import React from 'react'; 
import {Link} from 'react-router-dom'; 
import {useRCustomEffect} from '../../../useCustomEffect'; 
export default function ExpandCombinationsVariablesPart1(){
useRCustomEffect()
return ( <div>
<div className="page-columns page-rows-contents page-layout-article" id="quarto-content">
<main className="content" id="quarto-document-content">
<header className="quarto-title-block default" id="title-block-header">
<div className="quarto-title">
<h1 className="title">Create All Possible Combinations of Selected Variables (1/3): basics of <code>expand()</code> and <code>nesting()</code></h1>
</div>
<div className="quarto-title-meta">
</div>
</header>
<ul>
<li><code>expand()</code> creates all possible unique combinations between the levels in the selected Variables.</li>
<li><code>nesting()</code> inside <code>expand()</code> finds combinations already present in the input dataset.</li>
</ul>
<hr/>
<p>We’ll demonstrate the function using the following dataset. Note that in this example, the <code>size</code> variable is defined to have four distinct levels, <code>XS</code>, <code>S</code>, <code>M</code> and <code>L</code>, but only the first three levels are present in the dataset.</p>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb1"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb1-1"><a aria-hidden="true" href="#cb1-1" tabindex="-1"></a><span className="fu">library</span>(tidyr) </span>
<span id="cb1-2"><a aria-hidden="true" href="#cb1-2" tabindex="-1"></a><span className="fu">library</span>(dplyr)</span>
<span id="cb1-3"><a aria-hidden="true" href="#cb1-3" tabindex="-1"></a></span><br/>
<span id="cb1-4"><a aria-hidden="true" href="#cb1-4" tabindex="-1"></a>fruits <span className="ot">&lt;-</span> <span className="fu">tibble</span>(</span>
<span id="cb1-5"><a aria-hidden="true" href="#cb1-5" tabindex="-1"></a>  <span className="at">type =</span> <span className="fu">c</span>(<span className="st">"apple"</span>, <span className="st">"apple"</span>, <span className="st">"orange"</span>, <span className="st">"orange"</span>, <span className="st">"orange"</span>, <span className="st">"orange"</span>),</span>
<span id="cb1-6"><a aria-hidden="true" href="#cb1-6" tabindex="-1"></a>  <span className="at">year =</span> <span className="fu">rep</span>(<span className="fu">c</span>(<span className="dv">2023</span>, <span className="dv">2024</span>), <span className="at">each =</span> <span className="dv">3</span>),</span>
<span id="cb1-7"><a aria-hidden="true" href="#cb1-7" tabindex="-1"></a>  <span className="at">size =</span> <span className="fu">factor</span>(</span>
<span id="cb1-8"><a aria-hidden="true" href="#cb1-8" tabindex="-1"></a>    <span className="fu">c</span>(<span className="st">"XS"</span>, <span className="st">"S"</span>, <span className="st">"S"</span>, <span className="st">"S"</span>, <span className="st">"S"</span>, <span className="st">"M"</span>),</span>
<span id="cb1-9"><a aria-hidden="true" href="#cb1-9" tabindex="-1"></a>    <span className="at">levels =</span> <span className="fu">c</span>(<span className="st">"XS"</span>, <span className="st">"S"</span>, <span className="st">"M"</span>, <span className="st">"L"</span>)</span>
<span id="cb1-10"><a aria-hidden="true" href="#cb1-10" tabindex="-1"></a>  ),</span>
<span id="cb1-11"><a aria-hidden="true" href="#cb1-11" tabindex="-1"></a>  <span className="at">weights =</span> <span className="fu">rnorm</span>(<span className="dv">6</span>, <span className="fu">as.numeric</span>(size) <span className="sc">+</span> <span className="dv">2</span>)</span>
<span id="cb1-12"><a aria-hidden="true" href="#cb1-12" tabindex="-1"></a>)</span>
<span id="cb1-13"><a aria-hidden="true" href="#cb1-13" tabindex="-1"></a></span><br/>
<span id="cb1-14"><a aria-hidden="true" href="#cb1-14" tabindex="-1"></a>fruits</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 6 × 4
<br/>  type    year size  weights
<br/>  &lt;chr&gt;  &lt;dbl&gt; &lt;fct&gt;   &lt;dbl&gt;
<br/>1 apple   2023 XS       3.33
<br/>2 apple   2023 S        6.66
<br/>3 orange  2023 S        3.91
<br/>4 orange  2024 S        3.18
<br/>5 orange  2024 S        3.14
<br/>6 orange  2024 M        3.77</code></pre>
</div>
</div>
<p><strong><code>expand()</code> creates a new dataset that shows <em>all possible unique combinations</em> of selected variables.</strong> For instance, <code>expand(type, size)</code> creates (2 types) x (4 sizes) = 8 combinations (rows), and <code>expand(type, size, year)</code> creates (2 types) x (4 sizes) x (2 years) = 16 combinations (rows). Note that for the factor variable, the <em>full set</em> of levels are included in the combination (e.g., including the missing level <code>L</code> of the <code>size</code> variable), not just those that appear in the data. If you want to use only the factor values seen in the input dataset, use <code>fct_drop()</code> from the <Link to="https://forcats.tidyverse.org/index.html">forcats</Link> package to drop the missing factor levels.</p>
<div id="flex">
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb3"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb3-1"><a aria-hidden="true" href="#cb3-1" tabindex="-1"></a><span className="co">#</span></span>
<span id="cb3-2"><a aria-hidden="true" href="#cb3-2" tabindex="-1"></a>fruits <span className="sc">%&gt;%</span> </span>
<span id="cb3-3"><a aria-hidden="true" href="#cb3-3" tabindex="-1"></a>  <span className="fu">expand</span>(type, size)</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 8 × 2
<br/>  type   size 
<br/>  &lt;chr&gt;  &lt;fct&gt;
<br/>1 apple  XS   
<br/>2 apple  S    
<br/>3 apple  M    
<br/>4 apple  L    
<br/>5 orange XS   
<br/>6 orange S    
<br/>7 orange M    
<br/>8 orange L    </code></pre>
</div>
</div>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb5"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb5-1"><a aria-hidden="true" href="#cb5-1" tabindex="-1"></a><span className="co"># drop missing factor level 'L' of 'size' variable</span></span>
<span id="cb5-2"><a aria-hidden="true" href="#cb5-2" tabindex="-1"></a>fruits <span className="sc">%&gt;%</span> <span className="fu">expand</span>(</span>
<span id="cb5-3"><a aria-hidden="true" href="#cb5-3" tabindex="-1"></a>  type, <span className="at">size =</span> forcats<span className="sc">::</span><span className="fu">fct_drop</span>(size))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 6 × 2
<br/>  type   size 
<br/>  &lt;chr&gt;  &lt;fct&gt;
<br/>1 apple  XS   
<br/>2 apple  S    
<br/>3 apple  M    
<br/>4 orange XS   
<br/>5 orange S    
<br/>6 orange M    </code></pre>
</div>
</div>
</div>
<p>You can use the helper function <strong><code>nesting()</code> inside <code>expand()</code> to include only unique combinations that <em>already appear in the input dataset</em></strong>. For instance, in the code below, the <code>size</code> level of <code>L</code> is not included; combinations between <code>type</code> of <code>apple</code>, <code>size</code> of <code>M</code> and <code>L</code>, and <code>year</code> of <code>2024</code> are not present in the input dataset <code>fruits</code>, and thus not included in the output.</p>
<div id="flex">
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb7"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb7-1"><a aria-hidden="true" href="#cb7-1" tabindex="-1"></a>fruits <span className="sc">%&gt;%</span> </span>
<span id="cb7-2"><a aria-hidden="true" href="#cb7-2" tabindex="-1"></a>  <span className="fu">expand</span>(<span className="fu">nesting</span>(type, size))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 4 × 2
<br/>  type   size 
<br/>  &lt;chr&gt;  &lt;fct&gt;
<br/>1 apple  XS   
<br/>2 apple  S    
<br/>3 orange S    
<br/>4 orange M    </code></pre>
</div>
</div>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb9"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb9-1"><a aria-hidden="true" href="#cb9-1" tabindex="-1"></a>fruits <span className="sc">%&gt;%</span> </span>
<span id="cb9-2"><a aria-hidden="true" href="#cb9-2" tabindex="-1"></a>  <span className="fu">expand</span>(<span className="fu">nesting</span>(type, size, year))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 5 × 3
<br/>  type   size   year
<br/>  &lt;chr&gt;  &lt;fct&gt; &lt;dbl&gt;
<br/>1 apple  XS     2023
<br/>2 apple  S      2023
<br/>3 orange S      2023
<br/>4 orange S      2024
<br/>5 orange M      2024</code></pre>
</div>
</div>
</div>
<p>The code above is equivalent to selecting unique combinations using <Link to="/R/data-wrangling/dplyr/8-keep-distinct-rows"><code>distinct()</code></Link>, with rows further sorted with <Link to="/R/data-wrangling/dplyr/7-arrange"><code>arrange()</code></Link> (both functions from the <Link to="/R/data-wrangling/dplyr/0-introduction">dplyr</Link> package). For instance, the following two lines produce the same result.</p>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb11"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb11-1"><a aria-hidden="true" href="#cb11-1" tabindex="-1"></a>fruits <span className="sc">%&gt;%</span> <span className="fu">expand</span>(<span className="fu">nesting</span>(type, size))</span>
<span id="cb11-2"><a aria-hidden="true" href="#cb11-2" tabindex="-1"></a>fruits <span className="sc">%&gt;%</span> <span className="fu">distinct</span>(type, size) <span className="sc">%&gt;%</span> <span className="fu">arrange</span>(type, size)</span></code></pre></div>
</div>
<p>You can put together these two types of combinations: first expand with unique combinations already present in the dataset, and then expand further with additional variables including all possible combinations.</p>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb12"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb12-1"><a aria-hidden="true" href="#cb12-1" tabindex="-1"></a><span className="co"># find combinations of 'type' and 'size' already present in dataset</span></span>
<span id="cb12-2"><a aria-hidden="true" href="#cb12-2" tabindex="-1"></a><span className="co"># and then cross with 'year' including all possible combinations</span></span>
<span id="cb12-3"><a aria-hidden="true" href="#cb12-3" tabindex="-1"></a>fruits <span className="sc">%&gt;%</span> <span className="fu">expand</span>(<span className="fu">nesting</span>(type, size), year)</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 8 × 3
<br/>  type   size   year
<br/>  &lt;chr&gt;  &lt;fct&gt; &lt;dbl&gt;
<br/>1 apple  XS     2023
<br/>2 apple  XS     2024
<br/>3 apple  S      2023
<br/>4 apple  S      2024
<br/>5 orange S      2023
<br/>6 orange S      2024
<br/>7 orange M      2023
<br/>8 orange M      2024</code></pre>
</div>
</div>
<p>New variables can be supplied to create additional combinations.</p>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb14"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb14-1"><a aria-hidden="true" href="#cb14-1" tabindex="-1"></a><span className="co"># create combinations with a new variable 'store'</span></span>
<span id="cb14-2"><a aria-hidden="true" href="#cb14-2" tabindex="-1"></a>fruits <span className="sc">%&gt;%</span> <span className="fu">expand</span>(</span>
<span id="cb14-3"><a aria-hidden="true" href="#cb14-3" tabindex="-1"></a>  <span className="fu">nesting</span>(type, size), <span className="at">store =</span> <span className="fu">c</span>(<span className="st">"Walmart"</span>, <span className="st">"Costco"</span>))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 8 × 3
<br/>  type   size  store  
<br/>  &lt;chr&gt;  &lt;fct&gt; &lt;chr&gt;  
<br/>1 apple  XS    Costco 
<br/>2 apple  XS    Walmart
<br/>3 apple  S     Costco 
<br/>4 apple  S     Walmart
<br/>5 orange S     Costco 
<br/>6 orange S     Walmart
<br/>7 orange M     Costco 
<br/>8 orange M     Walmart</code></pre>
</div>
</div>
<hr/>
<p>Now you have been familiar with the basics of <code>expand()</code> and <code>nesting()</code>. In the <Link to="../expand-combinations-variables-part2">next section</Link>, we’ll discuss how to use <code>expand()</code> in junction with some <Link to="/R/data-wrangling/dplyr/0-introduction">dplyr</Link> functions to create additional applications.</p>
</main>
</div>
</div>
)}