import React from 'react'; 
import {Link} from 'react-router-dom'; 
import {useRCustomEffect} from '../../../useCustomEffect'; 
export default function ExpandCombinationsVariablesPart2(){
useRCustomEffect()
return ( <div>
<div className="page-columns page-rows-contents page-layout-article" id="quarto-content">
<main className="content" id="quarto-document-content">
<header className="quarto-title-block default" id="title-block-header">
<div className="quarto-title">
<h1 className="title">Create All Possible Combinations of Selected Variables (2/3): Use <code>expand()</code> with dplyr functions</h1>
</div>
<div className="quarto-title-meta">
</div>
</header>
<ul>
<li>Use <code>group_by()</code> before <code>expand()</code> to create combinations within each group.</li>
<li>Use <code>expand()</code> with <code>anti_join()</code> to find the missing combinations.</li>
<li>Use <code>expand()</code> with <code>right_join()</code> to convert implicit missing combinations to explicit missing values, a procedure that can be also performed by the <code>complete()</code> function.</li>
</ul>
<hr/>
<p>In this tutorial, we’ll continue using the <code>fruits</code> dataset to demonstrate the use of <code>expand()</code> in junction with functions of the <Link to="/R/data-wrangling/dplyr/0-introduction">dplyr</Link> package.</p>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb1"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb1-1"><a aria-hidden="true" href="#cb1-1" tabindex="-1"></a><span className="fu">library</span>(tidyr) </span>
<span id="cb1-2"><a aria-hidden="true" href="#cb1-2" tabindex="-1"></a><span className="fu">library</span>(dplyr)</span>
<span id="cb1-3"><a aria-hidden="true" href="#cb1-3" tabindex="-1"></a></span><br/>
<span id="cb1-4"><a aria-hidden="true" href="#cb1-4" tabindex="-1"></a>fruits <span className="ot">&lt;-</span> <span className="fu">tibble</span>(</span>
<span id="cb1-5"><a aria-hidden="true" href="#cb1-5" tabindex="-1"></a>  <span className="at">type =</span> <span className="fu">c</span>(<span className="st">"apple"</span>, <span className="st">"apple"</span>, <span className="st">"orange"</span>, <span className="st">"orange"</span>, <span className="st">"orange"</span>, <span className="st">"orange"</span>),</span>
<span id="cb1-6"><a aria-hidden="true" href="#cb1-6" tabindex="-1"></a>  <span className="at">year =</span> <span className="fu">rep</span>(<span className="fu">c</span>(<span className="dv">2023</span>, <span className="dv">2024</span>), <span className="at">each =</span> <span className="dv">3</span>),</span>
<span id="cb1-7"><a aria-hidden="true" href="#cb1-7" tabindex="-1"></a>  <span className="at">size =</span> <span className="fu">factor</span>(</span>
<span id="cb1-8"><a aria-hidden="true" href="#cb1-8" tabindex="-1"></a>    <span className="fu">c</span>(<span className="st">"XS"</span>, <span className="st">"S"</span>, <span className="st">"S"</span>, <span className="st">"S"</span>, <span className="st">"S"</span>, <span className="st">"M"</span>),</span>
<span id="cb1-9"><a aria-hidden="true" href="#cb1-9" tabindex="-1"></a>    <span className="at">levels =</span> <span className="fu">c</span>(<span className="st">"XS"</span>, <span className="st">"S"</span>, <span className="st">"M"</span>, <span className="st">"L"</span>)</span>
<span id="cb1-10"><a aria-hidden="true" href="#cb1-10" tabindex="-1"></a>  ),</span>
<span id="cb1-11"><a aria-hidden="true" href="#cb1-11" tabindex="-1"></a>  <span className="at">weights =</span> <span className="fu">rnorm</span>(<span className="dv">6</span>, <span className="fu">as.numeric</span>(size) <span className="sc">+</span> <span className="dv">2</span>)</span>
<span id="cb1-12"><a aria-hidden="true" href="#cb1-12" tabindex="-1"></a>)</span>
<span id="cb1-13"><a aria-hidden="true" href="#cb1-13" tabindex="-1"></a></span><br/>
<span id="cb1-14"><a aria-hidden="true" href="#cb1-14" tabindex="-1"></a>fruits</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 6 × 4
<br/>  type    year size  weights
<br/>  &lt;chr&gt;  &lt;dbl&gt; &lt;fct&gt;   &lt;dbl&gt;
<br/>1 apple   2023 XS       3.80
<br/>2 apple   2023 S        4.41
<br/>3 orange  2023 S        3.71
<br/>4 orange  2024 S        1.90
<br/>5 orange  2024 S        5.46
<br/>6 orange  2024 M        6.82</code></pre>
</div>
</div>
<p><strong>You can use <Link to="/R/data-wrangling/dplyr/6-grouped-dataset"><code>group_by()</code></Link> before <code>expand()</code> to create combinations within each group.</strong> This way, only levels that are present within each group are used to create combinations, except that for a factor variable, the full set of levels will be used in combination regardless of the groups. For instance, when <code>size</code> is a “factor”, all factor levels are used in combination regardless of the group; however, if <code>size</code> is a “character” (or other types), only levels present in each group are used in combination.</p>
<div id="flex">
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb3"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb3-1"><a aria-hidden="true" href="#cb3-1" tabindex="-1"></a><span className="co"># size as a "factor"</span></span>
<span id="cb3-2"><a aria-hidden="true" href="#cb3-2" tabindex="-1"></a>fruits <span className="sc">%&gt;%</span> </span>
<span id="cb3-3"><a aria-hidden="true" href="#cb3-3" tabindex="-1"></a>  </span>
<span id="cb3-4"><a aria-hidden="true" href="#cb3-4" tabindex="-1"></a>  <span className="fu">group_by</span>(type) <span className="sc">%&gt;%</span> </span>
<span id="cb3-5"><a aria-hidden="true" href="#cb3-5" tabindex="-1"></a>  <span className="fu">expand</span>(year, size)</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 12 × 3
<br/># Groups:   type [2]
<br/>   type    year size 
<br/>   &lt;chr&gt;  &lt;dbl&gt; &lt;fct&gt;
<br/> 1 apple   2023 XS   
<br/> 2 apple   2023 S    
<br/> 3 apple   2023 M    
<br/> 4 apple   2023 L    
<br/> 5 orange  2023 XS   
<br/> 6 orange  2023 S    
<br/> 7 orange  2023 M    
<br/> 8 orange  2023 L    
<br/> 9 orange  2024 XS   
<br/>10 orange  2024 S    
<br/>11 orange  2024 M    
<br/>12 orange  2024 L    </code></pre>
</div>
</div>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb5"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb5-1"><a aria-hidden="true" href="#cb5-1" tabindex="-1"></a><span className="co"># size as a "character"</span></span>
<span id="cb5-2"><a aria-hidden="true" href="#cb5-2" tabindex="-1"></a>fruits <span className="sc">%&gt;%</span> </span>
<span id="cb5-3"><a aria-hidden="true" href="#cb5-3" tabindex="-1"></a>  <span className="fu">mutate</span>(<span className="at">size =</span> <span className="fu">as.character</span>(size)) <span className="sc">%&gt;%</span> </span>
<span id="cb5-4"><a aria-hidden="true" href="#cb5-4" tabindex="-1"></a>  <span className="fu">group_by</span>(type) <span className="sc">%&gt;%</span> </span>
<span id="cb5-5"><a aria-hidden="true" href="#cb5-5" tabindex="-1"></a>  <span className="fu">expand</span>(year, size)</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 6 × 3
<br/># Groups:   type [2]
<br/>  type    year size 
<br/>  &lt;chr&gt;  &lt;dbl&gt; &lt;chr&gt;
<br/>1 apple   2023 S    
<br/>2 apple   2023 XS   
<br/>3 orange  2023 M    
<br/>4 orange  2023 S    
<br/>5 orange  2024 M    
<br/>6 orange  2024 S    </code></pre>
</div>
</div>
</div>
<p><strong>You can use <code>expand()</code> with <Link to="/R/data-wrangling/dplyr/18-filtering-join-two-datasets"><code>anti_join()</code></Link> to figure out which combinations are missing.</strong> (Recall that <code>anti_join(A, B)</code> returns rows found in dataset A but not in in B.) For instance, the code below looks for all possible combinations of <code>type</code>, <code>size</code>, and <code>year</code> that are not present in <code>fruits</code>.</p>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb7"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb7-1"><a aria-hidden="true" href="#cb7-1" tabindex="-1"></a><span className="co"># find missing combinations (relative to all the possibilities)</span></span>
<span id="cb7-2"><a aria-hidden="true" href="#cb7-2" tabindex="-1"></a>all_combinations <span className="ot">&lt;-</span> fruits <span className="sc">%&gt;%</span> <span className="fu">expand</span>(type, size, year)</span>
<span id="cb7-3"><a aria-hidden="true" href="#cb7-3" tabindex="-1"></a>all_combinations <span className="sc">%&gt;%</span> <span className="fu">anti_join</span>(fruits) </span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 11 × 3
<br/>   type   size   year
<br/>   &lt;chr&gt;  &lt;fct&gt; &lt;dbl&gt;
<br/> 1 apple  XS     2024
<br/> 2 apple  S      2024
<br/> 3 apple  M      2023
<br/> 4 apple  M      2024
<br/> 5 apple  L      2023
<br/> 6 apple  L      2024
<br/> 7 orange XS     2023
<br/> 8 orange XS     2024
<br/> 9 orange M      2023
<br/>10 orange L      2023
<br/>11 orange L      2024</code></pre>
</div>
</div>
<p><strong>You can use <code>expand()</code> with <Link to="/R/data-wrangling/dplyr/17-mutating-join-two-datasets"><code>right_join()</code></Link> to convert implicit missing combinations to explicit missing values.</strong> In this example, the missing rows have <code>NA</code> values in the <code>weights</code> variable.</p>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb9"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb9-1"><a aria-hidden="true" href="#cb9-1" tabindex="-1"></a>fruits <span className="sc">%&gt;%</span> <span className="fu">right_join</span>(all_combinations)</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 17 × 4
<br/>   type    year size  weights
<br/>   &lt;chr&gt;  &lt;dbl&gt; &lt;fct&gt;   &lt;dbl&gt;
<br/> 1 apple   2023 XS       3.80
<br/> 2 apple   2023 S        4.41
<br/> 3 orange  2023 S        3.71
<br/> 4 orange  2024 S        1.90
<br/> 5 orange  2024 S        5.46
<br/> 6 orange  2024 M        6.82
<br/> 7 apple   2024 XS      NA   
<br/> 8 apple   2024 S       NA   
<br/> 9 apple   2023 M       NA   
<br/>10 apple   2024 M       NA   
<br/>11 apple   2023 L       NA   
<br/>12 apple   2024 L       NA   
<br/>13 orange  2023 XS      NA   
<br/>14 orange  2024 XS      NA   
<br/>15 orange  2023 M       NA   
<br/>16 orange  2023 L       NA   
<br/>17 orange  2024 L       NA   </code></pre>
</div>
</div>
<p>The code above can be also written using the <strong><code>complete()</code></strong> function to produce a similar output (though with a different order of rows in the output).</p>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb11"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb11-1"><a aria-hidden="true" href="#cb11-1" tabindex="-1"></a>fruits <span className="sc">%&gt;%</span> <span className="fu">complete</span>(type, year, size)</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 17 × 4
<br/>   type    year size  weights
<br/>   &lt;chr&gt;  &lt;dbl&gt; &lt;fct&gt;   &lt;dbl&gt;
<br/> 1 apple   2023 XS       3.80
<br/> 2 apple   2023 S        4.41
<br/> 3 apple   2023 M       NA   
<br/> 4 apple   2023 L       NA   
<br/> 5 apple   2024 XS      NA   
<br/> 6 apple   2024 S       NA   
<br/> 7 apple   2024 M       NA   
<br/> 8 apple   2024 L       NA   
<br/> 9 orange  2023 XS      NA   
<br/>10 orange  2023 S        3.71
<br/>11 orange  2023 M       NA   
<br/>12 orange  2023 L       NA   
<br/>13 orange  2024 XS      NA   
<br/>14 orange  2024 S        1.90
<br/>15 orange  2024 S        5.46
<br/>16 orange  2024 M        6.82
<br/>17 orange  2024 L       NA   </code></pre>
</div>
</div>
<hr/>
<p>Now you have learned the use of <code>expand()</code> to create combinations of variables (columns) of an input dataset. In the next section, you’ll learn a similar function <Link to="../expand-combinations-variables-part3"><code>expand_grid()</code></Link> that yet creates combinations based on levels of <em>vectors</em>.</p>
</main>
</div>
</div>
)}