import React from 'react'; 
import {Link} from 'react-router-dom'; 
import {useRCustomEffect} from '../../../useCustomEffect'; 
export default function RepeatedOperationsOnMultipleColumn(){
useRCustomEffect()
return ( <div>
<div className="page-columns page-rows-contents page-layout-article" id="quarto-content">
<main className="content" id="quarto-document-content">
<header className="quarto-title-block default" id="title-block-header">
<div className="quarto-title">
<h1 className="title">Repeat Operations Across Multiple Columns</h1>
</div>
<div className="quarto-title-meta">
</div>
</header>
<p>In the earlier section, you have learned different <Link to="../2-select-columns">selection helper functions</Link> that allow efficient column selection. In this tutorial, you’ll learn another powerful and advanced selection helper function <strong><code>across()</code></strong> that enables you to repeat the same operations across multiple columns at the same time.</p>
<section className="level3" id="basics-of-repeated-operations-across-columns">
<h3 className="anchored" data-anchor-id="basics-of-repeated-operations-across-columns">Basics of repeated operations across columns</h3>
<p>Often you want to apply a processing repeatedly over multiple columns. In the following example, <code>n_distinct()</code> is verbosely typed 3 times to count unique entries in each of the three columns, “name”, “hair_color, and”skin_color”.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb1"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb1-1"><a aria-hidden="true" href="#cb1-1" tabindex="-1"></a><span className="fu">library</span>(dplyr)</span>
<span id="cb1-2"><a aria-hidden="true" href="#cb1-2" tabindex="-1"></a><span className="co"># a subset of the first 5 columns</span></span>
<span id="cb1-3"><a aria-hidden="true" href="#cb1-3" tabindex="-1"></a>starwars2 <span className="ot">&lt;-</span> starwars[, <span className="dv">1</span><span className="sc">:</span><span className="dv">5</span>] </span>
<span id="cb1-4"><a aria-hidden="true" href="#cb1-4" tabindex="-1"></a></span><br/>
<span id="cb1-5"><a aria-hidden="true" href="#cb1-5" tabindex="-1"></a>starwars2 <span className="sc">%&gt;%</span> </span>
<span id="cb1-6"><a aria-hidden="true" href="#cb1-6" tabindex="-1"></a>  <span className="fu">summarise</span>(<span className="at">name       =</span> <span className="fu">n_distinct</span>( name       ),</span>
<span id="cb1-7"><a aria-hidden="true" href="#cb1-7" tabindex="-1"></a>            <span className="at">hair_color =</span> <span className="fu">n_distinct</span>( hair_color ),</span>
<span id="cb1-8"><a aria-hidden="true" href="#cb1-8" tabindex="-1"></a>            <span className="at">skin_color =</span> <span className="fu">n_distinct</span>( skin_color ))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 1 × 3
<br/>     name hair_color skin_color
<br/>    &lt;int&gt;      &lt;int&gt;      &lt;int&gt;
<br/>  1    87         12         31</code></pre>
</div>
</div>
<p>The code can be much simplified using the <strong><code>across()</code></strong> function. Inside this function, the names of columns to be processed are fed to the first argument <strong><code>.col</code></strong>, and the function name (without carrying its parentheses) is fed to the second argument <strong><code>.fns</code></strong>. Note that both arguments are inside the <code>across()</code> function.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb3"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb3-1"><a aria-hidden="true" href="#cb3-1" tabindex="-1"></a>starwars2 <span className="sc">%&gt;%</span> </span>
<span id="cb3-2"><a aria-hidden="true" href="#cb3-2" tabindex="-1"></a>  <span className="fu">summarise</span>(<span className="fu">across</span>(<span className="at">.col =</span> <span className="fu">c</span>(name, hair_color, skin_color), </span>
<span id="cb3-3"><a aria-hidden="true" href="#cb3-3" tabindex="-1"></a>                   <span className="at">.fns =</span> n_distinct))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 1 × 3
<br/>     name hair_color skin_color
<br/>    &lt;int&gt;      &lt;int&gt;      &lt;int&gt;
<br/>  1    87         12         31</code></pre>
</div>
</div>
<p>You can use other helper functions inside <code>across()</code> to select columns. (Below the argument names of <code>across()</code> are omitted.)</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb5"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb5-1"><a aria-hidden="true" href="#cb5-1" tabindex="-1"></a><span className="co"># count unique values separately for each character variable</span></span>
<span id="cb5-2"><a aria-hidden="true" href="#cb5-2" tabindex="-1"></a>starwars2 <span className="sc">%&gt;%</span> </span>
<span id="cb5-3"><a aria-hidden="true" href="#cb5-3" tabindex="-1"></a>  <span className="fu">summarise</span>(<span className="fu">across</span>(<span className="fu">where</span>(is.character), n_distinct))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 1 × 3
<br/>     name hair_color skin_color
<br/>    &lt;int&gt;      &lt;int&gt;      &lt;int&gt;
<br/>  1    87         12         31</code></pre>
</div>
</div>
<p>The following example attempts to calculate the mean of all numeric columns. Due to the presence of <code>NA</code> values, the mean is output as <code>NA</code>.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb7"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb7-1"><a aria-hidden="true" href="#cb7-1" tabindex="-1"></a>starwars2 <span className="sc">%&gt;%</span> </span>
<span id="cb7-2"><a aria-hidden="true" href="#cb7-2" tabindex="-1"></a>  <span className="fu">summarise</span>(<span className="fu">across</span>(<span className="fu">where</span>(is.numeric), mean))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 1 × 2
<br/>    height  mass
<br/>     &lt;dbl&gt; &lt;dbl&gt;
<br/>  1     NA    NA</code></pre>
</div>
</div>
<p>To include additional arguments of the <code>mean()</code> function (e.g., <code>na.rm = T</code> to remove the NA values), use the <Link to="../2-select-columns"><strong>purrr style</strong></Link> with the tilde operator <code>~</code> to create an anonymous function in the form of <code>~function(.x, ...)</code>.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb9"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb9-1"><a aria-hidden="true" href="#cb9-1" tabindex="-1"></a><span className="co"># use the purrr style to add more arguments of the mean function</span></span>
<span id="cb9-2"><a aria-hidden="true" href="#cb9-2" tabindex="-1"></a>starwars2 <span className="sc">%&gt;%</span> </span>
<span id="cb9-3"><a aria-hidden="true" href="#cb9-3" tabindex="-1"></a>  <span className="fu">summarise</span>(<span className="fu">across</span>(<span className="fu">where</span>(is.numeric), <span className="sc">~</span><span className="fu">mean</span>(.x, <span className="at">na.rm =</span> T)))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 1 × 2
<br/>    height  mass
<br/>     &lt;dbl&gt; &lt;dbl&gt;
<br/>  1   175.  97.3</code></pre>
</div>
</div>
<p>The argument <strong><code>.names</code></strong> specifies the names of the output columns using the <Link to="https://glue.tidyverse.org/">glue syntax</Link>. The default value is <code>&#123;.col&#125;</code>; that is, it keeps the original column names. The following example adds “_mean” suffix to the output column names.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb11"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb11-1"><a aria-hidden="true" href="#cb11-1" tabindex="-1"></a>starwars2 <span className="sc">%&gt;%</span> <span className="fu">summarise</span>(<span className="fu">across</span>(</span>
<span id="cb11-2"><a aria-hidden="true" href="#cb11-2" tabindex="-1"></a>  <span className="fu">where</span>(is.numeric), <span className="sc">~</span><span className="fu">mean</span>(.x, <span className="at">na.rm =</span> T), </span>
<span id="cb11-3"><a aria-hidden="true" href="#cb11-3" tabindex="-1"></a>  <span className="at">.names =</span> <span className="st">"&#123;.col&#125;_mean"</span>))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 1 × 2
<br/>    height_mean mass_mean
<br/>          &lt;dbl&gt;     &lt;dbl&gt;
<br/>  1        175.      97.3</code></pre>
</div>
</div>
<p>The grouping variables (specified inside <Link to="../6-grouped-dataset"><code>group_by</code></Link>) are <em>not</em> selected by <code>across()</code>. In the example below, unique values are counted in all character columns <em>separately for each kind of hair color</em> (the grouping variable); although “hair_color” is also a character variable, it is not included in <code>across()</code>.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb13"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb13-1"><a aria-hidden="true" href="#cb13-1" tabindex="-1"></a>starwars2 <span className="sc">%&gt;%</span> </span>
<span id="cb13-2"><a aria-hidden="true" href="#cb13-2" tabindex="-1"></a>  <span className="fu">group_by</span>(hair_color) <span className="sc">%&gt;%</span> </span>
<span id="cb13-3"><a aria-hidden="true" href="#cb13-3" tabindex="-1"></a>  <span className="fu">summarise</span>(<span className="fu">across</span>(<span className="fu">where</span>(is.character), n_distinct))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 12 × 3
<br/>    hair_color     name skin_color
<br/>    &lt;chr&gt;         &lt;int&gt;      &lt;int&gt;
<br/>  1 auburn            1          1
<br/>  2 auburn, grey      1          1
<br/>  3 auburn, white     1          1
<br/>  4 black            13          7
<br/>  # ℹ 8 more rows</code></pre>
</div>
</div>
</section>
<section className="level3" id="operate-multiple-functions-across-columns">
<h3 className="anchored" data-anchor-id="operate-multiple-functions-across-columns">Operate multiple functions across columns</h3>
<p>To perform multiple functions repeatedly over a same set of columns, a list of functions can be fed into the function argument <code>.fns</code>. The example below calculates the mean, standard deviation, and number of missing values <code>NA</code> in each of the numeric columns.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb15"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb15-1"><a aria-hidden="true" href="#cb15-1" tabindex="-1"></a>f <span className="ot">&lt;-</span> <span className="fu">list</span>(</span>
<span id="cb15-2"><a aria-hidden="true" href="#cb15-2" tabindex="-1"></a>  <span className="at">m =</span> <span className="sc">~</span><span className="fu">mean</span>(.x, <span className="at">na.rm =</span> T), <span className="co"># calculate mean</span></span>
<span id="cb15-3"><a aria-hidden="true" href="#cb15-3" tabindex="-1"></a>  <span className="at">s =</span> <span className="sc">~</span><span className="fu">sd</span>(.x, <span className="at">na.rm =</span> T),   <span className="co"># calculate standard deviation</span></span>
<span id="cb15-4"><a aria-hidden="true" href="#cb15-4" tabindex="-1"></a>  <span className="at">n =</span> <span className="sc">~</span><span className="fu">sum</span>(<span className="fu">is.na</span>(.x))       <span className="co"># count number of missing values</span></span>
<span id="cb15-5"><a aria-hidden="true" href="#cb15-5" tabindex="-1"></a>)</span>
<span id="cb15-6"><a aria-hidden="true" href="#cb15-6" tabindex="-1"></a></span><br/>
<span id="cb15-7"><a aria-hidden="true" href="#cb15-7" tabindex="-1"></a>starwars2 <span className="sc">%&gt;%</span> <span className="fu">summarise</span>(<span className="fu">across</span>(<span className="fu">where</span>(is.numeric), f))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 1 × 6
<br/>    height_m height_s height_n mass_m mass_s mass_n
<br/>       &lt;dbl&gt;    &lt;dbl&gt;    &lt;int&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;int&gt;
<br/>  1     175.     34.8        6   97.3   169.     28</code></pre>
</div>
</div>
<p>In case of the listed function operation, <code>.names</code> defaults to <code>&#123;.col&#125;_&#123;.fn&#125;</code>, i.e., it names new columns with a combination of the original column names and the function names. The following script swaps the naming order (function name first, column name second), and connect them with a hyphen.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb17"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb17-1"><a aria-hidden="true" href="#cb17-1" tabindex="-1"></a>starwars2 <span className="sc">%&gt;%</span> <span className="fu">summarise</span>(</span>
<span id="cb17-2"><a aria-hidden="true" href="#cb17-2" tabindex="-1"></a>  <span className="fu">across</span>(<span className="fu">where</span>(is.numeric), f, <span className="at">.names =</span> <span className="st">"&#123;.fn&#125;-&#123;.col&#125;"</span>))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 1 × 6
<span>    `m-height` `s-height` `n-height` `m-mass` `s-mass` `n-mass`
</span><br/>         &lt;dbl&gt;      &lt;dbl&gt;      &lt;int&gt;    &lt;dbl&gt;    &lt;dbl&gt;    &lt;int&gt;
<br/>  1       175.       34.8          6     97.3     169.       28</code></pre>
</div>
</div>
<p>By default, columns produced from the same variables are clustered together. Alternatively, you can use <Link to="../13-reorder-columns"><code>relocate()</code></Link> to cluster columns produced from the same function, and use <Link to="../2-select-columns">selection helpers</Link> to select blocks of columns.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb19"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb19-1"><a aria-hidden="true" href="#cb19-1" tabindex="-1"></a>starwars2 <span className="sc">%&gt;%</span> <span className="fu">summarise</span>(</span>
<span id="cb19-2"><a aria-hidden="true" href="#cb19-2" tabindex="-1"></a>  <span className="fu">across</span>(<span className="fu">where</span>(is.numeric), f, <span className="at">.names =</span> <span className="st">"&#123;.fn&#125;-&#123;.col&#125;"</span>)) <span className="sc">%&gt;%</span> </span>
<span id="cb19-3"><a aria-hidden="true" href="#cb19-3" tabindex="-1"></a>  <span className="co"># start with mean columns, then standard deviation columns</span></span>
<span id="cb19-4"><a aria-hidden="true" href="#cb19-4" tabindex="-1"></a>  <span className="fu">relocate</span>(<span className="fu">contains</span>(<span className="st">"m-"</span>), <span className="fu">contains</span>(<span className="st">"s-"</span>))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 1 × 6
<span>    `m-height` `m-mass` `s-height` `s-mass` `n-height` `n-mass`
</span><br/>         &lt;dbl&gt;    &lt;dbl&gt;      &lt;dbl&gt;    &lt;dbl&gt;      &lt;int&gt;    &lt;int&gt;
<br/>  1       175.     97.3       34.8     169.          6       28</code></pre>
</div>
</div>
</section>
<section className="level3" id="use-with-other-dplyr-functions">
<h3 className="anchored" data-anchor-id="use-with-other-dplyr-functions">Use with other dplyr functions</h3>
<p>🎁 <strong><em>Bonus skill</em></strong></p>
<p>The use of <code>across()</code> with <Link to="../4-mutate-columns"><code>mutate()</code></Link> and other dplyr verbs works the same way. The following example scales all numeric variables to a range of 0 to 1.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb21"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb21-1"><a aria-hidden="true" href="#cb21-1" tabindex="-1"></a><span className="co"># define function scaling a numeric vector to range of [0, 1]</span></span>
<span id="cb21-2"><a aria-hidden="true" href="#cb21-2" tabindex="-1"></a>rescale_0_1 <span className="ot">&lt;-</span> <span className="cf">function</span>(x) &#123;</span>
<span id="cb21-3"><a aria-hidden="true" href="#cb21-3" tabindex="-1"></a>  minim <span className="ot">&lt;-</span> <span className="fu">min</span>(x, <span className="at">na.rm =</span> T)</span>
<span id="cb21-4"><a aria-hidden="true" href="#cb21-4" tabindex="-1"></a>  maxim <span className="ot">&lt;-</span> <span className="fu">max</span>(x, <span className="at">na.rm =</span> T)</span>
<span id="cb21-5"><a aria-hidden="true" href="#cb21-5" tabindex="-1"></a>  (x <span className="sc">-</span> minim) <span className="sc">/</span> (maxim <span className="sc">-</span> minim)</span>
<span id="cb21-6"><a aria-hidden="true" href="#cb21-6" tabindex="-1"></a>&#125;</span>
<span id="cb21-7"><a aria-hidden="true" href="#cb21-7" tabindex="-1"></a></span><br/>
<span id="cb21-8"><a aria-hidden="true" href="#cb21-8" tabindex="-1"></a><span className="co"># overwrite the original columns with scaled values</span></span>
<span id="cb21-9"><a aria-hidden="true" href="#cb21-9" tabindex="-1"></a>starwars2 <span className="sc">%&gt;%</span> <span className="fu">mutate</span>(<span className="fu">across</span>(<span className="fu">where</span>(is.numeric), rescale_0_1))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 87 × 5
<br/>    name           height   mass hair_color skin_color 
<br/>    &lt;chr&gt;           &lt;dbl&gt;  &lt;dbl&gt; &lt;chr&gt;      &lt;chr&gt;      
<br/>  1 Luke Skywalker  0.535 0.0462 blond      fair       
<br/>  2 C-3PO           0.510 0.0447 &lt;NA&gt;       gold       
<br/>  3 R2-D2           0.152 0.0127 &lt;NA&gt;       white, blue
<br/>  4 Darth Vader     0.687 0.0901 none       white      
<br/>  # ℹ 83 more rows</code></pre>
</div>
</div>
<div className="cell">
<div className="sourceCode cell-code" id="cb23"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb23-1"><a aria-hidden="true" href="#cb23-1" tabindex="-1"></a><span className="co"># keep original columns,</span></span>
<span id="cb23-2"><a aria-hidden="true" href="#cb23-2" tabindex="-1"></a><span className="co"># include scaled values as new columns with suffix "_scaled"</span></span>
<span id="cb23-3"><a aria-hidden="true" href="#cb23-3" tabindex="-1"></a>starwars2 <span className="sc">%&gt;%</span> <span className="fu">mutate</span>(<span className="fu">across</span>(</span>
<span id="cb23-4"><a aria-hidden="true" href="#cb23-4" tabindex="-1"></a>  <span className="fu">where</span>(is.numeric), rescale_0_1, <span className="at">.names =</span> <span className="st">"&#123;.col&#125;_scaled"</span>),</span>
<span id="cb23-5"><a aria-hidden="true" href="#cb23-5" tabindex="-1"></a>  <span className="co"># put new columns after the "mass" column</span></span>
<span id="cb23-6"><a aria-hidden="true" href="#cb23-6" tabindex="-1"></a>  <span className="at">.after =</span> mass)</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 87 × 7
<br/>    name           height  mass height_scaled mass_scaled hair_color skin_color 
<br/>    &lt;chr&gt;           &lt;int&gt; &lt;dbl&gt;         &lt;dbl&gt;       &lt;dbl&gt; &lt;chr&gt;      &lt;chr&gt;      
<br/>  1 Luke Skywalker    172    77         0.535      0.0462 blond      fair       
<br/>  2 C-3PO             167    75         0.510      0.0447 &lt;NA&gt;       gold       
<br/>  3 R2-D2              96    32         0.152      0.0127 &lt;NA&gt;       white, blue
<br/>  4 Darth Vader       202   136         0.687      0.0901 none       white      
<br/>  # ℹ 83 more rows</code></pre>
</div>
</div>
</section>
</main>
</div>
</div>
)}