import React from 'react'; 
import {Link} from 'react-router-dom'; 
import {useRCustomEffect} from '../../../useCustomEffect'; 
export default function GroupedDataset(){
useRCustomEffect()
return ( <div>
<div className="page-columns page-rows-contents page-layout-article" id="quarto-content">
<main className="content" id="quarto-document-content">
<header className="quarto-title-block default" id="title-block-header">
<div className="quarto-title">
<h1 className="title">Grouped Dataset (in Rows Subset)</h1>
</div>
<div className="quarto-title-meta">
</div>
</header>
<section className="level3" id="basics-group-the-dataset-into-subset-of-rows">
<h3 className="anchored" data-anchor-id="basics-group-the-dataset-into-subset-of-rows">Basics: Group the dataset into subset of rows</h3>
<p><strong><code>group_by()</code></strong> divides a data frame into groups (subsets of rows) based on specified columns (i.e., the grouping variables), and allows subsequent operations to be performed separately for each group. This function by itself does not change the appearance of the dataset.</p>
<p><code>group_by()</code> is often used before <code>summarise()</code> to create summarized statistics separately for each group of the dataset. In the script below, compare the effect without vs. with a grouping variable.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb1"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb1-1"><a aria-hidden="true" href="#cb1-1" tabindex="-1"></a><span className="fu">library</span>(dplyr)</span>
<span id="cb1-2"><a aria-hidden="true" href="#cb1-2" tabindex="-1"></a></span><br/>
<span id="cb1-3"><a aria-hidden="true" href="#cb1-3" tabindex="-1"></a><span className="co"># calculate the average height of ALL beings </span></span>
<span id="cb1-4"><a aria-hidden="true" href="#cb1-4" tabindex="-1"></a>starwars <span className="sc">%&gt;%</span> </span>
<span id="cb1-5"><a aria-hidden="true" href="#cb1-5" tabindex="-1"></a>  <span className="fu">summarise</span>(<span className="at">height.mean =</span> <span className="fu">mean</span>(height, <span className="at">na.rm =</span> T))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 1 × 1
<br/>    height.mean
<br/>          &lt;dbl&gt;
<br/>  1        175.</code></pre>
</div>
<div className="sourceCode cell-code" id="cb3"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb3-1"><a aria-hidden="true" href="#cb3-1" tabindex="-1"></a><span className="co"># Calculate the average height for EACH species</span></span>
<span id="cb3-2"><a aria-hidden="true" href="#cb3-2" tabindex="-1"></a>starwars <span className="sc">%&gt;%</span> </span>
<span id="cb3-3"><a aria-hidden="true" href="#cb3-3" tabindex="-1"></a>  <span className="fu">group_by</span>(species) <span className="sc">%&gt;%</span> </span>
<span id="cb3-4"><a aria-hidden="true" href="#cb3-4" tabindex="-1"></a>  <span className="fu">summarise</span>(<span className="at">height.mean =</span> <span className="fu">mean</span>(height, <span className="at">na.rm =</span> T))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 38 × 2
<br/>    species  height.mean
<br/>    &lt;chr&gt;          &lt;dbl&gt;
<br/>  1 Aleena            79
<br/>  2 Besalisk         198
<br/>  3 Cerean           198
<br/>  4 Chagrian         196
<br/>  # ℹ 34 more rows</code></pre>
</div>
</div>
<p>You can group the dataset by unique combinations of <em>multiple</em> variables. Note that <strong>each call to <code>summarise()</code> removes the <em>last</em> grouping variable</strong> from the grouping structure of the output dataset, and the summarized dataset below is now internally grouped only by the “species” variable.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb5"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb5-1"><a aria-hidden="true" href="#cb5-1" tabindex="-1"></a><span className="co"># Calculate mean height for each "species" - "gender" combination</span></span>
<span id="cb5-2"><a aria-hidden="true" href="#cb5-2" tabindex="-1"></a>starwars <span className="sc">%&gt;%</span> </span>
<span id="cb5-3"><a aria-hidden="true" href="#cb5-3" tabindex="-1"></a>  <span className="fu">group_by</span>(species, gender) <span className="sc">%&gt;%</span> </span>
<span id="cb5-4"><a aria-hidden="true" href="#cb5-4" tabindex="-1"></a>  <span className="fu">summarise</span>(<span className="at">height.mean =</span> <span className="fu">mean</span>(height))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 42 × 3
<br/>  # Groups:   species [38]
<br/>    species  gender    height.mean
<br/>    &lt;chr&gt;    &lt;chr&gt;           &lt;dbl&gt;
<br/>  1 Aleena   masculine          79
<br/>  2 Besalisk masculine         198
<br/>  3 Cerean   masculine         198
<br/>  4 Chagrian masculine         196
<br/>  # ℹ 38 more rows</code></pre>
</div>
</div>
<p><code>group_by()</code> is also often used in <Link to="../3-filter-rows"><code>filter()</code></Link> and <Link to="../4-mutate-columns"><code>mutate()</code></Link> functions.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb7"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb7-1"><a aria-hidden="true" href="#cb7-1" tabindex="-1"></a><span className="co"># find the tallest being in each gender of each species</span></span>
<span id="cb7-2"><a aria-hidden="true" href="#cb7-2" tabindex="-1"></a><span className="co"># use "==" for judge if two quantities are equal or not</span></span>
<span id="cb7-3"><a aria-hidden="true" href="#cb7-3" tabindex="-1"></a>starwars <span className="sc">%&gt;%</span> </span>
<span id="cb7-4"><a aria-hidden="true" href="#cb7-4" tabindex="-1"></a>  <span className="fu">group_by</span>(species, gender) <span className="sc">%&gt;%</span> </span>
<span id="cb7-5"><a aria-hidden="true" href="#cb7-5" tabindex="-1"></a>  <span className="fu">filter</span>(height <span className="sc">==</span> <span className="fu">max</span>(height, <span className="at">na.rm =</span> T))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 42 × 14
<br/>  # Groups:   species, gender [42]
<br/>    name      height  mass hair_color skin_color eye_color birth_year sex   gender
<br/>    &lt;chr&gt;      &lt;int&gt; &lt;dbl&gt; &lt;chr&gt;      &lt;chr&gt;      &lt;chr&gt;          &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; 
<br/>  1 Darth Va…    202   136 none       white      yellow          41.9 male  mascu…
<br/>  2 Greedo       173    74 &lt;NA&gt;       green      black           44   male  mascu…
<br/>  3 Jabba De…    175  1358 &lt;NA&gt;       green-tan… orange         600   herm… mascu…
<br/>  4 Yoda          66    17 white      green      brown          896   male  mascu…
<br/>  # ℹ 38 more rows
<br/>  # ℹ 5 more variables: homeworld &lt;chr&gt;, species &lt;chr&gt;, films &lt;list&gt;,
<br/>  #   vehicles &lt;list&gt;, starships &lt;list&gt;</code></pre>
</div>
<div className="sourceCode cell-code" id="cb9"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb9-1"><a aria-hidden="true" href="#cb9-1" tabindex="-1"></a><span className="co"># calculate the relative height of each being in its own species</span></span>
<span id="cb9-2"><a aria-hidden="true" href="#cb9-2" tabindex="-1"></a>starwars <span className="sc">%&gt;%</span> </span>
<span id="cb9-3"><a aria-hidden="true" href="#cb9-3" tabindex="-1"></a>  <span className="fu">group_by</span>(species) <span className="sc">%&gt;%</span> </span>
<span id="cb9-4"><a aria-hidden="true" href="#cb9-4" tabindex="-1"></a>  <span className="fu">mutate</span>(</span>
<span id="cb9-5"><a aria-hidden="true" href="#cb9-5" tabindex="-1"></a>    <span className="at">height.relative =</span> height <span className="sc">/</span> <span className="fu">max</span>(height, <span className="at">na.rm =</span> T), </span>
<span id="cb9-6"><a aria-hidden="true" href="#cb9-6" tabindex="-1"></a>    <span className="co"># put the new column before "mass" column</span></span>
<span id="cb9-7"><a aria-hidden="true" href="#cb9-7" tabindex="-1"></a>    <span className="co"># </span><span className="al">NOTE</span><span className="co">: now the new column is the 3rd column in the output dataset</span></span>
<span id="cb9-8"><a aria-hidden="true" href="#cb9-8" tabindex="-1"></a>    <span className="at">.before =</span> mass) </span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 87 × 15
<br/>  # Groups:   species [38]
<br/>    name   height height.relative  mass hair_color skin_color eye_color birth_year
<br/>    &lt;chr&gt;   &lt;int&gt;           &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;      &lt;chr&gt;      &lt;chr&gt;          &lt;dbl&gt;
<br/>  1 Luke …    172           0.851    77 blond      fair       blue            19  
<br/>  2 C-3PO     167           0.835    75 &lt;NA&gt;       gold       yellow         112  
<br/>  3 R2-D2      96           0.48     32 &lt;NA&gt;       white, bl… red             33  
<br/>  4 Darth…    202           1       136 none       white      yellow          41.9
<br/>  # ℹ 83 more rows
<br/>  # ℹ 7 more variables: sex &lt;chr&gt;, gender &lt;chr&gt;, homeworld &lt;chr&gt;, species &lt;chr&gt;,
<br/>  #   films &lt;list&gt;, vehicles &lt;list&gt;, starships &lt;list&gt;</code></pre>
</div>
</div>
</section>
<section className="level3" id="overwrite-add-or-remove-grouping-variables">
<h3 className="anchored" data-anchor-id="overwrite-add-or-remove-grouping-variables">Overwrite, add, or remove grouping variables</h3>
<p>By default, a new call of <code>group_by()</code> overrides the existing grouping variables.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb11"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb11-1"><a aria-hidden="true" href="#cb11-1" tabindex="-1"></a>starwars <span className="sc">%&gt;%</span> </span>
<span id="cb11-2"><a aria-hidden="true" href="#cb11-2" tabindex="-1"></a>  <span className="fu">group_by</span>(species) <span className="sc">%&gt;%</span> </span>
<span id="cb11-3"><a aria-hidden="true" href="#cb11-3" tabindex="-1"></a>  </span>
<span id="cb11-4"><a aria-hidden="true" href="#cb11-4" tabindex="-1"></a>  <span className="co"># overwrite prior grouping variables "species" </span></span>
<span id="cb11-5"><a aria-hidden="true" href="#cb11-5" tabindex="-1"></a>  <span className="fu">group_by</span>(gender, sex) <span className="sc">%&gt;%</span> </span>
<span id="cb11-6"><a aria-hidden="true" href="#cb11-6" tabindex="-1"></a>  </span>
<span id="cb11-7"><a aria-hidden="true" href="#cb11-7" tabindex="-1"></a>  <span className="co"># calculate for each combination of "gender" and "sex"</span></span>
<span id="cb11-8"><a aria-hidden="true" href="#cb11-8" tabindex="-1"></a>  <span className="fu">summarise</span>(<span className="at">height.max =</span> <span className="fu">max</span>(height, <span className="at">na.rm =</span> T))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 6 × 3
<br/>  # Groups:   gender [3]
<br/>    gender    sex            height.max
<br/>    &lt;chr&gt;     &lt;chr&gt;               &lt;int&gt;
<br/>  1 feminine  female                213
<br/>  2 feminine  none                   96
<br/>  3 masculine hermaphroditic        175
<br/>  4 masculine male                  264
<br/>  # ℹ 2 more rows</code></pre>
</div>
</div>
<p>You can use argument <strong><code>.add = T</code></strong> in a new line of <code>group_by()</code> to add additional variables to the existing grouping variables, instead of overwriting them.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb13"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb13-1"><a aria-hidden="true" href="#cb13-1" tabindex="-1"></a>starwars <span className="sc">%&gt;%</span> </span>
<span id="cb13-2"><a aria-hidden="true" href="#cb13-2" tabindex="-1"></a>  <span className="fu">group_by</span>(species) <span className="sc">%&gt;%</span> </span>
<span id="cb13-3"><a aria-hidden="true" href="#cb13-3" tabindex="-1"></a>  </span>
<span id="cb13-4"><a aria-hidden="true" href="#cb13-4" tabindex="-1"></a>  <span className="co"># ADD another two grouping variables "gender" and "sex" </span></span>
<span id="cb13-5"><a aria-hidden="true" href="#cb13-5" tabindex="-1"></a>  <span className="fu">group_by</span>(gender, sex, <span className="at">.add =</span> T) <span className="sc">%&gt;%</span> </span>
<span id="cb13-6"><a aria-hidden="true" href="#cb13-6" tabindex="-1"></a>  </span>
<span id="cb13-7"><a aria-hidden="true" href="#cb13-7" tabindex="-1"></a>  <span className="co"># calculate for each combination of "species", "gender", and "sex"</span></span>
<span id="cb13-8"><a aria-hidden="true" href="#cb13-8" tabindex="-1"></a>  <span className="fu">summarise</span>(<span className="at">height.max =</span> <span className="fu">max</span>(height, <span className="at">na.rm =</span> T))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 42 × 4
<br/>  # Groups:   species, gender [42]
<br/>    species  gender    sex   height.max
<br/>    &lt;chr&gt;    &lt;chr&gt;     &lt;chr&gt;      &lt;int&gt;
<br/>  1 Aleena   masculine male          79
<br/>  2 Besalisk masculine male         198
<br/>  3 Cerean   masculine male         198
<br/>  4 Chagrian masculine male         196
<br/>  # ℹ 38 more rows</code></pre>
</div>
</div>
<p>Use <strong><code>ungroup()</code></strong> to remove the grouping structure of the dataset.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb15"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb15-1"><a aria-hidden="true" href="#cb15-1" tabindex="-1"></a>starwars <span className="sc">%&gt;%</span> </span>
<span id="cb15-2"><a aria-hidden="true" href="#cb15-2" tabindex="-1"></a>  <span className="fu">group_by</span>(species, gender, sex, homeworld) <span className="sc">%&gt;%</span> </span>
<span id="cb15-3"><a aria-hidden="true" href="#cb15-3" tabindex="-1"></a>  <span className="fu">summarise</span>(<span className="at">height.mean =</span> <span className="fu">max</span>(height)) <span className="sc">%&gt;%</span> </span>
<span id="cb15-4"><a aria-hidden="true" href="#cb15-4" tabindex="-1"></a>  <span className="fu">ungroup</span>() </span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 65 × 5
<br/>    species  gender    sex   homeworld   height.mean
<br/>    &lt;chr&gt;    &lt;chr&gt;     &lt;chr&gt; &lt;chr&gt;             &lt;int&gt;
<br/>  1 Aleena   masculine male  Aleen Minor          79
<br/>  2 Besalisk masculine male  Ojom                198
<br/>  3 Cerean   masculine male  Cerea               198
<br/>  4 Chagrian masculine male  Champala            196
<br/>  # ℹ 61 more rows</code></pre>
</div>
</div>
</section>
</main>
</div>
</div>
)}