import React from 'react'; 
import {Link} from 'react-router-dom'; 
import {useRCustomEffect} from '../../../useCustomEffect'; 
export default function DataMasking(){
useRCustomEffect()
return ( <div>
<div className="page-columns page-rows-contents page-layout-article" id="quarto-content">
<main className="content" id="quarto-document-content">
<header className="quarto-title-block default" id="title-block-header">
<div className="quarto-title">
<h1 className="title">Data Masking</h1>
</div>
<div className="quarto-title-meta">
</div>
</header>
<section className="level3" id="what-is-data-masking">
<h3 className="anchored" data-anchor-id="what-is-data-masking">What is data masking?</h3>
<p><strong>Data masking</strong> is a special feature used in many dplyr functions, such as <Link to="../7-arrange"><code>arrange()</code></Link>, <Link to="../16-count-observations"><code>count()</code></Link>, <Link to="../3-filter-rows"><code>filter()</code></Link>, <Link to="../6-grouped-dataset"><code>group_by()</code></Link>, <Link to="../4-mutate-columns"><code>mutate()</code></Link>, and <Link to="../5-summarize"><code>summarise()</code></Link> (these functions are marked with <code>&lt;data-masking&gt;</code> in their R documents). Data masking allows you to refer to variables (columns) in a data frame as if they were standalone variables (objects) in an environment, without explicitly using the <code>data$var</code> syntax, thus making data wrangling easier with less typing.</p>
<p>Consider the following script. It involves multiple typing of <code>starwars2$</code> in order to access the column values.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb1"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb1-1"><a aria-hidden="true" href="#cb1-1" tabindex="-1"></a><span className="fu">library</span>(dplyr)</span>
<span id="cb1-2"><a aria-hidden="true" href="#cb1-2" tabindex="-1"></a>starwars2 <span className="ot">&lt;-</span> starwars[, <span className="dv">1</span><span className="sc">:</span><span className="dv">5</span>]</span>
<span id="cb1-3"><a aria-hidden="true" href="#cb1-3" tabindex="-1"></a>starwars2[starwars2<span className="sc">$</span>height <span className="sc">&gt;</span> <span className="dv">100</span> <span className="sc">&amp;</span> starwars<span className="sc">$</span>skin_color <span className="sc">==</span> <span className="st">"gold"</span>, ]</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 1 × 5
<br/>    name  height  mass hair_color skin_color
<br/>    &lt;chr&gt;  &lt;int&gt; &lt;dbl&gt; &lt;chr&gt;      &lt;chr&gt;     
<br/>  1 C-3PO    167    75 &lt;NA&gt;       gold</code></pre>
</div>
</div>
<p>Data masking makes the script more succinct and readable.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb3"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb3-1"><a aria-hidden="true" href="#cb3-1" tabindex="-1"></a>starwars2 <span className="sc">%&gt;%</span> <span className="fu">filter</span>(height <span className="sc">&gt;</span> <span className="dv">100</span> <span className="sc">&amp;</span> skin_color <span className="sc">==</span> <span className="st">"gold"</span>)</span></code></pre></div>
</div>
<p>The above example presents two types of “variables”:</p>
<ul>
<li><p><strong>Environment-variable</strong> (e.g., <code>starwars</code> and <code>starwars2</code>) are “programming” variables that are stored in an (e.g., global) environment. They are created by importing an external file, by assignment via <code>&lt;-</code>, or exist as R packages’ built-in dataset (as the case here).</p></li>
<li><p><strong>data-variables</strong> (e.g., <code>height</code> and <code>skin_color</code>) are “statistical” variables that are stored in a dataset. In the above example, using data masking, you can access these variables by directly calling the variable (column) names, the same way you call a variable (object) from an environment.</p></li>
</ul>
</section>
<section className="level3" id="data-masking-in-programming">
<h3 className="anchored" data-anchor-id="data-masking-in-programming">Data masking in programming</h3>
<p>While data masking makes it easier to access data variables in routine data wrangling, it requires a bit extra syntax notation when used in programming. There requires two separate syntax for accessing data variables and environment variables.</p>
<section className="level4" id="access-data-variable-via">
<h4 className="anchored" data-anchor-id="access-data-variable-via">Access data-variable via <strong><code>&#123; &#125;</code></strong></h4>
<p>The following script creates a function to group a dataset based on one variable, and then calculate the mean of a second variable for each group. Here <strong><code>var1</code> and <code>var2</code> are treated as data-variables</strong>. While defining the function, they are accessed in <code>group_by()</code> and <code>summarise()</code> with <strong>doubled curly braces <code>&#123; var &#125;</code></strong>.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb4"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb4-1"><a aria-hidden="true" href="#cb4-1" tabindex="-1"></a>func_uniqueLevels <span className="ot">&lt;-</span> <span className="cf">function</span>(mydata, var1, var2)&#123;</span>
<span id="cb4-2"><a aria-hidden="true" href="#cb4-2" tabindex="-1"></a>  mydata <span className="sc">%&gt;%</span> </span>
<span id="cb4-3"><a aria-hidden="true" href="#cb4-3" tabindex="-1"></a>    <span className="co"># wrap var1 and var2 in '&#123;&#123; &#125;&#125;' </span></span>
<span id="cb4-4"><a aria-hidden="true" href="#cb4-4" tabindex="-1"></a>    <span className="fu">group_by</span>(&#123;&#123; var1 &#125;&#125;) <span className="sc">%&gt;%</span> </span>
<span id="cb4-5"><a aria-hidden="true" href="#cb4-5" tabindex="-1"></a>    <span className="fu">summarise</span>(<span className="at">mean =</span> <span className="fu">mean</span>(&#123;&#123; var2 &#125;&#125;, <span className="at">na.rm =</span> T ))</span>
<span id="cb4-6"><a aria-hidden="true" href="#cb4-6" tabindex="-1"></a>&#125;</span>
<span id="cb4-7"><a aria-hidden="true" href="#cb4-7" tabindex="-1"></a></span><br/>
<span id="cb4-8"><a aria-hidden="true" href="#cb4-8" tabindex="-1"></a>starwars2 <span className="sc">%&gt;%</span> </span>
<span id="cb4-9"><a aria-hidden="true" href="#cb4-9" tabindex="-1"></a>  <span className="co"># calculate height for each type of hair color</span></span>
<span id="cb4-10"><a aria-hidden="true" href="#cb4-10" tabindex="-1"></a>  <span className="fu">func_uniqueLevels</span>(<span className="at">var1 =</span> hair_color, <span className="at">var2 =</span> height)</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 12 × 2
<br/>    hair_color     mean
<br/>    &lt;chr&gt;         &lt;dbl&gt;
<br/>  1 auburn         150 
<br/>  2 auburn, grey   180 
<br/>  3 auburn, white  182 
<br/>  4 black          174.
<br/>  # ℹ 8 more rows</code></pre>
</div>
</div>
</section>
<section className="level4" id="access-environment-variable-via-.data">
<h4 className="anchored" data-anchor-id="access-environment-variable-via-.data">Access environment-variable via <strong><code>.data[[ ]]</code></strong></h4>
<p>Data masking can be equivalently written with the <strong><code>.data[[ "var" ]]</code></strong> syntax: it includes a fixed <code>.data</code> pronoun (it’s not a dataset), the doubled square brackets, and <strong>quoted</strong> data-variables. For instance, the following two lines are equivalent:</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb6"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb6-1"><a aria-hidden="true" href="#cb6-1" tabindex="-1"></a>starwars2 <span className="sc">%&gt;%</span> <span className="fu">filter</span>(height <span className="sc">&gt;</span> <span className="dv">10</span>)</span>
<span id="cb6-2"><a aria-hidden="true" href="#cb6-2" tabindex="-1"></a>starwars2 <span className="sc">%&gt;%</span> <span className="fu">filter</span>(.data[[<span className="st">"height"</span>]] <span className="sc">&gt;</span> <span className="dv">10</span>)</span></code></pre></div>
</div>
<p>The same syntax can be used when defining a function with access to a environment-variable. For instance, the earlier created function <code>func_uniqueLevels</code> is equivalent to the code below. Here <strong><code>var1</code> and <code>var2</code> are defined as environment-variables</strong> (e.g., <code>var1 = "hair_color"</code>), i.e., <strong>objects available in the <em>local</em> environment of the function</strong> <code>func_uniqueLevels</code> . Therefore, they are accessed in <code>group_by()</code> and <code>summarise()</code> with the <code>.data[[var]]</code> syntax.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb7"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb7-1"><a aria-hidden="true" href="#cb7-1" tabindex="-1"></a>func_uniqueLevels <span className="ot">&lt;-</span> <span className="cf">function</span>(mydata, var1, var2)&#123;</span>
<span id="cb7-2"><a aria-hidden="true" href="#cb7-2" tabindex="-1"></a>  mydata <span className="sc">%&gt;%</span> </span>
<span id="cb7-3"><a aria-hidden="true" href="#cb7-3" tabindex="-1"></a>    <span className="co"># wrap var1 and var2 in .data[[...]]</span></span>
<span id="cb7-4"><a aria-hidden="true" href="#cb7-4" tabindex="-1"></a>    <span className="fu">group_by</span>(.data[[var1]] ) <span className="sc">%&gt;%</span> </span>
<span id="cb7-5"><a aria-hidden="true" href="#cb7-5" tabindex="-1"></a>    <span className="fu">summarise</span>(<span className="at">mean =</span> <span className="fu">mean</span>(.data[[var2]], <span className="at">na.rm =</span> T ))</span>
<span id="cb7-6"><a aria-hidden="true" href="#cb7-6" tabindex="-1"></a>&#125;</span>
<span id="cb7-7"><a aria-hidden="true" href="#cb7-7" tabindex="-1"></a></span><br/>
<span id="cb7-8"><a aria-hidden="true" href="#cb7-8" tabindex="-1"></a>starwars2 <span className="sc">%&gt;%</span> </span>
<span id="cb7-9"><a aria-hidden="true" href="#cb7-9" tabindex="-1"></a>  <span className="co"># Note! the data-variables are specified as quoted strings</span></span>
<span id="cb7-10"><a aria-hidden="true" href="#cb7-10" tabindex="-1"></a>  <span className="fu">func_uniqueLevels</span>(<span className="at">var1 =</span> <span className="st">"hair_color"</span>, <span className="at">var2 =</span> <span className="st">"height"</span>)</span></code></pre></div>
</div>
</section>
</section>
<section className="level3" id="naming-of-columns">
<h3 className="anchored" data-anchor-id="naming-of-columns">Naming of columns</h3>
<p>A newly generated column can be named programmatically by using <strong><code>:=</code></strong> instead of <code>=</code>. Similar to discussion above, the naming syntax depends on either the data-variable or environment-variable is being accessed.</p>
<section className="level4" id="name-after-data-variables-with">
<h4 className="anchored" data-anchor-id="name-after-data-variables-with">Name after data-variables with <code>&#123; &#125;</code></h4>
<p>If the name of a newly created column is derived from data-variables, wrap the column name in double curly brackets <code>&#123; &#125;</code> and in quote. The function below names the new column as the original column names connected with a hyphen.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb8"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb8-1"><a aria-hidden="true" href="#cb8-1" tabindex="-1"></a>func.paste <span className="ot">&lt;-</span> <span className="cf">function</span>(mydata, var1, var2)&#123;</span>
<span id="cb8-2"><a aria-hidden="true" href="#cb8-2" tabindex="-1"></a>  mydata <span className="sc">%&gt;%</span> </span>
<span id="cb8-3"><a aria-hidden="true" href="#cb8-3" tabindex="-1"></a>    <span className="co"># new column name in double curly brackets</span></span>
<span id="cb8-4"><a aria-hidden="true" href="#cb8-4" tabindex="-1"></a>    <span className="fu">mutate</span>(<span className="st">"&#123;&#123;var1&#125;&#125;-&#123;&#123;var2&#125;&#125;"</span> <span className="sc">:</span><span className="er">=</span> <span className="fu">paste</span>(&#123;&#123;var1&#125;&#125;, <span className="st">"-"</span>, &#123;&#123;var2&#125;&#125;),</span>
<span id="cb8-5"><a aria-hidden="true" href="#cb8-5" tabindex="-1"></a>           <span className="at">.before =</span> <span className="dv">2</span>) <span className="co"># new column as the 2nd column</span></span>
<span id="cb8-6"><a aria-hidden="true" href="#cb8-6" tabindex="-1"></a>&#125;</span>
<span id="cb8-7"><a aria-hidden="true" href="#cb8-7" tabindex="-1"></a></span><br/>
<span id="cb8-8"><a aria-hidden="true" href="#cb8-8" tabindex="-1"></a><span className="co"># Note! The new column is output as the 2nd column</span></span>
<span id="cb8-9"><a aria-hidden="true" href="#cb8-9" tabindex="-1"></a>starwars2 <span className="sc">%&gt;%</span> <span className="fu">func.paste</span>(<span className="at">var1 =</span> height, <span className="at">var2 =</span> mass)</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 87 × 6
<br/>    name           `height-mass` height  mass hair_color skin_color 
<br/>    &lt;chr&gt;          &lt;chr&gt;          &lt;int&gt; &lt;dbl&gt; &lt;chr&gt;      &lt;chr&gt;      
<br/>  1 Luke Skywalker 172 - 77         172    77 blond      fair       
<br/>  2 C-3PO          167 - 75         167    75 &lt;NA&gt;       gold       
<br/>  3 R2-D2          96 - 32           96    32 &lt;NA&gt;       white, blue
<br/>  4 Darth Vader    202 - 136        202   136 none       white      
<br/>  # ℹ 83 more rows</code></pre>
</div>
</div>
</section>
<section className="level4" id="name-after-environemnt-variables-using-the-glue-syntax">
<h4 className="anchored" data-anchor-id="name-after-environemnt-variables-using-the-glue-syntax">Name after environemnt-variables using the glue syntax <code>&#123; &#125;</code></h4>
<p>If the new column’s name is derived from an environment variable, you can use the glue syntax (with a <em>single</em> pair of curly brackets <code>&#123; &#125;</code>). The immediately above example is equivalent to the code below. The new column name is based on <code>var1</code> and <code>var2</code> that are defined variables / objects available in the <em>local</em> function environment.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb10"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb10-1"><a aria-hidden="true" href="#cb10-1" tabindex="-1"></a>func.paste <span className="ot">&lt;-</span> <span className="cf">function</span>(mydata, var1, var2)&#123;</span>
<span id="cb10-2"><a aria-hidden="true" href="#cb10-2" tabindex="-1"></a>  mydata <span className="sc">%&gt;%</span> </span>
<span id="cb10-3"><a aria-hidden="true" href="#cb10-3" tabindex="-1"></a>    <span className="co"># new column name in SINGLE pair of curly brackets</span></span>
<span id="cb10-4"><a aria-hidden="true" href="#cb10-4" tabindex="-1"></a>    <span className="fu">mutate</span>(<span className="st">"&#123;var1&#125;-&#123;var2&#125;"</span> <span className="sc">:</span><span className="er">=</span> <span className="fu">paste</span>(.data[[var1]], <span className="st">"-"</span>, .data[[var2]])) </span>
<span id="cb10-5"><a aria-hidden="true" href="#cb10-5" tabindex="-1"></a>&#125;</span>
<span id="cb10-6"><a aria-hidden="true" href="#cb10-6" tabindex="-1"></a></span><br/>
<span id="cb10-7"><a aria-hidden="true" href="#cb10-7" tabindex="-1"></a><span className="co"># column names supplied as quoted strings</span></span>
<span id="cb10-8"><a aria-hidden="true" href="#cb10-8" tabindex="-1"></a>starwars2 <span className="sc">%&gt;%</span> <span className="fu">func.paste</span>(<span className="at">var1 =</span> <span className="st">"height"</span>, <span className="at">var2 =</span> <span className="st">"mass"</span>)</span></code></pre></div>
</div>
</section>
</section>
</main>
</div>
</div>
)}