import React from 'react'; 
import {Link} from 'react-router-dom'; 
import {useRCustomEffect} from '../../../useCustomEffect'; 
export default function SeparateColumns(){
useRCustomEffect()
return ( <div>
<div className="page-columns page-rows-contents page-layout-article" id="quarto-content">
<main className="content" id="quarto-document-content">
<header className="quarto-title-block default" id="title-block-header">
<div className="quarto-title">
<h1 className="title">Split a Column into Multiple Columns</h1>
</div>
<div className="quarto-title-meta">
</div>
</header>
<p><code>separate()</code> splits a single character column into multiple columns based on a specified separator. In this tutorial, we’ll cover the following content:</p>
<ul>
<li><a href="#basics">The basics of <code>separate()</code></a></li>
<li><a href="#unequal_length">Deal with unequal length of split pieces</a></li>
<li><a href="#RegEx">Use regular expression to specify separators</a></li>
</ul>
<hr/>
<section className="level3" id="the-basics-of-separate">
<h3 className="anchored" data-anchor-id="the-basics-of-separate"><span id="basics">The basics of <code>separate()</code></span></h3>
<p><code>separate()</code> has four basic arguments, and can be summarized as <code>separate(data, col, into, sep)</code>:</p>
<ul>
<li><code>data</code>: the input data frame. It can be conveniently passed into this function using the <Link to="/R/data-wrangling/dplyr/1-pipe-operator">pipe operator <code>%&gt;%</code></Link>.</li>
<li><code>col</code> specifies the name of column that should be split into separate columns.</li>
<li><code>into</code> specifies the names of new variables to create.</li>
<li><code>sep</code> specifies the separator used to split the <code>col</code>. If not specified, any sequence of non-alphanumeric values will be used to separate the <code>col</code>. In this example, we use the dot as a separator. As the dot is typically interpreted as a <Link to="/R/data-wrangling/regular-expression/4-wildcards">wildcard</Link> (representing any character), we render it a <em>literal</em> dot by adding two backslashes before it.</li>
</ul>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb1"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb1-1"><a aria-hidden="true" href="#cb1-1" tabindex="-1"></a><span className="fu">library</span>(tidyr)</span>
<span id="cb1-2"><a aria-hidden="true" href="#cb1-2" tabindex="-1"></a><span className="fu">library</span>(dplyr)</span>
<span id="cb1-3"><a aria-hidden="true" href="#cb1-3" tabindex="-1"></a></span><br/>
<span id="cb1-4"><a aria-hidden="true" href="#cb1-4" tabindex="-1"></a>df <span className="ot">&lt;-</span> <span className="fu">tibble</span>(<span className="at">x =</span> <span className="fu">c</span>(<span className="st">"ABC.XYZ.8701"</span>, <span className="st">"apple.juice.1765"</span>, <span className="cn">NA</span>, <span className="st">"yes.no.123"</span>))</span>
<span id="cb1-5"><a aria-hidden="true" href="#cb1-5" tabindex="-1"></a>df</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 4 × 1
<br/>  x               
<br/>  &lt;chr&gt;           
<br/>1 ABC.XYZ.8701    
<br/>2 apple.juice.1765
<br/>3 &lt;NA&gt;            
<br/>4 yes.no.123      </code></pre>
</div>
</div>
<p>Here we split the column <code>x</code> into three separate columns (<code>A</code>, <code>B</code>, <code>C</code>) based on the dot (<code>.</code>) separator.</p>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb3"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb3-1"><a aria-hidden="true" href="#cb3-1" tabindex="-1"></a>df <span className="sc">%&gt;%</span> <span className="fu">separate</span>(</span>
<span id="cb3-2"><a aria-hidden="true" href="#cb3-2" tabindex="-1"></a>  <span className="at">col =</span> x,                 <span className="co"># split column "x"</span></span>
<span id="cb3-3"><a aria-hidden="true" href="#cb3-3" tabindex="-1"></a>  <span className="at">into =</span> <span className="fu">c</span>(<span className="st">"A"</span>, <span className="st">"B"</span>, <span className="st">"C"</span>), <span className="co"># create three new columns "A", "B", and "C"</span></span>
<span id="cb3-4"><a aria-hidden="true" href="#cb3-4" tabindex="-1"></a>  <span className="at">sep =</span> <span className="st">"</span><span className="sc">\\</span><span className="st">."</span>)             <span className="co"># split based on separator dot</span></span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 4 × 3
<br/>  A     B     C    
<br/>  &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
<br/>1 ABC   XYZ   8701 
<br/>2 apple juice 1765 
<br/>3 &lt;NA&gt;  &lt;NA&gt;  &lt;NA&gt; 
<br/>4 yes   no    123  </code></pre>
</div>
</div>
<p>Use <code>NA</code> in <code>into</code> to remove a specific variable from the output.</p>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb5"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb5-1"><a aria-hidden="true" href="#cb5-1" tabindex="-1"></a><span className="co"># retain only the last two columns in the output</span></span>
<span id="cb5-2"><a aria-hidden="true" href="#cb5-2" tabindex="-1"></a>df <span className="sc">%&gt;%</span> <span className="fu">separate</span>(<span className="at">col =</span> x, <span className="at">into =</span> <span className="fu">c</span>(<span className="cn">NA</span>, <span className="st">"B"</span>, <span className="st">"C"</span>))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 4 × 2
<br/>  B     C    
<br/>  &lt;chr&gt; &lt;chr&gt;
<br/>1 XYZ   8701 
<br/>2 juice 1765 
<br/>3 &lt;NA&gt;  &lt;NA&gt; 
<br/>4 no    123  </code></pre>
</div>
</div>
<p>By default, all new columns are generated as characters. Use <code>convert = T</code> to automatically parse the columns as the appropriate data types.</p>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb7"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb7-1"><a aria-hidden="true" href="#cb7-1" tabindex="-1"></a>df <span className="sc">%&gt;%</span> <span className="fu">separate</span>(</span>
<span id="cb7-2"><a aria-hidden="true" href="#cb7-2" tabindex="-1"></a>  <span className="at">col =</span> x, <span className="at">into =</span> <span className="fu">c</span>(<span className="st">"A"</span>, <span className="st">"B"</span>, <span className="st">"C"</span>),</span>
<span id="cb7-3"><a aria-hidden="true" href="#cb7-3" tabindex="-1"></a>  <span className="at">convert =</span> T) <span className="co"># column "C" is nicely parsed as integers</span></span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 4 × 3
<br/>  A     B         C
<br/>  &lt;chr&gt; &lt;chr&gt; &lt;int&gt;
<br/>1 ABC   XYZ    8701
<br/>2 apple juice  1765
<br/>3 &lt;NA&gt;  &lt;NA&gt;     NA
<br/>4 yes   no      123</code></pre>
</div>
</div>
</section>
<section className="level3" id="deal-with-unequal-length-of-split-pieces">
<h3 className="anchored" data-anchor-id="deal-with-unequal-length-of-split-pieces"><span id="unequal_length">Deal with unequal length of split pieces</span></h3>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb9"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb9-1"><a aria-hidden="true" href="#cb9-1" tabindex="-1"></a>df2 <span className="ot">&lt;-</span> <span className="fu">tibble</span>(<span className="at">x =</span> <span className="fu">c</span>(<span className="st">"A"</span>, <span className="st">"A B"</span>, <span className="st">"X Y Z"</span>))</span>
<span id="cb9-2"><a aria-hidden="true" href="#cb9-2" tabindex="-1"></a>df2</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 3 × 1
<br/>  x    
<br/>  &lt;chr&gt;
<br/>1 A    
<br/>2 A B  
<br/>3 X Y Z</code></pre>
</div>
</div>
<p>The following script splits column <code>x</code> into two columns by the white space separator. However, only the second row generates exactly two pieces; the first row generates only a single piece, and the third row generates three pieces.</p>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb11"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb11-1"><a aria-hidden="true" href="#cb11-1" tabindex="-1"></a><span className="co"># Warning messages:</span></span>
<span id="cb11-2"><a aria-hidden="true" href="#cb11-2" tabindex="-1"></a><span className="co"># 1: Expected 2 pieces. Additional pieces discarded in 1 rows [3]. </span></span>
<span id="cb11-3"><a aria-hidden="true" href="#cb11-3" tabindex="-1"></a><span className="co"># 2: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [1].</span></span>
<span id="cb11-4"><a aria-hidden="true" href="#cb11-4" tabindex="-1"></a>df2 <span className="sc">%&gt;%</span> <span className="fu">separate</span>(<span className="at">col =</span> x, <span className="at">into =</span> <span className="fu">c</span>(<span className="st">"col_1"</span>, <span className="st">"col_2"</span>))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 3 × 2
<br/>  col_1 col_2
<br/>  &lt;chr&gt; &lt;chr&gt;
<br/>1 A     &lt;NA&gt; 
<br/>2 A     B    
<br/>3 X     Y    </code></pre>
</div>
</div>
<p>In case of unequal number of split pieces, use the <code>fill</code> and <code>extra</code> arguments to control the output:</p>
<ul>
<li><p><code>fill</code> specifies how to address the rows when there are <em>not enough</em> pieces (as the case of the first row). It has three value options:</p>
<ul>
<li><code>"warn"</code> (the default): emit a warning and fill with missing values from the right.</li>
<li><code>"right"</code>: fill with missing values on the right.</li>
<li><code>"left"</code>: fill with missing values on the left.</li>
</ul></li>
<li><p><code>extra</code> specifies how to address the <em>extra</em> split pieces (as the case of the third row). It has three value options:</p>
<ul>
<li><code>"warn"</code> (the default): emit a warning and drop extra values.</li>
<li><code>"drop"</code>: drop any extra values without a warning.</li>
<li><code>"merge"</code>: only split into the most length of pieces (specified by <code>into</code>), and keep intact the additional non-split pieces.</li>
</ul></li>
</ul>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb13"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb13-1"><a aria-hidden="true" href="#cb13-1" tabindex="-1"></a>df2 <span className="sc">%&gt;%</span> </span>
<span id="cb13-2"><a aria-hidden="true" href="#cb13-2" tabindex="-1"></a>  <span className="fu">separate</span>(</span>
<span id="cb13-3"><a aria-hidden="true" href="#cb13-3" tabindex="-1"></a>    <span className="at">col =</span> x, <span className="at">into =</span> <span className="fu">c</span>(<span className="st">"col_1"</span>, <span className="st">"col_2"</span>), </span>
<span id="cb13-4"><a aria-hidden="true" href="#cb13-4" tabindex="-1"></a>    <span className="at">fill =</span> <span className="st">"right"</span>, <span className="co"># fill with NA on the right (for 1st row)</span></span>
<span id="cb13-5"><a aria-hidden="true" href="#cb13-5" tabindex="-1"></a>    <span className="at">extra =</span> <span className="st">"drop"</span>) <span className="co"># remove extra pieces (for 3rd row)</span></span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 3 × 2
<br/>  col_1 col_2
<br/>  &lt;chr&gt; &lt;chr&gt;
<br/>1 A     &lt;NA&gt; 
<br/>2 A     B    
<br/>3 X     Y    </code></pre>
</div>
</div>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb15"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb15-1"><a aria-hidden="true" href="#cb15-1" tabindex="-1"></a>df2 <span className="sc">%&gt;%</span> </span>
<span id="cb15-2"><a aria-hidden="true" href="#cb15-2" tabindex="-1"></a>  <span className="fu">separate</span>(</span>
<span id="cb15-3"><a aria-hidden="true" href="#cb15-3" tabindex="-1"></a>    <span className="at">col =</span> x, <span className="at">into =</span> <span className="fu">c</span>(<span className="st">"col_1"</span>, <span className="st">"col_2"</span>), </span>
<span id="cb15-4"><a aria-hidden="true" href="#cb15-4" tabindex="-1"></a>    <span className="at">fill =</span> <span className="st">"left"</span>,   <span className="co"># fill with NA on the left (for 1st row)</span></span>
<span id="cb15-5"><a aria-hidden="true" href="#cb15-5" tabindex="-1"></a>    <span className="at">extra =</span> <span className="st">"merge"</span>) <span className="co"># merge extra pieces (for 3rd row)</span></span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 3 × 2
<br/>  col_1 col_2
<br/>  &lt;chr&gt; &lt;chr&gt;
<br/>1 &lt;NA&gt;  A    
<br/>2 A     B    
<br/>3 X     Y Z  </code></pre>
</div>
</div>
<p>Or you can keep all split pieces by increasing the number of newly generated columns.</p>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb17"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb17-1"><a aria-hidden="true" href="#cb17-1" tabindex="-1"></a>df2 <span className="sc">%&gt;%</span> <span className="fu">separate</span>(<span className="at">col =</span> x, <span className="at">into =</span> <span className="fu">c</span>(<span className="st">"col_1"</span>, <span className="st">"col_2"</span>, <span className="st">"col_3"</span>),</span>
<span id="cb17-2"><a aria-hidden="true" href="#cb17-2" tabindex="-1"></a>                 <span className="at">fill =</span> <span className="st">"right"</span>)</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 3 × 3
<br/>  col_1 col_2 col_3
<br/>  &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
<br/>1 A     &lt;NA&gt;  &lt;NA&gt; 
<br/>2 A     B     &lt;NA&gt; 
<br/>3 X     Y     Z    </code></pre>
</div>
</div>
</section>
<section className="level3" id="use-regular-expression-to-specify-separators">
<h3 className="anchored" data-anchor-id="use-regular-expression-to-specify-separators"><span id="RegEx">Use regular expression to specify separators</span></h3>
<p>You can use <Link to="/R/data-wrangling/regular-expression/0-introduction">regular expressions</Link> to separate based on multiple characters. Below we use <Link to="/R/data-wrangling/regular-expression/1-character-class">character class</Link> <code>[.?: ]</code> to specify any of the characters, dot, question mark, and colon, and white space, to be valid separators.</p>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb19"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb19-1"><a aria-hidden="true" href="#cb19-1" tabindex="-1"></a>df3 <span className="ot">&lt;-</span> <span className="fu">tibble</span>(<span className="at">x =</span> <span className="fu">c</span>(<span className="cn">NA</span>, <span className="st">"x?y"</span>, <span className="st">"x.z"</span>, <span className="st">"y:z"</span>, <span className="st">"abc ABC"</span>))</span>
<span id="cb19-2"><a aria-hidden="true" href="#cb19-2" tabindex="-1"></a>df3</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 5 × 1
<br/>  x      
<br/>  &lt;chr&gt;  
<br/>1 &lt;NA&gt;   
<br/>2 x?y    
<br/>3 x.z    
<br/>4 y:z    
<br/>5 abc ABC</code></pre>
</div>
</div>
<div className="cell" data-layout-align="center">
<div className="sourceCode cell-code" id="cb21"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb21-1"><a aria-hidden="true" href="#cb21-1" tabindex="-1"></a>df3 <span className="sc">%&gt;%</span> <span className="fu">separate</span>(<span className="at">col =</span> x, <span className="at">into =</span> <span className="fu">c</span>(<span className="st">"A"</span>,<span className="st">"B"</span>), <span className="at">sep =</span> <span className="st">"[.?: ]"</span>)</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r"># A tibble: 5 × 2
<br/>  A     B    
<br/>  &lt;chr&gt; &lt;chr&gt;
<br/>1 &lt;NA&gt;  &lt;NA&gt; 
<br/>2 x     y    
<br/>3 x     z    
<br/>4 y     z    
<br/>5 abc   ABC  </code></pre>
</div>
</div>
</section>
</main>
</div>
</div>
)}