import React from 'react'; 
import {Link} from 'react-router-dom'; 
import {useRCustomEffect} from '../../../useCustomEffect'; 
export default function MutatingJoinTwoDatasets(){
useRCustomEffect()
return ( <div>
<div className="page-columns page-rows-contents page-layout-article" id="quarto-content">
<main className="content" id="quarto-document-content">
<header className="quarto-title-block default" id="title-block-header">
<div className="quarto-title">
<h1 className="title">Merge Two Datasets with Mutating Join</h1>
</div>
<div className="quarto-title-meta">
</div>
</header>
<p><strong>Mutating join</strong> is one of the most common types of data merge, and a critical skill to master in data wrangling. In this tutorial, we’ll cover a great details of this technique:</p>
<ul>
<li><a href="#basics">Basics: four types of mutating join</a></li>
<li><a href="#multiple_match">Join with matches in multiple rows</a></li>
<li><a href="#pecified_condition">Join with specified condition using <code>join_by()</code></a></li>
</ul>
<hr/>
<section className="level3" id="basics">
<h3 className="anchored" data-anchor-id="basics">Basics: four types of mutating join</h3>
<p><strong>Mutating join</strong> combines columns of two datasets, matching rows (observations) based on the shared variables (the <em>key</em> columns). It follows the basic syntax of <code>*_join(A, B, by = "col_name")</code>. A and B are two datasets, and <code>by</code> specifies the names (in quote) of the key columns, which the join is based on. All columns of A and B are combined together via the key columns, with all columns of A displayed first, followed with columns of B.</p>
<p>The star <code>*</code> represents four types of mutating join, depending on how matched and unmatched rows are kept in the output:</p>
<ul>
<li><a href="#inner_join"><code>inner_join()</code></a> returns matched rows found in both A and B (an intersection).</li>
<li><a href="#left_join"><code>left_join()</code></a> returns all rows of A.</li>
<li><a href="#right_join"><code>right_join()</code></a> returns all rows of B.</li>
<li><a href="#full_join"><code>full_join()</code></a> returns all rows of A and B (a union).</li>
</ul>
<p>The last three joins are also collectively known as the outer join.</p>
<p>The demonstrations below use the pipe operator in the form of <code>A %&gt;% join(B, by)</code>, and use two datasets <code>band_instruments</code> and <code>band_members</code> that are built in dplyr.</p>
<div id="flex">
<div>
<div className="cell">
<div className="sourceCode cell-code" id="cb1"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb1-1"><a aria-hidden="true" href="#cb1-1" tabindex="-1"></a>band_instruments</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 3 × 2
<br/>    name  plays 
<br/>    &lt;chr&gt; &lt;chr&gt; 
<br/>  1 John  guitar
<br/>  2 Paul  bass  
<br/>  3 Keith guitar</code></pre>
</div>
</div>
</div>
<div>
<div className="cell">
<div className="sourceCode cell-code" id="cb3"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb3-1"><a aria-hidden="true" href="#cb3-1" tabindex="-1"></a>band_members</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 3 × 2
<br/>    name  band   
<br/>    &lt;chr&gt; &lt;chr&gt;  
<br/>  1 Mick  Stones 
<br/>  2 John  Beatles
<br/>  3 Paul  Beatles</code></pre>
</div>
</div>
</div>
</div>
<section className="level4" id="inner_join">
<h4 className="anchored" data-anchor-id="inner_join"></h4>
<p>Use <strong><code>inner_join()</code></strong> to return rows found in <em>both</em> datasets. The merged dataset has a row number equal to or smaller than A and B. In the example below, joined by the key column “name”, the rows of “John” and “Paul” are the shared ones in both datasets, and are the only rows retained in the combined dataset.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb5"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb5-1"><a aria-hidden="true" href="#cb5-1" tabindex="-1"></a>band_members <span className="sc">%&gt;%</span> </span>
<span id="cb5-2"><a aria-hidden="true" href="#cb5-2" tabindex="-1"></a>  <span className="fu">inner_join</span>(band_instruments, <span className="at">by =</span> <span className="st">"name"</span>)</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 2 × 3
<br/>    name  band    plays 
<br/>    &lt;chr&gt; &lt;chr&gt;   &lt;chr&gt; 
<br/>  1 John  Beatles guitar
<br/>  2 Paul  Beatles bass</code></pre>
</div>
</div>
</section>
<section className="level4" id="left_join">
<h4 className="anchored" data-anchor-id="left_join"></h4>
<p>With <strong><code>left_join()</code></strong>, the left side dataset A (<code>band_members</code>) is the benchmark, and all its rows are returned. If a key in A (e.g., “Mick”) is not found in the right-side dataset B (<code>band_instruments</code>), the missing value <code>NA</code> will be used correspondingly as a placeholder for the missing entries in B.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb7"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb7-1"><a aria-hidden="true" href="#cb7-1" tabindex="-1"></a>band_members <span className="sc">%&gt;%</span> </span>
<span id="cb7-2"><a aria-hidden="true" href="#cb7-2" tabindex="-1"></a>  <span className="fu">left_join</span>(band_instruments, <span className="at">by =</span> <span className="st">"name"</span>)</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 3 × 3
<br/>    name  band    plays 
<br/>    &lt;chr&gt; &lt;chr&gt;   &lt;chr&gt; 
<br/>  1 Mick  Stones  &lt;NA&gt;  
<br/>  2 John  Beatles guitar
<br/>  3 Paul  Beatles bass</code></pre>
</div>
</div>
</section>
<section className="level4" id="right_join">
<h4 className="anchored" data-anchor-id="right_join"></h4>
<p>With <strong><code>right_join()</code></strong>, the right side dataset B (<code>band_instruments</code>) is the benchmark, and all its rows are returned (yet still columns of A will be displayed first in the joined dataset). If a key in B (e.g., “Keith”) is not found in A, a missing value <code>NA</code> is used correspondingly for missing entries in A.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb9"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb9-1"><a aria-hidden="true" href="#cb9-1" tabindex="-1"></a>band_members <span className="sc">%&gt;%</span> </span>
<span id="cb9-2"><a aria-hidden="true" href="#cb9-2" tabindex="-1"></a>  <span className="fu">right_join</span>(band_instruments, <span className="at">by =</span> <span className="st">"name"</span>)</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 3 × 3
<br/>    name  band    plays 
<br/>    &lt;chr&gt; &lt;chr&gt;   &lt;chr&gt; 
<br/>  1 John  Beatles guitar
<br/>  2 Paul  Beatles bass  
<br/>  3 Keith &lt;NA&gt;    guitar</code></pre>
</div>
</div>
</section>
<section className="level4" id="full_join">
<h4 className="anchored" data-anchor-id="full_join"></h4>
<p><strong><code>full_join()</code></strong> returns all rows from both datasets A and B. The merged dataset has a row number equal to or larger than A and B. As the key “Mick” is in A but not in B, and “Keith” is in B but not in A, this creates 2 <code>NA</code> values in the joined dataset.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb11"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb11-1"><a aria-hidden="true" href="#cb11-1" tabindex="-1"></a>band_members <span className="sc">%&gt;%</span> </span>
<span id="cb11-2"><a aria-hidden="true" href="#cb11-2" tabindex="-1"></a>  <span className="fu">full_join</span>(band_instruments, <span className="at">by =</span> <span className="st">"name"</span>)</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 4 × 3
<br/>    name  band    plays 
<br/>    &lt;chr&gt; &lt;chr&gt;   &lt;chr&gt; 
<br/>  1 Mick  Stones  &lt;NA&gt;  
<br/>  2 John  Beatles guitar
<br/>  3 Paul  Beatles bass  
<br/>  4 Keith &lt;NA&gt;    guitar</code></pre>
</div>
</div>
<p>If there are multiple key variables to join by, use <code>by = c(Var1, Var2, Var3, ...)</code>.</p>
<p>It is a good practice to type out the key columns to ensure readability. If the <code>by</code> argument is not explicitly specified, the data merge will be based on all variables shared in common in datasets A and B. Meanwhile it generates a message <code>Joining with 'by = join_by(name)'</code> in the console as a reminder for accuracy check.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb13"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb13-1"><a aria-hidden="true" href="#cb13-1" tabindex="-1"></a>band_members <span className="sc">%&gt;%</span> <span className="fu">full_join</span>(band_instruments)</span></code></pre></div>
<div className="cell-output cell-output-stderr">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  Joining with `by = join_by(name)`</code></pre>
</div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 4 × 3
<br/>    name  band    plays 
<br/>    &lt;chr&gt; &lt;chr&gt;   &lt;chr&gt; 
<br/>  1 Mick  Stones  &lt;NA&gt;  
<br/>  2 John  Beatles guitar
<br/>  3 Paul  Beatles bass  
<br/>  4 Keith &lt;NA&gt;    guitar</code></pre>
</div>
</div>
</section>
</section>
<section className="level3" id="multiple_match">
<h3 className="anchored" data-anchor-id="multiple_match">Join with matches in multiple rows</h3>
<p>By default, if a row in A matches multiple rows in B, <em>all</em> matched rows in B will be returned (in all four types of join). Consider the following example:</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb16"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb16-1"><a aria-hidden="true" href="#cb16-1" tabindex="-1"></a>A <span className="ot">&lt;-</span> <span className="fu">tibble</span>(<span className="at">x =</span> <span className="fu">c</span>(<span className="dv">10</span>, <span className="dv">20</span>, <span className="dv">30</span>),</span>
<span id="cb16-2"><a aria-hidden="true" href="#cb16-2" tabindex="-1"></a>            <span className="at">z =</span> <span className="fu">c</span>(<span className="st">"a"</span>, <span className="st">"b"</span>, <span className="st">"c"</span>))</span>
<span id="cb16-3"><a aria-hidden="true" href="#cb16-3" tabindex="-1"></a></span><br/>
<span id="cb16-4"><a aria-hidden="true" href="#cb16-4" tabindex="-1"></a>B <span className="ot">&lt;-</span> <span className="fu">tibble</span>(<span className="at">x =</span> <span className="fu">c</span>(<span className="dv">10</span>, <span className="dv">10</span>, <span className="dv">20</span>), </span>
<span id="cb16-5"><a aria-hidden="true" href="#cb16-5" tabindex="-1"></a>            <span className="at">y =</span> <span className="fu">c</span>(<span className="st">"red"</span>, <span className="st">"blue"</span>, <span className="st">"yellow"</span>))</span></code></pre></div>
</div>
<div id="flex">
<div>
<div className="cell">
<div className="sourceCode cell-code" id="cb17"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb17-1"><a aria-hidden="true" href="#cb17-1" tabindex="-1"></a>A</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 3 × 2
<br/>        x z    
<br/>    &lt;dbl&gt; &lt;chr&gt;
<br/>  1    10 a    
<br/>  2    20 b    
<br/>  3    30 c</code></pre>
</div>
</div>
</div>
<div>
<div className="cell">
<div className="sourceCode cell-code" id="cb19"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb19-1"><a aria-hidden="true" href="#cb19-1" tabindex="-1"></a>B</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 3 × 2
<br/>        x y     
<br/>    &lt;dbl&gt; &lt;chr&gt; 
<br/>  1    10 red   
<br/>  2    10 blue  
<br/>  3    20 yellow</code></pre>
</div>
</div>
</div>
</div>
<p>When joined by the column “x”, the first row in A matches the first two rows in B, and thus both the first two rows in B are returned in the output.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb21"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb21-1"><a aria-hidden="true" href="#cb21-1" tabindex="-1"></a>A <span className="sc">%&gt;%</span> <span className="fu">left_join</span>(B, <span className="at">by =</span> <span className="st">"x"</span>)</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 4 × 3
<br/>        x z     y     
<br/>    &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; 
<br/>  1    10 a     red   
<br/>  2    10 a     blue  
<br/>  3    20 b     yellow
<br/>  4    30 c     &lt;NA&gt;</code></pre>
</div>
</div>
<p>In case of multiple matches, the joining behavior can be further customized using the <code>multiple</code> argument, with optional values of:</p>
<ul>
<li><code>"all"</code>: the default, return all matched rows of B.</li>
<li><code>"any"</code>: return randomly one matched row in B.</li>
<li><code>"first"</code>: return the first matched row in B.</li>
<li><code>"last"</code>: return the last matched row in B.</li>
</ul>
<div className="cell">
<div className="sourceCode cell-code" id="cb23"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb23-1"><a aria-hidden="true" href="#cb23-1" tabindex="-1"></a><span className="co"># return only the first match</span></span>
<span id="cb23-2"><a aria-hidden="true" href="#cb23-2" tabindex="-1"></a>A <span className="sc">%&gt;%</span> <span className="fu">left_join</span>(B, <span className="at">by =</span> <span className="st">"x"</span>, <span className="at">multiple =</span> <span className="st">"first"</span>)</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 3 × 3
<br/>        x z     y     
<br/>    &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; 
<br/>  1    10 a     red   
<br/>  2    20 b     yellow
<br/>  3    30 c     &lt;NA&gt;</code></pre>
</div>
</div>
</section>
<section className="level3" id="pecified_condition">
<h3 className="anchored" data-anchor-id="pecified_condition">Join with specified condition using <strong><code>join_by()</code></strong></h3>
<p><strong><code>join_by()</code></strong> indicates how to treat the join keys, and gives more flexibility to merge datasets.</p>
<section className="level4" id="join-with-equality">
<h4 className="anchored" data-anchor-id="join-with-equality"><em>Join with equality</em></h4>
<p>Consider a new dataset <code>band_instruments2</code> where the musician column is named as “artist”, not “name” as in <code>band_members</code>. Due to a lack of shared column name, it is impossible to merge these two datasets directly.</p>
<div id="flex">
<div>
<div className="cell">
<div className="sourceCode cell-code" id="cb25"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb25-1"><a aria-hidden="true" href="#cb25-1" tabindex="-1"></a>band_members</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 3 × 2
<br/>    name  band   
<br/>    &lt;chr&gt; &lt;chr&gt;  
<br/>  1 Mick  Stones 
<br/>  2 John  Beatles
<br/>  3 Paul  Beatles</code></pre>
</div>
</div>
</div>
<div>
<div className="cell">
<div className="sourceCode cell-code" id="cb27"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb27-1"><a aria-hidden="true" href="#cb27-1" tabindex="-1"></a>band_instruments2</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 3 × 2
<br/>    artist plays 
<br/>    &lt;chr&gt;  &lt;chr&gt; 
<br/>  1 John   guitar
<br/>  2 Paul   bass  
<br/>  3 Keith  guitar</code></pre>
</div>
</div>
</div>
</div>
<p>If the join key column of the two datasets have different names, you can use the equality expression <code>==</code> to treat the column names as the same. Below <code>name == artist</code> use the column names respectively from the left and right dataset; the order can’t be switched.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb29"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb29-1"><a aria-hidden="true" href="#cb29-1" tabindex="-1"></a>band_members <span className="sc">%&gt;%</span> </span>
<span id="cb29-2"><a aria-hidden="true" href="#cb29-2" tabindex="-1"></a>  <span className="fu">full_join</span>(band_instruments2,</span>
<span id="cb29-3"><a aria-hidden="true" href="#cb29-3" tabindex="-1"></a>            <span className="at">by =</span> <span className="fu">join_by</span>(name <span className="sc">==</span> artist))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 4 × 3
<br/>    name  band    plays 
<br/>    &lt;chr&gt; &lt;chr&gt;   &lt;chr&gt; 
<br/>  1 Mick  Stones  &lt;NA&gt;  
<br/>  2 John  Beatles guitar
<br/>  3 Paul  Beatles bass  
<br/>  4 Keith &lt;NA&gt;    guitar</code></pre>
</div>
</div>
<p>By default, the join keys shared by both datasets are coalesced into a single column. Alternatively, use <code>keep = TRUE</code> to display the key columns from both datasets. In the example below, both the original join keys “name” from dataset A and “artist” from B are displayed in the output dataset.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb31"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb31-1"><a aria-hidden="true" href="#cb31-1" tabindex="-1"></a>band_members <span className="sc">%&gt;%</span> </span>
<span id="cb31-2"><a aria-hidden="true" href="#cb31-2" tabindex="-1"></a>  <span className="fu">full_join</span>(band_instruments2, <span className="at">by =</span> <span className="fu">join_by</span>(name <span className="sc">==</span> artist), </span>
<span id="cb31-3"><a aria-hidden="true" href="#cb31-3" tabindex="-1"></a>            <span className="at">keep =</span> T)</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 4 × 4
<br/>    name  band    artist plays 
<br/>    &lt;chr&gt; &lt;chr&gt;   &lt;chr&gt;  &lt;chr&gt; 
<br/>  1 Mick  Stones  &lt;NA&gt;   &lt;NA&gt;  
<br/>  2 John  Beatles John   guitar
<br/>  3 Paul  Beatles Paul   bass  
<br/>  4 &lt;NA&gt;  &lt;NA&gt;    Keith  guitar</code></pre>
</div>
</div>
</section>
<section className="level4" id="join-with-inequality">
<h4 className="anchored" data-anchor-id="join-with-inequality"><em>Join with inequality</em></h4>
<p>Consider the following dataset:</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb33"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb33-1"><a aria-hidden="true" href="#cb33-1" tabindex="-1"></a>A <span className="ot">&lt;-</span> <span className="fu">tibble</span>(<span className="at">id_A =</span> <span className="fu">c</span>(<span className="dv">1</span>, <span className="dv">2</span>, <span className="dv">3</span>), <span className="at">sales_A =</span> <span className="fu">c</span>(<span className="dv">10</span>, <span className="dv">20</span>, <span className="dv">30</span>))</span>
<span id="cb33-2"><a aria-hidden="true" href="#cb33-2" tabindex="-1"></a>B <span className="ot">&lt;-</span> <span className="fu">tibble</span>(<span className="at">id_B =</span> <span className="fu">c</span>(<span className="dv">1</span>, <span className="dv">3</span>, <span className="dv">4</span>), <span className="at">sales_B =</span> <span className="fu">c</span>(<span className="dv">5</span>, <span className="dv">20</span>, <span className="dv">40</span>))</span></code></pre></div>
</div>
<div id="flex">
<div>
<div className="cell">
<div className="sourceCode cell-code" id="cb34"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb34-1"><a aria-hidden="true" href="#cb34-1" tabindex="-1"></a>A</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 3 × 2
<br/>     id_A sales_A
<br/>    &lt;dbl&gt;   &lt;dbl&gt;
<br/>  1     1      10
<br/>  2     2      20
<br/>  3     3      30</code></pre>
</div>
</div>
</div>
<div>
<div className="cell">
<div className="sourceCode cell-code" id="cb36"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb36-1"><a aria-hidden="true" href="#cb36-1" tabindex="-1"></a>B</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 3 × 2
<br/>     id_B sales_B
<br/>    &lt;dbl&gt;   &lt;dbl&gt;
<br/>  1     1       5
<br/>  2     3      20
<br/>  3     4      40</code></pre>
</div>
</div>
</div>
</div>
<p>Instead of merging based on the shared common keys, you can define a different match: e.g., combine columns of A and B so long as the “sales_A” values are larger than “sales_B”. (This is also a case of <a href="#multiple_match">multiple match</a>, with the 3rd row in A matching the first two rows in B).</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb38"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb38-1"><a aria-hidden="true" href="#cb38-1" tabindex="-1"></a>A <span className="sc">%&gt;%</span> <span className="fu">left_join</span>(B, <span className="at">by =</span> <span className="fu">join_by</span>(sales_A <span className="sc">&gt;</span> sales_B))</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">  # A tibble: 4 × 4
<br/>     id_A sales_A  id_B sales_B
<br/>    &lt;dbl&gt;   &lt;dbl&gt; &lt;dbl&gt;   &lt;dbl&gt;
<br/>  1     1      10     1       5
<br/>  2     2      20     1       5
<br/>  3     3      30     1       5
<br/>  4     3      30     3      20</code></pre>
</div>
</div>
</section>
<section className="level4" id="join-with-na-values">
<h4 className="anchored" data-anchor-id="join-with-na-values"><em>Join with <code>NA</code> values</em></h4>
<p>By default, <code>NA</code> is considered as a match to another <code>NA</code> (just like a normal string match). Alternatively, use <code>na_matches = "never"</code> to mismatch <code>NA</code> values.</p>
<div className="cell">
<div className="sourceCode cell-code" id="cb40"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb40-1"><a aria-hidden="true" href="#cb40-1" tabindex="-1"></a>A <span className="ot">&lt;-</span> <span className="fu">data.frame</span>(<span className="at">x =</span> <span className="fu">c</span>(<span className="dv">1</span>, <span className="dv">2</span>, <span className="cn">NA</span>), <span className="at">y =</span> <span className="fu">c</span>(<span className="dv">10</span>, <span className="dv">20</span>, <span className="dv">40</span>))</span>
<span id="cb40-2"><a aria-hidden="true" href="#cb40-2" tabindex="-1"></a>B <span className="ot">&lt;-</span> <span className="fu">data.frame</span>(<span className="at">x =</span> <span className="fu">c</span>(<span className="dv">1</span>, <span className="cn">NA</span>, <span className="dv">3</span>), <span className="at">z =</span> <span className="fu">c</span>(<span className="dv">30</span>, <span className="dv">50</span>, <span className="dv">60</span>))</span></code></pre></div>
</div>
<div id="flex">
<div>
<div className="cell">
<div className="sourceCode cell-code" id="cb41"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb41-1"><a aria-hidden="true" href="#cb41-1" tabindex="-1"></a>A</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">     x  y
<br/>  1  1 10
<br/>  2  2 20
<br/>  3 NA 40</code></pre>
</div>
</div>
</div>
<div>
<div className="cell">
<div className="sourceCode cell-code" id="cb43"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb43-1"><a aria-hidden="true" href="#cb43-1" tabindex="-1"></a>B</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">     x  z
<br/>  1  1 30
<br/>  2 NA 50
<br/>  3  3 60</code></pre>
</div>
</div>
</div>
</div>
<div className="cell">
<div className="sourceCode cell-code" id="cb45"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb45-1"><a aria-hidden="true" href="#cb45-1" tabindex="-1"></a><span className="co"># 1st and 3rd row in A match the first two rows in B</span></span>
<span id="cb45-2"><a aria-hidden="true" href="#cb45-2" tabindex="-1"></a>A <span className="sc">%&gt;%</span> <span className="fu">inner_join</span>(B, <span className="at">by =</span> <span className="st">"x"</span>)</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">     x  y  z
<br/>  1  1 10 30
<br/>  2 NA 40 50</code></pre>
</div>
<div className="sourceCode cell-code" id="cb47"><pre className="sourceCode r code-with-copy"><code className="sourceCode r"><span id="cb47-1"><a aria-hidden="true" href="#cb47-1" tabindex="-1"></a><span className="co"># only the 1st row of the two datasets is a match</span></span>
<span id="cb47-2"><a aria-hidden="true" href="#cb47-2" tabindex="-1"></a>A <span className="sc">%&gt;%</span> <span className="fu">inner_join</span>(B, <span className="at">by =</span> <span className="st">"x"</span>, <span className="at">na_matches =</span> <span className="st">"never"</span>)</span></code></pre></div>
<div className="cell-output cell-output-stdout">
<pre className="demo-highlight sourceCode r rcss"><code className="sourceCode r">    x  y  z
<br/>  1 1 10 30</code></pre>
</div>
</div>
</section>
</section>
</main>
</div>
</div>
)}