import React from 'react'; 
import {Link} from 'react-router-dom'; 
import {useSparkCustomEffect} from '../../useCustomEffect'; 
export default function PythonOutput(){
useSparkCustomEffect()
return ( <div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser">
</div>
<div class="jp-InputArea jp-Cell-inputArea"><div class="jp-InputPrompt jp-InputArea-prompt">
</div><div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput" data-mime-type="text/markdown">
<h1 id="Understand-PySpark-DataFrame">Quick Introduction to PySpark DataFrame<a class="anchor-link" href="#Understand-PySpark-DataFrame">¶</a></h1><p><strong>DataFrame</strong> is a core data structure in PySpark. PySpark DataFrames are distributed across multiple nodes in a cluster. Each node stores and processes a portion of the data independently, allowing for parallel processing. This distribution enables PySpark to handle large-scale datasets by leveraging the combined computational power of multiple machines in the cluster.</p>
<h3 id="Key-Characteristics-of-PySpark-DataFrame">Key Characteristics of PySpark DataFrame<a class="anchor-link" href="#Key-Characteristics-of-PySpark-DataFrame">¶</a></h3><ul>
<li><strong>Distributed</strong>: Unlike a traditional DataFrame in Python (Pandas), a PySpark DataFrame is distributed across the cluster, making it suitable for parallel processing large datasets.</li>
<li><strong>Immutable</strong>: Once created, DataFrames cannot be changed. Transformations on a DataFrame return a new DataFrame without altering the original data.</li>
<li><strong>Schema-based</strong>: PySpark DataFrames have a schema that defines the structure of the data, including column names and data types. This schema is essential for data validation and manipulation.</li>
<li><strong>Lazy Evaluation</strong>: PySpark employs lazy evaluation, meaning that transformations on DataFrames are not executed immediately. Instead, they are recorded as a plan that is executed only when an action (like count or collect) is called.</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser">
</div>
<div class="jp-InputArea jp-Cell-inputArea"><div class="jp-InputPrompt jp-InputArea-prompt">
</div><div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput" data-mime-type="text/markdown">
<p>Continue reading <Link to="../createDataFrame">Create a PySpark DataFrame Using createDataFrame()</Link> and <Link to="../spark-dataframe-reader">Import Files into PySpark DataFrames</Link> to learn how to create DataFrames from various data structures and sources.</p>
</div>
</div>
</div>
</div>
</div>
)}