Spark SQL

Spark SQL is the module within Apache Spark that provides structured data processing capabilities. This document provides a technical overview of Spark SQL’s architecture, explaining how SQL queries and DataFrame operations are parsed, analyzed, optimized, and executed within the Spark framework. For information about the user-facing APIs, see Apache Spark Overview.

Core Components of Spark SQL

Spark SQL’s architecture consists of several key components that work together to process structured data:

SQL Parsing

The parsing phase converts SQL text into an unresolved logical plan. Spark SQL uses ANTLR4 for parsing SQL statements:

xufei.biz

Explorer

Spark SQL

Core Components of Spark SQL

SQL Parsing

Graph View

Table of Contents

Backlinks