spark-instrumented-optimizer/sql
Reynold Xin 77a98162d1 [SPARK-18025] Use commit protocol API in structured streaming
## What changes were proposed in this pull request?
This patch adds a new commit protocol implementation ManifestFileCommitProtocol that follows the existing streaming flow, and uses it in FileStreamSink to consolidate the write path in structured streaming with the batch mode write path.

This deletes a lot of code, and would make it trivial to support other functionalities that are currently available in batch but not in streaming, including all file formats and bucketing.

## How was this patch tested?
Should be covered by existing tests.

Author: Reynold Xin <rxin@databricks.com>

Closes #15710 from rxin/SPARK-18025.
2016-11-01 18:06:57 -07:00
..
catalyst [SPARK-17764][SQL] Add to_json supporting to convert nested struct column to JSON string 2016-11-01 12:46:41 -07:00
core [SPARK-18025] Use commit protocol API in structured streaming 2016-11-01 18:06:57 -07:00
hive [SPARK-18167] Disable flaky SQLQuerySuite test 2016-11-01 12:35:34 -07:00
hive-thriftserver [SPARK-17350][SQL] Disable default use of KryoSerializer in Thrift Server 2016-11-01 16:23:47 -07:00
README.md [SPARK-16557][SQL] Remove stale doc in sql/README.md 2016-07-14 19:24:42 -07:00

Spark SQL

This module provides support for executing relational queries expressed in either SQL or the DataFrame/Dataset API.

Spark SQL is broken up into four subprojects:

  • Catalyst (sql/catalyst) - An implementation-agnostic framework for manipulating trees of relational operators and expressions.
  • Execution (sql/core) - A query planner / execution engine for translating Catalyst's logical query plans into Spark RDDs. This component also includes a new public interface, SQLContext, that allows users to execute SQL or LINQ statements against existing RDDs and Parquet files.
  • Hive Support (sql/hive) - Includes an extension of SQLContext called HiveContext that allows users to write queries using a subset of HiveQL and access data from a Hive Metastore using Hive SerDes. There are also wrappers that allows users to run queries that include Hive UDFs, UDAFs, and UDTFs.
  • HiveServer and CLI support (sql/hive-thriftserver) - Includes support for the SQL CLI (bin/spark-sql) and a HiveServer2 (for JDBC/ODBC) compatible server.