History

gatorsmile 3884455780 [SPARK-31087] [SQL] Add Back Multiple Removed APIs ### What changes were proposed in this pull request? Based on the discussion in the mailing list [[Proposal] Modification to Spark's Semantic Versioning Policy](http://apache-spark-developers-list.1001551.n3.nabble.com/Proposal-Modification-to-Spark-s-Semantic-Versioning-Policy-td28938.html) , this PR is to add back the following APIs whose maintenance cost are relatively small. - functions.toDegrees/toRadians - functions.approxCountDistinct - functions.monotonicallyIncreasingId - Column.!== - Dataset.explode - Dataset.registerTempTable - SQLContext.getOrCreate, setActive, clearActive, constructors Below is the other removed APIs in the original PR, but not added back in this PR [https://issues.apache.org/jira/browse/SPARK-25908]: - Remove some AccumulableInfo .apply() methods - Remove non-label-specific multiclass precision/recall/fScore in favor of accuracy - Remove unused Python StorageLevel constants - Remove unused multiclass option in libsvm parsing - Remove references to deprecated spark configs like spark.yarn.am.port - Remove TaskContext.isRunningLocally - Remove ShuffleMetrics.shuffle* methods - Remove BaseReadWrite.context in favor of session ### Why are the changes needed? Avoid breaking the APIs that are commonly used. ### Does this PR introduce any user-facing change? Adding back the APIs that were removed in 3.0 branch does not introduce the user-facing changes, because Spark 3.0 has not been released. ### How was this patch tested? Added a new test suite for these APIs. Author: gatorsmile <gatorsmile@gmail.com> Author: yi.wu <yi.wu@databricks.com> Closes #27821 from gatorsmile/addAPIBackV2.		2020-03-28 22:05:16 -07:00
..
catalyst	[SPARK-31261][SQL] Avoid npe when reading bad csv input with `columnNameCorruptRecord` specified	2020-03-29 13:30:14 +09:00
core	[SPARK-31087] [SQL] Add Back Multiple Removed APIs	2020-03-28 22:05:16 -07:00
hive	[SPARK-31088][SQL] Add back HiveContext and createExternalTable	2020-03-26 23:51:15 -07:00
hive-thriftserver	[SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir	2020-03-27 12:05:45 +08:00
create-docs.sh	[SPARK-30510][SQL][DOCS] Publicly document Spark SQL configuration options	2020-02-09 19:20:47 +09:00
gen-sql-api-docs.py	[SPARK-30510][SQL][DOCS] Publicly document Spark SQL configuration options	2020-02-09 19:20:47 +09:00
gen-sql-config-docs.py	[SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder	2020-02-22 09:46:42 +09:00
mkdocs.yml	[SPARK-30731] Update deprecated Mkdocs option	2020-02-19 17:28:58 +09:00
README.md	[SPARK-30510][SQL][DOCS] Publicly document Spark SQL configuration options	2020-02-09 19:20:47 +09:00

README.md

Spark SQL

This module provides support for executing relational queries expressed in either SQL or the DataFrame/Dataset API.

Spark SQL is broken up into four subprojects:

Catalyst (sql/catalyst) - An implementation-agnostic framework for manipulating trees of relational operators and expressions.
Execution (sql/core) - A query planner / execution engine for translating Catalyst's logical query plans into Spark RDDs. This component also includes a new public interface, SQLContext, that allows users to execute SQL or LINQ statements against existing RDDs and Parquet files.
Hive Support (sql/hive) - Includes extensions that allow users to write queries using a subset of HiveQL and access data from a Hive Metastore using Hive SerDes. There are also wrappers that allow users to run queries that include Hive UDFs, UDAFs, and UDTFs.
HiveServer and CLI support (sql/hive-thriftserver) - Includes support for the SQL CLI (bin/spark-sql) and a HiveServer2 (for JDBC/ODBC) compatible server.

Running ./sql/create-docs.sh generates SQL documentation for built-in functions under sql/site, and SQL configuration documentation that gets included as part of configuration.md in the main docs directory.