This PR only applies to master branch (1.5.0-SNAPSHOT) since it references `org.apache.parquet` classes which only appear in Parquet 1.7.0.
Author: Cheng Lian <lian@databricks.com>
Closes#6683 from liancheng/output-committer-docs and squashes the following commits:
b4648b8 [Cheng Lian] Removes spark.sql.sources.outputCommitterClass as it's not a public option
ee63923 [Cheng Lian] Updates docs and comments of data sources and Parquet output committer options
This fixes various minor documentation issues on the Spark SQL page
Author: Lars Francke <lars.francke@gmail.com>
Closes#6890 from lfrancke/SPARK-8462 and squashes the following commits:
dd7e302 [Lars Francke] Merge branch 'master' into SPARK-8462
34eff2c [Lars Francke] Minor documentation fixes
1. Add `SQLConfEntry` to store the information about a configuration. For those configurations that cannot be found in `sql-programming-guide.md`, I left the doc as `<TODO>`.
2. Verify the value when setting a configuration if this is in SQLConf.
3. Use `SET -v` to display all public configurations.
Author: zsxwing <zsxwing@gmail.com>
Closes#6747 from zsxwing/sqlconf and squashes the following commits:
7d09bad [zsxwing] Use SQLConfEntry in HiveContext
49f6213 [zsxwing] Add getConf, setConf to SQLContext and HiveContext
e014f53 [zsxwing] Merge branch 'master' into sqlconf
93dad8e [zsxwing] Fix the unit tests
cf950c1 [zsxwing] Fix the code style and tests
3c5f03e [zsxwing] Add unsetConf(SQLConfEntry) and fix the code style
a2f4add [zsxwing] getConf will return the default value if a config is not set
037b1db [zsxwing] Add schema to SetCommand
0520c3c [zsxwing] Merge branch 'master' into sqlconf
7afb0ec [zsxwing] Fix the configurations about HiveThriftServer
7e728e3 [zsxwing] Add doc for SQLConfEntry and fix 'toString'
5e95b10 [zsxwing] Add enumConf
c6ba76d [zsxwing] setRawString => setConfString, getRawString => getConfString
4abd807 [zsxwing] Fix the test for 'set -v'
6e47e56 [zsxwing] Fix the compilation error
8973ced [zsxwing] Remove floatConf
1fc3a8b [zsxwing] Remove the 'conf' command and use 'set -v' instead
99c9c16 [zsxwing] Fix tests that use SQLConfEntry as a string
88a03cc [zsxwing] Add new lines between confs and return types
ce7c6c8 [zsxwing] Remove seqConf
f3c1b33 [zsxwing] Refactor SQLConf to display better error message
Typo in thriftserver section
Author: Moussa Taifi <moutai10@gmail.com>
Closes#6847 from moutai/patch-1 and squashes the following commits:
1bd29df [Moussa Taifi] Update sql-programming-guide.md
Author: Peter Hoffmann <ph@peter-hoffmann.com>
Closes#6815 from hoffmann/patch-1 and squashes the following commits:
2abb6da [Peter Hoffmann] fix read/write mixup
Author: Cheng Lian <lian@databricks.com>
Closes#6749 from liancheng/java-sample-fix and squashes the following commits:
5b44585 [Cheng Lian] Fixes a minor Java example error in SQL programming guide
JIRA: https://issues.apache.org/jira/browse/SPARK-7939
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#6503 from viirya/disable_partition_type_inference and squashes the following commits:
3e90470 [Liang-Chi Hsieh] Default to enable type inference and update docs.
455edb1 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into disable_partition_type_inference
9a57933 [Liang-Chi Hsieh] Add conf to enable/disable partition column type inference.
Add documentation for spark.sql.planner.externalSort
Author: Luca Martinetti <luca@luca.io>
Closes#6272 from lucamartinetti/docs-externalsort and squashes the following commits:
985661b [Luca Martinetti] [SPARK-7747] [SQL] [DOCS] Add documentation for spark.sql.planner.externalSort
Author: Reynold Xin <rxin@databricks.com>
Closes#6522 from rxin/sql-doc-1.4 and squashes the following commits:
c227be7 [Reynold Xin] Updated link.
040b6d7 [Reynold Xin] Update documentation for the new DataFrame reader/writer interface.
Author: Cheng Lian <lian@databricks.com>
Closes#6520 from liancheng/spark-7849 and squashes the following commits:
705264b [Cheng Lian] Updates SQL programming guide for 1.4
This PR adds a new SparkR programming guide at the top-level. This will be useful for R users as our APIs don't directly match the Scala/Python APIs and as we need to explain SparkR without using RDDs as examples etc.
cc rxin davies pwendell
cc cafreeman -- Would be great if you could also take a look at this !
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes#6490 from shivaram/sparkr-guide and squashes the following commits:
d5ff360 [Shivaram Venkataraman] Add a section on HiveContext, HQL queries
408dce5 [Shivaram Venkataraman] Fix link
dbb86e3 [Shivaram Venkataraman] Fix minor typo
9aff5e0 [Shivaram Venkataraman] Address comments, use dplyr-like syntax in example
d09703c [Shivaram Venkataraman] Fix default argument in read.df
ea816a1 [Shivaram Venkataraman] Add a new SparkR programming guide Also update write.df, read.df to handle defaults better
This contribution is my original work and I license the work to the project under the project's open source license
Author: Matt Wise <mwise@quixey.com>
Closes#6447 from wisematthew/fix-typo-in-java-udf-registration-doc and squashes the following commits:
e7ef5f7 [Matt Wise] Fix typo in documentation for Java UDF registration
sqlCtx -> sqlContext
You can check the docs by:
```
$ cd docs
$ SKIP_SCALADOC=1 jekyll serve
```
cc shivaram
Author: Davies Liu <davies@databricks.com>
Closes#5442 from davies/r_docs and squashes the following commits:
7a12ec6 [Davies Liu] remove rdd in R docs
8496b26 [Davies Liu] remove the docs related to RDD
e23b9d6 [Davies Liu] delete R docs for RDD API
222e4ff [Davies Liu] Merge branch 'master' into r_docs
89684ce [Davies Liu] Merge branch 'r_docs' of github.com:davies/spark into r_docs
f0a10e1 [Davies Liu] address comments from @shivaram
f61de71 [Davies Liu] Update pairRDD.R
3ef7cf3 [Davies Liu] use + instead of function(a,b) a+b
2f10a77 [Davies Liu] address comments from @cafreeman
9c2a062 [Davies Liu] mention R api together with Python API
23f751a [Davies Liu] Fill in SparkR examples in programming guide
add docs for https://issues.apache.org/jira/browse/SPARK-6994
Author: vidmantas zemleris <vidmantas@vinted.com>
Closes#6030 from vidma/docs/row-with-named-fields and squashes the following commits:
241b401 [vidmantas zemleris] [SPARK-6994][SQL] Update docs for fetching Row fields by name
Author: Reynold Xin <rxin@databricks.com>
Closes#6062 from rxin/agg-retain-doc and squashes the following commits:
43e511e [Reynold Xin] [SPARK-7462][SQL] Update documentation for retaining grouping columns in DataFrames.
JIRA: https://issues.apache.org/jira/browse/SPARK-7516
In sql-programming-guide, deprecated python data frame api inferSchema() should be replaced by createDataFrame():
schemaPeople = sqlContext.inferSchema(people) ->
schemaPeople = sqlContext.createDataFrame(people)
Author: gchen <chenguancheng@gmail.com>
Closes#6041 from gchen/python-docs and squashes the following commits:
c27eb7c [gchen] replace inferSchema() with createDataFrame()
fix typo
Author: Ken Geis <geis.ken@gmail.com>
Closes#5674 from kgeis/patch-1 and squashes the following commits:
5ae67de [Ken Geis] Update sql-programming-guide.md
This patch is fixing the Java examples for Spark SQL when defining
programmatically a Schema and mapping Rows.
Author: Olivier Girardot <o.girardot@lateral-thoughts.com>
Closes#5569 from ogirardot/branch-1.3 and squashes the following commits:
c29e58d [Olivier Girardot] SPARK-6992 : Fix documentation example for Spark SQL on StructType
(cherry picked from commit c9b1ba4b16a7afe93d45bf75b128cc0dd287ded0)
Signed-off-by: Reynold Xin <rxin@databricks.com>
This patch includes :
* adding how to use map after an sql query using javaRDD
* fixing the first few java examples that were written in Scala
Thank you for your time,
Olivier.
Author: Olivier Girardot <o.girardot@lateral-thoughts.com>
Closes#5564 from ogirardot/branch-1.3 and squashes the following commits:
9f8d60e [Olivier Girardot] SPARK-6988 : Fix documentation regarding DataFrames using the Java API
(cherry picked from commit 6b528dc139da594ef2e651d84bd91fe0f738a39d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
https://issues.apache.org/jira/browse/SPARK-6863
Author: Santiago M. Mola <santiago.mola@sap.com>
Closes#5472 from smola/fix/sql-docs and squashes the following commits:
42503d4 [Santiago M. Mola] [SPARK-6863] Fix formatting on SQL programming guide.
Use `sqlContext` in PySpark shell, make it consistent with SQL programming guide. `sqlCtx` is also kept for compatibility.
Author: Davies Liu <davies@databricks.com>
Closes#5425 from davies/sqlCtx and squashes the following commits:
af67340 [Davies Liu] sqlCtx -> sqlContext
15a278f [Davies Liu] use sqlContext in python shell
Author: Michael Armbrust <michael@databricks.com>
Closes#5192 from marmbrus/fixJDBCDocs and squashes the following commits:
b48a33d [Michael Armbrust] [DOCS][SQL] Fix JDBC example
Needed to import the types specifically, not the more general pyspark.sql
Author: Bill Chambers <wchambers@ischool.berkeley.edu>
Author: anabranch <wac.chambers@gmail.com>
Closes#5179 from anabranch/master and squashes the following commits:
8fa67bf [anabranch] Corrected SqlContext Import
603b080 [Bill Chambers] [DOCUMENTATION]Fixed Missing Type Import in Documentation
Author: vinodkc <vinod.kc.in@gmail.com>
Closes#5112 from vinodkc/spark_1.3_doc_fixes and squashes the following commits:
2c6aee6 [vinodkc] Spark 1.3 doc fixes
Author: Kamil Smuga <smugakamil@gmail.com>
Author: stderr <smugakamil@gmail.com>
Closes#5120 from kamilsmuga/master and squashes the following commits:
fee3281 [Kamil Smuga] more python api links fixed for docs
13240cb [Kamil Smuga] resolved merge conflicts with upstream/master
6649b3b [Kamil Smuga] fix broken docs links to Python API
92f03d7 [stderr] Fix links to pyspark api
Author: Tijo Thomas <tijoparacka@gmail.com>
Closes#5068 from tijoparacka/fix_sql_dataframe_example and squashes the following commits:
6953ac1 [Tijo Thomas] Handled Java and Python example sections
0751a74 [Tijo Thomas] Fixed compiler and errors in Dataframe examples
Also fixed a bunch of minor styling issues.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5001)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes#5001 from liancheng/parquet-doc and squashes the following commits:
89ad3db [Cheng Lian] Addresses @rxin's comments
7eb6955 [Cheng Lian] Docs for the new Parquet data source
415eefb [Cheng Lian] Some minor formatting improvements
Miss `toDF()` function in docs/sql-programming-guide.md
Author: zzcclp <xm_zzc@sina.com>
Closes#4977 from zzcclp/SPARK-6275 and squashes the following commits:
9a96c7b [zzcclp] Miss toDF()
Author: Michael Armbrust <michael@databricks.com>
Closes#4958 from marmbrus/sqlDocs and squashes the following commits:
9351dbc [Michael Armbrust] fix parquet example
6877e13 [Michael Armbrust] add sql examples
d81b7e7 [Michael Armbrust] rxins comments
e393528 [Michael Armbrust] fix order
19c2735 [Michael Armbrust] more on data source load/store
00d5914 [Michael Armbrust] Update SQL Docs with JDBC and Migration Guide
Author: Reynold Xin <rxin@databricks.com>
Closes#4954 from rxin/df-docs and squashes the following commits:
c592c70 [Reynold Xin] [SPARK-5310][Doc] Update SQL Programming Guide to include DataFrames.
Author: CodingCat <zhunansjtu@gmail.com>
Closes#4656 from CodingCat/fix_typo and squashes the following commits:
b41d15c [CodingCat] recover
689fe46 [CodingCat] fix typo
Updated examples using the new api and added DataFrame concept
Author: Antonio Navarro Perez <ajnavarro@users.noreply.github.com>
Closes#4560 from ajnavarro/ajnavarro-doc-sql-update and squashes the following commits:
82ebcf3 [Antonio Navarro Perez] Changed a missing JavaSQLContext to SQLContext.
8d5376a [Antonio Navarro Perez] fixed typo
8196b6b [Antonio Navarro Perez] [SQL][DOCS] Update sql documentation
Deprecate inferSchema() and applySchema(), use createDataFrame() instead, which could take an optional `schema` to create an DataFrame from an RDD. The `schema` could be StructType or list of names of columns.
Author: Davies Liu <davies@databricks.com>
Closes#4498 from davies/create and squashes the following commits:
08469c1 [Davies Liu] remove Scala/Java API for now
c80a7a9 [Davies Liu] fix hive test
d1bd8f2 [Davies Liu] cleanup applySchema
9526e97 [Davies Liu] createDataFrame from RDD with columns
Trivial fix.
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#4400 from adrian-wang/docdate and squashes the following commits:
31bbe40 [Daoyuan Wang] doc fix for date
- Add meta description tags on some of the most important doc pages
- Shorten the titles of some pages to have more relevant keywords; for
example there's no reason to have "Spark SQL Programming Guide - Spark
1.2.0 documentation", we can just say "Spark SQL - Spark 1.2.0
documentation".
Author: Matei Zaharia <matei@databricks.com>
Closes#4381 from mateiz/docs-seo and squashes the following commits:
4940563 [Matei Zaharia] [SPARK-5608] Improve SEO of Spark documentation pages
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#3820 from adrian-wang/parquettimestamp and squashes the following commits:
b1e2a0d [Daoyuan Wang] fix for nanos
4dadef1 [Daoyuan Wang] fix wrong read
93f438d [Daoyuan Wang] parquet timestamp support
Follow up of #3925
/cc rxin
Author: scwf <wangfei1@huawei.com>
Closes#4095 from scwf/sql-doc and squashes the following commits:
97e311b [scwf] update sql doc since now expose only one version of the data type APIs
`CACHE TABLE tbl` is now __eager__ by default not __lazy__
Author: luogankun <luogankun@gmail.com>
Closes#3773 from luogankun/SPARK-4930 and squashes the following commits:
cc17b7d [luogankun] [SPARK-4930][SQL][DOCS]Update SQL programming guide, add CACHE [LAZY] TABLE [AS SELECT] ...
bffe0e8 [luogankun] [SPARK-4930][SQL][DOCS]Update SQL programming guide, CACHE TABLE tbl is eager
* This commit hopes to avoid the confusion I faced when trying
to submit a regular, valid multi-line JSON file, also see
http://apache-spark-user-list.1001560.n3.nabble.com/Loading-JSON-Dataset-fails-with-com-fasterxml-jackson-databind-JsonMappingException-td20041.html
Author: Peter Vandenabeele <peter@vandenabeele.com>
Closes#3517 from petervandenabeele/pv-docs-note-on-jsonFile-format/01 and squashes the following commits:
1f98e52 [Peter Vandenabeele] Revert to people.json and simple Note text
6b6e062 [Peter Vandenabeele] Change the "JSON" connotation to "txt"
fca7dfb [Peter Vandenabeele] Add a Note on jsonFile having separate JSON objects per line
Add HTTP protocol support and test cases to spark thrift server, so users can deploy thrift server in both TCP and http mode.
Author: Judy Nash <judynash@microsoft.com>
Author: judynash <judynash@microsoft.com>
Closes#3672 from judynash/master and squashes the following commits:
526315d [Judy Nash] correct spacing on startThriftServer method
31a6520 [Judy Nash] fix code style issues and update sql programming guide format issue
47bf87e [Judy Nash] modify withJdbcStatement method definition to meet less than 100 line length
2e9c11c [Judy Nash] add thrift server in http mode documentation on sql programming guide
1cbd305 [Judy Nash] Merge remote-tracking branch 'upstream/master'
2b1d312 [Judy Nash] updated http thrift server support based on feedback
377532c [judynash] add HTTP protocol spark thrift server
Author: Andy Konwinski <andykonwinski@gmail.com>
Closes#3611 from andyk/patch-3 and squashes the following commits:
7bab333 [Andy Konwinski] Fix typo in Spark SQL docs.
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#3535 from adrian-wang/datedoc and squashes the following commits:
18ff1ed [Daoyuan Wang] [DOC] Date type