This fixes various minor documentation issues on the Spark SQL page
Author: Lars Francke <lars.francke@gmail.com>
Closes#6890 from lfrancke/SPARK-8462 and squashes the following commits:
dd7e302 [Lars Francke] Merge branch 'master' into SPARK-8462
34eff2c [Lars Francke] Minor documentation fixes
(cherry picked from commit 4ce3bab89f)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Typo in thriftserver section
Author: Moussa Taifi <moutai10@gmail.com>
Closes#6847 from moutai/patch-1 and squashes the following commits:
1bd29df [Moussa Taifi] Update sql-programming-guide.md
(cherry picked from commit dc455b8833)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Author: Peter Hoffmann <ph@peter-hoffmann.com>
Closes#6815 from hoffmann/patch-1 and squashes the following commits:
2abb6da [Peter Hoffmann] fix read/write mixup
(cherry picked from commit f3f2a4397d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Author: Cheng Lian <lian@databricks.com>
Closes#6749 from liancheng/java-sample-fix and squashes the following commits:
5b44585 [Cheng Lian] Fixes a minor Java example error in SQL programming guide
(cherry picked from commit 8f7308f9c4)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Add documentation for spark.sql.planner.externalSort
Author: Luca Martinetti <luca@luca.io>
Closes#6272 from lucamartinetti/docs-externalsort and squashes the following commits:
985661b [Luca Martinetti] [SPARK-7747] [SQL] [DOCS] Add documentation for spark.sql.planner.externalSort
(cherry picked from commit 4060526cd3)
Signed-off-by: Yin Huai <yhuai@databricks.com>
Author: Reynold Xin <rxin@databricks.com>
Closes#6522 from rxin/sql-doc-1.4 and squashes the following commits:
c227be7 [Reynold Xin] Updated link.
040b6d7 [Reynold Xin] Update documentation for the new DataFrame reader/writer interface.
(cherry picked from commit 00a7137900)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Author: Cheng Lian <lian@databricks.com>
Closes#6520 from liancheng/spark-7849 and squashes the following commits:
705264b [Cheng Lian] Updates SQL programming guide for 1.4
(cherry picked from commit 6e3f0c7810)
Signed-off-by: Reynold Xin <rxin@databricks.com>
This PR adds a new SparkR programming guide at the top-level. This will be useful for R users as our APIs don't directly match the Scala/Python APIs and as we need to explain SparkR without using RDDs as examples etc.
cc rxin davies pwendell
cc cafreeman -- Would be great if you could also take a look at this !
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes#6490 from shivaram/sparkr-guide and squashes the following commits:
d5ff360 [Shivaram Venkataraman] Add a section on HiveContext, HQL queries
408dce5 [Shivaram Venkataraman] Fix link
dbb86e3 [Shivaram Venkataraman] Fix minor typo
9aff5e0 [Shivaram Venkataraman] Address comments, use dplyr-like syntax in example
d09703c [Shivaram Venkataraman] Fix default argument in read.df
ea816a1 [Shivaram Venkataraman] Add a new SparkR programming guide Also update write.df, read.df to handle defaults better
(cherry picked from commit 5f48e5c33b)
Signed-off-by: Davies Liu <davies@databricks.com>
This contribution is my original work and I license the work to the project under the project's open source license
Author: Matt Wise <mwise@quixey.com>
Closes#6447 from wisematthew/fix-typo-in-java-udf-registration-doc and squashes the following commits:
e7ef5f7 [Matt Wise] Fix typo in documentation for Java UDF registration
(cherry picked from commit 35410614de)
Signed-off-by: Reynold Xin <rxin@databricks.com>
sqlCtx -> sqlContext
You can check the docs by:
```
$ cd docs
$ SKIP_SCALADOC=1 jekyll serve
```
cc shivaram
Author: Davies Liu <davies@databricks.com>
Closes#5442 from davies/r_docs and squashes the following commits:
7a12ec6 [Davies Liu] remove rdd in R docs
8496b26 [Davies Liu] remove the docs related to RDD
e23b9d6 [Davies Liu] delete R docs for RDD API
222e4ff [Davies Liu] Merge branch 'master' into r_docs
89684ce [Davies Liu] Merge branch 'r_docs' of github.com:davies/spark into r_docs
f0a10e1 [Davies Liu] address comments from @shivaram
f61de71 [Davies Liu] Update pairRDD.R
3ef7cf3 [Davies Liu] use + instead of function(a,b) a+b
2f10a77 [Davies Liu] address comments from @cafreeman
9c2a062 [Davies Liu] mention R api together with Python API
23f751a [Davies Liu] Fill in SparkR examples in programming guide
(cherry picked from commit 7af3818c6b)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
add docs for https://issues.apache.org/jira/browse/SPARK-6994
Author: vidmantas zemleris <vidmantas@vinted.com>
Closes#6030 from vidma/docs/row-with-named-fields and squashes the following commits:
241b401 [vidmantas zemleris] [SPARK-6994][SQL] Update docs for fetching Row fields by name
(cherry picked from commit 640f63b959)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Author: Reynold Xin <rxin@databricks.com>
Closes#6062 from rxin/agg-retain-doc and squashes the following commits:
43e511e [Reynold Xin] [SPARK-7462][SQL] Update documentation for retaining grouping columns in DataFrames.
(cherry picked from commit 3a9b6997df)
Signed-off-by: Reynold Xin <rxin@databricks.com>
JIRA: https://issues.apache.org/jira/browse/SPARK-7516
In sql-programming-guide, deprecated python data frame api inferSchema() should be replaced by createDataFrame():
schemaPeople = sqlContext.inferSchema(people) ->
schemaPeople = sqlContext.createDataFrame(people)
Author: gchen <chenguancheng@gmail.com>
Closes#6041 from gchen/python-docs and squashes the following commits:
c27eb7c [gchen] replace inferSchema() with createDataFrame()
(cherry picked from commit 8e674331d9)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Author: ksonj <kson@siberie.de>
Closes#5971 from ksonj/doc and squashes the following commits:
dadfebb [ksonj] __getitem__ is cleaner than __getattr__
(cherry picked from commit fae4e2d609)
Signed-off-by: Reynold Xin <rxin@databricks.com>
fix typo
Author: Ken Geis <geis.ken@gmail.com>
Closes#5674 from kgeis/patch-1 and squashes the following commits:
5ae67de [Ken Geis] Update sql-programming-guide.md
This patch is fixing the Java examples for Spark SQL when defining
programmatically a Schema and mapping Rows.
Author: Olivier Girardot <o.girardot@lateral-thoughts.com>
Closes#5569 from ogirardot/branch-1.3 and squashes the following commits:
c29e58d [Olivier Girardot] SPARK-6992 : Fix documentation example for Spark SQL on StructType
(cherry picked from commit c9b1ba4b16a7afe93d45bf75b128cc0dd287ded0)
Signed-off-by: Reynold Xin <rxin@databricks.com>
This patch includes :
* adding how to use map after an sql query using javaRDD
* fixing the first few java examples that were written in Scala
Thank you for your time,
Olivier.
Author: Olivier Girardot <o.girardot@lateral-thoughts.com>
Closes#5564 from ogirardot/branch-1.3 and squashes the following commits:
9f8d60e [Olivier Girardot] SPARK-6988 : Fix documentation regarding DataFrames using the Java API
(cherry picked from commit 6b528dc139da594ef2e651d84bd91fe0f738a39d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
https://issues.apache.org/jira/browse/SPARK-6863
Author: Santiago M. Mola <santiago.mola@sap.com>
Closes#5472 from smola/fix/sql-docs and squashes the following commits:
42503d4 [Santiago M. Mola] [SPARK-6863] Fix formatting on SQL programming guide.
Use `sqlContext` in PySpark shell, make it consistent with SQL programming guide. `sqlCtx` is also kept for compatibility.
Author: Davies Liu <davies@databricks.com>
Closes#5425 from davies/sqlCtx and squashes the following commits:
af67340 [Davies Liu] sqlCtx -> sqlContext
15a278f [Davies Liu] use sqlContext in python shell
Author: Michael Armbrust <michael@databricks.com>
Closes#5192 from marmbrus/fixJDBCDocs and squashes the following commits:
b48a33d [Michael Armbrust] [DOCS][SQL] Fix JDBC example
Needed to import the types specifically, not the more general pyspark.sql
Author: Bill Chambers <wchambers@ischool.berkeley.edu>
Author: anabranch <wac.chambers@gmail.com>
Closes#5179 from anabranch/master and squashes the following commits:
8fa67bf [anabranch] Corrected SqlContext Import
603b080 [Bill Chambers] [DOCUMENTATION]Fixed Missing Type Import in Documentation
Author: vinodkc <vinod.kc.in@gmail.com>
Closes#5112 from vinodkc/spark_1.3_doc_fixes and squashes the following commits:
2c6aee6 [vinodkc] Spark 1.3 doc fixes
Author: Kamil Smuga <smugakamil@gmail.com>
Author: stderr <smugakamil@gmail.com>
Closes#5120 from kamilsmuga/master and squashes the following commits:
fee3281 [Kamil Smuga] more python api links fixed for docs
13240cb [Kamil Smuga] resolved merge conflicts with upstream/master
6649b3b [Kamil Smuga] fix broken docs links to Python API
92f03d7 [stderr] Fix links to pyspark api
Author: Tijo Thomas <tijoparacka@gmail.com>
Closes#5068 from tijoparacka/fix_sql_dataframe_example and squashes the following commits:
6953ac1 [Tijo Thomas] Handled Java and Python example sections
0751a74 [Tijo Thomas] Fixed compiler and errors in Dataframe examples
Also fixed a bunch of minor styling issues.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5001)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes#5001 from liancheng/parquet-doc and squashes the following commits:
89ad3db [Cheng Lian] Addresses @rxin's comments
7eb6955 [Cheng Lian] Docs for the new Parquet data source
415eefb [Cheng Lian] Some minor formatting improvements
Miss `toDF()` function in docs/sql-programming-guide.md
Author: zzcclp <xm_zzc@sina.com>
Closes#4977 from zzcclp/SPARK-6275 and squashes the following commits:
9a96c7b [zzcclp] Miss toDF()
Author: Michael Armbrust <michael@databricks.com>
Closes#4958 from marmbrus/sqlDocs and squashes the following commits:
9351dbc [Michael Armbrust] fix parquet example
6877e13 [Michael Armbrust] add sql examples
d81b7e7 [Michael Armbrust] rxins comments
e393528 [Michael Armbrust] fix order
19c2735 [Michael Armbrust] more on data source load/store
00d5914 [Michael Armbrust] Update SQL Docs with JDBC and Migration Guide
Author: Reynold Xin <rxin@databricks.com>
Closes#4954 from rxin/df-docs and squashes the following commits:
c592c70 [Reynold Xin] [SPARK-5310][Doc] Update SQL Programming Guide to include DataFrames.
Author: CodingCat <zhunansjtu@gmail.com>
Closes#4656 from CodingCat/fix_typo and squashes the following commits:
b41d15c [CodingCat] recover
689fe46 [CodingCat] fix typo
Updated examples using the new api and added DataFrame concept
Author: Antonio Navarro Perez <ajnavarro@users.noreply.github.com>
Closes#4560 from ajnavarro/ajnavarro-doc-sql-update and squashes the following commits:
82ebcf3 [Antonio Navarro Perez] Changed a missing JavaSQLContext to SQLContext.
8d5376a [Antonio Navarro Perez] fixed typo
8196b6b [Antonio Navarro Perez] [SQL][DOCS] Update sql documentation
Deprecate inferSchema() and applySchema(), use createDataFrame() instead, which could take an optional `schema` to create an DataFrame from an RDD. The `schema` could be StructType or list of names of columns.
Author: Davies Liu <davies@databricks.com>
Closes#4498 from davies/create and squashes the following commits:
08469c1 [Davies Liu] remove Scala/Java API for now
c80a7a9 [Davies Liu] fix hive test
d1bd8f2 [Davies Liu] cleanup applySchema
9526e97 [Davies Liu] createDataFrame from RDD with columns
Trivial fix.
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#4400 from adrian-wang/docdate and squashes the following commits:
31bbe40 [Daoyuan Wang] doc fix for date
- Add meta description tags on some of the most important doc pages
- Shorten the titles of some pages to have more relevant keywords; for
example there's no reason to have "Spark SQL Programming Guide - Spark
1.2.0 documentation", we can just say "Spark SQL - Spark 1.2.0
documentation".
Author: Matei Zaharia <matei@databricks.com>
Closes#4381 from mateiz/docs-seo and squashes the following commits:
4940563 [Matei Zaharia] [SPARK-5608] Improve SEO of Spark documentation pages
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#3820 from adrian-wang/parquettimestamp and squashes the following commits:
b1e2a0d [Daoyuan Wang] fix for nanos
4dadef1 [Daoyuan Wang] fix wrong read
93f438d [Daoyuan Wang] parquet timestamp support
Follow up of #3925
/cc rxin
Author: scwf <wangfei1@huawei.com>
Closes#4095 from scwf/sql-doc and squashes the following commits:
97e311b [scwf] update sql doc since now expose only one version of the data type APIs
`CACHE TABLE tbl` is now __eager__ by default not __lazy__
Author: luogankun <luogankun@gmail.com>
Closes#3773 from luogankun/SPARK-4930 and squashes the following commits:
cc17b7d [luogankun] [SPARK-4930][SQL][DOCS]Update SQL programming guide, add CACHE [LAZY] TABLE [AS SELECT] ...
bffe0e8 [luogankun] [SPARK-4930][SQL][DOCS]Update SQL programming guide, CACHE TABLE tbl is eager
* This commit hopes to avoid the confusion I faced when trying
to submit a regular, valid multi-line JSON file, also see
http://apache-spark-user-list.1001560.n3.nabble.com/Loading-JSON-Dataset-fails-with-com-fasterxml-jackson-databind-JsonMappingException-td20041.html
Author: Peter Vandenabeele <peter@vandenabeele.com>
Closes#3517 from petervandenabeele/pv-docs-note-on-jsonFile-format/01 and squashes the following commits:
1f98e52 [Peter Vandenabeele] Revert to people.json and simple Note text
6b6e062 [Peter Vandenabeele] Change the "JSON" connotation to "txt"
fca7dfb [Peter Vandenabeele] Add a Note on jsonFile having separate JSON objects per line
Add HTTP protocol support and test cases to spark thrift server, so users can deploy thrift server in both TCP and http mode.
Author: Judy Nash <judynash@microsoft.com>
Author: judynash <judynash@microsoft.com>
Closes#3672 from judynash/master and squashes the following commits:
526315d [Judy Nash] correct spacing on startThriftServer method
31a6520 [Judy Nash] fix code style issues and update sql programming guide format issue
47bf87e [Judy Nash] modify withJdbcStatement method definition to meet less than 100 line length
2e9c11c [Judy Nash] add thrift server in http mode documentation on sql programming guide
1cbd305 [Judy Nash] Merge remote-tracking branch 'upstream/master'
2b1d312 [Judy Nash] updated http thrift server support based on feedback
377532c [judynash] add HTTP protocol spark thrift server
Author: Andy Konwinski <andykonwinski@gmail.com>
Closes#3611 from andyk/patch-3 and squashes the following commits:
7bab333 [Andy Konwinski] Fix typo in Spark SQL docs.
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#3535 from adrian-wang/datedoc and squashes the following commits:
18ff1ed [Daoyuan Wang] [DOC] Date type
Documents `spark.sql.parquet.filterPushdown`, explains why it's turned off by default and when it's safe to be turned on.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3440)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes#3440 from liancheng/parquet-filter-pushdown-doc and squashes the following commits:
2104311 [Cheng Lian] Documents spark.sql.parquet.filterPushdown
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3498)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes#3498 from liancheng/fix-sql-doc-typo and squashes the following commits:
865ecd7 [Cheng Lian] Fixes formatting typo in SQL programming guide
Author: Andy Konwinski <andykonwinski@gmail.com>
Closes#3323 from andyk/patch-2 and squashes the following commits:
4699fdc [Andy Konwinski] Fix broken link to Row class scaladoc
Let's give this another go using a version of Hive that shades its JLine dependency.
Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Patrick Wendell <pwendell@gmail.com>
Closes#3159 from pwendell/scala-2.11-prashant and squashes the following commits:
e93aa3e [Patrick Wendell] Restoring -Phive-thriftserver profile and cleaning up build script.
f65d17d [Patrick Wendell] Fixing build issue due to merge conflict
a8c41eb [Patrick Wendell] Reverting dev/run-tests back to master state.
7a6eb18 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into scala-2.11-prashant
583aa07 [Prashant Sharma] REVERT ME: removed hive thirftserver
3680e58 [Prashant Sharma] Revert "REVERT ME: Temporarily removing some Cli tests."
935fb47 [Prashant Sharma] Revert "Fixed by disabling a few tests temporarily."
925e90f [Prashant Sharma] Fixed by disabling a few tests temporarily.
2fffed3 [Prashant Sharma] Exclude groovy from sbt build, and also provide a way for such instances in future.
8bd4e40 [Prashant Sharma] Switched to gmaven plus, it fixes random failures observer with its predecessor gmaven.
5272ce5 [Prashant Sharma] SPARK_SCALA_VERSION related bugs.
2121071 [Patrick Wendell] Migrating version detection to PySpark
b1ed44d [Patrick Wendell] REVERT ME: Temporarily removing some Cli tests.
1743a73 [Patrick Wendell] Removing decimal test that doesn't work with Scala 2.11
f5cad4e [Patrick Wendell] Add Scala 2.11 docs
210d7e1 [Patrick Wendell] Revert "Testing new Hive version with shaded jline"
48518ce [Patrick Wendell] Remove association of Hive and Thriftserver profiles.
e9d0a06 [Patrick Wendell] Revert "Enable thritfserver for Scala 2.10 only"
67ec364 [Patrick Wendell] Guard building of thriftserver around Scala 2.10 check
8502c23 [Patrick Wendell] Enable thritfserver for Scala 2.10 only
e22b104 [Patrick Wendell] Small fix in pom file
ec402ab [Patrick Wendell] Various fixes
0be5a9d [Patrick Wendell] Testing new Hive version with shaded jline
4eaec65 [Prashant Sharma] Changed scripts to ignore target.
5167bea [Prashant Sharma] small correction
a4fcac6 [Prashant Sharma] Run against scala 2.11 on jenkins.
80285f4 [Prashant Sharma] MAven equivalent of setting spark.executor.extraClasspath during tests.
034b369 [Prashant Sharma] Setting test jars on executor classpath during tests from sbt.
d4874cb [Prashant Sharma] Fixed Python Runner suite. null check should be first case in scala 2.11.
6f50f13 [Prashant Sharma] Fixed build after rebasing with master. We should use ${scala.binary.version} instead of just 2.10
e56ca9d [Prashant Sharma] Print an error if build for 2.10 and 2.11 is spotted.
937c0b8 [Prashant Sharma] SCALA_VERSION -> SPARK_SCALA_VERSION
cb059b0 [Prashant Sharma] Code review
0476e5e [Prashant Sharma] Scala 2.11 support with repl and all build changes.
Author: wangfei <wangfei1@huawei.com>
Closes#3127 from scwf/patch-9 and squashes the following commits:
e39a560 [wangfei] now support dynamic partitioning