fix typo
Author: Ken Geis <geis.ken@gmail.com>
Closes#5674 from kgeis/patch-1 and squashes the following commits:
5ae67de [Ken Geis] Update sql-programming-guide.md
This patch is fixing the Java examples for Spark SQL when defining
programmatically a Schema and mapping Rows.
Author: Olivier Girardot <o.girardot@lateral-thoughts.com>
Closes#5569 from ogirardot/branch-1.3 and squashes the following commits:
c29e58d [Olivier Girardot] SPARK-6992 : Fix documentation example for Spark SQL on StructType
(cherry picked from commit c9b1ba4b16a7afe93d45bf75b128cc0dd287ded0)
Signed-off-by: Reynold Xin <rxin@databricks.com>
This patch includes :
* adding how to use map after an sql query using javaRDD
* fixing the first few java examples that were written in Scala
Thank you for your time,
Olivier.
Author: Olivier Girardot <o.girardot@lateral-thoughts.com>
Closes#5564 from ogirardot/branch-1.3 and squashes the following commits:
9f8d60e [Olivier Girardot] SPARK-6988 : Fix documentation regarding DataFrames using the Java API
(cherry picked from commit 6b528dc139da594ef2e651d84bd91fe0f738a39d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
https://issues.apache.org/jira/browse/SPARK-6863
Author: Santiago M. Mola <santiago.mola@sap.com>
Closes#5472 from smola/fix/sql-docs and squashes the following commits:
42503d4 [Santiago M. Mola] [SPARK-6863] Fix formatting on SQL programming guide.
Use `sqlContext` in PySpark shell, make it consistent with SQL programming guide. `sqlCtx` is also kept for compatibility.
Author: Davies Liu <davies@databricks.com>
Closes#5425 from davies/sqlCtx and squashes the following commits:
af67340 [Davies Liu] sqlCtx -> sqlContext
15a278f [Davies Liu] use sqlContext in python shell
Author: Michael Armbrust <michael@databricks.com>
Closes#5192 from marmbrus/fixJDBCDocs and squashes the following commits:
b48a33d [Michael Armbrust] [DOCS][SQL] Fix JDBC example
Needed to import the types specifically, not the more general pyspark.sql
Author: Bill Chambers <wchambers@ischool.berkeley.edu>
Author: anabranch <wac.chambers@gmail.com>
Closes#5179 from anabranch/master and squashes the following commits:
8fa67bf [anabranch] Corrected SqlContext Import
603b080 [Bill Chambers] [DOCUMENTATION]Fixed Missing Type Import in Documentation
Author: vinodkc <vinod.kc.in@gmail.com>
Closes#5112 from vinodkc/spark_1.3_doc_fixes and squashes the following commits:
2c6aee6 [vinodkc] Spark 1.3 doc fixes
Author: Kamil Smuga <smugakamil@gmail.com>
Author: stderr <smugakamil@gmail.com>
Closes#5120 from kamilsmuga/master and squashes the following commits:
fee3281 [Kamil Smuga] more python api links fixed for docs
13240cb [Kamil Smuga] resolved merge conflicts with upstream/master
6649b3b [Kamil Smuga] fix broken docs links to Python API
92f03d7 [stderr] Fix links to pyspark api
Author: Tijo Thomas <tijoparacka@gmail.com>
Closes#5068 from tijoparacka/fix_sql_dataframe_example and squashes the following commits:
6953ac1 [Tijo Thomas] Handled Java and Python example sections
0751a74 [Tijo Thomas] Fixed compiler and errors in Dataframe examples
Also fixed a bunch of minor styling issues.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5001)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes#5001 from liancheng/parquet-doc and squashes the following commits:
89ad3db [Cheng Lian] Addresses @rxin's comments
7eb6955 [Cheng Lian] Docs for the new Parquet data source
415eefb [Cheng Lian] Some minor formatting improvements
Miss `toDF()` function in docs/sql-programming-guide.md
Author: zzcclp <xm_zzc@sina.com>
Closes#4977 from zzcclp/SPARK-6275 and squashes the following commits:
9a96c7b [zzcclp] Miss toDF()
Author: Michael Armbrust <michael@databricks.com>
Closes#4958 from marmbrus/sqlDocs and squashes the following commits:
9351dbc [Michael Armbrust] fix parquet example
6877e13 [Michael Armbrust] add sql examples
d81b7e7 [Michael Armbrust] rxins comments
e393528 [Michael Armbrust] fix order
19c2735 [Michael Armbrust] more on data source load/store
00d5914 [Michael Armbrust] Update SQL Docs with JDBC and Migration Guide
Author: Reynold Xin <rxin@databricks.com>
Closes#4954 from rxin/df-docs and squashes the following commits:
c592c70 [Reynold Xin] [SPARK-5310][Doc] Update SQL Programming Guide to include DataFrames.
Author: CodingCat <zhunansjtu@gmail.com>
Closes#4656 from CodingCat/fix_typo and squashes the following commits:
b41d15c [CodingCat] recover
689fe46 [CodingCat] fix typo
Updated examples using the new api and added DataFrame concept
Author: Antonio Navarro Perez <ajnavarro@users.noreply.github.com>
Closes#4560 from ajnavarro/ajnavarro-doc-sql-update and squashes the following commits:
82ebcf3 [Antonio Navarro Perez] Changed a missing JavaSQLContext to SQLContext.
8d5376a [Antonio Navarro Perez] fixed typo
8196b6b [Antonio Navarro Perez] [SQL][DOCS] Update sql documentation
Deprecate inferSchema() and applySchema(), use createDataFrame() instead, which could take an optional `schema` to create an DataFrame from an RDD. The `schema` could be StructType or list of names of columns.
Author: Davies Liu <davies@databricks.com>
Closes#4498 from davies/create and squashes the following commits:
08469c1 [Davies Liu] remove Scala/Java API for now
c80a7a9 [Davies Liu] fix hive test
d1bd8f2 [Davies Liu] cleanup applySchema
9526e97 [Davies Liu] createDataFrame from RDD with columns
Trivial fix.
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#4400 from adrian-wang/docdate and squashes the following commits:
31bbe40 [Daoyuan Wang] doc fix for date
- Add meta description tags on some of the most important doc pages
- Shorten the titles of some pages to have more relevant keywords; for
example there's no reason to have "Spark SQL Programming Guide - Spark
1.2.0 documentation", we can just say "Spark SQL - Spark 1.2.0
documentation".
Author: Matei Zaharia <matei@databricks.com>
Closes#4381 from mateiz/docs-seo and squashes the following commits:
4940563 [Matei Zaharia] [SPARK-5608] Improve SEO of Spark documentation pages
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#3820 from adrian-wang/parquettimestamp and squashes the following commits:
b1e2a0d [Daoyuan Wang] fix for nanos
4dadef1 [Daoyuan Wang] fix wrong read
93f438d [Daoyuan Wang] parquet timestamp support
Follow up of #3925
/cc rxin
Author: scwf <wangfei1@huawei.com>
Closes#4095 from scwf/sql-doc and squashes the following commits:
97e311b [scwf] update sql doc since now expose only one version of the data type APIs
`CACHE TABLE tbl` is now __eager__ by default not __lazy__
Author: luogankun <luogankun@gmail.com>
Closes#3773 from luogankun/SPARK-4930 and squashes the following commits:
cc17b7d [luogankun] [SPARK-4930][SQL][DOCS]Update SQL programming guide, add CACHE [LAZY] TABLE [AS SELECT] ...
bffe0e8 [luogankun] [SPARK-4930][SQL][DOCS]Update SQL programming guide, CACHE TABLE tbl is eager
* This commit hopes to avoid the confusion I faced when trying
to submit a regular, valid multi-line JSON file, also see
http://apache-spark-user-list.1001560.n3.nabble.com/Loading-JSON-Dataset-fails-with-com-fasterxml-jackson-databind-JsonMappingException-td20041.html
Author: Peter Vandenabeele <peter@vandenabeele.com>
Closes#3517 from petervandenabeele/pv-docs-note-on-jsonFile-format/01 and squashes the following commits:
1f98e52 [Peter Vandenabeele] Revert to people.json and simple Note text
6b6e062 [Peter Vandenabeele] Change the "JSON" connotation to "txt"
fca7dfb [Peter Vandenabeele] Add a Note on jsonFile having separate JSON objects per line
Add HTTP protocol support and test cases to spark thrift server, so users can deploy thrift server in both TCP and http mode.
Author: Judy Nash <judynash@microsoft.com>
Author: judynash <judynash@microsoft.com>
Closes#3672 from judynash/master and squashes the following commits:
526315d [Judy Nash] correct spacing on startThriftServer method
31a6520 [Judy Nash] fix code style issues and update sql programming guide format issue
47bf87e [Judy Nash] modify withJdbcStatement method definition to meet less than 100 line length
2e9c11c [Judy Nash] add thrift server in http mode documentation on sql programming guide
1cbd305 [Judy Nash] Merge remote-tracking branch 'upstream/master'
2b1d312 [Judy Nash] updated http thrift server support based on feedback
377532c [judynash] add HTTP protocol spark thrift server
Author: Andy Konwinski <andykonwinski@gmail.com>
Closes#3611 from andyk/patch-3 and squashes the following commits:
7bab333 [Andy Konwinski] Fix typo in Spark SQL docs.
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#3535 from adrian-wang/datedoc and squashes the following commits:
18ff1ed [Daoyuan Wang] [DOC] Date type
Documents `spark.sql.parquet.filterPushdown`, explains why it's turned off by default and when it's safe to be turned on.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3440)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes#3440 from liancheng/parquet-filter-pushdown-doc and squashes the following commits:
2104311 [Cheng Lian] Documents spark.sql.parquet.filterPushdown
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3498)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes#3498 from liancheng/fix-sql-doc-typo and squashes the following commits:
865ecd7 [Cheng Lian] Fixes formatting typo in SQL programming guide
Author: Andy Konwinski <andykonwinski@gmail.com>
Closes#3323 from andyk/patch-2 and squashes the following commits:
4699fdc [Andy Konwinski] Fix broken link to Row class scaladoc
Let's give this another go using a version of Hive that shades its JLine dependency.
Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Patrick Wendell <pwendell@gmail.com>
Closes#3159 from pwendell/scala-2.11-prashant and squashes the following commits:
e93aa3e [Patrick Wendell] Restoring -Phive-thriftserver profile and cleaning up build script.
f65d17d [Patrick Wendell] Fixing build issue due to merge conflict
a8c41eb [Patrick Wendell] Reverting dev/run-tests back to master state.
7a6eb18 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into scala-2.11-prashant
583aa07 [Prashant Sharma] REVERT ME: removed hive thirftserver
3680e58 [Prashant Sharma] Revert "REVERT ME: Temporarily removing some Cli tests."
935fb47 [Prashant Sharma] Revert "Fixed by disabling a few tests temporarily."
925e90f [Prashant Sharma] Fixed by disabling a few tests temporarily.
2fffed3 [Prashant Sharma] Exclude groovy from sbt build, and also provide a way for such instances in future.
8bd4e40 [Prashant Sharma] Switched to gmaven plus, it fixes random failures observer with its predecessor gmaven.
5272ce5 [Prashant Sharma] SPARK_SCALA_VERSION related bugs.
2121071 [Patrick Wendell] Migrating version detection to PySpark
b1ed44d [Patrick Wendell] REVERT ME: Temporarily removing some Cli tests.
1743a73 [Patrick Wendell] Removing decimal test that doesn't work with Scala 2.11
f5cad4e [Patrick Wendell] Add Scala 2.11 docs
210d7e1 [Patrick Wendell] Revert "Testing new Hive version with shaded jline"
48518ce [Patrick Wendell] Remove association of Hive and Thriftserver profiles.
e9d0a06 [Patrick Wendell] Revert "Enable thritfserver for Scala 2.10 only"
67ec364 [Patrick Wendell] Guard building of thriftserver around Scala 2.10 check
8502c23 [Patrick Wendell] Enable thritfserver for Scala 2.10 only
e22b104 [Patrick Wendell] Small fix in pom file
ec402ab [Patrick Wendell] Various fixes
0be5a9d [Patrick Wendell] Testing new Hive version with shaded jline
4eaec65 [Prashant Sharma] Changed scripts to ignore target.
5167bea [Prashant Sharma] small correction
a4fcac6 [Prashant Sharma] Run against scala 2.11 on jenkins.
80285f4 [Prashant Sharma] MAven equivalent of setting spark.executor.extraClasspath during tests.
034b369 [Prashant Sharma] Setting test jars on executor classpath during tests from sbt.
d4874cb [Prashant Sharma] Fixed Python Runner suite. null check should be first case in scala 2.11.
6f50f13 [Prashant Sharma] Fixed build after rebasing with master. We should use ${scala.binary.version} instead of just 2.10
e56ca9d [Prashant Sharma] Print an error if build for 2.10 and 2.11 is spotted.
937c0b8 [Prashant Sharma] SCALA_VERSION -> SPARK_SCALA_VERSION
cb059b0 [Prashant Sharma] Code review
0476e5e [Prashant Sharma] Scala 2.11 support with repl and all build changes.
Author: wangfei <wangfei1@huawei.com>
Closes#3127 from scwf/patch-9 and squashes the following commits:
e39a560 [wangfei] now support dynamic partitioning
- Turns on compression for in-memory cached data by default
- Changes the default parquet compression format back to gzip (we have seen more OOMs with production workloads due to the way Snappy allocates memory)
- Ups the batch size to 10,000 rows
- Increases the broadcast threshold to 10mb.
- Uses our parquet implementation instead of the hive one by default.
- Cache parquet metadata by default.
Author: Michael Armbrust <michael@databricks.com>
Closes#3064 from marmbrus/fasterDefaults and squashes the following commits:
97ee9f8 [Michael Armbrust] parquet codec docs
e641694 [Michael Armbrust] Remote also
a12866a [Michael Armbrust] Cache metadata.
2d73acc [Michael Armbrust] Update docs defaults.
d63d2d5 [Michael Armbrust] document parquet option
da373f9 [Michael Armbrust] More aggressive defaults
In sql-programming-guide.md, there is a wrong package name "scala.math.sql".
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes#2873 from sarutak/wrong-packagename-fix and squashes the following commits:
4d5ecf4 [Kousuke Saruta] Fixed wrong package name in sql-programming-guide.md
We have changed the output format of `printSchema`. This PR will update our SQL programming guide to show the updated format. Also, it fixes a typo (the value type of `StructType` in Java API).
Author: Yin Huai <huai@cse.ohio-state.edu>
Closes#2630 from yhuai/sqlDoc and squashes the following commits:
267d63e [Yin Huai] Update the output of printSchema and fix a typo.
https://issues.apache.org/jira/browse/SPARK-3715
Author: WangTaoTheTonic <barneystinson@aliyun.com>
Closes#2567 from WangTaoTheTonic/minortypo and squashes the following commits:
9cc3f7a [WangTaoTheTonic] minor typo
Author: Michael Armbrust <michael@databricks.com>
Closes#2527 from marmbrus/patch-1 and squashes the following commits:
a0f9f1c [Michael Armbrust] [SQL][DOCS] Clarify that the server is for JDBC and ODBC
Author: Grega Kespret <grega.kespret@gmail.com>
Closes#2479 from gregakespret/patch-1 and squashes the following commits:
dd6b90a [Grega Kespret] Update docs to use jsonRDD instead of wrong jsonRdd.
Author: Michael Armbrust <michael@databricks.com>
Closes#2434 from marmbrus/patch-1 and squashes the following commits:
67215be [Michael Armbrust] [SQL][DOCS] Improve table caching section
Taken from liancheng's updates. Merged conflicts with #2316.
Author: Michael Armbrust <michael@databricks.com>
Closes#2384 from marmbrus/sqlDocUpdate and squashes the following commits:
2db6319 [Michael Armbrust] @liancheng's updates
* Fixed random typo
* Added in missing description for DecimalType
Author: Nicholas Chammas <nicholas.chammas@gmail.com>
Closes#2367 from nchammas/patch-1 and squashes the following commits:
aa528be [Nicholas Chammas] doc fix for SQL DecimalType
3247ac1 [Nicholas Chammas] [SQL] [Docs] typo fixes
After #1889, the default value of `containsNull` in an `ArrayType` is `true`.
Author: Yin Huai <huai@cse.ohio-state.edu>
Closes#2374 from yhuai/containsNull and squashes the following commits:
dc609a3 [Yin Huai] Update the SQL programming guide to show the correct default value of containsNull in an ArrayType (the default value is true instead of false).
Author: Henry Cook <hcook@eecs.berkeley.edu>
Closes#2316 from hcook/sql-docs and squashes the following commits:
373f94b [Henry Cook] Minor edits to sql programming guide.
As [reported on the dev list](http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-1-0-RC2-tp8107p8131.html):
* Code fencing with triple-backticks doesn’t seem to work like it does on GitHub. Newlines are lost. Instead, use 4-space indent to format small code blocks.
* Nested bullets need 2 leading spaces, not 1.
* Spellcheck!
Author: Nicholas Chammas <nicholas.chammas@gmail.com>
Author: nchammas <nicholas.chammas@gmail.com>
Closes#2201 from nchammas/sql-doc-fixes and squashes the following commits:
873f889 [Nicholas Chammas] [Docs] fix skip-api flag
5195e0c [Nicholas Chammas] [Docs] SQL doc formatting and typo fixes
3b26c8d [nchammas] [Spark QA] Link to console output on test time out
Currently we have a separate profile called hive-thriftserver. I originally suggested this in case users did not want to bundle the thriftserver, but it's ultimately lead to a lot of confusion. Since the thriftserver is only a few classes, I don't see a really good reason to isolate it from the rest of Hive. So let's go ahead and just include it in the same profile to simplify things.
This has been suggested in the past by liancheng.
Author: Patrick Wendell <pwendell@gmail.com>
Closes#2006 from pwendell/hiveserver and squashes the following commits:
742ea40 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into hiveserver
034ad47 [Patrick Wendell] SPARK-3092: Always include the thriftserver when -Phive is enabled.
This definitely needs review as I am not familiar with this part of Spark.
I tested this locally and it did seem to work.
Author: Patrick Wendell <pwendell@gmail.com>
Closes#1937 from pwendell/scheduler and squashes the following commits:
b858e33 [Patrick Wendell] SPARK-3025: Allow JDBC clients to set a fair scheduler pool