## What changes were proposed in this pull request?
Update the unit test code, examples, and documents to remove calls to deprecated method `dataset.registerTempTable`.
## How was this patch tested?
This PR only changes the unit test code, examples, and comments. It should be safe.
This is a follow up of PR https://github.com/apache/spark/pull/12945 which was merged.
Author: Sean Zhong <seanzhong@databricks.com>
Closes#13098 from clockfly/spark-15171-remove-deprecation.
## What changes were proposed in this pull request?
dapply() applies an R function on each partition of a DataFrame and returns a new DataFrame.
The function signature is:
dapply(df, function(localDF) {}, schema = NULL)
R function input: local data.frame from the partition on local node
R function output: local data.frame
Schema specifies the Row format of the resulting DataFrame. It must match the R function's output.
If schema is not specified, each partition of the result DataFrame will be serialized in R into a single byte array. Such resulting DataFrame can be processed by successive calls to dapply().
## How was this patch tested?
SparkR unit tests.
Author: Sun Rui <rui.sun@intel.com>
Author: Sun Rui <sunrui2016@gmail.com>
Closes#12493 from sun-rui/SPARK-12919.
## What changes were proposed in this pull request?
This issue aims to fix some errors in R examples and make them up-to-date in docs and example modules.
- Remove the wrong usage of `map`. We need to use `lapply` in `sparkR` if needed. However, `lapply` is private so far. The corrected example will be added later.
- Fix the wrong example in Section `Generic Load/Save Functions` of `docs/sql-programming-guide.md` for consistency
- Fix datatypes in `sparkr.md`.
- Update a data result in `sparkr.md`.
- Replace deprecated functions to remove warnings: jsonFile -> read.json, parquetFile -> read.parquet
- Use up-to-date R-like functions: loadDF -> read.df, saveDF -> write.df, saveAsParquetFile -> write.parquet
- Replace `SparkR DataFrame` with `SparkDataFrame` in `dataframe.R` and `data-manipulation.R`.
- Other minor syntax fixes and a typo.
## How was this patch tested?
Manual.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes#12649 from dongjoon-hyun/SPARK-14883.
## What changes were proposed in this pull request?
Removing references to assembly jar in documentation.
Adding an additional (previously undocumented) usage of spark-submit to run examples.
## How was this patch tested?
Ran spark-submit usage to ensure formatting was fine. Ran examples using SparkSubmit.
Author: Mark Grover <mark@apache.org>
Closes#12365 from markgrover/spark-14601.
## What changes were proposed in this pull request?
This PR fixes the `age` data types from `integer` to `long` in `SQL Programming Guide: JSON Datasets`.
## How was this patch tested?
Manual.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes#12290 from dongjoon-hyun/minor_fix_type_in_json_example.
## What changes were proposed in this pull request?
This patch removes DirectParquetOutputCommitter. This was initially created by Databricks as a faster way to write Parquet data to S3. However, given how the underlying S3 Hadoop implementation works, this committer only works when there are no failures. If there are multiple attempts of the same task (e.g. speculation or task failures or node failures), the output data can be corrupted. I don't think this performance optimization outweighs the correctness issue.
## How was this patch tested?
Removed the related tests also.
Author: Reynold Xin <rxin@databricks.com>
Closes#12229 from rxin/SPARK-10063.
This change modifies the "assembly/" module to just copy needed
dependencies to its build directory, and modifies the packaging
script to pick those up (and remove duplicate jars packages in the
examples module).
I also made some minor adjustments to dependencies to remove some
test jars from the final packaging, and remove jars that conflict with each
other when packaged separately (e.g. servlet api).
Also note that this change restores guava in applications' classpaths, even
though it's still shaded inside Spark. This is now needed for the Hadoop
libraries that are packaged with Spark, which now are not processed by
the shade plugin.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#11796 from vanzin/SPARK-13579.
## What changes were proposed in this pull request?
Since developer API of plug-able parser has been removed in #10801 , docs should be updated accordingly.
## How was this patch tested?
This patch will not affect the real code path.
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#11758 from adrian-wang/spark12855.
## What changes were proposed in this pull request?
`Shark` was merged into `Spark SQL` since [July 2014](https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html). The followings seem to be the only legacy. For Spark 2.x, we had better clean up those docs.
**Migration Guide**
```
- ## Migration Guide for Shark Users
- ...
- ### Scheduling
- ...
- ### Reducer number
- ...
- ### Caching
```
## How was this patch tested?
Pass the Jenkins test.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes#11770 from dongjoon-hyun/SPARK-13942.
## What changes were proposed in this pull request?
In order to make `docs/examples` (and other related code) more simple/readable/user-friendly, this PR replaces existing codes like the followings by using `diamond` operator.
```
- final ArrayList<Product2<Object, Object>> dataToWrite =
- new ArrayList<Product2<Object, Object>>();
+ final ArrayList<Product2<Object, Object>> dataToWrite = new ArrayList<>();
```
Java 7 or higher supports **diamond** operator which replaces the type arguments required to invoke the constructor of a generic class with an empty set of type parameters (<>). Currently, Spark Java code use mixed usage of this.
## How was this patch tested?
Manual.
Pass the existing tests.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes#11541 from dongjoon-hyun/SPARK-13702.
## What changes were proposed in this pull request?
This PR tries to fix all typos in all markdown files under `docs` module,
and fixes similar typos in other comments, too.
## How was the this patch tested?
manual tests.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes#11300 from dongjoon-hyun/minor_fix_typos.
In spark-env.sh.template, there are multi-byte characters, this PR will remove it.
Author: Sasaki Toru <sasakitoa@nttdata.co.jp>
Closes#11149 from sasakitoa/remove_multibyte_in_sparkenv.
ISTM `lib` is better because `datanucleus` jars are located in `lib` for release builds.
Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>
Closes#10901 from maropu/DocFix.
Spark SQL's JDBC data source allows users to specify an explicit JDBC driver to load (using the `driver` argument), but in the current code it's possible that the user-specified driver will not be used when it comes time to actually create a JDBC connection.
In a nutshell, the problem is that you might have multiple JDBC drivers on the classpath that claim to be able to handle the same subprotocol, so simply registering the user-provided driver class with the our `DriverRegistry` and JDBC's `DriverManager` is not sufficient to ensure that it's actually used when creating the JDBC connection.
This patch addresses this issue by first registering the user-specified driver with the DriverManager, then iterating over the driver manager's loaded drivers in order to obtain the correct driver and use it to create a connection (previously, we just called `DriverManager.getConnection()` directly).
If a user did not specify a JDBC driver to use, then we call `DriverManager.getDriver` to figure out the class of the driver to use, then pass that class's name to executors; this guards against corner-case bugs in situations where the driver and executor JVMs might have different sets of JDBC drivers on their classpaths (previously, there was the (rare) potential for `DriverManager.getConnection()` to use different drivers on the driver and executors if the user had not explicitly specified a JDBC driver class and the classpaths were different).
This patch is inspired by a similar patch that I made to the `spark-redshift` library (https://github.com/databricks/spark-redshift/pull/143), which contains its own modified fork of some of Spark's JDBC data source code (for cross-Spark-version compatibility reasons).
Author: Josh Rosen <joshrosen@databricks.com>
Closes#10519 from JoshRosen/jdbc-driver-precedence.
This PR adds a new option `spark.sql.hive.thriftServer.singleSession` for disabling multi-session support in the Thrift server.
Note that this option is added as a Spark configuration (retrieved from `SparkConf`) rather than Spark SQL configuration (retrieved from `SQLConf`). This is because all SQL configurations are session-ized. Since multi-session support is by default on, no JDBC connection can modify global configurations like the newly added one.
Author: Cheng Lian <lian@databricks.com>
Closes#9740 from liancheng/spark-11089.single-session-option.
This fix is to add one line to explain the current behavior of Spark SQL when writing Parquet files. All columns are forced to be nullable for compatibility reasons.
Author: gatorsmile <gatorsmile@gmail.com>
Closes#9314 from gatorsmile/lossNull.
It doesn't show up as a hyperlink currently. It will show up as a hyperlink after this change.
Author: Rohit Agarwal <mindprince@gmail.com>
Closes#9544 from mindprince/patch-2.
Doc change to align with HiveConf default in terms of where to create `warehouse` directory.
Author: xin Wu <xinwu@us.ibm.com>
Closes#9365 from xwu0226/spark-10046-commit.
This snippet seems to be mistakenly introduced at two places in #5348.
Author: Rohit Agarwal <mindprince@gmail.com>
Closes#9540 from mindprince/patch-1.
The SQL programming guide's link to the DataFrame functions reference points to the wrong location; this patch fixes that.
Author: Josh Rosen <joshrosen@databricks.com>
Closes#9269 from JoshRosen/SPARK-11299.
Recommend `--master yarn --deploy-mode {cluster,client}` consistently in docs.
Follow-on to https://github.com/apache/spark/pull/8385
CC nssalian
Author: Sean Owen <sowen@cloudera.com>
Closes#8968 from srowen/SPARK-9570.
It does not make much sense to set `spark.shuffle.spill` or `spark.sql.planner.externalSort` to false: I believe that these configurations were initially added as "escape hatches" to guard against bugs in the external operators, but these operators are now mature and well-tested. In addition, these configurations are not handled in a consistent way anymore: SQL's Tungsten codepath ignores these configurations and will continue to use spilling operators. Similarly, Spark Core's `tungsten-sort` shuffle manager does not respect `spark.shuffle.spill=false`.
This pull request removes these configurations, adds warnings at the appropriate places, and deletes a large amount of code which was only used in code paths that did not support spilling.
Author: Josh Rosen <joshrosen@databricks.com>
Closes#8831 from JoshRosen/remove-ability-to-disable-spilling.
In Spark 1.5.0, Spark SQL is compatible with Hive 0.12.0 through 1.2.1 but the documentation is wrong.
/CC yhuai
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes#8776 from sarutak/SPARK-10584-2.
The default value of hive metastore version is 1.2.1 but the documentation says the value of `spark.sql.hive.metastore.version` is 0.13.1.
Also, we cannot get the default value by `sqlContext.getConf("spark.sql.hive.metastore.version")`.
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes#8739 from sarutak/SPARK-10584.
https://issues.apache.org/jira/browse/SPARK-10287
After porting json to HadoopFsRelation, it seems hard to keep the behavior of picking up new files automatically for JSON. This PR removes this behavior, so JSON is consistent with others (ORC and Parquet).
Author: Yin Huai <yhuai@databricks.com>
Closes#8469 from yhuai/jsonRefresh.
spark.sql.tungsten.enabled will be the default value for both codegen and unsafe, they are kept internally for debug/testing.
cc marmbrus rxin
Author: Davies Liu <davies@databricks.com>
Closes#7998 from davies/tungsten and squashes the following commits:
c1c16da [Davies Liu] update doc
1a47be1 [Davies Liu] use tungsten.enabled for both of codegen/unsafe
(cherry picked from commit 4e70e8256c)
Signed-off-by: Reynold Xin <rxin@databricks.com>
spark.sql.tungsten.enabled will be the default value for both codegen and unsafe, they are kept internally for debug/testing.
cc marmbrus rxin
Author: Davies Liu <davies@databricks.com>
Closes#7998 from davies/tungsten and squashes the following commits:
c1c16da [Davies Liu] update doc
1a47be1 [Davies Liu] use tungsten.enabled for both of codegen/unsafe
#7142 made codegen enabled by default so let's modify the corresponding documents.
Closes#7142
Author: KaiXinXiaoLei <huleilei1@huawei.com>
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes#7863 from sarutak/SPARK-9535 and squashes the following commits:
0884424 [Kousuke Saruta] Removed a line which mentioned about the effect of codegen enabled
3c11af0 [Kousuke Saruta] Merge branch 'sqlconfig' of https://github.com/KaiXinXiaoLei/spark into SPARK-9535
4ee531d [KaiXinXiaoLei] delete space
4cfd11d [KaiXinXiaoLei] change spark.sql.planner.externalSort
d624cf8 [KaiXinXiaoLei] sql config is wrong
Use print(x) not print x for Python 3 in eval examples
CC sethah mengxr -- just wanted to close this out before 1.5
Author: Sean Owen <sowen@cloudera.com>
Closes#7822 from srowen/SPARK-9490 and squashes the following commits:
01abeba [Sean Owen] Change "print x" to "print(x)" in the rest of the docs too
bd7f7fb [Sean Owen] Use print(x) not print x for Python 3 in eval examples
PARQUET-136 and PARQUET-173 have been fixed in parquet-mr 1.7.0. It's time to enable filter push-down by default now.
Author: Cheng Lian <lian@databricks.com>
Closes#7612 from liancheng/spark-9207 and squashes the following commits:
77e6b5e [Cheng Lian] Enables Parquet filter push-down by default
…ng-guide#Manually Specifying Options to be in sync with java,python, R version
Author: Alok Singh <“singhal@us.ibm.com”>
Closes#7299 from aloknsingh/aloknsingh_SPARK-8909 and squashes the following commits:
d3c20ba [Alok Singh] fix the file to .parquet from .json
d476140 [Alok Singh] [SPARK-8909][Documentation] Change the scala example in sql-programming-guide#Manually Specifying Options to be in sync with java,python, R version
Author: Sun Rui <rui.sun@intel.com>
Closes#7287 from sun-rui/SPARK-8894 and squashes the following commits:
da63898 [Sun Rui] [SPARK-8894][SPARKR][DOC] Example code errors in SparkR documentation.
Fixed comment given by rxin
Author: Tijo Thomas <tijoparacka@gmail.com>
Closes#7281 from tijoparacka/modification_for_python_style and squashes the following commits:
6334e21 [Tijo Thomas] removed space
3de4cd8 [Tijo Thomas] python Style update
Modified the deprecated jdbc api in the documentation.
Author: Tijo Thomas <tijoparacka@gmail.com>
Closes#7039 from tijoparacka/JIRA_8615 and squashes the following commits:
6e73b8a [Tijo Thomas] Reverted new lines
4042fcf [Tijo Thomas] updated to sql documentation
a27949c [Tijo Thomas] Fixed Sample deprecated code
This PR only applies to master branch (1.5.0-SNAPSHOT) since it references `org.apache.parquet` classes which only appear in Parquet 1.7.0.
Author: Cheng Lian <lian@databricks.com>
Closes#6683 from liancheng/output-committer-docs and squashes the following commits:
b4648b8 [Cheng Lian] Removes spark.sql.sources.outputCommitterClass as it's not a public option
ee63923 [Cheng Lian] Updates docs and comments of data sources and Parquet output committer options
This fixes various minor documentation issues on the Spark SQL page
Author: Lars Francke <lars.francke@gmail.com>
Closes#6890 from lfrancke/SPARK-8462 and squashes the following commits:
dd7e302 [Lars Francke] Merge branch 'master' into SPARK-8462
34eff2c [Lars Francke] Minor documentation fixes
1. Add `SQLConfEntry` to store the information about a configuration. For those configurations that cannot be found in `sql-programming-guide.md`, I left the doc as `<TODO>`.
2. Verify the value when setting a configuration if this is in SQLConf.
3. Use `SET -v` to display all public configurations.
Author: zsxwing <zsxwing@gmail.com>
Closes#6747 from zsxwing/sqlconf and squashes the following commits:
7d09bad [zsxwing] Use SQLConfEntry in HiveContext
49f6213 [zsxwing] Add getConf, setConf to SQLContext and HiveContext
e014f53 [zsxwing] Merge branch 'master' into sqlconf
93dad8e [zsxwing] Fix the unit tests
cf950c1 [zsxwing] Fix the code style and tests
3c5f03e [zsxwing] Add unsetConf(SQLConfEntry) and fix the code style
a2f4add [zsxwing] getConf will return the default value if a config is not set
037b1db [zsxwing] Add schema to SetCommand
0520c3c [zsxwing] Merge branch 'master' into sqlconf
7afb0ec [zsxwing] Fix the configurations about HiveThriftServer
7e728e3 [zsxwing] Add doc for SQLConfEntry and fix 'toString'
5e95b10 [zsxwing] Add enumConf
c6ba76d [zsxwing] setRawString => setConfString, getRawString => getConfString
4abd807 [zsxwing] Fix the test for 'set -v'
6e47e56 [zsxwing] Fix the compilation error
8973ced [zsxwing] Remove floatConf
1fc3a8b [zsxwing] Remove the 'conf' command and use 'set -v' instead
99c9c16 [zsxwing] Fix tests that use SQLConfEntry as a string
88a03cc [zsxwing] Add new lines between confs and return types
ce7c6c8 [zsxwing] Remove seqConf
f3c1b33 [zsxwing] Refactor SQLConf to display better error message
Typo in thriftserver section
Author: Moussa Taifi <moutai10@gmail.com>
Closes#6847 from moutai/patch-1 and squashes the following commits:
1bd29df [Moussa Taifi] Update sql-programming-guide.md
Author: Peter Hoffmann <ph@peter-hoffmann.com>
Closes#6815 from hoffmann/patch-1 and squashes the following commits:
2abb6da [Peter Hoffmann] fix read/write mixup
Author: Cheng Lian <lian@databricks.com>
Closes#6749 from liancheng/java-sample-fix and squashes the following commits:
5b44585 [Cheng Lian] Fixes a minor Java example error in SQL programming guide
JIRA: https://issues.apache.org/jira/browse/SPARK-7939
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#6503 from viirya/disable_partition_type_inference and squashes the following commits:
3e90470 [Liang-Chi Hsieh] Default to enable type inference and update docs.
455edb1 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into disable_partition_type_inference
9a57933 [Liang-Chi Hsieh] Add conf to enable/disable partition column type inference.
Add documentation for spark.sql.planner.externalSort
Author: Luca Martinetti <luca@luca.io>
Closes#6272 from lucamartinetti/docs-externalsort and squashes the following commits:
985661b [Luca Martinetti] [SPARK-7747] [SQL] [DOCS] Add documentation for spark.sql.planner.externalSort
Author: Reynold Xin <rxin@databricks.com>
Closes#6522 from rxin/sql-doc-1.4 and squashes the following commits:
c227be7 [Reynold Xin] Updated link.
040b6d7 [Reynold Xin] Update documentation for the new DataFrame reader/writer interface.
Author: Cheng Lian <lian@databricks.com>
Closes#6520 from liancheng/spark-7849 and squashes the following commits:
705264b [Cheng Lian] Updates SQL programming guide for 1.4
This PR adds a new SparkR programming guide at the top-level. This will be useful for R users as our APIs don't directly match the Scala/Python APIs and as we need to explain SparkR without using RDDs as examples etc.
cc rxin davies pwendell
cc cafreeman -- Would be great if you could also take a look at this !
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes#6490 from shivaram/sparkr-guide and squashes the following commits:
d5ff360 [Shivaram Venkataraman] Add a section on HiveContext, HQL queries
408dce5 [Shivaram Venkataraman] Fix link
dbb86e3 [Shivaram Venkataraman] Fix minor typo
9aff5e0 [Shivaram Venkataraman] Address comments, use dplyr-like syntax in example
d09703c [Shivaram Venkataraman] Fix default argument in read.df
ea816a1 [Shivaram Venkataraman] Add a new SparkR programming guide Also update write.df, read.df to handle defaults better
This contribution is my original work and I license the work to the project under the project's open source license
Author: Matt Wise <mwise@quixey.com>
Closes#6447 from wisematthew/fix-typo-in-java-udf-registration-doc and squashes the following commits:
e7ef5f7 [Matt Wise] Fix typo in documentation for Java UDF registration
sqlCtx -> sqlContext
You can check the docs by:
```
$ cd docs
$ SKIP_SCALADOC=1 jekyll serve
```
cc shivaram
Author: Davies Liu <davies@databricks.com>
Closes#5442 from davies/r_docs and squashes the following commits:
7a12ec6 [Davies Liu] remove rdd in R docs
8496b26 [Davies Liu] remove the docs related to RDD
e23b9d6 [Davies Liu] delete R docs for RDD API
222e4ff [Davies Liu] Merge branch 'master' into r_docs
89684ce [Davies Liu] Merge branch 'r_docs' of github.com:davies/spark into r_docs
f0a10e1 [Davies Liu] address comments from @shivaram
f61de71 [Davies Liu] Update pairRDD.R
3ef7cf3 [Davies Liu] use + instead of function(a,b) a+b
2f10a77 [Davies Liu] address comments from @cafreeman
9c2a062 [Davies Liu] mention R api together with Python API
23f751a [Davies Liu] Fill in SparkR examples in programming guide
add docs for https://issues.apache.org/jira/browse/SPARK-6994
Author: vidmantas zemleris <vidmantas@vinted.com>
Closes#6030 from vidma/docs/row-with-named-fields and squashes the following commits:
241b401 [vidmantas zemleris] [SPARK-6994][SQL] Update docs for fetching Row fields by name
Author: Reynold Xin <rxin@databricks.com>
Closes#6062 from rxin/agg-retain-doc and squashes the following commits:
43e511e [Reynold Xin] [SPARK-7462][SQL] Update documentation for retaining grouping columns in DataFrames.
JIRA: https://issues.apache.org/jira/browse/SPARK-7516
In sql-programming-guide, deprecated python data frame api inferSchema() should be replaced by createDataFrame():
schemaPeople = sqlContext.inferSchema(people) ->
schemaPeople = sqlContext.createDataFrame(people)
Author: gchen <chenguancheng@gmail.com>
Closes#6041 from gchen/python-docs and squashes the following commits:
c27eb7c [gchen] replace inferSchema() with createDataFrame()
fix typo
Author: Ken Geis <geis.ken@gmail.com>
Closes#5674 from kgeis/patch-1 and squashes the following commits:
5ae67de [Ken Geis] Update sql-programming-guide.md
This patch is fixing the Java examples for Spark SQL when defining
programmatically a Schema and mapping Rows.
Author: Olivier Girardot <o.girardot@lateral-thoughts.com>
Closes#5569 from ogirardot/branch-1.3 and squashes the following commits:
c29e58d [Olivier Girardot] SPARK-6992 : Fix documentation example for Spark SQL on StructType
(cherry picked from commit c9b1ba4b16a7afe93d45bf75b128cc0dd287ded0)
Signed-off-by: Reynold Xin <rxin@databricks.com>
This patch includes :
* adding how to use map after an sql query using javaRDD
* fixing the first few java examples that were written in Scala
Thank you for your time,
Olivier.
Author: Olivier Girardot <o.girardot@lateral-thoughts.com>
Closes#5564 from ogirardot/branch-1.3 and squashes the following commits:
9f8d60e [Olivier Girardot] SPARK-6988 : Fix documentation regarding DataFrames using the Java API
(cherry picked from commit 6b528dc139da594ef2e651d84bd91fe0f738a39d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
https://issues.apache.org/jira/browse/SPARK-6863
Author: Santiago M. Mola <santiago.mola@sap.com>
Closes#5472 from smola/fix/sql-docs and squashes the following commits:
42503d4 [Santiago M. Mola] [SPARK-6863] Fix formatting on SQL programming guide.
Use `sqlContext` in PySpark shell, make it consistent with SQL programming guide. `sqlCtx` is also kept for compatibility.
Author: Davies Liu <davies@databricks.com>
Closes#5425 from davies/sqlCtx and squashes the following commits:
af67340 [Davies Liu] sqlCtx -> sqlContext
15a278f [Davies Liu] use sqlContext in python shell
Author: Michael Armbrust <michael@databricks.com>
Closes#5192 from marmbrus/fixJDBCDocs and squashes the following commits:
b48a33d [Michael Armbrust] [DOCS][SQL] Fix JDBC example
Needed to import the types specifically, not the more general pyspark.sql
Author: Bill Chambers <wchambers@ischool.berkeley.edu>
Author: anabranch <wac.chambers@gmail.com>
Closes#5179 from anabranch/master and squashes the following commits:
8fa67bf [anabranch] Corrected SqlContext Import
603b080 [Bill Chambers] [DOCUMENTATION]Fixed Missing Type Import in Documentation
Author: vinodkc <vinod.kc.in@gmail.com>
Closes#5112 from vinodkc/spark_1.3_doc_fixes and squashes the following commits:
2c6aee6 [vinodkc] Spark 1.3 doc fixes
Author: Kamil Smuga <smugakamil@gmail.com>
Author: stderr <smugakamil@gmail.com>
Closes#5120 from kamilsmuga/master and squashes the following commits:
fee3281 [Kamil Smuga] more python api links fixed for docs
13240cb [Kamil Smuga] resolved merge conflicts with upstream/master
6649b3b [Kamil Smuga] fix broken docs links to Python API
92f03d7 [stderr] Fix links to pyspark api
Author: Tijo Thomas <tijoparacka@gmail.com>
Closes#5068 from tijoparacka/fix_sql_dataframe_example and squashes the following commits:
6953ac1 [Tijo Thomas] Handled Java and Python example sections
0751a74 [Tijo Thomas] Fixed compiler and errors in Dataframe examples
Also fixed a bunch of minor styling issues.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5001)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes#5001 from liancheng/parquet-doc and squashes the following commits:
89ad3db [Cheng Lian] Addresses @rxin's comments
7eb6955 [Cheng Lian] Docs for the new Parquet data source
415eefb [Cheng Lian] Some minor formatting improvements
Miss `toDF()` function in docs/sql-programming-guide.md
Author: zzcclp <xm_zzc@sina.com>
Closes#4977 from zzcclp/SPARK-6275 and squashes the following commits:
9a96c7b [zzcclp] Miss toDF()
Author: Michael Armbrust <michael@databricks.com>
Closes#4958 from marmbrus/sqlDocs and squashes the following commits:
9351dbc [Michael Armbrust] fix parquet example
6877e13 [Michael Armbrust] add sql examples
d81b7e7 [Michael Armbrust] rxins comments
e393528 [Michael Armbrust] fix order
19c2735 [Michael Armbrust] more on data source load/store
00d5914 [Michael Armbrust] Update SQL Docs with JDBC and Migration Guide
Author: Reynold Xin <rxin@databricks.com>
Closes#4954 from rxin/df-docs and squashes the following commits:
c592c70 [Reynold Xin] [SPARK-5310][Doc] Update SQL Programming Guide to include DataFrames.
Author: CodingCat <zhunansjtu@gmail.com>
Closes#4656 from CodingCat/fix_typo and squashes the following commits:
b41d15c [CodingCat] recover
689fe46 [CodingCat] fix typo
Updated examples using the new api and added DataFrame concept
Author: Antonio Navarro Perez <ajnavarro@users.noreply.github.com>
Closes#4560 from ajnavarro/ajnavarro-doc-sql-update and squashes the following commits:
82ebcf3 [Antonio Navarro Perez] Changed a missing JavaSQLContext to SQLContext.
8d5376a [Antonio Navarro Perez] fixed typo
8196b6b [Antonio Navarro Perez] [SQL][DOCS] Update sql documentation
Deprecate inferSchema() and applySchema(), use createDataFrame() instead, which could take an optional `schema` to create an DataFrame from an RDD. The `schema` could be StructType or list of names of columns.
Author: Davies Liu <davies@databricks.com>
Closes#4498 from davies/create and squashes the following commits:
08469c1 [Davies Liu] remove Scala/Java API for now
c80a7a9 [Davies Liu] fix hive test
d1bd8f2 [Davies Liu] cleanup applySchema
9526e97 [Davies Liu] createDataFrame from RDD with columns
Trivial fix.
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#4400 from adrian-wang/docdate and squashes the following commits:
31bbe40 [Daoyuan Wang] doc fix for date
- Add meta description tags on some of the most important doc pages
- Shorten the titles of some pages to have more relevant keywords; for
example there's no reason to have "Spark SQL Programming Guide - Spark
1.2.0 documentation", we can just say "Spark SQL - Spark 1.2.0
documentation".
Author: Matei Zaharia <matei@databricks.com>
Closes#4381 from mateiz/docs-seo and squashes the following commits:
4940563 [Matei Zaharia] [SPARK-5608] Improve SEO of Spark documentation pages
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#3820 from adrian-wang/parquettimestamp and squashes the following commits:
b1e2a0d [Daoyuan Wang] fix for nanos
4dadef1 [Daoyuan Wang] fix wrong read
93f438d [Daoyuan Wang] parquet timestamp support
Follow up of #3925
/cc rxin
Author: scwf <wangfei1@huawei.com>
Closes#4095 from scwf/sql-doc and squashes the following commits:
97e311b [scwf] update sql doc since now expose only one version of the data type APIs
`CACHE TABLE tbl` is now __eager__ by default not __lazy__
Author: luogankun <luogankun@gmail.com>
Closes#3773 from luogankun/SPARK-4930 and squashes the following commits:
cc17b7d [luogankun] [SPARK-4930][SQL][DOCS]Update SQL programming guide, add CACHE [LAZY] TABLE [AS SELECT] ...
bffe0e8 [luogankun] [SPARK-4930][SQL][DOCS]Update SQL programming guide, CACHE TABLE tbl is eager
* This commit hopes to avoid the confusion I faced when trying
to submit a regular, valid multi-line JSON file, also see
http://apache-spark-user-list.1001560.n3.nabble.com/Loading-JSON-Dataset-fails-with-com-fasterxml-jackson-databind-JsonMappingException-td20041.html
Author: Peter Vandenabeele <peter@vandenabeele.com>
Closes#3517 from petervandenabeele/pv-docs-note-on-jsonFile-format/01 and squashes the following commits:
1f98e52 [Peter Vandenabeele] Revert to people.json and simple Note text
6b6e062 [Peter Vandenabeele] Change the "JSON" connotation to "txt"
fca7dfb [Peter Vandenabeele] Add a Note on jsonFile having separate JSON objects per line
Add HTTP protocol support and test cases to spark thrift server, so users can deploy thrift server in both TCP and http mode.
Author: Judy Nash <judynash@microsoft.com>
Author: judynash <judynash@microsoft.com>
Closes#3672 from judynash/master and squashes the following commits:
526315d [Judy Nash] correct spacing on startThriftServer method
31a6520 [Judy Nash] fix code style issues and update sql programming guide format issue
47bf87e [Judy Nash] modify withJdbcStatement method definition to meet less than 100 line length
2e9c11c [Judy Nash] add thrift server in http mode documentation on sql programming guide
1cbd305 [Judy Nash] Merge remote-tracking branch 'upstream/master'
2b1d312 [Judy Nash] updated http thrift server support based on feedback
377532c [judynash] add HTTP protocol spark thrift server
Author: Andy Konwinski <andykonwinski@gmail.com>
Closes#3611 from andyk/patch-3 and squashes the following commits:
7bab333 [Andy Konwinski] Fix typo in Spark SQL docs.
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#3535 from adrian-wang/datedoc and squashes the following commits:
18ff1ed [Daoyuan Wang] [DOC] Date type