Commit graph

147 commits

Author SHA1 Message Date
Felix Cheung 79aa1d82ca [SQL][DOC] SQL programming guide add deprecated methods in 2.0.0
## What changes were proposed in this pull request?

Doc changes

## How was this patch tested?

manual

liancheng

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #13827 from felixcheung/sqldocdeprecate.
2016-06-22 10:37:13 +08:00
Takeshi YAMAMURO 41e0ffb19f [SPARK-15894][SQL][DOC] Update docs for controlling #partitions
## What changes were proposed in this pull request?
Update docs for two parameters `spark.sql.files.maxPartitionBytes` and `spark.sql.files.openCostInBytes ` in Other Configuration Options.

## How was this patch tested?
N/A

Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>

Closes #13797 from maropu/SPARK-15894-2.
2016-06-21 14:27:16 +08:00
Felix Cheung 58f6e27dd7 [SPARK-15863][SQL][DOC][SPARKR] sql programming guide updates to include sparkSession in R
## What changes were proposed in this pull request?

Update doc as per discussion in PR #13592

## How was this patch tested?

manual

shivaram liancheng

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #13799 from felixcheung/rsqlprogrammingguide.
2016-06-21 13:56:37 +08:00
Cheng Lian 6df8e38860 [SPARK-15863][SQL][DOC] Initial SQL programming guide update for Spark 2.0
## What changes were proposed in this pull request?

Initial SQL programming guide update for Spark 2.0. Contents like 1.6 to 2.0 migration guide are still incomplete.

We may also want to add more examples for Scala/Java Dataset typed transformations.

## How was this patch tested?

N/A

Author: Cheng Lian <lian@databricks.com>

Closes #13592 from liancheng/sql-programming-guide-2.0.
2016-06-20 14:50:28 -07:00
Mortada Mehyar 675a73715d [DOCUMENTATION] fixed groupby aggregation example for pyspark
## What changes were proposed in this pull request?

fixing documentation for the groupby/agg example in python

## How was this patch tested?

the existing example in the documentation dose not contain valid syntax (missing parenthesis) and is not using `Column` in the expression for `agg()`

after the fix here's how I tested it:

```
In [1]: from pyspark.sql import Row

In [2]: import pyspark.sql.functions as func

In [3]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:records = [{'age': 19, 'department': 1, 'expense': 100},
: {'age': 20, 'department': 1, 'expense': 200},
: {'age': 21, 'department': 2, 'expense': 300},
: {'age': 22, 'department': 2, 'expense': 300},
: {'age': 23, 'department': 3, 'expense': 300}]
:--

In [4]: df = sqlContext.createDataFrame([Row(**d) for d in records])

In [5]: df.groupBy("department").agg(df["department"], func.max("age"), func.sum("expense")).show()

+----------+----------+--------+------------+
|department|department|max(age)|sum(expense)|
+----------+----------+--------+------------+
|         1|         1|      20|         300|
|         2|         2|      22|         600|
|         3|         3|      23|         300|
+----------+----------+--------+------------+

Author: Mortada Mehyar <mortada.mehyar@gmail.com>

Closes #13587 from mortada/groupby_agg_doc_fix.
2016-06-10 00:23:34 -07:00
gatorsmile 6cb8f836da [SPARK-15396][SQL][DOC] It can't connect hive metastore database
#### What changes were proposed in this pull request?
The `hive.metastore.warehouse.dir` property in hive-site.xml is deprecated since Spark 2.0.0. Users might not be able to connect to the existing metastore if they do not use the new conf parameter `spark.sql.warehouse.dir`.

This PR is to update the document and example for explaining the latest changes in the configuration of default location of database.

Below is the screenshot of the latest generated docs:

<img width="681" alt="screenshot 2016-05-20 08 38 10" src="https://cloud.githubusercontent.com/assets/11567269/15433296/a05c4ace-1e66-11e6-8d2b-73682b32e9c2.png">

<img width="789" alt="screenshot 2016-05-20 08 53 26" src="https://cloud.githubusercontent.com/assets/11567269/15433734/645dc42e-1e68-11e6-9476-effc9f8721bb.png">

<img width="789" alt="screenshot 2016-05-20 08 53 37" src="https://cloud.githubusercontent.com/assets/11567269/15433738/68569f92-1e68-11e6-83d3-ef5bb221a8d8.png">

No change is made in the R's example.

<img width="860" alt="screenshot 2016-05-20 08 54 38" src="https://cloud.githubusercontent.com/assets/11567269/15433779/965b8312-1e68-11e6-8bc4-53c88ceacde2.png">

#### How was this patch tested?
N/A

Author: gatorsmile <gatorsmile@gmail.com>

Closes #13225 from gatorsmile/document.
2016-05-21 23:12:27 -07:00
Sean Zhong 25b315e6ca [SPARK-15171][SQL] Remove the references to deprecated method dataset.registerTempTable
## What changes were proposed in this pull request?

Update the unit test code, examples, and documents to remove calls to deprecated method `dataset.registerTempTable`.

## How was this patch tested?

This PR only changes the unit test code, examples, and comments. It should be safe.
This is a follow up of PR https://github.com/apache/spark/pull/12945 which was merged.

Author: Sean Zhong <seanzhong@databricks.com>

Closes #13098 from clockfly/spark-15171-remove-deprecation.
2016-05-18 09:01:59 +08:00
Sun Rui 4ae9fe091c [SPARK-12919][SPARKR] Implement dapply() on DataFrame in SparkR.
## What changes were proposed in this pull request?

dapply() applies an R function on each partition of a DataFrame and returns a new DataFrame.

The function signature is:

	dapply(df, function(localDF) {}, schema = NULL)

R function input: local data.frame from the partition on local node
R function output: local data.frame

Schema specifies the Row format of the resulting DataFrame. It must match the R function's output.
If schema is not specified, each partition of the result DataFrame will be serialized in R into a single byte array. Such resulting DataFrame can be processed by successive calls to dapply().

## How was this patch tested?
SparkR unit tests.

Author: Sun Rui <rui.sun@intel.com>
Author: Sun Rui <sunrui2016@gmail.com>

Closes #12493 from sun-rui/SPARK-12919.
2016-04-29 16:41:07 -07:00
Dongjoon Hyun 6ab4d9e0c7 [SPARK-14883][DOCS] Fix wrong R examples and make them up-to-date
## What changes were proposed in this pull request?

This issue aims to fix some errors in R examples and make them up-to-date in docs and example modules.

- Remove the wrong usage of `map`. We need to use `lapply` in `sparkR` if needed. However, `lapply` is private so far. The corrected example will be added later.
- Fix the wrong example in Section `Generic Load/Save Functions` of `docs/sql-programming-guide.md` for consistency
- Fix datatypes in `sparkr.md`.
- Update a data result in `sparkr.md`.
- Replace deprecated functions to remove warnings: jsonFile -> read.json, parquetFile -> read.parquet
- Use up-to-date R-like functions: loadDF -> read.df, saveDF -> write.df, saveAsParquetFile -> write.parquet
- Replace `SparkR DataFrame` with `SparkDataFrame` in `dataframe.R` and `data-manipulation.R`.
- Other minor syntax fixes and a typo.

## How was this patch tested?

Manual.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #12649 from dongjoon-hyun/SPARK-14883.
2016-04-24 22:10:27 -07:00
Mark Grover ff9ae61a3b [SPARK-14601][DOC] Minor doc/usage changes related to removal of Spark assembly
## What changes were proposed in this pull request?

Removing references to assembly jar in documentation.
Adding an additional (previously undocumented) usage of spark-submit to run examples.

## How was this patch tested?

Ran spark-submit usage to ensure formatting was fine. Ran examples using SparkSubmit.

Author: Mark Grover <mark@apache.org>

Closes #12365 from markgrover/spark-14601.
2016-04-14 18:51:43 -07:00
Dongjoon Hyun 1a0cca1fc8 [MINOR][DOCS] Fix wrong data types in JSON Datasets example.
## What changes were proposed in this pull request?

This PR fixes the `age` data types from `integer` to `long` in `SQL Programming Guide: JSON Datasets`.

## How was this patch tested?

Manual.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #12290 from dongjoon-hyun/minor_fix_type_in_json_example.
2016-04-11 09:03:11 +01:00
Reynold Xin 9ca0760d67 [SPARK-10063][SQL] Remove DirectParquetOutputCommitter
## What changes were proposed in this pull request?
This patch removes DirectParquetOutputCommitter. This was initially created by Databricks as a faster way to write Parquet data to S3. However, given how the underlying S3 Hadoop implementation works, this committer only works when there are no failures. If there are multiple attempts of the same task (e.g. speculation or task failures or node failures), the output data can be corrupted. I don't think this performance optimization outweighs the correctness issue.

## How was this patch tested?
Removed the related tests also.

Author: Reynold Xin <rxin@databricks.com>

Closes #12229 from rxin/SPARK-10063.
2016-04-07 00:51:45 -07:00
Marcelo Vanzin 24d7d2e453 [SPARK-13579][BUILD] Stop building the main Spark assembly.
This change modifies the "assembly/" module to just copy needed
dependencies to its build directory, and modifies the packaging
script to pick those up (and remove duplicate jars packages in the
examples module).

I also made some minor adjustments to dependencies to remove some
test jars from the final packaging, and remove jars that conflict with each
other when packaged separately (e.g. servlet api).

Also note that this change restores guava in applications' classpaths, even
though it's still shaded inside Spark. This is now needed for the Hadoop
libraries that are packaged with Spark, which now are not processed by
the shade plugin.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #11796 from vanzin/SPARK-13579.
2016-04-04 16:52:22 -07:00
Daoyuan Wang d1c193a2f1 [SPARK-12855][MINOR][SQL][DOC][TEST] remove spark.sql.dialect from doc and test
## What changes were proposed in this pull request?

Since developer API of plug-able parser has been removed in #10801 , docs should be updated accordingly.

## How was this patch tested?

This patch will not affect the real code path.

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #11758 from adrian-wang/spark12855.
2016-03-16 22:52:10 -07:00
Dongjoon Hyun 4ce2d24e2a [SPARK-13942][CORE][DOCS] Remove Shark-related docs for 2.x
## What changes were proposed in this pull request?

`Shark` was merged into `Spark SQL` since [July 2014](https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html). The followings seem to be the only legacy. For Spark 2.x, we had better clean up those docs.

**Migration Guide**
```
- ## Migration Guide for Shark Users
- ...
- ### Scheduling
- ...
- ### Reducer number
- ...
- ### Caching
```

## How was this patch tested?

Pass the Jenkins test.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11770 from dongjoon-hyun/SPARK-13942.
2016-03-16 15:50:24 -07:00
Dongjoon Hyun c3689bc24e [SPARK-13702][CORE][SQL][MLLIB] Use diamond operator for generic instance creation in Java code.
## What changes were proposed in this pull request?

In order to make `docs/examples` (and other related code) more simple/readable/user-friendly, this PR replaces existing codes like the followings by using `diamond` operator.

```
-    final ArrayList<Product2<Object, Object>> dataToWrite =
-      new ArrayList<Product2<Object, Object>>();
+    final ArrayList<Product2<Object, Object>> dataToWrite = new ArrayList<>();
```

Java 7 or higher supports **diamond** operator which replaces the type arguments required to invoke the constructor of a generic class with an empty set of type parameters (<>). Currently, Spark Java code use mixed usage of this.

## How was this patch tested?

Manual.
Pass the existing tests.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11541 from dongjoon-hyun/SPARK-13702.
2016-03-09 10:31:26 +00:00
Dongjoon Hyun 024482bf51 [MINOR][DOCS] Fix all typos in markdown files of doc and similar patterns in other comments
## What changes were proposed in this pull request?

This PR tries to fix all typos in all markdown files under `docs` module,
and fixes similar typos in other comments, too.

## How was the this patch tested?

manual tests.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11300 from dongjoon-hyun/minor_fix_typos.
2016-02-22 09:52:07 +00:00
Sasaki Toru c2f21d8898 [SPARK-13264][DOC] Removed multi-byte characters in spark-env.sh.template
In spark-env.sh.template, there are multi-byte characters, this PR will remove it.

Author: Sasaki Toru <sasakitoa@nttdata.co.jp>

Closes #11149 from sasakitoa/remove_multibyte_in_sparkenv.
2016-02-11 09:30:36 +00:00
Sebastián Ramírez c882ec57de [SPARK-13040][DOCS] Update JDBC deprecated SPARK_CLASSPATH documentation
Update JDBC documentation based on http://stackoverflow.com/a/30947090/219530 as SPARK_CLASSPATH is deprecated.

Also, that's how it worked, it didn't work with the SPARK_CLASSPATH or the --jars alone.

This would solve issue: https://issues.apache.org/jira/browse/SPARK-13040

Author: Sebastián Ramírez <tiangolo@gmail.com>

Closes #10948 from tiangolo/patch-docs-jdbc.
2016-02-09 08:49:34 +00:00
Takeshi YAMAMURO da9146c91a [DOCS] Fix the jar location of datanucleus in sql-programming-guid.md
ISTM `lib` is better because `datanucleus` jars are located in `lib` for release builds.

Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>

Closes #10901 from maropu/DocFix.
2016-02-01 12:02:06 -08:00
Sun Rui 1b2a918e59 [SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR.
Author: Sun Rui <rui.sun@intel.com>

Closes #10201 from sun-rui/SPARK-12204.
2016-01-20 21:08:15 -08:00
Brandon Bradley a767ee8a05 [SPARK-12758][SQL] add note to Spark SQL Migration guide about TimestampType casting
Warning users about casting changes.

Author: Brandon Bradley <bradleytastic@gmail.com>

Closes #10708 from blbradley/spark-12758.
2016-01-11 14:21:50 -08:00
Josh Rosen 6c83d938cc [SPARK-12579][SQL] Force user-specified JDBC driver to take precedence
Spark SQL's JDBC data source allows users to specify an explicit JDBC driver to load (using the `driver` argument), but in the current code it's possible that the user-specified driver will not be used when it comes time to actually create a JDBC connection.

In a nutshell, the problem is that you might have multiple JDBC drivers on the classpath that claim to be able to handle the same subprotocol, so simply registering the user-provided driver class with the our `DriverRegistry` and JDBC's `DriverManager` is not sufficient to ensure that it's actually used when creating the JDBC connection.

This patch addresses this issue by first registering the user-specified driver with the DriverManager, then iterating over the driver manager's loaded drivers in order to obtain the correct driver and use it to create a connection (previously, we just called `DriverManager.getConnection()` directly).

If a user did not specify a JDBC driver to use, then we call `DriverManager.getDriver` to figure out the class of the driver to use, then pass that class's name to executors; this guards against corner-case bugs in situations where the driver and executor JVMs might have different sets of JDBC drivers on their classpaths (previously, there was the (rare) potential for `DriverManager.getConnection()` to use different drivers on the driver and executors if the user had not explicitly specified a JDBC driver class and the classpaths were different).

This patch is inspired by a similar patch that I made to the `spark-redshift` library (https://github.com/databricks/spark-redshift/pull/143), which contains its own modified fork of some of Spark's JDBC data source code (for cross-Spark-version compatibility reasons).

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10519 from JoshRosen/jdbc-driver-precedence.
2016-01-04 10:39:42 -08:00
Yin Huai ac8cdf1cdc [SPARK-11678][SQL][DOCS] Document basePath in the programming guide.
This PR adds document for `basePath`, which is a new parameter used by `HadoopFsRelation`.

The compiled doc is shown below.
![image](https://cloud.githubusercontent.com/assets/2072857/11673132/1ba01192-9dcb-11e5-98d9-ac0b4e92e98c.png)

JIRA: https://issues.apache.org/jira/browse/SPARK-11678

Author: Yin Huai <yhuai@databricks.com>

Closes #10211 from yhuai/basePathDoc.
2015-12-09 18:09:36 -08:00
Michael Armbrust 3959489423 [SPARK-12069][SQL] Update documentation with Datasets
Author: Michael Armbrust <michael@databricks.com>

Closes #10060 from marmbrus/docs.
2015-12-08 15:58:35 -08:00
woj-i 6a8cf80cc8 [SPARK-11821] Propagate Kerberos keytab for all environments
andrewor14 the same PR as in branch 1.5
harishreedharan

Author: woj-i <wojciechindyk@gmail.com>

Closes #9859 from woj-i/master.
2015-12-01 11:05:45 -08:00
Stephen Samuel 026ea2eab1 Updated sql programming guide to include jdbc fetch size
Author: Stephen Samuel <sam@sksamuel.com>

Closes #9377 from sksamuel/master.
2015-11-23 19:52:12 -08:00
Cheng Lian 7b1407c7b9 [SPARK-11089][SQL] Adds option for disabling multi-session in Thrift server
This PR adds a new option `spark.sql.hive.thriftServer.singleSession` for disabling multi-session support in the Thrift server.

Note that this option is added as a Spark configuration (retrieved from `SparkConf`) rather than Spark SQL configuration (retrieved from `SQLConf`). This is because all SQL configurations are session-ized. Since multi-session support is by default on, no JDBC connection can modify global configurations like the newly added one.

Author: Cheng Lian <lian@databricks.com>

Closes #9740 from liancheng/spark-11089.single-session-option.
2015-11-17 11:17:52 -08:00
gatorsmile 2f38378856 [SPARK-11360][DOC] Loss of nullability when writing parquet files
This fix is to add one line to explain the current behavior of Spark SQL when writing Parquet files. All columns are forced to be nullable for compatibility reasons.

Author: gatorsmile <gatorsmile@gmail.com>

Closes #9314 from gatorsmile/lossNull.
2015-11-09 16:06:48 -08:00
Rohit Agarwal b541b31630 [DOC][MINOR][SQL] Fix internal link
It doesn't show up as a hyperlink currently. It will show up as a hyperlink after this change.

Author: Rohit Agarwal <mindprince@gmail.com>

Closes #9544 from mindprince/patch-2.
2015-11-09 13:28:00 +01:00
xin Wu 26739059bc [SPARK-10046][SQL] Hive warehouse dir not set in current directory when not …
Doc change to align with HiveConf default in terms of where to create `warehouse` directory.

Author: xin Wu <xinwu@us.ibm.com>

Closes #9365 from xwu0226/spark-10046-commit.
2015-11-08 12:28:19 -08:00
Rohit Agarwal 5c4e6d7ec9 [DOC][SQL] Remove redundant out-of-place python snippet
This snippet seems to be mistakenly introduced at two places in #5348.

Author: Rohit Agarwal <mindprince@gmail.com>

Closes #9540 from mindprince/patch-1.
2015-11-08 14:24:26 +00:00
Wenchen Fan e0fc9c7e59 [SPARK-11197][SQL] add doc for run SQL on files directly
Author: Wenchen Fan <wenchen@databricks.com>

Closes #9467 from cloud-fan/doc.
2015-11-04 09:33:30 -08:00
lewuathe d648a4ad54 [DOC] Missing link to R DataFrame API doc
Author: lewuathe <lewuathe@me.com>
Author: Lewuathe <lewuathe@me.com>

Closes #9394 from Lewuathe/missing-link-to-R-dataframe.
2015-11-03 16:38:22 -08:00
Josh Rosen b67dc6a434 [SPARK-11299][DOC] Fix link to Scala DataFrame Functions reference
The SQL programming guide's link to the DataFrame functions reference points to the wrong location; this patch fixes that.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #9269 from JoshRosen/SPARK-11299.
2015-10-25 10:31:44 +01:00
Sean Owen 82bbc2a5f2 [SPARK-9570] [DOCS] Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'.
Recommend `--master yarn --deploy-mode {cluster,client}` consistently in docs.
Follow-on to https://github.com/apache/spark/pull/8385
CC nssalian

Author: Sean Owen <sowen@cloudera.com>

Closes #8968 from srowen/SPARK-9570.
2015-10-04 09:31:52 +01:00
Jacek Laskowski ca9fe540fe [SPARK-10662] [DOCS] Code snippets are not properly formatted in tables
* Backticks are processed properly in Spark Properties table
* Removed unnecessary spaces
* See http://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/running-on-yarn.html

Author: Jacek Laskowski <jacek.laskowski@deepsense.io>

Closes #8795 from jaceklaskowski/docs-yarn-formatting.
2015-09-21 19:46:39 +01:00
Josh Rosen 2117eea71e [SPARK-10710] Remove ability to disable spilling in core and SQL
It does not make much sense to set `spark.shuffle.spill` or `spark.sql.planner.externalSort` to false: I believe that these configurations were initially added as "escape hatches" to guard against bugs in the external operators, but these operators are now mature and well-tested. In addition, these configurations are not handled in a consistent way anymore: SQL's Tungsten codepath ignores these configurations and will continue to use spilling operators. Similarly, Spark Core's `tungsten-sort` shuffle manager does not respect `spark.shuffle.spill=false`.

This pull request removes these configurations, adds warnings at the appropriate places, and deletes a large amount of code which was only used in code paths that did not support spilling.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #8831 from JoshRosen/remove-ability-to-disable-spilling.
2015-09-19 21:40:21 -07:00
Kousuke Saruta d507f9c0b7 [SPARK-10584] [SQL] [DOC] Documentation about the compatible Hive version is wrong.
In Spark 1.5.0, Spark SQL is compatible with Hive 0.12.0 through 1.2.1 but the documentation is wrong.

/CC yhuai

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #8776 from sarutak/SPARK-10584-2.
2015-09-19 01:59:36 -07:00
Kousuke Saruta cf2821ef5f [SPARK-10584] [DOC] [SQL] Documentation about spark.sql.hive.metastore.version is wrong.
The default value of hive metastore version is 1.2.1 but the documentation says the value of `spark.sql.hive.metastore.version` is 0.13.1.
Also, we cannot get the default value by `sqlContext.getConf("spark.sql.hive.metastore.version")`.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #8739 from sarutak/SPARK-10584.
2015-09-14 12:06:23 -07:00
GuoQiang Li 5369be8068 [SPARK-10350] [DOC] [SQL] Removed duplicated option description from SQL guide
Author: GuoQiang Li <witgo@qq.com>

Closes #8520 from witgo/SPARK-10350.
2015-08-29 13:20:27 -07:00
Yin Huai b3dd569ad4 [SPARK-10287] [SQL] Fixes JSONRelation refreshing on read path
https://issues.apache.org/jira/browse/SPARK-10287

After porting json to HadoopFsRelation, it seems hard to keep the behavior of picking up new files automatically for JSON. This PR removes this behavior, so JSON is consistent with others (ORC and Parquet).

Author: Yin Huai <yhuai@databricks.com>

Closes #8469 from yhuai/jsonRefresh.
2015-08-27 16:11:25 -07:00
Michael Armbrust dc86a227e4 [SPARK-9148] [SPARK-10252] [SQL] Update SQL Programming Guide
Author: Michael Armbrust <michael@databricks.com>

Closes #8441 from marmbrus/documentation.
2015-08-27 11:45:15 -07:00
Cheng Lian 0fac144f6b [SPARK-9424] [SQL] Parquet programming guide updates for 1.5
Author: Cheng Lian <lian@databricks.com>

Closes #8467 from liancheng/spark-9424/parquet-docs-for-1.5.
2015-08-26 18:14:54 -07:00
Bill Chambers b23c4d3ffc Fix Broken Link
Link was broken because it included tick marks.

Author: Bill Chambers <wchambers@ischool.berkeley.edu>

Closes #8302 from anabranch/patch-1.
2015-08-19 00:05:01 -07:00
Davies Liu 17284db314 [SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen/unsafe
spark.sql.tungsten.enabled will be the default value for both codegen and unsafe, they are kept internally for debug/testing.

cc marmbrus rxin

Author: Davies Liu <davies@databricks.com>

Closes #7998 from davies/tungsten and squashes the following commits:

c1c16da [Davies Liu] update doc
1a47be1 [Davies Liu] use tungsten.enabled for both of codegen/unsafe

(cherry picked from commit 4e70e8256c)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-08-06 19:42:02 -07:00
Davies Liu 49b1504fe3 Revert "[SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen/unsafe"
This reverts commit 4e70e8256c.
2015-08-06 17:36:12 -07:00
Davies Liu 4e70e8256c [SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen/unsafe
spark.sql.tungsten.enabled will be the default value for both codegen and unsafe, they are kept internally for debug/testing.

cc marmbrus rxin

Author: Davies Liu <davies@databricks.com>

Closes #7998 from davies/tungsten and squashes the following commits:

c1c16da [Davies Liu] update doc
1a47be1 [Davies Liu] use tungsten.enabled for both of codegen/unsafe
2015-08-06 17:30:31 -07:00
KaiXinXiaoLei 536d2adc12 [SPARK-9535][SQL][DOCS] Modify document for codegen.
#7142 made codegen enabled by default so let's modify the corresponding documents.

Closes #7142

Author: KaiXinXiaoLei <huleilei1@huawei.com>
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #7863 from sarutak/SPARK-9535 and squashes the following commits:

0884424 [Kousuke Saruta] Removed a line which mentioned about the effect of codegen enabled
3c11af0 [Kousuke Saruta] Merge branch 'sqlconfig' of https://github.com/KaiXinXiaoLei/spark into SPARK-9535
4ee531d [KaiXinXiaoLei] delete space
4cfd11d [KaiXinXiaoLei] change spark.sql.planner.externalSort
d624cf8 [KaiXinXiaoLei] sql config is wrong
2015-08-02 20:04:21 -07:00
Sean Owen 873ab0f969 [SPARK-9490] [DOCS] [MLLIB] MLlib evaluation metrics guide example python code uses deprecated print statement
Use print(x) not print x for Python 3 in eval examples
CC sethah mengxr -- just wanted to close this out before 1.5

Author: Sean Owen <sowen@cloudera.com>

Closes #7822 from srowen/SPARK-9490 and squashes the following commits:

01abeba [Sean Owen] Change "print x" to "print(x)" in the rest of the docs too
bd7f7fb [Sean Owen] Use print(x) not print x for Python 3 in eval examples
2015-07-31 13:45:28 -07:00