Commit graph

11388 commits

Author SHA1 Message Date
Davies Liu 292ee1a994 [SPARK-8021] [SQL] [PYSPARK] make Python read/write API consistent with Scala
add schema()/format()/options() for reader,  add mode()/format()/options()/partitionBy() for writer

cc rxin yhuai  pwendell

Author: Davies Liu <davies@databricks.com>

Closes #6578 from davies/readwrite and squashes the following commits:

720d293 [Davies Liu] address comments
b65dfa2 [Davies Liu] Update readwriter.py
1299ab6 [Davies Liu] make Python API consistent with Scala

(cherry picked from commit 445647a1a3)
Signed-off-by: Patrick Wendell <patrick@databricks.com>
2015-06-02 08:37:26 -07:00
Yin Huai 87941ff8c4 [SPARK-8023][SQL] Add "deterministic" attribute to Expression to avoid collapsing nondeterministic projects.
This closes #6570.

Author: Yin Huai <yhuai@databricks.com>
Author: Reynold Xin <rxin@databricks.com>

Closes #6573 from rxin/deterministic and squashes the following commits:

356cd22 [Reynold Xin] Added unit test for the optimizer.
da3fde1 [Reynold Xin] Merge pull request #6570 from yhuai/SPARK-8023
da56200 [Yin Huai] Comments.
e38f264 [Yin Huai] Comment.
f9d6a73 [Yin Huai] Add a deterministic method to Expression.

(cherry picked from commit 0f80990bfa)
Signed-off-by: Reynold Xin <rxin@databricks.com>

Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/random.scala
2015-06-02 00:21:27 -07:00
Yin Huai 4940630f56 [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make metadataHive get constructed too early
https://issues.apache.org/jira/browse/SPARK-8020

Author: Yin Huai <yhuai@databricks.com>

Closes #6571 from yhuai/SPARK-8020-1 and squashes the following commits:

0398f5b [Yin Huai] First populate the SQLConf and then construct executionHive and metadataHive.

(cherry picked from commit 7b7f7b6c6f)
Signed-off-by: Yin Huai <yhuai@databricks.com>
2015-06-02 00:17:09 -07:00
Davies Liu 9d6475b93d [SPARK-6917] [SQL] DecimalType is not read back when non-native type exists
cc yhuai

Author: Davies Liu <davies@databricks.com>

Closes #6558 from davies/decimalType and squashes the following commits:

c877ca8 [Davies Liu] Update ParquetConverter.scala
48cc57c [Davies Liu] Update ParquetConverter.scala
b43845c [Davies Liu] add test
3b4a94f [Davies Liu] DecimalType is not read back when non-native type exists

(cherry picked from commit bcb47ad771)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-01 23:12:37 -07:00
Xiangrui Meng 011b07e238 [SPARK-7582] [MLLIB] user guide for StringIndexer
This PR adds a Java unit test and user guide for `StringIndexer`. I put it before `OneHotEncoder` because they are closely related. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6561 from mengxr/SPARK-7582 and squashes the following commits:

4bba4f1 [Xiangrui Meng] fix example
ba1cd1b [Xiangrui Meng] fix style
7fa18d1 [Xiangrui Meng] add user guide for StringIndexer
136cb93 [Xiangrui Meng] add a Java unit test for StringIndexer

(cherry picked from commit 0221c7f0ef)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-06-01 22:03:37 -07:00
Reynold Xin 575f3b3aa6 Fixed typo in the previous commit.
(cherry picked from commit b53a011647)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-01 21:42:22 -07:00
Yin Huai e6d58955c3 [SPARK-7965] [SPARK-7972] [SQL] Handle expressions containing multiple window expressions and make parser match window frames in case insensitive way
JIRAs:
https://issues.apache.org/jira/browse/SPARK-7965
https://issues.apache.org/jira/browse/SPARK-7972

Author: Yin Huai <yhuai@databricks.com>

Closes #6524 from yhuai/7965-7972 and squashes the following commits:

c12c79c [Yin Huai] Add doc for returned value.
de64328 [Yin Huai] Address rxin's comments.
fc9b1ad [Yin Huai] wip
2996da4 [Yin Huai] scala style
20b65b7 [Yin Huai] Handle expressions containing multiple window expressions.
9568b21 [Yin Huai] case insensitive matches
41f633d [Yin Huai] Failed test case.

(cherry picked from commit e797dba58e)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-01 21:40:35 -07:00
zsxwing 0d21a41be9 [SPARK-8025][Streaming]Add JavaDoc style deprecation for deprecated Streaming methods
Scala `deprecated` annotation actually doesn't show up in JavaDoc.

Author: zsxwing <zsxwing@gmail.com>

Closes #6564 from zsxwing/SPARK-8025 and squashes the following commits:

2faa2bb [zsxwing] Add JavaDoc style deprecation for deprecated Streaming methods

(cherry picked from commit 7f74bb3bc6)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-01 21:36:55 -07:00
Reynold Xin 3af4c0b4e8 [minor doc] Add exploratory data analysis warning for DataFrame.stat.freqItem API
Author: Reynold Xin <rxin@databricks.com>

Closes #6569 from rxin/freqItemsWarning and squashes the following commits:

7eec145 [Reynold Xin] [minor doc] Add exploratory data analysis warning for DataFrame.stat.freqItem API.

(cherry picked from commit 4c868b9943)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-01 21:29:46 -07:00
Shivaram Venkataraman d542a35ad7 [SPARK-8027] [SPARKR] Add maven profile to build R package docs
Also use that profile in create-release.sh

cc pwendell -- Note that this means that we need `knitr` and `roxygen` installed on the machines used for building the release. Let me know if you need help with that.

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6567 from shivaram/SPARK-8027 and squashes the following commits:

8dc8ecf [Shivaram Venkataraman] Add maven profile to build R package docs Also use that profile in create-release.sh

(cherry picked from commit cae9306c4f)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
2015-06-01 21:21:55 -07:00
Reynold Xin 8ac23762ec [SPARK-8026][SQL] Add Column.alias to Scala/Java DataFrame API
Author: Reynold Xin <rxin@databricks.com>

Closes #6565 from rxin/alias and squashes the following commits:

286d880 [Reynold Xin] [SPARK-8026][SQL] Add Column.alias to Scala/Java DataFrame API

(cherry picked from commit 89f642a0e8)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-01 21:13:21 -07:00
Reynold Xin efc0e05323 [SPARK-7982][SQL] DataFrame.stat.crosstab should use 0 instead of null for pairs that don't appear
Author: Reynold Xin <rxin@databricks.com>

Closes #6566 from rxin/crosstab and squashes the following commits:

e0ace1c [Reynold Xin] [SPARK-7982][SQL] DataFrame.stat.crosstab should use 0 instead of null for pairs that don't appear

(cherry picked from commit 6396cc0303)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-01 21:11:26 -07:00
Shivaram Venkataraman cbfb682ab9 [SPARK-8028] [SPARKR] Use addJar instead of setJars in SparkR
This prevents the spark.jars from being cleared while using `--packages` or `--jars`

cc pwendell davies brkyvz

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6568 from shivaram/SPARK-8028 and squashes the following commits:

3a9cf1f [Shivaram Venkataraman] Use addJar instead of setJars in SparkR This prevents the spark.jars from being cleared

(cherry picked from commit 6b44278ef7)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
2015-06-01 21:01:26 -07:00
Andrew Or f5a9833f3f [MINOR] [UI] Improve error message on log page
Currently if a bad log type if specified, then we get blank.
We should provide a more informative error message.
2015-06-01 20:11:38 -07:00
Tathagata Das a7c8b00b72 [SPARK-7958] [STREAMING] Handled exception in StreamingContext.start() to prevent leaking of actors
StreamingContext.start() can throw exception because DStream.validateAtStart() fails (say, checkpoint directory not set for StateDStream). But by then JobScheduler, JobGenerator, and ReceiverTracker has already started, along with their actors. But those cannot be shutdown because the only way to do that is call StreamingContext.stop() which cannot be called as the context has not been marked as ACTIVE.

The solution in this PR is to stop the internal scheduler if start throw exception, and mark the context as STOPPED.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #6559 from tdas/SPARK-7958 and squashes the following commits:

20b2ec1 [Tathagata Das] Added synchronized
790b617 [Tathagata Das] Handled exception in StreamingContext.start()

(cherry picked from commit 2f9c7519d6)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-06-01 20:05:11 -07:00
Michael Nazario a76c2e128b [SPARK-7899] [PYSPARK] Fix Python 3 pyspark/sql/types module conflict
This PR makes the types module in `pyspark/sql/types` work with pylint static analysis by removing the dynamic naming of the `pyspark/sql/_types` module to `pyspark/sql/types`.

Tests are now loaded using `$PYSPARK_DRIVER_PYTHON -m module` rather than `$PYSPARK_DRIVER_PYTHON module.py`. The old method adds the location of `module.py` to `sys.path`, so this change prevents accidental use of relative paths in Python.

Author: Michael Nazario <mnazario@palantir.com>

Closes #6439 from mnazario/feature/SPARK-7899 and squashes the following commits:

366ef30 [Michael Nazario] Remove hack on random.py
bb8b04d [Michael Nazario] Make doctests consistent with other tests
6ee4f75 [Michael Nazario] Change test scripts to use "-m"
673528f [Michael Nazario] Move _types back to types
2015-06-01 16:56:04 -07:00
Xiangrui Meng 4cafc63524 [SPARK-7584] [MLLIB] User guide for VectorAssembler
This PR adds a section in the user guide for `VectorAssembler` with code examples in Python/Java/Scala. It also adds a unit test in Java.

jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6556 from mengxr/SPARK-7584 and squashes the following commits:

11313f6 [Xiangrui Meng] simplify Java example
0cd47f3 [Xiangrui Meng] update user guide
fd36292 [Xiangrui Meng] update Java unit test
ce61ca0 [Xiangrui Meng] add Java unit test for VectorAssembler
e399942 [Xiangrui Meng] scala/python example code

(cherry picked from commit 90c606925e)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-06-01 15:05:21 -07:00
Davies Liu d023300f4e [SPARK-7497] [PYSPARK] [STREAMING] fix streaming flaky tests
Increase the duration and timeout in streaming python tests.

Author: Davies Liu <davies@databricks.com>

Closes #6239 from davies/flaky_tests and squashes the following commits:

d6aee8f [Davies Liu] fix window tests
26317f7 [Davies Liu] Merge branch 'master' of github.com:apache/spark into flaky_tests
7947db6 [Davies Liu] fix streaming flaky tests

(cherry picked from commit b7ab0299b0)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-06-01 14:40:40 -07:00
Nishkam Ravi 2f41cf3e29 [DOC] Minor modification to Streaming docs with regards to parallel data receiving
pwendell tdas

Author: Nishkam Ravi <nravi@cloudera.com>
Author: nishkamravi2 <nishkamravi@gmail.com>
Author: nravi <nravi@c1704.halxg.cloudera.com>

Closes #6544 from nishkamravi2/master_nravi and squashes the following commits:

46e8c03 [Nishkam Ravi] Slight modification to streaming docs

(cherry picked from commit e7c7e51f2e)
Signed-off-by: Sean Owen <sowen@cloudera.com>
2015-06-01 21:37:40 +01:00
Davies Liu 78a6723e87 [SPARK-7978] [SQL] [PYSPARK] DecimalType should not be singleton
Author: Davies Liu <davies@databricks.com>

Closes #6532 from davies/decimal and squashes the following commits:

c7fcbce [Davies Liu] Update tests.py
1425359 [Davies Liu] DecimalType should not be singleton

(cherry picked from commit 91777a1c3a)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-31 19:56:03 -07:00
Josh Rosen df0bf71ee0 [HOTFIX] Remove trailing whitespace to fix Scalastyle checks
866652c903 enabled this check.
2015-05-31 16:34:20 -07:00
Sun Rui f1d4e7e311 [SPARK-7227] [SPARKR] Support fillna / dropna in R DataFrame.
Author: Sun Rui <rui.sun@intel.com>

Closes #6183 from sun-rui/SPARK-7227 and squashes the following commits:

dd6f5b3 [Sun Rui] Rename readEnv() back to readMap(). Add alias na.omit() for dropna().
41cf725 [Sun Rui] [SPARK-7227][SPARKR] Support fillna / dropna in R DataFrame.

(cherry picked from commit 46576ab303)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
2015-05-31 15:02:16 -07:00
Reynold Xin bab0fab68f [SPARK-3850] Turn style checker on for trailing whitespaces.
Author: Reynold Xin <rxin@databricks.com>

Closes #6541 from rxin/trailing-whitespace-on and squashes the following commits:

f72ebe4 [Reynold Xin] [SPARK-3850] Turn style checker on for trailing whitespaces.

(cherry picked from commit 866652c903)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-31 14:23:48 -07:00
Yuhao Yang 4d5ce46772 [SPARK-7949] [MLLIB] [DOC] update document with some missing save/load
add save load for examples:
KMeansModel
PowerIterationClusteringModel
Word2VecModel
IsotonicRegressionModel

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #6498 from hhbyyh/docSaveLoad and squashes the following commits:

7f9f06d [Yuhao Yang] add missing imports
c604cad [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into docSaveLoad
1dd77cc [Yuhao Yang] update document with some missing save/load

(cherry picked from commit 0674700303)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-31 11:52:04 -07:00
Reynold Xin 70cf9c3495 [SPARK-3850] Trim trailing spaces for MLlib.
Author: Reynold Xin <rxin@databricks.com>

Closes #6534 from rxin/whitespace-mllib and squashes the following commits:

38926e3 [Reynold Xin] [SPARK-3850] Trim trailing spaces for MLlib.

(cherry picked from commit e1067d0ad1)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-31 11:35:46 -07:00
zsxwing 8a72bc9170 [MINOR] Add license for dagre-d3 and graphlib-dot
Add license for dagre-d3 and graphlib-dot

Author: zsxwing <zsxwing@gmail.com>

Closes #6539 from zsxwing/LICENSE and squashes the following commits:

82b0475 [zsxwing] Add license for dagre-d3 and graphlib-dot

(cherry picked from commit d1d2def2f5)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-31 11:18:20 -07:00
Reynold Xin 01f38f75d9 [SPARK-7979] Enforce structural type checker.
Author: Reynold Xin <rxin@databricks.com>

Closes #6536 from rxin/structural-type-checker and squashes the following commits:

f833151 [Reynold Xin] Fixed compilation.
633f9a1 [Reynold Xin] Fixed typo.
d1fa804 [Reynold Xin] [SPARK-7979] Enforce structural type checker.

(cherry picked from commit 4b5f12bac9)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-31 01:40:57 -07:00
Reynold Xin a1904fa79e [SPARK-3850] Trim trailing spaces for SQL.
Author: Reynold Xin <rxin@databricks.com>

Closes #6535 from rxin/whitespace-sql and squashes the following commits:

de50316 [Reynold Xin] [SPARK-3850] Trim trailing spaces for SQL.

(cherry picked from commit 63a50be13d)
Signed-off-by: Reynold Xin <rxin@databricks.com>

Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
	sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
	sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala
	sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala
2015-05-31 00:52:02 -07:00
Reynold Xin f63eab950b [SPARK-3850] Trim trailing spaces for examples/streaming/yarn.
Author: Reynold Xin <rxin@databricks.com>

Closes #6530 from rxin/trim-whitespace-1 and squashes the following commits:

7b7b3a0 [Reynold Xin] Reset again.
dc14597 [Reynold Xin] Reset scalastyle.
cd556c4 [Reynold Xin] YARN, Kinesis, Flume.
4223fe1 [Reynold Xin] [SPARK-3850] Trim trailing spaces for examples/streaming.

(cherry picked from commit 564bc11e98)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-31 00:48:29 -07:00
Reynold Xin a7c217166b [SPARK-3850] Trim trailing spaces for core.
Author: Reynold Xin <rxin@databricks.com>

Closes #6533 from rxin/whitespace-2 and squashes the following commits:

038314c [Reynold Xin] [SPARK-3850] Trim trailing spaces for core.

(cherry picked from commit 74fdc97c72)
Signed-off-by: Reynold Xin <rxin@databricks.com>

Conflicts:
	core/src/main/scala/org/apache/spark/storage/TachyonBlockManager.scala
	core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala
2015-05-31 00:17:47 -07:00
Reynold Xin 2016927f70 [SPARK-7975] Add style checker to disallow overriding equals covariantly.
Author: Reynold Xin <rxin@databricks.com>

This patch had conflicts when merged, resolved by
Committer: Reynold Xin <rxin@databricks.com>

Closes #6527 from rxin/covariant-equals and squashes the following commits:

e7d7784 [Reynold Xin] [SPARK-7975] Enforce CovariantEqualsChecker

(cherry picked from commit 7896e99b2a)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-31 00:06:02 -07:00
Cheng Lian 0d093d6e78 [SQL] [MINOR] Adds @deprecated Scaladoc entry for SchemaRDD
Author: Cheng Lian <lian@databricks.com>

Closes #6529 from liancheng/schemardd-deprecation-fix and squashes the following commits:

49765c2 [Cheng Lian] Adds @deprecated Scaladoc entry for SchemaRDD

(cherry picked from commit 8764dccebd)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-30 23:49:47 -07:00
Reynold Xin adfc9d1fa0 [SPARK-7976] Add style checker to disallow overriding finalize.
Author: Reynold Xin <rxin@databricks.com>

Closes #6528 from rxin/style-finalizer and squashes the following commits:

a2211ca [Reynold Xin] [SPARK-7976] Enable NoFinalizeChecker.

(cherry picked from commit 084fef76e9)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-30 23:36:37 -07:00
Reynold Xin 5e268d3956 Update documentation for the new DataFrame reader/writer interface.
Author: Reynold Xin <rxin@databricks.com>

Closes #6522 from rxin/sql-doc-1.4 and squashes the following commits:

c227be7 [Reynold Xin] Updated link.
040b6d7 [Reynold Xin] Update documentation for the new DataFrame reader/writer interface.

(cherry picked from commit 00a7137900)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-30 20:10:08 -07:00
Reynold Xin e74ea78276 [SPARK-7971] Add JavaDoc style deprecation for deprecated DataFrame methods
Scala deprecated annotation actually doesn't show up in JavaDoc.

Author: Reynold Xin <rxin@databricks.com>

Closes #6523 from rxin/df-deprecated-javadoc and squashes the following commits:

26da2b2 [Reynold Xin] [SPARK-7971] Add JavaDoc style deprecation for deprecated DataFrame methods.

(cherry picked from commit c63e1a742b)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-30 19:51:58 -07:00
Reynold Xin dc58e688ab [SQL] Tighten up visibility for JavaDoc.
I went through all the JavaDocs and tightened up visibility.

Author: Reynold Xin <rxin@databricks.com>

Closes #6526 from rxin/sql-1.4-visibility-for-docs and squashes the following commits:

bc37d1e [Reynold Xin] Tighten up visibility for JavaDoc.

(cherry picked from commit 14b314dc2c)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-30 19:51:17 -07:00
Xiangrui Meng a60b8bf329 [SPARK-5610] [DOC] update genjavadocSettings to use the patched version of genjavadoc
This PR updates `genjavadocSettings` to use a patched version of `genjavadoc-plugin` that hides package private classes/methods/interfaces in the generated Java API doc. The patch can be found at: https://github.com/typesafehub/genjavadoc/compare/master...mengxr:spark-1.4.

It wasn't merged into the main repo because there exist corner cases where a package private Scala class has to be a Java public class in order to compile. This doesn't seem to apply to the Spark codebase. So we release a patched version under `org.spark-project` and use it in the Spark build. brkyvz is publishing the artifacts to Maven Central.

Need more people audit the generated APIs and make sure we don't have false negatives.

Current listed classes under `org.apache.spark.rdd`:
![screen shot 2015-05-29 at 12 48 52 pm](https://cloud.githubusercontent.com/assets/829644/7891396/28fb9daa-0601-11e5-8ed8-4e9522d25a71.png)

After this PR:
![screen shot 2015-05-29 at 12 48 23 pm](https://cloud.githubusercontent.com/assets/829644/7891408/408e210e-0601-11e5-975c-ff0a02eb5c91.png)

cc: pwendell rxin srowen

Author: Xiangrui Meng <meng@databricks.com>

Closes #6506 from mengxr/SPARK-5610 and squashes the following commits:

489c785 [Xiangrui Meng] update genjavadocSettings to use the patched version of genjavadoc

(cherry picked from commit 2b258e1c07)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-30 17:22:31 -07:00
Mike Dusenberry df56309b04 [SPARK-7920] [MLLIB] Make MLlib ChiSqSelector Serializable (& Fix Related Documentation Example).
The MLlib ChiSqSelector class is not serializable, and so the example in the ChiSqSelector documentation fails. Also, that example is missing the import of ChiSqSelector.

This PR makes ChiSqSelector extend Serializable in MLlib, and adds the ChiSqSelector import statement to the associated example in the documentation.

Author: Mike Dusenberry <dusenberrymw@gmail.com>

Closes #6462 from dusenberrymw/Make_ChiSqSelector_Serializable_and_Fix_Related_Docs_Example and squashes the following commits:

9cb2f94 [Mike Dusenberry] Make MLlib ChiSqSelector Serializable.
d9003bf [Mike Dusenberry] Add missing import in MLlib ChiSqSelector Docs Scala example.

(cherry picked from commit 1281a35188)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-30 16:51:08 -07:00
Yanbo Liang 2790bb0354 [SPARK-7918] [MLLIB] MLlib Python doc parity check for evaluation and feature
Check then make the MLlib Python evaluation and feature doc to be as complete as the Scala doc.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #6461 from yanboliang/spark-7918 and squashes the following commits:

940e3f1 [Yanbo Liang] truncate too long line and remove extra sparse
a80ae58 [Yanbo Liang] MLlib Python doc parity check for evaluation and feature

(cherry picked from commit 1617363fbb)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-30 16:24:26 -07:00
Reynold Xin 6d7cf5382d Updated SQL programming guide's Hive connectivity section.
(cherry picked from commit 7716a5a1ec)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-30 14:58:11 -07:00
Cheng Lian b2b7601471 [SPARK-7849] [SQL] [Docs] Updates SQL programming guide for 1.4
Author: Cheng Lian <lian@databricks.com>

Closes #6520 from liancheng/spark-7849 and squashes the following commits:

705264b [Cheng Lian] Updates SQL programming guide for 1.4

(cherry picked from commit 6e3f0c7810)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-30 12:16:16 -07:00
Taka Shinagawa e7ba3ea86b [DOCS] [MINOR] Update for the Hadoop versions table with hadoop-2.6
Updated the doc for the hadoop-2.6 profile, which is new to Spark 1.4

Author: Taka Shinagawa <taka.epsilon@gmail.com>

Closes #6450 from mrt/docfix2 and squashes the following commits:

db1c43b [Taka Shinagawa] Updated the hadoop versions for hadoop-2.6 profile
323710e [Taka Shinagawa] The hadoop-2.6 profile is added to the Hadoop versions table

(cherry picked from commit 3ab71eb9d5)
Signed-off-by: Sean Owen <sowen@cloudera.com>
2015-05-30 08:26:06 -04:00
Sean Owen d6e9eade64 [SPARK-7890] [DOCS] Document that Spark 2.11 now supports Kafka
Remove caveat about Kafka / JDBC not being supported for Scala 2.11

Author: Sean Owen <sowen@cloudera.com>

Closes #6470 from srowen/SPARK-7890 and squashes the following commits:

4652634 [Sean Owen] One more rewording
7b7f3c8 [Sean Owen] Restore note about JDBC component
126744d [Sean Owen] Remove caveat about Kafka / JDBC not being supported for Scala 2.11

(cherry picked from commit 8c8de3ed86)
Signed-off-by: Sean Owen <sowen@cloudera.com>
2015-05-30 07:59:43 -04:00
Octavian Geagla 2c45009dad [SPARK-7459] [MLLIB] ElementwiseProduct Java example
Author: Octavian Geagla <ogeagla@gmail.com>

Closes #6008 from ogeagla/elementwise-prod-doc and squashes the following commits:

72e6dc0 [Octavian Geagla] [SPARK-7459] [MLLIB] Java example import.
cf2afbd [Octavian Geagla] [SPARK-7459] [MLLIB] Update description of example.
b66431b [Octavian Geagla] [SPARK-7459] [MLLIB] Add override annotation to java example, make scala example use same data as java.
6b26b03 [Octavian Geagla] [SPARK-7459] [MLLIB] Fix line which is too long.
79af020 [Octavian Geagla] [SPARK-7459] [MLLIB] Actually don't use Java 8.
9d5b31a [Octavian Geagla] [SPARK-7459] [MLLIB] Don't use Java 8
4f0c92f [Octavian Geagla] [SPARK-7459] [MLLIB] ElementwiseProduct Java example.

(cherry picked from commit e3a4374833)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-30 00:00:49 -07:00
Timothy Chen 8938a74893 [SPARK-7962] [MESOS] Fix master url parsing in rest submission client.
Only parse standalone master url when master url starts with spark://

Author: Timothy Chen <tnachen@gmail.com>

Closes #6517 from tnachen/fix_mesos_client and squashes the following commits:

61a1198 [Timothy Chen] Fix master url parsing in rest submission client.

(cherry picked from commit 78657d53d7)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-29 23:56:27 -07:00
Octavian Geagla 11a4b30d1e [SPARK-7576] [MLLIB] Add spark.ml user guide doc/example for ElementwiseProduct
Author: Octavian Geagla <ogeagla@gmail.com>

Closes #6501 from ogeagla/ml-guide-elemwiseprod and squashes the following commits:

4ad93d5 [Octavian Geagla] [SPARK-7576] [MLLIB] Incorporate code review feedback.
f7be7ad [Octavian Geagla] [SPARK-7576] [MLLIB] Add spark.ml user guide doc/example for ElementwiseProduct.

(cherry picked from commit da2112aef2)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-29 23:55:29 -07:00
Burak Yavuz 1513cffa35 [SPARK-7957] Preserve partitioning when using randomSplit
cc JoshRosen
Thanks for noticing this!

Author: Burak Yavuz <brkyvz@gmail.com>

Closes #6509 from brkyvz/sample-perf-reg and squashes the following commits:

497465d [Burak Yavuz] addressed code review
293f95f [Burak Yavuz] [SPARK-7957] Preserve partitioning when using randomSplit

(cherry picked from commit 7ed06c3992)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-29 22:19:23 -07:00
Taka Shinagawa 400e6dbce2 [DOCS][Tiny] Added a missing dash(-) in docs/configuration.md
The first line had only two dashes (--) instead of three(---). Because of this missing dash(-), 'jekyll build' command was not converting configuration.md to _site/configuration.html

Author: Taka Shinagawa <taka.epsilon@gmail.com>

Closes #6513 from mrt/docfix3 and squashes the following commits:

c470e2c [Taka Shinagawa] Added a missing dash(-) preventing jekyll from converting configuration.md to html format

(cherry picked from commit 3792d25836)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-29 20:35:26 -07:00
Ram Sriharsha 9a88be1833 [SPARK-6013] [ML] Add more Python ML examples for spark.ml
Author: Ram Sriharsha <rsriharsha@hw11853.local>

Closes #6443 from harsha2010/SPARK-6013 and squashes the following commits:

732506e [Ram Sriharsha] Code Review Feedback
121c211 [Ram Sriharsha] python style fix
5f9b8c3 [Ram Sriharsha] python style fixes
925ca86 [Ram Sriharsha] Simple Params Example
8b372b1 [Ram Sriharsha] GBT Example
965ec14 [Ram Sriharsha] Random Forest Example

(cherry picked from commit dbf8ff38de)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-29 15:22:38 -07:00
Shivaram Venkataraman 2bd4460548 [SPARK-7954] [SPARKR] Create SparkContext in sparkRSQL init
cc davies

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6507 from shivaram/sparkr-init and squashes the following commits:

6fdd169 [Shivaram Venkataraman] Create SparkContext in sparkRSQL init

(cherry picked from commit 5fb97dca9b)
Signed-off-by: Davies Liu <davies@databricks.com>
2015-05-29 15:08:50 -07:00