The new test uses CV to compare `maxIter=0` and `maxIter=1`, and validate on the evaluation result. jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes#6572 from mengxr/SPARK-7432 and squashes the following commits:
c236bb8 [Xiangrui Meng] fix flacky cv doctest
(cherry picked from commit bd97840d5c)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
add schema()/format()/options() for reader, add mode()/format()/options()/partitionBy() for writer
cc rxin yhuai pwendell
Author: Davies Liu <davies@databricks.com>
Closes#6578 from davies/readwrite and squashes the following commits:
720d293 [Davies Liu] address comments
b65dfa2 [Davies Liu] Update readwriter.py
1299ab6 [Davies Liu] make Python API consistent with Scala
(cherry picked from commit 445647a1a3)
Signed-off-by: Patrick Wendell <patrick@databricks.com>
This closes#6570.
Author: Yin Huai <yhuai@databricks.com>
Author: Reynold Xin <rxin@databricks.com>
Closes#6573 from rxin/deterministic and squashes the following commits:
356cd22 [Reynold Xin] Added unit test for the optimizer.
da3fde1 [Reynold Xin] Merge pull request #6570 from yhuai/SPARK-8023
da56200 [Yin Huai] Comments.
e38f264 [Yin Huai] Comment.
f9d6a73 [Yin Huai] Add a deterministic method to Expression.
(cherry picked from commit 0f80990bfa)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Conflicts:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/random.scala
https://issues.apache.org/jira/browse/SPARK-8020
Author: Yin Huai <yhuai@databricks.com>
Closes#6571 from yhuai/SPARK-8020-1 and squashes the following commits:
0398f5b [Yin Huai] First populate the SQLConf and then construct executionHive and metadataHive.
(cherry picked from commit 7b7f7b6c6f)
Signed-off-by: Yin Huai <yhuai@databricks.com>
cc yhuai
Author: Davies Liu <davies@databricks.com>
Closes#6558 from davies/decimalType and squashes the following commits:
c877ca8 [Davies Liu] Update ParquetConverter.scala
48cc57c [Davies Liu] Update ParquetConverter.scala
b43845c [Davies Liu] add test
3b4a94f [Davies Liu] DecimalType is not read back when non-native type exists
(cherry picked from commit bcb47ad771)
Signed-off-by: Reynold Xin <rxin@databricks.com>
This PR adds a Java unit test and user guide for `StringIndexer`. I put it before `OneHotEncoder` because they are closely related. jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes#6561 from mengxr/SPARK-7582 and squashes the following commits:
4bba4f1 [Xiangrui Meng] fix example
ba1cd1b [Xiangrui Meng] fix style
7fa18d1 [Xiangrui Meng] add user guide for StringIndexer
136cb93 [Xiangrui Meng] add a Java unit test for StringIndexer
(cherry picked from commit 0221c7f0ef)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
Scala `deprecated` annotation actually doesn't show up in JavaDoc.
Author: zsxwing <zsxwing@gmail.com>
Closes#6564 from zsxwing/SPARK-8025 and squashes the following commits:
2faa2bb [zsxwing] Add JavaDoc style deprecation for deprecated Streaming methods
(cherry picked from commit 7f74bb3bc6)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Author: Reynold Xin <rxin@databricks.com>
Closes#6569 from rxin/freqItemsWarning and squashes the following commits:
7eec145 [Reynold Xin] [minor doc] Add exploratory data analysis warning for DataFrame.stat.freqItem API.
(cherry picked from commit 4c868b9943)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Also use that profile in create-release.sh
cc pwendell -- Note that this means that we need `knitr` and `roxygen` installed on the machines used for building the release. Let me know if you need help with that.
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes#6567 from shivaram/SPARK-8027 and squashes the following commits:
8dc8ecf [Shivaram Venkataraman] Add maven profile to build R package docs Also use that profile in create-release.sh
(cherry picked from commit cae9306c4f)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Author: Reynold Xin <rxin@databricks.com>
Closes#6565 from rxin/alias and squashes the following commits:
286d880 [Reynold Xin] [SPARK-8026][SQL] Add Column.alias to Scala/Java DataFrame API
(cherry picked from commit 89f642a0e8)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Author: Reynold Xin <rxin@databricks.com>
Closes#6566 from rxin/crosstab and squashes the following commits:
e0ace1c [Reynold Xin] [SPARK-7982][SQL] DataFrame.stat.crosstab should use 0 instead of null for pairs that don't appear
(cherry picked from commit 6396cc0303)
Signed-off-by: Reynold Xin <rxin@databricks.com>
This prevents the spark.jars from being cleared while using `--packages` or `--jars`
cc pwendell davies brkyvz
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes#6568 from shivaram/SPARK-8028 and squashes the following commits:
3a9cf1f [Shivaram Venkataraman] Use addJar instead of setJars in SparkR This prevents the spark.jars from being cleared
(cherry picked from commit 6b44278ef7)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
StreamingContext.start() can throw exception because DStream.validateAtStart() fails (say, checkpoint directory not set for StateDStream). But by then JobScheduler, JobGenerator, and ReceiverTracker has already started, along with their actors. But those cannot be shutdown because the only way to do that is call StreamingContext.stop() which cannot be called as the context has not been marked as ACTIVE.
The solution in this PR is to stop the internal scheduler if start throw exception, and mark the context as STOPPED.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes#6559 from tdas/SPARK-7958 and squashes the following commits:
20b2ec1 [Tathagata Das] Added synchronized
790b617 [Tathagata Das] Handled exception in StreamingContext.start()
(cherry picked from commit 2f9c7519d6)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
This PR makes the types module in `pyspark/sql/types` work with pylint static analysis by removing the dynamic naming of the `pyspark/sql/_types` module to `pyspark/sql/types`.
Tests are now loaded using `$PYSPARK_DRIVER_PYTHON -m module` rather than `$PYSPARK_DRIVER_PYTHON module.py`. The old method adds the location of `module.py` to `sys.path`, so this change prevents accidental use of relative paths in Python.
Author: Michael Nazario <mnazario@palantir.com>
Closes#6439 from mnazario/feature/SPARK-7899 and squashes the following commits:
366ef30 [Michael Nazario] Remove hack on random.py
bb8b04d [Michael Nazario] Make doctests consistent with other tests
6ee4f75 [Michael Nazario] Change test scripts to use "-m"
673528f [Michael Nazario] Move _types back to types
This PR adds a section in the user guide for `VectorAssembler` with code examples in Python/Java/Scala. It also adds a unit test in Java.
jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes#6556 from mengxr/SPARK-7584 and squashes the following commits:
11313f6 [Xiangrui Meng] simplify Java example
0cd47f3 [Xiangrui Meng] update user guide
fd36292 [Xiangrui Meng] update Java unit test
ce61ca0 [Xiangrui Meng] add Java unit test for VectorAssembler
e399942 [Xiangrui Meng] scala/python example code
(cherry picked from commit 90c606925e)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
Increase the duration and timeout in streaming python tests.
Author: Davies Liu <davies@databricks.com>
Closes#6239 from davies/flaky_tests and squashes the following commits:
d6aee8f [Davies Liu] fix window tests
26317f7 [Davies Liu] Merge branch 'master' of github.com:apache/spark into flaky_tests
7947db6 [Davies Liu] fix streaming flaky tests
(cherry picked from commit b7ab0299b0)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
pwendell tdas
Author: Nishkam Ravi <nravi@cloudera.com>
Author: nishkamravi2 <nishkamravi@gmail.com>
Author: nravi <nravi@c1704.halxg.cloudera.com>
Closes#6544 from nishkamravi2/master_nravi and squashes the following commits:
46e8c03 [Nishkam Ravi] Slight modification to streaming docs
(cherry picked from commit e7c7e51f2e)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Author: Davies Liu <davies@databricks.com>
Closes#6532 from davies/decimal and squashes the following commits:
c7fcbce [Davies Liu] Update tests.py
1425359 [Davies Liu] DecimalType should not be singleton
(cherry picked from commit 91777a1c3a)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Author: Sun Rui <rui.sun@intel.com>
Closes#6183 from sun-rui/SPARK-7227 and squashes the following commits:
dd6f5b3 [Sun Rui] Rename readEnv() back to readMap(). Add alias na.omit() for dropna().
41cf725 [Sun Rui] [SPARK-7227][SPARKR] Support fillna / dropna in R DataFrame.
(cherry picked from commit 46576ab303)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Author: Reynold Xin <rxin@databricks.com>
Closes#6541 from rxin/trailing-whitespace-on and squashes the following commits:
f72ebe4 [Reynold Xin] [SPARK-3850] Turn style checker on for trailing whitespaces.
(cherry picked from commit 866652c903)
Signed-off-by: Reynold Xin <rxin@databricks.com>
add save load for examples:
KMeansModel
PowerIterationClusteringModel
Word2VecModel
IsotonicRegressionModel
Author: Yuhao Yang <hhbyyh@gmail.com>
Closes#6498 from hhbyyh/docSaveLoad and squashes the following commits:
7f9f06d [Yuhao Yang] add missing imports
c604cad [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into docSaveLoad
1dd77cc [Yuhao Yang] update document with some missing save/load
(cherry picked from commit 0674700303)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Author: Reynold Xin <rxin@databricks.com>
Closes#6534 from rxin/whitespace-mllib and squashes the following commits:
38926e3 [Reynold Xin] [SPARK-3850] Trim trailing spaces for MLlib.
(cherry picked from commit e1067d0ad1)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Add license for dagre-d3 and graphlib-dot
Author: zsxwing <zsxwing@gmail.com>
Closes#6539 from zsxwing/LICENSE and squashes the following commits:
82b0475 [zsxwing] Add license for dagre-d3 and graphlib-dot
(cherry picked from commit d1d2def2f5)
Signed-off-by: Andrew Or <andrew@databricks.com>
Author: Reynold Xin <rxin@databricks.com>
Closes#6535 from rxin/whitespace-sql and squashes the following commits:
de50316 [Reynold Xin] [SPARK-3850] Trim trailing spaces for SQL.
(cherry picked from commit 63a50be13d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Conflicts:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala
sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala
Author: Reynold Xin <rxin@databricks.com>
Closes#6533 from rxin/whitespace-2 and squashes the following commits:
038314c [Reynold Xin] [SPARK-3850] Trim trailing spaces for core.
(cherry picked from commit 74fdc97c72)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Conflicts:
core/src/main/scala/org/apache/spark/storage/TachyonBlockManager.scala
core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala
Author: Reynold Xin <rxin@databricks.com>
This patch had conflicts when merged, resolved by
Committer: Reynold Xin <rxin@databricks.com>
Closes#6527 from rxin/covariant-equals and squashes the following commits:
e7d7784 [Reynold Xin] [SPARK-7975] Enforce CovariantEqualsChecker
(cherry picked from commit 7896e99b2a)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Author: Cheng Lian <lian@databricks.com>
Closes#6529 from liancheng/schemardd-deprecation-fix and squashes the following commits:
49765c2 [Cheng Lian] Adds @deprecated Scaladoc entry for SchemaRDD
(cherry picked from commit 8764dccebd)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Author: Reynold Xin <rxin@databricks.com>
Closes#6522 from rxin/sql-doc-1.4 and squashes the following commits:
c227be7 [Reynold Xin] Updated link.
040b6d7 [Reynold Xin] Update documentation for the new DataFrame reader/writer interface.
(cherry picked from commit 00a7137900)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Scala deprecated annotation actually doesn't show up in JavaDoc.
Author: Reynold Xin <rxin@databricks.com>
Closes#6523 from rxin/df-deprecated-javadoc and squashes the following commits:
26da2b2 [Reynold Xin] [SPARK-7971] Add JavaDoc style deprecation for deprecated DataFrame methods.
(cherry picked from commit c63e1a742b)
Signed-off-by: Reynold Xin <rxin@databricks.com>
I went through all the JavaDocs and tightened up visibility.
Author: Reynold Xin <rxin@databricks.com>
Closes#6526 from rxin/sql-1.4-visibility-for-docs and squashes the following commits:
bc37d1e [Reynold Xin] Tighten up visibility for JavaDoc.
(cherry picked from commit 14b314dc2c)
Signed-off-by: Reynold Xin <rxin@databricks.com>
This PR updates `genjavadocSettings` to use a patched version of `genjavadoc-plugin` that hides package private classes/methods/interfaces in the generated Java API doc. The patch can be found at: https://github.com/typesafehub/genjavadoc/compare/master...mengxr:spark-1.4.
It wasn't merged into the main repo because there exist corner cases where a package private Scala class has to be a Java public class in order to compile. This doesn't seem to apply to the Spark codebase. So we release a patched version under `org.spark-project` and use it in the Spark build. brkyvz is publishing the artifacts to Maven Central.
Need more people audit the generated APIs and make sure we don't have false negatives.
Current listed classes under `org.apache.spark.rdd`:
![screen shot 2015-05-29 at 12 48 52 pm](https://cloud.githubusercontent.com/assets/829644/7891396/28fb9daa-0601-11e5-8ed8-4e9522d25a71.png)
After this PR:
![screen shot 2015-05-29 at 12 48 23 pm](https://cloud.githubusercontent.com/assets/829644/7891408/408e210e-0601-11e5-975c-ff0a02eb5c91.png)
cc: pwendell rxin srowen
Author: Xiangrui Meng <meng@databricks.com>
Closes#6506 from mengxr/SPARK-5610 and squashes the following commits:
489c785 [Xiangrui Meng] update genjavadocSettings to use the patched version of genjavadoc
(cherry picked from commit 2b258e1c07)
Signed-off-by: Reynold Xin <rxin@databricks.com>
The MLlib ChiSqSelector class is not serializable, and so the example in the ChiSqSelector documentation fails. Also, that example is missing the import of ChiSqSelector.
This PR makes ChiSqSelector extend Serializable in MLlib, and adds the ChiSqSelector import statement to the associated example in the documentation.
Author: Mike Dusenberry <dusenberrymw@gmail.com>
Closes#6462 from dusenberrymw/Make_ChiSqSelector_Serializable_and_Fix_Related_Docs_Example and squashes the following commits:
9cb2f94 [Mike Dusenberry] Make MLlib ChiSqSelector Serializable.
d9003bf [Mike Dusenberry] Add missing import in MLlib ChiSqSelector Docs Scala example.
(cherry picked from commit 1281a35188)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Check then make the MLlib Python evaluation and feature doc to be as complete as the Scala doc.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#6461 from yanboliang/spark-7918 and squashes the following commits:
940e3f1 [Yanbo Liang] truncate too long line and remove extra sparse
a80ae58 [Yanbo Liang] MLlib Python doc parity check for evaluation and feature
(cherry picked from commit 1617363fbb)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Author: Cheng Lian <lian@databricks.com>
Closes#6520 from liancheng/spark-7849 and squashes the following commits:
705264b [Cheng Lian] Updates SQL programming guide for 1.4
(cherry picked from commit 6e3f0c7810)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Updated the doc for the hadoop-2.6 profile, which is new to Spark 1.4
Author: Taka Shinagawa <taka.epsilon@gmail.com>
Closes#6450 from mrt/docfix2 and squashes the following commits:
db1c43b [Taka Shinagawa] Updated the hadoop versions for hadoop-2.6 profile
323710e [Taka Shinagawa] The hadoop-2.6 profile is added to the Hadoop versions table
(cherry picked from commit 3ab71eb9d5)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Remove caveat about Kafka / JDBC not being supported for Scala 2.11
Author: Sean Owen <sowen@cloudera.com>
Closes#6470 from srowen/SPARK-7890 and squashes the following commits:
4652634 [Sean Owen] One more rewording
7b7f3c8 [Sean Owen] Restore note about JDBC component
126744d [Sean Owen] Remove caveat about Kafka / JDBC not being supported for Scala 2.11
(cherry picked from commit 8c8de3ed86)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Author: Octavian Geagla <ogeagla@gmail.com>
Closes#6008 from ogeagla/elementwise-prod-doc and squashes the following commits:
72e6dc0 [Octavian Geagla] [SPARK-7459] [MLLIB] Java example import.
cf2afbd [Octavian Geagla] [SPARK-7459] [MLLIB] Update description of example.
b66431b [Octavian Geagla] [SPARK-7459] [MLLIB] Add override annotation to java example, make scala example use same data as java.
6b26b03 [Octavian Geagla] [SPARK-7459] [MLLIB] Fix line which is too long.
79af020 [Octavian Geagla] [SPARK-7459] [MLLIB] Actually don't use Java 8.
9d5b31a [Octavian Geagla] [SPARK-7459] [MLLIB] Don't use Java 8
4f0c92f [Octavian Geagla] [SPARK-7459] [MLLIB] ElementwiseProduct Java example.
(cherry picked from commit e3a4374833)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Only parse standalone master url when master url starts with spark://
Author: Timothy Chen <tnachen@gmail.com>
Closes#6517 from tnachen/fix_mesos_client and squashes the following commits:
61a1198 [Timothy Chen] Fix master url parsing in rest submission client.
(cherry picked from commit 78657d53d7)
Signed-off-by: Andrew Or <andrew@databricks.com>
Author: Octavian Geagla <ogeagla@gmail.com>
Closes#6501 from ogeagla/ml-guide-elemwiseprod and squashes the following commits:
4ad93d5 [Octavian Geagla] [SPARK-7576] [MLLIB] Incorporate code review feedback.
f7be7ad [Octavian Geagla] [SPARK-7576] [MLLIB] Add spark.ml user guide doc/example for ElementwiseProduct.
(cherry picked from commit da2112aef2)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
cc JoshRosen
Thanks for noticing this!
Author: Burak Yavuz <brkyvz@gmail.com>
Closes#6509 from brkyvz/sample-perf-reg and squashes the following commits:
497465d [Burak Yavuz] addressed code review
293f95f [Burak Yavuz] [SPARK-7957] Preserve partitioning when using randomSplit
(cherry picked from commit 7ed06c3992)
Signed-off-by: Reynold Xin <rxin@databricks.com>