Commit graph

11362 commits

Author SHA1 Message Date
Reynold Xin 01f38f75d9 [SPARK-7979] Enforce structural type checker.
Author: Reynold Xin <rxin@databricks.com>

Closes #6536 from rxin/structural-type-checker and squashes the following commits:

f833151 [Reynold Xin] Fixed compilation.
633f9a1 [Reynold Xin] Fixed typo.
d1fa804 [Reynold Xin] [SPARK-7979] Enforce structural type checker.

(cherry picked from commit 4b5f12bac9)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-31 01:40:57 -07:00
Reynold Xin a1904fa79e [SPARK-3850] Trim trailing spaces for SQL.
Author: Reynold Xin <rxin@databricks.com>

Closes #6535 from rxin/whitespace-sql and squashes the following commits:

de50316 [Reynold Xin] [SPARK-3850] Trim trailing spaces for SQL.

(cherry picked from commit 63a50be13d)
Signed-off-by: Reynold Xin <rxin@databricks.com>

Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
	sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
	sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala
	sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala
2015-05-31 00:52:02 -07:00
Reynold Xin f63eab950b [SPARK-3850] Trim trailing spaces for examples/streaming/yarn.
Author: Reynold Xin <rxin@databricks.com>

Closes #6530 from rxin/trim-whitespace-1 and squashes the following commits:

7b7b3a0 [Reynold Xin] Reset again.
dc14597 [Reynold Xin] Reset scalastyle.
cd556c4 [Reynold Xin] YARN, Kinesis, Flume.
4223fe1 [Reynold Xin] [SPARK-3850] Trim trailing spaces for examples/streaming.

(cherry picked from commit 564bc11e98)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-31 00:48:29 -07:00
Reynold Xin a7c217166b [SPARK-3850] Trim trailing spaces for core.
Author: Reynold Xin <rxin@databricks.com>

Closes #6533 from rxin/whitespace-2 and squashes the following commits:

038314c [Reynold Xin] [SPARK-3850] Trim trailing spaces for core.

(cherry picked from commit 74fdc97c72)
Signed-off-by: Reynold Xin <rxin@databricks.com>

Conflicts:
	core/src/main/scala/org/apache/spark/storage/TachyonBlockManager.scala
	core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala
2015-05-31 00:17:47 -07:00
Reynold Xin 2016927f70 [SPARK-7975] Add style checker to disallow overriding equals covariantly.
Author: Reynold Xin <rxin@databricks.com>

This patch had conflicts when merged, resolved by
Committer: Reynold Xin <rxin@databricks.com>

Closes #6527 from rxin/covariant-equals and squashes the following commits:

e7d7784 [Reynold Xin] [SPARK-7975] Enforce CovariantEqualsChecker

(cherry picked from commit 7896e99b2a)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-31 00:06:02 -07:00
Cheng Lian 0d093d6e78 [SQL] [MINOR] Adds @deprecated Scaladoc entry for SchemaRDD
Author: Cheng Lian <lian@databricks.com>

Closes #6529 from liancheng/schemardd-deprecation-fix and squashes the following commits:

49765c2 [Cheng Lian] Adds @deprecated Scaladoc entry for SchemaRDD

(cherry picked from commit 8764dccebd)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-30 23:49:47 -07:00
Reynold Xin adfc9d1fa0 [SPARK-7976] Add style checker to disallow overriding finalize.
Author: Reynold Xin <rxin@databricks.com>

Closes #6528 from rxin/style-finalizer and squashes the following commits:

a2211ca [Reynold Xin] [SPARK-7976] Enable NoFinalizeChecker.

(cherry picked from commit 084fef76e9)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-30 23:36:37 -07:00
Reynold Xin 5e268d3956 Update documentation for the new DataFrame reader/writer interface.
Author: Reynold Xin <rxin@databricks.com>

Closes #6522 from rxin/sql-doc-1.4 and squashes the following commits:

c227be7 [Reynold Xin] Updated link.
040b6d7 [Reynold Xin] Update documentation for the new DataFrame reader/writer interface.

(cherry picked from commit 00a7137900)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-30 20:10:08 -07:00
Reynold Xin e74ea78276 [SPARK-7971] Add JavaDoc style deprecation for deprecated DataFrame methods
Scala deprecated annotation actually doesn't show up in JavaDoc.

Author: Reynold Xin <rxin@databricks.com>

Closes #6523 from rxin/df-deprecated-javadoc and squashes the following commits:

26da2b2 [Reynold Xin] [SPARK-7971] Add JavaDoc style deprecation for deprecated DataFrame methods.

(cherry picked from commit c63e1a742b)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-30 19:51:58 -07:00
Reynold Xin dc58e688ab [SQL] Tighten up visibility for JavaDoc.
I went through all the JavaDocs and tightened up visibility.

Author: Reynold Xin <rxin@databricks.com>

Closes #6526 from rxin/sql-1.4-visibility-for-docs and squashes the following commits:

bc37d1e [Reynold Xin] Tighten up visibility for JavaDoc.

(cherry picked from commit 14b314dc2c)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-30 19:51:17 -07:00
Xiangrui Meng a60b8bf329 [SPARK-5610] [DOC] update genjavadocSettings to use the patched version of genjavadoc
This PR updates `genjavadocSettings` to use a patched version of `genjavadoc-plugin` that hides package private classes/methods/interfaces in the generated Java API doc. The patch can be found at: https://github.com/typesafehub/genjavadoc/compare/master...mengxr:spark-1.4.

It wasn't merged into the main repo because there exist corner cases where a package private Scala class has to be a Java public class in order to compile. This doesn't seem to apply to the Spark codebase. So we release a patched version under `org.spark-project` and use it in the Spark build. brkyvz is publishing the artifacts to Maven Central.

Need more people audit the generated APIs and make sure we don't have false negatives.

Current listed classes under `org.apache.spark.rdd`:
![screen shot 2015-05-29 at 12 48 52 pm](https://cloud.githubusercontent.com/assets/829644/7891396/28fb9daa-0601-11e5-8ed8-4e9522d25a71.png)

After this PR:
![screen shot 2015-05-29 at 12 48 23 pm](https://cloud.githubusercontent.com/assets/829644/7891408/408e210e-0601-11e5-975c-ff0a02eb5c91.png)

cc: pwendell rxin srowen

Author: Xiangrui Meng <meng@databricks.com>

Closes #6506 from mengxr/SPARK-5610 and squashes the following commits:

489c785 [Xiangrui Meng] update genjavadocSettings to use the patched version of genjavadoc

(cherry picked from commit 2b258e1c07)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-30 17:22:31 -07:00
Mike Dusenberry df56309b04 [SPARK-7920] [MLLIB] Make MLlib ChiSqSelector Serializable (& Fix Related Documentation Example).
The MLlib ChiSqSelector class is not serializable, and so the example in the ChiSqSelector documentation fails. Also, that example is missing the import of ChiSqSelector.

This PR makes ChiSqSelector extend Serializable in MLlib, and adds the ChiSqSelector import statement to the associated example in the documentation.

Author: Mike Dusenberry <dusenberrymw@gmail.com>

Closes #6462 from dusenberrymw/Make_ChiSqSelector_Serializable_and_Fix_Related_Docs_Example and squashes the following commits:

9cb2f94 [Mike Dusenberry] Make MLlib ChiSqSelector Serializable.
d9003bf [Mike Dusenberry] Add missing import in MLlib ChiSqSelector Docs Scala example.

(cherry picked from commit 1281a35188)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-30 16:51:08 -07:00
Yanbo Liang 2790bb0354 [SPARK-7918] [MLLIB] MLlib Python doc parity check for evaluation and feature
Check then make the MLlib Python evaluation and feature doc to be as complete as the Scala doc.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #6461 from yanboliang/spark-7918 and squashes the following commits:

940e3f1 [Yanbo Liang] truncate too long line and remove extra sparse
a80ae58 [Yanbo Liang] MLlib Python doc parity check for evaluation and feature

(cherry picked from commit 1617363fbb)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-30 16:24:26 -07:00
Reynold Xin 6d7cf5382d Updated SQL programming guide's Hive connectivity section.
(cherry picked from commit 7716a5a1ec)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-30 14:58:11 -07:00
Cheng Lian b2b7601471 [SPARK-7849] [SQL] [Docs] Updates SQL programming guide for 1.4
Author: Cheng Lian <lian@databricks.com>

Closes #6520 from liancheng/spark-7849 and squashes the following commits:

705264b [Cheng Lian] Updates SQL programming guide for 1.4

(cherry picked from commit 6e3f0c7810)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-30 12:16:16 -07:00
Taka Shinagawa e7ba3ea86b [DOCS] [MINOR] Update for the Hadoop versions table with hadoop-2.6
Updated the doc for the hadoop-2.6 profile, which is new to Spark 1.4

Author: Taka Shinagawa <taka.epsilon@gmail.com>

Closes #6450 from mrt/docfix2 and squashes the following commits:

db1c43b [Taka Shinagawa] Updated the hadoop versions for hadoop-2.6 profile
323710e [Taka Shinagawa] The hadoop-2.6 profile is added to the Hadoop versions table

(cherry picked from commit 3ab71eb9d5)
Signed-off-by: Sean Owen <sowen@cloudera.com>
2015-05-30 08:26:06 -04:00
Sean Owen d6e9eade64 [SPARK-7890] [DOCS] Document that Spark 2.11 now supports Kafka
Remove caveat about Kafka / JDBC not being supported for Scala 2.11

Author: Sean Owen <sowen@cloudera.com>

Closes #6470 from srowen/SPARK-7890 and squashes the following commits:

4652634 [Sean Owen] One more rewording
7b7f3c8 [Sean Owen] Restore note about JDBC component
126744d [Sean Owen] Remove caveat about Kafka / JDBC not being supported for Scala 2.11

(cherry picked from commit 8c8de3ed86)
Signed-off-by: Sean Owen <sowen@cloudera.com>
2015-05-30 07:59:43 -04:00
Octavian Geagla 2c45009dad [SPARK-7459] [MLLIB] ElementwiseProduct Java example
Author: Octavian Geagla <ogeagla@gmail.com>

Closes #6008 from ogeagla/elementwise-prod-doc and squashes the following commits:

72e6dc0 [Octavian Geagla] [SPARK-7459] [MLLIB] Java example import.
cf2afbd [Octavian Geagla] [SPARK-7459] [MLLIB] Update description of example.
b66431b [Octavian Geagla] [SPARK-7459] [MLLIB] Add override annotation to java example, make scala example use same data as java.
6b26b03 [Octavian Geagla] [SPARK-7459] [MLLIB] Fix line which is too long.
79af020 [Octavian Geagla] [SPARK-7459] [MLLIB] Actually don't use Java 8.
9d5b31a [Octavian Geagla] [SPARK-7459] [MLLIB] Don't use Java 8
4f0c92f [Octavian Geagla] [SPARK-7459] [MLLIB] ElementwiseProduct Java example.

(cherry picked from commit e3a4374833)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-30 00:00:49 -07:00
Timothy Chen 8938a74893 [SPARK-7962] [MESOS] Fix master url parsing in rest submission client.
Only parse standalone master url when master url starts with spark://

Author: Timothy Chen <tnachen@gmail.com>

Closes #6517 from tnachen/fix_mesos_client and squashes the following commits:

61a1198 [Timothy Chen] Fix master url parsing in rest submission client.

(cherry picked from commit 78657d53d7)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-29 23:56:27 -07:00
Octavian Geagla 11a4b30d1e [SPARK-7576] [MLLIB] Add spark.ml user guide doc/example for ElementwiseProduct
Author: Octavian Geagla <ogeagla@gmail.com>

Closes #6501 from ogeagla/ml-guide-elemwiseprod and squashes the following commits:

4ad93d5 [Octavian Geagla] [SPARK-7576] [MLLIB] Incorporate code review feedback.
f7be7ad [Octavian Geagla] [SPARK-7576] [MLLIB] Add spark.ml user guide doc/example for ElementwiseProduct.

(cherry picked from commit da2112aef2)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-29 23:55:29 -07:00
Burak Yavuz 1513cffa35 [SPARK-7957] Preserve partitioning when using randomSplit
cc JoshRosen
Thanks for noticing this!

Author: Burak Yavuz <brkyvz@gmail.com>

Closes #6509 from brkyvz/sample-perf-reg and squashes the following commits:

497465d [Burak Yavuz] addressed code review
293f95f [Burak Yavuz] [SPARK-7957] Preserve partitioning when using randomSplit

(cherry picked from commit 7ed06c3992)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-29 22:19:23 -07:00
Taka Shinagawa 400e6dbce2 [DOCS][Tiny] Added a missing dash(-) in docs/configuration.md
The first line had only two dashes (--) instead of three(---). Because of this missing dash(-), 'jekyll build' command was not converting configuration.md to _site/configuration.html

Author: Taka Shinagawa <taka.epsilon@gmail.com>

Closes #6513 from mrt/docfix3 and squashes the following commits:

c470e2c [Taka Shinagawa] Added a missing dash(-) preventing jekyll from converting configuration.md to html format

(cherry picked from commit 3792d25836)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-29 20:35:26 -07:00
Ram Sriharsha 9a88be1833 [SPARK-6013] [ML] Add more Python ML examples for spark.ml
Author: Ram Sriharsha <rsriharsha@hw11853.local>

Closes #6443 from harsha2010/SPARK-6013 and squashes the following commits:

732506e [Ram Sriharsha] Code Review Feedback
121c211 [Ram Sriharsha] python style fix
5f9b8c3 [Ram Sriharsha] python style fixes
925ca86 [Ram Sriharsha] Simple Params Example
8b372b1 [Ram Sriharsha] GBT Example
965ec14 [Ram Sriharsha] Random Forest Example

(cherry picked from commit dbf8ff38de)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-29 15:22:38 -07:00
Shivaram Venkataraman 2bd4460548 [SPARK-7954] [SPARKR] Create SparkContext in sparkRSQL init
cc davies

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6507 from shivaram/sparkr-init and squashes the following commits:

6fdd169 [Shivaram Venkataraman] Create SparkContext in sparkRSQL init

(cherry picked from commit 5fb97dca9b)
Signed-off-by: Davies Liu <davies@databricks.com>
2015-05-29 15:08:50 -07:00
Shivaram Venkataraman cf4122e4d4 [SPARK-6806] [SPARKR] [DOCS] Add a new SparkR programming guide
This PR adds a new SparkR programming guide at the top-level. This will be useful for R users as our APIs don't directly match the Scala/Python APIs and as we need to explain SparkR without using RDDs as examples etc.

cc rxin davies pwendell

cc cafreeman -- Would be great if you could also take a look at this !

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6490 from shivaram/sparkr-guide and squashes the following commits:

d5ff360 [Shivaram Venkataraman] Add a section on HiveContext, HQL queries
408dce5 [Shivaram Venkataraman] Fix link
dbb86e3 [Shivaram Venkataraman] Fix minor typo
9aff5e0 [Shivaram Venkataraman] Address comments, use dplyr-like syntax in example
d09703c [Shivaram Venkataraman] Fix default argument in read.df
ea816a1 [Shivaram Venkataraman] Add a new SparkR programming guide Also update write.df, read.df to handle defaults better

(cherry picked from commit 5f48e5c33b)
Signed-off-by: Davies Liu <davies@databricks.com>
2015-05-29 14:12:18 -07:00
Reynold Xin f40605f064 [SPARK-7940] Enforce whitespace checking for DO, TRY, CATCH, FINALLY, MATCH, LARROW, RARROW in style checker.
…

Author: Reynold Xin <rxin@databricks.com>

Closes #6491 from rxin/more-whitespace and squashes the following commits:

f6e63dc [Reynold Xin] [SPARK-7940] Enforce whitespace checking for DO, TRY, CATCH, FINALLY, MATCH, LARROW, RARROW in style checker.

(cherry picked from commit 94f62a4979)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-29 13:39:02 -07:00
Patrick Wendell e549874c33 Preparing development version 1.4.0-SNAPSHOT 2015-05-29 13:07:07 -07:00
Patrick Wendell dd109a8746 Preparing Spark release v1.4.0-rc3 2015-05-29 13:06:59 -07:00
Patrick Wendell 18811ca20b Revert "[SQL] [TEST] [MINOR] Uses a temporary log4j.properties in HiveThriftServer2Test to ensure expected logging behavior"
This reverts commit 645e611644.
2015-05-29 13:03:52 -07:00
Patrick Wendell c68abaa34e Preparing development version 1.4.0-SNAPSHOT 2015-05-29 12:15:18 -07:00
Patrick Wendell fb60503ff2 Preparing Spark release v1.4.0-rc3 2015-05-29 12:15:13 -07:00
MechCoder 4be701aa50 [SPARK-7946] [MLLIB] DecayFactor wrongly set in StreamingKMeans
Author: MechCoder <manojkumarsivaraj334@gmail.com>

Closes #6497 from MechCoder/spark-7946 and squashes the following commits:

2fdd0a3 [MechCoder] Add non-regression test
8c988c6 [MechCoder] [SPARK-7946] DecayFactor wrongly set in StreamingKMeans

(cherry picked from commit 6181937f31)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-29 11:36:48 -07:00
Cheng Lian 645e611644 [SQL] [TEST] [MINOR] Uses a temporary log4j.properties in HiveThriftServer2Test to ensure expected logging behavior
The `HiveThriftServer2Test` relies on proper logging behavior to assert whether the Thrift server daemon process is started successfully. However, some other jar files listed in the classpath may potentially contain an unexpected Log4J configuration file which overrides the logging behavior.

This PR writes a temporary `log4j.properties` and prepend it to driver classpath before starting the testing Thrift server process to ensure proper logging behavior.

cc andrewor14 yhuai

Author: Cheng Lian <lian@databricks.com>

Closes #6493 from liancheng/override-log4j and squashes the following commits:

c489e0e [Cheng Lian] Fixes minor Scala styling issue
b46ef0d [Cheng Lian] Uses a temporary log4j.properties in HiveThriftServer2Test to ensure expected logging behavior

(cherry picked from commit 4782e13040)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-29 11:11:47 -07:00
Reynold Xin 62df047a36 HOTFIX: Scala style checker for DataTypeSuite.scala. 2015-05-29 11:06:33 -07:00
Cheng Lian caea7a618d [SPARK-7950] [SQL] Sets spark.sql.hive.version in HiveThriftServer2.startWithContext()
When starting `HiveThriftServer2` via `startWithContext`, property `spark.sql.hive.version` isn't set. This causes Simba ODBC driver 1.0.8.1006 behaves differently and fails simple queries.

Hive2 JDBC driver works fine in this case. Also, when starting the server with `start-thriftserver.sh`, both Hive2 JDBC driver and Simba ODBC driver works fine.

Please refer to [SPARK-7950] [1] for details.

[1]: https://issues.apache.org/jira/browse/SPARK-7950

Author: Cheng Lian <lian@databricks.com>

Closes #6500 from liancheng/odbc-bugfix and squashes the following commits:

051e3a3 [Cheng Lian] Fixes import order
3a97376 [Cheng Lian] Sets spark.sql.hive.version in HiveThriftServer2.startWithContext()

(cherry picked from commit e7b6177557)
Signed-off-by: Yin Huai <yhuai@databricks.com>
2015-05-29 10:43:44 -07:00
Reynold Xin 23bd05fff7 HOTFIX: Scala style checker failure due to a missing space in TachyonBlockManager.scala. 2015-05-29 09:37:46 -07:00
Tim Ellison 459c3d22e0 [SPARK-7756] [CORE] Use testing cipher suites common to Oracle and IBM security providers
Add alias names for supported cipher suites to the sample SSL configuration.

The IBM JSSE provider reports its cipher suite with an SSL_ prefix, but accepts TLS_ prefixed suite names as an alias.  However, Jetty filters the requested ciphers based on the provider's reported supported suites, so the TLS_ versions are never passed through to JSSE causing an SSL handshake failure.

Author: Tim Ellison <t.p.ellison@gmail.com>

Closes #6282 from tellison/SSLFailure and squashes the following commits:

8de8a3e [Tim Ellison] Update SecurityManagerSuite with new expected suite names
96158b2 [Tim Ellison] Update the sample configs to use ciphers that are common to both the Oracle and IBM security providers.
705421b [Tim Ellison] Merge branch 'master' of github.com:tellison/spark into SSLFailure
68b9425 [Tim Ellison] Merge branch 'master' of https://github.com/apache/spark into SSLFailure
b0c35f6 [Tim Ellison] [CORE] Add aliases used for cipher suites in IBM provider

(cherry picked from commit bf46580708)
Signed-off-by: Sean Owen <sowen@cloudera.com>
2015-05-29 05:15:00 -04:00
Xiangrui Meng 509a7cafcc [SPARK-7912] [SPARK-7921] [MLLIB] Update OneHotEncoder to handle ML attributes and change includeFirst to dropLast
This PR contains two major changes to `OneHotEncoder`:

1. more robust handling of ML attributes. If the input attribute is unknown, we look at the values to get the max category index
2. change `includeFirst` to `dropLast` and leave the default to `true`. There are couple benefits:

    a. consistent with other tutorials of one-hot encoding (or dummy coding) (e.g., http://www.ats.ucla.edu/stat/mult_pkg/faq/general/dummy.htm)
    b. keep the indices unmodified in the output vector. If we drop the first, all indices will be shifted by 1.
    c. If users use `StringIndex`, the last element is the least frequent one.

Sorry for including two changes in one PR! I'll update the user guide in another PR.

jkbradley sryza

Author: Xiangrui Meng <meng@databricks.com>

Closes #6466 from mengxr/SPARK-7912 and squashes the following commits:

a280dca [Xiangrui Meng] fix tests
d8f234d [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7912
171b276 [Xiangrui Meng] mention the difference between our impl vs sklearn's
00dfd96 [Xiangrui Meng] update OneHotEncoder in Python
208ddad [Xiangrui Meng] update OneHotEncoder to handle ML attributes and change includeFirst to dropLast

(cherry picked from commit 23452be944)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-29 00:51:24 -07:00
Patrick Wendell 6bf5a42084 Preparing development version 1.4.0-SNAPSHOT 2015-05-28 23:40:27 -07:00
Patrick Wendell f2796816be Preparing Spark release v1.4.0-rc3 2015-05-28 23:40:22 -07:00
Reynold Xin 55dc7a6933 [SPARK-7929] Turn whitespace checker on for more token types.
This is the last batch of changes to complete SPARK-7929.

Previous related PRs:
https://github.com/apache/spark/pull/6480
https://github.com/apache/spark/pull/6478
https://github.com/apache/spark/pull/6477
https://github.com/apache/spark/pull/6476
https://github.com/apache/spark/pull/6475
https://github.com/apache/spark/pull/6474
https://github.com/apache/spark/pull/6473

Author: Reynold Xin <rxin@databricks.com>

Closes #6487 from rxin/whitespace-lint and squashes the following commits:

b33d43d [Reynold Xin] [SPARK-7929] Turn whitespace checker on for more token types.

(cherry picked from commit 97a60cf75d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-28 23:00:08 -07:00
Patrick Wendell 119c93af9c Preparing development version 1.4.0-SNAPSHOT 2015-05-28 22:57:31 -07:00
Patrick Wendell 2d97d7a0aa Preparing Spark release v1.4.0-rc3 2015-05-28 22:57:26 -07:00
Patrick Wendell e419821c3b [HOTFIX] Minor style fix from last commit 2015-05-28 22:48:25 -07:00
Tathagata Das 7a52fdf25f [SPARK-7931] [STREAMING] Do not restart receiver when stopped
Attempts to restart the socket receiver when it is supposed to be stopped causes undesirable error messages.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #6483 from tdas/SPARK-7931 and squashes the following commits:

09aeee1 [Tathagata Das] Do not restart receiver when stopped
2015-05-28 22:48:23 -07:00
Xiangrui Meng 68559423ac [SPARK-7922] [MLLIB] use DataFrames for user/item factors in ALSModel
Expose user/item factors in DataFrames. This is to be more consistent with the pipeline API. It also helps maintain consistent APIs across languages. This PR also removed fitting params from `ALSModel`.

coderxiang

Author: Xiangrui Meng <meng@databricks.com>

Closes #6468 from mengxr/SPARK-7922 and squashes the following commits:

7bfb1d5 [Xiangrui Meng] update ALSModel in PySpark
1ba5607 [Xiangrui Meng] use DataFrames for user/item factors in ALS

(cherry picked from commit db95137897)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-28 22:38:46 -07:00
Tathagata Das f7cb272b7c [SPARK-7930] [CORE] [STREAMING] Fixed shutdown hook priorities
Shutdown hook for temp directories had priority 100 while SparkContext was 50. So the local root directory was deleted before SparkContext was shutdown. This leads to scary errors on running jobs, at the time of shutdown. This is especially a problem when running streaming examples, where Ctrl-C is the only way to shutdown.

The fix in this PR is to make the temp directory shutdown priority lower than SparkContext, so that the temp dirs are the last thing to get deleted, after the SparkContext has been shut down. Also, the DiskBlockManager shutdown priority is change from default 100 to temp_dir_prio + 1, so that it gets invoked just before all temp dirs are cleared.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #6482 from tdas/SPARK-7930 and squashes the following commits:

d7cbeb5 [Tathagata Das] Removed unnecessary line
1514d0b [Tathagata Das] Fixed shutdown hook priorities

(cherry picked from commit cd3d9a5c0c)
Signed-off-by: Patrick Wendell <patrick@databricks.com>
2015-05-28 22:28:31 -07:00
Kay Ousterhout aee046dfa1 [SPARK-7932] Fix misleading scheduler delay visualization
The existing code rounds down to the nearest percent when computing the proportion
of a task's time that was spent on each phase of execution, and then computes
the scheduler delay proportion as 100 - sum(all other proportions).  As a result,
a few extra percent can end up in the scheduler delay. This commit eliminates
the rounding so that the time visualizations correspond properly to the real times.

sarutak If you could take a look at this, that would be great! Not sure if there's a good
reason to round here that I missed.

cc shivaram

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #6484 from kayousterhout/SPARK-7932 and squashes the following commits:

1723cc4 [Kay Ousterhout] [SPARK-7932] Fix misleading scheduler delay visualization

(cherry picked from commit 04ddcd4db7)
Signed-off-by: Kay Ousterhout <kayousterhout@gmail.com>
2015-05-28 22:09:59 -07:00
Xiangrui Meng 1d49d8c3fd [MINOR] fix RegressionEvaluator doc
`make clean html` under `python/doc` returns
~~~
/Users/meng/src/spark/python/pyspark/ml/evaluation.py:docstring of pyspark.ml.evaluation.RegressionEvaluator.setParams:3: WARNING: Definition list ends without a blank line; unexpected unindent.
~~~

harsha2010

Author: Xiangrui Meng <meng@databricks.com>

Closes #6469 from mengxr/fix-regression-evaluator-doc and squashes the following commits:

91e2dad [Xiangrui Meng] fix RegressionEvaluator doc

(cherry picked from commit 834e699524)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-28 21:26:49 -07:00
Xiangrui Meng 6e99dd5d04 [SPARK-7926] [PYSPARK] use the official Pyrolite release
Switch to the official Pyrolite release from the one published under `org.spark-project`. Thanks irmen for making the releases on Maven Central. We didn't upgrade to 4.6 because we don't have enough time for QA. I excludes `serpent` from its dependencies because we don't use it in Spark.
~~~
[info]   +-net.jpountz.lz4:lz4:1.3.0
[info]   +-net.razorvine:pyrolite:4.4
[info]   +-net.sf.py4j:py4j:0.8.2.1
~~~

davies

Author: Xiangrui Meng <meng@databricks.com>

Closes #6472 from mengxr/SPARK-7926 and squashes the following commits:

7b3c6bf [Xiangrui Meng] use the official Pyrolite release

(cherry picked from commit c45d58c143)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-28 21:21:01 -07:00