Commit graph

11154 commits

Author SHA1 Message Date
Josh Rosen 6df71eb8c1 [SPARK-7660] Wrap SnappyOutputStream to work around snappy-java bug
This patch wraps `SnappyOutputStream` to ensure that `close()` is idempotent and to guard against write-after-`close()` bugs. This is a workaround for https://github.com/xerial/snappy-java/issues/107, a bug where a non-idempotent `close()` method can lead to stream corruption. We can remove this workaround if we upgrade to a snappy-java version that contains my fix for this bug, but in the meantime this patch offers a backportable Spark fix.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #6176 from JoshRosen/SPARK-7660-wrap-snappy and squashes the following commits:

8b77aae [Josh Rosen] Wrap SnappyOutputStream to fix SPARK-7660

(cherry picked from commit f2cc6b5bcc)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
2015-05-17 09:33:49 -07:00
Steve Loughran 0feb3ded2e [SPARK-7669] Builds against Hadoop 2.6+ get inconsistent curator depend…
This adds a new profile, `hadoop-2.6`, copying over the hadoop-2.4 properties, updating ZK to 3.4.6 and making the curator version a configurable option. That keeps the curator-recipes JAR in sync with that used in hadoop.

There's one more option to consider: making the full curator-client version explicit with its own dependency version. This will pin down the version from hadoop and hive imports

Author: Steve Loughran <stevel@hortonworks.com>

Closes #6191 from steveloughran/stevel/SPARK-7669-hadoop-2.6 and squashes the following commits:

e3e281a [Steve Loughran] SPARK-7669 declare the version of curator-client and curator-framework JARs
2901ea9 [Steve Loughran] SPARK-7669 Builds against Hadoop 2.6+ get inconsistent curator dependencies

(cherry picked from commit 50217667cc)
Signed-off-by: Sean Owen <sowen@cloudera.com>
2015-05-17 17:03:20 +01:00
Liang-Chi Hsieh 898be62489 [SPARK-7447] [SQL] Don't re-merge Parquet schema when the relation is deserialized
JIRA: https://issues.apache.org/jira/browse/SPARK-7447

`MetadataCache` in `ParquetRelation2` is annotated as `transient`. When `ParquetRelation2` is deserialized, we ask `MetadataCache` to refresh and perform schema merging again. It is time-consuming especially for very many parquet files.

With the new `FSBasedParquetRelation`, although `MetadataCache` is not `transient` now, `MetadataCache.refresh()` still performs schema merging again when the relation is deserialized.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #6012 from viirya/without_remerge_schema and squashes the following commits:

2663957 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into without_remerge_schema
6ac7d93 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into without_remerge_schema
b0fc09b [Liang-Chi Hsieh] Don't generate and merge parquetSchema multiple times.

(cherry picked from commit 3399055787)
Signed-off-by: Cheng Lian <lian@databricks.com>
2015-05-17 15:42:40 +08:00
Shivaram Venkataraman 0ed376afad [MINOR] Add 1.3, 1.3.1 to master branch EC2 scripts
cc pwendell

P.S: I can't believe this was outdated all along ?

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6215 from shivaram/update-ec2-map and squashes the following commits:

ae3937a [Shivaram Venkataraman] Add 1.3, 1.3.1 to master branch EC2 scripts

(cherry picked from commit 1a7b9ce80b)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
2015-05-17 00:12:46 -07:00
Cheng Lian 671a6bca5f [MINOR] [SQL] Removes an unreachable case clause
This case clause is already covered by the one above, and generates a compilation warning.

Author: Cheng Lian <lian@databricks.com>

Closes #6214 from liancheng/remove-unreachable-code and squashes the following commits:

c38ca7c [Cheng Lian] Removes an unreachable case clause

(cherry picked from commit ba4f8ca0d9)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-16 23:20:19 -07:00
Reynold Xin 17e078671e [SPARK-7654][SQL] Move JDBC into DataFrame's reader/writer interface.
Also moved all the deprecated functions into one place for SQLContext and DataFrame, and updated tests to use the new API.

Author: Reynold Xin <rxin@databricks.com>

Closes #6210 from rxin/df-writer-reader-jdbc and squashes the following commits:

7465c2c [Reynold Xin] Fixed unit test.
118e609 [Reynold Xin] Updated tests.
3441b57 [Reynold Xin] Updated javadoc.
13cdd1c [Reynold Xin] [SPARK-7654][SQL] Move JDBC into DataFrame's reader/writer interface.

(cherry picked from commit 517eb37a85)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-16 22:02:00 -07:00
zsxwing 84949104c9 [SPARK-7655][Core] Deserializing value should not hold the TaskSchedulerImpl lock
We should not call `DirectTaskResult.value` when holding the `TaskSchedulerImpl` lock. It may cost dozens of seconds to deserialize a large object.

Author: zsxwing <zsxwing@gmail.com>

Closes #6195 from zsxwing/SPARK-7655 and squashes the following commits:

21f502e [zsxwing] Add more comments
e25fa88 [zsxwing] Add comments
15010b5 [zsxwing] Deserialize value should not hold the TaskSchedulerImpl lock

(cherry picked from commit 3b6ef2c539)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-16 21:03:28 -07:00
Reynold Xin bd057f8b55 [SPARK-7654][MLlib] Migrate MLlib to the DataFrame reader/writer API.
Author: Reynold Xin <rxin@databricks.com>

Closes #6211 from rxin/mllib-reader and squashes the following commits:

79a2cb9 [Reynold Xin] [SPARK-7654][MLlib] Migrate MLlib to the DataFrame reader/writer API.

(cherry picked from commit 161d0b4a41)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-16 15:04:26 -07:00
Matthew Brandyberry 8bde352bd7 [BUILD] update jblas dependency version to 1.2.4
jblas 1.2.4 includes native library support for PPC64LE.

Author: Matthew Brandyberry <mbrandy@us.ibm.com>

Closes #6199 from mtbrandy/jblas-1.2.4 and squashes the following commits:

9df9301 [Matthew Brandyberry] [BUILD] update jblas dependency version to 1.2.4

(cherry picked from commit 1b4e710e5c)
Signed-off-by: Sean Owen <sowen@cloudera.com>
2015-05-16 18:17:59 +01:00
Cheng Lian 856619d485 [HOTFIX] [SQL] Fixes DataFrameWriter.mode(String)
We forgot an assignment there.

/cc rxin

Author: Cheng Lian <lian@databricks.com>

Closes #6212 from liancheng/fix-df-writer and squashes the following commits:

711fbb0 [Cheng Lian] Adds a test case
3b72d78 [Cheng Lian] Fixes DataFrameWriter.mode(String)

(cherry picked from commit ce6391296a)
Signed-off-by: Cheng Lian <lian@databricks.com>
2015-05-16 20:57:26 +08:00
zsxwing ad5b0b1ce2 [SPARK-7655][Core][SQL] Remove 'scala.concurrent.ExecutionContext.Implicits.global' in 'ask' and 'BroadcastHashJoin'
Because both `AkkaRpcEndpointRef.ask` and `BroadcastHashJoin` uses `scala.concurrent.ExecutionContext.Implicits.global`. However, because the tasks in `BroadcastHashJoin` are usually long-running tasks, which will occupy all threads in `global`. Then `ask` cannot get a chance to process the replies.

For `ask`, actually the tasks are very simple, so we can use `MoreExecutors.sameThreadExecutor()`. For `BroadcastHashJoin`, it's better to use `ThreadUtils.newDaemonCachedThreadPool`.

Author: zsxwing <zsxwing@gmail.com>

Closes #6200 from zsxwing/SPARK-7655-2 and squashes the following commits:

cfdc605 [zsxwing] Remove redundant imort and minor doc fix
cf83153 [zsxwing] Add "sameThread" and "newDaemonCachedThreadPool with maxThreadNumber" to ThreadUtils
08ad0ee [zsxwing] Remove 'scala.concurrent.ExecutionContext.Implicits.global' in 'ask' and 'BroadcastHashJoin'

(cherry picked from commit 47e7ffe36b)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-16 00:44:36 -07:00
Nishkam Ravi e7607e5cbc [SPARK-7672] [CORE] Use int conversion in translating kryoserializer.buffer.mb to kryoserializer.buffer
In translating spark.kryoserializer.buffer.mb to spark.kryoserializer.buffer, use of toDouble will lead to "Fractional values not supported" error even when spark.kryoserializer.buffer.mb is an integer.
ilganeli, andrewor14

Author: Nishkam Ravi <nravi@cloudera.com>
Author: nishkamravi2 <nishkamravi@gmail.com>
Author: nravi <nravi@c1704.halxg.cloudera.com>

Closes #6198 from nishkamravi2/master_nravi and squashes the following commits:

171a53c [nishkamravi2] Update SparkConfSuite.scala
5261bf6 [Nishkam Ravi] Add a test for deprecated config spark.kryoserializer.buffer.mb
5190f79 [Nishkam Ravi] In translating from deprecated spark.kryoserializer.buffer.mb to spark.kryoserializer.buffer use int conversion since fractions are not permissible
059ce82 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
eaa13b5 [nishkamravi2] Update Client.scala
981afd2 [Nishkam Ravi] Check for read permission before initiating copy
1b81383 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
0f1abd0 [nishkamravi2] Update Utils.scala
474e3bf [nishkamravi2] Update DiskBlockManager.scala
97c383e [nishkamravi2] Update Utils.scala
8691e0c [Nishkam Ravi] Add a try/catch block around Utils.removeShutdownHook
2be1e76 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
1c13b79 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
bad4349 [nishkamravi2] Update Main.java
36a6f87 [Nishkam Ravi] Minor changes and bug fixes
b7f4ae7 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
4a45d6a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
458af39 [Nishkam Ravi] Locate the jar using getLocation, obviates the need to pass assembly path as an argument
d9658d6 [Nishkam Ravi] Changes for SPARK-6406
ccdc334 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
3faa7a4 [Nishkam Ravi] Launcher library changes (SPARK-6406)
345206a [Nishkam Ravi] spark-class merge Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
ac58975 [Nishkam Ravi] spark-class changes
06bfeb0 [nishkamravi2] Update spark-class
35af990 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
32c3ab3 [nishkamravi2] Update AbstractCommandBuilder.java
4bd4489 [nishkamravi2] Update AbstractCommandBuilder.java
746f35b [Nishkam Ravi] "hadoop" string in the assembly name should not be mandatory (everywhere else in spark we mandate spark-assembly*hadoop*.jar)
bfe96e0 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
ee902fa [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
d453197 [nishkamravi2] Update NewHadoopRDD.scala
6f41a1d [nishkamravi2] Update NewHadoopRDD.scala
0ce2c32 [nishkamravi2] Update HadoopRDD.scala
f7e33c2 [Nishkam Ravi] Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
ba1eb8b [Nishkam Ravi] Try-catch block around the two occurrences of removeShutDownHook. Deletion of semi-redundant occurrences of expensive operation inShutDown.
71d0e17 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
494d8c0 [nishkamravi2] Update DiskBlockManager.scala
3c5ddba [nishkamravi2] Update DiskBlockManager.scala
f0d12de [Nishkam Ravi] Workaround for IllegalStateException caused by recent changes to BlockManager.stop
79ea8b4 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
b446edc [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
5c9a4cb [nishkamravi2] Update TaskSetManagerSuite.scala
535295a [nishkamravi2] Update TaskSetManager.scala
3e1b616 [Nishkam Ravi] Modify test for maxResultSize
9f6583e [Nishkam Ravi] Changes to maxResultSize code (improve error message and add condition to check if maxResultSize > 0)
5f8f9ed [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
636a9ff [nishkamravi2] Update YarnAllocator.scala
8f76c8b [Nishkam Ravi] Doc change for yarn memory overhead
35daa64 [Nishkam Ravi] Slight change in the doc for yarn memory overhead
5ac2ec1 [Nishkam Ravi] Remove out
dac1047 [Nishkam Ravi] Additional documentation for yarn memory overhead issue
42c2c3d [Nishkam Ravi] Additional changes for yarn memory overhead issue
362da5e [Nishkam Ravi] Additional changes for yarn memory overhead
c726bd9 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
f00fa31 [Nishkam Ravi] Improving logging for AM memoryOverhead
1cf2d1e [nishkamravi2] Update YarnAllocator.scala
ebcde10 [Nishkam Ravi] Modify default YARN memory_overhead-- from an additive constant to a multiplier (redone to resolve merge conflicts)
2e69f11 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
efd688a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark
2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int value, to be consistent with rest of Spark
3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark
5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark
eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark
df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456)
6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed)
5108700 [nravi] Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456)
681b36f [nravi] Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles

(cherry picked from commit 0ac8b01a07)
Signed-off-by: Sean Owen <sowen@cloudera.com>
2015-05-16 08:24:34 +01:00
Sean Owen 1fc35607d7 [SPARK-4556] [BUILD] binary distribution assembly can't run in local mode
Add note on building a runnable distribution with make-distribution.sh

Author: Sean Owen <sowen@cloudera.com>

Closes #6186 from srowen/SPARK-4556 and squashes the following commits:

4002966 [Sean Owen] Add pointer to --help flag
9fa7883 [Sean Owen] Add note on building a runnable distribution with make-distribution.sh

(cherry picked from commit 1fd33815f4)
Signed-off-by: Sean Owen <sowen@cloudera.com>
2015-05-16 08:18:50 +01:00
FavioVazquez 7e3f9fea65 [SPARK-7671] Fix wrong URLs in MLlib Data Types Documentation
There is a mistake in the URL of Matrices in the MLlib Data Types documentation (Local matrix scala section), the URL points to https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Matrices which is a mistake, since Matrices is an object that implements factory methods for Matrix that does not have a companion class. The correct link should point to https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Matrices$

There is another mistake, in the Local Vector section in Scala, Java and Python

In the Scala section the URL of Vectors points to the trait Vector (https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Vector) and not to the factory methods implemented in Vectors.

The correct link should be: https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$

In the Java section the URL of Vectors points to the Interface Vector (https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vector.html) and not to the Class Vectors

The correct link should be:
https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vectors.html

In the Python section the URL of Vectors points to the class Vector (https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.linalg.Vector) and not the Class Vectors

The correct link should be:
https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.linalg.Vectors

Author: FavioVazquez <favio.vazquezp@gmail.com>

Closes #6196 from FavioVazquez/fix-typo-matrices-mllib-datatypes and squashes the following commits:

3e9efd5 [FavioVazquez] - Fixed wrong URLs in the MLlib Data Types Documentation
9af7074 [FavioVazquez] Merge remote-tracking branch 'upstream/master'
edab1ef [FavioVazquez] Merge remote-tracking branch 'upstream/master'
b2e2f8c [FavioVazquez] Merge remote-tracking branch 'upstream/master'

(cherry picked from commit d41ae4344c)
Signed-off-by: Sean Owen <sowen@cloudera.com>
2015-05-16 08:07:16 +01:00
Reynold Xin 9da55b5706 [SPARK-7654][SQL] DataFrameReader and DataFrameWriter for input/output API
This patch introduces DataFrameWriter and DataFrameReader.

DataFrameReader interface, accessible through SQLContext.read, contains methods that create DataFrames. These methods used to reside in SQLContext. Example usage:
```scala
sqlContext.read.json("...")
sqlContext.read.parquet("...")
```

DataFrameWriter interface, accessible through DataFrame.write, implements a builder pattern to avoid the proliferation of options in writing DataFrame out. It currently implements:
- mode
- format (e.g. "parquet", "json")
- options (generic options passed down into data sources)
- partitionBy (partitioning columns)
Example usage:
```scala
df.write.mode("append").format("json").partitionBy("date").saveAsTable("myJsonTable")
```

TODO:

- [ ] Documentation update
- [ ] Move JDBC into reader / writer?
- [ ] Deprecate the old interfaces
- [ ] Move the generic load interface into reader.
- [ ] Update example code and documentation

Author: Reynold Xin <rxin@databricks.com>

Closes #6175 from rxin/reader-writer and squashes the following commits:

b146c95 [Reynold Xin] Deprecation of old APIs.
bd8abdf [Reynold Xin] Fixed merge conflict.
26abea2 [Reynold Xin] Added general load methods.
244fbec [Reynold Xin] Added equivalent to example.
4f15d92 [Reynold Xin] Added documentation for partitionBy.
7e91611 [Reynold Xin] [SPARK-7654][SQL] DataFrameReader and DataFrameWriter for input/output API.

(cherry picked from commit 578bfeeff5)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-15 22:09:52 -07:00
AiHe f41be8fb38 [SPARK-7473] [MLLIB] Add reservoir sample in RandomForest
reservoir feature sample by using existing api

Author: AiHe <ai.he@ussuning.com>

Closes #5988 from AiHe/reservoir and squashes the following commits:

e7a41ac [AiHe] remove non-robust testing case
28ffb9a [AiHe] set seed as rng.nextLong
37459e1 [AiHe] set fixed seed
1e98a4c [AiHe] [MLLIB][tree] Add reservoir sample in RandomForest

(cherry picked from commit deb411335a)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-15 20:42:59 -07:00
Davies Liu 8164fbc255 [SPARK-7543] [SQL] [PySpark] split dataframe.py into multiple files
dataframe.py is splited into column.py, group.py and dataframe.py:
```
   360 column.py
  1223 dataframe.py
   183 group.py
```

Author: Davies Liu <davies@databricks.com>

Closes #6201 from davies/split_df and squashes the following commits:

fc8f5ab [Davies Liu] split dataframe.py into multiple files

(cherry picked from commit d7b69946cb)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-15 20:09:23 -07:00
Davies Liu 61806f6fd1 [SPARK-7073] [SQL] [PySpark] Clean up SQL data type hierarchy in Python
Author: Davies Liu <davies@databricks.com>

Closes #6206 from davies/sql_type and squashes the following commits:

33d6860 [Davies Liu] [SPARK-7073] [SQL] [PySpark] Clean up SQL data type hierarchy in Python

(cherry picked from commit adfd366814)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-15 20:05:33 -07:00
Ram Sriharsha 04323ba4ab [SPARK-7575] [ML] [DOC] Example code for OneVsRest
Java and Scala examples for OneVsRest. Fixes the base classifier to be Logistic Regression and accepts the configuration parameters of the base classifier.

Author: Ram Sriharsha <rsriharsha@hw11853.local>

Closes #6115 from harsha2010/SPARK-7575 and squashes the following commits:

87ad3c7 [Ram Sriharsha] extra line
f5d9891 [Ram Sriharsha] Merge branch 'master' into SPARK-7575
7076084 [Ram Sriharsha] cleanup
dfd660c [Ram Sriharsha] cleanup
8703e4f [Ram Sriharsha] update doc
cb23995 [Ram Sriharsha] fix commandline options for JavaOneVsRestExample
69e91f8 [Ram Sriharsha] cleanup
7f4e127 [Ram Sriharsha] cleanup
d4c40d0 [Ram Sriharsha] Code Review fixes
461eb38 [Ram Sriharsha] cleanup
e0106d9 [Ram Sriharsha] Fix typo
935cf56 [Ram Sriharsha] Try to match Java and Scala Example Commandline options
5323ff9 [Ram Sriharsha] cleanup
196a59a [Ram Sriharsha] cleanup
6adfa0c [Ram Sriharsha] Style Fix
8cfc5d5 [Ram Sriharsha] [SPARK-7575] Example code for OneVsRest

(cherry picked from commit cc12a86fb0)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-15 19:33:46 -07:00
Josh Rosen ed75cc02bc [SPARK-7563] OutputCommitCoordinator.stop() should only run on the driver
This fixes a bug where an executor that exits can cause the driver's OutputCommitCoordinator to stop. To fix this, we use an `isDriver` flag and check it in `stop()`.

See https://issues.apache.org/jira/browse/SPARK-7563 for more details.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #6197 from JoshRosen/SPARK-7563 and squashes the following commits:

04b2cc5 [Josh Rosen] [SPARK-7563] OutputCommitCoordinator.stop() should only be executed on the driver

(cherry picked from commit 2c04c8a1ae)
Signed-off-by: Patrick Wendell <patrick@databricks.com>
2015-05-15 18:06:12 -07:00
Kay Ousterhout 6f78d03d2a [SPARK-7676] Bug fix and cleanup of stage timeline view
cc pwendell sarutak

This commit cleans up some unnecessary code, eliminates the feature where when you mouse-over a box in the timeline, the corresponding task is highlighted in the table (because that feature is only useful in the rare case when you have a very small number of tasks, in which case it's easy to figure out the mapping anyway), and fixes a bug where nothing shows up if you try to visualize a stage with only 1 task.

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #6202 from kayousterhout/SPARK-7676 and squashes the following commits:

dfd29d4 [Kay Ousterhout] [SPARK-7676] Bug fix and cleanup of stage timeline view

(cherry picked from commit e745456476)
Signed-off-by: Kay Ousterhout <kayousterhout@gmail.com>
2015-05-15 17:45:23 -07:00
Liang-Chi Hsieh e847d86215 [SPARK-7556] [ML] [DOC] Add user guide for spark.ml Binarizer, including Scala, Java and Python examples
JIRA: https://issues.apache.org/jira/browse/SPARK-7556

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #6116 from viirya/binarizer_doc and squashes the following commits:

40cb677 [Liang-Chi Hsieh] Better print out.
5b7ef1d [Liang-Chi Hsieh] Make examples more clear.
1bf9c09 [Liang-Chi Hsieh] For comments.
6cf8cba [Liang-Chi Hsieh] Add user guide for Binarizer.

(cherry picked from commit c8696337e2)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-15 15:05:13 -07:00
Iulian Dragos 31e6404995 [SPARK-7677] [STREAMING] Add Kafka modules to the 2.11 build.
This is somewhat related to [SPARK-6154](https://issues.apache.org/jira/browse/SPARK-6154), though it only touches Kafka, not the jline dependency for thriftserver.

I tested this locally on 2.11 (./run-tests) and everything looked good (I had to disable mima, because `MimaBuild` harcodes 2.10 for the previous version -- that's another PR).

Author: Iulian Dragos <jaguarul@gmail.com>

Closes #6149 from dragos/issue/spark-2.11-kafka and squashes the following commits:

aa15d99 [Iulian Dragos] Add Kafka modules to the 2.11 build.

(cherry picked from commit 6e77105e11)
Signed-off-by: Patrick Wendell <patrick@databricks.com>
2015-05-15 14:57:56 -07:00
qhuang 9ef6d743a6 [SPARK-7226] [SPARKR] Support math functions in R DataFrame
Author: qhuang <qian.huang@intel.com>

Closes #6170 from hqzizania/master and squashes the following commits:

f20c39f [qhuang] add tests units and fixes
2a7d121 [qhuang] use a function name more familiar to R users
07aa72e [qhuang] Support math functions in R DataFrame

(cherry picked from commit 50da9e8916)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
2015-05-15 14:06:39 -07:00
Kousuke Saruta a5f7b3b9c7 [SPARK-7296] Add timeline visualization for stages in the UI.
This PR builds on #2342 by adding a timeline view for the Stage page,
showing how tasks spend their time.

With this timeline, we can understand following things of a Stage.

* When/where each task ran
* Total duration of each task
* Proportion of the time each task spends

Also, this timeline view can scrollable and zoomable.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #5843 from sarutak/stage-page-timeline and squashes the following commits:

4ba9604 [Kousuke Saruta] Fixed the order of legends
16bb552 [Kousuke Saruta] Removed border of legend area
2e5d605 [Kousuke Saruta] Modified warning message
16cb2e6 [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into stage-page-timeline
7ae328f [Kousuke Saruta] Modified code style
d5f794a [Kousuke Saruta] Fixed performance issues more
64e6642 [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into stage-page-timeline
e4a3354 [Kousuke Saruta] minor code style change
878e3b8 [Kousuke Saruta] Fixed a bug that tooltip remains
b9d8f1b [Kousuke Saruta] Fixed performance issue
ac8842b [Kousuke Saruta] Fixed layout
2319739 [Kousuke Saruta] Modified appearances more
81903ab [Kousuke Saruta] Modified appearances
a79dcc3 [Kousuke Saruta] Modified appearance
55a390c [Kousuke Saruta] Ignored scalastyle for a line-comment
29eae3e [Kousuke Saruta] limited to longest 1000 tasks
2a9e376 [Kousuke Saruta] Minor cleanup
385b6d2 [Kousuke Saruta] Added link feature
ba1ac3e [Kousuke Saruta] Fixed style
2ae8520 [Kousuke Saruta] Updated bootstrap-tooltip.js from 2.2.2 to 2.3.2
af430f1 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into stage-page-timeline
e694b8e [Kousuke Saruta] Added timeline view to StagePage
8f6610c [Kousuke Saruta] Fixed conflict
b587cf2 [Kousuke Saruta] initial commit
11fe67d [Kousuke Saruta] Fixed conflict
79ac03d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
a91abd3 [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into timeline-viewer-feature
ef34a5b [Kousuke Saruta] Implement tooltip using bootstrap
b09d0c5 [Kousuke Saruta] Move `stroke` and `fill` attribute of rect elements to css
d3c63c8 [Kousuke Saruta] Fixed a little bit bugs
a36291b [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into timeline-viewer-feature
28714b6 [Kousuke Saruta] Fixed highlight issue
0dc4278 [Kousuke Saruta] Addressed most of Patrics's feedbacks
8110acf [Kousuke Saruta] Added scroll limit to Job timeline
974a64a [Kousuke Saruta] Removed unused function
ee7a7f0 [Kousuke Saruta] Refactored
6a91872 [Kousuke Saruta] Temporary commit
6693f34 [Kousuke Saruta] Added link to job/stage box in the timeline in order to move to corresponding row when we click
8f88222 [Kousuke Saruta] Added job/stage description
aeed4b1 [Kousuke Saruta] Removed stage timeline
fc1696c [Kousuke Saruta] Merge branch 'timeline-viewer-feature' of github.com:sarutak/spark into timeline-viewer-feature
999ccd4 [Kousuke Saruta] Improved scalability
0fc6a31 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
19815ae [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
68b7540 [Kousuke Saruta] Merge branch 'timeline-viewer-feature' of github.com:sarutak/spark into timeline-viewer-feature
52b5f0b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
dec85db [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
fcdab7d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
dab7cc1 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
09cce97 [Kousuke Saruta] Cleanuped
16f82cf [Kousuke Saruta] Cleanuped
9fb522e [Kousuke Saruta] Cleanuped
d05f2c2 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
e85e9aa [Kousuke Saruta] Cleanup: Added TimelineViewUtils.scala
a76e569 [Kousuke Saruta] Removed unused setting in timeline-view.css
5ce1b21 [Kousuke Saruta] Added vis.min.js, vis.min.css and vis.map to .rat-exclude
082f709 [Kousuke Saruta] Added Timeline-View feature for Applications, Jobs and Stages

(cherry picked from commit 9b6cf285d0)
Signed-off-by: Kay Ousterhout <kayousterhout@gmail.com>
2015-05-15 13:58:08 -07:00
ehnalis 7dc0ff3f12 [SPARK-7504] [YARN] NullPointerException when initializing SparkContext in YARN-cluster mode
Added a simple checking for SparkContext.
Also added two rational checking against null at AM object.

Author: ehnalis <zoltan.zvara@gmail.com>

Closes #6083 from ehnalis/cluster and squashes the following commits:

926bd96 [ehnalis] Moved check to SparkContext.
7c89b6e [ehnalis] Remove false line.
ea2a5fe [ehnalis] [SPARK-7504] [YARN] NullPointerException when initializing SparkContext in YARN-cluster mode
4924e01 [ehnalis] [SPARK-7504] [YARN] NullPointerException when initializing SparkContext in YARN-cluster mode
39e4fa3 [ehnalis] SPARK-7504 [YARN] NullPointerException when initializing SparkContext in YARN-cluster mode
9f287c5 [ehnalis] [SPARK-7504] [YARN] NullPointerException when initializing SparkContext in YARN-cluster mode

(cherry picked from commit 8e3822a079)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-15 12:16:03 -07:00
Kousuke Saruta e319719e06 [SPARK-7664] [WEBUI] DAG visualization: Fix incorrect link paths of DAG.
In JobPage, we can jump a StagePage when we click corresponding box of DAG viz but the link path is incorrect.

When we click a box like as follows ...
![screenshot_from_2015-05-15 19 24 25](https://cloud.githubusercontent.com/assets/4736016/7651528/5f7ef824-fb3c-11e4-9518-8c9ade2dff7a.png)

We jump to index page.
![screenshot_from_2015-05-15 19 24 45](https://cloud.githubusercontent.com/assets/4736016/7651534/6d666274-fb3c-11e4-971c-c3f2dc2b1da2.png)

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #6184 from sarutak/fix-link-path-of-dag-viz and squashes the following commits:

faba3ba [Kousuke Saruta] Fix a incorrect link

(cherry picked from commit ad92af9dbb)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-15 11:54:40 -07:00
Sean Owen fe3c7340f0 [SPARK-5412] [DEPLOY] Cannot bind Master to a specific hostname as per the documentation
Pass args to start-master.sh through to start-daemon.sh, as other scripts do, so that things like --host have effect on start-master.sh as per docs

Author: Sean Owen <sowen@cloudera.com>

Closes #6185 from srowen/SPARK-5412 and squashes the following commits:

b3ce9da [Sean Owen] Pass args to start-master.sh through to start-daemon.sh, as other scripts do, so that things like --host have effect on start-master.sh as per docs

(cherry picked from commit 8ab1450d39)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-15 11:30:26 -07:00
Tim Ellison 866e4b5204 [CORE] Protect additional test vars from early GC
Fix more places in which some test variables could be collected early by aggressive JVM optimization.
Added a couple of comments to note where existing references are sufficient in the same test pattern.

Author: Tim Ellison <t.p.ellison@gmail.com>

Closes #6187 from tellison/DefeatEarlyGC and squashes the following commits:

27329d9 [Tim Ellison] [CORE] Protect additional test vars from early GC

(cherry picked from commit 270d4b5181)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-15 11:27:35 -07:00
Oleksii Kostyliev c58b9c6167 [SPARK-7233] [CORE] Detect REPL mode once
<h3>Description</h3>
Detect REPL mode once per JVM lifespan.
Previous behavior was to check presence of interpreter mode every time a job was submitted. In the case of execution of multiple short-living jobs this was causing massive mutual blocks between submission threads.

For more details please refer to https://issues.apache.org/jira/browse/SPARK-7233.

<h3>Notes</h3>
* I inverted the return value in case of catching an exception from `true` to `false`. It seems more logical to assume that if the REPL class is not found, we aren't in the interpreter mode.
* I'd personally would call `classForName` with just a Spark classloader (`org.apache.spark.util.Utils#getSparkClassLoader`) but `org.apache.spark.util.Utils#getContextOrSparkClassLoader` is said to be preferable.
* I struggled to come up with a concise, readable and clear unit test. Suggestions are welcome if you feel necessary.

Author: Oleksii Kostyliev <etander@gmail.com>
Author: Oleksii Kostyliev <okostyliev@thunderhead.com>

Closes #5835 from preeze/SPARK-7233 and squashes the following commits:

69bb9e4 [Oleksii Kostyliev] SPARK-7527: fixed explanatory comment to meet style-checker requirements
26dcc24 [Oleksii Kostyliev] SPARK-7527: fixed explanatory comment to meet style-checker requirements
c6f9685 [Oleksii Kostyliev] Merge remote-tracking branch 'remotes/upstream/master' into SPARK-7233
b78a983 [Oleksii Kostyliev] SPARK-7527: revert the fix and let it be addressed separately at a later stage
b64d441 [Oleksii Kostyliev] SPARK-7233: inline inInterpreter parameter into instantiateClass
86e2606 [Oleksii Kostyliev] SPARK-7233, SPARK-7527: Handle interpreter mode properly.
c7ee69c [Oleksii Kostyliev] Merge remote-tracking branch 'upstream/master' into SPARK-7233
d6c07fc [Oleksii Kostyliev] SPARK-7233: properly handle the inverted meaning of isInInterpreter
c319039 [Oleksii Kostyliev] SPARK-7233: move inInterpreter to Utils and make it lazy

(cherry picked from commit b1b9d5802e)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-15 11:21:06 -07:00
FlytxtRnD dfdae5800c [SPARK-7651] [MLLIB] [PYSPARK] GMM predict, predictSoft should raise error on bad input
In the Python API for Gaussian Mixture Model, predict() and predictSoft() methods should raise an error when the input argument is not an RDD.

Author: FlytxtRnD <meethu.mathew@flytxt.com>

Closes #6180 from FlytxtRnD/GmmPredictException and squashes the following commits:

4b6aa11 [FlytxtRnD] Raise error if the input to predict()/predictSoft() is not an RDD

(cherry picked from commit 8f4aaba0e4)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-15 10:43:26 -07:00
Liang-Chi Hsieh d1f5651004 [SPARK-7668] [MLLIB] Preserve isTransposed property for Matrix after calling map function
JIRA: https://issues.apache.org/jira/browse/SPARK-7668

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #6188 from viirya/fix_matrix_map and squashes the following commits:

2a7cc97 [Liang-Chi Hsieh] Preserve isTransposed property for Matrix after calling map function.

(cherry picked from commit f96b85ab44)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-15 10:03:44 -07:00
Kousuke Saruta a17a0ee776 [SPARK-7503] [YARN] Resources in .sparkStaging directory can't be cleaned up on error
When we run applications on YARN with cluster mode, uploaded resources on .sparkStaging directory can't be cleaned up in case of failure of uploading local resources.

You can see this issue by running following command.
```
bin/spark-submit --master yarn --deploy-mode cluster --class <someClassName> <non-existing-jar>
```

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #6026 from sarutak/delete-uploaded-resources-on-error and squashes the following commits:

caef9f4 [Kousuke Saruta] Fixed style
882f921 [Kousuke Saruta] Wrapped Client#submitApplication with try/catch blocks in order to delete resources on error
1786ca4 [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into delete-uploaded-resources-on-error
f61071b [Kousuke Saruta] Fixed cleanup problem

(cherry picked from commit c64ff8036c)
Signed-off-by: Sean Owen <sowen@cloudera.com>
2015-05-15 11:37:50 +01:00
Cheng Lian bcb2c5d169 [SPARK-7591] [SQL] Partitioning support API tweaks
Please see [SPARK-7591] [1] for the details.

/cc rxin marmbrus yhuai

[1]: https://issues.apache.org/jira/browse/SPARK-7591

Author: Cheng Lian <lian@databricks.com>

Closes #6150 from liancheng/spark-7591 and squashes the following commits:

af422e7 [Cheng Lian] Addresses @rxin's comments
37d1738 [Cheng Lian] Fixes HadoopFsRelation partition columns initialization
2fc680a [Cheng Lian] Fixes Scala style issue
189ad23 [Cheng Lian] Removes HadoopFsRelation constructor arguments
522c24e [Cheng Lian] Adds OutputWriterFactory
047d40d [Cheng Lian] Renames FSBased* to HadoopFs*, also renamed FSBasedParquetRelation back to ParquetRelation2

(cherry picked from commit fdf5bba35d)
Signed-off-by: Cheng Lian <lian@databricks.com>
2015-05-15 16:21:22 +08:00
Yanbo Liang c0bb974a46 [SPARK-6258] [MLLIB] GaussianMixture Python API parity check
Implement Python API for major disparities of GaussianMixture cluster algorithm between Scala & Python
```scala
GaussianMixture
    setInitialModel
GaussianMixtureModel
    k
```

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #6087 from yanboliang/spark-6258 and squashes the following commits:

b3af21c [Yanbo Liang] fix typo
2b645c1 [Yanbo Liang] fix doc
638b4b7 [Yanbo Liang] address comments
b5bcade [Yanbo Liang] GaussianMixture Python API parity check

(cherry picked from commit 94761485b2)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-15 00:19:20 -07:00
zsxwing 0ba99f061a [SPARK-7650] [STREAMING] [WEBUI] Move streaming css and js files to the streaming project
cc tdas

Author: zsxwing <zsxwing@gmail.com>

Closes #6160 from zsxwing/SPARK-7650 and squashes the following commits:

fe6ae15 [zsxwing] Fix the import order
a4ffd99 [zsxwing] Merge branch 'master' into SPARK-7650
dc402b6 [zsxwing] Move streaming css and js files to the streaming project

(cherry picked from commit cf842d42a7)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-14 23:51:51 -07:00
Kan Zhang 6742b4ecc6 [CORE] Remove unreachable Heartbeat message from Worker
It doesn't look to me Heartbeat is sent to Worker from anyone.

Author: Kan Zhang <kzhang@apache.org>

Closes #6163 from kanzhang/deadwood and squashes the following commits:

56be118 [Kan Zhang] [core] Remove unreachable Heartbeat message from Worker

(cherry picked from commit daf4ae72fe)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-14 23:50:57 -07:00
Josh Rosen 1206a5597b [HOTFIX] Add workaround for SPARK-7660 to fix JavaAPISuite failures. 2015-05-14 23:26:51 -07:00
Yin Huai 7aa269f4bb [SQL] When creating partitioned table scan, explicitly create UnionRDD.
Otherwise, it will cause stack overflow when there are many partitions.

Author: Yin Huai <yhuai@databricks.com>

Closes #6162 from yhuai/partitionUnionedRDD and squashes the following commits:

fa016d8 [Yin Huai] Explicitly create UnionRDD.

(cherry picked from commit e8f0e016ea)
Signed-off-by: Cheng Lian <lian@databricks.com>
2015-05-15 12:04:39 +08:00
Liang-Chi Hsieh bac45229aa [SPARK-7098][SQL] Make the WHERE clause with timestamp show consistent result
JIRA: https://issues.apache.org/jira/browse/SPARK-7098

The WHERE clause with timstamp shows inconsistent results. This pr fixes it.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #5682 from viirya/consistent_timestamp and squashes the following commits:

171445a [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into consistent_timestamp
4e98520 [Liang-Chi Hsieh] Make the WHERE clause with timestamp show consistent result.

(cherry picked from commit f9705d4613)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-14 20:49:32 -07:00
Michael Armbrust 778a0548cc [SPARK-7548] [SQL] Add explode function for DataFrames
Add an `explode` function for dataframes and modify the analyzer so that single table generating functions can be present in a select clause along with other expressions.   There are currently the following restrictions:
 - only top level TGFs are allowed (i.e. no `select(explode('list) + 1)`)
 - only one may be present in a single select to avoid potentially confusing implicit Cartesian products.

TODO:
 - [ ] Python

Author: Michael Armbrust <michael@databricks.com>

Closes #6107 from marmbrus/explodeFunction and squashes the following commits:

7ee2c87 [Michael Armbrust] whitespace
6f80ba3 [Michael Armbrust] Update dataframe.py
c176c89 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explodeFunction
81b5da3 [Michael Armbrust] style
d3faa05 [Michael Armbrust] fix self join case
f9e1e3e [Michael Armbrust] fix python, add since
4f0d0a9 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explodeFunction
e710fe4 [Michael Armbrust] add java and python
52ca0dc [Michael Armbrust] [SPARK-7548][SQL] Add explode function for dataframes.

(cherry picked from commit 6d0633e3ec)
Signed-off-by: Michael Armbrust <michael@databricks.com>
2015-05-14 19:51:00 -07:00
Xiangrui Meng a238c23b02 [SPARK-7619] [PYTHON] fix docstring signature
Just realized that we need `\` at the end of the docstring. brkyvz

Author: Xiangrui Meng <meng@databricks.com>

Closes #6161 from mengxr/SPARK-7619 and squashes the following commits:

e44495f [Xiangrui Meng] fix docstring signature

(cherry picked from commit 48fc38f584)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-14 18:16:48 -07:00
Xiangrui Meng f91bb57efa [SPARK-7648] [MLLIB] Add weights and intercept to GLM wrappers in spark.ml
Otherwise, users can only use `transform` on the models. brkyvz

Author: Xiangrui Meng <meng@databricks.com>

Closes #6156 from mengxr/SPARK-7647 and squashes the following commits:

1ae3d2d [Xiangrui Meng] add weights and intercept to LogisticRegression in Python
f49eb46 [Xiangrui Meng] add weights and intercept to LinearRegressionModel

(cherry picked from commit 723853edab)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-14 18:14:07 -07:00
zsxwing 79983f17d9 [SPARK-7645] [STREAMING] [WEBUI] Show milliseconds in the UI if the batch interval < 1 second
I also updated the summary of the Streaming page.

![screen shot 2015-05-14 at 11 52 59 am](https://cloud.githubusercontent.com/assets/1000778/7640103/13cdf68e-fa36-11e4-84ec-e2a3954f4319.png)
![screen shot 2015-05-14 at 12 39 33 pm](https://cloud.githubusercontent.com/assets/1000778/7640151/4cc066ac-fa36-11e4-8494-2821d6a6f17c.png)

Author: zsxwing <zsxwing@gmail.com>

Closes #6154 from zsxwing/SPARK-7645 and squashes the following commits:

5db6ca1 [zsxwing] Add UIUtils.formatBatchTime
e4802df [zsxwing] Show milliseconds in the UI if the batch interval < 1 second

(cherry picked from commit b208f998b5)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-14 16:58:46 -07:00
zsxwing 3358485778 [SPARK-7649] [STREAMING] [WEBUI] Use window.localStorage to store the status rather than the url
Use window.localStorage to store the status rather than the url so that the url won't be changed.

cc tdas

Author: zsxwing <zsxwing@gmail.com>

Closes #6158 from zsxwing/SPARK-7649 and squashes the following commits:

3c56fef [zsxwing] Use window.localStorage to store the status rather than the url

(cherry picked from commit 0a317c124c)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-14 16:57:55 -07:00
Xiangrui Meng 8d8876d3b3 [SPARK-7643] [UI] use the correct size in RDDPage for storage info and partitions
`dataDistribution` and `partitions` are `Option[Seq[_]]`. andrewor14 squito

Author: Xiangrui Meng <meng@databricks.com>

Closes #6157 from mengxr/SPARK-7643 and squashes the following commits:

99fe8a4 [Xiangrui Meng] use the correct size in RDDPage for storage info and partitions

(cherry picked from commit 57ed16cf93)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-14 16:56:38 -07:00
Rex Xiong 894214f9ea [SPARK-7598] [DEPLOY] Add aliveWorkers metrics in Master
In Spark Standalone setup, when some workers are DEAD, they will stay in master worker list for a while.
master.workers metrics for master is only showing the total number of workers, we need to monitor how many real ALIVE workers are there to ensure the cluster is healthy.

Author: Rex Xiong <pengx@microsoft.com>

Closes #6117 from twilightgod/add-aliveWorker-metrics and squashes the following commits:

6be69a5 [Rex Xiong] Fix comment for aliveWorkers metrics
a882f39 [Rex Xiong] Fix style for aliveWorkers metrics
38ce955 [Rex Xiong] Add aliveWorkers metrics in Master

(cherry picked from commit 93dbb3ad83)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-14 16:55:37 -07:00
tedyu fceaffc49b Make SPARK prefix a variable
Author: tedyu <yuzhihong@gmail.com>

Closes #6153 from ted-yu/master and squashes the following commits:

4e0bac5 [tedyu] Use JIRA_PROJECT_NAME as variable name
ab982aa [tedyu] Make SPARK prefix a variable

(cherry picked from commit 11a1a135d1)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-14 15:26:47 -07:00
ksonj a49a145884 [SPARK-7278] [PySpark] DateType should find datetime.datetime acceptable
DateType should not be restricted to `datetime.date` but accept `datetime.datetime` objects as well. Could someone with a little more insight verify this?

Author: ksonj <kson@siberie.de>

Closes #6057 from ksonj/dates and squashes the following commits:

68a158e [ksonj] DateType should find datetime.datetime acceptable too

(cherry picked from commit 5d7d4f887d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-14 15:11:09 -07:00
Wenchen Fan aa8a0f9637 [SQL][minor] rename apply for QueryPlanner
A follow-up of https://github.com/apache/spark/pull/5624

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes #6142 from cloud-fan/tmp and squashes the following commits:

971a92b [Wenchen Fan] use plan instead of execute
24c5ffe [Wenchen Fan] rename apply

(cherry picked from commit f2cd00be35)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-14 10:25:32 -07:00