Commit graph

10866 commits

Author SHA1 Message Date
Timothy Chen 2022193412 [SPARK-7216] [MESOS] Add driver details page to Mesos cluster UI.
Add a details page that displays Mesos driver in the Mesos cluster UI

Author: Timothy Chen <tnachen@gmail.com>

Closes #5763 from tnachen/mesos_cluster_page and squashes the following commits:

55f36eb [Timothy Chen] Add driver details page to Mesos cluster UI.
2015-05-01 18:36:42 -07:00
Sandy Ryza 099327d537 [SPARK-6954] [YARN] ExecutorAllocationManager can end up requesting a negative n...
...umber of executors

Author: Sandy Ryza <sandy@cloudera.com>

Closes #5704 from sryza/sandy-spark-6954 and squashes the following commits:

b7890fb [Sandy Ryza] Avoid ramping up to an existing number of executors
6eb516a [Sandy Ryza] SPARK-6954. ExecutorAllocationManager can end up requesting a negative number of executors
2015-05-01 18:33:15 -07:00
Holden Karau ae98eec730 [SPARK-3444] Provide an easy way to change log level
Add support for changing the log level at run time through the SparkContext. Based on an earlier PR, #2433 includes CR feedback from pwendel & davies

Author: Holden Karau <holden@pigscanfly.ca>

Closes #5791 from holdenk/SPARK-3444-provide-an-easy-way-to-change-log-level-r2 and squashes the following commits:

3bf3be9 [Holden Karau] fix exception
42ba873 [Holden Karau] fix exception
9117244 [Holden Karau] Only allow valid log levels, throw exception if invalid log level.
338d7bf [Holden Karau] rename setLoggingLevel to setLogLevel
fac14a0 [Holden Karau] Fix style errors
d9d03f3 [Holden Karau] Add support for changing the log level at run time through the SparkContext. Based on an earlier PR, #2433 includes CR feedback from @pwendel & @davies
2015-05-01 18:02:51 -07:00
cody koeninger 4786484076 [SPARK-2808][Streaming][Kafka] update kafka to 0.8.2
i don't think this should be merged until after 1.3.0 is final

Author: cody koeninger <cody@koeninger.org>
Author: Helena Edelson <helena.edelson@datastax.com>

Closes #4537 from koeninger/wip-2808-kafka-0.8.2-upgrade and squashes the following commits:

803aa2c [cody koeninger] [SPARK-2808][Streaming][Kafka] code cleanup per TD
e6dfaf6 [cody koeninger] [SPARK-2808][Streaming][Kafka] pointless whitespace change to trigger jenkins again
1770abc [cody koeninger] [SPARK-2808][Streaming][Kafka] make waitUntilLeaderOffset easier to call, call it from python tests as well
d4267e9 [cody koeninger] [SPARK-2808][Streaming][Kafka] fix stderr redirect in python test script
30d991d [cody koeninger] [SPARK-2808][Streaming][Kafka] remove stderr prints since it breaks python 3 syntax
1d896e2 [cody koeninger] [SPARK-2808][Streaming][Kafka] add even even more logging to python test
4c4557f [cody koeninger] [SPARK-2808][Streaming][Kafka] add even more logging to python test
115aeee [cody koeninger] Merge branch 'master' into wip-2808-kafka-0.8.2-upgrade
2712649 [cody koeninger] [SPARK-2808][Streaming][Kafka] add more logging to python test, see why its timing out in jenkins
2b92d3f [cody koeninger] [SPARK-2808][Streaming][Kafka] wait for leader offsets in the java test as well
3824ce3 [cody koeninger] [SPARK-2808][Streaming][Kafka] naming / comments per tdas
61b3464 [cody koeninger] [SPARK-2808][Streaming][Kafka] delay for second send in boundary condition test
af6f3ec [cody koeninger] [SPARK-2808][Streaming][Kafka] delay test until latest leader offset matches expected value
9edab4c [cody koeninger] [SPARK-2808][Streaming][Kafka] more shots in the dark on jenkins failing test
c70ee43 [cody koeninger] [SPARK-2808][Streaming][Kafka] add more asserts to test, try to figure out why it fails on jenkins but not locally
1d10751 [cody koeninger] Merge branch 'master' into wip-2808-kafka-0.8.2-upgrade
ed02d2c [cody koeninger] [SPARK-2808][Streaming][Kafka] move default argument for api version to overloaded method, for binary compat
407382e [cody koeninger] [SPARK-2808][Streaming][Kafka] update kafka to 0.8.2.1
77de6c2 [cody koeninger] Merge branch 'master' into wip-2808-kafka-0.8.2-upgrade
6953429 [cody koeninger] [SPARK-2808][Streaming][Kafka] update kafka to 0.8.2
2e67c66 [Helena Edelson] #SPARK-2808 Update to Kafka 0.8.2.0 GA from beta.
d9dc2bc [Helena Edelson] Merge remote-tracking branch 'upstream/master' into wip-2808-kafka-0.8.2-upgrade
e768164 [Helena Edelson] #2808 update kafka to version 0.8.2
2015-05-01 17:54:56 -07:00
jerryshao b88c275e6e [SPARK-7112][Streaming][WIP] Add a InputInfoTracker to track all the input streams
Author: jerryshao <saisai.shao@intel.com>
Author: Saisai Shao <saisai.shao@intel.com>

Closes #5680 from jerryshao/SPARK-7111 and squashes the following commits:

339f854 [Saisai Shao] Add an end-to-end test
812bcaf [jerryshao] Continue address the comments
abd0036 [jerryshao] Address the comments
727264e [jerryshao] Fix comment typo
6682bef [jerryshao] Fix compile issue
8325787 [jerryshao] Fix rebase issue
17fa251 [jerryshao] Refactor to build InputInfoTracker
ee1b536 [jerryshao] Add DirectStreamTracker to track the direct streams
2015-05-01 17:46:06 -07:00
zsxwing ebc25a4ddf [SPARK-7309] [CORE] [STREAMING] Shutdown the thread pools in ReceivedBlockHandler and DAGScheduler
Shutdown the thread pools in ReceivedBlockHandler and DAGScheduler when stopping them.

Author: zsxwing <zsxwing@gmail.com>

Closes #5845 from zsxwing/SPARK-7309 and squashes the following commits:

6c004fd [zsxwing] Shutdown the thread pools in ReceivedBlockHandler and DAGScheduler
2015-05-01 17:41:55 -07:00
Cheng Hao 98e7045805 [SPARK-6999] [SQL] Remove the infinite recursive method (useless)
Remove the method, since it causes infinite recursive calls. And seems it's a dummy method, since we have the API:
`def createDataFrame(rowRDD: JavaRDD[Row], schema: StructType): DataFrame`

Author: Cheng Hao <hao.cheng@intel.com>

Closes #5804 from chenghao-intel/spark_6999 and squashes the following commits:

63220a8 [Cheng Hao] remove the infinite recursive method (useless)
2015-05-01 19:39:30 -05:00
Rajendra Gokhale (rvgcentos) e6fb37712e [SPARK-7304] [BUILD] Include $@ in call to mvn consistently in make-distribution.sh
Adding the $ allows the caller of this script to supply additional arguments to the mvn command and is consistent with how mvn is being invoked elsewhere in the scripts

Author: Rajendra Gokhale (rvgcentos) <rvg@cloudera.com>

Closes #5846 from palamau/master and squashes the following commits:

e5f2adb [Rajendra Gokhale (rvgcentos)] Add $@ in call to mvn consistently in make-distribution.sh
2015-05-01 17:01:36 -07:00
Yin Huai 41c6a44b1a [SPARK-7312][SQL] SPARK-6913 broke jdk6 build
JIRA: https://issues.apache.org/jira/browse/SPARK-7312

Author: Yin Huai <yhuai@databricks.com>

Closes #5847 from yhuai/jdbcJava6 and squashes the following commits:

68433a2 [Yin Huai] compile with Java 6
2015-05-01 16:47:00 -07:00
Patrick Wendell 5c1fabafab Ignore flakey test in SparkSubmitUtilsSuite 2015-05-01 14:42:58 -07:00
Hari Shreedharan b1f4ca82d1 [SPARK-5342] [YARN] Allow long running Spark apps to run on secure YARN/HDFS
Take 2. Does the same thing as #4688, but fixes Hadoop-1 build.

Author: Hari Shreedharan <hshreedharan@apache.org>

Closes #5823 from harishreedharan/kerberos-longrunning and squashes the following commits:

3c86bba [Hari Shreedharan] Import fixes. Import postfixOps explicitly.
4d04301 [Hari Shreedharan] Minor formatting fixes.
b5e7a72 [Hari Shreedharan] Remove reflection, use a method in SparkHadoopUtil to update the token renewer.
7bff6e9 [Hari Shreedharan] Make sure all required classes are present in the jar. Fix import order.
e851f70 [Hari Shreedharan] Move the ExecutorDelegationTokenRenewer to yarn module. Use reflection to use it.
36eb8a9 [Hari Shreedharan] Change the renewal interval config param. Fix a bunch of comments.
611923a [Hari Shreedharan] Make sure the namenodes are listed correctly for creating tokens.
09fe224 [Hari Shreedharan] Use token.renew to get token's renewal interval rather than using hdfs-site.xml
6963bbc [Hari Shreedharan] Schedule renewal in AM before starting user class. Else, a restarted AM cannot access HDFS if the user class tries to.
072659e [Hari Shreedharan] Fix build failure caused by thread factory getting moved to ThreadUtils.
f041dd3 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
42eead4 [Hari Shreedharan] Remove RPC part. Refactor and move methods around, use renewal interval rather than max lifetime to create new tokens.
ebb36f5 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
bc083e3 [Hari Shreedharan] Overload RegisteredExecutor to send tokens. Minor doc updates.
7b19643 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
8a4f268 [Hari Shreedharan] Added docs in the security guide. Changed some code to ensure that the renewer objects are created only if required.
e800c8b [Hari Shreedharan] Restore original RegisteredExecutor message, and send new tokens via NewTokens message.
0e9507e [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
7f1bc58 [Hari Shreedharan] Minor fixes, cleanup.
bcd11f9 [Hari Shreedharan] Refactor AM and Executor token update code into separate classes, also send tokens via akka on executor startup.
f74303c [Hari Shreedharan] Move the new logic into specialized classes. Add cleanup for old credentials files.
2f9975c [Hari Shreedharan] Ensure new tokens are written out immediately on AM restart. Also, pikc up the latest suffix from HDFS if the AM is restarted.
61b2b27 [Hari Shreedharan] Account for AM restarts by making sure lastSuffix is read from the files on HDFS.
62c45ce [Hari Shreedharan] Relogin from keytab periodically.
fa233bd [Hari Shreedharan] Adding logging, fixing minor formatting and ordering issues.
42813b4 [Hari Shreedharan] Remove utils.sh, which was re-added due to merge with master.
0de27ee [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
55522e3 [Hari Shreedharan] Fix failure caused by Preconditions ambiguity.
9ef5f1b [Hari Shreedharan] Added explanation of how the credentials refresh works, some other minor fixes.
f4fd711 [Hari Shreedharan] Fix SparkConf usage.
2debcea [Hari Shreedharan] Change the file structure for credentials files. I will push a followup patch which adds a cleanup mechanism for old credentials files. The credentials files are small and few enough for it to cause issues on HDFS.
af6d5f0 [Hari Shreedharan] Cleaning up files where changes weren't required.
f0f54cb [Hari Shreedharan] Be more defensive when updating the credentials file.
f6954da [Hari Shreedharan] Got rid of Akka communication to renew, instead the executors check a known file's modification time to read the credentials.
5c11c3e [Hari Shreedharan] Move tests to YarnSparkHadoopUtil to fix compile issues.
b4cb917 [Hari Shreedharan] Send keytab to AM via DistributedCache rather than directly via HDFS
0985b4e [Hari Shreedharan] Write tokens to HDFS and read them back when required, rather than sending them over the wire.
d79b2b9 [Hari Shreedharan] Make sure correct credentials are passed to FileSystem#addDelegationTokens()
8c6928a [Hari Shreedharan] Fix issue caused by direct creation of Actor object.
fb27f46 [Hari Shreedharan] Make sure principal and keytab are set before CoarseGrainedSchedulerBackend is started. Also schedule re-logins in CoarseGrainedSchedulerBackend#start()
41efde0 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
d282d7a [Hari Shreedharan] Fix ClientSuite to set YARN mode, so that the correct class is used in tests.
bcfc374 [Hari Shreedharan] Fix Hadoop-1 build by adding no-op methods in SparkHadoopUtil, with impl in YarnSparkHadoopUtil.
f8fe694 [Hari Shreedharan] Handle None if keytab-login is not scheduled.
2b0d745 [Hari Shreedharan] [SPARK-5342][YARN] Allow long running Spark apps to run on secure YARN/HDFS.
ccba5bc [Hari Shreedharan] WIP: More changes wrt kerberos
77914dd [Hari Shreedharan] WIP: Add kerberos principal and keytab to YARN client.
2015-05-01 15:32:09 -05:00
Burak Yavuz 4dc8d74491 [SPARK-7240][SQL] Single pass covariance calculation for dataframes
Added the calculation of covariance between two columns to DataFrames.

cc mengxr rxin

Author: Burak Yavuz <brkyvz@gmail.com>

Closes #5825 from brkyvz/df-cov and squashes the following commits:

cb18046 [Burak Yavuz] changed to sample covariance
f2e862b [Burak Yavuz] fixed failed test
51e39b8 [Burak Yavuz] moved implementation
0c6a759 [Burak Yavuz] addressed math comments
8456eca [Burak Yavuz] fix pyStyle3
aa2ad29 [Burak Yavuz] fix pyStyle2
4e97a50 [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into df-cov
e3b0b85 [Burak Yavuz] addressed comments v0.1
a7115f1 [Burak Yavuz] fix python style
7dc6dbc [Burak Yavuz] reorder imports
408cb77 [Burak Yavuz] initial commit
2015-05-01 13:29:17 -07:00
Marcelo Vanzin 7b5dd3e3c0 [SPARK-7281] [YARN] Add option to set AM's lib path in client mode.
Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #5813 from vanzin/SPARK-7281 and squashes the following commits:

1cb6f42 [Marcelo Vanzin] [SPARK-7281] [yarn] Add option to set AM's lib path in client mode.
2015-05-01 21:20:46 +01:00
Nishkam Ravi f53a48827e [SPARK-7213] [YARN] Check for read permissions before copying a Hadoop config file
Author: Nishkam Ravi <nravi@cloudera.com>
Author: nishkamravi2 <nishkamravi@gmail.com>
Author: nravi <nravi@c1704.halxg.cloudera.com>

Closes #5760 from nishkamravi2/master_nravi and squashes the following commits:

eaa13b5 [nishkamravi2] Update Client.scala
981afd2 [Nishkam Ravi] Check for read permission before initiating copy
1b81383 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
0f1abd0 [nishkamravi2] Update Utils.scala
474e3bf [nishkamravi2] Update DiskBlockManager.scala
97c383e [nishkamravi2] Update Utils.scala
8691e0c [Nishkam Ravi] Add a try/catch block around Utils.removeShutdownHook
2be1e76 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
1c13b79 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
bad4349 [nishkamravi2] Update Main.java
36a6f87 [Nishkam Ravi] Minor changes and bug fixes
b7f4ae7 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
4a45d6a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
458af39 [Nishkam Ravi] Locate the jar using getLocation, obviates the need to pass assembly path as an argument
d9658d6 [Nishkam Ravi] Changes for SPARK-6406
ccdc334 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
3faa7a4 [Nishkam Ravi] Launcher library changes (SPARK-6406)
345206a [Nishkam Ravi] spark-class merge Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
ac58975 [Nishkam Ravi] spark-class changes
06bfeb0 [nishkamravi2] Update spark-class
35af990 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
32c3ab3 [nishkamravi2] Update AbstractCommandBuilder.java
4bd4489 [nishkamravi2] Update AbstractCommandBuilder.java
746f35b [Nishkam Ravi] "hadoop" string in the assembly name should not be mandatory (everywhere else in spark we mandate spark-assembly*hadoop*.jar)
bfe96e0 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
ee902fa [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
d453197 [nishkamravi2] Update NewHadoopRDD.scala
6f41a1d [nishkamravi2] Update NewHadoopRDD.scala
0ce2c32 [nishkamravi2] Update HadoopRDD.scala
f7e33c2 [Nishkam Ravi] Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
ba1eb8b [Nishkam Ravi] Try-catch block around the two occurrences of removeShutDownHook. Deletion of semi-redundant occurrences of expensive operation inShutDown.
71d0e17 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
494d8c0 [nishkamravi2] Update DiskBlockManager.scala
3c5ddba [nishkamravi2] Update DiskBlockManager.scala
f0d12de [Nishkam Ravi] Workaround for IllegalStateException caused by recent changes to BlockManager.stop
79ea8b4 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
b446edc [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
5c9a4cb [nishkamravi2] Update TaskSetManagerSuite.scala
535295a [nishkamravi2] Update TaskSetManager.scala
3e1b616 [Nishkam Ravi] Modify test for maxResultSize
9f6583e [Nishkam Ravi] Changes to maxResultSize code (improve error message and add condition to check if maxResultSize > 0)
5f8f9ed [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
636a9ff [nishkamravi2] Update YarnAllocator.scala
8f76c8b [Nishkam Ravi] Doc change for yarn memory overhead
35daa64 [Nishkam Ravi] Slight change in the doc for yarn memory overhead
5ac2ec1 [Nishkam Ravi] Remove out
dac1047 [Nishkam Ravi] Additional documentation for yarn memory overhead issue
42c2c3d [Nishkam Ravi] Additional changes for yarn memory overhead issue
362da5e [Nishkam Ravi] Additional changes for yarn memory overhead
c726bd9 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
f00fa31 [Nishkam Ravi] Improving logging for AM memoryOverhead
1cf2d1e [nishkamravi2] Update YarnAllocator.scala
ebcde10 [Nishkam Ravi] Modify default YARN memory_overhead-- from an additive constant to a multiplier (redone to resolve merge conflicts)
2e69f11 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
efd688a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark
2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int value, to be consistent with rest of Spark
3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark
5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark
eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark
df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456)
6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed)
5108700 [nravi] Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456)
681b36f [nravi] Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles
2015-05-01 21:15:33 +01:00
Patrick Wendell c6d9a42942 Revert "[SPARK-7224] added mock repository generator for --packages tests"
This reverts commit 7dacc08ab3.
2015-05-01 13:01:43 -07:00
Patrick Wendell 58d6584d34 Revert "[SPARK-7287] enabled fixed test"
This reverts commit 7cf1eb79b1.
2015-05-01 13:01:14 -07:00
Reynold Xin 37537760d1 [SPARK-7274] [SQL] Create Column expression for array/struct creation.
Author: Reynold Xin <rxin@databricks.com>

Closes #5802 from rxin/SPARK-7274 and squashes the following commits:

19aecaa [Reynold Xin] Fixed unicode tests.
bfc1538 [Reynold Xin] Export all Python functions.
2517b8c [Reynold Xin] Code review.
23da335 [Reynold Xin] Fixed Python bug.
132002e [Reynold Xin] Fixed tests.
56fce26 [Reynold Xin] Added Python support.
b0d591a [Reynold Xin] Fixed debug error.
86926a6 [Reynold Xin] Added test suite.
7dbb9ab [Reynold Xin] Ok one more.
470e2f5 [Reynold Xin] One more MLlib ...
e2d14f0 [Reynold Xin] [SPARK-7274][SQL] Create Column expression for array/struct creation.
2015-05-01 12:49:02 -07:00
Liang-Chi Hsieh 1686032728 [SPARK-7183] [NETWORK] Fix memory leak of TransportRequestHandler.streamIds
JIRA: https://issues.apache.org/jira/browse/SPARK-7183

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #5743 from viirya/fix_requesthandler_memory_leak and squashes the following commits:

cf2c086 [Liang-Chi Hsieh] For comments.
97e205c [Liang-Chi Hsieh] Remove unused import.
d35f19a [Liang-Chi Hsieh] For comments.
f9a0c37 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into fix_requesthandler_memory_leak
45908b7 [Liang-Chi Hsieh] for style.
17f020f [Liang-Chi Hsieh] Remove unused import.
37a4b6c [Liang-Chi Hsieh] Remove streamIds from TransportRequestHandler.
3b3f38a [Liang-Chi Hsieh] Fix memory leak of TransportRequestHandler.streamIds.
2015-05-01 11:59:12 -07:00
Sean Owen 1262e310cd [SPARK-6846] [WEBUI] [HOTFIX] return to GET for kill link in UI since YARN AM won't proxy POST
Partial undoing of SPARK-6846; YARN AM proxy won't forward POSTs, so go back to GET for kill links in Spark UI. Standalone UIs are not affected.

Author: Sean Owen <sowen@cloudera.com>

Closes #5837 from srowen/SPARK-6846.2 and squashes the following commits:

c17c386 [Sean Owen] Partial undoing of SPARK-6846; YARN AM proxy won't forward POSTs, so go back to GET for kill links in Spark UI. Standalone UIs are not affected.
2015-05-01 19:57:37 +01:00
Dan McClary 7d427222dc [SPARK-5854] personalized page rank
Here's a modification to PageRank which does personalized PageRank.  The approach is basically similar to that outlined by Bahmani et al. from 2010 (http://arxiv.org/pdf/1006.2880.pdf).

I'm sure this needs tuning up or other considerations, so let me know how I can improve this.

Author: Dan McClary <dan.mcclary@gmail.com>
Author: dwmclary <dan.mcclary@gmail.com>

Closes #4774 from dwmclary/SPARK-5854-Personalized-PageRank and squashes the following commits:

8b907db [dwmclary] fixed scalastyle errors in PageRankSuite
2c20e5d [dwmclary] merged with upstream master
d6cebac [dwmclary] updated as per style requests
7d00c23 [Dan McClary] fixed line overrun in personalizedVertexPageRank
d711677 [Dan McClary] updated vertexProgram to restore binary compatibility for inner method
bb8d507 [Dan McClary] Merge branch 'master' of https://github.com/apache/spark into SPARK-5854-Personalized-PageRank
fba0edd [Dan McClary] fixed silly mistakes
de51be2 [Dan McClary] cleaned up whitespace between comments and methods
0c30d0c [Dan McClary] updated to maintain binary compatibility
aaf0b4b [Dan McClary] Merge branch 'master' of https://github.com/apache/spark into SPARK-5854-Personalized-PageRank
76773f6 [Dan McClary] Merge branch 'master' of https://github.com/apache/spark into SPARK-5854-Personalized-PageRank
44ada8e [Dan McClary] updated tolerance on chain PPR
1ffed95 [Dan McClary] updated tolerance on chain PPR
b67ac69 [Dan McClary] updated tolerance on chain PPR
a560942 [Dan McClary] rolled PPR into pregel code for PageRank
6dc2c29 [Dan McClary] initial implementation of personalized page rank
2015-05-01 11:55:43 -07:00
niranda 27de6fef6a changing persistence engine trait to an abstract class
Author: niranda <niranda.perera@gmail.com>

Closes #5832 from nirandaperera/PersistanceEngine_abstract_class and squashes the following commits:

67b9d5a [niranda] changing persistence engine trait to an abstract class
2015-05-01 11:27:45 -07:00
Chris Biow c8c481da18 Limit help option regex
Added word-boundary delimiters so that embedded text such as "-h" within command line options and values doesn't trigger the usage script and exit.

Author: Chris Biow <chris.biow@10gen.com>

Closes #5816 from cbiow/patch-1 and squashes the following commits:

36b3726 [Chris Biow] Limit help option regex
2015-05-01 19:26:55 +01:00
Liang-Chi Hsieh 7630213cab [SPARK-5891] [ML] Add Binarizer ML Transformer
JIRA: https://issues.apache.org/jira/browse/SPARK-5891

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #5699 from viirya/add_binarizer and squashes the following commits:

1a0b9a4 [Liang-Chi Hsieh] For comments.
bc397f2 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into add_binarizer
cc4f03c [Liang-Chi Hsieh] Implement threshold param and use merged params map.
7564c63 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into add_binarizer
1682f8c [Liang-Chi Hsieh] Add Binarizer ML Transformer.
2015-05-01 08:31:01 -07:00
Debasish Das 3b514af8a0 [SPARK-3066] [MLLIB] Support recommendAll in matrix factorization model
This is based on #3098 from debasish83.

1. BLAS' GEMM is used to compute inner products.
2. Reverted changes to MovieLensALS. SPARK-4231 should be addressed in a separate PR.
3. ~~Fixed a bug in topByKey~~

Closes #3098

debasish83 coderxiang

Author: Debasish Das <debasish.das@one.verizon.com>
Author: Xiangrui Meng <meng@databricks.com>

Closes #5829 from mengxr/SPARK-3066 and squashes the following commits:

22e6a87 [Xiangrui Meng] topByKey was correct. update its usage
389b381 [Xiangrui Meng] fix indentation
49953de [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-3066
cb9799a [Xiangrui Meng] revert MovieLensALS
f864f5e [Xiangrui Meng] update test and fix a bug in topByKey
c5e0181 [Xiangrui Meng] use GEMM and topByKey
3a0c4eb [Debasish Das] updated with spark master
98fa424 [Debasish Das] updated with master
ee99571 [Debasish Das] addressed initial review comments;merged with master;added tests for batch predict APIs in matrix factorization
3f97c49 [Debasish Das] fixed spark coding style for imports
7163a5c [Debasish Das] Added API for batch user and product recommendation; MAP calculation for product recommendation per user using randomized split
d144f57 [Debasish Das] recommendAll API to MatrixFactorizationModel, uses topK finding using BoundedPriorityQueue similar to RDD.top
f38a1b5 [Debasish Das] use sampleByKey for per user sampling
10cbb37 [Debasish Das] provide ratio for topN product validation; generate MAP and prec@k metric for movielens dataset
9fa063e [Debasish Das] import scala.math.round
4bbae0f [Debasish Das] comments fixed as per scalastyle
cd3ab31 [Debasish Das] merged with AbstractParams serialization bug
9b3951f [Debasish Das] validate user/product on MovieLens dataset through user input and compute map measure along with rmse
2015-05-01 08:27:46 -07:00
Marcelo Vanzin 3052f4916e [SPARK-4705] Handle multiple app attempts event logs, history server.
This change modifies the event logging listener to write the logs for different application
attempts to different files. The attempt ID is set by the scheduler backend, so as long
as the backend returns that ID to SparkContext, things should work. Currently, the
YARN backend does that.

The history server was also modified to model multiple attempts per application. Each
attempt has its own UI and a separate row in the listing table, so that users can look at
all the attempts separately. The UI "adapts" itself to avoid showing attempt-specific info
when all the applications being shown have a single attempt.

Author: Marcelo Vanzin <vanzin@cloudera.com>
Author: twinkle sachdeva <twinkle@kite.ggn.in.guavus.com>
Author: twinkle.sachdeva <twinkle.sachdeva@guavus.com>
Author: twinkle sachdeva <twinkle.sachdeva@guavus.com>

Closes #5432 from vanzin/SPARK-4705 and squashes the following commits:

7e289fa [Marcelo Vanzin] Review feedback.
f66dcc5 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
bc885b7 [Marcelo Vanzin] Review feedback.
76a3651 [Marcelo Vanzin] Fix log cleaner, add test.
7c381ec [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
1aa309d [Marcelo Vanzin] Improve sorting of app attempts.
2ad77e7 [Marcelo Vanzin] Missed a reference to the old property name.
9d59d92 [Marcelo Vanzin] Scalastyle...
d5a9c37 [Marcelo Vanzin] Update JsonProtocol test, make property name consistent.
ba34b69 [Marcelo Vanzin] Use Option[String] for attempt id.
f1cb9b3 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
c14ec19 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
9092d39 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
86de638 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
07446c6 [Marcelo Vanzin] Disable striping for app id / name when multiple attempts exist.
9092af5 [Marcelo Vanzin] Fix HistoryServer test.
3a14503 [Marcelo Vanzin] Argh scalastyle.
657ec18 [Marcelo Vanzin] Fix yarn history URL, app links.
c3e0a82 [Marcelo Vanzin] Move app name to app info, more UI fixes.
ce5ee5d [Marcelo Vanzin] Misc UI, test, style fixes.
cbe8bba [Marcelo Vanzin] Attempt ID in listener event should be an option.
88b1de8 [Marcelo Vanzin] Add a test for apps with multiple attempts.
3245aa2 [Marcelo Vanzin] Make app attempts part of the history server model.
5fd5c6f [Marcelo Vanzin] Fix my broken rebase.
318525a [twinkle.sachdeva] SPARK-4705: 1) moved from directory structure to single file, as per the master branch. 2) Added the attempt id inside the SparkListenerApplicationStart, to make the info available independent of directory structure. 3) Changes in History Server to render the UI as per the snaphot II
6b2e521 [twinkle sachdeva] SPARK-4705 Incorporating the review comments regarding formatting, will do the rest of the changes after this
4c1fc26 [twinkle sachdeva] SPARK-4705 Incorporating the review comments regarding formatting, will do the rest of the changes after this
0eb7722 [twinkle sachdeva] SPARK-4705: Doing cherry-pick of fix into master
2015-05-01 09:50:55 -05:00
Kousuke Saruta 7fe0f3f2b4 [SPARK-3468] [WEBUI] Timeline-View feature
I sometimes trouble-shoot and analyse the cause of long time spending job.

At the time, I find the stages which spends long time or fails, then I find the tasks which spends long time or fails, next I analyse the proportion of each phase in a task.

Another case, I find executors which spends long time for running a task and analyse the details of a task.

In such situation, I think it's helpful to visualize timeline  for each application, job, task and the details of proportion of activity for each task.

I added 3 timeline features to existing Web UI.

[Application Timeline View]
This view shows following things.

* When each executor was added/removed and the reason why it's removed.
* When each job  was started/finished.
* Status of each job.

![screenshot from 2015-04-01 16 49 25](https://cloud.githubusercontent.com/assets/4736016/6936886/e35fd582-d891-11e4-980d-8de13f50e442.png)

[Stage Timeline View]
Similar to Application Timeline View, this view shows following things.

* When each executor was added/removed and the reason why it's removed.
* When each job was started/finished.
* Status of each stage.

![screenshot from 2015-04-01 16 50 59](https://cloud.githubusercontent.com/assets/4736016/6936900/0dca6526-d892-11e4-84a8-efd9037af444.png)

[Task Assignment Timeline View]
This view shows following things.

* When each task started/finished
* How long each task spent and the proportion.
* Status of each task.
* Where each task ran on.

![screenshot from 2015-04-01 16 51 54](https://cloud.githubusercontent.com/assets/4736016/6936910/20fd5acc-d892-11e4-9018-80e463881fc2.png)

All the view above is zoomable by mouse wheel action and scrollable by drag action.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2342 from sarutak/timeline-viewer-feature and squashes the following commits:

11fe67d [Kousuke Saruta] Fixed conflict
79ac03d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
a91abd3 [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into timeline-viewer-feature
ef34a5b [Kousuke Saruta] Implement tooltip using bootstrap
b09d0c5 [Kousuke Saruta] Move `stroke` and `fill` attribute of rect elements to css
d3c63c8 [Kousuke Saruta] Fixed a little bit bugs
a36291b [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into timeline-viewer-feature
28714b6 [Kousuke Saruta] Fixed highlight issue
0dc4278 [Kousuke Saruta] Addressed most of Patrics's feedbacks
8110acf [Kousuke Saruta] Added scroll limit to Job timeline
974a64a [Kousuke Saruta] Removed unused function
ee7a7f0 [Kousuke Saruta] Refactored
6a91872 [Kousuke Saruta] Temporary commit
6693f34 [Kousuke Saruta] Added link to job/stage box in the timeline in order to move to corresponding row when we click
8f88222 [Kousuke Saruta] Added job/stage description
aeed4b1 [Kousuke Saruta] Removed stage timeline
fc1696c [Kousuke Saruta] Merge branch 'timeline-viewer-feature' of github.com:sarutak/spark into timeline-viewer-feature
999ccd4 [Kousuke Saruta] Improved scalability
0fc6a31 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
19815ae [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
68b7540 [Kousuke Saruta] Merge branch 'timeline-viewer-feature' of github.com:sarutak/spark into timeline-viewer-feature
52b5f0b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
dec85db [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
fcdab7d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
dab7cc1 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
09cce97 [Kousuke Saruta] Cleanuped
16f82cf [Kousuke Saruta] Cleanuped
9fb522e [Kousuke Saruta] Cleanuped
d05f2c2 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
e85e9aa [Kousuke Saruta] Cleanup: Added TimelineViewUtils.scala
a76e569 [Kousuke Saruta] Removed unused setting in timeline-view.css
5ce1b21 [Kousuke Saruta] Added vis.min.js, vis.min.css and vis.map to .rat-exclude
082f709 [Kousuke Saruta] Added Timeline-View feature for Applications, Jobs and Stages
2015-05-01 01:39:56 -07:00
MechCoder c24aeb6a31 [SPARK-6257] [PYSPARK] [MLLIB] MLlib API missing items in Recommendation
Adds

rank, recommendUsers and RecommendProducts to MatrixFactorizationModel in PySpark.

Author: MechCoder <manojkumarsivaraj334@gmail.com>

Closes #5807 from MechCoder/spark-6257 and squashes the following commits:

09629c6 [MechCoder] doc
953b326 [MechCoder] [SPARK-6257] MLlib API missing items in Recommendation
2015-04-30 23:51:00 -07:00
zsxwing 14b32886fa [SPARK-7291] [CORE] Fix a flaky test in AkkaRpcEnvSuite
Read the port from RpcEnv to check the result so that it will success even if port conflicts

Author: zsxwing <zsxwing@gmail.com>

Closes #5822 from zsxwing/SPARK-7291 and squashes the following commits:

e521b84 [zsxwing] Fix a flaky test in AkkaRpcEnvSuite
2015-04-30 23:44:33 -07:00
Burak Yavuz 7cf1eb79b1 [SPARK-7287] enabled fixed test
andrewor14 pwendell I reenabled the test. Let's see if it's fixed. I did also notice that `--jars` started to fail after this was ignored though in the JIRA. like [here](https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/2238/consoleFull), you see that --jars fails for the same exact reason.

Has any change been made to Spark Submit recently? Did the test setup on Jenkins change? If we look into flaky tests last month, you wouldn't find this test among them.

Author: Burak Yavuz <brkyvz@gmail.com>

Closes #5826 from brkyvz/restart-test and squashes the following commits:

f509f68 [Burak Yavuz] enabled fixed test
2015-04-30 23:39:58 -07:00
Sandy Ryza 0a2b15ce43 [SPARK-4550] In sort-based shuffle, store map outputs in serialized form
Refer to the JIRA for the design doc and some perf results.

I wanted to call out some of the more possibly controversial changes up front:
* Map outputs are only stored in serialized form when Kryo is in use.  I'm still unsure whether Java-serialized objects can be relocated.  At the very least, Java serialization writes out a stream header which causes problems with the current approach, so I decided to leave investigating this to future work.
* The shuffle now explicitly operates on key-value pairs instead of any object.  Data is written to shuffle files in alternating keys and values instead of key-value tuples.  `BlockObjectWriter.write` now accepts a key argument and a value argument instead of any object.
* The map output buffer can hold a max of Integer.MAX_VALUE bytes.  Though this wouldn't be terribly difficult to change.
* When spilling occurs, the objects that still in memory at merge time end up serialized and deserialized an extra time.

Author: Sandy Ryza <sandy@cloudera.com>

Closes #4450 from sryza/sandy-spark-4550 and squashes the following commits:

8c70dd9 [Sandy Ryza] Fix serialization
9c16fe6 [Sandy Ryza] Fix a couple tests and move getAutoReset to KryoSerializerInstance
6c54e06 [Sandy Ryza] Fix scalastyle
d8462d8 [Sandy Ryza] SPARK-4550
2015-04-30 23:14:14 -07:00
Patrick Wendell a9fc50552e HOTFIX: Disable buggy dependency checker 2015-04-30 22:39:58 -07:00
Zhan Zhang 36a7a6807e [SPARK-6479] [BLOCK MANAGER] Create off-heap block storage API
This is the classes for creating off-heap block storage API. It also includes the migration for Tachyon. The diff seems to be big, but it mainly just rename tachyon to offheap. New implementation for hdfs will be submit for review in spark-6112.

Author: Zhan Zhang <zhazhan@gmail.com>

Closes #5430 from zhzhan/SPARK-6479 and squashes the following commits:

60acd84 [Zhan Zhang] minor change to kickoff the test
12f54c9 [Zhan Zhang] solve merge conflicts
a54132c [Zhan Zhang] solve review comments
ffb8e00 [Zhan Zhang] rebase to sparkcontext change
6e121e0 [Zhan Zhang] resolve review comments and restructure blockmanasger code
a7aed6c [Zhan Zhang] add Tachyon migration code
186de31 [Zhan Zhang] initial commit for off-heap block storage api
2015-04-30 22:24:31 -07:00
Burak Yavuz b5347a4664 [SPARK-7248] implemented random number generators for DataFrames
Adds the functions `rand` (Uniform Dist) and `randn` (Normal Dist.) as expressions to DataFrames.

cc mengxr rxin

Author: Burak Yavuz <brkyvz@gmail.com>

Closes #5819 from brkyvz/df-rng and squashes the following commits:

50d69d4 [Burak Yavuz] add seed for test that failed
4234c3a [Burak Yavuz] fix Rand expression
13cad5c [Burak Yavuz] couple fixes
7d53953 [Burak Yavuz] waiting for hive tests
b453716 [Burak Yavuz] move radn with seed down
03637f0 [Burak Yavuz] fix broken hive func
c5909eb [Burak Yavuz] deleted old implementation of Rand
6d43895 [Burak Yavuz] implemented random generators
2015-04-30 21:56:03 -07:00
zsxwing 69a739c7f5 [SPARK-7282] [STREAMING] Fix the race conditions in StreamingListenerSuite
Fixed the following flaky test
```Scala
[info] StreamingListenerSuite:
[info] - batch info reporting (782 milliseconds)
[info] - receiver info reporting *** FAILED *** (3 seconds, 911 milliseconds)
[info]   The code passed to eventually never returned normally. Attempted 10 times over 3.4735783689999997 seconds. Last failure message: 0 did not equal 1. (StreamingListenerSuite.scala:104)
[info]   org.scalatest.exceptions.TestFailedDueToTimeoutException:
[info]   at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420)
[info]   at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438)
[info]   at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
[info]   at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:307)
[info]   at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
[info]   at org.apache.spark.streaming.StreamingListenerSuite$$anonfun$2.apply$mcV$sp(StreamingListenerSuite.scala:104)
[info]   at org.apache.spark.streaming.StreamingListenerSuite$$anonfun$2.apply(StreamingListenerSuite.scala:94)
[info]   at org.apache.spark.streaming.StreamingListenerSuite$$anonfun$2.apply(StreamingListenerSuite.scala:94)
[info]   at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
[info]   at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
[info]   at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
[info]   at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
[info]   at org.apache.spark.streaming.StreamingListenerSuite.org$scalatest$BeforeAndAfter$$super$runTest(StreamingListenerSuite.scala:34)
[info]   at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200)
[info]   at org.apache.spark.streaming.StreamingListenerSuite.runTest(StreamingListenerSuite.scala:34)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
[info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
[info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
[info]   at scala.collection.immutable.List.foreach(List.scala:318)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
[info]   at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
[info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
[info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
[info]   at org.scalatest.Suite$class.run(Suite.scala:1424)
[info]   at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
[info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
[info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
[info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
[info]   at org.apache.spark.streaming.StreamingListenerSuite.org$scalatest$BeforeAndAfter$$super$run(StreamingListenerSuite.scala:34)
[info]   at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241)
[info]   at org.apache.spark.streaming.StreamingListenerSuite.run(StreamingListenerSuite.scala:34)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:294)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:284)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[info]   at java.lang.Thread.run(Thread.java:745)
[info]   Cause: org.scalatest.exceptions.TestFailedException: 0 did not equal 1
[info]   at org.scalatest.MatchersHelper$.newTestFailedException(MatchersHelper.scala:160)
[info]   at org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6231)
[info]   at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6277)
[info]   at org.apache.spark.streaming.StreamingListenerSuite$$anonfun$2$$anonfun$apply$mcV$sp$1.apply$mcV$sp(StreamingListenerSuite.scala:105)
[info]   at org.apache.spark.streaming.StreamingListenerSuite$$anonfun$2$$anonfun$apply$mcV$sp$1.apply(StreamingListenerSuite.scala:104)
[info]   at org.apache.spark.streaming.StreamingListenerSuite$$anonfun$2$$anonfun$apply$mcV$sp$1.apply(StreamingListenerSuite.scala:104)
[info]   at org.scalatest.concurrent.Eventually$class.makeAValiantAttempt$1(Eventually.scala:394)
[info]   at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:408)
[info]   at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438)
[info]   at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
[info]   at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:307)
[info]   at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
[info]   at org.apache.spark.streaming.StreamingListenerSuite$$anonfun$2.apply$mcV$sp(StreamingListenerSuite.scala:104)
[info]   at org.apache.spark.streaming.StreamingListenerSuite$$anonfun$2.apply(StreamingListenerSuite.scala:94)
[info]   at org.apache.spark.streaming.StreamingListenerSuite$$anonfun$2.apply(StreamingListenerSuite.scala:94)
[info]   at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
[info]   at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
[info]   at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
[info]   at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
[info]   at org.apache.spark.streaming.StreamingListenerSuite.org$scalatest$BeforeAndAfter$$super$runTest(StreamingListenerSuite.scala:34)
[info]   at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200)
[info]   at org.apache.spark.streaming.StreamingListenerSuite.runTest(StreamingListenerSuite.scala:34)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
[info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
[info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
[info]   at scala.collection.immutable.List.foreach(List.scala:318)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
[info]   at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
[info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
[info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
[info]   at org.scalatest.Suite$class.run(Suite.scala:1424)
[info]   at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
[info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
[info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
[info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
[info]   at org.apache.spark.streaming.StreamingListenerSuite.org$scalatest$BeforeAndAfter$$super$run(StreamingListenerSuite.scala:34)
[info]   at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241)
[info]   at org.apache.spark.streaming.StreamingListenerSuite.run(StreamingListenerSuite.scala:34)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:294)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:284)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[info]   at java.lang.Thread.run(Thread.java:745)
```

The original codes didn't have a memory barrier in the `eventually` closure, which might fail the test, because JVM doesn't guarantee the memory consistency between different threads without  a memory barrier.

This PR used `ConcurrentLinkedQueue` to set up the memory barrier.

Author: zsxwing <zsxwing@gmail.com>

Closes #5812 from zsxwing/SPARK-7282 and squashes the following commits:

59115ef [zsxwing] Use SynchronizedBuffer
014dd2b [zsxwing] Fix the race conditions in StreamingListenerSuite
2015-04-30 21:32:11 -07:00
Patrick Wendell beeafcfd6e Revert "[SPARK-5213] [SQL] Pluggable SQL Parser Support"
This reverts commit 3ba5aaab82.
2015-04-30 20:33:36 -07:00
scwf 473552fa5d [SPARK-7123] [SQL] support table.star in sqlcontext
Run following sql get error
`SELECT r.*
FROM testData l join testData2 r on (l.key = r.a)`

Author: scwf <wangfei1@huawei.com>

Closes #5690 from scwf/tablestar and squashes the following commits:

3b2e2b6 [scwf] support table.star
2015-04-30 18:50:14 -07:00
Cheng Hao 3ba5aaab82 [SPARK-5213] [SQL] Pluggable SQL Parser Support
This PR aims to make the SQL Parser Pluggable, and user can register it's own parser via Spark SQL CLI.

```
# add the jar into the classpath
$hchengmydesktop:spark>bin/spark-sql --jars sql99.jar

-- switch to "hiveql" dialect
   spark-sql>SET spark.sql.dialect=hiveql;
   spark-sql>SELECT * FROM src LIMIT 1;

-- switch to "sql" dialect
   spark-sql>SET spark.sql.dialect=sql;
   spark-sql>SELECT * FROM src LIMIT 1;

-- switch to a custom dialect
   spark-sql>SET spark.sql.dialect=com.xxx.xxx.SQL99Dialect;
   spark-sql>SELECT * FROM src LIMIT 1;

-- register the non-exist SQL dialect
   spark-sql> SET spark.sql.dialect=NotExistedClass;
   spark-sql> SELECT * FROM src LIMIT 1;
-- Exception will be thrown and switch to default sql dialect ("sql" for SQLContext and "hiveql" for HiveContext)
```

Author: Cheng Hao <hao.cheng@intel.com>

Closes #4015 from chenghao-intel/sqlparser and squashes the following commits:

493775c [Cheng Hao] update the code as feedback
81a731f [Cheng Hao] remove the unecessary comment
aab0b0b [Cheng Hao] polish the code a little bit
49b9d81 [Cheng Hao] shrink the comment for rebasing
2015-04-30 18:49:06 -07:00
Vyacheslav Baranov e991255e72 [SPARK-6913][SQL] Fixed "java.sql.SQLException: No suitable driver found"
Fixed `java.sql.SQLException: No suitable driver found` when loading DataFrame into Spark SQL if the driver is supplied with `--jars` argument.

The problem is in `java.sql.DriverManager` class that can't access drivers loaded by Spark ClassLoader.

Wrappers that forward requests are created for these drivers.

Also, it's not necessary any more to include JDBC drivers in `--driver-class-path` in local mode, specifying in `--jars` argument is sufficient.

Author: Vyacheslav Baranov <slavik.baranov@gmail.com>

Closes #5782 from SlavikBaranov/SPARK-6913 and squashes the following commits:

510c43f [Vyacheslav Baranov] [SPARK-6913] Fixed review comments
b2a727c [Vyacheslav Baranov] [SPARK-6913] Fixed thread race on driver registration
c8294ae [Vyacheslav Baranov] [SPARK-6913] Fixed "No suitable driver found" when using using JDBC driver added with SparkContext.addJar
2015-04-30 18:45:14 -07:00
wangfei a0d8a61ab1 [SPARK-7109] [SQL] Push down left side filter for left semi join
Now in spark sql optimizer we only push down right side filter for left semi join, actually we can push down left side filter because left semi join is doing filter on left table essentially.

Author: wangfei <wangfei1@huawei.com>
Author: scwf <wangfei1@huawei.com>

Closes #5677 from scwf/leftsemi and squashes the following commits:

483d205 [wangfei] update with master to fix compile issue
82df0e1 [wangfei] Merge branch 'master' of https://github.com/apache/spark into leftsemi
d68a053 [wangfei] added apply
8f48a3d [scwf] added test
ebadaa9 [wangfei] left filter push down for left semi join
2015-04-30 18:18:54 -07:00
scwf 079733817f [SPARK-7093] [SQL] Using newPredicate in NestedLoopJoin to enable code generation
Using newPredicate in NestedLoopJoin instead of InterpretedPredicate to make it can make use of code generation

Author: scwf <wangfei1@huawei.com>

Closes #5665 from scwf/NLP and squashes the following commits:

d19dd31 [scwf] improvement
a887c02 [scwf] improve for NLP boundCondition
2015-04-30 18:15:56 -07:00
rakeshchalasani ee04413935 [SPARK-7280][SQL] Add "drop" column/s on a data frame
Takes a column name/s and returns a new DataFrame that drops a column/s.

Author: rakeshchalasani <vnit.rakesh@gmail.com>

Closes #5818 from rakeshchalasani/SPARK-7280 and squashes the following commits:

ce2ec09 [rakeshchalasani] Minor edit
45c06f1 [rakeshchalasani] Change withColumnRename and format changes
f68945a [rakeshchalasani] Minor fix
0b9104d [rakeshchalasani] Drop one column at a time
289afd2 [rakeshchalasani] [SPARK-7280][SQL] Add "drop" column/s on a data frame
2015-04-30 17:42:50 -07:00
Burak Yavuz 149b3ee2da [SPARK-7242][SQL][MLLIB] Frequent items for DataFrames
Finding frequent items with possibly false positives, using the algorithm described in `http://www.cs.umd.edu/~samir/498/karp.pdf`.
public API under:
```
df.stat.freqItems(cols: Array[String], support: Double = 0.001): DataFrame
```

The output is a local DataFrame having the input column names with `-freqItems` appended to it. This is a single pass algorithm that may return false positives, but no false negatives.

cc mengxr rxin

Let's get the implementations in, I can add python API in a follow up PR.

Author: Burak Yavuz <brkyvz@gmail.com>

Closes #5799 from brkyvz/freq-items and squashes the following commits:

a6ec82c [Burak Yavuz] addressed comments v?
39b1bba [Burak Yavuz] removed toSeq
0915e23 [Burak Yavuz] addressed comments v2.1
3a5c177 [Burak Yavuz] addressed comments v2.0
482e741 [Burak Yavuz] removed old import
38e784d [Burak Yavuz] addressed comments v1.0
8279d4d [Burak Yavuz] added default value for support
3d82168 [Burak Yavuz] made base implementation
2015-04-30 16:40:32 -07:00
DB Tsai 1c3e402e66 [SPARK-7279] Removed diffSum which is theoretical zero in LinearRegression and coding formating
Author: DB Tsai <dbt@netflix.com>

Closes #5809 from dbtsai/format and squashes the following commits:

6904eed [DB Tsai] triger jenkins
9146e19 [DB Tsai] initial commit
2015-04-30 16:26:51 -07:00
Josh Rosen fa01bec484 [Build] Enable MiMa checks for SQL
Now that 1.3 has been released, we should enable MiMa checks for the `sql` subproject.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #5727 from JoshRosen/enable-more-mima-checks and squashes the following commits:

3ad302b [Josh Rosen] Merge remote-tracking branch 'origin/master' into enable-more-mima-checks
0c48e4d [Josh Rosen] Merge remote-tracking branch 'origin/master' into enable-more-mima-checks
e276cee [Josh Rosen] Fix SQL MiMa checks via excludes and private[sql]
44d0d01 [Josh Rosen] Add back 'launcher' exclude
1aae027 [Josh Rosen] Enable MiMa checks for launcher and sql projects.
2015-04-30 16:23:01 -07:00
Zhongshuai Pei 77cc25fb74 [SPARK-7267][SQL]Push down Project when it's child is Limit
SQL
```
select key from (select key,value from t1 limit 100) t2 limit 10
```
Optimized Logical Plan before modifying
```
== Optimized Logical Plan ==
Limit 10
  Project key#228
    Limit 100
      MetastoreRelation default, t1, None
```
Optimized Logical Plan after modifying
```
== Optimized Logical Plan ==
Limit 10
  Limit 100
    Project key#228
      MetastoreRelation default, t1, None
```
After this, we can combine limits

Author: Zhongshuai Pei <799203320@qq.com>
Author: DoingDone9 <799203320@qq.com>

Closes #5797 from DoingDone9/ProjectLimit and squashes the following commits:

70d0fca [Zhongshuai Pei] Update FilterPushdownSuite.scala
dc83ae9 [Zhongshuai Pei] Update FilterPushdownSuite.scala
485c61c [Zhongshuai Pei] Update Optimizer.scala
f03fe7f [Zhongshuai Pei] Merge pull request #12 from apache/master
f12fa50 [Zhongshuai Pei] Merge pull request #10 from apache/master
f61210c [Zhongshuai Pei] Merge pull request #9 from apache/master
34b1a9a [Zhongshuai Pei] Merge pull request #8 from apache/master
802261c [DoingDone9] Merge pull request #7 from apache/master
d00303b [DoingDone9] Merge pull request #6 from apache/master
98b134f [DoingDone9] Merge pull request #5 from apache/master
161cae3 [DoingDone9] Merge pull request #4 from apache/master
c87e8b6 [DoingDone9] Merge pull request #3 from apache/master
cb1852d [DoingDone9] Merge pull request #2 from apache/master
c3f046f [DoingDone9] Merge pull request #1 from apache/master
2015-04-30 15:22:13 -07:00
Josh Rosen 07a86205f9 [SPARK-7288] Suppress compiler warnings due to use of sun.misc.Unsafe; add facade in front of Unsafe; remove use of Unsafe.setMemory
This patch suppresses compiler warnings due to our use of `sun.misc.Unsafe` (introduced in #5725).  These warnings can only be suppressed via the `-XDignore.symbol.file` javac flag; the `SuppressWarnings` annotation won't work for these.

In order to restrict uses of this compiler flag to the `unsafe` module, I placed a facade in front of `Unsafe` so that other modules won't call it directly. This facade also will also help us to avoid accidental usage of deprecated Unsafe methods or methods that aren't supported in Java 6.

I also removed an unnecessary use of `Unsafe.setMemory`, which isn't present in certain versions of Java 6, and excluded the new `unsafe` module from Javadoc.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #5814 from JoshRosen/unsafe-compiler-warnings-fixes and squashes the following commits:

9e8c483 [Josh Rosen] Exclude new unsafe module from Javadoc
ba75ecf [Josh Rosen] Only apply -XDignore.symbol.file flag in unsafe project.
7403345 [Josh Rosen] Put facade in front of Unsafe.
50230c0 [Josh Rosen] Remove usage of Unsafe.setMemory
96d41c9 [Josh Rosen] Use -XDignore.symbol.file to suppress warnings about sun.misc.Unsafe usage
2015-04-30 15:21:00 -07:00
Liang-Chi Hsieh 6702324b60 [SPARK-7196][SQL] Support precision and scale of decimal type for JDBC
JIRA: https://issues.apache.org/jira/browse/SPARK-7196

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #5777 from viirya/jdbc_precision and squashes the following commits:

f40f5e6 [Liang-Chi Hsieh] Support precision and scale for NUMERIC type.
49acbf9 [Liang-Chi Hsieh] Add unit test.
a509e19 [Liang-Chi Hsieh] Support precision and scale of decimal type for JDBC.
2015-04-30 15:13:43 -07:00
Patrick Wendell e0628f2fae Revert "[SPARK-5342] [YARN] Allow long running Spark apps to run on secure YARN/HDFS"
This reverts commit 6c65da6bb7.
2015-04-30 14:59:20 -07:00
Joseph K. Bradley adbdb19a7d [SPARK-7207] [ML] [BUILD] Added ml.recommendation, ml.regression to SparkBuild
Added ml.recommendation, ml.regression to SparkBuild

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #5758 from jkbradley/SPARK-7207 and squashes the following commits:

a28158a [Joseph K. Bradley] Added ml.recommendation, ml.regression to SparkBuild
2015-04-30 14:39:27 -07:00
Hari Shreedharan 6c65da6bb7 [SPARK-5342] [YARN] Allow long running Spark apps to run on secure YARN/HDFS
Current Spark apps running on Secure YARN/HDFS would not be able to write data
to HDFS after 7 days, since delegation tokens cannot be renewed beyond that. This
means Spark Streaming apps will not be able to run on Secure YARN.

This commit adds basic functionality to fix this issue. In this patch:
- new parameters are added - principal and keytab, which can be used to login to a KDC
- the client logs in, and then get tokens to start the AM
- the keytab is copied to the staging directory
- the AM waits for 60% of the time till expiry of the tokens and then logs in using the keytab
- each time after 60% of the time, new tokens are created and sent to the executors

Currently, to avoid complicating the architecture, we set the keytab and principal in the
SparkHadoopUtil singleton, and schedule a login. Once the login is completed, a callback is scheduled.

This is being posted for feedback, so I can gather feedback on the general implementation.

There are currently a bunch of things to do:
- [x] logging
- [x] testing - I plan to manually test this soon. If you have ideas of how to add unit tests, comment.
- [x] add code to ensure that if these params are set in non-YARN cluster mode, we complain
- [x] documentation
- [x] Have the executors request for credentials from the AM, so that retries are possible.

Author: Hari Shreedharan <hshreedharan@apache.org>

Closes #4688 from harishreedharan/kerberos-longrunning and squashes the following commits:

36eb8a9 [Hari Shreedharan] Change the renewal interval config param. Fix a bunch of comments.
611923a [Hari Shreedharan] Make sure the namenodes are listed correctly for creating tokens.
09fe224 [Hari Shreedharan] Use token.renew to get token's renewal interval rather than using hdfs-site.xml
6963bbc [Hari Shreedharan] Schedule renewal in AM before starting user class. Else, a restarted AM cannot access HDFS if the user class tries to.
072659e [Hari Shreedharan] Fix build failure caused by thread factory getting moved to ThreadUtils.
f041dd3 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
42eead4 [Hari Shreedharan] Remove RPC part. Refactor and move methods around, use renewal interval rather than max lifetime to create new tokens.
ebb36f5 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
bc083e3 [Hari Shreedharan] Overload RegisteredExecutor to send tokens. Minor doc updates.
7b19643 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
8a4f268 [Hari Shreedharan] Added docs in the security guide. Changed some code to ensure that the renewer objects are created only if required.
e800c8b [Hari Shreedharan] Restore original RegisteredExecutor message, and send new tokens via NewTokens message.
0e9507e [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
7f1bc58 [Hari Shreedharan] Minor fixes, cleanup.
bcd11f9 [Hari Shreedharan] Refactor AM and Executor token update code into separate classes, also send tokens via akka on executor startup.
f74303c [Hari Shreedharan] Move the new logic into specialized classes. Add cleanup for old credentials files.
2f9975c [Hari Shreedharan] Ensure new tokens are written out immediately on AM restart. Also, pikc up the latest suffix from HDFS if the AM is restarted.
61b2b27 [Hari Shreedharan] Account for AM restarts by making sure lastSuffix is read from the files on HDFS.
62c45ce [Hari Shreedharan] Relogin from keytab periodically.
fa233bd [Hari Shreedharan] Adding logging, fixing minor formatting and ordering issues.
42813b4 [Hari Shreedharan] Remove utils.sh, which was re-added due to merge with master.
0de27ee [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
55522e3 [Hari Shreedharan] Fix failure caused by Preconditions ambiguity.
9ef5f1b [Hari Shreedharan] Added explanation of how the credentials refresh works, some other minor fixes.
f4fd711 [Hari Shreedharan] Fix SparkConf usage.
2debcea [Hari Shreedharan] Change the file structure for credentials files. I will push a followup patch which adds a cleanup mechanism for old credentials files. The credentials files are small and few enough for it to cause issues on HDFS.
af6d5f0 [Hari Shreedharan] Cleaning up files where changes weren't required.
f0f54cb [Hari Shreedharan] Be more defensive when updating the credentials file.
f6954da [Hari Shreedharan] Got rid of Akka communication to renew, instead the executors check a known file's modification time to read the credentials.
5c11c3e [Hari Shreedharan] Move tests to YarnSparkHadoopUtil to fix compile issues.
b4cb917 [Hari Shreedharan] Send keytab to AM via DistributedCache rather than directly via HDFS
0985b4e [Hari Shreedharan] Write tokens to HDFS and read them back when required, rather than sending them over the wire.
d79b2b9 [Hari Shreedharan] Make sure correct credentials are passed to FileSystem#addDelegationTokens()
8c6928a [Hari Shreedharan] Fix issue caused by direct creation of Actor object.
fb27f46 [Hari Shreedharan] Make sure principal and keytab are set before CoarseGrainedSchedulerBackend is started. Also schedule re-logins in CoarseGrainedSchedulerBackend#start()
41efde0 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
d282d7a [Hari Shreedharan] Fix ClientSuite to set YARN mode, so that the correct class is used in tests.
bcfc374 [Hari Shreedharan] Fix Hadoop-1 build by adding no-op methods in SparkHadoopUtil, with impl in YarnSparkHadoopUtil.
f8fe694 [Hari Shreedharan] Handle None if keytab-login is not scheduled.
2b0d745 [Hari Shreedharan] [SPARK-5342][YARN] Allow long running Spark apps to run on secure YARN/HDFS.
ccba5bc [Hari Shreedharan] WIP: More changes wrt kerberos
77914dd [Hari Shreedharan] WIP: Add kerberos principal and keytab to YARN client.
2015-04-30 13:03:23 -05:00