Commit graph

7435 commits

Author SHA1 Message Date
Reynold Xin 0db5d5a22e Added SignalLogger to HistoryServer.
This was omitted in #1260. @aarondav

Author: Reynold Xin <rxin@apache.org>

Closes #1300 from rxin/historyServer and squashes the following commits:

af720a3 [Reynold Xin] Added SignalLogger to HistoryServer.
2014-07-04 17:33:07 -07:00
Patrick Wendell fc71658938 HOTFIX: Clean before building docs during release.
If the docs are built after a Maven build has finished the intermediate
state somehow causes a compiler bug during sbt compilation. This just
does a clean before attempting to build the docs.
2014-07-04 10:01:19 -07:00
Yanjie Gao 5dadda8645 [SPARK-2234][SQL]Spark SQL basicOperators add Except operator
Hi all,
I want to submit a Except operator in basicOperators.scala
In SQL case.SQL support two table do except operator.
select * from table1
except
select * from table2
This operator support the substract function .Return an table with the elements from `this` that are not in `other`.This operator should limit the input SparkPlan Seq only has two member.The check will later support
JIRA:https://issues.apache.org/jira/browse/SPARK-2234

Author: Yanjie Gao <gaoyanjie55@163.com>
Author: YanjieGao <396154235@qq.com>
Author: root <root@node4.(none)>
Author: gaoyanjie <gaoyanjie55@163.com>

Closes #1151 from YanjieGao/patch-6 and squashes the following commits:

f19f899 [YanjieGao] add a new blank line in basicoperators.scala
2ff7d73 [YanjieGao] resolve the identation in SqlParser and SparkStrategies
fdb5227 [YanjieGao] Merge remote branch 'upstream/master' into patch-6
9940d19 [YanjieGao] make comment less than 100c
09c7413 [YanjieGao] pr 1151 SqlParser add cache ,basic Operator rename Except and modify comment
b4b5867 [root] Merge remote branch 'upstream/master' into patch-6
b4c3869 [Yanjie Gao] change SparkStrategies Sparkcontext to SqlContext
7e0ec29 [Yanjie Gao] delete multi test
7e7c83f [Yanjie Gao] delete conflict except
b01beb8 [YanjieGao] resolve conflict sparkstrategies and basicOperators
4dc8166 [YanjieGao] resolve conflict
fa68a98 [Yanjie Gao] Update joins.scala
8e6bb00 [Yanjie Gao] delete conflict except
dd9ba5e [Yanjie Gao] Update joins.scala
a0d4e73 [Yanjie Gao] delete skew join
60f5ddd [Yanjie Gao] update less than 100c
0e72233 [Yanjie Gao] update SQLQuerySuite on master branch
7f916b5 [Yanjie Gao] update execution/basicOperators on master branch
a28dece [Yanjie Gao] Update logical/basicOperators on master branch
a639935 [Yanjie Gao] Update SparkStrategies.scala
3bf7def [Yanjie Gao] update SqlParser on master branch
26f833f [Yanjie Gao] update SparkStrategies.scala on master branch
8dd063f [Yanjie Gao] Update logical/basicOperators on master branch
9847dcf [Yanjie Gao] update SqlParser on masterbranch
d6a4604 [Yanjie Gao] Update joins.scala
424c507 [Yanjie Gao] Update joins.scala
7680742 [Yanjie Gao] Update SqlParser.scala
a7193d8 [gaoyanjie] [SPARK-2234][SQL]Spark SQL basicOperators add Except operator #1151
5c8a224 [Yanjie Gao] update the line less than 100c
ee066b3 [Yanjie Gao] Update basicOperators.scala
32a80ab [Yanjie Gao] remove except in HiveQl
cf232eb [Yanjie Gao] update 1comment 2space3 left.out
f1ea3f3 [Yanjie Gao] remove comment
7ea9b91 [Yanjie Gao] remove annotation
7f3d613 [Yanjie Gao] update .map(_.copy())
670a1bb [Yanjie Gao] Update HiveQl.scala
3fe7746 [Yanjie Gao] Update SQLQuerySuite.scala
a36eb0a [Yanjie Gao] Update basicOperators.scala
7859e56 [Yanjie Gao] Update SparkStrategies.scala
052346d [Yanjie Gao] Subtract is conflict with Subtract(e1,e2)
aab3785 [Yanjie Gao] Update SQLQuerySuite.scala
4bf80b1 [Yanjie Gao] update subtract to except
4bdd520 [Yanjie Gao] Update SqlParser.scala
2d4bfbd [Yanjie Gao] Update SQLQuerySuite.scala
0808921 [Yanjie Gao] SQLQuerySuite
a8a1948 [Yanjie Gao] SparkStrategies
1fe96c0 [Yanjie Gao] HiveQl.scala update
3305e40 [Yanjie Gao] SqlParser
7a98c37 [Yanjie Gao] Update basicOperators.scala
cf5b9d0 [Yanjie Gao] Update basicOperators.scala
8945835 [Yanjie Gao] object SkewJoin extends Strategy
2b98962 [Yanjie Gao] Update SqlParser.scala
dd32980 [Yanjie Gao] update1
68815b2 [Yanjie Gao] Reformat the code style
4eb43ec [Yanjie Gao] Update basicOperators.scala
aa06072 [Yanjie Gao] Reformat the code sytle
2014-07-04 02:43:57 -07:00
Reynold Xin b3e768e154 [SPARK-2059][SQL] Add analysis checks
This replaces #1263 with a test case.

Author: Reynold Xin <rxin@apache.org>
Author: Michael Armbrust <michael@databricks.com>

Closes #1265 from rxin/sql-analysis-error and squashes the following commits:

a639e01 [Reynold Xin] Added a test case for unresolved attribute analysis.
7371e1b [Reynold Xin] Merge pull request #1263 from marmbrus/analysisChecks
448c088 [Michael Armbrust] Add analysis checks
2014-07-04 00:53:41 -07:00
baishuo(白硕) 0bbe61223e Update SQLConf.scala
use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap

Author: baishuo(白硕) <vc_java@hotmail.com>

Closes #1272 from baishuo/master and squashes the following commits:

51ec55d [baishuo(白硕)] Update SQLConf.scala
63da043 [baishuo(白硕)] Update SQLConf.scala
36b6dbd [baishuo(白硕)] Update SQLConf.scala
864faa0 [baishuo(白硕)] Update SQLConf.scala
593096b [baishuo(白硕)] Update SQLConf.scala
7304d9b [baishuo(白硕)] Update SQLConf.scala
843581c [baishuo(白硕)] Update SQLConf.scala
1d3e4a2 [baishuo(白硕)] Update SQLConf.scala
0740f28 [baishuo(白硕)] Update SQLConf.scala
2014-07-04 00:25:31 -07:00
Prashant Sharma d43415075b [SPARK-1199][REPL] Remove VALId and use the original import style for defined classes.
This is an alternate solution to #1176.

Author: Prashant Sharma <prashant.s@imaginea.com>

Closes #1179 from ScrapCodes/SPARK-1199/repl-fix-second-approach and squashes the following commits:

820b34b [Prashant Sharma] Here we generate two kinds of import wrappers based on whether it is a class or not.
2014-07-04 00:05:27 -07:00
Cheng Lian 544880457d [SPARK-2059][SQL] Don't throw TreeNodeException in execution.ExplainCommand
This is a fix for the problem revealed by PR #1265.

Currently `HiveComparisonSuite` ignores output of `ExplainCommand` since Catalyst query plan is quite different from Hive query plan. But exceptions throw from `CheckResolution` still breaks test cases. This PR catches any `TreeNodeException` and reports it as part of the query explanation.

After merging this PR, PR #1265 can also be merged safely.

For a normal query:

```
scala> hql("explain select key from src").foreach(println)
...
[Physical execution plan:]
[HiveTableScan [key#9], (MetastoreRelation default, src, None), None]
```

For a wrong query with unresolved attribute(s):

```
scala> hql("explain select kay from src").foreach(println)
...
[Error occurred during query planning: ]
[Unresolved attributes: 'kay, tree:]
[Project ['kay]]
[ LowerCaseSchema ]
[  MetastoreRelation default, src, None]
```

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #1294 from liancheng/safe-explain and squashes the following commits:

4318911 [Cheng Lian] Don't throw TreeNodeException in `execution.ExplainCommand`
2014-07-03 23:41:54 -07:00
Aaron Davidson 97a0bfe1c0 SPARK-2282: Reuse PySpark Accumulator sockets to avoid crashing Spark
JIRA: https://issues.apache.org/jira/browse/SPARK-2282

This issue is caused by a buildup of sockets in the TIME_WAIT stage of TCP, which is a stage that lasts for some period of time after the communication closes.

This solution simply allows us to reuse sockets that are in TIME_WAIT, to avoid issues with the buildup of the rapid creation of these sockets.

Author: Aaron Davidson <aaron@databricks.com>

Closes #1220 from aarondav/SPARK-2282 and squashes the following commits:

2e5cab3 [Aaron Davidson] SPARK-2282: Reuse PySpark Accumulator sockets to avoid crashing Spark
2014-07-03 23:02:36 -07:00
Andrew Or 3894a49be9 [SPARK-2307][Reprise] Correctly report RDD blocks on SparkUI
**Problem.** The existing code in `ExecutorPage.scala` requires a linear scan through all the blocks to filter out the uncached ones. Every refresh could be expensive if there are many blocks and many executors.

**Solution.** The proper semantics should be the following: `StorageStatusListener` should contain only block statuses that are cached. This means as soon as a block is unpersisted by any mean, its status should be removed. This is reflected in the changes made in `StorageStatusListener.scala`.

Further, the `StorageTab` must stop relying on the `StorageStatusListener` changing a dropped block's status to `StorageLevel.NONE` (which no longer happens). This is reflected in the changes made in `StorageTab.scala` and `StorageUtils.scala`.

----------

If you have been following this chain of PRs like pwendell, you will quickly notice that this reverts the changes in #1249, which reverts the changes in #1080. In other words, we are adding back the changes from #1080, and fixing SPARK-2307 on top of those changes. Please ask questions if you are confused.

Author: Andrew Or <andrewor14@gmail.com>

Closes #1255 from andrewor14/storage-ui-fix-reprise and squashes the following commits:

45416fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into storage-ui-fix-reprise
a82ea25 [Andrew Or] Add tests for StorageStatusListener
8773b01 [Andrew Or] Update comment / minor changes
3afde3f [Andrew Or] Correctly report the number of blocks on SparkUI
2014-07-03 22:48:23 -07:00
Aaron Davidson 586feb5c95 [SPARK-2350] Don't NPE while launching drivers
Prior to this change, we could throw a NPE if we launch a driver while another one is waiting, because removing from an iterator while iterating over it is not safe.

Author: Aaron Davidson <aaron@databricks.com>

Closes #1289 from aarondav/master-fail and squashes the following commits:

1cf1cf4 [Aaron Davidson] SPARK-2350: Don't NPE while launching drivers
2014-07-03 22:31:41 -07:00
Raymond Liu 5fa0a05763 [SPARK-1097] Workaround Hadoop conf ConcurrentModification issue
Workaround Hadoop conf ConcurrentModification issue

Author: Raymond Liu <raymond.liu@intel.com>

Closes #1273 from colorant/hadoopRDD and squashes the following commits:

994e98b [Raymond Liu] Address comments
e2cda3d [Raymond Liu] Workaround Hadoop conf ConcurrentModification issue
2014-07-03 19:24:22 -07:00
Clément MATHIEU fdc4c112e7 Streaming programming guide typos
Fix a bad Java code sample and a broken link in the streaming programming guide.

Author: Clément MATHIEU <clement@unportant.info>

Closes #1286 from cykl/streaming-programming-guide-typos and squashes the following commits:

b0908cb [Clément MATHIEU] Fix broken URL
9d3c535 [Clément MATHIEU] Spark streaming requires at least two working threads (scala version was OK)
2014-07-03 18:31:18 -07:00
Zongheng Yang d4c30cd991 [HOTFIX] Synchronize on SQLContext.settings in tests.
Let's see if this fixes the ongoing series of test failures in a master build machine (https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT-pre-YARN/SPARK_HADOOP_VERSION=1.0.4,label=centos/81/).

pwendell marmbrus

Author: Zongheng Yang <zongheng.y@gmail.com>

Closes #1277 from concretevitamin/test-fix and squashes the following commits:

28c88bd [Zongheng Yang] Synchronize on SQLContext.settings in tests.
2014-07-03 17:37:53 -07:00
Prashant Sharma 731f683b1b [SPARK-2109] Setting SPARK_MEM for bin/pyspark does not work.
Trivial fix.

Author: Prashant Sharma <prashant.s@imaginea.com>

Closes #1050 from ScrapCodes/SPARK-2109/pyspark-script-bug and squashes the following commits:

77072b9 [Prashant Sharma] Changed echos to redirect to STDERR.
13f48a0 [Prashant Sharma] [SPARK-2109] Setting SPARK_MEM for bin/pyspark does not work.
2014-07-03 15:06:58 -07:00
Yijie Shen a9b52e5623 [SPARK-2342] Evaluation helper's output type doesn't conform to input ty...
The function cast doesn't conform to the intention of "Those expressions are supposed to be in the same data type, and also the return type." comment

Author: Yijie Shen <henry.yijieshen@gmail.com>

Closes #1283 from yijieshen/master and squashes the following commits:

c7aaa4b [Yijie Shen] [SPARK-2342] Evaluation helper's output type doesn't conform to input type
2014-07-03 13:22:13 -07:00
Sean Owen 2b36344f58 SPARK-1675. Make clear whether computePrincipalComponents requires centered data
Just closing out this small JIRA, resolving with a comment change.

Author: Sean Owen <sowen@cloudera.com>

Closes #1171 from srowen/SPARK-1675 and squashes the following commits:

45ee9b7 [Sean Owen] Add simple note that data need not be centered for computePrincipalComponents
2014-07-03 11:54:51 -07:00
Andrew Or c480537739 [SPARK] Fix NPE for ExternalAppendOnlyMap
It did not handle null keys very gracefully before.

Author: Andrew Or <andrewor14@gmail.com>

Closes #1288 from andrewor14/fix-external and squashes the following commits:

312b8d8 [Andrew Or] Abstract key hash code
ed5adf9 [Andrew Or] Fix NPE for ExternalAppendOnlyMap
2014-07-03 10:26:50 -07:00
yantangzhai 3bbeca6489 [SPARK-2324] SparkContext should not exit directly when spark.local.dir is a list of multiple paths and one of them has error
The spark.local.dir is configured as a list of multiple paths as follows /data1/sparkenv/local,/data2/sparkenv/local. If the disk data2 of the driver node has error, the application will exit since DiskBlockManager exits directly at createLocalDirs. If the disk data2 of the worker node has error, the executor will exit either.
DiskBlockManager should not exit directly at createLocalDirs if one of spark.local.dir has error. Since spark.local.dir has multiple paths, a problem should not affect the overall situation.
I think DiskBlockManager could ignore the bad directory at createLocalDirs.

Author: yantangzhai <tyz0303@163.com>

Closes #1274 from YanTangZhai/SPARK-2324 and squashes the following commits:

609bf48 [yantangzhai] [SPARK-2324] SparkContext should not exit directly when spark.local.dir is a list of multiple paths and one of them has error
df08673 [yantangzhai] [SPARK-2324] SparkContext should not exit directly when spark.local.dir is a list of multiple paths and one of them has error
2014-07-03 10:14:35 -07:00
Takuya UESHIN bc7041a42d [SPARK-2287] [SQL] Make ScalaReflection be able to handle Generic case classes.
Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #1226 from ueshin/issues/SPARK-2287 and squashes the following commits:

32ef7c3 [Takuya UESHIN] Add execution of `SHOW TABLES` before `TestHive.reset()`.
541dc8d [Takuya UESHIN] Merge branch 'master' into issues/SPARK-2287
fac5fae [Takuya UESHIN] Remove unnecessary method receiver.
d306e60 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-2287
7de5706 [Takuya UESHIN] Make ScalaReflection be able to handle Generic case classes.
2014-07-02 10:10:36 -07:00
Takuya UESHIN 1e2c26c83d [SPARK-2328] [SQL] Add execution of SHOW TABLES before TestHive.reset().
`PruningSuite` is executed first of Hive tests unfortunately, `TestHive.reset()` breaks the test environment.
To prevent this, we must run a query before calling reset the first time.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #1268 from ueshin/issues/SPARK-2328 and squashes the following commits:

043ceac [Takuya UESHIN] Add execution of `SHOW TABLES` before `TestHive.reset()`.
2014-07-02 10:07:01 -07:00
Ximo Guanter Gonzalbez 5c6ec94da1 SPARK-2186: Spark SQL DSL support for simple aggregations such as SUM and AVG
**Description** This patch enables using the `.select()` function in SchemaRDD with functions such as `Sum`, `Count` and other.
**Testing** Unit tests added.

Author: Ximo Guanter Gonzalbez <ximo@tid.es>

Closes #1211 from edrevo/add-expression-support-in-select and squashes the following commits:

fe4a1e1 [Ximo Guanter Gonzalbez] Extend SQL DSL to functions
e1d344a [Ximo Guanter Gonzalbez] SPARK-2186: Spark SQL DSL support for simple aggregations such as SUM and AVG
2014-07-02 10:03:44 -07:00
CodingCat 6596392da0 update the comments in SqlParser
SqlParser has been case-insensitive after dab5439a08 was merged

Author: CodingCat <zhunansjtu@gmail.com>

Closes #1275 from CodingCat/master and squashes the following commits:

17931cd [CodingCat] update the comments in SqlParser
2014-07-01 20:37:10 -07:00
Kay Ousterhout 05c3d90e35 [SPARK-2185] Emit warning when task size exceeds a threshold.
This functionality was added in an earlier commit but shortly
after was removed due to a bad git merge (totally my fault).

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #1149 from kayousterhout/warning_bug and squashes the following commits:

3f1bb00 [Kay Ousterhout] Fixed Json tests
462a664 [Kay Ousterhout] Removed task set name from warning message
e89b2f6 [Kay Ousterhout] Fixed Json tests.
7af424c [Kay Ousterhout] Emit warning when task size exceeds a threshold.
2014-07-01 01:56:51 -07:00
Peter MacKinnon 3319a3e3c6 SPARK-2332 [build] add exclusion for old servlet-api on hadoop-client in core
Fix for class of test suite failures in jenkins

Author: Peter MacKinnon <pmackinn@redhat.com>

Closes #1271 from pdmack/master and squashes the following commits:

cfe59fd [Peter MacKinnon] exclude servlet-api in hadoop-client for sbt
6f39fec [Peter MacKinnon] add exclusion for old servlet-api on hadoop-client in core
2014-07-01 00:28:38 -07:00
Sean Owen 04fa1223ee SPARK-2293. Replace RDD.zip usage by map with predict inside.
This is the only occurrence of this pattern in the examples that needs to be replaced. It only addresses the example change.

Author: Sean Owen <sowen@cloudera.com>

Closes #1250 from srowen/SPARK-2293 and squashes the following commits:

6b1b28c [Sean Owen] Compute prediction-and-label RDD directly rather than by zipping, for efficiency
2014-06-30 16:03:38 -07:00
Reynold Xin 5fccb567b3 [SPARK-2318] When exiting on a signal, print the signal name first.
Author: Reynold Xin <rxin@apache.org>

Closes #1260 from rxin/signalhandler1 and squashes the following commits:

8e73552 [Reynold Xin] Uh add Logging back in ApplicationMaster.
0402ba8 [Reynold Xin] Synchronize SignalLogger.register.
dc70705 [Reynold Xin] Added SignalLogger to YARN ApplicationMaster.
79a21b4 [Reynold Xin] Added license header.
0da052c [Reynold Xin] Added the SignalLogger itself.
e587d2e [Reynold Xin] [SPARK-2318] When exiting on a signal, print the signal name first.
2014-06-30 15:12:38 -07:00
Reynold Xin 358ae1534d [SPARK-2322] Exception in resultHandler should NOT crash DAGScheduler and shutdown SparkContext.
This should go into 1.0.1.

Author: Reynold Xin <rxin@apache.org>

Closes #1264 from rxin/SPARK-2322 and squashes the following commits:

c77c07f [Reynold Xin] Added comment to SparkDriverExecutionException and a test case for accumulator.
5d8d920 [Reynold Xin] [SPARK-2322] Exception in resultHandler could crash DAGScheduler and shutdown SparkContext.
2014-06-30 11:50:22 -07:00
Andrew Ash 6803642253 SPARK-2077 Log serializer that actually ends up being used
I could settle with this being a debug also if we provided an example of how to turn it on in `log4j.properties`

https://issues.apache.org/jira/browse/SPARK-2077

Author: Andrew Ash <andrew@andrewash.com>

Closes #1017 from ash211/SPARK-2077 and squashes the following commits:

580f680 [Andrew Ash] Drop to debug
0266415 [Andrew Ash] SPARK-2077 Log serializer that actually ends up being used
2014-06-29 23:29:05 -07:00
William Benton a484030dae SPARK-897: preemptively serialize closures
These commits cause `ClosureCleaner.clean` to attempt to serialize the cleaned closure with the default closure serializer and throw a `SparkException` if doing so fails.  This behavior is enabled by default but can be disabled at individual callsites of `SparkContext.clean`.

Commit 98e01ae8 fixes some no-op assertions in `GraphSuite` that this work exposed; I'm happy to put that in a separate PR if that would be more appropriate.

Author: William Benton <willb@redhat.com>

Closes #143 from willb/spark-897 and squashes the following commits:

bceab8a [William Benton] Commented DStream corner cases for serializability checking.
64d04d2 [William Benton] FailureSuite now checks both messages and causes.
3b3f74a [William Benton] Stylistic and doc cleanups.
b215dea [William Benton] Fixed spurious failures in ImplicitOrderingSuite
be1ecd6 [William Benton] Don't check serializability of DStream transforms.
abe816b [William Benton] Make proactive serializability checking optional.
5bfff24 [William Benton] Adds proactive closure-serializablilty checking
ed2ccf0 [William Benton] Test cases for SPARK-897.
2014-06-29 23:27:34 -07:00
jerryshao 66135a341d [SPARK-2104] Fix task serializing issues when sort with Java non serializable class
Details can be see in [SPARK-2104](https://issues.apache.org/jira/browse/SPARK-2104). This work is based on Reynold's work, add some unit tests to validate the issue.

@rxin , would you please take a look at this PR, thanks a lot.

Author: jerryshao <saisai.shao@intel.com>

Closes #1245 from jerryshao/SPARK-2104 and squashes the following commits:

c8ee362 [jerryshao] Make field partitions transient
2b41917 [jerryshao] Minor changes
47d763c [jerryshao] Fix task serializing issue when sort with Java non serializable class
2014-06-29 23:00:00 -07:00
Kay Ousterhout 7b71a0e096 [SPARK-1683] Track task read metrics.
This commit adds a new metric in TaskMetrics to record
the input data size and displays this information in the UI.

An earlier version of this commit also added the read time,
which can be useful for diagnosing straggler problems,
but unfortunately that change introduced a significant performance
regression for jobs that don't do much computation. In order to
track read time, we'll need to do sampling.

The screenshots below show the UI with the new "Input" field,
which I added to the stage summary page, the executor summary page,
and the per-stage page.

![image](https://cloud.githubusercontent.com/assets/1108612/3167930/2627f92a-eb77-11e3-861c-98ea5bb7a1a2.png)

![image](https://cloud.githubusercontent.com/assets/1108612/3167936/475a889c-eb77-11e3-9706-f11c48751f17.png)

![image](https://cloud.githubusercontent.com/assets/1108612/3167948/80ebcf12-eb77-11e3-87ed-349fce6a770c.png)

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #962 from kayousterhout/read_metrics and squashes the following commits:

f13b67d [Kay Ousterhout] Correctly format input bytes on executor page
8b70cde [Kay Ousterhout] Added comment about potential inaccuracy of bytesRead
d1016e8 [Kay Ousterhout] Udated SparkListenerSuite test
8461492 [Kay Ousterhout] Miniscule style fix
ae04d99 [Kay Ousterhout] Remove input metrics for parallel collections
719f19d [Kay Ousterhout] Style fixes
bb6ec62 [Kay Ousterhout] Small fixes
869ac7b [Kay Ousterhout] Updated Json tests
44a0301 [Kay Ousterhout] Fixed accidentally added line
4bd0568 [Kay Ousterhout] Added input source, renamed Hdfs to Hadoop.
f27e535 [Kay Ousterhout] Updates based on review comments and to fix rebase
bf41029 [Kay Ousterhout] Updated Json tests to pass
0fc33e0 [Kay Ousterhout] Added explicit backward compatibility test
4e52925 [Kay Ousterhout] Added Json output and associated tests.
365400b [Kay Ousterhout] [SPARK-1683] Track task read metrics.
2014-06-29 22:01:42 -07:00
Reynold Xin cdf613fc52 [SPARK-2320] Reduce exception/code block font size in web ui
Author: Reynold Xin <rxin@apache.org>

Closes #1261 from rxin/ui-pre-size and squashes the following commits:

7ab1a69 [Reynold Xin] [SPARK-2320] Reduce exception/code block font size in web ui
2014-06-29 16:46:28 -07:00
Reynold Xin 2053d793cc Improve MapOutputTracker error logging.
Author: Reynold Xin <rxin@apache.org>

Closes #1258 from rxin/mapOutputTracker and squashes the following commits:

a7c95b6 [Reynold Xin] Improve MapOutputTracker error logging.
2014-06-28 21:05:03 -07:00
Matthew Farrellee 3c104c79d2 [SPARK-1394] Remove SIGCHLD handler in worker subprocess
It should not be the responsibility of the worker subprocess, which
does not intentionally fork, to try and cleanup child processes. Doing
so is complex and interferes with operations such as
platform.system().

If it is desirable to have tighter control over subprocesses, then
namespaces should be used and it should be the manager's resposibility
to handle cleanup.

Author: Matthew Farrellee <matt@redhat.com>

Closes #1247 from mattf/SPARK-1394 and squashes the following commits:

c36f308 [Matthew Farrellee] [SPARK-1394] Remove SIGCHLD handler in worker subprocess
2014-06-28 18:39:27 -07:00
Guillaume Ballet b8f2e13aec [SPARK-2233] make-distribution script should list the git hash in the RELEASE file
This patch adds the git revision hash (short version) to the RELEASE file. It uses git instead of simply checking for the existence of .git, so as to make sure that this is a functional repository.

Author: Guillaume Ballet <gballet@gmail.com>

Closes #1216 from gballet/master and squashes the following commits:

eabc50f [Guillaume Ballet] Refactored the script to take comments into account.
d93e5e8 [Guillaume Ballet] [SPARK 2233] make-distribution script now lists the git hash tag in the RELEASE file.
2014-06-28 13:07:12 -07:00
Matthew Farrellee 0e0686d3ef [SPARK-2003] Fix python SparkContext example
Author: Matthew Farrellee <matt@redhat.com>

Closes #1246 from mattf/SPARK-2003 and squashes the following commits:

b12e7ca [Matthew Farrellee] [SPARK-2003] Fix python SparkContext example
2014-06-27 18:20:33 -07:00
Andrew Or f17510e371 [SPARK-2259] Fix highly misleading docs on cluster / client deploy modes
The existing docs are highly misleading. For standalone mode, for example, it encourages the user to use standalone-cluster mode, which is not officially supported. The safeguards have been added in Spark submit itself to prevent bad documentation from leading users down the wrong path in the future.

This PR is prompted by countless headaches users of Spark have run into on the mailing list.

Author: Andrew Or <andrewor14@gmail.com>

Closes #1200 from andrewor14/submit-docs and squashes the following commits:

5ea2460 [Andrew Or] Rephrase cluster vs client explanation
c827f32 [Andrew Or] Clarify spark submit messages
9f7ed8f [Andrew Or] Clarify client vs cluster deploy mode + add safeguards
2014-06-27 16:11:31 -07:00
Andrew Or 21e0f77b63 [SPARK-2307] SparkUI - storage tab displays incorrect RDDs
The issue here is that the `StorageTab` listens for updates from the `StorageStatusListener`, but when a block is kicked out of the cache, `StorageStatusListener` removes it from its list. Thus, there is no way for the `StorageTab` to know whether a block has been dropped.

This issue was introduced in #1080, which was itself a bug fix. Here we revert that PR and offer a different fix for the original bug (SPARK-2144).

Author: Andrew Or <andrewor14@gmail.com>

Closes #1249 from andrewor14/storage-ui-fix and squashes the following commits:

af019ce [Andrew Or] Fix SPARK-2307
2014-06-27 15:23:25 -07:00
witgo 18f29b96c7 SPARK-2181:The keys for sorting the columns of Executor page in SparkUI are incorrect
Author: witgo <witgo@qq.com>

Closes #1135 from witgo/SPARK-2181 and squashes the following commits:

39dad90 [witgo] The keys for sorting the columns of Executor page in SparkUI are incorrect
2014-06-26 21:59:21 -07:00
Xiangrui Meng c23f5db32b [SPARK-2251] fix concurrency issues in random sampler
The following code is very likely to throw an exception:

~~~
val rdd = sc.parallelize(0 until 111, 10).sample(false, 0.1)
rdd.zip(rdd).count()
~~~

because the same random number generator is used in compute partitions.

Author: Xiangrui Meng <meng@databricks.com>

Closes #1229 from mengxr/fix-sample and squashes the following commits:

f1ee3d7 [Xiangrui Meng] fix concurrency issues in random sampler
2014-06-26 21:46:55 -07:00
Reynold Xin d1636dd72f [SPARK-2297][UI] Make task attempt and speculation more explicit in UI.
New UI:

![screen shot 2014-06-26 at 1 43 52 pm](https://cloud.githubusercontent.com/assets/323388/3404643/82b9ddc6-fd73-11e3-96f9-f7592a7aee79.png)

Author: Reynold Xin <rxin@apache.org>

Closes #1236 from rxin/ui-task-attempt and squashes the following commits:

3b645dd [Reynold Xin] Expose attemptId in Stage.
c0474b1 [Reynold Xin] Beefed up unit test.
c404bdd [Reynold Xin] Fix ReplayListenerSuite.
f56be4b [Reynold Xin] Fixed JsonProtocolSuite.
e29e0f7 [Reynold Xin] Minor update.
5e4354a [Reynold Xin] [SPARK-2297][UI] Make task attempt and speculation more explicit in UI.
2014-06-26 21:13:26 -07:00
Reynold Xin bf578deaf2 Removed throwable field from FetchFailedException and added MetadataFetchFailedException
FetchFailedException used to have a Throwable field, but in reality we never propagate any of the throwable/exceptions back to the driver because Executor explicitly looks for FetchFailedException and then sends FetchFailed as the TaskEndReason.

This pull request removes the throwable and adds a MetadataFetchFailedException that extends FetchFailedException (so now MapOutputTracker throws MetadataFetchFailedException instead).

Author: Reynold Xin <rxin@apache.org>

Closes #1227 from rxin/metadataFetchException and squashes the following commits:

5cb1e0a [Reynold Xin] MetadataFetchFailedException extends FetchFailedException.
8861ee2 [Reynold Xin] Throw MetadataFetchFailedException in MapOutputTracker.
2014-06-26 21:12:16 -07:00
Cheng Hao 981bde9b05 [SQL]Extract the joinkeys from join condition
Extract the join keys from equality conditions, that can be evaluated using equi-join.

Author: Cheng Hao <hao.cheng@intel.com>

Closes #1190 from chenghao-intel/extract_join_keys and squashes the following commits:

4a1060a [Cheng Hao] Fix some of the small issues
ceb4924 [Cheng Hao] Remove the redundant pattern of join keys extraction
cec34e8 [Cheng Hao] Update the code style issues
dcc4584 [Cheng Hao] Extract the joinkeys from join condition
2014-06-26 19:18:11 -07:00
Patrick Wendell f1f7385a50 Strip '@' symbols when merging pull requests.
Currently all of the commits with 'X' in them cause person X to
receive e-mails every time someone makes a public fork of Spark.

marmbrus who requested this.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #1239 from pwendell/strip and squashes the following commits:

22e5a97 [Patrick Wendell] Strip '@' symbols when merging pull requests.
2014-06-26 17:09:24 -07:00
Zichuan Ye 62d4a0fa99 Fixing AWS instance type information based upon current EC2 data
Fixed a problem in previous file in which some information regarding AWS instance types were wrong. Such information was updated base upon current AWS EC2 data.

Author: Zichuan Ye <jerry@tangentds.com>

Closes #1156 from jerry86/master and squashes the following commits:

ff36e95 [Zichuan Ye] Fixing AWS instance type information based upon current EC2 data
2014-06-26 15:21:29 -07:00
Reynold Xin 6587ef7c17 [SPARK-2286][UI] Report exception/errors for failed tasks that are not ExceptionFailure
Also added inline doc for each TaskEndReason.

Author: Reynold Xin <rxin@apache.org>

Closes #1225 from rxin/SPARK-2286 and squashes the following commits:

6a7959d [Reynold Xin] Fix unit test failure.
cf9d5eb [Reynold Xin] Merge branch 'master' into SPARK-2286
a61fae1 [Reynold Xin] Move to line above ...
38c7391 [Reynold Xin] [SPARK-2286][UI] Report exception/errors for failed tasks that are not ExceptionFailure.
2014-06-26 14:03:22 -07:00
Takuya UESHIN 32a1ad7531 [SPARK-2295] [SQL] Make JavaBeans nullability stricter.
Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #1235 from ueshin/issues/SPARK-2295 and squashes the following commits:

201c508 [Takuya UESHIN] Make JavaBeans nullability stricter.
2014-06-26 13:37:19 -07:00
Kay Ousterhout 48a82a827c Remove use of spark.worker.instances
spark.worker.instances was added as part of this commit: 1617816090

My understanding is that SPARK_WORKER_INSTANCES is supported for backwards compatibility,
but spark.worker.instances is never used (SparkSubmit.scala sets spark.executor.instances) so should
not have been added.

@sryza @pwendell @tgravescs LMK if I'm understanding this correctly

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #1214 from kayousterhout/yarn_config and squashes the following commits:

3d7c491 [Kay Ousterhout] Remove use of spark.worker.instances
2014-06-26 08:20:27 -05:00
Takuya UESHIN e4899a2537 [SPARK-2254] [SQL] ScalaRefection should mark primitive types as non-nullable.
Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #1193 from ueshin/issues/SPARK-2254 and squashes the following commits:

cfd6088 [Takuya UESHIN] Modify ScalaRefection.schemaFor method to return nullability of Scala Type.
2014-06-25 23:55:31 -07:00
Szul, Piotr 441cdcca64 [SPARK-2172] PySpark cannot import mllib modules in YARN-client mode
Include pyspark/mllib python sources as resources in the mllib.jar.
This way they will be included in the final assembly

Author: Szul, Piotr <Piotr.Szul@csiro.au>

Closes #1223 from piotrszul/branch-1.0 and squashes the following commits:

69d5174 [Szul, Piotr] Removed unsed resource directory src/main/resource from mllib pom
f8c52a0 [Szul, Piotr] [SPARK-2172] PySpark cannot import mllib modules in YARN-client mode Include pyspark/mllib python sources as resources in the jar

(cherry picked from commit fa167194ce)
Signed-off-by: Reynold Xin <rxin@apache.org>
2014-06-25 23:07:16 -07:00