Commit graph

3397 commits

Author SHA1 Message Date
nravi 70c8116c0a Workaround in Spark for ConcurrentModification issue (JIRA Hadoop-10456, Spark-1097)
This fix has gone into Hadoop 2.4.1. For developers using <  2.4.1, it would be good to have a workaround in Spark as well.

Fix has been tested for performance as well, no regressions found.

Author: nravi <nravi@c1704.halxg.cloudera.com>

Closes #1000 from nishkamravi2/master and squashes the following commits:

eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark
df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456)
6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed)
5108700 [nravi] Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456)
681b36f [nravi] Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles
2014-06-13 10:52:21 -07:00
Xiangrui Meng b3736e3d2f [HOTFIX] add math3 version to pom
Passed `mvn package`.

Author: Xiangrui Meng <meng@databricks.com>

Closes #1075 from mengxr/takeSample-fix and squashes the following commits:

45b4590 [Xiangrui Meng] add math3 version to pom
2014-06-13 02:59:38 -07:00
Andrew Or 44daec5abd [Minor] Fix style, formatting and naming in BlockManager etc.
This is a precursor to a bigger change. I wanted to separate out the relatively insignificant changes so the ultimate PR is not inflated.

(Warning: this PR is full of unimportant nitpicks)

Author: Andrew Or <andrewor14@gmail.com>

Closes #1058 from andrewor14/bm-minor and squashes the following commits:

8e12eaf [Andrew Or] SparkException -> BlockException
c36fd53 [Andrew Or] Make parts of BlockManager more readable
0a5f378 [Andrew Or] Entry -> MemoryEntry
e9762a5 [Andrew Or] Tone down string interpolation (minor reverts)
c4de9ac [Andrew Or] Merge branch 'master' of github.com:apache/spark into bm-minor
b3470f1 [Andrew Or] More string interpolation (minor)
7f9dcab [Andrew Or] Use string interpolation (minor)
94a425b [Andrew Or] Refactor against duplicate code + minor changes
8a6a7dc [Andrew Or] Exception -> SparkException
97c410f [Andrew Or] Deal with MIMA excludes
2480f1d [Andrew Or] Fixes in StorgeLevel.scala
abb0163 [Andrew Or] Style, formatting and naming fixes
2014-06-12 20:40:58 -07:00
Doris Xin 1de1d703bf SPARK-1939 Refactor takeSample method in RDD to use ScaSRS
Modified the takeSample method in RDD to use the ScaSRS sampling technique to improve performance. Added a private method that computes sampling rate > sample_size/total to ensure sufficient sample size with success rate >= 0.9999. Added a unit test for the private method to validate choice of sampling rate.

Author: Doris Xin <doris.s.xin@gmail.com>
Author: dorx <doris.s.xin@gmail.com>
Author: Xiangrui Meng <meng@databricks.com>

Closes #916 from dorx/takeSample and squashes the following commits:

5b061ae [Doris Xin] merge master
444e750 [Doris Xin] edge cases
3de882b [dorx] Merge pull request #2 from mengxr/SPARK-1939
82dde31 [Xiangrui Meng] update pyspark's takeSample
48d954d [Doris Xin] remove unused imports from RDDSuite
fb1452f [Doris Xin] allowing num to be greater than count in all cases
1481b01 [Doris Xin] washing test tubes and making coffee
dc699f3 [Doris Xin] give back imports removed by accident in rdd.py
64e445b [Doris Xin] logwarnning as soon as it enters the while loop
55518ed [Doris Xin] added TODO for logging in rdd.py
eff89e2 [Doris Xin] addressed reviewer comments.
ecab508 [Doris Xin] "fixed checkstyle violation
0a9b3e3 [Doris Xin] "reviewer comment addressed"
f80f270 [Doris Xin] Merge branch 'master' into takeSample
ae3ad04 [Doris Xin] fixed edge cases to prevent overflow
065ebcd [Doris Xin] Merge branch 'master' into takeSample
9bdd36e [Doris Xin] Check sample size and move computeFraction
e3fd6a6 [Doris Xin] Merge branch 'master' into takeSample
7cab53a [Doris Xin] fixed import bug in rdd.py
ffea61a [Doris Xin] SPARK-1939: Refactor takeSample method in RDD
1441977 [Doris Xin] SPARK-1939 Refactor takeSample method in RDD to use ScaSRS
2014-06-12 19:44:27 -07:00
Ariel Rabkin 0154587ab7 document laziness of parallelize
Took me several hours to figure out this behavior. It would be good to highlight it in the documentation.

Author: Ariel Rabkin <asrabkin@cs.princeton.edu>

Closes #1070 from asrabkin/master and squashes the following commits:

29a076e [Ariel Rabkin] doc fix
2014-06-12 17:51:33 -07:00
Patrick Wendell 1c04652c8f SPARK-1843: Replace assemble-deps with env variable.
(This change is actually small, I moved some logic into
compute-classpath that was previously in spark-class).

Assemble deps has existed for a while to allow developers to
run local code with new changes quickly. When I'm developing I
typically use a simpler approach which just prepends the Spark
classes to the classpath before the assembly jar. This is well
defined in the JVM and the Spark classes take precedence over those
in the assembly.

This approach is portable across both builds which is the main reason I'd
like to switch to it. It's also a bit easier to toggle on and off quickly.

The way you use this is the following:
```
$ ./bin/spark-shell # Use spark with the normal assembly
$ export SPARK_PREPEND_CLASSES=true
$ ./bin/spark-shell # Now it's using compiled classes
$ unset SPARK_PREPEND_CLASSES
$ ./bin/spark-shell # Back to normal
```

Author: Patrick Wendell <pwendell@gmail.com>

Closes #877 from pwendell/assemble-deps and squashes the following commits:

8a11345 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into assemble-deps
faa3168 [Patrick Wendell] Adding a warning for compatibility
3f151a7 [Patrick Wendell] Small fix
bbfb73c [Patrick Wendell] Review feedback
328e9f8 [Patrick Wendell] SPARK-1843: Replace assemble-deps with env variable.
2014-06-12 15:43:32 -07:00
Marcelo Vanzin ecde5b8375 [SPARK-2080] Yarn: report HS URL in client mode, correct user in cluster mode.
Yarn client mode was not setting the app's tracking URL to the
History Server's URL when configured by the user. Now client mode
behaves the same as cluster mode.

In SparkContext.scala, the "user.name" system property had precedence
over the SPARK_USER environment variable. This means that SPARK_USER
was never used, since "user.name" is always set by the JVM. In Yarn
cluster mode, this means the application always reported itself as
being run by user "yarn" (or whatever user was running the Yarn NM).
One could argue that the correct fix would be to use UGI.getCurrentUser()
here, but at least for Yarn that will match what SPARK_USER is set
to.

Author: Marcelo Vanzin <vanzin@cloudera.com>

This patch had conflicts when merged, resolved by
Committer: Thomas Graves <tgraves@apache.org>

Closes #1002 from vanzin/yarn-client-url and squashes the following commits:

4046e04 [Marcelo Vanzin] Set HS link in yarn-alpha also.
4c692d9 [Marcelo Vanzin] Yarn: report HS URL in client mode, correct user in cluster mode.
2014-06-12 16:19:36 -05:00
Doris Xin 83c226d454 [SPARK-2088] fix NPE in toString
After deserialization, the transient field creationSiteInfo does not get backfilled with the default value, but the toString method, which is invoked by the serializer, expects the field to always be non-null. An NPE is thrown when toString is called by the serializer when creationSiteInfo is null.

Author: Doris Xin <doris.s.xin@gmail.com>

Closes #1028 from dorx/toStringNPE and squashes the following commits:

f20021e [Doris Xin] unit test for toString after desrialization
6f0a586 [Doris Xin] Merge branch 'master' into toStringNPE
f47fecf [Doris Xin] Merge branch 'master' into toStringNPE
76199c6 [Doris Xin] [SPARK-2088] fix NPE in toString
2014-06-12 12:53:07 -07:00
Sandy Ryza ce92a9c18f SPARK-554. Add aggregateByKey.
Author: Sandy Ryza <sandy@cloudera.com>

Closes #705 from sryza/sandy-spark-554 and squashes the following commits:

2302b8f [Sandy Ryza] Add MIMA exclude
f52e0ad [Sandy Ryza] Fix Python tests for real
2f3afa3 [Sandy Ryza] Fix Python test
0b735e9 [Sandy Ryza] Fix line lengths
ae56746 [Sandy Ryza] Fix doc (replace T with V)
c2be415 [Sandy Ryza] Java and Python aggregateByKey
23bf400 [Sandy Ryza] SPARK-554.  Add aggregateByKey.
2014-06-12 08:14:25 -07:00
Henry Saputra 4d8ae709fb Cleanup on Connection and ConnectionManager
Simple cleanup on Connection and ConnectionManager to make IDE happy while working of issue:
1. Replace var with var
2. Add parentheses to Queue#dequeu to be consistent with side-effects.
3. Remove return on final line of a method.

Author: Henry Saputra <henry.saputra@gmail.com>

Closes #1060 from hsaputra/cleanup_connection_classes and squashes the following commits:

245fd09 [Henry Saputra] Cleanup on Connection and ConnectionManager to make IDE happy while working of issue: 1. Replace var with var 2. Add parentheses to Queue#dequeu to be consistent with side-effects. 3. Remove return on final line of a method.
2014-06-11 23:17:51 -07:00
Yadong e056320cc8 'killFuture' is never used
Author: Yadong <qiyadong2010@gmail.com>

Closes #1052 from watermen/bug-fix1 and squashes the following commits:

409d09a [Yadong] 'killFuture' is never used
2014-06-11 20:58:39 -07:00
Matei Zaharia 508fd371d6 [SPARK-2044] Pluggable interface for shuffles
This is a first cut at moving shuffle logic behind a pluggable interface, as described at https://issues.apache.org/jira/browse/SPARK-2044, to let us more easily experiment with new shuffle implementations. It moves the existing shuffle code to a class HashShuffleManager behind a general ShuffleManager interface.

Two things are still missing to make this complete:
* MapOutputTracker needs to be hidden behind the ShuffleManager interface; this will also require adding methods to ShuffleManager that will let the DAGScheduler interact with it as it does with the MapOutputTracker today
* The code to do map-sides and reduce-side combine in ShuffledRDD, PairRDDFunctions, etc needs to be moved into the ShuffleManager's readers and writers

However, some of these may also be done later after we merge the current interface.

Author: Matei Zaharia <matei@databricks.com>

Closes #1009 from mateiz/pluggable-shuffle and squashes the following commits:

7a09862 [Matei Zaharia] review comments
be33d3f [Matei Zaharia] review comments
1513d4e [Matei Zaharia] Add ASF header
ac56831 [Matei Zaharia] Bug fix and better error message
4f681ba [Matei Zaharia] Move write part of ShuffleMapTask to ShuffleManager
f6f011d [Matei Zaharia] Move hash shuffle reader behind ShuffleManager interface
55c7717 [Matei Zaharia] Changed RDD code to use ShuffleReader
75cc044 [Matei Zaharia] Partial work to move hash shuffle in
2014-06-11 20:45:29 -07:00
Prashant Sharma e508f599f8 [SPARK-2108] Mark SparkContext methods that return block information as developer API's
Author: Prashant Sharma <prashant.s@imaginea.com>

Closes #1047 from ScrapCodes/SPARK-2108/mark-as-dev-api and squashes the following commits:

073ee34 [Prashant Sharma] [SPARK-2108] Mark SparkContext methods that return block information as developer API's
2014-06-11 10:49:34 -07:00
witgo c48b6222ea Resolve scalatest warnings during build
Author: witgo <witgo@qq.com>

Closes #1032 from witgo/ShouldMatchers and squashes the following commits:

7ebf34c [witgo] Resolve scalatest warnings during build
2014-06-10 20:24:05 -07:00
Tathagata Das 4823bf470e [SPARK-1940] Enabling rolling of executor logs, and automatic cleanup of old executor logs
Currently, in the default log4j configuration, all the executor logs get sent to the file <code>[executor-working-dir]/stderr</code>. This does not all log files to be rolled, so old logs cannot be removed.

Using log4j RollingFileAppender allows log4j logs to be rolled, but all the logs get sent to a different set of files, other than the files <code>stdout</code> and <code>stderr</code> . So the logs are not visible in the Spark web UI any more as Spark web UI only reads the files <code>stdout</code> and <code>stderr</code>. Furthermore, it still does not allow the stdout and stderr to be cleared periodically in case a large amount of stuff gets written to them (e.g. by explicit `println` inside map function).

This PR solves this by implementing a simple `RollingFileAppender` within Spark (disabled by default). When enabled (using configuration parameter `spark.executor.rollingLogs.enabled`), the logs can get rolled over either by time interval (set with `spark.executor.rollingLogs.interval`, set to daily by default), or by size of logs (set with  `spark.executor.rollingLogs.size`). Finally, old logs can be automatically deleted by specifying how many of the latest log files to keep (set with `spark.executor.rollingLogs.keepLastN`).  The web UI has also been modified to show the logs across the rolled-over files.

You can test this locally (without waiting a whole day) by setting  configuration `spark.executor.rollingLogs.enabled=true` and `spark.executor.rollingLogs.interval=minutely`. Continuously generate logs by running spark jobs and the generated logs files would look like this (`stderr` and `stdout` are the most current log file that are being written to).

```
stderr
stderr--2014-05-27--14-37
stderr--2014-05-27--14-47
stderr--2014-05-27--15-05
stdout
stdout--2014-05-27--14-47
```

The web ui should show logs across these files.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #895 from tdas/rolling-logs and squashes the following commits:

fd8f87f [Tathagata Das] Minor change.
d326aee [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into rolling-logs
ad956c1 [Tathagata Das] Scala style fix.
1f0a6ec [Tathagata Das] Some more changes based on Patrick's PR comments.
c8bfe4e [Tathagata Das] Refactore FileAppender to a package spark.util.logging and broke up the file into multiple files. Changed configuration parameter names.
4224409 [Tathagata Das] Style fix.
108a9f8 [Tathagata Das] Added better constraint handling for rolling policies.
f7da977 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into rolling-logs
9134495 [Tathagata Das] Simplified rolling logs by removing Daily/Hourly/MinutelyRollingFileAppender, and removing the setting rollingLogs.enabled
312d874 [Tathagata Das] Minor fixes based on PR comments.
8a67d83 [Tathagata Das] Fixed comments.
b36cfd6 [Tathagata Das] Implemented RollingPolicy, TimeBasedRollingPolicy and SizeBasedRollingPolicy, and changed RollingFileAppender accordingly.
b7e8272 [Tathagata Das] Style fix,
374c9a9 [Tathagata Das] Added missing license.
24354ea [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into rolling-logs
6cc09c7 [Tathagata Das] Fixed bugs in rolling logs, and added more debug statements.
adf4910 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into rolling-logs
931f8fb [Tathagata Das] Changed log viewer in Spark web UI to handle rolling log files.
cb4fb6d [Tathagata Das] Added FileAppender and RollingFileAppender to generate rolling executor logs.
2014-06-10 20:22:02 -07:00
Nick Pentreath f971d6cb60 SPARK-1416: PySpark support for SequenceFile and Hadoop InputFormats
So I finally resurrected this PR. It seems the old one against the incubator mirror is no longer available, so I cannot reference it.

This adds initial support for reading Hadoop ```SequenceFile```s, as well as arbitrary Hadoop ```InputFormat```s, in PySpark.

# Overview
The basics are as follows:
1. ```PythonRDD``` object contains the relevant methods, that are in turn invoked by ```SparkContext``` in PySpark
2. The SequenceFile or InputFormat is read on the Scala side and converted from ```Writable``` instances to the relevant Scala classes (in the case of primitives)
3. Pyrolite is used to serialize Java objects. If this fails, the fallback is ```toString```
4. ```PickleSerializer``` on the Python side deserializes.

This works "out the box" for simple ```Writable```s:
* ```Text```
* ```IntWritable```, ```DoubleWritable```, ```FloatWritable```
* ```NullWritable```
* ```BooleanWritable```
* ```BytesWritable```
* ```MapWritable```

It also works for simple, "struct-like" classes. Due to the way Pyrolite works, this requires that the classes satisfy the JavaBeans convenstions (i.e. with fields and a no-arg constructor and getters/setters). (Perhaps in future some sugar for case classes and reflection could be added).

I've tested it out with ```ESInputFormat```  as an example and it works very nicely:
```python
conf = {"es.resource" : "index/type" }
rdd = sc.newAPIHadoopRDD("org.elasticsearch.hadoop.mr.EsInputFormat", "org.apache.hadoop.io.NullWritable", "org.elasticsearch.hadoop.mr.LinkedMapWritable", conf=conf)
rdd.first()
```

I suspect for things like HBase/Cassandra it will be a bit trickier to get it to work out the box.

# Some things still outstanding:
1. ~~Requires ```msgpack-python``` and will fail without it. As originally discussed with Josh, add a ```as_strings``` argument that defaults to ```False```, that can be used if ```msgpack-python``` is not available~~
2. ~~I see from https://github.com/apache/spark/pull/363 that Pyrolite is being used there for SerDe between Scala and Python. @ahirreddy @mateiz what is the plan behind this - is Pyrolite preferred? It seems from a cursory glance that adapting the ```msgpack```-based SerDe here to use Pyrolite wouldn't be too hard~~
3. ~~Support the key and value "wrapper" that would allow a Scala/Java function to be plugged in that would transform whatever the key/value Writable class is into something that can be serialized (e.g. convert some custom Writable to a JavaBean or ```java.util.Map``` that can be easily serialized)~~
4. Support ```saveAsSequenceFile``` and ```saveAsHadoopFile``` etc. This would require SerDe in the reverse direction, that can be handled by Pyrolite. Will work on this as a separate PR

Author: Nick Pentreath <nick.pentreath@gmail.com>

Closes #455 from MLnick/pyspark-inputformats and squashes the following commits:

268df7e [Nick Pentreath] Documentation changes mer @pwendell comments
761269b [Nick Pentreath] Address @pwendell comments, simplify default writable conversions and remove registry.
4c972d8 [Nick Pentreath] Add license headers
d150431 [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats
cde6af9 [Nick Pentreath] Parameterize converter trait
5ebacfa [Nick Pentreath] Update docs for PySpark input formats
a985492 [Nick Pentreath] Move Converter examples to own package
365d0be [Nick Pentreath] Make classes private[python]. Add docs and @Experimental annotation to Converter interface.
eeb8205 [Nick Pentreath] Fix path relative to SPARK_HOME in tests
1eaa08b [Nick Pentreath] HBase -> Cassandra app name oversight
3f90c3e [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats
2c18513 [Nick Pentreath] Add examples for reading HBase and Cassandra InputFormats from Python
b65606f [Nick Pentreath] Add converter interface
5757f6e [Nick Pentreath] Default key/value classes for sequenceFile asre None
085b55f [Nick Pentreath] Move input format tests to tests.py and clean up docs
43eb728 [Nick Pentreath] PySpark InputFormats docs into programming guide
94beedc [Nick Pentreath] Clean up args in PythonRDD. Set key/value converter defaults to None for PySpark context.py methods
1a4a1d6 [Nick Pentreath] Address @mateiz style comments
01e0813 [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats
15a7d07 [Nick Pentreath] Remove default args for key/value classes. Arg names to camelCase
9fe6bd5 [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats
84fe8e3 [Nick Pentreath] Python programming guide space formatting
d0f52b6 [Nick Pentreath] Python programming guide
7caa73a [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats
93ef995 [Nick Pentreath] Add back context.py changes
9ef1896 [Nick Pentreath] Recover earlier changes lost in previous merge for serializers.py
077ecb2 [Nick Pentreath] Recover earlier changes lost in previous merge for context.py
5af4770 [Nick Pentreath] Merge branch 'master' into pyspark-inputformats
35b8e3a [Nick Pentreath] Another fix for test ordering
bef3afb [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats
e001b94 [Nick Pentreath] Fix test failures due to ordering
78978d9 [Nick Pentreath] Add doc for SequenceFile and InputFormat support to Python programming guide
64eb051 [Nick Pentreath] Scalastyle fix
e7552fa [Nick Pentreath] Merge branch 'master' into pyspark-inputformats
44f2857 [Nick Pentreath] Remove msgpack dependency and switch serialization to Pyrolite, plus some clean up and refactoring
c0ebfb6 [Nick Pentreath] Change sequencefile test data generator to easily be called from PySpark tests
1d7c17c [Nick Pentreath] Amend tests to auto-generate sequencefile data in temp dir
17a656b [Nick Pentreath] remove binary sequencefile for tests
f60959e [Nick Pentreath] Remove msgpack dependency and serializer from PySpark
450e0a2 [Nick Pentreath] Merge branch 'master' into pyspark-inputformats
31a2fff [Nick Pentreath] Scalastyle fixes
fc5099e [Nick Pentreath] Add Apache license headers
4e08983 [Nick Pentreath] Clean up docs for PySpark context methods
b20ec7e [Nick Pentreath] Clean up merge duplicate dependencies
951c117 [Nick Pentreath] Merge branch 'master' into pyspark-inputformats
f6aac55 [Nick Pentreath] Bring back msgpack
9d2256e [Nick Pentreath] Merge branch 'master' into pyspark-inputformats
1bbbfb0 [Nick Pentreath] Clean up SparkBuild from merge
a67dfad [Nick Pentreath] Clean up Msgpack serialization and registering
7237263 [Nick Pentreath] Add back msgpack serializer and hadoop file code lost during merging
25da1ca [Nick Pentreath] Add generator for nulls, bools, bytes and maps
65360d5 [Nick Pentreath] Adding test SequenceFiles
0c612e5 [Nick Pentreath] Merge branch 'master' into pyspark-inputformats
d72bf18 [Nick Pentreath] msgpack
dd57922 [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats
e67212a [Nick Pentreath] Add back msgpack dependency
f2d76a0 [Nick Pentreath] Merge branch 'master' into pyspark-inputformats
41856a5 [Nick Pentreath] Merge branch 'master' into pyspark-inputformats
97ef708 [Nick Pentreath] Remove old writeToStream
2beeedb [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats
795a763 [Nick Pentreath] Change name to WriteInputFormatTestDataGenerator. Cleanup some var names. Use SPARK_HOME in path for writing test sequencefile data.
174f520 [Nick Pentreath] Add back graphx settings
703ee65 [Nick Pentreath] Add back msgpack
619c0fa [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats
1c8efbc [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats
eb40036 [Nick Pentreath] Remove unused comment lines
4d7ef2e [Nick Pentreath] Fix indentation
f1d73e3 [Nick Pentreath] mergeConfs returns a copy rather than mutating one of the input arguments
0f5cd84 [Nick Pentreath] Remove unused pair UTF8 class. Add comments to msgpack deserializer
4294cbb [Nick Pentreath] Add old Hadoop api methods. Clean up and expand comments. Clean up argument names
818a1e6 [Nick Pentreath] Add seqencefile and Hadoop InputFormat support to PythonRDD
4e7c9e3 [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats
c304cc8 [Nick Pentreath] Adding supporting sequncefiles for tests. Cleaning up
4b0a43f [Nick Pentreath] Refactoring utils into own objects. Cleaning up old commented-out code
d86325f [Nick Pentreath] Initial WIP of PySpark support for SequenceFile and arbitrary Hadoop InputFormat
2014-06-09 22:21:03 -07:00
Kay Ousterhout 6cf335d79a Added a TaskSetManager unit test.
This test ensures that when there are no
alive executors that satisfy a particular locality level,
the TaskSetManager doesn't ever use that as the maximum
allowed locality level (this optimization ensures that a
job doesn't wait extra time in an attempt to satisfy
a scheduling locality level that is impossible).

@mateiz and @lirui-intel this unit test illustrates an issue
with #892 (it fails with that patch).

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #1024 from kayousterhout/scheduler_unit_test and squashes the following commits:

de6a08f [Kay Ousterhout] Added a TaskSetManager unit test.
2014-06-09 13:13:53 -07:00
Andrew Ash 35630c86ff SPARK-1944 Document --verbose in spark-shell -h
https://issues.apache.org/jira/browse/SPARK-1944

Author: Andrew Ash <andrew@andrewash.com>

Closes #1020 from ash211/SPARK-1944 and squashes the following commits:

a831c4d [Andrew Ash] SPARK-1944 Document --verbose in spark-shell -h
2014-06-09 10:21:21 -07:00
Andrew Ash 32ee9f0668 Grammar: read -> reads
Author: Andrew Ash <andrew@andrewash.com>

Closes #1016 from ash211/patch-6 and squashes the following commits:

e3865c8 [Andrew Ash] Grammar: read -> reads
2014-06-08 23:20:10 -07:00
Neville Li 15ddbef414 [SPARK-2067] use relative path for Spark logo in UI
Author: Neville Li <neville@spotify.com>

Closes #1006 from nevillelyh/gh/SPARK-2067 and squashes the following commits:

9ee64cf [Neville Li] [SPARK-2067] use relative path for Spark logo in UI
2014-06-08 23:18:27 -07:00
Reynold Xin 219dc00b30 SPARK-1628 follow up: Improve RangePartitioner's documentation.
Adding a paragraph clarifying a weird behavior in RangePartitioner.

See also #549.

Author: Reynold Xin <rxin@apache.org>

Closes #1012 from rxin/partitioner-doc and squashes the following commits:

6f0109e [Reynold Xin] SPARK-1628 follow up: Improve RangePartitioner's documentation.
2014-06-08 18:39:57 -07:00
zsxwing a71c6d1cf0 SPARK-1628: Add missing hashCode methods in Partitioner subclasses
JIRA: https://issues.apache.org/jira/browse/SPARK-1628

Added `hashCode` in HashPartitioner, RangePartitioner, PythonPartitioner and PageRankUtils.CustomPartitioner.

Author: zsxwing <zsxwing@gmail.com>

Closes #549 from zsxwing/SPARK-1628 and squashes the following commits:

2620936 [zsxwing] SPARK-1628: Add missing hashCode methods in Partitioner subclasses
2014-06-08 14:18:52 -07:00
Neville Li 7b877b2705 SPARK-2056 Set RDD name to input path
Author: Neville Li <neville@spotify.com>

Closes #992 from nevillelyh/master and squashes the following commits:

3011739 [Neville Li] [SPARK-2056] Set RDD name to input path
2014-06-07 16:22:26 -07:00
witgo 41c4a33105 [SPARK-1841]: update scalatest to version 2.1.5
Author: witgo <witgo@qq.com>

Closes #713 from witgo/scalatest and squashes the following commits:

b627a6a [witgo] merge master
51fb3d6 [witgo] merge master
3771474 [witgo] fix RDDSuite
996d6f9 [witgo] fix TimeStampedWeakValueHashMap test
9dfa4e7 [witgo] merge bug
1479b22 [witgo] merge master
29b9194 [witgo] fix code style
022a7a2 [witgo] fix test dependency
a52c0fa [witgo] fix test dependency
cd8f59d [witgo] Merge branch 'master' of https://github.com/apache/spark into scalatest
046540d [witgo] fix RDDSuite.scala
2c543b9 [witgo] fix ReplSuite.scala
c458928 [witgo] update scalatest to version 2.1.5
2014-06-06 11:45:21 -07:00
Matei Zaharia b45c13e7d7 SPARK-2043: ExternalAppendOnlyMap doesn't always find matching keys
The current implementation reads one key with the next hash code as it finishes reading the keys with the current hash code, which may cause it to miss some matches of the next key. This can cause operations like join to give the wrong result when reduce tasks spill to disk and there are hash collisions, as values won't be matched together. This PR fixes it by not reading in that next key, using a peeking iterator instead.

Author: Matei Zaharia <matei@databricks.com>

Closes #986 from mateiz/spark-2043 and squashes the following commits:

0959514 [Matei Zaharia] Added unit test for having many hash collisions
892debb [Matei Zaharia] SPARK-2043: don't read a key with the next hash code in ExternalAppendOnlyMap, instead use a buffered iterator to only read values with the current hash code.
2014-06-05 23:01:48 -07:00
CrazyJvm 3d3f8c8004 Use pluggable clock in DAGSheduler #SPARK-2031
DAGScheduler supports pluggable clock like what TaskSetManager does.

Author: CrazyJvm <crazyjvm@gmail.com>

Closes #976 from CrazyJvm/clock and squashes the following commits:

6779a4c [CrazyJvm] Use pluggable clock in DAGSheduler
2014-06-05 17:44:46 -07:00
CodingCat 89cdbb087c SPARK-1677: allow user to disable output dir existence checking
https://issues.apache.org/jira/browse/SPARK-1677

For compatibility with older versions of Spark it would be nice to have an option `spark.hadoop.validateOutputSpecs` (default true)  for the user to disable the output directory existence checking

Author: CodingCat <zhunansjtu@gmail.com>

Closes #947 from CodingCat/SPARK-1677 and squashes the following commits:

7930f83 [CodingCat] miao
c0c0e03 [CodingCat] bug fix and doc update
5318562 [CodingCat] bug fix
13219b5 [CodingCat] allow user to disable output dir existence checking
2014-06-05 11:39:35 -07:00
Takuya UESHIN 7c160293d6 [SPARK-2029] Bump pom.xml version number of master branch to 1.1.0-SNAPSHOT.
Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #974 from ueshin/issues/SPARK-2029 and squashes the following commits:

e19e8f4 [Takuya UESHIN] Bump version number to 1.1.0-SNAPSHOT.
2014-06-05 11:27:33 -07:00
Colin McCabe 1765c8d0dd SPARK-1518: FileLogger: Fix compile against Hadoop trunk
In Hadoop trunk (currently Hadoop 3.0.0), the deprecated
FSDataOutputStream#sync() method has been removed.  Instead, we should
call FSDataOutputStream#hflush, which does the same thing as the
deprecated method used to do.

Author: Colin McCabe <cmccabe@cloudera.com>

Closes #898 from cmccabe/SPARK-1518 and squashes the following commits:

752b9d7 [Colin McCabe] FileLogger: Fix compile against Hadoop trunk
2014-06-04 15:56:29 -07:00
Sean Owen d341b17c2a SPARK-1973. Add randomSplit to JavaRDD (with tests, and tidy Java tests)
I'd like to use randomSplit through the Java API, and would like to add a convenience wrapper for this method to JavaRDD. This is fairly trivial. (In fact, is the intent that JavaRDD not wrap every RDD method? and that sometimes users should just use JavaRDD.wrapRDD()?)

Along the way, I added tests for it, and also touched up the Java API test style and behavior. This is maybe the more useful part of this small change.

Author: Sean Owen <sowen@cloudera.com>
Author: Xiangrui Meng <meng@databricks.com>

This patch had conflicts when merged, resolved by
Committer: Xiangrui Meng <meng@databricks.com>

Closes #919 from srowen/SPARK-1973 and squashes the following commits:

148cb7b [Sean Owen] Some final Java test polish, while we are at it
1fc3f3e [Xiangrui Meng] more cleaning on Java 8 tests
9ebc57f [Sean Owen] Use accumulator instead of temp files to test foreach
5efb0be [Sean Owen] Add Java randomSplit, and unit tests (including for sample)
5dcc158 [Sean Owen] Simplified Java 8 test with new language features, and fixed the name of MLB's greatest team
91a1769 [Sean Owen] Touch up minor style issues in existing Java API suite test
2014-06-04 11:27:08 -07:00
Kan Zhang c402a4a685 [SPARK-1817] RDD.zip() should verify partition sizes for each partition
RDD.zip() will throw an exception if it finds partition sizes are not the same.

Author: Kan Zhang <kzhang@apache.org>

Closes #944 from kanzhang/SPARK-1817 and squashes the following commits:

c073848 [Kan Zhang] [SPARK-1817] Cosmetic updates
524c670 [Kan Zhang] [SPARK-1817] RDD.zip() should verify partition sizes for each partition
2014-06-03 22:47:18 -07:00
Sean Owen 4ca0625669 SPARK-1806 (addendum) Use non-deprecated methods in Mesos 0.18
The update to Mesos 0.18 caused some deprecation warnings in the build. The change to the non-deprecated version is straightforward as it emulates what the Mesos driver does with the deprecated method anyway (c5aa1dd221/src/sched/sched.cpp (L1354))

Author: Sean Owen <sowen@cloudera.com>

Closes #920 from srowen/SPARK-1806 and squashes the following commits:

8d76b6a [Sean Owen] Use non-deprecated methods in Mesos 0.18
2014-06-03 22:37:20 -07:00
Reynold Xin 1faef149f7 SPARK-1941: Update streamlib to 2.7.0 and use HyperLogLogPlus instead of HyperLogLog.
I also corrected some errors made in the previous HLL count approximate API, including relativeSD wasn't really a measure for error (and we used it to test error bounds in test results).

Author: Reynold Xin <rxin@apache.org>

Closes #897 from rxin/hll and squashes the following commits:

4d83f41 [Reynold Xin] New error bound and non-randomness.
f154ea0 [Reynold Xin] Added a comment on the value bound for testing.
e367527 [Reynold Xin] One more round of code review.
41e649a [Reynold Xin] Update final mima list.
9e320c8 [Reynold Xin] Incorporate code review feedback.
e110d70 [Reynold Xin] Merge branch 'master' into hll
354deb8 [Reynold Xin] Added comment on the Mima exclude rules.
acaa524 [Reynold Xin] Added the right exclude rules in MimaExcludes.
6555bfe [Reynold Xin] Added a default method and re-arranged MimaExcludes.
1db1522 [Reynold Xin] Excluded util.SerializableHyperLogLog from MIMA check.
9221b27 [Reynold Xin] Merge branch 'master' into hll
88cfe77 [Reynold Xin] Updated documentation and restored the old incorrect API to maintain API compatibility.
1294be6 [Reynold Xin] Updated HLL+.
e7786cb [Reynold Xin] Merge branch 'master' into hll
c0ef0c2 [Reynold Xin] SPARK-1941: Update streamlib to 2.7.0 and use HyperLogLogPlus instead of HyperLogLog.
2014-06-03 18:37:40 -07:00
Ankur Dave b1feb60209 [SPARK-1991] Support custom storage levels for vertices and edges
This PR adds support for specifying custom storage levels for the vertices and edges of a graph. This enables GraphX to handle graphs larger than memory size by specifying MEMORY_AND_DISK and then repartitioning the graph to use many small partitions, each of which does fit in memory. Spark will then automatically load partitions from disk as needed.

The user specifies the desired vertex and edge storage levels when building the graph by passing them to the graph constructor. These are then stored in the `targetStorageLevel` attribute of the VertexRDD and EdgeRDD respectively. Whenever GraphX needs to cache a VertexRDD or EdgeRDD (because it plans to use it more than once, for example), it uses the specified target storage level. Also, when the user calls `Graph#cache()`, the vertices and edges are persisted using their target storage levels.

In order to facilitate propagating the target storage levels across VertexRDD and EdgeRDD operations, we remove raw calls to the constructors and instead introduce the `withPartitionsRDD` and `withTargetStorageLevel` methods.

I tested this change by running PageRank and triangle count on a severely memory-constrained cluster (1 executor with 300 MB of memory, and a 1 GB graph). Before this PR, these algorithms used to fail with OutOfMemoryErrors. With this PR, and using the DISK_ONLY storage level, they succeed.

Author: Ankur Dave <ankurdave@gmail.com>

Closes #946 from ankurdave/SPARK-1991 and squashes the following commits:

ce17d95 [Ankur Dave] Move pickStorageLevel to StorageLevel.fromString
ccaf06f [Ankur Dave] Shadow members in withXYZ() methods rather than using underscores
c34abc0 [Ankur Dave] Exclude all of GraphX from compatibility checks vs. 1.0.0
c5ca068 [Ankur Dave] Revert "Exclude all of GraphX from binary compatibility checks"
34bcefb [Ankur Dave] Exclude all of GraphX from binary compatibility checks
6fdd137 [Ankur Dave] [SPARK-1991] Support custom storage levels for vertices and edges
2014-06-03 14:54:26 -07:00
Wenchen Fan(Cloud) 45e9bc85db [SPARK-1912] fix compress memory issue during reduce
When we need to read a compressed block, we will first create a compress stream instance(LZF or Snappy) and use it to wrap that block.
Let's say a reducer task need to read 1000 local shuffle blocks, it will first prepare to read that 1000 blocks, which means create 1000 compression stream instance to wrap them. But the initialization of compression instance will allocate some memory and when we have many compression instance at the same time, it is a problem.
Actually reducer reads the shuffle blocks one by one, so we can do the compression instance initialization lazily.

Author: Wenchen Fan(Cloud) <cloud0fan@gmail.com>

Closes #860 from cloud-fan/fix-compress and squashes the following commits:

0924a6b [Wenchen Fan(Cloud)] rename 'doWork' into 'getIterator'
07f32c2 [Wenchen Fan(Cloud)] move the LazyProxyIterator to dataDeserialize
d80c426 [Wenchen Fan(Cloud)] remove empty lines in short class
2c8adb2 [Wenchen Fan(Cloud)] add inline comment
8ebff77 [Wenchen Fan(Cloud)] fix compress memory issue during reduce
2014-06-03 13:18:20 -07:00
Syed Hashmi 7782a304ad [SPARK-1942] Stop clearing spark.driver.port in unit tests
stop resetting spark.driver.port in unit tests (scala, java and python).

Author: Syed Hashmi <shashmi@cloudera.com>
Author: CodingCat <zhunansjtu@gmail.com>

Closes #943 from syedhashmi/master and squashes the following commits:

885f210 [Syed Hashmi] Removing unnecessary file (created by mergetool)
b8bd4b5 [Syed Hashmi] Merge remote-tracking branch 'upstream/master'
b895e59 [Syed Hashmi] Revert "[SPARK-1784] Add a new partitioner"
57b6587 [Syed Hashmi] Revert "[SPARK-1784] Add a balanced partitioner"
1574769 [Syed Hashmi] [SPARK-1942] Stop clearing spark.driver.port in unit tests
4354836 [Syed Hashmi] Revert "SPARK-1686: keep schedule() calling in the main thread"
fd36542 [Syed Hashmi] [SPARK-1784] Add a balanced partitioner
6668015 [CodingCat] SPARK-1686: keep schedule() calling in the main thread
4ca94cc [Syed Hashmi] [SPARK-1784] Add a new partitioner
2014-06-03 12:04:47 -07:00
witgo d8c005d537 Improve maven plugin configuration
Author: witgo <witgo@qq.com>

Closes #786 from witgo/maven_plugin and squashes the following commits:

5de86a2 [witgo] Merge branch 'master' of https://github.com/apache/spark into maven_plugin
c35ef73 [witgo] Improve maven plugin configuration
2014-05-31 14:36:27 -07:00
Aaron Davidson 9909efc10a SPARK-1839: PySpark RDD#take() shouldn't always read from driver
This patch simply ports over the Scala implementation of RDD#take(), which reads the first partition at the driver, then decides how many more partitions it needs to read and will possibly start a real job if it's more than 1. (Note that SparkContext#runJob(allowLocal=true) only runs the job locally if there's 1 partition selected and no parent stages.)

Author: Aaron Davidson <aaron@databricks.com>

Closes #922 from aarondav/take and squashes the following commits:

fa06df9 [Aaron Davidson] SPARK-1839: PySpark RDD#take() shouldn't always read from driver
2014-05-31 13:04:57 -07:00
Aaron Davidson 7d52777eff Super minor: Close inputStream in SparkSubmitArguments
`Properties#load()` doesn't close the InputStream, but it'd be closed after being GC'd anyway...

Also changed file.getName to file, because getName only shows the filename. This will show the full (possibly relative) path, which is less confusing if it's not found.

Author: Aaron Davidson <aaron@databricks.com>

Closes #914 from aarondav/tiny and squashes the following commits:

db9d072 [Aaron Davidson] Super minor: Close inputStream in SparkSubmitArguments
2014-05-31 12:36:58 -07:00
Chen Chao 9ecc40d3ae correct tiny comment error
Author: Chen Chao <crazyjvm@gmail.com>

Closes #928 from CrazyJvm/patch-8 and squashes the following commits:

144328b [Chen Chao] correct tiny comment error
2014-05-31 00:06:49 -07:00
Zhen Peng ff562b2396 [SPARK-1901] worker should make sure executor has exited before updating executor's info
https://issues.apache.org/jira/browse/SPARK-1901

Author: Zhen Peng <zhenpeng01@baidu.com>

Closes #854 from zhpengg/bugfix-worker-kills-executor and squashes the following commits:

21d380b [Zhen Peng] add some error messages
506cea6 [Zhen Peng] add some docs for killProcess()
a0b9860 [Zhen Peng] [SPARK-1901] worker should make sure executor has exited before updating executor's info
2014-05-30 10:12:51 -07:00
witgo 4dbb27b0cf [SPARK-1712]: TaskDescription instance is too big causes Spark to hang
Author: witgo <witgo@qq.com>

Closes #694 from witgo/SPARK-1712_new and squashes the following commits:

0f52483 [witgo] review commit
83ce29b [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new
52e6752 [witgo] reset test SparkContext
63636b6 [witgo] review commit
44a59ee [witgo] review commit
3b6d48c [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new
926bd6a [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new
9a5cfad [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new
03cc562 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new
b0930b0 [witgo] review commit
b1174bd [witgo] merge master
f76679b [witgo] merge master
689495d [witgo] fix scala style bug
1d35c3c [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new
062c182 [witgo] fix small bug for code style
0a428cf [witgo] add unit tests
158b2dc [witgo] review commit
4afe71d [witgo] review commit
9e4ffa7 [witgo] review commit
1d35c7d [witgo] fix hang
7965580 [witgo] fix Statement order
0e29eac [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new
3ea1ca1 [witgo] remove duplicate serialize
743a7ad [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new
86e2048 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new
2a89adc [witgo] SPARK-1712: TaskDescription instance is too big causes Spark to hang
2014-05-28 15:57:05 -07:00
lianhuiwang 95e4c9c6fb bugfix worker DriverStateChanged state should match DriverState.FAILED
bugfix worker DriverStateChanged state should match DriverState.FAILED

Author: lianhuiwang <lianhuiwang09@gmail.com>

Closes #864 from lianhuiwang/master and squashes the following commits:

480ce94 [lianhuiwang] address aarondav comments
f2b5970 [lianhuiwang] bugfix worker DriverStateChanged state should match DriverState.FAILED
2014-05-27 11:53:38 -07:00
zsxwing 549830b0db SPARK-1932: Fix race conditions in onReceiveCallback and cachedPeers
`var cachedPeers: Seq[BlockManagerId] = null` is used in `def replicate(blockId: BlockId, data: ByteBuffer, level: StorageLevel)` without proper protection.

There are two place will call `replicate(blockId, bytesAfterPut, level)`
* 17f3075bc4/core/src/main/scala/org/apache/spark/storage/BlockManager.scala (L644) runs in `connectionManager.futureExecContext`
* 17f3075bc4/core/src/main/scala/org/apache/spark/storage/BlockManager.scala (L752) `doPut` runs in `connectionManager.handleMessageExecutor`. `org.apache.spark.storage.BlockManagerWorker` calls `blockManager.putBytes` in `connectionManager.handleMessageExecutor`.

As they run in different `Executor`s, this is a race condition which may cause the memory pointed by `cachedPeers` is not correct even if `cachedPeers != null`.

The race condition of `onReceiveCallback` is that it's set in `BlockManagerWorker` but read in a different thread in `ConnectionManager.handleMessageExecutor`.

Author: zsxwing <zsxwing@gmail.com>

Closes #887 from zsxwing/SPARK-1932 and squashes the following commits:

524f69c [zsxwing] SPARK-1932: Fix race conditions in onReceiveCallback and cachedPeers
2014-05-26 23:17:39 -07:00
Reynold Xin 90e281b55a SPARK-1933: Throw a more meaningful exception when a directory is passed to addJar/addFile.
https://issues.apache.org/jira/browse/SPARK-1933

Author: Reynold Xin <rxin@apache.org>

Closes #888 from rxin/addfile and squashes the following commits:

8c402a3 [Reynold Xin] Updated comment.
ff6c162 [Reynold Xin] SPARK-1933: Throw a more meaningful exception when a directory is passed to addJar/addFile.
2014-05-26 22:05:23 -07:00
Reynold Xin ef690e1f69 Fixed the error message for OutOfMemoryError in DAGScheduler. 2014-05-26 21:31:27 -07:00
Zhen Peng 8d271c90fa SPARK-1929 DAGScheduler suspended by local task OOM
DAGScheduler does not handle local task OOM properly, and will wait for the job result forever.

Author: Zhen Peng <zhenpeng01@baidu.com>

Closes #883 from zhpengg/bugfix-dag-scheduler-oom and squashes the following commits:

76f7eda [Zhen Peng] remove redundant memory allocations
aa63161 [Zhen Peng] SPARK-1929 DAGScheduler suspended by local task OOM
2014-05-26 21:30:25 -07:00
Patrick Wendell b6d22af040 HOTFIX: Add no-arg SparkContext constructor in Java
Self explanatory.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #878 from pwendell/java-constructor and squashes the following commits:

2cc1605 [Patrick Wendell] HOTFIX: Add no-arg SparkContext constructor in Java
2014-05-25 20:13:32 -07:00
Zhen Peng 4e4831b8fa [SPARK-1886] check executor id existence when executor exit
Author: Zhen Peng <zhenpeng01@baidu.com>

Closes #827 from zhpengg/bugfix-executor-id-not-found and squashes the following commits:

cd8bb65 [Zhen Peng] bugfix: check executor id existence when executor exit
2014-05-24 20:40:19 -07:00
Andrew Or 5081a0a9d4 [SPARK-1900 / 1918] PySpark on YARN is broken
If I run the following on a YARN cluster
```
bin/spark-submit sheep.py --master yarn-client
```
it fails because of a mismatch in paths: `spark-submit` thinks that `sheep.py` resides on HDFS, and balks when it can't find the file there. A natural workaround is to add the `file:` prefix to the file:
```
bin/spark-submit file:/path/to/sheep.py --master yarn-client
```
However, this also fails. This time it is because python does not understand URI schemes.

This PR fixes this by automatically resolving all paths passed as command line argument to `spark-submit` properly. This has the added benefit of keeping file and jar paths consistent across different cluster modes. For python, we strip the URI scheme before we actually try to run it.

Much of the code is originally written by @mengxr. Tested on YARN cluster. More tests pending.

Author: Andrew Or <andrewor14@gmail.com>

Closes #853 from andrewor14/submit-paths and squashes the following commits:

0bb097a [Andrew Or] Format path correctly before adding it to PYTHONPATH
323b45c [Andrew Or] Include --py-files on PYTHONPATH for pyspark shell
3c36587 [Andrew Or] Improve error messages (minor)
854aa6a [Andrew Or] Guard against NPE if user gives pathological paths
6638a6b [Andrew Or] Fix spark-shell jar paths after #849 went in
3bb0359 [Andrew Or] Update more comments (minor)
2a1f8a0 [Andrew Or] Update comments (minor)
6af2c77 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths
a68c4d1 [Andrew Or] Handle Windows python file path correctly
427a250 [Andrew Or] Resolve paths properly for Windows
a591a4a [Andrew Or] Update tests for resolving URIs
6c8621c [Andrew Or] Move resolveURIs to Utils
db8255e [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths
f542dce [Andrew Or] Fix outdated tests
691c4ce [Andrew Or] Ignore special primary resource names
5342ac7 [Andrew Or] Add missing space in error message
02f77f3 [Andrew Or] Resolve command line arguments to spark-submit properly
2014-05-24 18:01:49 -07:00