Commit graph

2142 commits

Author SHA1 Message Date
Josh Rosen fc5b2e8b83 Merge pull request #457 from markhamstra/commutative
Add commutative requirement for 'reduce' to Python docstring.
2013-02-09 15:54:48 -08:00
Mark Hamstra b7a1fb5c5d Add commutative requirement for 'reduce' to Python docstring. 2013-02-09 12:14:11 -08:00
Matei Zaharia 51db4c1f30 Merge pull request #453 from markhamstra/commutative
Change docs on 'reduce' since the merging of local reduces no longer pre...
2013-02-09 10:36:30 -08:00
Mark Hamstra b8863a79d3 Merge branch 'master' of https://github.com/mesos/spark into commutative
Conflicts:
	core/src/main/scala/spark/RDD.scala
2013-02-08 18:26:00 -08:00
Matei Zaharia b53174a6f3 Merge pull request #454 from MLnick/ipython
SPARK-685 Adding IPYTHON environment variable support for launching pyspark using ...
2013-02-07 18:29:04 -08:00
Nick Pentreath 21d3946d17 Adding IPYTHON environment variable support for launching pyspark using ipython shell 2013-02-07 16:54:31 +02:00
Mark Hamstra 934a53c8b6 Change docs on 'reduce' since the merging of local reduces no longer preserves
ordering, so the reduce function must also be commutative.
2013-02-05 22:19:58 -08:00
Matei Zaharia 9cfa068379 Merge pull request #450 from stephenh/inlinemergepair
Inline mergePair to look more like the narrow dep branch.
2013-02-05 18:28:44 -08:00
Matei Zaharia 03eefbb200 Merge pull request #451 from stephenh/fixdeathpactexception
Handle Terminated to avoid endless DeathPactExceptions.
2013-02-05 18:27:54 -08:00
Stephen Haberman 870b2aaf5d Merge branch 'master' into fixdeathpactexception
Conflicts:
	core/src/main/scala/spark/deploy/worker/Worker.scala
2013-02-05 20:27:09 -06:00
Matei Zaharia a4611d66f0 Merge pull request #449 from stephenh/longerdriversuite
Increase DriverSuite timeout.
2013-02-05 17:58:22 -08:00
Stephen Haberman 0e19093fd8 Handle Terminated to avoid endless DeathPactExceptions.
Credit to Roland Kuhn, Akka's tech lead, for pointing out this
various obvious fix, but StandaloneExecutorBackend.preStart's
catch block would never (ever) get hit, because all of the
operation's in preStart are async.

So, the System.exit in the catch block was skipped, and instead
Akka was sending Terminated messages which, since we didn't
handle, it turned into DeathPactException, which started
a postRestart/preStart infinite loop.
2013-02-05 18:58:00 -06:00
Stephen Haberman 1ba3393ceb Increase DriverSuite timeout. 2013-02-05 17:56:50 -06:00
Stephen Haberman 8bd0e888f3 Inline mergePair to look more like the narrow dep branch.
No functionality changes, I think this is just more consistent
given mergePair isn't called multiple times/recursive.

Also added a comment to explain the usual case of having two parent RDDs.
2013-02-05 17:50:25 -06:00
Matei Zaharia 2d9eca9fbb Merge pull request #447 from pwendell/streaming-constructor
Streaming constructor which takes JavaSparkContext
2013-02-05 11:45:44 -08:00
Patrick Wendell 7eea64aa4c Streaming constructor which takes JavaSparkContext
It's sometimes helpful to directly pass a JavaSparkContext,
and take advantage of the various constructors available for that.
2013-02-05 11:43:16 -08:00
Matei Zaharia f6ec547ea7 Small fix to test for distinct 2013-02-04 13:14:54 -08:00
Matei Zaharia aa4ee1e9e5 Fix failing test 2013-02-04 11:06:31 -08:00
Matei Zaharia f7b4e428be Merge pull request #445 from JoshRosen/pyspark_fixes
Fix exit status in PySpark unit tests; fix/optimize PySpark's RDD.take()
2013-02-03 21:36:36 -08:00
Josh Rosen e61729113d Remove unnecessary doctest __main__ methods. 2013-02-03 21:29:40 -08:00
Matei Zaharia 3bfaf3ab1d Merge pull request #379 from stephenh/sparkmem
Add spark.executor.memory to differentiate executor memory from spark-shell
2013-02-02 23:58:23 -08:00
Matei Zaharia 88ee6163a1 Merge pull request #422 from squito/blockmanager_info
RDDInfo available from SparkContext
2013-02-02 23:44:13 -08:00
Matei Zaharia cd4ca93679 Merge pull request #436 from stephenh/removeextraloop
Once we find a split with no block, we don't have to look for more.
2013-02-02 23:39:28 -08:00
Matei Zaharia d5daaab381 Merge pull request #442 from stephenh/fixsystemnames
Fix createActorSystem not actually using the systemName parameter.
2013-02-02 23:38:46 -08:00
Matei Zaharia 9163c3705d Formatting 2013-02-02 23:34:47 -08:00
Josh Rosen 8fbd5380b7 Fetch fewer objects in PySpark's take() method. 2013-02-03 06:44:49 +00:00
Josh Rosen 2415c18f48 Fix reporting of PySpark doctest failures. 2013-02-03 06:44:11 +00:00
Matei Zaharia 34a7bcdb3a Formatting 2013-02-02 19:40:30 -08:00
Matei Zaharia 85019d76a4 Merge pull request #427 from woggling/dag-sched-tests
Tests for DAGScheduler
2013-02-02 19:09:59 -08:00
Stephen Haberman 7aba123f0c Further simplify checking for Nil. 2013-02-02 13:53:28 -06:00
Charles Reiss 6107957962 Merge remote-tracking branch 'base/master' into dag-sched-tests
Conflicts:
	core/src/main/scala/spark/scheduler/DAGScheduler.scala
2013-02-02 00:33:30 -08:00
Stephen Haberman cae8a6795c Fix dangling old variable names. 2013-02-02 02:15:39 -06:00
Stephen Haberman 696eec32c9 Move executorMemory up into SchedulerBackend. 2013-02-02 02:03:26 -06:00
Stephen Haberman 103c375ba0 Merge branch 'master' into sparkmem 2013-02-02 01:57:18 -06:00
Stephen Haberman 28e0cb9f31 Fix createActorSystem not actually using the systemName parameter.
This meant all system names were "spark", which worked, but didn't
lead to the most intuitive log output.

This fixes createActorSystem to use the passed system name, and
refactors Master/Worker to encapsulate their system/actor names
instead of having the clients guess at them.

Note that the driver system name, "spark", is left as is, and is
still repeated a few times, but that seems like a separate issue.
2013-02-02 01:11:37 -06:00
Charles Reiss 1fd5ee323d Code review changes: add sc.stop; style of multiline comments; parens on procedure calls. 2013-02-01 22:33:38 -08:00
Matei Zaharia ae26911ec0 Add back test for distinct without parens 2013-02-01 21:07:24 -08:00
Matei Zaharia 7ae4b6a23d Merge pull request #441 from stephenh/lessnoisyakka
Reduce the amount of duplicate logging Akka does to stdout.
2013-02-01 21:03:37 -08:00
Stephen Haberman 12c1eb4756 Reduce the amount of duplicate logging Akka does to stdout.
Given we have Akka logging go through SLF4j to log4j, we don't need
all the extra noise of Akka's stdout logger that is supposedly only
used during Akka init time but seems to continue logging lots of
noisy network events that we either don't care about or are in the
log4j logs anyway.

See:

http://doc.akka.io/docs/akka/2.0/general/configuration.html

    # Log level for the very basic logger activated during AkkaApplication startup
    # Options: ERROR, WARNING, INFO, DEBUG
    # stdout-loglevel = "WARNING"
2013-02-01 21:21:44 -06:00
Matei Zaharia 8b3041c723 Reduced the memory usage of reduce and similar operations
These operations used to wait for all the results to be available in an
array on the driver program before merging them. They now merge values
incrementally as they arrive.
2013-02-01 15:38:42 -08:00
Matei Zaharia 4529876db0 Merge branch 'master' of github.com:mesos/spark 2013-02-01 14:07:38 -08:00
Matei Zaharia 9970926ede formatting 2013-02-01 14:07:34 -08:00
Matei Zaharia 79c24abe4c Merge pull request #432 from stephenh/moreprivacy
Add more private declarations.
2013-02-01 14:06:55 -08:00
Matei Zaharia de340ddf0b Merge pull request #437 from stephenh/cancelmetacleaner
Stop BlockManagers metadataCleaner.
2013-02-01 12:59:25 -08:00
Matei Zaharia 0455650713 Merge pull request #439 from JoshRosen/spark-580
Use spark.local.dir for PySpark temp files (SPARK-580).
2013-02-01 12:07:42 -08:00
Josh Rosen e211f405bc Use spark.local.dir for PySpark temp files (SPARK-580). 2013-02-01 11:50:27 -08:00
Matei Zaharia b6a6092177 Merge pull request #438 from JoshRosen/spark-674
Do not launch JavaGateways on workers (SPARK-674).
2013-02-01 11:29:47 -08:00
Josh Rosen 9cc6ff9c4e Do not launch JavaGateways on workers (SPARK-674).
The problem was that the gateway was being initialized whenever the
pyspark.context module was loaded.  The fix uses lazy initialization
that occurs only when SparkContext instances are actually constructed.

I also made the gateway and jvm variables private.

This change results in ~3-4x performance improvement when running the
PySpark unit tests.
2013-02-01 11:13:10 -08:00
Imran Rashid c6190067ae remove unneeded (and unused) filter on block info 2013-02-01 09:55:25 -08:00
Stephen Haberman 59c57e48df Stop BlockManagers metadataCleaner. 2013-02-01 10:34:02 -06:00