Commit graph

597 commits

Author SHA1 Message Date
Matei Zaharia 6607f546cc Added an option to spread out jobs in the standalone mode. 2012-11-08 23:13:12 -08:00
Matei Zaharia 66cbdee941 Fix for connections not being reused (from Josh Rosen) 2012-11-08 09:53:40 -08:00
Imran Rashid 809b2bb1fe fix bug in getting slave id out of mesos 2012-11-08 00:34:28 -08:00
Matei Zaharia bb1bce7924 Various fixes to standalone mode and web UI:
- Don't report a job as finishing multiple times
- Don't show state of workers as LOADING when they're running
- Show start and finish times in web UI
- Sort web UI tables by ID and time by default
2012-11-07 16:49:53 -08:00
Matei Zaharia e2b8477487 Made Akka timeout and message frame size configurable, and upped the defaults 2012-11-06 15:58:05 -08:00
Shivaram Venkataraman a7d967a1ca Remove unnecessary hash-map put in MemoryStore 2012-11-01 10:46:38 -07:00
root e782187b4a Don't throw an error in the block manager when a block is cached on the master due to
a locally computed operation

Conflicts:

	core/src/main/scala/spark/storage/BlockManagerMaster.scala
2012-10-26 00:33:45 -07:00
Matei Zaharia f63a40fd99 Strip leading mesos:// in URLs passed to Mesos 2012-10-24 21:52:13 -07:00
Matei Zaharia d290e964ea Merge pull request #281 from rxin/memreport
Added a method to report slave memory status; force serialize accumulator update in local mode.
2012-10-23 22:04:35 -07:00
Matei Zaharia 0bd20c63e2 Merge remote-tracking branch 'JoshRosen/shuffle_refactoring' into dev
Conflicts:
	core/src/main/scala/spark/Dependency.scala
	core/src/main/scala/spark/rdd/CoGroupedRDD.scala
	core/src/main/scala/spark/rdd/ShuffledRDD.scala
2012-10-23 22:01:45 -07:00
Thomas Dudziak d9c2a89c57 Support for Hadoop 2 distributions such as cdh4 2012-10-18 16:08:54 -07:00
Reynold Xin 63fae9bc23 Serialize accumulator updates in TaskResult for local mode. 2012-10-15 21:38:28 -07:00
Reynold Xin 42d20fa8da Added a method to report slave memory status. 2012-10-14 22:30:53 -07:00
Matei Zaharia 64dbf8d372 Made ShuffleDependency automatically find a shuffle ID for itself 2012-10-14 10:00:22 -07:00
Matei Zaharia 8815aeba0c Take executor environment vars as an arguemnt to SparkContext 2012-10-13 15:31:11 -07:00
Josh Rosen 33cd3a0c12 Remove map-side combining from ShuffleMapTask.
This separation of concerns simplifies the 
ShuffleDependency and ShuffledRDD interfaces.

Map-side combining can be performed in a
mapPartitions() call prior to shuffling the RDD.

I don't anticipate this having much of a 
performance impact: in both approaches, each tuple
is hashed twice: once in the bucket partitioning
and once in the combiner's hashtable.  The same
steps are being performed, but in a different
order and through one extra Iterator.
2012-10-13 14:59:20 -07:00
Josh Rosen 10bcd217d2 Remove mapSideCombine field from Aggregator.
Instead, the presence or absense of a ShuffleDependency's aggregator
will control whether map-side combining is performed.
2012-10-13 14:59:20 -07:00
Josh Rosen 4775c55641 Change ShuffleFetcher to return an Iterator. 2012-10-13 14:59:20 -07:00
Josh Rosen 110832e88f Add helper methods to Aggregator. 2012-10-13 14:57:56 -07:00
Denny 0700d1920a Protect from null env variables in mesos. 2012-10-13 13:57:59 -07:00
Denny 21047d923e Protect from setting null environment variables. 2012-10-13 13:44:24 -07:00
Denny fa41d50f7d Don't use system envs for Mesos. 2012-10-13 13:15:50 -07:00
Denny 67c42a41d0 Let the user specify environment variables to be passed to the Executors.
Also removed unused variables in the ExecutorRunner.
2012-10-13 13:08:44 -07:00
Matei Zaharia b4067cbad4 More doc updates, and moved Serializer to a subpackage. 2012-10-12 18:19:21 -07:00
Matei Zaharia 8d7b77bcb5 Some doc and usability improvements:
- Added a StorageLevels class for easy access to StorageLevel constants
  in Java
- Added doc comments on Function classes in Java
- Updated Accumulator and HadoopWriter docs slightly
2012-10-12 17:53:20 -07:00
Matei Zaharia 682b2d9329 Added a test for when an RDD only partially fits in memory 2012-10-12 14:58:26 -07:00
Matei Zaharia dca496bb77 Document cartesian() operation 2012-10-12 14:46:41 -07:00
Matei Zaharia 23015ccac0 Merge pull request #271 from shivaram/block-manager-npe-fix
Change block manager to accept a ArrayBuffer
2012-10-12 14:36:28 -07:00
Shivaram Venkataraman 8577523f37 Add test to verify if RDD is computed even if block manager has insufficient
memory
2012-10-12 14:14:57 -07:00
Patrick Wendell dc8adbd359 Adding Java documentation 2012-10-11 00:49:03 -07:00
Shivaram Venkataraman 2cf40c5fd5 Change block manager to accept a ArrayBuffer instead of an iterator to ensure
that the computation can proceed even if we run out of memory to cache the
block. Update CacheTracker to use this new interface
2012-10-11 00:42:46 -07:00
Denny d3f095f904 Fixed bug when fetching Jar dependencies.
Instead of checking currentFiles check currentJars.
2012-10-10 16:09:53 -07:00
Matei Zaharia ee2fcb2ce6 Added documentation to all the *RDDFunction classes, and moved them into
the spark package to make them more visible. Also documented various
other miscellaneous things in the API.
2012-10-09 18:38:36 -07:00
Matei Zaharia bc0bc672d0 Updates to documentation:
- Edited quick start and tuning guide to simplify them a little
- Simplified top menu bar
- Made private a SparkContext constructor parameter that was left as
  public
- Various small fixes
2012-10-09 14:30:23 -07:00
Andy Konwinski 1d79ff6028 Fixes a typo, adds scaladoc comments to SparkContext constructors. 2012-10-08 22:49:17 -07:00
Patrick Wendell ac310098ef More docs in RDD class 2012-10-08 22:25:11 -07:00
Andy Konwinski bd688940a1 A start on scaladoc for the public APIs. 2012-10-08 21:13:29 -07:00
Mosharaf Chowdhury edc67bfba8 Merge branch 'dev' into bc-fix-dev 2012-10-08 16:19:13 -07:00
Matei Zaharia efc5423210 Made compression configurable separately for shuffle, broadcast and RDDs 2012-10-07 11:30:53 -07:00
Matei Zaharia 039cc6228e Merge pull request #251 from JoshRosen/docs/internals
Document Dependency classes and make minor interface improvements
2012-10-07 09:56:53 -07:00
Reynold Xin f66c0e9561 Changed the println to logInfo in Utils.fetchFile. 2012-10-07 01:53:24 -07:00
Matei Zaharia d72db3d7dc Merge pull request #250 from rxin/dev
Fixed a bug in addFile that if the file is specified as "file:///", the symlink is created incorrectly for local mode.
2012-10-07 00:56:53 -07:00
Reynold Xin 80f59e17e2 Fixed a bug in addFile that if the file is specified as "file:///", the
symlink is created wrong for local mode.
2012-10-07 00:54:38 -07:00
Josh Rosen e10308f5a0 Make ShuffleDependency.aggregator explicitly optional.
It was confusing to be using

    new Aggregator[K, V, V](null, null, null, false)

to represent the absence of an aggregator.
2012-10-07 00:36:04 -07:00
Matei Zaharia f930fe5d81 Improve error message 2012-10-07 07:34:36 +00:00
Matei Zaharia a3bf0ce57f Don't crash on ask timeout exceptions in deploy.Client.stop() (fixes a crash in tests) 2012-10-07 07:25:41 +00:00
Matei Zaharia eca570f66a Removed the need to sleep in tests due to waiting for Akka to shut down 2012-10-07 00:17:59 -07:00
Josh Rosen 4f72066a9a Document the Dependency classes. 2012-10-07 00:05:37 -07:00
Josh Rosen 3f2571fe98 Remove unused isShuffle field from Dependency. 2012-10-07 00:03:55 -07:00
Matei Zaharia b2fc3dd902 Log message 2012-10-07 06:43:52 +00:00