Commit graph

641 commits

Author SHA1 Message Date
Denny 5e4076e3f2 Merge branch 'dev' into feature/fileserver
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
2012-09-11 16:57:17 -07:00
Denny 77873d2c8e Formatting 2012-09-11 16:51:46 -07:00
Denny 24b9b37314 Subclass URLClassLoader instead of using reflection 2012-09-11 16:51:08 -07:00
Denny 31c53e917d Use stageId as index for fileSet caches. 2012-09-11 16:10:45 -07:00
Matei Zaharia 943df48348 Merge branch 'dev' of github.com:mesos/spark into dev 2012-09-11 16:00:37 -07:00
Matei Zaharia 6d7f907e73 Manually merge pull request #175 by Imran Rashid 2012-09-11 16:00:06 -07:00
Reynold Xin 7af7c79ce5 Updated the logError call from the previous commit to conform to
logError API.
2012-09-11 14:32:24 -07:00
Reynold Xin 38b9119c96 Log entire exception (including stack trace) in BlockManagerWorker. 2012-09-11 11:31:35 -07:00
Denny 4d3471dd07 Fix serialization bugs and added local cluster tests 2012-09-10 15:39:58 -07:00
Denny b864c36a30 Dynamically adding jar files and caching fileSets. 2012-09-10 12:49:09 -07:00
Denny f275fb07da General FileServer
A general fileserver for both JARs and regular files.
2012-09-10 12:48:59 -07:00
Matei Zaharia a13780670d Added a unit test for local-cluster mode and simplified some of the code involved in that 2012-09-10 12:48:58 -07:00
Denny f2ac55840c Add shutdown hook to Executor Runner and execute code to shutdown local cluster in Scheduler Backend 2012-09-10 12:48:58 -07:00
Denny 9ead8ab14e Set SPARK_LAUNCH_WITH_SCALA=0 in Executor Runner 2012-09-10 12:48:58 -07:00
Denny 8bb3c73977 Renamed spark-cluster to spark-local. 2012-09-10 12:48:58 -07:00
Denny a367c20f49 Fix wrong counting 2012-09-10 12:48:57 -07:00
Denny 93fe331e6d Delete old DeployUtils. 2012-09-10 12:48:57 -07:00
Denny cf074f9c96 Renamed class. 2012-09-10 12:48:57 -07:00
Denny 3749f94184 Start a standalone cluster locally. 2012-09-10 12:48:57 -07:00
Matei Zaharia 995982b3c9 Added a unit test for local-cluster mode and simplified some of the code involved in that 2012-09-07 17:08:36 -07:00
Matei Zaharia 8d2fcc2832 Merge pull request #189 from dennybritz/feature/localcluster
Simulating a Spark standalone cluster locally
2012-09-07 15:43:43 -07:00
Denny 7ff9311add Add shutdown hook to Executor Runner and execute code to shutdown local cluster in Scheduler Backend 2012-09-07 14:09:12 -07:00
Denny 4e7b264cf7 Set SPARK_LAUNCH_WITH_SCALA=0 in Executor Runner 2012-09-07 11:39:44 -07:00
root c2da64409a Randomize the order of block fetches in getMultiple 2012-09-06 23:16:26 +00:00
Denny 886183e591 Renamed spark-cluster to spark-local. 2012-09-05 17:10:54 -07:00
Reynold Xin c308fbcb79 Removed cache add/remove log messages from CacheTracker.
Added log messages on BlockManagerMaster to reflect block add/remove.
Also did some minor cleanup of storage package code.
2012-09-05 15:59:48 -07:00
Denny babbca0a2f Fix wrong counting 2012-09-04 22:04:18 -07:00
Denny 9326509f66 Delete old DeployUtils. 2012-09-04 21:15:23 -07:00
Denny 1588d4dbe6 Renamed class. 2012-09-04 21:13:25 -07:00
Denny 22dde6e020 Start a standalone cluster locally. 2012-09-04 20:56:30 -07:00
Matei Zaharia a842c63044 Minor formatting fixes 2012-09-03 16:24:00 -07:00
Harvey 3076b038f4 Start fetching a remote block when a received remote block has been passed
to the reduce function
2012-09-01 12:01:35 -07:00
Matei Zaharia 389fb4cc54 End runJob() with a SparkException when a task fails too many times in
one of the cluster schedulers.
2012-08-31 17:47:43 -07:00
Mosharaf Chowdhury 31ffe8d528 Synchronization bug fix in broadcast implementations 2012-08-30 22:26:43 -07:00
Mosharaf Chowdhury 3883532545 Bug fix. Fixed log messages. Updated BroadcastTest example to have iterations. 2012-08-30 21:43:00 -07:00
Matei Zaharia a480dec6b2 Deserialize multi-get results in the caller's thread. This fixes an
issue with shared buffers in the KryoSerializer.
2012-08-30 20:01:06 -07:00
Reynold Xin a8a2a08a1a Added a test for testing map-side combine on/off switch. 2012-08-30 12:34:28 -07:00
Reynold Xin 5945bcdcc5 Added a new flag in Aggregator to indicate applying map side combiners. 2012-08-29 23:32:08 -07:00
Reynold Xin c68e820b2a Merge branch 'dev' of github.com:mesos/spark into dev 2012-08-29 23:01:19 -07:00
Reynold Xin 940869dfda Disable running combiners on map tasks when mergeCombiners function is
not specified by the user.
2012-08-29 23:00:02 -07:00
Matei Zaharia bf2e9cb08e Fault tolerance and block store fixes discovered through streaming tests. 2012-08-27 23:07:50 -07:00
Reynold Xin 3a6a95dc24 Removed the deserialization cache for ShuffleMapTask because it was
causing concurrency problems (some variables in Shark get set to null).
The cost of task deserialization on slaves is trivial compared with the
execution time of the task anyway.
2012-08-27 22:33:15 -07:00
Matei Zaharia 2c16ae36d7 Set log level in tests to WARN 2012-08-23 20:38:14 -07:00
Matei Zaharia deedb9e7b7 Fix further issues with tests and broadcast.
The broadcast fix is to store values as MEMORY_ONLY_DESER instead of
MEMORY_ONLY, which will save substantial time on serialization.
2012-08-23 20:31:49 -07:00
Matei Zaharia 59b831b9d1 Fixed test failures due to broadcast not stopping correctly 2012-08-23 19:59:55 -07:00
Matei Zaharia 7310a6f499 Merge pull request #147 from mosharaf/dev
Broadcast refactoring/cleaning up
2012-08-23 19:38:28 -07:00
Matei Zaharia 25a6a39e6d Added other SparkContext constructors to JavaSparkContext 2012-08-19 18:59:16 -07:00
Shivaram Venkataraman 0f4fbb057b Change BlockManagerSuite test cases to use a deterministic size estimator and
update the results to match the new estimates
2012-08-13 13:32:23 -07:00
Shivaram Venkataraman 22ba3a3f77 Add test-cases for 32-bit and no-compressed oops scenarios. 2012-08-13 13:32:10 -07:00
Shivaram Venkataraman 1f68c4b03b Update test cases to match the new size estimates. Uses 64-bit and compressed
oops setting to get deterministic results
2012-08-13 13:31:54 -07:00
Shivaram Venkataraman 1ea269110c Move object size and pointer size initialization into a function to enable unit-testing 2012-08-13 13:31:45 -07:00
Shivaram Venkataraman 44661df9cc If spark.test.useCompressedOops is set, use that to infer compressed oops
setting. This is useful to get a deterministic test case
2012-08-13 13:31:39 -07:00
Shivaram Venkataraman 0dd8fe73ba Use HotSpotDiagnosticMXBean to get if CompressedOops are in use or not 2012-08-13 13:31:29 -07:00
Shivaram Venkataraman 80104ce1da Add link to Java wiki which specifies what changes with compressed oops 2012-08-13 13:31:21 -07:00
Shivaram Venkataraman 00ab5490b3 Changes to make size estimator more accurate. Fixes object size, pointer size
according to architecture and also aligns objects and arrays when computing
instance sizes. Verified using Eclipse Memory Analysis Tool (MAT)
2012-08-13 13:31:11 -07:00
Matei Zaharia 6ae3c375a9 Renamed apply() to call() in Java API and allowed it to throw Exceptions 2012-08-12 23:10:19 +02:00
Matei Zaharia 0141879c40 Use Promises instead of having a Future wait on a thread in
ConnectionManager.
2012-08-12 22:16:32 +02:00
Matei Zaharia 845a870242 Return remotely fetched blocks in a pipelined fashion from BlockManager 2012-08-12 20:01:38 +02:00
Matei Zaharia e17ed9a21d Switch to Akka futures in connection manager.
It's still not good because each Future ends up waiting on a lock, but
it seems to work better than Scala Actors, and more importantly it
allows us to use onComplete and other listeners on futures.
2012-08-12 19:40:37 +02:00
Matei Zaharia ad8a7612a4 Changed multi-get method in BlockManager to return an iterator 2012-08-12 19:18:01 +02:00
Matei Zaharia 3c94e5c188 Merge pull request #168 from shivaram/dev
Use JavaConversion to get a scala iterator
2012-08-10 00:57:33 -07:00
Matei Zaharia e463e7a333 Merge pull request #167 from JoshRosen/piped-rdd-fixes
Detect non-zero exit status from PipedRDD process
2012-08-10 00:56:42 -07:00
Josh Rosen 59c22fb444 Print exit status in PipedRDD failure exception. 2012-08-10 00:33:56 -07:00
Shivaram Venkataraman 1803cce692 Use an implicit conversion to get the scala iterator 2012-08-08 14:31:04 -07:00
Shivaram Venkataraman 674fcf56bf Use JavaConversion to get a scala iterator 2012-08-08 14:10:23 -07:00
Shivaram Venkataraman f4aaec7a48 Avoid a copy in ShuffleMapTask by creating an iterator that will be used by the
block manager.
2012-08-08 00:47:02 -07:00
Mosharaf Chowdhury d821dd3ccc BroadcastManager is a class now (replaced Braodcast object) 2012-08-05 01:10:51 -07:00
Mosharaf Chowdhury b4804119f9 Merge remote-tracking branch 'upstream/dev' into dev 2012-08-04 20:42:12 -07:00
Matei Zaharia 88b016db2a Merge pull request #160 from dennybritz/clusterscripts
Standalone cluster scripts
2012-08-04 17:45:20 -07:00
Mosharaf Chowdhury 1b0534af8f Merge branch 'dev' into bc-bm 2012-08-04 00:30:08 -07:00
Mosharaf Chowdhury d11b457e67 Merge remote-tracking branch 'upstream/dev' into dev 2012-08-04 00:28:10 -07:00
Mosharaf Chowdhury 24b7eb872c Bug fixed. Broadcast now works with BlockManager. 2012-08-04 00:27:28 -07:00
Shivaram Venkataraman ce3444d2cb Fix testcheckpoint to reuse spark context defined in the class 2012-08-03 18:52:26 -07:00
Matei Zaharia 62898b631f Made range partition balance tests more aggressive.
This is because we pull out such a large sample (10x the number of
partitions) that we should expect pretty good balance. The tests are
also deterministic so there's no worry about them failing irreproducibly.
2012-08-03 16:46:48 -04:00
Matei Zaharia 6601a6212b Added a unit test for cross-partition balancing in sort, and changes to
RangePartitioner to make it pass. It turns out that the first partition
was always kind of small due to how we picked partition boundaries.
2012-08-03 16:40:45 -04:00
Harvey 1170de3757 Fix for partitioning when sorting in descending order 2012-08-03 16:40:38 -04:00
Paul Cavallaro d05c0f97ca Logging Throwables in Info and Debug
Logging Throwables in logInfo and logDebug instead of swallowing them.

Conflicts:

	core/src/main/scala/spark/Logging.scala
2012-08-03 16:40:21 -04:00
Denny 0008994044 merged dev branch 2012-08-02 16:00:33 -07:00
Denny 53008c2d8a Settings variables and bugfix for stop script. 2012-08-02 15:59:39 -07:00
Matei Zaharia 71a958b0b7 Merge branch 'dev' of github.com:mesos/spark into dev
Conflicts:
	project/SparkBuild.scala
2012-08-02 17:23:13 -04:00
Denny 7312a5c30f Use spray's implicit Marshaller for Futures. 2012-08-02 14:11:27 -07:00
Denny ba7e30fb5e Mostly stlyistic changes. 2012-08-02 13:55:09 -07:00
Shivaram Venkataraman 1a07bb9ba4 Avoid an extra partition copy by passing an iterator to blockManager.put 2012-08-02 12:22:33 -07:00
Shivaram Venkataraman 6790908b11 Use maxMemory to better estimate memory available for BlockManager cache 2012-08-02 12:05:05 -07:00
Denny 863c31b7c1 Moved resources into static folder 2012-08-02 09:48:36 -07:00
Denny 0ee44c225e Spark standalone mode cluster scripts.
Heavily inspired by Hadoop cluster scripts ;-)
2012-08-01 20:38:52 -07:00
Denny 6c670c37dd Webui improvements. 2012-08-01 19:47:57 -07:00
Denny 1b29e90a79 merge dev branch 2012-08-01 14:06:09 -07:00
Denny 011220fa55 Compact job page. 2012-08-01 11:26:45 -07:00
Denny 7a295fee96 Spark WebUI Implementation. 2012-08-01 11:01:09 -07:00
Mosharaf Chowdhury f23395e8c5 Merge remote-tracking branch 'upstream/dev' into dev 2012-07-30 19:39:49 -07:00
Matei Zaharia 3ee2530c0c Merge branch 'block-manager-fix' into dev 2012-07-30 13:58:46 -07:00
Matei Zaharia 400221f851 Merge branch 'dev' of git://github.com/tdas/spark into dev 2012-07-30 13:54:57 -07:00
Matei Zaharia ed1b0f8388 Made BlockManagerMaster no longer be a singleton.
Also cleaned up a few formatting things throughout block manager code.
2012-07-30 13:53:47 -07:00
Matei Zaharia f471c82558 Various reorganization and formatting fixes 2012-07-30 11:24:01 -07:00
Mosharaf Chowdhury 5932a87cac Merge remote-tracking branch 'upstream/dev' into dev 2012-07-29 18:20:45 -07:00
Matei Zaharia d7f089323a Fixed AccumulatorSuite to clean up SparkContext with BeforeAndAfter 2012-07-28 20:25:42 -07:00
Imran Rashid f7149c5e46 tasks cannot access value of accumulator 2012-07-28 20:16:17 -07:00
Imran Rashid 244cbbe33a one more minor cleanup to scaladoc 2012-07-28 20:16:10 -07:00
Imran Rashid 3b392c67db fix up scaladoc, naming of type parameters 2012-07-28 20:16:01 -07:00
Imran Rashid f1face1ea9 rename addToAccum to addAccumulator 2012-07-28 20:16:01 -07:00
Imran Rashid 2d666b9d76 add some functionality to Vector, delete copy in AccumulatorSuite 2012-07-28 20:15:51 -07:00
Imran Rashid edc6972f8e move Vector class into core and spark.util package 2012-07-28 20:15:42 -07:00
Imran Rashid 83659af11c Accumulator now inherits from Accumulable, whcih simplifies a bunch of other things (eg., no +:=)
Conflicts:

	core/src/main/scala/spark/Accumulators.scala
2012-07-28 20:13:51 -07:00
Imran Rashid 79d58ed20a improve scaladoc 2012-07-28 20:12:41 -07:00
Imran Rashid ae07f3864c add Accumulatable, add corresponding docs & tests for accumulators 2012-07-28 20:12:41 -07:00
Matei Zaharia f6f917bd00 Add a sleep to prevent a failing test.
The BlockManager's put seems to be slightly asynchronous, which can
cause it to fail this test by not removing stuff from the cache before
we put the next value. We should probably change the semantics of put()
in this case but it's hard right now. It will also be hard for
asynchronously replicated puts.
2012-07-27 16:59:36 -07:00
Matei Zaharia c0c78d2119 Renamed test more descriptively 2012-07-27 16:28:18 -07:00
Matei Zaharia dee8ff1b9d Added a second version of union() without varargs. 2012-07-27 16:27:52 -07:00
Tathagata Das cf429699e1 Updated the new checkpoint RDD to remember partitioning of the original RDD. 2012-07-27 23:16:37 +00:00
Mosharaf Chowdhury b5be936d7c Broadcasts using BlockManager instead of BoundedMemoryCache 2012-07-27 15:38:46 -07:00
Mosharaf Chowdhury 1f19fbb8db Merge remote-tracking branch 'upstream/dev' into dev
Conflicts:
	core/src/main/scala/spark/broadcast/Broadcast.scala
2012-07-27 15:18:23 -07:00
Matei Zaharia b51d733a57 Fixed Java union methods having same erasure.
Changed union() methods on lists to take a separate "first element"
argument in order to differentiate them to the compiler, because Java 7
considered it an error to have them all take Lists parameterized with
different types.
2012-07-27 12:23:27 -07:00
Tathagata Das 3e271c3b61 Merge branch 'dev' of github.com:tdas/spark into dev 2012-07-27 12:01:04 -07:00
Tathagata Das 024905f682 Added BlockRDD and a first-cut version of checkpoint() to RDD class. 2012-07-27 12:00:49 -07:00
Tathagata Das d1eee44a03 Fixed more stuff in BoundedMemoryCache. 2012-07-27 18:33:32 +00:00
Tathagata Das d1b7f41671 Fixed bug in BoundedMemoryCache. 2012-07-27 09:00:45 -07:00
Tathagata Das 435d129bec Fixed bugs in block dropping code of MemoryStore and changed synchronized HashMap to ConcurrentHashMap in BlockManager. 2012-07-27 10:02:26 +00:00
Tathagata Das 0426769f89 Modified the block dropping code for better performance. 2012-07-26 20:53:45 -07:00
Matei Zaharia 5c5aa2ff81 Merge pull request #153 from JoshRosen/new-java-api
Java API
2012-07-26 17:20:52 -07:00
Josh Rosen c5e2810dc7 Add persist(), splits(), glom(), and mapPartitions() to Java API. 2012-07-26 12:46:47 -07:00
Josh Rosen bf61c10072 Detect non-zero exit status from PipedRDD process. 2012-07-26 11:32:59 -07:00
Josh Rosen 6a78e88237 Minor cleanup and optimizations in Java API.
- Add override keywords.
- Cache RDDs and counts in TC example.
- Clean up JavaRDDLike's abstract methods.
2012-07-24 09:47:00 -07:00
Denny 4f4a34c025 Stlystic changes
Conflicts:

	core/src/test/scala/spark/MesosSchedulerSuite.scala
2012-07-23 16:32:20 -07:00
Denny 866e6949df Always destroy SparkContext in after block for the unit tests.
Conflicts:

	core/src/test/scala/spark/ShuffleSuite.scala
2012-07-23 16:29:17 -07:00
Matei Zaharia 600e99728d Fix a bug where an input path was added to a Hadoop job configuration twice 2012-07-23 16:16:19 -07:00
Josh Rosen 042dcbde33 Add type annotations to Java API methods.
Add missing Scala Map to java.util.Map conversions.
2012-07-22 17:35:29 -07:00
Josh Rosen e23938c3be Use mapValues() in JavaPairRDD.cogroupResultToJava(). 2012-07-22 15:10:01 -07:00
Josh Rosen 01dce3f569 Add Java API
Add distinct() method to RDD.

Fix bug in DoubleRDDFunctions.
2012-07-18 17:34:29 -07:00
Mosharaf Chowdhury 85cd9979f2 Fix for isLocal 2012-07-13 01:13:14 -07:00
Mosharaf Chowdhury 1c83fd4b66 Merged with Upstream dev 2012-07-13 01:08:28 -07:00
Mosharaf Chowdhury bb4ee580fa Cleaning BitTorrentBroadcast code... 2012-07-13 01:04:01 -07:00
Mosharaf Chowdhury 8ccffe21da Cleaned TreeBroadcast 2012-07-13 00:54:25 -07:00
Matei Zaharia 628bb5ca7f Allow null keys in Spark's reduce and group by 2012-07-12 18:36:02 -07:00
Matei Zaharia e2a67a8024 Fixes to coarse-grained Mesos scheduler in dealing with failed nodes 2012-07-12 18:21:52 -07:00
Matei Zaharia be622cf867 Formatting 2012-07-11 17:31:44 -07:00
Matei Zaharia e8ae77df24 Added more methods for loading/saving with new Hadoop API 2012-07-11 17:31:33 -07:00
Mosharaf Chowdhury 34999d97f5 Added stop() to the Broadcast subsystem 2012-07-10 01:03:47 -07:00
Mosharaf Chowdhury d6a9680604 Slightly better check for isLocal 2012-07-10 00:16:47 -07:00
Mosharaf Chowdhury 701f49e0d9 Refactoring 2012-07-09 22:39:47 -07:00
Mosharaf Chowdhury cf1c60a1de Refactoring 2012-07-09 22:07:46 -07:00
Mosharaf Chowdhury e71f69ad3d Refactoring 2012-07-09 22:07:17 -07:00
Mosharaf Chowdhury ca02a92332 Refactored TrackMultipleValues out. 2012-07-09 21:35:39 -07:00
Mosharaf Chowdhury 654576ef1a Tweaks 2012-07-09 21:12:42 -07:00
Mosharaf Chowdhury 425c247269 Removed some unused stuff 2012-07-08 14:29:04 -07:00
Matei Zaharia 0a47284003 More work to allow Spark to run on the standalone deploy cluster. 2012-07-08 14:00:04 -07:00
Mosharaf Chowdhury c7c5258e25 Compiles without Dfs 2012-07-08 13:22:12 -07:00
Mosharaf Chowdhury 178bb29f05 Removed Chained and Dfs broadcast implementations 2012-07-08 11:57:00 -07:00
Matei Zaharia 1aa63f775b Added back coarse-grained Mesos scheduler based on StandaloneScheduler. 2012-07-08 10:52:13 -07:00
Matei Zaharia c5cc10cda3 More work on standalone scheduler 2012-07-06 20:17:44 -07:00
Matei Zaharia 909b325243 Further refactoring, and start of a standalone scheduler backend 2012-07-06 17:56:44 -07:00
Matei Zaharia 4e2fe0bdaf Miscellaneous bug fixes 2012-07-06 16:33:40 -07:00
Matei Zaharia e72afdb817 Some refactoring to make cluster scheduler pluggable. 2012-07-06 15:23:26 -07:00
Matei Zaharia 5d1a887bed Further updates to run processes on cluster. 2012-07-01 17:13:31 -07:00
Matei Zaharia 51c46eaca0 More work on standalone deploy system. 2012-07-01 01:05:59 -07:00
Matei Zaharia a6eb9fda61 Detect connection and disconnection of slaves 2012-06-30 17:46:56 -07:00
Matei Zaharia 408b5a1332 More work on deploy code (adding Worker class) 2012-06-30 16:45:57 -07:00
Matei Zaharia 2fb6e7d71e Initial framework to get a master and web UI up. 2012-06-30 14:45:55 -07:00
Matei Zaharia c53670b9bf Various code style fixes, mostly from IntelliJ IDEA 2012-06-29 18:47:12 -07:00
Matei Zaharia c6be4ffbf9 Fixes to CoarseMesosScheduler 2012-06-29 16:18:51 -07:00
Matei Zaharia 3a58efa5a5 Allow binding to a free port and change Akka logging to use SLF4J. Also
fixes various bugs in the previous code when running on Mesos.
2012-06-29 16:02:21 -07:00
Matei Zaharia 3920189932 Upgraded to Akka 2 and fixed test execution (which was still parallel
across projects).
2012-06-28 23:51:28 -07:00
root 6ad3e1f1b4 Various fixes when running on Mesos 2012-06-20 06:48:26 +00:00
Tathagata Das e896a505e2 Added testcase for ByteBufferInputStream bugs. 2012-06-17 16:11:12 -07:00
Tathagata Das 40536e3668 Fixed nasty corner case bug in ByteBufferInputStream. Could not add a test case for this as I could not figure out how to deterministically reproduce the bug in a short testcase. 2012-06-17 13:28:41 -07:00
Matei Zaharia 2893b30550 Various fixes to get unit tests running. In particular, shut down
ConnectionManager and DAGScheduler properly, plus a fix to
LocalScheduler that was not merged in from 0.5 and was actually caught
by one of the tests.
2012-06-17 00:28:45 -07:00
Matei Zaharia b3eeac55b8 Fixed HttpBroadcast to work with this branch's Serializer. 2012-06-15 23:54:38 -07:00
Matei Zaharia f58da6164e Merge branch 'master' into dev 2012-06-15 23:47:11 -07:00
Tathagata Das 5f54bdf98b Added shutdown for akka to SparkContext.stop(). Helps a little, but many testsuites still fail. 2012-06-13 20:49:00 -04:00
Tathagata Das c6156da9e2 Multiple bug fixes to pass the testsuites ShuffleSuite and BlockManagerSuite. 2012-06-13 16:26:49 -04:00
Matei Zaharia 879bc0bece Merge branch 'master' into mesos-0.9 2012-06-09 16:24:16 -07:00
Matei Zaharia 4b05798c06 Further bug fix to HttpBroadcast 2012-06-09 16:24:03 -07:00
Matei Zaharia 587a16a7ef Merge branch 'master' into mesos-0.9 2012-06-09 16:17:07 -07:00
Matei Zaharia 8ed662862e Bug fix to HttpBroadcast 2012-06-09 16:16:55 -07:00
Matei Zaharia 2fd9f994ae Merge branch 'master' into mesos-0.9 2012-06-09 15:58:35 -07:00
Matei Zaharia e75b1b5cb4 Change the default broadcast implementation to a simple HTTP-based
broadcast. Fixes #139.
2012-06-09 15:58:07 -07:00
Matei Zaharia a96558caa3 Performance improvements to shuffle operations: in particular, preserve
RDD partitioning in more cases where it's possible, and use iterators
instead of materializing collections when doing joins.
2012-06-09 14:44:18 -07:00
Matei Zaharia c2c7299d7a Added BlockManagerSuite, which I'd forgotten to merge. 2012-06-07 13:47:10 -07:00
Matei Zaharia 63051dd2bc Merge in engine improvements from the Spark Streaming project, developed
jointly with Tathagata Das and Haoyuan Li. This commit imports the changes
and ports them to Mesos 0.9, but does not yet pass unit tests due to
various classes not supporting a graceful stop() yet.
2012-06-07 12:45:38 -07:00
Matei Zaharia 7e1c97fc4b Merge branch 'master' into mesos-0.9 2012-06-06 16:48:59 -07:00
Matei Zaharia 048276799a Commit task outputs to Hadoop-supported storage systems in parallel on the
cluster instead of on the master. Fixes #110.
2012-06-06 16:46:53 -07:00
Matei Zaharia 6888bc7191 Merge branch 'master' into mesos-0.9 2012-06-06 16:14:19 -07:00
Matei Zaharia 6ae2746d1e Handle arrays that contain the same element many times better in
SizeEstimator. Also added a test for SizeEstimator. Fixes #136.
2012-06-06 16:13:02 -07:00
Matei Zaharia 0a617958d1 Some refactoring to make BoundedMemoryCache test similar to others 2012-06-06 16:12:08 -07:00
Matei Zaharia dbc3c86ae3 Merge branch 'master' into mesos-0.9
Conflicts:
	core/src/main/scala/spark/Executor.scala
2012-06-03 17:44:04 -07:00
Matei Zaharia e141f644ca Merge pull request #132 from Benky/rb-first-iteration
Little refactoring and unit tests for CacheTrackerActor
2012-05-26 13:15:06 -07:00
Richard Benkovsky ae64920337 MesosScheduler refactoring 2012-05-22 11:04:54 +02:00
Richard Benkovsky 3a1bcd4028 Added tests for CacheTrackerActor 2012-05-22 11:04:54 +02:00
Richard Benkovsky 8f2f736d53 Little refactoring 2012-05-22 11:04:54 +02:00
Richard Benkovsky 518506a7c5 Added tests for Utils.copyStream 2012-05-22 11:04:51 +02:00
Richard Benkovsky f162fc2beb Formating fixed 2012-05-22 09:45:38 +02:00
Richard Benkovsky 565245871f BoundedMemoryCache.put fails when estimated size of 'value' is larger than cache capacity 2012-05-20 22:13:35 +02:00
Richard Benkovsky 822a4be37d Utils.memoryBytesToString fixed 2012-05-19 15:13:20 +02:00
Reynold Xin d0c6e9f639 Made some RDD dependencies transient to reduce the amount of data needed
to be serialized in closure serialization. This can significantly reduce
the task setup time in Shark when the query involves a large number of
(Hive) partitions.
2012-05-16 14:16:55 -07:00
Reynold Xin 16461e2eda Updated Cache's put method to use a case class for response. Previously
it was pretty ugly that put() should return -1 for failures.
2012-05-15 00:31:52 -07:00
Reynold Xin 019e48833f Added the capacity to report cache usage status back to the cache
trackor. This is essential for building a dashboard to see the status of
caches on all slaves.
2012-05-14 18:39:04 -07:00
Matei Zaharia f48742683a Made caches dataset-aware so that they won't cyclically evict partitions
from the same dataset.
2012-05-06 20:14:40 -07:00
Matei Zaharia bd2ab635a7 Fixed the way the JAR server is created after finding issue at Twitter 2012-05-05 20:05:15 -07:00
Matei Zaharia 32a4f4623c Merge pull request #129 from mesos/rxin
Force serialize/deserialize task results in local execution mode.
2012-04-24 16:18:39 -07:00
Reynold Xin 761ea65a98 Added a test for the previous commit (failing to serialize task results
would throw an exception for local tasks).
2012-04-24 15:14:35 -07:00
Reynold Xin 9821cd4d42 Force serialize/deserialize task results in local execution mode. 2012-04-24 14:55:28 -07:00
Antonio 3e48818993 Removed commented-out System.exit call 2012-04-23 11:42:58 -07:00
Antonio 39d99168dc Added exception handling instead of just exiting in LocalScheduler for tasks that throw exceptions 2012-04-20 14:46:43 -07:00
Reynold Xin e601b3b9e5 Added the ability to set environmental variables in piped rdd. 2012-04-17 16:40:56 -07:00
Matei Zaharia 3b745176e0 Bug fix to pluggable closure serialization change 2012-04-12 17:53:02 +00:00
Matei Zaharia 112655f032 Merge pull request #121 from rxin/kryo-closure
Added an option (spark.closure.serializer) to specify the serializer for closures.
2012-04-10 14:21:02 -07:00
Reynold Xin d295ccb43c Added a closureSerializer field in SparkEnv and use it to serialize
tasks.
2012-04-10 13:29:46 -07:00
Reynold Xin 968f75f6af Added an option (spark.closure.serializer) to specify the serializer for
closures. This enables using Kryo as the closure serializer.
2012-04-09 21:59:56 -07:00
Matei Zaharia a69c0738d1 Merge branch 'master' into mesos-0.9 2012-04-08 23:41:36 -07:00
Matei Zaharia a633974143 Merge branch 'master' of github.com:mesos/spark 2012-04-08 23:41:25 -07:00
Matei Zaharia 0229d5390f Merge branch 'master' into mesos-0.9 2012-04-08 23:39:37 -07:00
Matei Zaharia d401e1b3e8 Fix a possible deadlock in MesosScheduler 2012-04-08 23:38:49 -07:00
Ankur Dave 7be1c7b331 Report entry dropping in BoundedMemoryCache 2012-04-06 15:49:32 -07:00
Matei Zaharia a8bb324ed9 Merge branch 'master' into mesos-0.9 2012-04-05 14:53:22 -07:00
Matei Zaharia 816d4e5840 Pass local IP address instead of hostname in spark.master.host. Fixes #117. 2012-04-05 14:53:17 -07:00
Matei Zaharia 335a6036ad Converted some tabs to spaces 2012-04-05 11:58:01 -07:00
Matei Zaharia 8c95a85438 Use Runtime.maxMemory instead of Runtime.totalMemory in
BoundedMemoryCache, in case the JVM was not started with its initial
heap size equaling its maximum one (-Xms == -Xmx).
2012-03-30 13:39:35 -04:00
Matei Zaharia 03d5b3b48d Use Runtime.maxMemory instead of Runtime.totalMemory in
BoundedMemoryCache, in case the JVM was not started with its initial
heap size equaling its maximum one (-Xms == -Xmx).
2012-03-30 13:38:19 -04:00
Matei Zaharia 95fb1a16b8 Use Mesos 0.9 RC3 JAR and protobuf 2.4.1 2012-03-30 11:38:49 -04:00
Matei Zaharia dfa3b6b544 Fixes to work with the very latest Mesos 0.9 API 2012-03-29 22:12:35 -04:00
Matei Zaharia 4d52cc6738 Merge branch 'master' into mesos-0.9 2012-03-29 21:29:39 -04:00
Reynold Xin 42dcdbcb2f Removed the extra spaces in OrderedRDDFunctions and SortedRDD. 2012-03-29 15:21:57 -07:00
Matei Zaharia 08cda89e8a Further fixes to how Mesos is found and used 2012-03-17 13:39:14 -07:00
Matei Zaharia 3c3fdf6eca Merge branch 'master' into mesos-0.9 2012-03-17 13:09:21 -07:00
Matei Zaharia c7af538ac1 Some fixes to sorting for when the RDD has fewer elements than the
number of partitions we ask to partition it into. Also, removed a test
that was taking way too long to run.
2012-03-17 13:08:36 -07:00
Matei Zaharia a099a63a8a Initial work to make Spark compile with Mesos 0.9 and Hadoop 1.0 2012-03-17 12:31:34 -07:00
Matei Zaharia a5e2b6a6bd Merge pull request #112 from cengle/master
Changed HadoopRDD to get key and value containers from the RecordReader instead of through reflection
2012-03-06 13:38:32 -08:00
Matei Zaharia 97eee50825 Fixes a nasty bug that could happen when tasks fail, because calling
wait() with a timeout of 0 on a Java object means "wait forever".
2012-03-01 13:43:17 -08:00
Cliff Engle dd68cb6099 Get key and value container from RecordReader 2012-02-29 16:33:23 -08:00
Matei Zaharia 1e10df0a46 Merge pull request #111 from alupher/master
Adding sorting to RDDs
2012-02-24 15:50:14 -08:00
Antonio 0d93d95bcf Removed unnecessary import 2012-02-21 19:57:12 -08:00
Antonio 2990298f71 Added sorting testing suite 2012-02-21 19:54:21 -08:00
Matei Zaharia aa04f87cd2 Added support for parallel execution of jobs in DAGScheduler. 2012-02-19 22:50:23 -08:00
Antonio 620798161b Added fixes to sorting 2012-02-13 00:07:39 -08:00
Matei Zaharia 2587ce1690 Fixed a deadlock that occured with MesosScheduler due to an earlier
synchronization change
2012-02-11 21:22:45 -08:00
Antonio e93f622665 Added sorting by key for pair RDDs 2012-02-11 00:56:28 -08:00
Matei Zaharia 98f008b721 Formatting fixes 2012-02-10 10:52:03 -08:00
Matei Zaharia 7660a8b12f Merge branch 'formatting'
Conflicts:
	core/src/main/scala/spark/DAGScheduler.scala
	core/src/main/scala/spark/SimpleShuffleFetcher.scala
	core/src/main/scala/spark/SparkContext.scala
2012-02-10 10:42:14 -08:00
haoyuan 194c42ab79 Code format. 2012-02-10 08:19:53 -08:00
Matei Zaharia 8f5ed51234 Delete Spark's temporary directories when the JVM exits. 2012-02-09 22:58:24 -08:00
Matei Zaharia c0a0df3285 Made the default cache BoundedMemoryCache, and reduced its default size 2012-02-09 22:32:02 -08:00
Matei Zaharia a766780f4c Added some tests for multithreaded access to Spark. 2012-02-09 22:27:53 -08:00
Matei Zaharia 0e93891d3d Replaced LocalFileShuffle with a non-singleton ShuffleManager class
and made DAGScheduler automatically set SparkEnv.
2012-02-09 22:14:56 -08:00
haoyuan 445e0bb1b5 Format the code a bit mroe. 2012-02-09 15:50:26 -08:00
haoyuan 651932e703 Format the code as coding style agreed by Matei/TD/Haoyuan 2012-02-09 13:26:23 -08:00
Matei Zaharia e02dc83a5b IO optimizations 2012-02-06 20:40:39 -08:00
Matei Zaharia c40e766368 Use java.util.HashMap in shuffles 2012-02-06 19:20:25 -08:00
Matei Zaharia b267175ab5 Synchronization fix in case SparkContext is used from multiple threads. 2012-02-06 14:28:18 -08:00
Matei Zaharia 43a3335090 Simplifying test 2012-02-05 22:46:51 -08:00
Hiral Patel b47952342e Add register immutable map to kryo serializer 2012-01-26 15:24:20 -08:00