Commit graph

1951 commits

Author SHA1 Message Date
Denny 485803d740 Merge branch 'dev' of github.com:radlab/spark into kafka 2012-11-06 09:41:45 -08:00
Denny 0c1de43fc7 Working on kafka. 2012-11-06 09:41:42 -08:00
Tathagata Das f8bb719cd2 Added a few more comments to the checkpoint-related functions. 2012-11-05 17:53:56 -08:00
Tathagata Das 395167f2b2 Made more bug fixes for checkpointing. 2012-11-05 16:11:50 -08:00
Tathagata Das 72b2303f99 Fixed major bugs in checkpointing. 2012-11-05 11:41:36 -08:00
Tathagata Das d154238789 Made checkpointing of dstream graph to work with checkpointing of RDDs. For streams requiring checkpointing of its RDD, the default checkpoint interval is set to 10 seconds. 2012-11-04 12:12:06 -08:00
Matei Zaharia dfce7e74a7 Merge pull request #298 from JoshRosen/fix/ec2-existing-cluster-check
Fix check for existing instances during spark-ec2 launch
2012-11-03 18:35:26 -07:00
Josh Rosen 594eed31c4 Fix check for existing instances during EC2 launch. 2012-11-03 17:02:47 -07:00
Tathagata Das 596154eabe Merge branch 'dev-checkpoint' into dev 2012-11-02 17:05:22 -07:00
Tathagata Das 3fb5c9ee24 Fixed serialization bug in countByWindow, added countByKey and countByKeyAndWindow, and added testcases for them. 2012-11-02 12:12:25 -07:00
Matei Zaharia 590e4aa9cb Merge pull request #296 from shivaram/block-manager-fix
Remove unnecessary hash-map put in MemoryStore
2012-11-01 11:54:23 -07:00
Matei Zaharia 4a47d1a476 Merge pull request #297 from JoshRosen/fix/ec2-spot-instances
Cancel spot instance requests when exiting spark-ec2
2012-11-01 11:31:18 -07:00
Shivaram Venkataraman a7d967a1ca Remove unnecessary hash-map put in MemoryStore 2012-11-01 10:46:38 -07:00
Tathagata Das 34e569f40e Added 'synchronized' to RDD serialization to ensure checkpoint-related changes are reflected atomically in the task closure. Added to tests to ensure that jobs running on an RDD on which checkpointing is in progress does hurt the result of the job. 2012-10-31 00:56:40 -07:00
Josh Rosen 96c9bcfd8d Cancel spot instance requests when exiting spark-ec2. 2012-10-30 23:32:38 -07:00
Tathagata Das 0dcd770fdc Added checkpointing support to all RDDs, along with CheckpointSuite to test checkpointing in them. 2012-10-30 16:09:37 -07:00
Denny ceec1a1a6a Nicer storage level format on RDD page 2012-10-29 15:03:01 -07:00
Denny eb95212f4d code Formatting 2012-10-29 14:57:32 -07:00
Denny 531ac136bf BlockManager UI. 2012-10-29 14:53:47 -07:00
Tathagata Das ac12abc17f Modified RDD API to make dependencies a var (therefore can be changed to checkpointed hadoop rdd) and othere references to parent RDDs either through dependencies or through a weak reference (to allow finalizing when dependencies do not refer to it any more). 2012-10-29 11:55:27 -07:00
Josh Rosen 2ccf3b6652 Fix PySpark hash partitioning bug.
A Java array's hashCode is based on its object
identify, not its elements, so this was causing
serialized keys to be hashed incorrectly.

This commit adds a PySpark-specific workaround
and adds more tests.
2012-10-28 22:30:28 -07:00
Josh Rosen 7859879aaa Bump required Py4J version and add test for large broadcast variables. 2012-10-28 16:48:25 -07:00
Tathagata Das 1b900183c8 Added save operations to DStreams. 2012-10-27 18:55:50 -07:00
Matei Zaharia 51477e8874 Merge pull request #294 from JoshRosen/docs/quickstart
Fix minor typos in quickstart and Scala programming guides
2012-10-27 16:56:39 -07:00
Josh Rosen 33bea24f8e Fix Spark groupId in Scala Programming Guide. 2012-10-26 15:01:28 -07:00
root e782187b4a Don't throw an error in the block manager when a block is cached on the master due to
a locally computed operation

Conflicts:

	core/src/main/scala/spark/storage/BlockManagerMaster.scala
2012-10-26 00:33:45 -07:00
Tathagata Das 650d717544 Merge branch 'dev' of github.com:radlab/spark into dev 2012-10-25 13:03:18 -07:00
Matei Zaharia 863a55ae42 Merge remote-tracking branch 'public/master' into dev
Conflicts:
	core/src/main/scala/spark/BlockStoreShuffleFetcher.scala
	core/src/main/scala/spark/KryoSerializer.scala
	core/src/main/scala/spark/MapOutputTracker.scala
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/executor/Executor.scala
	core/src/main/scala/spark/network/Connection.scala
	core/src/main/scala/spark/network/ConnectionManagerTest.scala
	core/src/main/scala/spark/rdd/BlockRDD.scala
	core/src/main/scala/spark/rdd/NewHadoopRDD.scala
	core/src/main/scala/spark/scheduler/ShuffleMapTask.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
	core/src/main/scala/spark/storage/BlockManager.scala
	core/src/main/scala/spark/storage/BlockMessage.scala
	core/src/main/scala/spark/storage/BlockStore.scala
	core/src/main/scala/spark/storage/StorageLevel.scala
	core/src/main/scala/spark/util/AkkaUtils.scala
	project/SparkBuild.scala
	run
2012-10-24 23:21:00 -07:00
Tathagata Das 926e05b030 Added tests for the file input stream. 2012-10-24 23:14:37 -07:00
Matei Zaharia f63a40fd99 Strip leading mesos:// in URLs passed to Mesos 2012-10-24 21:52:13 -07:00
Tathagata Das ed71df46cd Minor fixes. 2012-10-24 16:49:40 -07:00
Tathagata Das 1ef6ea2513 Added tests for testing network input stream. 2012-10-24 14:44:20 -07:00
Matei Zaharia d290e964ea Merge pull request #281 from rxin/memreport
Added a method to report slave memory status; force serialize accumulator update in local mode.
2012-10-23 22:04:35 -07:00
Matei Zaharia 0bd20c63e2 Merge remote-tracking branch 'JoshRosen/shuffle_refactoring' into dev
Conflicts:
	core/src/main/scala/spark/Dependency.scala
	core/src/main/scala/spark/rdd/CoGroupedRDD.scala
	core/src/main/scala/spark/rdd/ShuffledRDD.scala
2012-10-23 22:01:45 -07:00
Matei Zaharia 7849216bba Merge pull request #286 from JoshRosen/ec2-error-handling
Allow EC2 script to stop/destroy cluster after master/slave failures
2012-10-23 21:15:43 -07:00
Matei Zaharia 46b87dfc3a Merge pull request #292 from tomdz/tweaked-run-file
Tweaked run file to live more happily with typesafe's debian package
2012-10-23 21:14:06 -07:00
Tathagata Das 020d643484 Renamed the streaming testsuites. 2012-10-23 16:24:05 -07:00
Tathagata Das 0e5d9be4df Renamed APIs to create queueStream and fileStream. 2012-10-23 15:17:05 -07:00
Tathagata Das c2731dd3ef Updated StateDStream api to use Options instead of nulls. 2012-10-23 15:10:27 -07:00
Tathagata Das 19191d178d Renamed the network input streams. 2012-10-23 14:40:24 -07:00
Josh Rosen c4aa10154e Fix minor typos in quick start guide. 2012-10-23 13:49:52 -07:00
Tathagata Das a6de5758f1 Modified API of NetworkInputDStreams and got ObjectInputDStream and RawInputDStream working. 2012-10-23 01:41:13 -07:00
Tathagata Das 2c87c853ba Renamed examples 2012-10-22 15:31:19 -07:00
Thomas Dudziak f595bb53d1 Tweaked run file to live more happily with typesafe's debian package 2012-10-22 13:11:05 -07:00
Matei Zaharia 0967e71a00 Bump up version to 0.7.0-SNAPSHOT for master branch 2012-10-22 11:49:42 -07:00
Matei Zaharia 902a608187 Update version to 0.6.1-SNAPSHOT to show this is in development 2012-10-22 11:43:57 -07:00
Josh Rosen d4f2e5b0ef Remove PYTHONPATH from SparkContext's executorEnvs.
It makes more sense to pass it in the dictionary
of environment variables that is used to construct
PythonRDD.
2012-10-22 10:28:59 -07:00
Tathagata Das d85c66636b Added MapValueDStream, FlatMappedValuesDStream and CoGroupedDStream, and therefore DStream operations mapValue, flatMapValues, cogroup, and join. Also, added tests for DStream operations filter, glom, mapPartitions, groupByKey, mapValues, flatMapValues, cogroup, and join. 2012-10-21 17:40:08 -07:00
Tathagata Das c4a2b6f636 Fixed some bugs in tests for forgetting RDDs, and made sure that use of manual clock leads to a zeroTime of 0 in the DStreams (more intuitive). 2012-10-21 10:41:25 -07:00
Matei Zaharia 1be335e8fa Merge branch 'master' into dev 2012-10-21 00:05:02 -07:00