Commit graph

1739 commits

Author SHA1 Message Date
Patrick Wendell c438faeac4 Merge pull request #10 from radlab/datahandler-fix
Several code-quality improvements to DataHandler.
2013-01-02 17:07:12 -08:00
Patrick Wendell 2ef993d159 BufferingBlockCreator -> NetworkReceiver.BlockGenerator 2013-01-02 14:19:51 -08:00
Patrick Wendell 96a6ff0b09 Merge branch 'dev-merge' into datahandler-fix
Conflicts:
	streaming/src/main/scala/spark/streaming/dstream/DataHandler.scala
2013-01-02 14:08:15 -08:00
Patrick Wendell 493d65ce65 Several code-quality improvements to DataHandler.
- Changed to more accurate name: BufferingBlockCreator
- Docstring now correctly reflects the abstraction
  offered by the class
- Made internal methods private
- Fixed indentation problems
2013-01-02 13:39:18 -08:00
Tathagata Das 3dc87dd923 Fixed compilation bug in RDDSuite created during merge for mesos/master. 2013-01-01 16:38:04 -08:00
Tathagata Das d34dba25c2 Merge branch 'mesos' into dev-merge 2013-01-01 15:48:39 -08:00
Tathagata Das 02497f0cd4 Updated Streaming Programming Guide. 2013-01-01 12:21:32 -08:00
Matei Zaharia 55809fbc6d Merge pull request #349 from woggling/cache-finally
Avoid stalls when computation of cached RDD throws exception
2013-01-01 08:21:33 -08:00
Matei Zaharia c593f6329e Merge pull request #348 from JoshRosen/spark-597
Raise exception when hashing Java arrays (SPARK-597)
2013-01-01 08:20:06 -08:00
Charles Reiss 58072a7340 Remove some dead comments 2013-01-01 08:07:44 -08:00
Charles Reiss 21636ee4fa Test with exception while computing cached RDD. 2013-01-01 08:07:40 -08:00
Charles Reiss feadaf72f4 Mark key as not loading in CacheTracker even when compute() fails 2013-01-01 07:57:20 -08:00
Josh Rosen f803953998 Raise exception when hashing Java arrays (SPARK-597) 2012-12-31 20:20:11 -08:00
Tathagata Das 18b9b3b99f More classes made private[streaming] to hide from scala docs. 2012-12-30 20:00:42 -08:00
Tathagata Das 7e0271b438 Refactored a whole lot to push all DStreams into the spark.streaming.dstream package. 2012-12-30 15:19:55 -08:00
Tathagata Das 9e644402c1 Improved jekyll and scala docs. Made many classes and method private to remove them from scala docs. 2012-12-29 18:31:51 -08:00
Matei Zaharia 3f74f729a1 Merge pull request #345 from JoshRosen/fix/add-file
Fix deletion of files in current working directory by clearFiles()
2012-12-29 15:01:33 -08:00
Patrick Wendell 518111573f Merge pull request #8 from radlab/twitter-example
Adding a Twitter InputDStream with an example
2012-12-29 14:23:01 -08:00
Josh Rosen 397e67103c Change Utils.fetchFile() warning to SparkException. 2012-12-28 17:37:13 -08:00
Josh Rosen d64fa72d2e Add addFile() and addJar() to JavaSparkContext. 2012-12-28 17:00:57 -08:00
Josh Rosen bd237d4a9d Add synchronization to LocalScheduler.updateDependencies(). 2012-12-28 17:00:57 -08:00
Josh Rosen f1bf4f0385 Skip deletion of files in clearFiles().
This fixes an issue where Spark could delete
original files in the current working directory
that were added to the job using addFile().

There was also the potential for addFile() to
overwrite local files, which is addressed by
changing Utils.fetchFile() to log a warning
instead of overwriting a file with new contents.

This is a short-term fix; a better long-term
solution would be to remove the dependence on
storing files in the current working directory,
since we can't change the cwd from Java.
2012-12-28 17:00:57 -08:00
Tathagata Das 0bc0a60d30 Modifications to make sure LocalScheduler terminate cleanly without errors when SparkContext is shutdown, to minimize spurious exception during master failure tests. 2012-12-27 15:37:33 -08:00
Tathagata Das 7c33f76291 Merge branch 'mesos' into dev-merge 2012-12-26 19:19:07 -08:00
Tathagata Das 836042bb9f Merge branch 'dev-checkpoint' of github.com:radlab/spark into dev-merge
Conflicts:
	core/src/main/scala/spark/ParallelCollection.scala
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/rdd/BlockRDD.scala
	core/src/main/scala/spark/rdd/CartesianRDD.scala
	core/src/main/scala/spark/rdd/CoGroupedRDD.scala
	core/src/main/scala/spark/rdd/CoalescedRDD.scala
	core/src/main/scala/spark/rdd/FilteredRDD.scala
	core/src/main/scala/spark/rdd/FlatMappedRDD.scala
	core/src/main/scala/spark/rdd/GlommedRDD.scala
	core/src/main/scala/spark/rdd/HadoopRDD.scala
	core/src/main/scala/spark/rdd/MapPartitionsRDD.scala
	core/src/main/scala/spark/rdd/MapPartitionsWithSplitRDD.scala
	core/src/main/scala/spark/rdd/MappedRDD.scala
	core/src/main/scala/spark/rdd/PipedRDD.scala
	core/src/main/scala/spark/rdd/SampledRDD.scala
	core/src/main/scala/spark/rdd/ShuffledRDD.scala
	core/src/main/scala/spark/rdd/UnionRDD.scala
	core/src/main/scala/spark/scheduler/ResultTask.scala
	core/src/test/scala/spark/CheckpointSuite.scala
2012-12-26 19:09:01 -08:00
Matei Zaharia 84587a9bf3 Merge pull request #343 from markhamstra/spark-601
lookup() needn't fail when there is no partitioner
2012-12-24 15:28:05 -08:00
Mark Hamstra 903f3518df fall back to filter-map-collect when calling lookup() on an RDD without a partitioner 2012-12-24 13:18:45 -08:00
Matei Zaharia b575cbe069 Merge pull request #342 from markhamstra/spark-645
Allow distinct() to be called without parentheses
2012-12-24 08:04:50 -08:00
Mark Hamstra 61be8566e2 Allow distinct() to be called without parentheses when using the default number of splits. 2012-12-24 02:36:47 -08:00
Patrick Wendell bce84ceabb Minor changes after review and general cleanup.
- Added filters to Twitter example
- Removed un-used import
- Some code clean-up
2012-12-21 20:57:46 -08:00
Patrick Wendell 9ac4cb1c5f Adding a Twitter InputDStream with an example 2012-12-21 17:18:19 -08:00
Reynold Xin a6bb41c6d3 Updated Kryo version for Maven pom file. 2012-12-21 16:25:50 -08:00
Reynold Xin c68a076037 Updated Kryo documentation for Kryo version update. 2012-12-21 16:03:17 -08:00
Reynold Xin 60f7338092 Remove the call to close input stream in Kryo serializer. 2012-12-21 15:49:33 -08:00
Matei Zaharia 3334b7c6b5 Merge pull request #341 from rxin/4a3fb06ac2d11125feb08acbbd4df76d1e91b677
Kryo2 update against Spark master
2012-12-21 15:31:23 -08:00
Reynold Xin eac566a7f4 Merge branch 'master' of github.com:mesos/spark into dev
Conflicts:
	core/src/main/scala/spark/MapOutputTracker.scala
	core/src/main/scala/spark/PairRDDFunctions.scala
	core/src/main/scala/spark/ParallelCollection.scala
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/rdd/BlockRDD.scala
	core/src/main/scala/spark/rdd/CartesianRDD.scala
	core/src/main/scala/spark/rdd/CoGroupedRDD.scala
	core/src/main/scala/spark/rdd/CoalescedRDD.scala
	core/src/main/scala/spark/rdd/FilteredRDD.scala
	core/src/main/scala/spark/rdd/FlatMappedRDD.scala
	core/src/main/scala/spark/rdd/GlommedRDD.scala
	core/src/main/scala/spark/rdd/HadoopRDD.scala
	core/src/main/scala/spark/rdd/MapPartitionsRDD.scala
	core/src/main/scala/spark/rdd/MapPartitionsWithSplitRDD.scala
	core/src/main/scala/spark/rdd/MappedRDD.scala
	core/src/main/scala/spark/rdd/PipedRDD.scala
	core/src/main/scala/spark/rdd/SampledRDD.scala
	core/src/main/scala/spark/rdd/ShuffledRDD.scala
	core/src/main/scala/spark/rdd/UnionRDD.scala
	core/src/main/scala/spark/storage/BlockManager.scala
	core/src/main/scala/spark/storage/BlockManagerId.scala
	core/src/main/scala/spark/storage/BlockManagerMaster.scala
	core/src/main/scala/spark/storage/StorageLevel.scala
	core/src/main/scala/spark/util/MetadataCleaner.scala
	core/src/main/scala/spark/util/TimeStampedHashMap.scala
	core/src/test/scala/spark/storage/BlockManagerSuite.scala
	run
2012-12-20 14:53:40 -08:00
Tathagata Das 8512dd3225 Merge branch 'dev' of github.com:radlab/spark into dev-checkpoint
Conflicts:
	core/src/main/scala/spark/ParallelCollection.scala
	core/src/test/scala/spark/CheckpointSuite.scala
	streaming/src/main/scala/spark/streaming/DStream.scala
2012-12-20 14:24:19 -08:00
Tathagata Das fe777eb77d Fixed bugs in CheckpointRDD and spark.CheckpointSuite. 2012-12-20 13:39:27 -08:00
Tathagata Das f9c5b0a6fe Changed checkpoint writing and reading process. 2012-12-20 11:52:23 -08:00
Matei Zaharia 5e51b889fe Merge pull request #327 from rxin/spark-633
Added the ability in block manager to remove blocks.
2012-12-20 11:33:38 -08:00
Reynold Xin 9397c5014e Let the slave notify the master block removal. 2012-12-20 01:37:09 -08:00
Matei Zaharia e7051767f7 Merge pull request #337 from pwendell/worker-liveness-ui
SPARK-616: Logging dead workers in Web UI.
2012-12-19 15:31:32 -08:00
Reynold Xin 68c52d80ec Moved BlockManager's IdGenerator into BlockManager object. Removed some
excessive debug messages.
2012-12-19 15:27:23 -08:00
Matei Zaharia 30b47794da Merge pull request #340 from tomdz/deb-packaging-tweaks
Tweaked debian packaging to be a bit more in line with debian standards
2012-12-19 12:07:03 -08:00
Thomas Dudziak 5488ac67c3 Tweaked debian packaging to be a bit more in line with debian standards 2012-12-19 10:20:43 -08:00
Matei Zaharia 1e6e154d6d Merge pull request #338 from tomdz/repl-pom-fix
Fixed repl maven build
2012-12-18 14:03:29 -08:00
Tathagata Das 5184141936 Introduced getSpits, getDependencies, and getPreferredLocations in RDD and RDDCheckpointData. 2012-12-18 13:30:53 -08:00
Thomas Dudziak 4af6cad37a Fixed repl maven build to produce artifacts with the appropriate hadoop classifier and extracted repl fat-jar and debian packaging into a separate project to make Maven happy 2012-12-18 12:08:19 -08:00
Patrick Wendell bfac06e1f6 SPARK-616: Logging dead workers in Web UI.
This patch keeps track of which workers have died and marks them
as such in the master web UI. It also handles workers which die and
re-register using different actor ID's.
2012-12-17 23:09:05 -08:00
Tathagata Das 72eed2b95e Converted CheckpointState in RDDCheckpointData to use scala Enumeration. 2012-12-17 18:52:43 -08:00