ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Tathagata Das	214345ceac	Fixed issue https://spark-project.atlassian.net/browse/STREAMING-29 , along with updates to doc comments in SparkContext.checkpoint().	2013-01-19 23:50:17 -08:00
Tathagata Das	cd1521cfdb	Merge branch 'master' into streaming Conflicts: core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/FilteredRDD.scala docs/_layouts/global.html docs/index.md run	2013-01-15 12:08:51 -08:00
Stephen Haberman	4ee6b22775	Merge branch 'master' into tupleBy Conflicts: core/src/test/scala/spark/RDDSuite.scala	2013-01-08 09:10:10 -06:00
Stephen Haberman	8dc06069fe	Rename RDD.tupleBy to keyBy.	2013-01-06 15:21:45 -06:00
Stephen Haberman	1fdb6946b5	Add RDD.tupleBy.	2013-01-05 13:07:59 -06:00
Stephen Haberman	f4e6b9361f	Add RDD.collect(PartialFunction).	2013-01-05 12:14:08 -06:00
Tathagata Das	d34dba25c2	Merge branch 'mesos' into dev-merge	2013-01-01 15:48:39 -08:00
Josh Rosen	f803953998	Raise exception when hashing Java arrays (SPARK-597)	2012-12-31 20:20:11 -08:00
Tathagata Das	9e644402c1	Improved jekyll and scala docs. Made many classes and method private to remove them from scala docs.	2012-12-29 18:31:51 -08:00
Tathagata Das	7c33f76291	Merge branch 'mesos' into dev-merge	2012-12-26 19:19:07 -08:00
Tathagata Das	836042bb9f	Merge branch 'dev-checkpoint' of github.com:radlab/spark into dev-merge Conflicts: core/src/main/scala/spark/ParallelCollection.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/rdd/BlockRDD.scala core/src/main/scala/spark/rdd/CartesianRDD.scala core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/CoalescedRDD.scala core/src/main/scala/spark/rdd/FilteredRDD.scala core/src/main/scala/spark/rdd/FlatMappedRDD.scala core/src/main/scala/spark/rdd/GlommedRDD.scala core/src/main/scala/spark/rdd/HadoopRDD.scala core/src/main/scala/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/spark/rdd/MapPartitionsWithSplitRDD.scala core/src/main/scala/spark/rdd/MappedRDD.scala core/src/main/scala/spark/rdd/PipedRDD.scala core/src/main/scala/spark/rdd/SampledRDD.scala core/src/main/scala/spark/rdd/ShuffledRDD.scala core/src/main/scala/spark/rdd/UnionRDD.scala core/src/main/scala/spark/scheduler/ResultTask.scala core/src/test/scala/spark/CheckpointSuite.scala	2012-12-26 19:09:01 -08:00
Mark Hamstra	61be8566e2	Allow distinct() to be called without parentheses when using the default number of splits.	2012-12-24 02:36:47 -08:00
Reynold Xin	eac566a7f4	Merge branch 'master' of github.com:mesos/spark into dev Conflicts: core/src/main/scala/spark/MapOutputTracker.scala core/src/main/scala/spark/PairRDDFunctions.scala core/src/main/scala/spark/ParallelCollection.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/rdd/BlockRDD.scala core/src/main/scala/spark/rdd/CartesianRDD.scala core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/CoalescedRDD.scala core/src/main/scala/spark/rdd/FilteredRDD.scala core/src/main/scala/spark/rdd/FlatMappedRDD.scala core/src/main/scala/spark/rdd/GlommedRDD.scala core/src/main/scala/spark/rdd/HadoopRDD.scala core/src/main/scala/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/spark/rdd/MapPartitionsWithSplitRDD.scala core/src/main/scala/spark/rdd/MappedRDD.scala core/src/main/scala/spark/rdd/PipedRDD.scala core/src/main/scala/spark/rdd/SampledRDD.scala core/src/main/scala/spark/rdd/ShuffledRDD.scala core/src/main/scala/spark/rdd/UnionRDD.scala core/src/main/scala/spark/storage/BlockManager.scala core/src/main/scala/spark/storage/BlockManagerId.scala core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/main/scala/spark/storage/StorageLevel.scala core/src/main/scala/spark/util/MetadataCleaner.scala core/src/main/scala/spark/util/TimeStampedHashMap.scala core/src/test/scala/spark/storage/BlockManagerSuite.scala run	2012-12-20 14:53:40 -08:00
Tathagata Das	5184141936	Introduced getSpits, getDependencies, and getPreferredLocations in RDD and RDDCheckpointData.	2012-12-18 13:30:53 -08:00
Reynold Xin	4f076e105e	SPARK-635: Pass a TaskContext object to compute() interface and use that to close Hadoop input stream. Incorporated Matei's command.	2012-12-13 16:41:15 -08:00
Reynold Xin	eacb98e900	SPARK-635: Pass a TaskContext object to compute() interface and use that to close Hadoop input stream.	2012-12-13 15:41:53 -08:00
Tathagata Das	8e74fac215	Made checkpoint data in RDDs optional to further reduce serialized size.	2012-12-11 15:36:12 -08:00
Tathagata Das	746afc2e65	Bunch of bug fixes related to checkpointing in RDDs. RDDCheckpointData object is used to lock all serialization and dependency changes for checkpointing. ResultTask converted to Externalizable and serialized RDD is cached like ShuffleMapTask.	2012-12-10 23:36:37 -08:00
Tathagata Das	21a0852976	Refactored RDD checkpointing to minimize extra fields in RDD class.	2012-12-04 22:10:25 -08:00
Tathagata Das	b4dba55f78	Made RDD checkpoint not create a new thread. Fixed bug in detecting when spark.cleaner.delay is insufficient.	2012-12-02 02:03:05 +00:00
Matei Zaharia	f86960cba9	Merge pull request #313 from rxin/pde_size_compress Added a partition preserving flag to MapPartitionsWithSplitRDD.	2012-11-27 22:39:25 -08:00
Matei Zaharia	27e43abd19	Added a zip() operation for RDDs with the same shape (number of partitions and number of elements in each partition)	2012-11-27 22:27:47 -08:00
Reynold Xin	bd6dd1a3a6	Added a partition preserving flag to MapPartitionsWithSplitRDD.	2012-11-27 19:43:30 -08:00
Tathagata Das	c97ebf6437	Fixed bug in the number of splits in RDD after checkpointing. Modified reduceByKeyAndWindow (naive) computation from window+reduceByKey to reduceByKey+window+reduceByKey.	2012-11-19 23:22:07 +00:00
Tathagata Das	8a25d530ed	Optimized checkpoint writing by reusing FileSystem object. Fixed bug in updating of checkpoint data in DStream where the checkpointed RDDs, upon recovery, were not recognized as checkpointed RDDs and therefore deleted from HDFS. Made InputStreamsSuite more robust to timing delays.	2012-11-13 02:16:28 -08:00
Tathagata Das	d154238789	Made checkpointing of dstream graph to work with checkpointing of RDDs. For streams requiring checkpointing of its RDD, the default checkpoint interval is set to 10 seconds.	2012-11-04 12:12:06 -08:00
Tathagata Das	34e569f40e	Added 'synchronized' to RDD serialization to ensure checkpoint-related changes are reflected atomically in the task closure. Added to tests to ensure that jobs running on an RDD on which checkpointing is in progress does hurt the result of the job.	2012-10-31 00:56:40 -07:00
Tathagata Das	0dcd770fdc	Added checkpointing support to all RDDs, along with CheckpointSuite to test checkpointing in them.	2012-10-30 16:09:37 -07:00
Tathagata Das	ac12abc17f	Modified RDD API to make dependencies a var (therefore can be changed to checkpointed hadoop rdd) and othere references to parent RDDs either through dependencies or through a weak reference (to allow finalizing when dependencies do not refer to it any more).	2012-10-29 11:55:27 -07:00
Josh Rosen	4775c55641	Change ShuffleFetcher to return an Iterator.	2012-10-13 14:59:20 -07:00
Matei Zaharia	b4067cbad4	More doc updates, and moved Serializer to a subpackage.	2012-10-12 18:19:21 -07:00
Matei Zaharia	dca496bb77	Document cartesian() operation	2012-10-12 14:46:41 -07:00
Patrick Wendell	dc8adbd359	Adding Java documentation	2012-10-11 00:49:03 -07:00
Matei Zaharia	ee2fcb2ce6	Added documentation to all the *RDDFunction classes, and moved them into the spark package to make them more visible. Also documented various other miscellaneous things in the API.	2012-10-09 18:38:36 -07:00
Andy Konwinski	1d79ff6028	Fixes a typo, adds scaladoc comments to SparkContext constructors.	2012-10-08 22:49:17 -07:00
Patrick Wendell	ac310098ef	More docs in RDD class	2012-10-08 22:25:11 -07:00
Andy Konwinski	bd688940a1	A start on scaladoc for the public APIs.	2012-10-08 21:13:29 -07:00
Matei Zaharia	65113b7e1b	Only group elements ten at a time into SequenceFile records in saveAsObjectFile	2012-10-06 17:14:41 -07:00
Andy Konwinski	a242cdd0a6	Factor subclasses of RDD out of RDD.scala into their own classes in the rdd package.	2012-10-05 19:53:54 -07:00
Andy Konwinski	e0067da082	Moves all files in core/src/main/scala/ that have RDD in them from package spark to package spark.rdd and updates all references to them.	2012-10-05 19:23:45 -07:00
Matei Zaharia	8c82f43db3	Scaladoc documentation for some core Spark functionality	2012-10-04 22:59:36 -07:00
Matei Zaharia	6cf5dffc72	Make more stuff private[spark]	2012-10-02 22:28:55 -07:00
Matei Zaharia	802aa8aef9	Some bug fixes and logging fixes for broadcast.	2012-10-01 15:20:42 -07:00
Matei Zaharia	53f90d0f0e	Use underscores instead of colons in RDD IDs	2012-10-01 10:48:53 -07:00
Matei Zaharia	2f11e3c285	Merge pull request #227 from JoshRosen/fix/distinct_numsplits Allow controlling number of splits in distinct().	2012-09-28 23:57:24 -07:00
Josh Rosen	8654165e69	Use null as dummy value in distinct().	2012-09-28 23:55:17 -07:00
Josh Rosen	37c199bbb0	Allow controlling number of splits in distinct().	2012-09-28 23:44:19 -07:00
Matei Zaharia	3d7267999d	Print and track user call sites in more places in Spark	2012-09-28 17:42:00 -07:00
Matei Zaharia	9f6efbf06a	Merge pull request #225 from pwendell/dev Log message which records RDD origin	2012-09-28 16:28:07 -07:00
Patrick Wendell	9fc78f8f29	Fixing some whitespace issues	2012-09-28 16:05:50 -07:00

1 2 3

121 commits