spark-instrumented-optimizer

History

Matei Zaharia 64ba6a8c2c Simplify checkpointing code and RDD class a little: - RDD's getDependencies and getSplits methods are now guaranteed to be called only once, so subclasses can safely do computation in there without worrying about caching the results. - The management of a "splits_" variable that is cleared out when we checkpoint an RDD is now done in the RDD class. - A few of the RDD subclasses are simpler. - CheckpointRDD's compute() method no longer assumes that it is given a CheckpointRDDSplit -- it can work just as well on a split from the original RDD, because it only looks at its index. This is important because things like UnionRDD and ZippedRDD remember the parent's splits as part of their own and wouldn't work on checkpointed parents. - RDD.iterator can now reuse cached data if an RDD is computed before it is checkpointed. It seems like it wouldn't do this before (it always called iterator() on the CheckpointRDD, which read from HDFS).		2013-01-28 22:30:12 -08:00
..
api	Simplify checkpointing code and RDD class a little:	2013-01-28 22:30:12 -08:00
broadcast	Merge branch 'master' into streaming	2013-01-20 12:47:55 -08:00
deploy	Clean up BlockManagerUI a little (make it not be an object, merge with	2013-01-27 23:56:14 -08:00
executor	Rename more things from slave to executor	2013-01-27 23:17:20 -08:00
network	Refactor daemon thread pool creation.	2013-01-21 23:31:00 -08:00
partial	Make more stuff private[spark]	2012-10-02 22:28:55 -07:00
rdd	Simplify checkpointing code and RDD class a little:	2013-01-28 22:30:12 -08:00
scheduler	Merge pull request #413 from pwendell/stage-logging	2013-01-28 22:01:52 -08:00
serializer	More doc updates, and moved Serializer to a subpackage.	2012-10-12 18:19:21 -07:00
storage	Some DEBUG-level log cleanup.	2013-01-28 20:29:35 -08:00
util	Simplify checkpointing code and RDD class a little:	2013-01-28 22:30:12 -08:00
Accumulators.scala	Minor cleanup.	2013-01-21 15:55:46 -06:00
Aggregator.scala	Remove mapSideCombine field from Aggregator.	2012-10-13 14:59:20 -07:00
BlockStoreShuffleFetcher.scala	Change ShuffleFetcher to return an Iterator.	2012-10-13 14:59:20 -07:00
Cache.scala	Make classes package private	2012-10-02 19:00:19 -07:00
CacheManager.scala	Simplify checkpointing code and RDD class a little:	2013-01-28 22:30:12 -08:00
ClosureCleaner.scala	Make classes package private	2012-10-02 19:00:19 -07:00
Dependency.scala	Updated PruneDependency to change "split" to "partition".	2013-01-23 22:22:03 -08:00
DoubleRDDFunctions.scala	Added documentation to all the *RDDFunction classes, and moved them into	2012-10-09 18:38:36 -07:00
FetchFailedException.scala	Make classes package private	2012-10-02 19:00:19 -07:00
HadoopWriter.scala	Support for Hadoop 2 distributions such as cdh4	2012-10-18 16:08:54 -07:00
HttpFileServer.scala	Don't download files to master's working directory.	2013-01-21 17:34:17 -08:00
HttpServer.scala	Fix for hanging spark.HttpFileServer with kind of virtual network	2013-01-22 23:08:34 +09:00
JavaSerializer.scala	More doc updates, and moved Serializer to a subpackage.	2012-10-12 18:19:21 -07:00
KryoSerializer.scala	Fix compile error due to cherry-pick	2013-01-23 13:07:27 -08:00
Logging.scala	Minor cleanup.	2013-01-21 15:55:46 -06:00
MapOutputTracker.scala	Track workers by executor ID instead of hostname to allow multiple	2013-01-27 19:23:49 -08:00
package.scala	Scaladoc documentation for some core Spark functionality	2012-10-04 22:59:36 -07:00
PairRDDFunctions.scala	Simplify checkpointing code and RDD class a little:	2013-01-28 22:30:12 -08:00
ParallelCollection.scala	Further simplify getOrElse call.	2013-01-21 21:30:24 -06:00
Partitioner.scala	Raise exception when hashing Java arrays (SPARK-597)	2012-12-31 20:20:11 -08:00
RDD.scala	Simplify checkpointing code and RDD class a little:	2013-01-28 22:30:12 -08:00
RDDCheckpointData.scala	Simplify checkpointing code and RDD class a little:	2013-01-28 22:30:12 -08:00
SequenceFileRDDFunctions.scala	Update Hadoop dependency to 1.0.3 as 0.20 has Sun specific dependencies. Also	2013-01-07 15:57:33 -08:00
SerializableWritable.scala	Fix issue #65 : Change @serializable to extends Serializable in 2.9 branch	2011-08-02 10:16:33 +01:00
ShuffleFetcher.scala	Change ShuffleFetcher to return an Iterator.	2012-10-13 14:59:20 -07:00
SizeEstimator.scala	Remove dependencies on sun jvm classes. Instead use reflection to infer	2013-01-07 15:57:18 -08:00
SoftReferenceCache.scala	Make classes package private	2012-10-02 19:00:19 -07:00
SparkContext.scala	add long and float accumulatorparams	2013-01-28 20:23:11 -08:00
SparkEnv.scala	Track workers by executor ID instead of hostname to allow multiple	2013-01-27 19:23:49 -08:00
SparkException.scala	Upgraded to Akka 2 and fixed test execution (which was still parallel	2012-06-28 23:51:28 -07:00
SparkFiles.java	Allow PySpark's SparkFiles to be used from driver	2013-01-23 10:58:50 -08:00
Split.scala	Various code style fixes, mostly from IntelliJ IDEA	2012-06-29 18:47:12 -07:00
TaskContext.scala	Minor cleanup.	2013-01-21 15:55:46 -06:00
TaskEndReason.scala	Stylistic changes and Public Accumulable and Broadcast	2012-10-02 19:28:37 -07:00
TaskState.scala	Make classes package private	2012-10-02 19:00:19 -07:00
Utils.scala	Clean up BlockManagerUI a little (make it not be an object, merge with	2013-01-27 23:56:14 -08:00