spark-instrumented-optimizer/project
Josh Rosen b5f1ab701a [SPARK-13990] Automatically pick serializer when caching RDDs
Building on the `SerializerManager` introduced in SPARK-13926/ #11755, this patch Spark modifies Spark's BlockManager to use RDD's ClassTags in order to select the best serializer to use when caching RDD blocks.

When storing a local block, the BlockManager `put()` methods use implicits to record ClassTags and stores those tags in the blocks' BlockInfo records. When reading a local block, the stored ClassTag is used to pick the appropriate serializer. When a block is stored with replication, the class tag is written into the block transfer metadata and will also be stored in the remote BlockManager.

There are two or three places where we don't properly pass ClassTags, including TorrentBroadcast and BlockRDD. I think this happens to work because the missing ClassTag always happens to be `ClassTag.Any`, but it might be worth looking more carefully at those places to see whether we should be more explicit.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #11801 from JoshRosen/pick-best-serializer-for-caching.
2016-03-21 17:19:39 -07:00
..
project [MINOR][BUILD] Changed the comment to reflect the plugin project is there to support SBT pom reader only. 2015-11-30 09:30:58 +00:00
build.properties [SPARK-13834][BUILD] Update sbt and sbt plugins for 2.x. 2016-03-13 18:47:04 -07:00
MimaBuild.scala [SPARK-13948] MiMa check should catch if the visibility changes to private 2016-03-16 23:02:25 -07:00
MimaExcludes.scala [SPARK-13990] Automatically pick serializer when caching RDDs 2016-03-21 17:19:39 -07:00
plugins.sbt [SPARK-13834][BUILD] Update sbt and sbt plugins for 2.x. 2016-03-13 18:47:04 -07:00
SparkBuild.scala [SPARK-13948] MiMa check should catch if the visibility changes to private 2016-03-16 23:02:25 -07:00