Tathagata Das
|
214345ceac
|
Fixed issue https://spark-project.atlassian.net/browse/STREAMING-29, along with updates to doc comments in SparkContext.checkpoint().
|
2013-01-19 23:50:17 -08:00 |
|
Tathagata Das
|
cd1521cfdb
|
Merge branch 'master' into streaming
Conflicts:
core/src/main/scala/spark/rdd/CoGroupedRDD.scala
core/src/main/scala/spark/rdd/FilteredRDD.scala
docs/_layouts/global.html
docs/index.md
run
|
2013-01-15 12:08:51 -08:00 |
|
Stephen Haberman
|
4ee6b22775
|
Merge branch 'master' into tupleBy
Conflicts:
core/src/test/scala/spark/RDDSuite.scala
|
2013-01-08 09:10:10 -06:00 |
|
Stephen Haberman
|
8dc06069fe
|
Rename RDD.tupleBy to keyBy.
|
2013-01-06 15:21:45 -06:00 |
|
Stephen Haberman
|
1fdb6946b5
|
Add RDD.tupleBy.
|
2013-01-05 13:07:59 -06:00 |
|
Stephen Haberman
|
f4e6b9361f
|
Add RDD.collect(PartialFunction).
|
2013-01-05 12:14:08 -06:00 |
|
Tathagata Das
|
d34dba25c2
|
Merge branch 'mesos' into dev-merge
|
2013-01-01 15:48:39 -08:00 |
|
Josh Rosen
|
f803953998
|
Raise exception when hashing Java arrays (SPARK-597)
|
2012-12-31 20:20:11 -08:00 |
|
Tathagata Das
|
9e644402c1
|
Improved jekyll and scala docs. Made many classes and method private to remove them from scala docs.
|
2012-12-29 18:31:51 -08:00 |
|
Tathagata Das
|
7c33f76291
|
Merge branch 'mesos' into dev-merge
|
2012-12-26 19:19:07 -08:00 |
|
Tathagata Das
|
836042bb9f
|
Merge branch 'dev-checkpoint' of github.com:radlab/spark into dev-merge
Conflicts:
core/src/main/scala/spark/ParallelCollection.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/rdd/BlockRDD.scala
core/src/main/scala/spark/rdd/CartesianRDD.scala
core/src/main/scala/spark/rdd/CoGroupedRDD.scala
core/src/main/scala/spark/rdd/CoalescedRDD.scala
core/src/main/scala/spark/rdd/FilteredRDD.scala
core/src/main/scala/spark/rdd/FlatMappedRDD.scala
core/src/main/scala/spark/rdd/GlommedRDD.scala
core/src/main/scala/spark/rdd/HadoopRDD.scala
core/src/main/scala/spark/rdd/MapPartitionsRDD.scala
core/src/main/scala/spark/rdd/MapPartitionsWithSplitRDD.scala
core/src/main/scala/spark/rdd/MappedRDD.scala
core/src/main/scala/spark/rdd/PipedRDD.scala
core/src/main/scala/spark/rdd/SampledRDD.scala
core/src/main/scala/spark/rdd/ShuffledRDD.scala
core/src/main/scala/spark/rdd/UnionRDD.scala
core/src/main/scala/spark/scheduler/ResultTask.scala
core/src/test/scala/spark/CheckpointSuite.scala
|
2012-12-26 19:09:01 -08:00 |
|
Mark Hamstra
|
61be8566e2
|
Allow distinct() to be called without parentheses when using the default number of splits.
|
2012-12-24 02:36:47 -08:00 |
|
Reynold Xin
|
eac566a7f4
|
Merge branch 'master' of github.com:mesos/spark into dev
Conflicts:
core/src/main/scala/spark/MapOutputTracker.scala
core/src/main/scala/spark/PairRDDFunctions.scala
core/src/main/scala/spark/ParallelCollection.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/rdd/BlockRDD.scala
core/src/main/scala/spark/rdd/CartesianRDD.scala
core/src/main/scala/spark/rdd/CoGroupedRDD.scala
core/src/main/scala/spark/rdd/CoalescedRDD.scala
core/src/main/scala/spark/rdd/FilteredRDD.scala
core/src/main/scala/spark/rdd/FlatMappedRDD.scala
core/src/main/scala/spark/rdd/GlommedRDD.scala
core/src/main/scala/spark/rdd/HadoopRDD.scala
core/src/main/scala/spark/rdd/MapPartitionsRDD.scala
core/src/main/scala/spark/rdd/MapPartitionsWithSplitRDD.scala
core/src/main/scala/spark/rdd/MappedRDD.scala
core/src/main/scala/spark/rdd/PipedRDD.scala
core/src/main/scala/spark/rdd/SampledRDD.scala
core/src/main/scala/spark/rdd/ShuffledRDD.scala
core/src/main/scala/spark/rdd/UnionRDD.scala
core/src/main/scala/spark/storage/BlockManager.scala
core/src/main/scala/spark/storage/BlockManagerId.scala
core/src/main/scala/spark/storage/BlockManagerMaster.scala
core/src/main/scala/spark/storage/StorageLevel.scala
core/src/main/scala/spark/util/MetadataCleaner.scala
core/src/main/scala/spark/util/TimeStampedHashMap.scala
core/src/test/scala/spark/storage/BlockManagerSuite.scala
run
|
2012-12-20 14:53:40 -08:00 |
|
Tathagata Das
|
5184141936
|
Introduced getSpits, getDependencies, and getPreferredLocations in RDD and RDDCheckpointData.
|
2012-12-18 13:30:53 -08:00 |
|
Reynold Xin
|
4f076e105e
|
SPARK-635: Pass a TaskContext object to compute() interface and use
that to close Hadoop input stream. Incorporated Matei's command.
|
2012-12-13 16:41:15 -08:00 |
|
Reynold Xin
|
eacb98e900
|
SPARK-635: Pass a TaskContext object to compute() interface and use that
to close Hadoop input stream.
|
2012-12-13 15:41:53 -08:00 |
|
Tathagata Das
|
8e74fac215
|
Made checkpoint data in RDDs optional to further reduce serialized size.
|
2012-12-11 15:36:12 -08:00 |
|
Tathagata Das
|
746afc2e65
|
Bunch of bug fixes related to checkpointing in RDDs. RDDCheckpointData object is used to lock all serialization and dependency changes for checkpointing. ResultTask converted to Externalizable and serialized RDD is cached like ShuffleMapTask.
|
2012-12-10 23:36:37 -08:00 |
|
Tathagata Das
|
21a0852976
|
Refactored RDD checkpointing to minimize extra fields in RDD class.
|
2012-12-04 22:10:25 -08:00 |
|
Tathagata Das
|
b4dba55f78
|
Made RDD checkpoint not create a new thread. Fixed bug in detecting when spark.cleaner.delay is insufficient.
|
2012-12-02 02:03:05 +00:00 |
|
Matei Zaharia
|
f86960cba9
|
Merge pull request #313 from rxin/pde_size_compress
Added a partition preserving flag to MapPartitionsWithSplitRDD.
|
2012-11-27 22:39:25 -08:00 |
|
Matei Zaharia
|
27e43abd19
|
Added a zip() operation for RDDs with the same shape (number of
partitions and number of elements in each partition)
|
2012-11-27 22:27:47 -08:00 |
|
Reynold Xin
|
bd6dd1a3a6
|
Added a partition preserving flag to MapPartitionsWithSplitRDD.
|
2012-11-27 19:43:30 -08:00 |
|
Tathagata Das
|
c97ebf6437
|
Fixed bug in the number of splits in RDD after checkpointing. Modified reduceByKeyAndWindow (naive) computation from window+reduceByKey to reduceByKey+window+reduceByKey.
|
2012-11-19 23:22:07 +00:00 |
|
Tathagata Das
|
8a25d530ed
|
Optimized checkpoint writing by reusing FileSystem object. Fixed bug in updating of checkpoint data in DStream where the checkpointed RDDs, upon recovery, were not recognized as checkpointed RDDs and therefore deleted from HDFS. Made InputStreamsSuite more robust to timing delays.
|
2012-11-13 02:16:28 -08:00 |
|
Tathagata Das
|
d154238789
|
Made checkpointing of dstream graph to work with checkpointing of RDDs. For streams requiring checkpointing of its RDD, the default checkpoint interval is set to 10 seconds.
|
2012-11-04 12:12:06 -08:00 |
|
Tathagata Das
|
34e569f40e
|
Added 'synchronized' to RDD serialization to ensure checkpoint-related changes are reflected atomically in the task closure. Added to tests to ensure that jobs running on an RDD on which checkpointing is in progress does hurt the result of the job.
|
2012-10-31 00:56:40 -07:00 |
|
Tathagata Das
|
0dcd770fdc
|
Added checkpointing support to all RDDs, along with CheckpointSuite to test checkpointing in them.
|
2012-10-30 16:09:37 -07:00 |
|
Tathagata Das
|
ac12abc17f
|
Modified RDD API to make dependencies a var (therefore can be changed to checkpointed hadoop rdd) and othere references to parent RDDs either through dependencies or through a weak reference (to allow finalizing when dependencies do not refer to it any more).
|
2012-10-29 11:55:27 -07:00 |
|
Josh Rosen
|
4775c55641
|
Change ShuffleFetcher to return an Iterator.
|
2012-10-13 14:59:20 -07:00 |
|
Matei Zaharia
|
b4067cbad4
|
More doc updates, and moved Serializer to a subpackage.
|
2012-10-12 18:19:21 -07:00 |
|
Matei Zaharia
|
dca496bb77
|
Document cartesian() operation
|
2012-10-12 14:46:41 -07:00 |
|
Patrick Wendell
|
dc8adbd359
|
Adding Java documentation
|
2012-10-11 00:49:03 -07:00 |
|
Matei Zaharia
|
ee2fcb2ce6
|
Added documentation to all the *RDDFunction classes, and moved them into
the spark package to make them more visible. Also documented various
other miscellaneous things in the API.
|
2012-10-09 18:38:36 -07:00 |
|
Andy Konwinski
|
1d79ff6028
|
Fixes a typo, adds scaladoc comments to SparkContext constructors.
|
2012-10-08 22:49:17 -07:00 |
|
Patrick Wendell
|
ac310098ef
|
More docs in RDD class
|
2012-10-08 22:25:11 -07:00 |
|
Andy Konwinski
|
bd688940a1
|
A start on scaladoc for the public APIs.
|
2012-10-08 21:13:29 -07:00 |
|
Matei Zaharia
|
65113b7e1b
|
Only group elements ten at a time into SequenceFile records in
saveAsObjectFile
|
2012-10-06 17:14:41 -07:00 |
|
Andy Konwinski
|
a242cdd0a6
|
Factor subclasses of RDD out of RDD.scala into their own classes
in the rdd package.
|
2012-10-05 19:53:54 -07:00 |
|
Andy Konwinski
|
e0067da082
|
Moves all files in core/src/main/scala/ that have RDD in them from
package spark to package spark.rdd and updates all references to them.
|
2012-10-05 19:23:45 -07:00 |
|
Matei Zaharia
|
8c82f43db3
|
Scaladoc documentation for some core Spark functionality
|
2012-10-04 22:59:36 -07:00 |
|
Matei Zaharia
|
6cf5dffc72
|
Make more stuff private[spark]
|
2012-10-02 22:28:55 -07:00 |
|
Matei Zaharia
|
802aa8aef9
|
Some bug fixes and logging fixes for broadcast.
|
2012-10-01 15:20:42 -07:00 |
|
Matei Zaharia
|
53f90d0f0e
|
Use underscores instead of colons in RDD IDs
|
2012-10-01 10:48:53 -07:00 |
|
Matei Zaharia
|
2f11e3c285
|
Merge pull request #227 from JoshRosen/fix/distinct_numsplits
Allow controlling number of splits in distinct().
|
2012-09-28 23:57:24 -07:00 |
|
Josh Rosen
|
8654165e69
|
Use null as dummy value in distinct().
|
2012-09-28 23:55:17 -07:00 |
|
Josh Rosen
|
37c199bbb0
|
Allow controlling number of splits in distinct().
|
2012-09-28 23:44:19 -07:00 |
|
Matei Zaharia
|
3d7267999d
|
Print and track user call sites in more places in Spark
|
2012-09-28 17:42:00 -07:00 |
|
Matei Zaharia
|
9f6efbf06a
|
Merge pull request #225 from pwendell/dev
Log message which records RDD origin
|
2012-09-28 16:28:07 -07:00 |
|
Patrick Wendell
|
9fc78f8f29
|
Fixing some whitespace issues
|
2012-09-28 16:05:50 -07:00 |
|