Josh Rosen
85b8f2c64f
Add epydoc API documentation for PySpark.
2012-12-27 18:04:10 -08:00
Tathagata Das
0bc0a60d30
Modifications to make sure LocalScheduler terminate cleanly without errors when SparkContext is shutdown, to minimize spurious exception during master failure tests.
2012-12-27 15:37:33 -08:00
Josh Rosen
2d98fff065
Add IPython support to pyspark-shell.
...
Suggested by / based on code from @MLnick
2012-12-27 10:17:36 -08:00
Tathagata Das
7c33f76291
Merge branch 'mesos' into dev-merge
2012-12-26 19:19:07 -08:00
Tathagata Das
836042bb9f
Merge branch 'dev-checkpoint' of github.com:radlab/spark into dev-merge
...
Conflicts:
core/src/main/scala/spark/ParallelCollection.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/rdd/BlockRDD.scala
core/src/main/scala/spark/rdd/CartesianRDD.scala
core/src/main/scala/spark/rdd/CoGroupedRDD.scala
core/src/main/scala/spark/rdd/CoalescedRDD.scala
core/src/main/scala/spark/rdd/FilteredRDD.scala
core/src/main/scala/spark/rdd/FlatMappedRDD.scala
core/src/main/scala/spark/rdd/GlommedRDD.scala
core/src/main/scala/spark/rdd/HadoopRDD.scala
core/src/main/scala/spark/rdd/MapPartitionsRDD.scala
core/src/main/scala/spark/rdd/MapPartitionsWithSplitRDD.scala
core/src/main/scala/spark/rdd/MappedRDD.scala
core/src/main/scala/spark/rdd/PipedRDD.scala
core/src/main/scala/spark/rdd/SampledRDD.scala
core/src/main/scala/spark/rdd/ShuffledRDD.scala
core/src/main/scala/spark/rdd/UnionRDD.scala
core/src/main/scala/spark/scheduler/ResultTask.scala
core/src/test/scala/spark/CheckpointSuite.scala
2012-12-26 19:09:01 -08:00
Josh Rosen
1dca0c5180
Remove debug output from PythonPartitioner.
2012-12-26 18:23:06 -08:00
Josh Rosen
e2dad15621
Add support for batched serialization of Python objects in PySpark.
2012-12-26 18:16:09 -08:00
Josh Rosen
4608902fb8
Use filesystem to collect RDDs in PySpark.
...
Passing large volumes of data through Py4J seems
to be slow. It appears to be faster to write the
data to the local filesystem and read it back from
Python.
2012-12-24 17:20:10 -08:00
Matei Zaharia
84587a9bf3
Merge pull request #343 from markhamstra/spark-601
...
lookup() needn't fail when there is no partitioner
2012-12-24 15:28:05 -08:00
Josh Rosen
ccd075cf96
Reduce object overhead in Pyspark shuffle and collect
2012-12-24 15:01:13 -08:00
Mark Hamstra
903f3518df
fall back to filter-map-collect when calling lookup() on an RDD without a partitioner
2012-12-24 13:18:45 -08:00
Matei Zaharia
b575cbe069
Merge pull request #342 from markhamstra/spark-645
...
Allow distinct() to be called without parentheses
2012-12-24 08:04:50 -08:00
Mark Hamstra
61be8566e2
Allow distinct() to be called without parentheses when using the default number of splits.
2012-12-24 02:36:47 -08:00
Patrick Wendell
bce84ceabb
Minor changes after review and general cleanup.
...
- Added filters to Twitter example
- Removed un-used import
- Some code clean-up
2012-12-21 20:57:46 -08:00
Patrick Wendell
9ac4cb1c5f
Adding a Twitter InputDStream with an example
2012-12-21 17:18:19 -08:00
Reynold Xin
a6bb41c6d3
Updated Kryo version for Maven pom file.
2012-12-21 16:25:50 -08:00
Reynold Xin
c68a076037
Updated Kryo documentation for Kryo version update.
2012-12-21 16:03:17 -08:00
Reynold Xin
60f7338092
Remove the call to close input stream in Kryo serializer.
2012-12-21 15:49:33 -08:00
Matei Zaharia
3334b7c6b5
Merge pull request #341 from rxin/4a3fb06ac2d11125feb08acbbd4df76d1e91b677
...
Kryo2 update against Spark master
2012-12-21 15:31:23 -08:00
Reynold Xin
eac566a7f4
Merge branch 'master' of github.com:mesos/spark into dev
...
Conflicts:
core/src/main/scala/spark/MapOutputTracker.scala
core/src/main/scala/spark/PairRDDFunctions.scala
core/src/main/scala/spark/ParallelCollection.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/rdd/BlockRDD.scala
core/src/main/scala/spark/rdd/CartesianRDD.scala
core/src/main/scala/spark/rdd/CoGroupedRDD.scala
core/src/main/scala/spark/rdd/CoalescedRDD.scala
core/src/main/scala/spark/rdd/FilteredRDD.scala
core/src/main/scala/spark/rdd/FlatMappedRDD.scala
core/src/main/scala/spark/rdd/GlommedRDD.scala
core/src/main/scala/spark/rdd/HadoopRDD.scala
core/src/main/scala/spark/rdd/MapPartitionsRDD.scala
core/src/main/scala/spark/rdd/MapPartitionsWithSplitRDD.scala
core/src/main/scala/spark/rdd/MappedRDD.scala
core/src/main/scala/spark/rdd/PipedRDD.scala
core/src/main/scala/spark/rdd/SampledRDD.scala
core/src/main/scala/spark/rdd/ShuffledRDD.scala
core/src/main/scala/spark/rdd/UnionRDD.scala
core/src/main/scala/spark/storage/BlockManager.scala
core/src/main/scala/spark/storage/BlockManagerId.scala
core/src/main/scala/spark/storage/BlockManagerMaster.scala
core/src/main/scala/spark/storage/StorageLevel.scala
core/src/main/scala/spark/util/MetadataCleaner.scala
core/src/main/scala/spark/util/TimeStampedHashMap.scala
core/src/test/scala/spark/storage/BlockManagerSuite.scala
run
2012-12-20 14:53:40 -08:00
Tathagata Das
8512dd3225
Merge branch 'dev' of github.com:radlab/spark into dev-checkpoint
...
Conflicts:
core/src/main/scala/spark/ParallelCollection.scala
core/src/test/scala/spark/CheckpointSuite.scala
streaming/src/main/scala/spark/streaming/DStream.scala
2012-12-20 14:24:19 -08:00
Tathagata Das
fe777eb77d
Fixed bugs in CheckpointRDD and spark.CheckpointSuite.
2012-12-20 13:39:27 -08:00
Tathagata Das
f9c5b0a6fe
Changed checkpoint writing and reading process.
2012-12-20 11:52:23 -08:00
Matei Zaharia
5e51b889fe
Merge pull request #327 from rxin/spark-633
...
Added the ability in block manager to remove blocks.
2012-12-20 11:33:38 -08:00
Reynold Xin
9397c5014e
Let the slave notify the master block removal.
2012-12-20 01:37:09 -08:00
Matei Zaharia
e7051767f7
Merge pull request #337 from pwendell/worker-liveness-ui
...
SPARK-616: Logging dead workers in Web UI.
2012-12-19 15:31:32 -08:00
Reynold Xin
68c52d80ec
Moved BlockManager's IdGenerator into BlockManager object. Removed some
...
excessive debug messages.
2012-12-19 15:27:23 -08:00
Matei Zaharia
30b47794da
Merge pull request #340 from tomdz/deb-packaging-tweaks
...
Tweaked debian packaging to be a bit more in line with debian standards
2012-12-19 12:07:03 -08:00
Thomas Dudziak
5488ac67c3
Tweaked debian packaging to be a bit more in line with debian standards
2012-12-19 10:20:43 -08:00
Matei Zaharia
1e6e154d6d
Merge pull request #338 from tomdz/repl-pom-fix
...
Fixed repl maven build
2012-12-18 14:03:29 -08:00
Tathagata Das
5184141936
Introduced getSpits, getDependencies, and getPreferredLocations in RDD and RDDCheckpointData.
2012-12-18 13:30:53 -08:00
Thomas Dudziak
4af6cad37a
Fixed repl maven build to produce artifacts with the appropriate hadoop classifier and extracted repl fat-jar and debian packaging into a separate project to make Maven happy
2012-12-18 12:08:19 -08:00
Patrick Wendell
bfac06e1f6
SPARK-616: Logging dead workers in Web UI.
...
This patch keeps track of which workers have died and marks them
as such in the master web UI. It also handles workers which die and
re-register using different actor ID's.
2012-12-17 23:09:05 -08:00
Tathagata Das
72eed2b95e
Converted CheckpointState in RDDCheckpointData to use scala Enumeration.
2012-12-17 18:52:43 -08:00
Matei Zaharia
b82a6dd2c7
Merge pull request #332 from JoshRosen/spark-607
...
Add try-finally to handle MapOutputTracker timeouts
2012-12-14 11:41:16 -08:00
Reynold Xin
06f855c24d
Merge branch 'spark-633' of github.com:rxin/spark into spark-633
2012-12-14 00:27:24 -08:00
Reynold Xin
8c01295b85
Fixed conflicts from merging Charles' and TD's block manager changes.
2012-12-14 00:26:36 -08:00
Matei Zaharia
1072f970cc
Merge pull request #331 from woggling/deploy-exit-status
...
Have standalone cluster report exit codes to clients
2012-12-13 22:43:48 -08:00
Charles Reiss
c528932a41
Code review cleanup.
2012-12-13 22:37:16 -08:00
Charles Reiss
0aad42b5e7
Have standalone cluster report exit codes to clients. Addresses SPARK-639.
2012-12-13 22:37:16 -08:00
Reynold Xin
0235667f73
Merge branch 'master' of github.com:mesos/spark into spark-633
2012-12-13 22:33:41 -08:00
Reynold Xin
97434f49b8
Merged TD's block manager refactoring.
2012-12-13 22:32:19 -08:00
Matei Zaharia
d6d910471d
Merge pull request #333 from rxin/master
...
Fixed the broken Java unit test from SPARK-635.
2012-12-13 22:31:53 -08:00
Reynold Xin
f4a9e1b9be
Fixed the broken Java unit test from SPARK-635.
2012-12-13 22:22:12 -08:00
Reynold Xin
41e58a519a
Merge branch 'master' of github.com:mesos/spark into spark-633
2012-12-13 22:06:47 -08:00
Josh Rosen
cf52d9cade
Add try-finally to handle MapOutputTracker timeouts.
2012-12-13 21:53:30 -08:00
Matei Zaharia
05e225f988
Merge pull request #329 from woggling/executor-status-codes
...
Executor exit status codes
2012-12-13 20:14:10 -08:00
Charles Reiss
b054d3b222
ExecutorLostReason -> ExecutorLossReason
2012-12-13 18:44:07 -08:00
Charles Reiss
24d7aa2d15
Extra whitespace in ExecutorExitCode
2012-12-13 18:39:23 -08:00
Matei Zaharia
012aa4e3d4
Merge pull request #330 from JoshRosen/spark-638
...
Use spark-env.sh to configure standalone master
2012-12-13 18:05:32 -08:00