Commit graph

13261 commits

Author SHA1 Message Date
Matei Zaharia b4a2214218 More fault tolerance fixes to catch lost tasks 2012-08-27 22:49:29 -07:00
Matei Zaharia 291abc2c28 Merge pull request #181 from rxin/dev
Removed the deserialization cache for ShuffleMapTask
2012-08-27 22:38:22 -07:00
Reynold Xin 3a6a95dc24 Removed the deserialization cache for ShuffleMapTask because it was
causing concurrency problems (some variables in Shark get set to null).
The cost of task deserialization on slaves is trivial compared with the
execution time of the task anyway.
2012-08-27 22:33:15 -07:00
Josh Rosen 4143678509 Fix minor bugs in Python API examples. 2012-08-27 00:24:47 -07:00
Josh Rosen bff6a46359 Add pipe(), saveAsTextFile(), sc.union() to Python API. 2012-08-27 00:24:47 -07:00
Josh Rosen 200d248dcc Simplify Python worker; pipeline the map step of partitionBy(). 2012-08-27 00:24:39 -07:00
Josh Rosen 6904cb77d4 Use local combiners in Python API combineByKey(). 2012-08-27 00:19:26 -07:00
Josh Rosen 8b64b7ecd8 Add countByKey(), reduceByKeyLocally() to Python API 2012-08-27 00:19:22 -07:00
Josh Rosen 08b201d810 Add mapPartitions(), glom(), countByValue() to Python API. 2012-08-27 00:19:14 -07:00
Josh Rosen f79a1e4d2a Add broadcast variables to Python API. 2012-08-27 00:16:47 -07:00
Josh Rosen 65e8406029 Implement fold() in Python API. 2012-08-27 00:16:47 -07:00
root e2cf197a0a Made WordCount2 even more configurable 2012-08-27 03:34:15 +00:00
root 9635823947 Merge branch 'dev' of github.com:radlab/spark into dev 2012-08-27 03:08:25 +00:00
Matei Zaharia b914cd0dfa Serialize generation correctly in ShuffleMapTask 2012-08-26 20:07:59 -07:00
root 20f6b0cfc9 Merge branch 'dev' of github.com:radlab/spark into dev 2012-08-27 03:01:03 +00:00
Matei Zaharia 69c2ab0408 logging 2012-08-26 20:00:58 -07:00
root 89c5c03035 Merge branch 'dev' of github.com:radlab/spark into dev 2012-08-27 02:53:07 +00:00
Matei Zaharia 117e3f8c86 Fix a bug that was causing FetchFailedException not to be thrown 2012-08-26 19:52:56 -07:00
root beb6456442 Merge branch 'dev' of github.com:radlab/spark into dev 2012-08-27 02:37:49 +00:00
Matei Zaharia 3c9c44a8d3 More helpful log messages 2012-08-26 19:37:43 -07:00
root 7b59943d79 Merge branch 'dev' of github.com:radlab/spark into dev 2012-08-27 01:57:12 +00:00
Matei Zaharia 26dfd20c9a Detect disconnected slaves in StandaloneScheduler 2012-08-26 18:56:56 -07:00
root b78c5ae803 Merge branch 'dev' of github.com:radlab/spark into dev 2012-08-27 01:16:39 +00:00
Matei Zaharia 29e83f39e9 Fix replication with MEMORY_ONLY_DESER_2 2012-08-26 18:16:25 -07:00
root 9de1c3abf9 Tweaks to WordCount2 2012-08-27 00:57:00 +00:00
Matei Zaharia 57796b183e Code style 2012-08-26 17:25:22 -07:00
Matei Zaharia 22b1a20e61 Made Time and Interval immutable 2012-08-26 17:04:34 -07:00
Matei Zaharia 23a29b6d19 Merge branch 'dev' of github.com:radlab/spark into dev 2012-08-26 16:45:37 -07:00
Matei Zaharia b120e24fe0 Add equals and hashCode to Time 2012-08-26 16:45:14 -07:00
root b08ff710af Added sliding word count, and some fixes to reduce window DStream 2012-08-26 23:40:50 +00:00
Matei Zaharia 06ef7c3d1b Less debug info 2012-08-26 16:29:20 -07:00
Matei Zaharia ad6537321e Make Time serializable 2012-08-26 16:27:23 -07:00
Matei Zaharia 741899b21e Fix sendMessageReliablySync 2012-08-26 16:26:06 -07:00
Matei Zaharia 51453eb87b Merge pull request #179 from JoshRosen/fix/sparklr-caching
Cache points in SparkLR example
2012-08-26 15:32:50 -07:00
Josh Rosen 566feafe1d Cache points in SparkLR example. 2012-08-26 15:24:43 -07:00
Josh Rosen f3b852ce66 Refactor Python MappedRDD to use iterator pipelines. 2012-08-24 19:44:14 -07:00
Josh Rosen 4b52300487 Fix options parsing in Python pi example. 2012-08-24 19:42:47 -07:00
Matei Zaharia e7a5cbb543 Reduce log4j verbosity for streaming 2012-08-24 16:45:01 -07:00
Matei Zaharia 091b1438f5 Fix WordCount job name 2012-08-24 16:43:59 -07:00
Matei Zaharia 5a8015d2db Merge remote-tracking branch 'public/dev' into dev 2012-08-24 16:11:44 -07:00
Mosharaf Chowdhury edd1a740a6 Merge remote-tracking branch 'upstream/dev' into dev 2012-08-23 20:43:27 -07:00
Matei Zaharia 2c16ae36d7 Set log level in tests to WARN 2012-08-23 20:38:14 -07:00
Matei Zaharia deedb9e7b7 Fix further issues with tests and broadcast.
The broadcast fix is to store values as MEMORY_ONLY_DESER instead of
MEMORY_ONLY, which will save substantial time on serialization.
2012-08-23 20:31:49 -07:00
Mosharaf Chowdhury 3b1f5480a4 Merge remote-tracking branch 'upstream/dev' into dev 2012-08-23 20:16:50 -07:00
Matei Zaharia 59b831b9d1 Fixed test failures due to broadcast not stopping correctly 2012-08-23 19:59:55 -07:00
Matei Zaharia 7310a6f499 Merge pull request #147 from mosharaf/dev
Broadcast refactoring/cleaning up
2012-08-23 19:38:28 -07:00
Mosharaf Chowdhury 995ad6ba36 Merge remote-tracking branch 'upstream/dev' into dev 2012-08-23 09:51:38 -07:00
Josh Rosen 607b53abfc Use numpy in Python k-means example. 2012-08-22 00:43:55 -07:00
Matei Zaharia 79c82b6cfd Merge pull request #173 from squito/accum_localValue
make accumulator.localValue public, add tests
2012-08-22 00:11:21 -07:00
Josh Rosen fd94e5443c Use only cPickle for serialization in Python API.
Objects serialized with JSON can be compared for equality, but JSON can be slow
to serialize and only supports a limited range of data types.
2012-08-21 14:01:27 -07:00