Commit graph

10995 commits

Author SHA1 Message Date
Matei Zaharia 389fb4cc54 End runJob() with a SparkException when a task fails too many times in
one of the cluster schedulers.
2012-08-31 17:47:43 -07:00
Matei Zaharia 51fb13dd16 Bug fix 2012-08-31 15:36:11 -07:00
Matei Zaharia ce42a46375 Bug fix 2012-08-31 15:35:35 -07:00
Matei Zaharia f92d4a6ac1 Better output messages for streaming job duration 2012-08-31 15:33:48 -07:00
Matei Zaharia 607b8fffcd End runJob with a SparkException when a Mesos task fails too many times 2012-08-31 11:40:12 -07:00
Tathagata Das 2d01d38a41 Added StateDStream, corresponding stateful stream operations, and testcases. Also refactored few PairDStreamFunctions methods. 2012-08-31 03:47:34 -07:00
root e1da274a48 WordCount tweaks 2012-08-31 07:16:19 +00:00
root 113277549c Really fixed the replication-3 issue. The problem was a few buffers not being rewound. 2012-08-31 05:39:35 +00:00
Mosharaf Chowdhury baf2a7ccd2 Merge remote-tracking branch 'upstream/dev' into dev 2012-08-30 22:28:14 -07:00
Mosharaf Chowdhury 31ffe8d528 Synchronization bug fix in broadcast implementations 2012-08-30 22:26:43 -07:00
Matei Zaharia 101ae493e2 Replicate serialized blocks properly, without sharing a ByteBuffer. 2012-08-30 22:24:14 -07:00
Mosharaf Chowdhury 3883532545 Bug fix. Fixed log messages. Updated BroadcastTest example to have iterations. 2012-08-30 21:43:00 -07:00
Matei Zaharia a480dec6b2 Deserialize multi-get results in the caller's thread. This fixes an
issue with shared buffers in the KryoSerializer.
2012-08-30 20:01:06 -07:00
Matei Zaharia 1b3e3352eb Deserialize multi-get results in the caller's thread. This fixes an
issue with shared buffers with the KryoSerializer.
2012-08-30 17:59:25 -07:00
root d4d2cb670f Make checkpoint interval configurable in WordCount2 2012-08-31 00:34:57 +00:00
root c4366eb764 Fixes to ShuffleFetcher 2012-08-31 00:34:24 +00:00
Mosharaf Chowdhury 8f2bd399da Merge remote-tracking branch 'upstream/dev' into dev 2012-08-30 15:21:08 -07:00
Matei Zaharia bf3212615a Merge pull request #184 from rxin/dev
Disable running combiners on map tasks when mergeCombiners function is not specified by the user.
2012-08-30 14:12:40 -07:00
Reynold Xin a8a2a08a1a Added a test for testing map-side combine on/off switch. 2012-08-30 12:34:28 -07:00
Matei Zaharia 62e5326af0 Wording 2012-08-30 08:37:43 -07:00
Matei Zaharia e8ac9221dc Update sbt build command to create JARs 2012-08-30 08:36:39 -07:00
Reynold Xin 5945bcdcc5 Added a new flag in Aggregator to indicate applying map side combiners. 2012-08-29 23:32:08 -07:00
Reynold Xin c68e820b2a Merge branch 'dev' of github.com:mesos/spark into dev 2012-08-29 23:01:19 -07:00
Reynold Xin 940869dfda Disable running combiners on map tasks when mergeCombiners function is
not specified by the user.
2012-08-29 23:00:02 -07:00
Tathagata Das 4db3a96766 Made minor changes to reduce compilation errors in Eclipse. Twirl stuff still does not compile in Eclipse. 2012-08-29 13:04:01 -07:00
Matei Zaharia 84bf7924d6 Made region used by spark-ec2 configurable. 2012-08-28 22:40:48 -07:00
Matei Zaharia 47507d69d9 Made region used by spark-ec2 configurable. 2012-08-28 22:40:00 -07:00
root 1f8085b8d0 Compile fixes 2012-08-29 03:20:56 +00:00
Mosharaf Chowdhury c74455f309 Merge remote-tracking branch 'upstream/dev' into dev 2012-08-28 14:56:57 -07:00
Tathagata Das 43e66146f7 Merge branch 'dev' of github.com/radlab/spark into dev 2012-08-28 13:51:05 -07:00
Tathagata Das b5b93a621c Added capabllity to take streaming input from network. Renamed SparkStreamContext to StreamingContext. 2012-08-28 12:35:19 -07:00
Matei Zaharia bf2e9cb08e Fault tolerance and block store fixes discovered through streaming tests. 2012-08-27 23:07:50 -07:00
Matei Zaharia 17af2df0cd Log levels 2012-08-27 23:07:32 -07:00
Matei Zaharia a0b34d826a Merge branch 'dev' of github.com:radlab/spark into dev 2012-08-27 22:49:52 -07:00
Matei Zaharia b4a2214218 More fault tolerance fixes to catch lost tasks 2012-08-27 22:49:29 -07:00
Matei Zaharia 291abc2c28 Merge pull request #181 from rxin/dev
Removed the deserialization cache for ShuffleMapTask
2012-08-27 22:38:22 -07:00
Reynold Xin 3a6a95dc24 Removed the deserialization cache for ShuffleMapTask because it was
causing concurrency problems (some variables in Shark get set to null).
The cost of task deserialization on slaves is trivial compared with the
execution time of the task anyway.
2012-08-27 22:33:15 -07:00
Josh Rosen 4143678509 Fix minor bugs in Python API examples. 2012-08-27 00:24:47 -07:00
Josh Rosen bff6a46359 Add pipe(), saveAsTextFile(), sc.union() to Python API. 2012-08-27 00:24:47 -07:00
Josh Rosen 200d248dcc Simplify Python worker; pipeline the map step of partitionBy(). 2012-08-27 00:24:39 -07:00
Josh Rosen 6904cb77d4 Use local combiners in Python API combineByKey(). 2012-08-27 00:19:26 -07:00
Josh Rosen 8b64b7ecd8 Add countByKey(), reduceByKeyLocally() to Python API 2012-08-27 00:19:22 -07:00
Josh Rosen 08b201d810 Add mapPartitions(), glom(), countByValue() to Python API. 2012-08-27 00:19:14 -07:00
Josh Rosen f79a1e4d2a Add broadcast variables to Python API. 2012-08-27 00:16:47 -07:00
Josh Rosen 65e8406029 Implement fold() in Python API. 2012-08-27 00:16:47 -07:00
root e2cf197a0a Made WordCount2 even more configurable 2012-08-27 03:34:15 +00:00
root 9635823947 Merge branch 'dev' of github.com:radlab/spark into dev 2012-08-27 03:08:25 +00:00
Matei Zaharia b914cd0dfa Serialize generation correctly in ShuffleMapTask 2012-08-26 20:07:59 -07:00
root 20f6b0cfc9 Merge branch 'dev' of github.com:radlab/spark into dev 2012-08-27 03:01:03 +00:00
Matei Zaharia 69c2ab0408 logging 2012-08-26 20:00:58 -07:00