Commit graph

2748 commits

Author SHA1 Message Date
Patrick Wendell 224fbac061 Spark-742: TaskMetrics should not employ per-record timing.
This patch does three things:

1. Makes TimedIterator a trait with two implementations (one a no-op)
2. Makes the default behavior to use the no-op implementation
3. Removes DelegateBlockFetchTracker. This is just cleanup, but it seems like
   the triat doesn't really reduce complexity in any way.

In the future we can add other implementations, e.g. ones which perform sampling.
2013-04-29 11:13:43 -07:00
Matei Zaharia 0f45347c7b More unit test fixes 2013-04-28 22:29:27 -07:00
Matei Zaharia bce4089f22 Fix BlockManagerSuite to deal with clearing spark.hostPort 2013-04-28 22:23:48 -07:00
Matei Zaharia 68c07ea198 Merge pull request #582 from shivaram/master
Add zip partitions interface
2013-04-28 20:19:33 -07:00
Shivaram Venkataraman 604d3bf56c Rename partition class and add scala doc 2013-04-28 16:31:07 -07:00
Shivaram Venkataraman 15acd49f07 Actually rename classes to ZippedPartitions*
(the previous commit only renamed the file)
2013-04-28 16:03:22 -07:00
Shivaram Venkataraman 6e84635ab9 Rename classes from MapZipped* to Zipped* 2013-04-28 15:58:40 -07:00
Matei Zaharia f6ee9a8728 Merge pull request #583 from mridulm/master
Fix issues with streaming test cases after yarn branch merge
2013-04-28 15:36:04 -07:00
Mridul Muralidharan 430c531464 Remove debug statements 2013-04-29 00:24:30 +05:30
Mridul Muralidharan 3a89a76b87 Make log message more descriptive to aid in debugging 2013-04-29 00:04:12 +05:30
Mridul Muralidharan 9bd439502e Remove spurious commit 2013-04-28 23:09:08 +05:30
Mridul Muralidharan 7fa6978a1e Allow CheckpointWriter pending tasks to finish 2013-04-28 23:08:10 +05:30
Mridul Muralidharan 00c7a37604 Merge branch 'master' of github.com:mridulm/spark 2013-04-28 22:44:34 +05:30
Mridul Muralidharan afee902443 Attempt to fix streaming test failures after yarn branch merge 2013-04-28 22:26:45 +05:30
Shivaram Venkataraman 0cc6642b7c Rename to zipPartitions and style changes 2013-04-28 05:11:03 -07:00
Shivaram Venkataraman c9c4954d99 Add an interface to zip iterators of multiple RDDs
The current code supports 2, 3 or 4 arguments but can be extended
to more arguments if required.
2013-04-26 16:57:46 -07:00
Matei Zaharia 1f20ef2567 Merge branch 'master' of github.com:mesos/spark 2013-04-25 20:03:13 -07:00
Matei Zaharia 1b169f190c Exclude old versions of Netty, which had a different Maven organization 2013-04-25 19:52:12 -07:00
Matei Zaharia cf54b824ff Merge pull request #580 from pwendell/quickstart
SPARK-739 Have quickstart standlone job use README
2013-04-25 11:45:58 -07:00
Patrick Wendell a72134a6ac SPARK-739 Have quickstart standlone job use README 2013-04-25 10:39:28 -07:00
Matei Zaharia 6e6b5204ea Create an empty directory when checkpointing a 0-partition RDD (fixes a
test failure on Hadoop 2.0)
2013-04-25 00:42:37 -07:00
Matei Zaharia eef9ea1993 Update unit test memory to 2 GB 2013-04-25 00:42:29 -07:00
Matei Zaharia 01d9ba5038 Add back line removed during YARN merge 2013-04-25 00:11:27 -07:00
Reynold Xin ba6ffa6a5f Allow the specification of a shuffle serializer in the read path (for
local block reads).
2013-04-24 17:38:07 -07:00
Reynold Xin aa618ed2a2 Allow changing the serializer on a per shuffle basis. 2013-04-24 14:52:49 -07:00
Matei Zaharia 118a6c76f5 Merge pull request #575 from mridulm/master
Manual merge of yarn branch to trunk
2013-04-24 08:42:30 -07:00
Mridul Muralidharan 3b594a4e3b Do not add signature files - results in validation errors when using assembled file 2013-04-24 10:18:25 +05:30
Mridul Muralidharan dd515ca3ee Attempt at fixing merge conflict 2013-04-24 09:24:17 +05:30
Mridul Muralidharan d09db1c051 concurrentRestrictions fails for this PR - but works for master, probably some version change 2013-04-24 09:15:29 +05:30
Mridul Muralidharan adcda84f96 Pull latest SparkBuild.scala from master and merge conflicts 2013-04-24 08:57:25 +05:30
Reynold Xin 31ce6c66d6 Added a BlockObjectWriter interface in block manager so ShuffleMapTask
doesn't need to build up an array buffer for each shuffle bucket.
2013-04-23 17:48:59 -07:00
Mridul Muralidharan 5b85c715c8 Revert back to 2.0.2-alpha : 0.23.7 has protocol changes which break against cloudera 2013-04-24 02:57:51 +05:30
Mridul Muralidharan 8faf5c51c3 Patch from Thomas Graves to improve the YARN Client, and move to more production ready hadoop yarn branch 2013-04-24 02:31:57 +05:30
Mridul Muralidharan b11058f42c Ensure that maven package adds yarn jars as part of shaded jar for hadoop2-yarn profile 2013-04-23 22:48:32 +05:30
koeninger dfac0aa5c2 prevent mysql driver from pulling entire resultset into memory. explicitly close resultset and statement. 2013-04-22 21:12:52 -05:00
Mridul Muralidharan 7acab3ab45 Fix review comments, add a new api to SparkHadoopUtil to create appropriate Configuration. Modify an example to show how to use SplitInfo 2013-04-22 08:01:13 +05:30
koeninger b2a3f24dde first attempt at an RDD to pull data from JDBC sources 2013-04-21 00:29:37 -05:00
Matei Zaharia 17e076de80 Turn on forking in test JVMs to reduce the pressure on perm gen and code
cache sizes due to having 2 instances of the Scala compiler and a bunch
of classloaders.
2013-04-18 22:25:57 -07:00
Mridul Muralidharan ac2e8e8720 Add some basic documentation 2013-04-19 00:13:19 +05:30
Mridul Muralidharan 5ee2f5c483 Cache pattern, add (commented out) alternatives for check* apis 2013-04-17 23:13:34 +05:30
Mridul Muralidharan f07961060d Add a small note on spark.tasks.schedule.aggression 2013-04-17 23:13:02 +05:30
Matei Zaharia 5d8a71c484 Merge pull request #570 from jey/increase-codecache-size
Increase ReservedCodeCacheSize for sbt
2013-04-16 19:48:02 -07:00
Mridul Muralidharan 5d891534fd Move back to 2.0.2-alpha, since 2.0.3-alpha is not available in cloudera yet. Also, add netty dependency explicitly to prevent resolving to older 2.3x version. Additionally, comment out retrievePattern to ensure correct netty is picked up 2013-04-17 05:54:43 +05:30
Mridul Muralidharan 46779b4745 Move back to 2.0.2-alpha, since 2.0.3-alpha is not available in cloudera yet 2013-04-17 05:53:28 +05:30
Mridul Muralidharan 02dffd2eb0 Ensure all ask/await block for spark.akka.askTimeout - so that it is controllable : instead of arbitrary timeouts spread across codebase. In our tests, we use 30 seconds, though default of 10 is maintained 2013-04-17 05:52:57 +05:30
Mridul Muralidharan a402b23bcd Fudge order of classpath - so that our jars take precedence over what is in CLASSPATH variable. Sounds logical, hope there is no issue cos of it 2013-04-17 05:52:00 +05:30
Mridul Muralidharan bcdde331c3 Move from master to driver 2013-04-17 04:12:18 +05:30
Jey Kottalam 6bfe4bf3eb Increase ReservedCodeCacheSize for sbt 2013-04-16 09:50:59 -07:00
Mridul Muralidharan ad80f68eb5 remove spurious debug statements 2013-04-16 22:15:34 +05:30
Mridul Muralidharan f7969f72ee Fix exception when checkpoint path does not exist (no data in rdd which is being checkpointed for example) 2013-04-16 21:51:38 +05:30