ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Reynold Xin	084df85816	Merge branch 'master' of github.com:mridulm/spark	2013-04-30 10:22:37 -07:00
Matei Zaharia	f708dda81e	Merge pull request #585 from pwendell/listener-perf [Fix SPARK-742] Task Metrics should not employ per-record timing by default	2013-04-30 07:51:40 -07:00
Mridul Muralidharan	e46d547ccd	Fix issues reported by Reynold	2013-04-30 16:15:56 +05:30
Reynold Xin	1055785a83	Allow specifying the shuffle write file buffer size. The default buffer size is 8KB in FastBufferedOutputStream, which is too small and would cause a lot of disk seeks.	2013-04-29 23:33:56 -07:00
Reynold Xin	7007201201	Added a shuffle block manager so it is easier in the future to consolidate shuffle output files.	2013-04-29 23:07:03 -07:00
Reynold Xin	ed4ddf4487	Merge branch 'master' of github.com:mesos/spark into blockmanager	2013-04-29 20:08:23 -07:00
Matei Zaharia	f1f92c88eb	Build against Hadoop 1 by default	2013-04-29 17:08:45 -07:00
Reynold Xin	d3586ef438	Merge branch 'blockmanager' of github.com:rxin/spark into blockmanager Conflicts: core/src/main/scala/spark/storage/DiskStore.scala	2013-04-29 15:44:18 -07:00
Patrick Wendell	016ce1fa9c	Using full package name for util	2013-04-29 12:02:27 -07:00
Patrick Wendell	540be6b154	Modified version of the fix which just removes all per-record tracking.	2013-04-29 11:32:07 -07:00
Patrick Wendell	224fbac061	Spark-742: TaskMetrics should not employ per-record timing. This patch does three things: 1. Makes TimedIterator a trait with two implementations (one a no-op) 2. Makes the default behavior to use the no-op implementation 3. Removes DelegateBlockFetchTracker. This is just cleanup, but it seems like the triat doesn't really reduce complexity in any way. In the future we can add other implementations, e.g. ones which perform sampling.	2013-04-29 11:13:43 -07:00
Matei Zaharia	0f45347c7b	More unit test fixes	2013-04-28 22:29:27 -07:00
Matei Zaharia	bce4089f22	Fix BlockManagerSuite to deal with clearing spark.hostPort	2013-04-28 22:23:48 -07:00
Matei Zaharia	68c07ea198	Merge pull request #582 from shivaram/master Add zip partitions interface	2013-04-28 20:19:33 -07:00
Shivaram Venkataraman	604d3bf56c	Rename partition class and add scala doc	2013-04-28 16:31:07 -07:00
Shivaram Venkataraman	15acd49f07	Actually rename classes to ZippedPartitions* (the previous commit only renamed the file)	2013-04-28 16:03:22 -07:00
Shivaram Venkataraman	6e84635ab9	Rename classes from MapZipped* to Zipped*	2013-04-28 15:58:40 -07:00
Matei Zaharia	f6ee9a8728	Merge pull request #583 from mridulm/master Fix issues with streaming test cases after yarn branch merge	2013-04-28 15:36:04 -07:00
Mridul Muralidharan	430c531464	Remove debug statements	2013-04-29 00:24:30 +05:30
Mridul Muralidharan	3a89a76b87	Make log message more descriptive to aid in debugging	2013-04-29 00:04:12 +05:30
Mridul Muralidharan	9bd439502e	Remove spurious commit	2013-04-28 23:09:08 +05:30
Mridul Muralidharan	7fa6978a1e	Allow CheckpointWriter pending tasks to finish	2013-04-28 23:08:10 +05:30
Mridul Muralidharan	00c7a37604	Merge branch 'master' of github.com:mridulm/spark	2013-04-28 22:44:34 +05:30
Mridul Muralidharan	afee902443	Attempt to fix streaming test failures after yarn branch merge	2013-04-28 22:26:45 +05:30
Shivaram Venkataraman	0cc6642b7c	Rename to zipPartitions and style changes	2013-04-28 05:11:03 -07:00
Shivaram Venkataraman	c9c4954d99	Add an interface to zip iterators of multiple RDDs The current code supports 2, 3 or 4 arguments but can be extended to more arguments if required.	2013-04-26 16:57:46 -07:00
Matei Zaharia	1f20ef2567	Merge branch 'master' of github.com:mesos/spark	2013-04-25 20:03:13 -07:00
Matei Zaharia	1b169f190c	Exclude old versions of Netty, which had a different Maven organization	2013-04-25 19:52:12 -07:00
Matei Zaharia	cf54b824ff	Merge pull request #580 from pwendell/quickstart SPARK-739 Have quickstart standlone job use README	2013-04-25 11:45:58 -07:00
Patrick Wendell	a72134a6ac	SPARK-739 Have quickstart standlone job use README	2013-04-25 10:39:28 -07:00
Matei Zaharia	6e6b5204ea	Create an empty directory when checkpointing a 0-partition RDD (fixes a test failure on Hadoop 2.0)	2013-04-25 00:42:37 -07:00
Matei Zaharia	eef9ea1993	Update unit test memory to 2 GB	2013-04-25 00:42:29 -07:00
Matei Zaharia	01d9ba5038	Add back line removed during YARN merge	2013-04-25 00:11:27 -07:00
Reynold Xin	ba6ffa6a5f	Allow the specification of a shuffle serializer in the read path (for local block reads).	2013-04-24 17:38:07 -07:00
Reynold Xin	aa618ed2a2	Allow changing the serializer on a per shuffle basis.	2013-04-24 14:52:49 -07:00
Matei Zaharia	118a6c76f5	Merge pull request #575 from mridulm/master Manual merge of yarn branch to trunk	2013-04-24 08:42:30 -07:00
Mridul Muralidharan	3b594a4e3b	Do not add signature files - results in validation errors when using assembled file	2013-04-24 10:18:25 +05:30
Mridul Muralidharan	dd515ca3ee	Attempt at fixing merge conflict	2013-04-24 09:24:17 +05:30
Mridul Muralidharan	d09db1c051	concurrentRestrictions fails for this PR - but works for master, probably some version change	2013-04-24 09:15:29 +05:30
Mridul Muralidharan	adcda84f96	Pull latest SparkBuild.scala from master and merge conflicts	2013-04-24 08:57:25 +05:30
Reynold Xin	31ce6c66d6	Added a BlockObjectWriter interface in block manager so ShuffleMapTask doesn't need to build up an array buffer for each shuffle bucket.	2013-04-23 17:48:59 -07:00
Mridul Muralidharan	5b85c715c8	Revert back to 2.0.2-alpha : 0.23.7 has protocol changes which break against cloudera	2013-04-24 02:57:51 +05:30
Mridul Muralidharan	8faf5c51c3	Patch from Thomas Graves to improve the YARN Client, and move to more production ready hadoop yarn branch	2013-04-24 02:31:57 +05:30
Mridul Muralidharan	b11058f42c	Ensure that maven package adds yarn jars as part of shaded jar for hadoop2-yarn profile	2013-04-23 22:48:32 +05:30
koeninger	dfac0aa5c2	prevent mysql driver from pulling entire resultset into memory. explicitly close resultset and statement.	2013-04-22 21:12:52 -05:00
Mridul Muralidharan	7acab3ab45	Fix review comments, add a new api to SparkHadoopUtil to create appropriate Configuration. Modify an example to show how to use SplitInfo	2013-04-22 08:01:13 +05:30
koeninger	b2a3f24dde	first attempt at an RDD to pull data from JDBC sources	2013-04-21 00:29:37 -05:00
Matei Zaharia	17e076de80	Turn on forking in test JVMs to reduce the pressure on perm gen and code cache sizes due to having 2 instances of the Scala compiler and a bunch of classloaders.	2013-04-18 22:25:57 -07:00
Mridul Muralidharan	ac2e8e8720	Add some basic documentation	2013-04-19 00:13:19 +05:30
Andrew xia	8436bd5d4a	remove TaskSetQueueManager and update code style	2013-04-19 02:17:22 +08:00

... 2 3 4 5 6 ...

2821 commits