ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Matei Zaharia	abdc1f8bbb	Merge pull request #847 from rxin/rdd Allow subclasses of Product2 in all key-value related classes	2013-08-19 18:30:56 -07:00
Matei Zaharia	498a26189b	Small fixes to web UI: - Use SPARK_PUBLIC_DNS environment variable if set (for EC2) - Use a non-ephemeral port (3030 instead of 33000) by default - Updated test to use non-ephemeral port too	2013-08-19 18:17:49 -07:00
Reynold Xin	5054abd41b	Code review feedback. (added tests for cogroup and substract; added more documentation on MutablePair)	2013-08-19 12:58:02 -07:00
Jey Kottalam	6f6944c807	Update SBT build to use simpler fix for Hadoop 0.23.9	2013-08-19 12:33:13 -07:00
Reynold Xin	acc4aa1f47	Added a test for sorting using MutablePair's.	2013-08-19 11:02:10 -07:00
Reynold Xin	71d705a66e	Made PairRDDFunctions taking only Tuple2, but made the rest of the shuffle code path working with general Product2.	2013-08-19 00:40:43 -07:00
Reynold Xin	2a7b99c08b	Added the missing RDD files and cleaned up SparkContext.	2013-08-18 20:39:29 -07:00
Reynold Xin	82bf4c0339	Allow subclasses of Product2 in all key-value related classes (ShuffleDependency, PairRDDFunctions, etc).	2013-08-18 20:25:45 -07:00
Matei Zaharia	8ac3d1e263	Added unit tests for ClusterTaskSetManager, and fix a bug found with resetting locality level after a non-local launch	2013-08-18 19:51:07 -07:00
Matei Zaharia	4004cf775d	Added some comments on threading in scheduler code	2013-08-18 19:51:07 -07:00
Matei Zaharia	2a4ed10210	Address some review comments: - When a resourceOffers() call has multiple offers, force the TaskSets to consider them in increasing order of locality levels so that they get a chance to launch stuff locally across all offers - Simplify ClusterScheduler.prioritizeContainers - Add docs on the new configuration options	2013-08-18 19:51:07 -07:00
Matei Zaharia	222c897128	Comment cleanup (via Kay) and some debug messages	2013-08-18 19:51:07 -07:00
Matei Zaharia	cf39d45d14	More scheduling fixes: - Added periodic revival of offers in StandaloneSchedulerBackend - Replaced task scheduling aggression with multi-level delay scheduling in ClusterTaskSetManager - Fixed ZippedRDD preferred locations because they can't currently be process-local - Fixed some uses of hostPort	2013-08-18 19:51:07 -07:00
Matei Zaharia	90a04dab8d	Initial work towards scheduler refactoring: - Replace use of hostPort vs host in Task.preferredLocations with a TaskLocation class that contains either an executorId and a host or just a host. This is part of a bigger effort to eliminate hostPort based data structures and just use executorID, since the hostPort vs host stuff is confusing (and not checkable with static typing, leading to ugly debug code), and hostPorts are not provided by Mesos. - Replaced most hostPort-based data structures and fields as above. - Simplified ClusterTaskSetManager to deal with preferred locations in a more concise way and generally be more concise. - Updated the way ClusterTaskSetManager handles racks: instead of enqueueing a task to a separate queue for all the hosts in the rack, which would create lots of large queues, have one queue per rack name. - Removed non-local fallback stuff in ClusterScheduler that tried to launch less-local tasks on a node once the local ones were all assigned. This change didn't work because many cluster schedulers send offers for just one node at a time (even the standalone and YARN ones do so as nodes join the cluster one by one). Thus, lots of non-local tasks would be assigned even though a node with locality for them would be able to receive tasks just a short time later. - Renamed MapOutputTracker "generations" to "epochs".	2013-08-18 19:51:06 -07:00
Jey Kottalam	23f4622aff	Remove redundant dependencies from POMs	2013-08-18 18:53:57 -07:00
Jey Kottalam	bdd861c6c3	Fix Maven build with Hadoop 0.23.9	2013-08-18 18:28:57 -07:00
Matei Zaharia	8fa0747978	Merge pull request #840 from AndreSchumacher/zipegg Implementing SPARK-878 for PySpark: adding zip and egg files to context ...	2013-08-18 17:02:54 -07:00
Jey Kottalam	47a7c4338a	Don't assume spark-examples JAR always exists	2013-08-18 16:59:02 -07:00
Jey Kottalam	44000b10ff	Make YARN POM file valid	2013-08-18 16:23:22 -07:00
Evan Sparks	07fe910669	Fixing typos in Java tests, and addressing alignment issues.	2013-08-18 15:03:13 -07:00
Evan Sparks	b291db712e	Centralizing linear data generator and mllib regression tests to use it.	2013-08-18 15:03:13 -07:00
Evan Sparks	b659af83d3	Adding Linear Regression, and refactoring Ridge Regression.	2013-08-18 15:03:13 -07:00
Matei Zaharia	1e137a5a21	Merge pull request #846 from rxin/rdd Two minor RDD refactoring	2013-08-17 22:22:32 -07:00
Reynold Xin	2c00ea3efc	Moved shuffle serializer setting from a constructor parameter to a setSerializer method in various RDDs that involve shuffle operations.	2013-08-17 21:43:29 -07:00
Reynold Xin	0e84fee76b	Removed the mapSideCombine option in partitionBy.	2013-08-17 21:13:41 -07:00
Reynold Xin	10af952a3d	Removed the mapSideCombine option in CoGroupedRDD.	2013-08-17 21:07:34 -07:00
Reynold Xin	5d050a3e1f	Removed the unused shuffleId in ShuffleDependency's constructor.	2013-08-16 23:23:16 -07:00
Matei Zaharia	e89ffc7b3c	Merge pull request #839 from jegonzal/zip_partitions Currying RDD.zipPartitions	2013-08-16 14:02:34 -07:00
Jey Kottalam	67b593607c	Rename YARN build flag to SPARK_WITH_YARN	2013-08-16 14:00:05 -07:00
Jey Kottalam	b1d99744a8	Fix SBT build under Hadoop 0.23.x	2013-08-16 13:50:12 -07:00
Jey Kottalam	c1e547bb7f	Updates to repl and example POMs to match SBT build	2013-08-16 13:50:12 -07:00
Jey Kottalam	ad580b94d5	Maven build now also works with YARN	2013-08-16 13:50:12 -07:00
Jey Kottalam	741ecd56fe	Forgot to remove a few references to ${classifier}	2013-08-16 13:50:12 -07:00
Jey Kottalam	9dd15fe700	Don't mark hadoop-client as 'provided'	2013-08-16 13:50:12 -07:00
Jey Kottalam	11b42a84db	Maven build now works with CDH hadoop-2.0.0-mr1	2013-08-16 13:50:12 -07:00
Jey Kottalam	353fab2440	Initial changes to make Maven build agnostic of hadoop version	2013-08-16 13:50:12 -07:00
Jey Kottalam	8add2d7a59	Fix repl/assembly when YARN enabled	2013-08-16 13:50:12 -07:00
Jey Kottalam	3f98eff63a	Allow make-distribution.sh to specify Hadoop version used	2013-08-16 13:50:09 -07:00
Joseph E. Gonzalez	53b2639a1e	Reversing the argument order in zipPartitions to enable stronger type inference.	2013-08-16 12:38:59 -07:00
Andre Schumacher	c7e348faec	Implementing SPARK-878 for PySpark: adding zip and egg files to context and passing it down to workers which add these to their sys.path	2013-08-16 11:58:20 -07:00
Holden Karau	8fc40818d7	Fix	2013-08-15 23:08:48 -07:00
Reynold Xin	1fb1b09928	Merge pull request #841 from rxin/json Use the JSON formatter from Scala library and removed dependency on lift-json.	2013-08-15 22:15:05 -07:00
Matei Zaharia	c69c48947d	Merge pull request #843 from Reinvigorate/bug-879 fixing typo in conf/slaves	2013-08-15 20:55:09 -07:00
seanm	a5193a8fac	fixing typo	2013-08-15 20:52:58 -06:00
Reynold Xin	c961c19b7b	Use the JSON formatter from Scala library and removed dependency on lift-json. It made the JSON creation slightly more complicated, but reduces one external dependency. The scala library also properly escape "/" (which lift-json doesn't).	2013-08-15 18:23:01 -07:00
Reynold Xin	eddbf43b54	Revert "Merge pull request #834 from Daemoen/master" This reverts commit `230ab2722e`, reversing changes made to `659553b21d`.	2013-08-15 17:49:37 -07:00
Reynold Xin	230ab2722e	Merge pull request #834 from Daemoen/master Updated json output to allow for display of worker state	2013-08-15 17:45:17 -07:00
Patrick Wendell	659553b21d	Merge pull request #836 from pwendell/rename Rename `memoryBytesToString` and `memoryMegabytesToString`	2013-08-15 16:56:31 -07:00
Jey Kottalam	a0f0848463	Update default version of Hadoop to 1.2.1	2013-08-15 16:50:37 -07:00
Jey Kottalam	a06a9d5c5f	Rename HadoopWriter to SparkHadoopWriter since it's outside of our package	2013-08-15 16:50:37 -07:00

... 2 3 4 5 6 ...

3913 commits