ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Evan Chan	27726079e4	Print out more friendly error if listFiles() fails listFiles() could return null if the I/O fails, and this currently results in an ugly NPE which is hard to diagnose.	2013-09-09 12:58:12 -07:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	2186d93285	Add metrics-ganglia to core pom file	2013-09-09 12:37:33 -05:00
Stephen Haberman	59003d387d	Use a set since shuffle could change order.	2013-09-09 11:45:03 -05:00
Stephen Haberman	6471bfec73	Reword 'evenly distributed' to 'distributed with a hash partitioner.	2013-09-09 11:44:15 -05:00
Matei Zaharia	bf984e2745	Merge pull request #890 from mridulm/master Fix hash bug	2013-09-08 23:50:24 -07:00
Reynold Xin	e9d4f44a7a	Merge pull request #909 from mateiz/exec-id-fix Fix an instance where full standalone mode executor IDs were passed to	2013-09-08 23:36:48 -07:00
Matei Zaharia	7d3204b056	Merge pull request #905 from mateiz/docs2 Job scheduling and cluster mode docs	2013-09-08 21:39:12 -07:00
Patrick Wendell	f68848d95d	Merge pull request #906 from pwendell/ganglia-sink Clean-up of Metrics Code/Docs and Add Ganglia Sink	2013-09-08 18:32:16 -07:00
Matei Zaharia	f9b7f58de2	Fix an instance where full standalone mode executor IDs were passed to StandaloneSchedulerBackend instead of the smaller IDs used within Spark (that lack the application name). This was reported by ClearStory in https://github.com/clearstorydata/spark/pull/9. Also fixed some messages that said slave instead of executor.	2013-09-08 18:27:50 -07:00
Matei Zaharia	170b3869ee	Fix unit test failure due to changed default	2013-09-08 17:51:27 -07:00
Patrick Wendell	b4e382c210	Adding sc name in metrics source	2013-09-08 16:06:49 -07:00
Patrick Wendell	c190b48bf5	Adding more docs and some code cleanup	2013-09-08 13:46:28 -07:00
Stephen Haberman	df5fd35273	Add better docs for coalesce. Include the useful tip that if shuffle=true, coalesce can actually increase the number of partitions. This makes coalesce more like a generic `RDD.repartition` operation. (Ideally this `RDD.repartition` could automatically choose either a coalesce or a shuffle if numPartitions was either less than or greater than, respectively, the current number of partitions.)	2013-09-08 15:39:04 -05:00
Matei Zaharia	04cfb3aa9d	Merge pull request #898 from ilikerps/660 SPARK-660: Add StorageLevel support in Python	2013-09-08 10:33:20 -07:00
Patrick Wendell	8de8ee5d3c	Ganglia sink	2013-09-08 10:08:18 -07:00
Matei Zaharia	651a96adf7	More fair scheduler docs and property names. Also changed uses of "job" terminology to "application" when they referred to an entire Spark program, to avoid confusion.	2013-09-08 00:29:11 -07:00
Matei Zaharia	98fb69822c	Work in progress: - Add job scheduling docs - Rename some fair scheduler properties - Organize intro page better - Link to Apache wiki for "contributing to Spark"	2013-09-08 00:29:11 -07:00
Aaron Davidson	c1cc8c4da2	Export StorageLevel and refactor	2013-09-07 14:41:31 -07:00
Aaron Davidson	8001687af5	Remove reflection, hard-code StorageLevels The sc.StorageLevel -> StorageLevel pathway is a bit janky, but otherwise the shell would have to call a private method of SparkContext. Having StorageLevel available in sc also doesn't seem like the end of the world. There may be a better solution, though. As for creating the StorageLevel object itself, this seems to be the best way in Python 2 for creating singleton, enum-like objects: http://stackoverflow.com/questions/36932/how-can-i-represent-an-enum-in-python	2013-09-07 09:34:07 -07:00
Reynold Xin	210eae26f4	Fixed the bug that ResultTask was not properly deserializing outputId.	2013-09-07 21:59:47 +08:00
Aaron Davidson	b8a0b6ea5e	Memoize StorageLevels read from JVM	2013-09-06 15:36:04 -07:00
Reynold Xin	1e15feb5a3	Hot fix to resolve the compilation error caused by SPARK-821.	2013-09-06 22:44:05 +08:00
Patrick Wendell	ddcb9d310a	Merge pull request #895 from ilikerps/821 SPARK-821: Don't cache results when action run locally on driver	2013-09-05 23:54:09 -07:00
Aaron Davidson	a63d4c7dc2	SPARK-660: Add StorageLevel support in Python It uses reflection... I am not proud of that fact, but it at least ensures compatibility (sans refactoring of the StorageLevel stuff).	2013-09-05 23:36:27 -07:00
Aaron Davidson	3a04e76c89	Reynold's second round of comments	2013-09-05 21:43:26 -07:00
Matei Zaharia	699c331f2f	Merge pull request #891 from xiajunluan/SPARK-864 [SPARK-864]DAGScheduler Exception if we delete Worker and StandaloneExecutorBackend then add Worker	2013-09-05 20:21:53 -07:00
Aaron Davidson	4f2236a1c5	Add unit test and address comments	2013-09-05 18:06:30 -07:00
Aaron Davidson	1418d18af4	SPARK-821: Don't cache results when action run locally on driver Caching the results of local actions (e.g., rdd.first()) causes the driver to store entire partitions in its own memory, which may be highly constrained. This patch simply makes the CacheManager avoid caching the result of all locally-run computations.	2013-09-05 15:34:42 -07:00
Andrew xia	7c15e3c5de	Fix bug SPARK-864	2013-09-05 15:56:11 +08:00
Patrick Wendell	5c7494d7c1	Merge pull request #893 from ilikerps/master SPARK-884: Add unit test to validate Spark JSON output	2013-09-04 22:47:03 -07:00
Aaron Davidson	714e7f9e32	Fix line over 100 chars	2013-09-04 22:40:08 -07:00
Aaron Davidson	37db141aef	Address Patrick's comments	2013-09-04 21:34:20 -07:00
Aaron Davidson	9e6f2b6822	SPARK-884: Add unit test to validate Spark JSON output This unit test simply validates that the outputs of the JsonProtocol methods are syntactically valid JSON.	2013-09-04 15:26:46 -07:00
Mridul Muralidharan	1e2474b814	Address review comments - rename toHash to nonNegativeHash	2013-09-04 07:46:46 +05:30
Mridul Muralidharan	b3a82b7df3	Fix hash bug - caused failure after 35k stages, sigh	2013-09-04 07:02:25 +05:30
Mark Hamstra	c9bc8af3d1	Removed repetative import; fixes hidden definition compiler warning.	2013-09-03 15:25:20 -07:00
Patrick Wendell	c592a3c9b9	Minor spacing fix	2013-09-03 14:39:11 -07:00
Patrick Wendell	19f70273d2	Merge pull request #878 from tgravescs/yarnUILink Link the Spark UI up to the Yarn UI	2013-09-03 14:29:10 -07:00
Matei Zaharia	68df2464d1	Merge pull request #889 from alig/master Return the port the WebUI is bound to (useful if port 0 was used)	2013-09-03 13:01:17 -07:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	41c1b5b9a0	Update based on review comments. Change function to prependBaseUri and fix formatting.	2013-09-03 14:46:51 -05:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	c8cc276110	Review comment changes and update to org.apache packaging	2013-09-03 10:50:21 -05:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	547fc4a412	Merge remote-tracking branch 'mesos/master' into yarnUILink Conflicts: core/src/main/scala/org/apache/spark/ui/UIUtils.scala core/src/main/scala/org/apache/spark/ui/jobs/PoolTable.scala core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala docs/running-on-yarn.md	2013-09-03 08:36:59 -05:00
Ali Ghodsi	b25918d841	Merge branch 'master' of https://github.com/alig/spark Conflicts: core/src/main/scala/org/apache/spark/deploy/master/Master.scala	2013-09-03 00:56:12 -07:00
Ali Ghodsi	bd0788505f	Using configured akka timeouts	2013-09-03 00:50:35 -07:00
Ali Ghodsi	cbfef9b3ff	Sort order of imports to match project guidelines	2013-09-02 19:33:55 -07:00
Ali Ghodsi	36d8fca2cc	Reynold's comment fixed	2013-09-02 19:31:09 -07:00
Ali Ghodsi	e452bd6d77	Brushing the code up slightly	2013-09-02 19:04:08 -07:00
Ali Ghodsi	cf7b115496	Enabling getting the actual WEBUI port	2013-09-02 18:21:21 -07:00
Matei Zaharia	12b2f1f9c9	Add missing license headers found with RAT	2013-09-02 12:23:03 -07:00
Matei Zaharia	246bf67f58	Fix test	2013-09-02 10:57:34 -07:00
Matei Zaharia	9329a7d4cd	Fix spark.io.compression.codec and change default codec to LZF	2013-09-02 10:15:22 -07:00
Matei Zaharia	6550e5e60c	Allow PySpark to launch worker.py directly on Windows	2013-09-01 18:06:15 -07:00
Matei Zaharia	3db404a43a	Run script fixes for Windows after package & assembly change	2013-09-01 23:45:57 +00:00
Matei Zaharia	0a8cc30921	Move some classes to more appropriate packages: * RDD, RDDFunctions -> org.apache.spark.rdd Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util * JavaSerializer, KryoSerializer -> org.apache.spark.serializer	2013-09-01 14:13:16 -07:00
Matei Zaharia	5701eb92c7	Fix some URLs	2013-09-01 14:13:16 -07:00
Matei Zaharia	12495ec63a	Remove shutdown hook to stop jetty; this is unnecessary for releasing ports and creates noisy log messages	2013-09-01 14:13:15 -07:00
Matei Zaharia	46eecd110a	Initial work to rename package to org.apache.spark	2013-09-01 14:13:13 -07:00
Matei Zaharia	a30fac16ca	Merge pull request #883 from alig/master Don't require the spark home environment variable to be set for standalone mode (change needed by SIMR)	2013-09-01 12:27:50 -07:00
Matei Zaharia	e34bc3a8ee	Small tweak	2013-08-31 17:47:15 -07:00
Matei Zaharia	2ee6a7e32a	Print output from spark-daemon only when it fails to launch	2013-08-31 17:31:07 -07:00
Ali Ghodsi	250bddc255	Don't require spark home to be set for standalone mode	2013-08-31 17:29:05 -07:00
Matei Zaharia	25ac50668b	Various web UI improvements: - Use "fluid" layout that can expand to wide browser windows, instead of the old one's limit of 1200 px - Remove unnecessary <hr> elements - Switch back to Bootstrap's default theme and tweak progress bar colors - Make headers more consistent between deploy and app UIs - Replace some inline CSS with stylesheets	2013-08-31 16:55:40 -07:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	96452eea56	fix up minor things	2013-08-30 16:04:31 -05:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	bac46266a9	Link the Spark UI to the Yarn UI	2013-08-30 15:55:32 -05:00
Mikhail Bautin	35090958b3	Also add getConf to NewHadoopRDD	2013-08-30 11:03:57 -07:00
Mikhail Bautin	5e30172f70	Make HadoopRDD's configuration accessible	2013-08-30 11:01:06 -07:00
Matei Zaharia	ca71620950	Merge pull request #857 from mateiz/assembly Change build and run instructions to use assemblies	2013-08-29 21:51:14 -07:00
Matei Zaharia	666d93c294	Update Maven build to create assemblies expected by new scripts This includes the following changes: - The "assembly" package now builds in Maven by default, and creates an assembly containing both hadoop-client and Spark, unlike the old BigTop distribution assembly that skipped hadoop-client - There is now a bigtop-dist package to build the old BigTop assembly - The repl-bin package is no longer built by default since the scripts don't reply on it; instead it can be enabled with -Prepl-bin - Py4J is now included in the assembly/lib folder as a local Maven repo, so that the Maven package can link to it - run-example now adds the original Spark classpath as well because the Maven examples assembly lists spark-core and such as provided - The various Maven projects add a spark-yarn dependency correctly	2013-08-29 21:19:06 -07:00
Matei Zaharia	aab345c463	Fix finding of assembly JAR, as well as some pointers to ./run	2013-08-29 21:19:06 -07:00
Matei Zaharia	ab0e625d9e	Fix PySpark for assembly run and include it in dist	2013-08-29 21:19:06 -07:00
Matei Zaharia	53cd50c069	Change build and run instructions to use assemblies This commit makes Spark invocation saner by using an assembly JAR to find all of Spark's dependencies instead of adding all the JARs in lib_managed. It also packages the examples into an assembly and uses that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script with two better-named scripts: "run-examples" for examples, and "spark-class" for Spark internal classes (e.g. REPL, master, etc). This is also designed to minimize the confusion people have in trying to use "run" to run their own classes; it's not meant to do that, but now at least if they look at it, they can modify run-examples to do a decent job for them. As part of this, Bagel's examples are also now properly moved to the examples package instead of bagel.	2013-08-29 21:19:04 -07:00
jerryshao	f3dbe6b215	Fix removed block zero size log reporting	2013-08-30 09:39:01 +08:00
Patrick Wendell	abdbacf252	Merge pull request #871 from pwendell/expose-local Expose `isLocal` in SparkContext.	2013-08-28 21:11:31 -07:00
Patrick Wendell	30d2421112	Make local variable public	2013-08-28 19:53:31 -07:00
Matei Zaharia	baa84e7e4c	Merge pull request #865 from tgravescs/fixtmpdir Spark on Yarn should use yarn approved directories for spark.local.dir and tmp	2013-08-28 12:44:46 -07:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	aac1214ee4	Change Executor to only look at the env variable SPARK_YARN_MODE	2013-08-28 13:26:26 -05:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	3f206bf0b5	Updated based on review comments.	2013-08-27 14:34:27 -05:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	cf52a3cba6	Allow for Executors to have different directories then the Spark Master for Yarn	2013-08-27 11:00:21 -05:00
Reynold Xin	a77e0abb96	Added worker state to the cluster master JSON ui.	2013-08-26 11:21:03 -07:00
Reynold Xin	9db1e50344	Revert "Merge pull request #841 from rxin/json" This reverts commit `1fb1b09928`, reversing changes made to `c69c48947d`.	2013-08-26 11:05:14 -07:00
Matei Zaharia	8a36fd09dd	Merge pull request #854 from markhamstra/pomUpdate Synced sbt and maven builds to use the same dependencies, etc.	2013-08-22 10:13:35 -07:00
Matei Zaharia	c2d00f12e2	Merge pull request #832 from alig/coalesce Coalesced RDD with locality	2013-08-22 10:13:03 -07:00
Mark Hamstra	ff6f1b0500	Synced sbt and maven builds	2013-08-21 13:50:24 -07:00
Mark Hamstra	5eea613ec0	Removed meaningless types	2013-08-20 16:49:18 -07:00
Ali Ghodsi	f20ed14e87	Merged in from upstream to use TaskLocation instead of strings	2013-08-20 16:21:43 -07:00
Ali Ghodsi	5cd21c4195	added curly braces to make the code more consistent	2013-08-20 16:16:05 -07:00
Ali Ghodsi	db4bc55bef	indent	2013-08-20 16:16:05 -07:00
Ali Ghodsi	c0942a710f	Bug in test fixed	2013-08-20 16:16:05 -07:00
Ali Ghodsi	5db41919b5	Added a test to make sure no locality preferences are ignored	2013-08-20 16:16:05 -07:00
Ali Ghodsi	7b123b3126	Simpler code	2013-08-20 16:16:05 -07:00
Ali Ghodsi	9192c358e4	simpler code	2013-08-20 16:16:05 -07:00
Ali Ghodsi	a75a64eade	Fixed almost all of Matei's feedback	2013-08-20 16:16:05 -07:00
Ali Ghodsi	f1c853d76d	fixed Matei's comments	2013-08-20 16:16:04 -07:00
Ali Ghodsi	890ea6ba79	making CoalescedRDDPartition public	2013-08-20 16:16:04 -07:00
Ali Ghodsi	d6b6c680be	comment in the test to make it more understandable	2013-08-20 16:16:04 -07:00
Ali Ghodsi	b69e7166ba	Coalescer now uses current preferred locations for derived RDDs. Made run() in DAGScheduler thread safe and added a method to be able to ask it for preferred locations. Added a similar method that wraps the former inside SparkContext.	2013-08-20 16:16:04 -07:00
Ali Ghodsi	3b5bb8a4ae	added one test that will test a future functionality	2013-08-20 16:13:37 -07:00
Ali Ghodsi	33a0f59354	Added error messages to the tests to make failed tests less cryptic	2013-08-20 16:13:37 -07:00
Ali Ghodsi	abcefb3858	fixed matei's comments	2013-08-20 16:13:37 -07:00
Ali Ghodsi	35537e6341	Made a function object that returns the coalesced groups	2013-08-20 16:13:37 -07:00

1 2 3 4 5 ...

2129 commits