ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
jerryshao	ffa5f8e11d	Fix issue when local properties pass from parent to child thread	2013-09-18 17:33:24 +08:00
Reynold Xin	37d8f37a8e	Added a submitJob interface that returns a Future of the result.	2013-09-17 21:13:59 -07:00
Reynold Xin	1cb42e6b2d	Properly handle job failure when the job gets killed.	2013-09-16 22:10:45 -07:00
Reynold Xin	cbc48be13b	Initial commit for job killing.	2013-09-16 18:54:06 -07:00
Holden Karau	bfcddf4700	Make mapPartitionsWithIndex work with JavaRDD's	2013-09-14 15:53:42 -07:00
Holden Karau	74f710f6cd	Start of working on SPARK-615	2013-09-11 22:35:58 -07:00
Mike	d34672f668	Set currentMemory to 0 in clear(). Remove unnecessary entries.get() call.	2013-09-11 18:01:19 -07:00
Kay Ousterhout	93c4253275	Changed localProperties to use ThreadLocal (not DynamicVariable). The fact that DynamicVariable uses an InheritableThreadLocal can cause problems where the properties end up being shared across threads in certain circumstances.	2013-09-11 13:01:39 -07:00
Patrick Wendell	91a59e6b10	Merge pull request #919 from mateiz/jets3t Add explicit jets3t dependency, which is excluded in hadoop-client	2013-09-11 10:21:48 -07:00
Patrick Wendell	b9128d34bf	Merge pull request #922 from pwendell/port-change Change default port number from 3030 to 4030.	2013-09-11 10:03:06 -07:00
Patrick Wendell	bddf135670	Change port from 3030 to 4040	2013-09-11 10:01:38 -07:00
David McCauley	5dd875c5b5	SPARK-894 - Not all WebUI fields delivered VIA JSON	2013-09-11 10:46:37 +01:00
Mike	293c758cc0	Remove MemoryStore$Entry.dropPending, unused as of `42e0a68082`.	2013-09-10 00:24:35 -07:00
Matei Zaharia	f117dc6d0d	Add explicit jets3t dependency, which is excluded in hadoop-client	2013-09-10 06:39:25 +00:00
Matei Zaharia	c81377b9ed	Merge pull request #915 from ooyala/master Get rid of / improve ugly NPE when Utils.deleteRecursively() fails	2013-09-09 20:16:19 -07:00
Evan Chan	fdb8b0eec3	Style fix: put body of if within curly braces	2013-09-09 14:29:32 -07:00
Matei Zaharia	a85758c200	Merge pull request #907 from stephenh/document_coalesce_shuffle Add better docs for coalesce.	2013-09-09 13:45:40 -07:00
Evan Chan	27726079e4	Print out more friendly error if listFiles() fails listFiles() could return null if the I/O fails, and this currently results in an ugly NPE which is hard to diagnose.	2013-09-09 12:58:12 -07:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	2186d93285	Add metrics-ganglia to core pom file	2013-09-09 12:37:33 -05:00
Stephen Haberman	59003d387d	Use a set since shuffle could change order.	2013-09-09 11:45:03 -05:00
Stephen Haberman	6471bfec73	Reword 'evenly distributed' to 'distributed with a hash partitioner.	2013-09-09 11:44:15 -05:00
Matei Zaharia	bf984e2745	Merge pull request #890 from mridulm/master Fix hash bug	2013-09-08 23:50:24 -07:00
Reynold Xin	e9d4f44a7a	Merge pull request #909 from mateiz/exec-id-fix Fix an instance where full standalone mode executor IDs were passed to	2013-09-08 23:36:48 -07:00
Matei Zaharia	7d3204b056	Merge pull request #905 from mateiz/docs2 Job scheduling and cluster mode docs	2013-09-08 21:39:12 -07:00
Patrick Wendell	f68848d95d	Merge pull request #906 from pwendell/ganglia-sink Clean-up of Metrics Code/Docs and Add Ganglia Sink	2013-09-08 18:32:16 -07:00
Matei Zaharia	f9b7f58de2	Fix an instance where full standalone mode executor IDs were passed to StandaloneSchedulerBackend instead of the smaller IDs used within Spark (that lack the application name). This was reported by ClearStory in https://github.com/clearstorydata/spark/pull/9. Also fixed some messages that said slave instead of executor.	2013-09-08 18:27:50 -07:00
Matei Zaharia	170b3869ee	Fix unit test failure due to changed default	2013-09-08 17:51:27 -07:00
Patrick Wendell	b4e382c210	Adding sc name in metrics source	2013-09-08 16:06:49 -07:00
Patrick Wendell	c190b48bf5	Adding more docs and some code cleanup	2013-09-08 13:46:28 -07:00
Stephen Haberman	df5fd35273	Add better docs for coalesce. Include the useful tip that if shuffle=true, coalesce can actually increase the number of partitions. This makes coalesce more like a generic `RDD.repartition` operation. (Ideally this `RDD.repartition` could automatically choose either a coalesce or a shuffle if numPartitions was either less than or greater than, respectively, the current number of partitions.)	2013-09-08 15:39:04 -05:00
Matei Zaharia	04cfb3aa9d	Merge pull request #898 from ilikerps/660 SPARK-660: Add StorageLevel support in Python	2013-09-08 10:33:20 -07:00
Patrick Wendell	8de8ee5d3c	Ganglia sink	2013-09-08 10:08:18 -07:00
Matei Zaharia	651a96adf7	More fair scheduler docs and property names. Also changed uses of "job" terminology to "application" when they referred to an entire Spark program, to avoid confusion.	2013-09-08 00:29:11 -07:00
Matei Zaharia	98fb69822c	Work in progress: - Add job scheduling docs - Rename some fair scheduler properties - Organize intro page better - Link to Apache wiki for "contributing to Spark"	2013-09-08 00:29:11 -07:00
Aaron Davidson	c1cc8c4da2	Export StorageLevel and refactor	2013-09-07 14:41:31 -07:00
Aaron Davidson	8001687af5	Remove reflection, hard-code StorageLevels The sc.StorageLevel -> StorageLevel pathway is a bit janky, but otherwise the shell would have to call a private method of SparkContext. Having StorageLevel available in sc also doesn't seem like the end of the world. There may be a better solution, though. As for creating the StorageLevel object itself, this seems to be the best way in Python 2 for creating singleton, enum-like objects: http://stackoverflow.com/questions/36932/how-can-i-represent-an-enum-in-python	2013-09-07 09:34:07 -07:00
Reynold Xin	210eae26f4	Fixed the bug that ResultTask was not properly deserializing outputId.	2013-09-07 21:59:47 +08:00
Aaron Davidson	b8a0b6ea5e	Memoize StorageLevels read from JVM	2013-09-06 15:36:04 -07:00
Reynold Xin	1e15feb5a3	Hot fix to resolve the compilation error caused by SPARK-821.	2013-09-06 22:44:05 +08:00
Patrick Wendell	ddcb9d310a	Merge pull request #895 from ilikerps/821 SPARK-821: Don't cache results when action run locally on driver	2013-09-05 23:54:09 -07:00
Aaron Davidson	a63d4c7dc2	SPARK-660: Add StorageLevel support in Python It uses reflection... I am not proud of that fact, but it at least ensures compatibility (sans refactoring of the StorageLevel stuff).	2013-09-05 23:36:27 -07:00
Aaron Davidson	3a04e76c89	Reynold's second round of comments	2013-09-05 21:43:26 -07:00
Matei Zaharia	699c331f2f	Merge pull request #891 from xiajunluan/SPARK-864 [SPARK-864]DAGScheduler Exception if we delete Worker and StandaloneExecutorBackend then add Worker	2013-09-05 20:21:53 -07:00
Aaron Davidson	4f2236a1c5	Add unit test and address comments	2013-09-05 18:06:30 -07:00
Aaron Davidson	1418d18af4	SPARK-821: Don't cache results when action run locally on driver Caching the results of local actions (e.g., rdd.first()) causes the driver to store entire partitions in its own memory, which may be highly constrained. This patch simply makes the CacheManager avoid caching the result of all locally-run computations.	2013-09-05 15:34:42 -07:00
Andrew xia	7c15e3c5de	Fix bug SPARK-864	2013-09-05 15:56:11 +08:00
Patrick Wendell	5c7494d7c1	Merge pull request #893 from ilikerps/master SPARK-884: Add unit test to validate Spark JSON output	2013-09-04 22:47:03 -07:00
Aaron Davidson	714e7f9e32	Fix line over 100 chars	2013-09-04 22:40:08 -07:00
Aaron Davidson	37db141aef	Address Patrick's comments	2013-09-04 21:34:20 -07:00
Aaron Davidson	9e6f2b6822	SPARK-884: Add unit test to validate Spark JSON output This unit test simply validates that the outputs of the JsonProtocol methods are syntactically valid JSON.	2013-09-04 15:26:46 -07:00
Mridul Muralidharan	1e2474b814	Address review comments - rename toHash to nonNegativeHash	2013-09-04 07:46:46 +05:30
Mridul Muralidharan	b3a82b7df3	Fix hash bug - caused failure after 35k stages, sigh	2013-09-04 07:02:25 +05:30
Mark Hamstra	c9bc8af3d1	Removed repetative import; fixes hidden definition compiler warning.	2013-09-03 15:25:20 -07:00
Patrick Wendell	c592a3c9b9	Minor spacing fix	2013-09-03 14:39:11 -07:00
Patrick Wendell	19f70273d2	Merge pull request #878 from tgravescs/yarnUILink Link the Spark UI up to the Yarn UI	2013-09-03 14:29:10 -07:00
Matei Zaharia	68df2464d1	Merge pull request #889 from alig/master Return the port the WebUI is bound to (useful if port 0 was used)	2013-09-03 13:01:17 -07:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	41c1b5b9a0	Update based on review comments. Change function to prependBaseUri and fix formatting.	2013-09-03 14:46:51 -05:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	c8cc276110	Review comment changes and update to org.apache packaging	2013-09-03 10:50:21 -05:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	547fc4a412	Merge remote-tracking branch 'mesos/master' into yarnUILink Conflicts: core/src/main/scala/org/apache/spark/ui/UIUtils.scala core/src/main/scala/org/apache/spark/ui/jobs/PoolTable.scala core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala docs/running-on-yarn.md	2013-09-03 08:36:59 -05:00
Ali Ghodsi	b25918d841	Merge branch 'master' of https://github.com/alig/spark Conflicts: core/src/main/scala/org/apache/spark/deploy/master/Master.scala	2013-09-03 00:56:12 -07:00
Ali Ghodsi	bd0788505f	Using configured akka timeouts	2013-09-03 00:50:35 -07:00
Ali Ghodsi	cbfef9b3ff	Sort order of imports to match project guidelines	2013-09-02 19:33:55 -07:00
Ali Ghodsi	36d8fca2cc	Reynold's comment fixed	2013-09-02 19:31:09 -07:00
Ali Ghodsi	e452bd6d77	Brushing the code up slightly	2013-09-02 19:04:08 -07:00
Ali Ghodsi	cf7b115496	Enabling getting the actual WEBUI port	2013-09-02 18:21:21 -07:00
Matei Zaharia	12b2f1f9c9	Add missing license headers found with RAT	2013-09-02 12:23:03 -07:00
Matei Zaharia	246bf67f58	Fix test	2013-09-02 10:57:34 -07:00
Matei Zaharia	9329a7d4cd	Fix spark.io.compression.codec and change default codec to LZF	2013-09-02 10:15:22 -07:00
Matei Zaharia	6550e5e60c	Allow PySpark to launch worker.py directly on Windows	2013-09-01 18:06:15 -07:00
Matei Zaharia	3db404a43a	Run script fixes for Windows after package & assembly change	2013-09-01 23:45:57 +00:00
Matei Zaharia	0a8cc30921	Move some classes to more appropriate packages: * RDD, RDDFunctions -> org.apache.spark.rdd Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util * JavaSerializer, KryoSerializer -> org.apache.spark.serializer	2013-09-01 14:13:16 -07:00
Matei Zaharia	5701eb92c7	Fix some URLs	2013-09-01 14:13:16 -07:00
Matei Zaharia	12495ec63a	Remove shutdown hook to stop jetty; this is unnecessary for releasing ports and creates noisy log messages	2013-09-01 14:13:15 -07:00
Matei Zaharia	46eecd110a	Initial work to rename package to org.apache.spark	2013-09-01 14:13:13 -07:00
Matei Zaharia	a30fac16ca	Merge pull request #883 from alig/master Don't require the spark home environment variable to be set for standalone mode (change needed by SIMR)	2013-09-01 12:27:50 -07:00
Matei Zaharia	e34bc3a8ee	Small tweak	2013-08-31 17:47:15 -07:00
Matei Zaharia	2ee6a7e32a	Print output from spark-daemon only when it fails to launch	2013-08-31 17:31:07 -07:00
Ali Ghodsi	250bddc255	Don't require spark home to be set for standalone mode	2013-08-31 17:29:05 -07:00
Matei Zaharia	25ac50668b	Various web UI improvements: - Use "fluid" layout that can expand to wide browser windows, instead of the old one's limit of 1200 px - Remove unnecessary <hr> elements - Switch back to Bootstrap's default theme and tweak progress bar colors - Make headers more consistent between deploy and app UIs - Replace some inline CSS with stylesheets	2013-08-31 16:55:40 -07:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	96452eea56	fix up minor things	2013-08-30 16:04:31 -05:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	bac46266a9	Link the Spark UI to the Yarn UI	2013-08-30 15:55:32 -05:00
Mikhail Bautin	35090958b3	Also add getConf to NewHadoopRDD	2013-08-30 11:03:57 -07:00
Mikhail Bautin	5e30172f70	Make HadoopRDD's configuration accessible	2013-08-30 11:01:06 -07:00
Matei Zaharia	ca71620950	Merge pull request #857 from mateiz/assembly Change build and run instructions to use assemblies	2013-08-29 21:51:14 -07:00
Matei Zaharia	666d93c294	Update Maven build to create assemblies expected by new scripts This includes the following changes: - The "assembly" package now builds in Maven by default, and creates an assembly containing both hadoop-client and Spark, unlike the old BigTop distribution assembly that skipped hadoop-client - There is now a bigtop-dist package to build the old BigTop assembly - The repl-bin package is no longer built by default since the scripts don't reply on it; instead it can be enabled with -Prepl-bin - Py4J is now included in the assembly/lib folder as a local Maven repo, so that the Maven package can link to it - run-example now adds the original Spark classpath as well because the Maven examples assembly lists spark-core and such as provided - The various Maven projects add a spark-yarn dependency correctly	2013-08-29 21:19:06 -07:00
Matei Zaharia	aab345c463	Fix finding of assembly JAR, as well as some pointers to ./run	2013-08-29 21:19:06 -07:00
Matei Zaharia	ab0e625d9e	Fix PySpark for assembly run and include it in dist	2013-08-29 21:19:06 -07:00
Matei Zaharia	53cd50c069	Change build and run instructions to use assemblies This commit makes Spark invocation saner by using an assembly JAR to find all of Spark's dependencies instead of adding all the JARs in lib_managed. It also packages the examples into an assembly and uses that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script with two better-named scripts: "run-examples" for examples, and "spark-class" for Spark internal classes (e.g. REPL, master, etc). This is also designed to minimize the confusion people have in trying to use "run" to run their own classes; it's not meant to do that, but now at least if they look at it, they can modify run-examples to do a decent job for them. As part of this, Bagel's examples are also now properly moved to the examples package instead of bagel.	2013-08-29 21:19:04 -07:00
jerryshao	f3dbe6b215	Fix removed block zero size log reporting	2013-08-30 09:39:01 +08:00
Patrick Wendell	abdbacf252	Merge pull request #871 from pwendell/expose-local Expose `isLocal` in SparkContext.	2013-08-28 21:11:31 -07:00
Patrick Wendell	30d2421112	Make local variable public	2013-08-28 19:53:31 -07:00
Matei Zaharia	baa84e7e4c	Merge pull request #865 from tgravescs/fixtmpdir Spark on Yarn should use yarn approved directories for spark.local.dir and tmp	2013-08-28 12:44:46 -07:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	aac1214ee4	Change Executor to only look at the env variable SPARK_YARN_MODE	2013-08-28 13:26:26 -05:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	3f206bf0b5	Updated based on review comments.	2013-08-27 14:34:27 -05:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	cf52a3cba6	Allow for Executors to have different directories then the Spark Master for Yarn	2013-08-27 11:00:21 -05:00
Reynold Xin	a77e0abb96	Added worker state to the cluster master JSON ui.	2013-08-26 11:21:03 -07:00
Reynold Xin	9db1e50344	Revert "Merge pull request #841 from rxin/json" This reverts commit `1fb1b09928`, reversing changes made to `c69c48947d`.	2013-08-26 11:05:14 -07:00
Matei Zaharia	8a36fd09dd	Merge pull request #854 from markhamstra/pomUpdate Synced sbt and maven builds to use the same dependencies, etc.	2013-08-22 10:13:35 -07:00
Matei Zaharia	c2d00f12e2	Merge pull request #832 from alig/coalesce Coalesced RDD with locality	2013-08-22 10:13:03 -07:00
Mark Hamstra	ff6f1b0500	Synced sbt and maven builds	2013-08-21 13:50:24 -07:00
Mark Hamstra	5eea613ec0	Removed meaningless types	2013-08-20 16:49:18 -07:00
Ali Ghodsi	f20ed14e87	Merged in from upstream to use TaskLocation instead of strings	2013-08-20 16:21:43 -07:00
Ali Ghodsi	5cd21c4195	added curly braces to make the code more consistent	2013-08-20 16:16:05 -07:00
Ali Ghodsi	db4bc55bef	indent	2013-08-20 16:16:05 -07:00
Ali Ghodsi	c0942a710f	Bug in test fixed	2013-08-20 16:16:05 -07:00
Ali Ghodsi	5db41919b5	Added a test to make sure no locality preferences are ignored	2013-08-20 16:16:05 -07:00
Ali Ghodsi	7b123b3126	Simpler code	2013-08-20 16:16:05 -07:00
Ali Ghodsi	9192c358e4	simpler code	2013-08-20 16:16:05 -07:00
Ali Ghodsi	a75a64eade	Fixed almost all of Matei's feedback	2013-08-20 16:16:05 -07:00
Ali Ghodsi	f1c853d76d	fixed Matei's comments	2013-08-20 16:16:04 -07:00
Ali Ghodsi	890ea6ba79	making CoalescedRDDPartition public	2013-08-20 16:16:04 -07:00
Ali Ghodsi	d6b6c680be	comment in the test to make it more understandable	2013-08-20 16:16:04 -07:00
Ali Ghodsi	b69e7166ba	Coalescer now uses current preferred locations for derived RDDs. Made run() in DAGScheduler thread safe and added a method to be able to ask it for preferred locations. Added a similar method that wraps the former inside SparkContext.	2013-08-20 16:16:04 -07:00
Ali Ghodsi	3b5bb8a4ae	added one test that will test a future functionality	2013-08-20 16:13:37 -07:00
Ali Ghodsi	33a0f59354	Added error messages to the tests to make failed tests less cryptic	2013-08-20 16:13:37 -07:00
Ali Ghodsi	abcefb3858	fixed matei's comments	2013-08-20 16:13:37 -07:00
Ali Ghodsi	35537e6341	Made a function object that returns the coalesced groups	2013-08-20 16:13:37 -07:00
Ali Ghodsi	339598c080	several of Reynold's suggestions implemented	2013-08-20 16:13:37 -07:00
Ali Ghodsi	02d6464f2f	space removed	2013-08-20 16:13:37 -07:00
Ali Ghodsi	4f99be1ffd	use count rather than foreach	2013-08-20 16:13:37 -07:00
Ali Ghodsi	f67753cdfc	made preferredLocation a val of the surrounding case class	2013-08-20 16:13:37 -07:00
Ali Ghodsi	f24861b60a	Fix bug in tests	2013-08-20 16:13:36 -07:00
Ali Ghodsi	f6e47e8b51	Renamed split to partition	2013-08-20 16:13:36 -07:00
Ali Ghodsi	937f72feb8	word wrap before 100 chars per line	2013-08-20 16:13:36 -07:00
Ali Ghodsi	c4d59910b1	added goals inline as comment	2013-08-20 16:13:36 -07:00
Ali Ghodsi	7a2a33e32d	Large scale load and locality tests for the coalesced partitions added	2013-08-20 16:13:36 -07:00
Ali Ghodsi	66edf854aa	Bug, should compute slack wrt parent partition size, not number of bins	2013-08-20 16:13:36 -07:00
Ali Ghodsi	1ede102ba5	load balancing coalescer	2013-08-20 16:13:36 -07:00
Matei Zaharia	aa2b89d98d	Merge remote-tracking branch 'jey/hadoop-agnostic' Conflicts: core/src/main/scala/spark/PairRDDFunctions.scala	2013-08-20 10:14:15 -07:00
Mark Hamstra	1630fbf838	changeGeneration --> changeEpoch renaming	2013-08-20 00:17:16 -07:00
Mark Hamstra	ad18410427	Renamed 'priority' to 'jobId' and assorted minor changes	2013-08-20 00:07:04 -07:00
Matei Zaharia	8cae72e94e	Merge pull request #828 from mateiz/sched-improvements Scheduler fixes and improvements	2013-08-19 23:40:04 -07:00
Matei Zaharia	efeb142981	Merge pull request #849 from mateiz/web-fixes Small fixes to web UI	2013-08-19 19:23:50 -07:00
Matei Zaharia	793a722f8e	Allow some wiggle room in UISuite port test and in EC2 ports	2013-08-19 18:51:00 -07:00
Matei Zaharia	abdc1f8bbb	Merge pull request #847 from rxin/rdd Allow subclasses of Product2 in all key-value related classes	2013-08-19 18:30:56 -07:00
Matei Zaharia	498a26189b	Small fixes to web UI: - Use SPARK_PUBLIC_DNS environment variable if set (for EC2) - Use a non-ephemeral port (3030 instead of 33000) by default - Updated test to use non-ephemeral port too	2013-08-19 18:17:49 -07:00
Reynold Xin	5054abd41b	Code review feedback. (added tests for cogroup and substract; added more documentation on MutablePair)	2013-08-19 12:58:02 -07:00
Reynold Xin	acc4aa1f47	Added a test for sorting using MutablePair's.	2013-08-19 11:02:10 -07:00
Reynold Xin	71d705a66e	Made PairRDDFunctions taking only Tuple2, but made the rest of the shuffle code path working with general Product2.	2013-08-19 00:40:43 -07:00
Reynold Xin	2a7b99c08b	Added the missing RDD files and cleaned up SparkContext.	2013-08-18 20:39:29 -07:00
Reynold Xin	82bf4c0339	Allow subclasses of Product2 in all key-value related classes (ShuffleDependency, PairRDDFunctions, etc).	2013-08-18 20:25:45 -07:00
Matei Zaharia	8ac3d1e263	Added unit tests for ClusterTaskSetManager, and fix a bug found with resetting locality level after a non-local launch	2013-08-18 19:51:07 -07:00
Matei Zaharia	4004cf775d	Added some comments on threading in scheduler code	2013-08-18 19:51:07 -07:00
Matei Zaharia	2a4ed10210	Address some review comments: - When a resourceOffers() call has multiple offers, force the TaskSets to consider them in increasing order of locality levels so that they get a chance to launch stuff locally across all offers - Simplify ClusterScheduler.prioritizeContainers - Add docs on the new configuration options	2013-08-18 19:51:07 -07:00
Matei Zaharia	222c897128	Comment cleanup (via Kay) and some debug messages	2013-08-18 19:51:07 -07:00
Matei Zaharia	cf39d45d14	More scheduling fixes: - Added periodic revival of offers in StandaloneSchedulerBackend - Replaced task scheduling aggression with multi-level delay scheduling in ClusterTaskSetManager - Fixed ZippedRDD preferred locations because they can't currently be process-local - Fixed some uses of hostPort	2013-08-18 19:51:07 -07:00
Matei Zaharia	90a04dab8d	Initial work towards scheduler refactoring: - Replace use of hostPort vs host in Task.preferredLocations with a TaskLocation class that contains either an executorId and a host or just a host. This is part of a bigger effort to eliminate hostPort based data structures and just use executorID, since the hostPort vs host stuff is confusing (and not checkable with static typing, leading to ugly debug code), and hostPorts are not provided by Mesos. - Replaced most hostPort-based data structures and fields as above. - Simplified ClusterTaskSetManager to deal with preferred locations in a more concise way and generally be more concise. - Updated the way ClusterTaskSetManager handles racks: instead of enqueueing a task to a separate queue for all the hosts in the rack, which would create lots of large queues, have one queue per rack name. - Removed non-local fallback stuff in ClusterScheduler that tried to launch less-local tasks on a node once the local ones were all assigned. This change didn't work because many cluster schedulers send offers for just one node at a time (even the standalone and YARN ones do so as nodes join the cluster one by one). Thus, lots of non-local tasks would be assigned even though a node with locality for them would be able to receive tasks just a short time later. - Renamed MapOutputTracker "generations" to "epochs".	2013-08-18 19:51:06 -07:00
Jey Kottalam	bdd861c6c3	Fix Maven build with Hadoop 0.23.9	2013-08-18 18:28:57 -07:00
Matei Zaharia	8fa0747978	Merge pull request #840 from AndreSchumacher/zipegg Implementing SPARK-878 for PySpark: adding zip and egg files to context ...	2013-08-18 17:02:54 -07:00
Reynold Xin	2c00ea3efc	Moved shuffle serializer setting from a constructor parameter to a setSerializer method in various RDDs that involve shuffle operations.	2013-08-17 21:43:29 -07:00
Reynold Xin	0e84fee76b	Removed the mapSideCombine option in partitionBy.	2013-08-17 21:13:41 -07:00
Reynold Xin	10af952a3d	Removed the mapSideCombine option in CoGroupedRDD.	2013-08-17 21:07:34 -07:00
Reynold Xin	5d050a3e1f	Removed the unused shuffleId in ShuffleDependency's constructor.	2013-08-16 23:23:16 -07:00
Matei Zaharia	e89ffc7b3c	Merge pull request #839 from jegonzal/zip_partitions Currying RDD.zipPartitions	2013-08-16 14:02:34 -07:00
Jey Kottalam	ad580b94d5	Maven build now also works with YARN	2013-08-16 13:50:12 -07:00
Jey Kottalam	9dd15fe700	Don't mark hadoop-client as 'provided'	2013-08-16 13:50:12 -07:00
Jey Kottalam	11b42a84db	Maven build now works with CDH hadoop-2.0.0-mr1	2013-08-16 13:50:12 -07:00
Jey Kottalam	353fab2440	Initial changes to make Maven build agnostic of hadoop version	2013-08-16 13:50:12 -07:00
Joseph E. Gonzalez	53b2639a1e	Reversing the argument order in zipPartitions to enable stronger type inference.	2013-08-16 12:38:59 -07:00
Andre Schumacher	c7e348faec	Implementing SPARK-878 for PySpark: adding zip and egg files to context and passing it down to workers which add these to their sys.path	2013-08-16 11:58:20 -07:00
Reynold Xin	c961c19b7b	Use the JSON formatter from Scala library and removed dependency on lift-json. It made the JSON creation slightly more complicated, but reduces one external dependency. The scala library also properly escape "/" (which lift-json doesn't).	2013-08-15 18:23:01 -07:00
Reynold Xin	eddbf43b54	Revert "Merge pull request #834 from Daemoen/master" This reverts commit `230ab2722e`, reversing changes made to `659553b21d`.	2013-08-15 17:49:37 -07:00
Reynold Xin	230ab2722e	Merge pull request #834 from Daemoen/master Updated json output to allow for display of worker state	2013-08-15 17:45:17 -07:00
Patrick Wendell	659553b21d	Merge pull request #836 from pwendell/rename Rename `memoryBytesToString` and `memoryMegabytesToString`	2013-08-15 16:56:31 -07:00
Jey Kottalam	a06a9d5c5f	Rename HadoopWriter to SparkHadoopWriter since it's outside of our package	2013-08-15 16:50:37 -07:00
Jey Kottalam	8f979edef5	Fix newTaskAttemptID to work under YARN	2013-08-15 16:50:37 -07:00
Jey Kottalam	e2d7656ca3	re-enable YARN support	2013-08-15 16:50:37 -07:00
Jey Kottalam	bd0bab47c9	SparkEnv isn't available this early, and not needed anyway	2013-08-15 16:50:37 -07:00
Jey Kottalam	4f43fd791a	make SparkHadoopUtil a member of SparkEnv	2013-08-15 16:50:37 -07:00
Jey Kottalam	43ebcb8484	rename HadoopMapRedUtil => SparkHadoopMapRedUtil, HadoopMapReduceUtil => SparkHadoopMapReduceUtil	2013-08-15 16:50:37 -07:00
Jey Kottalam	8b1c1520fc	add comment	2013-08-15 16:50:37 -07:00
Jey Kottalam	69c3bbf688	dynamically detect hadoop version	2013-08-15 16:50:37 -07:00
Jey Kottalam	f67b94ad4f	remove core/src/hadoop{1,2} dirs	2013-08-15 16:50:36 -07:00
Jey Kottalam	b877e20a33	move yarn to its own directory	2013-08-15 16:50:36 -07:00
Patrick Wendell	4c6ade1ad5	Rename `memoryBytesToString` and `memoryMegabytesToString` These are used all over the place now and they are not specific to memory at all. memoryBytesToString --> bytesToString memoryMegabytesToString --> megabytesToString	2013-08-15 15:58:07 -07:00
Reynold Xin	1a51deae8a	More minor UI changes including code review feedback.	2013-08-15 14:34:07 -07:00
Daemoen	ad2e8b5126	Updated json output to allow for display of worker state Ops teams need to ensure that the cluster is functional and performant. Having to scrape the html source for worker state won't work reliably, and will be slow. By exposing the state in the json output, ops teams are able to ensure a fully functional environment by querying for the json output and parsing for dead nodes.	2013-08-15 12:19:14 -07:00
Reynold Xin	2d2a556bdf	Various UI improvements.	2013-08-14 23:23:09 -07:00
Reynold Xin	290e3e6e65	Renamed setCurrentJobDescription to setJobDescription.	2013-08-14 18:40:53 -07:00
Reynold Xin	3886b54933	A few small scheduler / job description changes. 1. Renamed SparkContext.addLocalProperty to setLocalProperty. And allow this function to unset a property. 2. Renamed SparkContext.setDescription to setCurrentJobDescription. 3. Throw an exception if the fair scheduler allocation file is invalid.	2013-08-14 17:19:42 -07:00
Matei Zaharia	839f2d4f3f	Merge pull request #822 from pwendell/ui-features Adding GC Stats to TaskMetrics (and three small fixes)	2013-08-14 16:17:23 -07:00
Patrick Wendell	04ad78b09d	Style cleanup based on Matei feedback	2013-08-14 14:57:21 -07:00
Kay Ousterhout	a88aa5e6ed	Fixed 2 bugs in executor UI. 1) UI crashed if the executor UI was loaded before any tasks started. 2) The total tasks was incorrectly reported due to using string (rather than int) arithmetic.	2013-08-13 23:44:58 -07:00
Patrick Wendell	c223176388	Small style clean-up	2013-08-13 16:56:37 -07:00
Patrick Wendell	fab5cee111	Correcting terminology in RDD page	2013-08-13 16:25:55 -07:00
Patrick Wendell	024e5c5ce1	Correct sorting order for stages	2013-08-13 16:25:55 -07:00
Patrick Wendell	4e9f0c2df6	Capturing GC detials in TaskMetrics	2013-08-13 16:25:55 -07:00
Patrick Wendell	f0382007dc	Bug fix for display of shuffle read/write metrics. This fixes an error where empty cells are missing if a given task has no shuffle read/write.	2013-08-13 16:25:55 -07:00
Matei Zaharia	d316af9c84	Merge pull request #821 from pwendell/print-launch-command Print run command to stderr rather than stdout	2013-08-13 15:31:01 -07:00
Patrick Wendell	a7feb69ae8	Print run command to stderr rather than stdout	2013-08-13 15:07:03 -07:00
Kay Ousterhout	1beb843a6f	Reuse the set of failed states rather than creating a new object each time	2013-08-13 14:27:40 -07:00
Kay Ousterhout	c92dd627ca	Properly account for killed tasks. The TaskState class's isFinished() method didn't return true for KILLED tasks, which means some resources are never reclaimed for tasks that are killed. This also made it inconsistent with the isFinished() method used by CoarseMesosSchedulerBackend.	2013-08-13 12:40:15 -07:00
Patrick Wendell	ed6a1646e6	Slight change to pr-784	2013-08-13 09:29:40 -07:00
Patrick Wendell	a0133bfbad	Merge pull request #784 from jerryshao/dev-metrics-servlet Add MetricsServlet for Spark metrics system	2013-08-13 09:28:18 -07:00
Matei Zaharia	65d0d91fba	Merge pull request #807 from JoshRosen/guava-optional Change scala.Option to Guava Optional in Java APIs	2013-08-12 19:00:57 -07:00
Josh Rosen	cf08bb7a3e	Fix import organization.	2013-08-12 18:55:02 -07:00
jerryshao	09c7179e81	MetricsServlet code refactor according to comments	2013-08-12 13:23:23 +08:00
jerryshao	320e87e7ab	Add MetricsServlet for Spark metrics system	2013-08-12 13:23:23 +08:00
Reynold Xin	e5b9ed2833	Merge pull request #808 from pwendell/ui_compressed_bytes Report compressed bytes read when calculating TaskMetrics	2013-08-11 17:22:47 -07:00
Patrick Wendell	3d8f281604	Report compressed bytes read when calculating TaskMetrics	2013-08-11 16:25:57 -07:00
Matei Zaharia	379648630b	Merge pull request #805 from woggle/hadoop-rdd-jobconf Use new Configuration() instead of slower new JobConf() in SerializableWritable	2013-08-11 14:51:47 -07:00
Josh Rosen	d7f78b443b	Change scala.Option to Guava Optional in Java APIs.	2013-08-11 12:05:09 -07:00
Charles Reiss	6402b539d0	Use new Configuration() instead of new JobConf() for ObjectWritable. JobConf's constructor loads default config files in some verisons of Hadoop, which is quite slow, and we only need the Configuration object to pass the correct ClassLoader.	2013-08-10 21:31:05 -07:00
Matei Zaharia	71c63de22f	Merge pull request #795 from mridulm/master Fix bug reported in PR 791 : a race condition in ConnectionManager and Connection	2013-08-10 10:21:20 -07:00
Matei Zaharia	d3277a0daf	Merge remote-tracking branch 'origin/pr/792' Conflicts: core/src/main/scala/spark/ui/jobs/IndexPage.scala core/src/main/scala/spark/ui/jobs/StagePage.scala	2013-08-10 10:18:50 -07:00
Patrick Wendell	d17eeb997d	Merge pull request #785 from anfeng/master expose HDFS file system stats via Executor metrics	2013-08-10 09:02:27 -07:00
Kay Ousterhout	14d14f451a	Shortened names, as per Matei's suggestion	2013-08-10 07:50:27 -07:00
Matei Zaharia	cd247ba5bb	Merge pull request #786 from shivaram/mllib-java Java fixes, tests and examples for ALS, KMeans	2013-08-09 20:41:13 -07:00
Kay Ousterhout	7810a76512	Only print event queue full error message once	2013-08-09 18:20:48 -07:00
Kay Ousterhout	44ca8629d8	Style fix: removing unnecessary return type	2013-08-09 17:22:50 -07:00
Kay Ousterhout	29b79714f9	Style fixes based on code review	2013-08-09 16:46:34 -07:00
Kay Ousterhout	81e1d4a7d1	Refactored SparkListener to process all events asynchronously. This commit fixes issues where SparkListeners that take a while to process events slow the DAGScheduler. This commit also fixes a bug in the UI where if a user goes to a web page of a stage that does not exist, they can create a memory leak (granted, this is not an issue at small scale -- probably only an issue if someone actively tried to DOS the UI).	2013-08-09 13:27:41 -07:00
Matei Zaharia	b09d4b79e8	Merge pull request #799 from woggle/sync-fix Remove extra synchronization in ResultTask	2013-08-09 13:17:08 -07:00
Patrick Wendell	cc6b92e80e	Merge pull request #775 from pwendell/print-launch-command Log the launch command for Spark daemons	2013-08-09 13:00:33 -07:00
Patrick Wendell	3970b580c2	Using quotes when printing out command	2013-08-09 11:53:32 -07:00
Charles Reiss	9dfc280f74	Remove extra synchronization in ResultTask	2013-08-09 11:09:02 -07:00
Matei Zaharia	f94fc75c3f	Merge pull request #788 from shane-huang/sparkjavaopts For standalone mode, add worker local env setting of SPARK_JAVA_OPTS as ...	2013-08-09 10:04:03 -07:00
Matei Zaharia	d1e1c1b24d	Add test for Kryo with WrappedArray (which was failing in Chill 0.3.0)	2013-08-08 13:34:11 -07:00
Matei Zaharia	5a4003c1ac	Update to Chill 0.3.1	2013-08-08 13:30:27 -07:00
Mridul Muralidharan	c230ca3b4e	Change line size	2013-08-08 22:28:40 +05:30
Mridul Muralidharan	dc47084f4e	Attempt to fix bug reported in PR 791 : a race condition in ConnectionManager and Connection	2013-08-08 22:19:27 +05:30
Kay Ousterhout	88049a214d	Fixed 3 bugs that caused UI to crash (including SPARK-810). One bug caused the UI to crash if you try to look at a job's status before any of the tasks have finished. The second bug was a concurrency issue where two different threads (the scheduling thread and a UI thread) could be reading/updating the data structures in JobProgressListener concurrently. The third bug mis-used an Option, also causing the UI to crash under certain conditions.	2013-08-07 23:09:25 -07:00
Patrick Wendell	b4321edf68	Reverting boostrap change	2013-08-07 22:18:18 -07:00
Patrick Wendell	21392f2a73	Change I forgot to merge in	2013-08-07 21:45:32 -07:00
Patrick Wendell	706394b370	Bumping font size to 14px and fixing sytle issue in progress bars	2013-08-07 21:27:04 -07:00
Patrick Wendell	8c0d668468	Merge branch 'master' into bootstrap-design Conflicts: core/src/main/scala/spark/ui/UIUtils.scala core/src/main/scala/spark/ui/jobs/IndexPage.scala core/src/main/scala/spark/ui/storage/RDDPage.scala	2013-08-07 21:06:03 -07:00
Kay Ousterhout	b88e26248e	Fixed issue in UI that limited scheduler throughput. Removal of items from ArrayBuffers in the UI code was slow and significantly impacted scheduler throughput. This commit improves scheduler throughput by 5x.	2013-08-07 14:42:05 -07:00
shane-huang	cbc5107e36	For standalone mode, add worker local env setting of SPARK_JAVA_OPTS as default and let application env override default options if applicable Signed-off-by: shane-huang <shengsheng.huang@intel.com>	2013-08-07 14:36:48 +08:00
Matei Zaharia	6b043a6f11	Merge pull request #724 from dlyubimov/SPARK-826 SPARK-826: fold(), reduce(), collect() always attempt to use java serialization	2013-08-06 22:31:02 -07:00
Matei Zaharia	7c4b7a53b1	Merge remote-tracking branch 'origin/pr/781' Conflicts: core/src/main/resources/spark/ui/static/webui.css	2013-08-06 17:19:49 -07:00
Karen Feng	908032e79b	Used saturated colors for progress bars	2013-08-06 16:52:21 -07:00
Karen Feng	8bc497fa10	Lightened color of progress bars	2013-08-06 16:33:05 -07:00
Karen Feng	ca1903ea63	Overlays progress text on top of bar	2013-08-06 15:45:42 -07:00
Matei Zaharia	df4d10d630	Merge pull request #779 from adatao/adatao-global-SparkEnv [HOTFIX] Extend thread safety for SparkEnv.get()	2013-08-06 15:44:05 -07:00
Shivaram Venkataraman	471fbadd0c	Java examples, tests for KMeans and ALS - Changes ALS to accept RDD[Rating] instead of (Int, Int, Double) making it easier to call from Java - Renames class methods from `train` to `run` to enable static methods to be called from Java. - Add unit tests which check if both static / class methods can be called. - Also add examples which port the main() function in ALS, KMeans to the examples project. Couple of minor changes to existing code: - Add a toJavaRDD method in RDD to convert scala RDD to java RDD easily - Workaround a bug where using double[] from Java leads to class cast exception in KMeans init	2013-08-06 15:43:46 -07:00
anfeng	dda2ac8b5d	reformat registerFileSystemStat()	2013-08-06 15:22:25 -07:00
Karen Feng	099528b6c4	Pre-sorts stage/env tables, changes text/link of stage summaries	2013-08-06 14:52:12 -07:00
Karen Feng	254a930730	Reverse sorts StageTable by submitted time	2013-08-06 14:18:38 -07:00
Karen Feng	5ed5b73026	Sorts first column of env tables	2013-08-06 13:59:53 -07:00
anfeng	0748c60817	expose HDFS file system stats via Executor metrics	2013-08-06 11:47:06 -07:00
Reynold Xin	d031f73679	Merge pull request #782 from WANdisco/master SHARK-94 Log the files computed by HadoopRDD and NewHadoopRDD	2013-08-05 22:33:00 -07:00
Matei Zaharia	1b63dea816	Merge pull request #769 from markhamstra/NegativeCores SPARK-847 + SPARK-845: Zombie workers and negative cores	2013-08-05 22:21:26 -07:00
Alexander Pivovarov	a30866438b	SHARK-94 Log the files computed by HadoopRDD and NewHadoopRDD	2013-08-05 21:48:43 -07:00
Matei Zaharia	8b277892c9	Merge pull request #774 from pwendell/job-description Show user-defined job name in UI	2013-08-05 19:14:52 -07:00
Christopher Nguyen	b1bbbe699c	[HOTFIX] Mark lastSetSparkEnv @volatile in case it gets HotSpot-cached On branch adatao-global-SparkEnv Changes to be committed: modified: core/src/main/scala/spark/SparkEnv.scala	2013-08-05 17:22:27 -07:00
Mark Hamstra	35d8f5ee52	Moved handling of timed out workers within the Master actor	2013-08-05 13:13:56 -07:00
Mark Hamstra	37ccf9301a	milliseconds -> seconds in timeOutDeadWorkers logging	2013-08-05 13:13:56 -07:00
Mark Hamstra	cdd1af562e	Timeout zombie workers	2013-08-05 13:13:56 -07:00
Mikhail Bautin	e8bec8365f	Only reduce the number of cores once when removing an executor	2013-08-05 13:13:56 -07:00
Karen Feng	95025afdec	Made most small fixes for SPARK-849 except for table sort, task progress overlay	2013-08-05 13:04:56 -07:00
Bill Zhao	87134b3648	SPARK-850: give better console message	2013-08-05 11:55:35 -07:00
Christopher Nguyen	39e4fda76f	[HOTFIX] Extend thread safety for SparkEnv.get() A ThreadLocal SparkEnv.env is facing various situations leading to NullPointerExceptions, where SparkEnv.env set in one thread is not gettable in another thread, but often assumed to be available. See, e.g., https://groups.google.com/forum/#!topic/spark-developers/GLx8yunSj0A This hotfixes SparkEnv.env to return either (a) the ThreadLocal value if non-null, or (b) the previously set value in any thread. This approach preserves SparkEnv.set() thread safety needed by RDD.compute() and possibly other places. A refactoring that parameterizes SparkEnv should be addressed subsequently. On branch adatao-global-SparkEnv Changes to be committed: modified: core/src/main/scala/spark/SparkEnv.scala	2013-08-05 02:09:54 -07:00
Patrick Wendell	f3660d5ab8	Make output formatting consistent between bash/scala	2013-08-03 21:30:15 -07:00
Patrick Wendell	ad94fbb322	Log the launch command for Spark executors	2013-08-03 09:19:46 -07:00
Matei Zaharia	22abbc10d6	Merge pull request #772 from karenfeng/ui-843 Show app duration	2013-08-02 16:37:59 -07:00
Patrick Wendell	5b3784a79c	Show user-defined job name in UI	2013-08-02 15:47:41 -07:00
Karen Feng	b3ae5b25d5	Shows time the app has been running	2013-08-02 13:25:14 -07:00
Patrick Wendell	9d7dfd2d5a	Merge pull request #743 from pwendell/app-metrics Add application metrics to standalone master	2013-08-01 17:41:58 -07:00
Patrick Wendell	f1d2ad550e	under_scores --> camelCase for config options	2013-08-01 15:26:26 -07:00
Patrick Wendell	12d9c82c9b	Small style fix	2013-08-01 15:25:52 -07:00
Patrick Wendell	37bc64a205	Adding application-level metrics. This adds metrics for applications in the deploy Master.	2013-08-01 15:25:52 -07:00
Karen Feng	73692f3cb9	Unify, reduce body font size	2013-08-01 15:10:30 -07:00
Patrick Wendell	87fd321a5a	Minor refactoring and code cleanup	2013-08-01 15:02:31 -07:00
Patrick Wendell	b10199413a	Slight refactoring to SparkContext functions	2013-08-01 15:00:42 -07:00
Patrick Wendell	cfcd77b5da	Increasing inter job arrival	2013-08-01 15:00:42 -07:00
Patrick Wendell	5faac7f4f3	Minor style fixes	2013-08-01 15:00:42 -07:00
Patrick Wendell	5e7b38fbb3	Merge pull request #695 from xiajunluan/pool_ui Enhance job ui in spark ui system with adding pool information	2013-08-01 14:59:33 -07:00
Karen Feng	47600e9579	Removed hr margin	2013-08-01 14:57:04 -07:00
Karen Feng	e648a62fc8	Inserted needed line break for log paging	2013-08-01 14:46:19 -07:00
Karen Feng	686d6266c4	Use nav pills instead of default	2013-08-01 14:41:49 -07:00
Karen Feng	86d372d17f	Removed line breaks	2013-08-01 14:37:21 -07:00
Karen Feng	99803d88b9	Reduced all header sizes	2013-08-01 14:18:33 -07:00
Karen Feng	d216d687ef	Reduced size of table text to compact	2013-08-01 13:27:23 -07:00
Karen Feng	5dae283996	Merge branch 'master' of https://github.com/mesos/spark into bootstrap-update	2013-08-01 11:28:28 -07:00
Matei Zaharia	0a96493ac6	Merge pull request #760 from karenfeng/heading-update Clean up web UI page headers	2013-08-01 11:27:17 -07:00
Patrick Wendell	9177bea2b4	Removing extra imports	2013-08-01 10:42:50 -07:00
Patrick Wendell	3e4d5e5f8b	Merge branch 'master' into master-json Conflicts: core/src/main/scala/spark/deploy/master/ui/IndexPage.scala	2013-08-01 10:42:07 -07:00
Patrick Wendell	ffc034e4fb	Import cleanup	2013-08-01 10:39:56 -07:00
Andrew xia	d58502a156	fix bug of spark "SubmitStage" listener as unit test error	2013-08-01 23:21:41 +08:00
Andrew xia	3b5a11e765	change function name "setName" to "setProperties" as "setName" is also member of Thread class	2013-08-01 19:37:15 +08:00
Dmitriy Lyubimov	d29ee3689b	Merge fixes merge commit hasn't picked	2013-08-01 00:21:26 -07:00
Dmitriy Lyubimov	cb6be5bd7e	Merge remote-tracking branch 'mesos/master' into SPARK-826 Conflicts: core/src/main/scala/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/spark/scheduler/local/LocalTaskSetManager.scala core/src/test/scala/spark/KryoSerializerSuite.scala	2013-07-31 22:09:22 -07:00
Dmitriy Lyubimov	28f1550f01	More elegant rewrite of the same.	2013-07-31 21:41:00 -07:00
Dmitriy Lyubimov	7c52ecc6a4	(1) added reduce test case. (2) added nested streaming in ParallelCollectionRDD (3) added kryo with fold test which still doesn't work	2013-07-31 19:27:30 -07:00
Matei Zaharia	3097d75d6f	Merge remote-tracking branch 'dlyubimov/SPARK-827' Conflicts: docs/configuration.md	2013-07-31 18:36:43 -07:00
Karen Feng	7c9c5ef6c6	Merge branch 'master' of https://github.com/mesos/spark into bootstrap-update	2013-07-31 16:39:26 -07:00
Karen Feng	02cde8efdf	Replaces theme with Bootswatch Spacelab theme	2013-07-31 16:34:07 -07:00
Karen Feng	09cd67bf98	Changed bootstrap colors, fixed logpaging buttons	2013-07-31 16:18:53 -07:00
Matei Zaharia	39c75f3033	Merge pull request #757 from BlackNiuza/result_task_generation Bug fix: SPARK-837	2013-07-31 15:52:36 -07:00
Matei Zaharia	14bf2fe039	Merge pull request #749 from benh/spark-executor-uri Added property 'spark.executor.uri' for launching on Mesos.	2013-07-31 14:18:16 -07:00
Benjamin Hindman	4692ea4892	Used 'uri.split('/').last' instead of 'new File(uri).getName()'.	2013-07-31 12:29:44 -07:00
Karen Feng	c453967f9a	Reduced size of heading	2013-07-31 11:57:50 -07:00
Matei Zaharia	a386ced2c6	Merge pull request #754 from rxin/compression Compression codec change	2013-07-31 11:22:50 -07:00
Karen Feng	49e6344142	Removed master URL from job UI, reduced heading size of basic spark pages	2013-07-31 11:17:59 -07:00
Reynold Xin	c61843a69f	Changed other LZF uses to use the compression codec interface.	2013-07-31 10:32:13 -07:00
Patrick Wendell	89da9d94b3	Add JSON path to master index page	2013-07-31 09:47:53 -07:00
BlackNiuza	9a815de4bf	write and read generation in ResultTask	2013-08-01 00:36:47 +08:00
Roman Tkalenko	0c6553714a	Refactored Vector.apply(length, initializer) replacing excessive code with library method (also removed unused variable ```ans``` as minor change)	2013-07-31 19:05:46 +03:00
Matei Zaharia	12553e5c55	Simplified nonNegativeMod to match previous version	2013-07-31 08:50:28 -07:00
Matei Zaharia	d4556f4207	Merge pull request #751 from cdshines/master Cleaned Partitioner & PythonPartitioner source by taking out non-related logic to Utils	2013-07-31 08:48:14 -07:00
Andrew xia	5670c96f29	Merge branch 'master' into Pool_UI Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/scheduler/DAGScheduler.scala core/src/main/scala/spark/scheduler/SparkListener.scala core/src/main/scala/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/spark/scheduler/cluster/TaskSetManager.scala core/src/main/scala/spark/scheduler/local/LocalTaskSetManager.scala core/src/main/scala/spark/ui/jobs/IndexPage.scala core/src/main/scala/spark/ui/jobs/JobProgressUI.scala	2013-07-31 19:36:36 +08:00
cdshines	fefb03cbd7	Eliminated code duplication, refactored to pattern-matching style Partitioner and PythonPartitioner	2013-07-31 13:19:42 +03:00
Dmitriy Lyubimov	96664431cb	IDEA flipped JavaSerialized import at some point to a wrong class.	2013-07-30 23:10:09 -07:00
Dmitriy Lyubimov	c219fc94fd	Minor, style	2013-07-30 22:08:39 -07:00
Dmitriy Lyubimov	f4b4b8836e	reverting back to one-by-one serialization for parallelize()	2013-07-30 19:00:58 -07:00
jerryshao	bf9318091a	Add Apache license header to metrics system	2013-07-31 09:42:16 +08:00
Reynold Xin	98024eadc3	Renamed compressionOutputStream and compressionInputStream to compressedOutputStream and compressedInputStream.	2013-07-30 18:28:46 -07:00
Dmitriy Lyubimov	abada94ebf	removing default constructor (not Externalizable any more)	2013-07-30 18:04:02 -07:00
Dmitriy Lyubimov	943c6590c9	realiging "extends" back manually	2013-07-30 18:01:35 -07:00
Dmitriy Lyubimov	ca33b12e98	resetting wrap and continuation indent = 4	2013-07-30 17:51:44 -07:00
Reynold Xin	dae12fef9e	Updated the configuration option for Snappy block size to be consistent with the documentation.	2013-07-30 17:49:31 -07:00
Dmitriy Lyubimov	984b56155a	changing approaches for parallelize(): java serialization needs to avoid writing headers!	2013-07-30 17:36:59 -07:00
Reynold Xin	311aae76a2	Added Snappy dependency to Maven build files.	2013-07-30 17:25:42 -07:00
Reynold Xin	56774b176e	Added unit test for compression codecs.	2013-07-30 17:12:33 -07:00
Reynold Xin	ad7e9d0d64	CompressionCodec cleanup. Moved it to spark.io package.	2013-07-30 17:11:54 -07:00
Dmitriy Lyubimov	ef9529a943	refactoring using writeByteBuffer() from Utils.	2013-07-30 16:24:23 -07:00
Dmitriy Lyubimov	43394b9a6d	fixing formatting	2013-07-30 16:13:41 -07:00
Dmitriy Lyubimov	13a9d66645	adding ===	2013-07-30 16:10:55 -07:00
Reynold Xin	368c58eac5	Merge branch 'lazy_file_open' of github.com:lyogavin/spark into compression Conflicts: project/SparkBuild.scala	2013-07-30 16:04:18 -07:00
Patrick Wendell	e87de037d6	Merge pull request #744 from karenfeng/bootstrap-update Use Bootstrap progress bars in web UI	2013-07-30 15:00:08 -07:00
Karen Feng	26144c400f	Fixed wrap style	2013-07-30 12:40:41 -07:00
Karen Feng	218d7c4ed8	Fixed style, lowered height of progress bars	2013-07-30 12:39:17 -07:00
Karen Feng	f1cab31b73	Removed intermediate set for activeTasks, removed progress bar margin	2013-07-30 11:06:47 -07:00
Dmitriy Lyubimov	1bca91633e	+ bug fixes; test added Conflicts: core/src/test/scala/spark/KryoSerializerSuite.scala	2013-07-30 11:04:11 -07:00
Benjamin Hindman	f6f46455eb	Added property 'spark.executor.uri' for launching on Mesos without requiring Spark to be installed. Using 'make_distribution.sh' a user can put a Spark distribution at a URI supported by Mesos (e.g., 'hdfs://...') and then set that when launching their job. Also added SPARK_EXECUTOR_URI for the REPL.	2013-07-29 23:32:52 -07:00
Josh Rosen	49be084ed3	Use File.pathSeparator instead of hardcoding ':'.	2013-07-29 22:08:57 -07:00
Josh Rosen	b95732632b	Do not inherit master's PYTHONPATH on workers. This fixes SPARK-832, an issue where PySpark would not work when the master and workers used different SPARK_HOME paths. This change may potentially break code that relied on the master's PYTHONPATH being used on workers. To have custom PYTHONPATH additions used on the workers, users should set a custom PYTHONPATH in spark-env.sh rather than setting it in the shell.	2013-07-29 22:08:57 -07:00
Andrew xia	5406013997	refactor codes less than 100 character per line	2013-07-30 11:41:38 +08:00
Andrew xia	614ee16cc4	refactor job ui with pool information	2013-07-30 10:57:26 +08:00
Dmitriy Lyubimov	8e5cd041bb	initial externalization of ParallelCollectionRDD's split	2013-07-29 19:02:53 -07:00
Reynold Xin	81720e13fc	Moved all StandaloneClusterMessage's into StandaloneClusterMessages object.	2013-07-29 17:53:01 -07:00
Reynold Xin	23b5da14ed	Moved block manager messages into BlockManagerMessages object.	2013-07-29 17:42:05 -07:00
Reynold Xin	105f4d22e9	Removed Cache and SoftReferenceCache since they are no longer used.	2013-07-29 17:30:38 -07:00
Reynold Xin	17e62113d4	Moved DeployMessage's into its own DeployMessages object. Also renamed MasterState to MasterStateResponse and WorkerState to WorkerStateResponse for clarity.	2013-07-29 17:14:44 -07:00
Karen Feng	87b821dc39	Fixed continuity of executorToTasksActive, changed color of progress bars	2013-07-29 16:50:51 -07:00
Karen Feng	c7b2788948	Merge branch 'master' of https://github.com/mesos/spark into bootstrap-update Conflicts: core/src/main/scala/spark/ui/jobs/IndexPage.scala	2013-07-29 16:36:07 -07:00
Patrick Wendell	c99b674405	Merge pull request #735 from karenfeng/ui-807 Totals for shuffle data and CPU time	2013-07-29 16:32:55 -07:00
Karen Feng	2d6da9195a	Alphabetized imports	2013-07-29 15:50:52 -07:00
Karen Feng	478a2886d9	Added started tasks to progress bar	2013-07-29 14:51:07 -07:00
Karen Feng	e04a37a332	Merge branch 'master' of https://github.com/mesos/spark into bootstrap-update cially if it merges an updated upstream into a topic branch.	2013-07-29 14:32:48 -07:00
Reynold Xin	fe7298b587	Merge pull request #741 from pwendell/usability Fix two small usability issues	2013-07-29 14:01:00 -07:00
Karen Feng	43a2cc15c0	Use Bootstrap progress bars in web UI	2013-07-29 13:37:24 -07:00
Matei Zaharia	b9d6783f36	Optimize Python take() to not compute entire first partition	2013-07-29 02:51:43 -04:00
Dmitriy Lyubimov	f5067abe85	changes per comments.	2013-07-27 23:08:00 -07:00
Karen Feng	077f2dad22	Fixed outdated bugs	2013-07-27 16:39:36 -07:00
Patrick Wendell	bcafb36c1e	Slight wording change	2013-07-27 16:03:50 -07:00
Patrick Wendell	8177165ac4	Log executor on finish	2013-07-27 16:02:06 -07:00
Patrick Wendell	c2223e6801	Improve catch scope and logging for client stop() This does two things: 1. Catches the more general `TimeoutException`, since those can be thrown. 2. Logs at info level when a timeout is detected.	2013-07-27 16:02:06 -07:00
Karen Feng	5a93e3c58c	Cleaned up code based on pwendell's suggestions	2013-07-27 15:55:26 -07:00
Karen Feng	dcc4743a95	Moved val now to render	2013-07-27 12:52:53 -07:00
Karen Feng	1714693324	Current time called once with value now	2013-07-27 12:24:41 -07:00
Dmitriy Lyubimov	6a47cee721	style	2013-07-26 22:35:13 -07:00
Dmitriy Lyubimov	0c391feb73	Maximum task failures configurable	2013-07-26 22:34:43 -07:00
Dmitriy Lyubimov	23f3e0f117	mixing in SharedSparkContext for the kryo-collect test	2013-07-26 19:15:11 -07:00
Karen Feng	bd4cc52e30	Made metrics Option instead of Some, fixed NullPointerException	2013-07-26 17:23:18 -07:00
Reynold Xin	cb366774c8	Merge pull request #738 from harsha2010/pruning Fix bug in Partition Pruning.	2013-07-26 16:59:30 -07:00
harshars	392d7474fd	Code review	2013-07-26 15:23:15 -07:00
harshars	72cf7ec0e5	Indentation	2013-07-26 15:16:41 -07:00
harshars	822aac8f5a	Indentation	2013-07-26 15:10:32 -07:00
harshars	743fc4e7aa	Fix Bug in Partition Pruning, index of Pruned Partitions should inherit from parent	2013-07-26 14:35:17 -07:00
Karen Feng	3fbe9eaac0	Displys shuffle read/write only if exists, wraps if statements, trims old vals, grabs current time once	2013-07-26 11:51:38 -07:00
Karen Feng	22faeab261	Split Shuffle Activity overview column for read/write	2013-07-25 17:14:18 -07:00
Karen Feng	d4bbc8bd25	Shows totals for shuffle data and CPU time in Stage, homepage overviews including active time	2013-07-25 15:59:52 -07:00
Charles Reiss	a6de90c927	For standalone mode, get JAVA_HOME, SPARK_JAVA_OPTS, SPARK_LIBRARY_PATH from application env, not worker env	2013-07-25 12:42:30 -07:00
Matei Zaharia	8eb8b52997	Fix Chill version in Maven	2013-07-25 08:58:02 -07:00
Matei Zaharia	e2421c1311	Update Chill reference in pom.xml too	2013-07-25 00:05:43 -07:00
ryanlecompte	e56aa75de0	fix wrapping	2013-07-24 22:08:09 -07:00
ryanlecompte	fc4b025314	add test	2013-07-24 20:53:15 -07:00
ryanlecompte	a1c515fb02	add copyright back in	2013-07-24 20:50:32 -07:00
ryanlecompte	8e0939f5a9	refactor Kryo serializer support to use chill/chill-java	2013-07-24 20:43:57 -07:00
Karen Feng	57009eef90	Fixed consistency of "success" status string	2013-07-24 13:43:09 -07:00
Karen Feng	4280e1768d	Removed finished status for task info, changed name of success case	2013-07-24 12:48:48 -07:00
Karen Feng	bd3931c874	Changed ifs with returns to if/else	2013-07-24 11:27:17 -07:00
Karen Feng	93c6015f82	Shows task status and running tasks on Stage Page: fixes SPARK-804 and 811	2013-07-24 10:53:02 -07:00
jerryshao	31ec72b243	Code refactor according to comments	2013-07-24 14:57:47 +08:00
jerryshao	8d1ef7f2df	Code style changes	2013-07-24 14:57:47 +08:00
Andrew xia	05637de842	Change class xxxInstrumentation to class xxxSource	2013-07-24 14:57:47 +08:00
Andrew xia	ed1a3bc206	continue to refactor code style and functions	2013-07-24 14:57:47 +08:00
jerryshao	5730193e0c	Fix some typos	2013-07-24 14:57:47 +08:00
jerryshao	a79f6077f0	Add Maven metrics library dependency and code changes	2013-07-24 14:57:47 +08:00
jerryshao	1daff54b2e	Change Executor MetricsSystem initialize code to SparkEnv	2013-07-24 14:57:47 +08:00
Andrew xia	5f8802c1fb	Register and init metricsSystem in SparkContext Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/SparkEnv.scala	2013-07-24 14:57:47 +08:00
Andrew xia	9cea0c2818	Refactor metricsSystem unit test, add resource files.	2013-07-24 14:57:47 +08:00
Andrew xia	7d2eada451	Add metrics source of DAGScheduler and blockManager Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/SparkEnv.scala	2013-07-24 14:57:47 +08:00
jerryshao	e9ac88754d	Remove twice add Source bug and code clean	2013-07-24 14:57:47 +08:00
jerryshao	e080588f73	Add metrics system unit test	2013-07-24 14:57:47 +08:00
jerryshao	5ce5dc9fcd	Add default properties to deal with no configure file situation	2013-07-24 14:57:47 +08:00
jerryshao	871bc1687e	Add Executor instrumentation	2013-07-24 14:57:46 +08:00
jerryshao	7fb574bf66	Code clean and remarshal	2013-07-24 14:57:46 +08:00
Andrew xia	4d6dd67fa1	refactor metrics system 1.change source abstract class to support MetricRegistry 2.change master/work/jvm source class	2013-07-24 14:57:46 +08:00
jerryshao	03f9871116	MetricsSystem refactor	2013-07-24 14:57:46 +08:00
jerryshao	c3daad3f65	Update metric source support for instrumentation	2013-07-24 14:57:46 +08:00
jerryshao	9dec8c73e6	Add Master and Worker instrumentation support	2013-07-24 14:57:46 +08:00
jerryshao	503acd3a37	Build metrics system framwork	2013-07-24 14:57:46 +08:00
Matei Zaharia	b011329040	Merge pull request #727 from rxin/scheduler Scheduler code style cleanup.	2013-07-23 22:50:09 -07:00
Matei Zaharia	876125b997	Merge pull request #726 from rxin/spark-826 SPARK-829: scheduler shouldn't hang if a task contains unserializable objects in its closure	2013-07-23 22:28:21 -07:00
Reynold Xin	3dae1df66f	Moved non-serializable closure catching exception from submitStage to submitMissingTasks	2013-07-23 20:29:07 -07:00
Reynold Xin	d33b8a2a0f	Added comments on task closure serialization.	2013-07-23 20:28:39 -07:00
Reynold Xin	85ab8114bc	Moved non-serializable closure catching exception from submitStage to submitMissingTasks	2013-07-23 20:25:58 -07:00
Matei Zaharia	6a31b7191d	Small bug fix	2013-07-23 16:20:24 -07:00

... 6 7 8 9 10 ...

2446 commits