ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
jerryshao	f3dbe6b215	Fix removed block zero size log reporting	2013-08-30 09:39:01 +08:00
Patrick Wendell	abdbacf252	Merge pull request #871 from pwendell/expose-local Expose `isLocal` in SparkContext.	2013-08-28 21:11:31 -07:00
Patrick Wendell	30d2421112	Make local variable public	2013-08-28 19:53:31 -07:00
Matei Zaharia	baa84e7e4c	Merge pull request #865 from tgravescs/fixtmpdir Spark on Yarn should use yarn approved directories for spark.local.dir and tmp	2013-08-28 12:44:46 -07:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	aac1214ee4	Change Executor to only look at the env variable SPARK_YARN_MODE	2013-08-28 13:26:26 -05:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	3f206bf0b5	Updated based on review comments.	2013-08-27 14:34:27 -05:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	cf52a3cba6	Allow for Executors to have different directories then the Spark Master for Yarn	2013-08-27 11:00:21 -05:00
Reynold Xin	a77e0abb96	Added worker state to the cluster master JSON ui.	2013-08-26 11:21:03 -07:00
Reynold Xin	9db1e50344	Revert "Merge pull request #841 from rxin/json" This reverts commit `1fb1b09928`, reversing changes made to `c69c48947d`.	2013-08-26 11:05:14 -07:00
Matei Zaharia	c2d00f12e2	Merge pull request #832 from alig/coalesce Coalesced RDD with locality	2013-08-22 10:13:03 -07:00
Mark Hamstra	5eea613ec0	Removed meaningless types	2013-08-20 16:49:18 -07:00
Ali Ghodsi	f20ed14e87	Merged in from upstream to use TaskLocation instead of strings	2013-08-20 16:21:43 -07:00
Ali Ghodsi	5cd21c4195	added curly braces to make the code more consistent	2013-08-20 16:16:05 -07:00
Ali Ghodsi	db4bc55bef	indent	2013-08-20 16:16:05 -07:00
Ali Ghodsi	7b123b3126	Simpler code	2013-08-20 16:16:05 -07:00
Ali Ghodsi	9192c358e4	simpler code	2013-08-20 16:16:05 -07:00
Ali Ghodsi	a75a64eade	Fixed almost all of Matei's feedback	2013-08-20 16:16:05 -07:00
Ali Ghodsi	f1c853d76d	fixed Matei's comments	2013-08-20 16:16:04 -07:00
Ali Ghodsi	890ea6ba79	making CoalescedRDDPartition public	2013-08-20 16:16:04 -07:00
Ali Ghodsi	b69e7166ba	Coalescer now uses current preferred locations for derived RDDs. Made run() in DAGScheduler thread safe and added a method to be able to ask it for preferred locations. Added a similar method that wraps the former inside SparkContext.	2013-08-20 16:16:04 -07:00
Ali Ghodsi	abcefb3858	fixed matei's comments	2013-08-20 16:13:37 -07:00
Ali Ghodsi	35537e6341	Made a function object that returns the coalesced groups	2013-08-20 16:13:37 -07:00
Ali Ghodsi	339598c080	several of Reynold's suggestions implemented	2013-08-20 16:13:37 -07:00
Ali Ghodsi	02d6464f2f	space removed	2013-08-20 16:13:37 -07:00
Ali Ghodsi	4f99be1ffd	use count rather than foreach	2013-08-20 16:13:37 -07:00
Ali Ghodsi	f67753cdfc	made preferredLocation a val of the surrounding case class	2013-08-20 16:13:37 -07:00
Ali Ghodsi	f24861b60a	Fix bug in tests	2013-08-20 16:13:36 -07:00
Ali Ghodsi	f6e47e8b51	Renamed split to partition	2013-08-20 16:13:36 -07:00
Ali Ghodsi	937f72feb8	word wrap before 100 chars per line	2013-08-20 16:13:36 -07:00
Ali Ghodsi	c4d59910b1	added goals inline as comment	2013-08-20 16:13:36 -07:00
Ali Ghodsi	7a2a33e32d	Large scale load and locality tests for the coalesced partitions added	2013-08-20 16:13:36 -07:00
Ali Ghodsi	66edf854aa	Bug, should compute slack wrt parent partition size, not number of bins	2013-08-20 16:13:36 -07:00
Ali Ghodsi	1ede102ba5	load balancing coalescer	2013-08-20 16:13:36 -07:00
Matei Zaharia	aa2b89d98d	Merge remote-tracking branch 'jey/hadoop-agnostic' Conflicts: core/src/main/scala/spark/PairRDDFunctions.scala	2013-08-20 10:14:15 -07:00
Mark Hamstra	1630fbf838	changeGeneration --> changeEpoch renaming	2013-08-20 00:17:16 -07:00
Mark Hamstra	ad18410427	Renamed 'priority' to 'jobId' and assorted minor changes	2013-08-20 00:07:04 -07:00
Matei Zaharia	8cae72e94e	Merge pull request #828 from mateiz/sched-improvements Scheduler fixes and improvements	2013-08-19 23:40:04 -07:00
Matei Zaharia	efeb142981	Merge pull request #849 from mateiz/web-fixes Small fixes to web UI	2013-08-19 19:23:50 -07:00
Matei Zaharia	abdc1f8bbb	Merge pull request #847 from rxin/rdd Allow subclasses of Product2 in all key-value related classes	2013-08-19 18:30:56 -07:00
Matei Zaharia	498a26189b	Small fixes to web UI: - Use SPARK_PUBLIC_DNS environment variable if set (for EC2) - Use a non-ephemeral port (3030 instead of 33000) by default - Updated test to use non-ephemeral port too	2013-08-19 18:17:49 -07:00
Reynold Xin	5054abd41b	Code review feedback. (added tests for cogroup and substract; added more documentation on MutablePair)	2013-08-19 12:58:02 -07:00
Reynold Xin	71d705a66e	Made PairRDDFunctions taking only Tuple2, but made the rest of the shuffle code path working with general Product2.	2013-08-19 00:40:43 -07:00
Reynold Xin	2a7b99c08b	Added the missing RDD files and cleaned up SparkContext.	2013-08-18 20:39:29 -07:00
Reynold Xin	82bf4c0339	Allow subclasses of Product2 in all key-value related classes (ShuffleDependency, PairRDDFunctions, etc).	2013-08-18 20:25:45 -07:00
Matei Zaharia	8ac3d1e263	Added unit tests for ClusterTaskSetManager, and fix a bug found with resetting locality level after a non-local launch	2013-08-18 19:51:07 -07:00
Matei Zaharia	4004cf775d	Added some comments on threading in scheduler code	2013-08-18 19:51:07 -07:00
Matei Zaharia	2a4ed10210	Address some review comments: - When a resourceOffers() call has multiple offers, force the TaskSets to consider them in increasing order of locality levels so that they get a chance to launch stuff locally across all offers - Simplify ClusterScheduler.prioritizeContainers - Add docs on the new configuration options	2013-08-18 19:51:07 -07:00
Matei Zaharia	222c897128	Comment cleanup (via Kay) and some debug messages	2013-08-18 19:51:07 -07:00
Matei Zaharia	cf39d45d14	More scheduling fixes: - Added periodic revival of offers in StandaloneSchedulerBackend - Replaced task scheduling aggression with multi-level delay scheduling in ClusterTaskSetManager - Fixed ZippedRDD preferred locations because they can't currently be process-local - Fixed some uses of hostPort	2013-08-18 19:51:07 -07:00
Matei Zaharia	90a04dab8d	Initial work towards scheduler refactoring: - Replace use of hostPort vs host in Task.preferredLocations with a TaskLocation class that contains either an executorId and a host or just a host. This is part of a bigger effort to eliminate hostPort based data structures and just use executorID, since the hostPort vs host stuff is confusing (and not checkable with static typing, leading to ugly debug code), and hostPorts are not provided by Mesos. - Replaced most hostPort-based data structures and fields as above. - Simplified ClusterTaskSetManager to deal with preferred locations in a more concise way and generally be more concise. - Updated the way ClusterTaskSetManager handles racks: instead of enqueueing a task to a separate queue for all the hosts in the rack, which would create lots of large queues, have one queue per rack name. - Removed non-local fallback stuff in ClusterScheduler that tried to launch less-local tasks on a node once the local ones were all assigned. This change didn't work because many cluster schedulers send offers for just one node at a time (even the standalone and YARN ones do so as nodes join the cluster one by one). Thus, lots of non-local tasks would be assigned even though a node with locality for them would be able to receive tasks just a short time later. - Renamed MapOutputTracker "generations" to "epochs".	2013-08-18 19:51:06 -07:00
Matei Zaharia	8fa0747978	Merge pull request #840 from AndreSchumacher/zipegg Implementing SPARK-878 for PySpark: adding zip and egg files to context ...	2013-08-18 17:02:54 -07:00
Reynold Xin	2c00ea3efc	Moved shuffle serializer setting from a constructor parameter to a setSerializer method in various RDDs that involve shuffle operations.	2013-08-17 21:43:29 -07:00
Reynold Xin	0e84fee76b	Removed the mapSideCombine option in partitionBy.	2013-08-17 21:13:41 -07:00
Reynold Xin	10af952a3d	Removed the mapSideCombine option in CoGroupedRDD.	2013-08-17 21:07:34 -07:00
Reynold Xin	5d050a3e1f	Removed the unused shuffleId in ShuffleDependency's constructor.	2013-08-16 23:23:16 -07:00
Matei Zaharia	e89ffc7b3c	Merge pull request #839 from jegonzal/zip_partitions Currying RDD.zipPartitions	2013-08-16 14:02:34 -07:00
Joseph E. Gonzalez	53b2639a1e	Reversing the argument order in zipPartitions to enable stronger type inference.	2013-08-16 12:38:59 -07:00
Andre Schumacher	c7e348faec	Implementing SPARK-878 for PySpark: adding zip and egg files to context and passing it down to workers which add these to their sys.path	2013-08-16 11:58:20 -07:00
Reynold Xin	c961c19b7b	Use the JSON formatter from Scala library and removed dependency on lift-json. It made the JSON creation slightly more complicated, but reduces one external dependency. The scala library also properly escape "/" (which lift-json doesn't).	2013-08-15 18:23:01 -07:00
Reynold Xin	eddbf43b54	Revert "Merge pull request #834 from Daemoen/master" This reverts commit `230ab2722e`, reversing changes made to `659553b21d`.	2013-08-15 17:49:37 -07:00
Reynold Xin	230ab2722e	Merge pull request #834 from Daemoen/master Updated json output to allow for display of worker state	2013-08-15 17:45:17 -07:00
Patrick Wendell	659553b21d	Merge pull request #836 from pwendell/rename Rename `memoryBytesToString` and `memoryMegabytesToString`	2013-08-15 16:56:31 -07:00
Jey Kottalam	a06a9d5c5f	Rename HadoopWriter to SparkHadoopWriter since it's outside of our package	2013-08-15 16:50:37 -07:00
Jey Kottalam	8f979edef5	Fix newTaskAttemptID to work under YARN	2013-08-15 16:50:37 -07:00
Jey Kottalam	e2d7656ca3	re-enable YARN support	2013-08-15 16:50:37 -07:00
Jey Kottalam	bd0bab47c9	SparkEnv isn't available this early, and not needed anyway	2013-08-15 16:50:37 -07:00
Jey Kottalam	4f43fd791a	make SparkHadoopUtil a member of SparkEnv	2013-08-15 16:50:37 -07:00
Jey Kottalam	43ebcb8484	rename HadoopMapRedUtil => SparkHadoopMapRedUtil, HadoopMapReduceUtil => SparkHadoopMapReduceUtil	2013-08-15 16:50:37 -07:00
Jey Kottalam	8b1c1520fc	add comment	2013-08-15 16:50:37 -07:00
Jey Kottalam	69c3bbf688	dynamically detect hadoop version	2013-08-15 16:50:37 -07:00
Jey Kottalam	f67b94ad4f	remove core/src/hadoop{1,2} dirs	2013-08-15 16:50:36 -07:00
Patrick Wendell	4c6ade1ad5	Rename `memoryBytesToString` and `memoryMegabytesToString` These are used all over the place now and they are not specific to memory at all. memoryBytesToString --> bytesToString memoryMegabytesToString --> megabytesToString	2013-08-15 15:58:07 -07:00
Reynold Xin	1a51deae8a	More minor UI changes including code review feedback.	2013-08-15 14:34:07 -07:00
Daemoen	ad2e8b5126	Updated json output to allow for display of worker state Ops teams need to ensure that the cluster is functional and performant. Having to scrape the html source for worker state won't work reliably, and will be slow. By exposing the state in the json output, ops teams are able to ensure a fully functional environment by querying for the json output and parsing for dead nodes.	2013-08-15 12:19:14 -07:00
Reynold Xin	2d2a556bdf	Various UI improvements.	2013-08-14 23:23:09 -07:00
Reynold Xin	290e3e6e65	Renamed setCurrentJobDescription to setJobDescription.	2013-08-14 18:40:53 -07:00
Reynold Xin	3886b54933	A few small scheduler / job description changes. 1. Renamed SparkContext.addLocalProperty to setLocalProperty. And allow this function to unset a property. 2. Renamed SparkContext.setDescription to setCurrentJobDescription. 3. Throw an exception if the fair scheduler allocation file is invalid.	2013-08-14 17:19:42 -07:00
Matei Zaharia	839f2d4f3f	Merge pull request #822 from pwendell/ui-features Adding GC Stats to TaskMetrics (and three small fixes)	2013-08-14 16:17:23 -07:00
Patrick Wendell	04ad78b09d	Style cleanup based on Matei feedback	2013-08-14 14:57:21 -07:00
Kay Ousterhout	a88aa5e6ed	Fixed 2 bugs in executor UI. 1) UI crashed if the executor UI was loaded before any tasks started. 2) The total tasks was incorrectly reported due to using string (rather than int) arithmetic.	2013-08-13 23:44:58 -07:00
Patrick Wendell	c223176388	Small style clean-up	2013-08-13 16:56:37 -07:00
Patrick Wendell	fab5cee111	Correcting terminology in RDD page	2013-08-13 16:25:55 -07:00
Patrick Wendell	024e5c5ce1	Correct sorting order for stages	2013-08-13 16:25:55 -07:00
Patrick Wendell	4e9f0c2df6	Capturing GC detials in TaskMetrics	2013-08-13 16:25:55 -07:00
Patrick Wendell	f0382007dc	Bug fix for display of shuffle read/write metrics. This fixes an error where empty cells are missing if a given task has no shuffle read/write.	2013-08-13 16:25:55 -07:00
Matei Zaharia	d316af9c84	Merge pull request #821 from pwendell/print-launch-command Print run command to stderr rather than stdout	2013-08-13 15:31:01 -07:00
Patrick Wendell	a7feb69ae8	Print run command to stderr rather than stdout	2013-08-13 15:07:03 -07:00
Kay Ousterhout	1beb843a6f	Reuse the set of failed states rather than creating a new object each time	2013-08-13 14:27:40 -07:00
Kay Ousterhout	c92dd627ca	Properly account for killed tasks. The TaskState class's isFinished() method didn't return true for KILLED tasks, which means some resources are never reclaimed for tasks that are killed. This also made it inconsistent with the isFinished() method used by CoarseMesosSchedulerBackend.	2013-08-13 12:40:15 -07:00
Patrick Wendell	ed6a1646e6	Slight change to pr-784	2013-08-13 09:29:40 -07:00
Patrick Wendell	a0133bfbad	Merge pull request #784 from jerryshao/dev-metrics-servlet Add MetricsServlet for Spark metrics system	2013-08-13 09:28:18 -07:00
Matei Zaharia	65d0d91fba	Merge pull request #807 from JoshRosen/guava-optional Change scala.Option to Guava Optional in Java APIs	2013-08-12 19:00:57 -07:00
Josh Rosen	cf08bb7a3e	Fix import organization.	2013-08-12 18:55:02 -07:00
jerryshao	09c7179e81	MetricsServlet code refactor according to comments	2013-08-12 13:23:23 +08:00
jerryshao	320e87e7ab	Add MetricsServlet for Spark metrics system	2013-08-12 13:23:23 +08:00
Reynold Xin	e5b9ed2833	Merge pull request #808 from pwendell/ui_compressed_bytes Report compressed bytes read when calculating TaskMetrics	2013-08-11 17:22:47 -07:00
Patrick Wendell	3d8f281604	Report compressed bytes read when calculating TaskMetrics	2013-08-11 16:25:57 -07:00
Matei Zaharia	379648630b	Merge pull request #805 from woggle/hadoop-rdd-jobconf Use new Configuration() instead of slower new JobConf() in SerializableWritable	2013-08-11 14:51:47 -07:00
Josh Rosen	d7f78b443b	Change scala.Option to Guava Optional in Java APIs.	2013-08-11 12:05:09 -07:00
Charles Reiss	6402b539d0	Use new Configuration() instead of new JobConf() for ObjectWritable. JobConf's constructor loads default config files in some verisons of Hadoop, which is quite slow, and we only need the Configuration object to pass the correct ClassLoader.	2013-08-10 21:31:05 -07:00
Matei Zaharia	71c63de22f	Merge pull request #795 from mridulm/master Fix bug reported in PR 791 : a race condition in ConnectionManager and Connection	2013-08-10 10:21:20 -07:00
Matei Zaharia	d3277a0daf	Merge remote-tracking branch 'origin/pr/792' Conflicts: core/src/main/scala/spark/ui/jobs/IndexPage.scala core/src/main/scala/spark/ui/jobs/StagePage.scala	2013-08-10 10:18:50 -07:00
Patrick Wendell	d17eeb997d	Merge pull request #785 from anfeng/master expose HDFS file system stats via Executor metrics	2013-08-10 09:02:27 -07:00
Kay Ousterhout	14d14f451a	Shortened names, as per Matei's suggestion	2013-08-10 07:50:27 -07:00
Matei Zaharia	cd247ba5bb	Merge pull request #786 from shivaram/mllib-java Java fixes, tests and examples for ALS, KMeans	2013-08-09 20:41:13 -07:00
Kay Ousterhout	7810a76512	Only print event queue full error message once	2013-08-09 18:20:48 -07:00
Kay Ousterhout	44ca8629d8	Style fix: removing unnecessary return type	2013-08-09 17:22:50 -07:00
Kay Ousterhout	29b79714f9	Style fixes based on code review	2013-08-09 16:46:34 -07:00
Kay Ousterhout	81e1d4a7d1	Refactored SparkListener to process all events asynchronously. This commit fixes issues where SparkListeners that take a while to process events slow the DAGScheduler. This commit also fixes a bug in the UI where if a user goes to a web page of a stage that does not exist, they can create a memory leak (granted, this is not an issue at small scale -- probably only an issue if someone actively tried to DOS the UI).	2013-08-09 13:27:41 -07:00
Matei Zaharia	b09d4b79e8	Merge pull request #799 from woggle/sync-fix Remove extra synchronization in ResultTask	2013-08-09 13:17:08 -07:00
Patrick Wendell	cc6b92e80e	Merge pull request #775 from pwendell/print-launch-command Log the launch command for Spark daemons	2013-08-09 13:00:33 -07:00
Patrick Wendell	3970b580c2	Using quotes when printing out command	2013-08-09 11:53:32 -07:00
Charles Reiss	9dfc280f74	Remove extra synchronization in ResultTask	2013-08-09 11:09:02 -07:00
Matei Zaharia	f94fc75c3f	Merge pull request #788 from shane-huang/sparkjavaopts For standalone mode, add worker local env setting of SPARK_JAVA_OPTS as ...	2013-08-09 10:04:03 -07:00
Mridul Muralidharan	c230ca3b4e	Change line size	2013-08-08 22:28:40 +05:30
Mridul Muralidharan	dc47084f4e	Attempt to fix bug reported in PR 791 : a race condition in ConnectionManager and Connection	2013-08-08 22:19:27 +05:30
Kay Ousterhout	88049a214d	Fixed 3 bugs that caused UI to crash (including SPARK-810). One bug caused the UI to crash if you try to look at a job's status before any of the tasks have finished. The second bug was a concurrency issue where two different threads (the scheduling thread and a UI thread) could be reading/updating the data structures in JobProgressListener concurrently. The third bug mis-used an Option, also causing the UI to crash under certain conditions.	2013-08-07 23:09:25 -07:00
Patrick Wendell	b4321edf68	Reverting boostrap change	2013-08-07 22:18:18 -07:00
Patrick Wendell	21392f2a73	Change I forgot to merge in	2013-08-07 21:45:32 -07:00
Patrick Wendell	706394b370	Bumping font size to 14px and fixing sytle issue in progress bars	2013-08-07 21:27:04 -07:00
Patrick Wendell	8c0d668468	Merge branch 'master' into bootstrap-design Conflicts: core/src/main/scala/spark/ui/UIUtils.scala core/src/main/scala/spark/ui/jobs/IndexPage.scala core/src/main/scala/spark/ui/storage/RDDPage.scala	2013-08-07 21:06:03 -07:00
Kay Ousterhout	b88e26248e	Fixed issue in UI that limited scheduler throughput. Removal of items from ArrayBuffers in the UI code was slow and significantly impacted scheduler throughput. This commit improves scheduler throughput by 5x.	2013-08-07 14:42:05 -07:00
shane-huang	cbc5107e36	For standalone mode, add worker local env setting of SPARK_JAVA_OPTS as default and let application env override default options if applicable Signed-off-by: shane-huang <shengsheng.huang@intel.com>	2013-08-07 14:36:48 +08:00
Matei Zaharia	6b043a6f11	Merge pull request #724 from dlyubimov/SPARK-826 SPARK-826: fold(), reduce(), collect() always attempt to use java serialization	2013-08-06 22:31:02 -07:00
Matei Zaharia	7c4b7a53b1	Merge remote-tracking branch 'origin/pr/781' Conflicts: core/src/main/resources/spark/ui/static/webui.css	2013-08-06 17:19:49 -07:00
Karen Feng	908032e79b	Used saturated colors for progress bars	2013-08-06 16:52:21 -07:00
Karen Feng	8bc497fa10	Lightened color of progress bars	2013-08-06 16:33:05 -07:00
Karen Feng	ca1903ea63	Overlays progress text on top of bar	2013-08-06 15:45:42 -07:00
Matei Zaharia	df4d10d630	Merge pull request #779 from adatao/adatao-global-SparkEnv [HOTFIX] Extend thread safety for SparkEnv.get()	2013-08-06 15:44:05 -07:00
Shivaram Venkataraman	471fbadd0c	Java examples, tests for KMeans and ALS - Changes ALS to accept RDD[Rating] instead of (Int, Int, Double) making it easier to call from Java - Renames class methods from `train` to `run` to enable static methods to be called from Java. - Add unit tests which check if both static / class methods can be called. - Also add examples which port the main() function in ALS, KMeans to the examples project. Couple of minor changes to existing code: - Add a toJavaRDD method in RDD to convert scala RDD to java RDD easily - Workaround a bug where using double[] from Java leads to class cast exception in KMeans init	2013-08-06 15:43:46 -07:00
anfeng	dda2ac8b5d	reformat registerFileSystemStat()	2013-08-06 15:22:25 -07:00
Karen Feng	099528b6c4	Pre-sorts stage/env tables, changes text/link of stage summaries	2013-08-06 14:52:12 -07:00
Karen Feng	254a930730	Reverse sorts StageTable by submitted time	2013-08-06 14:18:38 -07:00
Karen Feng	5ed5b73026	Sorts first column of env tables	2013-08-06 13:59:53 -07:00
anfeng	0748c60817	expose HDFS file system stats via Executor metrics	2013-08-06 11:47:06 -07:00
Reynold Xin	d031f73679	Merge pull request #782 from WANdisco/master SHARK-94 Log the files computed by HadoopRDD and NewHadoopRDD	2013-08-05 22:33:00 -07:00
Matei Zaharia	1b63dea816	Merge pull request #769 from markhamstra/NegativeCores SPARK-847 + SPARK-845: Zombie workers and negative cores	2013-08-05 22:21:26 -07:00
Alexander Pivovarov	a30866438b	SHARK-94 Log the files computed by HadoopRDD and NewHadoopRDD	2013-08-05 21:48:43 -07:00
Matei Zaharia	8b277892c9	Merge pull request #774 from pwendell/job-description Show user-defined job name in UI	2013-08-05 19:14:52 -07:00
Christopher Nguyen	b1bbbe699c	[HOTFIX] Mark lastSetSparkEnv @volatile in case it gets HotSpot-cached On branch adatao-global-SparkEnv Changes to be committed: modified: core/src/main/scala/spark/SparkEnv.scala	2013-08-05 17:22:27 -07:00
Mark Hamstra	35d8f5ee52	Moved handling of timed out workers within the Master actor	2013-08-05 13:13:56 -07:00
Mark Hamstra	37ccf9301a	milliseconds -> seconds in timeOutDeadWorkers logging	2013-08-05 13:13:56 -07:00
Mark Hamstra	cdd1af562e	Timeout zombie workers	2013-08-05 13:13:56 -07:00
Mikhail Bautin	e8bec8365f	Only reduce the number of cores once when removing an executor	2013-08-05 13:13:56 -07:00
Karen Feng	95025afdec	Made most small fixes for SPARK-849 except for table sort, task progress overlay	2013-08-05 13:04:56 -07:00
Bill Zhao	87134b3648	SPARK-850: give better console message	2013-08-05 11:55:35 -07:00
Christopher Nguyen	39e4fda76f	[HOTFIX] Extend thread safety for SparkEnv.get() A ThreadLocal SparkEnv.env is facing various situations leading to NullPointerExceptions, where SparkEnv.env set in one thread is not gettable in another thread, but often assumed to be available. See, e.g., https://groups.google.com/forum/#!topic/spark-developers/GLx8yunSj0A This hotfixes SparkEnv.env to return either (a) the ThreadLocal value if non-null, or (b) the previously set value in any thread. This approach preserves SparkEnv.set() thread safety needed by RDD.compute() and possibly other places. A refactoring that parameterizes SparkEnv should be addressed subsequently. On branch adatao-global-SparkEnv Changes to be committed: modified: core/src/main/scala/spark/SparkEnv.scala	2013-08-05 02:09:54 -07:00
Patrick Wendell	f3660d5ab8	Make output formatting consistent between bash/scala	2013-08-03 21:30:15 -07:00
Patrick Wendell	ad94fbb322	Log the launch command for Spark executors	2013-08-03 09:19:46 -07:00
Matei Zaharia	22abbc10d6	Merge pull request #772 from karenfeng/ui-843 Show app duration	2013-08-02 16:37:59 -07:00
Patrick Wendell	5b3784a79c	Show user-defined job name in UI	2013-08-02 15:47:41 -07:00
Karen Feng	b3ae5b25d5	Shows time the app has been running	2013-08-02 13:25:14 -07:00
Patrick Wendell	9d7dfd2d5a	Merge pull request #743 from pwendell/app-metrics Add application metrics to standalone master	2013-08-01 17:41:58 -07:00
Patrick Wendell	f1d2ad550e	under_scores --> camelCase for config options	2013-08-01 15:26:26 -07:00
Patrick Wendell	12d9c82c9b	Small style fix	2013-08-01 15:25:52 -07:00
Patrick Wendell	37bc64a205	Adding application-level metrics. This adds metrics for applications in the deploy Master.	2013-08-01 15:25:52 -07:00
Karen Feng	73692f3cb9	Unify, reduce body font size	2013-08-01 15:10:30 -07:00
Patrick Wendell	87fd321a5a	Minor refactoring and code cleanup	2013-08-01 15:02:31 -07:00
Patrick Wendell	b10199413a	Slight refactoring to SparkContext functions	2013-08-01 15:00:42 -07:00
Patrick Wendell	cfcd77b5da	Increasing inter job arrival	2013-08-01 15:00:42 -07:00
Patrick Wendell	5faac7f4f3	Minor style fixes	2013-08-01 15:00:42 -07:00
Patrick Wendell	5e7b38fbb3	Merge pull request #695 from xiajunluan/pool_ui Enhance job ui in spark ui system with adding pool information	2013-08-01 14:59:33 -07:00
Karen Feng	47600e9579	Removed hr margin	2013-08-01 14:57:04 -07:00
Karen Feng	e648a62fc8	Inserted needed line break for log paging	2013-08-01 14:46:19 -07:00
Karen Feng	686d6266c4	Use nav pills instead of default	2013-08-01 14:41:49 -07:00
Karen Feng	86d372d17f	Removed line breaks	2013-08-01 14:37:21 -07:00
Karen Feng	99803d88b9	Reduced all header sizes	2013-08-01 14:18:33 -07:00
Karen Feng	d216d687ef	Reduced size of table text to compact	2013-08-01 13:27:23 -07:00
Karen Feng	5dae283996	Merge branch 'master' of https://github.com/mesos/spark into bootstrap-update	2013-08-01 11:28:28 -07:00
Matei Zaharia	0a96493ac6	Merge pull request #760 from karenfeng/heading-update Clean up web UI page headers	2013-08-01 11:27:17 -07:00
Patrick Wendell	9177bea2b4	Removing extra imports	2013-08-01 10:42:50 -07:00
Patrick Wendell	3e4d5e5f8b	Merge branch 'master' into master-json Conflicts: core/src/main/scala/spark/deploy/master/ui/IndexPage.scala	2013-08-01 10:42:07 -07:00
Patrick Wendell	ffc034e4fb	Import cleanup	2013-08-01 10:39:56 -07:00
Andrew xia	d58502a156	fix bug of spark "SubmitStage" listener as unit test error	2013-08-01 23:21:41 +08:00
Andrew xia	3b5a11e765	change function name "setName" to "setProperties" as "setName" is also member of Thread class	2013-08-01 19:37:15 +08:00
Dmitriy Lyubimov	cb6be5bd7e	Merge remote-tracking branch 'mesos/master' into SPARK-826 Conflicts: core/src/main/scala/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/spark/scheduler/local/LocalTaskSetManager.scala core/src/test/scala/spark/KryoSerializerSuite.scala	2013-07-31 22:09:22 -07:00
Dmitriy Lyubimov	28f1550f01	More elegant rewrite of the same.	2013-07-31 21:41:00 -07:00
Dmitriy Lyubimov	7c52ecc6a4	(1) added reduce test case. (2) added nested streaming in ParallelCollectionRDD (3) added kryo with fold test which still doesn't work	2013-07-31 19:27:30 -07:00
Matei Zaharia	3097d75d6f	Merge remote-tracking branch 'dlyubimov/SPARK-827' Conflicts: docs/configuration.md	2013-07-31 18:36:43 -07:00
Karen Feng	7c9c5ef6c6	Merge branch 'master' of https://github.com/mesos/spark into bootstrap-update	2013-07-31 16:39:26 -07:00
Karen Feng	02cde8efdf	Replaces theme with Bootswatch Spacelab theme	2013-07-31 16:34:07 -07:00
Karen Feng	09cd67bf98	Changed bootstrap colors, fixed logpaging buttons	2013-07-31 16:18:53 -07:00
Matei Zaharia	39c75f3033	Merge pull request #757 from BlackNiuza/result_task_generation Bug fix: SPARK-837	2013-07-31 15:52:36 -07:00
Matei Zaharia	14bf2fe039	Merge pull request #749 from benh/spark-executor-uri Added property 'spark.executor.uri' for launching on Mesos.	2013-07-31 14:18:16 -07:00
Benjamin Hindman	4692ea4892	Used 'uri.split('/').last' instead of 'new File(uri).getName()'.	2013-07-31 12:29:44 -07:00
Karen Feng	c453967f9a	Reduced size of heading	2013-07-31 11:57:50 -07:00
Matei Zaharia	a386ced2c6	Merge pull request #754 from rxin/compression Compression codec change	2013-07-31 11:22:50 -07:00
Karen Feng	49e6344142	Removed master URL from job UI, reduced heading size of basic spark pages	2013-07-31 11:17:59 -07:00
Reynold Xin	c61843a69f	Changed other LZF uses to use the compression codec interface.	2013-07-31 10:32:13 -07:00
Patrick Wendell	89da9d94b3	Add JSON path to master index page	2013-07-31 09:47:53 -07:00
BlackNiuza	9a815de4bf	write and read generation in ResultTask	2013-08-01 00:36:47 +08:00
Roman Tkalenko	0c6553714a	Refactored Vector.apply(length, initializer) replacing excessive code with library method (also removed unused variable ```ans``` as minor change)	2013-07-31 19:05:46 +03:00
Matei Zaharia	12553e5c55	Simplified nonNegativeMod to match previous version	2013-07-31 08:50:28 -07:00
Matei Zaharia	d4556f4207	Merge pull request #751 from cdshines/master Cleaned Partitioner & PythonPartitioner source by taking out non-related logic to Utils	2013-07-31 08:48:14 -07:00
Andrew xia	5670c96f29	Merge branch 'master' into Pool_UI Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/scheduler/DAGScheduler.scala core/src/main/scala/spark/scheduler/SparkListener.scala core/src/main/scala/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/spark/scheduler/cluster/TaskSetManager.scala core/src/main/scala/spark/scheduler/local/LocalTaskSetManager.scala core/src/main/scala/spark/ui/jobs/IndexPage.scala core/src/main/scala/spark/ui/jobs/JobProgressUI.scala	2013-07-31 19:36:36 +08:00
cdshines	fefb03cbd7	Eliminated code duplication, refactored to pattern-matching style Partitioner and PythonPartitioner	2013-07-31 13:19:42 +03:00
Dmitriy Lyubimov	96664431cb	IDEA flipped JavaSerialized import at some point to a wrong class.	2013-07-30 23:10:09 -07:00
Dmitriy Lyubimov	c219fc94fd	Minor, style	2013-07-30 22:08:39 -07:00
Dmitriy Lyubimov	f4b4b8836e	reverting back to one-by-one serialization for parallelize()	2013-07-30 19:00:58 -07:00
jerryshao	bf9318091a	Add Apache license header to metrics system	2013-07-31 09:42:16 +08:00
Reynold Xin	98024eadc3	Renamed compressionOutputStream and compressionInputStream to compressedOutputStream and compressedInputStream.	2013-07-30 18:28:46 -07:00
Dmitriy Lyubimov	abada94ebf	removing default constructor (not Externalizable any more)	2013-07-30 18:04:02 -07:00
Dmitriy Lyubimov	943c6590c9	realiging "extends" back manually	2013-07-30 18:01:35 -07:00
Dmitriy Lyubimov	ca33b12e98	resetting wrap and continuation indent = 4	2013-07-30 17:51:44 -07:00
Reynold Xin	dae12fef9e	Updated the configuration option for Snappy block size to be consistent with the documentation.	2013-07-30 17:49:31 -07:00
Dmitriy Lyubimov	984b56155a	changing approaches for parallelize(): java serialization needs to avoid writing headers!	2013-07-30 17:36:59 -07:00
Reynold Xin	ad7e9d0d64	CompressionCodec cleanup. Moved it to spark.io package.	2013-07-30 17:11:54 -07:00
Dmitriy Lyubimov	ef9529a943	refactoring using writeByteBuffer() from Utils.	2013-07-30 16:24:23 -07:00
Dmitriy Lyubimov	43394b9a6d	fixing formatting	2013-07-30 16:13:41 -07:00
Reynold Xin	368c58eac5	Merge branch 'lazy_file_open' of github.com:lyogavin/spark into compression Conflicts: project/SparkBuild.scala	2013-07-30 16:04:18 -07:00
Patrick Wendell	e87de037d6	Merge pull request #744 from karenfeng/bootstrap-update Use Bootstrap progress bars in web UI	2013-07-30 15:00:08 -07:00
Karen Feng	26144c400f	Fixed wrap style	2013-07-30 12:40:41 -07:00
Karen Feng	218d7c4ed8	Fixed style, lowered height of progress bars	2013-07-30 12:39:17 -07:00
Karen Feng	f1cab31b73	Removed intermediate set for activeTasks, removed progress bar margin	2013-07-30 11:06:47 -07:00
Dmitriy Lyubimov	1bca91633e	+ bug fixes; test added Conflicts: core/src/test/scala/spark/KryoSerializerSuite.scala	2013-07-30 11:04:11 -07:00
Benjamin Hindman	f6f46455eb	Added property 'spark.executor.uri' for launching on Mesos without requiring Spark to be installed. Using 'make_distribution.sh' a user can put a Spark distribution at a URI supported by Mesos (e.g., 'hdfs://...') and then set that when launching their job. Also added SPARK_EXECUTOR_URI for the REPL.	2013-07-29 23:32:52 -07:00
Josh Rosen	49be084ed3	Use File.pathSeparator instead of hardcoding ':'.	2013-07-29 22:08:57 -07:00
Josh Rosen	b95732632b	Do not inherit master's PYTHONPATH on workers. This fixes SPARK-832, an issue where PySpark would not work when the master and workers used different SPARK_HOME paths. This change may potentially break code that relied on the master's PYTHONPATH being used on workers. To have custom PYTHONPATH additions used on the workers, users should set a custom PYTHONPATH in spark-env.sh rather than setting it in the shell.	2013-07-29 22:08:57 -07:00
Andrew xia	5406013997	refactor codes less than 100 character per line	2013-07-30 11:41:38 +08:00
Andrew xia	614ee16cc4	refactor job ui with pool information	2013-07-30 10:57:26 +08:00
Dmitriy Lyubimov	8e5cd041bb	initial externalization of ParallelCollectionRDD's split	2013-07-29 19:02:53 -07:00
Reynold Xin	81720e13fc	Moved all StandaloneClusterMessage's into StandaloneClusterMessages object.	2013-07-29 17:53:01 -07:00
Reynold Xin	23b5da14ed	Moved block manager messages into BlockManagerMessages object.	2013-07-29 17:42:05 -07:00
Reynold Xin	105f4d22e9	Removed Cache and SoftReferenceCache since they are no longer used.	2013-07-29 17:30:38 -07:00
Reynold Xin	17e62113d4	Moved DeployMessage's into its own DeployMessages object. Also renamed MasterState to MasterStateResponse and WorkerState to WorkerStateResponse for clarity.	2013-07-29 17:14:44 -07:00
Karen Feng	87b821dc39	Fixed continuity of executorToTasksActive, changed color of progress bars	2013-07-29 16:50:51 -07:00
Karen Feng	c7b2788948	Merge branch 'master' of https://github.com/mesos/spark into bootstrap-update Conflicts: core/src/main/scala/spark/ui/jobs/IndexPage.scala	2013-07-29 16:36:07 -07:00
Patrick Wendell	c99b674405	Merge pull request #735 from karenfeng/ui-807 Totals for shuffle data and CPU time	2013-07-29 16:32:55 -07:00
Karen Feng	2d6da9195a	Alphabetized imports	2013-07-29 15:50:52 -07:00
Karen Feng	478a2886d9	Added started tasks to progress bar	2013-07-29 14:51:07 -07:00
Karen Feng	e04a37a332	Merge branch 'master' of https://github.com/mesos/spark into bootstrap-update cially if it merges an updated upstream into a topic branch.	2013-07-29 14:32:48 -07:00
Reynold Xin	fe7298b587	Merge pull request #741 from pwendell/usability Fix two small usability issues	2013-07-29 14:01:00 -07:00
Karen Feng	43a2cc15c0	Use Bootstrap progress bars in web UI	2013-07-29 13:37:24 -07:00
Matei Zaharia	b9d6783f36	Optimize Python take() to not compute entire first partition	2013-07-29 02:51:43 -04:00
Dmitriy Lyubimov	f5067abe85	changes per comments.	2013-07-27 23:08:00 -07:00
Karen Feng	077f2dad22	Fixed outdated bugs	2013-07-27 16:39:36 -07:00
Patrick Wendell	bcafb36c1e	Slight wording change	2013-07-27 16:03:50 -07:00
Patrick Wendell	8177165ac4	Log executor on finish	2013-07-27 16:02:06 -07:00
Patrick Wendell	c2223e6801	Improve catch scope and logging for client stop() This does two things: 1. Catches the more general `TimeoutException`, since those can be thrown. 2. Logs at info level when a timeout is detected.	2013-07-27 16:02:06 -07:00
Karen Feng	5a93e3c58c	Cleaned up code based on pwendell's suggestions	2013-07-27 15:55:26 -07:00
Karen Feng	dcc4743a95	Moved val now to render	2013-07-27 12:52:53 -07:00
Karen Feng	1714693324	Current time called once with value now	2013-07-27 12:24:41 -07:00
Dmitriy Lyubimov	6a47cee721	style	2013-07-26 22:35:13 -07:00
Dmitriy Lyubimov	0c391feb73	Maximum task failures configurable	2013-07-26 22:34:43 -07:00
Karen Feng	bd4cc52e30	Made metrics Option instead of Some, fixed NullPointerException	2013-07-26 17:23:18 -07:00
Reynold Xin	cb366774c8	Merge pull request #738 from harsha2010/pruning Fix bug in Partition Pruning.	2013-07-26 16:59:30 -07:00
harshars	392d7474fd	Code review	2013-07-26 15:23:15 -07:00
harshars	72cf7ec0e5	Indentation	2013-07-26 15:16:41 -07:00
harshars	822aac8f5a	Indentation	2013-07-26 15:10:32 -07:00
harshars	743fc4e7aa	Fix Bug in Partition Pruning, index of Pruned Partitions should inherit from parent	2013-07-26 14:35:17 -07:00
Karen Feng	3fbe9eaac0	Displys shuffle read/write only if exists, wraps if statements, trims old vals, grabs current time once	2013-07-26 11:51:38 -07:00
Karen Feng	22faeab261	Split Shuffle Activity overview column for read/write	2013-07-25 17:14:18 -07:00
Karen Feng	d4bbc8bd25	Shows totals for shuffle data and CPU time in Stage, homepage overviews including active time	2013-07-25 15:59:52 -07:00
Charles Reiss	a6de90c927	For standalone mode, get JAVA_HOME, SPARK_JAVA_OPTS, SPARK_LIBRARY_PATH from application env, not worker env	2013-07-25 12:42:30 -07:00
ryanlecompte	e56aa75de0	fix wrapping	2013-07-24 22:08:09 -07:00
ryanlecompte	8e0939f5a9	refactor Kryo serializer support to use chill/chill-java	2013-07-24 20:43:57 -07:00
Karen Feng	57009eef90	Fixed consistency of "success" status string	2013-07-24 13:43:09 -07:00
Karen Feng	4280e1768d	Removed finished status for task info, changed name of success case	2013-07-24 12:48:48 -07:00
Karen Feng	bd3931c874	Changed ifs with returns to if/else	2013-07-24 11:27:17 -07:00
Karen Feng	93c6015f82	Shows task status and running tasks on Stage Page: fixes SPARK-804 and 811	2013-07-24 10:53:02 -07:00
jerryshao	31ec72b243	Code refactor according to comments	2013-07-24 14:57:47 +08:00
jerryshao	8d1ef7f2df	Code style changes	2013-07-24 14:57:47 +08:00
Andrew xia	05637de842	Change class xxxInstrumentation to class xxxSource	2013-07-24 14:57:47 +08:00
Andrew xia	ed1a3bc206	continue to refactor code style and functions	2013-07-24 14:57:47 +08:00
jerryshao	5730193e0c	Fix some typos	2013-07-24 14:57:47 +08:00
jerryshao	a79f6077f0	Add Maven metrics library dependency and code changes	2013-07-24 14:57:47 +08:00
jerryshao	1daff54b2e	Change Executor MetricsSystem initialize code to SparkEnv	2013-07-24 14:57:47 +08:00
Andrew xia	5f8802c1fb	Register and init metricsSystem in SparkContext Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/SparkEnv.scala	2013-07-24 14:57:47 +08:00
Andrew xia	7d2eada451	Add metrics source of DAGScheduler and blockManager Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/SparkEnv.scala	2013-07-24 14:57:47 +08:00
jerryshao	e9ac88754d	Remove twice add Source bug and code clean	2013-07-24 14:57:47 +08:00
jerryshao	5ce5dc9fcd	Add default properties to deal with no configure file situation	2013-07-24 14:57:47 +08:00
jerryshao	871bc1687e	Add Executor instrumentation	2013-07-24 14:57:46 +08:00
jerryshao	7fb574bf66	Code clean and remarshal	2013-07-24 14:57:46 +08:00
Andrew xia	4d6dd67fa1	refactor metrics system 1.change source abstract class to support MetricRegistry 2.change master/work/jvm source class	2013-07-24 14:57:46 +08:00
jerryshao	03f9871116	MetricsSystem refactor	2013-07-24 14:57:46 +08:00
jerryshao	c3daad3f65	Update metric source support for instrumentation	2013-07-24 14:57:46 +08:00
jerryshao	9dec8c73e6	Add Master and Worker instrumentation support	2013-07-24 14:57:46 +08:00
jerryshao	503acd3a37	Build metrics system framwork	2013-07-24 14:57:46 +08:00
Matei Zaharia	b011329040	Merge pull request #727 from rxin/scheduler Scheduler code style cleanup.	2013-07-23 22:50:09 -07:00
Matei Zaharia	876125b997	Merge pull request #726 from rxin/spark-826 SPARK-829: scheduler shouldn't hang if a task contains unserializable objects in its closure	2013-07-23 22:28:21 -07:00
Reynold Xin	3dae1df66f	Moved non-serializable closure catching exception from submitStage to submitMissingTasks	2013-07-23 20:29:07 -07:00
Reynold Xin	d33b8a2a0f	Added comments on task closure serialization.	2013-07-23 20:28:39 -07:00
Reynold Xin	85ab8114bc	Moved non-serializable closure catching exception from submitStage to submitMissingTasks	2013-07-23 20:25:58 -07:00
Matei Zaharia	6a31b7191d	Small bug fix	2013-07-23 16:20:24 -07:00
Matei Zaharia	2f1736c396	Merge pull request #725 from karenfeng/task-start Creates task start events	2013-07-23 15:53:30 -07:00
Karen Feng	abc78cd331	Modifies instead of copies HashSets, fixes comment style	2013-07-23 15:47:16 -07:00
Karen Feng	383684daaa	Replaces Seq with HashSet, removes redundant import	2013-07-23 15:33:27 -07:00
Reynold Xin	f2422d4f29	SPARK-829: scheduler shouldn't hang if a task contains unserializable objects in its closure.	2013-07-23 15:30:20 -07:00
Reynold Xin	5ed38b4d1d	Scheduler code style cleanup.	2013-07-23 15:28:59 -07:00
Reynold Xin	101b8cc78a	SPARK-829: scheduler shouldn't hang if a task contains unserializable objects in its closure.	2013-07-23 15:28:20 -07:00
Karen Feng	9f2dbb2a7c	Adds/removes active tasks only once	2013-07-23 15:10:09 -07:00
Dmitriy Lyubimov	ef82ff8564	Merge branch 'master' into SPARK-826 Conflicts: core/src/main/scala/spark/scheduler/local/LocalScheduler.scala	2013-07-23 13:43:00 -07:00
Karen Feng	0200801a55	Tracks task start events and shows number of active tasks on Executor UI	2013-07-23 13:35:43 -07:00
Dmitriy Lyubimov	310e73d566	style	2013-07-23 13:23:25 -07:00
Matei Zaharia	f369e0e51b	Merge pull request #720 from ooyala/2013-07/persistent-rdds-api Add a public method getCachedRdds to SparkContext	2013-07-23 13:22:27 -07:00
Dmitriy Lyubimov	ac60d06381	Re-working in terms of changes to TaskSetManager. Verified with Standalone and Local mode.	2013-07-23 13:13:19 -07:00
Evan Chan	4830e22562	Rename method per rxin feedback	2013-07-23 09:50:13 -07:00
Evan Chan	2c2bfbe294	Add toMap method to TimeStampedHashMap and use it	2013-07-23 01:36:44 -07:00
Matei Zaharia	401aac8b18	Merge pull request #719 from karenfeng/ui-808 Creates Executors tab for Jobs UI	2013-07-22 16:57:16 -07:00
Karen Feng	872c97ad82	Split task columns, memory columns sort by numeric value	2013-07-22 16:54:37 -07:00
Karen Feng	2eea974795	Executors UI now calls executor ID from TaskInfo instead of TaskMetrics	2013-07-22 15:15:54 -07:00
Dmitriy Lyubimov	8ca0c31944	removing non-pertinent comment	2013-07-22 14:48:46 -07:00
Dmitriy Lyubimov	b4b230e606	Fixing for LocalScheduler with test, that much works ..	2013-07-22 14:42:47 -07:00
Karen Feng	85c4d7bf3b	Shows number of complete/total/failed tasks (bug: failed tasks assigned to null executor)	2013-07-22 14:35:47 -07:00
Josh Rosen	f649dabb4a	Fix bug: DoubleRDDFunctions.sampleStdev() computed non-sample stdev(). Update JavaDoubleRDD to add new methods and docs. Fixes SPARK-825.	2013-07-22 13:21:48 -07:00
Karen Feng	8901f379c9	Fixed memory used/remaining/total bug	2013-07-22 09:58:03 -07:00
Karen Feng	636b19f833	Merge branch 'master' of https://github.com/mesos/spark into ui-808	2013-07-22 09:53:26 -07:00
Evan Chan	0337d88321	Add a public method getCachedRdds to SparkContext	2013-07-21 18:26:14 -07:00
Karen Feng	865dc63bac	Changed table format for executors	2013-07-19 15:57:01 -07:00
Karen Feng	81bb5dc640	Creates Executors tab for application with RDD block and memory/disk used, solves SPARK-808	2013-07-19 14:08:30 -07:00
Konstantin Boudnik	cfce9a6a36	Regression: default webui-port can't be set via command line "--webui-port" anymore	2013-07-19 14:00:58 -07:00
Liang-Chi Hsieh	aa6f83289b	A better fix for giving local jars unde Yarn mode.	2013-07-19 22:25:28 +08:00
Liang-Chi Hsieh	a613628c50	Do not copy local jars given to SparkContext in yarn mode since the Context is not running on local. This bug causes failure when jars can not be found. Example codes (such as spark.examples.SparkPi) can not work without this fix under yarn mode.	2013-07-19 16:59:12 +08:00
Prashant Sharma	039087e1e3	Fixed formatting As per review comments on #709	2013-07-17 11:46:00 +05:30
Matei Zaharia	af3c9d5042	Add Apache license headers and LICENSE and NOTICE files	2013-07-16 17:21:33 -07:00
Matei Zaharia	b1f9f64743	Merge branch 'master' of github.com:mesos/spark	2013-07-16 11:01:53 -07:00
Matei Zaharia	5c388808a8	SPARK-814: Result stages should be named after action	2013-07-16 11:01:14 -07:00
Prashant Sharma	f89cc7ae3c	Fixed warning for type erasure	2013-07-16 14:59:24 +05:30
Prashant Sharma	50f3cd8890	Fixed warning enumerations	2013-07-16 14:39:46 +05:30
Prashant Sharma	55da6e9504	Fixed warning erasure -> runtimeClass	2013-07-16 14:37:08 +05:30
Prashant Sharma	ff14f38f3d	Fixed warning Throwables	2013-07-16 14:34:56 +05:30
Prashant Sharma	63addd93a8	Fixed warning ClassManifest -> ClassTag	2013-07-16 14:09:52 +05:30
Reynold Xin	69316603d6	Throw a more meaningful message when runJob is called to launch tasks on non-existent partitions.	2013-07-15 22:50:11 -07:00
Karen Feng	6dc7c9bfb1	Removed job UI column, linked description to job UI	2013-07-15 16:33:50 -07:00
Karen Feng	fbf5aa761e	Removed log message, added field in master UI to link to log UI	2013-07-15 15:50:03 -07:00
Karen Feng	eac381a957	Merge branch 'ui-802' of https://github.com/karenfeng/spark into ui-802	2013-07-15 15:48:44 -07:00
Karen Feng	3955711250	Added field to master UI with link to job UI	2013-07-15 15:47:21 -07:00
Karen Feng	0d78b6d9cd	Links to job UI from standalone deploy cluster web UI: fixes SPARK-802	2013-07-15 13:47:38 -07:00
Karen Feng	b2aaa1199e	Adds app name in HTML page titles on job web UI: fixes SPARK-806	2013-07-15 11:44:42 -07:00
Prashant Sharma	a1e56a43b3	Fixed compilation issues as Map is by default immutable.Map in scala-2.10	2013-07-15 11:28:18 +05:30
Prashant Sharma	a3494d405d	Merge branch 'master' of github.com:mesos/spark into scala-2.10 Conflicts: core/src/main/scala/spark/Utils.scala core/src/test/scala/spark/ui/UISuite.scala project/SparkBuild.scala run	2013-07-15 11:15:55 +05:30
Matei Zaharia	d47c16f78d	Add an option to disable reference tracking in Kryo	2013-07-15 01:55:54 +00:00
Matei Zaharia	10c05937bd	Merge pull request #699 from pwendell/ui-env Add `Environment` tab to SparkUI.	2013-07-14 11:45:18 -07:00
Patrick Wendell	4883586838	Responding to Matei's review	2013-07-14 10:37:26 -07:00
Matei Zaharia	b91a218cea	Cosmetic fixes to web UI	2013-07-14 07:31:33 +00:00
Matei Zaharia	a44a7b1238	Determine Spark core classes better in getCallSite	2013-07-14 07:23:09 +00:00
root	e271fde10b	Fixed a delay scheduling bug in the YARN branch, found by Patrick	2013-07-14 06:24:29 +00:00
Patrick Wendell	ddb97f0fdf	Add `Environment` tab to SparkUI. This adds a tab which displays system property and classpath information. This can be useful in debugging various types of issues such as: 1. Extra/incorrect Hadoop jars being included in the classpath 2. Spark launching with a different JRE version than intended 3. Spark system properties not being set to intended values 4. User added jars that conflict with Spark jars	2013-07-13 16:14:40 -07:00
Matei Zaharia	77c69ae5a0	Merge pull request #697 from pwendell/block-locations Show block locations in Web UI.	2013-07-12 23:05:21 -07:00
Matei Zaharia	5a7835c152	Merge pull request #691 from karenfeng/logpaging Create log pages	2013-07-12 20:28:21 -07:00
Matei Zaharia	71ccca0cc1	Merge pull request #696 from woggle/executor-env Pass executor env vars (e.g. SPARK_CLASSPATH) to compute-classpath.sh	2013-07-12 20:25:06 -07:00
Matei Zaharia	90fc3f30cd	Merge pull request #692 from Reinvigorate/takeOrdered adding takeOrdered() to RDD	2013-07-12 20:23:36 -07:00
Patrick Wendell	08150f19ab	Minor style fix	2013-07-12 19:32:35 -07:00
Patrick Wendell	6855338e14	Show block locations in Web UI. This fixes SPARK-769. Support is added for enumerating the locations of blocks in the UI. There is also some minor cleanup in StorageUtils.	2013-07-12 19:30:32 -07:00
Charles Reiss	531a7e5574	Pass executor env vars (e.g. SPARK_CLASSPATH) to compute-classpath.	2013-07-12 12:58:25 -07:00
seanm	a1662326e9	comment adjustment to takeOrdered	2013-07-12 08:38:19 -07:00
Prashant Sharma	a220e11a07	Merge branch 'master' of github.com:mesos/spark into scala-2.10	2013-07-12 15:12:46 +05:30
Prashant Sharma	e86d5dbaad	Merge branch 'master' into master-merge Conflicts: README.md core/pom.xml core/src/main/scala/spark/deploy/JsonProtocol.scala core/src/main/scala/spark/deploy/LocalSparkCluster.scala core/src/main/scala/spark/deploy/master/Master.scala core/src/main/scala/spark/deploy/master/MasterWebUI.scala core/src/main/scala/spark/deploy/worker/Worker.scala core/src/main/scala/spark/deploy/worker/WorkerWebUI.scala core/src/main/scala/spark/storage/BlockManagerUI.scala core/src/main/scala/spark/util/AkkaUtils.scala pom.xml project/SparkBuild.scala streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala	2013-07-12 14:49:16 +05:30
Andrew xia	2080e25006	Enhance job ui in spark ui system with adding pool information	2013-07-12 14:25:18 +08:00
seanm	a2c915fba8	giving order to top and making tests more clear	2013-07-11 18:55:00 -07:00
Karen Feng	5c67ca0278	Remove "Bytes" in lieu of String notation	2013-07-11 17:31:59 -07:00
Karen Feng	6d054487bf	Replace default buffer value to 100 GB, changed buttons to use String notation, removed default buffer parameter in UI URLs	2013-07-11 17:12:17 -07:00
Karen Feng	a32784109d	Fixed links for "Back to Master"	2013-07-11 16:57:55 -07:00
Karen Feng	ece2388585	Removed logPageLength from logPage	2013-07-11 16:35:56 -07:00
Karen Feng	9ed036ccdb	Replaced logPageLength with byteLength to prevent buffer shrink bug	2013-07-11 16:33:53 -07:00
Karen Feng	fdc226a14c	Clarified start and end byte variable names	2013-07-11 15:36:43 -07:00
Karen Feng	5d5dbc39f6	getByteRange moved to WorkerWebUI, takes converted parameters, returns only start/end offset	2013-07-11 15:22:45 -07:00
Karen Feng	15fd11d657	Removed redundant calls to request by logPage	2013-07-11 15:01:50 -07:00
Karen Feng	11872888ca	Created getByteRange function for logs and log pages, removed lastNBytes function	2013-07-11 14:56:37 -07:00
Matei Zaharia	018d04c64e	Merge pull request #684 from woggle/mesos-classloader Explicitly set class loader for MesosSchedulerDriver callbacks.	2013-07-11 12:48:37 -07:00
Karen Feng	e3a3fcf61b	Scrollbar on log pages appear automatically	2013-07-11 12:16:38 -07:00
Karen Feng	044d4577ec	Fixed capitalization of log page	2013-07-11 12:02:15 -07:00
Karen Feng	0ecc33f0c8	Added byte range, page title with log name, previous/next bytes buttons, initialization to end of log, large default buffer, buggy back to master link	2013-07-11 11:25:58 -07:00
Karen Feng	74bd3fc680	Added byte range on log pages	2013-07-10 15:44:28 -07:00
Karen Feng	24196c91f0	Changed buffer to 10,000 bytes, created scrollbar for fixed-height log	2013-07-10 15:27:52 -07:00
Karen Feng	f5f3b272f8	Fixed mixup of start/end, moved more import files	2013-07-10 14:52:29 -07:00
Karen Feng	0d4580360b	Fixed docstring of offsetBytes to match params and wrapped for 100+ character lines	2013-07-10 13:24:26 -07:00
Karen Feng	04263e4d46	Made some minor style changes	2013-07-10 13:15:42 -07:00
Karen Feng	cfb6447ac4	Fixed for nonexistent bytes, added unit tests, changed stdout-page to stdout	2013-07-10 11:47:57 -07:00
seanm	ee4ce2fc51	adding takeOrdered to java API	2013-07-10 10:46:04 -07:00
seanm	24705d0f46	adding takeOrdered() to RDD	2013-07-10 10:33:11 -07:00
Karen Feng	620a6974c6	Allows for larger files, refactors lastNBytes, removes old Log column, fixes imports, uses map	2013-07-10 10:20:53 -07:00
Karen Feng	b6072b58bf	Fixes style, makes "std__-page" consistent, reads only parts of files	2013-07-09 17:25:10 -07:00
Karen Feng	13fc6f248c	Clean commit of log paging	2013-07-09 14:17:15 -07:00
Charles Reiss	e47253e0cc	Reset ClassLoader in MesosSchedulerBackend, too. (per review comments). Also set ClassLoader for all mesos callbacks, not just statusUpdate, registered.	2013-07-09 01:23:23 -07:00
Charles Reiss	8c1d1c98e0	Explicitly set class loader for MesosSchedulerDriver callbacks.	2013-07-08 12:25:46 -07:00
Shivaram Venkataraman	4af0d63cb1	Remove akka LogLevel fix as we no longer use spray	2013-07-07 10:42:43 -07:00
Shivaram Venkataraman	a948f06725	Suppress log messages in sbt test with two changes: 1. Set akka log level to ERROR before shutting down the actorSystem. This avoids akka log messages (like Spray) from falling back to INFO on the Stdout logger 2. Initialize netty to use SLF4J in LocalSparkContext. This ensures that stack trace thrown during shutdown is handled by SLF4J instead of stdout	2013-07-07 04:09:08 -07:00
Patrick Wendell	32b9d21a97	Fix occasional failure in UI listener. If a task fails before the metrics are initialized, it remains possible that the metrics field will be `None`. This patch accounts for that possbility by keeping metrics as an `Option` at all times.	2013-07-06 16:40:02 -07:00
Matei Zaharia	1ffadb2d9e	Merge remote-tracking branch 'pwendell/ui-updates' Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala core/src/main/scala/spark/util/AkkaUtils.scala pom.xml	2013-07-06 15:51:41 -07:00
Matei Zaharia	94871e4703	Merge pull request #655 from tgravescs/master Add support for running Spark on Yarn on a secure Hadoop Cluster	2013-07-06 15:26:19 -07:00
Matei Zaharia	3f918b33f8	Merge pull request #672 from holdenk/master s/ActorSystemImpl/ExtendedActorSystem/ as ActorSystemImpl results in a warning	2013-07-06 12:45:18 -07:00
Matei Zaharia	7ba7fa110b	Merge pull request #674 from liancheng/master Bug fix: SPARK-789	2013-07-06 11:45:08 -07:00
BlackNiuza	44a2440039	Remove active job from idToActiveJob when job finished or aborted	2013-07-07 01:33:09 +08:00
Patrick Wendell	37abe84212	Tracking some task metrics even during failures.	2013-07-06 09:19:59 -07:00
Patrick Wendell	84b7fc54e6	Enforcing correct sort order for formatted strings	2013-07-05 17:21:08 -07:00
Matei Zaharia	652ea0f1d8	Allow RDD.takeSample to give samples bigger than the RDD Before, when withReplacement was set to true, we would not get a sample bigger than the RDD's count(). Conflicts: core/src/main/scala/spark/RDD.scala core/src/test/scala/spark/RDDSuite.scala	2013-07-05 11:15:13 -07:00
Matei Zaharia	6586c5e28b	Added a SparkContext accessor to RDD	2013-07-05 11:13:46 -07:00
jerryshao	e4ff544a8d	Clean StageToInfos periodically when spark.cleaner.ttl is enabled	2013-07-05 10:34:45 +08:00
Lian Cheng	c0c3155c3c	Bug fix: SPARK-789 https://spark-project.atlassian.net/browse/SPARK-789	2013-07-05 00:54:10 +08:00
Holden Karau	0f06d6217d	s/ActorSystemImpl/ExtendedActorSystem/ as ActorSystemImpl results in a warning	2013-07-04 01:05:39 -07:00
Gavin Li	94238aae57	fix dependencies	2013-07-03 18:08:38 +00:00
Prashant Sharma	a5f1f6a907	Merge branch 'master' into master-merge Conflicts: core/pom.xml core/src/main/scala/spark/MapOutputTracker.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/RDDCheckpointData.scala core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/Utils.scala core/src/main/scala/spark/api/python/PythonRDD.scala core/src/main/scala/spark/deploy/client/Client.scala core/src/main/scala/spark/deploy/master/MasterWebUI.scala core/src/main/scala/spark/deploy/worker/Worker.scala core/src/main/scala/spark/deploy/worker/WorkerWebUI.scala core/src/main/scala/spark/rdd/BlockRDD.scala core/src/main/scala/spark/rdd/ZippedRDD.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/main/scala/spark/storage/BlockManager.scala core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/main/scala/spark/storage/BlockManagerMasterActor.scala core/src/main/scala/spark/storage/BlockManagerUI.scala core/src/main/scala/spark/util/AkkaUtils.scala core/src/test/scala/spark/SizeEstimatorSuite.scala pom.xml project/SparkBuild.scala repl/src/main/scala/spark/repl/SparkILoop.scala repl/src/test/scala/spark/repl/ReplSuite.scala streaming/src/main/scala/spark/streaming/StreamingContext.scala streaming/src/main/scala/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala streaming/src/main/scala/spark/streaming/util/MasterFailureTest.scala	2013-07-03 11:43:26 +05:30
Gavin Li	96130c30d9	add compression codec trait and snappy compression	2013-07-03 05:49:04 +00:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	923cf92900	Rework from pull request. Removed --user option from Spark on Yarn Client, made the user of JAVA_HOME environment variable conditional on if its set, and created addCredentials in each of the SparkHadoopUtil classes to only add the credentials when the profile is hadoop2-yarn.	2013-07-02 21:18:59 -05:00
Patrick Wendell	39e2325675	Removing dead code	2013-07-02 16:28:40 -07:00
Patrick Wendell	8ca1cc1786	Adding truncation for log files	2013-07-02 16:10:50 -07:00
Patrick Wendell	9a42d04efa	Throw exception for missing resource	2013-07-01 14:43:13 -07:00
Patrick Wendell	1025d7d1ef	Package refactoring	2013-07-01 14:40:53 -07:00
Patrick Wendell	30b9034241	Fixing bug where logs aren't shown	2013-07-01 13:48:01 -07:00
Patrick Wendell	8688689387	Various formatting changes	2013-07-01 13:40:12 -07:00
Patrick Wendell	735c951a09	Adding test script	2013-07-01 09:33:22 -07:00
Patrick Wendell	5de326db7d	Print exception message	2013-07-01 09:19:45 -07:00
root	ec31e68d5d	Fixed PySpark perf regression by not using socket.makefile(), and improved debuggability by letting "print" statements show up in the executor's stderr Conflicts: core/src/main/scala/spark/api/python/PythonRDD.scala	2013-07-01 06:26:31 +00:00
root	3296d132b6	Fix performance bug with new Python code not using buffered streams	2013-07-01 06:25:43 +00:00
Matei Zaharia	03d0b858c8	Made use of spark.executor.memory setting consistent and documented it Conflicts: core/src/main/scala/spark/SparkContext.scala	2013-06-30 15:46:46 -07:00
Patrick Wendell	e721ff7e5a	Allowing details for failed stages	2013-06-29 11:26:30 -07:00
Patrick Wendell	473961d82e	Styling for progress bar	2013-06-29 08:38:04 -07:00
Patrick Wendell	249f0e54ba	Minor changes from Matei's review	2013-06-28 13:25:26 -07:00
Patrick Wendell	c537e869f3	Missing logo file	2013-06-27 22:02:03 -07:00
Patrick Wendell	62c2c6b856	Forcing Jetty to run as daemon	2013-06-27 21:47:22 -07:00
Patrick Wendell	a55190d314	Adding better tabs for UI headers.	2013-06-27 19:14:51 -07:00
Patrick Wendell	362d996c81	Handful of changes based on matei's review - Avoid exception when no tasks have finished for a stage - Adding DOCTYPE so css renders properly - Adding progress slider	2013-06-27 19:14:28 -07:00
Patrick Wendell	92a4c2a5f6	Fixing bug in local scheduler time recording	2013-06-27 12:33:06 -07:00
Stephen Haberman	d7011632d1	Wrap lines.	2013-06-26 12:35:57 -05:00
Patrick Wendell	ee692482a6	One more private class	2013-06-26 09:07:32 -07:00
Patrick Wendell	a59c15a37e	Adding config option for retained stages	2013-06-26 08:54:57 -07:00
Patrick Wendell	274193664a	Bumping timeouts	2013-06-26 08:51:28 -07:00
Patrick Wendell	b14ad509ba	Moving static ui package	2013-06-26 08:46:51 -07:00
Patrick Wendell	2cbaa0734b	Making all new classes package private	2013-06-26 08:44:55 -07:00
Stephen Haberman	d11025dc6a	Be cute with Option and getenv.	2013-06-26 09:53:35 -05:00
Matei Zaharia	6c8d1b2ca6	Fix computation of classpath when we launch java directly The previous version assumed that a CLASSPATH environment variable was set by the "run" script when launching the process that starts the ExecutorRunner, but unfortunately this is not true in tests. Instead, we factor the classpath calculation into an extenral script and call that. NOTE: This includes a Windows version but hasn't yet been tested there.	2013-06-25 18:21:00 -04:00
Matei Zaharia	15b00914c5	Some fixes to the launch-java-directly change: - Split SPARK_JAVA_OPTS into multiple command-line arguments if it contains spaces; this splitting follows quoting rules in bash - Add the Scala JARs to the classpath if they're not in the CLASSPATH variable because the ExecutorRunner is launched with "scala" (this can happen when using local-cluster URLs in spark-shell)	2013-06-25 17:17:27 -04:00
Matei Zaharia	7e0191c6ea	Merge remote-tracking branch 'cgrothaus/SPARK-698' Conflicts: run	2013-06-25 15:47:40 -04:00
Patrick Wendell	d66bd6f885	Adding another unit test to Web UI suite	2013-06-24 17:12:55 -07:00
Patrick Wendell	f7389330c3	Allowing for requested port on construction	2013-06-24 16:51:52 -07:00
Patrick Wendell	42157027f2	A few bug fixes and a unit test	2013-06-24 16:25:05 -07:00
Patrick Wendell	a4248138b4	Minor style cleanup	2013-06-24 14:22:28 -07:00
Patrick Wendell	b5e6e8bcc8	Cleaning up some code for Job Progress	2013-06-24 14:13:24 -07:00
Patrick Wendell	93e8ed85aa	Work around for initalization issue	2013-06-24 13:11:18 -07:00
Patrick Wendell	f6e64b5cd6	Updating based on changes to JobLogger (and one small change to JobLogger)	2013-06-24 12:40:41 -07:00
Matei Zaharia	78ffe164b3	Clone the zero value for each key in foldByKey The old version reused the object within each task, leading to overwriting of the object when a mutable type is used, which is expected to be common in fold. Conflicts: core/src/test/scala/spark/ShuffleSuite.scala	2013-06-23 10:26:53 -07:00
Matei Zaharia	0e0f9d3069	Fix search path for REPL class loader to really find added JARs	2013-06-22 17:44:04 -07:00
Matei Zaharia	3e61beff7b	Merge pull request #648 from shivaram/netty-dbg Shuffle fixes and cleanup	2013-06-22 16:22:47 -07:00
Patrick Wendell	7e9f1ed0de	Some cleanup of styling	2013-06-22 10:31:37 -07:00
Patrick Wendell	3b7ebdeeb8	Handling entirely failed stages	2013-06-22 10:31:37 -07:00
Patrick Wendell	be6107ce44	Some tweaking with shared page header	2013-06-22 10:31:37 -07:00
Patrick Wendell	9a24d1a2d0	Using scala in XML imports	2013-06-22 10:31:37 -07:00
Patrick Wendell	f91e1c4822	Linking RDD information when available in stages	2013-06-22 10:31:37 -07:00
Patrick Wendell	a86bb459e2	Showing shuffle status and purging old stages	2013-06-22 10:31:37 -07:00
Patrick Wendell	3485e73376	Style cleanup	2013-06-22 10:31:37 -07:00
Patrick Wendell	dd696f3a3d	Some renaming and comments	2013-06-22 10:31:37 -07:00
Patrick Wendell	5c872e9ef5	Documentation and some refactoring	2013-06-22 10:31:37 -07:00
Patrick Wendell	17776323a6	More work on percentile data:	2013-06-22 10:31:37 -07:00
Patrick Wendell	dcf6a68177	Refactoring into different modules	2013-06-22 10:31:36 -07:00
Patrick Wendell	ce81c320ac	Adding helper function to make listing tables	2013-06-22 10:31:36 -07:00
Patrick Wendell	9fd5dc3ea9	Initial steps towards job progress UI	2013-06-22 10:31:36 -07:00
Patrick Wendell	bc4a811c57	Stash	2013-06-22 10:31:36 -07:00
Patrick Wendell	77c53f7868	Refactoring UI packages	2013-06-22 10:31:36 -07:00
Patrick Wendell	8b5c7e71c4	Import cleanup	2013-06-22 10:31:36 -07:00
Patrick Wendell	32a45d01b1	Removing twirl files	2013-06-22 10:31:36 -07:00
Patrick Wendell	4e1f202481	Removing dead code	2013-06-22 10:31:36 -07:00
Patrick Wendell	d6fde4ffe4	Some JSON cleanup	2013-06-22 10:31:36 -07:00
Patrick Wendell	91ec5a1a04	Changing JSON protocol and removing spray code	2013-06-22 10:31:36 -07:00
Patrick Wendell	fc94576ece	Adding worker version of UI	2013-06-22 10:31:36 -07:00
Patrick Wendell	ee73c09ac9	Some comments	2013-06-22 10:31:36 -07:00
Patrick Wendell	9161db5478	Cleaning up master web UI	2013-06-22 10:31:36 -07:00
Patrick Wendell	e55cf0245f	Adding WebUI file	2013-06-22 10:31:35 -07:00
Patrick Wendell	f85fd7a793	Commenting unfinished part	2013-06-22 10:31:35 -07:00
Patrick Wendell	2c36a514aa	Spray refactoring for master web UI	2013-06-22 10:31:35 -07:00
Patrick Wendell	7e6977b6c5	Fix in storage status page	2013-06-22 10:31:35 -07:00
Patrick Wendell	950f83535a	Adding deterministic port	2013-06-22 10:31:35 -07:00
Patrick Wendell	7cd70dc2c1	Minor cleanup	2013-06-22 10:31:35 -07:00
Patrick Wendell	e66f570194	Completely hacked version of block manager UI in jetty	2013-06-22 10:31:35 -07:00
Patrick Wendell	60fbf7e461	Partially working checkpoint	2013-06-22 10:31:35 -07:00
Matei Zaharia	1ef5d0d2c9	Merge pull request #644 from shimingfei/joblogger add Joblogger to Spark (on new Spark code)	2013-06-22 09:35:57 -07:00
Jey Kottalam	1ba3c17303	use parens when calling method with side-effects	2013-06-21 12:14:16 -04:00
Jey Kottalam	edb18ca928	Rename PythonWorker to PythonWorkerFactory	2013-06-21 12:14:16 -04:00
Jey Kottalam	62c4781400	Add tests and fixes for Python daemon shutdown	2013-06-21 12:14:16 -04:00
Jey Kottalam	c79a6078c3	Prefork Python worker processes	2013-06-21 12:14:16 -04:00
Jey Kottalam	40afe0d2a5	Add Python timing instrumentation	2013-06-21 12:14:16 -04:00
Mingfei	2fc794a6c7	small modify in DAGScheduler	2013-06-21 18:21:35 +08:00
Mingfei	4b9862ac9c	small format modification	2013-06-21 17:55:32 +08:00
Mingfei	aa7aa587be	some format modification	2013-06-21 17:48:41 +08:00
Mingfei	5240795154	edit according to comments	2013-06-21 17:38:23 +08:00
Matei Zaharia	71030ba3eb	Merge pull request #654 from lyogavin/enhance_pipe fix typo and coding style in #638	2013-06-19 15:21:03 -07:00
Thomas Graves	bad51c7cb4	upmerge with latest mesos/spark master and fix hbase compile with hadoop2-yarn profile	2013-06-19 14:39:13 -05:00
Thomas Graves	75d78c7ac9	Add support for Spark on Yarn on a secure Hadoop cluster	2013-06-19 11:18:42 -05:00
Matei Zaharia	7902baddc7	Update ASM to version 4.0	2013-06-19 13:34:30 +02:00
Gavin Li	0a2a9bce1e	fix typo and coding style	2013-06-18 21:30:13 +00:00
jerryshao	1e9269c3ee	reduce ZippedPartitionsRDD's getPreferredLocations complexity	2013-06-18 09:49:06 +08:00
Matei Zaharia	db42451a52	Merge pull request #643 from adatao/master Bug fix: Zero-length partitions result in NaN for overall mean & variance	2013-06-17 15:26:36 -07:00
Matei Zaharia	e82a2ffcc9	Merge pull request #653 from rxin/logging SPARK-781: Log the temp directory path when Spark says "Failed to create temp directory."	2013-06-17 15:13:15 -07:00
Matei Zaharia	ec193c7d89	Merge remote-tracking branch 'xiajunluan/xiajunluan' Conflicts: core/src/main/scala/spark/scheduler/cluster/TaskSetManager.scala	2013-06-18 00:11:50 +02:00
Reynold Xin	be3c406edf	Fixed the typo pointed out by Matei.	2013-06-17 17:07:51 -04:00
Reynold Xin	1450296797	SPARK-781: Log the temp directory path when Spark says "Failed to create temp directory".	2013-06-17 16:58:23 -04:00
Gavin Li	4508089fc3	refine comments and add sc.clean	2013-06-17 05:23:46 +00:00
Gavin Li	e6ae049283	Merge remote-tracking branch 'upstream1/master' into enhance_pipe	2013-06-16 22:53:39 +00:00
Gavin Li	fb6d733fa8	update according to comments	2013-06-16 22:32:55 +00:00
Matei Zaharia	f961aac8b2	Merge pull request #649 from ryanlecompte/master Add top K method to RDD using a bounded priority queue	2013-06-15 00:53:41 -07:00
ryanlecompte	e8801d4490	use delegation for BoundedPriorityQueue, add Java API	2013-06-14 23:39:05 -07:00
Reynold Xin	2cc188fd54	SPARK-774: cogroup should also disable map side combine by default	2013-06-14 00:10:54 -07:00
Reynold Xin	6738178d0d	SPARK-772: groupByKey should disable map side combine.	2013-06-13 23:59:42 -07:00
ryanlecompte	93b3f5e535	drop unneeded ClassManifest implicit	2013-06-13 16:26:35 -07:00
ryanlecompte	44b8dbaede	use Iterator.single(elem) instead of Iterator(elem) for improved performance based on scaladocs	2013-06-13 16:23:15 -07:00
Shivaram Venkataraman	1d9f0df065	Fix some comments and style	2013-06-13 14:46:25 -07:00
Mingfei	967a6a699d	modify sparklister function interface according to comments	2013-06-13 14:36:07 +08:00
Shivaram Venkataraman	5da4287b1d	Merge branch 'netty-dbg' of github.com:shivaram/spark into netty-dbg	2013-06-12 16:38:37 -07:00
Shivaram Venkataraman	5e9a9317c5	Merge branch 'master' of git://github.com/mesos/spark into netty-dbg	2013-06-12 16:38:01 -07:00
ryanlecompte	db5bca08ff	add a new top K method to RDD using a bounded priority queue	2013-06-12 10:54:16 -07:00
Andrew xia	190ec61799	change code style and debug info	2013-06-10 15:27:02 +08:00
Patrick Wendell	ef14dc2e77	Adding Java-API version of compression codec	2013-06-09 18:09:46 -07:00
Patrick Wendell	df592192e7	Monads FTW	2013-06-09 18:09:24 -07:00
Patrick Wendell	d1bbcebae5	Adding compression to Hadoop save functions	2013-06-09 11:39:35 -07:00
Mingfei	ade822011d	not check return value of eventQueue.take	2013-06-08 16:26:45 +08:00
Matei Zaharia	5b5b5aedbf	Fixed a few test issues due to Akka 2.1, as well as SBT memory. Unfortunately, in Akka 2.1, ActorSystem.awaitTermination hangs for remote actors, and Akka also leaves a non-daemon Netty thread even when run in daemon mode. Thus I had to comment out some of the calls to awaitTermination, and we still have one failing test.	2013-06-08 01:09:24 -07:00
Mingfei	4fd86e0e10	delete test code for joblogger in SparkContext	2013-06-08 15:45:47 +08:00
Mingfei	362f0f93ac	Merge branch 'master' of https://github.com/mesos/spark	2013-06-08 15:20:13 +08:00
Mingfei	1a4d93c025	modify to pass job annotation by localProperties and use daeamon thread to do joblogger's work	2013-06-08 14:23:39 +08:00
Matei Zaharia	b58a29295b	Small formatting and style fixes	2013-06-07 22:51:28 -07:00
Matei Zaharia	c8fc423bc2	Merge pull request #631 from jerryshao/master Fix block manager UI display issue when enable spark.cleaner.ttl	2013-06-07 22:43:18 -07:00
Matei Zaharia	c9ca0a4a58	Small code style fix to SchedulingAlgorithm.scala	2013-06-07 22:40:44 -07:00
Matei Zaharia	1ae60bcb36	Merge pull request #634 from xiajunluan/master [Spark-753] Fix ClusterSchedulSuite unit test failed	2013-06-07 22:39:06 -07:00
Shivaram Venkataraman	ac480fd977	Clean up variables and counters in BlockFetcherIterator	2013-06-06 16:34:27 -07:00
Gavin Li	e179ff8a32	update according to comments	2013-06-05 22:41:05 +00:00
Shivaram Venkataraman	cb2f5046ee	Pass in bufferSize to BufferedOutputStream	2013-06-05 15:09:02 -07:00
Shivaram Venkataraman	c851957fe4	Don't write zero block files with java serializer	2013-06-05 14:28:38 -07:00
Christopher Nguyen	9d35904357	In the current code, when both partitions happen to have zero-length, the return mean will be NaN. Consequently, the result of mean after reducing over all partitions will also be NaN, which is not correct if there are partitions with non-zero length. This patch fixes this issue.	2013-06-04 22:12:47 -07:00
Matei Zaharia	fff3728552	Merge pull request #640 from pwendell/timeout-update Fixing bug in BlockManager timeout	2013-06-04 16:09:50 -07:00
Patrick Wendell	061fd3ae36	Fixing bug in BlockManager timeout	2013-06-04 19:02:44 -04:00
Matei Zaharia	f420d4f228	Merge pull request #639 from pwendell/timeout-update Bump akka and blockmanager timeouts to 60 seconds	2013-06-04 15:25:58 -07:00
Patrick Wendell	8bd4e12104	Bump akka and blockmanager timeouts to 60 seconds	2013-06-04 18:14:24 -04:00
Shivaram Venkataraman	96943a1cc0	var to val	2013-06-03 12:29:38 -07:00
Shivaram Venkataraman	cd347f547a	Reuse the file object as it is valid after delete	2013-06-03 12:27:51 -07:00
Shivaram Venkataraman	a058b0acf3	Delete a file for a block if it already exists.	2013-06-03 12:10:00 -07:00
Andrew xia	606bb1b450	Fix schedulingAlgorithm bugs for unit test	2013-06-03 10:29:23 +08:00
Shivaram Venkataraman	038cfc1a9a	Make connect timeout configurable	2013-05-31 23:32:18 -07:00
Shivaram Venkataraman	91aca92249	Another round of Netty fixes. 1. Avoid race condition between stop and copier completion 2. Handle socket exceptions by reporting them and filling in a failed FetchResult	2013-05-31 23:21:38 -07:00
Gavin Li	9f84315c05	enhance pipe to support what we can do in hadoop streaming	2013-06-01 00:26:10 +00:00
Reynold Xin	de1167bf2c	Incorporated Charles' feedback to put rdd metadata removal in BlockManagerMasterActor.	2013-05-31 15:54:57 -07:00
Reynold Xin	ba5e544461	More block manager cleanup. Implemented a removeRdd method in BlockManager, and use that to implement RDD.unpersist. Previously, unpersist needs to send B akka messages, where B = number of blocks. Now unpersist only needs to send W akka messages, where W = the number of workers.	2013-05-31 01:48:16 -07:00
jerryshao	926f41cc52	fix block manager UI display issue when enable spark.cleaner.ttl	2013-05-31 09:32:52 +08:00
Reynold Xin	bed1b08169	Do not create symlink for local add file. Instead, copy the file. This prevents Spark from changing the original file's permission, and also allow add file to work on non-posix operating systems.	2013-05-30 16:21:49 -07:00
Shivaram Venkataraman	3b0cd17343	Merge branch 'master' of git://github.com/mesos/spark Conflicts: core/src/test/scala/spark/ShuffleSuite.scala	2013-05-30 14:36:24 -07:00
Andrew xia	c3db3ea554	1. Add unit test for local scheduler 2. Move localTaskSetManager to a new file	2013-05-30 20:49:40 +08:00
Andrew xia	ecceb101d3	implement FIFO and fair scheduler for spark local mode	2013-05-30 10:43:01 +08:00
Shivaram Venkataraman	19fd6d54c0	Also flush serializer in revertPartialWrites	2013-05-29 17:29:34 -07:00
Shivaram Venkataraman	618c8cae1e	Skip fetching zero-sized blocks in OIO. Also unify splitLocalRemoteBlocks for netty/nio and add a test case	2013-05-29 13:18:54 -07:00
Matei Zaharia	6ed71390d9	Merge pull request #626 from stephenh/remove-add-if-no-port Remove unused addIfNoPort.	2013-05-29 10:14:22 -07:00
Shivaram Venkataraman	b79b10a6d6	Flush serializer to fix zero-size kryo blocks bug. Also convert the local-cluster test case to check for non-zero block sizes	2013-05-29 00:52:55 -07:00
Matei Zaharia	41d230ccb0	Merge pull request #611 from squito/classloader Use default classloaders for akka & deserializing task results	2013-05-28 23:35:24 -07:00
Shivaram Venkataraman	fbc1ab3468	Couple of Netty fixes a. Fix the port number by reading it from the bound channel b. Fix the shutdown sequence to make sure we actually block on the channel c. Fix the unit test to use two JVMs.	2013-05-28 16:27:16 -07:00
Stephen Haberman	4fe1fbdd51	Remove unused addIfNoPort.	2013-05-28 16:26:32 -05:00
Matei Zaharia	3db1e17baa	Merge pull request #620 from jerryshao/master Fix CheckpointRDD java.io.FileNotFoundException when calling getPreferredLocations	2013-05-27 21:31:43 -07:00
Matei Zaharia	e8d4b6c296	Merge pull request #529 from xiajunluan/master [SPARK-663]Implement Fair Scheduler in Spark Cluster Scheduler	2013-05-25 21:09:03 -07:00
Reynold Xin	26962c9340	Automatically configure Netty port. This makes unit tests using local-cluster pass. Previously they were failing because Netty was trying to bind to the same port for all processes. Pair programmed with @shivaram.	2013-05-24 16:39:33 -07:00
Reynold Xin	6ea085169d	Fixed the bug that shuffle serializer is ignored by the new shuffle block iterators for local blocks. Also added a unit test for that.	2013-05-24 14:08:37 -07:00
jerryshao	bd3ea8f2a6	fix CheckpointRDD getPreferredLocations java.io.FileNotFoundException	2013-05-24 14:26:19 +08:00
Charles Reiss	f350f14084	Use ARRAY_SAMPLE_SIZE constant instead of 100.0	2013-05-21 18:11:33 -07:00
Andrew xia	ecd6d75c6a	fix bug of unit tests	2013-05-21 06:49:23 +08:00
Reynold Xin	5912cc4967	Merge pull request #610 from JoshRosen/spark-747 Throw exception if TaskResult exceeds Akka frame size	2013-05-17 19:58:40 -07:00
Reynold Xin	8d78c5f89f	Changed the logging level from info to warning when addJar(null) is called.	2013-05-17 18:51:35 -07:00
Andrew xia	3d4672eaa9	Merge branch 'master' into xiajunluan Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/scheduler/cluster/ClusterScheduler.scala core/src/main/scala/spark/scheduler/cluster/TaskSetManager.scala	2013-05-18 07:28:03 +08:00
Andrew xia	d19753b9c7	expose TaskSetManager type to resourceOffer function in ClusterScheduler	2013-05-18 06:45:19 +08:00
Andrew xia	c6e2770bfe	Fix ClusterScheduler bug to avoid allocating tasks to same slave	2013-05-17 05:10:38 +08:00
Mridul Muralidharan	f0881f8d48	Hope this does not turn into a bike shed change	2013-05-17 01:58:50 +05:30
Mridul Muralidharan	feddd2530d	Filter out nulls - prevent NPE	2013-05-16 17:49:14 +05:30
Josh Rosen	b8e46b6074	Abort job if result exceeds Akka frame size; add test.	2013-05-16 01:57:57 -07:00
Matei Zaharia	2f576aba8f	Merge pull request #602 from rxin/shufflemerge Manual merge & cleanup of Shane's Shuffle Performance Optimization	2013-05-15 18:06:24 -07:00
Reynold Xin	203d7b7c14	Merge pull request #593 from squito/driver_ui_link Master UI has link to Application UI	2013-05-15 00:47:20 -07:00
Reynold Xin	f3491cb89b	Merge branch 'master' of github.com:mesos/spark into shufflemerge Conflicts: core/src/main/scala/spark/storage/BlockManager.scala core/src/test/scala/spark/DistributedSuite.scala project/SparkBuild.scala	2013-05-15 00:31:52 -07:00
Reynold Xin	f9d40a5848	Added a comment in JdbcRDD for example usage.	2013-05-14 23:29:57 -07:00
Reynold Xin	81ad2fa331	Merge branch 'jdbc' of github.com:koeninger/spark Conflicts: project/SparkBuild.scala	2013-05-14 23:12:00 -07:00
Imran Rashid	38d4b97c6d	use threads classloader when deserializing task results; classnotfoundexception includes classloader	2013-05-14 22:32:14 -07:00
Imran Rashid	d7d1da79d3	when akka starts, use akkas default classloader (current thread)	2013-05-14 22:32:09 -07:00
Matei Zaharia	016ac86830	Merge pull request #601 from rxin/emptyrdd-master EmptyRDD (master branch 0.8)	2013-05-13 21:45:36 -07:00
Matei Zaharia	4b354e0a08	Merge pull request #589 from mridulm/master Add support for instance local scheduling	2013-05-13 17:39:19 -07:00
Patrick Wendell	7f0833647b	Capturing class name	2013-05-12 07:54:03 -07:00
Patrick Wendell	72b9c4cb6e	Small fix	2013-05-11 23:53:50 -07:00
Patrick Wendell	1c15b85051	Removing import	2013-05-11 23:52:53 -07:00
Patrick Wendell	059ab88754	Changing technique to use same code path in all cases	2013-05-11 23:50:54 -07:00
Cody Koeninger	3da2305ed0	code cleanup per rxin comments	2013-05-11 23:59:07 -05:00
Josh Rosen	440719109e	Throw exception if task result exceeds Akka frame size. This partially addresses SPARK-747.	2013-05-11 19:17:13 -07:00
Patrick Wendell	0345954530	SPARK-738: Spark should detect and squash nonserializable exceptions	2013-05-11 14:17:09 -07:00
Mark Hamstra	6e6b3e0d7e	Actually use the cleaned closure in foreachPartition	2013-05-10 13:02:34 -07:00
Imran Rashid	0ab818d508	fix linebreak	2013-05-09 00:38:59 -07:00
Reynold Xin	5d70ee4663	Cleaned up connection manager (moved many classes to their own files).	2013-05-07 22:42:15 -07:00
Reynold Xin	8388e8dd7a	Minor style fix in DiskStore...	2013-05-07 18:40:35 -07:00
Reynold Xin	547dcbe494	Cleaned up Scala files in network/netty from Shane's PR.	2013-05-07 18:39:33 -07:00
Reynold Xin	9e64396ca4	Cleaned up the Java files from Shane's PR.	2013-05-07 18:30:54 -07:00
Reynold Xin	0e5cc30868	Cleaned up BlockManager and BlockFetcherIterator from Shane's PR.	2013-05-07 18:18:24 -07:00
Reynold Xin	8b79485171	Moved BlockFetcherIterator to its own file.	2013-05-07 17:02:32 -07:00
Reynold Xin	90577ada69	Merge branch 'shuffle-performance-fix-0.7' of github.com:shane-huang/spark into shufflemerge Conflicts: core/src/main/scala/spark/storage/BlockManager.scala core/src/main/scala/spark/storage/DiskStore.scala project/SparkBuild.scala	2013-05-07 15:56:19 -07:00
Reynold Xin	0fd84965f6	Added EmptyRDD.	2013-05-06 15:40:34 -07:00
Imran Rashid	22a5063ae4	switch from separating appUI host & port to combining into just appUiUrl	2013-05-05 12:19:11 -07:00
Matei Zaharia	7af92f248b	Merge pull request #597 from JoshRosen/webui-fixes Two minor bug fixes for Spark Web UI	2013-05-04 22:29:17 -07:00
Josh Rosen	42b1953c53	Fix SPARK-630: app details page shows finished executors as running.	2013-05-04 18:34:47 -07:00
Josh Rosen	c0688451a6	Fix wrong closing tags in web UI HTML.	2013-05-04 18:34:46 -07:00
Josh Rosen	d48e9fde01	Fix SPARK-629: weird number of cores in job details page.	2013-05-04 18:34:45 -07:00
Mridul Muralidharan	25198d7e9e	Merge branch 'master' of github.com:mridulm/spark	2013-05-04 20:45:56 +05:30
Mridul Muralidharan	5b011d18d7	Merge from master	2013-05-04 20:41:27 +05:30
Mridul Muralidharan	edb57c8331	Add support for instance local in getPreferredLocations of ZippedPartitionsBaseRDD. Add comments to both ZippedPartitionsBaseRDD and ZippedRDD to better describe the potential problem with the approach	2013-05-04 19:47:45 +05:30
Matei Zaharia	3bf2c868c3	Merge pull request #594 from shivaram/master Add zip partitions to Java API	2013-05-03 18:27:30 -07:00
Shivaram Venkataraman	bb8a434f9d	Add zipPartitions to Java API.	2013-05-03 15:14:02 -07:00
Imran Rashid	6fae936088	applications (aka drivers) send their webUI address to master when registering so it can be displayed in the master web ui	2013-05-03 12:59:10 -07:00
Mridul Muralidharan	ea2a6f91d3	pull from master	2013-05-04 00:35:59 +05:30
Reynold Xin	93091f6936	Merge branch 'master' of github.com:mesos/spark into blockmanager	2013-05-03 01:02:32 -07:00
Reynold Xin	2bc895a829	Updated according to Matei's code review comment.	2013-05-03 01:02:16 -07:00
Mridul Muralidharan	11589c39d9	Fix ZippedRDD as part Matei's suggestion	2013-05-03 12:23:30 +05:30
Matei Zaharia	6fe9d4e61e	Merge pull request #592 from woggling/localdir-fix Don't accept generated local directory names that can't be created	2013-05-02 21:33:56 -07:00
Matei Zaharia	538ee755b4	Merge pull request #581 from jerryshao/master fix [SPARK-740] block manage UI throws exception when enabling Spark Streaming	2013-05-02 09:01:42 -07:00
Charles Reiss	c847dd3da2	Don't accept generated temp directory names that can't be created successfully.	2013-05-01 23:19:10 -07:00
Reynold Xin	4a31877408	Added the unpersist api to JavaRDD.	2013-05-01 20:31:54 -07:00
Reynold Xin	98df9d2853	Added removeRdd function in BlockManager.	2013-05-01 20:17:09 -07:00
Mridul Muralidharan	dfde9ce9dd	comment out debug versions of checkHost, etc from Utils - which were used to test	2013-05-02 07:41:33 +05:30
Mridul Muralidharan	1b5aaeadc7	Integrate review comments 2	2013-05-02 07:30:06 +05:30
jerryshao	c047f0e3ad	filter out Spark streaming block RDD and sort RDDInfo with id	2013-05-02 09:48:32 +08:00
Mridul Muralidharan	609a817f52	Integrate review comments on pull request	2013-05-02 06:44:33 +05:30
Reynold Xin	204eb32e14	Changed the type of the persistentRdds hashmap back to TimeStampedHashMap.	2013-05-01 16:14:58 -07:00
Reynold Xin	34637b97ec	Added SparkContext.cleanup back. Not sure why it was removed before ...	2013-05-01 16:12:37 -07:00
Reynold Xin	3227ec8edd	Cleaned up Ram's code. Moved SparkContext.remove to RDD.unpersist. Also updated unit tests to make sure they are properly testing for concurrency.	2013-05-01 16:07:44 -07:00
harshars	8481562731	Merged Ram's commit on removing RDDs. Conflicts: core/src/main/scala/spark/SparkContext.scala	2013-05-01 14:42:17 -07:00
Mridul Muralidharan	27764a00f4	Fix some npe introduced accidentally	2013-05-01 20:56:05 +05:30
Mridul Muralidharan	d960e7e0f8	a) Add support for hyper local scheduling - specific to a host + port - before trying host local scheduling. b) Add some fixes to test code to ensure it passes (and fixes some other issues). c) Fix bug in task scheduling which incorrectly used availableCores instead of all cores on the node.	2013-05-01 20:24:00 +05:30
Prashant Sharma	dbe2887da7	Fixed deprecated method warning	2013-05-01 13:22:49 +05:30
Matei Zaharia	aa8fe1a209	Merge pull request #586 from mridulm/master Pull request to address issues Reynold Xin reported	2013-04-30 22:30:18 -07:00
Reynold Xin	dd7bef3147	Two minor fixes according to Ryan LeCompte's review.	2013-04-30 15:02:32 -07:00
Reynold Xin	cea6174573	Merge branch 'master' of github.com:mesos/spark into blockmanager Conflicts: core/src/main/scala/spark/BlockStoreShuffleFetcher.scala	2013-04-30 13:28:35 -07:00
Mridul Muralidharan	60cabb35cb	Add addition catch block for exception too	2013-05-01 01:17:14 +05:30
Mridul Muralidharan	3b748ced22	Be more aggressive and defensive in all uses of SelectionKey in select loop	2013-05-01 00:30:30 +05:30
Mridul Muralidharan	0f45477be1	Change indentation	2013-05-01 00:10:02 +05:30
Mridul Muralidharan	538614acfe	Be more aggressive and defensive in select also	2013-05-01 00:05:32 +05:30
Mridul Muralidharan	48854e1dbf	If key is not valid, close connection	2013-04-30 23:59:33 +05:30
Matei Zaharia	f708dda81e	Merge pull request #585 from pwendell/listener-perf [Fix SPARK-742] Task Metrics should not employ per-record timing by default	2013-04-30 07:51:40 -07:00
Mridul Muralidharan	e46d547ccd	Fix issues reported by Reynold	2013-04-30 16:15:56 +05:30
Reynold Xin	1055785a83	Allow specifying the shuffle write file buffer size. The default buffer size is 8KB in FastBufferedOutputStream, which is too small and would cause a lot of disk seeks.	2013-04-29 23:33:56 -07:00
Reynold Xin	7007201201	Added a shuffle block manager so it is easier in the future to consolidate shuffle output files.	2013-04-29 23:07:03 -07:00
Reynold Xin	d3586ef438	Merge branch 'blockmanager' of github.com:rxin/spark into blockmanager Conflicts: core/src/main/scala/spark/storage/DiskStore.scala	2013-04-29 15:44:18 -07:00
Patrick Wendell	016ce1fa9c	Using full package name for util	2013-04-29 12:02:27 -07:00
Patrick Wendell	540be6b154	Modified version of the fix which just removes all per-record tracking.	2013-04-29 11:32:07 -07:00
Patrick Wendell	224fbac061	Spark-742: TaskMetrics should not employ per-record timing. This patch does three things: 1. Makes TimedIterator a trait with two implementations (one a no-op) 2. Makes the default behavior to use the no-op implementation 3. Removes DelegateBlockFetchTracker. This is just cleanup, but it seems like the triat doesn't really reduce complexity in any way. In the future we can add other implementations, e.g. ones which perform sampling.	2013-04-29 11:13:43 -07:00
Prashant Sharma	24bbf318b3	Fixied other warnings	2013-04-29 19:56:28 +05:30
Prashant Sharma	d3518f57cd	Fixed warning: erasure -> runtimeClass	2013-04-29 18:14:25 +05:30
Prashant Sharma	8f3ac240cb	Fixed Warning: ClassManifest -> ClassTag	2013-04-29 16:39:13 +05:30
Shivaram Venkataraman	604d3bf56c	Rename partition class and add scala doc	2013-04-28 16:31:07 -07:00
Shivaram Venkataraman	15acd49f07	Actually rename classes to ZippedPartitions* (the previous commit only renamed the file)	2013-04-28 16:03:22 -07:00
Shivaram Venkataraman	6e84635ab9	Rename classes from MapZipped* to Zipped*	2013-04-28 15:58:40 -07:00
Shivaram Venkataraman	0cc6642b7c	Rename to zipPartitions and style changes	2013-04-28 05:11:03 -07:00
Shivaram Venkataraman	c9c4954d99	Add an interface to zip iterators of multiple RDDs The current code supports 2, 3 or 4 arguments but can be extended to more arguments if required.	2013-04-26 16:57:46 -07:00
Matei Zaharia	6e6b5204ea	Create an empty directory when checkpointing a 0-partition RDD (fixes a test failure on Hadoop 2.0)	2013-04-25 00:42:37 -07:00
Reynold Xin	ba6ffa6a5f	Allow the specification of a shuffle serializer in the read path (for local block reads).	2013-04-24 17:38:07 -07:00
Reynold Xin	aa618ed2a2	Allow changing the serializer on a per shuffle basis.	2013-04-24 14:52:49 -07:00
Prashant Sharma	ad88f083a6	scala 2.10 and master merge	2013-04-24 18:08:26 +05:30
Mridul Muralidharan	dd515ca3ee	Attempt at fixing merge conflict	2013-04-24 09:24:17 +05:30
Reynold Xin	31ce6c66d6	Added a BlockObjectWriter interface in block manager so ShuffleMapTask doesn't need to build up an array buffer for each shuffle bucket.	2013-04-23 17:48:59 -07:00
koeninger	dfac0aa5c2	prevent mysql driver from pulling entire resultset into memory. explicitly close resultset and statement.	2013-04-22 21:12:52 -05:00
Prashant Sharma	185bb9525a	Manually merged scala-2.10 and master	2013-04-22 14:14:03 +05:30
Mridul Muralidharan	7acab3ab45	Fix review comments, add a new api to SparkHadoopUtil to create appropriate Configuration. Modify an example to show how to use SplitInfo	2013-04-22 08:01:13 +05:30
koeninger	b2a3f24dde	first attempt at an RDD to pull data from JDBC sources	2013-04-21 00:29:37 -05:00
Andrew xia	8436bd5d4a	remove TaskSetQueueManager and update code style	2013-04-19 02:17:22 +08:00
Andrew xia	e0603d7e8b	refactor the Schedulable interface and add unit test for SchedulingAlgorithm	2013-04-18 13:13:54 +08:00
Mridul Muralidharan	5ee2f5c483	Cache pattern, add (commented out) alternatives for check* apis	2013-04-17 23:13:34 +05:30
Mridul Muralidharan	f07961060d	Add a small note on spark.tasks.schedule.aggression	2013-04-17 23:13:02 +05:30
Mridul Muralidharan	02dffd2eb0	Ensure all ask/await block for spark.akka.askTimeout - so that it is controllable : instead of arbitrary timeouts spread across codebase. In our tests, we use 30 seconds, though default of 10 is maintained	2013-04-17 05:52:57 +05:30
Mridul Muralidharan	ad80f68eb5	remove spurious debug statements	2013-04-16 22:15:34 +05:30
Mridul Muralidharan	f7969f72ee	Fix exception when checkpoint path does not exist (no data in rdd which is being checkpointed for example)	2013-04-16 21:51:38 +05:30
Mridul Muralidharan	323ab8ff3b	Scala does not prevent variable shadowing ! Sick error due to it ...	2013-04-16 17:05:10 +05:30
shane-huang	b493f55a4f	fix a bug in netty Block Fetcher Signed-off-by: shane-huang <shengsheng.huang@intel.com>	2013-04-16 10:01:01 +08:00
Mridul Muralidharan	59c380d69a	Fix npe	2013-04-16 03:29:38 +05:30
Mridul Muralidharan	dd2b64ec97	Fix bug with atomic update	2013-04-16 03:19:24 +05:30
Mridul Muralidharan	5540ab8243	Use hostname instead of hostport for executor, fix creation of workdir	2013-04-16 02:57:43 +05:30
Mridul Muralidharan	eb7e95e833	Commit job to persist files	2013-04-16 02:56:36 +05:30
Matei Zaharia	a64c107449	Make ShuffledRDD.prev transient	2013-04-15 16:41:51 -04:00
Mridul Muralidharan	19652a44be	Fix issue with FileSuite failing	2013-04-15 19:16:36 +05:30
Mridul Muralidharan	54b3d45b81	Checkpoint commit - compiles and passes a lot of tests - not all though, looking into FileSuite issues	2013-04-15 18:26:50 +05:30
Mridul Muralidharan	d90d2af103	Checkpoint commit - compiles and passes a lot of tests - not all though, looking into FileSuite issues	2013-04-15 18:12:11 +05:30
Matei Zaharia	c35d530bcf	Fix compile error	2013-04-13 12:43:12 -04:00
Andrew Ash	29d3440efb	Add details when BlockManager heartbeats time out Makes it more clear what the threshold was for tuning spark.storage.blockManagerSlaveTimeoutMs Before: WARN "Removing BlockManager BlockManagerId(201304022120-1976232532-5050-27464-0, myhostname, 51337) with no recent heart beats After: WARN "Removing BlockManager BlockManagerId(201304022120-1976232532-5050-27464-0, myhostname, 51337) with no recent heart beats: 19216ms exceeds 15000ms	2013-04-11 01:54:02 -03:00
Andrew xia	2f883c515f	Contiue to update codes for scala code style 1.refactor braces for "class" "if" "while" "for" "match" 2.make code lines less than 100 3.refactor class parameter and extends defination	2013-04-09 13:02:50 +08:00
Matei Zaharia	054feb6448	Fixed a bug with zip	2013-04-07 21:15:21 -04:00
Matei Zaharia	b5900d47b1	Fix compile warning	2013-04-07 20:55:42 -04:00
Matei Zaharia	6962d40b44	Fix deprecated warning	2013-04-07 20:27:33 -04:00
Mridul Muralidharan	6798a09df8	Add support for building against hadoop2-yarn : adding new maven profile for it	2013-04-07 17:47:38 +05:30
shane-huang	df47b40b76	Shuffle Performance fix: Use netty embeded OIO file server instead of ConnectionManager Shuffle Performance Optimization: do not send 0-byte block requests to reduce network messages change reference from io.Source to scala.io.Source to avoid looking into io.netty package Signed-off-by: shane-huang <shengsheng.huang@intel.com>	2013-04-07 14:37:12 +08:00
Andrew xia	2b373dd07a	add properties default value null to fix sbt/sbt test errors	2013-04-02 12:11:14 +08:00
Mark Hamstra	e215f67923	Correct sense of 'filter out' in comment.	2013-03-31 08:00:13 -07:00
Mark Hamstra	8bcdc64005	Fixed broken filter in getWritableClass[T]	2013-03-30 22:09:52 -07:00
Matei Zaharia	9831bc1a09	Merge pull request #539 from cgrothaus/fix-webui-workdirpath Bugfix: WorkerWebUI must respect workDirPath from Worker	2013-03-29 22:16:22 -07:00
Matei Zaharia	3cc8ab6e29	Merge pull request #541 from stephenh/shufflecoalesce Add a shuffle parameter to coalesce.	2013-03-29 22:14:07 -07:00
Andrew xia	1a28f92711	change some typo and some spacing	2013-03-29 08:34:28 +08:00
Andrew xia	def3d1c84a	1.remove redundant spacing in source code 2.replace get/set functions with val and var defination	2013-03-29 08:20:35 +08:00
Holden Karau	f5df729b12	Explicitly catch all throwables (warning in 2.10)	2013-03-24 16:15:32 -07:00
Stephen Haberman	dd854d5b9f	Use Boolean in the Java API, and != for assert.	2013-03-23 11:49:45 -05:00
Stephen Haberman	4ca273edc4	Merge branch 'master' into shufflecoalesce Conflicts: core/src/test/scala/spark/RDDSuite.scala	2013-03-23 11:45:45 -05:00
Matei Zaharia	b8949cab88	Merge pull request #505 from stephenh/volatile Make Executor fields volatile since they're read from the thread pool.	2013-03-23 07:19:34 -07:00
Matei Zaharia	fd53f2fc7b	Merge pull request #510 from markhamstra/WithThing mapWith, flatMapWith and filterWith	2013-03-23 07:13:21 -07:00
Andrew xia	d1d9bdaabe	Just update typo and comments	2013-03-23 07:25:30 +08:00
Stephen Haberman	00170eb0b9	Fix are/our typo.	2013-03-22 12:59:08 -05:00
Stephen Haberman	1c67c7dfd1	Add a shuffle parameter to coalesce. This is useful for when you want just 1 output file (part-00000) but still up the upstream RDD to be computed in parallel.	2013-03-22 08:54:44 -05:00
Christoph Grothaus	445f387ef4	Bugfix: WorkerWebUI must respect workDirPath from Worker	2013-03-22 11:08:40 +01:00
Matei Zaharia	35588490cb	Merge pull request #538 from rxin/cogroup Added mapSideCombine flag to CoGroupedRDD. Added unit test for CoGroupedRDD.	2013-03-20 19:27:47 -07:00
Stephen Haberman	4f4215311a	Merge branch 'master' into volatile	2013-03-20 15:37:10 -05:00
Matei Zaharia	b812e6b7bb	Merge pull request #526 from markhamstra/foldByKey Add foldByKey	2013-03-20 11:21:02 -07:00
Reynold Xin	d48ee7e55e	Merge branch 'master' of github.com:mesos/spark into cogroup	2013-03-20 14:00:28 +08:00
Reynold Xin	00a11304fd	Added mapSideCombine flag to CoGroupedRDD. Added unit test for CoGroupedRDD.	2013-03-20 13:49:51 +08:00
Matei Zaharia	945d1e720e	Merge pull request #536 from sasurfer/master CoalescedRDD for many partitions	2013-03-19 21:59:06 -07:00
Matei Zaharia	1cbbe94ac1	Merge pull request #534 from stephenh/removetrycatch Remove try/catch block that can't be hit.	2013-03-19 21:34:34 -07:00
Andrey Kouznetsov	bd167f83b0	call setConf from input format if it is Configurable	2013-03-19 17:15:15 +04:00
Giovanni Delussu	aceae029f7	CoalescedRDD changed to work with a big number of partitions both in the original and the new coalesced RDD. The limitation was in the range that Scala.Int can represent.	2013-03-19 11:25:45 +01:00
Stephen Haberman	fb34967815	Remove try/catch block that can't be hit.	2013-03-18 01:55:50 -05:00
Mark Hamstra	ab33e27cc9	constructorOfA -> constructA in doc comments	2013-03-16 15:29:15 -07:00
Mark Hamstra	9784fc1fcd	fix wayward comma in doc comment	2013-03-16 15:25:02 -07:00
Mark Hamstra	32979b5e7d	whitespace	2013-03-16 13:36:46 -07:00
Mark Hamstra	ca9f81e8fc	refactor foldByKey to use combineByKey	2013-03-16 13:31:01 -07:00
Mark Hamstra	1fb192ef40	Merge branch 'master' of https://github.com/mesos/spark into foldByKey	2013-03-16 12:17:13 -07:00
Mark Hamstra	80fc8c82ed	_With[Matei]	2013-03-16 12:16:29 -07:00
Mark Hamstra	38454c4aed	Merge branch 'master' of https://github.com/mesos/spark into WithThing	2013-03-16 11:54:44 -07:00
Matei Zaharia	c1e9cdc49f	Merge pull request #525 from stephenh/subtractByKey Add PairRDDFunctions.subtractByKey.	2013-03-16 11:47:45 -07:00
Mark Hamstra	ef75be3bf7	Merge branch 'master' of https://github.com/mesos/spark into foldByKey	2013-03-15 21:41:24 -07:00
Andrew xia	5892393140	refactor fair scheduler implementation 1.Chage "pool" properties to be the memeber of ActiveJob 2.Abstract the Schedulable of Pool and TaskSetManager 3.Abstract the FIFO and FS comparator algorithm 4.Miscellaneous changing of class define and construction	2013-03-16 11:13:38 +08:00
Matei Zaharia	cdbfd1e196	Merge pull request #516 from squito/fix_local_metrics Fix local metrics	2013-03-15 15:13:28 -07:00
Mark Hamstra	857010392b	Fuller implementation of foldByKey	2013-03-15 10:56:05 -07:00
Mark Hamstra	16a4ca4537	restrict V type of foldByKey in order to retain ClassManifest; added foldByKey to Java API and test	2013-03-14 13:58:37 -07:00
Mark Hamstra	b1422cbdd5	added foldByKey	2013-03-14 12:59:58 -07:00
Stephen Haberman	7786881f47	Fix tabs that snuck in.	2013-03-14 14:57:12 -05:00
Stephen Haberman	7d8bb4df3a	Allow subtractByKey's other argument to have a different value type.	2013-03-14 14:44:15 -05:00
Stephen Haberman	4632c45af1	Finished subtractByKeys.	2013-03-14 10:35:34 -05:00
Matei Zaharia	4032beba49	Merge pull request #521 from stephenh/earlyclose Close the reader in HadoopRDD as soon as iteration end.	2013-03-13 19:29:46 -07:00
Stephen Haberman	63fe225587	Simplify SubtractedRDD in preparation from subtractByKey.	2013-03-13 17:17:34 -05:00
Mark Hamstra	cd5b947cf6	Merge branch 'master' of https://github.com/mesos/spark into WithThing	2013-03-13 13:16:14 -07:00
Stephen Haberman	1a175d13b9	Add NextIterator.closeIfNeeded.	2013-03-13 10:17:39 -05:00
Stephen Haberman	8f00d23598	Remove NextIterator.close default implementation.	2013-03-12 12:30:10 -05:00
Harold Lim	0b64e5f1ac	Removed some commented code	2013-03-12 13:31:27 +08:00
Harold Lim	f5b1fecb9f	Cleaned up the code	2013-03-12 13:31:27 +08:00
Harold Lim	b5325182a3	Updated/Refactored the Fair Task Scheduler. It does not inherit ClusterScheduler anymore. Rather, ClusterScheduler internally uses TaskSetQueuesManager that handles the scheduling of taskset queues. This is the class that should be extended to support other scheduling policies	2013-03-12 13:31:27 +08:00
Harold Lim	54ed7c4af4	Changed the name of the system property to set the allocation xml	2013-03-12 13:31:27 +08:00
Harold Lim	c07087364b	Made changes to the SparkContext to have a DynamicVariable for setting local properties that can be passed down the stack. Added an implementation of the fair scheduler	2013-03-12 13:31:27 +08:00
Stephen Haberman	9e68f48625	More quickly call close in HadoopRDD. This also refactors out the common "gotNext" iterator pattern into a shared utility class.	2013-03-11 23:59:17 -05:00
Charles Reiss	769d399674	Send block sizes as longs.	2013-03-11 14:17:05 -07:00
Mark Hamstra	1289e7176b	refactored _With API and added foreachPartition	2013-03-10 22:27:13 -07:00
Mark Hamstra	b57df1f5e3	Merge branch 'master' of https://github.com/mesos/spark into WithThing	2013-03-10 16:56:31 -07:00
Matei Zaharia	91a9d093bd	Merge pull request #512 from patelh/fix-kryo-serializer Fix reference bug in Kryo serializer, add test, update version	2013-03-10 15:48:23 -07:00
Matei Zaharia	557cfd0f4d	Merge pull request #515 from woggling/deploy-app-death Notify standalone deploy client of application death.	2013-03-10 15:44:57 -07:00
Matei Zaharia	a59cc6060f	Merge remote-tracking branch 'stephenh/nomocks' Conflicts: core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/test/scala/spark/scheduler/DAGSchedulerSuite.scala	2013-03-10 13:39:10 -07:00
Imran Rashid	20f01a0a1b	enable task metrics in local mode, add tests	2013-03-09 21:17:31 -08:00
Imran Rashid	ec30188a2a	rename remoteFetchWaitTime to fetchWaitTime, since it also includes time from local fetches	2013-03-09 21:16:53 -08:00
Charles Reiss	b0983c5762	Notify standalone deploy client of application death. Usually, this isn't necessary since the application will be removed as a result of the deploy client disconnecting, but occassionally, the standalone deploy master removes an application otherwise. Also mark applications as FAILED instead of FINISHED when they are killed as a result of their executors failing too many times.	2013-03-09 11:29:45 -08:00
Hiral Patel	664e5fd24b	Fix reference bug in Kryo serializer, add test, update version	2013-03-07 22:16:11 -08:00
Mark Hamstra	5ff0810b11	refactor mapWith, flatMapWith and filterWith to each use two parameter lists	2013-03-05 12:25:44 -08:00
Mark Hamstra	d046d8ad32	whitespace formatting	2013-03-05 00:48:13 -08:00
Mark Hamstra	9148b968cf	mapWith, flatMapWith and filterWith	2013-03-04 15:48:47 -08:00
Matei Zaharia	9f0dc829cb	Fix TaskMetrics not being serializable	2013-03-04 12:08:31 -08:00
Matei Zaharia	04fb81ffe5	Merge pull request #506 from rxin/spark-706 Fixed SPARK-706: Failures in block manager put leads to read task hanging.	2013-03-03 17:20:07 -08:00
Imran Rashid	0bd1d00c2a	minor cleanup based on feedback in review request	2013-03-03 16:46:45 -08:00
Imran Rashid	f1006b99ff	change CleanupIterator to CompletionIterator	2013-03-03 16:39:05 -08:00
Imran Rashid	8fef5b9c5f	refactoring of TaskMetrics	2013-03-03 16:34:04 -08:00
Imran Rashid	d36abdb053	Merge branch 'master' into stageInfo	2013-03-03 15:20:46 -08:00
Reynold Xin	44134e12bb	Fixed SPARK-706: Failures in block manager put leads to read task hanging.	2013-02-28 15:14:59 -08:00
Stephen Haberman	6415c2bb60	Don't create the Executor until we have everything it needs.	2013-02-28 12:38:09 -06:00
Stephen Haberman	80eecd2cb1	Make Executor fields volatile since they're read from the thread pool.	2013-02-28 10:41:07 -06:00
Mosharaf Chowdhury	4ab387bcdb	Fixed master datastructure updates after removing an application; and a typo.	2013-02-27 13:52:44 -08:00
Matei Zaharia	ece3edfffa	Fix a problem with no hosts being counted as alive in the first job	2013-02-26 12:11:03 -08:00
Matei Zaharia	73697e2891	Fix overly large thread names in PySpark	2013-02-26 12:07:59 -08:00
Stephen Haberman	a65aa549ff	Override DAGScheduler.runLocally so we can remove the Thread.sleep.	2013-02-25 23:49:32 -06:00
Stephen Haberman	a4adeb255c	Merge branch 'master' into nomocks Conflicts: core/src/test/scala/spark/scheduler/DAGSchedulerSuite.scala	2013-02-25 23:48:52 -06:00
Tathagata Das	c02e064938	Fixed replication bug in BlockManager	2013-02-25 17:27:46 -08:00
Matei Zaharia	490f056cdd	Allow passing sparkHome and JARs to StreamingContext constructor Also warns if spark.cleaner.ttl is not set in the version where you pass your own SparkContext.	2013-02-25 15:13:30 -08:00
Matei Zaharia	568bdaf8ae	Set spark.deploy.spreadOut to true by default in 0.7 (improves locality)	2013-02-25 14:34:55 -08:00
Matei Zaharia	1ef58dadcc	Add a config property for Akka lifecycle event logging	2013-02-25 14:01:24 -08:00
Matei Zaharia	ceaec4a675	Merge pull request #498 from pwendell/shutup-akka Disable remote lifecycle logging from Akka.	2013-02-25 12:31:24 -08:00
Patrick Wendell	85a85646d9	Disable remote lifecycle logging from Akka. This changes the default setting to `off` for remote lifecycle events. When this is on, it is very chatty at the INFO level. It also prints out several ERROR messages sometimes when sc.stop() is called.	2013-02-25 12:25:43 -08:00
Imran Rashid	8f17387d97	remove bogus comment	2013-02-25 10:31:06 -08:00
Matei Zaharia	6ae9a22c3e	Get spark.default.paralellism on each call to defaultPartitioner, instead of only once, in case the user changes it across Spark uses	2013-02-25 10:28:08 -08:00
Matei Zaharia	d6e6abece3	Merge pull request #459 from stephenh/bettersplits Change defaultPartitioner to use upstream split size.	2013-02-25 09:22:04 -08:00
Stephen Haberman	c44ccf2862	Use default parallelism if its set.	2013-02-24 23:54:03 -06:00
Stephen Haberman	44032bc476	Merge branch 'master' into bettersplits Conflicts: core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/test/scala/spark/ShuffleSuite.scala	2013-02-24 22:08:14 -06:00
Christoph Grothaus	f39f2b7636	Incorporate feedback from mateiz: - we do not need getEnvOrEmpty - Instead of saving SPARK_NONDAEMON_JAVA_OPTS, it would be better to modify the scripts to use a different variable name for the JAVA_OPTS they do eventually use	2013-02-24 21:24:30 +01:00
Tathagata Das	dff53d1b94	Merge branch 'mesos-master' into streaming	2013-02-24 12:17:22 -08:00
Matei Zaharia	3b9f929467	Merge pull request #468 from haitaoyao/master support customized java options for Master, Worker, Executor, and Repl	2013-02-23 23:38:15 -08:00
Stephen Haberman	37c7a71f9c	Add subtract to JavaRDD, JavaDoubleRDD, and JavaPairRDD.	2013-02-24 00:27:53 -06:00
Stephen Haberman	f442e7d83c	Update for split->partition rename.	2013-02-24 00:27:14 -06:00
Stephen Haberman	cec87a0653	Merge branch 'master' into subtract	2013-02-23 23:27:55 -06:00
Tathagata Das	d853aa9658	Change spark.cleaner.delay to spark.cleaner.ttl. Updated docs.	2013-02-23 17:42:26 -08:00
Patrick Wendell	931f439be9	Responding to code review	2013-02-23 15:40:41 -08:00
Patrick Wendell	f51b0f93f2	Adding Java-accessible methods to Vector.scala This is needed for the Strata machine learning tutorial (and also is generally helpful).	2013-02-23 13:26:59 -08:00
Matei Zaharia	d942d39072	Handle exceptions in RecordReader.close() better (suggested by Jim Donahue)	2013-02-23 11:19:07 -08:00
Matei Zaharia	c89824046a	Merge pull request #490 from woggling/conn-death Detect when SendingConnections disconnect even if we aren't sending to them	2013-02-22 22:58:19 -08:00
Charles Reiss	c8a7886921	Detect when SendingConnections drop by trying to read them. Comment fix	2013-02-22 16:11:52 -08:00
Matei Zaharia	d4d7993bf5	Several fixes to the work to log when no resources can be used by a job. Fixed some of the messages as well as code style.	2013-02-22 15:51:37 -08:00
Matei Zaharia	f33662c133	Merge remote-tracking branch 'pwendell/starvation-check' Also fixed a bug where master was offering executors on dead workers Conflicts: core/src/main/scala/spark/deploy/master/Master.scala	2013-02-22 15:27:41 -08:00
Matei Zaharia	7341de0d48	Merge pull request #475 from JoshRosen/spark-668 Remove hack workaround for SPARK-668	2013-02-22 14:56:18 -08:00
Patrick Wendell	f8c3a03d55	SPARK-702: Replace Function --> JFunction in JavaAPI Suite. In a few places the Scala (rather than Java) function class is used.	2013-02-22 12:54:15 -08:00
Imran Rashid	0f37b43b40	make the ShuffleFetcher responsible for collecting shuffle metrics, which gives us metrics for CoGroupedRDD and ShuffledRDD	2013-02-21 16:56:28 -08:00
Imran Rashid	9230617f23	add cleanup iterator	2013-02-21 16:55:14 -08:00
Imran Rashid	81bd07da26	sparkListeners should be a val	2013-02-21 15:21:45 -08:00
Imran Rashid	796e934d31	add some docs & some cleanup	2013-02-21 15:19:34 -08:00
Imran Rashid	394d3acc3e	store taskInfo & metrics together in a tuple	2013-02-21 15:19:34 -08:00
Imran Rashid	7960927cf4	get rid of a bunch of boilerplate; more formatting happens in Listener, not StageInfo	2013-02-21 15:19:34 -08:00
Imran Rashid	d0bfac3eed	taskInfo tracks if a task is run on a preferred host	2013-02-21 15:19:34 -08:00
Imran Rashid	6f62a57858	add runtime breakdowns	2013-02-21 15:19:34 -08:00
Imran Rashid	176cb20703	add task result size; better formatting for time interval distributions; cleanup distribution formatting	2013-02-21 15:19:33 -08:00
Imran Rashid	f2fcabf2ea	add timing around parts of executor & track result size	2013-02-21 15:19:33 -08:00
Imran Rashid	ff127cfcd3	Merge branch 'master' into stageInfo Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/storage/BlockManager.scala	2013-02-21 15:16:21 -08:00
Imran Rashid	baab23abdf	TaskContext does not hold a reference to Task; instead, it has a shared instance of TaskMetrics with Task	2013-02-21 14:13:01 -08:00
haitao.yao	8215b95547	Merge branch 'mesos'	2013-02-21 10:07:24 +08:00
Christoph Grothaus	85a35c6840	Fix SPARK-698. From ExecutorRunner, launch java directly instead via the run scripts.	2013-02-20 21:42:11 +01:00
Tathagata Das	334ab92441	Fixed bug in CheckpointSuite	2013-02-20 10:26:36 -08:00
Tathagata Das	1cb725e417	Merge branch 'mesos-master' into streaming	2013-02-20 09:55:35 -08:00
Tathagata Das	fb9956256d	Merge branch 'mesos-master' into streaming Conflicts: core/src/main/scala/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/spark/streaming/dstream/ReducedWindowedDStream.scala	2013-02-20 09:01:29 -08:00
Matei Zaharia	05bc02e80b	Merge pull request #482 from woggling/shutdown-exceptions Don't call System.exit over uncaught exceptions from shutdown hooks	2013-02-19 20:56:15 -08:00
haitao.yao	6a3d44c673	Merge branch 'mesos'	2013-02-20 10:23:58 +08:00
Charles Reiss	092c631fa8	Pull detection of being in a shutdown hook into utility function.	2013-02-19 17:49:55 -08:00
Reynold Xin	130f704baf	Added a method to create PartitionPruningRDD.	2013-02-19 16:03:52 -08:00
Charles Reiss	d0588bd6d7	Catch/log errors deleting temp dirs	2013-02-19 13:04:06 -08:00
Charles Reiss	687581c3ec	Paranoid uncaught exception handling for exceptions during shutdown	2013-02-19 13:03:02 -08:00
haitao.yao	7c129388fb	Merge branch 'mesos'	2013-02-19 11:22:24 +08:00
Matei Zaharia	7151e1e4c8	Rename "jobs" to "applications" in the standalone cluster	2013-02-17 23:23:08 -08:00
Matei Zaharia	06e5e6627f	Renamed "splits" to "partitions"	2013-02-17 22:13:26 -08:00
Matei Zaharia	340cc54e47	Merge pull request #471 from stephenh/parallelrdd Move ParallelCollection into spark.rdd package.	2013-02-16 16:39:15 -08:00
Matei Zaharia	3260b6120e	Merge pull request #470 from stephenh/morek Make CoGroupedRDDs explicitly have the same key type.	2013-02-16 16:38:38 -08:00
Stephen Haberman	924f47dd11	Add RDD.subtract. Instead of reusing the cogroup primitive, this adds a SubtractedRDD that knows it only needs to keep rdd1's values (per split) in memory.	2013-02-16 13:38:42 -06:00
Stephen Haberman	e7713adb99	Move ParallelCollection into spark.rdd package.	2013-02-16 13:20:48 -06:00
Stephen Haberman	ae2234687d	Make CoGroupedRDDs explicitly have the same key type.	2013-02-16 13:10:31 -06:00
Stephen Haberman	4328873294	Add assertion about dependencies.	2013-02-16 01:16:40 -06:00
Stephen Haberman	c34b8ad2c5	Avoid a shuffle if combineByKey is passed the same partitioner.	2013-02-16 00:54:03 -06:00
Stephen Haberman	4281e579c2	Update more javadocs.	2013-02-16 00:45:03 -06:00
Stephen Haberman	6cd68c31cb	Update default.parallelism docs, have StandaloneSchedulerBackend use it. Only brand new RDDs (e.g. parallelize and makeRDD) now use default parallelism, everything else uses their largest parent's partitioner or partition size.	2013-02-16 00:29:11 -06:00
haitao.yao	a9cfac347a	Merge branch 'mesos'	2013-02-16 10:11:28 +08:00
Imran Rashid	bffee929ab	Merge branch 'master' into stageInfo Conflicts: core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/storage/BlockManager.scala	2013-02-15 10:35:04 -08:00
Imran Rashid	893bad9089	use appid instead of frameworkid; simplify stupid condition	2013-02-13 20:30:21 -08:00
Imran Rashid	8f18e7e863	include jobid in Executor commandline args	2013-02-13 13:05:13 -08:00
Matei Zaharia	bfeed4725d	Merge pull request #465 from pwendell/java-sort-fix SPARK-696: sortByKey should use 'ascending' parameter	2013-02-11 18:23:12 -08:00
Patrick Wendell	21df6ffc13	SPARK-696: sortByKey should use 'ascending' parameter	2013-02-11 17:43:26 -08:00
Matei Zaharia	ea08537143	Fixed an exponential recursion that could happen with doCheckpoint due to lack of memoization	2013-02-11 13:23:50 -08:00
Josh Rosen	e9fb25426e	Remove hack workaround for SPARK-668. Renaming the type paramters solves this problem (see SPARK-694). I tried this fix earlier, but it didn't work because I didn't run `sbt/sbt clean` first.	2013-02-11 11:19:20 -08:00
Imran Rashid	e9f53ec0ea	undo chnage to onCompleteCallbacks	2013-02-11 09:36:49 -08:00
Matei Zaharia	da8afbc77e	Some bug and formatting fixes to FT Conflicts: core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala	2013-02-10 22:43:38 -08:00
root	1b47fa2752	Detect hard crashes of workers using a heartbeat mechanism. Also fixes some issues in the rest of the code with detecting workers this way. Conflicts: core/src/main/scala/spark/deploy/master/Master.scala core/src/main/scala/spark/deploy/worker/Worker.scala core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala	2013-02-10 22:28:28 -08:00
Matei Zaharia	8c66c49962	Tweak web UI so that people don't get confused about master URL format Conflicts: core/src/main/twirl/spark/deploy/master/index.scala.html core/src/main/twirl/spark/deploy/worker/index.scala.html	2013-02-10 21:58:34 -08:00
Imran Rashid	d9461b15d3	cleanup a bunch of imports	2013-02-10 21:41:40 -08:00
Tathagata Das	16baea62bc	Fixed bug in CheckpointRDD to prevent exception when the original RDD had zero splits.	2013-02-10 19:14:49 -08:00
Imran Rashid	383af599bb	SparkContext.addSparkListener; "std" listener in StatsReportListener	2013-02-10 14:19:37 -08:00
Imran Rashid	b7d9e24394	use TaskMetrics to gather all stats; lots of plumbing to get it all the way back to driver	2013-02-10 14:18:52 -08:00
Stephen Haberman	680f42e6cd	Change defaultPartitioner to use upstream split size. Previously it used the SparkContext.defaultParallelism, which occassionally ended up being a very bad guess. Looking at upstream RDDs seems to make better use of the context. Also sorted the upstream RDDs by partition size first, as if we have a hugely-partitioned RDD and tiny-partitioned RDD, it is unlikely we want the resulting RDD to be tiny-partitioned.	2013-02-10 02:27:03 -06:00
Patrick Wendell	2ed791fd7f	Minor fixes	2013-02-09 22:00:38 -08:00
Patrick Wendell	1859c9f93c	Changing to use Timer based on code review	2013-02-09 21:55:17 -08:00
Matei Zaharia	ccb1ca4a23	Merge pull request #448 from squito/fetch_maxBytesInFlight add as many fetch requests as we can, subject to maxBytesInFlight	2013-02-09 18:15:18 -08:00
Matei Zaharia	f750daa510	Merge pull request #452 from stephenh/misc Add RDD.coalesce, clean up some RDDs, other misc.	2013-02-09 18:12:56 -08:00
Stephen Haberman	4619ee0787	Move JavaRDDLike.coalesce into the right places.	2013-02-09 20:05:42 -06:00
Stephen Haberman	921be76533	Use stubs instead of mocks for DAGSchedulerSuite.	2013-02-09 16:42:18 -06:00
Stephen Haberman	fb7599870f	Fix JavaRDDLike.coalesce return type.	2013-02-09 16:10:52 -06:00
Stephen Haberman	2a18cd826c	Add back return types.	2013-02-09 10:12:04 -06:00
Stephen Haberman	da52b16b38	Remove RDD.coalesce default arguments.	2013-02-09 10:11:54 -06:00
Imran Rashid	04e828f7c1	general fixes to Distribution, plus some tests	2013-02-08 19:07:36 -08:00
Mark Hamstra	b8863a79d3	Merge branch 'master' of https://github.com/mesos/spark into commutative Conflicts: core/src/main/scala/spark/RDD.scala	2013-02-08 18:26:00 -08:00
Mark Hamstra	934a53c8b6	Change docs on 'reduce' since the merging of local reduces no longer preserves ordering, so the reduce function must also be commutative.	2013-02-05 22:19:58 -08:00
Stephen Haberman	a9c8d53cfa	Clean up RDDs, mainly to use getSplits. Also made sure clearDependencies() was calling super, to ensure the getSplits/getDependencies vars in the RDD base class get cleaned up.	2013-02-05 22:16:59 -06:00
Stephen Haberman	f4d43cb43e	Remove unneeded zipWithIndex. Also rename r->rdd and remove unneeded extra type info.	2013-02-05 21:26:45 -06:00
Stephen Haberman	f2bc748013	Add RDD.coalesce.	2013-02-05 21:23:36 -06:00
Stephen Haberman	67df7f2fa2	Add private, minor formatting.	2013-02-05 21:08:21 -06:00

... 15 16 17 18 19 ...

2632 commits