ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Aaron Davidson	da896115ec	Change BlockId filename to name + rest of Patrick's comments	2013-10-13 11:15:02 -07:00
Aaron Davidson	d60352283c	Add unit test and address rest of Reynold's comments	2013-10-12 22:45:15 -07:00
Aaron Davidson	a395911138	Refactor BlockId into an actual type This is an unfortunately invasive change which converts all of our BlockId strings into actual BlockId types. Here are some advantages of doing this now: + Type safety + Code clarity - it's now obvious what the key of a shuffle or rdd block is, for instance. Additionally, appearing in tuple/map type signatures is a big readability bonus. A Seq[(String, BlockStatus)] is not very clear. Further, we can now use more Scala features, like matching on BlockId types. + Explicit usage - we can now formally tell where various BlockIds are being used (without doing string searches); this makes updating current BlockIds a much clearer process, and compiler-supported. (I'm looking at you, shuffle file consolidation.) + It will only get harder to make this change as time goes on. Since this touches a lot of files, it'd be best to either get this patch in quickly or throw it on the ground to avoid too many secondary merge conflicts.	2013-10-12 22:44:57 -07:00
Reynold Xin	88866ea9c9	Fixed PairRDDFunctionsSuite after removing InterruptibleRDD.	2013-10-12 20:05:23 -07:00
Reynold Xin	6b288b75d4	Job cancellation: address Matei's code review feedback.	2013-10-12 15:53:31 -07:00
Reynold Xin	97ffebbe87	Fixed dagscheduler suite because of a logging message change.	2013-10-11 16:18:22 -07:00
Reynold Xin	a61cf40ab9	Job cancellation: addressed code review feedback from Kay.	2013-10-11 15:58:14 -07:00
Reynold Xin	e2047d3927	Making takeAsync and collectAsync deterministic.	2013-10-11 13:04:45 -07:00
Reynold Xin	42fb1df694	Merge branch 'master' of github.com:apache/incubator-spark into kill Conflicts: core/src/main/scala/org/apache/spark/rdd/CoGroupedRDD.scala	2013-10-10 23:48:05 -07:00
Reynold Xin	d9e724e756	Fixed the broken local scheduler test.	2013-10-10 23:08:13 -07:00
Reynold Xin	37397b73ba	Added comprehensive tests for job cancellation in a variety of environments (local vs cluster, fifo vs fair).	2013-10-10 22:57:43 -07:00
Reynold Xin	80cdbf4f49	Switched to use daemon thread in executor and fixed a bug in job cancellation for fair scheduler.	2013-10-10 22:40:48 -07:00
Reynold Xin	ec2e2ed1e1	Use the same Executor in LocalScheduler as in ClusterScheduler.	2013-10-10 18:55:25 -07:00
Matei Zaharia	c71499b779	Merge pull request #19 from aarondav/master-zk Standalone Scheduler fault tolerance using ZooKeeper This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch `d5a96fe`), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from `d5a96fe`. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again.	2013-10-10 17:16:42 -07:00
Matei Zaharia	001d13f7b9	Merge branch 'master' into fast-map Conflicts: core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala	2013-10-10 13:26:43 -07:00
Reynold Xin	3bd2890d2b	Fixed the deadlock situation in multi-job actions and added more unit tests.	2013-10-10 12:07:09 -07:00
Prashant Sharma	34da58ae50	Changed message-frame-size to maximum-frame-size as property. Removed a test accidentally added during merge.	2013-10-10 15:13:44 +05:30
Reynold Xin	0353f74a9a	Put the job cancellation handling into the dagscheduler's main event loop.	2013-10-10 00:28:00 -07:00
Reynold Xin	dbae7795ba	Merge branch 'master' of github.com:apache/incubator-spark into kill Conflicts: core/src/main/scala/org/apache/spark/CacheManager.scala core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala core/src/main/scala/org/apache/spark/scheduler/DAGSchedulerSource.scala	2013-10-09 22:57:35 -07:00
Reynold Xin	53895f9cde	Implemented FutureAction, FutureJob, CancellablePromise. Implemented more unit tests for async actions.	2013-10-09 22:43:06 -07:00
Prashant Sharma	026ab75661	Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10	2013-10-10 09:42:55 +05:30
Prashant Sharma	26860639c5	Merge branch 'scala-2.10' of github.com:ScrapCodes/spark into scala-2.10 Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala project/SparkBuild.scala	2013-10-10 09:42:23 +05:30
Reynold Xin	320418f7c8	Merge pull request #49 from mateiz/kryo-fix-2 Fix Chill serialization of Range objects It used to write out each element one by one, creating very large objects.	2013-10-09 16:55:30 -07:00
Matei Zaharia	c84c205289	Fix Chill serialization of Range objects, which used to write out each element, and register user and Spark classes before Chill's serializers to let them override Chill's behavior in general.	2013-10-09 16:23:40 -07:00
Kay Ousterhout	36966f65df	Style fixes	2013-10-09 15:36:34 -07:00
Kay Ousterhout	a34a4e8174	Fix race condition in SparkListenerSuite (fixes SPARK-908).	2013-10-09 15:07:53 -07:00
Matei Zaharia	4acbc5afdd	Moved files that were in the wrong directory after package rename	2013-10-08 23:16:17 -07:00
Matei Zaharia	0e40cfabf8	Fix some review comments	2013-10-08 23:16:16 -07:00
Matei Zaharia	b535db7d89	Added a fast and low-memory append-only map implementation for cogroup and parallel reduce operations	2013-10-08 23:14:38 -07:00
Prashant Sharma	7be75682b9	Merge branch 'master' into wip-merge-master Conflicts: bagel/pom.xml core/pom.xml core/src/test/scala/org/apache/spark/ui/UISuite.scala examples/pom.xml mllib/pom.xml pom.xml project/SparkBuild.scala repl/pom.xml streaming/pom.xml tools/pom.xml In scala 2.10, a shorter representation is used for naming artifacts so changed to shorter scala version for artifacts and made it a property in pom.	2013-10-08 11:29:40 +05:30
Reynold Xin	213b70a2db	Merge pull request #31 from sundeepn/branch-0.8 Resolving package conflicts with hadoop 0.23.9 Hadoop 0.23.9 is having a package conflict with easymock's dependencies. (cherry picked from commit `023e3fdf00`) Signed-off-by: Reynold Xin <rxin@apache.org>	2013-10-07 10:54:22 -07:00
Patrick Wendell	aa9fb84994	Merging build changes in from 0.8	2013-10-05 22:07:00 -07:00
Aaron Davidson	0f070279e7	Address Matei's comments	2013-10-05 15:15:29 -07:00
Martin Weindel	e09f4a9601	fixed some warnings	2013-10-05 23:08:23 +02:00
Prashant Sharma	3e41495288	Fixed tests, changed property akka.remote.netty.x to akka.remote.netty.tcp.x	2013-10-05 16:39:25 +05:30
Prashant Sharma	c810ee0690	Merge branch 'master' into scala-2.10 Conflicts: core/src/test/scala/org/apache/spark/DistributedSuite.scala project/SparkBuild.scala	2013-10-05 15:52:57 +05:30
Aaron Davidson	db6f154940	Fix race conditions during recovery One major change was the use of messages instead of raw functions as the parameter of Akka scheduled timers. Since messages are serialized, unlike raw functions, the behavior is easier to think about and doesn't cause race conditions when exceptions are thrown. Another change is to avoid using global pointers that might change without a lock.	2013-10-04 19:54:33 -07:00
Andre Schumacher	c84946fe21	Fixing SPARK-602: PythonPartitioner Currently PythonPartitioner determines partition ID by hashing a byte-array representation of PySpark's key. This PR lets PythonPartitioner use the actual partition ID, which is required e.g. for sorting via PySpark.	2013-10-04 11:56:47 -07:00
Reynold Xin	d29e8035a0	Added countAsync and various unit tests for async actions.	2013-10-03 15:13:44 -07:00
Reynold Xin	e8e917f209	Merge branch 'master' into kill Conflicts: core/src/main/scala/org/apache/spark/TaskEndReason.scala core/src/main/scala/org/apache/spark/executor/Executor.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala	2013-10-02 23:01:34 -07:00
Reynold Xin	1c48ba0d9f	Merge remote-tracking branch 'origin' into kill Conflicts: core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala	2013-10-02 16:40:44 -07:00
Prashant Sharma	5829692885	Merge branch 'master' into scala-2.10 Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala docs/_config.yml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala	2013-10-01 11:57:24 +05:30
Kay Ousterhout	0dcad2edcb	Added additional unit test for repeated task failures	2013-09-30 23:26:15 -07:00
Kay Ousterhout	dea4677c88	Fixed compilation errors and broken test.	2013-09-30 22:07:01 -07:00
Kay Ousterhout	8deda427bc	Merge remote-tracking branch 'upstream/master' into results_through-bm Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterScheduler.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/local/LocalTaskSetManager.scala	2013-09-30 10:16:58 -07:00
Kay Ousterhout	58b764b7c6	Addressed Matei's code review comments	2013-09-30 10:11:59 -07:00
shane-huang	84849baf88	Merge branch 'reorgscripts' into scripts-reorg	2013-09-27 09:28:33 +08:00
Aaron Davidson	42d72308fb	Add license notices	2013-09-26 15:45:20 -07:00
Reynold Xin	70a0b993d4	Merge pull request #14 from kayousterhout/untangle_scheduler Improved organization of scheduling packages. This commit does not change any code -- only file organization. Please let me know if there was some masterminded strategy behind the existing organization that I failed to understand! There are two components of this change: (1) Moving files out of the cluster package, and down a level to the scheduling package. These files are all used by the local scheduler in addition to the cluster scheduler(s), so should not be in the cluster package. As a result of this change, none of the files in the local package reference files in the cluster package. (2) Moving the mesos package to within the cluster package. The mesos scheduling code is for a cluster, and represents a specific case of cluster scheduling (the Mesos-related classes often subclass cluster scheduling classes). Thus, the most logical place for it seems to be within the cluster package. The one thing about the scheduling code that seems a little funny to me is the naming of the SchedulerBackends. The StandaloneSchedulerBackend is not just for Standalone mode, but instead is used by Mesos coarse grained mode and Yarn, and the backend that is just for Standalone mode is instead called SparkDeploySchedulerBackend. I didn't change this because I wasn't sure if there was a reason for this naming that I'm just not aware of.	2013-09-26 14:11:54 -07:00
Patrick Wendell	6566a19b38	Merge pull request #9 from rxin/limit Smarter take/limit implementation.	2013-09-26 08:01:04 -07:00
Prashant Sharma	42f30b5590	Fixed UISuite, for case when port 4040 is already bound on machine running the test.	2013-09-26 14:38:42 +05:30
Prashant Sharma	604dc40996	Sync with master and some build fixes	2013-09-26 11:40:02 +05:30
Kay Ousterhout	d85fe41b2b	Improved organization of scheduling packages. This commit does not change any code -- only file organization. There are two components of this change: (1) Moving files out of the cluster package, and down a level to the scheduling package. These files are all used by the local scheduler in addition to the cluster scheduler(s), so should not be in the cluster package. As a result of this change, none of the files in the local package reference files in the cluster package. (2) Moving the mesos package to within the cluster package. The mesos scheduling code is for a cluster, and represents a specific case of cluster scheduling (the Mesos-related classes often subclass cluster scheduling classes). Thus, the most logical place for it is within the cluster package.	2013-09-25 12:45:46 -07:00
Reynold Xin	ff540a015b	Merge branch 'master' of github.com:markhamstra/incubator-spark	2013-09-23 11:55:02 -07:00
Kay Ousterhout	c75eb14fe5	Send Task results through the block manager when larger than Akka frame size. This change requires adding an extra failure mode: tasks can complete successfully, but the result gets lost or flushed from the block manager before it's been fetched.	2013-09-22 21:20:48 -07:00
shane-huang	dfbdc9ddb7	added spark-class and spark-executor to sbin Signed-off-by: shane-huang <shengsheng.huang@intel.com>	2013-09-23 11:28:58 +08:00
Reynold Xin	a2ea069a5f	Merge pull request #937 from jerryshao/localProperties-fix Fix PR926 local properties issues in Spark Streaming like scenarios	2013-09-21 23:04:42 -07:00
Prashant Sharma	276c37a51c	Akka 2.2 migration	2013-09-22 08:20:12 +05:30
jerryshao	aa0c29f747	Add barrier for local properties unit test and fix some styles	2013-09-22 09:53:11 +08:00
Reynold Xin	42571d30d0	Smarter take/limit implementation.	2013-09-20 17:09:53 -07:00
Ankur Dave	026dba6aba	After unit tests, clear port properties unconditionally In MapOutputTrackerSuite, the "remote fetch" test sets spark.driver.port and spark.hostPort, assuming that they will be cleared by LocalSparkContext. However, the test never sets sc, so it remains null, causing LocalSparkContext to skip clearing these properties. Subsequent tests therefore fail with java.net.BindException: "Address already in use". This commit makes LocalSparkContext clear the properties even if sc is null.	2013-09-19 22:05:23 -07:00
jerryshao	ffa5f8e11d	Fix issue when local properties pass from parent to child thread	2013-09-18 17:33:24 +08:00
Reynold Xin	37d8f37a8e	Added a submitJob interface that returns a Future of the result.	2013-09-17 21:13:59 -07:00
Reynold Xin	cbc48be13b	Initial commit for job killing.	2013-09-16 18:54:06 -07:00
Prashant Sharma	383e151fd7	Merge branch 'master' of git://github.com/mesos/spark into scala-2.10 Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala project/SparkBuild.scala	2013-09-15 10:55:12 +05:30
Patrick Wendell	bddf135670	Change port from 3030 to 4040	2013-09-11 10:01:38 -07:00
Matei Zaharia	a85758c200	Merge pull request #907 from stephenh/document_coalesce_shuffle Add better docs for coalesce.	2013-09-09 13:45:40 -07:00
Stephen Haberman	59003d387d	Use a set since shuffle could change order.	2013-09-09 11:45:03 -05:00
Matei Zaharia	7d3204b056	Merge pull request #905 from mateiz/docs2 Job scheduling and cluster mode docs	2013-09-08 21:39:12 -07:00
Patrick Wendell	f68848d95d	Merge pull request #906 from pwendell/ganglia-sink Clean-up of Metrics Code/Docs and Add Ganglia Sink	2013-09-08 18:32:16 -07:00
Matei Zaharia	170b3869ee	Fix unit test failure due to changed default	2013-09-08 17:51:27 -07:00
Patrick Wendell	c190b48bf5	Adding more docs and some code cleanup	2013-09-08 13:46:28 -07:00
Stephen Haberman	df5fd35273	Add better docs for coalesce. Include the useful tip that if shuffle=true, coalesce can actually increase the number of partitions. This makes coalesce more like a generic `RDD.repartition` operation. (Ideally this `RDD.repartition` could automatically choose either a coalesce or a shuffle if numPartitions was either less than or greater than, respectively, the current number of partitions.)	2013-09-08 15:39:04 -05:00
Matei Zaharia	651a96adf7	More fair scheduler docs and property names. Also changed uses of "job" terminology to "application" when they referred to an entire Spark program, to avoid confusion.	2013-09-08 00:29:11 -07:00
Matei Zaharia	98fb69822c	Work in progress: - Add job scheduling docs - Rename some fair scheduler properties - Organize intro page better - Link to Apache wiki for "contributing to Spark"	2013-09-08 00:29:11 -07:00
Reynold Xin	1e15feb5a3	Hot fix to resolve the compilation error caused by SPARK-821.	2013-09-06 22:44:05 +08:00
Prashant Sharma	4106ae9fbf	Merged with master	2013-09-06 17:53:01 +05:30
Aaron Davidson	3a04e76c89	Reynold's second round of comments	2013-09-05 21:43:26 -07:00
Aaron Davidson	4f2236a1c5	Add unit test and address comments	2013-09-05 18:06:30 -07:00
Aaron Davidson	1418d18af4	SPARK-821: Don't cache results when action run locally on driver Caching the results of local actions (e.g., rdd.first()) causes the driver to store entire partitions in its own memory, which may be highly constrained. This patch simply makes the CacheManager avoid caching the result of all locally-run computations.	2013-09-05 15:34:42 -07:00
Aaron Davidson	714e7f9e32	Fix line over 100 chars	2013-09-04 22:40:08 -07:00
Aaron Davidson	37db141aef	Address Patrick's comments	2013-09-04 21:34:20 -07:00
Aaron Davidson	9e6f2b6822	SPARK-884: Add unit test to validate Spark JSON output This unit test simply validates that the outputs of the JsonProtocol methods are syntactically valid JSON.	2013-09-04 15:26:46 -07:00
Mark Hamstra	c9bc8af3d1	Removed repetative import; fixes hidden definition compiler warning.	2013-09-03 15:25:20 -07:00
Matei Zaharia	12b2f1f9c9	Add missing license headers found with RAT	2013-09-02 12:23:03 -07:00
Matei Zaharia	246bf67f58	Fix test	2013-09-02 10:57:34 -07:00
Matei Zaharia	0a8cc30921	Move some classes to more appropriate packages: * RDD, RDDFunctions -> org.apache.spark.rdd Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util * JavaSerializer, KryoSerializer -> org.apache.spark.serializer	2013-09-01 14:13:16 -07:00
Matei Zaharia	46eecd110a	Initial work to rename package to org.apache.spark	2013-09-01 14:13:13 -07:00
Matei Zaharia	53cd50c069	Change build and run instructions to use assemblies This commit makes Spark invocation saner by using an assembly JAR to find all of Spark's dependencies instead of adding all the JARs in lib_managed. It also packages the examples into an assembly and uses that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script with two better-named scripts: "run-examples" for examples, and "spark-class" for Spark internal classes (e.g. REPL, master, etc). This is also designed to minimize the confusion people have in trying to use "run" to run their own classes; it's not meant to do that, but now at least if they look at it, they can modify run-examples to do a decent job for them. As part of this, Bagel's examples are also now properly moved to the examples package instead of bagel.	2013-08-29 21:19:04 -07:00
Ali Ghodsi	c0942a710f	Bug in test fixed	2013-08-20 16:16:05 -07:00
Ali Ghodsi	5db41919b5	Added a test to make sure no locality preferences are ignored	2013-08-20 16:16:05 -07:00
Ali Ghodsi	7b123b3126	Simpler code	2013-08-20 16:16:05 -07:00
Ali Ghodsi	a75a64eade	Fixed almost all of Matei's feedback	2013-08-20 16:16:05 -07:00
Ali Ghodsi	f1c853d76d	fixed Matei's comments	2013-08-20 16:16:04 -07:00
Ali Ghodsi	d6b6c680be	comment in the test to make it more understandable	2013-08-20 16:16:04 -07:00
Ali Ghodsi	b69e7166ba	Coalescer now uses current preferred locations for derived RDDs. Made run() in DAGScheduler thread safe and added a method to be able to ask it for preferred locations. Added a similar method that wraps the former inside SparkContext.	2013-08-20 16:16:04 -07:00
Ali Ghodsi	3b5bb8a4ae	added one test that will test a future functionality	2013-08-20 16:13:37 -07:00
Ali Ghodsi	33a0f59354	Added error messages to the tests to make failed tests less cryptic	2013-08-20 16:13:37 -07:00
Ali Ghodsi	f24861b60a	Fix bug in tests	2013-08-20 16:13:36 -07:00
Ali Ghodsi	937f72feb8	word wrap before 100 chars per line	2013-08-20 16:13:36 -07:00
Ali Ghodsi	7a2a33e32d	Large scale load and locality tests for the coalesced partitions added	2013-08-20 16:13:36 -07:00
Ali Ghodsi	1ede102ba5	load balancing coalescer	2013-08-20 16:13:36 -07:00
Matei Zaharia	8cae72e94e	Merge pull request #828 from mateiz/sched-improvements Scheduler fixes and improvements	2013-08-19 23:40:04 -07:00
Matei Zaharia	efeb142981	Merge pull request #849 from mateiz/web-fixes Small fixes to web UI	2013-08-19 19:23:50 -07:00
Matei Zaharia	793a722f8e	Allow some wiggle room in UISuite port test and in EC2 ports	2013-08-19 18:51:00 -07:00
Matei Zaharia	498a26189b	Small fixes to web UI: - Use SPARK_PUBLIC_DNS environment variable if set (for EC2) - Use a non-ephemeral port (3030 instead of 33000) by default - Updated test to use non-ephemeral port too	2013-08-19 18:17:49 -07:00
Reynold Xin	5054abd41b	Code review feedback. (added tests for cogroup and substract; added more documentation on MutablePair)	2013-08-19 12:58:02 -07:00
Reynold Xin	acc4aa1f47	Added a test for sorting using MutablePair's.	2013-08-19 11:02:10 -07:00
Reynold Xin	71d705a66e	Made PairRDDFunctions taking only Tuple2, but made the rest of the shuffle code path working with general Product2.	2013-08-19 00:40:43 -07:00
Matei Zaharia	8ac3d1e263	Added unit tests for ClusterTaskSetManager, and fix a bug found with resetting locality level after a non-local launch	2013-08-18 19:51:07 -07:00
Matei Zaharia	2a4ed10210	Address some review comments: - When a resourceOffers() call has multiple offers, force the TaskSets to consider them in increasing order of locality levels so that they get a chance to launch stuff locally across all offers - Simplify ClusterScheduler.prioritizeContainers - Add docs on the new configuration options	2013-08-18 19:51:07 -07:00
Matei Zaharia	222c897128	Comment cleanup (via Kay) and some debug messages	2013-08-18 19:51:07 -07:00
Matei Zaharia	90a04dab8d	Initial work towards scheduler refactoring: - Replace use of hostPort vs host in Task.preferredLocations with a TaskLocation class that contains either an executorId and a host or just a host. This is part of a bigger effort to eliminate hostPort based data structures and just use executorID, since the hostPort vs host stuff is confusing (and not checkable with static typing, leading to ugly debug code), and hostPorts are not provided by Mesos. - Replaced most hostPort-based data structures and fields as above. - Simplified ClusterTaskSetManager to deal with preferred locations in a more concise way and generally be more concise. - Updated the way ClusterTaskSetManager handles racks: instead of enqueueing a task to a separate queue for all the hosts in the rack, which would create lots of large queues, have one queue per rack name. - Removed non-local fallback stuff in ClusterScheduler that tried to launch less-local tasks on a node once the local ones were all assigned. This change didn't work because many cluster schedulers send offers for just one node at a time (even the standalone and YARN ones do so as nodes join the cluster one by one). Thus, lots of non-local tasks would be assigned even though a node with locality for them would be able to receive tasks just a short time later. - Renamed MapOutputTracker "generations" to "epochs".	2013-08-18 19:51:06 -07:00
Reynold Xin	2c00ea3efc	Moved shuffle serializer setting from a constructor parameter to a setSerializer method in various RDDs that involve shuffle operations.	2013-08-17 21:43:29 -07:00
Matei Zaharia	e89ffc7b3c	Merge pull request #839 from jegonzal/zip_partitions Currying RDD.zipPartitions	2013-08-16 14:02:34 -07:00
Joseph E. Gonzalez	53b2639a1e	Reversing the argument order in zipPartitions to enable stronger type inference.	2013-08-16 12:38:59 -07:00
Patrick Wendell	659553b21d	Merge pull request #836 from pwendell/rename Rename `memoryBytesToString` and `memoryMegabytesToString`	2013-08-15 16:56:31 -07:00
Patrick Wendell	4c6ade1ad5	Rename `memoryBytesToString` and `memoryMegabytesToString` These are used all over the place now and they are not specific to memory at all. memoryBytesToString --> bytesToString memoryMegabytesToString --> megabytesToString	2013-08-15 15:58:07 -07:00
Reynold Xin	3886b54933	A few small scheduler / job description changes. 1. Renamed SparkContext.addLocalProperty to setLocalProperty. And allow this function to unset a property. 2. Renamed SparkContext.setDescription to setCurrentJobDescription. 3. Throw an exception if the fair scheduler allocation file is invalid.	2013-08-14 17:19:42 -07:00
Patrick Wendell	ed6a1646e6	Slight change to pr-784	2013-08-13 09:29:40 -07:00
Patrick Wendell	a0133bfbad	Merge pull request #784 from jerryshao/dev-metrics-servlet Add MetricsServlet for Spark metrics system	2013-08-13 09:28:18 -07:00
jerryshao	09c7179e81	MetricsServlet code refactor according to comments	2013-08-12 13:23:23 +08:00
jerryshao	320e87e7ab	Add MetricsServlet for Spark metrics system	2013-08-12 13:23:23 +08:00
Josh Rosen	d7f78b443b	Change scala.Option to Guava Optional in Java APIs.	2013-08-11 12:05:09 -07:00
Matei Zaharia	d1e1c1b24d	Add test for Kryo with WrappedArray (which was failing in Chill 0.3.0)	2013-08-08 13:34:11 -07:00
Matei Zaharia	6b043a6f11	Merge pull request #724 from dlyubimov/SPARK-826 SPARK-826: fold(), reduce(), collect() always attempt to use java serialization	2013-08-06 22:31:02 -07:00
Patrick Wendell	5b3784a79c	Show user-defined job name in UI	2013-08-02 15:47:41 -07:00
Patrick Wendell	5e7b38fbb3	Merge pull request #695 from xiajunluan/pool_ui Enhance job ui in spark ui system with adding pool information	2013-08-01 14:59:33 -07:00
Dmitriy Lyubimov	d29ee3689b	Merge fixes merge commit hasn't picked	2013-08-01 00:21:26 -07:00
Dmitriy Lyubimov	cb6be5bd7e	Merge remote-tracking branch 'mesos/master' into SPARK-826 Conflicts: core/src/main/scala/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/spark/scheduler/local/LocalTaskSetManager.scala core/src/test/scala/spark/KryoSerializerSuite.scala	2013-07-31 22:09:22 -07:00
Dmitriy Lyubimov	28f1550f01	More elegant rewrite of the same.	2013-07-31 21:41:00 -07:00
Dmitriy Lyubimov	7c52ecc6a4	(1) added reduce test case. (2) added nested streaming in ParallelCollectionRDD (3) added kryo with fold test which still doesn't work	2013-07-31 19:27:30 -07:00
Andrew xia	5670c96f29	Merge branch 'master' into Pool_UI Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/scheduler/DAGScheduler.scala core/src/main/scala/spark/scheduler/SparkListener.scala core/src/main/scala/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/spark/scheduler/cluster/TaskSetManager.scala core/src/main/scala/spark/scheduler/local/LocalTaskSetManager.scala core/src/main/scala/spark/ui/jobs/IndexPage.scala core/src/main/scala/spark/ui/jobs/JobProgressUI.scala	2013-07-31 19:36:36 +08:00
Reynold Xin	98024eadc3	Renamed compressionOutputStream and compressionInputStream to compressedOutputStream and compressedInputStream.	2013-07-30 18:28:46 -07:00
Reynold Xin	56774b176e	Added unit test for compression codecs.	2013-07-30 17:12:33 -07:00
Reynold Xin	ad7e9d0d64	CompressionCodec cleanup. Moved it to spark.io package.	2013-07-30 17:11:54 -07:00
Dmitriy Lyubimov	13a9d66645	adding ===	2013-07-30 16:10:55 -07:00
Dmitriy Lyubimov	1bca91633e	+ bug fixes; test added Conflicts: core/src/test/scala/spark/KryoSerializerSuite.scala	2013-07-30 11:04:11 -07:00
Dmitriy Lyubimov	23f3e0f117	mixing in SharedSparkContext for the kryo-collect test	2013-07-26 19:15:11 -07:00
Reynold Xin	cb366774c8	Merge pull request #738 from harsha2010/pruning Fix bug in Partition Pruning.	2013-07-26 16:59:30 -07:00
harshars	392d7474fd	Code review	2013-07-26 15:23:15 -07:00
harshars	822aac8f5a	Indentation	2013-07-26 15:10:32 -07:00
harshars	743fc4e7aa	Fix Bug in Partition Pruning, index of Pruned Partitions should inherit from parent	2013-07-26 14:35:17 -07:00
ryanlecompte	fc4b025314	add test	2013-07-24 20:53:15 -07:00
ryanlecompte	a1c515fb02	add copyright back in	2013-07-24 20:50:32 -07:00
ryanlecompte	8e0939f5a9	refactor Kryo serializer support to use chill/chill-java	2013-07-24 20:43:57 -07:00
jerryshao	31ec72b243	Code refactor according to comments	2013-07-24 14:57:47 +08:00
Andrew xia	05637de842	Change class xxxInstrumentation to class xxxSource	2013-07-24 14:57:47 +08:00
Andrew xia	ed1a3bc206	continue to refactor code style and functions	2013-07-24 14:57:47 +08:00
Andrew xia	9cea0c2818	Refactor metricsSystem unit test, add resource files.	2013-07-24 14:57:47 +08:00
jerryshao	e080588f73	Add metrics system unit test	2013-07-24 14:57:47 +08:00
Matei Zaharia	b011329040	Merge pull request #727 from rxin/scheduler Scheduler code style cleanup.	2013-07-23 22:50:09 -07:00
Reynold Xin	3dae1df66f	Moved non-serializable closure catching exception from submitStage to submitMissingTasks	2013-07-23 20:29:07 -07:00
Reynold Xin	85ab8114bc	Moved non-serializable closure catching exception from submitStage to submitMissingTasks	2013-07-23 20:25:58 -07:00
Reynold Xin	f2422d4f29	SPARK-829: scheduler shouldn't hang if a task contains unserializable objects in its closure.	2013-07-23 15:30:20 -07:00
Reynold Xin	5ed38b4d1d	Scheduler code style cleanup.	2013-07-23 15:28:59 -07:00
Reynold Xin	101b8cc78a	SPARK-829: scheduler shouldn't hang if a task contains unserializable objects in its closure.	2013-07-23 15:28:20 -07:00
Dmitriy Lyubimov	72bac09c42	Leaking spark context in the test	2013-07-23 15:19:07 -07:00
Dmitriy Lyubimov	ef82ff8564	Merge branch 'master' into SPARK-826 Conflicts: core/src/main/scala/spark/scheduler/local/LocalScheduler.scala	2013-07-23 13:43:00 -07:00
Matei Zaharia	f369e0e51b	Merge pull request #720 from ooyala/2013-07/persistent-rdds-api Add a public method getCachedRdds to SparkContext	2013-07-23 13:22:27 -07:00
Evan Chan	efd6418c1b	Move getPersistentRDDs testing to a new Suite	2013-07-23 10:40:41 -07:00
Matei Zaharia	ea1cfabfdd	Merge branch 'master' of github.com:mesos/spark	2013-07-22 16:22:02 -07:00
Matei Zaharia	8e38e77232	Fix a test that was using an outdated config setting	2013-07-22 16:05:32 -07:00
Dmitriy Lyubimov	b4b230e606	Fixing for LocalScheduler with test, that much works ..	2013-07-22 14:42:47 -07:00
Josh Rosen	f649dabb4a	Fix bug: DoubleRDDFunctions.sampleStdev() computed non-sample stdev(). Update JavaDoubleRDD to add new methods and docs. Fixes SPARK-825.	2013-07-22 13:21:48 -07:00
Evan Chan	0337d88321	Add a public method getCachedRdds to SparkContext	2013-07-21 18:26:14 -07:00
Matei Zaharia	af3c9d5042	Add Apache license headers and LICENSE and NOTICE files	2013-07-16 17:21:33 -07:00
Matei Zaharia	b1f9f64743	Merge branch 'master' of github.com:mesos/spark	2013-07-16 11:01:53 -07:00
Matei Zaharia	5c388808a8	SPARK-814: Result stages should be named after action	2013-07-16 11:01:14 -07:00
Matei Zaharia	f347cc3f65	Fix deprecation warning and style issues	2013-07-16 10:53:30 -07:00
Reynold Xin	69316603d6	Throw a more meaningful message when runJob is called to launch tasks on non-existent partitions.	2013-07-15 22:50:11 -07:00
Prashant Sharma	b59152a7c3	Changed to master version of the test, messed up during merge.	2013-07-15 12:09:17 +05:30
Prashant Sharma	a3494d405d	Merge branch 'master' of github.com:mesos/spark into scala-2.10 Conflicts: core/src/main/scala/spark/Utils.scala core/src/test/scala/spark/ui/UISuite.scala project/SparkBuild.scala run	2013-07-15 11:15:55 +05:30
Matei Zaharia	5a7835c152	Merge pull request #691 from karenfeng/logpaging Create log pages	2013-07-12 20:28:21 -07:00
Karen Feng	73984b96a8	Removed unit test of nonexistent function Utils.lastNBytes	2013-07-12 14:26:56 -07:00
Prashant Sharma	e86d5dbaad	Merge branch 'master' into master-merge Conflicts: README.md core/pom.xml core/src/main/scala/spark/deploy/JsonProtocol.scala core/src/main/scala/spark/deploy/LocalSparkCluster.scala core/src/main/scala/spark/deploy/master/Master.scala core/src/main/scala/spark/deploy/master/MasterWebUI.scala core/src/main/scala/spark/deploy/worker/Worker.scala core/src/main/scala/spark/deploy/worker/WorkerWebUI.scala core/src/main/scala/spark/storage/BlockManagerUI.scala core/src/main/scala/spark/util/AkkaUtils.scala pom.xml project/SparkBuild.scala streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala	2013-07-12 14:49:16 +05:30
Andrew xia	2080e25006	Enhance job ui in spark ui system with adding pool information	2013-07-12 14:25:18 +08:00
seanm	a2c915fba8	giving order to top and making tests more clear	2013-07-11 18:55:00 -07:00
Karen Feng	f5f3b272f8	Fixed mixup of start/end, moved more import files	2013-07-10 14:52:29 -07:00
Karen Feng	dbe948d9a2	Moved appropriate import files from UISuite to UtilsSuite	2013-07-10 14:15:41 -07:00
Karen Feng	5f8a20b4a8	Moved unit tests for Utils from UISuite to UtilsSuite	2013-07-10 13:53:39 -07:00
Karen Feng	04263e4d46	Made some minor style changes	2013-07-10 13:15:42 -07:00
Karen Feng	cfb6447ac4	Fixed for nonexistent bytes, added unit tests, changed stdout-page to stdout	2013-07-10 11:47:57 -07:00
seanm	24705d0f46	adding takeOrdered() to RDD	2013-07-10 10:33:11 -07:00
Shivaram Venkataraman	d362d0f411	Ignore stderr when calling cat on a non-existing file	2013-07-07 04:09:46 -07:00
Shivaram Venkataraman	7d6d9e6ab2	Set DriverSuite log level to WARN	2013-07-07 04:09:15 -07:00
Shivaram Venkataraman	a948f06725	Suppress log messages in sbt test with two changes: 1. Set akka log level to ERROR before shutting down the actorSystem. This avoids akka log messages (like Spray) from falling back to INFO on the Stdout logger 2. Initialize netty to use SLF4J in LocalSparkContext. This ensures that stack trace thrown during shutdown is handled by SLF4J instead of stdout	2013-07-07 04:09:08 -07:00
Matei Zaharia	1ffadb2d9e	Merge remote-tracking branch 'pwendell/ui-updates' Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala core/src/main/scala/spark/util/AkkaUtils.scala pom.xml	2013-07-06 15:51:41 -07:00
Matei Zaharia	2a36e5449b	Merge pull request #673 from xiajunluan/master Add config template file for fair scheduler feature	2013-07-06 12:43:21 -07:00
Matei Zaharia	399bd65ef5	Fixed compile error due to merge	2013-07-05 11:27:06 -07:00
Matei Zaharia	652ea0f1d8	Allow RDD.takeSample to give samples bigger than the RDD Before, when withReplacement was set to true, we would not get a sample bigger than the RDD's count(). Conflicts: core/src/main/scala/spark/RDD.scala core/src/test/scala/spark/RDDSuite.scala	2013-07-05 11:15:13 -07:00
Andrew xia	6ccfb73ca9	Add fair scheduler config template file	2013-07-04 19:19:44 +08:00
Prashant Sharma	a5f1f6a907	Merge branch 'master' into master-merge Conflicts: core/pom.xml core/src/main/scala/spark/MapOutputTracker.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/RDDCheckpointData.scala core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/Utils.scala core/src/main/scala/spark/api/python/PythonRDD.scala core/src/main/scala/spark/deploy/client/Client.scala core/src/main/scala/spark/deploy/master/MasterWebUI.scala core/src/main/scala/spark/deploy/worker/Worker.scala core/src/main/scala/spark/deploy/worker/WorkerWebUI.scala core/src/main/scala/spark/rdd/BlockRDD.scala core/src/main/scala/spark/rdd/ZippedRDD.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/main/scala/spark/storage/BlockManager.scala core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/main/scala/spark/storage/BlockManagerMasterActor.scala core/src/main/scala/spark/storage/BlockManagerUI.scala core/src/main/scala/spark/util/AkkaUtils.scala core/src/test/scala/spark/SizeEstimatorSuite.scala pom.xml project/SparkBuild.scala repl/src/main/scala/spark/repl/SparkILoop.scala repl/src/test/scala/spark/repl/ReplSuite.scala streaming/src/main/scala/spark/streaming/StreamingContext.scala streaming/src/main/scala/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala streaming/src/main/scala/spark/streaming/util/MasterFailureTest.scala	2013-07-03 11:43:26 +05:30
Patrick Wendell	8ca1cc1786	Adding truncation for log files	2013-07-02 16:10:50 -07:00
Patrick Wendell	8688689387	Various formatting changes	2013-07-01 13:40:12 -07:00
Matei Zaharia	50ca17635a	Merge pull request #664 from pwendell/test-fix Removing incorrect test statement	2013-06-27 22:24:52 -07:00
Patrick Wendell	c767e74370	Removing incorrect test statement	2013-06-27 21:48:58 -07:00
Patrick Wendell	362d996c81	Handful of changes based on matei's review - Avoid exception when no tasks have finished for a stage - Adding DOCTYPE so css renders properly - Adding progress slider	2013-06-27 19:14:28 -07:00
Patrick Wendell	2cbaa0734b	Making all new classes package private	2013-06-26 08:44:55 -07:00
Matei Zaharia	9f0d913295	Refactored tests to share SparkContexts in some of them Creating these seems to take a while and clutters the output with Akka stuff, so it would be nice to share them.	2013-06-25 19:18:30 -04:00

... 2 3 4 5 6 ...

751 commits