ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Matei Zaharia	c2c7299d7a	Added BlockManagerSuite, which I'd forgotten to merge.	2012-06-07 13:47:10 -07:00
Matei Zaharia	63051dd2bc	Merge in engine improvements from the Spark Streaming project, developed jointly with Tathagata Das and Haoyuan Li. This commit imports the changes and ports them to Mesos 0.9, but does not yet pass unit tests due to various classes not supporting a graceful stop() yet.	2012-06-07 12:45:38 -07:00
Matei Zaharia	7e1c97fc4b	Merge branch 'master' into mesos-0.9	2012-06-06 16:48:59 -07:00
Matei Zaharia	048276799a	Commit task outputs to Hadoop-supported storage systems in parallel on the cluster instead of on the master. Fixes #110.	2012-06-06 16:46:53 -07:00
Matei Zaharia	6888bc7191	Merge branch 'master' into mesos-0.9	2012-06-06 16:14:19 -07:00
Matei Zaharia	6ae2746d1e	Handle arrays that contain the same element many times better in SizeEstimator. Also added a test for SizeEstimator. Fixes #136.	2012-06-06 16:13:02 -07:00
Matei Zaharia	0a617958d1	Some refactoring to make BoundedMemoryCache test similar to others	2012-06-06 16:12:08 -07:00
Matei Zaharia	dbc3c86ae3	Merge branch 'master' into mesos-0.9 Conflicts: core/src/main/scala/spark/Executor.scala	2012-06-03 17:44:04 -07:00
Matei Zaharia	e141f644ca	Merge pull request #132 from Benky/rb-first-iteration Little refactoring and unit tests for CacheTrackerActor	2012-05-26 13:15:06 -07:00
Richard Benkovsky	ae64920337	MesosScheduler refactoring	2012-05-22 11:04:54 +02:00
Richard Benkovsky	3a1bcd4028	Added tests for CacheTrackerActor	2012-05-22 11:04:54 +02:00
Richard Benkovsky	8f2f736d53	Little refactoring	2012-05-22 11:04:54 +02:00
Richard Benkovsky	518506a7c5	Added tests for Utils.copyStream	2012-05-22 11:04:51 +02:00
Richard Benkovsky	f162fc2beb	Formating fixed	2012-05-22 09:45:38 +02:00
Richard Benkovsky	565245871f	BoundedMemoryCache.put fails when estimated size of 'value' is larger than cache capacity	2012-05-20 22:13:35 +02:00
Richard Benkovsky	822a4be37d	Utils.memoryBytesToString fixed	2012-05-19 15:13:20 +02:00
Reynold Xin	d0c6e9f639	Made some RDD dependencies transient to reduce the amount of data needed to be serialized in closure serialization. This can significantly reduce the task setup time in Shark when the query involves a large number of (Hive) partitions.	2012-05-16 14:16:55 -07:00
Reynold Xin	16461e2eda	Updated Cache's put method to use a case class for response. Previously it was pretty ugly that put() should return -1 for failures.	2012-05-15 00:31:52 -07:00
Reynold Xin	019e48833f	Added the capacity to report cache usage status back to the cache trackor. This is essential for building a dashboard to see the status of caches on all slaves.	2012-05-14 18:39:04 -07:00
Matei Zaharia	f48742683a	Made caches dataset-aware so that they won't cyclically evict partitions from the same dataset.	2012-05-06 20:14:40 -07:00
Matei Zaharia	bd2ab635a7	Fixed the way the JAR server is created after finding issue at Twitter	2012-05-05 20:05:15 -07:00
Matei Zaharia	32a4f4623c	Merge pull request #129 from mesos/rxin Force serialize/deserialize task results in local execution mode.	2012-04-24 16:18:39 -07:00
Reynold Xin	761ea65a98	Added a test for the previous commit (failing to serialize task results would throw an exception for local tasks).	2012-04-24 15:14:35 -07:00
Reynold Xin	9821cd4d42	Force serialize/deserialize task results in local execution mode.	2012-04-24 14:55:28 -07:00
Antonio	3e48818993	Removed commented-out System.exit call	2012-04-23 11:42:58 -07:00
Antonio	39d99168dc	Added exception handling instead of just exiting in LocalScheduler for tasks that throw exceptions	2012-04-20 14:46:43 -07:00
Reynold Xin	e601b3b9e5	Added the ability to set environmental variables in piped rdd.	2012-04-17 16:40:56 -07:00
Matei Zaharia	3b745176e0	Bug fix to pluggable closure serialization change	2012-04-12 17:53:02 +00:00
Matei Zaharia	112655f032	Merge pull request #121 from rxin/kryo-closure Added an option (spark.closure.serializer) to specify the serializer for closures.	2012-04-10 14:21:02 -07:00
Reynold Xin	d295ccb43c	Added a closureSerializer field in SparkEnv and use it to serialize tasks.	2012-04-10 13:29:46 -07:00
Reynold Xin	968f75f6af	Added an option (spark.closure.serializer) to specify the serializer for closures. This enables using Kryo as the closure serializer.	2012-04-09 21:59:56 -07:00
Matei Zaharia	a69c0738d1	Merge branch 'master' into mesos-0.9	2012-04-08 23:41:36 -07:00
Matei Zaharia	a633974143	Merge branch 'master' of github.com:mesos/spark	2012-04-08 23:41:25 -07:00
Matei Zaharia	0229d5390f	Merge branch 'master' into mesos-0.9	2012-04-08 23:39:37 -07:00
Matei Zaharia	d401e1b3e8	Fix a possible deadlock in MesosScheduler	2012-04-08 23:38:49 -07:00
Ankur Dave	7be1c7b331	Report entry dropping in BoundedMemoryCache	2012-04-06 15:49:32 -07:00
Matei Zaharia	a8bb324ed9	Merge branch 'master' into mesos-0.9	2012-04-05 14:53:22 -07:00
Matei Zaharia	816d4e5840	Pass local IP address instead of hostname in spark.master.host. Fixes #117 .	2012-04-05 14:53:17 -07:00
Matei Zaharia	335a6036ad	Converted some tabs to spaces	2012-04-05 11:58:01 -07:00
Matei Zaharia	8c95a85438	Use Runtime.maxMemory instead of Runtime.totalMemory in BoundedMemoryCache, in case the JVM was not started with its initial heap size equaling its maximum one (-Xms == -Xmx).	2012-03-30 13:39:35 -04:00
Matei Zaharia	03d5b3b48d	Use Runtime.maxMemory instead of Runtime.totalMemory in BoundedMemoryCache, in case the JVM was not started with its initial heap size equaling its maximum one (-Xms == -Xmx).	2012-03-30 13:38:19 -04:00
Matei Zaharia	95fb1a16b8	Use Mesos 0.9 RC3 JAR and protobuf 2.4.1	2012-03-30 11:38:49 -04:00
Matei Zaharia	dfa3b6b544	Fixes to work with the very latest Mesos 0.9 API	2012-03-29 22:12:35 -04:00
Matei Zaharia	4d52cc6738	Merge branch 'master' into mesos-0.9	2012-03-29 21:29:39 -04:00
Reynold Xin	42dcdbcb2f	Removed the extra spaces in OrderedRDDFunctions and SortedRDD.	2012-03-29 15:21:57 -07:00
Matei Zaharia	08cda89e8a	Further fixes to how Mesos is found and used	2012-03-17 13:39:14 -07:00
Matei Zaharia	3c3fdf6eca	Merge branch 'master' into mesos-0.9	2012-03-17 13:09:21 -07:00
Matei Zaharia	c7af538ac1	Some fixes to sorting for when the RDD has fewer elements than the number of partitions we ask to partition it into. Also, removed a test that was taking way too long to run.	2012-03-17 13:08:36 -07:00
Matei Zaharia	a099a63a8a	Initial work to make Spark compile with Mesos 0.9 and Hadoop 1.0	2012-03-17 12:31:34 -07:00
Matei Zaharia	a5e2b6a6bd	Merge pull request #112 from cengle/master Changed HadoopRDD to get key and value containers from the RecordReader instead of through reflection	2012-03-06 13:38:32 -08:00
Matei Zaharia	97eee50825	Fixes a nasty bug that could happen when tasks fail, because calling wait() with a timeout of 0 on a Java object means "wait forever".	2012-03-01 13:43:17 -08:00
Cliff Engle	dd68cb6099	Get key and value container from RecordReader	2012-02-29 16:33:23 -08:00
Matei Zaharia	1e10df0a46	Merge pull request #111 from alupher/master Adding sorting to RDDs	2012-02-24 15:50:14 -08:00
Antonio	0d93d95bcf	Removed unnecessary import	2012-02-21 19:57:12 -08:00
Antonio	2990298f71	Added sorting testing suite	2012-02-21 19:54:21 -08:00
Matei Zaharia	aa04f87cd2	Added support for parallel execution of jobs in DAGScheduler.	2012-02-19 22:50:23 -08:00
Antonio	620798161b	Added fixes to sorting	2012-02-13 00:07:39 -08:00
Matei Zaharia	2587ce1690	Fixed a deadlock that occured with MesosScheduler due to an earlier synchronization change	2012-02-11 21:22:45 -08:00
Antonio	e93f622665	Added sorting by key for pair RDDs	2012-02-11 00:56:28 -08:00
Matei Zaharia	98f008b721	Formatting fixes	2012-02-10 10:52:03 -08:00
Matei Zaharia	7660a8b12f	Merge branch 'formatting' Conflicts: core/src/main/scala/spark/DAGScheduler.scala core/src/main/scala/spark/SimpleShuffleFetcher.scala core/src/main/scala/spark/SparkContext.scala	2012-02-10 10:42:14 -08:00
haoyuan	194c42ab79	Code format.	2012-02-10 08:19:53 -08:00
Matei Zaharia	8f5ed51234	Delete Spark's temporary directories when the JVM exits.	2012-02-09 22:58:24 -08:00
Matei Zaharia	c0a0df3285	Made the default cache BoundedMemoryCache, and reduced its default size	2012-02-09 22:32:02 -08:00
Matei Zaharia	a766780f4c	Added some tests for multithreaded access to Spark.	2012-02-09 22:27:53 -08:00
Matei Zaharia	0e93891d3d	Replaced LocalFileShuffle with a non-singleton ShuffleManager class and made DAGScheduler automatically set SparkEnv.	2012-02-09 22:14:56 -08:00
haoyuan	445e0bb1b5	Format the code a bit mroe.	2012-02-09 15:50:26 -08:00
haoyuan	651932e703	Format the code as coding style agreed by Matei/TD/Haoyuan	2012-02-09 13:26:23 -08:00
Matei Zaharia	e02dc83a5b	IO optimizations	2012-02-06 20:40:39 -08:00
Matei Zaharia	c40e766368	Use java.util.HashMap in shuffles	2012-02-06 19:20:25 -08:00
Matei Zaharia	b267175ab5	Synchronization fix in case SparkContext is used from multiple threads.	2012-02-06 14:28:18 -08:00
Matei Zaharia	43a3335090	Simplifying test	2012-02-05 22:46:51 -08:00
Hiral Patel	b47952342e	Add register immutable map to kryo serializer	2012-01-26 15:24:20 -08:00
Matei Zaharia	fabcc82528	Merge pull request #103 from edisontung/master Made improvements to takeSample. Also changed SparkLocalKMeans to SparkKMeans	2012-01-13 19:20:03 -08:00
Matei Zaharia	fd5581a0d3	Fixed a failure recovery bug and added some tests for fault recovery.	2012-01-13 19:17:27 -08:00
Matei Zaharia	eb05154b7a	Fixed a failure recovery bug and added some tests for fault recovery.	2012-01-13 19:08:25 -08:00
Edison Tung	1ecc221f84	Fixed bugs I've fixed the bugs detailed in the diff. One of the bugs was already fixed on the local file (forgot to commit).	2012-01-09 11:59:52 -08:00
Matei Zaharia	e269f6f7ea	Register RDDs with the MapOutputTracker even if they have no partitions. Fixes #105.	2012-01-05 15:59:20 -05:00
Matei Zaharia	3034fc0d91	Merge commit 'ad4ebff42c1b738746b2b9ecfbb041b6d06e3e16'	2011-12-14 18:19:43 +01:00
Matei Zaharia	6a650cbbdf	Make Spark port default to 7077 so that it's not an ephemeral port that might be taken	2011-12-14 18:18:22 +01:00
Matei Zaharia	735843a049	Merge remote-tracking branch 'origin/charles-newhadoop'	2011-12-02 21:59:30 -08:00
Charles Reiss	66f05f383e	Add new Hadoop API reading support.	2011-12-01 14:02:10 -08:00
Charles Reiss	02d43e6986	Add new Hadoop API writing support.	2011-12-01 14:01:28 -08:00
Edison Tung	42f8847a21	Revert de01b6deaaee1b43321e0aac330f4a98c0ea61c6^..HEAD	2011-12-01 13:43:25 -08:00
Edison Tung	de01b6deaa	Fixed bug in RDD Math.min takes 2 args, not 1. This was not committed earlier for some reason	2011-12-01 13:34:37 -08:00
Matei Zaharia	22b8fcf632	Added fold() and aggregate() operations that reuse an object to merge results into rather than requiring a new object allocation for each element merged. Fixes #95.	2011-11-30 11:37:47 -08:00
Matei Zaharia	09dd58b3a7	Send SPARK_JAVA_OPTS to slave nodes.	2011-11-30 11:34:58 -08:00
Edison Tung	a3bc012af8	added takeSamples method takeSamples method takes a specified number of samples from the RDD and outputs it in an array.	2011-11-21 16:38:44 -08:00
Ankur Dave	ad4ebff42c	Deduplicate exceptions when printing them The first time they appear, exceptions are printed in full, including a stack trace. After that, they are printed in abbreviated form. They are periodically reprinted in full; the reprint interval defaults to 5 seconds and is configurable using the property spark.logging.exceptionPrintInterval.	2011-11-14 01:54:53 +00:00
Ankur Dave	35b6358a7c	Report errors in tasks to the driver via a Mesos status update When a task throws an exception, the Spark executor previously just logged it to a local file on the slave and exited. This commit causes Spark to also report the exception back to the driver using a Mesos status update, so the user doesn't have to look through a log file on the slave. Here's what the reporting currently looks like: # ./run spark.examples.ExceptionHandlingTest master@203.0.113.1:5050 [...] 11/10/26 21:04:13 INFO spark.SimpleJob: Lost TID 1 (task 0:1) 11/10/26 21:04:13 INFO spark.SimpleJob: Loss was due to java.lang.Exception: Testing exception handling [...] 11/10/26 21:04:16 INFO spark.SparkContext: Job finished in 5.988547328 s	2011-11-14 01:54:53 +00:00
Matei Zaharia	07532021fe	Bug fix: reject offers that we didn't find any tasks for	2011-11-08 23:05:54 -08:00
Matei Zaharia	9e4c79a4d3	Closure cleaner unit test	2011-11-08 00:40:15 -08:00
Matei Zaharia	f346e64637	Updates to the closure cleaner to work better with closures in classes. Before, the cleaner attempted to clone $outer objects that were classes (as opposed to nested closures) and preserve only their used fields, which was bad because it would miss fields that are accessed indirectly by methods, and in general it would confuse user code. Now we keep a reference to those objects without cloning them. This is not perfect because the user still needs to be careful of what they'll carry along into closures, but it works better in some cases that seemed confusing before. We need to improve the documentation on what variables get passed along with a closure and possibly add some debugging tools for it as well. Fixes #71 -- that code now works in the REPL.	2011-11-08 00:33:28 -08:00
Matei Zaharia	c2b7fd6899	Make parallelize() work efficiently for ranges of Long, Double, etc (splitting them into sub-ranges). Fixes #87.	2011-11-02 15:16:02 -07:00
Matei Zaharia	157279e9eb	Update Spark to work with the latest Mesos API	2011-10-30 14:10:56 -07:00
root	3a0e6c4363	Miscellaneous fixes: - Executor should initialize logging properly - groupByKey should allow custom partitioner	2011-10-17 18:07:35 +00:00
root	62aa820084	Merge branch 'ankur-master'	2011-10-14 02:14:07 +00:00
Ankur Dave	2d7057bf5d	Implement PairRDDFunctions.partitionBy	2011-10-09 15:52:09 -07:00
Ankur Dave	06637cb69e	Fix PairRDDFunctions.groupWith partitioning This commit fixes a bug in groupWith that was causing it to destroy partitioning information. It replaces a call to map with a call to mapValues, which preserves partitioning.	2011-10-09 15:48:46 -07:00
Ankur Dave	2911a783d6	Add custom partitioner support to PairRDDFunctions.combineByKey	2011-10-09 15:47:20 -07:00

1 2 3 4 5 ...

314 commits