ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Imran Rashid	83659af11c	Accumulator now inherits from Accumulable, whcih simplifies a bunch of other things (eg., no +:=) Conflicts: core/src/main/scala/spark/Accumulators.scala	2012-07-28 20:13:51 -07:00
Imran Rashid	79d58ed20a	improve scaladoc	2012-07-28 20:12:41 -07:00
Imran Rashid	ae07f3864c	add Accumulatable, add corresponding docs & tests for accumulators	2012-07-28 20:12:41 -07:00
Matei Zaharia	dee8ff1b9d	Added a second version of union() without varargs.	2012-07-27 16:27:52 -07:00
Tathagata Das	cf429699e1	Updated the new checkpoint RDD to remember partitioning of the original RDD.	2012-07-27 23:16:37 +00:00
Mosharaf Chowdhury	b5be936d7c	Broadcasts using BlockManager instead of BoundedMemoryCache	2012-07-27 15:38:46 -07:00
Mosharaf Chowdhury	1f19fbb8db	Merge remote-tracking branch 'upstream/dev' into dev Conflicts: core/src/main/scala/spark/broadcast/Broadcast.scala	2012-07-27 15:18:23 -07:00
Matei Zaharia	b51d733a57	Fixed Java union methods having same erasure. Changed union() methods on lists to take a separate "first element" argument in order to differentiate them to the compiler, because Java 7 considered it an error to have them all take Lists parameterized with different types.	2012-07-27 12:23:27 -07:00
Tathagata Das	3e271c3b61	Merge branch 'dev' of github.com:tdas/spark into dev	2012-07-27 12:01:04 -07:00
Tathagata Das	024905f682	Added BlockRDD and a first-cut version of checkpoint() to RDD class.	2012-07-27 12:00:49 -07:00
Tathagata Das	d1eee44a03	Fixed more stuff in BoundedMemoryCache.	2012-07-27 18:33:32 +00:00
Tathagata Das	d1b7f41671	Fixed bug in BoundedMemoryCache.	2012-07-27 09:00:45 -07:00
Tathagata Das	435d129bec	Fixed bugs in block dropping code of MemoryStore and changed synchronized HashMap to ConcurrentHashMap in BlockManager.	2012-07-27 10:02:26 +00:00
Tathagata Das	0426769f89	Modified the block dropping code for better performance.	2012-07-26 20:53:45 -07:00
Matei Zaharia	5c5aa2ff81	Merge pull request #153 from JoshRosen/new-java-api Java API	2012-07-26 17:20:52 -07:00
Josh Rosen	c5e2810dc7	Add persist(), splits(), glom(), and mapPartitions() to Java API.	2012-07-26 12:46:47 -07:00
Josh Rosen	bf61c10072	Detect non-zero exit status from PipedRDD process.	2012-07-26 11:32:59 -07:00
Josh Rosen	6a78e88237	Minor cleanup and optimizations in Java API. - Add override keywords. - Cache RDDs and counts in TC example. - Clean up JavaRDDLike's abstract methods.	2012-07-24 09:47:00 -07:00
Denny	4f4a34c025	Stlystic changes Conflicts: core/src/test/scala/spark/MesosSchedulerSuite.scala	2012-07-23 16:32:20 -07:00
Matei Zaharia	600e99728d	Fix a bug where an input path was added to a Hadoop job configuration twice	2012-07-23 16:16:19 -07:00
Josh Rosen	042dcbde33	Add type annotations to Java API methods. Add missing Scala Map to java.util.Map conversions.	2012-07-22 17:35:29 -07:00
Josh Rosen	e23938c3be	Use mapValues() in JavaPairRDD.cogroupResultToJava().	2012-07-22 15:10:01 -07:00
Josh Rosen	01dce3f569	Add Java API Add distinct() method to RDD. Fix bug in DoubleRDDFunctions.	2012-07-18 17:34:29 -07:00
Mosharaf Chowdhury	85cd9979f2	Fix for isLocal	2012-07-13 01:13:14 -07:00
Mosharaf Chowdhury	1c83fd4b66	Merged with Upstream dev	2012-07-13 01:08:28 -07:00
Mosharaf Chowdhury	bb4ee580fa	Cleaning BitTorrentBroadcast code...	2012-07-13 01:04:01 -07:00
Mosharaf Chowdhury	8ccffe21da	Cleaned TreeBroadcast	2012-07-13 00:54:25 -07:00
Matei Zaharia	628bb5ca7f	Allow null keys in Spark's reduce and group by	2012-07-12 18:36:02 -07:00
Matei Zaharia	e2a67a8024	Fixes to coarse-grained Mesos scheduler in dealing with failed nodes	2012-07-12 18:21:52 -07:00
Matei Zaharia	be622cf867	Formatting	2012-07-11 17:31:44 -07:00
Matei Zaharia	e8ae77df24	Added more methods for loading/saving with new Hadoop API	2012-07-11 17:31:33 -07:00
Mosharaf Chowdhury	34999d97f5	Added stop() to the Broadcast subsystem	2012-07-10 01:03:47 -07:00
Mosharaf Chowdhury	d6a9680604	Slightly better check for isLocal	2012-07-10 00:16:47 -07:00
Mosharaf Chowdhury	701f49e0d9	Refactoring	2012-07-09 22:39:47 -07:00
Mosharaf Chowdhury	cf1c60a1de	Refactoring	2012-07-09 22:07:46 -07:00
Mosharaf Chowdhury	e71f69ad3d	Refactoring	2012-07-09 22:07:17 -07:00
Mosharaf Chowdhury	ca02a92332	Refactored TrackMultipleValues out.	2012-07-09 21:35:39 -07:00
Mosharaf Chowdhury	654576ef1a	Tweaks	2012-07-09 21:12:42 -07:00
Mosharaf Chowdhury	425c247269	Removed some unused stuff	2012-07-08 14:29:04 -07:00
Matei Zaharia	0a47284003	More work to allow Spark to run on the standalone deploy cluster.	2012-07-08 14:00:04 -07:00
Mosharaf Chowdhury	c7c5258e25	Compiles without Dfs	2012-07-08 13:22:12 -07:00
Mosharaf Chowdhury	178bb29f05	Removed Chained and Dfs broadcast implementations	2012-07-08 11:57:00 -07:00
Matei Zaharia	1aa63f775b	Added back coarse-grained Mesos scheduler based on StandaloneScheduler.	2012-07-08 10:52:13 -07:00
Matei Zaharia	c5cc10cda3	More work on standalone scheduler	2012-07-06 20:17:44 -07:00
Matei Zaharia	909b325243	Further refactoring, and start of a standalone scheduler backend	2012-07-06 17:56:44 -07:00
Matei Zaharia	4e2fe0bdaf	Miscellaneous bug fixes	2012-07-06 16:33:40 -07:00
Matei Zaharia	e72afdb817	Some refactoring to make cluster scheduler pluggable.	2012-07-06 15:23:26 -07:00
Matei Zaharia	5d1a887bed	Further updates to run processes on cluster.	2012-07-01 17:13:31 -07:00
Matei Zaharia	51c46eaca0	More work on standalone deploy system.	2012-07-01 01:05:59 -07:00
Matei Zaharia	a6eb9fda61	Detect connection and disconnection of slaves	2012-06-30 17:46:56 -07:00
Matei Zaharia	408b5a1332	More work on deploy code (adding Worker class)	2012-06-30 16:45:57 -07:00
Matei Zaharia	2fb6e7d71e	Initial framework to get a master and web UI up.	2012-06-30 14:45:55 -07:00
Matei Zaharia	c53670b9bf	Various code style fixes, mostly from IntelliJ IDEA	2012-06-29 18:47:12 -07:00
Matei Zaharia	c6be4ffbf9	Fixes to CoarseMesosScheduler	2012-06-29 16:18:51 -07:00
Matei Zaharia	3a58efa5a5	Allow binding to a free port and change Akka logging to use SLF4J. Also fixes various bugs in the previous code when running on Mesos.	2012-06-29 16:02:21 -07:00
Matei Zaharia	3920189932	Upgraded to Akka 2 and fixed test execution (which was still parallel across projects).	2012-06-28 23:51:28 -07:00
root	6ad3e1f1b4	Various fixes when running on Mesos	2012-06-20 06:48:26 +00:00
Tathagata Das	40536e3668	Fixed nasty corner case bug in ByteBufferInputStream. Could not add a test case for this as I could not figure out how to deterministically reproduce the bug in a short testcase.	2012-06-17 13:28:41 -07:00
Matei Zaharia	2893b30550	Various fixes to get unit tests running. In particular, shut down ConnectionManager and DAGScheduler properly, plus a fix to LocalScheduler that was not merged in from 0.5 and was actually caught by one of the tests.	2012-06-17 00:28:45 -07:00
Matei Zaharia	b3eeac55b8	Fixed HttpBroadcast to work with this branch's Serializer.	2012-06-15 23:54:38 -07:00
Matei Zaharia	f58da6164e	Merge branch 'master' into dev	2012-06-15 23:47:11 -07:00
Tathagata Das	5f54bdf98b	Added shutdown for akka to SparkContext.stop(). Helps a little, but many testsuites still fail.	2012-06-13 20:49:00 -04:00
Tathagata Das	c6156da9e2	Multiple bug fixes to pass the testsuites ShuffleSuite and BlockManagerSuite.	2012-06-13 16:26:49 -04:00
Matei Zaharia	879bc0bece	Merge branch 'master' into mesos-0.9	2012-06-09 16:24:16 -07:00
Matei Zaharia	4b05798c06	Further bug fix to HttpBroadcast	2012-06-09 16:24:03 -07:00
Matei Zaharia	587a16a7ef	Merge branch 'master' into mesos-0.9	2012-06-09 16:17:07 -07:00
Matei Zaharia	8ed662862e	Bug fix to HttpBroadcast	2012-06-09 16:16:55 -07:00
Matei Zaharia	2fd9f994ae	Merge branch 'master' into mesos-0.9	2012-06-09 15:58:35 -07:00
Matei Zaharia	e75b1b5cb4	Change the default broadcast implementation to a simple HTTP-based broadcast. Fixes #139.	2012-06-09 15:58:07 -07:00
Matei Zaharia	a96558caa3	Performance improvements to shuffle operations: in particular, preserve RDD partitioning in more cases where it's possible, and use iterators instead of materializing collections when doing joins.	2012-06-09 14:44:18 -07:00
Matei Zaharia	63051dd2bc	Merge in engine improvements from the Spark Streaming project, developed jointly with Tathagata Das and Haoyuan Li. This commit imports the changes and ports them to Mesos 0.9, but does not yet pass unit tests due to various classes not supporting a graceful stop() yet.	2012-06-07 12:45:38 -07:00
Matei Zaharia	7e1c97fc4b	Merge branch 'master' into mesos-0.9	2012-06-06 16:48:59 -07:00
Matei Zaharia	048276799a	Commit task outputs to Hadoop-supported storage systems in parallel on the cluster instead of on the master. Fixes #110.	2012-06-06 16:46:53 -07:00
Matei Zaharia	6888bc7191	Merge branch 'master' into mesos-0.9	2012-06-06 16:14:19 -07:00
Matei Zaharia	6ae2746d1e	Handle arrays that contain the same element many times better in SizeEstimator. Also added a test for SizeEstimator. Fixes #136.	2012-06-06 16:13:02 -07:00
Matei Zaharia	dbc3c86ae3	Merge branch 'master' into mesos-0.9 Conflicts: core/src/main/scala/spark/Executor.scala	2012-06-03 17:44:04 -07:00
Matei Zaharia	e141f644ca	Merge pull request #132 from Benky/rb-first-iteration Little refactoring and unit tests for CacheTrackerActor	2012-05-26 13:15:06 -07:00
Richard Benkovsky	ae64920337	MesosScheduler refactoring	2012-05-22 11:04:54 +02:00
Richard Benkovsky	3a1bcd4028	Added tests for CacheTrackerActor	2012-05-22 11:04:54 +02:00
Richard Benkovsky	8f2f736d53	Little refactoring	2012-05-22 11:04:54 +02:00
Richard Benkovsky	f162fc2beb	Formating fixed	2012-05-22 09:45:38 +02:00
Richard Benkovsky	565245871f	BoundedMemoryCache.put fails when estimated size of 'value' is larger than cache capacity	2012-05-20 22:13:35 +02:00
Richard Benkovsky	822a4be37d	Utils.memoryBytesToString fixed	2012-05-19 15:13:20 +02:00
Reynold Xin	d0c6e9f639	Made some RDD dependencies transient to reduce the amount of data needed to be serialized in closure serialization. This can significantly reduce the task setup time in Shark when the query involves a large number of (Hive) partitions.	2012-05-16 14:16:55 -07:00
Reynold Xin	16461e2eda	Updated Cache's put method to use a case class for response. Previously it was pretty ugly that put() should return -1 for failures.	2012-05-15 00:31:52 -07:00
Reynold Xin	019e48833f	Added the capacity to report cache usage status back to the cache trackor. This is essential for building a dashboard to see the status of caches on all slaves.	2012-05-14 18:39:04 -07:00
Matei Zaharia	f48742683a	Made caches dataset-aware so that they won't cyclically evict partitions from the same dataset.	2012-05-06 20:14:40 -07:00
Matei Zaharia	bd2ab635a7	Fixed the way the JAR server is created after finding issue at Twitter	2012-05-05 20:05:15 -07:00
Matei Zaharia	32a4f4623c	Merge pull request #129 from mesos/rxin Force serialize/deserialize task results in local execution mode.	2012-04-24 16:18:39 -07:00
Reynold Xin	9821cd4d42	Force serialize/deserialize task results in local execution mode.	2012-04-24 14:55:28 -07:00
Antonio	3e48818993	Removed commented-out System.exit call	2012-04-23 11:42:58 -07:00
Antonio	39d99168dc	Added exception handling instead of just exiting in LocalScheduler for tasks that throw exceptions	2012-04-20 14:46:43 -07:00
Reynold Xin	e601b3b9e5	Added the ability to set environmental variables in piped rdd.	2012-04-17 16:40:56 -07:00
Matei Zaharia	3b745176e0	Bug fix to pluggable closure serialization change	2012-04-12 17:53:02 +00:00
Matei Zaharia	112655f032	Merge pull request #121 from rxin/kryo-closure Added an option (spark.closure.serializer) to specify the serializer for closures.	2012-04-10 14:21:02 -07:00
Reynold Xin	d295ccb43c	Added a closureSerializer field in SparkEnv and use it to serialize tasks.	2012-04-10 13:29:46 -07:00
Reynold Xin	968f75f6af	Added an option (spark.closure.serializer) to specify the serializer for closures. This enables using Kryo as the closure serializer.	2012-04-09 21:59:56 -07:00
Matei Zaharia	a69c0738d1	Merge branch 'master' into mesos-0.9	2012-04-08 23:41:36 -07:00
Matei Zaharia	a633974143	Merge branch 'master' of github.com:mesos/spark	2012-04-08 23:41:25 -07:00
Matei Zaharia	0229d5390f	Merge branch 'master' into mesos-0.9	2012-04-08 23:39:37 -07:00
Matei Zaharia	d401e1b3e8	Fix a possible deadlock in MesosScheduler	2012-04-08 23:38:49 -07:00
Ankur Dave	7be1c7b331	Report entry dropping in BoundedMemoryCache	2012-04-06 15:49:32 -07:00
Matei Zaharia	a8bb324ed9	Merge branch 'master' into mesos-0.9	2012-04-05 14:53:22 -07:00
Matei Zaharia	816d4e5840	Pass local IP address instead of hostname in spark.master.host. Fixes #117 .	2012-04-05 14:53:17 -07:00
Matei Zaharia	335a6036ad	Converted some tabs to spaces	2012-04-05 11:58:01 -07:00
Matei Zaharia	8c95a85438	Use Runtime.maxMemory instead of Runtime.totalMemory in BoundedMemoryCache, in case the JVM was not started with its initial heap size equaling its maximum one (-Xms == -Xmx).	2012-03-30 13:39:35 -04:00
Matei Zaharia	03d5b3b48d	Use Runtime.maxMemory instead of Runtime.totalMemory in BoundedMemoryCache, in case the JVM was not started with its initial heap size equaling its maximum one (-Xms == -Xmx).	2012-03-30 13:38:19 -04:00
Matei Zaharia	dfa3b6b544	Fixes to work with the very latest Mesos 0.9 API	2012-03-29 22:12:35 -04:00
Matei Zaharia	4d52cc6738	Merge branch 'master' into mesos-0.9	2012-03-29 21:29:39 -04:00
Reynold Xin	42dcdbcb2f	Removed the extra spaces in OrderedRDDFunctions and SortedRDD.	2012-03-29 15:21:57 -07:00
Matei Zaharia	08cda89e8a	Further fixes to how Mesos is found and used	2012-03-17 13:39:14 -07:00
Matei Zaharia	3c3fdf6eca	Merge branch 'master' into mesos-0.9	2012-03-17 13:09:21 -07:00
Matei Zaharia	c7af538ac1	Some fixes to sorting for when the RDD has fewer elements than the number of partitions we ask to partition it into. Also, removed a test that was taking way too long to run.	2012-03-17 13:08:36 -07:00
Matei Zaharia	a099a63a8a	Initial work to make Spark compile with Mesos 0.9 and Hadoop 1.0	2012-03-17 12:31:34 -07:00
Matei Zaharia	a5e2b6a6bd	Merge pull request #112 from cengle/master Changed HadoopRDD to get key and value containers from the RecordReader instead of through reflection	2012-03-06 13:38:32 -08:00
Matei Zaharia	97eee50825	Fixes a nasty bug that could happen when tasks fail, because calling wait() with a timeout of 0 on a Java object means "wait forever".	2012-03-01 13:43:17 -08:00
Cliff Engle	dd68cb6099	Get key and value container from RecordReader	2012-02-29 16:33:23 -08:00
Matei Zaharia	1e10df0a46	Merge pull request #111 from alupher/master Adding sorting to RDDs	2012-02-24 15:50:14 -08:00
Matei Zaharia	aa04f87cd2	Added support for parallel execution of jobs in DAGScheduler.	2012-02-19 22:50:23 -08:00
Antonio	620798161b	Added fixes to sorting	2012-02-13 00:07:39 -08:00
Matei Zaharia	2587ce1690	Fixed a deadlock that occured with MesosScheduler due to an earlier synchronization change	2012-02-11 21:22:45 -08:00
Antonio	e93f622665	Added sorting by key for pair RDDs	2012-02-11 00:56:28 -08:00
Matei Zaharia	98f008b721	Formatting fixes	2012-02-10 10:52:03 -08:00
Matei Zaharia	7660a8b12f	Merge branch 'formatting' Conflicts: core/src/main/scala/spark/DAGScheduler.scala core/src/main/scala/spark/SimpleShuffleFetcher.scala core/src/main/scala/spark/SparkContext.scala	2012-02-10 10:42:14 -08:00
haoyuan	194c42ab79	Code format.	2012-02-10 08:19:53 -08:00
Matei Zaharia	8f5ed51234	Delete Spark's temporary directories when the JVM exits.	2012-02-09 22:58:24 -08:00
Matei Zaharia	c0a0df3285	Made the default cache BoundedMemoryCache, and reduced its default size	2012-02-09 22:32:02 -08:00
Matei Zaharia	0e93891d3d	Replaced LocalFileShuffle with a non-singleton ShuffleManager class and made DAGScheduler automatically set SparkEnv.	2012-02-09 22:14:56 -08:00
haoyuan	445e0bb1b5	Format the code a bit mroe.	2012-02-09 15:50:26 -08:00
haoyuan	651932e703	Format the code as coding style agreed by Matei/TD/Haoyuan	2012-02-09 13:26:23 -08:00
Matei Zaharia	e02dc83a5b	IO optimizations	2012-02-06 20:40:39 -08:00
Matei Zaharia	c40e766368	Use java.util.HashMap in shuffles	2012-02-06 19:20:25 -08:00
Matei Zaharia	b267175ab5	Synchronization fix in case SparkContext is used from multiple threads.	2012-02-06 14:28:18 -08:00
Hiral Patel	b47952342e	Add register immutable map to kryo serializer	2012-01-26 15:24:20 -08:00
Matei Zaharia	fabcc82528	Merge pull request #103 from edisontung/master Made improvements to takeSample. Also changed SparkLocalKMeans to SparkKMeans	2012-01-13 19:20:03 -08:00
Matei Zaharia	fd5581a0d3	Fixed a failure recovery bug and added some tests for fault recovery.	2012-01-13 19:17:27 -08:00
Edison Tung	1ecc221f84	Fixed bugs I've fixed the bugs detailed in the diff. One of the bugs was already fixed on the local file (forgot to commit).	2012-01-09 11:59:52 -08:00
Matei Zaharia	e269f6f7ea	Register RDDs with the MapOutputTracker even if they have no partitions. Fixes #105.	2012-01-05 15:59:20 -05:00
Matei Zaharia	3034fc0d91	Merge commit 'ad4ebff42c1b738746b2b9ecfbb041b6d06e3e16'	2011-12-14 18:19:43 +01:00
Matei Zaharia	6a650cbbdf	Make Spark port default to 7077 so that it's not an ephemeral port that might be taken	2011-12-14 18:18:22 +01:00
Matei Zaharia	735843a049	Merge remote-tracking branch 'origin/charles-newhadoop'	2011-12-02 21:59:30 -08:00
Charles Reiss	66f05f383e	Add new Hadoop API reading support.	2011-12-01 14:02:10 -08:00
Charles Reiss	02d43e6986	Add new Hadoop API writing support.	2011-12-01 14:01:28 -08:00
Edison Tung	42f8847a21	Revert de01b6deaaee1b43321e0aac330f4a98c0ea61c6^..HEAD	2011-12-01 13:43:25 -08:00
Edison Tung	de01b6deaa	Fixed bug in RDD Math.min takes 2 args, not 1. This was not committed earlier for some reason	2011-12-01 13:34:37 -08:00
Matei Zaharia	22b8fcf632	Added fold() and aggregate() operations that reuse an object to merge results into rather than requiring a new object allocation for each element merged. Fixes #95.	2011-11-30 11:37:47 -08:00
Matei Zaharia	09dd58b3a7	Send SPARK_JAVA_OPTS to slave nodes.	2011-11-30 11:34:58 -08:00
Edison Tung	a3bc012af8	added takeSamples method takeSamples method takes a specified number of samples from the RDD and outputs it in an array.	2011-11-21 16:38:44 -08:00
Ankur Dave	ad4ebff42c	Deduplicate exceptions when printing them The first time they appear, exceptions are printed in full, including a stack trace. After that, they are printed in abbreviated form. They are periodically reprinted in full; the reprint interval defaults to 5 seconds and is configurable using the property spark.logging.exceptionPrintInterval.	2011-11-14 01:54:53 +00:00
Ankur Dave	35b6358a7c	Report errors in tasks to the driver via a Mesos status update When a task throws an exception, the Spark executor previously just logged it to a local file on the slave and exited. This commit causes Spark to also report the exception back to the driver using a Mesos status update, so the user doesn't have to look through a log file on the slave. Here's what the reporting currently looks like: # ./run spark.examples.ExceptionHandlingTest master@203.0.113.1:5050 [...] 11/10/26 21:04:13 INFO spark.SimpleJob: Lost TID 1 (task 0:1) 11/10/26 21:04:13 INFO spark.SimpleJob: Loss was due to java.lang.Exception: Testing exception handling [...] 11/10/26 21:04:16 INFO spark.SparkContext: Job finished in 5.988547328 s	2011-11-14 01:54:53 +00:00

1 2 3 4 5 ...

416 commits