ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Ankur Dave	6c6e47e3cd	Use BufferedOutputStream in ShuffleMapTask	2011-10-09 15:43:31 -07:00
Matei Zaharia	1069740264	Added a jarOfObject method to get the JAR of the class that an object belongs to, which seems like a more common case.	2011-08-29 23:27:10 -07:00
Matei Zaharia	0aa23bf17e	Added a convenience method for getting the JAR file that loaded a class (useful for jobs to pass their own JAR files to SparkContext).	2011-08-29 22:59:44 -07:00
Matei Zaharia	a161f00610	Made a log message slightly less ugly	2011-08-27 16:58:54 -07:00
Matei Zaharia	c22043f150	Minor fix: can use >= when checking memory	2011-08-02 19:11:17 -07:00
Ismael Juma	6ff57f5594	Use scala.math instead of Math as the latter is deprecated.	2011-08-02 10:25:47 +01:00
Ismael Juma	620de2dd1d	Change currentThread to Thread.currentThread as the former is deprecated.	2011-08-02 10:25:16 +01:00
Ismael Juma	0fba22b3d2	Fix issue #65 : Change @serializable to extends Serializable in 2.9 branch Note that we use scala.Serializable introduced in Scala 2.9 instead of java.io.Serializable. Also, case classes inherit from scala.Serializable by default.	2011-08-02 10:16:33 +01:00
Matei Zaharia	711575391d	Merge branch 'scala-2.9' Conflicts: project/build/SparkProject.scala	2011-08-01 15:25:26 -07:00
Matei Zaharia	4050d661c5	Updated to newest Mesos API, which includes better memory accounting by specifying per-executor memory.	2011-08-01 13:54:48 -07:00
Matei Zaharia	d12122502b	Various improvements to Kryo serializer: - Replaced modified Kryo version with the standard one augmented with the kryo-serializers package, which includes support for classes with no-arg constructors (that was why we had a modified Kryo before) - The kryo-serializers version also fixes issue #72. - Added a bunch of tests. - Serialize maps and a few other common types properly by default.	2011-07-21 22:09:33 -07:00
Matei Zaharia	baa72e2747	Removed a debug statement that slipped in as a println	2011-07-21 16:09:33 -07:00
Matei Zaharia	2bfd7931e8	Merge branch 'new-rdds-protobuf' Conflicts: core/src/main/scala/spark/Executor.scala core/src/main/scala/spark/RDD.scala	2011-07-21 16:08:39 -07:00
Matei Zaharia	1450fd74d9	Merge branch 'master' into scala-2.9	2011-07-14 17:37:24 -04:00
Matei Zaharia	ccf48388cd	Lowered default number of splits for files	2011-07-14 17:37:04 -04:00
Matei Zaharia	146a18c2a4	Merge branch 'master' into scala-2.9	2011-07-14 17:29:17 -04:00
Matei Zaharia	c8eb8b2b90	Set class loader for remote actors to fix a bug that happens in 2.9	2011-07-14 17:29:11 -04:00
Matei Zaharia	8ea67307b9	Merge branch 'master' into scala-2.9	2011-07-14 14:47:12 -04:00
Matei Zaharia	e4c3402d2d	Renamed ParallelArray to ParallelCollection	2011-07-14 14:47:01 -04:00
Matei Zaharia	9ac461d85d	Remove RDD.toString because it looked confusing	2011-07-14 14:39:32 -04:00
Matei Zaharia	797b4547c3	Fix tracking of updates in accumulators to solve an issue that would manifest in the 2.9 interpreter	2011-07-14 14:08:34 -04:00
Matei Zaharia	3efd9e94d8	Merge branch 'master' into scala-2.9	2011-07-14 12:42:57 -04:00
Matei Zaharia	0ccfe20755	Forgot to add a file	2011-07-14 12:42:50 -04:00
Matei Zaharia	38f38dda5b	Merge branch 'master' into scala-2.9	2011-07-14 12:42:02 -04:00
Matei Zaharia	969644df8e	Cleaned up a few issues to do with default parallelism levels. Also renamed HadoopFileWriter to HadoopWriter (since it's not only for files) and fixed a bug for lookup().	2011-07-14 12:40:56 -04:00
Matei Zaharia	2fb906e8e5	Merge branch 'master' into scala-2.9	2011-07-14 00:20:14 -04:00
Matei Zaharia	2604939f64	Simplified and documented code a little and added test	2011-07-14 00:19:00 -04:00
Matei Zaharia	2439e51a03	Merge branch 'master' into implicit-sequencefile	2011-07-13 23:20:22 -04:00
Matei Zaharia	d0c7958364	Merge branch 'master' into scala-2.9 Conflicts: core/src/main/scala/spark/HadoopFileWriter.scala	2011-07-13 23:09:33 -04:00
Matei Zaharia	9c0069188b	Updated save code to allow non-file-based OutputFormats and added a test for file-related stuff	2011-07-13 23:04:06 -04:00
Matei Zaharia	da8a3b8926	Increase default value of spark.locality.wait a little	2011-07-13 20:07:24 -04:00
Matei Zaharia	080869c6ef	Merge branch 'master' into scala-2.9	2011-07-13 00:20:08 -04:00
Matei Zaharia	842e14d567	Added mapPartitions operation and a bunch of tests for RDD ops	2011-07-13 00:19:52 -04:00
Matei Zaharia	9b568d37f7	Merge branch 'master' into scala-2.9 Conflicts: core/src/main/scala/spark/RDD.scala	2011-07-11 22:25:53 -04:00
Matei Zaharia	d05fea24f3	Simplified parallel shuffle fetcher to use URLConnection	2011-07-11 22:12:36 -04:00
Matei Zaharia	25c3a7781c	Moved PairRDD and SequenceFileRDD functions to separate source files	2011-07-10 00:06:15 -04:00
Matei Zaharia	b7f1f62ff5	bug fix	2011-07-09 18:53:02 -04:00
Matei Zaharia	003480f374	Register byte[] with Kryo serializer	2011-07-09 18:08:07 -04:00
Matei Zaharia	aea5cb4413	Added parallel shuffle fetcher	2011-07-09 17:25:56 -04:00
Matei Zaharia	4b1646a25f	Support for non-filesystem-based Hadoop data sources	2011-07-06 20:37:55 -04:00
Matei Zaharia	07a97d47c2	Support for non-filesystem-based Hadoop data sources	2011-07-06 20:37:34 -04:00
Matei Zaharia	3488c386a9	Initial work to make stuff like sequenceFile[Int, Int] work without requiring the user to provide a Writable type. The approach here might not be the best but it seems to work correctly.	2011-06-28 17:07:04 -07:00
Matei Zaharia	5633299ec6	Merge remote-tracking branch 'origin/master' into scala-2.9	2011-06-27 22:50:59 -07:00
Matei Zaharia	b0ecf1ee41	Don't pass a null context when running tasks locally	2011-06-27 22:50:43 -07:00
Matei Zaharia	85cad5d9dd	Fixed HadoopFileWriter to compile for Scala 2.9	2011-06-27 22:44:14 -07:00
Matei Zaharia	393607d5ef	Merge branch 'master' into scala-2.9	2011-06-27 18:08:25 -07:00
Matei Zaharia	2f652f1656	Fix a compile error	2011-06-27 18:07:16 -07:00
Tathagata Das	3f08e1129f	Merge branch 'master' into td-rdd-save Conflicts: core/src/main/scala/spark/SparkContext.scala	2011-06-27 13:43:44 -07:00
Tathagata Das	ad842ac823	Merge branch 'master' into td-rdd-save Conflicts: core/src/main/scala/spark/RDD.scala	2011-06-27 13:39:11 -07:00
Matei Zaharia	bae8a97968	Merge branch 'master' into scala-2.9 Conflicts: repl/src/main/scala/spark/repl/SparkInterpreterLoop.scala	2011-06-26 19:22:27 -07:00
Matei Zaharia	c4dd68ae21	Merge branch 'mos-bt' This merge keeps only the broadcast work in mos-bt because the structure of shuffle has changed with the new RDD design. We still need some kind of parallel shuffle but that will be added later. Conflicts: core/src/main/scala/spark/BitTorrentBroadcast.scala core/src/main/scala/spark/ChainedBroadcast.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/Utils.scala core/src/main/scala/spark/shuffle/BasicLocalFileShuffle.scala core/src/main/scala/spark/shuffle/DfsShuffle.scala	2011-06-26 18:22:12 -07:00
Tathagata Das	38f2ba99cc	Further changes to HadoopFileWriter. Implemented ability to save RDDs as SequenceFiles and ObjectFiles. 1> HadoopFileWriter changed to take class types as constructor parameters (no more generic type) 2> Multiple types of RDD.saveAsHadoopFile() implemented to provide more saving options 3> RDD.saveAsSequenceFile() automatically converts basic types to Writable types before saving as SequenceFile 4> RDD.saveAsObjectFile() serializes objects and saves them to a ObjectFile 5> SparkContext.objectFile() opens the saved ObjectFiles	2011-06-24 19:51:21 -07:00
Olivier Grisel	2e3531d8bf	Implemented RDD.leftOuterJoin and RDD.rightOuterJoin	2011-06-24 11:00:51 +02:00
Tathagata Das	3d2befe831	Improved HadoopFileWriter (saves key and value classes to jobconf)	2011-06-23 08:11:22 -07:00
Matei Zaharia	214250016a	Added simple version of lookup	2011-06-20 11:59:16 -07:00
Matei Zaharia	23b42af70a	Merge branch 'master' into scala-2.9	2011-06-19 23:06:21 -07:00
Matei Zaharia	23b1c309fb	Added pipe() operation on RDDs for mapping through a shell command.	2011-06-19 23:05:19 -07:00
Tathagata Das	b5e6645505	Cleaner reimplementation of HadoopFileWriter. Introduced TaskContext. 1> HadoopFileWriter works correctly with task failures 2> It can also take an user specified JobConf object for configuration settings 3> A Task can now get information like stage ID, split ID, and attempt ID using TaskContext class 4> Minor changes in SparkContext, DAGScheduler and subclasses to allow specification of TaskContext as a parameter	2011-06-16 20:57:57 -07:00
Tathagata Das	869836a2fa	Implemented TaskContext to hold contextual information (jobID, taskID, attemptID) of a task	2011-06-10 19:47:28 -07:00
Tathagata Das	389e56156f	HadoopFileWriter changed to use Hadoop's OutputCommitter	2011-06-09 15:29:22 -07:00
Tathagata Das	24d845833c	First-cut implementation of RDD.SaveAsText	2011-06-05 04:14:43 -07:00
Matei Zaharia	3297706ab2	Merge remote-tracking branch 'origin/master' into scala-2.9	2011-06-01 11:46:31 -07:00
Matei Zaharia	9bb448a151	Catch Throwable instead of Exception in LocalScheduler and Executor. Fixes #57 .	2011-06-01 11:45:47 -07:00
Matei Zaharia	850fe3274e	Make the runJob API public. Fixes #56 .	2011-06-01 11:38:44 -07:00
Ismael Juma	82f10bd794	Remove unnecessary toStream calls.	2011-06-01 16:12:42 +01:00
Matei Zaharia	10fe324845	Merge remote-tracking branch 'origin/master' into scala-2.9	2011-05-31 23:48:11 -07:00
Matei Zaharia	5166d76843	Ensure logging is initialized before spawning any threads to fix issue #45	2011-05-31 23:47:32 -07:00
Matei Zaharia	0afd35a8dd	Some docs in ClosureCleaner	2011-05-31 22:06:30 -07:00
Matei Zaharia	8b0390d344	Instantiate NullWritable properly in HadoopFile	2011-05-30 23:54:14 -07:00
Matei Zaharia	4096c2287e	Various fixes	2011-05-29 18:46:01 -07:00
Matei Zaharia	ef706ae959	Merge branch 'master' into new-rdds-protobuf Conflicts: run	2011-05-29 16:20:23 -07:00
Matei Zaharia	c501cff924	Executor was looking for the wrong constructor for ExecutorClassLoader	2011-05-29 16:15:59 -07:00
Ismael Juma	1396678baa	Move REPL classes to separate module.	2011-05-27 11:22:50 +01:00
Ismael Juma	164ef4c751	Use explicit asInstanceOf instead of misleading unchecked pattern matching. Also enable -unchecked warnings in SBT build file.	2011-05-27 07:57:10 +01:00
Ismael Juma	89c8ea2bb2	Replace deprecated `-` and `--` with suggested filterNot (which is uglier).	2011-05-26 22:22:37 +01:00
Ismael Juma	94f05683bd	Replace deprecated `first` with `head`.	2011-05-26 22:13:41 +01:00
Ismael Juma	0b6a862b68	Use math instead of Math as the latter is deprecated.	2011-05-26 22:06:36 +01:00
Ismael Juma	1f27d94c48	Use Array.iterator instead of Iterator.fromArray as the latter is deprecated.	2011-05-26 22:04:42 +01:00
Ismael Juma	1993a8e556	Use += instead of + for mutable sequences as the latter is deprecated.	2011-05-26 21:59:48 +01:00
root	5ef938615f	Initial work on making stuff compile with protobuf Mesos	2011-05-24 22:27:08 +00:00
Matei Zaharia	cec427e777	Fixed a bug with preferred locations having changed meaning in new RDDs	2011-05-22 17:12:29 -07:00
Matei Zaharia	4c888b2933	Fix queue type for executor	2011-05-22 16:42:05 -07:00
Matei Zaharia	bea3a33012	doc tweak	2011-05-22 16:03:41 -07:00
Matei Zaharia	9bde5a54cb	class loader fix	2011-05-22 16:00:41 -07:00
Matei Zaharia	91c07a33d9	Various fixes to serialization	2011-05-21 22:50:08 -07:00
Matei Zaharia	f61b61c4ac	Merge branch 'master' into new-rdds	2011-05-21 21:25:58 -07:00
Matei Zaharia	24a1e7f838	Scheduler can now recover from lost map outputs	2011-05-20 00:19:53 -07:00
Matei Zaharia	82329b0b28	Updated scheduler to support running on just some partitions of final RDD	2011-05-19 12:47:09 -07:00
Matei Zaharia	328e51b693	Various minor fixes	2011-05-19 11:19:25 -07:00
Matei Zaharia	fd1d255821	Stop objectifying various trackers, caches, etc.	2011-05-17 12:41:13 -07:00
Matei Zaharia	4db50e26c7	Fixed unit tests by making them clean up the SparkContext after use and thus clean up the various singletons (RDDCache, MapOutputTracker, etc). This isn't perfect yet (ideally we shouldn't use singleton objects at all) but we can fix that later.	2011-05-13 12:03:58 -07:00
Matei Zaharia	aca8150c52	Ensure that AddedToCache messages make it home before tasks finish	2011-05-13 11:43:52 -07:00
Matei Zaharia	16c886a581	Optimization for count()	2011-05-13 10:41:34 -07:00
Mosharaf Chowdhury	db7a2c4897	Issue #42 fixed.	2011-04-28 14:30:48 -07:00
Ankur Dave	a4c04f3f6f	Error handling for disk I/O in DiskSpillingCache Also renamed the property spark.DiskSpillingCache.cacheDir to spark.diskSpillingCache.cacheDir in order to follow conventions.	2011-04-27 23:23:29 -07:00
Ankur Dave	12ff0d2dc3	Bring an entry back into memory after fetching it from disk	2011-04-27 22:59:05 -07:00
Ankur Dave	e30313aa2c	Added DiskSpillingCache DiskSpillingCache is a BoundedMemoryCache that spills entries to disk when it runs out of space. Currently the implementation is very simple. In particular, it's missing the following features: - Error handling for disk I/O, including checking of disk space levels - Bringing an entry back into memory after fetching it from disk In addition, here are some features that aren't critical but should be implemented soon: - Spilling based on a user-set priority in addition to LRU - Caching into a subdirectory of spark.DiskSpillingCache.cacheDir rather than the root directory	2011-04-27 22:32:35 -07:00
Mosharaf Chowdhury	60d1121343	Refactoring: daemonThreadFactories have all been moved to the Utils object instead of having multiple copies in Broadcast and Shuffle objects.	2011-04-27 22:13:01 -07:00
Mosharaf Chowdhury	e898e108a3	Cleanup + refactoring...	2011-04-27 22:00:24 -07:00
Mosharaf Chowdhury	0567646180	Shuffle is also working from its own subpackage.	2011-04-27 21:11:41 -07:00

1 2 3 4 5

207 commits