Commit graph

7542 commits

Author SHA1 Message Date
Reynold Xin 761ea65a98 Added a test for the previous commit (failing to serialize task results
would throw an exception for local tasks).
2012-04-24 15:14:35 -07:00
Reynold Xin 9821cd4d42 Force serialize/deserialize task results in local execution mode. 2012-04-24 14:55:28 -07:00
Antonio 3e48818993 Removed commented-out System.exit call 2012-04-23 11:42:58 -07:00
Antonio 39d99168dc Added exception handling instead of just exiting in LocalScheduler for tasks that throw exceptions 2012-04-20 14:46:43 -07:00
Reynold Xin e601b3b9e5 Added the ability to set environmental variables in piped rdd. 2012-04-17 16:40:56 -07:00
Matei Zaharia 3b745176e0 Bug fix to pluggable closure serialization change 2012-04-12 17:53:02 +00:00
Matei Zaharia 112655f032 Merge pull request #121 from rxin/kryo-closure
Added an option (spark.closure.serializer) to specify the serializer for closures.
2012-04-10 14:21:02 -07:00
Reynold Xin d295ccb43c Added a closureSerializer field in SparkEnv and use it to serialize
tasks.
2012-04-10 13:29:46 -07:00
Reynold Xin 968f75f6af Added an option (spark.closure.serializer) to specify the serializer for
closures. This enables using Kryo as the closure serializer.
2012-04-09 21:59:56 -07:00
Matei Zaharia a69c0738d1 Merge branch 'master' into mesos-0.9 2012-04-08 23:41:36 -07:00
Matei Zaharia a633974143 Merge branch 'master' of github.com:mesos/spark 2012-04-08 23:41:25 -07:00
Matei Zaharia 0229d5390f Merge branch 'master' into mesos-0.9 2012-04-08 23:39:37 -07:00
Matei Zaharia d401e1b3e8 Fix a possible deadlock in MesosScheduler 2012-04-08 23:38:49 -07:00
Ankur Dave 7be1c7b331 Report entry dropping in BoundedMemoryCache 2012-04-06 15:49:32 -07:00
Matei Zaharia a8bb324ed9 Merge branch 'master' into mesos-0.9 2012-04-05 14:53:22 -07:00
Matei Zaharia 816d4e5840 Pass local IP address instead of hostname in spark.master.host. Fixes #117. 2012-04-05 14:53:17 -07:00
Matei Zaharia 335a6036ad Converted some tabs to spaces 2012-04-05 11:58:01 -07:00
Matei Zaharia 8c95a85438 Use Runtime.maxMemory instead of Runtime.totalMemory in
BoundedMemoryCache, in case the JVM was not started with its initial
heap size equaling its maximum one (-Xms == -Xmx).
2012-03-30 13:39:35 -04:00
Matei Zaharia 03d5b3b48d Use Runtime.maxMemory instead of Runtime.totalMemory in
BoundedMemoryCache, in case the JVM was not started with its initial
heap size equaling its maximum one (-Xms == -Xmx).
2012-03-30 13:38:19 -04:00
Matei Zaharia 95fb1a16b8 Use Mesos 0.9 RC3 JAR and protobuf 2.4.1 2012-03-30 11:38:49 -04:00
Matei Zaharia dfa3b6b544 Fixes to work with the very latest Mesos 0.9 API 2012-03-29 22:12:35 -04:00
Matei Zaharia 4d52cc6738 Merge branch 'master' into mesos-0.9 2012-03-29 21:29:39 -04:00
Reynold Xin 42dcdbcb2f Removed the extra spaces in OrderedRDDFunctions and SortedRDD. 2012-03-29 15:21:57 -07:00
Matei Zaharia 08cda89e8a Further fixes to how Mesos is found and used 2012-03-17 13:39:14 -07:00
Matei Zaharia 3c3fdf6eca Merge branch 'master' into mesos-0.9 2012-03-17 13:09:21 -07:00
Matei Zaharia c7af538ac1 Some fixes to sorting for when the RDD has fewer elements than the
number of partitions we ask to partition it into. Also, removed a test
that was taking way too long to run.
2012-03-17 13:08:36 -07:00
Matei Zaharia a099a63a8a Initial work to make Spark compile with Mesos 0.9 and Hadoop 1.0 2012-03-17 12:31:34 -07:00
Matei Zaharia a5e2b6a6bd Merge pull request #112 from cengle/master
Changed HadoopRDD to get key and value containers from the RecordReader instead of through reflection
2012-03-06 13:38:32 -08:00
Matei Zaharia 97eee50825 Fixes a nasty bug that could happen when tasks fail, because calling
wait() with a timeout of 0 on a Java object means "wait forever".
2012-03-01 13:43:17 -08:00
Cliff Engle dd68cb6099 Get key and value container from RecordReader 2012-02-29 16:33:23 -08:00
Matei Zaharia 1e10df0a46 Merge pull request #111 from alupher/master
Adding sorting to RDDs
2012-02-24 15:50:14 -08:00
Antonio 0d93d95bcf Removed unnecessary import 2012-02-21 19:57:12 -08:00
Antonio 2990298f71 Added sorting testing suite 2012-02-21 19:54:21 -08:00
Matei Zaharia aa04f87cd2 Added support for parallel execution of jobs in DAGScheduler. 2012-02-19 22:50:23 -08:00
Antonio 620798161b Added fixes to sorting 2012-02-13 00:07:39 -08:00
Matei Zaharia 2587ce1690 Fixed a deadlock that occured with MesosScheduler due to an earlier
synchronization change
2012-02-11 21:22:45 -08:00
Antonio e93f622665 Added sorting by key for pair RDDs 2012-02-11 00:56:28 -08:00
Matei Zaharia 98f008b721 Formatting fixes 2012-02-10 10:52:03 -08:00
Matei Zaharia 7660a8b12f Merge branch 'formatting'
Conflicts:
	core/src/main/scala/spark/DAGScheduler.scala
	core/src/main/scala/spark/SimpleShuffleFetcher.scala
	core/src/main/scala/spark/SparkContext.scala
2012-02-10 10:42:14 -08:00
haoyuan 194c42ab79 Code format. 2012-02-10 08:19:53 -08:00
Matei Zaharia 8f5ed51234 Delete Spark's temporary directories when the JVM exits. 2012-02-09 22:58:24 -08:00
Matei Zaharia c0a0df3285 Made the default cache BoundedMemoryCache, and reduced its default size 2012-02-09 22:32:02 -08:00
Matei Zaharia a766780f4c Added some tests for multithreaded access to Spark. 2012-02-09 22:27:53 -08:00
Matei Zaharia 0e93891d3d Replaced LocalFileShuffle with a non-singleton ShuffleManager class
and made DAGScheduler automatically set SparkEnv.
2012-02-09 22:14:56 -08:00
haoyuan 445e0bb1b5 Format the code a bit mroe. 2012-02-09 15:50:26 -08:00
haoyuan 651932e703 Format the code as coding style agreed by Matei/TD/Haoyuan 2012-02-09 13:26:23 -08:00
Matei Zaharia e02dc83a5b IO optimizations 2012-02-06 20:40:39 -08:00
Matei Zaharia c40e766368 Use java.util.HashMap in shuffles 2012-02-06 19:20:25 -08:00
Matei Zaharia b267175ab5 Synchronization fix in case SparkContext is used from multiple threads. 2012-02-06 14:28:18 -08:00
Matei Zaharia 43a3335090 Simplifying test 2012-02-05 22:46:51 -08:00
Hiral Patel b47952342e Add register immutable map to kryo serializer 2012-01-26 15:24:20 -08:00
Matei Zaharia fabcc82528 Merge pull request #103 from edisontung/master
Made improvements to takeSample. Also changed SparkLocalKMeans to SparkKMeans
2012-01-13 19:20:03 -08:00
Matei Zaharia fd5581a0d3 Fixed a failure recovery bug and added some tests for fault recovery. 2012-01-13 19:17:27 -08:00
Matei Zaharia eb05154b7a Fixed a failure recovery bug and added some tests for fault recovery. 2012-01-13 19:08:25 -08:00
Edison Tung 1ecc221f84 Fixed bugs
I've fixed the bugs detailed in the diff. One of the bugs was already
fixed on the local file (forgot to commit).
2012-01-09 11:59:52 -08:00
Matei Zaharia e269f6f7ea Register RDDs with the MapOutputTracker even if they have no partitions.
Fixes #105.
2012-01-05 15:59:20 -05:00
Matei Zaharia 3034fc0d91 Merge commit 'ad4ebff42c1b738746b2b9ecfbb041b6d06e3e16' 2011-12-14 18:19:43 +01:00
Matei Zaharia 6a650cbbdf Make Spark port default to 7077 so that it's not an ephemeral port that might be taken 2011-12-14 18:18:22 +01:00
Matei Zaharia 735843a049 Merge remote-tracking branch 'origin/charles-newhadoop' 2011-12-02 21:59:30 -08:00
Charles Reiss 66f05f383e Add new Hadoop API reading support. 2011-12-01 14:02:10 -08:00
Charles Reiss 02d43e6986 Add new Hadoop API writing support. 2011-12-01 14:01:28 -08:00
Edison Tung 42f8847a21 Revert de01b6deaaee1b43321e0aac330f4a98c0ea61c6^..HEAD 2011-12-01 13:43:25 -08:00
Edison Tung de01b6deaa Fixed bug in RDD
Math.min takes 2 args, not 1. This was not committed earlier for some
reason
2011-12-01 13:34:37 -08:00
Matei Zaharia 22b8fcf632 Added fold() and aggregate() operations that reuse an object to
merge results into rather than requiring a new object allocation
for each element merged. Fixes #95.
2011-11-30 11:37:47 -08:00
Matei Zaharia 09dd58b3a7 Send SPARK_JAVA_OPTS to slave nodes. 2011-11-30 11:34:58 -08:00
Edison Tung a3bc012af8 added takeSamples method
takeSamples method takes a specified number of samples from the RDD and
outputs it in an array.
2011-11-21 16:38:44 -08:00
Ankur Dave ad4ebff42c Deduplicate exceptions when printing them
The first time they appear, exceptions are printed in full, including
a stack trace. After that, they are printed in abbreviated form. They
are periodically reprinted in full; the reprint interval defaults to 5
seconds and is configurable using the property
spark.logging.exceptionPrintInterval.
2011-11-14 01:54:53 +00:00
Ankur Dave 35b6358a7c Report errors in tasks to the driver via a Mesos status update
When a task throws an exception, the Spark executor previously just
logged it to a local file on the slave and exited. This commit causes
Spark to also report the exception back to the driver using a Mesos
status update, so the user doesn't have to look through a log file on
the slave.

Here's what the reporting currently looks like:

    # ./run spark.examples.ExceptionHandlingTest master@203.0.113.1:5050
    [...]
    11/10/26 21:04:13 INFO spark.SimpleJob: Lost TID 1 (task 0:1)
    11/10/26 21:04:13 INFO spark.SimpleJob: Loss was due to java.lang.Exception: Testing exception handling
    [...]
    11/10/26 21:04:16 INFO spark.SparkContext: Job finished in 5.988547328 s
2011-11-14 01:54:53 +00:00
Matei Zaharia 07532021fe Bug fix: reject offers that we didn't find any tasks for 2011-11-08 23:05:54 -08:00
Matei Zaharia 9e4c79a4d3 Closure cleaner unit test 2011-11-08 00:40:15 -08:00
Matei Zaharia f346e64637 Updates to the closure cleaner to work better with closures in classes.
Before, the cleaner attempted to clone $outer objects that were classes
(as opposed to nested closures) and preserve only their used fields,
which was bad because it would miss fields that are accessed indirectly
by methods, and in general it would confuse user code. Now we keep a
reference to those objects without cloning them. This is not perfect
because the user still needs to be careful of what they'll carry along
into closures, but it works better in some cases that seemed confusing
before. We need to improve the documentation on what variables get
passed along with a closure and possibly add some debugging tools for it
as well.

Fixes #71 -- that code now works in the REPL.
2011-11-08 00:33:28 -08:00
Matei Zaharia c2b7fd6899 Make parallelize() work efficiently for ranges of Long, Double, etc
(splitting them into sub-ranges). Fixes #87.
2011-11-02 15:16:02 -07:00
Matei Zaharia 157279e9eb Update Spark to work with the latest Mesos API 2011-10-30 14:10:56 -07:00
root 3a0e6c4363 Miscellaneous fixes:
- Executor should initialize logging properly
- groupByKey should allow custom partitioner
2011-10-17 18:07:35 +00:00
root 62aa820084 Merge branch 'ankur-master' 2011-10-14 02:14:07 +00:00
Ankur Dave 2d7057bf5d Implement PairRDDFunctions.partitionBy 2011-10-09 15:52:09 -07:00
Ankur Dave 06637cb69e Fix PairRDDFunctions.groupWith partitioning
This commit fixes a bug in groupWith that was causing it to destroy
partitioning information. It replaces a call to map with a call to
mapValues, which preserves partitioning.
2011-10-09 15:48:46 -07:00
Ankur Dave 2911a783d6 Add custom partitioner support to PairRDDFunctions.combineByKey 2011-10-09 15:47:20 -07:00
Ankur Dave 6c6e47e3cd Use BufferedOutputStream in ShuffleMapTask 2011-10-09 15:43:31 -07:00
Matei Zaharia 1069740264 Added a jarOfObject method to get the JAR of the class that an object
belongs to, which seems like a more common case.
2011-08-29 23:27:10 -07:00
Matei Zaharia 0aa23bf17e Added a convenience method for getting the JAR file that loaded a class
(useful for jobs to pass their own JAR files to SparkContext).
2011-08-29 22:59:44 -07:00
Matei Zaharia a161f00610 Made a log message slightly less ugly 2011-08-27 16:58:54 -07:00
Matei Zaharia 3759bcd061 New mesos.jar 2011-08-10 14:03:48 -07:00
Matei Zaharia c22043f150 Minor fix: can use >= when checking memory 2011-08-02 19:11:17 -07:00
Ismael Juma 6ff57f5594 Use scala.math instead of Math as the latter is deprecated. 2011-08-02 10:25:47 +01:00
Ismael Juma 620de2dd1d Change currentThread to Thread.currentThread as the former is deprecated. 2011-08-02 10:25:16 +01:00
Ismael Juma 0fba22b3d2 Fix issue #65: Change @serializable to extends Serializable in 2.9 branch
Note that we use scala.Serializable introduced in Scala 2.9 instead of
java.io.Serializable. Also, case classes inherit from scala.Serializable by
default.
2011-08-02 10:16:33 +01:00
Matei Zaharia 711575391d Merge branch 'scala-2.9'
Conflicts:
	project/build/SparkProject.scala
2011-08-01 15:25:26 -07:00
Matei Zaharia 4050d661c5 Updated to newest Mesos API, which includes better memory accounting
by specifying per-executor memory.
2011-08-01 13:54:48 -07:00
Matei Zaharia d12122502b Various improvements to Kryo serializer:
- Replaced modified Kryo version with the standard one augmented with
  the kryo-serializers package, which includes support for classes with
  no-arg constructors (that was why we had a modified Kryo before)
- The kryo-serializers version also fixes issue #72.
- Added a bunch of tests.
- Serialize maps and a few other common types properly by default.
2011-07-21 22:09:33 -07:00
Matei Zaharia baa72e2747 Removed a debug statement that slipped in as a println 2011-07-21 16:09:33 -07:00
Matei Zaharia 2bfd7931e8 Merge branch 'new-rdds-protobuf'
Conflicts:
	core/src/main/scala/spark/Executor.scala
	core/src/main/scala/spark/RDD.scala
2011-07-21 16:08:39 -07:00
Matei Zaharia 1450fd74d9 Merge branch 'master' into scala-2.9 2011-07-14 17:37:24 -04:00
Matei Zaharia ccf48388cd Lowered default number of splits for files 2011-07-14 17:37:04 -04:00
Matei Zaharia 146a18c2a4 Merge branch 'master' into scala-2.9 2011-07-14 17:29:17 -04:00
Matei Zaharia c8eb8b2b90 Set class loader for remote actors to fix a bug that happens in 2.9 2011-07-14 17:29:11 -04:00
Matei Zaharia 8ea67307b9 Merge branch 'master' into scala-2.9 2011-07-14 14:47:12 -04:00
Matei Zaharia e4c3402d2d Renamed ParallelArray to ParallelCollection 2011-07-14 14:47:01 -04:00
Matei Zaharia 9ac461d85d Remove RDD.toString because it looked confusing 2011-07-14 14:39:32 -04:00
Matei Zaharia 797b4547c3 Fix tracking of updates in accumulators to solve an issue that would manifest in the 2.9 interpreter 2011-07-14 14:08:34 -04:00
Matei Zaharia 3efd9e94d8 Merge branch 'master' into scala-2.9 2011-07-14 12:42:57 -04:00
Matei Zaharia 0ccfe20755 Forgot to add a file 2011-07-14 12:42:50 -04:00
Matei Zaharia 38f38dda5b Merge branch 'master' into scala-2.9 2011-07-14 12:42:02 -04:00
Matei Zaharia 969644df8e Cleaned up a few issues to do with default parallelism levels. Also
renamed HadoopFileWriter to HadoopWriter (since it's not only for files)
and fixed a bug for lookup().
2011-07-14 12:40:56 -04:00
Matei Zaharia 2fb906e8e5 Merge branch 'master' into scala-2.9 2011-07-14 00:20:14 -04:00
Matei Zaharia 2604939f64 Simplified and documented code a little and added test 2011-07-14 00:19:00 -04:00
Matei Zaharia 2439e51a03 Merge branch 'master' into implicit-sequencefile 2011-07-13 23:20:22 -04:00
Matei Zaharia d0c7958364 Merge branch 'master' into scala-2.9
Conflicts:
	core/src/main/scala/spark/HadoopFileWriter.scala
2011-07-13 23:09:33 -04:00
Matei Zaharia 9c0069188b Updated save code to allow non-file-based OutputFormats and added a test
for file-related stuff
2011-07-13 23:04:06 -04:00
Matei Zaharia da8a3b8926 Increase default value of spark.locality.wait a little 2011-07-13 20:07:24 -04:00
Matei Zaharia 080869c6ef Merge branch 'master' into scala-2.9 2011-07-13 00:20:08 -04:00
Matei Zaharia 842e14d567 Added mapPartitions operation and a bunch of tests for RDD ops 2011-07-13 00:19:52 -04:00
Matei Zaharia 9b568d37f7 Merge branch 'master' into scala-2.9
Conflicts:
	core/src/main/scala/spark/RDD.scala
2011-07-11 22:25:53 -04:00
Matei Zaharia d05fea24f3 Simplified parallel shuffle fetcher to use URLConnection 2011-07-11 22:12:36 -04:00
Matei Zaharia 25c3a7781c Moved PairRDD and SequenceFileRDD functions to separate source files 2011-07-10 00:06:15 -04:00
Matei Zaharia b7f1f62ff5 bug fix 2011-07-09 18:53:02 -04:00
Matei Zaharia 003480f374 Register byte[] with Kryo serializer 2011-07-09 18:08:07 -04:00
Matei Zaharia aea5cb4413 Added parallel shuffle fetcher 2011-07-09 17:25:56 -04:00
Matei Zaharia 4b1646a25f Support for non-filesystem-based Hadoop data sources 2011-07-06 20:37:55 -04:00
Matei Zaharia 07a97d47c2 Support for non-filesystem-based Hadoop data sources 2011-07-06 20:37:34 -04:00
Matei Zaharia 3488c386a9 Initial work to make stuff like sequenceFile[Int, Int] work without
requiring the user to provide a Writable type. The approach here might
not be the best but it seems to work correctly.
2011-06-28 17:07:04 -07:00
Matei Zaharia 5633299ec6 Merge remote-tracking branch 'origin/master' into scala-2.9 2011-06-27 22:50:59 -07:00
Matei Zaharia b0ecf1ee41 Don't pass a null context when running tasks locally 2011-06-27 22:50:43 -07:00
Matei Zaharia 85cad5d9dd Fixed HadoopFileWriter to compile for Scala 2.9 2011-06-27 22:44:14 -07:00
Matei Zaharia 393607d5ef Merge branch 'master' into scala-2.9 2011-06-27 18:08:25 -07:00
Matei Zaharia 2f652f1656 Fix a compile error 2011-06-27 18:07:16 -07:00
Tathagata Das 3f08e1129f Merge branch 'master' into td-rdd-save
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
2011-06-27 13:43:44 -07:00
Tathagata Das ad842ac823 Merge branch 'master' into td-rdd-save
Conflicts:
	core/src/main/scala/spark/RDD.scala
2011-06-27 13:39:11 -07:00
Matei Zaharia bae8a97968 Merge branch 'master' into scala-2.9
Conflicts:
	repl/src/main/scala/spark/repl/SparkInterpreterLoop.scala
2011-06-26 19:22:27 -07:00
Matei Zaharia c4dd68ae21 Merge branch 'mos-bt'
This merge keeps only the broadcast work in mos-bt because the structure
of shuffle has changed with the new RDD design. We still need some kind
of parallel shuffle but that will be added later.

Conflicts:
	core/src/main/scala/spark/BitTorrentBroadcast.scala
	core/src/main/scala/spark/ChainedBroadcast.scala
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/Utils.scala
	core/src/main/scala/spark/shuffle/BasicLocalFileShuffle.scala
	core/src/main/scala/spark/shuffle/DfsShuffle.scala
2011-06-26 18:22:12 -07:00
Tathagata Das 38f2ba99cc Further changes to HadoopFileWriter. Implemented ability to save RDDs as SequenceFiles and ObjectFiles.
1> HadoopFileWriter changed to take class types as constructor parameters (no more generic type)
2> Multiple types of RDD.saveAsHadoopFile() implemented to provide more saving options
3> RDD.saveAsSequenceFile() automatically converts basic types to Writable types before saving as SequenceFile
4> RDD.saveAsObjectFile() serializes objects and saves them to a ObjectFile
5> SparkContext.objectFile() opens the saved ObjectFiles
2011-06-24 19:51:21 -07:00
Olivier Grisel 2e3531d8bf Implemented RDD.leftOuterJoin and RDD.rightOuterJoin 2011-06-24 11:00:51 +02:00
Tathagata Das 3d2befe831 Improved HadoopFileWriter (saves key and value classes to jobconf) 2011-06-23 08:11:22 -07:00
Olivier Grisel 005d1605a4 add missing test for RDD.groupWith 2011-06-23 02:10:52 +02:00
Matei Zaharia 214250016a Added simple version of lookup 2011-06-20 11:59:16 -07:00
Matei Zaharia 23b42af70a Merge branch 'master' into scala-2.9 2011-06-19 23:06:21 -07:00
Matei Zaharia 23b1c309fb Added pipe() operation on RDDs for mapping through a shell command. 2011-06-19 23:05:19 -07:00
Tathagata Das b5e6645505 Cleaner reimplementation of HadoopFileWriter. Introduced TaskContext.
1> HadoopFileWriter works correctly with task failures
2> It can also take an user specified JobConf object for configuration settings
3> A Task can now get information like stage ID, split ID, and attempt ID using TaskContext class
4> Minor changes in SparkContext, DAGScheduler and subclasses to allow specification of TaskContext as a parameter
2011-06-16 20:57:57 -07:00
Tathagata Das 869836a2fa Implemented TaskContext to hold contextual information (jobID, taskID, attemptID) of a task 2011-06-10 19:47:28 -07:00
Tathagata Das 389e56156f HadoopFileWriter changed to use Hadoop's OutputCommitter 2011-06-09 15:29:22 -07:00
Tathagata Das 24d845833c First-cut implementation of RDD.SaveAsText 2011-06-05 04:14:43 -07:00
Matei Zaharia 3297706ab2 Merge remote-tracking branch 'origin/master' into scala-2.9 2011-06-01 11:46:31 -07:00
Matei Zaharia 9bb448a151 Catch Throwable instead of Exception in LocalScheduler and Executor. Fixes #57. 2011-06-01 11:45:47 -07:00
Matei Zaharia 850fe3274e Make the runJob API public. Fixes #56. 2011-06-01 11:38:44 -07:00
Ismael Juma 82f10bd794 Remove unnecessary toStream calls. 2011-06-01 16:12:42 +01:00
Matei Zaharia 10fe324845 Merge remote-tracking branch 'origin/master' into scala-2.9 2011-05-31 23:48:11 -07:00
Matei Zaharia 5166d76843 Ensure logging is initialized before spawning any threads to fix issue #45 2011-05-31 23:47:32 -07:00
Matei Zaharia 0afd35a8dd Some docs in ClosureCleaner 2011-05-31 22:06:30 -07:00
Matei Zaharia 8b0390d344 Instantiate NullWritable properly in HadoopFile 2011-05-30 23:54:14 -07:00
Matei Zaharia 4096c2287e Various fixes 2011-05-29 18:46:01 -07:00
Matei Zaharia ef706ae959 Merge branch 'master' into new-rdds-protobuf
Conflicts:
	run
2011-05-29 16:20:23 -07:00
Matei Zaharia c501cff924 Executor was looking for the wrong constructor for ExecutorClassLoader 2011-05-29 16:15:59 -07:00
Ismael Juma 1396678baa Move REPL classes to separate module. 2011-05-27 11:22:50 +01:00
Ismael Juma 051da8b4ad Delete liblzf from lib as it's no longer used. 2011-05-27 11:22:10 +01:00
Ismael Juma ae1a1f91f1 Remove several dependencies from git and configure them as SBT managed dependencies.
Upgrade some of the dependencies while at it.
2011-05-27 11:22:01 +01:00
Ismael Juma 164ef4c751 Use explicit asInstanceOf instead of misleading unchecked pattern matching.
Also enable -unchecked warnings in SBT build file.
2011-05-27 07:57:10 +01:00
Ismael Juma 89c8ea2bb2 Replace deprecated - and -- with suggested filterNot (which is uglier). 2011-05-26 22:22:37 +01:00
Ismael Juma 94f05683bd Replace deprecated first with head. 2011-05-26 22:13:41 +01:00
Ismael Juma 0b6a862b68 Use math instead of Math as the latter is deprecated. 2011-05-26 22:06:36 +01:00
Ismael Juma 1f27d94c48 Use Array.iterator instead of Iterator.fromArray as the latter is deprecated. 2011-05-26 22:04:42 +01:00
Ismael Juma 1993a8e556 Use += instead of + for mutable sequences as the latter is deprecated. 2011-05-26 21:59:48 +01:00
root 5ef938615f Initial work on making stuff compile with protobuf Mesos 2011-05-24 22:27:08 +00:00
Matei Zaharia cec427e777 Fixed a bug with preferred locations having changed meaning in new RDDs 2011-05-22 17:12:29 -07:00
Matei Zaharia 4c888b2933 Fix queue type for executor 2011-05-22 16:42:05 -07:00
Matei Zaharia bea3a33012 doc tweak 2011-05-22 16:03:41 -07:00
Matei Zaharia 9bde5a54cb class loader fix 2011-05-22 16:00:41 -07:00
Matei Zaharia 91c07a33d9 Various fixes to serialization 2011-05-21 22:50:08 -07:00
Matei Zaharia f61b61c4ac Merge branch 'master' into new-rdds 2011-05-21 21:25:58 -07:00
Matei Zaharia 24a1e7f838 Scheduler can now recover from lost map outputs 2011-05-20 00:19:53 -07:00
Matei Zaharia 82329b0b28 Updated scheduler to support running on just some partitions of final RDD 2011-05-19 12:47:09 -07:00
Matei Zaharia 328e51b693 Various minor fixes 2011-05-19 11:19:25 -07:00
Matei Zaharia fd1d255821 Stop objectifying various trackers, caches, etc. 2011-05-17 12:41:13 -07:00
Matei Zaharia 4db50e26c7 Fixed unit tests by making them clean up the SparkContext after use and
thus clean up the various singletons (RDDCache, MapOutputTracker, etc).
This isn't perfect yet (ideally we shouldn't use singleton objects at
all) but we can fix that later.
2011-05-13 12:03:58 -07:00
Matei Zaharia aca8150c52 Ensure that AddedToCache messages make it home before tasks finish 2011-05-13 11:43:52 -07:00
Matei Zaharia 16c886a581 Optimization for count() 2011-05-13 10:41:34 -07:00
Mosharaf Chowdhury db7a2c4897 Issue #42 fixed. 2011-04-28 14:30:48 -07:00
Ankur Dave a4c04f3f6f Error handling for disk I/O in DiskSpillingCache
Also renamed the property spark.DiskSpillingCache.cacheDir to spark.diskSpillingCache.cacheDir in order to follow conventions.
2011-04-27 23:23:29 -07:00
Ankur Dave 12ff0d2dc3 Bring an entry back into memory after fetching it from disk 2011-04-27 22:59:05 -07:00
Ankur Dave e30313aa2c Added DiskSpillingCache
DiskSpillingCache is a BoundedMemoryCache that spills entries to disk
when it runs out of space. Currently the implementation is very
simple. In particular, it's missing the following features:

- Error handling for disk I/O, including checking of disk space levels
- Bringing an entry back into memory after fetching it from disk

In addition, here are some features that aren't critical but should be
implemented soon:

- Spilling based on a user-set priority in addition to LRU
- Caching into a subdirectory of spark.DiskSpillingCache.cacheDir
  rather than the root directory
2011-04-27 22:32:35 -07:00
Mosharaf Chowdhury 60d1121343 Refactoring: daemonThreadFactories have all been moved to the Utils
object instead of having multiple copies in Broadcast and Shuffle
objects.
2011-04-27 22:13:01 -07:00
Mosharaf Chowdhury e898e108a3 Cleanup + refactoring... 2011-04-27 22:00:24 -07:00
Mosharaf Chowdhury 0567646180 Shuffle is also working from its own subpackage. 2011-04-27 21:11:41 -07:00
Mosharaf Chowdhury 2742de707a Removed some shuffle implementations. Remaining ones all use local files
to write map outputs.
2011-04-27 20:53:43 -07:00
Mosharaf Chowdhury 9d78779257 Merge branch 'mos-shuffle-tracked' into mos-bt
Conflicts:
	core/src/main/scala/spark/Broadcast.scala
2011-04-27 20:47:07 -07:00
Mosharaf Chowdhury ac7e066383 Merge branch 'master' into mos-shuffle-tracked
Conflicts:
	.gitignore
	core/src/main/scala/spark/LocalFileShuffle.scala
	src/scala/spark/BasicLocalFileShuffle.scala
	src/scala/spark/Broadcast.scala
	src/scala/spark/LocalFileShuffle.scala
2011-04-27 14:35:03 -07:00
Mosharaf Chowdhury 4e4c41026c Added support for custom classes. (from 49ea48) 2011-04-27 12:30:16 -07:00
Mosharaf Chowdhury 65848da8df Refacoring... 2011-04-26 17:41:31 -07:00
Mosharaf Chowdhury b8ab7862b8 Moved broadcast-related code to separate directory under spark.broadcast
package.
2011-04-26 17:22:52 -07:00
Mosharaf Chowdhury e31007248c Merge branch 'master' into mos-bt 2011-04-26 12:04:14 -07:00
Mosharaf Chowdhury 9257a55e3a Refactoring... 2011-04-26 11:45:36 -07:00
Mosharaf Chowdhury 9d2d533493 Temporary fix for issue #42. 2011-04-21 17:40:26 -07:00
Timothy Hunter 5c9535228a fixed small bug when classpath has some strange formatting 2011-04-18 17:12:29 -07:00
Mosharaf Chowdhury a8f47a62b9 Renamed MaxRxPeers to MaxTxPeers to MaxTxSlots and MaxRxSlots
respectively for clarity (most probably they were misunderstood and
misused)
2011-04-13 16:24:19 -07:00
Matei Zaharia 94ba95bcb2 Added flatMapValues 2011-04-12 19:51:58 -07:00
Mosharaf Chowdhury b67a968b5d hasBlocks is now AtomicInteger (even though it was ok) 2011-04-02 22:03:18 -07:00
Mosharaf Chowdhury 5bf3c83b13 BroadcastSuperTracker (right now for BT) is contacted over TCP instead
of direct procedure call.

Need to do the same for others and consolidate all broadcast mechanisms.
2011-04-01 19:31:28 -07:00
Mosharaf Chowdhury 733a130108 Formatting... 2011-04-01 14:51:24 -07:00
Mosharaf Chowdhury 4636aea598 Formatting... 2011-04-01 14:49:59 -07:00
Mosharaf Chowdhury addd569e52 Each broadcasted variable can have different blockSize. Corresponding
logic to adapt blockSize based on network condition is not yet
implemented.

Formatting + consolidation.
2011-03-31 14:51:46 -07:00
Mosharaf Chowdhury 815f3411ec Consolidated Broadcast config params. 2011-03-30 16:45:51 -07:00
Mosharaf Chowdhury a18a28b08e Removed gossip-related code that were already commented out.
More formatting.
2011-03-30 14:22:09 -07:00
Mosharaf Chowdhury 43aceafd70 Formatting... 2011-03-30 12:18:50 -07:00
Mosharaf Chowdhury 73b165220d Random is the default choice; rarestFirst didn't work well in
experiments.
2011-03-29 13:06:43 -07:00
Matei Zaharia d840fa8d0c Merge remote branch 'origin/custom-serialization' into new-rdds 2011-03-09 00:40:07 -08:00
root ff5b13799a Some tweaks to make Kryo cache work better 2011-03-09 03:31:50 -05:00
Matei Zaharia 7febdfbe29 Better reuse of buffers in Kryo serialization 2011-03-08 12:36:36 -08:00
Matei Zaharia 8ee3ec29ee Merge remote branch 'origin/custom-serialization' into new-rdds 2011-03-08 11:58:19 -08:00
Matei Zaharia 7408230bfa Updated modified Kryo to use objenesis 2011-03-08 11:58:08 -08:00
Matei Zaharia ab1216cb14 Register None and Nil properly 2011-03-08 11:52:58 -08:00
Matei Zaharia d39f5dd15e Merge remote branch 'origin/custom-serialization' into new-rdds 2011-03-08 10:28:50 -08:00
Matei Zaharia 4f0d0a7b73 stuff 2011-03-08 10:28:26 -08:00
Matei Zaharia 8b6f3db415 Merge remote branch 'origin/custom-serialization' into new-rdds 2011-03-07 19:20:28 -08:00
Matei Zaharia 38f6bce33d Added SerializingCache 2011-03-07 19:16:24 -08:00
Matei Zaharia 6316c7979d Remove some logging 2011-03-07 18:56:36 -08:00
Matei Zaharia e7b4b047a6 Added pluggable serializers and Kryo serialization 2011-03-07 18:41:53 -08:00
Matei Zaharia 467f056e29 Remove commented code 2011-03-06 23:38:41 -08:00
Matei Zaharia bce95b8458 Finished cogroup stuff 2011-03-06 23:38:16 -08:00
Matei Zaharia 04c2d6a60c stuff 2011-03-06 19:27:03 -08:00
Matei Zaharia 0fb691dd28 Various fixes to get MesosScheduler working with new RDDs 2011-03-06 16:16:38 -08:00
Matei Zaharia 1df5a65a01 Pass cache locations correctly to DAGScheduler. 2011-03-06 12:16:38 -08:00
Matei Zaharia e1436f1eaa Merge remote branch 'origin/master' into new-rdds 2011-03-06 11:11:47 -08:00
Matei Zaharia 370b95816f Added sampling for large arrays in SizeEstimator 2011-03-06 11:11:20 -08:00
Matei Zaharia a789e9aaea Merge remote branch 'origin/master' into new-rdds 2011-03-01 10:33:37 -08:00
Matei Zaharia 021c50a8d4 Remove unnecessary lock which was there to work around a bug in
Configuration in Hadoop 0.20.0
2011-03-01 10:28:38 -08:00
Matei Zaharia adaba4d550 Removed old slf4j jars that came with Hadoop 2011-03-01 10:28:21 -08:00
Matei Zaharia 447debb771 Updated Hadoop to 0.20.2 to include some bug fixes 2011-03-01 10:27:48 -08:00
Matei Zaharia 9e59afd710 More work on new RDD design 2011-02-27 19:15:52 -08:00
Matei Zaharia f38f86d59e More stuff 2011-02-27 14:27:12 -08:00
Matei Zaharia 2e6023f2bf stuff 2011-02-26 23:41:44 -08:00
Matei Zaharia 309367c477 Initial work towards new RDD design 2011-02-26 23:15:33 -08:00
Mosharaf Chowdhury 0416cc22d2 Picking peers weighted by the number of rare blocks they have. A block is rare if there are at most 2 copies in the neighborhood. Better number can be used (some function of neighborhood size) 2011-02-15 16:27:44 -08:00
Mosharaf Chowdhury cf81da9485 Optimization: Master sends out at least one copy of each block first regardless of whatever a client is asking for. Once one copy of each block is out, Master then responds to specific blocks from individual receivers. 2011-02-14 15:08:33 -08:00
Mosharaf Chowdhury 2b946fb2d1 pickBlockRarestFirst and gossips commented OUT for now.
Problem with the rarestFirst implemention is that we are picking peers randomly first and then picking blocks from the random peer using rarestFirst. NOT the right away to do it. It should be the other way around.
Problem with gossip is that peers might end up overwriting newer information by older ones. To fix that we either have to have timestamps or must match the bitVectors before overwriting.
2011-02-13 13:53:15 -08:00
Mosharaf Chowdhury ca2895ebb0 Fix in rarestFirst implemenation.
If there are more than one rarest blocks, pick randomly between them (was deterministic before)
2011-02-10 20:37:44 -08:00
Mosharaf Chowdhury 520bbdc7e3 Peers now gossip about their neighbors when they talk. 2011-02-10 20:15:30 -08:00
Matei Zaharia dc24aecd8f Close record readers in HadoopFile after finishing a split 2011-02-10 12:07:48 -08:00
Mosharaf Chowdhury 441462bc7f Fixed some warnings during compilation. 2011-02-09 12:11:43 -08:00
Mosharaf Chowdhury 1a73c0d265 Merged with master. Using sbt. 2011-02-09 10:48:48 -08:00
Mosharaf Chowdhury 495b38658e Merge branch 'master' into mos-bt 2011-02-09 10:40:23 -08:00
Matei Zaharia 99f3f23efa Changed default shuffle to LocalFileShuffle because it's way faster for small files 2011-02-08 17:03:03 -08:00
Matei Zaharia ec28b607fd Merge branch 'master' into sbt
Conflicts:
	Makefile
	core/src/main/java/spark/compress/lzf/LZF.java
	core/src/main/java/spark/compress/lzf/LZFInputStream.java
	core/src/main/java/spark/compress/lzf/LZFOutputStream.java
	core/src/main/native/spark_compress_lzf_LZF.c
	run
2011-02-02 00:25:54 -08:00
Matei Zaharia e5c4cd8a5e Made examples and core subprojects 2011-02-01 15:11:08 -08:00