Commit graph

213 commits

Author SHA1 Message Date
Matei Zaharia 1069740264 Added a jarOfObject method to get the JAR of the class that an object
belongs to, which seems like a more common case.
2011-08-29 23:27:10 -07:00
Matei Zaharia 0aa23bf17e Added a convenience method for getting the JAR file that loaded a class
(useful for jobs to pass their own JAR files to SparkContext).
2011-08-29 22:59:44 -07:00
Matei Zaharia a161f00610 Made a log message slightly less ugly 2011-08-27 16:58:54 -07:00
Matei Zaharia 3759bcd061 New mesos.jar 2011-08-10 14:03:48 -07:00
Matei Zaharia c22043f150 Minor fix: can use >= when checking memory 2011-08-02 19:11:17 -07:00
Ismael Juma 6ff57f5594 Use scala.math instead of Math as the latter is deprecated. 2011-08-02 10:25:47 +01:00
Ismael Juma 620de2dd1d Change currentThread to Thread.currentThread as the former is deprecated. 2011-08-02 10:25:16 +01:00
Ismael Juma 0fba22b3d2 Fix issue #65: Change @serializable to extends Serializable in 2.9 branch
Note that we use scala.Serializable introduced in Scala 2.9 instead of
java.io.Serializable. Also, case classes inherit from scala.Serializable by
default.
2011-08-02 10:16:33 +01:00
Matei Zaharia 711575391d Merge branch 'scala-2.9'
Conflicts:
	project/build/SparkProject.scala
2011-08-01 15:25:26 -07:00
Matei Zaharia 4050d661c5 Updated to newest Mesos API, which includes better memory accounting
by specifying per-executor memory.
2011-08-01 13:54:48 -07:00
Matei Zaharia d12122502b Various improvements to Kryo serializer:
- Replaced modified Kryo version with the standard one augmented with
  the kryo-serializers package, which includes support for classes with
  no-arg constructors (that was why we had a modified Kryo before)
- The kryo-serializers version also fixes issue #72.
- Added a bunch of tests.
- Serialize maps and a few other common types properly by default.
2011-07-21 22:09:33 -07:00
Matei Zaharia baa72e2747 Removed a debug statement that slipped in as a println 2011-07-21 16:09:33 -07:00
Matei Zaharia 2bfd7931e8 Merge branch 'new-rdds-protobuf'
Conflicts:
	core/src/main/scala/spark/Executor.scala
	core/src/main/scala/spark/RDD.scala
2011-07-21 16:08:39 -07:00
Matei Zaharia 1450fd74d9 Merge branch 'master' into scala-2.9 2011-07-14 17:37:24 -04:00
Matei Zaharia ccf48388cd Lowered default number of splits for files 2011-07-14 17:37:04 -04:00
Matei Zaharia 146a18c2a4 Merge branch 'master' into scala-2.9 2011-07-14 17:29:17 -04:00
Matei Zaharia c8eb8b2b90 Set class loader for remote actors to fix a bug that happens in 2.9 2011-07-14 17:29:11 -04:00
Matei Zaharia 8ea67307b9 Merge branch 'master' into scala-2.9 2011-07-14 14:47:12 -04:00
Matei Zaharia e4c3402d2d Renamed ParallelArray to ParallelCollection 2011-07-14 14:47:01 -04:00
Matei Zaharia 9ac461d85d Remove RDD.toString because it looked confusing 2011-07-14 14:39:32 -04:00
Matei Zaharia 797b4547c3 Fix tracking of updates in accumulators to solve an issue that would manifest in the 2.9 interpreter 2011-07-14 14:08:34 -04:00
Matei Zaharia 3efd9e94d8 Merge branch 'master' into scala-2.9 2011-07-14 12:42:57 -04:00
Matei Zaharia 0ccfe20755 Forgot to add a file 2011-07-14 12:42:50 -04:00
Matei Zaharia 38f38dda5b Merge branch 'master' into scala-2.9 2011-07-14 12:42:02 -04:00
Matei Zaharia 969644df8e Cleaned up a few issues to do with default parallelism levels. Also
renamed HadoopFileWriter to HadoopWriter (since it's not only for files)
and fixed a bug for lookup().
2011-07-14 12:40:56 -04:00
Matei Zaharia 2fb906e8e5 Merge branch 'master' into scala-2.9 2011-07-14 00:20:14 -04:00
Matei Zaharia 2604939f64 Simplified and documented code a little and added test 2011-07-14 00:19:00 -04:00
Matei Zaharia 2439e51a03 Merge branch 'master' into implicit-sequencefile 2011-07-13 23:20:22 -04:00
Matei Zaharia d0c7958364 Merge branch 'master' into scala-2.9
Conflicts:
	core/src/main/scala/spark/HadoopFileWriter.scala
2011-07-13 23:09:33 -04:00
Matei Zaharia 9c0069188b Updated save code to allow non-file-based OutputFormats and added a test
for file-related stuff
2011-07-13 23:04:06 -04:00
Matei Zaharia da8a3b8926 Increase default value of spark.locality.wait a little 2011-07-13 20:07:24 -04:00
Matei Zaharia 080869c6ef Merge branch 'master' into scala-2.9 2011-07-13 00:20:08 -04:00
Matei Zaharia 842e14d567 Added mapPartitions operation and a bunch of tests for RDD ops 2011-07-13 00:19:52 -04:00
Matei Zaharia 9b568d37f7 Merge branch 'master' into scala-2.9
Conflicts:
	core/src/main/scala/spark/RDD.scala
2011-07-11 22:25:53 -04:00
Matei Zaharia d05fea24f3 Simplified parallel shuffle fetcher to use URLConnection 2011-07-11 22:12:36 -04:00
Matei Zaharia 25c3a7781c Moved PairRDD and SequenceFileRDD functions to separate source files 2011-07-10 00:06:15 -04:00
Matei Zaharia b7f1f62ff5 bug fix 2011-07-09 18:53:02 -04:00
Matei Zaharia 003480f374 Register byte[] with Kryo serializer 2011-07-09 18:08:07 -04:00
Matei Zaharia aea5cb4413 Added parallel shuffle fetcher 2011-07-09 17:25:56 -04:00
Matei Zaharia 4b1646a25f Support for non-filesystem-based Hadoop data sources 2011-07-06 20:37:55 -04:00
Matei Zaharia 07a97d47c2 Support for non-filesystem-based Hadoop data sources 2011-07-06 20:37:34 -04:00
Matei Zaharia 3488c386a9 Initial work to make stuff like sequenceFile[Int, Int] work without
requiring the user to provide a Writable type. The approach here might
not be the best but it seems to work correctly.
2011-06-28 17:07:04 -07:00
Matei Zaharia 5633299ec6 Merge remote-tracking branch 'origin/master' into scala-2.9 2011-06-27 22:50:59 -07:00
Matei Zaharia b0ecf1ee41 Don't pass a null context when running tasks locally 2011-06-27 22:50:43 -07:00
Matei Zaharia 85cad5d9dd Fixed HadoopFileWriter to compile for Scala 2.9 2011-06-27 22:44:14 -07:00
Matei Zaharia 393607d5ef Merge branch 'master' into scala-2.9 2011-06-27 18:08:25 -07:00
Matei Zaharia 2f652f1656 Fix a compile error 2011-06-27 18:07:16 -07:00
Tathagata Das 3f08e1129f Merge branch 'master' into td-rdd-save
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
2011-06-27 13:43:44 -07:00
Tathagata Das ad842ac823 Merge branch 'master' into td-rdd-save
Conflicts:
	core/src/main/scala/spark/RDD.scala
2011-06-27 13:39:11 -07:00
Matei Zaharia bae8a97968 Merge branch 'master' into scala-2.9
Conflicts:
	repl/src/main/scala/spark/repl/SparkInterpreterLoop.scala
2011-06-26 19:22:27 -07:00
Matei Zaharia c4dd68ae21 Merge branch 'mos-bt'
This merge keeps only the broadcast work in mos-bt because the structure
of shuffle has changed with the new RDD design. We still need some kind
of parallel shuffle but that will be added later.

Conflicts:
	core/src/main/scala/spark/BitTorrentBroadcast.scala
	core/src/main/scala/spark/ChainedBroadcast.scala
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/Utils.scala
	core/src/main/scala/spark/shuffle/BasicLocalFileShuffle.scala
	core/src/main/scala/spark/shuffle/DfsShuffle.scala
2011-06-26 18:22:12 -07:00
Tathagata Das 38f2ba99cc Further changes to HadoopFileWriter. Implemented ability to save RDDs as SequenceFiles and ObjectFiles.
1> HadoopFileWriter changed to take class types as constructor parameters (no more generic type)
2> Multiple types of RDD.saveAsHadoopFile() implemented to provide more saving options
3> RDD.saveAsSequenceFile() automatically converts basic types to Writable types before saving as SequenceFile
4> RDD.saveAsObjectFile() serializes objects and saves them to a ObjectFile
5> SparkContext.objectFile() opens the saved ObjectFiles
2011-06-24 19:51:21 -07:00
Olivier Grisel 2e3531d8bf Implemented RDD.leftOuterJoin and RDD.rightOuterJoin 2011-06-24 11:00:51 +02:00
Tathagata Das 3d2befe831 Improved HadoopFileWriter (saves key and value classes to jobconf) 2011-06-23 08:11:22 -07:00
Olivier Grisel 005d1605a4 add missing test for RDD.groupWith 2011-06-23 02:10:52 +02:00
Matei Zaharia 214250016a Added simple version of lookup 2011-06-20 11:59:16 -07:00
Matei Zaharia 23b42af70a Merge branch 'master' into scala-2.9 2011-06-19 23:06:21 -07:00
Matei Zaharia 23b1c309fb Added pipe() operation on RDDs for mapping through a shell command. 2011-06-19 23:05:19 -07:00
Tathagata Das b5e6645505 Cleaner reimplementation of HadoopFileWriter. Introduced TaskContext.
1> HadoopFileWriter works correctly with task failures
2> It can also take an user specified JobConf object for configuration settings
3> A Task can now get information like stage ID, split ID, and attempt ID using TaskContext class
4> Minor changes in SparkContext, DAGScheduler and subclasses to allow specification of TaskContext as a parameter
2011-06-16 20:57:57 -07:00
Tathagata Das 869836a2fa Implemented TaskContext to hold contextual information (jobID, taskID, attemptID) of a task 2011-06-10 19:47:28 -07:00
Tathagata Das 389e56156f HadoopFileWriter changed to use Hadoop's OutputCommitter 2011-06-09 15:29:22 -07:00
Tathagata Das 24d845833c First-cut implementation of RDD.SaveAsText 2011-06-05 04:14:43 -07:00
Matei Zaharia 3297706ab2 Merge remote-tracking branch 'origin/master' into scala-2.9 2011-06-01 11:46:31 -07:00
Matei Zaharia 9bb448a151 Catch Throwable instead of Exception in LocalScheduler and Executor. Fixes #57. 2011-06-01 11:45:47 -07:00
Matei Zaharia 850fe3274e Make the runJob API public. Fixes #56. 2011-06-01 11:38:44 -07:00
Ismael Juma 82f10bd794 Remove unnecessary toStream calls. 2011-06-01 16:12:42 +01:00
Matei Zaharia 10fe324845 Merge remote-tracking branch 'origin/master' into scala-2.9 2011-05-31 23:48:11 -07:00
Matei Zaharia 5166d76843 Ensure logging is initialized before spawning any threads to fix issue #45 2011-05-31 23:47:32 -07:00
Matei Zaharia 0afd35a8dd Some docs in ClosureCleaner 2011-05-31 22:06:30 -07:00
Matei Zaharia 8b0390d344 Instantiate NullWritable properly in HadoopFile 2011-05-30 23:54:14 -07:00
Matei Zaharia 4096c2287e Various fixes 2011-05-29 18:46:01 -07:00
Matei Zaharia ef706ae959 Merge branch 'master' into new-rdds-protobuf
Conflicts:
	run
2011-05-29 16:20:23 -07:00
Matei Zaharia c501cff924 Executor was looking for the wrong constructor for ExecutorClassLoader 2011-05-29 16:15:59 -07:00
Ismael Juma 1396678baa Move REPL classes to separate module. 2011-05-27 11:22:50 +01:00
Ismael Juma 051da8b4ad Delete liblzf from lib as it's no longer used. 2011-05-27 11:22:10 +01:00
Ismael Juma ae1a1f91f1 Remove several dependencies from git and configure them as SBT managed dependencies.
Upgrade some of the dependencies while at it.
2011-05-27 11:22:01 +01:00
Ismael Juma 164ef4c751 Use explicit asInstanceOf instead of misleading unchecked pattern matching.
Also enable -unchecked warnings in SBT build file.
2011-05-27 07:57:10 +01:00
Ismael Juma 89c8ea2bb2 Replace deprecated - and -- with suggested filterNot (which is uglier). 2011-05-26 22:22:37 +01:00
Ismael Juma 94f05683bd Replace deprecated first with head. 2011-05-26 22:13:41 +01:00
Ismael Juma 0b6a862b68 Use math instead of Math as the latter is deprecated. 2011-05-26 22:06:36 +01:00
Ismael Juma 1f27d94c48 Use Array.iterator instead of Iterator.fromArray as the latter is deprecated. 2011-05-26 22:04:42 +01:00
Ismael Juma 1993a8e556 Use += instead of + for mutable sequences as the latter is deprecated. 2011-05-26 21:59:48 +01:00
root 5ef938615f Initial work on making stuff compile with protobuf Mesos 2011-05-24 22:27:08 +00:00
Matei Zaharia cec427e777 Fixed a bug with preferred locations having changed meaning in new RDDs 2011-05-22 17:12:29 -07:00
Matei Zaharia 4c888b2933 Fix queue type for executor 2011-05-22 16:42:05 -07:00
Matei Zaharia bea3a33012 doc tweak 2011-05-22 16:03:41 -07:00
Matei Zaharia 9bde5a54cb class loader fix 2011-05-22 16:00:41 -07:00
Matei Zaharia 91c07a33d9 Various fixes to serialization 2011-05-21 22:50:08 -07:00
Matei Zaharia f61b61c4ac Merge branch 'master' into new-rdds 2011-05-21 21:25:58 -07:00
Matei Zaharia 24a1e7f838 Scheduler can now recover from lost map outputs 2011-05-20 00:19:53 -07:00
Matei Zaharia 82329b0b28 Updated scheduler to support running on just some partitions of final RDD 2011-05-19 12:47:09 -07:00
Matei Zaharia 328e51b693 Various minor fixes 2011-05-19 11:19:25 -07:00
Matei Zaharia fd1d255821 Stop objectifying various trackers, caches, etc. 2011-05-17 12:41:13 -07:00
Matei Zaharia 4db50e26c7 Fixed unit tests by making them clean up the SparkContext after use and
thus clean up the various singletons (RDDCache, MapOutputTracker, etc).
This isn't perfect yet (ideally we shouldn't use singleton objects at
all) but we can fix that later.
2011-05-13 12:03:58 -07:00
Matei Zaharia aca8150c52 Ensure that AddedToCache messages make it home before tasks finish 2011-05-13 11:43:52 -07:00
Matei Zaharia 16c886a581 Optimization for count() 2011-05-13 10:41:34 -07:00
Mosharaf Chowdhury db7a2c4897 Issue #42 fixed. 2011-04-28 14:30:48 -07:00
Ankur Dave a4c04f3f6f Error handling for disk I/O in DiskSpillingCache
Also renamed the property spark.DiskSpillingCache.cacheDir to spark.diskSpillingCache.cacheDir in order to follow conventions.
2011-04-27 23:23:29 -07:00
Ankur Dave 12ff0d2dc3 Bring an entry back into memory after fetching it from disk 2011-04-27 22:59:05 -07:00
Ankur Dave e30313aa2c Added DiskSpillingCache
DiskSpillingCache is a BoundedMemoryCache that spills entries to disk
when it runs out of space. Currently the implementation is very
simple. In particular, it's missing the following features:

- Error handling for disk I/O, including checking of disk space levels
- Bringing an entry back into memory after fetching it from disk

In addition, here are some features that aren't critical but should be
implemented soon:

- Spilling based on a user-set priority in addition to LRU
- Caching into a subdirectory of spark.DiskSpillingCache.cacheDir
  rather than the root directory
2011-04-27 22:32:35 -07:00