Commit graph

496 commits

Author SHA1 Message Date
Matei Zaharia 25c3a7781c Moved PairRDD and SequenceFileRDD functions to separate source files 2011-07-10 00:06:15 -04:00
Matei Zaharia b7f1f62ff5 bug fix 2011-07-09 18:53:02 -04:00
Matei Zaharia 003480f374 Register byte[] with Kryo serializer 2011-07-09 18:08:07 -04:00
Matei Zaharia aea5cb4413 Added parallel shuffle fetcher 2011-07-09 17:25:56 -04:00
Matei Zaharia 4b1646a25f Support for non-filesystem-based Hadoop data sources 2011-07-06 20:37:55 -04:00
Matei Zaharia b0ecf1ee41 Don't pass a null context when running tasks locally 2011-06-27 22:50:43 -07:00
Matei Zaharia 2f652f1656 Fix a compile error 2011-06-27 18:07:16 -07:00
tdas ae63972a89 Merge pull request #64 from mesos/td-rdd-save
Functionality to save RDDs to Hadoop files
2011-06-27 13:44:55 -07:00
Tathagata Das 3f08e1129f Merge branch 'master' into td-rdd-save
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
2011-06-27 13:43:44 -07:00
Tathagata Das ad842ac823 Merge branch 'master' into td-rdd-save
Conflicts:
	core/src/main/scala/spark/RDD.scala
2011-06-27 13:39:11 -07:00
Matei Zaharia b187675b68 Print version number 0.3 in REPL 2011-06-26 18:27:01 -07:00
Matei Zaharia c4dd68ae21 Merge branch 'mos-bt'
This merge keeps only the broadcast work in mos-bt because the structure
of shuffle has changed with the new RDD design. We still need some kind
of parallel shuffle but that will be added later.

Conflicts:
	core/src/main/scala/spark/BitTorrentBroadcast.scala
	core/src/main/scala/spark/ChainedBroadcast.scala
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/Utils.scala
	core/src/main/scala/spark/shuffle/BasicLocalFileShuffle.scala
	core/src/main/scala/spark/shuffle/DfsShuffle.scala
2011-06-26 18:22:12 -07:00
Tathagata Das 38f2ba99cc Further changes to HadoopFileWriter. Implemented ability to save RDDs as SequenceFiles and ObjectFiles.
1> HadoopFileWriter changed to take class types as constructor parameters (no more generic type)
2> Multiple types of RDD.saveAsHadoopFile() implemented to provide more saving options
3> RDD.saveAsSequenceFile() automatically converts basic types to Writable types before saving as SequenceFile
4> RDD.saveAsObjectFile() serializes objects and saves them to a ObjectFile
5> SparkContext.objectFile() opens the saved ObjectFiles
2011-06-24 19:51:21 -07:00
Matei Zaharia b626562d54 Merge pull request #63 from ogrisel/outer-join
Implemented RDD.leftOuterJoin and RDD.rightOuterJoin
2011-06-24 12:22:15 -07:00
Olivier Grisel 2e3531d8bf Implemented RDD.leftOuterJoin and RDD.rightOuterJoin 2011-06-24 11:00:51 +02:00
Matei Zaharia 095dd9c444 Merge pull request #62 from ogrisel/cogroup-test
Add missing test for RDD.groupWith
2011-06-23 10:12:24 -07:00
Matei Zaharia e8e35d5fb5 Merge pull request #61 from ogrisel/better-readme
Better readme
2011-06-23 10:11:14 -07:00
Tathagata Das 3d2befe831 Improved HadoopFileWriter (saves key and value classes to jobconf) 2011-06-23 08:11:22 -07:00
Olivier Grisel 7ef48a4df0 typo 2011-06-23 02:28:17 +02:00
Olivier Grisel 5b9e0a126d format 2011-06-23 02:27:14 +02:00
Olivier Grisel 236bcd0d9b Markdown rendering for the toplevel README.md to improve readability on github 2011-06-23 02:24:04 +02:00
Olivier Grisel 005d1605a4 add missing test for RDD.groupWith 2011-06-23 02:10:52 +02:00
Matei Zaharia 214250016a Added simple version of lookup 2011-06-20 11:59:16 -07:00
Matei Zaharia 23b1c309fb Added pipe() operation on RDDs for mapping through a shell command. 2011-06-19 23:05:19 -07:00
Tathagata Das b5e6645505 Cleaner reimplementation of HadoopFileWriter. Introduced TaskContext.
1> HadoopFileWriter works correctly with task failures
2> It can also take an user specified JobConf object for configuration settings
3> A Task can now get information like stage ID, split ID, and attempt ID using TaskContext class
4> Minor changes in SparkContext, DAGScheduler and subclasses to allow specification of TaskContext as a parameter
2011-06-16 20:57:57 -07:00
Tathagata Das 869836a2fa Implemented TaskContext to hold contextual information (jobID, taskID, attemptID) of a task 2011-06-10 19:47:28 -07:00
Tathagata Das 389e56156f HadoopFileWriter changed to use Hadoop's OutputCommitter 2011-06-09 15:29:22 -07:00
Matei Zaharia a413b8e59d Merge pull request #59 from ijuma/master
Move managedStyle to SparkProject
2011-06-07 00:41:50 -07:00
Tathagata Das 24d845833c First-cut implementation of RDD.SaveAsText 2011-06-05 04:14:43 -07:00
Ismael Juma 1ad4dcd3de Move managedStyle to SparkProject.
I had added it to DepJar by mistake.
2011-06-02 14:06:54 +01:00
Matei Zaharia 9bb448a151 Catch Throwable instead of Exception in LocalScheduler and Executor. Fixes #57. 2011-06-01 11:45:47 -07:00
Matei Zaharia 850fe3274e Make the runJob API public. Fixes #56. 2011-06-01 11:38:44 -07:00
Matei Zaharia 5166d76843 Ensure logging is initialized before spawning any threads to fix issue #45 2011-05-31 23:47:32 -07:00
Matei Zaharia 96daa31a01 Pass quoted arguments properly to run 2011-05-31 23:34:16 -07:00
Matei Zaharia 7862995569 Give SBT a bit more memory so it can do a update / compile / test in one JVM 2011-05-31 23:33:47 -07:00
Matei Zaharia 8b0390d344 Instantiate NullWritable properly in HadoopFile 2011-05-30 23:54:14 -07:00
Matei Zaharia c501cff924 Executor was looking for the wrong constructor for ExecutorClassLoader 2011-05-29 16:15:59 -07:00
Matei Zaharia 50ac1d2a40 Merge remote-tracking branch 'ijuma/issue51' 2011-05-29 15:41:12 -07:00
Ismael Juma 0c62ee4321 Depend on jetty-server in compile scope and upgrade to 7.4.2.
As Matei described: "We're using Jetty to run an HTTP server, not to embed Spark
in a webapp"
2011-05-29 20:12:50 +01:00
Matei Zaharia 22c8b84d8b Merge pull request #53 from ijuma/master
Use explicit asInstanceOf instead of misleading unchecked pattern matching.
2011-05-27 10:20:54 -07:00
Ismael Juma e3b323321d Use ManagedStyle.Maven. 2011-05-27 14:56:01 +01:00
Ismael Juma 3a6b0b8a57 Publish javadoc and sources. 2011-05-27 14:55:51 +01:00
Ismael Juma 59f1f42a9a Update run to work with SBT managed dependencies and the newly introduced repl module. 2011-05-27 11:22:59 +01:00
Ismael Juma 3af6003c87 Update sbt to 0.7.7. 2011-05-27 11:22:59 +01:00
Ismael Juma 1396678baa Move REPL classes to separate module. 2011-05-27 11:22:50 +01:00
Ismael Juma 3e8114ddbd Change project.organization to org.spark-project to fit Maven convention. 2011-05-27 11:22:10 +01:00
Ismael Juma 7b7dfdb085 Set project.version to 0.3-SNAPSHOT. 2011-05-27 11:22:10 +01:00
Ismael Juma 051da8b4ad Delete liblzf from lib as it's no longer used. 2011-05-27 11:22:10 +01:00
Ismael Juma ae1a1f91f1 Remove several dependencies from git and configure them as SBT managed dependencies.
Upgrade some of the dependencies while at it.
2011-05-27 11:22:01 +01:00
Ismael Juma 164ef4c751 Use explicit asInstanceOf instead of misleading unchecked pattern matching.
Also enable -unchecked warnings in SBT build file.
2011-05-27 07:57:10 +01:00