Commit graph

176 commits

Author SHA1 Message Date
Tathagata Das def8126d77 Added TwitterInputDStream from example to StreamingContext. Renamed example TwitterBasic to TwitterPopularTags. 2013-02-14 17:49:43 -08:00
Tathagata Das 2eacf22401 Removed countByKeyAndWindow on paired DStreams, and added countByValueAndWindow for all DStreams. Updated both scala and java API and testsuites. 2013-02-14 12:21:47 -08:00
Prashant Sharma 291dd47c7f Taking FeederActor out as seperate program 2013-02-08 14:34:07 +05:30
Tathagata Das 4cc223b478 Merge branch 'mesos-master' into streaming 2013-02-07 13:59:31 -08:00
Tathagata Das 12300758cc Merge pull request #372 from Reinvigorate/sm-kafka
Removing offset management code that is non-existent in kafka 0.7.0+
2013-02-07 12:41:07 -08:00
Patrick Wendell dab81a8511 Fixing to match Spark styleguide 2013-02-05 20:57:04 -08:00
Patrick Wendell cc37601ecb Adding an example with an OLAP roll-up 2013-02-04 14:18:11 -08:00
Mikhail Bautin fe3eceab57 Remove activation of profiles by default
See the discussion at https://github.com/mesos/spark/pull/355 for why
default profile activation is a problem.
2013-01-31 13:30:41 -08:00
Prashant Sharma 4496bf197b Improved document comment in example 2013-01-25 14:34:38 +05:30
Prashant Sharma d17065c4b5 actor as receiver 2013-01-22 13:28:29 +05:30
Prashant Sharma 43bfd7bb21 Changed method name of createReceiver to getReceiver as it is not intended to be a factory. 2013-01-21 11:39:30 +05:30
Matei Zaharia 6e3754bf47 Add Maven build file for streaming, and fix some issues in SBT file
As part of this, changed our Scala 2.9.2 Kafka library to be available
as a local Maven repository, following the example in
(http://blog.dub.podval.org/2010/01/maven-in-project-repository.html)
2013-01-20 19:22:24 -08:00
Matei Zaharia 86057ec7c8 Merge branch 'master' into streaming
Conflicts:
	core/src/main/scala/spark/api/python/PythonRDD.scala
2013-01-20 12:47:55 -08:00
Matei Zaharia 2a8c2a6790 Minor formatting fixes 2013-01-20 10:24:53 -08:00
Tathagata Das 4f8fe58b25 Merge branch 'mesos-streaming' into streaming
Conflicts:
	core/src/main/scala/spark/api/java/JavaRDDLike.scala
	core/src/main/scala/spark/api/java/JavaSparkContext.scala
	core/src/test/scala/spark/JavaAPISuite.java
2013-01-20 01:13:56 -08:00
Prashant Sharma 56b9bd197c Plug in actor as stream receiver API 2013-01-19 22:04:07 +05:30
Prashant Sharma bb6ab92e31 Changed method name of createReceiver to getReceiver as it is not intended to be a factory. 2013-01-19 22:04:07 +05:30
seanm d3064fe707 kafkaStream API cleanup. A quorum of zookeepers can now be specified 2013-01-18 21:34:29 -07:00
Patrick Wendell 12b72b3e73 NetworkWordCount example 2013-01-17 22:37:56 -08:00
Patrick Wendell e0165bf714 Adding queueStream and some slight refactoring 2013-01-17 21:25:49 -08:00
Patrick Wendell 6fba7683c2 Small doc fix 2013-01-17 18:46:24 -08:00
Nick Pentreath a5ba7a9f32 Use only one update function and pass in transpose of ratings matrix where appropriate 2013-01-17 16:21:00 +02:00
Nick Pentreath a512df551f Fixed index error missing first argument 2013-01-17 16:05:27 +02:00
Nick Pentreath 42fbef3c2a Adding default command line args to SparkALS 2013-01-17 15:54:59 +02:00
Tathagata Das cd1521cfdb Merge branch 'master' into streaming
Conflicts:
	core/src/main/scala/spark/rdd/CoGroupedRDD.scala
	core/src/main/scala/spark/rdd/FilteredRDD.scala
	docs/_layouts/global.html
	docs/index.md
	run
2013-01-15 12:08:51 -08:00
Patrick Wendell d182a57cae Two changes:
- Updating countByX() types based on bug fix
- Porting new documentation to Java
2013-01-14 10:03:55 -08:00
Patrick Wendell 3461cd99b7 Flume example and bug fix 2013-01-14 09:42:36 -08:00
Tathagata Das 0a2e333341 Removed stream id from the constructor of NetworkReceiver to make it easier for PluggableNetworkInputDStream. 2013-01-13 16:18:39 -08:00
Eric Zhang ba06e9c97c Update examples/src/main/scala/spark/examples/LocalLR.scala
fix spelling mistake
2013-01-13 15:33:11 +08:00
Shivaram Venkataraman bbc56d85ed Rename environment variable for hadoop profiles to hadoopVersion 2013-01-12 15:24:13 -08:00
Shivaram Venkataraman 9262522306 Activate hadoop2 profile in pom.xml with -Dhadoop=2 2013-01-10 22:07:34 -08:00
Shivaram Venkataraman f7adb382ac Activate hadoop1 if property hadoop is missing. hadoop2 can be activated now
by using -Dhadoop -Phadoop2.
2013-01-08 03:19:43 -08:00
Patrick Wendell 6c502e3793 Making the Twitter example distributed.
This adds a distributed (receiver-based) implementation of the
Twitter dstream. It also changes the example to perform a
distributed sort rather than collecting the dataset at one node.
2013-01-07 22:01:11 -08:00
Tathagata Das 8c1b872512 Moved Twitter example to the where the other examples are. 2013-01-07 17:48:10 -08:00
Shivaram Venkataraman 4bbe07e5ec Activate hadoop1 profile by default for maven builds 2013-01-07 17:46:22 -08:00
Tathagata Das 237bac36e9 Renamed examples and added documentation. 2013-01-07 14:37:21 -08:00
Tathagata Das af8738dfb5 Moved Spark Streaming examples to examples sub-project. 2013-01-06 19:31:54 -08:00
Thomas Dudziak 02d64f9662 Mark hadoop dependencies provided in all library artifacts 2012-12-10 21:27:54 -08:00
Matei Zaharia ccff0a089a Use the same output directories that SBT had in subprojects
This will make it easier to make the "run" script work with a Maven build
2012-12-10 10:58:56 -08:00
Thomas Dudziak 3b643e86bc Updated versions in the pom.xml files to match current master 2012-11-27 17:50:42 -08:00
Thomas Dudziak 69297c64be Addressed code review comments 2012-11-27 15:45:16 -08:00
Thomas Dudziak 811a32257b Added maven and debian build files 2012-11-20 16:19:51 -08:00
root acf8272324 Fix K-means example a little 2012-11-10 23:07:21 -08:00
Matei Zaharia 8d7b77bcb5 Some doc and usability improvements:
- Added a StorageLevels class for easy access to StorageLevel constants
  in Java
- Added doc comments on Function classes in Java
- Updated Accumulator and HadoopWriter docs slightly
2012-10-12 17:53:20 -07:00
Mosharaf Chowdhury 119e50c7b9 Conflict fixed 2012-10-02 22:25:39 -07:00
Matei Zaharia 56c90485fd More updates to documentation 2012-09-25 19:31:07 -07:00
Mosharaf Chowdhury 3883532545 Bug fix. Fixed log messages. Updated BroadcastTest example to have iterations. 2012-08-30 21:43:00 -07:00
Josh Rosen 566feafe1d Cache points in SparkLR example. 2012-08-26 15:24:43 -07:00
Matei Zaharia 6ae3c375a9 Renamed apply() to call() in Java API and allowed it to throw Exceptions 2012-08-12 23:10:19 +02:00
Imran Rashid edc6972f8e move Vector class into core and spark.util package 2012-07-28 20:15:42 -07:00
Josh Rosen 2a60c998cc Remove StringOps.split() from Java WordCount. 2012-07-25 10:13:06 -07:00
Josh Rosen 6a78e88237 Minor cleanup and optimizations in Java API.
- Add override keywords.
- Cache RDDs and counts in TC example.
- Clean up JavaRDDLike's abstract methods.
2012-07-24 09:47:00 -07:00
Josh Rosen 460da878fc Improve Java API examples
- Replace JavaLR example with JavaHdfsLR example.
- Use anonymous classes in JavaWordCount; add options.
- Remove @Override annotations.
2012-07-22 14:40:39 -07:00
Josh Rosen 01dce3f569 Add Java API
Add distinct() method to RDD.

Fix bug in DoubleRDDFunctions.
2012-07-18 17:34:29 -07:00
Matei Zaharia 28fed4ce3b Add System.exit(0) at the end of all the example programs. 2012-06-05 23:31:28 -07:00
haoyuan 651932e703 Format the code as coding style agreed by Matei/TD/Haoyuan 2012-02-09 13:26:23 -08:00
Matei Zaharia 100e800782 Some fixes to the examples (mostly to use functional API) 2012-01-31 00:33:18 -08:00
Matei Zaharia fabcc82528 Merge pull request #103 from edisontung/master
Made improvements to takeSample. Also changed SparkLocalKMeans to SparkKMeans
2012-01-13 19:20:03 -08:00
Matei Zaharia 3034fc0d91 Merge commit 'ad4ebff42c1b738746b2b9ecfbb041b6d06e3e16' 2011-12-14 18:19:43 +01:00
Matei Zaharia 72c4839c5f Fixed LocalFileLR to deal with a change in Scala IO sources
(you can no longer iterate over a Source multiple times).
2011-12-01 13:52:12 -08:00
Edison Tung 42f8847a21 Revert de01b6deaaee1b43321e0aac330f4a98c0ea61c6^..HEAD 2011-12-01 13:43:25 -08:00
Edison Tung e1c814be4c Renamed SparkLocalKMeans to SparkKMeans 2011-12-01 13:34:03 -08:00
Edison Tung 3b9d9de583 Added KMeans examples
LocalKMeans runs locally with a randomly generated dataset.
SparkLocalKMeans takes an input file and runs KMeans on it.
2011-11-21 16:37:58 -08:00
Ankur Dave 35b6358a7c Report errors in tasks to the driver via a Mesos status update
When a task throws an exception, the Spark executor previously just
logged it to a local file on the slave and exited. This commit causes
Spark to also report the exception back to the driver using a Mesos
status update, so the user doesn't have to look through a log file on
the slave.

Here's what the reporting currently looks like:

    # ./run spark.examples.ExceptionHandlingTest master@203.0.113.1:5050
    [...]
    11/10/26 21:04:13 INFO spark.SimpleJob: Lost TID 1 (task 0:1)
    11/10/26 21:04:13 INFO spark.SimpleJob: Loss was due to java.lang.Exception: Testing exception handling
    [...]
    11/10/26 21:04:16 INFO spark.SparkContext: Job finished in 5.988547328 s
2011-11-14 01:54:53 +00:00
Matei Zaharia d4c8e69dc7 K-means example 2011-11-01 19:25:58 -07:00
Ismael Juma 0fba22b3d2 Fix issue #65: Change @serializable to extends Serializable in 2.9 branch
Note that we use scala.Serializable introduced in Scala 2.9 instead of
java.io.Serializable. Also, case classes inherit from scala.Serializable by
default.
2011-08-02 10:16:33 +01:00
Matei Zaharia c4dd68ae21 Merge branch 'mos-bt'
This merge keeps only the broadcast work in mos-bt because the structure
of shuffle has changed with the new RDD design. We still need some kind
of parallel shuffle but that will be added later.

Conflicts:
	core/src/main/scala/spark/BitTorrentBroadcast.scala
	core/src/main/scala/spark/ChainedBroadcast.scala
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/Utils.scala
	core/src/main/scala/spark/shuffle/BasicLocalFileShuffle.scala
	core/src/main/scala/spark/shuffle/DfsShuffle.scala
2011-06-26 18:22:12 -07:00
Ismael Juma 26000af4fa Replace deprecated fromFunction with either tabulate or fill.
tabulate used if indexed used by function and fill otherwise.
2011-05-26 22:12:11 +01:00
Ismael Juma 0b6a862b68 Use math instead of Math as the latter is deprecated. 2011-05-26 22:06:36 +01:00
Mosharaf Chowdhury 9d78779257 Merge branch 'mos-shuffle-tracked' into mos-bt
Conflicts:
	core/src/main/scala/spark/Broadcast.scala
2011-04-27 20:47:07 -07:00
Mosharaf Chowdhury ac7e066383 Merge branch 'master' into mos-shuffle-tracked
Conflicts:
	.gitignore
	core/src/main/scala/spark/LocalFileShuffle.scala
	src/scala/spark/BasicLocalFileShuffle.scala
	src/scala/spark/Broadcast.scala
	src/scala/spark/LocalFileShuffle.scala
2011-04-27 14:35:03 -07:00
Matei Zaharia 309367c477 Initial work towards new RDD design 2011-02-26 23:15:33 -08:00
Mosharaf Chowdhury 1a73c0d265 Merged with master. Using sbt. 2011-02-09 10:48:48 -08:00
Mosharaf Chowdhury 495b38658e Merge branch 'master' into mos-bt 2011-02-09 10:40:23 -08:00
Matei Zaharia a11fe23017 Moved examples to spark.examples package 2011-02-02 16:30:27 -08:00
Matei Zaharia e5c4cd8a5e Made examples and core subprojects 2011-02-01 15:11:08 -08:00