Commit graph

62 commits

Author SHA1 Message Date
Matei Zaharia 6e3754bf47 Add Maven build file for streaming, and fix some issues in SBT file
As part of this, changed our Scala 2.9.2 Kafka library to be available
as a local Maven repository, following the example in
(http://blog.dub.podval.org/2010/01/maven-in-project-repository.html)
2013-01-20 19:22:24 -08:00
Matei Zaharia 86057ec7c8 Merge branch 'master' into streaming
Conflicts:
	core/src/main/scala/spark/api/python/PythonRDD.scala
2013-01-20 12:47:55 -08:00
Matei Zaharia 2a8c2a6790 Minor formatting fixes 2013-01-20 10:24:53 -08:00
Tathagata Das 4f8fe58b25 Merge branch 'mesos-streaming' into streaming
Conflicts:
	core/src/main/scala/spark/api/java/JavaRDDLike.scala
	core/src/main/scala/spark/api/java/JavaSparkContext.scala
	core/src/test/scala/spark/JavaAPISuite.java
2013-01-20 01:13:56 -08:00
Patrick Wendell 12b72b3e73 NetworkWordCount example 2013-01-17 22:37:56 -08:00
Patrick Wendell e0165bf714 Adding queueStream and some slight refactoring 2013-01-17 21:25:49 -08:00
Patrick Wendell 6fba7683c2 Small doc fix 2013-01-17 18:46:24 -08:00
Nick Pentreath a5ba7a9f32 Use only one update function and pass in transpose of ratings matrix where appropriate 2013-01-17 16:21:00 +02:00
Nick Pentreath a512df551f Fixed index error missing first argument 2013-01-17 16:05:27 +02:00
Nick Pentreath 42fbef3c2a Adding default command line args to SparkALS 2013-01-17 15:54:59 +02:00
Tathagata Das cd1521cfdb Merge branch 'master' into streaming
Conflicts:
	core/src/main/scala/spark/rdd/CoGroupedRDD.scala
	core/src/main/scala/spark/rdd/FilteredRDD.scala
	docs/_layouts/global.html
	docs/index.md
	run
2013-01-15 12:08:51 -08:00
Patrick Wendell d182a57cae Two changes:
- Updating countByX() types based on bug fix
- Porting new documentation to Java
2013-01-14 10:03:55 -08:00
Patrick Wendell 3461cd99b7 Flume example and bug fix 2013-01-14 09:42:36 -08:00
Tathagata Das 0a2e333341 Removed stream id from the constructor of NetworkReceiver to make it easier for PluggableNetworkInputDStream. 2013-01-13 16:18:39 -08:00
Eric Zhang ba06e9c97c Update examples/src/main/scala/spark/examples/LocalLR.scala
fix spelling mistake
2013-01-13 15:33:11 +08:00
Shivaram Venkataraman bbc56d85ed Rename environment variable for hadoop profiles to hadoopVersion 2013-01-12 15:24:13 -08:00
Shivaram Venkataraman 9262522306 Activate hadoop2 profile in pom.xml with -Dhadoop=2 2013-01-10 22:07:34 -08:00
Shivaram Venkataraman f7adb382ac Activate hadoop1 if property hadoop is missing. hadoop2 can be activated now
by using -Dhadoop -Phadoop2.
2013-01-08 03:19:43 -08:00
Patrick Wendell 6c502e3793 Making the Twitter example distributed.
This adds a distributed (receiver-based) implementation of the
Twitter dstream. It also changes the example to perform a
distributed sort rather than collecting the dataset at one node.
2013-01-07 22:01:11 -08:00
Tathagata Das 8c1b872512 Moved Twitter example to the where the other examples are. 2013-01-07 17:48:10 -08:00
Shivaram Venkataraman 4bbe07e5ec Activate hadoop1 profile by default for maven builds 2013-01-07 17:46:22 -08:00
Tathagata Das 237bac36e9 Renamed examples and added documentation. 2013-01-07 14:37:21 -08:00
Tathagata Das af8738dfb5 Moved Spark Streaming examples to examples sub-project. 2013-01-06 19:31:54 -08:00
Thomas Dudziak 02d64f9662 Mark hadoop dependencies provided in all library artifacts 2012-12-10 21:27:54 -08:00
Matei Zaharia ccff0a089a Use the same output directories that SBT had in subprojects
This will make it easier to make the "run" script work with a Maven build
2012-12-10 10:58:56 -08:00
Thomas Dudziak 3b643e86bc Updated versions in the pom.xml files to match current master 2012-11-27 17:50:42 -08:00
Thomas Dudziak 69297c64be Addressed code review comments 2012-11-27 15:45:16 -08:00
Thomas Dudziak 811a32257b Added maven and debian build files 2012-11-20 16:19:51 -08:00
root acf8272324 Fix K-means example a little 2012-11-10 23:07:21 -08:00
Matei Zaharia 8d7b77bcb5 Some doc and usability improvements:
- Added a StorageLevels class for easy access to StorageLevel constants
  in Java
- Added doc comments on Function classes in Java
- Updated Accumulator and HadoopWriter docs slightly
2012-10-12 17:53:20 -07:00
Mosharaf Chowdhury 119e50c7b9 Conflict fixed 2012-10-02 22:25:39 -07:00
Matei Zaharia 56c90485fd More updates to documentation 2012-09-25 19:31:07 -07:00
Mosharaf Chowdhury 3883532545 Bug fix. Fixed log messages. Updated BroadcastTest example to have iterations. 2012-08-30 21:43:00 -07:00
Josh Rosen 566feafe1d Cache points in SparkLR example. 2012-08-26 15:24:43 -07:00
Matei Zaharia 6ae3c375a9 Renamed apply() to call() in Java API and allowed it to throw Exceptions 2012-08-12 23:10:19 +02:00
Imran Rashid edc6972f8e move Vector class into core and spark.util package 2012-07-28 20:15:42 -07:00
Josh Rosen 2a60c998cc Remove StringOps.split() from Java WordCount. 2012-07-25 10:13:06 -07:00
Josh Rosen 6a78e88237 Minor cleanup and optimizations in Java API.
- Add override keywords.
- Cache RDDs and counts in TC example.
- Clean up JavaRDDLike's abstract methods.
2012-07-24 09:47:00 -07:00
Josh Rosen 460da878fc Improve Java API examples
- Replace JavaLR example with JavaHdfsLR example.
- Use anonymous classes in JavaWordCount; add options.
- Remove @Override annotations.
2012-07-22 14:40:39 -07:00
Josh Rosen 01dce3f569 Add Java API
Add distinct() method to RDD.

Fix bug in DoubleRDDFunctions.
2012-07-18 17:34:29 -07:00
Matei Zaharia 28fed4ce3b Add System.exit(0) at the end of all the example programs. 2012-06-05 23:31:28 -07:00
haoyuan 651932e703 Format the code as coding style agreed by Matei/TD/Haoyuan 2012-02-09 13:26:23 -08:00
Matei Zaharia 100e800782 Some fixes to the examples (mostly to use functional API) 2012-01-31 00:33:18 -08:00
Matei Zaharia fabcc82528 Merge pull request #103 from edisontung/master
Made improvements to takeSample. Also changed SparkLocalKMeans to SparkKMeans
2012-01-13 19:20:03 -08:00
Matei Zaharia 3034fc0d91 Merge commit 'ad4ebff42c1b738746b2b9ecfbb041b6d06e3e16' 2011-12-14 18:19:43 +01:00
Matei Zaharia 72c4839c5f Fixed LocalFileLR to deal with a change in Scala IO sources
(you can no longer iterate over a Source multiple times).
2011-12-01 13:52:12 -08:00
Edison Tung 42f8847a21 Revert de01b6deaaee1b43321e0aac330f4a98c0ea61c6^..HEAD 2011-12-01 13:43:25 -08:00
Edison Tung e1c814be4c Renamed SparkLocalKMeans to SparkKMeans 2011-12-01 13:34:03 -08:00
Edison Tung 3b9d9de583 Added KMeans examples
LocalKMeans runs locally with a randomly generated dataset.
SparkLocalKMeans takes an input file and runs KMeans on it.
2011-11-21 16:37:58 -08:00
Ankur Dave 35b6358a7c Report errors in tasks to the driver via a Mesos status update
When a task throws an exception, the Spark executor previously just
logged it to a local file on the slave and exited. This commit causes
Spark to also report the exception back to the driver using a Mesos
status update, so the user doesn't have to look through a log file on
the slave.

Here's what the reporting currently looks like:

    # ./run spark.examples.ExceptionHandlingTest master@203.0.113.1:5050
    [...]
    11/10/26 21:04:13 INFO spark.SimpleJob: Lost TID 1 (task 0:1)
    11/10/26 21:04:13 INFO spark.SimpleJob: Loss was due to java.lang.Exception: Testing exception handling
    [...]
    11/10/26 21:04:16 INFO spark.SparkContext: Job finished in 5.988547328 s
2011-11-14 01:54:53 +00:00