Commit graph

221 commits

Author SHA1 Message Date
Jey Kottalam 9dd15fe700 Don't mark hadoop-client as 'provided' 2013-08-16 13:50:12 -07:00
Jey Kottalam 11b42a84db Maven build now works with CDH hadoop-2.0.0-mr1 2013-08-16 13:50:12 -07:00
Jey Kottalam 353fab2440 Initial changes to make Maven build agnostic of hadoop version 2013-08-16 13:50:12 -07:00
Jey Kottalam 4f43fd791a make SparkHadoopUtil a member of SparkEnv 2013-08-15 16:50:37 -07:00
Evan Sparks ff9ebfabb4 Merge pull request #762 from shivaram/sgd-cleanup
Refactor SGD options into a new class.
2013-08-11 10:52:55 -07:00
Alexander Pivovarov 2d97cc46af Fixed path to JavaALS.java and JavaKMeans.java, fixed hadoop2-yarn profile 2013-08-10 23:04:50 -07:00
Matei Zaharia 4c4f769187 Optimize Scala PageRank to use reduceByKey 2013-08-10 18:09:54 -07:00
Matei Zaharia 06e4f2a8f2 Merge pull request #789 from MLnick/master
Adding Scala version of PageRank example
2013-08-10 18:06:23 -07:00
Matei Zaharia cd247ba5bb Merge pull request #786 from shivaram/mllib-java
Java fixes, tests and examples for ALS, KMeans
2013-08-09 20:41:13 -07:00
Matei Zaharia 06303a62e5 Optimize JavaPageRank to use reduceByKey instead of groupByKey 2013-08-08 18:50:00 -07:00
Shivaram Venkataraman 2812e72200 Add setters for optimizer, gradient in SGD.
Also remove java-specific constructor for LabeledPoint.
2013-08-08 16:24:31 -07:00
Shivaram Venkataraman e1a209f791 Remove Java-specific constructor for Rating.
The scala constructor works for native type java types. Modify examples
to match this.
2013-08-08 14:36:02 -07:00
Nick Pentreath c4eea875ac Style changes as per Matei's comments 2013-08-08 12:40:37 +02:00
Nick Pentreath cce758b893 Adding Scala version of PageRank example 2013-08-07 16:38:52 +02:00
Shivaram Venkataraman 338b7a7455 Merge branch 'master' of git://github.com/mesos/spark into sgd-cleanup
Conflicts:
	mllib/src/main/scala/spark/mllib/util/MLUtils.scala
2013-08-06 21:21:55 -07:00
Shivaram Venkataraman 7db69d56f2 Refactor GLM algorithms and add Java tests
This change adds Java examples and unit tests for all GLM algorithms
to make sure the MLLib interface works from Java. Changes include
- Introduce LabeledPoint and avoid using Doubles in train arguments
- Rename train to run in class methods
- Make the optimizer a member variable of GLM to make sure the builder
  pattern works
2013-08-06 17:23:22 -07:00
Shivaram Venkataraman 471fbadd0c Java examples, tests for KMeans and ALS
- Changes ALS to accept RDD[Rating] instead of (Int, Int, Double) making it
  easier to call from Java
- Renames class methods from `train` to `run` to enable static methods to be
  called from Java.
- Add unit tests which check if both static / class methods can be called.
- Also add examples which port the main() function in ALS, KMeans to the
  examples project.

Couple of minor changes to existing code:
- Add a toJavaRDD method in RDD to convert scala RDD to java RDD easily
- Workaround a bug where using double[] from Java leads to class cast exception in
  KMeans init
2013-08-06 15:43:46 -07:00
stayhf 882baee489 Got rid of unnecessary map function 2013-08-06 21:34:39 +00:00
stayhf 326a7a82e0 changes as reviewer requested 2013-08-06 21:03:24 +00:00
stayhf 98fd62605d Updated code with reviewer's suggestions 2013-08-05 00:30:28 +00:00
stayhf a682637301 Simple PageRank algorithm implementation in Java for SPARK-760 2013-08-03 06:01:16 +00:00
Matei Zaharia af3c9d5042 Add Apache license headers and LICENSE and NOTICE files 2013-07-16 17:21:33 -07:00
Prashant Sharma e86d5dbaad Merge branch 'master' into master-merge
Conflicts:
	README.md
	core/pom.xml
	core/src/main/scala/spark/deploy/JsonProtocol.scala
	core/src/main/scala/spark/deploy/LocalSparkCluster.scala
	core/src/main/scala/spark/deploy/master/Master.scala
	core/src/main/scala/spark/deploy/master/MasterWebUI.scala
	core/src/main/scala/spark/deploy/worker/Worker.scala
	core/src/main/scala/spark/deploy/worker/WorkerWebUI.scala
	core/src/main/scala/spark/storage/BlockManagerUI.scala
	core/src/main/scala/spark/util/AkkaUtils.scala
	pom.xml
	project/SparkBuild.scala
	streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala
2013-07-12 14:49:16 +05:30
Mark Hamstra 0b39d66f3f pom cleanup 2013-07-08 16:07:09 -07:00
Mark Hamstra afdaf430bd Explicit dependencies for scala-library and scalap to prevent 2.9.2 vs. 2.9.3 problems 2013-07-08 15:40:50 -07:00
Prashant Sharma a5f1f6a907 Merge branch 'master' into master-merge
Conflicts:
	core/pom.xml
	core/src/main/scala/spark/MapOutputTracker.scala
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/RDDCheckpointData.scala
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/Utils.scala
	core/src/main/scala/spark/api/python/PythonRDD.scala
	core/src/main/scala/spark/deploy/client/Client.scala
	core/src/main/scala/spark/deploy/master/MasterWebUI.scala
	core/src/main/scala/spark/deploy/worker/Worker.scala
	core/src/main/scala/spark/deploy/worker/WorkerWebUI.scala
	core/src/main/scala/spark/rdd/BlockRDD.scala
	core/src/main/scala/spark/rdd/ZippedRDD.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
	core/src/main/scala/spark/storage/BlockManager.scala
	core/src/main/scala/spark/storage/BlockManagerMaster.scala
	core/src/main/scala/spark/storage/BlockManagerMasterActor.scala
	core/src/main/scala/spark/storage/BlockManagerUI.scala
	core/src/main/scala/spark/util/AkkaUtils.scala
	core/src/test/scala/spark/SizeEstimatorSuite.scala
	pom.xml
	project/SparkBuild.scala
	repl/src/main/scala/spark/repl/SparkILoop.scala
	repl/src/test/scala/spark/repl/ReplSuite.scala
	streaming/src/main/scala/spark/streaming/StreamingContext.scala
	streaming/src/main/scala/spark/streaming/api/java/JavaStreamingContext.scala
	streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
	streaming/src/main/scala/spark/streaming/util/MasterFailureTest.scala
2013-07-03 11:43:26 +05:30
Konstantin Boudnik 6fdbc68f2c Fixing missed hbase dependency in examples hadoop2-yarn profile 2013-07-01 17:45:07 -07:00
Matei Zaharia ccfe953a4d Merge pull request #577 from skumargithub/master
Example of cumulative counting using updateStateByKey
2013-06-29 17:57:53 -07:00
Matei Zaharia 1667158544 Merge remote-tracking branch 'mrpotes/master' 2013-06-29 14:36:09 -07:00
James Phillpotts 176193b1e8 Fix usage and parameter extraction 2013-06-25 23:06:15 +01:00
James Phillpotts 366572edca Include a default OAuth implementation, and update examples and JavaStreamingContext 2013-06-25 22:59:34 +01:00
Tathagata Das c89af0a7f9 Merge branch 'master' into streaming
Conflicts:
	.gitignore
2013-06-24 23:57:47 -07:00
Matei Zaharia dbfab49d2a Merge remote-tracking branch 'milliondreams/casdemo'
Conflicts:
	project/SparkBuild.scala
2013-06-18 14:55:31 +02:00
Matei Zaharia b7794813b1 Fix run script on Windows for Scala 2.10 2013-06-15 09:37:13 -07:00
Rohit Rai b5b12823fa Fixing the style as per feedback 2013-06-13 14:05:46 +05:30
Rohit Rai b104c7f5c7 Example to write the output to cassandra 2013-06-03 15:15:52 +05:30
Rohit Rai 56c64c4033 A better way to read column value if you are sure the column exists in every row. 2013-06-03 12:48:35 +05:30
Rohit Rai 6d8423fd1b Adding deps to examples/pom.xml
Fixing exclusion in examples deps in SparkBuild.scala
2013-06-02 13:03:45 +05:30
Rohit Rai 81c2adc15c Removing infix call 2013-06-02 12:51:15 +05:30
Rohit Rai 3be7bdcefd Adding example to make Spark RDD from Cassandra 2013-06-01 19:32:17 +05:30
Ethan Jewett 3217d486f7 Add hBase dependency to examples POM 2013-05-20 19:41:38 -05:00
Ethan Jewett ee6f6aa6cd Add hBase example 2013-05-09 18:33:38 -05:00
Reynold Xin 012c9e5ab0 Revert "Merge pull request #596 from esjewett/master" because the
dependency on hbase introduces netty-3.2.2 which conflicts with
netty-3.5.3 already in Spark. This caused multiple test failures.

This reverts commit 0f1b7a06e1, reversing
changes made to aacca1b8a8.
2013-05-09 14:20:01 -07:00
Ethan Jewett a3d5f92210 Switch to using SparkContext method to create RDD 2013-05-07 11:43:06 -05:00
unknown cbf6a5ee1e Removed unused code, clarified intent of the program, batch size to 1 second 2013-05-06 08:05:45 -06:00
Ethan Jewett 7cff7e7897 Fix indents and mention other configuration options 2013-05-04 14:56:55 -05:00
Ethan Jewett 9290f16430 Remove unnecessary column family config 2013-05-04 12:39:14 -05:00
Ethan Jewett 02e8cfa617 HBase example 2013-05-04 12:31:30 -05:00
unknown 1d54401d7e Modified as per TD's suggestions 2013-04-30 23:01:32 -06:00
Prashant Sharma 8f3ac240cb Fixed Warning: ClassManifest -> ClassTag 2013-04-29 16:39:13 +05:30
Prashant Sharma 4b4a36ea7d Fixed pom.xml with updated dependencies. 2013-04-29 12:55:43 +05:30
Mridul Muralidharan dd515ca3ee Attempt at fixing merge conflict 2013-04-24 09:24:17 +05:30
unknown 0dc1e2d60f Examaple of cumulative counting using updateStateByKey 2013-04-22 09:22:45 -06:00
Mridul Muralidharan 7acab3ab45 Fix review comments, add a new api to SparkHadoopUtil to create appropriate Configuration. Modify an example to show how to use SplitInfo 2013-04-22 08:01:13 +05:30
seanm 7e56e99573 Surfacing decoders on KafkaInputDStream 2013-04-16 17:17:16 -06:00
Andrew Ash f1d8871ca1 Uniform whitespace across scala examples 2013-04-09 23:35:13 -04:00
Matei Zaharia 65caa8f711 Merge remote-tracking branch 'jey/bump-development-version-to-0.8.0'
Conflicts:
	docs/_config.yml
	project/SparkBuild.scala
2013-04-08 12:43:17 -04:00
Matei Zaharia b362df39ea Merge pull request #552 from MLnick/master
Bumping version for Twitter Algebird to latest
2013-04-07 17:17:52 -07:00
Mridul Muralidharan 6798a09df8 Add support for building against hadoop2-yarn : adding new maven profile for it 2013-04-07 17:47:38 +05:30
Nick Pentreath 0f54344fd8 Bumping Algebird version in examples now that it supports JDK 1.6 2013-04-03 13:15:34 +02:00
Erik van oosten b5e60c3253 Corrected order of CountMinSketchMonoid arguments 2013-04-02 15:25:22 +03:00
Jey Kottalam bc8ba222ff Bump development version to 0.8.0 2013-03-28 15:42:01 -07:00
Matei Zaharia ca4d083ec8 Merge pull request #528 from MLnick/java-examples
[SPARK-707] Adding Java versions of Pi, LogQuery and K-Means examples
2013-03-20 11:22:36 -07:00
Nick Pentreath 52398cc1a3 Java indentation 4 --> 2 spaces 2013-03-20 09:55:42 +02:00
Nick Pentreath 9fa47a2039 A few cosmetic changes for JavaKMeans 2013-03-19 15:31:03 +02:00
Nick Pentreath 568ddf7330 Adding Java K-Means example 2013-03-19 15:29:22 +02:00
Nick Pentreath b990caeb80 Changes to more closely match line length limit style 2013-03-17 20:03:27 +02:00
Mikhail Bautin 7fd2708eda Add a log4j compile dependency to fix build in IntelliJ
Also rename parent project to spark-parent (otherwise it shows up as
"parent" in IntelliJ, which is very confusing).
2013-03-15 11:41:51 -07:00
Nick Pentreath 13757b1198 Adding Java versions of Pi and LogQuery 2013-03-15 10:52:01 +02:00
Mark Hamstra 8b06b359da bump version to 0.7.1-SNAPSHOT in the subproject poms to keep the maven build building. 2013-02-28 23:34:34 -08:00
Matei Zaharia 5d7b591cfe Pass a code JAR to SparkContext in our examples. Fixes SPARK-594. 2013-02-25 19:34:32 -08:00
Matei Zaharia 6b87ef7c86 Fix compile error 2013-02-25 14:01:16 -08:00
Matei Zaharia 01bd136ba5 Use public method sparkContext instead of protected sc in streaming examples 2013-02-25 13:27:11 -08:00
Tathagata Das f282bc4960 Changed Algebird from 0.1.9 to 0.1.8 2013-02-24 12:44:12 -08:00
Tathagata Das c1a040db3a Fixed bugs in examples. 2013-02-24 11:00:30 -08:00
Tathagata Das 41285eaae3 Fixed differences in APIs of StreamingContext and JavaStreamingContext. Change rawNetworkStream to rawSocketStream, and added twitter, actor, zeroMQ streams to JavaStreamingContext. Also added them to JavaAPISuite. 2013-02-23 16:25:07 -08:00
Tathagata Das cfa65ebff1 Merge pull request #480 from MLnick/streaming-eg-algebird
[Streaming] Examples using Twitter's Algebird library
2013-02-22 12:29:04 -08:00
Tathagata Das 688e62718f Merge pull request #479 from ScrapCodes/zeromq-streaming
Zeromq streaming
2013-02-22 12:17:17 -08:00
Nick Pentreath d9bdae8cc2 Adding documentation for HLL and CMS examples. More efficient and clear use of the monoids. 2013-02-21 12:31:31 +02:00
Nick Pentreath 718474b9c6 Bumping Algebird to 0.1.9 2013-02-21 12:11:31 +02:00
Nick Pentreath 16d456742e Merge remote-tracking branch 'upstream/streaming' into streaming-eg-algebird 2013-02-21 09:33:08 +02:00
Tathagata Das 972fe7714f Merge branch 'mesos-streaming' into streaming
Conflicts:
	streaming/src/test/java/spark/streaming/JavaAPISuite.java
2013-02-20 11:06:01 -08:00
Tathagata Das fb9956256d Merge branch 'mesos-master' into streaming
Conflicts:
	core/src/main/scala/spark/rdd/CheckpointRDD.scala
	streaming/src/main/scala/spark/streaming/dstream/ReducedWindowedDStream.scala
2013-02-20 09:01:29 -08:00
Prashant Sharma 4e5b09664c fixes corresponding to review feedback at pull request #479 2013-02-20 19:14:52 +05:30
Prashant Sharma 05dc385649 A bug fix post merge, following changes to AkkaUtils 2013-02-20 15:28:12 +05:30
Nick Pentreath 8a281399f9 Streaming example using Twitter Algebird's Count Min Sketch monoid 2013-02-19 17:56:02 +02:00
Nick Pentreath d8ee184d95 Dependencies and refactoring for streaming HLL example, and using context.twitterStream method 2013-02-19 17:42:57 +02:00
Prashant Sharma 8d44480d84 example for demonstrating ZeroMQ stream 2013-02-19 19:42:14 +05:30
Nick Pentreath 315ea069e8 Merge remote-tracking branch 'upstream/streaming' into streaming-eg-algebird
Conflicts:
	project/SparkBuild.scala
2013-02-19 13:58:05 +02:00
Nick Pentreath 015893f0e8 Adding streaming HyperLogLog example using Algebird 2013-02-19 13:21:33 +02:00
Tathagata Das 7e30c46aaf Added comment to the KafkaWordCount, given by Sean McNamara. 2013-02-19 03:05:44 -08:00
Tathagata Das 9e82be1503 Merge branch 'streaming' into ScrapCodes-streaming-actor
Conflicts:
	docs/plugin-custom-receiver.md
	streaming/src/main/scala/spark/streaming/StreamingContext.scala
	streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
	streaming/src/main/scala/spark/streaming/dstream/PluggableInputDStream.scala
	streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala
	streaming/src/test/scala/spark/streaming/InputStreamsSuite.scala
2013-02-19 02:48:50 -08:00
Tathagata Das 12ea14c211 Changed networkStream to socketStream and pluggableNetworkStream to become networkStream as a way to create streams from arbitrary network receiver. 2013-02-18 15:18:34 -08:00
Tathagata Das 6a6e6bda57 Merge branch 'streaming' into ScrapCode-streaming
Conflicts:
	streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
	streaming/src/main/scala/spark/streaming/dstream/NetworkInputDStream.scala
2013-02-18 13:26:12 -08:00
Tathagata Das 4b8402e900 Moved Java streaming examples to examples/src/main/java/spark/streaming/... and fixed logging in NetworkInputTracker to highlight errors when receiver deregisters/shuts down. 2013-02-14 18:10:37 -08:00
Tathagata Das def8126d77 Added TwitterInputDStream from example to StreamingContext. Renamed example TwitterBasic to TwitterPopularTags. 2013-02-14 17:49:43 -08:00
Tathagata Das 2eacf22401 Removed countByKeyAndWindow on paired DStreams, and added countByValueAndWindow for all DStreams. Updated both scala and java API and testsuites. 2013-02-14 12:21:47 -08:00
Prashant Sharma 291dd47c7f Taking FeederActor out as seperate program 2013-02-08 14:34:07 +05:30
Tathagata Das 4cc223b478 Merge branch 'mesos-master' into streaming 2013-02-07 13:59:31 -08:00
Tathagata Das 12300758cc Merge pull request #372 from Reinvigorate/sm-kafka
Removing offset management code that is non-existent in kafka 0.7.0+
2013-02-07 12:41:07 -08:00