Commit graph

364 commits

Author SHA1 Message Date
Matei Zaharia 1ffadb2d9e Merge remote-tracking branch 'pwendell/ui-updates'
Conflicts:
	core/src/main/scala/spark/scheduler/DAGScheduler.scala
	core/src/main/scala/spark/util/AkkaUtils.scala
	pom.xml
2013-07-06 15:51:41 -07:00
Matei Zaharia 94871e4703 Merge pull request #655 from tgravescs/master
Add support for running Spark on Yarn on a secure Hadoop Cluster
2013-07-06 15:26:19 -07:00
Tathagata Das 280418ac45 Reduced the number of Iterator to ArrayBuffer copies in NetworkReceiver. 2013-07-05 21:38:21 -07:00
Y.CORP.YAHOO.COM\tgraves 923cf92900 Rework from pull request. Removed --user option from Spark on Yarn Client, made the user of JAVA_HOME environment
variable conditional on if its set, and created addCredentials in each of the SparkHadoopUtil classes
to only add the credentials when the profile is hadoop2-yarn.
2013-07-02 21:18:59 -05:00
Matei Zaharia 4358acfe07 Initialize Twitter4J OAuth from system properties instead of prompting 2013-06-29 15:25:06 -07:00
Matei Zaharia 1667158544 Merge remote-tracking branch 'mrpotes/master' 2013-06-29 14:36:09 -07:00
Patrick Wendell 362d996c81 Handful of changes based on matei's review
- Avoid exception when no tasks have finished for a stage
- Adding DOCTYPE so css renders properly
- Adding progress slider
2013-06-27 19:14:28 -07:00
James Phillpotts 366572edca Include a default OAuth implementation, and update examples and JavaStreamingContext 2013-06-25 22:59:34 +01:00
Tathagata Das c89af0a7f9 Merge branch 'master' into streaming
Conflicts:
	.gitignore
2013-06-24 23:57:47 -07:00
Tathagata Das 48c7e373c6 Minor formatting fixes 2013-06-24 23:11:04 -07:00
Tathagata Das 1249e9153b Merge pull request #572 from Reinvigorate/sm-block-interval
Adding spark.streaming.blockInterval property
2013-06-24 21:46:33 -07:00
Tathagata Das cfcda95f86 Merge pull request #571 from Reinvigorate/sm-kafka-serializers
Surfacing decoders on KafkaInputDStream
2013-06-24 21:44:50 -07:00
James Phillpotts 8955787a59 Twitter API v1 is retired - username/password auth no longer possible 2013-06-24 09:15:17 +01:00
James Phillpotts 93a1643405 Allow other twitter authorizations than username/password 2013-06-21 14:21:52 +01:00
Thomas Graves 75d78c7ac9 Add support for Spark on Yarn on a secure Hadoop cluster 2013-06-19 11:18:42 -05:00
Jey Kottalam e7982c798e Exclude old versions of Netty from Maven-based build 2013-05-18 21:24:58 -07:00
seanm f25282def5 fixing kafkaStream Java API and adding test 2013-05-10 17:34:28 -06:00
seanm 3632980b1b fixing indentation 2013-05-10 15:54:26 -06:00
seanm b95c1bdbba count() now uses a transform instead of ConstantInputDStream 2013-05-10 12:47:24 -06:00
seanm d761e7359d adding kafkaStream API tests 2013-05-10 12:05:10 -06:00
Reynold Xin 90577ada69 Merge branch 'shuffle-performance-fix-0.7' of github.com:shane-huang/spark into shufflemerge
Conflicts:
	core/src/main/scala/spark/storage/BlockManager.scala
	core/src/main/scala/spark/storage/DiskStore.scala
	project/SparkBuild.scala
2013-05-07 15:56:19 -07:00
Mridul Muralidharan 430c531464 Remove debug statements 2013-04-29 00:24:30 +05:30
Mridul Muralidharan 3a89a76b87 Make log message more descriptive to aid in debugging 2013-04-29 00:04:12 +05:30
Mridul Muralidharan 7fa6978a1e Allow CheckpointWriter pending tasks to finish 2013-04-28 23:08:10 +05:30
Mridul Muralidharan afee902443 Attempt to fix streaming test failures after yarn branch merge 2013-04-28 22:26:45 +05:30
Mridul Muralidharan dd515ca3ee Attempt at fixing merge conflict 2013-04-24 09:24:17 +05:30
seanm 7e56e99573 Surfacing decoders on KafkaInputDStream 2013-04-16 17:17:16 -06:00
seanm ab0f834dbb adding spark.streaming.blockInterval property 2013-04-16 11:57:05 -06:00
seanm b42d68c8ce fixing Spark Streaming count() so that 0 will be emitted when there is nothing to count 2013-04-15 12:54:55 -06:00
Matei Zaharia 65caa8f711 Merge remote-tracking branch 'jey/bump-development-version-to-0.8.0'
Conflicts:
	docs/_config.yml
	project/SparkBuild.scala
2013-04-08 12:43:17 -04:00
Mridul Muralidharan 6798a09df8 Add support for building against hadoop2-yarn : adding new maven profile for it 2013-04-07 17:47:38 +05:30
shane-huang df47b40b76 Shuffle Performance fix: Use netty embeded OIO file server instead of ConnectionManager
Shuffle Performance Optimization: do not send 0-byte block requests to reduce network messages
change reference from io.Source to scala.io.Source to avoid looking into io.netty package

Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-04-07 14:37:12 +08:00
Jey Kottalam bc8ba222ff Bump development version to 0.8.0 2013-03-28 15:42:01 -07:00
Jey Kottalam b569b3f200 Move streaming test initialization into 'before' blocks 2013-03-28 15:08:41 -07:00
seanm 329ef34c2e fixing autooffset.reset behavior when set to 'largest' 2013-03-26 23:56:15 -06:00
Holden Karau 1f5381119f method first in trait IterableLike is deprecated: use `head' instead 2013-03-24 19:19:40 -07:00
seanm d61978d0ab keeping JavaStreamingContext in sync with StreamingContext + adding comments for better clarity 2013-03-15 23:36:52 -06:00
Mikhail Bautin 7fd2708eda Add a log4j compile dependency to fix build in IntelliJ
Also rename parent project to spark-parent (otherwise it shows up as
"parent" in IntelliJ, which is very confusing).
2013-03-15 11:41:51 -07:00
seanm 33fa1e7e4a removing dependency on ZookeeperConsumerConnector + purging last relic of kafka reliability that never solidified (ie- setOffsets) 2013-03-15 00:10:13 -06:00
seanm d069283211 fixing memory leak in kafka MessageHandler 2013-03-14 23:45:33 -06:00
seanm cfa8e769a8 KafkaInputDStream improvements. Allows more Kafka configurability 2013-03-14 23:45:19 -06:00
Stephen Haberman 0cf320485d Forgot equals. 2013-03-12 00:05:35 -05:00
Stephen Haberman 9e68f48625 More quickly call close in HadoopRDD.
This also refactors out the common "gotNext" iterator pattern into
a shared utility class.
2013-03-11 23:59:17 -05:00
Mark Hamstra b409073102 Instead of failing to bind to a fixed, already-in-use port, let the OS choose an available port for TestServer. 2013-03-01 15:05:07 -08:00
Mark Hamstra 8b06b359da bump version to 0.7.1-SNAPSHOT in the subproject poms to keep the maven build building. 2013-02-28 23:34:34 -08:00
Matei Zaharia 4223be3aa4 Merge pull request #503 from pwendell/bug-fix
createNewSparkContext should use sparkHome/jars/environment.
2013-02-25 19:43:05 -08:00
Patrick Wendell 284ba90958 createNewSparkContext should use sparkHome/jars/environment.
This fixes a bug introduced by Matei's recent change.
2013-02-25 19:40:52 -08:00
Matei Zaharia 5d7b591cfe Pass a code JAR to SparkContext in our examples. Fixes SPARK-594. 2013-02-25 19:34:32 -08:00
Tathagata Das bc4a6eb850 Changed Flume test to use the same port as other tests, so that can be controlled centrally. 2013-02-25 18:04:21 -08:00
Matei Zaharia 4d480ec59e Fixed something that was reported as a compile error in ScalaDoc.
For some reason, ScalaDoc complained about no such constructor for
StreamingContext; it doesn't seem like an actual Scala error but it
prevented sbt publish and from working because docs weren't built.
2013-02-25 15:53:43 -08:00
Matei Zaharia 490f056cdd Allow passing sparkHome and JARs to StreamingContext constructor
Also warns if spark.cleaner.ttl is not set in the version where you pass
your own SparkContext.
2013-02-25 15:13:30 -08:00
Tathagata Das 5ab37be983 Fixed class paths and dependencies based on Matei's comments. 2013-02-24 16:24:52 -08:00
Tathagata Das 28f8b721f6 Added back the initial spark job before starting streaming receivers 2013-02-24 13:01:54 -08:00
Tathagata Das 68c7934b1a Fixed missing dependencies in streaming/pom.xml 2013-02-24 11:51:45 -08:00
Tathagata Das b4eb24de96 Updated streaming programming guide with Java API info, and comments from Patrick. 2013-02-23 23:59:45 -08:00
Tathagata Das 41285eaae3 Fixed differences in APIs of StreamingContext and JavaStreamingContext. Change rawNetworkStream to rawSocketStream, and added twitter, actor, zeroMQ streams to JavaStreamingContext. Also added them to JavaAPISuite. 2013-02-23 16:25:07 -08:00
Tathagata Das d8cee52d52 Merge branch 'mesos-streaming' into streaming 2013-02-22 18:25:34 -08:00
Tathagata Das cfa65ebff1 Merge pull request #480 from MLnick/streaming-eg-algebird
[Streaming] Examples using Twitter's Algebird library
2013-02-22 12:29:04 -08:00
Tathagata Das 688e62718f Merge pull request #479 from ScrapCodes/zeromq-streaming
Zeromq streaming
2013-02-22 12:17:17 -08:00
Tathagata Das 208edaac1b Fixed condition in InputDStream isTimeValid. 2013-02-21 15:22:26 -08:00
Nick Pentreath 16d456742e Merge remote-tracking branch 'upstream/streaming' into streaming-eg-algebird 2013-02-21 09:33:08 +02:00
Tathagata Das 972fe7714f Merge branch 'mesos-streaming' into streaming
Conflicts:
	streaming/src/test/java/spark/streaming/JavaAPISuite.java
2013-02-20 11:06:01 -08:00
Tathagata Das fb9956256d Merge branch 'mesos-master' into streaming
Conflicts:
	core/src/main/scala/spark/rdd/CheckpointRDD.scala
	streaming/src/main/scala/spark/streaming/dstream/ReducedWindowedDStream.scala
2013-02-20 09:01:29 -08:00
Prashant Sharma 4e5b09664c fixes corresponding to review feedback at pull request #479 2013-02-20 19:14:52 +05:30
Patrick Wendell 041c19e5f0 Small changes that were missing in merge 2013-02-19 08:44:20 -08:00
Patrick Wendell fed1122d74 Use RDD type for slice operator in Java.
This commit uses the RDD type in `slice`, making it available to both normal
and pair RDD's in java. It also updates the signature for `slice` to match
changes in the Scala API.
2013-02-19 08:32:38 -08:00
Patrick Wendell 35880de42e Use RDD type for transform operator in Java.
This is an improved implementation of the `transform` operator in Java.
The main difference is that this allows all four possible types of
transform functions

1. JavaRDD -> JavaRDD
2. JavaRDD -> JavaPairRDD
3. JavaPairRDD -> JavaPairRDD
4. JavaPairRDD -> JavaRDD

whereas previously only (1) and (3) were possible.

Conflicts:

	streaming/src/test/java/spark/streaming/JavaAPISuite.java
2013-02-19 08:31:58 -08:00
Patrick Wendell 9d49a6b03f Use RDD type for foreach operator in Java. 2013-02-19 08:30:32 -08:00
Nick Pentreath d8ee184d95 Dependencies and refactoring for streaming HLL example, and using context.twitterStream method 2013-02-19 17:42:57 +02:00
Prashant Sharma 8d44480d84 example for demonstrating ZeroMQ stream 2013-02-19 19:42:14 +05:30
Prashant Sharma f7d3e309cb ZeroMQ stream as receiver 2013-02-19 19:32:52 +05:30
Tathagata Das 9e82be1503 Merge branch 'streaming' into ScrapCodes-streaming-actor
Conflicts:
	docs/plugin-custom-receiver.md
	streaming/src/main/scala/spark/streaming/StreamingContext.scala
	streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
	streaming/src/main/scala/spark/streaming/dstream/PluggableInputDStream.scala
	streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala
	streaming/src/test/scala/spark/streaming/InputStreamsSuite.scala
2013-02-19 02:48:50 -08:00
Tathagata Das 12ea14c211 Changed networkStream to socketStream and pluggableNetworkStream to become networkStream as a way to create streams from arbitrary network receiver. 2013-02-18 15:18:34 -08:00
Tathagata Das 6a6e6bda57 Merge branch 'streaming' into ScrapCode-streaming
Conflicts:
	streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
	streaming/src/main/scala/spark/streaming/dstream/NetworkInputDStream.scala
2013-02-18 13:26:12 -08:00
Tathagata Das 8ad561dc7d Added checkpointing and fault-tolerance semantics to the programming guide. Fixed default checkpoint interval to being a multiple of slide duration. Fixed visibility of some classes and objects to clean up docs. 2013-02-18 02:12:41 -08:00
Matei Zaharia 7151e1e4c8 Rename "jobs" to "applications" in the standalone cluster 2013-02-17 23:23:08 -08:00
Tathagata Das f98c7da23e Many changes to ensure better 2nd recovery if 2nd failure happens while
recovering from 1st failure
- Made the scheduler to checkpoint after clearing old metadata which
  ensures that a new checkpoint is written as soon as at least one batch
  gets computed  while recovering from a failure. This ensures that if
  there is a 2nd failure while recovering from 1st failure, the system
  start 2nd recovery from a newer checkpoint.
- Modified Checkpoint writer to write checkpoint in a different thread.
- Added a check to make sure that compute for InputDStreams gets called
  only for strictly increasing times.
- Changed implementation of slice to call getOrCompute on parent DStream
  in time-increasing order.
- Added testcase to test slice.
- Fixed testGroupByKeyAndWindow testcase in JavaAPISuite to verify
  results with expected output in an order-independent manner.
2013-02-17 15:06:41 -08:00
Stephen Haberman ae2234687d Make CoGroupedRDDs explicitly have the same key type. 2013-02-16 13:10:31 -06:00
Tathagata Das ddcb976b0d Made MasterFailureTest more robust. 2013-02-15 06:54:47 +00:00
Tathagata Das 4b8402e900 Moved Java streaming examples to examples/src/main/java/spark/streaming/... and fixed logging in NetworkInputTracker to highlight errors when receiver deregisters/shuts down. 2013-02-14 18:10:37 -08:00
Tathagata Das def8126d77 Added TwitterInputDStream from example to StreamingContext. Renamed example TwitterBasic to TwitterPopularTags. 2013-02-14 17:49:43 -08:00
Tathagata Das 2eacf22401 Removed countByKeyAndWindow on paired DStreams, and added countByValueAndWindow for all DStreams. Updated both scala and java API and testsuites. 2013-02-14 12:21:47 -08:00
Tathagata Das 03e8dc6861 Changes functions comments to make them more consistent. 2013-02-13 20:59:29 -08:00
Tathagata Das 12b020b668 Added filter functionality to reduceByKeyAndWindow with inverse. Consolidated reduceByKeyAndWindow's many functions into smaller number of functions with optional parameters. 2013-02-13 20:53:50 -08:00
Tathagata Das 39addd3803 Changed scheduler and file input stream to fix bugs in the driver fault tolerance. Added MasterFailureTest to rigorously test master fault tolerance with file input stream. 2013-02-13 12:17:45 -08:00
Patrick Wendell 3f3e77f28b STREAMING-50: Support transform workaround in JavaPairDStream
This ports a useful workaround (the `transform` function) to
JavaPairDStream. It is necessary to do things like sorting which
are not supported yet in the core streaming API.
2013-02-12 14:02:32 -08:00
Patrick Wendell d09c36065c Using tuple swap() 2013-02-11 10:45:45 -08:00
Patrick Wendell 04786d0739 small fix 2013-02-11 10:05:49 -08:00
Patrick Wendell c65988bdc1 Fix for MapPartitions 2013-02-11 10:03:37 -08:00
Patrick Wendell 20cf770545 Fix for flatmap 2013-02-11 10:03:37 -08:00
Patrick Wendell 314d87a038 Indentation fix 2013-02-11 10:03:37 -08:00
Patrick Wendell f0b68c623c Initial cut at replacing K, V in Java files 2013-02-11 10:03:37 -08:00
Tathagata Das fd90daf850 Fixed bugs in FileInputDStream and Scheduler that occasionally failed to reprocess old files after recovering from master failure. Completely modified spark.streaming.FailureTest to test multiple master failures using file input stream. 2013-02-10 19:48:42 -08:00
Tathagata Das 99a5fc498a Added an initial spark job to ensure worker nodes are initialized. 2013-02-09 15:18:05 -08:00
Tathagata Das 4cc223b478 Merge branch 'mesos-master' into streaming 2013-02-07 13:59:31 -08:00
Tathagata Das d55e3aa467 Updated JavaStreamingContext with updated kafkaStream API. 2013-02-07 13:59:18 -08:00
Tathagata Das c6b2f765d3 Merge branch 'mesos-streaming' into streaming 2013-02-07 13:13:53 -08:00
Tathagata Das 12300758cc Merge pull request #372 from Reinvigorate/sm-kafka
Removing offset management code that is non-existent in kafka 0.7.0+
2013-02-07 12:41:07 -08:00
Tathagata Das 915d9931fe Merge pull request #373 from Reinvigorate/sm-updateStateByKey
StateDStream changes to give updateStateByKey consistent behavior
2013-02-07 11:59:19 -08:00
Patrick Wendell 7eea64aa4c Streaming constructor which takes JavaSparkContext
It's sometimes helpful to directly pass a JavaSparkContext,
and take advantage of the various constructors available for that.
2013-02-05 11:43:16 -08:00