Commit graph

2583 commits

Author SHA1 Message Date
Shivaram Venkataraman 03f45a18d5 Use port 5080 for httpd/ganglia 2013-02-18 16:56:01 -08:00
Tathagata Das 12ea14c211 Changed networkStream to socketStream and pluggableNetworkStream to become networkStream as a way to create streams from arbitrary network receiver. 2013-02-18 15:18:34 -08:00
Tathagata Das 6a6e6bda57 Merge branch 'streaming' into ScrapCode-streaming
Conflicts:
	streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
	streaming/src/main/scala/spark/streaming/dstream/NetworkInputDStream.scala
2013-02-18 13:26:12 -08:00
Tathagata Das 8ad561dc7d Added checkpointing and fault-tolerance semantics to the programming guide. Fixed default checkpoint interval to being a multiple of slide duration. Fixed visibility of some classes and objects to clean up docs. 2013-02-18 02:12:41 -08:00
Matei Zaharia 7151e1e4c8 Rename "jobs" to "applications" in the standalone cluster 2013-02-17 23:23:08 -08:00
Matei Zaharia 06e5e6627f Renamed "splits" to "partitions" 2013-02-17 22:13:26 -08:00
Matei Zaharia 455d015076 Clean up EC2 script options a bit 2013-02-17 16:53:12 -08:00
Tathagata Das f98c7da23e Many changes to ensure better 2nd recovery if 2nd failure happens while
recovering from 1st failure
- Made the scheduler to checkpoint after clearing old metadata which
  ensures that a new checkpoint is written as soon as at least one batch
  gets computed  while recovering from a failure. This ensures that if
  there is a 2nd failure while recovering from 1st failure, the system
  start 2nd recovery from a newer checkpoint.
- Modified Checkpoint writer to write checkpoint in a different thread.
- Added a check to make sure that compute for InputDStreams gets called
  only for strictly increasing times.
- Changed implementation of slice to call getOrCompute on parent DStream
  in time-increasing order.
- Added testcase to test slice.
- Fixed testGroupByKeyAndWindow testcase in JavaAPISuite to verify
  results with expected output in an order-independent manner.
2013-02-17 15:06:41 -08:00
Matei Zaharia 08e444df0e Change EC2 script to use 0.6 AMIs by default, for now 2013-02-17 14:01:48 -08:00
Matei Zaharia 2a907dceb3 Merge pull request #421 from shivaram/spark-ec2-change
Switch spark_ec2.py to use the new spark-ec2 scripts.
2013-02-17 13:48:43 -08:00
Matei Zaharia 340cc54e47 Merge pull request #471 from stephenh/parallelrdd
Move ParallelCollection into spark.rdd package.
2013-02-16 16:39:15 -08:00
Matei Zaharia 3260b6120e Merge pull request #470 from stephenh/morek
Make CoGroupedRDDs explicitly have the same key type.
2013-02-16 16:38:38 -08:00
Stephen Haberman 924f47dd11 Add RDD.subtract.
Instead of reusing the cogroup primitive, this adds a SubtractedRDD
that knows it only needs to keep rdd1's values (per split) in memory.
2013-02-16 13:38:42 -06:00
Stephen Haberman e7713adb99 Move ParallelCollection into spark.rdd package. 2013-02-16 13:20:48 -06:00
Stephen Haberman ae2234687d Make CoGroupedRDDs explicitly have the same key type. 2013-02-16 13:10:31 -06:00
Matei Zaharia 9d979fb630 Merge pull request #469 from stephenh/samepartitionercombine
If combineByKey is using the same partitioner, skip the shuffle.
2013-02-16 10:07:42 -08:00
Stephen Haberman 4328873294 Add assertion about dependencies. 2013-02-16 01:16:40 -06:00
Stephen Haberman c34b8ad2c5 Avoid a shuffle if combineByKey is passed the same partitioner. 2013-02-16 00:54:03 -06:00
Stephen Haberman 4281e579c2 Update more javadocs. 2013-02-16 00:45:03 -06:00
haitao.yao 858784459f support customized java options for master, worker, executor, repl shell 2013-02-16 14:42:06 +08:00
Stephen Haberman 6a2d957843 Tweak test names. 2013-02-16 00:33:49 -06:00
Stephen Haberman 37397106ce Remove fileServerSuite.txt. 2013-02-16 00:31:07 -06:00
Stephen Haberman 6cd68c31cb Update default.parallelism docs, have StandaloneSchedulerBackend use it.
Only brand new RDDs (e.g. parallelize and makeRDD) now use default
parallelism, everything else uses their largest parent's partitioner
or partition size.
2013-02-16 00:29:11 -06:00
Matei Zaharia beb7ab8708 Merge pull request #467 from squito/executor_job_id
include jobid in Executor commandline args
2013-02-15 22:09:24 -08:00
haitao.yao a9cfac347a Merge branch 'mesos' 2013-02-16 10:11:28 +08:00
Imran Rashid bffee929ab Merge branch 'master' into stageInfo
Conflicts:
	core/src/main/scala/spark/rdd/CoGroupedRDD.scala
	core/src/main/scala/spark/storage/BlockManager.scala
2013-02-15 10:35:04 -08:00
Tathagata Das ddcb976b0d Made MasterFailureTest more robust. 2013-02-15 06:54:47 +00:00
Tathagata Das 3bcc6e5c03 Merge pull request #466 from pwendell/java-stream-transform
STREAMING-50: Support transform workaround in JavaPairDStream
2013-02-14 21:30:55 -08:00
Tathagata Das 4b8402e900 Moved Java streaming examples to examples/src/main/java/spark/streaming/... and fixed logging in NetworkInputTracker to highlight errors when receiver deregisters/shuts down. 2013-02-14 18:10:37 -08:00
Tathagata Das def8126d77 Added TwitterInputDStream from example to StreamingContext. Renamed example TwitterBasic to TwitterPopularTags. 2013-02-14 17:49:43 -08:00
Tathagata Das 2eacf22401 Removed countByKeyAndWindow on paired DStreams, and added countByValueAndWindow for all DStreams. Updated both scala and java API and testsuites. 2013-02-14 12:21:47 -08:00
Tathagata Das 03e8dc6861 Changes functions comments to make them more consistent. 2013-02-13 20:59:29 -08:00
Tathagata Das 12b020b668 Added filter functionality to reduceByKeyAndWindow with inverse. Consolidated reduceByKeyAndWindow's many functions into smaller number of functions with optional parameters. 2013-02-13 20:53:50 -08:00
Imran Rashid 893bad9089 use appid instead of frameworkid; simplify stupid condition 2013-02-13 20:30:21 -08:00
Matei Zaharia e8663e0fe5 Merge pull request #461 from JoshRosen/fix/issue-tracker-link
Update issue tracker link in contributing guide
2013-02-13 18:42:17 -08:00
Imran Rashid 8f18e7e863 include jobid in Executor commandline args 2013-02-13 13:05:13 -08:00
Tathagata Das 39addd3803 Changed scheduler and file input stream to fix bugs in the driver fault tolerance. Added MasterFailureTest to rigorously test master fault tolerance with file input stream. 2013-02-13 12:17:45 -08:00
Patrick Wendell 3f3e77f28b STREAMING-50: Support transform workaround in JavaPairDStream
This ports a useful workaround (the `transform` function) to
JavaPairDStream. It is necessary to do things like sorting which
are not supported yet in the core streaming API.
2013-02-12 14:02:32 -08:00
Matei Zaharia fd7e414bd0 Merge pull request #464 from pwendell/java-type-fix
SPARK-694: All references to [K, V] in JavaDStreamLike should be changed to [K2, V2]
2013-02-11 19:19:05 -08:00
Matei Zaharia bfeed4725d Merge pull request #465 from pwendell/java-sort-fix
SPARK-696: sortByKey should use 'ascending' parameter
2013-02-11 18:23:12 -08:00
Patrick Wendell 21df6ffc13 SPARK-696: sortByKey should use 'ascending' parameter 2013-02-11 17:43:26 -08:00
Matei Zaharia 582d31dff9 Formatting fixes 2013-02-11 13:24:54 -08:00
Matei Zaharia ea08537143 Fixed an exponential recursion that could happen with doCheckpoint due
to lack of memoization
2013-02-11 13:23:50 -08:00
Josh Rosen e9fb25426e Remove hack workaround for SPARK-668.
Renaming the type paramters solves this problem (see SPARK-694).

I tried this fix earlier, but it didn't work because I didn't run
`sbt/sbt clean` first.
2013-02-11 11:19:20 -08:00
Patrick Wendell d09c36065c Using tuple swap() 2013-02-11 10:45:45 -08:00
Patrick Wendell 04786d0739 small fix 2013-02-11 10:05:49 -08:00
Patrick Wendell c65988bdc1 Fix for MapPartitions 2013-02-11 10:03:37 -08:00
Patrick Wendell 20cf770545 Fix for flatmap 2013-02-11 10:03:37 -08:00
Patrick Wendell 314d87a038 Indentation fix 2013-02-11 10:03:37 -08:00
Patrick Wendell f0b68c623c Initial cut at replacing K, V in Java files 2013-02-11 10:03:37 -08:00