Commit graph

6602 commits

Author SHA1 Message Date
Hossein Falaki a7de8e9b1c Renamed countDistinct and countDistinctByKey methods to include Approx 2013-12-30 19:28:03 -08:00
Matei Zaharia 0fa5809768 Updated docs for SparkConf and handled review comments 2013-12-30 22:17:28 -05:00
Hossein Falaki d50ccc5ca9 Using origin version 2013-12-30 15:08:34 -08:00
Andrew Or 347fafe4fc Fix CheckpointSuite test fail 2013-12-30 13:10:33 -08:00
Andrew Or d6e7910d92 Simplify merge logic based on the invariant that all spills contain unique keys 2013-12-30 13:01:00 -08:00
Patrick Wendell 1cbef081e3 Response to Shivaram's review 2013-12-30 12:46:09 -08:00
Tathagata Das f4e4066191 Refactored kafka, flume, zeromq, mqtt as separate external projects, with their own self-contained scala API, java API, scala unit tests and java unit tests. Updated examples to use the external projects. 2013-12-30 11:13:24 -08:00
Andrew Or 2b71ab97c4 Merge pull request from aarondav: Utilize DiskBlockManager pathway for temp file writing
This gives us a couple advantages:

- Uses spark.local.dir and randomly selects a directory/disk.
- Ensure files are deleted on normal DiskBlockManager cleanup.
- Availability of same stats as usual DiskBlockObjectWriter (currenty unused).

Also enable basic cleanup when iterator is fully drained.
Still requires cleanup for operations that fail or don't go through all elements.
2013-12-30 11:01:30 -08:00
Patrick Wendell 50e3b8ec4c Merge pull request #308 from kayousterhout/stage_naming
Changed naming of StageCompleted event to be consistent

The rest of the SparkListener events are named with "SparkListener"
as the prefix of the name; this commit renames the StageCompleted
event to SparkListenerStageCompleted for consistency.
2013-12-30 07:44:26 -08:00
Jianping J Wang 600421d8bc update 2013-12-30 23:42:55 +08:00
Jianping J Wang 29fe6bdaa2 refactor and bug fix 2013-12-30 23:41:15 +08:00
Lian, Cheng 6d0e2e86df Response to comments from Reynold, Ameet and Evan
* Arguments renamed according to Ameet's suggestion
* Using DoubleMatrix instead of Array[Double] in computation
* Removed arguments C (kinds of label) and D (dimension of feature vector) from NaiveBayes.train()
* Replaced reduceByKey with foldByKey to avoid modifying original input data
2013-12-30 22:46:32 +08:00
Patrick Wendell cffe1c1d5c SPARK-1008: Logging improvments
1. Adds a default log4j file that gets loaded if users haven't specified a log4j file.
2. Isolates use of the tools assembly jar. I found this produced SLF4J warnings
   after building with SBT (and I've seen similar warnings on the mailing list).
2013-12-29 23:14:33 -08:00
Andrew Or 015a510b0a Merge branch 'master' of github.com:andrewor14/incubator-spark 2013-12-29 22:03:47 -08:00
Andrew Or 2a48d71528 Add test suite for ExternalAppendOnlyMap 2013-12-29 21:56:13 -08:00
Andrew Or 4a014dc59c Make serializer a parameter to ExternalAppendOnlyMap 2013-12-29 21:55:53 -08:00
Kay Ousterhout c2c1af39f5 Updated code style according to Patrick's comments 2013-12-29 21:10:08 -08:00
Aaron Davidson e3cac47e65 Use Comparator instead of Ordering
lower object creation costs
2013-12-29 19:58:37 -08:00
Matei Zaharia 994f080f8a Properly show Spark properties on web UI, and change app name property 2013-12-29 22:19:33 -05:00
Matei Zaharia eaa8a68ff0 Fix some Python docs and make sure to unset SPARK_TESTING in Python
tests so we don't get the test spark.conf on the classpath.
2013-12-29 20:15:07 -05:00
Andrew Or 8fbff9f5d0 Address Aaron's comments 2013-12-29 16:22:44 -08:00
Matei Zaharia 11540b798d Added tests for SparkConf and fixed a bug
Typesafe Config caches system properties the first time it's invoked
by default, ignoring later changes unless you do something special
2013-12-29 18:44:06 -05:00
Matei Zaharia 1ee7f5aee4 Fix a change that was lost during merge 2013-12-29 18:15:46 -05:00
Matei Zaharia 0bd1900cbc Fix a few settings that were being read as system properties after merge 2013-12-29 15:38:46 -05:00
Patrick Wendell 7a99702ce2 Respect supervise option at Master 2013-12-29 12:12:58 -08:00
Matei Zaharia b4ceed40d6 Merge remote-tracking branch 'origin/master' into conf2
Conflicts:
	core/src/main/scala/org/apache/spark/SparkContext.scala
	core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
	core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
	core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
	core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala
	core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala
	core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala
	core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
	new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
	streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
	streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
	streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
	streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala
	streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
	streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
	streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
	streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala
2013-12-29 15:08:08 -05:00
Patrick Wendell a8729770f5 Slight change to retry logic 2013-12-29 11:57:57 -08:00
Matei Zaharia 58c6fa2041 Add Python docs about SparkConf 2013-12-29 14:46:59 -05:00
Patrick Wendell 8da1012f9b TODO clean-up 2013-12-29 11:38:12 -08:00
Matei Zaharia 615fb649d6 Fix some other Python tests due to initializing JVM in a different way
The test in context.py created two different instances of the
SparkContext class by copying "globals", so that some tests can have a
global "sc" object and others can try initializing their own contexts.
This led to two JVM gateways being created since SparkConf also looked
at pyspark.context.SparkContext to get the JVM.
2013-12-29 14:32:05 -05:00
Patrick Wendell faefea3fd8 Adding driver ID to submission response 2013-12-29 11:31:10 -08:00
Patrick Wendell 6ffa9bb226 Documentation and adding supervise option 2013-12-29 11:26:56 -08:00
Patrick Wendell 35f6dc252a Changes to allow fate sharing of drivers/executors and workers. 2013-12-29 11:14:36 -08:00
Matei Zaharia cd00225db9 Add SparkConf support in Python 2013-12-29 14:03:39 -05:00
Lian, Cheng f150b6e76c Response to Reynold's comments 2013-12-29 17:13:01 +08:00
Reynold Xin 72a17b69f5 Revert "Merge pull request #310 from jyunfan/master"
This reverts commit 79b20e4dbe, reversing
changes made to 7375047d51.
2013-12-28 21:25:40 -10:00
Reynold Xin 79b20e4dbe Merge pull request #310 from jyunfan/master
Fix typo in the Accumulators section

Change 'val' to 'var'
2013-12-28 21:13:36 -10:00
Matei Zaharia 1c11f54a9b Fix Python use of getLocalDir 2013-12-29 00:11:36 -05:00
Tor Myklebust fec01664a7 Make Python function/line appear in the UI. 2013-12-28 23:34:16 -05:00
Tor Myklebust d812aeece9 Factor call site reporting out to SparkContext. 2013-12-28 23:21:49 -05:00
Matei Zaharia 20631348d1 Fix other failing tests 2013-12-28 23:17:58 -05:00
Jyun-Fan Tsai 17f6620a71 Fix typo in the Accumulators section
val => var
2013-12-29 11:30:02 +08:00
Matei Zaharia 0900d5c72a Add a StreamingContext constructor that takes a conf object 2013-12-28 21:38:07 -05:00
Matei Zaharia a8f316386a Fix CheckpointSuite test failures 2013-12-28 21:26:43 -05:00
Matei Zaharia 578bd1fc28 Fix test failures due to setting / clearing clock type in Streaming 2013-12-28 21:21:06 -05:00
Matei Zaharia 5bbe73864e Fix Executor not getting properties in local mode 2013-12-28 17:31:58 -05:00
Matei Zaharia a16c52ed1b Check for SPARK_YARN_MODE through a system property too since it can
sometimes be set that way (undoes a change in previous commit)
2013-12-28 17:24:21 -05:00
Matei Zaharia 642029e7f4 Various fixes to configuration code
- Got rid of global SparkContext.globalConf
- Pass SparkConf to serializers and compression codecs
- Made SparkConf public instead of private[spark]
- Improved API of SparkContext and SparkConf
- Switched executor environment vars to be passed through SparkConf
- Fixed some places that were still using system properties
- Fixed some tests, though others are still failing

This still fails several tests in core, repl and streaming, likely due
to properties not being set or cleared correctly (some of the tests run
fine in isolation).
2013-12-28 17:13:15 -05:00
Patrick Wendell 7375047d51 Merge pull request #304 from kayousterhout/remove_unused
Removed unused failed and causeOfFailure variables (in TaskSetManager)
2013-12-28 13:25:06 -08:00
Matei Zaharia ad3dfd1531 Merge pull request #307 from kayousterhout/other_failure
Removed unused OtherFailure TaskEndReason.

The OtherFailure TaskEndReason was added by @mateiz 3 years ago in this commit: 24a1e7f838

Unless I am missing something, it doesn't seem to have been used then, and is not used now, so seems safe for deletion.
2013-12-27 22:10:14 -05:00