Hossein Falaki
a7de8e9b1c
Renamed countDistinct and countDistinctByKey methods to include Approx
2013-12-30 19:28:03 -08:00
Matei Zaharia
0fa5809768
Updated docs for SparkConf and handled review comments
2013-12-30 22:17:28 -05:00
Hossein Falaki
d50ccc5ca9
Using origin version
2013-12-30 15:08:34 -08:00
Andrew Or
347fafe4fc
Fix CheckpointSuite test fail
2013-12-30 13:10:33 -08:00
Andrew Or
d6e7910d92
Simplify merge logic based on the invariant that all spills contain unique keys
2013-12-30 13:01:00 -08:00
Patrick Wendell
1cbef081e3
Response to Shivaram's review
2013-12-30 12:46:09 -08:00
Tathagata Das
f4e4066191
Refactored kafka, flume, zeromq, mqtt as separate external projects, with their own self-contained scala API, java API, scala unit tests and java unit tests. Updated examples to use the external projects.
2013-12-30 11:13:24 -08:00
Andrew Or
2b71ab97c4
Merge pull request from aarondav: Utilize DiskBlockManager pathway for temp file writing
...
This gives us a couple advantages:
- Uses spark.local.dir and randomly selects a directory/disk.
- Ensure files are deleted on normal DiskBlockManager cleanup.
- Availability of same stats as usual DiskBlockObjectWriter (currenty unused).
Also enable basic cleanup when iterator is fully drained.
Still requires cleanup for operations that fail or don't go through all elements.
2013-12-30 11:01:30 -08:00
Patrick Wendell
50e3b8ec4c
Merge pull request #308 from kayousterhout/stage_naming
...
Changed naming of StageCompleted event to be consistent
The rest of the SparkListener events are named with "SparkListener"
as the prefix of the name; this commit renames the StageCompleted
event to SparkListenerStageCompleted for consistency.
2013-12-30 07:44:26 -08:00
Jianping J Wang
600421d8bc
update
2013-12-30 23:42:55 +08:00
Jianping J Wang
29fe6bdaa2
refactor and bug fix
2013-12-30 23:41:15 +08:00
Lian, Cheng
6d0e2e86df
Response to comments from Reynold, Ameet and Evan
...
* Arguments renamed according to Ameet's suggestion
* Using DoubleMatrix instead of Array[Double] in computation
* Removed arguments C (kinds of label) and D (dimension of feature vector) from NaiveBayes.train()
* Replaced reduceByKey with foldByKey to avoid modifying original input data
2013-12-30 22:46:32 +08:00
Patrick Wendell
cffe1c1d5c
SPARK-1008: Logging improvments
...
1. Adds a default log4j file that gets loaded if users haven't specified a log4j file.
2. Isolates use of the tools assembly jar. I found this produced SLF4J warnings
after building with SBT (and I've seen similar warnings on the mailing list).
2013-12-29 23:14:33 -08:00
Andrew Or
015a510b0a
Merge branch 'master' of github.com:andrewor14/incubator-spark
2013-12-29 22:03:47 -08:00
Andrew Or
2a48d71528
Add test suite for ExternalAppendOnlyMap
2013-12-29 21:56:13 -08:00
Andrew Or
4a014dc59c
Make serializer a parameter to ExternalAppendOnlyMap
2013-12-29 21:55:53 -08:00
Kay Ousterhout
c2c1af39f5
Updated code style according to Patrick's comments
2013-12-29 21:10:08 -08:00
Aaron Davidson
e3cac47e65
Use Comparator instead of Ordering
...
lower object creation costs
2013-12-29 19:58:37 -08:00
Matei Zaharia
994f080f8a
Properly show Spark properties on web UI, and change app name property
2013-12-29 22:19:33 -05:00
Matei Zaharia
eaa8a68ff0
Fix some Python docs and make sure to unset SPARK_TESTING in Python
...
tests so we don't get the test spark.conf on the classpath.
2013-12-29 20:15:07 -05:00
Andrew Or
8fbff9f5d0
Address Aaron's comments
2013-12-29 16:22:44 -08:00
Matei Zaharia
11540b798d
Added tests for SparkConf and fixed a bug
...
Typesafe Config caches system properties the first time it's invoked
by default, ignoring later changes unless you do something special
2013-12-29 18:44:06 -05:00
Matei Zaharia
1ee7f5aee4
Fix a change that was lost during merge
2013-12-29 18:15:46 -05:00
Matei Zaharia
0bd1900cbc
Fix a few settings that were being read as system properties after merge
2013-12-29 15:38:46 -05:00
Patrick Wendell
7a99702ce2
Respect supervise option at Master
2013-12-29 12:12:58 -08:00
Matei Zaharia
b4ceed40d6
Merge remote-tracking branch 'origin/master' into conf2
...
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala
core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala
core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala
2013-12-29 15:08:08 -05:00
Patrick Wendell
a8729770f5
Slight change to retry logic
2013-12-29 11:57:57 -08:00
Matei Zaharia
58c6fa2041
Add Python docs about SparkConf
2013-12-29 14:46:59 -05:00
Patrick Wendell
8da1012f9b
TODO clean-up
2013-12-29 11:38:12 -08:00
Matei Zaharia
615fb649d6
Fix some other Python tests due to initializing JVM in a different way
...
The test in context.py created two different instances of the
SparkContext class by copying "globals", so that some tests can have a
global "sc" object and others can try initializing their own contexts.
This led to two JVM gateways being created since SparkConf also looked
at pyspark.context.SparkContext to get the JVM.
2013-12-29 14:32:05 -05:00
Patrick Wendell
faefea3fd8
Adding driver ID to submission response
2013-12-29 11:31:10 -08:00
Patrick Wendell
6ffa9bb226
Documentation and adding supervise option
2013-12-29 11:26:56 -08:00
Patrick Wendell
35f6dc252a
Changes to allow fate sharing of drivers/executors and workers.
2013-12-29 11:14:36 -08:00
Matei Zaharia
cd00225db9
Add SparkConf support in Python
2013-12-29 14:03:39 -05:00
Lian, Cheng
f150b6e76c
Response to Reynold's comments
2013-12-29 17:13:01 +08:00
Reynold Xin
72a17b69f5
Revert "Merge pull request #310 from jyunfan/master"
...
This reverts commit 79b20e4dbe
, reversing
changes made to 7375047d51
.
2013-12-28 21:25:40 -10:00
Reynold Xin
79b20e4dbe
Merge pull request #310 from jyunfan/master
...
Fix typo in the Accumulators section
Change 'val' to 'var'
2013-12-28 21:13:36 -10:00
Matei Zaharia
1c11f54a9b
Fix Python use of getLocalDir
2013-12-29 00:11:36 -05:00
Tor Myklebust
fec01664a7
Make Python function/line appear in the UI.
2013-12-28 23:34:16 -05:00
Tor Myklebust
d812aeece9
Factor call site reporting out to SparkContext.
2013-12-28 23:21:49 -05:00
Matei Zaharia
20631348d1
Fix other failing tests
2013-12-28 23:17:58 -05:00
Jyun-Fan Tsai
17f6620a71
Fix typo in the Accumulators section
...
val => var
2013-12-29 11:30:02 +08:00
Matei Zaharia
0900d5c72a
Add a StreamingContext constructor that takes a conf object
2013-12-28 21:38:07 -05:00
Matei Zaharia
a8f316386a
Fix CheckpointSuite test failures
2013-12-28 21:26:43 -05:00
Matei Zaharia
578bd1fc28
Fix test failures due to setting / clearing clock type in Streaming
2013-12-28 21:21:06 -05:00
Matei Zaharia
5bbe73864e
Fix Executor not getting properties in local mode
2013-12-28 17:31:58 -05:00
Matei Zaharia
a16c52ed1b
Check for SPARK_YARN_MODE through a system property too since it can
...
sometimes be set that way (undoes a change in previous commit)
2013-12-28 17:24:21 -05:00
Matei Zaharia
642029e7f4
Various fixes to configuration code
...
- Got rid of global SparkContext.globalConf
- Pass SparkConf to serializers and compression codecs
- Made SparkConf public instead of private[spark]
- Improved API of SparkContext and SparkConf
- Switched executor environment vars to be passed through SparkConf
- Fixed some places that were still using system properties
- Fixed some tests, though others are still failing
This still fails several tests in core, repl and streaming, likely due
to properties not being set or cleared correctly (some of the tests run
fine in isolation).
2013-12-28 17:13:15 -05:00
Patrick Wendell
7375047d51
Merge pull request #304 from kayousterhout/remove_unused
...
Removed unused failed and causeOfFailure variables (in TaskSetManager)
2013-12-28 13:25:06 -08:00
Matei Zaharia
ad3dfd1531
Merge pull request #307 from kayousterhout/other_failure
...
Removed unused OtherFailure TaskEndReason.
The OtherFailure TaskEndReason was added by @mateiz 3 years ago in this commit: 24a1e7f838
Unless I am missing something, it doesn't seem to have been used then, and is not used now, so seems safe for deletion.
2013-12-27 22:10:14 -05:00