Matei Zaharia
44b4a0f88f
Track workers by executor ID instead of hostname to allow multiple
...
executors per machine and remove the need for multiple IP addresses in
unit tests.
2013-01-27 19:23:49 -08:00
Matei Zaharia
6ad8540b40
Merge pull request #401 from squito/blockmanager_ui
...
Blockmanager ui
2013-01-27 15:51:08 -08:00
Matei Zaharia
49f6472c0f
Merge pull request #418 from woggling/reregister-deadlock
...
Fix BlockManager reregistration deadlock; do BlockManager reregistration more asynchronously
2013-01-26 18:59:02 -08:00
Charles Reiss
58fc6b2bed
Handle duplicate registrations better.
2013-01-26 18:30:44 -08:00
Charles Reiss
ad4232b4da
Fix deadlock in BlockManager reregistration triggered by failed updates.
2013-01-26 18:30:38 -08:00
Josh Rosen
d49cf0e587
Fix JavaRDDLike.flatMap(PairFlatMapFunction) (SPARK-668).
...
This workaround is easier than rewriting JavaRDDLike in Java.
2013-01-26 16:13:18 -08:00
Imran Rashid
49c05608f5
add metadatacleaner for persisentRdd map
2013-01-25 17:04:16 -08:00
Stephen Haberman
8efbda0b17
Call executeOnCompleteCallbacks in more finally blocks.
2013-01-25 14:55:33 -06:00
Imran Rashid
a1d9d1767d
fixup 1cadaa1
, changed api of map
2013-01-25 10:05:26 -08:00
Imran Rashid
1cadaa164e
switch to TimeStampedHashMap for storing persistent Rdds
2013-01-25 09:30:21 -08:00
Imran Rashid
539491bbc3
code reformatting
2013-01-25 09:29:59 -08:00
Stephen Haberman
7dfb82a992
Replace old 'master' term with 'driver'.
2013-01-25 11:03:00 -06:00
Stephen Haberman
ec43a51b38
Merge branch 'master' into localsparkcontext
...
Conflicts:
core/src/test/scala/spark/FileServerSuite.scala
core/src/test/scala/spark/RDDSuite.scala
2013-01-24 21:17:30 -06:00
Patrick Wendell
b6fc6e6752
SPARK-541: Adding a warning for invalid Master URL
...
Right now Spark silently parses master URL's which do not match any
known regex as a Mesos URL. The Mesos error message when an invalid URL gets
passed is really confusing, so this warns the user when the implicit
conversion is happening.
2013-01-24 14:31:23 -08:00
Stephen Haberman
230bda2047
Add LocalSparkContext to manage common sc variable.
2013-01-24 11:01:01 -06:00
Matei Zaharia
0fe173a3a5
Merge pull request #410 from rxin/splitpruningrdd
...
Added a clearDependencies method in PartitionPruningRDD.
2013-01-23 23:10:15 -08:00
Reynold Xin
67a43bc7e6
Added a clearDependencies method in PartitionPruningRDD.
2013-01-23 23:06:52 -08:00
Matei Zaharia
fe5e4812fc
Merge pull request #409 from rxin/splitpruningrdd
...
Added pruntSplits method to RDD.
2013-01-23 22:23:22 -08:00
Reynold Xin
c109f29c97
Updated PruneDependency to change "split" to "partition".
2013-01-23 22:22:03 -08:00
Reynold Xin
eedc542a02
Removed pruneSplits method in RDD and renamed SplitsPruningRDD to
...
PartitionPruningRDD.
2013-01-23 22:14:23 -08:00
Reynold Xin
81004b967e
Marked prev RDD as transient in SplitsPruningRDD.
2013-01-23 21:54:27 -08:00
Reynold Xin
636e912f32
Created a PruneDependency to properly assign dependency for
...
SplitsPruningRDD.
2013-01-23 21:21:55 -08:00
Reynold Xin
45cd50d5fe
Updated assert == to ===.
2013-01-23 16:06:58 -08:00
Matei Zaharia
548856a224
Merge remote-tracking branch 'woggling/remove-machines'
...
Conflicts:
core/src/main/scala/spark/scheduler/DAGScheduler.scala
2013-01-23 15:44:17 -08:00
Reynold Xin
c24b3819dd
Added an extra assert for split size check.
2013-01-23 15:34:59 -08:00
Reynold Xin
eb222b7206
Added pruntSplits method to RDD.
2013-01-23 15:29:02 -08:00
Matei Zaharia
1dd82743e0
Fix compile error due to cherry-pick
2013-01-23 13:07:27 -08:00
Charles Reiss
5c7422292e
Remove more dead code from test.
2013-01-23 12:59:51 -08:00
Imran Rashid
e1985bfa04
be sure to set class loader of kryo instances
2013-01-23 12:51:09 -08:00
Charles Reiss
be4a115a7e
Clarify TODO.
2013-01-23 12:48:45 -08:00
Charles Reiss
88b9d240fd
Remove dead code in test.
2013-01-23 12:40:38 -08:00
Matei Zaharia
1a3aeeca23
Merge pull request #407 from woggling/no-cache-tracker
...
Eliminate CacheTracker
2013-01-23 12:28:48 -08:00
Charles Reiss
e1027ca639
Actually add CacheManager.
2013-01-23 12:22:11 -08:00
Matei Zaharia
4147e1d47b
Merge pull request #406 from tdas/master
...
Changed StorageLevel and BlockManagerId API to prevent duplication in memory
2013-01-23 12:18:31 -08:00
Matei Zaharia
4d77d554e1
Merge pull request #394 from JoshRosen/add_file_fix
...
Add SparkFiles.get() API to access files added through addFile().
2013-01-23 12:16:30 -08:00
Josh Rosen
ae2ed2947d
Allow PySpark's SparkFiles to be used from driver
...
Fix minor documentation formatting issues.
2013-01-23 10:58:50 -08:00
Tathagata Das
79d55700ce
One more fix. Made even default constructor of BlockManagerId private to prevent such problems in the future.
2013-01-23 01:57:09 -08:00
Charles Reiss
0b506dd2ec
Add tests of various node failure scenarios.
2013-01-23 01:38:15 -08:00
Charles Reiss
d209b6b764
Extra debugging from hostLost()
2013-01-23 01:35:14 -08:00
Charles Reiss
9a27062260
Force generation increment after shuffle map stage
2013-01-23 01:34:44 -08:00
Tathagata Das
155f31398d
Made StorageLevel constructor private, and added StorageLevels.create() to the Java API. Updates scala and java programming guides.
2013-01-23 01:10:26 -08:00
Tathagata Das
5e11f1e51f
Modified StorageLevel API to ensure zero duplicate objects.
2013-01-22 23:42:53 -08:00
Tathagata Das
bacade6caf
Modified BlockManagerId API to ensure zero duplicate objects. Fixed BlockManagerId testcase in BlockManagerTestSuite.
2013-01-22 22:55:26 -08:00
Josh Rosen
43e9ff9596
Add test for driver hanging on exit (SPARK-530).
2013-01-22 22:47:26 -08:00
Charles Reiss
2849931000
Eliminate CacheTracker.
...
Replaces DAGScheduler's queries of CacheTracker with BlockManagerMaster
queries.
Adds CacheManager to locally coordinate computation of cached RDDs.
2013-01-22 22:19:30 -08:00
Matei Zaharia
ebaa8f6519
Merge remote-tracking branch 'stephenh/cleanup'
...
Conflicts:
core/src/main/scala/spark/scheduler/local/LocalScheduler.scala
2013-01-22 21:05:45 -08:00
Matei Zaharia
d2d273868b
Merge pull request #397 from JoshRosen/refactoring/daemon-threads
...
Refactor daemon thread creation
2013-01-22 21:02:53 -08:00
Stephen Haberman
98d0b7747d
Fix Worker logInfo about unknown executor.
2013-01-22 18:11:51 -06:00
Stephen Haberman
8c51322cd0
Don't bother creating an exception.
2013-01-22 18:09:10 -06:00
Stephen Haberman
fdec42385a
Fix SPARK_MEM in ExecutorRunner.
2013-01-22 18:01:12 -06:00
Stephen Haberman
2437f6741b
Restore SPARK_MEM in executorEnvs.
2013-01-22 18:01:03 -06:00
Matei Zaharia
151c47eef5
Merge pull request #399 from NFLabs/master
...
Fix for hanging spark.HttpFileServer on the kind of virtual network
2013-01-22 15:49:24 -08:00
Stephen Haberman
250fe89679
Handle Master telling the Worker to kill an already-dead executor.
2013-01-22 16:29:05 -06:00
Stephen Haberman
6f2194f757
Call removeJob instead of killing the cluster.
2013-01-22 15:38:58 -06:00
Stephen Haberman
27b3f3f0a9
Handle slaveLost before slaveIdToHost knows about it.
2013-01-22 15:30:42 -06:00
Imran Rashid
905c720e5e
Merge branch 'master' into blockmanager_ui
...
Conflicts:
core/src/main/scala/spark/RDD.scala
2013-01-22 12:02:27 -08:00
Imran Rashid
50e2b23927
Fix up some problems from the merge
2013-01-22 11:46:01 -08:00
Stephen Haberman
588b24197a
Use default arguments instead of constructor overloads.
2013-01-22 10:19:30 -06:00
Leemoonsoo
7e9ee2e833
Fix for hanging spark.HttpFileServer with kind of virtual network
2013-01-22 23:08:34 +09:00
Charles Reiss
e353886a8c
Use generation numbers for fetch failure tracking
2013-01-22 00:23:31 -08:00
Josh Rosen
551a47a620
Refactor daemon thread pool creation.
2013-01-21 23:31:00 -08:00
Stephen Haberman
a8baeb9327
Further simplify getOrElse call.
2013-01-21 21:30:24 -06:00
Stephen Haberman
2d8218b871
Remove unneeded/now-broken saveAsNewAPIHadoopFile overload.
2013-01-21 20:00:27 -06:00
Josh Rosen
7b9e96c992
Add synchronization to Executor.updateDependencies() (SPARK-662)
2013-01-21 17:34:23 -08:00
Josh Rosen
ef711902c1
Don't download files to master's working directory.
...
This should avoid exceptions caused by existing
files with different contents.
I also removed some unused code.
2013-01-21 17:34:17 -08:00
Stephen Haberman
ffd1623595
Minor cleanup.
2013-01-21 15:55:46 -06:00
Matei Zaharia
a88b44ed3b
Only bind to IPv4 addresses when trying to auto-detect external IP
2013-01-21 11:59:21 -08:00
Matei Zaharia
4d34c7fc3e
Fix compile error caused by cherry-pick
2013-01-21 11:33:48 -08:00
Imran Rashid
a3f571b539
more File -> String changes
2013-01-21 11:21:52 -08:00
Imran Rashid
fe26acc482
remove unused imports
2013-01-21 11:21:46 -08:00
Imran Rashid
c73107500e
send sparkHome as String instead of File over network
2013-01-21 11:21:39 -08:00
Imran Rashid
5bf73df7f0
oops, fix stupid compile error
2013-01-21 11:21:33 -08:00
Imran Rashid
aae5a920a4
get sparkHome the correct way
2013-01-21 11:21:28 -08:00
Imran Rashid
f116d6b5c6
executor can use a different sparkHome from Worker
2013-01-21 11:21:22 -08:00
Stephen Haberman
6ded481999
Merge branch 'master' into hadoopconf
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/api/java/JavaSparkContext.scala
2013-01-21 12:56:48 -06:00
Stephen Haberman
69a417858b
Also use hadoopConfiguration in newAPI methods.
2013-01-21 12:42:11 -06:00
Matei Zaharia
c0b9ceb8c3
Log remote lifecycle events in Akka for easier debugging
2013-01-21 00:23:53 -08:00
Matei Zaharia
c7b5e5f1ec
Merge pull request #389 from JoshRosen/python_rdd_checkpointing
...
Add checkpointing to the Python API
2013-01-20 17:10:44 -08:00
Josh Rosen
9f211dd3f0
Fix PythonPartitioner equality; see SPARK-654.
...
PythonPartitioner did not take the Python-side partitioning function
into account when checking for equality, which might cause problems
in the future.
2013-01-20 15:41:42 -08:00
Josh Rosen
5b6ea9e9a0
Update checkpointing API docs in Python/Java.
2013-01-20 15:31:41 -08:00
Josh Rosen
7ed1bf4b48
Add RDD checkpointing to Python API.
2013-01-20 13:19:19 -08:00
Matei Zaharia
86057ec7c8
Merge branch 'master' into streaming
...
Conflicts:
core/src/main/scala/spark/api/python/PythonRDD.scala
2013-01-20 12:47:55 -08:00
Matei Zaharia
8e7f098a2c
Added accumulators to PySpark
2013-01-20 01:57:44 -08:00
Tathagata Das
4f8fe58b25
Merge branch 'mesos-streaming' into streaming
...
Conflicts:
core/src/main/scala/spark/api/java/JavaRDDLike.scala
core/src/main/scala/spark/api/java/JavaSparkContext.scala
core/src/test/scala/spark/JavaAPISuite.java
2013-01-20 01:13:56 -08:00
Tathagata Das
214345ceac
Fixed issue https://spark-project.atlassian.net/browse/STREAMING-29 , along with updates to doc comments in SparkContext.checkpoint().
2013-01-19 23:50:17 -08:00
Imran Rashid
d98caa0fa0
Merge remote-tracking branch 'dennybritz/blockmanagerUI' into blockmanager_ui
...
Conflicts:
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/storage/BlockManagerMaster.scala
core/src/main/scala/spark/storage/StorageLevel.scala
2013-01-18 18:11:26 -08:00
Patrick Wendell
ee0314c3b3
Merge branch 'streaming' into streaming-java-api
2013-01-17 18:43:00 -08:00
Patrick Wendell
d5570c7968
Adding checkpointing to Java API
2013-01-17 18:41:58 -08:00
Matei Zaharia
54c0f9f185
Fix code that assumed spark.local.dir is only a single directory
2013-01-17 17:40:55 -08:00
Fernand Pajot
742bc841ad
changed HttpBroadcast server cache to be in spark.local.dir instead of java.io.tmpdir
2013-01-17 16:56:11 -08:00
Matei Zaharia
aff1844155
Merge pull request #381 from squito/remove_threadpool
...
remove unused thread pool
2013-01-16 16:46:42 -08:00
Tathagata Das
f466ee44bc
Merge branch 'master' into streaming
...
Conflicts:
core/src/main/scala/spark/MapOutputTracker.scala
2013-01-16 12:57:11 -08:00
Imran Rashid
eae698f755
remove unused thread pool
2013-01-16 12:21:37 -08:00
Tathagata Das
a805ac4a7c
Disabled checkpoint for PairwiseRDD (pySpark).
2013-01-16 10:55:26 -08:00
Matei Zaharia
4beb084f64
Merge pull request #374 from woggling/null-mapout
...
Generate FetchFailedException even for cached missing map outputs
2013-01-15 14:22:29 -08:00
Tathagata Das
cd1521cfdb
Merge branch 'master' into streaming
...
Conflicts:
core/src/main/scala/spark/rdd/CoGroupedRDD.scala
core/src/main/scala/spark/rdd/FilteredRDD.scala
docs/_layouts/global.html
docs/index.md
run
2013-01-15 12:08:51 -08:00
Charles Reiss
4078623b9f
Remove broken attempt to test fetching case.
2013-01-15 12:05:54 -08:00
Stephen Haberman
74d3b23929
Add spark.executor.memory to differentiate executor memory from spark-shell memory.
2013-01-15 14:03:28 -06:00
Stephen Haberman
d228bff440
Add a test.
2013-01-15 11:48:50 -06:00
Stephen Haberman
dd583b7ebf
Call executeOnCompleteCallbacks in a finally block.
2013-01-15 10:52:06 -06:00
Tathagata Das
eded21925a
Merge pull request #375 from tdas/streaming
...
Important bug fixes
2013-01-14 23:06:40 -08:00
Charles Reiss
b038999797
Fix accidental spark.master.host reuse
2013-01-14 17:04:44 -08:00
Charles Reiss
7ba34bc007
Additional tests for MapOutputTracker.
2013-01-14 15:27:02 -08:00
Charles Reiss
273fb5cc10
Throw FetchFailedException for cached missing locs
2013-01-14 15:26:48 -08:00
Tathagata Das
131be5d62e
Fixed bug in RDD checkpointing.
2013-01-14 03:28:25 -08:00
Tathagata Das
82b0cc90ca
Merge pull request #370 from tdas/streaming
...
Added more documentation and minor change in API for NetworkReceiver
2013-01-13 21:28:12 -08:00
Tathagata Das
0dbd411a56
Added documentation for PairDStreamFunctions.
2013-01-13 21:08:35 -08:00
Matei Zaharia
cb867e9ffb
Merge branch 'master' of github.com:mesos/spark
2013-01-13 19:34:32 -08:00
Matei Zaharia
72408e8dfa
Make filter preserve partitioner info, since it can
2013-01-13 19:34:07 -08:00
Matei Zaharia
9a34409810
Merge pull request #360 from rxin/cogroup-java
...
Changed CoGroupRDD's hash map from Scala to Java.
2013-01-13 15:31:08 -08:00
Reynold Xin
be7166146b
Removed the use of getOrElse to avoid Scala wrapper for every call.
2013-01-13 15:27:28 -08:00
Ryan LeCompte
c31931af7e
switch to uppercase constants
2013-01-13 10:39:47 -08:00
Ryan LeCompte
2305a2c1d9
more code cleanup
2013-01-13 10:01:56 -08:00
Mikhail Bautin
88d8f11365
Add missing dependency spray-json to Maven build
2013-01-13 00:46:25 -08:00
Matei Zaharia
fbb3fc4143
Merge pull request #346 from JoshRosen/python-api
...
Python API (PySpark)
2013-01-12 23:49:36 -08:00
Matei Zaharia
01413ca0e7
Merge pull request #364 from tysonjh/master
...
Executor and JobDescription JSON support added
2013-01-12 16:17:07 -08:00
Matei Zaharia
995075bf79
Merge pull request #355 from shivaram/default-hadoop-pom
...
Activate hadoop1 profile by default for maven builds
2013-01-12 15:38:36 -08:00
Shivaram Venkataraman
bbc56d85ed
Rename environment variable for hadoop profiles to hadoopVersion
2013-01-12 15:24:13 -08:00
Ryan LeCompte
addff2c466
add comment
2013-01-12 09:57:29 -08:00
Ryan LeCompte
ea20ae6618
add one extra test
2013-01-12 09:18:00 -08:00
Ryan LeCompte
2c77eeebb6
correct test params
2013-01-12 00:13:45 -08:00
Ryan LeCompte
0cfea7a2ec
add unit test
2013-01-11 23:48:07 -08:00
Ryan LeCompte
ff10b3aa09
add missing return
2013-01-11 21:03:57 -08:00
Ryan LeCompte
22445fbea9
attempt to sleep for more accurate time period, minor cleanup
2013-01-11 13:30:49 -08:00
Tyson
1731f1fed4
Added an optional format parameter for individual job queries and optimized the jobId query
2013-01-11 15:01:43 -05:00
Tyson
c063e8777e
Added implicit json writers for JobDescription and ExecutorRunner
2013-01-11 14:57:38 -05:00
Stephen Haberman
5c7a127219
Pass a new Configuration that wraps the default hadoopConfiguration.
2013-01-11 11:25:11 -06:00
Stephen Haberman
3e6519a36e
Use hadoopConfiguration for default JobConf in PairRDDFunctions.
2013-01-11 11:24:20 -06:00
Shivaram Venkataraman
9262522306
Activate hadoop2 profile in pom.xml with -Dhadoop=2
2013-01-10 22:07:34 -08:00
Matei Zaharia
2e914d9983
Formatting
2013-01-10 19:13:08 -08:00
Matei Zaharia
3548c9c0c8
Merge branch 'master' of github.com:mesos/spark
2013-01-10 19:06:40 -08:00
Matei Zaharia
6d1c230281
Merge pull request #357 from tysonjh/master
...
JSON support added to WebUI
2013-01-10 19:06:07 -08:00
Matei Zaharia
248995c535
Merge pull request #356 from shane-huang/master
...
Fix an issue in ConnectionManager where sendMessage may create too many unnecessary connections
2013-01-10 17:52:23 -08:00
Reynold Xin
bd336f5f40
Changed CoGroupRDD's hash map from Scala to Java.
2013-01-10 17:13:04 -08:00
Stephen Haberman
d1864052c5
Fix invalid asInstanceOf cast.
2013-01-10 12:16:26 -06:00
Stephen Haberman
b15e851279
Check for AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY environment variables.
...
For custom properties, use "spark.hadoop.*" as a prefix instead of just "hadoop.*".
2013-01-10 10:55:41 -06:00
shane-huang
9930a95d21
Modified Patch according to comments
2013-01-10 20:09:55 +08:00
Stephen Haberman
e3861ae395
Provide and expose a default Hadoop Configuration.
...
Any "hadoop.*" system properties will be passed along into configuration.
2013-01-09 17:08:14 -06:00
Tyson
549ee388a1
Removed io.spray spray-json dependency as it is not needed.
2013-01-09 15:12:23 -05:00
Tyson
bf9d9946f9
Query parameter reformatted to be more extensible and routing more robust
2013-01-09 11:29:58 -05:00
Tyson
0da2ff102e
Added url query parameter json and handler
2013-01-09 10:40:48 -05:00
Tyson
269fe018c7
JSON object definitions
2013-01-09 10:40:43 -05:00
Matei Zaharia
9cc764f523
Code style
2013-01-08 22:29:57 -08:00
Matei Zaharia
14972141f9
Merge pull request #344 from mbautin/log_preferred_hosts
...
Log preferred hosts
2013-01-08 22:26:34 -08:00
Josh Rosen
b57dd0f160
Add mapPartitionsWithSplit() to PySpark.
2013-01-08 16:05:02 -08:00
Stephen Haberman
8ac0f35be4
Add JavaRDDLike.keyBy.
2013-01-08 09:57:45 -06:00
Stephen Haberman
4ee6b22775
Merge branch 'master' into tupleBy
...
Conflicts:
core/src/test/scala/spark/RDDSuite.scala
2013-01-08 09:10:10 -06:00
shane-huang
e4cb72da8a
Fix an issue in ConnectionManager where sendingMessage may create too many unnecessary SendingConnections.
2013-01-08 22:40:58 +08:00
Shivaram Venkataraman
f7adb382ac
Activate hadoop1 if property hadoop is missing. hadoop2 can be activated now
...
by using -Dhadoop -Phadoop2.
2013-01-08 03:19:43 -08:00
Mikhail Bautin
4725b0f643
Fixing if/else coding style for preferred hosts logging
2013-01-07 20:09:26 -08:00