Commit graph

1065 commits

Author SHA1 Message Date
Matei Zaharia 44b4a0f88f Track workers by executor ID instead of hostname to allow multiple
executors per machine and remove the need for multiple IP addresses in
unit tests.
2013-01-27 19:23:49 -08:00
Matei Zaharia 6ad8540b40 Merge pull request #401 from squito/blockmanager_ui
Blockmanager ui
2013-01-27 15:51:08 -08:00
Matei Zaharia 49f6472c0f Merge pull request #418 from woggling/reregister-deadlock
Fix BlockManager reregistration deadlock; do BlockManager reregistration more asynchronously
2013-01-26 18:59:02 -08:00
Charles Reiss 58fc6b2bed Handle duplicate registrations better. 2013-01-26 18:30:44 -08:00
Charles Reiss ad4232b4da Fix deadlock in BlockManager reregistration triggered by failed updates. 2013-01-26 18:30:38 -08:00
Josh Rosen d49cf0e587 Fix JavaRDDLike.flatMap(PairFlatMapFunction) (SPARK-668).
This workaround is easier than rewriting JavaRDDLike in Java.
2013-01-26 16:13:18 -08:00
Imran Rashid 49c05608f5 add metadatacleaner for persisentRdd map 2013-01-25 17:04:16 -08:00
Stephen Haberman 8efbda0b17 Call executeOnCompleteCallbacks in more finally blocks. 2013-01-25 14:55:33 -06:00
Imran Rashid a1d9d1767d fixup 1cadaa1, changed api of map 2013-01-25 10:05:26 -08:00
Imran Rashid 1cadaa164e switch to TimeStampedHashMap for storing persistent Rdds 2013-01-25 09:30:21 -08:00
Imran Rashid 539491bbc3 code reformatting 2013-01-25 09:29:59 -08:00
Stephen Haberman 7dfb82a992 Replace old 'master' term with 'driver'. 2013-01-25 11:03:00 -06:00
Stephen Haberman ec43a51b38 Merge branch 'master' into localsparkcontext
Conflicts:
	core/src/test/scala/spark/FileServerSuite.scala
	core/src/test/scala/spark/RDDSuite.scala
2013-01-24 21:17:30 -06:00
Patrick Wendell b6fc6e6752 SPARK-541: Adding a warning for invalid Master URL
Right now Spark silently parses master URL's which do not match any
known regex as a Mesos URL. The Mesos error message when an invalid URL gets
passed is really confusing, so this warns the user when the implicit
conversion is happening.
2013-01-24 14:31:23 -08:00
Stephen Haberman 230bda2047 Add LocalSparkContext to manage common sc variable. 2013-01-24 11:01:01 -06:00
Matei Zaharia 0fe173a3a5 Merge pull request #410 from rxin/splitpruningrdd
Added a clearDependencies method in PartitionPruningRDD.
2013-01-23 23:10:15 -08:00
Reynold Xin 67a43bc7e6 Added a clearDependencies method in PartitionPruningRDD. 2013-01-23 23:06:52 -08:00
Matei Zaharia fe5e4812fc Merge pull request #409 from rxin/splitpruningrdd
Added pruntSplits method to RDD.
2013-01-23 22:23:22 -08:00
Reynold Xin c109f29c97 Updated PruneDependency to change "split" to "partition". 2013-01-23 22:22:03 -08:00
Reynold Xin eedc542a02 Removed pruneSplits method in RDD and renamed SplitsPruningRDD to
PartitionPruningRDD.
2013-01-23 22:14:23 -08:00
Reynold Xin 81004b967e Marked prev RDD as transient in SplitsPruningRDD. 2013-01-23 21:54:27 -08:00
Reynold Xin 636e912f32 Created a PruneDependency to properly assign dependency for
SplitsPruningRDD.
2013-01-23 21:21:55 -08:00
Reynold Xin 45cd50d5fe Updated assert == to ===. 2013-01-23 16:06:58 -08:00
Matei Zaharia 548856a224 Merge remote-tracking branch 'woggling/remove-machines'
Conflicts:
	core/src/main/scala/spark/scheduler/DAGScheduler.scala
2013-01-23 15:44:17 -08:00
Reynold Xin c24b3819dd Added an extra assert for split size check. 2013-01-23 15:34:59 -08:00
Reynold Xin eb222b7206 Added pruntSplits method to RDD. 2013-01-23 15:29:02 -08:00
Matei Zaharia 1dd82743e0 Fix compile error due to cherry-pick 2013-01-23 13:07:27 -08:00
Charles Reiss 5c7422292e Remove more dead code from test. 2013-01-23 12:59:51 -08:00
Imran Rashid e1985bfa04 be sure to set class loader of kryo instances 2013-01-23 12:51:09 -08:00
Charles Reiss be4a115a7e Clarify TODO. 2013-01-23 12:48:45 -08:00
Charles Reiss 88b9d240fd Remove dead code in test. 2013-01-23 12:40:38 -08:00
Matei Zaharia 1a3aeeca23 Merge pull request #407 from woggling/no-cache-tracker
Eliminate CacheTracker
2013-01-23 12:28:48 -08:00
Charles Reiss e1027ca639 Actually add CacheManager. 2013-01-23 12:22:11 -08:00
Matei Zaharia 4147e1d47b Merge pull request #406 from tdas/master
Changed StorageLevel and BlockManagerId API to prevent duplication in memory
2013-01-23 12:18:31 -08:00
Matei Zaharia 4d77d554e1 Merge pull request #394 from JoshRosen/add_file_fix
Add SparkFiles.get() API to access files added through addFile().
2013-01-23 12:16:30 -08:00
Josh Rosen ae2ed2947d Allow PySpark's SparkFiles to be used from driver
Fix minor documentation formatting issues.
2013-01-23 10:58:50 -08:00
Tathagata Das 79d55700ce One more fix. Made even default constructor of BlockManagerId private to prevent such problems in the future. 2013-01-23 01:57:09 -08:00
Charles Reiss 0b506dd2ec Add tests of various node failure scenarios. 2013-01-23 01:38:15 -08:00
Charles Reiss d209b6b764 Extra debugging from hostLost() 2013-01-23 01:35:14 -08:00
Charles Reiss 9a27062260 Force generation increment after shuffle map stage 2013-01-23 01:34:44 -08:00
Tathagata Das 155f31398d Made StorageLevel constructor private, and added StorageLevels.create() to the Java API. Updates scala and java programming guides. 2013-01-23 01:10:26 -08:00
Tathagata Das 5e11f1e51f Modified StorageLevel API to ensure zero duplicate objects. 2013-01-22 23:42:53 -08:00
Tathagata Das bacade6caf Modified BlockManagerId API to ensure zero duplicate objects. Fixed BlockManagerId testcase in BlockManagerTestSuite. 2013-01-22 22:55:26 -08:00
Josh Rosen 43e9ff9596 Add test for driver hanging on exit (SPARK-530). 2013-01-22 22:47:26 -08:00
Charles Reiss 2849931000 Eliminate CacheTracker.
Replaces DAGScheduler's queries of CacheTracker with BlockManagerMaster
queries.

Adds CacheManager to locally coordinate computation of cached RDDs.
2013-01-22 22:19:30 -08:00
Matei Zaharia ebaa8f6519 Merge remote-tracking branch 'stephenh/cleanup'
Conflicts:
	core/src/main/scala/spark/scheduler/local/LocalScheduler.scala
2013-01-22 21:05:45 -08:00
Matei Zaharia d2d273868b Merge pull request #397 from JoshRosen/refactoring/daemon-threads
Refactor daemon thread creation
2013-01-22 21:02:53 -08:00
Stephen Haberman 98d0b7747d Fix Worker logInfo about unknown executor. 2013-01-22 18:11:51 -06:00
Stephen Haberman 8c51322cd0 Don't bother creating an exception. 2013-01-22 18:09:10 -06:00
Stephen Haberman fdec42385a Fix SPARK_MEM in ExecutorRunner. 2013-01-22 18:01:12 -06:00
Stephen Haberman 2437f6741b Restore SPARK_MEM in executorEnvs. 2013-01-22 18:01:03 -06:00
Matei Zaharia 151c47eef5 Merge pull request #399 from NFLabs/master
Fix for hanging spark.HttpFileServer on the kind of virtual network
2013-01-22 15:49:24 -08:00
Stephen Haberman 250fe89679 Handle Master telling the Worker to kill an already-dead executor. 2013-01-22 16:29:05 -06:00
Stephen Haberman 6f2194f757 Call removeJob instead of killing the cluster. 2013-01-22 15:38:58 -06:00
Stephen Haberman 27b3f3f0a9 Handle slaveLost before slaveIdToHost knows about it. 2013-01-22 15:30:42 -06:00
Imran Rashid 905c720e5e Merge branch 'master' into blockmanager_ui
Conflicts:
	core/src/main/scala/spark/RDD.scala
2013-01-22 12:02:27 -08:00
Imran Rashid 50e2b23927 Fix up some problems from the merge 2013-01-22 11:46:01 -08:00
Stephen Haberman 588b24197a Use default arguments instead of constructor overloads. 2013-01-22 10:19:30 -06:00
Leemoonsoo 7e9ee2e833 Fix for hanging spark.HttpFileServer with kind of virtual network 2013-01-22 23:08:34 +09:00
Charles Reiss e353886a8c Use generation numbers for fetch failure tracking 2013-01-22 00:23:31 -08:00
Josh Rosen 551a47a620 Refactor daemon thread pool creation. 2013-01-21 23:31:00 -08:00
Stephen Haberman a8baeb9327 Further simplify getOrElse call. 2013-01-21 21:30:24 -06:00
Stephen Haberman 2d8218b871 Remove unneeded/now-broken saveAsNewAPIHadoopFile overload. 2013-01-21 20:00:27 -06:00
Josh Rosen 7b9e96c992 Add synchronization to Executor.updateDependencies() (SPARK-662) 2013-01-21 17:34:23 -08:00
Josh Rosen ef711902c1 Don't download files to master's working directory.
This should avoid exceptions caused by existing
files with different contents.

I also removed some unused code.
2013-01-21 17:34:17 -08:00
Stephen Haberman ffd1623595 Minor cleanup. 2013-01-21 15:55:46 -06:00
Matei Zaharia a88b44ed3b Only bind to IPv4 addresses when trying to auto-detect external IP 2013-01-21 11:59:21 -08:00
Matei Zaharia 4d34c7fc3e Fix compile error caused by cherry-pick 2013-01-21 11:33:48 -08:00
Imran Rashid a3f571b539 more File -> String changes 2013-01-21 11:21:52 -08:00
Imran Rashid fe26acc482 remove unused imports 2013-01-21 11:21:46 -08:00
Imran Rashid c73107500e send sparkHome as String instead of File over network 2013-01-21 11:21:39 -08:00
Imran Rashid 5bf73df7f0 oops, fix stupid compile error 2013-01-21 11:21:33 -08:00
Imran Rashid aae5a920a4 get sparkHome the correct way 2013-01-21 11:21:28 -08:00
Imran Rashid f116d6b5c6 executor can use a different sparkHome from Worker 2013-01-21 11:21:22 -08:00
Stephen Haberman 6ded481999 Merge branch 'master' into hadoopconf
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/api/java/JavaSparkContext.scala
2013-01-21 12:56:48 -06:00
Stephen Haberman 69a417858b Also use hadoopConfiguration in newAPI methods. 2013-01-21 12:42:11 -06:00
Matei Zaharia c0b9ceb8c3 Log remote lifecycle events in Akka for easier debugging 2013-01-21 00:23:53 -08:00
Matei Zaharia c7b5e5f1ec Merge pull request #389 from JoshRosen/python_rdd_checkpointing
Add checkpointing to the Python API
2013-01-20 17:10:44 -08:00
Josh Rosen 9f211dd3f0 Fix PythonPartitioner equality; see SPARK-654.
PythonPartitioner did not take the Python-side partitioning function
into account when checking for equality, which might cause problems
in the future.
2013-01-20 15:41:42 -08:00
Josh Rosen 5b6ea9e9a0 Update checkpointing API docs in Python/Java. 2013-01-20 15:31:41 -08:00
Josh Rosen 7ed1bf4b48 Add RDD checkpointing to Python API. 2013-01-20 13:19:19 -08:00
Matei Zaharia 86057ec7c8 Merge branch 'master' into streaming
Conflicts:
	core/src/main/scala/spark/api/python/PythonRDD.scala
2013-01-20 12:47:55 -08:00
Matei Zaharia 8e7f098a2c Added accumulators to PySpark 2013-01-20 01:57:44 -08:00
Tathagata Das 4f8fe58b25 Merge branch 'mesos-streaming' into streaming
Conflicts:
	core/src/main/scala/spark/api/java/JavaRDDLike.scala
	core/src/main/scala/spark/api/java/JavaSparkContext.scala
	core/src/test/scala/spark/JavaAPISuite.java
2013-01-20 01:13:56 -08:00
Tathagata Das 214345ceac Fixed issue https://spark-project.atlassian.net/browse/STREAMING-29, along with updates to doc comments in SparkContext.checkpoint(). 2013-01-19 23:50:17 -08:00
Imran Rashid d98caa0fa0 Merge remote-tracking branch 'dennybritz/blockmanagerUI' into blockmanager_ui
Conflicts:
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/storage/BlockManagerMaster.scala
	core/src/main/scala/spark/storage/StorageLevel.scala
2013-01-18 18:11:26 -08:00
Patrick Wendell ee0314c3b3 Merge branch 'streaming' into streaming-java-api 2013-01-17 18:43:00 -08:00
Patrick Wendell d5570c7968 Adding checkpointing to Java API 2013-01-17 18:41:58 -08:00
Matei Zaharia 54c0f9f185 Fix code that assumed spark.local.dir is only a single directory 2013-01-17 17:40:55 -08:00
Fernand Pajot 742bc841ad changed HttpBroadcast server cache to be in spark.local.dir instead of java.io.tmpdir 2013-01-17 16:56:11 -08:00
Matei Zaharia aff1844155 Merge pull request #381 from squito/remove_threadpool
remove unused thread pool
2013-01-16 16:46:42 -08:00
Tathagata Das f466ee44bc Merge branch 'master' into streaming
Conflicts:
	core/src/main/scala/spark/MapOutputTracker.scala
2013-01-16 12:57:11 -08:00
Imran Rashid eae698f755 remove unused thread pool 2013-01-16 12:21:37 -08:00
Tathagata Das a805ac4a7c Disabled checkpoint for PairwiseRDD (pySpark). 2013-01-16 10:55:26 -08:00
Matei Zaharia 4beb084f64 Merge pull request #374 from woggling/null-mapout
Generate FetchFailedException even for cached missing map outputs
2013-01-15 14:22:29 -08:00
Tathagata Das cd1521cfdb Merge branch 'master' into streaming
Conflicts:
	core/src/main/scala/spark/rdd/CoGroupedRDD.scala
	core/src/main/scala/spark/rdd/FilteredRDD.scala
	docs/_layouts/global.html
	docs/index.md
	run
2013-01-15 12:08:51 -08:00
Charles Reiss 4078623b9f Remove broken attempt to test fetching case. 2013-01-15 12:05:54 -08:00
Stephen Haberman 74d3b23929 Add spark.executor.memory to differentiate executor memory from spark-shell memory. 2013-01-15 14:03:28 -06:00
Stephen Haberman d228bff440 Add a test. 2013-01-15 11:48:50 -06:00
Stephen Haberman dd583b7ebf Call executeOnCompleteCallbacks in a finally block. 2013-01-15 10:52:06 -06:00
Tathagata Das eded21925a Merge pull request #375 from tdas/streaming
Important bug fixes
2013-01-14 23:06:40 -08:00
Charles Reiss b038999797 Fix accidental spark.master.host reuse 2013-01-14 17:04:44 -08:00
Charles Reiss 7ba34bc007 Additional tests for MapOutputTracker. 2013-01-14 15:27:02 -08:00
Charles Reiss 273fb5cc10 Throw FetchFailedException for cached missing locs 2013-01-14 15:26:48 -08:00
Tathagata Das 131be5d62e Fixed bug in RDD checkpointing. 2013-01-14 03:28:25 -08:00
Tathagata Das 82b0cc90ca Merge pull request #370 from tdas/streaming
Added more documentation and minor change in API for NetworkReceiver
2013-01-13 21:28:12 -08:00
Tathagata Das 0dbd411a56 Added documentation for PairDStreamFunctions. 2013-01-13 21:08:35 -08:00
Matei Zaharia cb867e9ffb Merge branch 'master' of github.com:mesos/spark 2013-01-13 19:34:32 -08:00
Matei Zaharia 72408e8dfa Make filter preserve partitioner info, since it can 2013-01-13 19:34:07 -08:00
Matei Zaharia 9a34409810 Merge pull request #360 from rxin/cogroup-java
Changed CoGroupRDD's hash map from Scala to Java.
2013-01-13 15:31:08 -08:00
Reynold Xin be7166146b Removed the use of getOrElse to avoid Scala wrapper for every call. 2013-01-13 15:27:28 -08:00
Ryan LeCompte c31931af7e switch to uppercase constants 2013-01-13 10:39:47 -08:00
Ryan LeCompte 2305a2c1d9 more code cleanup 2013-01-13 10:01:56 -08:00
Mikhail Bautin 88d8f11365 Add missing dependency spray-json to Maven build 2013-01-13 00:46:25 -08:00
Matei Zaharia fbb3fc4143 Merge pull request #346 from JoshRosen/python-api
Python API (PySpark)
2013-01-12 23:49:36 -08:00
Matei Zaharia 01413ca0e7 Merge pull request #364 from tysonjh/master
Executor and JobDescription JSON support added
2013-01-12 16:17:07 -08:00
Matei Zaharia 995075bf79 Merge pull request #355 from shivaram/default-hadoop-pom
Activate hadoop1 profile by default for maven builds
2013-01-12 15:38:36 -08:00
Shivaram Venkataraman bbc56d85ed Rename environment variable for hadoop profiles to hadoopVersion 2013-01-12 15:24:13 -08:00
Ryan LeCompte addff2c466 add comment 2013-01-12 09:57:29 -08:00
Ryan LeCompte ea20ae6618 add one extra test 2013-01-12 09:18:00 -08:00
Ryan LeCompte 2c77eeebb6 correct test params 2013-01-12 00:13:45 -08:00
Ryan LeCompte 0cfea7a2ec add unit test 2013-01-11 23:48:07 -08:00
Ryan LeCompte ff10b3aa09 add missing return 2013-01-11 21:03:57 -08:00
Ryan LeCompte 22445fbea9 attempt to sleep for more accurate time period, minor cleanup 2013-01-11 13:30:49 -08:00
Tyson 1731f1fed4 Added an optional format parameter for individual job queries and optimized the jobId query 2013-01-11 15:01:43 -05:00
Tyson c063e8777e Added implicit json writers for JobDescription and ExecutorRunner 2013-01-11 14:57:38 -05:00
Stephen Haberman 5c7a127219 Pass a new Configuration that wraps the default hadoopConfiguration. 2013-01-11 11:25:11 -06:00
Stephen Haberman 3e6519a36e Use hadoopConfiguration for default JobConf in PairRDDFunctions. 2013-01-11 11:24:20 -06:00
Shivaram Venkataraman 9262522306 Activate hadoop2 profile in pom.xml with -Dhadoop=2 2013-01-10 22:07:34 -08:00
Matei Zaharia 2e914d9983 Formatting 2013-01-10 19:13:08 -08:00
Matei Zaharia 3548c9c0c8 Merge branch 'master' of github.com:mesos/spark 2013-01-10 19:06:40 -08:00
Matei Zaharia 6d1c230281 Merge pull request #357 from tysonjh/master
JSON support added to WebUI
2013-01-10 19:06:07 -08:00
Matei Zaharia 248995c535 Merge pull request #356 from shane-huang/master
Fix an issue in ConnectionManager where sendMessage may create too many unnecessary connections
2013-01-10 17:52:23 -08:00
Reynold Xin bd336f5f40 Changed CoGroupRDD's hash map from Scala to Java. 2013-01-10 17:13:04 -08:00
Stephen Haberman d1864052c5 Fix invalid asInstanceOf cast. 2013-01-10 12:16:26 -06:00
Stephen Haberman b15e851279 Check for AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY environment variables.
For custom properties, use "spark.hadoop.*" as a prefix instead of just "hadoop.*".
2013-01-10 10:55:41 -06:00
shane-huang 9930a95d21 Modified Patch according to comments 2013-01-10 20:09:55 +08:00
Stephen Haberman e3861ae395 Provide and expose a default Hadoop Configuration.
Any "hadoop.*" system properties will be passed along into configuration.
2013-01-09 17:08:14 -06:00
Tyson 549ee388a1 Removed io.spray spray-json dependency as it is not needed. 2013-01-09 15:12:23 -05:00
Tyson bf9d9946f9 Query parameter reformatted to be more extensible and routing more robust 2013-01-09 11:29:58 -05:00
Tyson 0da2ff102e Added url query parameter json and handler 2013-01-09 10:40:48 -05:00
Tyson 269fe018c7 JSON object definitions 2013-01-09 10:40:43 -05:00
Matei Zaharia 9cc764f523 Code style 2013-01-08 22:29:57 -08:00
Matei Zaharia 14972141f9 Merge pull request #344 from mbautin/log_preferred_hosts
Log preferred hosts
2013-01-08 22:26:34 -08:00
Josh Rosen b57dd0f160 Add mapPartitionsWithSplit() to PySpark. 2013-01-08 16:05:02 -08:00
Stephen Haberman 8ac0f35be4 Add JavaRDDLike.keyBy. 2013-01-08 09:57:45 -06:00
Stephen Haberman 4ee6b22775 Merge branch 'master' into tupleBy
Conflicts:
	core/src/test/scala/spark/RDDSuite.scala
2013-01-08 09:10:10 -06:00
shane-huang e4cb72da8a Fix an issue in ConnectionManager where sendingMessage may create too many unnecessary SendingConnections. 2013-01-08 22:40:58 +08:00
Shivaram Venkataraman f7adb382ac Activate hadoop1 if property hadoop is missing. hadoop2 can be activated now
by using -Dhadoop -Phadoop2.
2013-01-08 03:19:43 -08:00
Mikhail Bautin 4725b0f643 Fixing if/else coding style for preferred hosts logging 2013-01-07 20:09:26 -08:00