Commit graph

882 commits

Author SHA1 Message Date
Josh Rosen 551a47a620 Refactor daemon thread pool creation. 2013-01-21 23:31:00 -08:00
Stephen Haberman a8baeb9327 Further simplify getOrElse call. 2013-01-21 21:30:24 -06:00
Stephen Haberman 2d8218b871 Remove unneeded/now-broken saveAsNewAPIHadoopFile overload. 2013-01-21 20:00:27 -06:00
Josh Rosen 7b9e96c992 Add synchronization to Executor.updateDependencies() (SPARK-662) 2013-01-21 17:34:23 -08:00
Josh Rosen ef711902c1 Don't download files to master's working directory.
This should avoid exceptions caused by existing
files with different contents.

I also removed some unused code.
2013-01-21 17:34:17 -08:00
Stephen Haberman ffd1623595 Minor cleanup. 2013-01-21 15:55:46 -06:00
Matei Zaharia a88b44ed3b Only bind to IPv4 addresses when trying to auto-detect external IP 2013-01-21 11:59:21 -08:00
Matei Zaharia 4d34c7fc3e Fix compile error caused by cherry-pick 2013-01-21 11:33:48 -08:00
Imran Rashid a3f571b539 more File -> String changes 2013-01-21 11:21:52 -08:00
Imran Rashid fe26acc482 remove unused imports 2013-01-21 11:21:46 -08:00
Imran Rashid c73107500e send sparkHome as String instead of File over network 2013-01-21 11:21:39 -08:00
Imran Rashid 5bf73df7f0 oops, fix stupid compile error 2013-01-21 11:21:33 -08:00
Imran Rashid aae5a920a4 get sparkHome the correct way 2013-01-21 11:21:28 -08:00
Imran Rashid f116d6b5c6 executor can use a different sparkHome from Worker 2013-01-21 11:21:22 -08:00
Stephen Haberman 6ded481999 Merge branch 'master' into hadoopconf
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/api/java/JavaSparkContext.scala
2013-01-21 12:56:48 -06:00
Stephen Haberman 69a417858b Also use hadoopConfiguration in newAPI methods. 2013-01-21 12:42:11 -06:00
Matei Zaharia c0b9ceb8c3 Log remote lifecycle events in Akka for easier debugging 2013-01-21 00:23:53 -08:00
Matei Zaharia c7b5e5f1ec Merge pull request #389 from JoshRosen/python_rdd_checkpointing
Add checkpointing to the Python API
2013-01-20 17:10:44 -08:00
Josh Rosen 9f211dd3f0 Fix PythonPartitioner equality; see SPARK-654.
PythonPartitioner did not take the Python-side partitioning function
into account when checking for equality, which might cause problems
in the future.
2013-01-20 15:41:42 -08:00
Josh Rosen 5b6ea9e9a0 Update checkpointing API docs in Python/Java. 2013-01-20 15:31:41 -08:00
Josh Rosen 7ed1bf4b48 Add RDD checkpointing to Python API. 2013-01-20 13:19:19 -08:00
Matei Zaharia 86057ec7c8 Merge branch 'master' into streaming
Conflicts:
	core/src/main/scala/spark/api/python/PythonRDD.scala
2013-01-20 12:47:55 -08:00
Matei Zaharia 8e7f098a2c Added accumulators to PySpark 2013-01-20 01:57:44 -08:00
Tathagata Das 4f8fe58b25 Merge branch 'mesos-streaming' into streaming
Conflicts:
	core/src/main/scala/spark/api/java/JavaRDDLike.scala
	core/src/main/scala/spark/api/java/JavaSparkContext.scala
	core/src/test/scala/spark/JavaAPISuite.java
2013-01-20 01:13:56 -08:00
Tathagata Das 214345ceac Fixed issue https://spark-project.atlassian.net/browse/STREAMING-29, along with updates to doc comments in SparkContext.checkpoint(). 2013-01-19 23:50:17 -08:00
Imran Rashid d98caa0fa0 Merge remote-tracking branch 'dennybritz/blockmanagerUI' into blockmanager_ui
Conflicts:
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/storage/BlockManagerMaster.scala
	core/src/main/scala/spark/storage/StorageLevel.scala
2013-01-18 18:11:26 -08:00
Patrick Wendell ee0314c3b3 Merge branch 'streaming' into streaming-java-api 2013-01-17 18:43:00 -08:00
Patrick Wendell d5570c7968 Adding checkpointing to Java API 2013-01-17 18:41:58 -08:00
Matei Zaharia 54c0f9f185 Fix code that assumed spark.local.dir is only a single directory 2013-01-17 17:40:55 -08:00
Fernand Pajot 742bc841ad changed HttpBroadcast server cache to be in spark.local.dir instead of java.io.tmpdir 2013-01-17 16:56:11 -08:00
Matei Zaharia aff1844155 Merge pull request #381 from squito/remove_threadpool
remove unused thread pool
2013-01-16 16:46:42 -08:00
Tathagata Das f466ee44bc Merge branch 'master' into streaming
Conflicts:
	core/src/main/scala/spark/MapOutputTracker.scala
2013-01-16 12:57:11 -08:00
Imran Rashid eae698f755 remove unused thread pool 2013-01-16 12:21:37 -08:00
Tathagata Das a805ac4a7c Disabled checkpoint for PairwiseRDD (pySpark). 2013-01-16 10:55:26 -08:00
Matei Zaharia 4beb084f64 Merge pull request #374 from woggling/null-mapout
Generate FetchFailedException even for cached missing map outputs
2013-01-15 14:22:29 -08:00
Tathagata Das cd1521cfdb Merge branch 'master' into streaming
Conflicts:
	core/src/main/scala/spark/rdd/CoGroupedRDD.scala
	core/src/main/scala/spark/rdd/FilteredRDD.scala
	docs/_layouts/global.html
	docs/index.md
	run
2013-01-15 12:08:51 -08:00
Stephen Haberman dd583b7ebf Call executeOnCompleteCallbacks in a finally block. 2013-01-15 10:52:06 -06:00
Tathagata Das eded21925a Merge pull request #375 from tdas/streaming
Important bug fixes
2013-01-14 23:06:40 -08:00
Charles Reiss 273fb5cc10 Throw FetchFailedException for cached missing locs 2013-01-14 15:26:48 -08:00
Tathagata Das 131be5d62e Fixed bug in RDD checkpointing. 2013-01-14 03:28:25 -08:00
Tathagata Das 82b0cc90ca Merge pull request #370 from tdas/streaming
Added more documentation and minor change in API for NetworkReceiver
2013-01-13 21:28:12 -08:00
Tathagata Das 0dbd411a56 Added documentation for PairDStreamFunctions. 2013-01-13 21:08:35 -08:00
Matei Zaharia cb867e9ffb Merge branch 'master' of github.com:mesos/spark 2013-01-13 19:34:32 -08:00
Matei Zaharia 72408e8dfa Make filter preserve partitioner info, since it can 2013-01-13 19:34:07 -08:00
Matei Zaharia 9a34409810 Merge pull request #360 from rxin/cogroup-java
Changed CoGroupRDD's hash map from Scala to Java.
2013-01-13 15:31:08 -08:00
Reynold Xin be7166146b Removed the use of getOrElse to avoid Scala wrapper for every call. 2013-01-13 15:27:28 -08:00
Ryan LeCompte c31931af7e switch to uppercase constants 2013-01-13 10:39:47 -08:00
Ryan LeCompte 2305a2c1d9 more code cleanup 2013-01-13 10:01:56 -08:00
Matei Zaharia fbb3fc4143 Merge pull request #346 from JoshRosen/python-api
Python API (PySpark)
2013-01-12 23:49:36 -08:00
Ryan LeCompte addff2c466 add comment 2013-01-12 09:57:29 -08:00
Ryan LeCompte 0cfea7a2ec add unit test 2013-01-11 23:48:07 -08:00
Ryan LeCompte ff10b3aa09 add missing return 2013-01-11 21:03:57 -08:00
Ryan LeCompte 22445fbea9 attempt to sleep for more accurate time period, minor cleanup 2013-01-11 13:30:49 -08:00
Tyson 1731f1fed4 Added an optional format parameter for individual job queries and optimized the jobId query 2013-01-11 15:01:43 -05:00
Tyson c063e8777e Added implicit json writers for JobDescription and ExecutorRunner 2013-01-11 14:57:38 -05:00
Stephen Haberman 5c7a127219 Pass a new Configuration that wraps the default hadoopConfiguration. 2013-01-11 11:25:11 -06:00
Stephen Haberman 3e6519a36e Use hadoopConfiguration for default JobConf in PairRDDFunctions. 2013-01-11 11:24:20 -06:00
Matei Zaharia 2e914d9983 Formatting 2013-01-10 19:13:08 -08:00
Matei Zaharia 3548c9c0c8 Merge branch 'master' of github.com:mesos/spark 2013-01-10 19:06:40 -08:00
Matei Zaharia 6d1c230281 Merge pull request #357 from tysonjh/master
JSON support added to WebUI
2013-01-10 19:06:07 -08:00
Matei Zaharia 248995c535 Merge pull request #356 from shane-huang/master
Fix an issue in ConnectionManager where sendMessage may create too many unnecessary connections
2013-01-10 17:52:23 -08:00
Reynold Xin bd336f5f40 Changed CoGroupRDD's hash map from Scala to Java. 2013-01-10 17:13:04 -08:00
Stephen Haberman d1864052c5 Fix invalid asInstanceOf cast. 2013-01-10 12:16:26 -06:00
Stephen Haberman b15e851279 Check for AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY environment variables.
For custom properties, use "spark.hadoop.*" as a prefix instead of just "hadoop.*".
2013-01-10 10:55:41 -06:00
shane-huang 9930a95d21 Modified Patch according to comments 2013-01-10 20:09:55 +08:00
Stephen Haberman e3861ae395 Provide and expose a default Hadoop Configuration.
Any "hadoop.*" system properties will be passed along into configuration.
2013-01-09 17:08:14 -06:00
Tyson 549ee388a1 Removed io.spray spray-json dependency as it is not needed. 2013-01-09 15:12:23 -05:00
Tyson bf9d9946f9 Query parameter reformatted to be more extensible and routing more robust 2013-01-09 11:29:58 -05:00
Tyson 0da2ff102e Added url query parameter json and handler 2013-01-09 10:40:48 -05:00
Tyson 269fe018c7 JSON object definitions 2013-01-09 10:40:43 -05:00
Matei Zaharia 9cc764f523 Code style 2013-01-08 22:29:57 -08:00
Matei Zaharia 14972141f9 Merge pull request #344 from mbautin/log_preferred_hosts
Log preferred hosts
2013-01-08 22:26:34 -08:00
Josh Rosen b57dd0f160 Add mapPartitionsWithSplit() to PySpark. 2013-01-08 16:05:02 -08:00
Stephen Haberman 8ac0f35be4 Add JavaRDDLike.keyBy. 2013-01-08 09:57:45 -06:00
Stephen Haberman 4ee6b22775 Merge branch 'master' into tupleBy
Conflicts:
	core/src/test/scala/spark/RDDSuite.scala
2013-01-08 09:10:10 -06:00
shane-huang e4cb72da8a Fix an issue in ConnectionManager where sendingMessage may create too many unnecessary SendingConnections. 2013-01-08 22:40:58 +08:00
Mikhail Bautin 4725b0f643 Fixing if/else coding style for preferred hosts logging 2013-01-07 20:09:26 -08:00
Mikhail Bautin c41042c816 Log preferred hosts 2013-01-07 20:06:09 -08:00
Matei Zaharia f7cf035b9b Merge pull request #350 from tdas/streaming
Spark Streaming
2013-01-07 17:40:11 -08:00
Shivaram Venkataraman 77d751731c Remove unused BoundedMemoryCache file and associated test case. 2013-01-07 15:57:46 -08:00
Shivaram Venkataraman aed368a970 Update Hadoop dependency to 1.0.3 as 0.20 has Sun specific dependencies. Also
fix SequenceFileRDDFunctions to pick the right type conversion across Hadoop
versions
2013-01-07 15:57:33 -08:00
Shivaram Venkataraman f8d579a0c0 Remove dependencies on sun jvm classes. Instead use reflection to infer
HotSpot options and total physical memory size
2013-01-07 15:57:18 -08:00
Tathagata Das 3b0a3b89ac Added better docs for RDDCheckpointData 2013-01-07 14:55:49 -08:00
Tathagata Das 237bac36e9 Renamed examples and added documentation. 2013-01-07 14:37:21 -08:00
Matei Zaharia 1941d9602d Merge branch 'master' of github.com:mesos/spark 2013-01-07 16:50:39 -05:00
Matei Zaharia 9c32f300fb Add Accumulable.setValue for easier use in Java 2013-01-07 16:50:23 -05:00
Tathagata Das 1346126485 Changed cleanup to clearOldValues for TimeStampedHashMap and TimeStampedHashSet. 2013-01-07 12:11:27 -08:00
Stephen Haberman 8dc06069fe Rename RDD.tupleBy to keyBy. 2013-01-06 15:21:45 -06:00
Matei Zaharia 8fd3a70c18 Add PairRDD.keys() and values() to Java API 2013-01-05 22:46:45 -05:00
Matei Zaharia b1663752c6 Merge pull request #351 from stephenh/values
Add PairRDDFunctions.keys and values.
2013-01-05 19:15:54 -08:00
Matei Zaharia 0982572519 Add methods called just 'accumulator' for int/double in Java API 2013-01-05 22:11:28 -05:00
Matei Zaharia 86af64b0a6 Fix Accumulators in Java, and add a test for them 2013-01-05 20:55:17 -05:00
Matei Zaharia ecf9c08901 Fix Accumulators in Java, and add a test for them 2013-01-05 20:54:08 -05:00
Stephen Haberman 1fdb6946b5 Add RDD.tupleBy. 2013-01-05 13:07:59 -06:00
Stephen Haberman f4e6b9361f Add RDD.collect(PartialFunction). 2013-01-05 12:14:08 -06:00
Stephen Haberman 8d57c78c83 Add PairRDDFunctions.keys and values. 2013-01-05 12:04:01 -06:00
Josh Rosen 33beba3965 Change PySpark RDD.take() to not call iterator(). 2013-01-03 14:52:21 -08:00
Tathagata Das d34dba25c2 Merge branch 'mesos' into dev-merge 2013-01-01 15:48:39 -08:00
Josh Rosen b58340dbd9 Rename top-level 'pyspark' directory to 'python' 2013-01-01 15:05:00 -08:00
Josh Rosen 170e451fbd Minor documentation and style fixes for PySpark. 2013-01-01 13:52:14 -08:00