Commit graph

1090 commits

Author SHA1 Message Date
Tyson c063e8777e Added implicit json writers for JobDescription and ExecutorRunner 2013-01-11 14:57:38 -05:00
Stephen Haberman 5c7a127219 Pass a new Configuration that wraps the default hadoopConfiguration. 2013-01-11 11:25:11 -06:00
Stephen Haberman 3e6519a36e Use hadoopConfiguration for default JobConf in PairRDDFunctions. 2013-01-11 11:24:20 -06:00
Shivaram Venkataraman 9262522306 Activate hadoop2 profile in pom.xml with -Dhadoop=2 2013-01-10 22:07:34 -08:00
Matei Zaharia 2e914d9983 Formatting 2013-01-10 19:13:08 -08:00
Matei Zaharia 3548c9c0c8 Merge branch 'master' of github.com:mesos/spark 2013-01-10 19:06:40 -08:00
Matei Zaharia 6d1c230281 Merge pull request #357 from tysonjh/master
JSON support added to WebUI
2013-01-10 19:06:07 -08:00
Matei Zaharia 248995c535 Merge pull request #356 from shane-huang/master
Fix an issue in ConnectionManager where sendMessage may create too many unnecessary connections
2013-01-10 17:52:23 -08:00
Reynold Xin bd336f5f40 Changed CoGroupRDD's hash map from Scala to Java. 2013-01-10 17:13:04 -08:00
Stephen Haberman d1864052c5 Fix invalid asInstanceOf cast. 2013-01-10 12:16:26 -06:00
Stephen Haberman b15e851279 Check for AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY environment variables.
For custom properties, use "spark.hadoop.*" as a prefix instead of just "hadoop.*".
2013-01-10 10:55:41 -06:00
shane-huang 9930a95d21 Modified Patch according to comments 2013-01-10 20:09:55 +08:00
Stephen Haberman e3861ae395 Provide and expose a default Hadoop Configuration.
Any "hadoop.*" system properties will be passed along into configuration.
2013-01-09 17:08:14 -06:00
Tyson 549ee388a1 Removed io.spray spray-json dependency as it is not needed. 2013-01-09 15:12:23 -05:00
Tyson bf9d9946f9 Query parameter reformatted to be more extensible and routing more robust 2013-01-09 11:29:58 -05:00
Tyson 0da2ff102e Added url query parameter json and handler 2013-01-09 10:40:48 -05:00
Tyson 269fe018c7 JSON object definitions 2013-01-09 10:40:43 -05:00
Matei Zaharia 9cc764f523 Code style 2013-01-08 22:29:57 -08:00
Matei Zaharia 14972141f9 Merge pull request #344 from mbautin/log_preferred_hosts
Log preferred hosts
2013-01-08 22:26:34 -08:00
Josh Rosen b57dd0f160 Add mapPartitionsWithSplit() to PySpark. 2013-01-08 16:05:02 -08:00
Stephen Haberman 8ac0f35be4 Add JavaRDDLike.keyBy. 2013-01-08 09:57:45 -06:00
Stephen Haberman 4ee6b22775 Merge branch 'master' into tupleBy
Conflicts:
	core/src/test/scala/spark/RDDSuite.scala
2013-01-08 09:10:10 -06:00
shane-huang e4cb72da8a Fix an issue in ConnectionManager where sendingMessage may create too many unnecessary SendingConnections. 2013-01-08 22:40:58 +08:00
Shivaram Venkataraman f7adb382ac Activate hadoop1 if property hadoop is missing. hadoop2 can be activated now
by using -Dhadoop -Phadoop2.
2013-01-08 03:19:43 -08:00
Mikhail Bautin 4725b0f643 Fixing if/else coding style for preferred hosts logging 2013-01-07 20:09:26 -08:00
Mikhail Bautin c41042c816 Log preferred hosts 2013-01-07 20:06:09 -08:00
Shivaram Venkataraman 4bbe07e5ec Activate hadoop1 profile by default for maven builds 2013-01-07 17:46:22 -08:00
Matei Zaharia f7cf035b9b Merge pull request #350 from tdas/streaming
Spark Streaming
2013-01-07 17:40:11 -08:00
Shivaram Venkataraman b1336e2fe4 Update expected size of strings to match our dummy string class 2013-01-07 17:00:32 -08:00
Tathagata Das 4719e6d8fe Changed locations for unit test logs. 2013-01-07 16:06:07 -08:00
Shivaram Venkataraman 55c66d365f Use a dummy string class in Size Estimator tests to make it resistant to jdk
versions
2013-01-07 15:58:00 -08:00
Shivaram Venkataraman 77d751731c Remove unused BoundedMemoryCache file and associated test case. 2013-01-07 15:57:46 -08:00
Shivaram Venkataraman aed368a970 Update Hadoop dependency to 1.0.3 as 0.20 has Sun specific dependencies. Also
fix SequenceFileRDDFunctions to pick the right type conversion across Hadoop
versions
2013-01-07 15:57:33 -08:00
Shivaram Venkataraman f8d579a0c0 Remove dependencies on sun jvm classes. Instead use reflection to infer
HotSpot options and total physical memory size
2013-01-07 15:57:18 -08:00
Tathagata Das 3b0a3b89ac Added better docs for RDDCheckpointData 2013-01-07 14:55:49 -08:00
Tathagata Das 237bac36e9 Renamed examples and added documentation. 2013-01-07 14:37:21 -08:00
Matei Zaharia 1941d9602d Merge branch 'master' of github.com:mesos/spark 2013-01-07 16:50:39 -05:00
Matei Zaharia 9c32f300fb Add Accumulable.setValue for easier use in Java 2013-01-07 16:50:23 -05:00
Tathagata Das 1346126485 Changed cleanup to clearOldValues for TimeStampedHashMap and TimeStampedHashSet. 2013-01-07 12:11:27 -08:00
Stephen Haberman 8dc06069fe Rename RDD.tupleBy to keyBy. 2013-01-06 15:21:45 -06:00
Matei Zaharia 8fd3a70c18 Add PairRDD.keys() and values() to Java API 2013-01-05 22:46:45 -05:00
Matei Zaharia b1663752c6 Merge pull request #351 from stephenh/values
Add PairRDDFunctions.keys and values.
2013-01-05 19:15:54 -08:00
Matei Zaharia 0982572519 Add methods called just 'accumulator' for int/double in Java API 2013-01-05 22:11:28 -05:00
Matei Zaharia 86af64b0a6 Fix Accumulators in Java, and add a test for them 2013-01-05 20:55:17 -05:00
Matei Zaharia ecf9c08901 Fix Accumulators in Java, and add a test for them 2013-01-05 20:54:08 -05:00
Stephen Haberman 1fdb6946b5 Add RDD.tupleBy. 2013-01-05 13:07:59 -06:00
Stephen Haberman 6a0db3b449 Fix typo. 2013-01-05 12:56:17 -06:00
Stephen Haberman f4e6b9361f Add RDD.collect(PartialFunction). 2013-01-05 12:14:08 -06:00
Stephen Haberman 8d57c78c83 Add PairRDDFunctions.keys and values. 2013-01-05 12:04:01 -06:00
Josh Rosen 33beba3965 Change PySpark RDD.take() to not call iterator(). 2013-01-03 14:52:21 -08:00
Tathagata Das 3dc87dd923 Fixed compilation bug in RDDSuite created during merge for mesos/master. 2013-01-01 16:38:04 -08:00
Tathagata Das d34dba25c2 Merge branch 'mesos' into dev-merge 2013-01-01 15:48:39 -08:00
Josh Rosen b58340dbd9 Rename top-level 'pyspark' directory to 'python' 2013-01-01 15:05:00 -08:00
Josh Rosen 170e451fbd Minor documentation and style fixes for PySpark. 2013-01-01 13:52:14 -08:00
Matei Zaharia 55809fbc6d Merge pull request #349 from woggling/cache-finally
Avoid stalls when computation of cached RDD throws exception
2013-01-01 08:21:33 -08:00
Charles Reiss 58072a7340 Remove some dead comments 2013-01-01 08:07:44 -08:00
Charles Reiss 21636ee4fa Test with exception while computing cached RDD. 2013-01-01 08:07:40 -08:00
Charles Reiss feadaf72f4 Mark key as not loading in CacheTracker even when compute() fails 2013-01-01 07:57:20 -08:00
Josh Rosen f803953998 Raise exception when hashing Java arrays (SPARK-597) 2012-12-31 20:20:11 -08:00
Tathagata Das 7e0271b438 Refactored a whole lot to push all DStreams into the spark.streaming.dstream package. 2012-12-30 15:19:55 -08:00
Tathagata Das 9e644402c1 Improved jekyll and scala docs. Made many classes and method private to remove them from scala docs. 2012-12-29 18:31:51 -08:00
Josh Rosen 59195c68ec Update PySpark for compatibility with TaskContext. 2012-12-29 16:01:03 -08:00
Josh Rosen c5cee53f20 Merge remote-tracking branch 'origin/master' into python-api
Conflicts:
	docs/quick-start.md
2012-12-29 16:00:51 -08:00
Josh Rosen 7ec3595de2 Fix bug (introduced by batching) in PySpark take() 2012-12-28 22:21:16 -08:00
Josh Rosen 397e67103c Change Utils.fetchFile() warning to SparkException. 2012-12-28 17:37:13 -08:00
Josh Rosen d64fa72d2e Add addFile() and addJar() to JavaSparkContext. 2012-12-28 17:00:57 -08:00
Josh Rosen bd237d4a9d Add synchronization to LocalScheduler.updateDependencies(). 2012-12-28 17:00:57 -08:00
Josh Rosen f1bf4f0385 Skip deletion of files in clearFiles().
This fixes an issue where Spark could delete
original files in the current working directory
that were added to the job using addFile().

There was also the potential for addFile() to
overwrite local files, which is addressed by
changing Utils.fetchFile() to log a warning
instead of overwriting a file with new contents.

This is a short-term fix; a better long-term
solution would be to remove the dependence on
storing files in the current working directory,
since we can't change the cwd from Java.
2012-12-28 17:00:57 -08:00
Josh Rosen fbadb1cda5 Mark api.python classes as private; echo Java output to stderr. 2012-12-28 09:06:11 -08:00
Tathagata Das 0bc0a60d30 Modifications to make sure LocalScheduler terminate cleanly without errors when SparkContext is shutdown, to minimize spurious exception during master failure tests. 2012-12-27 15:37:33 -08:00
Tathagata Das 7c33f76291 Merge branch 'mesos' into dev-merge 2012-12-26 19:19:07 -08:00
Tathagata Das 836042bb9f Merge branch 'dev-checkpoint' of github.com:radlab/spark into dev-merge
Conflicts:
	core/src/main/scala/spark/ParallelCollection.scala
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/rdd/BlockRDD.scala
	core/src/main/scala/spark/rdd/CartesianRDD.scala
	core/src/main/scala/spark/rdd/CoGroupedRDD.scala
	core/src/main/scala/spark/rdd/CoalescedRDD.scala
	core/src/main/scala/spark/rdd/FilteredRDD.scala
	core/src/main/scala/spark/rdd/FlatMappedRDD.scala
	core/src/main/scala/spark/rdd/GlommedRDD.scala
	core/src/main/scala/spark/rdd/HadoopRDD.scala
	core/src/main/scala/spark/rdd/MapPartitionsRDD.scala
	core/src/main/scala/spark/rdd/MapPartitionsWithSplitRDD.scala
	core/src/main/scala/spark/rdd/MappedRDD.scala
	core/src/main/scala/spark/rdd/PipedRDD.scala
	core/src/main/scala/spark/rdd/SampledRDD.scala
	core/src/main/scala/spark/rdd/ShuffledRDD.scala
	core/src/main/scala/spark/rdd/UnionRDD.scala
	core/src/main/scala/spark/scheduler/ResultTask.scala
	core/src/test/scala/spark/CheckpointSuite.scala
2012-12-26 19:09:01 -08:00
Josh Rosen 1dca0c5180 Remove debug output from PythonPartitioner. 2012-12-26 18:23:06 -08:00
Josh Rosen 4608902fb8 Use filesystem to collect RDDs in PySpark.
Passing large volumes of data through Py4J seems
to be slow.  It appears to be faster to write the
data to the local filesystem and read it back from
Python.
2012-12-24 17:20:10 -08:00
Mark Hamstra 903f3518df fall back to filter-map-collect when calling lookup() on an RDD without a partitioner 2012-12-24 13:18:45 -08:00
Mark Hamstra 61be8566e2 Allow distinct() to be called without parentheses when using the default number of splits. 2012-12-24 02:36:47 -08:00
Reynold Xin 60f7338092 Remove the call to close input stream in Kryo serializer. 2012-12-21 15:49:33 -08:00
Matei Zaharia 3334b7c6b5 Merge pull request #341 from rxin/4a3fb06ac2d11125feb08acbbd4df76d1e91b677
Kryo2 update against Spark master
2012-12-21 15:31:23 -08:00
Reynold Xin eac566a7f4 Merge branch 'master' of github.com:mesos/spark into dev
Conflicts:
	core/src/main/scala/spark/MapOutputTracker.scala
	core/src/main/scala/spark/PairRDDFunctions.scala
	core/src/main/scala/spark/ParallelCollection.scala
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/rdd/BlockRDD.scala
	core/src/main/scala/spark/rdd/CartesianRDD.scala
	core/src/main/scala/spark/rdd/CoGroupedRDD.scala
	core/src/main/scala/spark/rdd/CoalescedRDD.scala
	core/src/main/scala/spark/rdd/FilteredRDD.scala
	core/src/main/scala/spark/rdd/FlatMappedRDD.scala
	core/src/main/scala/spark/rdd/GlommedRDD.scala
	core/src/main/scala/spark/rdd/HadoopRDD.scala
	core/src/main/scala/spark/rdd/MapPartitionsRDD.scala
	core/src/main/scala/spark/rdd/MapPartitionsWithSplitRDD.scala
	core/src/main/scala/spark/rdd/MappedRDD.scala
	core/src/main/scala/spark/rdd/PipedRDD.scala
	core/src/main/scala/spark/rdd/SampledRDD.scala
	core/src/main/scala/spark/rdd/ShuffledRDD.scala
	core/src/main/scala/spark/rdd/UnionRDD.scala
	core/src/main/scala/spark/storage/BlockManager.scala
	core/src/main/scala/spark/storage/BlockManagerId.scala
	core/src/main/scala/spark/storage/BlockManagerMaster.scala
	core/src/main/scala/spark/storage/StorageLevel.scala
	core/src/main/scala/spark/util/MetadataCleaner.scala
	core/src/main/scala/spark/util/TimeStampedHashMap.scala
	core/src/test/scala/spark/storage/BlockManagerSuite.scala
	run
2012-12-20 14:53:40 -08:00
Tathagata Das 8512dd3225 Merge branch 'dev' of github.com:radlab/spark into dev-checkpoint
Conflicts:
	core/src/main/scala/spark/ParallelCollection.scala
	core/src/test/scala/spark/CheckpointSuite.scala
	streaming/src/main/scala/spark/streaming/DStream.scala
2012-12-20 14:24:19 -08:00
Tathagata Das fe777eb77d Fixed bugs in CheckpointRDD and spark.CheckpointSuite. 2012-12-20 13:39:27 -08:00
Tathagata Das f9c5b0a6fe Changed checkpoint writing and reading process. 2012-12-20 11:52:23 -08:00
Matei Zaharia 5e51b889fe Merge pull request #327 from rxin/spark-633
Added the ability in block manager to remove blocks.
2012-12-20 11:33:38 -08:00
Reynold Xin 9397c5014e Let the slave notify the master block removal. 2012-12-20 01:37:09 -08:00
Reynold Xin 68c52d80ec Moved BlockManager's IdGenerator into BlockManager object. Removed some
excessive debug messages.
2012-12-19 15:27:23 -08:00
Tathagata Das 5184141936 Introduced getSpits, getDependencies, and getPreferredLocations in RDD and RDDCheckpointData. 2012-12-18 13:30:53 -08:00
Patrick Wendell bfac06e1f6 SPARK-616: Logging dead workers in Web UI.
This patch keeps track of which workers have died and marks them
as such in the master web UI. It also handles workers which die and
re-register using different actor ID's.
2012-12-17 23:09:05 -08:00
Tathagata Das 72eed2b95e Converted CheckpointState in RDDCheckpointData to use scala Enumeration. 2012-12-17 18:52:43 -08:00
Matei Zaharia b82a6dd2c7 Merge pull request #332 from JoshRosen/spark-607
Add try-finally to handle MapOutputTracker timeouts
2012-12-14 11:41:16 -08:00
Reynold Xin 06f855c24d Merge branch 'spark-633' of github.com:rxin/spark into spark-633 2012-12-14 00:27:24 -08:00
Reynold Xin 8c01295b85 Fixed conflicts from merging Charles' and TD's block manager changes. 2012-12-14 00:26:36 -08:00
Charles Reiss c528932a41 Code review cleanup. 2012-12-13 22:37:16 -08:00
Charles Reiss 0aad42b5e7 Have standalone cluster report exit codes to clients. Addresses SPARK-639. 2012-12-13 22:37:16 -08:00
Reynold Xin 0235667f73 Merge branch 'master' of github.com:mesos/spark into spark-633 2012-12-13 22:33:41 -08:00
Reynold Xin 97434f49b8 Merged TD's block manager refactoring. 2012-12-13 22:32:19 -08:00
Reynold Xin f4a9e1b9be Fixed the broken Java unit test from SPARK-635. 2012-12-13 22:22:12 -08:00
Reynold Xin 41e58a519a Merge branch 'master' of github.com:mesos/spark into spark-633 2012-12-13 22:06:47 -08:00
Josh Rosen cf52d9cade Add try-finally to handle MapOutputTracker timeouts. 2012-12-13 21:53:30 -08:00
Matei Zaharia 05e225f988 Merge pull request #329 from woggling/executor-status-codes
Executor exit status codes
2012-12-13 20:14:10 -08:00
Charles Reiss b054d3b222 ExecutorLostReason -> ExecutorLossReason 2012-12-13 18:44:07 -08:00
Charles Reiss 24d7aa2d15 Extra whitespace in ExecutorExitCode 2012-12-13 18:39:23 -08:00
Reynold Xin dc7d7fc286 Merge branch 'master' of github.com:mesos/spark into spark-633 2012-12-13 16:48:34 -08:00
Reynold Xin 4f076e105e SPARK-635: Pass a TaskContext object to compute() interface and use
that to close Hadoop input stream. Incorporated Matei's command.
2012-12-13 16:41:15 -08:00
Charles Reiss 829206f1a7 Explain slaveLost calls made by StandaloneSchedulerBackend 2012-12-13 16:23:36 -08:00
Charles Reiss a4041dd87f Log duplicate slaveLost() calls in ClusterScheduler. 2012-12-13 16:23:36 -08:00
Charles Reiss fa9df4a45d Normalize executor exit statuses and report them to the user. 2012-12-13 16:23:31 -08:00
Reynold Xin eacb98e900 SPARK-635: Pass a TaskContext object to compute() interface and use that
to close Hadoop input stream.
2012-12-13 15:41:53 -08:00
Josh Rosen 7c9e3d1c21 Return success or failure in BlockStore.remove(). 2012-12-13 15:22:27 -08:00
Reynold Xin 1b7a0451ed Added the ability in block manager to remove blocks. 2012-12-13 00:04:42 -08:00
Charles Reiss 1d8e2e6cff Call slaveLost on executor death for standalone clusters. 2012-12-12 21:15:34 -08:00
Tathagata Das 8e74fac215 Made checkpoint data in RDDs optional to further reduce serialized size. 2012-12-11 15:36:12 -08:00
Tathagata Das fa28f25619 Fixed bug in UnionRDD and CoGroupedRDD 2012-12-11 13:59:43 -08:00
Tathagata Das 2a87d816a2 Added clear property to JavaAPISuite to remove port binding errors. 2012-12-11 01:44:43 -08:00
Tathagata Das 746afc2e65 Bunch of bug fixes related to checkpointing in RDDs. RDDCheckpointData object is used to lock all serialization and dependency changes for checkpointing. ResultTask converted to Externalizable and serialized RDD is cached like ShuffleMapTask. 2012-12-10 23:36:37 -08:00
Reynold Xin 21b271f5bd Suppress shuffle block updates when a slave node comes back. 2012-12-10 20:36:03 -08:00
Matei Zaharia a1a2daa7ef Merge pull request #317 from woggling/block-manager-heartbeat
Implement block manager heartbeat
2012-12-10 11:03:55 -08:00
Charles Reiss b6b62d774f Decrease BlockManagerMaster logging verbosity 2012-12-10 00:31:55 -08:00
Charles Reiss 5d3e917d09 Use Akka scheduler for BlockManager heart beats.
Adds required ActorSystem argument to BlockManager constructors.
2012-12-10 00:31:50 -08:00
Charles Reiss b53dd28c90 Changed default block manager heartbeat interval to 5 s 2012-12-09 23:03:34 -08:00
Matei Zaharia beb440089e Merge pull request #310 from tomdz/master-mavenized
Maven build setup
2012-12-09 21:40:05 -08:00
Tathagata Das 4be1cdc8b9 Merge pull request #5 from radlab/flume-integration
Flume integration
2012-12-09 20:31:42 -08:00
Tathagata Das e427216018 Removed unnecessary testcases. 2012-12-08 12:46:59 -08:00
Matei Zaharia e1d7cd2276 Search for a non-loopback address in Utils.getLocalIpAddress 2012-12-08 00:33:11 -08:00
Patrick Wendell 3e796bdd57 Changes in response to TD's review. 2012-12-07 19:34:05 -08:00
Patrick Wendell c36ca10241 Adding locality aware parallelize 2012-12-07 16:42:36 -08:00
Tathagata Das 1f3a75ae9e Modified checkpoint testsuite to more comprehensively test checkpointing of various RDDs. Fixed checkpoint bug (splits referring to parent RDDs or parent splits) in UnionRDD and CoalescedRDD. Fixed bug in testing ShuffledRDD. Removed unnecessary and useless map-side combining step for narrow dependencies in CoGroupedRDD. Removed unncessary WeakReference stuff from many other RDDs. 2012-12-07 13:45:52 -08:00
Charles Reiss 714c8d32d5 Don't divide by milliseconds by 1000 more. 2012-12-06 18:38:34 -08:00
Charles Reiss 8f0819520c map -> foreach 2012-12-06 18:29:50 -08:00
Charles Reiss 7a033fd795 Make LocalSparkCluster use distinct IPs 2012-12-06 00:03:08 -08:00
Charles Reiss a2a94fdbc7 Tests for block manager heartbeats. 2012-12-05 23:36:05 -08:00
Charles Reiss d21ca010ac Add block manager heart beats.
Renames old message called 'HeartBeat' to 'BlockUpdate'.

The BlockManager periodically sends a heart beat message to the master.
If the manager is currently not registered. The master responds to the
heart beat by indicating whether the BlockManager is currently registered
with the master. Additionally, the master now also responds to block
updates by indicating whether the BlockManager in question is registered.
When the BlockManager detects (by heart beat or failed block update)
that it stopped being registered, it reregisters and sends block
updates for all its blocks.
2012-12-05 23:35:20 -08:00
Charles Reiss c9e54a6755 Track block managers by hostname; handle manager removal. 2012-12-05 23:35:20 -08:00
Charles Reiss 5afa2ee9e9 Actually put millis in _lastSeenMs 2012-12-05 23:35:20 -08:00
Charles Reiss 813ac71459 Don't use bogus port number in notifyADeadHost(). 2012-12-05 23:35:20 -08:00
Tathagata Das 21a0852976 Refactored RDD checkpointing to minimize extra fields in RDD class. 2012-12-04 22:10:25 -08:00
Tathagata Das a69a82be26 Added metadata cleaner to HttpBroadcast to clean up old broacast files. 2012-12-03 22:37:31 -08:00
Josh Rosen cdaa0fad51 Use external addresses in standalone WebUI on EC2. 2012-12-01 18:19:13 -08:00
Tathagata Das b4dba55f78 Made RDD checkpoint not create a new thread. Fixed bug in detecting when spark.cleaner.delay is insufficient. 2012-12-02 02:03:05 +00:00
Tathagata Das 477de94894 Minor modifications. 2012-12-01 13:15:06 -08:00
Tathagata Das 6fcd09f499 Added TimeStampedHashSet and used that to cleanup the list of registered RDD IDs in CacheTracker. 2012-11-29 02:06:33 -08:00
Tathagata Das c9789751bf Added metadata cleaner to BlockManager to remove old blocks completely. 2012-11-28 23:18:24 -08:00
Thomas Dudziak 84e584fa8c Code review feedback fix 2012-11-28 19:46:06 -08:00
Tathagata Das 9e9e9e1d89 Renamed CleanupTask to MetadataCleaner. 2012-11-28 18:48:14 -08:00
Tathagata Das e463ae4920 Modified StorageLevel and BlockManagerId to cache common objects and use cached object while deserializing. 2012-11-28 14:05:01 -08:00
Tathagata Das d5e7aad039 Bug fixes 2012-11-28 08:36:55 +00:00
Matei Zaharia f86960cba9 Merge pull request #313 from rxin/pde_size_compress
Added a partition preserving flag to MapPartitionsWithSplitRDD.
2012-11-27 22:39:25 -08:00
Matei Zaharia 3ebd8e1885 Added zip to Java API 2012-11-27 22:38:09 -08:00
Matei Zaharia 27e43abd19 Added a zip() operation for RDDs with the same shape (number of
partitions and number of elements in each partition)
2012-11-27 22:27:47 -08:00
Matei Zaharia f410a111ad Merge branch 'master' of github.com:mesos/spark 2012-11-27 20:51:58 -08:00
Josh Rosen 7d71b9a56a Fix NullPointerException caused by unregistered map outputs. 2012-11-27 20:51:51 -08:00
Matei Zaharia 935c468b71 Merge pull request #311 from woggling/map-output-npe
Fix NullPointerException when map output unregistered from MapOutputTracker twice
2012-11-27 20:50:48 -08:00
Reynold Xin bd6dd1a3a6 Added a partition preserving flag to MapPartitionsWithSplitRDD. 2012-11-27 19:43:30 -08:00
Reynold Xin f24bfd2dd1 For size compression, compress non zero values into non zero values. 2012-11-27 19:20:45 -08:00
Thomas Dudziak 3b643e86bc Updated versions in the pom.xml files to match current master 2012-11-27 17:50:42 -08:00
Charles Reiss cf79de425d Fix NullPointerException when unregistering a map output twice. 2012-11-27 16:12:05 -08:00
Charles Reiss 5fa868b98b Tests for MapOutputTracker. 2012-11-27 16:05:36 -08:00
Thomas Dudziak 69297c64be Addressed code review comments 2012-11-27 15:45:16 -08:00
Tathagata Das b18d70870a Modified bunch HashMaps in Spark to use TimeStampedHashMap and made various modules use CleanupTask to periodically clean up metadata. 2012-11-27 15:08:49 -08:00
Tathagata Das 0fe2fc4d5e Merged branch mesos/master to branch dev. 2012-11-26 13:16:59 -08:00
Thomas Dudziak 811a32257b Added maven and debian build files 2012-11-20 16:19:51 -08:00
Tathagata Das c97ebf6437 Fixed bug in the number of splits in RDD after checkpointing. Modified reduceByKeyAndWindow (naive) computation from window+reduceByKey to reduceByKey+window+reduceByKey. 2012-11-19 23:22:07 +00:00
Matei Zaharia 3ff6f4bdee Merge pull request #304 from mbautin/configurable_local_ip
SPARK-624: make the default local IP customizable
2012-11-19 13:23:39 -08:00
mbautin 00f4e3ff9c Addressing Matei's comment: SPARK_LOCAL_IP environment variable 2012-11-19 11:52:10 -08:00
Tathagata Das 10c1abcb6a Fixed checkpointing bug in CoGroupedRDD. CoGroupSplits kept around the RDD splits of its parent RDDs, thus checkpointing its parents did not release the references to the parent splits. 2012-11-17 17:27:00 -08:00
Charles Reiss 12c24e786c Set default uncaught exception handler to exit.
Among other things, should prevent OutOfMemoryErrors in some daemon threads
(such as the network manager) from causing a spark executor to enter a state
where it cannot make progress but does not report an error.
2012-11-16 20:12:31 -08:00
mbautin 1f5a7e0e64 SPARK-624: make the default local IP customizable 2012-11-15 13:57:47 -08:00
Matei Zaharia c23a74df0a Use DNS names instead of IP addresses in standalone mode, to allow
matching with data locality hints from storage systems.
2012-11-15 00:10:52 -08:00
Tathagata Das 8a25d530ed Optimized checkpoint writing by reusing FileSystem object. Fixed bug in updating of checkpoint data in DStream where the checkpointed RDDs, upon recovery, were not recognized as checkpointed RDDs and therefore deleted from HDFS. Made InputStreamsSuite more robust to timing delays. 2012-11-13 02:16:28 -08:00
Denny 05e3807354 Merge branch 'master' into blockmanagerUI 2012-11-12 10:56:54 -08:00
Denny 4a1be7e0db Refactor BlockManager UI and adding worker details. 2012-11-12 10:56:35 -08:00
Matei Zaharia 173e0354c0 Detect correctly when one has disconnected from a standalone cluster.
SPARK-617 #resolve
2012-11-11 21:06:57 -08:00
Denny 68e0a88282 Merge branch 'master' into blockmanagerUI 2012-11-11 14:00:02 -08:00
Denny b829fba749 Merge branch 'master' into blockmanagerUI
Conflicts:
	core/src/main/twirl/spark/deploy/worker/index.scala.html
2012-11-11 13:59:40 -08:00
Tathagata Das 04e9e9d93c Refactored BlockManagerMaster (not BlockManagerMasterActor) to simplify the code and fix live lock problem in unlimited attempts to contact the master. Also added testcases in the BlockManagerSuite to test BlockManagerMaster methods getPeers and getLocations. 2012-11-11 08:54:21 -08:00
root acf8272324 Fix K-means example a little 2012-11-10 23:07:21 -08:00
Tathagata Das 355c8e4b17 Fixed deadlock in BlockManager. 2012-11-09 16:28:45 -08:00
Tathagata Das 9915989bfa Incorporated Matei's suggestions. Tested with 5 producer(consumer) threads each doing 50k puts (gets), took 15 minutes to run, no errors or deadlocks. 2012-11-09 15:46:15 -08:00
Tathagata Das de00bc63db Fixed deadlock in BlockManager.
1. Changed the lock structure of BlockManager by replacing the 337 coarse-grained locks to use BlockInfo objects as per-block fine-grained locks.
2. Changed the MemoryStore lock structure by making the block putting threads lock on a different object (not the memory store) thus making sure putting threads minimally blocks to the getting treads.
3. Added spark.storage.ThreadingTest to stress test the BlockManager using 5 block producer and 5 block consumer threads.
2012-11-09 14:09:37 -08:00
Matei Zaharia 6607f546cc Added an option to spread out jobs in the standalone mode. 2012-11-08 23:13:12 -08:00
Matei Zaharia 66cbdee941 Fix for connections not being reused (from Josh Rosen) 2012-11-08 09:53:40 -08:00
Imran Rashid 809b2bb1fe fix bug in getting slave id out of mesos 2012-11-08 00:34:28 -08:00
Matei Zaharia bb1bce7924 Various fixes to standalone mode and web UI:
- Don't report a job as finishing multiple times
- Don't show state of workers as LOADING when they're running
- Show start and finish times in web UI
- Sort web UI tables by ID and time by default
2012-11-07 16:49:53 -08:00
Matei Zaharia e2b8477487 Made Akka timeout and message frame size configurable, and upped the defaults 2012-11-06 15:58:05 -08:00
Tathagata Das 72b2303f99 Fixed major bugs in checkpointing. 2012-11-05 11:41:36 -08:00
Tathagata Das d154238789 Made checkpointing of dstream graph to work with checkpointing of RDDs. For streams requiring checkpointing of its RDD, the default checkpoint interval is set to 10 seconds. 2012-11-04 12:12:06 -08:00
Shivaram Venkataraman a7d967a1ca Remove unnecessary hash-map put in MemoryStore 2012-11-01 10:46:38 -07:00
Tathagata Das 34e569f40e Added 'synchronized' to RDD serialization to ensure checkpoint-related changes are reflected atomically in the task closure. Added to tests to ensure that jobs running on an RDD on which checkpointing is in progress does hurt the result of the job. 2012-10-31 00:56:40 -07:00
Tathagata Das 0dcd770fdc Added checkpointing support to all RDDs, along with CheckpointSuite to test checkpointing in them. 2012-10-30 16:09:37 -07:00
Denny ceec1a1a6a Nicer storage level format on RDD page 2012-10-29 15:03:01 -07:00
Denny eb95212f4d code Formatting 2012-10-29 14:57:32 -07:00
Denny 531ac136bf BlockManager UI. 2012-10-29 14:53:47 -07:00
Tathagata Das ac12abc17f Modified RDD API to make dependencies a var (therefore can be changed to checkpointed hadoop rdd) and othere references to parent RDDs either through dependencies or through a weak reference (to allow finalizing when dependencies do not refer to it any more). 2012-10-29 11:55:27 -07:00
Josh Rosen 2ccf3b6652 Fix PySpark hash partitioning bug.
A Java array's hashCode is based on its object
identify, not its elements, so this was causing
serialized keys to be hashed incorrectly.

This commit adds a PySpark-specific workaround
and adds more tests.
2012-10-28 22:30:28 -07:00
root e782187b4a Don't throw an error in the block manager when a block is cached on the master due to
a locally computed operation

Conflicts:

	core/src/main/scala/spark/storage/BlockManagerMaster.scala
2012-10-26 00:33:45 -07:00
Matei Zaharia 863a55ae42 Merge remote-tracking branch 'public/master' into dev
Conflicts:
	core/src/main/scala/spark/BlockStoreShuffleFetcher.scala
	core/src/main/scala/spark/KryoSerializer.scala
	core/src/main/scala/spark/MapOutputTracker.scala
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/executor/Executor.scala
	core/src/main/scala/spark/network/Connection.scala
	core/src/main/scala/spark/network/ConnectionManagerTest.scala
	core/src/main/scala/spark/rdd/BlockRDD.scala
	core/src/main/scala/spark/rdd/NewHadoopRDD.scala
	core/src/main/scala/spark/scheduler/ShuffleMapTask.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
	core/src/main/scala/spark/storage/BlockManager.scala
	core/src/main/scala/spark/storage/BlockMessage.scala
	core/src/main/scala/spark/storage/BlockStore.scala
	core/src/main/scala/spark/storage/StorageLevel.scala
	core/src/main/scala/spark/util/AkkaUtils.scala
	project/SparkBuild.scala
	run
2012-10-24 23:21:00 -07:00
Matei Zaharia f63a40fd99 Strip leading mesos:// in URLs passed to Mesos 2012-10-24 21:52:13 -07:00
Matei Zaharia d290e964ea Merge pull request #281 from rxin/memreport
Added a method to report slave memory status; force serialize accumulator update in local mode.
2012-10-23 22:04:35 -07:00
Matei Zaharia 0bd20c63e2 Merge remote-tracking branch 'JoshRosen/shuffle_refactoring' into dev
Conflicts:
	core/src/main/scala/spark/Dependency.scala
	core/src/main/scala/spark/rdd/CoGroupedRDD.scala
	core/src/main/scala/spark/rdd/ShuffledRDD.scala
2012-10-23 22:01:45 -07:00
Josh Rosen d4f2e5b0ef Remove PYTHONPATH from SparkContext's executorEnvs.
It makes more sense to pass it in the dictionary
of environment variables that is used to construct
PythonRDD.
2012-10-22 10:28:59 -07:00
Josh Rosen c23bf1aff4 Add PySpark README and run scripts. 2012-10-20 00:22:27 +00:00
Josh Rosen 52989c8a2c Update Python API for v0.6.0 compatibility. 2012-10-19 10:24:49 -07:00
Josh Rosen e21eb6e00d Merge tag 'v0.6.0' into python-api 2012-10-19 09:44:32 -07:00
Thomas Dudziak d9c2a89c57 Support for Hadoop 2 distributions such as cdh4 2012-10-18 16:08:54 -07:00
Reynold Xin 4a3fb06ac2 Updated Kryo to 2.20. 2012-10-16 01:10:01 -07:00
Reynold Xin 63fae9bc23 Serialize accumulator updates in TaskResult for local mode. 2012-10-15 21:38:28 -07:00
Reynold Xin 42d20fa8da Added a method to report slave memory status. 2012-10-14 22:30:53 -07:00
Matei Zaharia 64dbf8d372 Made ShuffleDependency automatically find a shuffle ID for itself 2012-10-14 10:00:22 -07:00
Tathagata Das e95ff45b53 Implemented checkpointing of StreamingContext and DStream graph. 2012-10-13 20:10:49 -07:00
Matei Zaharia 8815aeba0c Take executor environment vars as an arguemnt to SparkContext 2012-10-13 15:31:11 -07:00
Josh Rosen 33cd3a0c12 Remove map-side combining from ShuffleMapTask.
This separation of concerns simplifies the 
ShuffleDependency and ShuffledRDD interfaces.

Map-side combining can be performed in a
mapPartitions() call prior to shuffling the RDD.

I don't anticipate this having much of a 
performance impact: in both approaches, each tuple
is hashed twice: once in the bucket partitioning
and once in the combiner's hashtable.  The same
steps are being performed, but in a different
order and through one extra Iterator.
2012-10-13 14:59:20 -07:00
Josh Rosen 10bcd217d2 Remove mapSideCombine field from Aggregator.
Instead, the presence or absense of a ShuffleDependency's aggregator
will control whether map-side combining is performed.
2012-10-13 14:59:20 -07:00
Josh Rosen 4775c55641 Change ShuffleFetcher to return an Iterator. 2012-10-13 14:59:20 -07:00
Josh Rosen 110832e88f Add helper methods to Aggregator. 2012-10-13 14:57:56 -07:00
Denny 0700d1920a Protect from null env variables in mesos. 2012-10-13 13:57:59 -07:00
Denny 21047d923e Protect from setting null environment variables. 2012-10-13 13:44:24 -07:00
Denny fa41d50f7d Don't use system envs for Mesos. 2012-10-13 13:15:50 -07:00
Denny 67c42a41d0 Let the user specify environment variables to be passed to the Executors.
Also removed unused variables in the ExecutorRunner.
2012-10-13 13:08:44 -07:00
Matei Zaharia b4067cbad4 More doc updates, and moved Serializer to a subpackage. 2012-10-12 18:19:21 -07:00
Matei Zaharia 8d7b77bcb5 Some doc and usability improvements:
- Added a StorageLevels class for easy access to StorageLevel constants
  in Java
- Added doc comments on Function classes in Java
- Updated Accumulator and HadoopWriter docs slightly
2012-10-12 17:53:20 -07:00
Matei Zaharia 682b2d9329 Added a test for when an RDD only partially fits in memory 2012-10-12 14:58:26 -07:00
Matei Zaharia dca496bb77 Document cartesian() operation 2012-10-12 14:46:41 -07:00
Matei Zaharia 23015ccac0 Merge pull request #271 from shivaram/block-manager-npe-fix
Change block manager to accept a ArrayBuffer
2012-10-12 14:36:28 -07:00
Shivaram Venkataraman 8577523f37 Add test to verify if RDD is computed even if block manager has insufficient
memory
2012-10-12 14:14:57 -07:00
Patrick Wendell dc8adbd359 Adding Java documentation 2012-10-11 00:49:03 -07:00
Shivaram Venkataraman 2cf40c5fd5 Change block manager to accept a ArrayBuffer instead of an iterator to ensure
that the computation can proceed even if we run out of memory to cache the
block. Update CacheTracker to use this new interface
2012-10-11 00:42:46 -07:00
Denny d3f095f904 Fixed bug when fetching Jar dependencies.
Instead of checking currentFiles check currentJars.
2012-10-10 16:09:53 -07:00
Matei Zaharia ee2fcb2ce6 Added documentation to all the *RDDFunction classes, and moved them into
the spark package to make them more visible. Also documented various
other miscellaneous things in the API.
2012-10-09 18:38:36 -07:00
Matei Zaharia bc0bc672d0 Updates to documentation:
- Edited quick start and tuning guide to simplify them a little
- Simplified top menu bar
- Made private a SparkContext constructor parameter that was left as
  public
- Various small fixes
2012-10-09 14:30:23 -07:00
Andy Konwinski 1d79ff6028 Fixes a typo, adds scaladoc comments to SparkContext constructors. 2012-10-08 22:49:17 -07:00
Patrick Wendell ac310098ef More docs in RDD class 2012-10-08 22:25:11 -07:00
Andy Konwinski bd688940a1 A start on scaladoc for the public APIs. 2012-10-08 21:13:29 -07:00
Mosharaf Chowdhury edc67bfba8 Merge branch 'dev' into bc-fix-dev 2012-10-08 16:19:13 -07:00
Matei Zaharia efc5423210 Made compression configurable separately for shuffle, broadcast and RDDs 2012-10-07 11:30:53 -07:00
Matei Zaharia 039cc6228e Merge pull request #251 from JoshRosen/docs/internals
Document Dependency classes and make minor interface improvements
2012-10-07 09:56:53 -07:00
Reynold Xin f66c0e9561 Changed the println to logInfo in Utils.fetchFile. 2012-10-07 01:53:24 -07:00
Matei Zaharia d72db3d7dc Merge pull request #250 from rxin/dev
Fixed a bug in addFile that if the file is specified as "file:///", the symlink is created incorrectly for local mode.
2012-10-07 00:56:53 -07:00
Reynold Xin 80f59e17e2 Fixed a bug in addFile that if the file is specified as "file:///", the
symlink is created wrong for local mode.
2012-10-07 00:54:38 -07:00
Josh Rosen e10308f5a0 Make ShuffleDependency.aggregator explicitly optional.
It was confusing to be using

    new Aggregator[K, V, V](null, null, null, false)

to represent the absence of an aggregator.
2012-10-07 00:36:04 -07:00
Matei Zaharia f930fe5d81 Improve error message 2012-10-07 07:34:36 +00:00
Matei Zaharia a3bf0ce57f Don't crash on ask timeout exceptions in deploy.Client.stop() (fixes a crash in tests) 2012-10-07 07:25:41 +00:00
Matei Zaharia eca570f66a Removed the need to sleep in tests due to waiting for Akka to shut down 2012-10-07 00:17:59 -07:00
Josh Rosen 4f72066a9a Document the Dependency classes. 2012-10-07 00:05:37 -07:00
Josh Rosen 3f2571fe98 Remove unused isShuffle field from Dependency. 2012-10-07 00:03:55 -07:00
Matei Zaharia b2fc3dd902 Log message 2012-10-07 06:43:52 +00:00
Matei Zaharia ea096f7cd5 More logging 2012-10-07 06:35:48 +00:00
root 554b42cb24 Log more info in MapOutputTracker 2012-10-07 05:02:18 +00:00
root a73b25826b Made Akka thread pool and message batch sizes configurable 2012-10-07 04:19:54 +00:00
root ce915cadee Made run script add test-classes onto the classpath only if SPARK_TESTING is set; fixes #216 2012-10-07 04:19:16 +00:00
root 975009d688 Avoid acquiring locks in BlockManager when fetching shuffle outputs 2012-10-07 04:02:10 +00:00
root 0bc63f7ef1 Log initial number of fetches in reducer 2012-10-07 03:51:04 +00:00
Matei Zaharia dc28a3ac0a Modified shuffle to limit the maximum outstanding data size in bytes,
instead of the maximum number of outstanding fetches. This should make
it faster when there are many small map output files, as well as more
robust to overallocating memory on large map outputs.
2012-10-06 20:07:10 -07:00
Matei Zaharia 9a3b3f32a3 Pass sizes of map outputs back to MapOutputTracker 2012-10-06 18:46:04 -07:00
Matei Zaharia 0e42832e6a Made block store return the size of each block put in 2012-10-06 18:00:53 -07:00
Matei Zaharia b0110de5b6 Warn about user programs that try to set spark.cache.class 2012-10-06 17:27:14 -07:00
Matei Zaharia 65113b7e1b Only group elements ten at a time into SequenceFile records in
saveAsObjectFile
2012-10-06 17:14:41 -07:00
Matei Zaharia 716e10ca32 Minor formatting fixes 2012-10-05 22:03:06 -07:00
Matei Zaharia 70f02fa912 Merge branch 'dev' of github.com:mesos/spark into dev 2012-10-05 22:00:22 -07:00
Andy Konwinski a242cdd0a6 Factor subclasses of RDD out of RDD.scala into their own classes
in the rdd package.
2012-10-05 19:53:54 -07:00
Andy Konwinski d7363a6b8a Moves all files in core/src/main/scala/ that have RDD in their name
from that directory to a new core/src/main/scala/rdd directory.
2012-10-05 19:23:45 -07:00
Andy Konwinski e0067da082 Moves all files in core/src/main/scala/ that have RDD in them from
package spark to package spark.rdd and updates all references to them.
2012-10-05 19:23:45 -07:00
Matei Zaharia 69588baf65 Cleaning up code slightly 2012-10-05 19:16:09 -07:00
root f52bc09a34 Reduce some overly aggressive logging in connection manager 2012-10-06 01:54:39 +00:00
Shivaram Venkataraman b6e4f46a96 Fix SizeEstimator tests to work with String classes in JDK 6 and 7
Conflicts:

	core/src/test/scala/spark/BoundedMemoryCacheSuite.scala
2012-10-05 16:58:57 -07:00
Matei Zaharia e3ae98b54e Merge pull request #247 from squito/dev
Dev
2012-10-05 10:27:18 -07:00
Imran Rashid e0698f8f26 change tests to show utility of localValue 2012-10-04 23:05:42 -07:00
Imran Rashid 82a3327862 make accumulator.localValue public, add tests
Conflicts:
	core/src/test/scala/spark/AccumulatorSuite.scala
2012-10-04 23:05:01 -07:00
Matei Zaharia 8c82f43db3 Scaladoc documentation for some core Spark functionality 2012-10-04 22:59:36 -07:00
Matei Zaharia c04aeaf365 Merge pull request #243 from andyk/mesos-from-maven-central
Removes the included mesos-0.9.0.jar and pulls it from Maven Central instead
2012-10-03 11:10:00 -07:00
Reynold Xin 45f4b7cc7e Made Serializer and JavaSerializer non private. 2012-10-03 10:20:59 -07:00
Andy Konwinski 5897567679 Removes the included mesos-0.9.0.jar and adds a libraryDependency to
the build file so that mesos-0.9.0-incubating.jar (which contains the
same class files, but has a silightly different name) will be pulled
down from Maven Central instead.
2012-10-03 08:58:05 -07:00
Matei Zaharia 833f1d0c86 Made StorageLevel public 2012-10-03 08:27:25 -07:00
Matei Zaharia 6cf5dffc72 Make more stuff private[spark] 2012-10-02 22:28:55 -07:00
Mosharaf Chowdhury 119e50c7b9 Conflict fixed 2012-10-02 22:25:39 -07:00
Matei Zaharia 626f701931 Merge pull request #240 from dennybritz/private_classes
Package-Private Classes
2012-10-02 21:24:32 -07:00
Denny 0361353a70 Make Java API abstract wrapped functions private 2012-10-02 20:02:53 -07:00
Denny b9badcd5bd accidentially removed trait 2012-10-02 19:35:07 -07:00
Denny 18a1faedf6 Stylistic changes and Public Accumulable and Broadcast 2012-10-02 19:28:37 -07:00
Denny b7a913e1fa Make dependency classes public - used by spark 2012-10-02 19:04:23 -07:00
Denny 4d9f4b01af Make classes package private 2012-10-02 19:00:19 -07:00
Matei Zaharia 97cbd699d7 Merge branch 'dev' of github.com:mesos/spark into dev 2012-10-02 17:31:01 -07:00
Matei Zaharia 5fda59ab99 Added a test for overly large blocks in memory store 2012-10-02 17:30:40 -07:00
Matei Zaharia 6098f7e87a Fixed cache replacement behavior of BlockManager:
- Partitions that get dropped to disk will now be loaded back into RAM
  after they're accessed again
- Same-RDD rule for cache replacement is now implemented (don't drop
  partitions from an RDD to make room for other partitions from itself)
- Items stored as MEMORY_AND_DISK go into memory only first, instead of
  being eagerly written out to disk
- MemoryStore.ensureFreeSpace is called within a lock on the writer
  thread to prevent race conditions (this can still be optimized to
  allow multiple concurrent calls to it but it's a start)
- MemoryStore does not accept blocks larger than its limit
2012-10-02 17:25:38 -07:00
Reynold Xin 7997585616 Added a check to make sure SPARK_MEM <= memoryPerSlave for local cluster
mode.
2012-10-02 15:45:25 -07:00
Reynold Xin 0898a21b95 Merge branch 'dev' of https://github.com/mesos/spark into dev 2012-10-02 13:08:01 -07:00
Matei Zaharia 22684653a5 Revert "Place Spray repo ahead of Cloudera in Maven search path"
This reverts commit 42e0a68082.
2012-10-02 12:01:32 -07:00
Reynold Xin b8cd681169 Allow whitespaces in cluster URL configuration for local cluster. 2012-10-02 11:52:12 -07:00
Matei Zaharia 42e0a68082 Place Spray repo ahead of Cloudera in Maven search path 2012-10-02 11:37:19 -07:00
Matei Zaharia b9fb8d6463 Include date in folder name for Spark local dir. 2012-10-01 15:55:16 -07:00
Matei Zaharia bc881e4798 Merge branch 'dev' of github.com:mesos/spark into dev 2012-10-01 15:21:56 -07:00
Matei Zaharia 802aa8aef9 Some bug fixes and logging fixes for broadcast. 2012-10-01 15:20:42 -07:00
Matei Zaharia 74a9244255 Write all unit test output to a file 2012-10-01 15:07:42 -07:00
Reynold Xin f264153162 Fixed #232: DirectBuffer's cleaner was empty and Spark tried to invoke
clean on it.
2012-10-01 14:07:34 -07:00
Matei Zaharia 3b348f909d Improve log messages from BlockManager 2012-10-01 12:01:38 -07:00
Matei Zaharia 0b84871dbc Remove some printlns in tests 2012-10-01 10:57:26 -07:00
Matei Zaharia 53f90d0f0e Use underscores instead of colons in RDD IDs 2012-10-01 10:48:53 -07:00
Matei Zaharia 2314132d57 Added a (failing) test for LRU with MEMORY_AND_DISK. 2012-09-30 22:52:16 -07:00
Matei Zaharia 3128c57f90 Simplified Class / ClassLoader test 2012-09-30 21:48:27 -07:00
Matei Zaharia 83143f9a5f Fixed several bugs that caused weird behavior with files in spark-shell:
- SizeEstimator was following through a ClassLoader field of Hadoop
  JobConfs, which referenced the whole interpreter, Scala compiler, etc.
  Chaos ensued, giving an estimated size in the tens of gigabytes.
- Broadcast variables in local mode were only stored as MEMORY_ONLY and
  never made accessible over a server, so they fell out of the cache when
  they were deemed too large and couldn't be reloaded.
2012-09-30 21:19:39 -07:00
Matei Zaharia fd0374b9de Comment 2012-09-29 21:43:06 -07:00
Matei Zaharia 5718cef2a4 Removed Logging trait from CoalescedRDD since we don't log anything 2012-09-29 21:40:43 -07:00