Commit graph

905 commits

Author SHA1 Message Date
Reynold Xin eac566a7f4 Merge branch 'master' of github.com:mesos/spark into dev
Conflicts:
	core/src/main/scala/spark/MapOutputTracker.scala
	core/src/main/scala/spark/PairRDDFunctions.scala
	core/src/main/scala/spark/ParallelCollection.scala
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/rdd/BlockRDD.scala
	core/src/main/scala/spark/rdd/CartesianRDD.scala
	core/src/main/scala/spark/rdd/CoGroupedRDD.scala
	core/src/main/scala/spark/rdd/CoalescedRDD.scala
	core/src/main/scala/spark/rdd/FilteredRDD.scala
	core/src/main/scala/spark/rdd/FlatMappedRDD.scala
	core/src/main/scala/spark/rdd/GlommedRDD.scala
	core/src/main/scala/spark/rdd/HadoopRDD.scala
	core/src/main/scala/spark/rdd/MapPartitionsRDD.scala
	core/src/main/scala/spark/rdd/MapPartitionsWithSplitRDD.scala
	core/src/main/scala/spark/rdd/MappedRDD.scala
	core/src/main/scala/spark/rdd/PipedRDD.scala
	core/src/main/scala/spark/rdd/SampledRDD.scala
	core/src/main/scala/spark/rdd/ShuffledRDD.scala
	core/src/main/scala/spark/rdd/UnionRDD.scala
	core/src/main/scala/spark/storage/BlockManager.scala
	core/src/main/scala/spark/storage/BlockManagerId.scala
	core/src/main/scala/spark/storage/BlockManagerMaster.scala
	core/src/main/scala/spark/storage/StorageLevel.scala
	core/src/main/scala/spark/util/MetadataCleaner.scala
	core/src/main/scala/spark/util/TimeStampedHashMap.scala
	core/src/test/scala/spark/storage/BlockManagerSuite.scala
	run
2012-12-20 14:53:40 -08:00
Tathagata Das 8512dd3225 Merge branch 'dev' of github.com:radlab/spark into dev-checkpoint
Conflicts:
	core/src/main/scala/spark/ParallelCollection.scala
	core/src/test/scala/spark/CheckpointSuite.scala
	streaming/src/main/scala/spark/streaming/DStream.scala
2012-12-20 14:24:19 -08:00
Tathagata Das fe777eb77d Fixed bugs in CheckpointRDD and spark.CheckpointSuite. 2012-12-20 13:39:27 -08:00
Tathagata Das f9c5b0a6fe Changed checkpoint writing and reading process. 2012-12-20 11:52:23 -08:00
Matei Zaharia 5e51b889fe Merge pull request #327 from rxin/spark-633
Added the ability in block manager to remove blocks.
2012-12-20 11:33:38 -08:00
Reynold Xin 9397c5014e Let the slave notify the master block removal. 2012-12-20 01:37:09 -08:00
Reynold Xin 68c52d80ec Moved BlockManager's IdGenerator into BlockManager object. Removed some
excessive debug messages.
2012-12-19 15:27:23 -08:00
Tathagata Das 5184141936 Introduced getSpits, getDependencies, and getPreferredLocations in RDD and RDDCheckpointData. 2012-12-18 13:30:53 -08:00
Patrick Wendell bfac06e1f6 SPARK-616: Logging dead workers in Web UI.
This patch keeps track of which workers have died and marks them
as such in the master web UI. It also handles workers which die and
re-register using different actor ID's.
2012-12-17 23:09:05 -08:00
Tathagata Das 72eed2b95e Converted CheckpointState in RDDCheckpointData to use scala Enumeration. 2012-12-17 18:52:43 -08:00
Matei Zaharia b82a6dd2c7 Merge pull request #332 from JoshRosen/spark-607
Add try-finally to handle MapOutputTracker timeouts
2012-12-14 11:41:16 -08:00
Reynold Xin 06f855c24d Merge branch 'spark-633' of github.com:rxin/spark into spark-633 2012-12-14 00:27:24 -08:00
Reynold Xin 8c01295b85 Fixed conflicts from merging Charles' and TD's block manager changes. 2012-12-14 00:26:36 -08:00
Charles Reiss c528932a41 Code review cleanup. 2012-12-13 22:37:16 -08:00
Charles Reiss 0aad42b5e7 Have standalone cluster report exit codes to clients. Addresses SPARK-639. 2012-12-13 22:37:16 -08:00
Reynold Xin 0235667f73 Merge branch 'master' of github.com:mesos/spark into spark-633 2012-12-13 22:33:41 -08:00
Reynold Xin 97434f49b8 Merged TD's block manager refactoring. 2012-12-13 22:32:19 -08:00
Reynold Xin f4a9e1b9be Fixed the broken Java unit test from SPARK-635. 2012-12-13 22:22:12 -08:00
Reynold Xin 41e58a519a Merge branch 'master' of github.com:mesos/spark into spark-633 2012-12-13 22:06:47 -08:00
Josh Rosen cf52d9cade Add try-finally to handle MapOutputTracker timeouts. 2012-12-13 21:53:30 -08:00
Matei Zaharia 05e225f988 Merge pull request #329 from woggling/executor-status-codes
Executor exit status codes
2012-12-13 20:14:10 -08:00
Charles Reiss b054d3b222 ExecutorLostReason -> ExecutorLossReason 2012-12-13 18:44:07 -08:00
Charles Reiss 24d7aa2d15 Extra whitespace in ExecutorExitCode 2012-12-13 18:39:23 -08:00
Reynold Xin dc7d7fc286 Merge branch 'master' of github.com:mesos/spark into spark-633 2012-12-13 16:48:34 -08:00
Reynold Xin 4f076e105e SPARK-635: Pass a TaskContext object to compute() interface and use
that to close Hadoop input stream. Incorporated Matei's command.
2012-12-13 16:41:15 -08:00
Charles Reiss 829206f1a7 Explain slaveLost calls made by StandaloneSchedulerBackend 2012-12-13 16:23:36 -08:00
Charles Reiss a4041dd87f Log duplicate slaveLost() calls in ClusterScheduler. 2012-12-13 16:23:36 -08:00
Charles Reiss fa9df4a45d Normalize executor exit statuses and report them to the user. 2012-12-13 16:23:31 -08:00
Reynold Xin eacb98e900 SPARK-635: Pass a TaskContext object to compute() interface and use that
to close Hadoop input stream.
2012-12-13 15:41:53 -08:00
Josh Rosen 7c9e3d1c21 Return success or failure in BlockStore.remove(). 2012-12-13 15:22:27 -08:00
Reynold Xin 1b7a0451ed Added the ability in block manager to remove blocks. 2012-12-13 00:04:42 -08:00
Charles Reiss 1d8e2e6cff Call slaveLost on executor death for standalone clusters. 2012-12-12 21:15:34 -08:00
Tathagata Das 8e74fac215 Made checkpoint data in RDDs optional to further reduce serialized size. 2012-12-11 15:36:12 -08:00
Tathagata Das fa28f25619 Fixed bug in UnionRDD and CoGroupedRDD 2012-12-11 13:59:43 -08:00
Tathagata Das 2a87d816a2 Added clear property to JavaAPISuite to remove port binding errors. 2012-12-11 01:44:43 -08:00
Tathagata Das 746afc2e65 Bunch of bug fixes related to checkpointing in RDDs. RDDCheckpointData object is used to lock all serialization and dependency changes for checkpointing. ResultTask converted to Externalizable and serialized RDD is cached like ShuffleMapTask. 2012-12-10 23:36:37 -08:00
Reynold Xin 21b271f5bd Suppress shuffle block updates when a slave node comes back. 2012-12-10 20:36:03 -08:00
Matei Zaharia a1a2daa7ef Merge pull request #317 from woggling/block-manager-heartbeat
Implement block manager heartbeat
2012-12-10 11:03:55 -08:00
Charles Reiss b6b62d774f Decrease BlockManagerMaster logging verbosity 2012-12-10 00:31:55 -08:00
Charles Reiss 5d3e917d09 Use Akka scheduler for BlockManager heart beats.
Adds required ActorSystem argument to BlockManager constructors.
2012-12-10 00:31:50 -08:00
Charles Reiss b53dd28c90 Changed default block manager heartbeat interval to 5 s 2012-12-09 23:03:34 -08:00
Matei Zaharia beb440089e Merge pull request #310 from tomdz/master-mavenized
Maven build setup
2012-12-09 21:40:05 -08:00
Tathagata Das 4be1cdc8b9 Merge pull request #5 from radlab/flume-integration
Flume integration
2012-12-09 20:31:42 -08:00
Tathagata Das e427216018 Removed unnecessary testcases. 2012-12-08 12:46:59 -08:00
Matei Zaharia e1d7cd2276 Search for a non-loopback address in Utils.getLocalIpAddress 2012-12-08 00:33:11 -08:00
Patrick Wendell 3e796bdd57 Changes in response to TD's review. 2012-12-07 19:34:05 -08:00
Patrick Wendell c36ca10241 Adding locality aware parallelize 2012-12-07 16:42:36 -08:00
Tathagata Das 1f3a75ae9e Modified checkpoint testsuite to more comprehensively test checkpointing of various RDDs. Fixed checkpoint bug (splits referring to parent RDDs or parent splits) in UnionRDD and CoalescedRDD. Fixed bug in testing ShuffledRDD. Removed unnecessary and useless map-side combining step for narrow dependencies in CoGroupedRDD. Removed unncessary WeakReference stuff from many other RDDs. 2012-12-07 13:45:52 -08:00
Charles Reiss 714c8d32d5 Don't divide by milliseconds by 1000 more. 2012-12-06 18:38:34 -08:00
Charles Reiss 8f0819520c map -> foreach 2012-12-06 18:29:50 -08:00
Charles Reiss 7a033fd795 Make LocalSparkCluster use distinct IPs 2012-12-06 00:03:08 -08:00
Charles Reiss a2a94fdbc7 Tests for block manager heartbeats. 2012-12-05 23:36:05 -08:00
Charles Reiss d21ca010ac Add block manager heart beats.
Renames old message called 'HeartBeat' to 'BlockUpdate'.

The BlockManager periodically sends a heart beat message to the master.
If the manager is currently not registered. The master responds to the
heart beat by indicating whether the BlockManager is currently registered
with the master. Additionally, the master now also responds to block
updates by indicating whether the BlockManager in question is registered.
When the BlockManager detects (by heart beat or failed block update)
that it stopped being registered, it reregisters and sends block
updates for all its blocks.
2012-12-05 23:35:20 -08:00
Charles Reiss c9e54a6755 Track block managers by hostname; handle manager removal. 2012-12-05 23:35:20 -08:00
Charles Reiss 5afa2ee9e9 Actually put millis in _lastSeenMs 2012-12-05 23:35:20 -08:00
Charles Reiss 813ac71459 Don't use bogus port number in notifyADeadHost(). 2012-12-05 23:35:20 -08:00
Tathagata Das 21a0852976 Refactored RDD checkpointing to minimize extra fields in RDD class. 2012-12-04 22:10:25 -08:00
Tathagata Das a69a82be26 Added metadata cleaner to HttpBroadcast to clean up old broacast files. 2012-12-03 22:37:31 -08:00
Josh Rosen cdaa0fad51 Use external addresses in standalone WebUI on EC2. 2012-12-01 18:19:13 -08:00
Tathagata Das b4dba55f78 Made RDD checkpoint not create a new thread. Fixed bug in detecting when spark.cleaner.delay is insufficient. 2012-12-02 02:03:05 +00:00
Tathagata Das 477de94894 Minor modifications. 2012-12-01 13:15:06 -08:00
Tathagata Das 6fcd09f499 Added TimeStampedHashSet and used that to cleanup the list of registered RDD IDs in CacheTracker. 2012-11-29 02:06:33 -08:00
Tathagata Das c9789751bf Added metadata cleaner to BlockManager to remove old blocks completely. 2012-11-28 23:18:24 -08:00
Thomas Dudziak 84e584fa8c Code review feedback fix 2012-11-28 19:46:06 -08:00
Tathagata Das 9e9e9e1d89 Renamed CleanupTask to MetadataCleaner. 2012-11-28 18:48:14 -08:00
Tathagata Das e463ae4920 Modified StorageLevel and BlockManagerId to cache common objects and use cached object while deserializing. 2012-11-28 14:05:01 -08:00
Tathagata Das d5e7aad039 Bug fixes 2012-11-28 08:36:55 +00:00
Matei Zaharia f86960cba9 Merge pull request #313 from rxin/pde_size_compress
Added a partition preserving flag to MapPartitionsWithSplitRDD.
2012-11-27 22:39:25 -08:00
Matei Zaharia 3ebd8e1885 Added zip to Java API 2012-11-27 22:38:09 -08:00
Matei Zaharia 27e43abd19 Added a zip() operation for RDDs with the same shape (number of
partitions and number of elements in each partition)
2012-11-27 22:27:47 -08:00
Matei Zaharia f410a111ad Merge branch 'master' of github.com:mesos/spark 2012-11-27 20:51:58 -08:00
Josh Rosen 7d71b9a56a Fix NullPointerException caused by unregistered map outputs. 2012-11-27 20:51:51 -08:00
Matei Zaharia 935c468b71 Merge pull request #311 from woggling/map-output-npe
Fix NullPointerException when map output unregistered from MapOutputTracker twice
2012-11-27 20:50:48 -08:00
Reynold Xin bd6dd1a3a6 Added a partition preserving flag to MapPartitionsWithSplitRDD. 2012-11-27 19:43:30 -08:00
Reynold Xin f24bfd2dd1 For size compression, compress non zero values into non zero values. 2012-11-27 19:20:45 -08:00
Thomas Dudziak 3b643e86bc Updated versions in the pom.xml files to match current master 2012-11-27 17:50:42 -08:00
Charles Reiss cf79de425d Fix NullPointerException when unregistering a map output twice. 2012-11-27 16:12:05 -08:00
Charles Reiss 5fa868b98b Tests for MapOutputTracker. 2012-11-27 16:05:36 -08:00
Thomas Dudziak 69297c64be Addressed code review comments 2012-11-27 15:45:16 -08:00
Tathagata Das b18d70870a Modified bunch HashMaps in Spark to use TimeStampedHashMap and made various modules use CleanupTask to periodically clean up metadata. 2012-11-27 15:08:49 -08:00
Tathagata Das 0fe2fc4d5e Merged branch mesos/master to branch dev. 2012-11-26 13:16:59 -08:00
Thomas Dudziak 811a32257b Added maven and debian build files 2012-11-20 16:19:51 -08:00
Tathagata Das c97ebf6437 Fixed bug in the number of splits in RDD after checkpointing. Modified reduceByKeyAndWindow (naive) computation from window+reduceByKey to reduceByKey+window+reduceByKey. 2012-11-19 23:22:07 +00:00
Matei Zaharia 3ff6f4bdee Merge pull request #304 from mbautin/configurable_local_ip
SPARK-624: make the default local IP customizable
2012-11-19 13:23:39 -08:00
mbautin 00f4e3ff9c Addressing Matei's comment: SPARK_LOCAL_IP environment variable 2012-11-19 11:52:10 -08:00
Tathagata Das 10c1abcb6a Fixed checkpointing bug in CoGroupedRDD. CoGroupSplits kept around the RDD splits of its parent RDDs, thus checkpointing its parents did not release the references to the parent splits. 2012-11-17 17:27:00 -08:00
Charles Reiss 12c24e786c Set default uncaught exception handler to exit.
Among other things, should prevent OutOfMemoryErrors in some daemon threads
(such as the network manager) from causing a spark executor to enter a state
where it cannot make progress but does not report an error.
2012-11-16 20:12:31 -08:00
mbautin 1f5a7e0e64 SPARK-624: make the default local IP customizable 2012-11-15 13:57:47 -08:00
Matei Zaharia c23a74df0a Use DNS names instead of IP addresses in standalone mode, to allow
matching with data locality hints from storage systems.
2012-11-15 00:10:52 -08:00
Tathagata Das 8a25d530ed Optimized checkpoint writing by reusing FileSystem object. Fixed bug in updating of checkpoint data in DStream where the checkpointed RDDs, upon recovery, were not recognized as checkpointed RDDs and therefore deleted from HDFS. Made InputStreamsSuite more robust to timing delays. 2012-11-13 02:16:28 -08:00
Matei Zaharia 173e0354c0 Detect correctly when one has disconnected from a standalone cluster.
SPARK-617 #resolve
2012-11-11 21:06:57 -08:00
Tathagata Das 04e9e9d93c Refactored BlockManagerMaster (not BlockManagerMasterActor) to simplify the code and fix live lock problem in unlimited attempts to contact the master. Also added testcases in the BlockManagerSuite to test BlockManagerMaster methods getPeers and getLocations. 2012-11-11 08:54:21 -08:00
root acf8272324 Fix K-means example a little 2012-11-10 23:07:21 -08:00
Tathagata Das 355c8e4b17 Fixed deadlock in BlockManager. 2012-11-09 16:28:45 -08:00
Tathagata Das 9915989bfa Incorporated Matei's suggestions. Tested with 5 producer(consumer) threads each doing 50k puts (gets), took 15 minutes to run, no errors or deadlocks. 2012-11-09 15:46:15 -08:00
Tathagata Das de00bc63db Fixed deadlock in BlockManager.
1. Changed the lock structure of BlockManager by replacing the 337 coarse-grained locks to use BlockInfo objects as per-block fine-grained locks.
2. Changed the MemoryStore lock structure by making the block putting threads lock on a different object (not the memory store) thus making sure putting threads minimally blocks to the getting treads.
3. Added spark.storage.ThreadingTest to stress test the BlockManager using 5 block producer and 5 block consumer threads.
2012-11-09 14:09:37 -08:00
Matei Zaharia 6607f546cc Added an option to spread out jobs in the standalone mode. 2012-11-08 23:13:12 -08:00
Matei Zaharia 66cbdee941 Fix for connections not being reused (from Josh Rosen) 2012-11-08 09:53:40 -08:00
Imran Rashid 809b2bb1fe fix bug in getting slave id out of mesos 2012-11-08 00:34:28 -08:00
Matei Zaharia bb1bce7924 Various fixes to standalone mode and web UI:
- Don't report a job as finishing multiple times
- Don't show state of workers as LOADING when they're running
- Show start and finish times in web UI
- Sort web UI tables by ID and time by default
2012-11-07 16:49:53 -08:00
Matei Zaharia e2b8477487 Made Akka timeout and message frame size configurable, and upped the defaults 2012-11-06 15:58:05 -08:00
Tathagata Das 72b2303f99 Fixed major bugs in checkpointing. 2012-11-05 11:41:36 -08:00
Tathagata Das d154238789 Made checkpointing of dstream graph to work with checkpointing of RDDs. For streams requiring checkpointing of its RDD, the default checkpoint interval is set to 10 seconds. 2012-11-04 12:12:06 -08:00
Shivaram Venkataraman a7d967a1ca Remove unnecessary hash-map put in MemoryStore 2012-11-01 10:46:38 -07:00
Tathagata Das 34e569f40e Added 'synchronized' to RDD serialization to ensure checkpoint-related changes are reflected atomically in the task closure. Added to tests to ensure that jobs running on an RDD on which checkpointing is in progress does hurt the result of the job. 2012-10-31 00:56:40 -07:00
Tathagata Das 0dcd770fdc Added checkpointing support to all RDDs, along with CheckpointSuite to test checkpointing in them. 2012-10-30 16:09:37 -07:00
Tathagata Das ac12abc17f Modified RDD API to make dependencies a var (therefore can be changed to checkpointed hadoop rdd) and othere references to parent RDDs either through dependencies or through a weak reference (to allow finalizing when dependencies do not refer to it any more). 2012-10-29 11:55:27 -07:00
Josh Rosen 2ccf3b6652 Fix PySpark hash partitioning bug.
A Java array's hashCode is based on its object
identify, not its elements, so this was causing
serialized keys to be hashed incorrectly.

This commit adds a PySpark-specific workaround
and adds more tests.
2012-10-28 22:30:28 -07:00
root e782187b4a Don't throw an error in the block manager when a block is cached on the master due to
a locally computed operation

Conflicts:

	core/src/main/scala/spark/storage/BlockManagerMaster.scala
2012-10-26 00:33:45 -07:00
Matei Zaharia 863a55ae42 Merge remote-tracking branch 'public/master' into dev
Conflicts:
	core/src/main/scala/spark/BlockStoreShuffleFetcher.scala
	core/src/main/scala/spark/KryoSerializer.scala
	core/src/main/scala/spark/MapOutputTracker.scala
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/executor/Executor.scala
	core/src/main/scala/spark/network/Connection.scala
	core/src/main/scala/spark/network/ConnectionManagerTest.scala
	core/src/main/scala/spark/rdd/BlockRDD.scala
	core/src/main/scala/spark/rdd/NewHadoopRDD.scala
	core/src/main/scala/spark/scheduler/ShuffleMapTask.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
	core/src/main/scala/spark/storage/BlockManager.scala
	core/src/main/scala/spark/storage/BlockMessage.scala
	core/src/main/scala/spark/storage/BlockStore.scala
	core/src/main/scala/spark/storage/StorageLevel.scala
	core/src/main/scala/spark/util/AkkaUtils.scala
	project/SparkBuild.scala
	run
2012-10-24 23:21:00 -07:00
Matei Zaharia f63a40fd99 Strip leading mesos:// in URLs passed to Mesos 2012-10-24 21:52:13 -07:00
Matei Zaharia d290e964ea Merge pull request #281 from rxin/memreport
Added a method to report slave memory status; force serialize accumulator update in local mode.
2012-10-23 22:04:35 -07:00
Matei Zaharia 0bd20c63e2 Merge remote-tracking branch 'JoshRosen/shuffle_refactoring' into dev
Conflicts:
	core/src/main/scala/spark/Dependency.scala
	core/src/main/scala/spark/rdd/CoGroupedRDD.scala
	core/src/main/scala/spark/rdd/ShuffledRDD.scala
2012-10-23 22:01:45 -07:00
Josh Rosen d4f2e5b0ef Remove PYTHONPATH from SparkContext's executorEnvs.
It makes more sense to pass it in the dictionary
of environment variables that is used to construct
PythonRDD.
2012-10-22 10:28:59 -07:00
Josh Rosen c23bf1aff4 Add PySpark README and run scripts. 2012-10-20 00:22:27 +00:00
Josh Rosen 52989c8a2c Update Python API for v0.6.0 compatibility. 2012-10-19 10:24:49 -07:00
Josh Rosen e21eb6e00d Merge tag 'v0.6.0' into python-api 2012-10-19 09:44:32 -07:00
Thomas Dudziak d9c2a89c57 Support for Hadoop 2 distributions such as cdh4 2012-10-18 16:08:54 -07:00
Reynold Xin 4a3fb06ac2 Updated Kryo to 2.20. 2012-10-16 01:10:01 -07:00
Reynold Xin 63fae9bc23 Serialize accumulator updates in TaskResult for local mode. 2012-10-15 21:38:28 -07:00
Reynold Xin 42d20fa8da Added a method to report slave memory status. 2012-10-14 22:30:53 -07:00
Matei Zaharia 64dbf8d372 Made ShuffleDependency automatically find a shuffle ID for itself 2012-10-14 10:00:22 -07:00
Tathagata Das e95ff45b53 Implemented checkpointing of StreamingContext and DStream graph. 2012-10-13 20:10:49 -07:00
Matei Zaharia 8815aeba0c Take executor environment vars as an arguemnt to SparkContext 2012-10-13 15:31:11 -07:00
Josh Rosen 33cd3a0c12 Remove map-side combining from ShuffleMapTask.
This separation of concerns simplifies the 
ShuffleDependency and ShuffledRDD interfaces.

Map-side combining can be performed in a
mapPartitions() call prior to shuffling the RDD.

I don't anticipate this having much of a 
performance impact: in both approaches, each tuple
is hashed twice: once in the bucket partitioning
and once in the combiner's hashtable.  The same
steps are being performed, but in a different
order and through one extra Iterator.
2012-10-13 14:59:20 -07:00
Josh Rosen 10bcd217d2 Remove mapSideCombine field from Aggregator.
Instead, the presence or absense of a ShuffleDependency's aggregator
will control whether map-side combining is performed.
2012-10-13 14:59:20 -07:00
Josh Rosen 4775c55641 Change ShuffleFetcher to return an Iterator. 2012-10-13 14:59:20 -07:00
Josh Rosen 110832e88f Add helper methods to Aggregator. 2012-10-13 14:57:56 -07:00
Denny 0700d1920a Protect from null env variables in mesos. 2012-10-13 13:57:59 -07:00
Denny 21047d923e Protect from setting null environment variables. 2012-10-13 13:44:24 -07:00
Denny fa41d50f7d Don't use system envs for Mesos. 2012-10-13 13:15:50 -07:00
Denny 67c42a41d0 Let the user specify environment variables to be passed to the Executors.
Also removed unused variables in the ExecutorRunner.
2012-10-13 13:08:44 -07:00
Matei Zaharia b4067cbad4 More doc updates, and moved Serializer to a subpackage. 2012-10-12 18:19:21 -07:00
Matei Zaharia 8d7b77bcb5 Some doc and usability improvements:
- Added a StorageLevels class for easy access to StorageLevel constants
  in Java
- Added doc comments on Function classes in Java
- Updated Accumulator and HadoopWriter docs slightly
2012-10-12 17:53:20 -07:00
Matei Zaharia 682b2d9329 Added a test for when an RDD only partially fits in memory 2012-10-12 14:58:26 -07:00
Matei Zaharia dca496bb77 Document cartesian() operation 2012-10-12 14:46:41 -07:00
Matei Zaharia 23015ccac0 Merge pull request #271 from shivaram/block-manager-npe-fix
Change block manager to accept a ArrayBuffer
2012-10-12 14:36:28 -07:00
Shivaram Venkataraman 8577523f37 Add test to verify if RDD is computed even if block manager has insufficient
memory
2012-10-12 14:14:57 -07:00
Patrick Wendell dc8adbd359 Adding Java documentation 2012-10-11 00:49:03 -07:00
Shivaram Venkataraman 2cf40c5fd5 Change block manager to accept a ArrayBuffer instead of an iterator to ensure
that the computation can proceed even if we run out of memory to cache the
block. Update CacheTracker to use this new interface
2012-10-11 00:42:46 -07:00
Denny d3f095f904 Fixed bug when fetching Jar dependencies.
Instead of checking currentFiles check currentJars.
2012-10-10 16:09:53 -07:00
Matei Zaharia ee2fcb2ce6 Added documentation to all the *RDDFunction classes, and moved them into
the spark package to make them more visible. Also documented various
other miscellaneous things in the API.
2012-10-09 18:38:36 -07:00
Matei Zaharia bc0bc672d0 Updates to documentation:
- Edited quick start and tuning guide to simplify them a little
- Simplified top menu bar
- Made private a SparkContext constructor parameter that was left as
  public
- Various small fixes
2012-10-09 14:30:23 -07:00
Andy Konwinski 1d79ff6028 Fixes a typo, adds scaladoc comments to SparkContext constructors. 2012-10-08 22:49:17 -07:00
Patrick Wendell ac310098ef More docs in RDD class 2012-10-08 22:25:11 -07:00
Andy Konwinski bd688940a1 A start on scaladoc for the public APIs. 2012-10-08 21:13:29 -07:00
Mosharaf Chowdhury edc67bfba8 Merge branch 'dev' into bc-fix-dev 2012-10-08 16:19:13 -07:00
Matei Zaharia efc5423210 Made compression configurable separately for shuffle, broadcast and RDDs 2012-10-07 11:30:53 -07:00
Matei Zaharia 039cc6228e Merge pull request #251 from JoshRosen/docs/internals
Document Dependency classes and make minor interface improvements
2012-10-07 09:56:53 -07:00
Reynold Xin f66c0e9561 Changed the println to logInfo in Utils.fetchFile. 2012-10-07 01:53:24 -07:00
Matei Zaharia d72db3d7dc Merge pull request #250 from rxin/dev
Fixed a bug in addFile that if the file is specified as "file:///", the symlink is created incorrectly for local mode.
2012-10-07 00:56:53 -07:00
Reynold Xin 80f59e17e2 Fixed a bug in addFile that if the file is specified as "file:///", the
symlink is created wrong for local mode.
2012-10-07 00:54:38 -07:00
Josh Rosen e10308f5a0 Make ShuffleDependency.aggregator explicitly optional.
It was confusing to be using

    new Aggregator[K, V, V](null, null, null, false)

to represent the absence of an aggregator.
2012-10-07 00:36:04 -07:00
Matei Zaharia f930fe5d81 Improve error message 2012-10-07 07:34:36 +00:00
Matei Zaharia a3bf0ce57f Don't crash on ask timeout exceptions in deploy.Client.stop() (fixes a crash in tests) 2012-10-07 07:25:41 +00:00
Matei Zaharia eca570f66a Removed the need to sleep in tests due to waiting for Akka to shut down 2012-10-07 00:17:59 -07:00
Josh Rosen 4f72066a9a Document the Dependency classes. 2012-10-07 00:05:37 -07:00
Josh Rosen 3f2571fe98 Remove unused isShuffle field from Dependency. 2012-10-07 00:03:55 -07:00
Matei Zaharia b2fc3dd902 Log message 2012-10-07 06:43:52 +00:00
Matei Zaharia ea096f7cd5 More logging 2012-10-07 06:35:48 +00:00
root 554b42cb24 Log more info in MapOutputTracker 2012-10-07 05:02:18 +00:00
root a73b25826b Made Akka thread pool and message batch sizes configurable 2012-10-07 04:19:54 +00:00
root ce915cadee Made run script add test-classes onto the classpath only if SPARK_TESTING is set; fixes #216 2012-10-07 04:19:16 +00:00
root 975009d688 Avoid acquiring locks in BlockManager when fetching shuffle outputs 2012-10-07 04:02:10 +00:00
root 0bc63f7ef1 Log initial number of fetches in reducer 2012-10-07 03:51:04 +00:00
Matei Zaharia dc28a3ac0a Modified shuffle to limit the maximum outstanding data size in bytes,
instead of the maximum number of outstanding fetches. This should make
it faster when there are many small map output files, as well as more
robust to overallocating memory on large map outputs.
2012-10-06 20:07:10 -07:00
Matei Zaharia 9a3b3f32a3 Pass sizes of map outputs back to MapOutputTracker 2012-10-06 18:46:04 -07:00
Matei Zaharia 0e42832e6a Made block store return the size of each block put in 2012-10-06 18:00:53 -07:00
Matei Zaharia b0110de5b6 Warn about user programs that try to set spark.cache.class 2012-10-06 17:27:14 -07:00
Matei Zaharia 65113b7e1b Only group elements ten at a time into SequenceFile records in
saveAsObjectFile
2012-10-06 17:14:41 -07:00
Matei Zaharia 716e10ca32 Minor formatting fixes 2012-10-05 22:03:06 -07:00
Matei Zaharia 70f02fa912 Merge branch 'dev' of github.com:mesos/spark into dev 2012-10-05 22:00:22 -07:00
Andy Konwinski a242cdd0a6 Factor subclasses of RDD out of RDD.scala into their own classes
in the rdd package.
2012-10-05 19:53:54 -07:00
Andy Konwinski d7363a6b8a Moves all files in core/src/main/scala/ that have RDD in their name
from that directory to a new core/src/main/scala/rdd directory.
2012-10-05 19:23:45 -07:00
Andy Konwinski e0067da082 Moves all files in core/src/main/scala/ that have RDD in them from
package spark to package spark.rdd and updates all references to them.
2012-10-05 19:23:45 -07:00
Matei Zaharia 69588baf65 Cleaning up code slightly 2012-10-05 19:16:09 -07:00
root f52bc09a34 Reduce some overly aggressive logging in connection manager 2012-10-06 01:54:39 +00:00
Shivaram Venkataraman b6e4f46a96 Fix SizeEstimator tests to work with String classes in JDK 6 and 7
Conflicts:

	core/src/test/scala/spark/BoundedMemoryCacheSuite.scala
2012-10-05 16:58:57 -07:00
Matei Zaharia e3ae98b54e Merge pull request #247 from squito/dev
Dev
2012-10-05 10:27:18 -07:00
Imran Rashid e0698f8f26 change tests to show utility of localValue 2012-10-04 23:05:42 -07:00
Imran Rashid 82a3327862 make accumulator.localValue public, add tests
Conflicts:
	core/src/test/scala/spark/AccumulatorSuite.scala
2012-10-04 23:05:01 -07:00
Matei Zaharia 8c82f43db3 Scaladoc documentation for some core Spark functionality 2012-10-04 22:59:36 -07:00
Matei Zaharia c04aeaf365 Merge pull request #243 from andyk/mesos-from-maven-central
Removes the included mesos-0.9.0.jar and pulls it from Maven Central instead
2012-10-03 11:10:00 -07:00
Reynold Xin 45f4b7cc7e Made Serializer and JavaSerializer non private. 2012-10-03 10:20:59 -07:00
Andy Konwinski 5897567679 Removes the included mesos-0.9.0.jar and adds a libraryDependency to
the build file so that mesos-0.9.0-incubating.jar (which contains the
same class files, but has a silightly different name) will be pulled
down from Maven Central instead.
2012-10-03 08:58:05 -07:00
Matei Zaharia 833f1d0c86 Made StorageLevel public 2012-10-03 08:27:25 -07:00
Matei Zaharia 6cf5dffc72 Make more stuff private[spark] 2012-10-02 22:28:55 -07:00
Mosharaf Chowdhury 119e50c7b9 Conflict fixed 2012-10-02 22:25:39 -07:00
Matei Zaharia 626f701931 Merge pull request #240 from dennybritz/private_classes
Package-Private Classes
2012-10-02 21:24:32 -07:00
Denny 0361353a70 Make Java API abstract wrapped functions private 2012-10-02 20:02:53 -07:00
Denny b9badcd5bd accidentially removed trait 2012-10-02 19:35:07 -07:00
Denny 18a1faedf6 Stylistic changes and Public Accumulable and Broadcast 2012-10-02 19:28:37 -07:00
Denny b7a913e1fa Make dependency classes public - used by spark 2012-10-02 19:04:23 -07:00
Denny 4d9f4b01af Make classes package private 2012-10-02 19:00:19 -07:00
Matei Zaharia 97cbd699d7 Merge branch 'dev' of github.com:mesos/spark into dev 2012-10-02 17:31:01 -07:00
Matei Zaharia 5fda59ab99 Added a test for overly large blocks in memory store 2012-10-02 17:30:40 -07:00
Matei Zaharia 6098f7e87a Fixed cache replacement behavior of BlockManager:
- Partitions that get dropped to disk will now be loaded back into RAM
  after they're accessed again
- Same-RDD rule for cache replacement is now implemented (don't drop
  partitions from an RDD to make room for other partitions from itself)
- Items stored as MEMORY_AND_DISK go into memory only first, instead of
  being eagerly written out to disk
- MemoryStore.ensureFreeSpace is called within a lock on the writer
  thread to prevent race conditions (this can still be optimized to
  allow multiple concurrent calls to it but it's a start)
- MemoryStore does not accept blocks larger than its limit
2012-10-02 17:25:38 -07:00
Reynold Xin 7997585616 Added a check to make sure SPARK_MEM <= memoryPerSlave for local cluster
mode.
2012-10-02 15:45:25 -07:00
Reynold Xin 0898a21b95 Merge branch 'dev' of https://github.com/mesos/spark into dev 2012-10-02 13:08:01 -07:00
Matei Zaharia 22684653a5 Revert "Place Spray repo ahead of Cloudera in Maven search path"
This reverts commit 42e0a68082.
2012-10-02 12:01:32 -07:00