ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Imran Rashid	d98caa0fa0	Merge remote-tracking branch 'dennybritz/blockmanagerUI' into blockmanager_ui Conflicts: core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/main/scala/spark/storage/StorageLevel.scala	2013-01-18 18:11:26 -08:00
Patrick Wendell	ee0314c3b3	Merge branch 'streaming' into streaming-java-api	2013-01-17 18:43:00 -08:00
Patrick Wendell	d5570c7968	Adding checkpointing to Java API	2013-01-17 18:41:58 -08:00
Matei Zaharia	54c0f9f185	Fix code that assumed spark.local.dir is only a single directory	2013-01-17 17:40:55 -08:00
Fernand Pajot	742bc841ad	changed HttpBroadcast server cache to be in spark.local.dir instead of java.io.tmpdir	2013-01-17 16:56:11 -08:00
Matei Zaharia	aff1844155	Merge pull request #381 from squito/remove_threadpool remove unused thread pool	2013-01-16 16:46:42 -08:00
Tathagata Das	f466ee44bc	Merge branch 'master' into streaming Conflicts: core/src/main/scala/spark/MapOutputTracker.scala	2013-01-16 12:57:11 -08:00
Imran Rashid	eae698f755	remove unused thread pool	2013-01-16 12:21:37 -08:00
Tathagata Das	a805ac4a7c	Disabled checkpoint for PairwiseRDD (pySpark).	2013-01-16 10:55:26 -08:00
Matei Zaharia	4beb084f64	Merge pull request #374 from woggling/null-mapout Generate FetchFailedException even for cached missing map outputs	2013-01-15 14:22:29 -08:00
Tathagata Das	cd1521cfdb	Merge branch 'master' into streaming Conflicts: core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/FilteredRDD.scala docs/_layouts/global.html docs/index.md run	2013-01-15 12:08:51 -08:00
Stephen Haberman	74d3b23929	Add spark.executor.memory to differentiate executor memory from spark-shell memory.	2013-01-15 14:03:28 -06:00
Stephen Haberman	dd583b7ebf	Call executeOnCompleteCallbacks in a finally block.	2013-01-15 10:52:06 -06:00
Tathagata Das	eded21925a	Merge pull request #375 from tdas/streaming Important bug fixes	2013-01-14 23:06:40 -08:00
Charles Reiss	273fb5cc10	Throw FetchFailedException for cached missing locs	2013-01-14 15:26:48 -08:00
Tathagata Das	131be5d62e	Fixed bug in RDD checkpointing.	2013-01-14 03:28:25 -08:00
folone	25c0739bad	Moved to scala 2.10.0. Notable changes are: - akka 2.0.3 → 2.1.0 - spray 1.0-M1 → 1.1-M7 For now the repl subproject is commented out, as scala reflection api changed very much since the introduction of macros.	2013-01-14 09:52:11 +01:00
Tathagata Das	82b0cc90ca	Merge pull request #370 from tdas/streaming Added more documentation and minor change in API for NetworkReceiver	2013-01-13 21:28:12 -08:00
Tathagata Das	0dbd411a56	Added documentation for PairDStreamFunctions.	2013-01-13 21:08:35 -08:00
Matei Zaharia	cb867e9ffb	Merge branch 'master' of github.com:mesos/spark	2013-01-13 19:34:32 -08:00
Matei Zaharia	72408e8dfa	Make filter preserve partitioner info, since it can	2013-01-13 19:34:07 -08:00
Matei Zaharia	9a34409810	Merge pull request #360 from rxin/cogroup-java Changed CoGroupRDD's hash map from Scala to Java.	2013-01-13 15:31:08 -08:00
Reynold Xin	be7166146b	Removed the use of getOrElse to avoid Scala wrapper for every call.	2013-01-13 15:27:28 -08:00
Ryan LeCompte	c31931af7e	switch to uppercase constants	2013-01-13 10:39:47 -08:00
Ryan LeCompte	2305a2c1d9	more code cleanup	2013-01-13 10:01:56 -08:00
Matei Zaharia	fbb3fc4143	Merge pull request #346 from JoshRosen/python-api Python API (PySpark)	2013-01-12 23:49:36 -08:00
Ryan LeCompte	addff2c466	add comment	2013-01-12 09:57:29 -08:00
Ryan LeCompte	0cfea7a2ec	add unit test	2013-01-11 23:48:07 -08:00
Ryan LeCompte	ff10b3aa09	add missing return	2013-01-11 21:03:57 -08:00
Ryan LeCompte	22445fbea9	attempt to sleep for more accurate time period, minor cleanup	2013-01-11 13:30:49 -08:00
Tyson	1731f1fed4	Added an optional format parameter for individual job queries and optimized the jobId query	2013-01-11 15:01:43 -05:00
Tyson	c063e8777e	Added implicit json writers for JobDescription and ExecutorRunner	2013-01-11 14:57:38 -05:00
Stephen Haberman	5c7a127219	Pass a new Configuration that wraps the default hadoopConfiguration.	2013-01-11 11:25:11 -06:00
Stephen Haberman	3e6519a36e	Use hadoopConfiguration for default JobConf in PairRDDFunctions.	2013-01-11 11:24:20 -06:00
Matei Zaharia	2e914d9983	Formatting	2013-01-10 19:13:08 -08:00
Matei Zaharia	3548c9c0c8	Merge branch 'master' of github.com:mesos/spark	2013-01-10 19:06:40 -08:00
Matei Zaharia	6d1c230281	Merge pull request #357 from tysonjh/master JSON support added to WebUI	2013-01-10 19:06:07 -08:00
Matei Zaharia	248995c535	Merge pull request #356 from shane-huang/master Fix an issue in ConnectionManager where sendMessage may create too many unnecessary connections	2013-01-10 17:52:23 -08:00
Reynold Xin	bd336f5f40	Changed CoGroupRDD's hash map from Scala to Java.	2013-01-10 17:13:04 -08:00
Stephen Haberman	d1864052c5	Fix invalid asInstanceOf cast.	2013-01-10 12:16:26 -06:00
Stephen Haberman	b15e851279	Check for AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY environment variables. For custom properties, use "spark.hadoop." as a prefix instead of just "hadoop.".	2013-01-10 10:55:41 -06:00
shane-huang	9930a95d21	Modified Patch according to comments	2013-01-10 20:09:55 +08:00
Stephen Haberman	e3861ae395	Provide and expose a default Hadoop Configuration. Any "hadoop.*" system properties will be passed along into configuration.	2013-01-09 17:08:14 -06:00
Tyson	549ee388a1	Removed io.spray spray-json dependency as it is not needed.	2013-01-09 15:12:23 -05:00
Tyson	bf9d9946f9	Query parameter reformatted to be more extensible and routing more robust	2013-01-09 11:29:58 -05:00
Tyson	0da2ff102e	Added url query parameter json and handler	2013-01-09 10:40:48 -05:00
Tyson	269fe018c7	JSON object definitions	2013-01-09 10:40:43 -05:00
Matei Zaharia	9cc764f523	Code style	2013-01-08 22:29:57 -08:00
Matei Zaharia	14972141f9	Merge pull request #344 from mbautin/log_preferred_hosts Log preferred hosts	2013-01-08 22:26:34 -08:00
Josh Rosen	b57dd0f160	Add mapPartitionsWithSplit() to PySpark.	2013-01-08 16:05:02 -08:00
Stephen Haberman	8ac0f35be4	Add JavaRDDLike.keyBy.	2013-01-08 09:57:45 -06:00
Stephen Haberman	4ee6b22775	Merge branch 'master' into tupleBy Conflicts: core/src/test/scala/spark/RDDSuite.scala	2013-01-08 09:10:10 -06:00
shane-huang	e4cb72da8a	Fix an issue in ConnectionManager where sendingMessage may create too many unnecessary SendingConnections.	2013-01-08 22:40:58 +08:00
Mikhail Bautin	4725b0f643	Fixing if/else coding style for preferred hosts logging	2013-01-07 20:09:26 -08:00
Mikhail Bautin	c41042c816	Log preferred hosts	2013-01-07 20:06:09 -08:00
Matei Zaharia	f7cf035b9b	Merge pull request #350 from tdas/streaming Spark Streaming	2013-01-07 17:40:11 -08:00
Shivaram Venkataraman	77d751731c	Remove unused BoundedMemoryCache file and associated test case.	2013-01-07 15:57:46 -08:00
Shivaram Venkataraman	aed368a970	Update Hadoop dependency to 1.0.3 as 0.20 has Sun specific dependencies. Also fix SequenceFileRDDFunctions to pick the right type conversion across Hadoop versions	2013-01-07 15:57:33 -08:00
Shivaram Venkataraman	f8d579a0c0	Remove dependencies on sun jvm classes. Instead use reflection to infer HotSpot options and total physical memory size	2013-01-07 15:57:18 -08:00
Tathagata Das	3b0a3b89ac	Added better docs for RDDCheckpointData	2013-01-07 14:55:49 -08:00
Tathagata Das	237bac36e9	Renamed examples and added documentation.	2013-01-07 14:37:21 -08:00
Matei Zaharia	1941d9602d	Merge branch 'master' of github.com:mesos/spark	2013-01-07 16:50:39 -05:00
Matei Zaharia	9c32f300fb	Add Accumulable.setValue for easier use in Java	2013-01-07 16:50:23 -05:00
Tathagata Das	1346126485	Changed cleanup to clearOldValues for TimeStampedHashMap and TimeStampedHashSet.	2013-01-07 12:11:27 -08:00
Stephen Haberman	8dc06069fe	Rename RDD.tupleBy to keyBy.	2013-01-06 15:21:45 -06:00
Matei Zaharia	8fd3a70c18	Add PairRDD.keys() and values() to Java API	2013-01-05 22:46:45 -05:00
Matei Zaharia	b1663752c6	Merge pull request #351 from stephenh/values Add PairRDDFunctions.keys and values.	2013-01-05 19:15:54 -08:00
Matei Zaharia	0982572519	Add methods called just 'accumulator' for int/double in Java API	2013-01-05 22:11:28 -05:00
Matei Zaharia	86af64b0a6	Fix Accumulators in Java, and add a test for them	2013-01-05 20:55:17 -05:00
Matei Zaharia	ecf9c08901	Fix Accumulators in Java, and add a test for them	2013-01-05 20:54:08 -05:00
Stephen Haberman	1fdb6946b5	Add RDD.tupleBy.	2013-01-05 13:07:59 -06:00
Stephen Haberman	f4e6b9361f	Add RDD.collect(PartialFunction).	2013-01-05 12:14:08 -06:00
Stephen Haberman	8d57c78c83	Add PairRDDFunctions.keys and values.	2013-01-05 12:04:01 -06:00
Josh Rosen	33beba3965	Change PySpark RDD.take() to not call iterator().	2013-01-03 14:52:21 -08:00
Tathagata Das	d34dba25c2	Merge branch 'mesos' into dev-merge	2013-01-01 15:48:39 -08:00
Josh Rosen	b58340dbd9	Rename top-level 'pyspark' directory to 'python'	2013-01-01 15:05:00 -08:00
Josh Rosen	170e451fbd	Minor documentation and style fixes for PySpark.	2013-01-01 13:52:14 -08:00
Matei Zaharia	55809fbc6d	Merge pull request #349 from woggling/cache-finally Avoid stalls when computation of cached RDD throws exception	2013-01-01 08:21:33 -08:00
Charles Reiss	58072a7340	Remove some dead comments	2013-01-01 08:07:44 -08:00
Charles Reiss	feadaf72f4	Mark key as not loading in CacheTracker even when compute() fails	2013-01-01 07:57:20 -08:00
Josh Rosen	f803953998	Raise exception when hashing Java arrays (SPARK-597)	2012-12-31 20:20:11 -08:00
Tathagata Das	7e0271b438	Refactored a whole lot to push all DStreams into the spark.streaming.dstream package.	2012-12-30 15:19:55 -08:00
Tathagata Das	9e644402c1	Improved jekyll and scala docs. Made many classes and method private to remove them from scala docs.	2012-12-29 18:31:51 -08:00
Josh Rosen	59195c68ec	Update PySpark for compatibility with TaskContext.	2012-12-29 16:01:03 -08:00
Josh Rosen	c5cee53f20	Merge remote-tracking branch 'origin/master' into python-api Conflicts: docs/quick-start.md	2012-12-29 16:00:51 -08:00
Josh Rosen	7ec3595de2	Fix bug (introduced by batching) in PySpark take()	2012-12-28 22:21:16 -08:00
Josh Rosen	397e67103c	Change Utils.fetchFile() warning to SparkException.	2012-12-28 17:37:13 -08:00
Josh Rosen	d64fa72d2e	Add addFile() and addJar() to JavaSparkContext.	2012-12-28 17:00:57 -08:00
Josh Rosen	bd237d4a9d	Add synchronization to LocalScheduler.updateDependencies().	2012-12-28 17:00:57 -08:00
Josh Rosen	f1bf4f0385	Skip deletion of files in clearFiles(). This fixes an issue where Spark could delete original files in the current working directory that were added to the job using addFile(). There was also the potential for addFile() to overwrite local files, which is addressed by changing Utils.fetchFile() to log a warning instead of overwriting a file with new contents. This is a short-term fix; a better long-term solution would be to remove the dependence on storing files in the current working directory, since we can't change the cwd from Java.	2012-12-28 17:00:57 -08:00
Josh Rosen	fbadb1cda5	Mark api.python classes as private; echo Java output to stderr.	2012-12-28 09:06:11 -08:00
Tathagata Das	0bc0a60d30	Modifications to make sure LocalScheduler terminate cleanly without errors when SparkContext is shutdown, to minimize spurious exception during master failure tests.	2012-12-27 15:37:33 -08:00
Tathagata Das	7c33f76291	Merge branch 'mesos' into dev-merge	2012-12-26 19:19:07 -08:00
Tathagata Das	836042bb9f	Merge branch 'dev-checkpoint' of github.com:radlab/spark into dev-merge Conflicts: core/src/main/scala/spark/ParallelCollection.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/rdd/BlockRDD.scala core/src/main/scala/spark/rdd/CartesianRDD.scala core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/CoalescedRDD.scala core/src/main/scala/spark/rdd/FilteredRDD.scala core/src/main/scala/spark/rdd/FlatMappedRDD.scala core/src/main/scala/spark/rdd/GlommedRDD.scala core/src/main/scala/spark/rdd/HadoopRDD.scala core/src/main/scala/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/spark/rdd/MapPartitionsWithSplitRDD.scala core/src/main/scala/spark/rdd/MappedRDD.scala core/src/main/scala/spark/rdd/PipedRDD.scala core/src/main/scala/spark/rdd/SampledRDD.scala core/src/main/scala/spark/rdd/ShuffledRDD.scala core/src/main/scala/spark/rdd/UnionRDD.scala core/src/main/scala/spark/scheduler/ResultTask.scala core/src/test/scala/spark/CheckpointSuite.scala	2012-12-26 19:09:01 -08:00
Josh Rosen	1dca0c5180	Remove debug output from PythonPartitioner.	2012-12-26 18:23:06 -08:00
Josh Rosen	4608902fb8	Use filesystem to collect RDDs in PySpark. Passing large volumes of data through Py4J seems to be slow. It appears to be faster to write the data to the local filesystem and read it back from Python.	2012-12-24 17:20:10 -08:00
Mark Hamstra	903f3518df	fall back to filter-map-collect when calling lookup() on an RDD without a partitioner	2012-12-24 13:18:45 -08:00
Mark Hamstra	61be8566e2	Allow distinct() to be called without parentheses when using the default number of splits.	2012-12-24 02:36:47 -08:00
Reynold Xin	60f7338092	Remove the call to close input stream in Kryo serializer.	2012-12-21 15:49:33 -08:00
Matei Zaharia	3334b7c6b5	Merge pull request #341 from rxin/4a3fb06ac2d11125feb08acbbd4df76d1e91b677 Kryo2 update against Spark master	2012-12-21 15:31:23 -08:00
Reynold Xin	eac566a7f4	Merge branch 'master' of github.com:mesos/spark into dev Conflicts: core/src/main/scala/spark/MapOutputTracker.scala core/src/main/scala/spark/PairRDDFunctions.scala core/src/main/scala/spark/ParallelCollection.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/rdd/BlockRDD.scala core/src/main/scala/spark/rdd/CartesianRDD.scala core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/CoalescedRDD.scala core/src/main/scala/spark/rdd/FilteredRDD.scala core/src/main/scala/spark/rdd/FlatMappedRDD.scala core/src/main/scala/spark/rdd/GlommedRDD.scala core/src/main/scala/spark/rdd/HadoopRDD.scala core/src/main/scala/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/spark/rdd/MapPartitionsWithSplitRDD.scala core/src/main/scala/spark/rdd/MappedRDD.scala core/src/main/scala/spark/rdd/PipedRDD.scala core/src/main/scala/spark/rdd/SampledRDD.scala core/src/main/scala/spark/rdd/ShuffledRDD.scala core/src/main/scala/spark/rdd/UnionRDD.scala core/src/main/scala/spark/storage/BlockManager.scala core/src/main/scala/spark/storage/BlockManagerId.scala core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/main/scala/spark/storage/StorageLevel.scala core/src/main/scala/spark/util/MetadataCleaner.scala core/src/main/scala/spark/util/TimeStampedHashMap.scala core/src/test/scala/spark/storage/BlockManagerSuite.scala run	2012-12-20 14:53:40 -08:00
Tathagata Das	8512dd3225	Merge branch 'dev' of github.com:radlab/spark into dev-checkpoint Conflicts: core/src/main/scala/spark/ParallelCollection.scala core/src/test/scala/spark/CheckpointSuite.scala streaming/src/main/scala/spark/streaming/DStream.scala	2012-12-20 14:24:19 -08:00
Tathagata Das	fe777eb77d	Fixed bugs in CheckpointRDD and spark.CheckpointSuite.	2012-12-20 13:39:27 -08:00
Tathagata Das	f9c5b0a6fe	Changed checkpoint writing and reading process.	2012-12-20 11:52:23 -08:00
Matei Zaharia	5e51b889fe	Merge pull request #327 from rxin/spark-633 Added the ability in block manager to remove blocks.	2012-12-20 11:33:38 -08:00
Reynold Xin	9397c5014e	Let the slave notify the master block removal.	2012-12-20 01:37:09 -08:00
Reynold Xin	68c52d80ec	Moved BlockManager's IdGenerator into BlockManager object. Removed some excessive debug messages.	2012-12-19 15:27:23 -08:00
Tathagata Das	5184141936	Introduced getSpits, getDependencies, and getPreferredLocations in RDD and RDDCheckpointData.	2012-12-18 13:30:53 -08:00
Patrick Wendell	bfac06e1f6	SPARK-616: Logging dead workers in Web UI. This patch keeps track of which workers have died and marks them as such in the master web UI. It also handles workers which die and re-register using different actor ID's.	2012-12-17 23:09:05 -08:00
Tathagata Das	72eed2b95e	Converted CheckpointState in RDDCheckpointData to use scala Enumeration.	2012-12-17 18:52:43 -08:00
Matei Zaharia	b82a6dd2c7	Merge pull request #332 from JoshRosen/spark-607 Add try-finally to handle MapOutputTracker timeouts	2012-12-14 11:41:16 -08:00
Reynold Xin	06f855c24d	Merge branch 'spark-633' of github.com:rxin/spark into spark-633	2012-12-14 00:27:24 -08:00
Reynold Xin	8c01295b85	Fixed conflicts from merging Charles' and TD's block manager changes.	2012-12-14 00:26:36 -08:00
Charles Reiss	c528932a41	Code review cleanup.	2012-12-13 22:37:16 -08:00
Charles Reiss	0aad42b5e7	Have standalone cluster report exit codes to clients. Addresses SPARK-639.	2012-12-13 22:37:16 -08:00
Reynold Xin	97434f49b8	Merged TD's block manager refactoring.	2012-12-13 22:32:19 -08:00
Reynold Xin	41e58a519a	Merge branch 'master' of github.com:mesos/spark into spark-633	2012-12-13 22:06:47 -08:00
Josh Rosen	cf52d9cade	Add try-finally to handle MapOutputTracker timeouts.	2012-12-13 21:53:30 -08:00
Matei Zaharia	05e225f988	Merge pull request #329 from woggling/executor-status-codes Executor exit status codes	2012-12-13 20:14:10 -08:00
Charles Reiss	b054d3b222	ExecutorLostReason -> ExecutorLossReason	2012-12-13 18:44:07 -08:00
Charles Reiss	24d7aa2d15	Extra whitespace in ExecutorExitCode	2012-12-13 18:39:23 -08:00
Reynold Xin	dc7d7fc286	Merge branch 'master' of github.com:mesos/spark into spark-633	2012-12-13 16:48:34 -08:00
Reynold Xin	4f076e105e	SPARK-635: Pass a TaskContext object to compute() interface and use that to close Hadoop input stream. Incorporated Matei's command.	2012-12-13 16:41:15 -08:00
Charles Reiss	829206f1a7	Explain slaveLost calls made by StandaloneSchedulerBackend	2012-12-13 16:23:36 -08:00
Charles Reiss	a4041dd87f	Log duplicate slaveLost() calls in ClusterScheduler.	2012-12-13 16:23:36 -08:00
Charles Reiss	fa9df4a45d	Normalize executor exit statuses and report them to the user.	2012-12-13 16:23:31 -08:00
Reynold Xin	eacb98e900	SPARK-635: Pass a TaskContext object to compute() interface and use that to close Hadoop input stream.	2012-12-13 15:41:53 -08:00
Josh Rosen	7c9e3d1c21	Return success or failure in BlockStore.remove().	2012-12-13 15:22:27 -08:00
Reynold Xin	1b7a0451ed	Added the ability in block manager to remove blocks.	2012-12-13 00:04:42 -08:00
Charles Reiss	1d8e2e6cff	Call slaveLost on executor death for standalone clusters.	2012-12-12 21:15:34 -08:00
Tathagata Das	8e74fac215	Made checkpoint data in RDDs optional to further reduce serialized size.	2012-12-11 15:36:12 -08:00
Tathagata Das	fa28f25619	Fixed bug in UnionRDD and CoGroupedRDD	2012-12-11 13:59:43 -08:00
Tathagata Das	746afc2e65	Bunch of bug fixes related to checkpointing in RDDs. RDDCheckpointData object is used to lock all serialization and dependency changes for checkpointing. ResultTask converted to Externalizable and serialized RDD is cached like ShuffleMapTask.	2012-12-10 23:36:37 -08:00
Reynold Xin	21b271f5bd	Suppress shuffle block updates when a slave node comes back.	2012-12-10 20:36:03 -08:00
Matei Zaharia	a1a2daa7ef	Merge pull request #317 from woggling/block-manager-heartbeat Implement block manager heartbeat	2012-12-10 11:03:55 -08:00
Charles Reiss	b6b62d774f	Decrease BlockManagerMaster logging verbosity	2012-12-10 00:31:55 -08:00
Charles Reiss	5d3e917d09	Use Akka scheduler for BlockManager heart beats. Adds required ActorSystem argument to BlockManager constructors.	2012-12-10 00:31:50 -08:00
Charles Reiss	b53dd28c90	Changed default block manager heartbeat interval to 5 s	2012-12-09 23:03:34 -08:00
Matei Zaharia	e1d7cd2276	Search for a non-loopback address in Utils.getLocalIpAddress	2012-12-08 00:33:11 -08:00
Patrick Wendell	3e796bdd57	Changes in response to TD's review.	2012-12-07 19:34:05 -08:00
Patrick Wendell	c36ca10241	Adding locality aware parallelize	2012-12-07 16:42:36 -08:00
Tathagata Das	1f3a75ae9e	Modified checkpoint testsuite to more comprehensively test checkpointing of various RDDs. Fixed checkpoint bug (splits referring to parent RDDs or parent splits) in UnionRDD and CoalescedRDD. Fixed bug in testing ShuffledRDD. Removed unnecessary and useless map-side combining step for narrow dependencies in CoGroupedRDD. Removed unncessary WeakReference stuff from many other RDDs.	2012-12-07 13:45:52 -08:00
Charles Reiss	714c8d32d5	Don't divide by milliseconds by 1000 more.	2012-12-06 18:38:34 -08:00
Charles Reiss	8f0819520c	map -> foreach	2012-12-06 18:29:50 -08:00
Charles Reiss	7a033fd795	Make LocalSparkCluster use distinct IPs	2012-12-06 00:03:08 -08:00
Charles Reiss	d21ca010ac	Add block manager heart beats. Renames old message called 'HeartBeat' to 'BlockUpdate'. The BlockManager periodically sends a heart beat message to the master. If the manager is currently not registered. The master responds to the heart beat by indicating whether the BlockManager is currently registered with the master. Additionally, the master now also responds to block updates by indicating whether the BlockManager in question is registered. When the BlockManager detects (by heart beat or failed block update) that it stopped being registered, it reregisters and sends block updates for all its blocks.	2012-12-05 23:35:20 -08:00
Charles Reiss	c9e54a6755	Track block managers by hostname; handle manager removal.	2012-12-05 23:35:20 -08:00
Charles Reiss	5afa2ee9e9	Actually put millis in _lastSeenMs	2012-12-05 23:35:20 -08:00
Charles Reiss	813ac71459	Don't use bogus port number in notifyADeadHost().	2012-12-05 23:35:20 -08:00
Tathagata Das	21a0852976	Refactored RDD checkpointing to minimize extra fields in RDD class.	2012-12-04 22:10:25 -08:00
Tathagata Das	a69a82be26	Added metadata cleaner to HttpBroadcast to clean up old broacast files.	2012-12-03 22:37:31 -08:00
Josh Rosen	cdaa0fad51	Use external addresses in standalone WebUI on EC2.	2012-12-01 18:19:13 -08:00
Tathagata Das	b4dba55f78	Made RDD checkpoint not create a new thread. Fixed bug in detecting when spark.cleaner.delay is insufficient.	2012-12-02 02:03:05 +00:00
Tathagata Das	477de94894	Minor modifications.	2012-12-01 13:15:06 -08:00
Tathagata Das	6fcd09f499	Added TimeStampedHashSet and used that to cleanup the list of registered RDD IDs in CacheTracker.	2012-11-29 02:06:33 -08:00
Tathagata Das	c9789751bf	Added metadata cleaner to BlockManager to remove old blocks completely.	2012-11-28 23:18:24 -08:00
Tathagata Das	9e9e9e1d89	Renamed CleanupTask to MetadataCleaner.	2012-11-28 18:48:14 -08:00
Tathagata Das	e463ae4920	Modified StorageLevel and BlockManagerId to cache common objects and use cached object while deserializing.	2012-11-28 14:05:01 -08:00
Tathagata Das	d5e7aad039	Bug fixes	2012-11-28 08:36:55 +00:00
Matei Zaharia	f86960cba9	Merge pull request #313 from rxin/pde_size_compress Added a partition preserving flag to MapPartitionsWithSplitRDD.	2012-11-27 22:39:25 -08:00
Matei Zaharia	3ebd8e1885	Added zip to Java API	2012-11-27 22:38:09 -08:00
Matei Zaharia	27e43abd19	Added a zip() operation for RDDs with the same shape (number of partitions and number of elements in each partition)	2012-11-27 22:27:47 -08:00
Matei Zaharia	f410a111ad	Merge branch 'master' of github.com:mesos/spark	2012-11-27 20:51:58 -08:00
Josh Rosen	7d71b9a56a	Fix NullPointerException caused by unregistered map outputs.	2012-11-27 20:51:51 -08:00
Matei Zaharia	935c468b71	Merge pull request #311 from woggling/map-output-npe Fix NullPointerException when map output unregistered from MapOutputTracker twice	2012-11-27 20:50:48 -08:00
Reynold Xin	bd6dd1a3a6	Added a partition preserving flag to MapPartitionsWithSplitRDD.	2012-11-27 19:43:30 -08:00
Reynold Xin	f24bfd2dd1	For size compression, compress non zero values into non zero values.	2012-11-27 19:20:45 -08:00
Charles Reiss	cf79de425d	Fix NullPointerException when unregistering a map output twice.	2012-11-27 16:12:05 -08:00
Tathagata Das	b18d70870a	Modified bunch HashMaps in Spark to use TimeStampedHashMap and made various modules use CleanupTask to periodically clean up metadata.	2012-11-27 15:08:49 -08:00
Tathagata Das	0fe2fc4d5e	Merged branch mesos/master to branch dev.	2012-11-26 13:16:59 -08:00
Tathagata Das	c97ebf6437	Fixed bug in the number of splits in RDD after checkpointing. Modified reduceByKeyAndWindow (naive) computation from window+reduceByKey to reduceByKey+window+reduceByKey.	2012-11-19 23:22:07 +00:00
Matei Zaharia	3ff6f4bdee	Merge pull request #304 from mbautin/configurable_local_ip SPARK-624: make the default local IP customizable	2012-11-19 13:23:39 -08:00
mbautin	00f4e3ff9c	Addressing Matei's comment: SPARK_LOCAL_IP environment variable	2012-11-19 11:52:10 -08:00
Tathagata Das	10c1abcb6a	Fixed checkpointing bug in CoGroupedRDD. CoGroupSplits kept around the RDD splits of its parent RDDs, thus checkpointing its parents did not release the references to the parent splits.	2012-11-17 17:27:00 -08:00
Charles Reiss	12c24e786c	Set default uncaught exception handler to exit. Among other things, should prevent OutOfMemoryErrors in some daemon threads (such as the network manager) from causing a spark executor to enter a state where it cannot make progress but does not report an error.	2012-11-16 20:12:31 -08:00
mbautin	1f5a7e0e64	SPARK-624: make the default local IP customizable	2012-11-15 13:57:47 -08:00
Matei Zaharia	c23a74df0a	Use DNS names instead of IP addresses in standalone mode, to allow matching with data locality hints from storage systems.	2012-11-15 00:10:52 -08:00
Tathagata Das	8a25d530ed	Optimized checkpoint writing by reusing FileSystem object. Fixed bug in updating of checkpoint data in DStream where the checkpointed RDDs, upon recovery, were not recognized as checkpointed RDDs and therefore deleted from HDFS. Made InputStreamsSuite more robust to timing delays.	2012-11-13 02:16:28 -08:00
Denny	05e3807354	Merge branch 'master' into blockmanagerUI	2012-11-12 10:56:54 -08:00
Denny	4a1be7e0db	Refactor BlockManager UI and adding worker details.	2012-11-12 10:56:35 -08:00
Matei Zaharia	173e0354c0	Detect correctly when one has disconnected from a standalone cluster. SPARK-617 #resolve	2012-11-11 21:06:57 -08:00
Denny	68e0a88282	Merge branch 'master' into blockmanagerUI	2012-11-11 14:00:02 -08:00
Denny	b829fba749	Merge branch 'master' into blockmanagerUI Conflicts: core/src/main/twirl/spark/deploy/worker/index.scala.html	2012-11-11 13:59:40 -08:00
Tathagata Das	04e9e9d93c	Refactored BlockManagerMaster (not BlockManagerMasterActor) to simplify the code and fix live lock problem in unlimited attempts to contact the master. Also added testcases in the BlockManagerSuite to test BlockManagerMaster methods getPeers and getLocations.	2012-11-11 08:54:21 -08:00
root	acf8272324	Fix K-means example a little	2012-11-10 23:07:21 -08:00
Tathagata Das	355c8e4b17	Fixed deadlock in BlockManager.	2012-11-09 16:28:45 -08:00
Tathagata Das	9915989bfa	Incorporated Matei's suggestions. Tested with 5 producer(consumer) threads each doing 50k puts (gets), took 15 minutes to run, no errors or deadlocks.	2012-11-09 15:46:15 -08:00
Tathagata Das	de00bc63db	Fixed deadlock in BlockManager. 1. Changed the lock structure of BlockManager by replacing the 337 coarse-grained locks to use BlockInfo objects as per-block fine-grained locks. 2. Changed the MemoryStore lock structure by making the block putting threads lock on a different object (not the memory store) thus making sure putting threads minimally blocks to the getting treads. 3. Added spark.storage.ThreadingTest to stress test the BlockManager using 5 block producer and 5 block consumer threads.	2012-11-09 14:09:37 -08:00
Matei Zaharia	6607f546cc	Added an option to spread out jobs in the standalone mode.	2012-11-08 23:13:12 -08:00
Matei Zaharia	66cbdee941	Fix for connections not being reused (from Josh Rosen)	2012-11-08 09:53:40 -08:00
Imran Rashid	809b2bb1fe	fix bug in getting slave id out of mesos	2012-11-08 00:34:28 -08:00
Matei Zaharia	bb1bce7924	Various fixes to standalone mode and web UI: - Don't report a job as finishing multiple times - Don't show state of workers as LOADING when they're running - Show start and finish times in web UI - Sort web UI tables by ID and time by default	2012-11-07 16:49:53 -08:00
Matei Zaharia	e2b8477487	Made Akka timeout and message frame size configurable, and upped the defaults	2012-11-06 15:58:05 -08:00
Tathagata Das	72b2303f99	Fixed major bugs in checkpointing.	2012-11-05 11:41:36 -08:00
Tathagata Das	d154238789	Made checkpointing of dstream graph to work with checkpointing of RDDs. For streams requiring checkpointing of its RDD, the default checkpoint interval is set to 10 seconds.	2012-11-04 12:12:06 -08:00
Shivaram Venkataraman	a7d967a1ca	Remove unnecessary hash-map put in MemoryStore	2012-11-01 10:46:38 -07:00
Tathagata Das	34e569f40e	Added 'synchronized' to RDD serialization to ensure checkpoint-related changes are reflected atomically in the task closure. Added to tests to ensure that jobs running on an RDD on which checkpointing is in progress does hurt the result of the job.	2012-10-31 00:56:40 -07:00
Tathagata Das	0dcd770fdc	Added checkpointing support to all RDDs, along with CheckpointSuite to test checkpointing in them.	2012-10-30 16:09:37 -07:00
Denny	ceec1a1a6a	Nicer storage level format on RDD page	2012-10-29 15:03:01 -07:00
Denny	eb95212f4d	code Formatting	2012-10-29 14:57:32 -07:00
Denny	531ac136bf	BlockManager UI.	2012-10-29 14:53:47 -07:00
Tathagata Das	ac12abc17f	Modified RDD API to make dependencies a var (therefore can be changed to checkpointed hadoop rdd) and othere references to parent RDDs either through dependencies or through a weak reference (to allow finalizing when dependencies do not refer to it any more).	2012-10-29 11:55:27 -07:00
Josh Rosen	2ccf3b6652	Fix PySpark hash partitioning bug. A Java array's hashCode is based on its object identify, not its elements, so this was causing serialized keys to be hashed incorrectly. This commit adds a PySpark-specific workaround and adds more tests.	2012-10-28 22:30:28 -07:00
root	e782187b4a	Don't throw an error in the block manager when a block is cached on the master due to a locally computed operation Conflicts: core/src/main/scala/spark/storage/BlockManagerMaster.scala	2012-10-26 00:33:45 -07:00
Matei Zaharia	863a55ae42	Merge remote-tracking branch 'public/master' into dev Conflicts: core/src/main/scala/spark/BlockStoreShuffleFetcher.scala core/src/main/scala/spark/KryoSerializer.scala core/src/main/scala/spark/MapOutputTracker.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/executor/Executor.scala core/src/main/scala/spark/network/Connection.scala core/src/main/scala/spark/network/ConnectionManagerTest.scala core/src/main/scala/spark/rdd/BlockRDD.scala core/src/main/scala/spark/rdd/NewHadoopRDD.scala core/src/main/scala/spark/scheduler/ShuffleMapTask.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/main/scala/spark/storage/BlockManager.scala core/src/main/scala/spark/storage/BlockMessage.scala core/src/main/scala/spark/storage/BlockStore.scala core/src/main/scala/spark/storage/StorageLevel.scala core/src/main/scala/spark/util/AkkaUtils.scala project/SparkBuild.scala run	2012-10-24 23:21:00 -07:00
Matei Zaharia	f63a40fd99	Strip leading mesos:// in URLs passed to Mesos	2012-10-24 21:52:13 -07:00
Matei Zaharia	d290e964ea	Merge pull request #281 from rxin/memreport Added a method to report slave memory status; force serialize accumulator update in local mode.	2012-10-23 22:04:35 -07:00
Matei Zaharia	0bd20c63e2	Merge remote-tracking branch 'JoshRosen/shuffle_refactoring' into dev Conflicts: core/src/main/scala/spark/Dependency.scala core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/ShuffledRDD.scala	2012-10-23 22:01:45 -07:00
Josh Rosen	d4f2e5b0ef	Remove PYTHONPATH from SparkContext's executorEnvs. It makes more sense to pass it in the dictionary of environment variables that is used to construct PythonRDD.	2012-10-22 10:28:59 -07:00
Josh Rosen	c23bf1aff4	Add PySpark README and run scripts.	2012-10-20 00:22:27 +00:00
Josh Rosen	52989c8a2c	Update Python API for v0.6.0 compatibility.	2012-10-19 10:24:49 -07:00
Josh Rosen	e21eb6e00d	Merge tag 'v0.6.0' into python-api	2012-10-19 09:44:32 -07:00
Thomas Dudziak	d9c2a89c57	Support for Hadoop 2 distributions such as cdh4	2012-10-18 16:08:54 -07:00
Reynold Xin	4a3fb06ac2	Updated Kryo to 2.20.	2012-10-16 01:10:01 -07:00
Reynold Xin	63fae9bc23	Serialize accumulator updates in TaskResult for local mode.	2012-10-15 21:38:28 -07:00
Reynold Xin	42d20fa8da	Added a method to report slave memory status.	2012-10-14 22:30:53 -07:00
Matei Zaharia	64dbf8d372	Made ShuffleDependency automatically find a shuffle ID for itself	2012-10-14 10:00:22 -07:00
Tathagata Das	e95ff45b53	Implemented checkpointing of StreamingContext and DStream graph.	2012-10-13 20:10:49 -07:00
Matei Zaharia	8815aeba0c	Take executor environment vars as an arguemnt to SparkContext	2012-10-13 15:31:11 -07:00
Josh Rosen	33cd3a0c12	Remove map-side combining from ShuffleMapTask. This separation of concerns simplifies the ShuffleDependency and ShuffledRDD interfaces. Map-side combining can be performed in a mapPartitions() call prior to shuffling the RDD. I don't anticipate this having much of a performance impact: in both approaches, each tuple is hashed twice: once in the bucket partitioning and once in the combiner's hashtable. The same steps are being performed, but in a different order and through one extra Iterator.	2012-10-13 14:59:20 -07:00
Josh Rosen	10bcd217d2	Remove mapSideCombine field from Aggregator. Instead, the presence or absense of a ShuffleDependency's aggregator will control whether map-side combining is performed.	2012-10-13 14:59:20 -07:00
Josh Rosen	4775c55641	Change ShuffleFetcher to return an Iterator.	2012-10-13 14:59:20 -07:00
Josh Rosen	110832e88f	Add helper methods to Aggregator.	2012-10-13 14:57:56 -07:00
Denny	0700d1920a	Protect from null env variables in mesos.	2012-10-13 13:57:59 -07:00
Denny	21047d923e	Protect from setting null environment variables.	2012-10-13 13:44:24 -07:00
Denny	fa41d50f7d	Don't use system envs for Mesos.	2012-10-13 13:15:50 -07:00
Denny	67c42a41d0	Let the user specify environment variables to be passed to the Executors. Also removed unused variables in the ExecutorRunner.	2012-10-13 13:08:44 -07:00
Matei Zaharia	b4067cbad4	More doc updates, and moved Serializer to a subpackage.	2012-10-12 18:19:21 -07:00
Matei Zaharia	8d7b77bcb5	Some doc and usability improvements: - Added a StorageLevels class for easy access to StorageLevel constants in Java - Added doc comments on Function classes in Java - Updated Accumulator and HadoopWriter docs slightly	2012-10-12 17:53:20 -07:00
Matei Zaharia	dca496bb77	Document cartesian() operation	2012-10-12 14:46:41 -07:00
Matei Zaharia	23015ccac0	Merge pull request #271 from shivaram/block-manager-npe-fix Change block manager to accept a ArrayBuffer	2012-10-12 14:36:28 -07:00
Patrick Wendell	dc8adbd359	Adding Java documentation	2012-10-11 00:49:03 -07:00
Shivaram Venkataraman	2cf40c5fd5	Change block manager to accept a ArrayBuffer instead of an iterator to ensure that the computation can proceed even if we run out of memory to cache the block. Update CacheTracker to use this new interface	2012-10-11 00:42:46 -07:00
Denny	d3f095f904	Fixed bug when fetching Jar dependencies. Instead of checking currentFiles check currentJars.	2012-10-10 16:09:53 -07:00
Matei Zaharia	ee2fcb2ce6	Added documentation to all the *RDDFunction classes, and moved them into the spark package to make them more visible. Also documented various other miscellaneous things in the API.	2012-10-09 18:38:36 -07:00
Matei Zaharia	bc0bc672d0	Updates to documentation: - Edited quick start and tuning guide to simplify them a little - Simplified top menu bar - Made private a SparkContext constructor parameter that was left as public - Various small fixes	2012-10-09 14:30:23 -07:00
Andy Konwinski	1d79ff6028	Fixes a typo, adds scaladoc comments to SparkContext constructors.	2012-10-08 22:49:17 -07:00
Patrick Wendell	ac310098ef	More docs in RDD class	2012-10-08 22:25:11 -07:00
Andy Konwinski	bd688940a1	A start on scaladoc for the public APIs.	2012-10-08 21:13:29 -07:00
Mosharaf Chowdhury	edc67bfba8	Merge branch 'dev' into bc-fix-dev	2012-10-08 16:19:13 -07:00
Matei Zaharia	efc5423210	Made compression configurable separately for shuffle, broadcast and RDDs	2012-10-07 11:30:53 -07:00
Matei Zaharia	039cc6228e	Merge pull request #251 from JoshRosen/docs/internals Document Dependency classes and make minor interface improvements	2012-10-07 09:56:53 -07:00
Reynold Xin	f66c0e9561	Changed the println to logInfo in Utils.fetchFile.	2012-10-07 01:53:24 -07:00
Matei Zaharia	d72db3d7dc	Merge pull request #250 from rxin/dev Fixed a bug in addFile that if the file is specified as "file:///", the symlink is created incorrectly for local mode.	2012-10-07 00:56:53 -07:00
Reynold Xin	80f59e17e2	Fixed a bug in addFile that if the file is specified as "file:///", the symlink is created wrong for local mode.	2012-10-07 00:54:38 -07:00
Josh Rosen	e10308f5a0	Make ShuffleDependency.aggregator explicitly optional. It was confusing to be using new Aggregator[K, V, V](null, null, null, false) to represent the absence of an aggregator.	2012-10-07 00:36:04 -07:00
Matei Zaharia	f930fe5d81	Improve error message	2012-10-07 07:34:36 +00:00
Matei Zaharia	a3bf0ce57f	Don't crash on ask timeout exceptions in deploy.Client.stop() (fixes a crash in tests)	2012-10-07 07:25:41 +00:00
Matei Zaharia	eca570f66a	Removed the need to sleep in tests due to waiting for Akka to shut down	2012-10-07 00:17:59 -07:00
Josh Rosen	4f72066a9a	Document the Dependency classes.	2012-10-07 00:05:37 -07:00
Josh Rosen	3f2571fe98	Remove unused isShuffle field from Dependency.	2012-10-07 00:03:55 -07:00
Matei Zaharia	b2fc3dd902	Log message	2012-10-07 06:43:52 +00:00
Matei Zaharia	ea096f7cd5	More logging	2012-10-07 06:35:48 +00:00
root	554b42cb24	Log more info in MapOutputTracker	2012-10-07 05:02:18 +00:00
root	a73b25826b	Made Akka thread pool and message batch sizes configurable	2012-10-07 04:19:54 +00:00
root	ce915cadee	Made run script add test-classes onto the classpath only if SPARK_TESTING is set; fixes #216	2012-10-07 04:19:16 +00:00
root	975009d688	Avoid acquiring locks in BlockManager when fetching shuffle outputs	2012-10-07 04:02:10 +00:00
root	0bc63f7ef1	Log initial number of fetches in reducer	2012-10-07 03:51:04 +00:00
Matei Zaharia	dc28a3ac0a	Modified shuffle to limit the maximum outstanding data size in bytes, instead of the maximum number of outstanding fetches. This should make it faster when there are many small map output files, as well as more robust to overallocating memory on large map outputs.	2012-10-06 20:07:10 -07:00
Matei Zaharia	9a3b3f32a3	Pass sizes of map outputs back to MapOutputTracker	2012-10-06 18:46:04 -07:00
Matei Zaharia	0e42832e6a	Made block store return the size of each block put in	2012-10-06 18:00:53 -07:00
Matei Zaharia	b0110de5b6	Warn about user programs that try to set spark.cache.class	2012-10-06 17:27:14 -07:00
Matei Zaharia	65113b7e1b	Only group elements ten at a time into SequenceFile records in saveAsObjectFile	2012-10-06 17:14:41 -07:00
Matei Zaharia	716e10ca32	Minor formatting fixes	2012-10-05 22:03:06 -07:00
Matei Zaharia	70f02fa912	Merge branch 'dev' of github.com:mesos/spark into dev	2012-10-05 22:00:22 -07:00
Andy Konwinski	a242cdd0a6	Factor subclasses of RDD out of RDD.scala into their own classes in the rdd package.	2012-10-05 19:53:54 -07:00
Andy Konwinski	d7363a6b8a	Moves all files in core/src/main/scala/ that have RDD in their name from that directory to a new core/src/main/scala/rdd directory.	2012-10-05 19:23:45 -07:00
Andy Konwinski	e0067da082	Moves all files in core/src/main/scala/ that have RDD in them from package spark to package spark.rdd and updates all references to them.	2012-10-05 19:23:45 -07:00
Matei Zaharia	69588baf65	Cleaning up code slightly	2012-10-05 19:16:09 -07:00
root	f52bc09a34	Reduce some overly aggressive logging in connection manager	2012-10-06 01:54:39 +00:00
Matei Zaharia	e3ae98b54e	Merge pull request #247 from squito/dev Dev	2012-10-05 10:27:18 -07:00
Imran Rashid	e0698f8f26	change tests to show utility of localValue	2012-10-04 23:05:42 -07:00
Imran Rashid	82a3327862	make accumulator.localValue public, add tests Conflicts: core/src/test/scala/spark/AccumulatorSuite.scala	2012-10-04 23:05:01 -07:00
Matei Zaharia	8c82f43db3	Scaladoc documentation for some core Spark functionality	2012-10-04 22:59:36 -07:00
Reynold Xin	45f4b7cc7e	Made Serializer and JavaSerializer non private.	2012-10-03 10:20:59 -07:00
Matei Zaharia	833f1d0c86	Made StorageLevel public	2012-10-03 08:27:25 -07:00
Matei Zaharia	6cf5dffc72	Make more stuff private[spark]	2012-10-02 22:28:55 -07:00
Mosharaf Chowdhury	119e50c7b9	Conflict fixed	2012-10-02 22:25:39 -07:00
Matei Zaharia	626f701931	Merge pull request #240 from dennybritz/private_classes Package-Private Classes	2012-10-02 21:24:32 -07:00
Denny	0361353a70	Make Java API abstract wrapped functions private	2012-10-02 20:02:53 -07:00
Denny	b9badcd5bd	accidentially removed trait	2012-10-02 19:35:07 -07:00
Denny	18a1faedf6	Stylistic changes and Public Accumulable and Broadcast	2012-10-02 19:28:37 -07:00
Denny	b7a913e1fa	Make dependency classes public - used by spark	2012-10-02 19:04:23 -07:00
Denny	4d9f4b01af	Make classes package private	2012-10-02 19:00:19 -07:00
Matei Zaharia	97cbd699d7	Merge branch 'dev' of github.com:mesos/spark into dev	2012-10-02 17:31:01 -07:00
Matei Zaharia	6098f7e87a	Fixed cache replacement behavior of BlockManager: - Partitions that get dropped to disk will now be loaded back into RAM after they're accessed again - Same-RDD rule for cache replacement is now implemented (don't drop partitions from an RDD to make room for other partitions from itself) - Items stored as MEMORY_AND_DISK go into memory only first, instead of being eagerly written out to disk - MemoryStore.ensureFreeSpace is called within a lock on the writer thread to prevent race conditions (this can still be optimized to allow multiple concurrent calls to it but it's a start) - MemoryStore does not accept blocks larger than its limit	2012-10-02 17:25:38 -07:00
Reynold Xin	7997585616	Added a check to make sure SPARK_MEM <= memoryPerSlave for local cluster mode.	2012-10-02 15:45:25 -07:00
Reynold Xin	0898a21b95	Merge branch 'dev' of https://github.com/mesos/spark into dev	2012-10-02 13:08:01 -07:00
Matei Zaharia	22684653a5	Revert "Place Spray repo ahead of Cloudera in Maven search path" This reverts commit `42e0a68082`.	2012-10-02 12:01:32 -07:00
Reynold Xin	b8cd681169	Allow whitespaces in cluster URL configuration for local cluster.	2012-10-02 11:52:12 -07:00
Matei Zaharia	42e0a68082	Place Spray repo ahead of Cloudera in Maven search path	2012-10-02 11:37:19 -07:00
Matei Zaharia	b9fb8d6463	Include date in folder name for Spark local dir.	2012-10-01 15:55:16 -07:00
Matei Zaharia	bc881e4798	Merge branch 'dev' of github.com:mesos/spark into dev	2012-10-01 15:21:56 -07:00
Matei Zaharia	802aa8aef9	Some bug fixes and logging fixes for broadcast.	2012-10-01 15:20:42 -07:00
Reynold Xin	f264153162	Fixed #232 : DirectBuffer's cleaner was empty and Spark tried to invoke clean on it.	2012-10-01 14:07:34 -07:00
Matei Zaharia	3b348f909d	Improve log messages from BlockManager	2012-10-01 12:01:38 -07:00
Matei Zaharia	53f90d0f0e	Use underscores instead of colons in RDD IDs	2012-10-01 10:48:53 -07:00
Matei Zaharia	2314132d57	Added a (failing) test for LRU with MEMORY_AND_DISK.	2012-09-30 22:52:16 -07:00
Matei Zaharia	3128c57f90	Simplified Class / ClassLoader test	2012-09-30 21:48:27 -07:00
Matei Zaharia	83143f9a5f	Fixed several bugs that caused weird behavior with files in spark-shell: - SizeEstimator was following through a ClassLoader field of Hadoop JobConfs, which referenced the whole interpreter, Scala compiler, etc. Chaos ensued, giving an estimated size in the tens of gigabytes. - Broadcast variables in local mode were only stored as MEMORY_ONLY and never made accessible over a server, so they fell out of the cache when they were deemed too large and couldn't be reloaded.	2012-09-30 21:19:39 -07:00
Matei Zaharia	fd0374b9de	Comment	2012-09-29 21:43:06 -07:00
Matei Zaharia	5718cef2a4	Removed Logging trait from CoalescedRDD since we don't log anything	2012-09-29 21:40:43 -07:00
Matei Zaharia	143ef4f90d	Added a CoalescedRDD class for reducing the number of partitions in an RDD.	2012-09-29 21:30:52 -07:00
Matei Zaharia	ebd52347b5	Merge branch 'dev' of github.com:mesos/spark into dev	2012-09-29 20:22:31 -07:00
Matei Zaharia	9b326d01e9	Made BlockManager unmap memory-mapped files when necessary to reduce the number of open files. Also optimized sending of disk-based blocks.	2012-09-29 20:21:54 -07:00
Matei Zaharia	2f11e3c285	Merge pull request #227 from JoshRosen/fix/distinct_numsplits Allow controlling number of splits in distinct().	2012-09-28 23:57:24 -07:00
Josh Rosen	8654165e69	Use null as dummy value in distinct().	2012-09-28 23:55:17 -07:00
Josh Rosen	37c199bbb0	Allow controlling number of splits in distinct().	2012-09-28 23:44:19 -07:00
Matei Zaharia	56dcad5936	Don't create a Cache in SparkEnv because we don't use it	2012-09-28 23:40:56 -07:00
Matei Zaharia	1d44644f4f	Logging tweaks	2012-09-28 23:28:16 -07:00
Matei Zaharia	815d6bd69a	Renamed subdirs option	2012-09-28 19:02:41 -07:00
Matei Zaharia	e54e1d7043	Made subdirs per local dir configurable, and reduced lock usage a bit	2012-09-28 19:00:50 -07:00
Matei Zaharia	ae8c7d6cfa	Made disk store use multiple directories, deleted ShuffleManager	2012-09-28 18:28:13 -07:00
Matei Zaharia	3d7267999d	Print and track user call sites in more places in Spark	2012-09-28 17:42:00 -07:00
Matei Zaharia	9f6efbf06a	Merge pull request #225 from pwendell/dev Log message which records RDD origin	2012-09-28 16:28:07 -07:00
Matei Zaharia	0121a26bd1	Changed the way tasks' dependency files are sent to workers so that custom serializers or Kryo registrators can be loaded.	2012-09-28 16:14:05 -07:00
Patrick Wendell	9fc78f8f29	Fixing some whitespace issues	2012-09-28 16:05:50 -07:00
Patrick Wendell	bc909c2903	Changes based on Matei's comments	2012-09-28 16:04:36 -07:00
Patrick Wendell	c387e40fb1	Log message which records RDD origin This adds tracking to determine the "origin" of an RDD. Origin is defined by the boundary between the user's code and the spark code, during an RDD's instantiation. It is meant to help users understand where a Spark RDD is coming from in their code. This patch also logs origin data when stages are submitted to the scheduler. Finally, it adds a new log message to fix an inconsitency in the way that dependent stages (those missing parents) and independent stages (those without) are logged during submission.	2012-09-28 15:51:46 -07:00
Matei Zaharia	2a8bfbca00	Fixed a bug where isLocal was set to false when using local[K]	2012-09-28 14:50:54 -07:00
Matei Zaharia	4a138403ef	Fix a bug in JAR fetcher that made it always fetch the JAR	2012-09-27 21:32:06 -07:00
Matei Zaharia	009b0e37e7	Added an option to compress blocks in the block store	2012-09-27 18:45:44 -07:00
Matei Zaharia	7bcb08cef5	Renamed storage levels to something cleaner; fixes #223 .	2012-09-27 17:50:59 -07:00
Matei Zaharia	920fab23c3	Merge pull request #222 from rxin/dev Added MapPartitionsWithSplitRDD.	2012-09-26 23:16:45 -07:00
Matei Zaharia	ea05fc130b	Updates to standalone cluster, web UI and deploy docs.	2012-09-26 22:54:39 -07:00
Matei Zaharia	1ef4f0fbd2	Allow controlling number of splits in sortByKey.	2012-09-26 19:18:47 -07:00
Reynold Xin	1ad1331a34	Added MapPartitionsWithSplitRDD.	2012-09-26 17:11:28 -07:00
Matei Zaharia	ee71fa49c1	Look for Kryo registrator using context class loader	2012-09-26 14:15:16 -07:00
Matei Zaharia	d71a358c46	Fixed a test that was getting extremely lucky before, and increased the number of samples used for sorting	2012-09-26 00:25:34 -07:00
Matei Zaharia	051785c7e6	Several fixes to sampling issues pointed out by Henry Milner: - takeSample was biased towards earlier partitions - There were some range errors in takeSample - SampledRDDs with replacement didn't produce appropriate counts across partitions (we took exactly frac of each one)	2012-09-25 21:46:58 -07:00
Matei Zaharia	4d3339a3ec	Merge pull request #217 from rxin/dev Added a method to RDD to expose the ClassManifest.	2012-09-24 23:52:32 -07:00
Reynold Xin	7a4cd92861	Renamed RDD.manifest to RDD.elementClassManifest	2012-09-24 23:42:33 -07:00
Matei Zaharia	296e24b440	Merge pull request #218 from rnpandya/dev Scripts to start Spark under windows	2012-09-24 21:10:31 -07:00
Reynold Xin	348bcbca1f	Added a method to RDD to expose the ClassManifest.	2012-09-24 16:56:27 -07:00
Ravi Pandya	39215357af	Windows command scripts for sbt and run	2012-09-24 15:43:19 -07:00
Matei Zaharia	6eeb379cf8	Fix some test issues	2012-09-24 15:39:58 -07:00
Matei Zaharia	f855e4fad2	Merge pull request #208 from rxin/dev Separated ShuffledRDD into multiple classes.	2012-09-24 12:32:01 -07:00
root	107a5ca879	Make default number of parallel fetches slightly smaller since it doesn't seem to hurt performance much and it will cause slightly less GC.	2012-09-23 06:06:12 +00:00
root	e41cab04ca	Avoid creating an extra buffer when saving a stream of values as DISK_ONLY	2012-09-23 05:56:44 +00:00
Denny	afb7ccc838	HTTP File server fixes.	2012-09-21 10:58:13 -07:00
root	6d28dde370	Rename our toIterator method into asIterator to prevent confusion with the Scala collection one, which often copies a collection.	2012-09-21 06:02:55 +00:00
root	a642051ade	Fixed a performance bug in BlockManager that was creating garbage when returning deserialized, in-memory RDDs.	2012-09-21 05:42:21 +00:00
root	8feb5caacd	Fixed an issue with ordering of classloader setup that was causing Java deserializer to break	2012-09-21 05:13:19 +00:00
Reynold Xin	6b5980da79	Set a limited number of retry in standalone deploy mode.	2012-09-19 15:41:56 -07:00
Reynold Xin	397d3816e1	Separated ShuffledRDD into multiple classes: RepartitionShuffledRDD, ShuffledSortedRDD, and ShuffledAggregatedRDD.	2012-09-19 12:31:45 -07:00
Denny	ca64d16a2d	When a file is downloaded, make it executable. That's neccsary for scripts (e.g. in Shark)	2012-09-17 10:08:37 -07:00
Matei Zaharia	840cbcf849	Change default serializer to Java.. it had accidentally become Kryo.	2012-09-13 17:19:26 -07:00
Matei Zaharia	b4dfa25c8a	Store shuffle map outputs as DISK_ONLY	2012-09-12 16:05:57 -07:00
Matei Zaharia	2d761e3353	Ported performance and FT improvements from latest streaming work	2012-09-12 14:54:40 -07:00
Matei Zaharia	9b4cd1648b	Fix bugs with Connection's shutdown callback failing to get its address	2012-09-12 14:54:14 -07:00
Matei Zaharia	9199775d41	Wait for Akka to really shut down in SparkEnv.stop()	2012-09-12 14:50:37 -07:00
Denny	5e4076e3f2	Merge branch 'dev' into feature/fileserver Conflicts: core/src/main/scala/spark/SparkContext.scala	2012-09-11 16:57:17 -07:00
Denny	77873d2c8e	Formatting	2012-09-11 16:51:46 -07:00
Denny	24b9b37314	Subclass URLClassLoader instead of using reflection	2012-09-11 16:51:08 -07:00
Denny	31c53e917d	Use stageId as index for fileSet caches.	2012-09-11 16:10:45 -07:00
Matei Zaharia	943df48348	Merge branch 'dev' of github.com:mesos/spark into dev	2012-09-11 16:00:37 -07:00
Matei Zaharia	6d7f907e73	Manually merge pull request #175 by Imran Rashid	2012-09-11 16:00:06 -07:00
Reynold Xin	7af7c79ce5	Updated the logError call from the previous commit to conform to logError API.	2012-09-11 14:32:24 -07:00
Reynold Xin	38b9119c96	Log entire exception (including stack trace) in BlockManagerWorker.	2012-09-11 11:31:35 -07:00
Denny	4d3471dd07	Fix serialization bugs and added local cluster tests	2012-09-10 15:39:58 -07:00
Tathagata Das	c63a606458	Made NewHadoopRDD broadcast its job configuration (same as HadoopRDD).	2012-09-10 19:51:27 +00:00
Denny	b864c36a30	Dynamically adding jar files and caching fileSets.	2012-09-10 12:49:09 -07:00
Denny	f275fb07da	General FileServer A general fileserver for both JARs and regular files.	2012-09-10 12:48:59 -07:00
Matei Zaharia	a13780670d	Added a unit test for local-cluster mode and simplified some of the code involved in that	2012-09-10 12:48:58 -07:00
Denny	f2ac55840c	Add shutdown hook to Executor Runner and execute code to shutdown local cluster in Scheduler Backend	2012-09-10 12:48:58 -07:00
Denny	9ead8ab14e	Set SPARK_LAUNCH_WITH_SCALA=0 in Executor Runner	2012-09-10 12:48:58 -07:00
Denny	8bb3c73977	Renamed spark-cluster to spark-local.	2012-09-10 12:48:58 -07:00
Denny	a367c20f49	Fix wrong counting	2012-09-10 12:48:57 -07:00
Denny	93fe331e6d	Delete old DeployUtils.	2012-09-10 12:48:57 -07:00
Denny	cf074f9c96	Renamed class.	2012-09-10 12:48:57 -07:00
Denny	3749f94184	Start a standalone cluster locally.	2012-09-10 12:48:57 -07:00
Matei Zaharia	995982b3c9	Added a unit test for local-cluster mode and simplified some of the code involved in that	2012-09-07 17:08:36 -07:00
Matei Zaharia	8d2fcc2832	Merge pull request #189 from dennybritz/feature/localcluster Simulating a Spark standalone cluster locally	2012-09-07 15:43:43 -07:00
Denny	7ff9311add	Add shutdown hook to Executor Runner and execute code to shutdown local cluster in Scheduler Backend	2012-09-07 14:09:12 -07:00
Denny	4e7b264cf7	Set SPARK_LAUNCH_WITH_SCALA=0 in Executor Runner	2012-09-07 11:39:44 -07:00
haoyuan	db08a362aa	commit opt for grep scalibility test.	2012-09-07 02:17:52 +00:00
root	c2da64409a	Randomize the order of block fetches in getMultiple	2012-09-06 23:16:26 +00:00
root	9ef90c95f4	Bug fix	2012-09-06 00:43:46 +00:00
root	2fa6d999fd	Tuning Akka more	2012-09-06 00:16:39 +00:00
Denny	886183e591	Renamed spark-cluster to spark-local.	2012-09-05 17:10:54 -07:00
root	215544820f	Serialize map output locations more efficiently, and only once, in MapOutputTracker	2012-09-05 23:54:04 +00:00
root	dc68febdce	User Spark's closure serializer for the ShuffleMapTask cache	2012-09-05 23:06:59 +00:00
Reynold Xin	c308fbcb79	Removed cache add/remove log messages from CacheTracker. Added log messages on BlockManagerMaster to reflect block add/remove. Also did some minor cleanup of storage package code.	2012-09-05 15:59:48 -07:00
root	ed937a821f	Merge branch 'dev' of github.com:radlab/spark into dev	2012-09-05 22:26:49 +00:00
root	1d6b36d3c3	Further tuning for network performance	2012-09-05 22:26:37 +00:00
root	3fa0d7f0c9	Serialize BlockRDD more efficiently	2012-09-05 08:28:15 +00:00
root	4a5d0d249e	Merge branch 'dev' of github.com:radlab/spark into dev	2012-09-05 08:23:09 +00:00
root	efc7668d16	Allow serializing HttpBroadcast through Kryo	2012-09-05 08:22:57 +00:00
root	75487b2f5a	Broadcast the JobConf in HadoopRDD to reduce task sizes	2012-09-05 08:14:50 +00:00
root	b7ad291ac5	Tuning Akka for more connections	2012-09-05 07:08:07 +00:00
root	fc186dc18a	Merge branch 'dev' of github.com:radlab/spark into dev	2012-09-05 05:53:18 +00:00
root	4ea032a142	Some changes to make important log output visible even if we set the logging to WARNING	2012-09-05 05:53:07 +00:00
Denny	babbca0a2f	Fix wrong counting	2012-09-04 22:04:18 -07:00
Denny	9326509f66	Delete old DeployUtils.	2012-09-04 21:15:23 -07:00
Denny	1588d4dbe6	Renamed class.	2012-09-04 21:13:25 -07:00
Denny	22dde6e020	Start a standalone cluster locally.	2012-09-04 20:56:30 -07:00
Tathagata Das	7c09ad0e04	Changed DStream member access permissions from private to protected. Updated StateDStream to checkpoint RDDs and forget lineage.	2012-09-04 19:11:49 -07:00
Matei Zaharia	a842c63044	Minor formatting fixes	2012-09-03 16:24:00 -07:00
Tathagata Das	b8e9e8ea78	Merge branch 'dev' of github.com:radlab/spark into dev	2012-09-02 02:35:32 -07:00
root	ceabf71257	tweaks	2012-09-01 21:52:42 +00:00
root	6025889be0	More raw network receiver programs	2012-09-01 20:51:07 +00:00
Harvey	3076b038f4	Start fetching a remote block when a received remote block has been passed to the reduce function	2012-09-01 12:01:35 -07:00
Matei Zaharia	f84d2bbe55	Bug fixes to RateLimitedOutputStream	2012-09-01 00:31:15 -07:00
Matei Zaharia	44758aa8e2	First work towards a RawInputDStream and a sender program for it.	2012-09-01 00:17:59 -07:00
root	c42e7ac282	More block manager fixes	2012-09-01 04:31:11 +00:00
Matei Zaharia	389fb4cc54	End runJob() with a SparkException when a task fails too many times in one of the cluster schedulers.	2012-08-31 17:47:43 -07:00
root	113277549c	Really fixed the replication-3 issue. The problem was a few buffers not being rewound.	2012-08-31 05:39:35 +00:00
Mosharaf Chowdhury	31ffe8d528	Synchronization bug fix in broadcast implementations	2012-08-30 22:26:43 -07:00
Matei Zaharia	101ae493e2	Replicate serialized blocks properly, without sharing a ByteBuffer.	2012-08-30 22:24:14 -07:00
Mosharaf Chowdhury	3883532545	Bug fix. Fixed log messages. Updated BroadcastTest example to have iterations.	2012-08-30 21:43:00 -07:00
Matei Zaharia	a480dec6b2	Deserialize multi-get results in the caller's thread. This fixes an issue with shared buffers in the KryoSerializer.	2012-08-30 20:01:06 -07:00
Matei Zaharia	1b3e3352eb	Deserialize multi-get results in the caller's thread. This fixes an issue with shared buffers with the KryoSerializer.	2012-08-30 17:59:25 -07:00
root	c4366eb764	Fixes to ShuffleFetcher	2012-08-31 00:34:24 +00:00
Reynold Xin	5945bcdcc5	Added a new flag in Aggregator to indicate applying map side combiners.	2012-08-29 23:32:08 -07:00
Reynold Xin	c68e820b2a	Merge branch 'dev' of github.com:mesos/spark into dev	2012-08-29 23:01:19 -07:00
Reynold Xin	940869dfda	Disable running combiners on map tasks when mergeCombiners function is not specified by the user.	2012-08-29 23:00:02 -07:00
Tathagata Das	4db3a96766	Made minor changes to reduce compilation errors in Eclipse. Twirl stuff still does not compile in Eclipse.	2012-08-29 13:04:01 -07:00
Matei Zaharia	bf2e9cb08e	Fault tolerance and block store fixes discovered through streaming tests.	2012-08-27 23:07:50 -07:00
Matei Zaharia	17af2df0cd	Log levels	2012-08-27 23:07:32 -07:00
Matei Zaharia	b4a2214218	More fault tolerance fixes to catch lost tasks	2012-08-27 22:49:29 -07:00
Reynold Xin	3a6a95dc24	Removed the deserialization cache for ShuffleMapTask because it was causing concurrency problems (some variables in Shark get set to null). The cost of task deserialization on slaves is trivial compared with the execution time of the task anyway.	2012-08-27 22:33:15 -07:00
Josh Rosen	bff6a46359	Add pipe(), saveAsTextFile(), sc.union() to Python API.	2012-08-27 00:24:47 -07:00
Josh Rosen	200d248dcc	Simplify Python worker; pipeline the map step of partitionBy().	2012-08-27 00:24:39 -07:00
Josh Rosen	f79a1e4d2a	Add broadcast variables to Python API.	2012-08-27 00:16:47 -07:00
Matei Zaharia	b914cd0dfa	Serialize generation correctly in ShuffleMapTask	2012-08-26 20:07:59 -07:00
Matei Zaharia	69c2ab0408	logging	2012-08-26 20:00:58 -07:00
Matei Zaharia	117e3f8c86	Fix a bug that was causing FetchFailedException not to be thrown	2012-08-26 19:52:56 -07:00
Matei Zaharia	3c9c44a8d3	More helpful log messages	2012-08-26 19:37:43 -07:00
Matei Zaharia	26dfd20c9a	Detect disconnected slaves in StandaloneScheduler	2012-08-26 18:56:56 -07:00
Matei Zaharia	29e83f39e9	Fix replication with MEMORY_ONLY_DESER_2	2012-08-26 18:16:25 -07:00
Matei Zaharia	06ef7c3d1b	Less debug info	2012-08-26 16:29:20 -07:00
Matei Zaharia	741899b21e	Fix sendMessageReliablySync	2012-08-26 16:26:06 -07:00
Matei Zaharia	5a8015d2db	Merge remote-tracking branch 'public/dev' into dev	2012-08-24 16:11:44 -07:00
Matei Zaharia	deedb9e7b7	Fix further issues with tests and broadcast. The broadcast fix is to store values as MEMORY_ONLY_DESER instead of MEMORY_ONLY, which will save substantial time on serialization.	2012-08-23 20:31:49 -07:00
Matei Zaharia	59b831b9d1	Fixed test failures due to broadcast not stopping correctly	2012-08-23 19:59:55 -07:00
Matei Zaharia	7310a6f499	Merge pull request #147 from mosharaf/dev Broadcast refactoring/cleaning up	2012-08-23 19:38:28 -07:00
Josh Rosen	607b53abfc	Use numpy in Python k-means example.	2012-08-22 00:43:55 -07:00
Josh Rosen	fd94e5443c	Use only cPickle for serialization in Python API. Objects serialized with JSON can be compared for equality, but JSON can be slow to serialize and only supports a limited range of data types.	2012-08-21 14:01:27 -07:00
Matei Zaharia	25a6a39e6d	Added other SparkContext constructors to JavaSparkContext	2012-08-19 18:59:16 -07:00
Josh Rosen	886b39de55	Add Python API.	2012-08-18 22:33:51 -07:00
Shivaram Venkataraman	1ea269110c	Move object size and pointer size initialization into a function to enable unit-testing	2012-08-13 13:31:45 -07:00
Shivaram Venkataraman	44661df9cc	If spark.test.useCompressedOops is set, use that to infer compressed oops setting. This is useful to get a deterministic test case	2012-08-13 13:31:39 -07:00
Shivaram Venkataraman	0dd8fe73ba	Use HotSpotDiagnosticMXBean to get if CompressedOops are in use or not	2012-08-13 13:31:29 -07:00
Shivaram Venkataraman	80104ce1da	Add link to Java wiki which specifies what changes with compressed oops	2012-08-13 13:31:21 -07:00
Shivaram Venkataraman	00ab5490b3	Changes to make size estimator more accurate. Fixes object size, pointer size according to architecture and also aligns objects and arrays when computing instance sizes. Verified using Eclipse Memory Analysis Tool (MAT)	2012-08-13 13:31:11 -07:00
Matei Zaharia	6ae3c375a9	Renamed apply() to call() in Java API and allowed it to throw Exceptions	2012-08-12 23:10:19 +02:00
Matei Zaharia	0141879c40	Use Promises instead of having a Future wait on a thread in ConnectionManager.	2012-08-12 22:16:32 +02:00
Matei Zaharia	845a870242	Return remotely fetched blocks in a pipelined fashion from BlockManager	2012-08-12 20:01:38 +02:00
Matei Zaharia	e17ed9a21d	Switch to Akka futures in connection manager. It's still not good because each Future ends up waiting on a lock, but it seems to work better than Scala Actors, and more importantly it allows us to use onComplete and other listeners on futures.	2012-08-12 19:40:37 +02:00
Matei Zaharia	ad8a7612a4	Changed multi-get method in BlockManager to return an iterator	2012-08-12 19:18:01 +02:00
Matei Zaharia	3c94e5c188	Merge pull request #168 from shivaram/dev Use JavaConversion to get a scala iterator	2012-08-10 00:57:33 -07:00
Matei Zaharia	e463e7a333	Merge pull request #167 from JoshRosen/piped-rdd-fixes Detect non-zero exit status from PipedRDD process	2012-08-10 00:56:42 -07:00
Josh Rosen	59c22fb444	Print exit status in PipedRDD failure exception.	2012-08-10 00:33:56 -07:00
Shivaram Venkataraman	1803cce692	Use an implicit conversion to get the scala iterator	2012-08-08 14:31:04 -07:00
Shivaram Venkataraman	674fcf56bf	Use JavaConversion to get a scala iterator	2012-08-08 14:10:23 -07:00
Shivaram Venkataraman	f4aaec7a48	Avoid a copy in ShuffleMapTask by creating an iterator that will be used by the block manager.	2012-08-08 00:47:02 -07:00
Mosharaf Chowdhury	d821dd3ccc	BroadcastManager is a class now (replaced Braodcast object)	2012-08-05 01:10:51 -07:00
Mosharaf Chowdhury	b4804119f9	Merge remote-tracking branch 'upstream/dev' into dev	2012-08-04 20:42:12 -07:00
Matei Zaharia	88b016db2a	Merge pull request #160 from dennybritz/clusterscripts Standalone cluster scripts	2012-08-04 17:45:20 -07:00
Mosharaf Chowdhury	1b0534af8f	Merge branch 'dev' into bc-bm	2012-08-04 00:30:08 -07:00
Mosharaf Chowdhury	d11b457e67	Merge remote-tracking branch 'upstream/dev' into dev	2012-08-04 00:28:10 -07:00
Mosharaf Chowdhury	24b7eb872c	Bug fixed. Broadcast now works with BlockManager.	2012-08-04 00:27:28 -07:00
Matei Zaharia	6601a6212b	Added a unit test for cross-partition balancing in sort, and changes to RangePartitioner to make it pass. It turns out that the first partition was always kind of small due to how we picked partition boundaries.	2012-08-03 16:40:45 -04:00
Harvey	1170de3757	Fix for partitioning when sorting in descending order	2012-08-03 16:40:38 -04:00
Paul Cavallaro	d05c0f97ca	Logging Throwables in Info and Debug Logging Throwables in logInfo and logDebug instead of swallowing them. Conflicts: core/src/main/scala/spark/Logging.scala	2012-08-03 16:40:21 -04:00
Denny	0008994044	merged dev branch	2012-08-02 16:00:33 -07:00
Denny	53008c2d8a	Settings variables and bugfix for stop script.	2012-08-02 15:59:39 -07:00
Matei Zaharia	71a958b0b7	Merge branch 'dev' of github.com:mesos/spark into dev Conflicts: project/SparkBuild.scala	2012-08-02 17:23:13 -04:00
Denny	7312a5c30f	Use spray's implicit Marshaller for Futures.	2012-08-02 14:11:27 -07:00
Denny	ba7e30fb5e	Mostly stlyistic changes.	2012-08-02 13:55:09 -07:00
Shivaram Venkataraman	1a07bb9ba4	Avoid an extra partition copy by passing an iterator to blockManager.put	2012-08-02 12:22:33 -07:00
Shivaram Venkataraman	6790908b11	Use maxMemory to better estimate memory available for BlockManager cache	2012-08-02 12:05:05 -07:00
Denny	863c31b7c1	Moved resources into static folder	2012-08-02 09:48:36 -07:00
Tathagata Das	1c0aeee960	Merge branch 'dev' of github.com:radlab/spark into dev	2012-08-01 22:11:41 -07:00
Tathagata Das	3be54c2a8a	1. Refactored SparkStreamContext, Scheduler, InputRDS, FileInputRDS and a few other files. 2. Modified Time class to represent milliseconds (long) directly, instead of LongTime. 3. Added new files QueueInputRDS, RecurringTimer, etc. 4. Added RDDSuite as the skeleton for testcases. 5. Added two examples in spark.streaming.examples. 6. Removed all past examples and a few unnecessary files. Moved a number of files to spark.streaming.util.	2012-08-01 22:09:27 -07:00
Denny	0ee44c225e	Spark standalone mode cluster scripts. Heavily inspired by Hadoop cluster scripts ;-)	2012-08-01 20:38:52 -07:00
Denny	6c670c37dd	Webui improvements.	2012-08-01 19:47:57 -07:00
Denny	1b29e90a79	merge dev branch	2012-08-01 14:06:09 -07:00
Denny	011220fa55	Compact job page.	2012-08-01 11:26:45 -07:00
Denny	7a295fee96	Spark WebUI Implementation.	2012-08-01 11:01:09 -07:00
Mosharaf Chowdhury	f23395e8c5	Merge remote-tracking branch 'upstream/dev' into dev	2012-07-30 19:39:49 -07:00
Matei Zaharia	7814ecbd47	Merge remote-tracking branch 'public/dev' into dev	2012-07-30 15:05:49 -07:00
Matei Zaharia	3ee2530c0c	Merge branch 'block-manager-fix' into dev	2012-07-30 13:58:46 -07:00
Matei Zaharia	400221f851	Merge branch 'dev' of git://github.com/tdas/spark into dev	2012-07-30 13:54:57 -07:00
Matei Zaharia	ed1b0f8388	Made BlockManagerMaster no longer be a singleton. Also cleaned up a few formatting things throughout block manager code.	2012-07-30 13:53:47 -07:00
Matei Zaharia	f471c82558	Various reorganization and formatting fixes	2012-07-30 11:24:01 -07:00
Mosharaf Chowdhury	5932a87cac	Merge remote-tracking branch 'upstream/dev' into dev	2012-07-29 18:20:45 -07:00
Matei Zaharia	e2e71a1fb5	Merge remote-tracking branch 'public/dev' into dev	2012-07-28 20:26:59 -07:00
Imran Rashid	f7149c5e46	tasks cannot access value of accumulator	2012-07-28 20:16:17 -07:00
Imran Rashid	244cbbe33a	one more minor cleanup to scaladoc	2012-07-28 20:16:10 -07:00
Imran Rashid	3b392c67db	fix up scaladoc, naming of type parameters	2012-07-28 20:16:01 -07:00
Imran Rashid	f1face1ea9	rename addToAccum to addAccumulator	2012-07-28 20:16:01 -07:00
Imran Rashid	2d666b9d76	add some functionality to Vector, delete copy in AccumulatorSuite	2012-07-28 20:15:51 -07:00
Imran Rashid	edc6972f8e	move Vector class into core and spark.util package	2012-07-28 20:15:42 -07:00
Imran Rashid	83659af11c	Accumulator now inherits from Accumulable, whcih simplifies a bunch of other things (eg., no +:=) Conflicts: core/src/main/scala/spark/Accumulators.scala	2012-07-28 20:13:51 -07:00
Imran Rashid	79d58ed20a	improve scaladoc	2012-07-28 20:12:41 -07:00
Imran Rashid	ae07f3864c	add Accumulatable, add corresponding docs & tests for accumulators	2012-07-28 20:12:41 -07:00
Matei Zaharia	47b7ebad12	Added the Spark Streaing code, ported to Akka 2	2012-07-28 20:03:26 -07:00
Matei Zaharia	dee8ff1b9d	Added a second version of union() without varargs.	2012-07-27 16:27:52 -07:00
Tathagata Das	cf429699e1	Updated the new checkpoint RDD to remember partitioning of the original RDD.	2012-07-27 23:16:37 +00:00
Mosharaf Chowdhury	b5be936d7c	Broadcasts using BlockManager instead of BoundedMemoryCache	2012-07-27 15:38:46 -07:00
Mosharaf Chowdhury	1f19fbb8db	Merge remote-tracking branch 'upstream/dev' into dev Conflicts: core/src/main/scala/spark/broadcast/Broadcast.scala	2012-07-27 15:18:23 -07:00
Matei Zaharia	b51d733a57	Fixed Java union methods having same erasure. Changed union() methods on lists to take a separate "first element" argument in order to differentiate them to the compiler, because Java 7 considered it an error to have them all take Lists parameterized with different types.	2012-07-27 12:23:27 -07:00
Tathagata Das	3e271c3b61	Merge branch 'dev' of github.com:tdas/spark into dev	2012-07-27 12:01:04 -07:00
Tathagata Das	024905f682	Added BlockRDD and a first-cut version of checkpoint() to RDD class.	2012-07-27 12:00:49 -07:00
Tathagata Das	d1eee44a03	Fixed more stuff in BoundedMemoryCache.	2012-07-27 18:33:32 +00:00
Tathagata Das	d1b7f41671	Fixed bug in BoundedMemoryCache.	2012-07-27 09:00:45 -07:00
Tathagata Das	435d129bec	Fixed bugs in block dropping code of MemoryStore and changed synchronized HashMap to ConcurrentHashMap in BlockManager.	2012-07-27 10:02:26 +00:00
Tathagata Das	0426769f89	Modified the block dropping code for better performance.	2012-07-26 20:53:45 -07:00
Matei Zaharia	5c5aa2ff81	Merge pull request #153 from JoshRosen/new-java-api Java API	2012-07-26 17:20:52 -07:00
Josh Rosen	c5e2810dc7	Add persist(), splits(), glom(), and mapPartitions() to Java API.	2012-07-26 12:46:47 -07:00
Josh Rosen	bf61c10072	Detect non-zero exit status from PipedRDD process.	2012-07-26 11:32:59 -07:00
Josh Rosen	6a78e88237	Minor cleanup and optimizations in Java API. - Add override keywords. - Cache RDDs and counts in TC example. - Clean up JavaRDDLike's abstract methods.	2012-07-24 09:47:00 -07:00
Denny	4f4a34c025	Stlystic changes Conflicts: core/src/test/scala/spark/MesosSchedulerSuite.scala	2012-07-23 16:32:20 -07:00
Matei Zaharia	600e99728d	Fix a bug where an input path was added to a Hadoop job configuration twice	2012-07-23 16:16:19 -07:00
Josh Rosen	042dcbde33	Add type annotations to Java API methods. Add missing Scala Map to java.util.Map conversions.	2012-07-22 17:35:29 -07:00
Josh Rosen	e23938c3be	Use mapValues() in JavaPairRDD.cogroupResultToJava().	2012-07-22 15:10:01 -07:00
Josh Rosen	01dce3f569	Add Java API Add distinct() method to RDD. Fix bug in DoubleRDDFunctions.	2012-07-18 17:34:29 -07:00
Mosharaf Chowdhury	85cd9979f2	Fix for isLocal	2012-07-13 01:13:14 -07:00
Mosharaf Chowdhury	1c83fd4b66	Merged with Upstream dev	2012-07-13 01:08:28 -07:00
Mosharaf Chowdhury	bb4ee580fa	Cleaning BitTorrentBroadcast code...	2012-07-13 01:04:01 -07:00
Mosharaf Chowdhury	8ccffe21da	Cleaned TreeBroadcast	2012-07-13 00:54:25 -07:00
Matei Zaharia	628bb5ca7f	Allow null keys in Spark's reduce and group by	2012-07-12 18:36:02 -07:00
Matei Zaharia	e2a67a8024	Fixes to coarse-grained Mesos scheduler in dealing with failed nodes	2012-07-12 18:21:52 -07:00
Matei Zaharia	be622cf867	Formatting	2012-07-11 17:31:44 -07:00
Matei Zaharia	e8ae77df24	Added more methods for loading/saving with new Hadoop API	2012-07-11 17:31:33 -07:00
Mosharaf Chowdhury	34999d97f5	Added stop() to the Broadcast subsystem	2012-07-10 01:03:47 -07:00
Mosharaf Chowdhury	d6a9680604	Slightly better check for isLocal	2012-07-10 00:16:47 -07:00
Mosharaf Chowdhury	701f49e0d9	Refactoring	2012-07-09 22:39:47 -07:00
Mosharaf Chowdhury	cf1c60a1de	Refactoring	2012-07-09 22:07:46 -07:00
Mosharaf Chowdhury	e71f69ad3d	Refactoring	2012-07-09 22:07:17 -07:00
Mosharaf Chowdhury	ca02a92332	Refactored TrackMultipleValues out.	2012-07-09 21:35:39 -07:00
Mosharaf Chowdhury	654576ef1a	Tweaks	2012-07-09 21:12:42 -07:00
Mosharaf Chowdhury	425c247269	Removed some unused stuff	2012-07-08 14:29:04 -07:00
Matei Zaharia	0a47284003	More work to allow Spark to run on the standalone deploy cluster.	2012-07-08 14:00:04 -07:00
Mosharaf Chowdhury	c7c5258e25	Compiles without Dfs	2012-07-08 13:22:12 -07:00
Mosharaf Chowdhury	178bb29f05	Removed Chained and Dfs broadcast implementations	2012-07-08 11:57:00 -07:00
Matei Zaharia	1aa63f775b	Added back coarse-grained Mesos scheduler based on StandaloneScheduler.	2012-07-08 10:52:13 -07:00
Matei Zaharia	c5cc10cda3	More work on standalone scheduler	2012-07-06 20:17:44 -07:00
Matei Zaharia	909b325243	Further refactoring, and start of a standalone scheduler backend	2012-07-06 17:56:44 -07:00
Matei Zaharia	4e2fe0bdaf	Miscellaneous bug fixes	2012-07-06 16:33:40 -07:00
Matei Zaharia	e72afdb817	Some refactoring to make cluster scheduler pluggable.	2012-07-06 15:23:26 -07:00
Matei Zaharia	5d1a887bed	Further updates to run processes on cluster.	2012-07-01 17:13:31 -07:00
Matei Zaharia	51c46eaca0	More work on standalone deploy system.	2012-07-01 01:05:59 -07:00
Matei Zaharia	a6eb9fda61	Detect connection and disconnection of slaves	2012-06-30 17:46:56 -07:00
Matei Zaharia	408b5a1332	More work on deploy code (adding Worker class)	2012-06-30 16:45:57 -07:00
Matei Zaharia	2fb6e7d71e	Initial framework to get a master and web UI up.	2012-06-30 14:45:55 -07:00
Matei Zaharia	c53670b9bf	Various code style fixes, mostly from IntelliJ IDEA	2012-06-29 18:47:12 -07:00
Matei Zaharia	c6be4ffbf9	Fixes to CoarseMesosScheduler	2012-06-29 16:18:51 -07:00
Matei Zaharia	3a58efa5a5	Allow binding to a free port and change Akka logging to use SLF4J. Also fixes various bugs in the previous code when running on Mesos.	2012-06-29 16:02:21 -07:00
Matei Zaharia	3920189932	Upgraded to Akka 2 and fixed test execution (which was still parallel across projects).	2012-06-28 23:51:28 -07:00
root	6ad3e1f1b4	Various fixes when running on Mesos	2012-06-20 06:48:26 +00:00
Tathagata Das	40536e3668	Fixed nasty corner case bug in ByteBufferInputStream. Could not add a test case for this as I could not figure out how to deterministically reproduce the bug in a short testcase.	2012-06-17 13:28:41 -07:00
Matei Zaharia	2893b30550	Various fixes to get unit tests running. In particular, shut down ConnectionManager and DAGScheduler properly, plus a fix to LocalScheduler that was not merged in from 0.5 and was actually caught by one of the tests.	2012-06-17 00:28:45 -07:00
Matei Zaharia	b3eeac55b8	Fixed HttpBroadcast to work with this branch's Serializer.	2012-06-15 23:54:38 -07:00
Matei Zaharia	f58da6164e	Merge branch 'master' into dev	2012-06-15 23:47:11 -07:00
Tathagata Das	5f54bdf98b	Added shutdown for akka to SparkContext.stop(). Helps a little, but many testsuites still fail.	2012-06-13 20:49:00 -04:00
Tathagata Das	c6156da9e2	Multiple bug fixes to pass the testsuites ShuffleSuite and BlockManagerSuite.	2012-06-13 16:26:49 -04:00
Matei Zaharia	879bc0bece	Merge branch 'master' into mesos-0.9	2012-06-09 16:24:16 -07:00
Matei Zaharia	4b05798c06	Further bug fix to HttpBroadcast	2012-06-09 16:24:03 -07:00
Matei Zaharia	587a16a7ef	Merge branch 'master' into mesos-0.9	2012-06-09 16:17:07 -07:00
Matei Zaharia	8ed662862e	Bug fix to HttpBroadcast	2012-06-09 16:16:55 -07:00
Matei Zaharia	2fd9f994ae	Merge branch 'master' into mesos-0.9	2012-06-09 15:58:35 -07:00
Matei Zaharia	e75b1b5cb4	Change the default broadcast implementation to a simple HTTP-based broadcast. Fixes #139.	2012-06-09 15:58:07 -07:00
Matei Zaharia	a96558caa3	Performance improvements to shuffle operations: in particular, preserve RDD partitioning in more cases where it's possible, and use iterators instead of materializing collections when doing joins.	2012-06-09 14:44:18 -07:00
Matei Zaharia	63051dd2bc	Merge in engine improvements from the Spark Streaming project, developed jointly with Tathagata Das and Haoyuan Li. This commit imports the changes and ports them to Mesos 0.9, but does not yet pass unit tests due to various classes not supporting a graceful stop() yet.	2012-06-07 12:45:38 -07:00
Matei Zaharia	7e1c97fc4b	Merge branch 'master' into mesos-0.9	2012-06-06 16:48:59 -07:00
Matei Zaharia	048276799a	Commit task outputs to Hadoop-supported storage systems in parallel on the cluster instead of on the master. Fixes #110.	2012-06-06 16:46:53 -07:00
Matei Zaharia	6888bc7191	Merge branch 'master' into mesos-0.9	2012-06-06 16:14:19 -07:00
Matei Zaharia	6ae2746d1e	Handle arrays that contain the same element many times better in SizeEstimator. Also added a test for SizeEstimator. Fixes #136.	2012-06-06 16:13:02 -07:00
Matei Zaharia	dbc3c86ae3	Merge branch 'master' into mesos-0.9 Conflicts: core/src/main/scala/spark/Executor.scala	2012-06-03 17:44:04 -07:00
Matei Zaharia	e141f644ca	Merge pull request #132 from Benky/rb-first-iteration Little refactoring and unit tests for CacheTrackerActor	2012-05-26 13:15:06 -07:00
Richard Benkovsky	ae64920337	MesosScheduler refactoring	2012-05-22 11:04:54 +02:00
Richard Benkovsky	3a1bcd4028	Added tests for CacheTrackerActor	2012-05-22 11:04:54 +02:00
Richard Benkovsky	8f2f736d53	Little refactoring	2012-05-22 11:04:54 +02:00
Richard Benkovsky	f162fc2beb	Formating fixed	2012-05-22 09:45:38 +02:00
Richard Benkovsky	565245871f	BoundedMemoryCache.put fails when estimated size of 'value' is larger than cache capacity	2012-05-20 22:13:35 +02:00
Richard Benkovsky	822a4be37d	Utils.memoryBytesToString fixed	2012-05-19 15:13:20 +02:00
Reynold Xin	d0c6e9f639	Made some RDD dependencies transient to reduce the amount of data needed to be serialized in closure serialization. This can significantly reduce the task setup time in Shark when the query involves a large number of (Hive) partitions.	2012-05-16 14:16:55 -07:00
Reynold Xin	16461e2eda	Updated Cache's put method to use a case class for response. Previously it was pretty ugly that put() should return -1 for failures.	2012-05-15 00:31:52 -07:00
Reynold Xin	019e48833f	Added the capacity to report cache usage status back to the cache trackor. This is essential for building a dashboard to see the status of caches on all slaves.	2012-05-14 18:39:04 -07:00
Matei Zaharia	f48742683a	Made caches dataset-aware so that they won't cyclically evict partitions from the same dataset.	2012-05-06 20:14:40 -07:00
Matei Zaharia	bd2ab635a7	Fixed the way the JAR server is created after finding issue at Twitter	2012-05-05 20:05:15 -07:00
Matei Zaharia	32a4f4623c	Merge pull request #129 from mesos/rxin Force serialize/deserialize task results in local execution mode.	2012-04-24 16:18:39 -07:00
Reynold Xin	9821cd4d42	Force serialize/deserialize task results in local execution mode.	2012-04-24 14:55:28 -07:00
Antonio	3e48818993	Removed commented-out System.exit call	2012-04-23 11:42:58 -07:00
Antonio	39d99168dc	Added exception handling instead of just exiting in LocalScheduler for tasks that throw exceptions	2012-04-20 14:46:43 -07:00
Reynold Xin	e601b3b9e5	Added the ability to set environmental variables in piped rdd.	2012-04-17 16:40:56 -07:00
Matei Zaharia	3b745176e0	Bug fix to pluggable closure serialization change	2012-04-12 17:53:02 +00:00
Matei Zaharia	112655f032	Merge pull request #121 from rxin/kryo-closure Added an option (spark.closure.serializer) to specify the serializer for closures.	2012-04-10 14:21:02 -07:00
Reynold Xin	d295ccb43c	Added a closureSerializer field in SparkEnv and use it to serialize tasks.	2012-04-10 13:29:46 -07:00
Reynold Xin	968f75f6af	Added an option (spark.closure.serializer) to specify the serializer for closures. This enables using Kryo as the closure serializer.	2012-04-09 21:59:56 -07:00
Matei Zaharia	a69c0738d1	Merge branch 'master' into mesos-0.9	2012-04-08 23:41:36 -07:00
Matei Zaharia	a633974143	Merge branch 'master' of github.com:mesos/spark	2012-04-08 23:41:25 -07:00
Matei Zaharia	0229d5390f	Merge branch 'master' into mesos-0.9	2012-04-08 23:39:37 -07:00
Matei Zaharia	d401e1b3e8	Fix a possible deadlock in MesosScheduler	2012-04-08 23:38:49 -07:00
Ankur Dave	7be1c7b331	Report entry dropping in BoundedMemoryCache	2012-04-06 15:49:32 -07:00
Matei Zaharia	a8bb324ed9	Merge branch 'master' into mesos-0.9	2012-04-05 14:53:22 -07:00
Matei Zaharia	816d4e5840	Pass local IP address instead of hostname in spark.master.host. Fixes #117 .	2012-04-05 14:53:17 -07:00
Matei Zaharia	335a6036ad	Converted some tabs to spaces	2012-04-05 11:58:01 -07:00
Matei Zaharia	8c95a85438	Use Runtime.maxMemory instead of Runtime.totalMemory in BoundedMemoryCache, in case the JVM was not started with its initial heap size equaling its maximum one (-Xms == -Xmx).	2012-03-30 13:39:35 -04:00
Matei Zaharia	03d5b3b48d	Use Runtime.maxMemory instead of Runtime.totalMemory in BoundedMemoryCache, in case the JVM was not started with its initial heap size equaling its maximum one (-Xms == -Xmx).	2012-03-30 13:38:19 -04:00
Matei Zaharia	dfa3b6b544	Fixes to work with the very latest Mesos 0.9 API	2012-03-29 22:12:35 -04:00
Matei Zaharia	4d52cc6738	Merge branch 'master' into mesos-0.9	2012-03-29 21:29:39 -04:00
Reynold Xin	42dcdbcb2f	Removed the extra spaces in OrderedRDDFunctions and SortedRDD.	2012-03-29 15:21:57 -07:00
Matei Zaharia	08cda89e8a	Further fixes to how Mesos is found and used	2012-03-17 13:39:14 -07:00
Matei Zaharia	3c3fdf6eca	Merge branch 'master' into mesos-0.9	2012-03-17 13:09:21 -07:00
Matei Zaharia	c7af538ac1	Some fixes to sorting for when the RDD has fewer elements than the number of partitions we ask to partition it into. Also, removed a test that was taking way too long to run.	2012-03-17 13:08:36 -07:00
Matei Zaharia	a099a63a8a	Initial work to make Spark compile with Mesos 0.9 and Hadoop 1.0	2012-03-17 12:31:34 -07:00
Matei Zaharia	a5e2b6a6bd	Merge pull request #112 from cengle/master Changed HadoopRDD to get key and value containers from the RecordReader instead of through reflection	2012-03-06 13:38:32 -08:00
Matei Zaharia	97eee50825	Fixes a nasty bug that could happen when tasks fail, because calling wait() with a timeout of 0 on a Java object means "wait forever".	2012-03-01 13:43:17 -08:00
Cliff Engle	dd68cb6099	Get key and value container from RecordReader	2012-02-29 16:33:23 -08:00
Matei Zaharia	1e10df0a46	Merge pull request #111 from alupher/master Adding sorting to RDDs	2012-02-24 15:50:14 -08:00
Matei Zaharia	aa04f87cd2	Added support for parallel execution of jobs in DAGScheduler.	2012-02-19 22:50:23 -08:00
Antonio	620798161b	Added fixes to sorting	2012-02-13 00:07:39 -08:00
Matei Zaharia	2587ce1690	Fixed a deadlock that occured with MesosScheduler due to an earlier synchronization change	2012-02-11 21:22:45 -08:00
Antonio	e93f622665	Added sorting by key for pair RDDs	2012-02-11 00:56:28 -08:00
Matei Zaharia	98f008b721	Formatting fixes	2012-02-10 10:52:03 -08:00
Matei Zaharia	7660a8b12f	Merge branch 'formatting' Conflicts: core/src/main/scala/spark/DAGScheduler.scala core/src/main/scala/spark/SimpleShuffleFetcher.scala core/src/main/scala/spark/SparkContext.scala	2012-02-10 10:42:14 -08:00
haoyuan	194c42ab79	Code format.	2012-02-10 08:19:53 -08:00
Matei Zaharia	8f5ed51234	Delete Spark's temporary directories when the JVM exits.	2012-02-09 22:58:24 -08:00
Matei Zaharia	c0a0df3285	Made the default cache BoundedMemoryCache, and reduced its default size	2012-02-09 22:32:02 -08:00
Matei Zaharia	0e93891d3d	Replaced LocalFileShuffle with a non-singleton ShuffleManager class and made DAGScheduler automatically set SparkEnv.	2012-02-09 22:14:56 -08:00
haoyuan	445e0bb1b5	Format the code a bit mroe.	2012-02-09 15:50:26 -08:00
haoyuan	651932e703	Format the code as coding style agreed by Matei/TD/Haoyuan	2012-02-09 13:26:23 -08:00
Matei Zaharia	e02dc83a5b	IO optimizations	2012-02-06 20:40:39 -08:00
Matei Zaharia	c40e766368	Use java.util.HashMap in shuffles	2012-02-06 19:20:25 -08:00
Matei Zaharia	b267175ab5	Synchronization fix in case SparkContext is used from multiple threads.	2012-02-06 14:28:18 -08:00
Hiral Patel	b47952342e	Add register immutable map to kryo serializer	2012-01-26 15:24:20 -08:00
Matei Zaharia	fabcc82528	Merge pull request #103 from edisontung/master Made improvements to takeSample. Also changed SparkLocalKMeans to SparkKMeans	2012-01-13 19:20:03 -08:00
Matei Zaharia	fd5581a0d3	Fixed a failure recovery bug and added some tests for fault recovery.	2012-01-13 19:17:27 -08:00
Edison Tung	1ecc221f84	Fixed bugs I've fixed the bugs detailed in the diff. One of the bugs was already fixed on the local file (forgot to commit).	2012-01-09 11:59:52 -08:00
Matei Zaharia	e269f6f7ea	Register RDDs with the MapOutputTracker even if they have no partitions. Fixes #105.	2012-01-05 15:59:20 -05:00
Matei Zaharia	3034fc0d91	Merge commit 'ad4ebff42c1b738746b2b9ecfbb041b6d06e3e16'	2011-12-14 18:19:43 +01:00
Matei Zaharia	6a650cbbdf	Make Spark port default to 7077 so that it's not an ephemeral port that might be taken	2011-12-14 18:18:22 +01:00
Matei Zaharia	735843a049	Merge remote-tracking branch 'origin/charles-newhadoop'	2011-12-02 21:59:30 -08:00
Charles Reiss	66f05f383e	Add new Hadoop API reading support.	2011-12-01 14:02:10 -08:00
Charles Reiss	02d43e6986	Add new Hadoop API writing support.	2011-12-01 14:01:28 -08:00
Edison Tung	42f8847a21	Revert de01b6deaaee1b43321e0aac330f4a98c0ea61c6^..HEAD	2011-12-01 13:43:25 -08:00
Edison Tung	de01b6deaa	Fixed bug in RDD Math.min takes 2 args, not 1. This was not committed earlier for some reason	2011-12-01 13:34:37 -08:00
Matei Zaharia	22b8fcf632	Added fold() and aggregate() operations that reuse an object to merge results into rather than requiring a new object allocation for each element merged. Fixes #95.	2011-11-30 11:37:47 -08:00
Matei Zaharia	09dd58b3a7	Send SPARK_JAVA_OPTS to slave nodes.	2011-11-30 11:34:58 -08:00
Edison Tung	a3bc012af8	added takeSamples method takeSamples method takes a specified number of samples from the RDD and outputs it in an array.	2011-11-21 16:38:44 -08:00
Ankur Dave	ad4ebff42c	Deduplicate exceptions when printing them The first time they appear, exceptions are printed in full, including a stack trace. After that, they are printed in abbreviated form. They are periodically reprinted in full; the reprint interval defaults to 5 seconds and is configurable using the property spark.logging.exceptionPrintInterval.	2011-11-14 01:54:53 +00:00
Ankur Dave	35b6358a7c	Report errors in tasks to the driver via a Mesos status update When a task throws an exception, the Spark executor previously just logged it to a local file on the slave and exited. This commit causes Spark to also report the exception back to the driver using a Mesos status update, so the user doesn't have to look through a log file on the slave. Here's what the reporting currently looks like: # ./run spark.examples.ExceptionHandlingTest master@203.0.113.1:5050 [...] 11/10/26 21:04:13 INFO spark.SimpleJob: Lost TID 1 (task 0:1) 11/10/26 21:04:13 INFO spark.SimpleJob: Loss was due to java.lang.Exception: Testing exception handling [...] 11/10/26 21:04:16 INFO spark.SparkContext: Job finished in 5.988547328 s	2011-11-14 01:54:53 +00:00
Matei Zaharia	07532021fe	Bug fix: reject offers that we didn't find any tasks for	2011-11-08 23:05:54 -08:00
Matei Zaharia	f346e64637	Updates to the closure cleaner to work better with closures in classes. Before, the cleaner attempted to clone $outer objects that were classes (as opposed to nested closures) and preserve only their used fields, which was bad because it would miss fields that are accessed indirectly by methods, and in general it would confuse user code. Now we keep a reference to those objects without cloning them. This is not perfect because the user still needs to be careful of what they'll carry along into closures, but it works better in some cases that seemed confusing before. We need to improve the documentation on what variables get passed along with a closure and possibly add some debugging tools for it as well. Fixes #71 -- that code now works in the REPL.	2011-11-08 00:33:28 -08:00
Matei Zaharia	c2b7fd6899	Make parallelize() work efficiently for ranges of Long, Double, etc (splitting them into sub-ranges). Fixes #87.	2011-11-02 15:16:02 -07:00
Matei Zaharia	157279e9eb	Update Spark to work with the latest Mesos API	2011-10-30 14:10:56 -07:00
root	3a0e6c4363	Miscellaneous fixes: - Executor should initialize logging properly - groupByKey should allow custom partitioner	2011-10-17 18:07:35 +00:00
root	62aa820084	Merge branch 'ankur-master'	2011-10-14 02:14:07 +00:00
Ankur Dave	2d7057bf5d	Implement PairRDDFunctions.partitionBy	2011-10-09 15:52:09 -07:00
Ankur Dave	06637cb69e	Fix PairRDDFunctions.groupWith partitioning This commit fixes a bug in groupWith that was causing it to destroy partitioning information. It replaces a call to map with a call to mapValues, which preserves partitioning.	2011-10-09 15:48:46 -07:00
Ankur Dave	2911a783d6	Add custom partitioner support to PairRDDFunctions.combineByKey	2011-10-09 15:47:20 -07:00
Ankur Dave	6c6e47e3cd	Use BufferedOutputStream in ShuffleMapTask	2011-10-09 15:43:31 -07:00
Matei Zaharia	1069740264	Added a jarOfObject method to get the JAR of the class that an object belongs to, which seems like a more common case.	2011-08-29 23:27:10 -07:00
Matei Zaharia	0aa23bf17e	Added a convenience method for getting the JAR file that loaded a class (useful for jobs to pass their own JAR files to SparkContext).	2011-08-29 22:59:44 -07:00
Matei Zaharia	a161f00610	Made a log message slightly less ugly	2011-08-27 16:58:54 -07:00
Matei Zaharia	c22043f150	Minor fix: can use >= when checking memory	2011-08-02 19:11:17 -07:00
Ismael Juma	6ff57f5594	Use scala.math instead of Math as the latter is deprecated.	2011-08-02 10:25:47 +01:00
Ismael Juma	620de2dd1d	Change currentThread to Thread.currentThread as the former is deprecated.	2011-08-02 10:25:16 +01:00
Ismael Juma	0fba22b3d2	Fix issue #65 : Change @serializable to extends Serializable in 2.9 branch Note that we use scala.Serializable introduced in Scala 2.9 instead of java.io.Serializable. Also, case classes inherit from scala.Serializable by default.	2011-08-02 10:16:33 +01:00
Matei Zaharia	711575391d	Merge branch 'scala-2.9' Conflicts: project/build/SparkProject.scala	2011-08-01 15:25:26 -07:00
Matei Zaharia	4050d661c5	Updated to newest Mesos API, which includes better memory accounting by specifying per-executor memory.	2011-08-01 13:54:48 -07:00
Matei Zaharia	d12122502b	Various improvements to Kryo serializer: - Replaced modified Kryo version with the standard one augmented with the kryo-serializers package, which includes support for classes with no-arg constructors (that was why we had a modified Kryo before) - The kryo-serializers version also fixes issue #72. - Added a bunch of tests. - Serialize maps and a few other common types properly by default.	2011-07-21 22:09:33 -07:00
Matei Zaharia	baa72e2747	Removed a debug statement that slipped in as a println	2011-07-21 16:09:33 -07:00
Matei Zaharia	2bfd7931e8	Merge branch 'new-rdds-protobuf' Conflicts: core/src/main/scala/spark/Executor.scala core/src/main/scala/spark/RDD.scala	2011-07-21 16:08:39 -07:00
Matei Zaharia	1450fd74d9	Merge branch 'master' into scala-2.9	2011-07-14 17:37:24 -04:00
Matei Zaharia	ccf48388cd	Lowered default number of splits for files	2011-07-14 17:37:04 -04:00
Matei Zaharia	146a18c2a4	Merge branch 'master' into scala-2.9	2011-07-14 17:29:17 -04:00
Matei Zaharia	c8eb8b2b90	Set class loader for remote actors to fix a bug that happens in 2.9	2011-07-14 17:29:11 -04:00
Matei Zaharia	8ea67307b9	Merge branch 'master' into scala-2.9	2011-07-14 14:47:12 -04:00
Matei Zaharia	e4c3402d2d	Renamed ParallelArray to ParallelCollection	2011-07-14 14:47:01 -04:00
Matei Zaharia	9ac461d85d	Remove RDD.toString because it looked confusing	2011-07-14 14:39:32 -04:00
Matei Zaharia	797b4547c3	Fix tracking of updates in accumulators to solve an issue that would manifest in the 2.9 interpreter	2011-07-14 14:08:34 -04:00
Matei Zaharia	3efd9e94d8	Merge branch 'master' into scala-2.9	2011-07-14 12:42:57 -04:00
Matei Zaharia	0ccfe20755	Forgot to add a file	2011-07-14 12:42:50 -04:00
Matei Zaharia	38f38dda5b	Merge branch 'master' into scala-2.9	2011-07-14 12:42:02 -04:00
Matei Zaharia	969644df8e	Cleaned up a few issues to do with default parallelism levels. Also renamed HadoopFileWriter to HadoopWriter (since it's not only for files) and fixed a bug for lookup().	2011-07-14 12:40:56 -04:00
Matei Zaharia	2fb906e8e5	Merge branch 'master' into scala-2.9	2011-07-14 00:20:14 -04:00
Matei Zaharia	2604939f64	Simplified and documented code a little and added test	2011-07-14 00:19:00 -04:00
Matei Zaharia	2439e51a03	Merge branch 'master' into implicit-sequencefile	2011-07-13 23:20:22 -04:00
Matei Zaharia	d0c7958364	Merge branch 'master' into scala-2.9 Conflicts: core/src/main/scala/spark/HadoopFileWriter.scala	2011-07-13 23:09:33 -04:00
Matei Zaharia	9c0069188b	Updated save code to allow non-file-based OutputFormats and added a test for file-related stuff	2011-07-13 23:04:06 -04:00
Matei Zaharia	da8a3b8926	Increase default value of spark.locality.wait a little	2011-07-13 20:07:24 -04:00
Matei Zaharia	080869c6ef	Merge branch 'master' into scala-2.9	2011-07-13 00:20:08 -04:00
Matei Zaharia	842e14d567	Added mapPartitions operation and a bunch of tests for RDD ops	2011-07-13 00:19:52 -04:00
Matei Zaharia	9b568d37f7	Merge branch 'master' into scala-2.9 Conflicts: core/src/main/scala/spark/RDD.scala	2011-07-11 22:25:53 -04:00
Matei Zaharia	d05fea24f3	Simplified parallel shuffle fetcher to use URLConnection	2011-07-11 22:12:36 -04:00
Matei Zaharia	25c3a7781c	Moved PairRDD and SequenceFileRDD functions to separate source files	2011-07-10 00:06:15 -04:00
Matei Zaharia	b7f1f62ff5	bug fix	2011-07-09 18:53:02 -04:00
Matei Zaharia	003480f374	Register byte[] with Kryo serializer	2011-07-09 18:08:07 -04:00
Matei Zaharia	aea5cb4413	Added parallel shuffle fetcher	2011-07-09 17:25:56 -04:00
Matei Zaharia	4b1646a25f	Support for non-filesystem-based Hadoop data sources	2011-07-06 20:37:55 -04:00
Matei Zaharia	07a97d47c2	Support for non-filesystem-based Hadoop data sources	2011-07-06 20:37:34 -04:00
Matei Zaharia	3488c386a9	Initial work to make stuff like sequenceFile[Int, Int] work without requiring the user to provide a Writable type. The approach here might not be the best but it seems to work correctly.	2011-06-28 17:07:04 -07:00
Matei Zaharia	5633299ec6	Merge remote-tracking branch 'origin/master' into scala-2.9	2011-06-27 22:50:59 -07:00
Matei Zaharia	b0ecf1ee41	Don't pass a null context when running tasks locally	2011-06-27 22:50:43 -07:00
Matei Zaharia	85cad5d9dd	Fixed HadoopFileWriter to compile for Scala 2.9	2011-06-27 22:44:14 -07:00
Matei Zaharia	393607d5ef	Merge branch 'master' into scala-2.9	2011-06-27 18:08:25 -07:00
Matei Zaharia	2f652f1656	Fix a compile error	2011-06-27 18:07:16 -07:00
Tathagata Das	3f08e1129f	Merge branch 'master' into td-rdd-save Conflicts: core/src/main/scala/spark/SparkContext.scala	2011-06-27 13:43:44 -07:00
Tathagata Das	ad842ac823	Merge branch 'master' into td-rdd-save Conflicts: core/src/main/scala/spark/RDD.scala	2011-06-27 13:39:11 -07:00
Matei Zaharia	bae8a97968	Merge branch 'master' into scala-2.9 Conflicts: repl/src/main/scala/spark/repl/SparkInterpreterLoop.scala	2011-06-26 19:22:27 -07:00
Matei Zaharia	c4dd68ae21	Merge branch 'mos-bt' This merge keeps only the broadcast work in mos-bt because the structure of shuffle has changed with the new RDD design. We still need some kind of parallel shuffle but that will be added later. Conflicts: core/src/main/scala/spark/BitTorrentBroadcast.scala core/src/main/scala/spark/ChainedBroadcast.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/Utils.scala core/src/main/scala/spark/shuffle/BasicLocalFileShuffle.scala core/src/main/scala/spark/shuffle/DfsShuffle.scala	2011-06-26 18:22:12 -07:00
Tathagata Das	38f2ba99cc	Further changes to HadoopFileWriter. Implemented ability to save RDDs as SequenceFiles and ObjectFiles. 1> HadoopFileWriter changed to take class types as constructor parameters (no more generic type) 2> Multiple types of RDD.saveAsHadoopFile() implemented to provide more saving options 3> RDD.saveAsSequenceFile() automatically converts basic types to Writable types before saving as SequenceFile 4> RDD.saveAsObjectFile() serializes objects and saves them to a ObjectFile 5> SparkContext.objectFile() opens the saved ObjectFiles	2011-06-24 19:51:21 -07:00
Olivier Grisel	2e3531d8bf	Implemented RDD.leftOuterJoin and RDD.rightOuterJoin	2011-06-24 11:00:51 +02:00
Tathagata Das	3d2befe831	Improved HadoopFileWriter (saves key and value classes to jobconf)	2011-06-23 08:11:22 -07:00
Matei Zaharia	214250016a	Added simple version of lookup	2011-06-20 11:59:16 -07:00
Matei Zaharia	23b42af70a	Merge branch 'master' into scala-2.9	2011-06-19 23:06:21 -07:00
Matei Zaharia	23b1c309fb	Added pipe() operation on RDDs for mapping through a shell command.	2011-06-19 23:05:19 -07:00
Tathagata Das	b5e6645505	Cleaner reimplementation of HadoopFileWriter. Introduced TaskContext. 1> HadoopFileWriter works correctly with task failures 2> It can also take an user specified JobConf object for configuration settings 3> A Task can now get information like stage ID, split ID, and attempt ID using TaskContext class 4> Minor changes in SparkContext, DAGScheduler and subclasses to allow specification of TaskContext as a parameter	2011-06-16 20:57:57 -07:00
Tathagata Das	869836a2fa	Implemented TaskContext to hold contextual information (jobID, taskID, attemptID) of a task	2011-06-10 19:47:28 -07:00
Tathagata Das	389e56156f	HadoopFileWriter changed to use Hadoop's OutputCommitter	2011-06-09 15:29:22 -07:00
Tathagata Das	24d845833c	First-cut implementation of RDD.SaveAsText	2011-06-05 04:14:43 -07:00
Matei Zaharia	3297706ab2	Merge remote-tracking branch 'origin/master' into scala-2.9	2011-06-01 11:46:31 -07:00
Matei Zaharia	9bb448a151	Catch Throwable instead of Exception in LocalScheduler and Executor. Fixes #57 .	2011-06-01 11:45:47 -07:00
Matei Zaharia	850fe3274e	Make the runJob API public. Fixes #56 .	2011-06-01 11:38:44 -07:00
Ismael Juma	82f10bd794	Remove unnecessary toStream calls.	2011-06-01 16:12:42 +01:00
Matei Zaharia	10fe324845	Merge remote-tracking branch 'origin/master' into scala-2.9	2011-05-31 23:48:11 -07:00
Matei Zaharia	5166d76843	Ensure logging is initialized before spawning any threads to fix issue #45	2011-05-31 23:47:32 -07:00
Matei Zaharia	0afd35a8dd	Some docs in ClosureCleaner	2011-05-31 22:06:30 -07:00
Matei Zaharia	8b0390d344	Instantiate NullWritable properly in HadoopFile	2011-05-30 23:54:14 -07:00
Matei Zaharia	4096c2287e	Various fixes	2011-05-29 18:46:01 -07:00
Matei Zaharia	ef706ae959	Merge branch 'master' into new-rdds-protobuf Conflicts: run	2011-05-29 16:20:23 -07:00
Matei Zaharia	c501cff924	Executor was looking for the wrong constructor for ExecutorClassLoader	2011-05-29 16:15:59 -07:00
Ismael Juma	1396678baa	Move REPL classes to separate module.	2011-05-27 11:22:50 +01:00
Ismael Juma	164ef4c751	Use explicit asInstanceOf instead of misleading unchecked pattern matching. Also enable -unchecked warnings in SBT build file.	2011-05-27 07:57:10 +01:00
Ismael Juma	89c8ea2bb2	Replace deprecated `-` and `--` with suggested filterNot (which is uglier).	2011-05-26 22:22:37 +01:00
Ismael Juma	94f05683bd	Replace deprecated `first` with `head`.	2011-05-26 22:13:41 +01:00
Ismael Juma	0b6a862b68	Use math instead of Math as the latter is deprecated.	2011-05-26 22:06:36 +01:00
Ismael Juma	1f27d94c48	Use Array.iterator instead of Iterator.fromArray as the latter is deprecated.	2011-05-26 22:04:42 +01:00
Ismael Juma	1993a8e556	Use += instead of + for mutable sequences as the latter is deprecated.	2011-05-26 21:59:48 +01:00
root	5ef938615f	Initial work on making stuff compile with protobuf Mesos	2011-05-24 22:27:08 +00:00
Matei Zaharia	cec427e777	Fixed a bug with preferred locations having changed meaning in new RDDs	2011-05-22 17:12:29 -07:00
Matei Zaharia	4c888b2933	Fix queue type for executor	2011-05-22 16:42:05 -07:00
Matei Zaharia	bea3a33012	doc tweak	2011-05-22 16:03:41 -07:00
Matei Zaharia	9bde5a54cb	class loader fix	2011-05-22 16:00:41 -07:00
Matei Zaharia	91c07a33d9	Various fixes to serialization	2011-05-21 22:50:08 -07:00
Matei Zaharia	f61b61c4ac	Merge branch 'master' into new-rdds	2011-05-21 21:25:58 -07:00
Matei Zaharia	24a1e7f838	Scheduler can now recover from lost map outputs	2011-05-20 00:19:53 -07:00
Matei Zaharia	82329b0b28	Updated scheduler to support running on just some partitions of final RDD	2011-05-19 12:47:09 -07:00
Matei Zaharia	328e51b693	Various minor fixes	2011-05-19 11:19:25 -07:00
Matei Zaharia	fd1d255821	Stop objectifying various trackers, caches, etc.	2011-05-17 12:41:13 -07:00
Matei Zaharia	4db50e26c7	Fixed unit tests by making them clean up the SparkContext after use and thus clean up the various singletons (RDDCache, MapOutputTracker, etc). This isn't perfect yet (ideally we shouldn't use singleton objects at all) but we can fix that later.	2011-05-13 12:03:58 -07:00
Matei Zaharia	aca8150c52	Ensure that AddedToCache messages make it home before tasks finish	2011-05-13 11:43:52 -07:00
Matei Zaharia	16c886a581	Optimization for count()	2011-05-13 10:41:34 -07:00
Mosharaf Chowdhury	db7a2c4897	Issue #42 fixed.	2011-04-28 14:30:48 -07:00
Ankur Dave	a4c04f3f6f	Error handling for disk I/O in DiskSpillingCache Also renamed the property spark.DiskSpillingCache.cacheDir to spark.diskSpillingCache.cacheDir in order to follow conventions.	2011-04-27 23:23:29 -07:00
Ankur Dave	12ff0d2dc3	Bring an entry back into memory after fetching it from disk	2011-04-27 22:59:05 -07:00
Ankur Dave	e30313aa2c	Added DiskSpillingCache DiskSpillingCache is a BoundedMemoryCache that spills entries to disk when it runs out of space. Currently the implementation is very simple. In particular, it's missing the following features: - Error handling for disk I/O, including checking of disk space levels - Bringing an entry back into memory after fetching it from disk In addition, here are some features that aren't critical but should be implemented soon: - Spilling based on a user-set priority in addition to LRU - Caching into a subdirectory of spark.DiskSpillingCache.cacheDir rather than the root directory	2011-04-27 22:32:35 -07:00
Mosharaf Chowdhury	60d1121343	Refactoring: daemonThreadFactories have all been moved to the Utils object instead of having multiple copies in Broadcast and Shuffle objects.	2011-04-27 22:13:01 -07:00
Mosharaf Chowdhury	e898e108a3	Cleanup + refactoring...	2011-04-27 22:00:24 -07:00
Mosharaf Chowdhury	0567646180	Shuffle is also working from its own subpackage.	2011-04-27 21:11:41 -07:00
Mosharaf Chowdhury	2742de707a	Removed some shuffle implementations. Remaining ones all use local files to write map outputs.	2011-04-27 20:53:43 -07:00
Mosharaf Chowdhury	9d78779257	Merge branch 'mos-shuffle-tracked' into mos-bt Conflicts: core/src/main/scala/spark/Broadcast.scala	2011-04-27 20:47:07 -07:00
Mosharaf Chowdhury	ac7e066383	Merge branch 'master' into mos-shuffle-tracked Conflicts: .gitignore core/src/main/scala/spark/LocalFileShuffle.scala src/scala/spark/BasicLocalFileShuffle.scala src/scala/spark/Broadcast.scala src/scala/spark/LocalFileShuffle.scala	2011-04-27 14:35:03 -07:00
Mosharaf Chowdhury	4e4c41026c	Added support for custom classes. (from 49ea48)	2011-04-27 12:30:16 -07:00
Mosharaf Chowdhury	65848da8df	Refacoring...	2011-04-26 17:41:31 -07:00
Mosharaf Chowdhury	b8ab7862b8	Moved broadcast-related code to separate directory under spark.broadcast package.	2011-04-26 17:22:52 -07:00
Mosharaf Chowdhury	e31007248c	Merge branch 'master' into mos-bt	2011-04-26 12:04:14 -07:00
Mosharaf Chowdhury	9257a55e3a	Refactoring...	2011-04-26 11:45:36 -07:00
Mosharaf Chowdhury	9d2d533493	Temporary fix for issue #42 .	2011-04-21 17:40:26 -07:00
Timothy Hunter	5c9535228a	fixed small bug when classpath has some strange formatting	2011-04-18 17:12:29 -07:00
Mosharaf Chowdhury	a8f47a62b9	Renamed MaxRxPeers to MaxTxPeers to MaxTxSlots and MaxRxSlots respectively for clarity (most probably they were misunderstood and misused)	2011-04-13 16:24:19 -07:00
Matei Zaharia	94ba95bcb2	Added flatMapValues	2011-04-12 19:51:58 -07:00
Mosharaf Chowdhury	b67a968b5d	hasBlocks is now AtomicInteger (even though it was ok)	2011-04-02 22:03:18 -07:00
Mosharaf Chowdhury	5bf3c83b13	BroadcastSuperTracker (right now for BT) is contacted over TCP instead of direct procedure call. Need to do the same for others and consolidate all broadcast mechanisms.	2011-04-01 19:31:28 -07:00
Mosharaf Chowdhury	733a130108	Formatting...	2011-04-01 14:51:24 -07:00
Mosharaf Chowdhury	4636aea598	Formatting...	2011-04-01 14:49:59 -07:00
Mosharaf Chowdhury	addd569e52	Each broadcasted variable can have different blockSize. Corresponding logic to adapt blockSize based on network condition is not yet implemented. Formatting + consolidation.	2011-03-31 14:51:46 -07:00
Mosharaf Chowdhury	815f3411ec	Consolidated Broadcast config params.	2011-03-30 16:45:51 -07:00
Mosharaf Chowdhury	a18a28b08e	Removed gossip-related code that were already commented out. More formatting.	2011-03-30 14:22:09 -07:00
Mosharaf Chowdhury	43aceafd70	Formatting...	2011-03-30 12:18:50 -07:00
Mosharaf Chowdhury	73b165220d	Random is the default choice; rarestFirst didn't work well in experiments.	2011-03-29 13:06:43 -07:00
Matei Zaharia	d840fa8d0c	Merge remote branch 'origin/custom-serialization' into new-rdds	2011-03-09 00:40:07 -08:00
root	ff5b13799a	Some tweaks to make Kryo cache work better	2011-03-09 03:31:50 -05:00
Matei Zaharia	7febdfbe29	Better reuse of buffers in Kryo serialization	2011-03-08 12:36:36 -08:00
Matei Zaharia	8ee3ec29ee	Merge remote branch 'origin/custom-serialization' into new-rdds	2011-03-08 11:58:19 -08:00
Matei Zaharia	ab1216cb14	Register None and Nil properly	2011-03-08 11:52:58 -08:00
Matei Zaharia	d39f5dd15e	Merge remote branch 'origin/custom-serialization' into new-rdds	2011-03-08 10:28:50 -08:00
Matei Zaharia	4f0d0a7b73	stuff	2011-03-08 10:28:26 -08:00
Matei Zaharia	8b6f3db415	Merge remote branch 'origin/custom-serialization' into new-rdds	2011-03-07 19:20:28 -08:00
Matei Zaharia	38f6bce33d	Added SerializingCache	2011-03-07 19:16:24 -08:00
Matei Zaharia	6316c7979d	Remove some logging	2011-03-07 18:56:36 -08:00
Matei Zaharia	e7b4b047a6	Added pluggable serializers and Kryo serialization	2011-03-07 18:41:53 -08:00
Matei Zaharia	467f056e29	Remove commented code	2011-03-06 23:38:41 -08:00
Matei Zaharia	bce95b8458	Finished cogroup stuff	2011-03-06 23:38:16 -08:00
Matei Zaharia	04c2d6a60c	stuff	2011-03-06 19:27:03 -08:00
Matei Zaharia	0fb691dd28	Various fixes to get MesosScheduler working with new RDDs	2011-03-06 16:16:38 -08:00
Matei Zaharia	1df5a65a01	Pass cache locations correctly to DAGScheduler.	2011-03-06 12:16:38 -08:00
Matei Zaharia	e1436f1eaa	Merge remote branch 'origin/master' into new-rdds	2011-03-06 11:11:47 -08:00
Matei Zaharia	370b95816f	Added sampling for large arrays in SizeEstimator	2011-03-06 11:11:20 -08:00
Matei Zaharia	a789e9aaea	Merge remote branch 'origin/master' into new-rdds	2011-03-01 10:33:37 -08:00
Matei Zaharia	021c50a8d4	Remove unnecessary lock which was there to work around a bug in Configuration in Hadoop 0.20.0	2011-03-01 10:28:38 -08:00
Matei Zaharia	9e59afd710	More work on new RDD design	2011-02-27 19:15:52 -08:00
Matei Zaharia	f38f86d59e	More stuff	2011-02-27 14:27:12 -08:00
Matei Zaharia	2e6023f2bf	stuff	2011-02-26 23:41:44 -08:00
Matei Zaharia	309367c477	Initial work towards new RDD design	2011-02-26 23:15:33 -08:00
Mosharaf Chowdhury	0416cc22d2	Picking peers weighted by the number of rare blocks they have. A block is rare if there are at most 2 copies in the neighborhood. Better number can be used (some function of neighborhood size)	2011-02-15 16:27:44 -08:00
Mosharaf Chowdhury	cf81da9485	Optimization: Master sends out at least one copy of each block first regardless of whatever a client is asking for. Once one copy of each block is out, Master then responds to specific blocks from individual receivers.	2011-02-14 15:08:33 -08:00
Mosharaf Chowdhury	2b946fb2d1	pickBlockRarestFirst and gossips commented OUT for now. Problem with the rarestFirst implemention is that we are picking peers randomly first and then picking blocks from the random peer using rarestFirst. NOT the right away to do it. It should be the other way around. Problem with gossip is that peers might end up overwriting newer information by older ones. To fix that we either have to have timestamps or must match the bitVectors before overwriting.	2011-02-13 13:53:15 -08:00
Mosharaf Chowdhury	ca2895ebb0	Fix in rarestFirst implemenation. If there are more than one rarest blocks, pick randomly between them (was deterministic before)	2011-02-10 20:37:44 -08:00
Mosharaf Chowdhury	520bbdc7e3	Peers now gossip about their neighbors when they talk.	2011-02-10 20:15:30 -08:00
Matei Zaharia	dc24aecd8f	Close record readers in HadoopFile after finishing a split	2011-02-10 12:07:48 -08:00
Mosharaf Chowdhury	441462bc7f	Fixed some warnings during compilation.	2011-02-09 12:11:43 -08:00
Mosharaf Chowdhury	1a73c0d265	Merged with master. Using sbt.	2011-02-09 10:48:48 -08:00
Mosharaf Chowdhury	495b38658e	Merge branch 'master' into mos-bt	2011-02-09 10:40:23 -08:00
Matei Zaharia	99f3f23efa	Changed default shuffle to LocalFileShuffle because it's way faster for small files	2011-02-08 17:03:03 -08:00
Matei Zaharia	ec28b607fd	Merge branch 'master' into sbt Conflicts: Makefile core/src/main/java/spark/compress/lzf/LZF.java core/src/main/java/spark/compress/lzf/LZFInputStream.java core/src/main/java/spark/compress/lzf/LZFOutputStream.java core/src/main/native/spark_compress_lzf_LZF.c run	2011-02-02 00:25:54 -08:00
Matei Zaharia	e5c4cd8a5e	Made examples and core subprojects	2011-02-01 15:11:08 -08:00

... 72 73 74 75 76 ...

4461 commits