ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Matei Zaharia	dc28a3ac0a	Modified shuffle to limit the maximum outstanding data size in bytes, instead of the maximum number of outstanding fetches. This should make it faster when there are many small map output files, as well as more robust to overallocating memory on large map outputs.	2012-10-06 20:07:10 -07:00
Matei Zaharia	9a3b3f32a3	Pass sizes of map outputs back to MapOutputTracker	2012-10-06 18:46:04 -07:00
Matei Zaharia	0e42832e6a	Made block store return the size of each block put in	2012-10-06 18:00:53 -07:00
Matei Zaharia	b0110de5b6	Warn about user programs that try to set spark.cache.class	2012-10-06 17:27:14 -07:00
Matei Zaharia	65113b7e1b	Only group elements ten at a time into SequenceFile records in saveAsObjectFile	2012-10-06 17:14:41 -07:00
Matei Zaharia	716e10ca32	Minor formatting fixes	2012-10-05 22:03:06 -07:00
Matei Zaharia	70f02fa912	Merge branch 'dev' of github.com:mesos/spark into dev	2012-10-05 22:00:22 -07:00
Andy Konwinski	a242cdd0a6	Factor subclasses of RDD out of RDD.scala into their own classes in the rdd package.	2012-10-05 19:53:54 -07:00
Andy Konwinski	d7363a6b8a	Moves all files in core/src/main/scala/ that have RDD in their name from that directory to a new core/src/main/scala/rdd directory.	2012-10-05 19:23:45 -07:00
Andy Konwinski	e0067da082	Moves all files in core/src/main/scala/ that have RDD in them from package spark to package spark.rdd and updates all references to them.	2012-10-05 19:23:45 -07:00
Matei Zaharia	69588baf65	Cleaning up code slightly	2012-10-05 19:16:09 -07:00
root	f52bc09a34	Reduce some overly aggressive logging in connection manager	2012-10-06 01:54:39 +00:00
Matei Zaharia	e3ae98b54e	Merge pull request #247 from squito/dev Dev	2012-10-05 10:27:18 -07:00
Imran Rashid	e0698f8f26	change tests to show utility of localValue	2012-10-04 23:05:42 -07:00
Imran Rashid	82a3327862	make accumulator.localValue public, add tests Conflicts: core/src/test/scala/spark/AccumulatorSuite.scala	2012-10-04 23:05:01 -07:00
Matei Zaharia	8c82f43db3	Scaladoc documentation for some core Spark functionality	2012-10-04 22:59:36 -07:00
Reynold Xin	45f4b7cc7e	Made Serializer and JavaSerializer non private.	2012-10-03 10:20:59 -07:00
Matei Zaharia	833f1d0c86	Made StorageLevel public	2012-10-03 08:27:25 -07:00
Matei Zaharia	6cf5dffc72	Make more stuff private[spark]	2012-10-02 22:28:55 -07:00
Mosharaf Chowdhury	119e50c7b9	Conflict fixed	2012-10-02 22:25:39 -07:00
Matei Zaharia	626f701931	Merge pull request #240 from dennybritz/private_classes Package-Private Classes	2012-10-02 21:24:32 -07:00
Denny	0361353a70	Make Java API abstract wrapped functions private	2012-10-02 20:02:53 -07:00
Denny	b9badcd5bd	accidentially removed trait	2012-10-02 19:35:07 -07:00
Denny	18a1faedf6	Stylistic changes and Public Accumulable and Broadcast	2012-10-02 19:28:37 -07:00
Denny	b7a913e1fa	Make dependency classes public - used by spark	2012-10-02 19:04:23 -07:00
Denny	4d9f4b01af	Make classes package private	2012-10-02 19:00:19 -07:00
Matei Zaharia	97cbd699d7	Merge branch 'dev' of github.com:mesos/spark into dev	2012-10-02 17:31:01 -07:00
Matei Zaharia	6098f7e87a	Fixed cache replacement behavior of BlockManager: - Partitions that get dropped to disk will now be loaded back into RAM after they're accessed again - Same-RDD rule for cache replacement is now implemented (don't drop partitions from an RDD to make room for other partitions from itself) - Items stored as MEMORY_AND_DISK go into memory only first, instead of being eagerly written out to disk - MemoryStore.ensureFreeSpace is called within a lock on the writer thread to prevent race conditions (this can still be optimized to allow multiple concurrent calls to it but it's a start) - MemoryStore does not accept blocks larger than its limit	2012-10-02 17:25:38 -07:00
Reynold Xin	7997585616	Added a check to make sure SPARK_MEM <= memoryPerSlave for local cluster mode.	2012-10-02 15:45:25 -07:00
Reynold Xin	0898a21b95	Merge branch 'dev' of https://github.com/mesos/spark into dev	2012-10-02 13:08:01 -07:00
Matei Zaharia	22684653a5	Revert "Place Spray repo ahead of Cloudera in Maven search path" This reverts commit `42e0a68082`.	2012-10-02 12:01:32 -07:00
Reynold Xin	b8cd681169	Allow whitespaces in cluster URL configuration for local cluster.	2012-10-02 11:52:12 -07:00
Matei Zaharia	42e0a68082	Place Spray repo ahead of Cloudera in Maven search path	2012-10-02 11:37:19 -07:00
Matei Zaharia	b9fb8d6463	Include date in folder name for Spark local dir.	2012-10-01 15:55:16 -07:00
Matei Zaharia	bc881e4798	Merge branch 'dev' of github.com:mesos/spark into dev	2012-10-01 15:21:56 -07:00
Matei Zaharia	802aa8aef9	Some bug fixes and logging fixes for broadcast.	2012-10-01 15:20:42 -07:00
Reynold Xin	f264153162	Fixed #232 : DirectBuffer's cleaner was empty and Spark tried to invoke clean on it.	2012-10-01 14:07:34 -07:00
Matei Zaharia	3b348f909d	Improve log messages from BlockManager	2012-10-01 12:01:38 -07:00
Matei Zaharia	53f90d0f0e	Use underscores instead of colons in RDD IDs	2012-10-01 10:48:53 -07:00
Matei Zaharia	2314132d57	Added a (failing) test for LRU with MEMORY_AND_DISK.	2012-09-30 22:52:16 -07:00
Matei Zaharia	3128c57f90	Simplified Class / ClassLoader test	2012-09-30 21:48:27 -07:00
Matei Zaharia	83143f9a5f	Fixed several bugs that caused weird behavior with files in spark-shell: - SizeEstimator was following through a ClassLoader field of Hadoop JobConfs, which referenced the whole interpreter, Scala compiler, etc. Chaos ensued, giving an estimated size in the tens of gigabytes. - Broadcast variables in local mode were only stored as MEMORY_ONLY and never made accessible over a server, so they fell out of the cache when they were deemed too large and couldn't be reloaded.	2012-09-30 21:19:39 -07:00
Matei Zaharia	fd0374b9de	Comment	2012-09-29 21:43:06 -07:00
Matei Zaharia	5718cef2a4	Removed Logging trait from CoalescedRDD since we don't log anything	2012-09-29 21:40:43 -07:00
Matei Zaharia	143ef4f90d	Added a CoalescedRDD class for reducing the number of partitions in an RDD.	2012-09-29 21:30:52 -07:00
Matei Zaharia	ebd52347b5	Merge branch 'dev' of github.com:mesos/spark into dev	2012-09-29 20:22:31 -07:00
Matei Zaharia	9b326d01e9	Made BlockManager unmap memory-mapped files when necessary to reduce the number of open files. Also optimized sending of disk-based blocks.	2012-09-29 20:21:54 -07:00
Matei Zaharia	2f11e3c285	Merge pull request #227 from JoshRosen/fix/distinct_numsplits Allow controlling number of splits in distinct().	2012-09-28 23:57:24 -07:00
Josh Rosen	8654165e69	Use null as dummy value in distinct().	2012-09-28 23:55:17 -07:00
Josh Rosen	37c199bbb0	Allow controlling number of splits in distinct().	2012-09-28 23:44:19 -07:00
Matei Zaharia	56dcad5936	Don't create a Cache in SparkEnv because we don't use it	2012-09-28 23:40:56 -07:00
Matei Zaharia	1d44644f4f	Logging tweaks	2012-09-28 23:28:16 -07:00
Matei Zaharia	815d6bd69a	Renamed subdirs option	2012-09-28 19:02:41 -07:00
Matei Zaharia	e54e1d7043	Made subdirs per local dir configurable, and reduced lock usage a bit	2012-09-28 19:00:50 -07:00
Matei Zaharia	ae8c7d6cfa	Made disk store use multiple directories, deleted ShuffleManager	2012-09-28 18:28:13 -07:00
Matei Zaharia	3d7267999d	Print and track user call sites in more places in Spark	2012-09-28 17:42:00 -07:00
Matei Zaharia	9f6efbf06a	Merge pull request #225 from pwendell/dev Log message which records RDD origin	2012-09-28 16:28:07 -07:00
Matei Zaharia	0121a26bd1	Changed the way tasks' dependency files are sent to workers so that custom serializers or Kryo registrators can be loaded.	2012-09-28 16:14:05 -07:00
Patrick Wendell	9fc78f8f29	Fixing some whitespace issues	2012-09-28 16:05:50 -07:00
Patrick Wendell	bc909c2903	Changes based on Matei's comments	2012-09-28 16:04:36 -07:00
Patrick Wendell	c387e40fb1	Log message which records RDD origin This adds tracking to determine the "origin" of an RDD. Origin is defined by the boundary between the user's code and the spark code, during an RDD's instantiation. It is meant to help users understand where a Spark RDD is coming from in their code. This patch also logs origin data when stages are submitted to the scheduler. Finally, it adds a new log message to fix an inconsitency in the way that dependent stages (those missing parents) and independent stages (those without) are logged during submission.	2012-09-28 15:51:46 -07:00
Matei Zaharia	2a8bfbca00	Fixed a bug where isLocal was set to false when using local[K]	2012-09-28 14:50:54 -07:00
Matei Zaharia	4a138403ef	Fix a bug in JAR fetcher that made it always fetch the JAR	2012-09-27 21:32:06 -07:00
Matei Zaharia	009b0e37e7	Added an option to compress blocks in the block store	2012-09-27 18:45:44 -07:00
Matei Zaharia	7bcb08cef5	Renamed storage levels to something cleaner; fixes #223 .	2012-09-27 17:50:59 -07:00
Matei Zaharia	920fab23c3	Merge pull request #222 from rxin/dev Added MapPartitionsWithSplitRDD.	2012-09-26 23:16:45 -07:00
Matei Zaharia	ea05fc130b	Updates to standalone cluster, web UI and deploy docs.	2012-09-26 22:54:39 -07:00
Matei Zaharia	1ef4f0fbd2	Allow controlling number of splits in sortByKey.	2012-09-26 19:18:47 -07:00
Reynold Xin	1ad1331a34	Added MapPartitionsWithSplitRDD.	2012-09-26 17:11:28 -07:00
Matei Zaharia	ee71fa49c1	Look for Kryo registrator using context class loader	2012-09-26 14:15:16 -07:00
Matei Zaharia	d71a358c46	Fixed a test that was getting extremely lucky before, and increased the number of samples used for sorting	2012-09-26 00:25:34 -07:00
Matei Zaharia	051785c7e6	Several fixes to sampling issues pointed out by Henry Milner: - takeSample was biased towards earlier partitions - There were some range errors in takeSample - SampledRDDs with replacement didn't produce appropriate counts across partitions (we took exactly frac of each one)	2012-09-25 21:46:58 -07:00
Matei Zaharia	4d3339a3ec	Merge pull request #217 from rxin/dev Added a method to RDD to expose the ClassManifest.	2012-09-24 23:52:32 -07:00
Reynold Xin	7a4cd92861	Renamed RDD.manifest to RDD.elementClassManifest	2012-09-24 23:42:33 -07:00
Matei Zaharia	296e24b440	Merge pull request #218 from rnpandya/dev Scripts to start Spark under windows	2012-09-24 21:10:31 -07:00
Reynold Xin	348bcbca1f	Added a method to RDD to expose the ClassManifest.	2012-09-24 16:56:27 -07:00
Ravi Pandya	39215357af	Windows command scripts for sbt and run	2012-09-24 15:43:19 -07:00
Matei Zaharia	6eeb379cf8	Fix some test issues	2012-09-24 15:39:58 -07:00
Matei Zaharia	f855e4fad2	Merge pull request #208 from rxin/dev Separated ShuffledRDD into multiple classes.	2012-09-24 12:32:01 -07:00
root	107a5ca879	Make default number of parallel fetches slightly smaller since it doesn't seem to hurt performance much and it will cause slightly less GC.	2012-09-23 06:06:12 +00:00
root	e41cab04ca	Avoid creating an extra buffer when saving a stream of values as DISK_ONLY	2012-09-23 05:56:44 +00:00
Denny	afb7ccc838	HTTP File server fixes.	2012-09-21 10:58:13 -07:00
root	6d28dde370	Rename our toIterator method into asIterator to prevent confusion with the Scala collection one, which often copies a collection.	2012-09-21 06:02:55 +00:00
root	a642051ade	Fixed a performance bug in BlockManager that was creating garbage when returning deserialized, in-memory RDDs.	2012-09-21 05:42:21 +00:00
root	8feb5caacd	Fixed an issue with ordering of classloader setup that was causing Java deserializer to break	2012-09-21 05:13:19 +00:00
Reynold Xin	6b5980da79	Set a limited number of retry in standalone deploy mode.	2012-09-19 15:41:56 -07:00
Reynold Xin	397d3816e1	Separated ShuffledRDD into multiple classes: RepartitionShuffledRDD, ShuffledSortedRDD, and ShuffledAggregatedRDD.	2012-09-19 12:31:45 -07:00
Denny	ca64d16a2d	When a file is downloaded, make it executable. That's neccsary for scripts (e.g. in Shark)	2012-09-17 10:08:37 -07:00
Matei Zaharia	840cbcf849	Change default serializer to Java.. it had accidentally become Kryo.	2012-09-13 17:19:26 -07:00
Matei Zaharia	b4dfa25c8a	Store shuffle map outputs as DISK_ONLY	2012-09-12 16:05:57 -07:00
Matei Zaharia	2d761e3353	Ported performance and FT improvements from latest streaming work	2012-09-12 14:54:40 -07:00
Matei Zaharia	9b4cd1648b	Fix bugs with Connection's shutdown callback failing to get its address	2012-09-12 14:54:14 -07:00
Matei Zaharia	9199775d41	Wait for Akka to really shut down in SparkEnv.stop()	2012-09-12 14:50:37 -07:00
Denny	5e4076e3f2	Merge branch 'dev' into feature/fileserver Conflicts: core/src/main/scala/spark/SparkContext.scala	2012-09-11 16:57:17 -07:00
Denny	77873d2c8e	Formatting	2012-09-11 16:51:46 -07:00
Denny	24b9b37314	Subclass URLClassLoader instead of using reflection	2012-09-11 16:51:08 -07:00
Denny	31c53e917d	Use stageId as index for fileSet caches.	2012-09-11 16:10:45 -07:00
Matei Zaharia	943df48348	Merge branch 'dev' of github.com:mesos/spark into dev	2012-09-11 16:00:37 -07:00
Matei Zaharia	6d7f907e73	Manually merge pull request #175 by Imran Rashid	2012-09-11 16:00:06 -07:00
Reynold Xin	7af7c79ce5	Updated the logError call from the previous commit to conform to logError API.	2012-09-11 14:32:24 -07:00
Reynold Xin	38b9119c96	Log entire exception (including stack trace) in BlockManagerWorker.	2012-09-11 11:31:35 -07:00
Denny	4d3471dd07	Fix serialization bugs and added local cluster tests	2012-09-10 15:39:58 -07:00
Denny	b864c36a30	Dynamically adding jar files and caching fileSets.	2012-09-10 12:49:09 -07:00
Denny	f275fb07da	General FileServer A general fileserver for both JARs and regular files.	2012-09-10 12:48:59 -07:00
Matei Zaharia	a13780670d	Added a unit test for local-cluster mode and simplified some of the code involved in that	2012-09-10 12:48:58 -07:00
Denny	f2ac55840c	Add shutdown hook to Executor Runner and execute code to shutdown local cluster in Scheduler Backend	2012-09-10 12:48:58 -07:00
Denny	9ead8ab14e	Set SPARK_LAUNCH_WITH_SCALA=0 in Executor Runner	2012-09-10 12:48:58 -07:00
Denny	8bb3c73977	Renamed spark-cluster to spark-local.	2012-09-10 12:48:58 -07:00
Denny	a367c20f49	Fix wrong counting	2012-09-10 12:48:57 -07:00
Denny	93fe331e6d	Delete old DeployUtils.	2012-09-10 12:48:57 -07:00
Denny	cf074f9c96	Renamed class.	2012-09-10 12:48:57 -07:00
Denny	3749f94184	Start a standalone cluster locally.	2012-09-10 12:48:57 -07:00
Matei Zaharia	995982b3c9	Added a unit test for local-cluster mode and simplified some of the code involved in that	2012-09-07 17:08:36 -07:00
Matei Zaharia	8d2fcc2832	Merge pull request #189 from dennybritz/feature/localcluster Simulating a Spark standalone cluster locally	2012-09-07 15:43:43 -07:00
Denny	7ff9311add	Add shutdown hook to Executor Runner and execute code to shutdown local cluster in Scheduler Backend	2012-09-07 14:09:12 -07:00
Denny	4e7b264cf7	Set SPARK_LAUNCH_WITH_SCALA=0 in Executor Runner	2012-09-07 11:39:44 -07:00
root	c2da64409a	Randomize the order of block fetches in getMultiple	2012-09-06 23:16:26 +00:00
Denny	886183e591	Renamed spark-cluster to spark-local.	2012-09-05 17:10:54 -07:00
Reynold Xin	c308fbcb79	Removed cache add/remove log messages from CacheTracker. Added log messages on BlockManagerMaster to reflect block add/remove. Also did some minor cleanup of storage package code.	2012-09-05 15:59:48 -07:00
Denny	babbca0a2f	Fix wrong counting	2012-09-04 22:04:18 -07:00
Denny	9326509f66	Delete old DeployUtils.	2012-09-04 21:15:23 -07:00
Denny	1588d4dbe6	Renamed class.	2012-09-04 21:13:25 -07:00
Denny	22dde6e020	Start a standalone cluster locally.	2012-09-04 20:56:30 -07:00
Matei Zaharia	a842c63044	Minor formatting fixes	2012-09-03 16:24:00 -07:00
Harvey	3076b038f4	Start fetching a remote block when a received remote block has been passed to the reduce function	2012-09-01 12:01:35 -07:00
Matei Zaharia	389fb4cc54	End runJob() with a SparkException when a task fails too many times in one of the cluster schedulers.	2012-08-31 17:47:43 -07:00
Mosharaf Chowdhury	31ffe8d528	Synchronization bug fix in broadcast implementations	2012-08-30 22:26:43 -07:00
Mosharaf Chowdhury	3883532545	Bug fix. Fixed log messages. Updated BroadcastTest example to have iterations.	2012-08-30 21:43:00 -07:00
Matei Zaharia	a480dec6b2	Deserialize multi-get results in the caller's thread. This fixes an issue with shared buffers in the KryoSerializer.	2012-08-30 20:01:06 -07:00
Reynold Xin	5945bcdcc5	Added a new flag in Aggregator to indicate applying map side combiners.	2012-08-29 23:32:08 -07:00
Reynold Xin	c68e820b2a	Merge branch 'dev' of github.com:mesos/spark into dev	2012-08-29 23:01:19 -07:00
Reynold Xin	940869dfda	Disable running combiners on map tasks when mergeCombiners function is not specified by the user.	2012-08-29 23:00:02 -07:00
Matei Zaharia	bf2e9cb08e	Fault tolerance and block store fixes discovered through streaming tests.	2012-08-27 23:07:50 -07:00
Reynold Xin	3a6a95dc24	Removed the deserialization cache for ShuffleMapTask because it was causing concurrency problems (some variables in Shark get set to null). The cost of task deserialization on slaves is trivial compared with the execution time of the task anyway.	2012-08-27 22:33:15 -07:00
Matei Zaharia	deedb9e7b7	Fix further issues with tests and broadcast. The broadcast fix is to store values as MEMORY_ONLY_DESER instead of MEMORY_ONLY, which will save substantial time on serialization.	2012-08-23 20:31:49 -07:00
Matei Zaharia	59b831b9d1	Fixed test failures due to broadcast not stopping correctly	2012-08-23 19:59:55 -07:00
Matei Zaharia	7310a6f499	Merge pull request #147 from mosharaf/dev Broadcast refactoring/cleaning up	2012-08-23 19:38:28 -07:00
Matei Zaharia	25a6a39e6d	Added other SparkContext constructors to JavaSparkContext	2012-08-19 18:59:16 -07:00
Shivaram Venkataraman	1ea269110c	Move object size and pointer size initialization into a function to enable unit-testing	2012-08-13 13:31:45 -07:00
Shivaram Venkataraman	44661df9cc	If spark.test.useCompressedOops is set, use that to infer compressed oops setting. This is useful to get a deterministic test case	2012-08-13 13:31:39 -07:00
Shivaram Venkataraman	0dd8fe73ba	Use HotSpotDiagnosticMXBean to get if CompressedOops are in use or not	2012-08-13 13:31:29 -07:00
Shivaram Venkataraman	80104ce1da	Add link to Java wiki which specifies what changes with compressed oops	2012-08-13 13:31:21 -07:00
Shivaram Venkataraman	00ab5490b3	Changes to make size estimator more accurate. Fixes object size, pointer size according to architecture and also aligns objects and arrays when computing instance sizes. Verified using Eclipse Memory Analysis Tool (MAT)	2012-08-13 13:31:11 -07:00
Matei Zaharia	6ae3c375a9	Renamed apply() to call() in Java API and allowed it to throw Exceptions	2012-08-12 23:10:19 +02:00
Matei Zaharia	0141879c40	Use Promises instead of having a Future wait on a thread in ConnectionManager.	2012-08-12 22:16:32 +02:00
Matei Zaharia	845a870242	Return remotely fetched blocks in a pipelined fashion from BlockManager	2012-08-12 20:01:38 +02:00
Matei Zaharia	e17ed9a21d	Switch to Akka futures in connection manager. It's still not good because each Future ends up waiting on a lock, but it seems to work better than Scala Actors, and more importantly it allows us to use onComplete and other listeners on futures.	2012-08-12 19:40:37 +02:00
Matei Zaharia	ad8a7612a4	Changed multi-get method in BlockManager to return an iterator	2012-08-12 19:18:01 +02:00
Matei Zaharia	3c94e5c188	Merge pull request #168 from shivaram/dev Use JavaConversion to get a scala iterator	2012-08-10 00:57:33 -07:00
Matei Zaharia	e463e7a333	Merge pull request #167 from JoshRosen/piped-rdd-fixes Detect non-zero exit status from PipedRDD process	2012-08-10 00:56:42 -07:00

1 2 3 4 5 ...

604 commits