ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Ankur Dave	362b9422e4	Soften wording about GraphX superseding Bagel	2014-01-10 23:48:32 -08:00
Ankur Dave	2d7e8d8c48	Add GC note to GraphLab	2014-01-10 23:46:02 -08:00
Reynold Xin	92ad18b00e	Merge pull request #376 from prabeesh/master Change clientId to random clientId The client identifier should be unique across all clients connecting to the same server. A convenience method is provided to generate a random client id that should satisfy this criteria - generateClientId(). Returns a randomly generated client identifier based on the current user's login name and the system time. As the client identifier is used by the server to identify a client when it reconnects, the client must use the same identifier between connections if durable subscriptions are to be used.	2014-01-10 23:25:15 -08:00
Reynold Xin	0b5ce7af17	Merge pull request #386 from pwendell/typo-fix Small typo fix	2014-01-10 23:23:21 -08:00
Andrew Or	bb8098f203	Add number of bytes spilled to Web UI	2014-01-10 21:40:55 -08:00
Ankur Dave	a696be1e01	Finish `d1d2b6d9b6`	2014-01-10 21:18:34 -08:00
Ankur Dave	d1d2b6d9b6	Remove blank lines added to Spark core	2014-01-10 21:17:32 -08:00
Matei Zaharia	1d7bef0c91	Merge pull request #381 from mateiz/default-ttl Fix default TTL for metadata cleaner It seems to have been set to 3500 in a previous commit for debugging, but it should be off by default.	2014-01-10 18:53:03 -08:00
Ankur Dave	c4fb6a87d3	Fix scaladoc warnings	2014-01-10 18:36:42 -08:00
Andrew Or	e6447152b3	Induce spilling in ExternalAppendOnlyMapSuite	2014-01-10 18:33:48 -08:00
Ankur Dave	0ca18b8b07	Revert GraphX changes to SparkILoopInit The changes were to support a custom banner in spark-shell for use by graphx-shell, but once GraphX is merged into Spark, a separate shell will be unnecessary.	2014-01-10 18:05:11 -08:00
Ankur Dave	41d6586e8e	Revert changes to Spark's (PrimitiveKey)OpenHashMap; copy PKOHM to graphx	2014-01-10 18:00:54 -08:00
Patrick Wendell	44d6a8e3d8	Merge pull request #382 from RongGu/master Fix a type error in comment lines Fix a type error in comment lines	2014-01-10 17:51:50 -08:00
Patrick Wendell	08370a52b8	Small typo fix	2014-01-10 17:47:15 -08:00
Patrick Wendell	88faa30a42	Merge pull request #385 from shivaram/add-i2-instances Add i2 instance types to Spark EC2. Using data from http://aws.amazon.com/amazon-linux-ami/instance-type-matrix/ and http://www.ec2instances.info/	2014-01-10 17:14:22 -08:00
Matei Zaharia	942c80b34c	Fix one unit test that was not setting spark.cleaner.ttl	2014-01-10 16:32:36 -08:00
Patrick Wendell	f26553102c	Merge pull request #383 from tdas/driver-test API for automatic driver recovery for streaming programs and other bug fixes 1. Added Scala and Java API for automatically loading checkpoint if it exists in the provided checkpoint directory. Scala API: `StreamingContext.getOrCreate(<checkpoint dir>, <function to create new StreamingContext>)` returns a StreamingContext Java API: `JavaStreamingContext.getOrCreate(<checkpoint dir>, <factory obj of type JavaStreamingContextFactory>)`, return a JavaStreamingContext See the RecoverableNetworkWordCount below as an example of how to use it. 2. Refactored streaming.Checkpoint*** code to fix bugs and make the DStream metadata checkpoint writing and reading more robust. Specifically, it fixes and improves the logic behind backing up and writing metadata checkpoint files. Also, it ensure that spark.driver.* and spark.hostPort is cleared from SparkConf before being written to checkpoint. 3. Fixed bug in cleaning up of checkpointed RDDs created by DStream. Specifically, this fix ensures that checkpointed RDD's files are not prematurely cleaned up, thus ensuring reliable recovery. 4. TimeStampedHashMap is upgraded to optionally update the timestamp on map.get(key). This allows clearing of data based on access time (i.e., clear records were last accessed before a threshold timestamp). 5. Added caching for file modification time in FileInputDStream using the updated TimeStampedHashMap. Without the caching, enumerating the mod times to find new files can take seconds if there are 1000s of files. This cache is automatically cleared. This PR is not entirely final as I may make some minor additions - a Java examples, and adding StreamingContext.getOrCreate to unit test. Edit: Java example to be added later, unit test added.	2014-01-10 16:25:44 -08:00
Patrick Wendell	d37408f39c	Merge pull request #377 from andrewor14/master External Sorting for Aggregator and CoGroupedRDDs (Revisited) (This pull request is re-opened from https://github.com/apache/incubator-spark/pull/303, which was closed because Jenkins / github was misbehaving) The target issue for this patch is the out-of-memory exceptions triggered by aggregate operations such as reduce, groupBy, join, and cogroup. The existing AppendOnlyMap used by these operations resides purely in memory, and grows with the size of the input data until the amount of allocated memory is exceeded. Under large workloads, this problem is aggravated by the fact that OOM frequently occurs only after a very long (> 1 hour) map phase, in which case the entire job must be restarted. The solution is to spill the contents of this map to disk once a certain memory threshold is exceeded. This functionality is provided by ExternalAppendOnlyMap, which additionally sorts this buffer before writing it out to disk, and later merges these buffers back in sorted order. Under normal circumstances in which OOM is not triggered, ExternalAppendOnlyMap is simply a wrapper around AppendOnlyMap and incurs little overhead. Only when the memory usage is expected to exceed the given threshold does ExternalAppendOnlyMap spill to disk.	2014-01-10 16:25:01 -08:00
Ankur Dave	85a6645d31	Add doc for Algorithms	2014-01-10 16:08:58 -08:00
Ankur Dave	04c20e7f4f	Minor cleanup to docs	2014-01-10 15:58:30 -08:00
Ankur Dave	1788729273	Move VertexIdToIndexMap into impl	2014-01-10 15:58:18 -08:00
Ankur Dave	57d7487d3d	Improve docs for VertexRDD	2014-01-10 15:48:20 -08:00
Tathagata Das	4f39e79c23	Merge remote-tracking branch 'apache/master' into driver-test Conflicts: streaming/src/main/scala/org/apache/spark/streaming/DStreamGraph.scala	2014-01-10 15:47:01 -08:00
Andrew Or	2e393cd5fd	Update documentation for externalSorting	2014-01-10 15:45:38 -08:00
Tathagata Das	82f07deeda	Modified streaming.FailureSuite tests to test StreamingContext.getOrCreate.	2014-01-10 15:37:05 -08:00
Reynold Xin	0eaf01c5ed	Merge pull request #369 from pillis/master SPARK-961 Add a Vector.random() method Added method and testcases	2014-01-10 15:32:19 -08:00
Ankur Dave	11dd35c28b	Clean up GraphGenerators	2014-01-10 15:23:32 -08:00
Ankur Dave	9e48af6dba	Remove unused HashUtils class	2014-01-10 15:22:57 -08:00
Ankur Dave	b437ed62a8	graph -> graphx in pom.xml	2014-01-10 15:22:31 -08:00
Andrew Or	e4c51d2113	Address Patrick's and Reynold's comments Aside from trivial formatting changes, use nulls instead of Options for DiskMapIterator, and add documentation for spark.shuffle.externalSorting and spark.shuffle.memoryFraction. Also, set spark.shuffle.memoryFraction to 0.3, and spark.storage.memoryFraction = 0.6.	2014-01-10 15:09:51 -08:00
RongGu	94776f753f	fix a type error in comment lines	2014-01-11 05:43:56 +08:00
Thomas Graves	7cef8435d7	Merge pull request #371 from tgravescs/yarn_client_addjar_misc_fixes Yarn client addjar and misc fixes Fix the addJar functionality in yarn-client mode, add support for the other options supported in yarn-standalone mode, set the application type on yarn in hadoop 2.X, add documentation, change heartbeat interval to be same code as the yarn-standalone so it doesn't take so long to get containers and exit.	2014-01-10 15:34:15 -06:00
Ankur Dave	7bda997785	Improve docs for PartitionStrategy	2014-01-10 13:00:28 -08:00
Patrick Wendell	7b58f116e5	Merge pull request #384 from pwendell/debug-logs Make DEBUG-level logs consummable. Removes two things that caused issues with the debug logs: (a) Internal polling in the DAGScheduler was polluting the logs. (b) The Scala REPL logs were really noisy.	2014-01-10 12:47:46 -08:00
Ankur Dave	eb4b46f8d1	Improve docs for GraphOps	2014-01-10 12:46:00 -08:00
Shivaram Venkataraman	7c4e6e1bf1	Add i2 instance types to Spark EC2.	2014-01-10 12:44:55 -08:00
Ankur Dave	9454fa1f6c	Remove duplicate method in GraphLoader and improve docs	2014-01-10 12:37:20 -08:00
Ankur Dave	37611e57f6	Improve docs for EdgeRDD, EdgeTriplet, and GraphLab	2014-01-10 12:37:03 -08:00
Ankur Dave	eee9bc0958	Remove commented-out perf files	2014-01-10 12:36:15 -08:00
Ankur Dave	c39ec3017f	Remove some commented code	2014-01-10 12:17:17 -08:00
Tathagata Das	e4bb845238	Updated docs based on Patrick's comments in PR 383.	2014-01-10 12:17:09 -08:00
Ankur Dave	5fcd2a61b4	Finish cleaning up Graph docs	2014-01-10 12:17:04 -08:00
Ankur Dave	4c114a7556	Start cleaning up Scaladocs in Graph and EdgeRDD	2014-01-10 11:37:54 -08:00
Ankur Dave	3eb83191cb	Generate GraphX docs	2014-01-10 11:37:28 -08:00
Ankur Dave	6bd9a78e78	Add back Bagel links to docs, but mark them superseded	2014-01-10 11:37:10 -08:00
Ankur Dave	cfc10c74a3	Remove EdgeTriplet.{src,dst}Stale, which were unused	2014-01-10 10:43:23 -08:00
Ankur Dave	bf50e8c6cd	Remove commented code from Analytics	2014-01-10 10:37:04 -08:00
Ankur Dave	1b2aad918c	Update graphx/pom.xml to mirror mllib/pom.xml	2014-01-10 10:34:40 -08:00
Patrick Wendell	e9ed2d9e82	Make DEBUG-level logs consummable. Removes two things that caused issues with the debug logs: (a) Internal polling in the DAGScheduler was polluting the logs. (b) The Scala REPL logs were really noisy.	2014-01-10 10:33:24 -08:00
Ankur Dave	23d2995116	Merge pull request #1 from jegonzal/graphx ProgrammingGuide	2014-01-10 10:20:02 -08:00

... 3 4 5 6 7 ...

6166 commits