ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Ankur Dave	59e4384e19	Fix Pregel SSSP example in programming guide	2014-01-13 21:02:38 -08:00
Ankur Dave	c6023bee60	Fix infinite loop in GraphGenerators.generateRandomEdges The loop occurred when numEdges < numVertices. This commit fixes it by allowing generateRandomEdges to generate a multigraph.	2014-01-13 21:02:37 -08:00
Ankur Dave	84d6af8021	Make Graph{,Impl,Ops} serializable to work around capture	2014-01-13 21:02:37 -08:00
Ankur Dave	d4d9ece1af	Remove Graph.statistics and GraphImpl.printLineage	2014-01-13 21:02:37 -08:00
Andrew Or	839934140f	Wording changes per Patrick	2014-01-13 20:51:38 -08:00
Matei Zaharia	cc93c2abb1	Disable MLlib tests for now while Jenkins is still on Python 2.6	2014-01-13 20:46:46 -08:00
Patrick Wendell	b07bc02a00	Merge pull request #412 from harveyfeng/master Add default value for HadoopRDD's `cloneRecords` constructor arg Small mend to https://github.com/apache/incubator-spark/pull/359/files#diff-1 for backwards compatibility	2014-01-13 20:45:22 -08:00
Reynold Xin	33022d6656	Adjusted visibility of various components.	2014-01-13 19:58:53 -08:00
Patrick Wendell	a2fee38ee0	Merge pull request #411 from tdas/filestream-fix Improved logic of finding new files in FileInputDStream Earlier, if HDFS has a hiccup and reports a existence of a new file (mod time T sec) at time T + 1 sec, then fileStream could have missed that file. With this change, it should be able to find files that are delayed by up to <batch size> seconds. That is, even if file is reported at T + <batch time> sec, file stream should be able to catch it. The new logic, at a high level, is as follows. It keeps track of the new files it found in the previous interval and mod time of the oldest of those files (lets call it X). Then in the current interval, it will ignore those files that were seen in the previous interval and those which have mod time older than X. So if a new file gets reported by HDFS that in the current interval, but has mod time in the previous interval, it will be considered. However, if the mod time earlier than the previous interval (that is, earlier than X), they will be ignored. This is the current limitation, and future version would improve this behavior further. Also reduced line lengths in DStream to <=100 chars.	2014-01-13 19:45:26 -08:00
Harvey	9e84e70509	Add default value for HadoopRDD's `cloneRecords` constructor arg, to maintain backwards compatibility.	2014-01-13 19:43:40 -08:00
Joseph E. Gonzalez	ee8931d2c6	Finished documenting vertexrdd.	2014-01-13 19:30:35 -08:00
Patrick Wendell	d4cd5debf4	Fix for Kryo Serializer	2014-01-13 19:03:59 -08:00
Reynold Xin	0fbc0b0561	Merge branch 'graphx' of github.com:ankurdave/incubator-spark into graphx	2014-01-13 18:51:22 -08:00
Reynold Xin	0b18bfba1a	Updated doc for PageRank.	2014-01-13 18:51:04 -08:00
Reynold Xin	9317286b72	More cleanup.	2014-01-13 18:45:35 -08:00
Reynold Xin	8e5c732430	Moved SVDPlusPlusConf into SVDPlusPlus object itself.	2014-01-13 18:45:20 -08:00
Raymond Liu	4c22c55ad6	Address comments to fix code formats	2014-01-14 10:41:42 +08:00
Joseph E. Gonzalez	552de5d42e	Finished second pass on pregel docs.	2014-01-13 18:40:43 -08:00
Joseph E. Gonzalez	622b7f7d39	Minor changes in graphx programming guide.	2014-01-13 18:40:43 -08:00
Raymond Liu	161ab93989	Yarn workerRunnable refactor	2014-01-14 10:36:00 +08:00
Raymond Liu	79a5ba3497	Yarn Client refactor	2014-01-14 10:33:48 +08:00
Reynold Xin	1dce9ce446	Moved PartitionStrategy's into an object.	2014-01-13 18:32:04 -08:00
Reynold Xin	ae06d2c22f	Updated GraphGenerator.	2014-01-13 18:31:49 -08:00
Reynold Xin	87f335db78	Made more things private.	2014-01-13 18:30:26 -08:00
Reynold Xin	a4e12af7aa	Merge branch 'graphx' of github.com:ankurdave/incubator-spark into graphx Conflicts: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala	2014-01-13 17:42:59 -08:00
Reynold Xin	02a8f54bfa	Miscel doc update.	2014-01-13 17:40:36 -08:00
Tathagata Das	1233b3de01	Merge remote-tracking branch 'apache/master' into filestream-fix	2014-01-13 17:29:19 -08:00
Joseph E. Gonzalez	cfe4a29dcb	Improvements in example code for the programming guide as well as adding serialization support for GraphImpl to address issues with failed closure capture.	2014-01-13 17:18:31 -08:00
Ankur Dave	ae4b75d94a	Add EdgeDirection.Either and use it to fix CC bug The bug was due to a misunderstanding of the activeSetOpt parameter to Graph.mapReduceTriplets. Passing EdgeDirection.Both causes mapReduceTriplets to run only on edges with both vertices in the active set. This commit adds EdgeDirection.Either, which causes mapReduceTriplets to run on edges with either vertex in the active set. This is what connected components needed.	2014-01-13 17:03:03 -08:00
Ankur Dave	1bd5cefcae	Remove aggregateNeighbors	2014-01-13 17:03:03 -08:00
Tathagata Das	c0bb38e8aa	Improved file input stream further.	2014-01-13 16:54:52 -08:00
Reynold Xin	dc041cd3b6	Merge branch 'scaladoc1' of github.com:rxin/incubator-spark into graphx	2014-01-13 16:25:21 -08:00
Reynold Xin	01c0d72b32	Merge pull request #410 from rxin/scaladoc1 Updated JavaStreamingContext to make scaladoc compile. `sbt/sbt doc` used to fail. This fixed it.	2014-01-13 16:24:30 -08:00
Reynold Xin	e2d25d2dfe	Merge branch 'master' into graphx	2014-01-13 16:21:26 -08:00
Reynold Xin	30328c347b	Updated JavaStreamingContext to make scaladoc compile. `sbt/sbt doc` used to fail. This fixed it.	2014-01-13 15:58:39 -08:00
Ankur Dave	8038da2328	Merge pull request #2 from jegonzal/GraphXCCIssue Improving documentation and identifying potential bug in CC calculation.	2014-01-13 14:59:30 -08:00
Tathagata Das	27311b1332	Added unpersisting and modified testsuite to better test out metadata cleaning.	2014-01-13 14:57:07 -08:00
Ankur Dave	97cd27e31b	Add graph loader links to doc	2014-01-13 14:54:48 -08:00
Ankur Dave	15ca89b11e	Fix mapReduceTriplets links in doc	2014-01-13 14:54:33 -08:00
Joseph E. Gonzalez	80e4d98dc6	Improving documentation and identifying potential bug in CC calculation.	2014-01-13 13:40:16 -08:00
Patrick Wendell	c3816de504	Changing option wording per discussion with Andrew	2014-01-13 13:25:06 -08:00
Ankur Dave	9fe88627b5	Improve EdgeRDD scaladoc	2014-01-13 13:16:41 -08:00
Ankur Dave	ea69cff711	Further improve VertexRDD scaladocs	2014-01-13 12:52:52 -08:00
Patrick Wendell	5d61e051c2	Improvements to external sorting 1. Adds the option of compressing outputs. 2. Adds batching to the serialization to prevent OOM on the read side. 3. Slight renaming of config options. 4. Use Spark's buffer size for reads in addition to writes.	2014-01-13 12:21:39 -08:00
Patrick Wendell	b93f9d42f2	Merge pull request #400 from tdas/dstream-move Moved DStream and PairDSream to org.apache.spark.streaming.dstream Similar to the package location of `org.apache.spark.rdd.RDD`, `DStream` has been moved from `org.apache.spark.streaming.DStream` to `org.apache.spark.streaming.dstream.DStream`. I know that the package name is a little long, but I think its better to keep it consistent with Spark's structure. Also fixed persistence of windowed DStream. The RDDs generated generated by windowed DStream are essentially unions of underlying RDDs, and persistent these union RDDs would store numerous copies of the underlying data. Instead setting the persistence level on the windowed DStream is made to set the persistence level of the underlying DStream.	2014-01-13 12:18:05 -08:00
Ankur Dave	8ca9773974	Add LiveJournalPageRank example	2014-01-13 12:17:58 -08:00
Saurabh Rawat	e922973373	Modifications as suggested in PR feedback- - mapPartitions, foreachPartition moved to JavaRDDLike - call scala rdd's setGenerator instead of setting directly in JavaRDD	2014-01-13 23:40:04 +05:30
eklavya	fa42951e3b	Remove default param from mapPartitions	2014-01-13 18:13:22 +05:30
eklavya	8fe562c0fa	Remove classtag from mapPartitions.	2014-01-13 18:09:58 +05:30
eklavya	6a65feebc7	Added foreachPartition method to JavaRDD.	2014-01-13 17:56:47 +05:30

... 3 4 5 6 7 ...

6336 commits