ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Reynold Xin	4f80dd22bd	Fixed a bug that variable encoding doesn't work for ints that use all 64 bits.	2013-12-05 16:19:37 -08:00
Ankur Dave	e0bcaa0942	Merge pull request #86 from ankurdave/vid-varenc Finish work on #85	2013-12-05 12:37:02 -08:00
Ankur Dave	6a7b396e5d	Finish work on #85	2013-12-05 12:35:03 -08:00
Reynold Xin	3e96b9ab1e	Merge pull request #85 from ankurdave/vid-varenc Always write Vids using variable encoding	2013-12-05 12:07:36 -08:00
Ankur Dave	a3bb98b88a	Always write Vids using variable encoding Also, autoformat Serializers.scala.	2013-12-05 12:06:07 -08:00
Joey	e0347ba6c7	Merge pull request #83 from ankurdave/fix-tests Fix compile errors in GraphSuite and SerializerSuite	2013-12-04 17:38:06 -08:00
Ankur Dave	2e583d2de4	Declare Vids explicitly to avoid ClassCastException	2013-12-04 17:34:14 -08:00
Ankur Dave	92e96f727e	Fix compile errors in GraphSuite and SerializerSuite	2013-12-04 17:29:52 -08:00
Reynold Xin	cbd3b7571f	Merge pull request #81 from amplab/clean1 Codebase refactoring	2013-12-04 15:35:26 -08:00
Reynold Xin	8701cb55e6	Use specialized shuffler for aggregation.	2013-12-01 21:55:50 -08:00
Reynold Xin	55edbb4209	Created an algorithms package and put all algorithms there.	2013-12-01 20:17:26 -08:00
Reynold Xin	e36fe55a03	Memoize preferred locations in ZippedPartitionsBaseRDD so preferred location computation doesn't lead to exponential explosion.	2013-11-30 18:07:36 -08:00
Reynold Xin	583a389e3f	Removed PartitionStrategy from GraphImpl.	2013-11-30 17:00:54 -08:00
Reynold Xin	6eeadb667d	Created EdgeRDD.	2013-11-30 16:53:54 -08:00
Reynold Xin	34ee81415e	Merged Ankur's pull request #80 and fixed subgraph.	2013-11-30 15:10:30 -08:00
Reynold Xin	8e790b7f7a	Merge branch 'subgraph-test' of github.com:ankurdave/graphx into clean1 Conflicts: graph/src/main/scala/org/apache/spark/graph/impl/VertexPartition.scala	2013-11-30 14:48:43 -08:00
Reynold Xin	229022891f	Made all VertexPartition internal data structures private.	2013-11-30 14:45:56 -08:00
Reynold Xin	b30e0ae035	Added an optimized count to VertexSetRDD.	2013-11-30 14:24:18 -08:00
Reynold Xin	689f757f7a	Merge branch 'clean1' of github.com:amplab/graphx into clean1	2013-11-30 14:16:06 -08:00
Reynold Xin	4d3d68b8fb	Minor update to tests.	2013-11-30 14:15:47 -08:00
Ankur Dave	3292cb0f9c	Revert "Fix join error by caching vTable in mapReduceTriplets" This reverts commit `dee1318d3d`, which is unnecessary due to `7528e6d5f1`.	2013-11-30 14:05:32 -08:00
Reynold Xin	e72bd91590	Merge branch 'clean1' of github.com:amplab/graphx into clean1	2013-11-30 14:04:45 -08:00
Reynold Xin	7528e6d5f1	Enable joining arbitrary VertexPartitions (with different indexes).	2013-11-30 14:04:16 -08:00
Ankur Dave	eed3195038	Fix VertexSetRDD test by enabling index reuse	2013-11-30 13:50:37 -08:00
Ankur Dave	dee1318d3d	Fix join error by caching vTable in mapReduceTriplets	2013-11-30 13:37:19 -08:00
Reynold Xin	10c0f9b0bb	Added a log4j properties file for graphx unit tests.	2013-11-30 13:18:43 -08:00
Reynold Xin	95e83af209	More, bigger cleaning for better encapsulation of VertexSetRDD and VertexPartition. This is work in progress as stuff doesn't really run.	2013-11-27 00:30:26 -08:00
Reynold Xin	caba162861	Added join and aggregateUsingIndex to VertexPartition.	2013-11-26 21:02:39 -08:00
Ankur Dave	9e896be375	Test edge filtering in subgraph (test fails)	2013-11-26 15:58:55 -08:00
Ankur Dave	137294e2ab	Test GraphImpl.subgraph and fix bug	2013-11-26 15:32:47 -08:00
Reynold Xin	2d19d0381b	Merge branch 'simplify' into clean	2013-11-26 13:55:26 -08:00
Reynold Xin	d58bfa8573	Code cleaning to improve readability.	2013-11-26 13:54:46 -08:00
Reynold Xin	d074e4c6ab	Bring PrimitiveVector up to date.	2013-11-26 02:49:41 -08:00
Reynold Xin	088995f917	Merge pull request #77 from amplab/upgrade Sync with Spark master	2013-11-25 00:57:51 -08:00
Reynold Xin	6bcac986b2	Merge branch 'master' of github.com:apache/incubator-spark	2013-11-25 15:47:47 +08:00
Reynold Xin	62889c419c	Merge pull request #203 from witgo/master Fix Maven build for metrics-graphite	2013-11-25 11:27:45 +08:00
LiGuoqiang	989203604e	Fix Maven build for metrics-graphite	2013-11-25 11:23:11 +08:00
Ankur Dave	6af03edcf1	Merge pull request #76 from dcrankshaw/fix_partitioners Actually use partitioner command line args in Analytics.	2013-11-24 16:42:37 -08:00
Dan Crankshaw	4b6b15dadd	Actually use partitioner command line args in Analytics.	2013-11-24 16:38:38 -08:00
Matei Zaharia	859d62dc2a	Merge pull request #151 from russellcardullo/add-graphite-sink Add graphite sink for metrics This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.	2013-11-24 16:19:51 -08:00
Matei Zaharia	65de73c7f8	Merge pull request #185 from mkolod/random-number-generator XORShift RNG with unit tests and benchmark This patch was introduced to address SPARK-950 - the discussion below the ticket explains not only the rationale, but also the design and testing decisions: https://spark-project.atlassian.net/browse/SPARK-950 To run unit test, start SBT console and type: compile test-only org.apache.spark.util.XORShiftRandomSuite To run benchmark, type: project core console Once the Scala console starts, type: org.apache.spark.util.XORShiftRandom.benchmark(100000000) XORShiftRandom is also an object with a main method taking the number of iterations as an argument, so you can also run it from the command line.	2013-11-24 15:52:33 -08:00
Reynold Xin	972171b9d9	Merge pull request #197 from aarondav/patrick-fix Fix 'timeWriting' stat for shuffle files Due to concurrent git branches, changes from shuffle file consolidation patch caused the shuffle write timing patch to no longer actually measure the time, since it requires time be measured after the stream has been closed.	2013-11-25 07:50:46 +08:00
Reynold Xin	a1a7e3627c	Merge pull request #75 from amplab/simplify Simplify GraphImpl internals	2013-11-24 05:15:09 -08:00
Reynold Xin	718cc803f7	Merge pull request #200 from mateiz/hash-fix AppendOnlyMap fixes - Chose a more random reshuffling step for values returned by Object.hashCode to avoid some long chaining that was happening for consecutive integers (e.g. `sc.makeRDD(1 to 100000000, 100).map(t => (t, t)).reduceByKey(_ + _).count`) - Some other small optimizations throughout (see commit comments)	2013-11-24 11:02:02 +08:00
Matei Zaharia	9837a60234	Some other optimizations to AppendOnlyMap: - Don't check keys for equality when re-inserting due to growing the table; the keys will already be unique - Remember the grow threshold instead of recomputing it on each insert	2013-11-23 17:38:29 -08:00
Matei Zaharia	7535d7fbcb	Fixes to AppendOnlyMap: - Use Murmur Hash 3 finalization step to scramble the bits of HashCode instead of the simpler version in java.util.HashMap; the latter one had trouble with ranges of consecutive integers. Murmur Hash 3 is used by fastutil. - Use Object.equals() instead of Scala's == to compare keys, because the latter does extra casts for numeric types (see the equals method in https://github.com/scala/scala/blob/master/src/library/scala/runtime/BoxesRunTime.java)	2013-11-23 17:21:37 -08:00
Reynold Xin	51aa9d6e99	Merge pull request #198 from ankurdave/zipPartitions-preservesPartitioning Support preservesPartitioning in RDD.zipPartitions In `RDD.zipPartitions`, add support for a `preservesPartitioning` option (similar to `RDD.mapPartitions`) that reuses the first RDD's partitioner.	2013-11-23 19:46:46 +08:00
Ankur Dave	c1507afc6c	Support preservesPartitioning in RDD.zipPartitions	2013-11-23 03:03:31 -08:00
Ankur Dave	fad6e70add	Simplify GraphImpl internals	2013-11-23 02:59:56 -08:00
Ankur Dave	ad56ae7bfd	Support preservesPartitioning in RDD.zipPartitions	2013-11-23 02:32:37 -08:00

1 2 3 4 5 ...

4947 commits