Commit graph

7129 commits

Author SHA1 Message Date
Ankur Dave 11dd35c28b Clean up GraphGenerators 2014-01-10 15:23:32 -08:00
Ankur Dave 9e48af6dba Remove unused HashUtils class 2014-01-10 15:22:57 -08:00
Ankur Dave b437ed62a8 graph -> graphx in pom.xml 2014-01-10 15:22:31 -08:00
Andrew Or e4c51d2113 Address Patrick's and Reynold's comments
Aside from trivial formatting changes, use nulls instead of Options for
DiskMapIterator, and add documentation for spark.shuffle.externalSorting
and spark.shuffle.memoryFraction.

Also, set spark.shuffle.memoryFraction to 0.3, and spark.storage.memoryFraction = 0.6.
2014-01-10 15:09:51 -08:00
RongGu 94776f753f fix a type error in comment lines 2014-01-11 05:43:56 +08:00
Thomas Graves 7cef8435d7 Merge pull request #371 from tgravescs/yarn_client_addjar_misc_fixes
Yarn client addjar and misc fixes

Fix the addJar functionality in yarn-client mode, add support for the other options supported in yarn-standalone mode, set the application type on yarn in hadoop 2.X, add documentation, change heartbeat interval to be same code as the yarn-standalone so it doesn't take so long to get containers and exit.
2014-01-10 15:34:15 -06:00
Ankur Dave 7bda997785 Improve docs for PartitionStrategy 2014-01-10 13:00:28 -08:00
Patrick Wendell 7b58f116e5 Merge pull request #384 from pwendell/debug-logs
Make DEBUG-level logs consummable.

Removes two things that caused issues with the debug logs:

(a) Internal polling in the DAGScheduler was polluting the logs.
(b) The Scala REPL logs were really noisy.
2014-01-10 12:47:46 -08:00
Ankur Dave eb4b46f8d1 Improve docs for GraphOps 2014-01-10 12:46:00 -08:00
Shivaram Venkataraman 7c4e6e1bf1 Add i2 instance types to Spark EC2. 2014-01-10 12:44:55 -08:00
Ankur Dave 9454fa1f6c Remove duplicate method in GraphLoader and improve docs 2014-01-10 12:37:20 -08:00
Ankur Dave 37611e57f6 Improve docs for EdgeRDD, EdgeTriplet, and GraphLab 2014-01-10 12:37:03 -08:00
Ankur Dave eee9bc0958 Remove commented-out perf files 2014-01-10 12:36:15 -08:00
Ankur Dave c39ec3017f Remove some commented code 2014-01-10 12:17:17 -08:00
Tathagata Das e4bb845238 Updated docs based on Patrick's comments in PR 383. 2014-01-10 12:17:09 -08:00
Ankur Dave 5fcd2a61b4 Finish cleaning up Graph docs 2014-01-10 12:17:04 -08:00
Ankur Dave 4c114a7556 Start cleaning up Scaladocs in Graph and EdgeRDD 2014-01-10 11:37:54 -08:00
Ankur Dave 3eb83191cb Generate GraphX docs 2014-01-10 11:37:28 -08:00
Ankur Dave 6bd9a78e78 Add back Bagel links to docs, but mark them superseded 2014-01-10 11:37:10 -08:00
Ankur Dave cfc10c74a3 Remove EdgeTriplet.{src,dst}Stale, which were unused 2014-01-10 10:43:23 -08:00
Ankur Dave bf50e8c6cd Remove commented code from Analytics 2014-01-10 10:37:04 -08:00
Ankur Dave 1b2aad918c Update graphx/pom.xml to mirror mllib/pom.xml 2014-01-10 10:34:40 -08:00
Patrick Wendell e9ed2d9e82 Make DEBUG-level logs consummable.
Removes two things that caused issues with the debug logs:

(a) Internal polling in the DAGScheduler was polluting the logs.
(b) The Scala REPL logs were really noisy.
2014-01-10 10:33:24 -08:00
Ankur Dave 23d2995116 Merge pull request #1 from jegonzal/graphx
ProgrammingGuide
2014-01-10 10:20:02 -08:00
Tathagata Das 2213a5a47f Merge branch 'driver-test' of github.com:tdas/incubator-spark into driver-test 2014-01-10 05:06:22 -08:00
Tathagata Das 740730a179 Fixed conf/slaves and updated docs. 2014-01-10 05:06:15 -08:00
Tathagata Das 4f609f7901 Removed spark.hostPort and other setting from SparkConf before saving to checkpoint. 2014-01-10 12:58:07 +00:00
Tathagata Das d7ec73ac76 Merge branch 'driver-test' of github.com:tdas/incubator-spark into driver-test 2014-01-10 11:44:17 +00:00
Tathagata Das 9d3d9c8251 Refactored graph checkpoint file reading and writing code to make it cleaner and easily debuggable. 2014-01-10 11:44:02 +00:00
Ankur Dave 729277ebc4 Undo 8b6b8ac87f
Getting unpersist right in GraphLab is tricky.
2014-01-10 01:53:28 -08:00
Ankur Dave 4cc550909a graph -> graphx in log4j.properties 2014-01-10 00:59:59 -08:00
Joseph E. Gonzalez b1eeefb401 WIP. Updating figures and cleaning up initial skeleton for GraphX Programming guide. 2014-01-10 00:39:08 -08:00
Ankur Dave ba511f890e Avoid recomputation by caching all multiply-used RDDs 2014-01-10 00:35:02 -08:00
Ankur Dave 8b6b8ac87f Unpersist previous iterations in GraphLab 2014-01-10 00:34:08 -08:00
Matei Zaharia 669ba4caa9 Fix default TTL for metadata cleaner
It seems to have been set to 3500 in a previous commit for debugging,
but it should be off by default
2014-01-10 00:21:36 -08:00
Pillis 8d021b42bc SPARK-961. Add a Vector.random() method - update 1 2014-01-10 00:07:36 -08:00
Matei Zaharia 0ebc97305a Merge pull request #375 from mateiz/option-fix
Fix bug added when we changed AppDescription.maxCores to an Option

The Scala compiler warned about this -- we were comparing an Option against an integer now.
2014-01-09 23:58:49 -08:00
Patrick Wendell dd03cea02a Merge pull request #378 from pwendell/consolidate_on
Enable shuffle consolidation by default.

Bump this to being enabled for 0.9.0.
2014-01-09 23:38:03 -08:00
Ankur Dave 2578332f97 Add Graph.unpersistVertices() 2014-01-09 23:34:35 -08:00
Ankur Dave 8ae108f6c4 Unpersist previous iterations in Pregel 2014-01-09 23:25:35 -08:00
Reza Zadeh 21c8a54c08 Merge remote-tracking branch 'upstream/master' into sparsesvd
Conflicts:
	docs/mllib-guide.md
2014-01-09 22:45:32 -08:00
Patrick Wendell 460f655cc6 Enable shuffle consolidation by default.
Bump this to being enabled for 0.9.0.
2014-01-09 22:42:50 -08:00
Reza Zadeh cf5bd4ab2e fix example 2014-01-09 22:39:41 -08:00
Patrick Wendell 997c830e0b Merge pull request #363 from pwendell/streaming-logs
Set default logging to WARN for Spark streaming examples.

This programatically sets the log level to WARN by default for streaming
tests. If the user has already specified a log4j.properties file,
the user's file will take precedence over this default.
2014-01-09 22:22:20 -08:00
Andrew Or 372a533a6c Fix wonky imports from merge 2014-01-09 21:47:49 -08:00
Ankur Dave 210f2dd84f graph -> graphx in bin/compute-classpath.sh 2014-01-09 21:47:40 -08:00
Andrew Or aa5002bb96 Defensively allocate memory from global pool
This is an alternative to the existing approach, which evenly distributes the
collective shuffle memory among all running tasks. In the new approach, each
thread requests a chunk of memory whenever its map is about to multiplicatively
grow. If there is sufficient memory in the global pool, the thread allocates it
and grows its map. Otherwise, it spills.

A danger with the previous approach is that a new task may quickly fill up its
map before old tasks finish spilling, potentially causing an OOM. This approach
prevents this scenario as it favors existing tasks over new tasks; any thread
that may step over the boundary of other threads defensively backs off and
starts spilling.

Testing through spark-perf reveals: (1) When no spills have occured, the
performance of external sorting using this memory management approach is
essentially the same as without external sorting. (2) When one or more spills
have occured, the performance of external sorting is a small multiple (3x) worse
2014-01-09 21:43:58 -08:00
Andrew Or d76e1f90a8 Merge github.com:apache/incubator-spark
Conflicts:
	core/src/main/scala/org/apache/spark/SparkEnv.scala
	streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java
2014-01-09 21:38:48 -08:00
Ankur Dave b7c92dded3 Add implicit algorithm methods for Graph; remove standalone PageRank 2014-01-09 20:44:28 -08:00
Patrick Wendell 7b748b83a1 Minor clean-up 2014-01-09 20:42:48 -08:00