Commit graph

4655 commits

Author SHA1 Message Date
Reynold Xin 944f6b8048 Merge pull request #43 from amplab/FixBitSetCastException
Fix BitSet cast exception
2013-10-31 09:40:35 -07:00
Dan Crankshaw c430d2e21d Added bitset to kryo register 2013-10-31 01:01:59 -07:00
Joseph E. Gonzalez a3ce484a2c Adding additional type constraints to VertexSetRDD to help diagnose issues with recent benchmarks. 2013-10-30 21:02:21 -07:00
Ankur Dave 5064f9b2d2 Merge remote-tracking branch 'spark-upstream/master'
Conflicts:
	project/SparkBuild.scala
2013-10-30 15:59:09 -07:00
Dan Crankshaw a0c86c3689 Merge pull request #38 from jegonzal/Documentation
Improving Documentation
2013-10-30 15:34:39 -07:00
Matei Zaharia 618c1f6cf3 Merge pull request #125 from velvia/2013-10/local-jar-uri
Add support for local:// URI scheme for addJars()

This PR adds support for a new URI scheme for SparkContext.addJars():  `local://file/path`.
The *local* scheme indicates that the `/file/path` exists on every worker node.    The reason for its existence is for big library JARs, which would be really expensive to serve using the standard HTTP fileserver distribution method, especially for big clusters.  Today the only inexpensive method (assuming such a file is on every host, via say NFS, rsync, etc.) of doing this is to add the JAR to the SPARK_CLASSPATH, but we want a method where the user does not need to modify the Spark configuration.

I would add something to the docs, but it's not obvious where to add it.

Oh, and it would be great if this could be merged in time for 0.8.1.
2013-10-30 12:03:44 -07:00
Evan Chan de0285556a Add support for local:// URI scheme for addJars()
This indicates that a jar is available locally on each worker node.
2013-10-30 09:41:35 -07:00
Matei Zaharia 745dc42908 Merge pull request #118 from JoshRosen/blockinfo-memory-usage
Reduce the memory footprint of BlockInfo objects

This pull request reduces the memory footprint of all BlockInfo objects and makes additional optimizations for shuffle blocks.  For all BlockInfo objects, these changes remove two boolean fields and one Object field.  For shuffle blocks, we additionally remove an Object field and a boolean field.

When storing tens of thousands of these objects, this may add up to significant memory savings.  A ShuffleBlockInfo now only needs to wrap a single long.

This was motivated by a [report of high blockInfo memory usage during shuffles](https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C20131026134353.202b2b9b%40sh9%3E).

I haven't run benchmarks to measure the exact memory savings.

/cc @aarondav
2013-10-29 23:47:10 -07:00
Joey 4f63b5e17f Adding code example 2013-10-29 21:31:12 -07:00
Joey 1a20ba9b70 Updating images so they render correctly. 2013-10-29 21:06:29 -07:00
Joseph E. Gonzalez 41b3122120 Strating to improve README. 2013-10-29 20:57:55 -07:00
Josh Rosen cb9c8a922f Extract BlockInfo classes from BlockManager.
This saves space, since the inner classes needed
to keep a reference to the enclosing BlockManager.
2013-10-29 18:06:51 -07:00
Joey 06adf636c5 Merge pull request #33 from kellrott/master
Fixing graph/pom.xml
2013-10-29 16:43:46 -07:00
Joseph E. Gonzalez 38ec0baf5c fixing a typo in the VertexSetRDD docs 2013-10-29 16:27:55 -07:00
Joseph E. Gonzalez d8c8256e52 merging upstream changes 2013-10-29 16:23:26 -07:00
Josh Rosen 846b1cf5ab Store fewer BlockInfo fields for shuffle blocks. 2013-10-29 15:14:29 -07:00
Ankur Dave 098768e0b9 Merge pull request #37 from jegonzal/AnalyticsCleanup
Updated Connected Components and Pregel Docs
2013-10-29 15:08:36 -07:00
Joseph E. Gonzalez 08c7b040d6 Documented the VertexSetRDD 2013-10-29 15:03:13 -07:00
Joseph E. Gonzalez ede329336d Fixing a scaladoc bug in graph generators. 2013-10-29 14:50:12 -07:00
Joseph E. Gonzalez 15958ca65a Reindenting documentation. 2013-10-29 14:01:24 -07:00
Joseph E. Gonzalez d316cad9b1 Documented Graph.appy functions. 2013-10-29 13:58:04 -07:00
Joseph E. Gonzalez 19da8820fc Minor modifications to documentation. 2013-10-29 11:06:06 -07:00
Joseph E. Gonzalez 77626d1507 Adding collect neighbors and documenting GraphOps. 2013-10-29 11:05:42 -07:00
Joseph E. Gonzalez 942de98433 Making suggested changes. 2013-10-29 10:19:49 -07:00
Reynold Xin f0e23a023c Merge pull request #119 from soulmachine/master
A little revise for the document
2013-10-29 01:41:44 -04:00
Joseph E. Gonzalez d6a902f309 Finished updating connected components to used Pregel like abstraction and created a series of tests in the AnalyticsSuite. 2013-10-28 11:52:26 -07:00
soulmachine a197137fde A little revise for the document 2013-10-29 00:28:56 +08:00
Josh Rosen 2d7cf6a271 Restructure BlockInfo fields to reduce memory use. 2013-10-27 23:01:03 -07:00
Matei Zaharia aec9bf9060 Merge pull request #112 from kayousterhout/ui_task_attempt_id
Display both task ID and task attempt ID in UI, and rename taskId to taskAttemptId

Previously only the task attempt ID was shown in the UI; this was confusing because the job can be shown as complete while there are tasks still running.  Showing the task ID in addition to the attempt ID makes it clear which tasks are redundant.

This commit also renames taskId to taskAttemptId in TaskInfo and in the local/cluster schedulers.  This identifier was used to uniquely identify attempts, not tasks, so the current naming was confusing.  The new naming is also more consistent with map reduce.
2013-10-27 19:32:00 -07:00
Reynold Xin d4df4749a8 Merge pull request #115 from aarondav/shuffle-fix
Eliminate extra memory usage when shuffle file consolidation is disabled

Otherwise, we see SPARK-946 even when shuffle file consolidation is disabled.
Fixing SPARK-946 is still forthcoming.
2013-10-27 22:11:21 -04:00
Joseph E. Gonzalez a2287ae138 Implementing connected components on top of pregel like abstraction. 2013-10-27 10:42:11 -07:00
Aaron Davidson 4261e834cb Use flag instead of name check. 2013-10-26 23:53:38 -07:00
Aaron Davidson 596f18479e Eliminate extra memory usage when shuffle file consolidation is disabled
Otherwise, we see SPARK-946 even when shuffle file consolidation is disabled.
Fixing SPARK-946 is still forthcoming.
2013-10-26 22:35:01 -07:00
Kay Ousterhout ae22b4dd99 Display both task ID and task index in UI 2013-10-26 22:18:39 -07:00
Joseph E. Gonzalez 6a0fbc0374 Updating the GraphLab API to match the changes made to the Pregel API. 2013-10-26 15:44:19 -07:00
Joseph E. Gonzalez 08024c938c Adding more documentation to the Pregel API as well as additional functionality including the ability to specify the edge direction along which messages are computed. 2013-10-26 15:42:51 -07:00
Joseph E. Gonzalez 00e73833cc Fixing a bug in reverse edge direction. 2013-10-26 15:10:30 -07:00
Patrick Wendell e018f2d0ae Merge pull request #113 from pwendell/master
Improve error message when multiple assembly jars are present.

This can happen easily if building different hadoop versions. Right now it gives a class not found exception.
2013-10-26 11:39:15 -07:00
Reynold Xin 662ee9f321 Merge pull request #114 from soulmachine/master
A little revise for the document
2013-10-26 11:35:59 -07:00
soulmachine 2eed6bbd10 A little revise for the document 2013-10-26 15:13:57 +08:00
Patrick Wendell 4ba32678e0 Adding improved error message when multiple assembly jars are present.
This can happen easily if building different hadoop versions.
2013-10-25 19:01:15 -07:00
Matei Zaharia bab496c120 Merge pull request #108 from alig/master
Changes to enable executing by using HDFS as a synchronization point between driver and executors, as well as ensuring executors exit properly.
2013-10-25 18:28:43 -07:00
Matei Zaharia d307db6e55 Merge pull request #102 from tdas/transform
Added new Spark Streaming operations

New operations
- transformWith which allows arbitrary 2-to-1 DStream transform, added to Scala and Java API
- StreamingContext.transform to allow arbitrary n-to-1 DStream
- leftOuterJoin and rightOuterJoin between 2 DStreams, added to Scala and Java API
- missing variations of join and cogroup added to Scala Java API
- missing JavaStreamingContext.union

Updated a number of Java and Scala API docs
2013-10-25 17:26:06 -07:00
Ali Ghodsi eef261c892 fixing comments on PR 2013-10-25 16:48:33 -07:00
Kyle Ellrott 8236d5dcc4 More changes to the graph/pom.xml to make it match the other subprojects 2013-10-25 15:52:44 -07:00
Matei Zaharia 85e2cab6f6 Merge pull request #111 from kayousterhout/ui_name
Properly display the name of a stage in the UI.

This fixes a bug introduced by the fix for SPARK-940, which
changed the UI to display the RDD name rather than the stage
name. As a result, no name for the stage was shown when
using the Spark shell, which meant that there was no way to
click on the stage to see more details (e.g., the running
tasks). This commit changes the UI back to using the
stage name.

@pwendell -- let me know if this change was intentional
2013-10-25 14:46:06 -07:00
Tathagata Das dc9570782a Merge branch 'apache-master' into transform 2013-10-25 14:22:23 -07:00
Kyle Ellrott d39ac2eb40 Merge https://github.com/amplab/graphx 2013-10-25 13:16:05 -07:00
Kay Ousterhout a9c8d83aaf Properly display the name of a stage in the UI.
This fixes a bug introduced by the fix for SPARK-940, which
changed the UI to display the RDD name rather than the stage
name. As a result, no name for the stage was shown when
using the Spark shell, which meant that there was no way to
click on the stage to see more details (e.g., the running
tasks). This commit changes the UI back to using the
stage name.
2013-10-25 12:00:09 -07:00
Reynold Xin ab35ec4f0f Merge pull request #110 from pwendell/master
Exclude jopt from kafka dependency.

Kafka uses an older version of jopt that causes bad conflicts with the version
used by spark-perf. It's not easy to remove this downstream because of the way
that spark-perf uses Spark (by including a spark assembly as an unmanaged jar).
This fixes the problem at its source by just never including it.
2013-10-25 10:16:18 -07:00