ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Prashant Sharma	94b7a7fe37	run-example -> bin/run-example	2014-01-02 18:41:21 +05:30
Matei Zaharia	0fa5809768	Updated docs for SparkConf and handled review comments	2013-12-30 22:17:28 -05:00
Patrick Wendell	41c60b337a	Various broken links in documentation	2013-12-07 22:31:44 -08:00
Patrick Wendell	08c1a42d7d	Add a `repartition` operator. This patch adds an operator called repartition with more straightforward semantics than the current `coalesce` operator. There are a few use cases where this operator is useful: 1. If a user wants to increase the number of partitions in the RDD. This is more common now with streaming. E.g. a user is ingesting data on one node but they want to add more partitions to ensure parallelism of subsequent operations across threads or the cluster. Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's super confusing. 2. If a user has input data where the number of partitions is not known. E.g. > sc.textFile("some file").coalesce(50).... This is both vague semantically (am I growing or shrinking this RDD) but also, may not work correctly if the base RDD has fewer than 50 partitions. The new operator forces shuffles every time, so it will always produce exactly the number of new partitions. It also throws an exception rather than silently not-working if a bad input is passed. I am currently adding streaming tests (requires refactoring some of the test suite to allow testing at partition granularity), so this is not ready for merge yet. But feedback is welcome.	2013-10-24 14:31:33 -07:00
Aaron Davidson	4ea8ee468f	Add docs for standalone scheduler fault tolerance Also fix a couple HTML/Markdown issues in other files.	2013-10-08 14:18:31 -07:00
Matei Zaharia	5b4dea2143	More fixes	2013-09-01 14:13:16 -07:00
Matei Zaharia	d27cd03f30	Fix more URLs in docs	2013-09-01 14:13:16 -07:00
Matei Zaharia	4f422032e5	Update docs for new package	2013-09-01 14:13:15 -07:00
Matei Zaharia	53cd50c069	Change build and run instructions to use assemblies This commit makes Spark invocation saner by using an assembly JAR to find all of Spark's dependencies instead of adding all the JARs in lib_managed. It also packages the examples into an assembly and uses that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script with two better-named scripts: "run-examples" for examples, and "spark-class" for Spark internal classes (e.g. REPL, master, etc). This is also designed to minimize the confusion people have in trying to use "run" to run their own classes; it's not meant to do that, but now at least if they look at it, they can modify run-examples to do a decent job for them. As part of this, Bagel's examples are also now properly moved to the examples package instead of bagel.	2013-08-29 21:19:04 -07:00
Prashant Sharma	2bc348e92c	Linking custom receiver guide	2013-08-23 09:44:02 +05:30
Andy Konwinski	cd7259b4b8	Fixes typos in Spark Streaming Programming Guide These typos were reported on the spark-users mailing list, see: https://groups.google.com/d/msg/spark-users/SyLGgJlKCrI/LpeBypOkSMUJ	2013-07-12 11:51:14 -07:00
Andrew Ash	6efc8cae8f	Typos: cluser -> cluster	2013-04-10 13:44:10 -03:00
Matei Zaharia	fadeb1ddea	More doc tweaks	2013-02-26 23:20:49 -08:00
Tathagata Das	6a78ef0578	Merge pull request #500 from pwendell/streaming-docs Minor changes based on feedback	2013-02-25 15:28:45 -08:00
Patrick Wendell	8316534eef	meta-data	2013-02-25 15:27:04 -08:00
Patrick Wendell	918ee25867	One more change done with TD	2013-02-25 15:24:17 -08:00
Matei Zaharia	2ae15353a1	Merge branch 'master' of github.com:mesos/spark	2013-02-25 15:14:16 -08:00
Matei Zaharia	490f056cdd	Allow passing sparkHome and JARs to StreamingContext constructor Also warns if spark.cleaner.ttl is not set in the version where you pass your own SparkContext.	2013-02-25 15:13:30 -08:00
Patrick Wendell	07f2618769	Minor changes based on feedback	2013-02-25 15:09:59 -08:00
Patrick Wendell	50ce0516e6	Some changes to streaming failure docs. TD gave me the go-ahead to just make these changes: - Define stateful dstream - Some minor wording fixes	2013-02-25 14:38:39 -08:00
Matei Zaharia	5d4a0ac794	Some tweaks to docs	2013-02-25 14:23:03 -08:00
Tathagata Das	5ab37be983	Fixed class paths and dependencies based on Matei's comments.	2013-02-24 16:24:52 -08:00
Tathagata Das	b4eb24de96	Updated streaming programming guide with Java API info, and comments from Patrick.	2013-02-23 23:59:45 -08:00
Tathagata Das	d853aa9658	Change spark.cleaner.delay to spark.cleaner.ttl. Updated docs.	2013-02-23 17:42:26 -08:00
Tathagata Das	12ea14c211	Changed networkStream to socketStream and pluggableNetworkStream to become networkStream as a way to create streams from arbitrary network receiver.	2013-02-18 15:18:34 -08:00
Tathagata Das	8ad561dc7d	Added checkpointing and fault-tolerance semantics to the programming guide. Fixed default checkpoint interval to being a multiple of slide duration. Fixed visibility of some classes and objects to clean up docs.	2013-02-18 02:12:41 -08:00
Tathagata Das	0dbd411a56	Added documentation for PairDStreamFunctions.	2013-01-13 21:08:35 -08:00
Tathagata Das	237bac36e9	Renamed examples and added documentation.	2013-01-07 14:37:21 -08:00
Tathagata Das	02497f0cd4	Updated Streaming Programming Guide.	2013-01-01 12:21:32 -08:00
Tathagata Das	9e644402c1	Improved jekyll and scala docs. Made many classes and method private to remove them from scala docs.	2012-12-29 18:31:51 -08:00
Patrick Wendell	d39ac5fbc1	Streaming programming guide. STREAMING-2 #resolve	2012-11-13 21:19:58 -08:00

31 commits