ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Patrick Wendell	0984647aae	Enable compression by default for spills	2014-01-13 23:25:25 -08:00
Patrick Wendell	c3816de504	Changing option wording per discussion with Andrew	2014-01-13 13:25:06 -08:00
Patrick Wendell	5d61e051c2	Improvements to external sorting 1. Adds the option of compressing outputs. 2. Adds batching to the serialization to prevent OOM on the read side. 3. Slight renaming of config options. 4. Use Spark's buffer size for reads in addition to writes.	2014-01-13 12:21:39 -08:00
Patrick Wendell	2802cc80bc	Disable shuffle file consolidation by default	2014-01-12 19:16:43 -08:00
Patrick Wendell	d37408f39c	Merge pull request #377 from andrewor14/master External Sorting for Aggregator and CoGroupedRDDs (Revisited) (This pull request is re-opened from https://github.com/apache/incubator-spark/pull/303, which was closed because Jenkins / github was misbehaving) The target issue for this patch is the out-of-memory exceptions triggered by aggregate operations such as reduce, groupBy, join, and cogroup. The existing AppendOnlyMap used by these operations resides purely in memory, and grows with the size of the input data until the amount of allocated memory is exceeded. Under large workloads, this problem is aggravated by the fact that OOM frequently occurs only after a very long (> 1 hour) map phase, in which case the entire job must be restarted. The solution is to spill the contents of this map to disk once a certain memory threshold is exceeded. This functionality is provided by ExternalAppendOnlyMap, which additionally sorts this buffer before writing it out to disk, and later merges these buffers back in sorted order. Under normal circumstances in which OOM is not triggered, ExternalAppendOnlyMap is simply a wrapper around AppendOnlyMap and incurs little overhead. Only when the memory usage is expected to exceed the given threshold does ExternalAppendOnlyMap spill to disk.	2014-01-10 16:25:01 -08:00
Andrew Or	2e393cd5fd	Update documentation for externalSorting	2014-01-10 15:45:38 -08:00
Andrew Or	e4c51d2113	Address Patrick's and Reynold's comments Aside from trivial formatting changes, use nulls instead of Options for DiskMapIterator, and add documentation for spark.shuffle.externalSorting and spark.shuffle.memoryFraction. Also, set spark.shuffle.memoryFraction to 0.3, and spark.storage.memoryFraction = 0.6.	2014-01-10 15:09:51 -08:00
Patrick Wendell	460f655cc6	Enable shuffle consolidation by default. Bump this to being enabled for 0.9.0.	2014-01-09 22:42:50 -08:00
Patrick Wendell	112c0a1776	Fixing config option "retained_stages" => "retainedStages". This is a very esoteric option and it's out of sync with the style we use. So it seems fitting to fix it for 0.9.0.	2014-01-08 21:16:16 -08:00
Matei Zaharia	2c421749ea	Address review comments	2014-01-07 19:30:23 -05:00
Matei Zaharia	d8bcc8e9a0	Add way to limit default # of cores used by applications on standalone mode Also documents the spark.deploy.spreadOut option.	2014-01-07 14:35:52 -05:00
Prashant Sharma	c729fa7c8e	formatting related fixes suggested by Patrick.	2014-01-07 13:08:16 +05:30
Prashant Sharma	b84dc780d3	Allow configuration to be printed in logs for diagnosis.	2014-01-07 13:01:43 +05:30
Prashant Sharma	b3018811e1	Allow users to set arbitrary akka configurations via spark conf.	2014-01-07 13:01:43 +05:30
Andrew Ash	2dd4fb5698	Clarify spark.cores.max It controls the count of cores across the cluster, not on a per-machine basis.	2014-01-06 09:01:46 -08:00
Matei Zaharia	0fa5809768	Updated docs for SparkConf and handled review comments	2013-12-30 22:17:28 -05:00
Prashant Sharma	d3090b79a5	A few corrections to documentation.	2013-12-12 10:12:06 +05:30
Prashant Sharma	603af51bb5	Merge branch 'master' into akka-bug-fix Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala	2013-12-11 10:21:53 +05:30
Aaron Davidson	cb6ac8aafb	Correct spellling error in configuration.md	2013-12-07 01:40:01 -08:00
Patrick Wendell	7a1d1c93b8	Minor formatting fix in config file	2013-12-06 20:28:22 -08:00
Patrick Wendell	b9451acdf4	Adding disclaimer for shuffle file consolidation	2013-12-06 19:25:28 -08:00
Patrick Wendell	1450b8ef87	Small changes from Matei review	2013-12-04 18:49:32 -08:00
Patrick Wendell	b1c6fa1584	Document missing configs and set shuffle consolidation to false.	2013-12-04 18:39:34 -08:00
Prashant Sharma	54862af5ee	Improvements from the review comments and followed Boy Scout Rule.	2013-11-27 14:26:28 +05:30
Prashant Sharma	dca946ff67	Documenting the newly added spark properties.	2013-11-26 20:47:38 +05:30
Reynold Xin	f628804c02	Merge pull request #76 from pwendell/master Clarify compression property. Clarifies that this governs compression of internal data, not input data or output data.	2013-10-18 23:19:42 -07:00
Patrick Wendell	6b62836285	Clarify compression property. Clarifies that this governs compression of internal data, not input data or output data.	2013-10-18 23:08:44 -07:00
Mosharaf Chowdhury	35b2415fb3	Code styling. Updated doc.	2013-10-17 13:14:12 -07:00
Patrick Wendell	bddf135670	Change port from 3030 to 4040	2013-09-11 10:01:38 -07:00
Matei Zaharia	98fb69822c	Work in progress: - Add job scheduling docs - Rename some fair scheduler properties - Organize intro page better - Link to Apache wiki for "contributing to Spark"	2013-09-08 00:29:11 -07:00
Matei Zaharia	9329a7d4cd	Fix spark.io.compression.codec and change default codec to LZF	2013-09-02 10:15:22 -07:00
Matei Zaharia	9ee1e9db2e	Doc improvements	2013-09-01 22:12:03 -07:00
Matei Zaharia	0a8cc30921	Move some classes to more appropriate packages: * RDD, RDDFunctions -> org.apache.spark.rdd Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util * JavaSerializer, KryoSerializer -> org.apache.spark.serializer	2013-09-01 14:13:16 -07:00
Matei Zaharia	4f422032e5	Update docs for new package	2013-09-01 14:13:15 -07:00
Matei Zaharia	4819baa658	More updates, describing changes to recommended use of environment vars and new Python stuff	2013-08-31 14:21:10 -07:00
Matei Zaharia	53b1c30607	Update docs for Spark UI port	2013-08-20 22:57:11 -07:00
Matei Zaharia	2a4ed10210	Address some review comments: - When a resourceOffers() call has multiple offers, force the TaskSets to consider them in increasing order of locality levels so that they get a chance to launch stuff locally across all offers - Simplify ClusterScheduler.prioritizeContainers - Add docs on the new configuration options	2013-08-18 19:51:07 -07:00
Matei Zaharia	3097d75d6f	Merge remote-tracking branch 'dlyubimov/SPARK-827' Conflicts: docs/configuration.md	2013-07-31 18:36:43 -07:00
Reynold Xin	5227043f84	Documentation update for compression codec.	2013-07-30 17:12:16 -07:00
Dmitriy Lyubimov	0862494d44	typo	2013-07-27 23:16:20 -07:00
Dmitriy Lyubimov	f5067abe85	changes per comments.	2013-07-27 23:08:00 -07:00
Matei Zaharia	d47c16f78d	Add an option to disable reference tracking in Kryo	2013-07-15 01:55:54 +00:00
Matei Zaharia	1ffadb2d9e	Merge remote-tracking branch 'pwendell/ui-updates' Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala core/src/main/scala/spark/util/AkkaUtils.scala pom.xml	2013-07-06 15:51:41 -07:00
Matei Zaharia	5bbd0eec84	Update docs on SCALA_LIBRARY_PATH	2013-06-30 17:00:40 -07:00
Matei Zaharia	03d0b858c8	Made use of spark.executor.memory setting consistent and documented it Conflicts: core/src/main/scala/spark/SparkContext.scala	2013-06-30 15:46:46 -07:00
Patrick Wendell	a59c15a37e	Adding config option for retained stages	2013-06-26 08:54:57 -07:00
Tathagata Das	c89af0a7f9	Merge branch 'master' into streaming Conflicts: .gitignore	2013-06-24 23:57:47 -07:00
seanm	ab0f834dbb	adding spark.streaming.blockInterval property	2013-04-16 11:57:05 -06:00
Matei Zaharia	22334eafd9	Some tweaks to docs	2013-02-26 22:52:38 -08:00
Tathagata Das	d853aa9658	Change spark.cleaner.delay to spark.cleaner.ttl. Updated docs.	2013-02-23 17:42:26 -08:00
Matei Zaharia	05d2e94838	Use a separate memory setting for standalone cluster daemons Conflicts: docs/_config.yml	2013-02-10 21:59:41 -08:00
Stephen Haberman	7dfb82a992	Replace old 'master' term with 'driver'.	2013-01-25 11:03:00 -06:00
Matei Zaharia	76d7c0ce2b	Add more Akka settings to docs	2013-01-21 13:10:33 -08:00
Tathagata Das	02497f0cd4	Updated Streaming Programming Guide.	2013-01-01 12:21:32 -08:00
Matei Zaharia	19910c00c3	tweaks	2012-10-13 16:22:39 -07:00
Matei Zaharia	4a3e9cf69c	Document how to configure SPARK_MEM & co on a per-job basis	2012-10-13 16:20:25 -07:00
Andy Konwinski	45d03231d0	Adds liquid variables to docs templating system so that they can be used throughout the docs: SPARK_VERSION, SCALA_VERSION, and MESOS_VERSION. To use them, e.g. use {{site.SPARK_VERSION}}. Also removes uses of {{HOME_PATH}} which were being resolved to "" by the templating system anyway.	2012-10-08 10:30:38 -07:00
Matei Zaharia	efc5423210	Made compression configurable separately for shuffle, broadcast and RDDs	2012-10-07 11:30:53 -07:00
Matei Zaharia	dc28a3ac0a	Modified shuffle to limit the maximum outstanding data size in bytes, instead of the maximum number of outstanding fetches. This should make it faster when there are many small map output files, as well as more robust to overallocating memory on large map outputs.	2012-10-06 20:07:10 -07:00
Matei Zaharia	802aa8aef9	Some bug fixes and logging fixes for broadcast.	2012-10-01 15:20:42 -07:00
Matei Zaharia	009b0e37e7	Added an option to compress blocks in the block store	2012-09-27 18:45:44 -07:00
Matei Zaharia	a4093f7563	Minor doc fixes	2012-09-26 23:22:15 -07:00
Matei Zaharia	ea05fc130b	Updates to standalone cluster, web UI and deploy docs.	2012-09-26 22:54:39 -07:00
Matei Zaharia	874a9fd407	More updates to docs, including tuning guide	2012-09-26 19:17:58 -07:00
Andy Konwinski	52c29071a4	- Add docs/api to .gitignore - Rework/expand the nav bar with more of the docs site - Removing parts of docs about EC2 and Mesos that differentiate between running 0.5 and before - Merged subheadings from running-on-amazon-ec2.html that are still relevant (i.e., "Using a newer version of Spark" and "Accessing Data in S3") into ec2-scripts.html and deleted running-on-amazon-ec2.html - Added some TODO comments to a few docs - Updated the blurb about AMP Camp - Renamed programming-guide to spark-programming-guide - Fixing typos/etc. in Standalone Spark doc	2012-09-16 15:28:52 -07:00
Andy Konwinski	4d3a17c8d7	Fixing lots of broken links.	2012-09-12 16:06:18 -07:00
Andy Konwinski	16da942d66	Adding docs directory containing documentation currently on the wiki which can be compiled via jekyll, using the command `jekyll`. To compile and run a local webserver to serve the doc as a website, run `jekyll --server`.	2012-09-12 13:03:43 -07:00

... 2 3 4 5 6

267 commits