ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Thomas Graves	7edbea41b4	SPARK-1189: Add Security to Spark - Akka, Http, ConnectionManager, UI use servlets resubmit pull request. was https://github.com/apache/incubator-spark/pull/332. Author: Thomas Graves <tgraves@apache.org> Closes #33 from tgravescs/security-branch-0.9-with-client-rebase and squashes the following commits: dfe3918 [Thomas Graves] Fix merge conflict since startUserClass now using runAsUser 05eebed [Thomas Graves] Fix dependency lost in upmerge d1040ec [Thomas Graves] Fix up various imports 05ff5e0 [Thomas Graves] Fix up imports after upmerging to master ac046b3 [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase 13733e1 [Thomas Graves] Pass securityManager and SparkConf around where we can. Switch to use sparkConf for reading config whereever possible. Added ConnectionManagerSuite unit tests. 4a57acc [Thomas Graves] Change UI createHandler routines to createServlet since they now return servlets 2f77147 [Thomas Graves] Rework from comments 50dd9f2 [Thomas Graves] fix header in SecurityManager ecbfb65 [Thomas Graves] Fix spacing and formatting b514bec [Thomas Graves] Fix reference to config ed3d1c1 [Thomas Graves] Add security.md 6f7ddf3 [Thomas Graves] Convert SaslClient and SaslServer to scala, change spark.authenticate.ui to spark.ui.acls.enable, and fix up various other things from review comments 2d9e23e [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase_rework 5721c5a [Thomas Graves] update AkkaUtilsSuite test for the actorSelection changes, fix typos based on comments, and remove extra lines I missed in rebase from AkkaUtils f351763 [Thomas Graves] Add Security to Spark - Akka, Http, ConnectionManager, UI to use servlets	2014-03-06 18:27:50 -06:00
Kyle Ellrott	40566e10aa	SPARK-942: Do not materialize partitions when DISK_ONLY storage level is used This is a port of a pull request original targeted at incubator-spark: https://github.com/apache/incubator-spark/pull/180 Essentially if a user returns a generative iterator (from a flatMap operation), when trying to persist the data, Spark would first unroll the iterator into an ArrayBuffer, and then try to figure out if it could store the data. In cases where the user provided an iterator that generated more data then available memory, this would case a crash. With this patch, if the user requests a persist with a 'StorageLevel.DISK_ONLY', the iterator will be unrolled as it is inputed into the serializer. To do this, two changes where made: 1) The type of the 'values' argument in the putValues method of the BlockStore interface was changed from ArrayBuffer to Iterator (and all code interfacing with this method was modified to connect correctly. 2) The JavaSerializer now calls the ObjectOutputStream 'reset' method every 1000 objects. This was done because the ObjectOutputStream caches objects (thus preventing them from being GC'd) to write more compact serialization. If reset is never called, eventually the memory fills up, if it is called too often then the serialization streams become much larger because of redundant class descriptions. Author: Kyle Ellrott <kellrott@gmail.com> Closes #50 from kellrott/iterator-to-disk and squashes the following commits: 9ef7cb8 [Kyle Ellrott] Fixing formatting issues. 60e0c57 [Kyle Ellrott] Fixing issues (formatting, variable names, etc.) from review comments 8aa31cd [Kyle Ellrott] Merge ../incubator-spark into iterator-to-disk 33ac390 [Kyle Ellrott] Merge branch 'iterator-to-disk' of github.com:kellrott/incubator-spark into iterator-to-disk 2f684ea [Kyle Ellrott] Refactoring the BlockManager to replace the Either[Either[A,B]] usage. Now using trait 'Values'. Also modified BlockStore.putBytes call to return PutResult, so that it behaves like putValues. f70d069 [Kyle Ellrott] Adding docs for spark.serializer.objectStreamReset configuration 7ccc74b [Kyle Ellrott] Moving the 'LargeIteratorSuite' to simply test persistance of iterators. It doesn't try to invoke a OOM error any more 16a4cea [Kyle Ellrott] Streamlined the LargeIteratorSuite unit test. It should now run in ~25 seconds. Confirmed that it still crashes an unpatched copy of Spark. c2fb430 [Kyle Ellrott] Removing more un-needed array-buffer to iterator conversions 627a8b7 [Kyle Ellrott] Wrapping a few long lines 0f28ec7 [Kyle Ellrott] Adding second putValues to BlockStore interface that accepts an ArrayBuffer (rather then an Iterator). This will allow BlockStores to have slightly different behaviors dependent on whether they get an Iterator or ArrayBuffer. In the case of the MemoryStore, it needs to duplicate and cache an Iterator into an ArrayBuffer, but if handed a ArrayBuffer, it can skip the duplication. 656c33e [Kyle Ellrott] Fixing the JavaSerializer to read from the SparkConf rather then the System property. 8644ee8 [Kyle Ellrott] Merge branch 'master' into iterator-to-disk 00c98e0 [Kyle Ellrott] Making the Java ObjectStreamSerializer reset rate configurable by the system variable 'spark.serializer.objectStreamReset', default is not 10000. 40fe1d7 [Kyle Ellrott] Removing rouge space 31fe08e [Kyle Ellrott] Removing un-needed semi-colons 9df0276 [Kyle Ellrott] Added check to make sure that streamed-to-dist RDD actually returns good data in the LargeIteratorSuite a6424ba [Kyle Ellrott] Wrapping long line 2eeda75 [Kyle Ellrott] Fixing dumb mistake ("\|\|" instead of "&&") 0e6f808 [Kyle Ellrott] Deleting temp output directory when done 95c7f67 [Kyle Ellrott] Simplifying StorageLevel checks 56f71cd [Kyle Ellrott] Merge branch 'master' into iterator-to-disk 44ec35a [Kyle Ellrott] Adding some comments. 5eb2b7e [Kyle Ellrott] Changing the JavaSerializer reset to occur every 1000 objects. f403826 [Kyle Ellrott] Merge branch 'master' into iterator-to-disk 81d670c [Kyle Ellrott] Adding unit test for straight to disk iterator methods. d32992f [Kyle Ellrott] Merge remote-tracking branch 'origin/master' into iterator-to-disk cac1fad [Kyle Ellrott] Fixing MemoryStore, so that it converts incoming iterators to ArrayBuffer objects. This was previously done higher up the stack. efe1102 [Kyle Ellrott] Changing CacheManager and BlockManager to pass iterators directly to the serializer when a 'DISK_ONLY' persist is called. This is in response to SPARK-942.	2014-03-06 14:51:19 -08:00
CodingCat	1865dd681b	SPARK-1178: missing document of spark.scheduler.revive.interval https://spark-project.atlassian.net/browse/SPARK-1178 The configuration on spark.scheduler.revive.interval is undocumented but actually used https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L64 Author: CodingCat <zhunansjtu@gmail.com> Closes #74 from CodingCat/SPARK-1178 and squashes the following commits: 783ec69 [CodingCat] missing document of spark.scheduler.revive.interval	2014-03-04 10:28:17 -08:00
Andrew Or	1896c6e7c9	Merge pull request #533 from andrewor14/master. Closes #533 . External spilling - generalize batching logic The existing implementation consists of a hack for Kryo specifically and only works for LZF compression. Introducing an intermediate batch-level stream takes care of pre-fetching and other arbitrary behavior of higher level streams in a more general way. Author: Andrew Or <andrewor14@gmail.com> == Merge branch commits == commit 3ddeb7ef89a0af2b685fb5d071aa0f71c975cc82 Author: Andrew Or <andrewor14@gmail.com> Date: Wed Feb 5 12:09:32 2014 -0800 Also privatize fields commit 090544a87a0767effd0c835a53952f72fc8d24f0 Author: Andrew Or <andrewor14@gmail.com> Date: Wed Feb 5 10:58:23 2014 -0800 Privatize methods commit 13920c918efe22e66a1760b14beceb17a61fd8cc Author: Andrew Or <andrewor14@gmail.com> Date: Tue Feb 4 16:34:15 2014 -0800 Update docs commit bd5a1d7350467ed3dc19c2de9b2c9f531f0e6aa3 Author: Andrew Or <andrewor14@gmail.com> Date: Tue Feb 4 13:44:24 2014 -0800 Typo: phyiscal -> physical commit 287ef44e593ad72f7434b759be3170d9ee2723d2 Author: Andrew Or <andrewor14@gmail.com> Date: Tue Feb 4 13:38:32 2014 -0800 Avoid reading the entire batch into memory; also simplify streaming logic Additionally, address formatting comments. commit 3df700509955f7074821e9aab1e74cb53c58b5a5 Merge: a531d2e 164489d Author: Andrew Or <andrewor14@gmail.com> Date: Mon Feb 3 18:27:49 2014 -0800 Merge branch 'master' of github.com:andrewor14/incubator-spark commit a531d2e347acdcecf2d0ab72cd4f965ab5e145d8 Author: Andrew Or <andrewor14@gmail.com> Date: Mon Feb 3 18:18:04 2014 -0800 Relax assumptions on compressors and serializers when batching This commit introduces an intermediate layer of an input stream on the batch level. This guards against interference from higher level streams (i.e. compression and deserialization streams), especially pre-fetching, without specifically targeting particular libraries (Kryo) and forcing shuffle spill compression to use LZF. commit 164489d6f176bdecfa9dabec2dfce5504d1ee8af Author: Andrew Or <andrewor14@gmail.com> Date: Mon Feb 3 18:18:04 2014 -0800 Relax assumptions on compressors and serializers when batching This commit introduces an intermediate layer of an input stream on the batch level. This guards against interference from higher level streams (i.e. compression and deserialization streams), especially pre-fetching, without specifically targeting particular libraries (Kryo) and forcing shuffle spill compression to use LZF.	2014-02-06 22:05:53 -08:00
Reynold Xin	ac712e48af	Merge pull request #524 from rxin/doc Added spark.shuffle.file.buffer.kb to configuration doc. Author: Reynold Xin <rxin@apache.org> == Merge branch commits == commit 0eea1d761ff772ff89be234e1e28035d54e5a7de Author: Reynold Xin <rxin@apache.org> Date: Wed Jan 29 14:40:48 2014 -0800 Added spark.shuffle.file.buffer.kb to configuration doc.	2014-01-30 09:33:18 -08:00
Tathagata Das	7930209614	Merge pull request #497 from tdas/docs-update Updated Spark Streaming Programming Guide Here is the updated version of the Spark Streaming Programming Guide. This is still a work in progress, but the major changes are in place. So feedback is most welcome. In general, I have tried to make the guide to easier to understand even if the reader does not know much about Spark. The updated website is hosted here - http://www.eecs.berkeley.edu/~tdas/spark_docs/streaming-programming-guide.html The major changes are: - Overview illustrates the usecases of Spark Streaming - various input sources and various output sources - An example right after overview to quickly give an idea of what Spark Streaming program looks like - Made Java API and examples a first class citizen like Scala by using tabs to show both Scala and Java examples (similar to AMPCamp tutorial's code tabs) - Highlighted the DStream operations updateStateByKey and transform because of their powerful nature - Updated driver node failure recovery text to highlight automatic recovery in Spark standalone mode - Added information about linking and using the external input sources like Kafka and Flume - In general, reorganized the sections to better show the Basic section and the more advanced sections like Tuning and Recovery. Todos: - Links to the docs of external Kafka, Flume, etc - Illustrate window operation with figure as well as example. Author: Tathagata Das <tathagata.das1565@gmail.com> == Merge branch commits == commit 18ff10556570b39d672beeb0a32075215cfcc944 Author: Tathagata Das <tathagata.das1565@gmail.com> Date: Tue Jan 28 21:49:30 2014 -0800 Fixed a lot of broken links. commit 34a5a6008dac2e107624c7ff0db0824ee5bae45f Author: Tathagata Das <tathagata.das1565@gmail.com> Date: Tue Jan 28 18:02:28 2014 -0800 Updated github url to use SPARK_GITHUB_URL variable. commit f338a60ae8069e0a382d2cb170227e5757cc0b7a Author: Tathagata Das <tathagata.das1565@gmail.com> Date: Mon Jan 27 22:42:42 2014 -0800 More updates based on Patrick and Harvey's comments. commit 89a81ff25726bf6d26163e0dd938290a79582c0f Author: Tathagata Das <tathagata.das1565@gmail.com> Date: Mon Jan 27 13:08:34 2014 -0800 Updated docs based on Patricks PR comments. commit d5b6196b532b5746e019b959a79ea0cc013a8fc3 Author: Tathagata Das <tathagata.das1565@gmail.com> Date: Sun Jan 26 20:15:58 2014 -0800 Added spark.streaming.unpersist config and info on StreamingListener interface. commit e3dcb46ab83d7071f611d9b5008ba6bc16c9f951 Author: Tathagata Das <tathagata.das1565@gmail.com> Date: Sun Jan 26 18:41:12 2014 -0800 Fixed docs on StreamingContext.getOrCreate. commit 6c29524639463f11eec721e4d17a9d7159f2944b Author: Tathagata Das <tathagata.das1565@gmail.com> Date: Thu Jan 23 18:49:39 2014 -0800 Added example and figure for window operations, and links to Kafka and Flume API docs. commit f06b964a51bb3b21cde2ff8bdea7d9785f6ce3a9 Author: Tathagata Das <tathagata.das1565@gmail.com> Date: Wed Jan 22 22:49:12 2014 -0800 Fixed missing endhighlight tag in the MLlib guide. commit 036a7d46187ea3f2a0fb8349ef78f10d6c0b43a9 Merge: eab351d `a1cd185` Author: Tathagata Das <tathagata.das1565@gmail.com> Date: Wed Jan 22 22:17:42 2014 -0800 Merge remote-tracking branch 'apache/master' into docs-update commit eab351d05c0baef1d4b549e1581310087158d78d Author: Tathagata Das <tathagata.das1565@gmail.com> Date: Wed Jan 22 22:17:15 2014 -0800 Update Spark Streaming Programming Guide.	2014-01-28 21:51:05 -08:00
Reynold Xin	84670f2715	Merge pull request #466 from liyinan926/file-overwrite-new Allow files added through SparkContext.addFile() to be overwritten This is useful for the cases when a file needs to be refreshed and downloaded by the executors periodically. For example, a possible use case is: the driver periodically renews a Hadoop delegation token and writes it to a token file. The token file needs to be downloaded by the executors whenever it gets renewed. However, the current implementation throws an exception when the target file exists and its contents do not match those of the new source. This PR adds an option to allow files to be overwritten to support use cases similar to the above.	2014-01-27 17:08:35 -08:00
Andrew Ash	069bb94206	Clarify spark.default.parallelism It's the task count across the cluster, not per worker, per machine, per core, or anything else.	2014-01-21 14:49:35 -08:00
Patrick Wendell	c324ac10ee	Force use of LZF when spilling data	2014-01-20 19:00:48 -08:00
Patrick Wendell	cdb003e376	Removing docs on akka options	2014-01-20 16:40:58 -08:00
Yinan Li	584323c6b1	Addressed comments from Reynold Signed-off-by: Yinan Li <liyinan926@gmail.com>	2014-01-18 21:28:17 -08:00
Patrick Wendell	bf5699543b	Merge pull request #462 from mateiz/conf-file-fix Remove Typesafe Config usage and conf files to fix nested property names With Typesafe Config we had the subtle problem of no longer allowing nested property names, which are used for a few of our properties: http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html This PR is for branch 0.9 but should be added into master too. (cherry picked from commit `34e911ce9a`) Signed-off-by: Patrick Wendell <pwendell@gmail.com>	2014-01-18 16:20:00 -08:00
Yinan Li	fd833e7ab1	Allow files added through SparkContext.addFile() to be overwritten This is useful for the cases when a file needs to be refreshed and downloaded by the executors periodically. Signed-off-by: Yinan Li <liyinan926@gmail.com>	2014-01-18 15:26:59 -08:00
Patrick Wendell	0984647aae	Enable compression by default for spills	2014-01-13 23:25:25 -08:00
Patrick Wendell	c3816de504	Changing option wording per discussion with Andrew	2014-01-13 13:25:06 -08:00
Patrick Wendell	5d61e051c2	Improvements to external sorting 1. Adds the option of compressing outputs. 2. Adds batching to the serialization to prevent OOM on the read side. 3. Slight renaming of config options. 4. Use Spark's buffer size for reads in addition to writes.	2014-01-13 12:21:39 -08:00
Patrick Wendell	2802cc80bc	Disable shuffle file consolidation by default	2014-01-12 19:16:43 -08:00
Patrick Wendell	d37408f39c	Merge pull request #377 from andrewor14/master External Sorting for Aggregator and CoGroupedRDDs (Revisited) (This pull request is re-opened from https://github.com/apache/incubator-spark/pull/303, which was closed because Jenkins / github was misbehaving) The target issue for this patch is the out-of-memory exceptions triggered by aggregate operations such as reduce, groupBy, join, and cogroup. The existing AppendOnlyMap used by these operations resides purely in memory, and grows with the size of the input data until the amount of allocated memory is exceeded. Under large workloads, this problem is aggravated by the fact that OOM frequently occurs only after a very long (> 1 hour) map phase, in which case the entire job must be restarted. The solution is to spill the contents of this map to disk once a certain memory threshold is exceeded. This functionality is provided by ExternalAppendOnlyMap, which additionally sorts this buffer before writing it out to disk, and later merges these buffers back in sorted order. Under normal circumstances in which OOM is not triggered, ExternalAppendOnlyMap is simply a wrapper around AppendOnlyMap and incurs little overhead. Only when the memory usage is expected to exceed the given threshold does ExternalAppendOnlyMap spill to disk.	2014-01-10 16:25:01 -08:00
Andrew Or	2e393cd5fd	Update documentation for externalSorting	2014-01-10 15:45:38 -08:00
Andrew Or	e4c51d2113	Address Patrick's and Reynold's comments Aside from trivial formatting changes, use nulls instead of Options for DiskMapIterator, and add documentation for spark.shuffle.externalSorting and spark.shuffle.memoryFraction. Also, set spark.shuffle.memoryFraction to 0.3, and spark.storage.memoryFraction = 0.6.	2014-01-10 15:09:51 -08:00
Patrick Wendell	460f655cc6	Enable shuffle consolidation by default. Bump this to being enabled for 0.9.0.	2014-01-09 22:42:50 -08:00
Patrick Wendell	112c0a1776	Fixing config option "retained_stages" => "retainedStages". This is a very esoteric option and it's out of sync with the style we use. So it seems fitting to fix it for 0.9.0.	2014-01-08 21:16:16 -08:00
Matei Zaharia	2c421749ea	Address review comments	2014-01-07 19:30:23 -05:00
Matei Zaharia	d8bcc8e9a0	Add way to limit default # of cores used by applications on standalone mode Also documents the spark.deploy.spreadOut option.	2014-01-07 14:35:52 -05:00
Prashant Sharma	c729fa7c8e	formatting related fixes suggested by Patrick.	2014-01-07 13:08:16 +05:30
Prashant Sharma	b84dc780d3	Allow configuration to be printed in logs for diagnosis.	2014-01-07 13:01:43 +05:30
Prashant Sharma	b3018811e1	Allow users to set arbitrary akka configurations via spark conf.	2014-01-07 13:01:43 +05:30
Andrew Ash	2dd4fb5698	Clarify spark.cores.max It controls the count of cores across the cluster, not on a per-machine basis.	2014-01-06 09:01:46 -08:00
Matei Zaharia	0fa5809768	Updated docs for SparkConf and handled review comments	2013-12-30 22:17:28 -05:00
Prashant Sharma	d3090b79a5	A few corrections to documentation.	2013-12-12 10:12:06 +05:30
Prashant Sharma	603af51bb5	Merge branch 'master' into akka-bug-fix Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala	2013-12-11 10:21:53 +05:30
Aaron Davidson	cb6ac8aafb	Correct spellling error in configuration.md	2013-12-07 01:40:01 -08:00
Patrick Wendell	7a1d1c93b8	Minor formatting fix in config file	2013-12-06 20:28:22 -08:00
Patrick Wendell	b9451acdf4	Adding disclaimer for shuffle file consolidation	2013-12-06 19:25:28 -08:00
Patrick Wendell	1450b8ef87	Small changes from Matei review	2013-12-04 18:49:32 -08:00
Patrick Wendell	b1c6fa1584	Document missing configs and set shuffle consolidation to false.	2013-12-04 18:39:34 -08:00
Prashant Sharma	54862af5ee	Improvements from the review comments and followed Boy Scout Rule.	2013-11-27 14:26:28 +05:30
Prashant Sharma	dca946ff67	Documenting the newly added spark properties.	2013-11-26 20:47:38 +05:30
Reynold Xin	f628804c02	Merge pull request #76 from pwendell/master Clarify compression property. Clarifies that this governs compression of internal data, not input data or output data.	2013-10-18 23:19:42 -07:00
Patrick Wendell	6b62836285	Clarify compression property. Clarifies that this governs compression of internal data, not input data or output data.	2013-10-18 23:08:44 -07:00
Mosharaf Chowdhury	35b2415fb3	Code styling. Updated doc.	2013-10-17 13:14:12 -07:00
Patrick Wendell	bddf135670	Change port from 3030 to 4040	2013-09-11 10:01:38 -07:00
Matei Zaharia	98fb69822c	Work in progress: - Add job scheduling docs - Rename some fair scheduler properties - Organize intro page better - Link to Apache wiki for "contributing to Spark"	2013-09-08 00:29:11 -07:00
Matei Zaharia	9329a7d4cd	Fix spark.io.compression.codec and change default codec to LZF	2013-09-02 10:15:22 -07:00
Matei Zaharia	9ee1e9db2e	Doc improvements	2013-09-01 22:12:03 -07:00
Matei Zaharia	0a8cc30921	Move some classes to more appropriate packages: * RDD, RDDFunctions -> org.apache.spark.rdd Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util * JavaSerializer, KryoSerializer -> org.apache.spark.serializer	2013-09-01 14:13:16 -07:00
Matei Zaharia	4f422032e5	Update docs for new package	2013-09-01 14:13:15 -07:00
Matei Zaharia	4819baa658	More updates, describing changes to recommended use of environment vars and new Python stuff	2013-08-31 14:21:10 -07:00
Matei Zaharia	53b1c30607	Update docs for Spark UI port	2013-08-20 22:57:11 -07:00
Matei Zaharia	2a4ed10210	Address some review comments: - When a resourceOffers() call has multiple offers, force the TaskSets to consider them in increasing order of locality levels so that they get a chance to launch stuff locally across all offers - Simplify ClusterScheduler.prioritizeContainers - Add docs on the new configuration options	2013-08-18 19:51:07 -07:00
Matei Zaharia	3097d75d6f	Merge remote-tracking branch 'dlyubimov/SPARK-827' Conflicts: docs/configuration.md	2013-07-31 18:36:43 -07:00
Reynold Xin	5227043f84	Documentation update for compression codec.	2013-07-30 17:12:16 -07:00
Dmitriy Lyubimov	0862494d44	typo	2013-07-27 23:16:20 -07:00
Dmitriy Lyubimov	f5067abe85	changes per comments.	2013-07-27 23:08:00 -07:00
Matei Zaharia	d47c16f78d	Add an option to disable reference tracking in Kryo	2013-07-15 01:55:54 +00:00
Matei Zaharia	1ffadb2d9e	Merge remote-tracking branch 'pwendell/ui-updates' Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala core/src/main/scala/spark/util/AkkaUtils.scala pom.xml	2013-07-06 15:51:41 -07:00
Matei Zaharia	5bbd0eec84	Update docs on SCALA_LIBRARY_PATH	2013-06-30 17:00:40 -07:00
Matei Zaharia	03d0b858c8	Made use of spark.executor.memory setting consistent and documented it Conflicts: core/src/main/scala/spark/SparkContext.scala	2013-06-30 15:46:46 -07:00
Patrick Wendell	a59c15a37e	Adding config option for retained stages	2013-06-26 08:54:57 -07:00
Tathagata Das	c89af0a7f9	Merge branch 'master' into streaming Conflicts: .gitignore	2013-06-24 23:57:47 -07:00
seanm	ab0f834dbb	adding spark.streaming.blockInterval property	2013-04-16 11:57:05 -06:00
Matei Zaharia	22334eafd9	Some tweaks to docs	2013-02-26 22:52:38 -08:00
Tathagata Das	d853aa9658	Change spark.cleaner.delay to spark.cleaner.ttl. Updated docs.	2013-02-23 17:42:26 -08:00
Matei Zaharia	05d2e94838	Use a separate memory setting for standalone cluster daemons Conflicts: docs/_config.yml	2013-02-10 21:59:41 -08:00
Stephen Haberman	7dfb82a992	Replace old 'master' term with 'driver'.	2013-01-25 11:03:00 -06:00
Matei Zaharia	76d7c0ce2b	Add more Akka settings to docs	2013-01-21 13:10:33 -08:00
Tathagata Das	02497f0cd4	Updated Streaming Programming Guide.	2013-01-01 12:21:32 -08:00
Matei Zaharia	19910c00c3	tweaks	2012-10-13 16:22:39 -07:00
Matei Zaharia	4a3e9cf69c	Document how to configure SPARK_MEM & co on a per-job basis	2012-10-13 16:20:25 -07:00
Andy Konwinski	45d03231d0	Adds liquid variables to docs templating system so that they can be used throughout the docs: SPARK_VERSION, SCALA_VERSION, and MESOS_VERSION. To use them, e.g. use {{site.SPARK_VERSION}}. Also removes uses of {{HOME_PATH}} which were being resolved to "" by the templating system anyway.	2012-10-08 10:30:38 -07:00
Matei Zaharia	efc5423210	Made compression configurable separately for shuffle, broadcast and RDDs	2012-10-07 11:30:53 -07:00
Matei Zaharia	dc28a3ac0a	Modified shuffle to limit the maximum outstanding data size in bytes, instead of the maximum number of outstanding fetches. This should make it faster when there are many small map output files, as well as more robust to overallocating memory on large map outputs.	2012-10-06 20:07:10 -07:00
Matei Zaharia	802aa8aef9	Some bug fixes and logging fixes for broadcast.	2012-10-01 15:20:42 -07:00
Matei Zaharia	009b0e37e7	Added an option to compress blocks in the block store	2012-09-27 18:45:44 -07:00
Matei Zaharia	a4093f7563	Minor doc fixes	2012-09-26 23:22:15 -07:00
Matei Zaharia	ea05fc130b	Updates to standalone cluster, web UI and deploy docs.	2012-09-26 22:54:39 -07:00
Matei Zaharia	874a9fd407	More updates to docs, including tuning guide	2012-09-26 19:17:58 -07:00
Andy Konwinski	52c29071a4	- Add docs/api to .gitignore - Rework/expand the nav bar with more of the docs site - Removing parts of docs about EC2 and Mesos that differentiate between running 0.5 and before - Merged subheadings from running-on-amazon-ec2.html that are still relevant (i.e., "Using a newer version of Spark" and "Accessing Data in S3") into ec2-scripts.html and deleted running-on-amazon-ec2.html - Added some TODO comments to a few docs - Updated the blurb about AMP Camp - Renamed programming-guide to spark-programming-guide - Fixing typos/etc. in Standalone Spark doc	2012-09-16 15:28:52 -07:00
Andy Konwinski	4d3a17c8d7	Fixing lots of broken links.	2012-09-12 16:06:18 -07:00
Andy Konwinski	16da942d66	Adding docs directory containing documentation currently on the wiki which can be compiled via jekyll, using the command `jekyll`. To compile and run a local webserver to serve the doc as a website, run `jekyll --server`.	2012-09-12 13:03:43 -07:00

1 2 3 4 5

230 commits