ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Andrew Or	b3ffac5178	[SPARK-10983] Unified memory manager This patch unifies the memory management of the storage and execution regions such that either side can borrow memory from each other. When memory pressure arises, storage will be evicted in favor of execution. To avoid regressions in cases where storage is crucial, we dynamically allocate a fraction of space for storage that execution cannot evict. Several configurations are introduced: - spark.memory.fraction (default 0.75): fraction of the heap space used for execution and storage. The lower this is, the more frequently spills and cached data eviction occur. The purpose of this config is to set aside memory for internal metadata, user data structures, and imprecise size estimation in the case of sparse, unusually large records. - spark.memory.storageFraction (default 0.5): size of the storage region within the space set aside by `spark.memory.fraction`. Cached data may only be evicted if total storage exceeds this region. - spark.memory.useLegacyMode (default false): whether to use the memory management that existed in Spark 1.5 and before. This is mainly for backward compatibility. For a detailed description of the design, see [SPARK-10000](https://issues.apache.org/jira/browse/SPARK-10000). This patch builds on top of the `MemoryManager` interface introduced in #9000. Author: Andrew Or <andrew@databricks.com> Closes #9084 from andrewor14/unified-memory-manager.	2015-10-13 13:49:59 -07:00
jerryshao	f97e9323b5	[SPARK-10739] [YARN] Add application attempt window for Spark on Yarn Add application attempt window for Spark on Yarn to ignore old out of window failures, this is useful for long running applications to recover from failures. Author: jerryshao <sshao@hortonworks.com> Closes #8857 from jerryshao/SPARK-10739 and squashes the following commits: 36eabdc [jerryshao] change the doc 7f9b77d [jerryshao] Style change 1c9afd0 [jerryshao] Address the comments caca695 [jerryshao] Add application attempt window for Spark on Yarn	2015-10-12 18:18:19 -07:00
Kay Ousterhout	091c2c3ecd	[SPARK-11056] Improve documentation of SBT build. This commit improves the documentation around building Spark to (1) recommend using SBT interactive mode to avoid the overhead of launching SBT and (2) refer to the wiki page that documents using SPARK_PREPEND_CLASSES to avoid creating the assembly jar for each compile. cc srowen Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #9068 from kayousterhout/SPARK-11056.	2015-10-12 14:23:29 -07:00
Jean-Baptiste Onofré	60150cf00a	[SPARK-10883] Add a note about how to build Spark sub-modules (reactor) Author: Jean-Baptiste Onofré <jbonofre@apache.org> Closes #8993 from jbonofre/SPARK-10883-2.	2015-10-08 11:38:39 +01:00
admackin	cd28139c9b	Akka framesize units should be specified 1.4 docs noted that the units were MB - i have assumed this is still the case Author: admackin <admackin@users.noreply.github.com> Closes #9025 from admackin/master.	2015-10-08 00:01:23 -07:00
Xin Ren	27cdde2ff8	[SPARK-10669] [DOCS] Link to each language's API in codetabs in ML docs: spark.mllib In the Markdown docs for the spark.mllib Programming Guide, we have code examples with codetabs for each language. We should link to each language's API docs within the corresponding codetab, but we are inconsistent about this. For an example of what we want to do, see the "ChiSqSelector" section in `64743870f2/docs/mllib-feature-extraction.md` This JIRA is just for spark.mllib, not spark.ml. Please let me know if more work is needed, thanks a lot. Author: Xin Ren <iamshrek@126.com> Closes #8977 from keypointt/SPARK-10669.	2015-10-07 15:00:19 +01:00
Sean Owen	82bbc2a5f2	[SPARK-9570] [DOCS] Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'. Recommend `--master yarn --deploy-mode {cluster,client}` consistently in docs. Follow-on to https://github.com/apache/spark/pull/8385 CC nssalian Author: Sean Owen <sowen@cloudera.com> Closes #8968 from srowen/SPARK-9570.	2015-10-04 09:31:52 +01:00
Yuhao Yang	9b9fe5f7bf	[SPARK-10670] [ML] [Doc] add api reference for ml doc jira: https://issues.apache.org/jira/browse/SPARK-10670 In the Markdown docs for the spark.ml Programming Guide, we have code examples with codetabs for each language. We should link to each language's API docs within the corresponding codetab, but we are inconsistent about this. For an example of what we want to do, see the "Word2Vec" section in `64743870f2/docs/ml-features.md` This JIRA is just for spark.ml, not spark.mllib Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8901 from hhbyyh/docAPI.	2015-09-28 22:40:02 -07:00
David Martin	b58249930d	Fix two mistakes in programming-guide page seperate -> separate sees -> see Author: David Martin <dmartinpro@users.noreply.github.com> Closes #8928 from dmartinpro/patch-1.	2015-09-28 10:41:39 +01:00
Bin Wang	fb4c7be747	add doc for spark.streaming.stopGracefullyOnShutdown Author: Bin Wang <wbin00@gmail.com> Closes #8898 from wb14123/doc.	2015-09-27 21:26:54 +01:00
Matt Hagen	558e9c7e60	[SPARK-10663] Removed unnecessary invocation of DataFrame.toDF method. The Scala example under the "Example: Pipeline" heading in this document initializes the "test" variable to a DataFrame. Because test is already a DF, there is not need to call test.toDF as the example does in a subsequent line: model.transform(test.toDF). So, I removed the extraneous toDF invocation. Author: Matt Hagen <anonz3000@gmail.com> Closes #8875 from hagenhaus/SPARK-10663.	2015-09-22 21:14:25 -07:00
Akash Mishra	0bd0e5bed2	[SPARK-10695] [DOCUMENTATION] [MESOS] Fixing incorrect value informati… …on for spark.mesos.constraints parameter. Author: Akash Mishra <akash.mishra20@gmail.com> Closes #8816 from SleepyThread/constraint-fix.	2015-09-22 00:14:27 -07:00
Marcelo Vanzin	97a99dde6e	[SPARK-10676] [DOCS] Add documentation for SASL encryption options. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8803 from vanzin/SPARK-10676.	2015-09-21 13:15:44 -07:00
Jacek Laskowski	ca9fe540fe	[SPARK-10662] [DOCS] Code snippets are not properly formatted in tables * Backticks are processed properly in Spark Properties table * Removed unnecessary spaces * See http://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/running-on-yarn.html Author: Jacek Laskowski <jacek.laskowski@deepsense.io> Closes #8795 from jaceklaskowski/docs-yarn-formatting.	2015-09-21 19:46:39 +01:00
Josh Rosen	2117eea71e	[SPARK-10710] Remove ability to disable spilling in core and SQL It does not make much sense to set `spark.shuffle.spill` or `spark.sql.planner.externalSort` to false: I believe that these configurations were initially added as "escape hatches" to guard against bugs in the external operators, but these operators are now mature and well-tested. In addition, these configurations are not handled in a consistent way anymore: SQL's Tungsten codepath ignores these configurations and will continue to use spilling operators. Similarly, Spark Core's `tungsten-sort` shuffle manager does not respect `spark.shuffle.spill=false`. This pull request removes these configurations, adds warnings at the appropriate places, and deletes a large amount of code which was only used in code paths that did not support spilling. Author: Josh Rosen <joshrosen@databricks.com> Closes #8831 from JoshRosen/remove-ability-to-disable-spilling.	2015-09-19 21:40:21 -07:00
Alexis Seigneurin	d83b6aae8b	Fixed links to the API Submitting this change on the master branch as requested in https://github.com/apache/spark/pull/8819#issuecomment-141505941 Author: Alexis Seigneurin <alexis.seigneurin@gmail.com> Closes #8838 from aseigneurin/patch-2.	2015-09-19 12:01:22 +01:00
Kousuke Saruta	d507f9c0b7	[SPARK-10584] [SQL] [DOC] Documentation about the compatible Hive version is wrong. In Spark 1.5.0, Spark SQL is compatible with Hive 0.12.0 through 1.2.1 but the documentation is wrong. /CC yhuai Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #8776 from sarutak/SPARK-10584-2.	2015-09-19 01:59:36 -07:00
Reynold Xin	348d7c9a93	[SPARK-9808] Remove hash shuffle file consolidation. Author: Reynold Xin <rxin@databricks.com> Closes #8812 from rxin/SPARK-9808-1.	2015-09-18 13:48:41 -07:00
Reynold Xin	74d8f7dda8	Added <code> tag to documentation.	2015-09-17 22:46:13 -07:00
Felix Bechstein	9a56dcdf7f	docs/running-on-mesos.md: state default values in default column This PR simply uses the default value column for defaults. Author: Felix Bechstein <felix.bechstein@otto.de> Closes #8810 from felixb/fix_mesos_doc.	2015-09-17 22:42:46 -07:00
Michael Armbrust	e0dc2bc232	[SPARK-10650] Clean before building docs The [published docs for 1.5.0](http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/) have a bunch of test classes in them. The only way I can reproduce this is to `test:compile` before running `unidoc`. To prevent this from happening again, I've added a clean before doc generation. Author: Michael Armbrust <michael@databricks.com> Closes #8787 from marmbrus/testsInDocs.	2015-09-17 11:05:30 -07:00
yangping.wu	c88bb5df94	[SPARK-10660] Doc describe error in the "Running Spark on YARN" page In the Configuration section, the spark.yarn.driver.memoryOverhead and spark.yarn.am.memoryOverhead‘s default value should be "driverMemory * 0.10, with minimum of 384" and "AM memory * 0.10, with minimum of 384" respectively. Because from Spark 1.4.0, the MEMORY_OVERHEAD_FACTOR is set to 0.1.0, not 0.07. Author: yangping.wu <wyphao.2007@163.com> Closes #8797 from 397090770/SparkOnYarnDocError.	2015-09-17 09:52:40 -07:00
Joseph K. Bradley	b921fe4dc0	[SPARK-10595] [ML] [MLLIB] [DOCS] Various ML guide cleanups Various ML guide cleanups. * ml-guide.md: Make it easier to access the algorithm-specific guides. * LDA user guide: EM often begins with useless topics, but running longer generally improves them dramatically. E.g., 10 iterations on a Wikipedia dataset produces useless topics, but 50 iterations produces very meaningful topics. * mllib-feature-extraction.html#elementwiseproduct: “w” parameter should be “scalingVec” * Clean up Binarizer user guide a little. * Document in Pipeline that users should not put an instance into the Pipeline in more than 1 place. * spark.ml Word2Vec user guide: clean up grammar/writing * Chi Sq Feature Selector docs: Improve text in doc. CC: mengxr feynmanliang Author: Joseph K. Bradley <joseph@databricks.com> Closes #8752 from jkbradley/mlguide-fixes-1.5.	2015-09-15 19:43:26 -07:00
Jacek Laskowski	416003b264	[DOCS] Small fixes to Spark on Yarn doc * a follow-up to `16b6d18613` as `--num-executors` flag is not suppported. * links + formatting Author: Jacek Laskowski <jacek.laskowski@deepsense.io> Closes #8762 from jaceklaskowski/docs-spark-on-yarn.	2015-09-15 20:42:33 +01:00
Reynold Xin	09b7e7c198	Update version to 1.6.0-SNAPSHOT. Author: Reynold Xin <rxin@databricks.com> Closes #8350 from rxin/1.6.	2015-09-15 00:54:20 -07:00
Jacek Laskowski	833be73314	Small fixes to docs Links work now properly + consistent use of Spark standalone cluster (Spark uppercase + lowercase the rest -- seems agreed in the other places in the docs). Author: Jacek Laskowski <jacek.laskowski@deepsense.io> Closes #8759 from jaceklaskowski/docs-submitting-apps.	2015-09-14 23:40:29 -07:00
Kousuke Saruta	cf2821ef5f	[SPARK-10584] [DOC] [SQL] Documentation about spark.sql.hive.metastore.version is wrong. The default value of hive metastore version is 1.2.1 but the documentation says the value of `spark.sql.hive.metastore.version` is 0.13.1. Also, we cannot get the default value by `sqlContext.getConf("spark.sql.hive.metastore.version")`. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #8739 from sarutak/SPARK-10584.	2015-09-14 12:06:23 -07:00
Sean Owen	1dc614b874	[SPARK-10222] [GRAPHX] [DOCS] More thoroughly deprecate Bagel in favor of GraphX Finish deprecating Bagel; remove reference to nonexistent example Author: Sean Owen <sowen@cloudera.com> Closes #8731 from srowen/SPARK-10222.	2015-09-13 08:36:46 +01:00
y-shimizu	c268ca4ddd	[SPARK-10518] [DOCS] Update code examples in spark.ml user guide to use LIBSVM data source instead of MLUtils I fixed to use LIBSVM data source in the example code in spark.ml instead of MLUtils Author: y-shimizu <y.shimizu0429@gmail.com> Closes #8697 from y-shimizu/SPARK-10518.	2015-09-11 08:27:30 -07:00
Akash Mishra	a5ef2d0600	[SPARK-10514] [MESOS] waiting for min no of total cores acquired by Spark by implementing the sufficientResourcesRegistered method spark.scheduler.minRegisteredResourcesRatio configuration parameter works for YARN mode but not for Mesos Coarse grained mode. If the parameter specified default value of 0 will be set for spark.scheduler.minRegisteredResourcesRatio in base class and this method will always return true. There are no existing test for YARN mode too. Hence not added test for the same. Author: Akash Mishra <akash.mishra20@gmail.com> Closes #8672 from SleepyThread/master.	2015-09-10 12:04:02 -07:00
Holden Karau	a76bde9dae	[SPARK-10469] [DOC] Try and document the three options From JIRA: Add documentation for tungsten-sort. From the mailing list "I saw a new "spark.shuffle.manager=tungsten-sort" implemented in https://issues.apache.org/jira/browse/SPARK-7081, but it can't be found its corresponding description in http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc3-docs/configuration.html(Currenlty there are only 'sort' and 'hash' two options)." Author: Holden Karau <holden@pigscanfly.ca> Closes #8638 from holdenk/SPARK-10469-document-tungsten-sort.	2015-09-10 11:49:53 -07:00
Sean Paradiso	1dc7548c59	[MINOR] [MLLIB] [ML] [DOC] fixed typo: label for negative result should be 0.0 (original: 1.0) Small typo in the example for `LabelledPoint` in the MLLib docs. Author: Sean Paradiso <seanparadiso@gmail.com> Closes #8680 from sparadiso/docs_mllib_smalltypo.	2015-09-09 22:09:33 -07:00
Yuhao Yang	91a577d277	[SPARK-10249] [ML] [DOC] Add Python Code Example to StopWordsRemover User Guide jira: https://issues.apache.org/jira/browse/SPARK-10249 update user guide since python support added. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8620 from hhbyyh/swPyDocExample.	2015-09-08 22:33:23 -07:00
Tathagata Das	52b24a602a	[SPARK-10492] [STREAMING] [DOCUMENTATION] Update Streaming documentation about rate limiting and backpressure Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #8656 from tdas/SPARK-10492 and squashes the following commits: 986cdd6 [Tathagata Das] Added information on backpressure	2015-09-08 14:54:43 -07:00
Jacek Laskowski	6ceed852ab	Docs small fixes Author: Jacek Laskowski <jacek@japila.pl> Closes #8629 from jaceklaskowski/docs-fixes.	2015-09-08 14:38:10 +01:00
Stephen Hopper	9d8e838d88	[DOC] Added R to the list of languages with "high-level API" support in the… … main README. Author: Stephen Hopper <shopper@shopper-osx.local> Closes #8646 from enragedginger/master.	2015-09-08 14:36:34 +01:00
Reynold Xin	5ffe752b59	[SPARK-9767] Remove ConnectionManager. We introduced the Netty network module for shuffle in Spark 1.2, and has turned it on by default for 3 releases. The old ConnectionManager is difficult to maintain. If we merge the patch now, by the time it is released, it would be 1 yr for which ConnectionManager is off by default. It's time to remove it. Author: Reynold Xin <rxin@databricks.com> Closes #8161 from rxin/SPARK-9767.	2015-09-07 10:42:30 -10:00
Tathagata Das	7a4f326c00	[SPARK-10440] [STREAMING] [DOCS] Update python API stuff in the programming guides and python docs - Fixed information around Python API tags in streaming programming guides - Added missing stuff in python docs Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #8595 from tdas/SPARK-10440.	2015-09-04 23:16:39 -10:00
Timothy Chen	b087d23e28	[SPARK-9669] [MESOS] Support PySpark on Mesos cluster mode. Support running pyspark with cluster mode on Mesos! This doesn't upload any scripts, so if running in a remote Mesos requires the user to specify the script from a available URI. Author: Timothy Chen <tnachen@gmail.com> Closes #8349 from tnachen/mesos_python.	2015-09-04 15:21:31 -07:00
Tom Graves	49aff7b9ad	[SPARK-10432] spark.port.maxRetries documentation is unclear Author: Tom Graves <tgraves@yahoo-inc.com> Closes #8585 from tgravescs/SPARK-10432.	2015-09-03 13:46:16 -07:00
zhuol	ec01280533	[SPARK-4223] [CORE] Support * in acls. SPARK-4223. Currently we support setting view and modify acls but you have to specify a list of users. It would be nice to support * meaning all users have access. Manual tests to verify that: "*" works for any user in: a. Spark ui: view and kill stage. Done. b. Spark history server. Done. c. Yarn application killing. Done. Author: zhuol <zhuol@yahoo-inc.com> Closes #8398 from zhuoliu/4223.	2015-09-01 11:14:59 -10:00
Sean Owen	3f63bd6023	[SPARK-10398] [DOCS] Migrate Spark download page to use new lua mirroring scripts Migrate Apache download closer.cgi refs to new closer.lua This is the bit of the change that affects the project docs; I'm implementing the changes to the Apache site separately. Author: Sean Owen <sowen@cloudera.com> Closes #8557 from srowen/SPARK-10398.	2015-09-01 20:06:01 +01:00
Xiangrui Meng	ca69fc8efd	[SPARK-10331] [MLLIB] Update example code in ml-guide * The example code was added in 1.2, before `createDataFrame`. This PR switches to `createDataFrame`. Java code still uses JavaBean. * assume `sqlContext` is available * fix some minor issues from previous code review jkbradley srowen feynmanliang Author: Xiangrui Meng <meng@databricks.com> Closes #8518 from mengxr/SPARK-10331.	2015-08-29 23:57:09 -07:00
Xiangrui Meng	905fbe498b	[SPARK-10348] [MLLIB] updates ml-guide * replace `ML Dataset` by `DataFrame` to unify the abstraction * ML algorithms -> pipeline components to describe the main concept * remove Scala API doc links from the main guide * `Section Title` -> `Section tile` to be consistent with other section titles in MLlib guide * modified lines break at 100 chars or periods jkbradley feynmanliang Author: Xiangrui Meng <meng@databricks.com> Closes #8517 from mengxr/SPARK-10348.	2015-08-29 23:26:23 -07:00
GuoQiang Li	5369be8068	[SPARK-10350] [DOC] [SQL] Removed duplicated option description from SQL guide Author: GuoQiang Li <witgo@qq.com> Closes #8520 from witgo/SPARK-10350.	2015-08-29 13:20:27 -07:00
martinzapletal	e8ea5bafee	[SPARK-9910] [ML] User guide for train validation split Author: martinzapletal <zapletal-martin@email.cz> Closes #8377 from zapletal-martin/SPARK-9910.	2015-08-28 21:03:48 -07:00
Xiangrui Meng	88032ecaf0	[SPARK-9671] [MLLIB] re-org user guide and add migration guide This PR updates the MLlib user guide and adds migration guide for 1.4->1.5. * merge migration guide for `spark.mllib` and `spark.ml` packages * remove dependency section from `spark.ml` guide * move the paragraph about `spark.mllib` and `spark.ml` to the top and recommend `spark.ml` * move Sam's talk to footnote to make the section focus on dependencies Minor changes to code examples and other wording will be in a separate PR. jkbradley srowen feynmanliang Author: Xiangrui Meng <meng@databricks.com> Closes #8498 from mengxr/SPARK-9671.	2015-08-28 13:53:31 -07:00
Yuhao Yang	e2a843090c	[SPARK-9890] [DOC] [ML] User guide for CountVectorizer jira: https://issues.apache.org/jira/browse/SPARK-9890 document with Scala and java examples Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8487 from hhbyyh/cvDoc.	2015-08-28 08:00:44 -07:00
Keiji Yoshida	18294cd871	Fix DynamodDB/DynamoDB typo in Kinesis Integration doc Fix DynamodDB/DynamoDB typo in Kinesis Integration doc Author: Keiji Yoshida <yoshida.keiji.84@gmail.com> Closes #8501 from yosssi/patch-1.	2015-08-28 09:36:50 +01:00
Feynman Liang	af0e1249b1	[SPARK-9905] [ML] [DOC] Adds LinearRegressionSummary user guide * Adds user guide for `LinearRegressionSummary` * Fixes unresolved issues in #8197 CC jkbradley mengxr Author: Feynman Liang <fliang@databricks.com> Closes #8491 from feynmanliang/SPARK-9905.	2015-08-27 21:55:20 -07:00

1 2 3 4 5 ...

1287 commits