[SPARK-12534][DOC] update documentation to list command line equivalent to properties
Several Spark properties equivalent to Spark submit command line options are missing. Author: felixcheung <felixcheung_m@hotmail.com> Closes #10491 from felixcheung/sparksubmitdoc.
This commit is contained in:
parent
1b2a918e59
commit
85200c09ad
|
@ -173,7 +173,7 @@ of the most common options to set are:
|
|||
stored on disk. This should be on a fast, local disk in your system. It can also be a
|
||||
comma-separated list of multiple directories on different disks.
|
||||
|
||||
NOTE: In Spark 1.0 and later this will be overriden by SPARK_LOCAL_DIRS (Standalone, Mesos) or
|
||||
NOTE: In Spark 1.0 and later this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or
|
||||
LOCAL_DIRS (YARN) environment variables set by the cluster manager.
|
||||
</td>
|
||||
</tr>
|
||||
|
@ -687,10 +687,10 @@ Apart from these, the following properties are also available, and may be useful
|
|||
<td><code>spark.rdd.compress</code></td>
|
||||
<td>false</td>
|
||||
<td>
|
||||
Whether to compress serialized RDD partitions (e.g. for
|
||||
<code>StorageLevel.MEMORY_ONLY_SER</code> in Java
|
||||
and Scala or <code>StorageLevel.MEMORY_ONLY</code> in Python).
|
||||
Can save substantial space at the cost of some extra CPU time.
|
||||
Whether to compress serialized RDD partitions (e.g. for
|
||||
<code>StorageLevel.MEMORY_ONLY_SER</code> in Java
|
||||
and Scala or <code>StorageLevel.MEMORY_ONLY</code> in Python).
|
||||
Can save substantial space at the cost of some extra CPU time.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
|
|
|
@ -39,7 +39,10 @@ Resource allocation can be configured as follows, based on the cluster type:
|
|||
and optionally set `spark.cores.max` to limit each application's resource share as in the standalone mode.
|
||||
You should also set `spark.executor.memory` to control the executor memory.
|
||||
* **YARN:** The `--num-executors` option to the Spark YARN client controls how many executors it will allocate
|
||||
on the cluster, while `--executor-memory` and `--executor-cores` control the resources per executor.
|
||||
on the cluster (`spark.executor.instances` as configuration property), while `--executor-memory`
|
||||
(`spark.executor.memory` configuration property) and `--executor-cores` (`spark.executor.cores` configuration
|
||||
property) control the resources per executor. For more information, see the
|
||||
[YARN Spark Properties](running-on-yarn.html).
|
||||
|
||||
A second option available on Mesos is _dynamic sharing_ of CPU cores. In this mode, each Spark application
|
||||
still has a fixed and independent memory allocation (set by `spark.executor.memory`), but when the
|
||||
|
|
|
@ -113,6 +113,19 @@ If you need a reference to the proper location to put log files in the YARN so t
|
|||
Use lower-case suffixes, e.g. <code>k</code>, <code>m</code>, <code>g</code>, <code>t</code>, and <code>p</code>, for kibi-, mebi-, gibi-, tebi-, and pebibytes, respectively.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>spark.driver.memory</code></td>
|
||||
<td>1g</td>
|
||||
<td>
|
||||
Amount of memory to use for the driver process, i.e. where SparkContext is initialized.
|
||||
(e.g. <code>1g</code>, <code>2g</code>).
|
||||
|
||||
<br /><em>Note:</em> In client mode, this config must not be set through the <code>SparkConf</code>
|
||||
directly in your application, because the driver JVM has already started at that point.
|
||||
Instead, please set this through the <code>--driver-memory</code> command line option
|
||||
or in your default properties file.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>spark.driver.cores</code></td>
|
||||
<td><code>1</code></td>
|
||||
|
@ -202,6 +215,13 @@ If you need a reference to the proper location to put log files in the YARN so t
|
|||
Comma-separated list of files to be placed in the working directory of each executor.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>spark.executor.cores</code></td>
|
||||
<td>1 in YARN mode, all the available cores on the worker in standalone mode.</td>
|
||||
<td>
|
||||
The number of cores to use on each executor. For YARN and standalone mode only.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>spark.executor.instances</code></td>
|
||||
<td><code>2</code></td>
|
||||
|
@ -209,6 +229,13 @@ If you need a reference to the proper location to put log files in the YARN so t
|
|||
The number of executors. Note that this property is incompatible with <code>spark.dynamicAllocation.enabled</code>. If both <code>spark.dynamicAllocation.enabled</code> and <code>spark.executor.instances</code> are specified, dynamic allocation is turned off and the specified number of <code>spark.executor.instances</code> is used.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>spark.executor.memory</code></td>
|
||||
<td>1g</td>
|
||||
<td>
|
||||
Amount of memory to use per executor process (e.g. <code>2g</code>, <code>8g</code>).
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>spark.yarn.executor.memoryOverhead</code></td>
|
||||
<td>executorMemory * 0.10, with minimum of 384 </td>
|
||||
|
|
Loading…
Reference in a new issue