[SPARK-15894][SQL][DOC] Update docs for controlling #partitions

## What changes were proposed in this pull request?
Update docs for two parameters `spark.sql.files.maxPartitionBytes` and `spark.sql.files.openCostInBytes ` in Other Configuration Options.

## How was this patch tested?
N/A

Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>

Closes #13797 from maropu/SPARK-15894-2.
This commit is contained in:
Takeshi YAMAMURO 2016-06-21 14:27:16 +08:00 committed by Cheng Lian
parent 58f6e27dd7
commit 41e0ffb19f

View file

@ -2015,6 +2015,23 @@ that these options will be deprecated in future release as more optimizations ar
<table class="table">
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
<tr>
<td><code>spark.sql.files.maxPartitionBytes</code></td>
<td>134217728 (128 MB)</td>
<td>
The maximum number of bytes to pack into a single partition when reading files.
</td>
</tr>
<tr>
<td><code>spark.sql.files.openCostInBytes</code></td>
<td>4194304 (4 MB)</td>
<td>
The estimated cost to open a file, measured by the number of bytes could be scanned in the same
time. This is used when putting multiple files into a partition. It is better to over estimated,
then the partitions with small files will be faster than partitions with bigger files (which is
scheduled first).
</td>
</tr>
<tr>
<td><code>spark.sql.autoBroadcastJoinThreshold</code></td>
<td>10485760 (10 MB)</td>