[SPARK-15894][SQL][DOC] Update docs for controlling #partitions

## What changes were proposed in this pull request? Update docs for two parameters `spark.sql.files.maxPartitionBytes` and `spark.sql.files.openCostInBytes ` in Other Configuration Options. ## How was this patch tested? N/A Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #13797 from maropu/SPARK-15894-2.
2016-06-21 14:27:16 +08:00 · 2016-06-21 14:27:16 +08:00 · 41e0ffb19f
parent 58f6e27dd7
commit 41e0ffb19f
1 changed files with 17 additions and 0 deletions
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@ -2015,6 +2015,23 @@ that these options will be deprecated in future release as more optimizations ar

 <table class="table">
  <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
+  <tr>
+    <td><code>spark.sql.files.maxPartitionBytes</code></td>
+    <td>134217728 (128 MB)</td>
+    <td>
+      The maximum number of bytes to pack into a single partition when reading files.
+    </td>
+  </tr>
+  <tr>
+    <td><code>spark.sql.files.openCostInBytes</code></td>
+    <td>4194304 (4 MB)</td>
+    <td>
+      The estimated cost to open a file, measured by the number of bytes could be scanned in the same
+      time. This is used when putting multiple files into a partition. It is better to over estimated,
+      then the partitions with small files will be faster than partitions with bigger files (which is
+      scheduled first).
+    </td>
+  </tr>
  <tr>
    <td><code>spark.sql.autoBroadcastJoinThreshold</code></td>
    <td>10485760 (10 MB)</td>