[SPARK-8437] [DOCS] Using directory path without wildcard for filename slow for large number of files with wholeTextFiles and binaryFiles

Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/'

Author: Sean Owen <sowen@cloudera.com>

Closes #7036 from srowen/SPARK-8437 and squashes the following commits:

0e813ae [Sean Owen] Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/'
This commit is contained in:
Sean Owen 2015-06-29 17:21:35 -07:00 committed by Andrew Or
parent fbf75738fe
commit 5d30eae560

View file

@ -831,6 +831,8 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
* }}}
*
* @note Small files are preferred, large file is also allowable, but may cause bad performance.
* @note On some filesystems, `.../path/*` can be a more efficient way to read all files in a directory
* rather than `.../path/` or `.../path`
*
* @param minPartitions A suggestion value of the minimal splitting number for input data.
*/
@ -878,9 +880,11 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
* (a-hdfs-path/part-nnnnn, its content)
* }}}
*
* @param minPartitions A suggestion value of the minimal splitting number for input data.
*
* @note Small files are preferred; very large files may cause bad performance.
* @note On some filesystems, `.../path/*` can be a more efficient way to read all files in a directory
* rather than `.../path/` or `.../path`
*
* @param minPartitions A suggestion value of the minimal splitting number for input data.
*/
@Experimental
def binaryFiles(