[SPARK-16302][SQL] Set the right number of partitions for reading data from a local collection.
follow #13137 This pr sets the right number of partitions when reading data from a local collection. Query 'val df = Seq((1, 2)).toDF("key", "value").count' always use defaultParallelism tasks. So it causes run many empty or small tasks. Manually tested and checked. Author: Lianhui Wang <lianhuiwang09@gmail.com> Closes #13979 from lianhuiwang/localTable-Parallel.
This commit is contained in:
parent
5bea8757cc
commit
06e33985c6
|
@ -42,7 +42,10 @@ case class LocalTableScanExec(
|
|||
}
|
||||
}
|
||||
|
||||
private lazy val rdd = sqlContext.sparkContext.parallelize(unsafeRows)
|
||||
private lazy val numParallelism: Int = math.min(math.max(unsafeRows.length, 1),
|
||||
sqlContext.sparkContext.defaultParallelism)
|
||||
|
||||
private lazy val rdd = sqlContext.sparkContext.parallelize(unsafeRows, numParallelism)
|
||||
|
||||
protected override def doExecute(): RDD[InternalRow] = {
|
||||
val numOutputRows = longMetric("numOutputRows")
|
||||
|
|
Loading…
Reference in a new issue