[SPARK-15230][SQL] distinct() does not handle column name with dot properly

## What changes were proposed in this pull request?

When table is created with column name containing dot, distinct() will fail to run. For example,
```scala
val rowRDD = sparkContext.parallelize(Seq(Row(1), Row(1), Row(2)))
val schema = StructType(Array(StructField("column.with.dot", IntegerType, nullable = false)))
val df = spark.createDataFrame(rowRDD, schema)
```
running the following will have no problem:
```scala
df.select(new Column("`column.with.dot`"))
```
but running the query with additional distinct() will cause exception:
```scala
df.select(new Column("`column.with.dot`")).distinct()
```

The issue is that distinct() will try to resolve the column name, but the column name in the schema does not have backtick with it. So the solution is to add the backtick before passing the column name to resolve().

## How was this patch tested?

Added a new test case.

Author: bomeng <bmeng@us.ibm.com>

Closes #13140 from bomeng/SPARK-15230.
This commit is contained in:
bomeng 2016-06-23 11:06:19 +08:00 committed by Wenchen Fan
parent 37f3be5d29
commit 925884a612
2 changed files with 12 additions and 1 deletions

View file

@ -1812,7 +1812,13 @@ class Dataset[T] private[sql](
* @since 2.0.0
*/
def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
val groupCols = colNames.map(resolve)
val resolver = sparkSession.sessionState.analyzer.resolver
val allColumns = queryExecution.analyzed.output
val groupCols = colNames.map { colName =>
allColumns.find(col => resolver(col.name, colName)).getOrElse(
throw new AnalysisException(
s"""Cannot resolve column name "$colName" among (${schema.fieldNames.mkString(", ")})"""))
}
val groupColExprIds = groupCols.map(_.exprId)
val aggCols = logicalPlan.output.map { attr =>
if (groupColExprIds.contains(attr.exprId)) {

View file

@ -1536,4 +1536,9 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
Utils.deleteRecursively(baseDir)
}
}
test("SPARK-15230: distinct() does not handle column name with dot properly") {
val df = Seq(1, 1, 2).toDF("column.with.dot")
checkAnswer(df.distinct(), Row(1) :: Row(2) :: Nil)
}
}