[SPARK-17104][SQL] LogicalRelation.newInstance should follow the semantics of MultiInstanceRelation

## What changes were proposed in this pull request?

Currently `LogicalRelation.newInstance()` simply creates another `LogicalRelation` object with the same parameters. However, the `newInstance()` method inherited from `MultiInstanceRelation` should return a copy of object with unique expression ids. Current `LogicalRelation.newInstance()` can cause failure when doing self-join.

## How was this patch tested?

Jenkins tests.

Author: Liang-Chi Hsieh <simonh@tw.ibm.com>

Closes #14682 from viirya/fix-localrelation.
This commit is contained in:
Liang-Chi Hsieh 2016-08-20 23:29:48 +08:00 committed by Wenchen Fan
parent 3e5fdeb3fb
commit 31a0155720
2 changed files with 16 additions and 2 deletions

View file

@ -79,11 +79,18 @@ case class LogicalRelation(
/** Used to lookup original attribute capitalization */
val attributeMap: AttributeMap[AttributeReference] = AttributeMap(output.map(o => (o, o)))
def newInstance(): this.type =
/**
* Returns a new instance of this LogicalRelation. According to the semantics of
* MultiInstanceRelation, this method returns a copy of this object with
* unique expression ids. We respect the `expectedOutputAttributes` and create
* new instances of attributes in it.
*/
override def newInstance(): this.type = {
LogicalRelation(
relation,
expectedOutputAttributes,
expectedOutputAttributes.map(_.map(_.newInstance())),
metastoreTableIdentifier).asInstanceOf[this.type]
}
override def refresh(): Unit = relation match {
case fs: HadoopFsRelation => fs.refresh()

View file

@ -589,6 +589,13 @@ class ParquetMetastoreSuite extends ParquetPartitioningTest {
}
}
}
test("self-join") {
val table = spark.table("normal_parquet")
val selfJoin = table.as("t1").join(table.as("t2"))
checkAnswer(selfJoin,
sql("SELECT * FROM normal_parquet x JOIN normal_parquet y"))
}
}
/**