spark-instrumented-optimizer/sql/core
Max Gekk b7cabc80e6 [SPARK-31553][SQL] Revert "[SPARK-29048] Improve performance on Column.isInCollection() with a large size collection"
### What changes were proposed in this pull request?
This reverts commit 5631a96367.

Closes #28328

### Why are the changes needed?
The PR  https://github.com/apache/spark/pull/25754 introduced a bug in `isInCollection`. For example, if the SQL config `spark.sql.optimizer.inSetConversionThreshold`is set to 10 (by default):
```scala
val set = (0 to 20).map(_.toString).toSet
val data = Seq("1").toDF("x")
data.select($"x".isInCollection(set).as("isInCollection")).show()
```
The function must return **'true'** because "1" is in the set of "0" ... "20" but it returns "false":
```
+--------------+
|isInCollection|
+--------------+
|         false|
+--------------+
```

### Does this PR introduce any user-facing change?
Yes

### How was this patch tested?
```
$ ./build/sbt "test:testOnly *ColumnExpressionSuite"
```

Closes #28388 from MaxGekk/fix-isInCollection-revert.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-04-28 14:10:50 +00:00
..
benchmarks [SPARK-31364][SQL][TESTS] Benchmark Parquet Nested Field Predicate Pushdown 2020-04-24 22:10:58 +00:00
src [SPARK-31553][SQL] Revert "[SPARK-29048] Improve performance on Column.isInCollection() with a large size collection" 2020-04-28 14:10:50 +00:00
v1.2/src [SPARK-31489][SQL] Fix pushing down filters with java.time.LocalDate values in ORC 2020-04-26 15:49:00 -07:00
v2.3/src [SPARK-31489][SQL] Fix pushing down filters with java.time.LocalDate values in ORC 2020-04-26 15:49:00 -07:00
pom.xml [SPARK-31272][SQL] Support DB2 Kerberos login in JDBC connector 2020-04-22 17:10:30 -07:00