spark-instrumented-optimizer

History

Max Gekk b7cabc80e6 [SPARK-31553][SQL] Revert "[SPARK-29048] Improve performance on Column.isInCollection() with a large size collection" ### What changes were proposed in this pull request? This reverts commit `5631a96367`. Closes #28328 ### Why are the changes needed? The PR https://github.com/apache/spark/pull/25754 introduced a bug in `isInCollection`. For example, if the SQL config `spark.sql.optimizer.inSetConversionThreshold`is set to 10 (by default): ```scala val set = (0 to 20).map(_.toString).toSet val data = Seq("1").toDF("x") data.select($"x".isInCollection(set).as("isInCollection")).show() ``` The function must return 'true' because "1" is in the set of "0" ... "20" but it returns "false": ``` +--------------+ \|isInCollection\| +--------------+ \| false\| +--------------+ ``` ### Does this PR introduce any user-facing change? Yes ### How was this patch tested? ``` $ ./build/sbt "test:testOnly *ColumnExpressionSuite" ``` Closes #28388 from MaxGekk/fix-isInCollection-revert. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>		2020-04-28 14:10:50 +00:00
..
benchmarks	[SPARK-31364][SQL][TESTS] Benchmark Parquet Nested Field Predicate Pushdown	2020-04-24 22:10:58 +00:00
src	[SPARK-31553][SQL] Revert "[SPARK-29048] Improve performance on Column.isInCollection() with a large size collection"	2020-04-28 14:10:50 +00:00
v1.2/src	[SPARK-31489][SQL] Fix pushing down filters with `java.time.LocalDate` values in ORC	2020-04-26 15:49:00 -07:00
v2.3/src	[SPARK-31489][SQL] Fix pushing down filters with `java.time.LocalDate` values in ORC	2020-04-26 15:49:00 -07:00
pom.xml	[SPARK-31272][SQL] Support DB2 Kerberos login in JDBC connector	2020-04-22 17:10:30 -07:00