ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Bruce Robbins	09b05487b7	[SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection ## What changes were proposed in this pull request? When creating some unsafe projections, Spark rebuilds the map of schema attributes once for each expression in the projection. Some file format readers create one unsafe projection per input file, others create one per task. ProjectExec also creates one unsafe projection per task. As a result, for wide queries on wide tables, Spark might build the map of schema attributes hundreds of thousands of times. This PR changes two functions to reuse the same AttributeSeq instance when creating BoundReference objects for each expression in the projection. This avoids the repeated rebuilding of the map of schema attributes. ### Benchmarks The time saved by this PR depends on size of the schema, size of the projection, number of input files (or number of file splits), number of tasks, and file format. I chose a couple of example cases. In the following tests, I ran the query ```sql select * from table where id1 = 1 ``` Matching rows are about 0.2% of the table. #### Orc table 6000 columns, 500K rows, 34 input files baseline \| pr \| improvement ----\|----\|---- 1.772306 min \| 1.487267 min \| 16.082943% #### Orc table 6000 columns, 500K rows, 17 input files baseline \| pr \| improvement ----\|----\|---- 1.656400 min \| 1.423550 min \| 14.057595% #### Orc table 60 columns, 50M rows, 34 input files baseline \| pr \| improvement ----\|----\|---- 0.299878 min \| 0.290339 min \| 3.180926% #### Parquet table 6000 columns, 500K rows, 34 input files baseline \| pr \| improvement ----\|----\|---- 1.478306 min \| 1.373728 min \| 7.074165% Note: The parquet reader does not create an unsafe projection. However, the filter operation in the query causes the planner to add a ProjectExec, which does create an unsafe projection for each task. So these results have nothing to do with Parquet itself. #### Parquet table 60 columns, 50M rows, 34 input files baseline \| pr \| improvement ----\|----\|---- 0.245006 min \| 0.242200 min \| 1.145099% #### CSV table 6000 columns, 500K rows, 34 input files baseline \| pr \| improvement ----\|----\|---- 2.390117 min \| 2.182778 min \| 8.674844% #### CSV table 60 columns, 50M rows, 34 input files baseline \| pr \| improvement ----\|----\|---- 1.520911 min \| 1.510211 min \| 0.703526% ## How was this patch tested? SQL unit tests Python core and SQL test Closes #23392 from bersprockets/norebuild. Authored-by: Bruce Robbins <bersprockets@gmail.com> Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>	2019-01-13 23:54:19 +01:00
Maxim Gekk	4ff2b94a7c	[SPARK-26503][CORE][DOC][FOLLOWUP] Get rid of spark.sql.legacy.timeParser.enabled ## What changes were proposed in this pull request? The SQL config `spark.sql.legacy.timeParser.enabled` was removed by https://github.com/apache/spark/pull/23495. The PR cleans up the SQL migration guide and the comment for `UnixTimestamp`. Closes #23529 from MaxGekk/get-rid-off-legacy-parser-followup. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-01-13 11:20:22 +08:00
Kengo Seki	3bd77aa9f6	[SPARK-26564] Fix wrong assertions and error messages for parameter checking ## What changes were proposed in this pull request? If users set equivalent values to spark.network.timeout and spark.executor.heartbeatInterval, they get the following message: ``` java.lang.IllegalArgumentException: requirement failed: The value of spark.network.timeout=120s must be no less than the value of spark.executor.heartbeatInterval=120s. ``` But it's misleading since it can be read as they could be equal. So this PR replaces "no less than" with "greater than". Also, it fixes similar inconsistencies found in MLlib and SQL components. ## How was this patch tested? Ran Spark with equivalent values for them manually and confirmed that the revised message was displayed. Closes #23488 from sekikn/SPARK-26564. Authored-by: Kengo Seki <sekikn@apache.org> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-12 14:53:33 -06:00
Oleksii Shkarupin	5b37092311	[SPARK-26538][SQL] Set default precision and scale for elements of postgres numeric array ## What changes were proposed in this pull request? When determining CatalystType for postgres columns with type `numeric[]` set the type of array element to `DecimalType(38, 18)` instead of `DecimalType(0,0)`. ## How was this patch tested? Tested with modified `org.apache.spark.sql.jdbc.JDBCSuite`. Ran the `PostgresIntegrationSuite` manually. Closes #23456 from a-shkarupin/postgres_numeric_array. Lead-authored-by: Oleksii Shkarupin <a.shkarupin@gmail.com> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2019-01-12 11:06:39 -08:00
Dongjoon Hyun	3587a9a227	[SPARK-26607][SQL][TEST] Remove Spark 2.2.x testing from HiveExternalCatalogVersionsSuite ## What changes were proposed in this pull request? The vote of final release of `branch-2.2` passed and the branch goes EOL. This PR removes Spark 2.2.x from the testing coverage. ## How was this patch tested? Pass the Jenkins. Closes #23526 from dongjoon-hyun/SPARK-26607. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2019-01-11 22:53:58 -08:00
Mukul Murthy	ae382c94dd	[SPARK-26586][SS] Fix race condition that causes streams to run with unexpected confs ## What changes were proposed in this pull request? Fix race condition where streams can have unexpected conf values. New streaming queries should run with isolated SparkSessions so that they aren't affected by conf updates after they are started. In StreamExecution, the parent SparkSession is cloned and used to run each batch, but this cloning happens in a separate thread and may happen after DataStreamWriter.start() returns. If a stream is started and a conf key is set immediately after, the stream is likely to have the new value. ## How was this patch tested? New unit test that fails prior to the production change and passes with it. Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #23513 from mukulmurthy/26586. Authored-by: Mukul Murthy <mukul.murthy@gmail.com> Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>	2019-01-11 11:46:14 -08:00
Liang-Chi Hsieh	50ebf3a43b	[SPARK-26551][SQL] Fix schema pruning error when selecting one complex field and having is not null predicate on another one ## What changes were proposed in this pull request? Schema pruning has errors when selecting one complex field and having is not null predicate on another one: ```scala val query = sql("select * from contacts") .where("name.middle is not null") .select( "id", "name.first", "name.middle", "name.last" ) .where("last = 'Jones'") .select(count("id")) ``` ``` java.lang.IllegalArgumentException: middle does not exist. Available: last [info] at org.apache.spark.sql.types.StructType.$anonfun$fieldIndex$1(StructType.scala:303) [info] at scala.collection.immutable.Map$Map1.getOrElse(Map.scala:119) [info] at org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:302) [info] at org.apache.spark.sql.execution.ProjectionOverSchema.$anonfun$getProjection$6(ProjectionOverSchema.scala:58) [info] at scala.Option.map(Option.scala:163) [info] at org.apache.spark.sql.execution.ProjectionOverSchema.getProjection(ProjectionOverSchema.scala:56) [info] at org.apache.spark.sql.execution.ProjectionOverSchema.unapply(ProjectionOverSchema.scala:32) [info] at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaPruning$$anonfun$$nestedInanonfun$buildNewProjection$1$1.applyOrElse(Parque tSchemaPruning.scala:153) ``` ## How was this patch tested? Added tests. Closes #23474 from viirya/SPARK-26551. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: DB Tsai <d_tsai@apple.com>	2019-01-11 19:23:32 +00:00
Jungtaek Lim (HeartSaVioR)	d9e4cf67c0	[SPARK-26482][CORE] Use ConfigEntry for hardcoded configs for ui categories ## What changes were proposed in this pull request? The PR makes hardcoded configs below to use `ConfigEntry`. * spark.ui * spark.ssl * spark.authenticate * spark.master.rest * spark.master.ui * spark.metrics * spark.admin * spark.modify.acl This patch doesn't change configs which are not relevant to SparkConf (e.g. system properties). ## How was this patch tested? Existing tests. Closes #23423 from HeartSaVioR/SPARK-26466. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-01-11 10:18:07 -08:00
Sean Owen	51a6ba0181	[SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled ## What changes were proposed in this pull request? Per discussion in #23391 (comment) this proposes to just remove the old pre-Spark-3 time parsing behavior. This is a rebase of https://github.com/apache/spark/pull/23411 ## How was this patch tested? Existing tests. Closes #23495 from srowen/SPARK-26503.2. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-11 08:53:12 -06:00
Wenchen Fan	1f1d98c6fa	[SPARK-26580][SQL] remove Scala 2.11 hack for Scala UDF ## What changes were proposed in this pull request? In https://github.com/apache/spark/pull/22732 , we tried our best to keep the behavior of Scala UDF unchanged in Spark 2.4. However, since Spark 3.0, Scala 2.12 is the default. The trick that was used to keep the behavior unchanged doesn't work with Scala 2.12. This PR proposes to remove the Scala 2.11 hack, as it's not useful. ## How was this patch tested? existing tests. Closes #23498 from cloud-fan/udf. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-01-11 14:52:13 +08:00
Dongjoon Hyun	270916f8cd	[SPARK-26584][SQL] Remove `spark.sql.orc.copyBatchToSpark` internal conf ## What changes were proposed in this pull request? This PR aims to remove internal ORC configuration to simplify the code path for Spark 3.0.0. This removes the configuration `spark.sql.orc.copyBatchToSpark` and related ORC codes including tests and benchmarks. ## How was this patch tested? Pass the Jenkins with the reduced test coverage. Closes #23503 from dongjoon-hyun/SPARK-26584. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2019-01-10 08:42:23 -08:00
Sean Owen	2f8a938805	[SPARK-26539][CORE] Remove spark.memory.useLegacyMode and StaticMemoryManager ## What changes were proposed in this pull request? Remove spark.memory.useLegacyMode and StaticMemoryManager. Update tests that used the StaticMemoryManager to equivalent use of UnifiedMemoryManager. ## How was this patch tested? Existing tests, with modifications to make them work with a different mem manager. Closes #23457 from srowen/SPARK-26539. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-10 08:57:44 -06:00
Wenchen Fan	6955638eae	[SPARK-26459][SQL] replace UpdateNullabilityInAttributeReferences with FixNullability ## What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/18576 The newly added rule `UpdateNullabilityInAttributeReferences` does the same thing the `FixNullability` does, we only need to keep one of them. This PR removes `UpdateNullabilityInAttributeReferences`, and use `FixNullability` to replace it. Also rename it to `UpdateAttributeNullability` ## How was this patch tested? existing tests Closes #23390 from cloud-fan/nullable. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2019-01-10 20:15:25 +09:00
Maxim Gekk	73c7b126c6	[SPARK-26546][SQL] Caching of java.time.format.DateTimeFormatter ## What changes were proposed in this pull request? Added a cache for java.time.format.DateTimeFormatter instances with keys consist of pattern and locale. This should allow to avoid parsing of timestamp/date patterns each time when new instance of `TimestampFormatter`/`DateFormatter` is created. ## How was this patch tested? By existing test suites `TimestampFormatterSuite`/`DateFormatterSuite` and `JsonFunctionsSuite`/`JsonSuite`. Closes #23462 from MaxGekk/time-formatter-caching. Lead-authored-by: Maxim Gekk <max.gekk@gmail.com> Co-authored-by: Maxim Gekk <maxim.gekk@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-01-10 10:32:20 +08:00
Jamison Bennett	1a47233f99	[SPARK-26493][SQL] Allow multiple spark.sql.extensions ## What changes were proposed in this pull request? Allow multiple spark.sql.extensions to be specified in the configuration. ## How was this patch tested? New tests are added. Closes #23398 from jamisonbennett/SPARK-26493. Authored-by: Jamison Bennett <jamison.bennett@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-01-10 10:23:03 +08:00
maryannxue	2d01bccbd4	[SPARK-26065][FOLLOW-UP][SQL] Fix the Failure when having two Consecutive Hints ## What changes were proposed in this pull request? This is to fix a bug in https://github.com/apache/spark/pull/23036, which would lead to an exception in case of two consecutive hints. ## How was this patch tested? Added a new test. Closes #23501 from maryannxue/query-hint-followup. Authored-by: maryannxue <maryannxue@apache.org> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2019-01-09 14:31:26 -08:00
Wenchen Fan	e853afb416	[SPARK-26448][SQL] retain the difference between 0.0 and -0.0 ## What changes were proposed in this pull request? In https://github.com/apache/spark/pull/23043 , we introduced a behavior change: Spark users are not able to distinguish 0.0 and -0.0 anymore. This PR proposes an alternative fix to the original bug, to retain the difference between 0.0 and -0.0 inside Spark. The idea is, we can rewrite the window partition key, join key and grouping key during logical phase, to normalize the special floating numbers. Thus only operators care about special floating numbers need to pay the perf overhead, and end users can distinguish -0.0. ## How was this patch tested? existing test Closes #23388 from cloud-fan/minor. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2019-01-09 13:50:32 -08:00
Peter Toth	49c062b2e0	[SPARK-25484][SQL][TEST] Refactor ExternalAppendOnlyUnsafeRowArrayBenchmark ## What changes were proposed in this pull request? Refactor ExternalAppendOnlyUnsafeRowArrayBenchmark to use main method. ## How was this patch tested? Manually tested and regenerated results. Please note that `spark.memory.debugFill` setting has a huge impact on this benchmark. Since it is set to true by default when running the benchmark from SBT, we need to disable it: ``` SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt ";project sql;set javaOptions in Test += \"-Dspark.memory.debugFill=false\";test:runMain org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark" ``` Closes #22617 from peter-toth/SPARK-25484. Lead-authored-by: Peter Toth <peter.toth@gmail.com> Co-authored-by: Peter Toth <ptoth@hortonworks.com> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2019-01-09 09:54:21 -08:00
Gengliang Wang	311f32f37f	[SPARK-26571][SQL] Update Hive Serde mapping with canonical name of Parquet and Orc FileFormat ## What changes were proposed in this pull request? Currently Spark table maintains Hive catalog storage format, so that Hive client can read it. In `HiveSerDe.scala`, Spark uses a mapping from its data source to HiveSerde. The mapping is old, we need to update with latest canonical name of Parquet and Orc FileFormat. Otherwise the following queries will result in wrong Serde value in Hive table(default value `org.apache.hadoop.mapred.SequenceFileInputFormat`), and Hive client will fail to read the output table: ``` df.write.format("org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat").saveAsTable(..) ``` ``` df.write.format("org.apache.spark.sql.execution.datasources.orc.OrcFileFormat").saveAsTable(..) ``` This minor PR is to fix the mapping. ## How was this patch tested? Unit test. Closes #23491 from gengliangwang/fixHiveSerdeMap. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-01-09 10:18:33 +08:00
Marcelo Vanzin	2783e4c45f	[SPARK-24522][UI] Create filter to apply HTTP security checks consistently. Currently there is code scattered in a bunch of places to do different things related to HTTP security, such as access control, setting security-related headers, and filtering out bad content. This makes it really easy to miss these things when writing new UI code. This change creates a new filter that does all of those things, and makes sure that all servlet handlers that are attached to the UI get the new filter and any user-defined filters consistently. The extent of the actual features should be the same as before. The new filter is added at the end of the filter chain, because authentication is done by custom filters and thus needs to happen first. This means that custom filters see unfiltered HTTP requests - which is actually the current behavior anyway. As a side-effect of some of the code refactoring, handlers added after the initial set also get wrapped with a GzipHandler, which didn't happen before. Tested with added unit tests and in a history server with SPNEGO auth configured. Closes #23302 from vanzin/SPARK-24522. Authored-by: Marcelo Vanzin <vanzin@cloudera.com> Signed-off-by: Imran Rashid <irashid@cloudera.com>	2019-01-08 11:25:33 -06:00
“attilapiros”	c101182b10	[SPARK-26002][SQL] Fix day of year calculation for Julian calendar days ## What changes were proposed in this pull request? Fixing leap year calculations for date operators (year/month/dayOfYear) where the Julian calendars are used (before 1582-10-04). In a Julian calendar every years which are multiples of 4 are leap years (there is no extra exception for years multiples of 100). ## How was this patch tested? With a unit test ("SPARK-26002: correct day of year calculations for Julian calendar years") which focuses to these corner cases. Manually: ``` scala> sql("select year('1500-01-01')").show() +------------------------------+ \|year(CAST(1500-01-01 AS DATE))\| +------------------------------+ \| 1500\| +------------------------------+ scala> sql("select dayOfYear('1100-01-01')").show() +-----------------------------------+ \|dayofyear(CAST(1100-01-01 AS DATE))\| +-----------------------------------+ \| 1\| +-----------------------------------+ ``` Closes #23000 from attilapiros/julianOffByDays. Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-01-09 01:24:47 +08:00
Wenchen Fan	72a572ffd6	[SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any ## What changes were proposed in this pull request? For Scala UDF, when checking input nullability, we will skip inputs with type `Any`, and only check the inputs that provide nullability info. We should do the same for checking input types. ## How was this patch tested? new tests Closes #23275 from cloud-fan/udf. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-01-08 22:44:33 +08:00
Yuming Wang	29a7d2da44	[SPARK-24196][SQL] Implement Spark's own GetSchemasOperation ## What changes were proposed in this pull request? This PR fix SQL Client tools can't show DBs by implementing Spark's own `GetSchemasOperation`. ## How was this patch tested? unit tests and manual tests ![image](https://user-images.githubusercontent.com/5399861/47782885-3dd5d400-dd3c-11e8-8586-59a8c15c7020.png) ![image](https://user-images.githubusercontent.com/5399861/47782899-4928ff80-dd3c-11e8-9d2d-ba9580ba4301.png) Closes #22903 from wangyum/SPARK-24196. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2019-01-07 18:59:43 -08:00
Hyukjin Kwon	5102ccc4ab	[SPARK-26339][SQL][FOLLOW-UP] Issue warning instead of throwing an exception for underscore files ## What changes were proposed in this pull request? The PR https://github.com/apache/spark/pull/23446 happened to introduce a behaviour change - empty dataframes can't be read anymore from underscore files. It looks controversial to allow or disallow this case so this PR targets to fix to issue warning instead of throwing an exception to be more conservative. Before ```scala scala> spark.read.schema("a int").parquet("_tmp").show() org.apache.spark.sql.AnalysisException: All paths were ignored: file:/.../_tmp file:/.../_tmp1; at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:570) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:360) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:231) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:219) at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:651) at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:635) ... 49 elided scala> spark.read.text("_tmp").show() org.apache.spark.sql.AnalysisException: All paths were ignored: file:/.../_tmp file:/.../_tmp1; at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:570) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:360) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:231) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:219) at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:723) at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:695) ... 49 elided ``` After ```scala scala> spark.read.schema("a int").parquet("_tmp").show() 19/01/07 15:14:43 WARN DataSource: All paths were ignored: file:/.../_tmp file:/.../_tmp1 +---+ \| a\| +---+ +---+ scala> spark.read.text("_tmp").show() 19/01/07 15:14:51 WARN DataSource: All paths were ignored: file:/.../_tmp file:/.../_tmp1 +-----+ \|value\| +-----+ +-----+ ``` ## How was this patch tested? Manually tested as above. Closes #23481 from HyukjinKwon/SPARK-26339. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2019-01-07 15:48:54 -08:00
Marco Gaido	1a641525e6	[SPARK-26491][CORE][TEST] Use ConfigEntry for hardcoded configs for test categories ## What changes were proposed in this pull request? The PR makes hardcoded `spark.test` and `spark.testing` configs to use `ConfigEntry` and put them in the config package. ## How was this patch tested? existing UTs Closes #23413 from mgaido91/SPARK-26491. Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-01-07 15:35:33 -08:00
maryannxue	98be8953c7	[SPARK-26065][SQL] Change query hint from a `LogicalPlan` to a field ## What changes were proposed in this pull request? The existing query hint implementation relies on a logical plan node `ResolvedHint` to store query hints in logical plans, and on `Statistics` in physical plans. Since `ResolvedHint` is not really a logical operator and can break the pattern matching for existing and future optimization rules, it is a issue to the Optimizer as the old `AnalysisBarrier` was to the Analyzer. Given the fact that all our query hints are either 1) a join hint, i.e., broadcast hint; or 2) a re-partition hint, which is indeed an operator, we only need to add a hint field on the Join plan and that will be a good enough solution for the current hint usage. This PR is to let `Join` node have a hint for its left sub-tree and another hint for its right sub-tree and each hint is a merged result of all the effective hints specified in the corresponding sub-tree. The "effectiveness" of a hint, i.e., whether that hint should be propagated to the `Join` node, is currently consistent with the hint propagation rules originally implemented in the `Statistics` approach. Note that the `ResolvedHint` node still has to live through the analysis stage because of the `Dataset` interface, but it will be got rid of and moved to the `Join` node in the "pre-optimization" stage. This PR also introduces a change in how hints work with join reordering. Before this PR, hints would stop join reordering. For example, in "a.join(b).join(c).hint("broadcast").join(d)", the broadcast hint would stop d from participating in the cost-based join reordering while still allowing reordering from under the hint node. After this PR, though, the broadcast hint will not interfere with join reordering at all, and after reordering if a relation associated with a hint stays unchanged or equivalent to the original relation, the hint will be retained, otherwise will be discarded. For example, the original plan is like "a.join(b).hint("broadcast").join(c).hint("broadcast").join(d)", thus the join order is "a JOIN b JOIN c JOIN d". So if after reordering the join order becomes "a JOIN b JOIN (c JOIN d)", the plan will be like "a.join(b).hint("broadcast").join(c.join(d))"; but if after reordering the join order becomes "a JOIN c JOIN b JOIN d", the plan will be like "a.join(c).join(b).hint("broadcast").join(d)". ## How was this patch tested? Added new tests. Closes #23036 from maryannxue/query-hint. Authored-by: maryannxue <maryannxue@apache.org> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2019-01-07 13:59:40 -08:00
ayudovin	868e02533d	[SPARK-26383][CORE] NPE when use DataFrameReader.jdbc with wrong URL ### What changes were proposed in this pull request? When passing wrong url to jdbc then It would throw IllegalArgumentException instead of NPE. ### How was this patch tested? Adding test case to Existing tests in JDBCSuite Closes #23464 from ayudovin/fixing-npe. Authored-by: ayudovin <a.yudovin6695@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-07 08:58:33 -06:00
Dongjoon Hyun	61133cb8a6	[SPARK-26536][BUILD][FOLLOWUP][TEST-MAVEN] Make StreamingReadSupport public for maven testing ## What changes were proposed in this pull request? `StreamingReadSupport` is designed to be a `package` interface. Mockito seems to complain during `Maven` testing. This doesn't fail in `sbt` and IntelliJ. For mock-testing purpose, this PR makes it `public` interface and adds explicit comments like `public interface ReadSupport` ```scala EpochCoordinatorSuite: * RUN ABORTED * java.lang.IllegalAccessError: tried to access class org.apache.spark.sql.sources.v2.reader.streaming.StreamingReadSupport from class org.apache.spark.sql.sources.v2.reader.streaming.ContinuousReadSupport$MockitoMock$58628338 at org.apache.spark.sql.sources.v2.reader.streaming.ContinuousReadSupport$MockitoMock$58628338.<clinit>(Unknown Source) at sun.reflect.GeneratedSerializationConstructorAccessor632.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.objenesis.instantiator.sun.SunReflectionFactoryInstantiator.newInstance(SunReflectionFactoryInstantiator.java:48) at org.objenesis.ObjenesisBase.newInstance(ObjenesisBase.java:73) at org.mockito.internal.creation.instance.ObjenesisInstantiator.newInstance(ObjenesisInstantiator.java:19) at org.mockito.internal.creation.bytebuddy.SubclassByteBuddyMockMaker.createMock(SubclassByteBuddyMockMaker.java:47) at org.mockito.internal.creation.bytebuddy.ByteBuddyMockMaker.createMock(ByteBuddyMockMaker.java:25) at org.mockito.internal.util.MockUtil.createMock(MockUtil.java:35) at org.mockito.internal.MockitoCore.mock(MockitoCore.java:69) ``` ## How was this patch tested? Pass the Jenkins with Maven build Closes #23463 from dongjoon-hyun/SPARK-26536-2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2019-01-06 21:00:10 -08:00
Maxim Gekk	b305d71625	[SPARK-26547][SQL] Remove duplicate toHiveString from HiveUtils ## What changes were proposed in this pull request? The `toHiveString()` and `toHiveStructString` methods were removed from `HiveUtils` because they have been already implemented in `HiveResult`. One related test was moved to `HiveResultSuite`. ## How was this patch tested? By tests from `hive-thriftserver`. Closes #23466 from MaxGekk/dedup-hive-result-string. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2019-01-06 17:36:06 -08:00
Hirobe Keiichi	9d8e9b394b	[SPARK-26339][SQL] Throws better exception when reading files that start with underscore ## What changes were proposed in this pull request? My pull request #23288 was resolved and merged to master, but it turned out later that my change breaks another regression test. Because we cannot reopen pull request, I create a new pull request here. Commit 92934b4 is only change after pull request #23288. `CheckFileExist` was avoided at 239cfa4 after discussing #23288 (comment). But, that change turned out to be wrong because we should not check if argument checkFileExist is false. Test `27e42c1de5/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala (L2555)` failed when we avoided checkFileExist, but now successed after commit 92934b4 . ## How was this patch tested? Both of below tests were passed. ``` testOnly org.apache.spark.sql.execution.datasources.csv.CSVSuite testOnly org.apache.spark.sql.SQLQuerySuite ``` Closes #23446 from KeiichiHirobe/SPARK-26339. Authored-by: Hirobe Keiichi <keiichi_hirobe@forcia.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-06 08:52:09 -06:00
Dave DeCaprio	a17851cb95	[SPARK-26548][SQL] Don't hold CacheManager write lock while computing executedPlan ## What changes were proposed in this pull request? Address SPARK-26548, in Spark 2.4.0, the CacheManager holds a write lock while computing the executedPlan for a cached logicalPlan. In some cases with very large query plans this can be an expensive operation, taking minutes to run. The entire cache is blocked during this time. This PR changes that so the writeLock is only obtained after the executedPlan is generated, this reduces the time the lock is held to just the necessary time when the shared data structure is being updated. gatorsmile and cloud-fan - You can committed patches in this area before. This is a small incremental change. ## How was this patch tested? Has been tested on a live system where the blocking was causing major issues and it is working well. CacheManager has no explicit unit test but is used in many places internally as part of the SharedState. Closes #23469 from DaveDeCaprio/optimizer-unblocked. Lead-authored-by: Dave DeCaprio <daved@alum.mit.edu> Co-authored-by: David DeCaprio <daved@alum.mit.edu> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2019-01-05 19:20:35 -08:00
Kris Mok	4ab5b5b918	[SPARK-26545] Fix typo in EqualNullSafe's truth table comment ## What changes were proposed in this pull request? The truth table comment in EqualNullSafe incorrectly marked FALSE results as UNKNOWN. ## How was this patch tested? N/A Closes #23461 from rednaxelafx/fix-typo. Authored-by: Kris Mok <kris.mok@databricks.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2019-01-05 14:37:04 -08:00
Maxim Gekk	980e6bcd1c	[SPARK-26246][SQL][FOLLOWUP] Inferring TimestampType from JSON ## What changes were proposed in this pull request? Added new JSON option `inferTimestamp` (`true` by default) to control inferring of `TimestampType` from string values. ## How was this patch tested? Add new UT to `JsonInferSchemaSuite`. Closes #23455 from MaxGekk/json-infer-time-followup. Authored-by: Maxim Gekk <maxim.gekk@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-01-05 21:50:27 +08:00
Marco Gaido	1af1190bee	[SPARK-26078][SQL][FOLLOWUP] Remove useless import ## What changes were proposed in this pull request? While backporting the patch to 2.4/2.3, I realized that the patch introduces unneeded imports (probably leftovers from intermediate changes). This PR removes the useless import. ## How was this patch tested? NA Closes #23451 from mgaido91/SPARK-26078_FOLLOWUP. Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2019-01-05 01:14:58 -08:00
Dongjoon Hyun	e15a319ccd	[SPARK-26536][BUILD][TEST] Upgrade Mockito to 2.23.4 ## What changes were proposed in this pull request? This PR upgrades Mockito from 1.10.19 to 2.23.4. The following changes are required. - Replace `org.mockito.Matchers` with `org.mockito.ArgumentMatchers` - Replace `anyObject` with `any` - Replace `getArgumentAt` with `getArgument` and add type annotation. - Use `isNull` matcher in case of `null` is invoked. ```scala saslHandler.channelInactive(null); - verify(handler).channelInactive(any(TransportClient.class)); + verify(handler).channelInactive(isNull()); ``` - Make and use `doReturn` wrapper to avoid [SI-4775](https://issues.scala-lang.org/browse/SI-4775) ```scala private def doReturn(value: Any) = org.mockito.Mockito.doReturn(value, Seq.empty: _*) ``` ## How was this patch tested? Pass the Jenkins with the existing tests. Closes #23452 from dongjoon-hyun/SPARK-26536. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2019-01-04 19:23:38 -08:00
Sean Owen	36440e6447	[SPARK-26306][TEST][BUILD] More memory to de-flake SorterSuite ## What changes were proposed in this pull request? Increase test memory to avoid OOM in TimSort-related tests. ## How was this patch tested? Existing tests. Closes #23425 from srowen/SPARK-26306. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-04 15:35:23 -06:00
Liu,Linhong	f65dc9593e	[SPARK-26526][SQL][TEST] Fix invalid test case about non-deterministic expression ## What changes were proposed in this pull request? Test case in SPARK-10316 is used to make sure non-deterministic `Filter` won't be pushed through `Project` But in current code base this test case can't cover this purpose. Change LogicalRDD to HadoopFsRelation can fix this issue. ## How was this patch tested? Modified test pass. Closes #23440 from LinhongLiu/fix-test. Authored-by: Liu,Linhong <liulinhong@baidu.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-01-04 10:51:33 +08:00
Gengliang Wang	e2dbafdbc5	[SPARK-26447][SQL] Allow OrcColumnarBatchReader to return less partition columns ## What changes were proposed in this pull request? Currently OrcColumnarBatchReader returns all the partition column values in the batch read. In data source V2, we can improve it by returning the required partition column values only. This PR is part of https://github.com/apache/spark/pull/23383 . As cloud-fan suggested, create a new PR to make review easier. Also, this PR doesn't improve `OrcFileFormat`, since in the method `buildReaderWithPartitionValues`, the `requiredSchema` filter out all the partition columns, so we can't know which partition column is required. ## How was this patch tested? Unit test Closes #23387 from gengliangwang/refactorOrcColumnarBatch. Lead-authored-by: Gengliang Wang <gengliang.wang@databricks.com> Co-authored-by: Gengliang Wang <ltnwgl@gmail.com> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-01-04 00:37:03 +08:00
Liang-Chi Hsieh	40711eef16	[SPARK-26517][SQL][TEST] Avoid duplicate test in ParquetSchemaPruningSuite ## What changes were proposed in this pull request? `testExactCaseQueryPruning` and `testMixedCaseQueryPruning` don't need to set up `PARQUET_VECTORIZED_READER_ENABLED` config. Because `withMixedCaseData` will run against both Spark vectorized reader and Parquet-mr reader. ## How was this patch tested? Existing test. Closes #23427 from viirya/fix-parquet-schema-pruning-test. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-03 10:30:47 -06:00
Maxim Gekk	2a30deb85a	[SPARK-26502][SQL] Move hiveResultString() from QueryExecution to HiveResult ## What changes were proposed in this pull request? In the PR, I propose to move `hiveResultString()` out of `QueryExecution` and put it to a separate object. Closes #23409 from MaxGekk/hive-result-string. Lead-authored-by: Maxim Gekk <maxim.gekk@databricks.com> Co-authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>	2019-01-03 11:27:40 +01:00
Hyukjin Kwon	56967b7e28	[SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API ## What changes were proposed in this pull request? This PR fixes `pivot(Column)` can accepts `collection.mutable.WrappedArray`. Note that we return `collection.mutable.WrappedArray` from `ArrayType`, and `Literal.apply` doesn't support this. We can unwrap the array and use it for type dispatch. ```scala val df = Seq( (2, Seq.empty[String]), (2, Seq("a", "x")), (3, Seq.empty[String]), (3, Seq("a", "x"))).toDF("x", "s") df.groupBy("x").pivot("s").count().show() ``` Before: ``` Unsupported literal type class scala.collection.mutable.WrappedArray$ofRef WrappedArray() java.lang.RuntimeException: Unsupported literal type class scala.collection.mutable.WrappedArray$ofRef WrappedArray() at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:80) at org.apache.spark.sql.RelationalGroupedDataset.$anonfun$pivot$2(RelationalGroupedDataset.scala:427) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:39) at scala.collection.TraversableLike.map(TraversableLike.scala:237) at scala.collection.TraversableLike.map$(TraversableLike.scala:230) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.sql.RelationalGroupedDataset.pivot(RelationalGroupedDataset.scala:425) at org.apache.spark.sql.RelationalGroupedDataset.pivot(RelationalGroupedDataset.scala:406) at org.apache.spark.sql.RelationalGroupedDataset.pivot(RelationalGroupedDataset.scala:317) at org.apache.spark.sql.DataFramePivotSuite.$anonfun$new$1(DataFramePivotSuite.scala:341) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ``` After: ``` +---+---+------+ \| x\| []\|[a, x]\| +---+---+------+ \| 3\| 1\| 1\| \| 2\| 1\| 1\| +---+---+------+ ``` ## How was this patch tested? Manually tested and unittests were added. Closes #23349 from HyukjinKwon/SPARK-26403. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-01-03 11:01:54 +08:00
Maxim Gekk	8be4d24a27	[SPARK-26023][SQL][FOLLOWUP] Dumping truncated plans and generated code to a file ## What changes were proposed in this pull request? `DataSourceScanExec` overrides "wrong" `treeString` method without `append`. In the PR, I propose to make `treeString`s final to prevent such mistakes in the future. And removed the `treeString` and `verboseString` since they both use `simpleString` with reduction. ## How was this patch tested? It was tested by `DataSourceScanExecRedactionSuite` Closes #23431 from MaxGekk/datasource-scan-exec-followup. Authored-by: Maxim Gekk <maxim.gekk@databricks.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2019-01-02 16:57:10 -08:00
seancxmao	d40654861b	[SPARK-26277][SQL][TEST] WholeStageCodegen metrics should be tested with whole-stage codegen enabled ## What changes were proposed in this pull request? In `org.apache.spark.sql.execution.metric.SQLMetricsSuite`, there's a test case named "WholeStageCodegen metrics". However, it is executed with whole-stage codegen disabled. This PR fixes this by enable whole-stage codegen for this test case. ## How was this patch tested? Tested locally using exiting test cases. Closes #23224 from seancxmao/codegen-metrics. Authored-by: seancxmao <seancxmao@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-02 15:45:14 -06:00
Kazuaki Ishizaki	79b05481a2	[SPARK-26508][CORE][SQL] Address warning messages in Java reported at lgtm.com ## What changes were proposed in this pull request? This PR addresses warning messages in Java files reported at [lgtm.com](https://lgtm.com). [lgtm.com](https://lgtm.com) provides automated code review of Java/Python/JavaScript files for OSS projects. [Here](https://lgtm.com/projects/g/apache/spark/alerts/?mode=list&severity=warning) are warning messages regarding Apache Spark project. This PR addresses the following warnings: - Result of multiplication cast to wider type - Implicit narrowing conversion in compound assignment - Boxed variable is never null - Useless null check NOTE: `Potential input resource leak` looks false positive for now. ## How was this patch tested? Existing UTs Closes #23420 from kiszk/SPARK-26508. Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-01 22:37:28 -06:00
Hyukjin Kwon	39a0493387	[SPARK-26227][R] from_[csv\|json] should accept schema_of_[csv\|json] in R API ## What changes were proposed in this pull request? 1. Document `from_csv(..., schema_of_csv(...))` support: ```R csv <- "Amsterdam,2018" df <- sql(paste0("SELECT '", csv, "' as csv")) head(select(df, from_csv(df$csv, schema_of_csv(csv)))) ``` ``` from_csv(csv) 1 Amsterdam, 2018 ``` 2. Allow `from_json(..., schema_of_json(...))` Before: ```R df2 <- sql("SELECT named_struct('name', 'Bob') as people") df2 <- mutate(df2, people_json = to_json(df2$people)) head(select(df2, from_json(df2$people_json, schema_of_json(head(df2)$people_json)))) ``` ``` Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘from_json’ for signature ‘"Column", "Column"’ ``` After: ```R df2 <- sql("SELECT named_struct('name', 'Bob') as people") df2 <- mutate(df2, people_json = to_json(df2$people)) head(select(df2, from_json(df2$people_json, schema_of_json(head(df2)$people_json)))) ``` ``` from_json(people_json) 1 Bob ``` 3. (While I'm here) Allow `structType` as schema for `from_csv` support to match with `from_json`. Before: ```R csv <- "Amsterdam,2018" df <- sql(paste0("SELECT '", csv, "' as csv")) head(select(df, from_csv(df$csv, structType("city STRING, year INT")))) ``` ``` Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘from_csv’ for signature ‘"Column", "structType"’ ``` After: ```R csv <- "Amsterdam,2018" df <- sql(paste0("SELECT '", csv, "' as csv")) head(select(df, from_csv(df$csv, structType("city STRING, year INT")))) ``` ``` from_csv(csv) 1 Amsterdam, 2018 ``` ## How was this patch tested? Manually tested and unittests were added. Closes #23184 from HyukjinKwon/SPARK-26227-1. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-01-02 08:01:34 +08:00
Maxim Gekk	5da55873fa	[SPARK-26374][TEST][SQL] Enable TimestampFormatter in HadoopFsRelationTest ## What changes were proposed in this pull request? Default timestamp pattern defined in `JSONOptions` doesn't allow saving/loading timestamps with time zones of seconds precision. Because of that, the round trip test failed for timestamps before 1582. In the PR, I propose to extend zone offset section from `XXX` to `XXXXX` which should allow to save/load zone offsets like `-07:52:48`. ## How was this patch tested? It was tested by `JsonHadoopFsRelationSuite` and `TimestampFormatterSuite`. Closes #23417 from MaxGekk/hadoopfsrelationtest-new-formatter. Lead-authored-by: Maxim Gekk <max.gekk@gmail.com> Co-authored-by: Maxim Gekk <maxim.gekk@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-01-02 07:59:32 +08:00
zhoukang	2bf4d97118	[SPARK-24544][SQL] Print actual failure cause when look up function failed ## What changes were proposed in this pull request? When we operate as below: ` 0: jdbc:hive2://xxx/> create function funnel_analysis as 'com.xxx.hive.extend.udf.UapFunnelAnalysis'; ` ` 0: jdbc:hive2://xxx/> select funnel_analysis(1,",",1,''); Error: org.apache.spark.sql.AnalysisException: Undefined function: 'funnel_analysis'. This function is neither a registered temporary function nor a permanent function registered in the database 'xxx'.; line 1 pos 7 (state=,code=0) ` ` 0: jdbc:hive2://xxx/> describe function funnel_analysis; +-----------------------------------------------------------+--+ \| function_desc \| +-----------------------------------------------------------+--+ \| Function: xxx.funnel_analysis \| \| Class: com.xxx.hive.extend.udf.UapFunnelAnalysis \| \| Usage: N/A. \| +-----------------------------------------------------------+--+ ` We can see describe funtion will get right information,but when we actually use this funtion,we will get an undefined exception. Which is really misleading,the real cause is below: ` No handler for Hive UDF 'com.xxx.xxx.hive.extend.udf.UapFunnelAnalysis': java.lang.IllegalStateException: Should not be called directly; at org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.initialize(GenericUDTF.java:72) at org.apache.spark.sql.hive.HiveGenericUDTF.outputInspector$lzycompute(hiveUDFs.scala:204) at org.apache.spark.sql.hive.HiveGenericUDTF.outputInspector(hiveUDFs.scala:204) at org.apache.spark.sql.hive.HiveGenericUDTF.elementSchema$lzycompute(hiveUDFs.scala:212) at org.apache.spark.sql.hive.HiveGenericUDTF.elementSchema(hiveUDFs.scala:212) ` This patch print the actual failure for quick debugging. ## How was this patch tested? UT Closes #21790 from caneGuy/zhoukang/print-warning1. Authored-by: zhoukang <zhoukang199191@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-01 09:13:13 -06:00
Thomas D'Silva	5f0ddd2d6e	[SPARK-26499][SQL] JdbcUtils.makeGetter does not handle ByteType …Type ## What changes were proposed in this pull request? Modifed JdbcUtils.makeGetter to handle ByteType. ## How was this patch tested? Added a new test to JDBCSuite that maps ```TINYINT``` to ```ByteType```. Closes #23400 from twdsilva/tiny_int_support. Authored-by: Thomas D'Silva <tdsilva@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-01-01 14:11:14 +08:00
Hyukjin Kwon	f7455618ce	Revert "[SPARK-26339][SQL] Throws better exception when reading files that start with underscore" This reverts commit `c0b9db120d`.	2019-01-01 09:29:28 +08:00
Herman van Hovell	c0368363f8	[SPARK-26495][SQL] Simplify the SelectedField extractor. ## What changes were proposed in this pull request? The current `SelectedField` extractor is somewhat complicated and it seems to be handling cases that should be handled automatically: - `GetArrayItem(child: GetStructFieldObject())` - `GetArrayStructFields(child: GetArrayStructFields())` - `GetMap(value: GetStructFieldObject())` This PR removes those cases and simplifies the extractor by passing down the data type instead of a field. ## How was this patch tested? Existing tests. Closes #23397 from hvanhovell/SPARK-26495. Authored-by: Herman van Hovell <hvanhovell@databricks.com> Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>	2018-12-31 17:46:06 +01:00

1 2 3 4 5 ...

7443 commits