### What changes were proposed in this pull request?
In SPARK-24994 we implemented unwrapping cast for **integral types**. This extends it to support **numeric types** such as float/double/decimal, so that filters involving these types can be better pushed down to data sources.
Unlike the cases of integral types, conversions between numeric types can result to rounding up or downs. Consider the following case:
```sql
cast(e as double) < 1.9
```
assume type of `e` is short, since 1.9 is not representable in the type, the casting will either truncate or round. Now suppose the literal is truncated, we cannot convert the expression to:
```sql
e < cast(1.9 as short)
```
as in the previous implementation, since if `e` is 1, the original expression evaluates to true, but converted expression will evaluate to false.
To resolve the above, this PR first finds out whether casting from the wider type to the narrower type will result to truncate or round, by comparing a _roundtrip value_ derived from **converting the literal first to the narrower type, and then to the wider type**, versus the original literal value. For instance, in the above, we'll first obtain a roundtrip value via the conversion (double) 1.9 -> (short) 1 -> (double) 1.0, and then compare it against 1.9.
<img width="1153" alt="Screen Shot 2020-09-28 at 3 30 27 PM" src="https://user-images.githubusercontent.com/506679/94492719-bd29e780-019f-11eb-9111-71d6e3d157f7.png">
Now in the case of truncate, we'd convert the original expression to:
```sql
e <= cast(1.9 as short)
```
instead, so that the conversion also is valid when `e` is 1.
For more details, please check [this blog post](https://prestosql.io/blog/2019/05/21/optimizing-the-casts-away.html) by Presto which offers a very good explanation on how it works.
### Why are the changes needed?
For queries such as:
```sql
SELECT * FROM tbl WHERE short_col < 100.5
```
The predicate `short_col < 100.5` can't be pushed down to data sources because it involves casts. This eliminates the cast so these queries can run more efficiently.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit tests
Closes#29792 from sunchao/SPARK-32858.
Lead-authored-by: Chao Sun <sunchao@apple.com>
Co-authored-by: Chao Sun <sunchao@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
Fix Javadoc generation errors in `kvstore` and `unsafe` modules according to error message hints.
### Why are the changes needed?
Fixes `doc` task failures which prevented other tasks successful executions (eg `publishLocal` task depends on `doc` task).
### Does this PR introduce _any_ user-facing change?
No.
Meaning of text in Javadoc is stayed the same.
### How was this patch tested?
Run `build/sbt kvstore/Compile/doc`, `build/sbt unsafe/Compile/doc` and `build/sbt doc` without errors.
Closes#30007 from gemelen/feature/doc-task-fix.
Authored-by: Denis Pyshev <git@gemelen.net>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Add `And(IsNotNull(e), GreaterThan(Size(e), Literal(0)))` filter before Explode, PosExplode and Inline, when `outer = false`.
Removed unused `InferFiltersFromConstraints` from `operatorOptimizationRuleSet` to avoid confusion that happened during the review process.
### Why are the changes needed?
Predicate pushdown will be able to move this new filter down through joins and into data sources for performance improvement.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit test
Closes#29092 from tanelk/SPARK-32295.
Lead-authored-by: tanel.kiis@gmail.com <tanel.kiis@gmail.com>
Co-authored-by: Tanel Kiis <tanel.kiis@reach-u.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
### What changes were proposed in this pull request?
`ScalarSubquery` should returns the first two rows.
### Why are the changes needed?
To avoid Driver OOM.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing test: d6f3138352/sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala (L147-L154)Closes#30016 from wangyum/SPARK-33119.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
use `lazy array` instead of `var` for auxiliary variables in binary lor
### Why are the changes needed?
In https://github.com/apache/spark/pull/29255, I made a mistake:
the `private var _threshold` and `_rawThreshold` are initialized by defaut values of `threshold`, that is beacuse:
1, param `threshold` is set default value at first;
2, `_threshold` and `_rawThreshold` are initialized based on the default value;
3, param `threshold` is updated by the value from estimator, by `copyValues` method:
```
if (map.contains(param) && to.hasParam(param.name)) {
to.set(param.name, map(param))
}
```
We can update `_threshold` and `_rawThreshold` in `setThreshold` and `setThresholds`, but we can not update them in `set`/`copyValues` so their values are kept until methods `setThreshold` and `setThresholds` are called.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
test in repl
Closes#30013 from zhengruifeng/lor_threshold_init.
Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: zhengruifeng <ruifengz@foxmail.com>
### What changes were proposed in this pull request?
We have a problem when you use CREATE TEMPORARY TABLE with LOCATION
```scala
spark.range(3).write.parquet("/tmp/testspark1")
sql("CREATE TEMPORARY TABLE t USING parquet OPTIONS (path '/tmp/testspark1')")
sql("CREATE TEMPORARY TABLE t USING parquet LOCATION '/tmp/testspark1'")
```
```scala
org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.;
at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$12(DataSource.scala:200)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:200)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:408)
at org.apache.spark.sql.execution.datasources.CreateTempViewUsing.run(ddl.scala:94)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3618)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3616)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:229)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
```
This bug was introduced by SPARK-30507.
sparksqlparser --> visitCreateTable --> visitCreateTableClauses --> cleanTableOptions extract the path from the options but in this case CreateTempViewUsing need the path in the options map.
### Why are the changes needed?
To fix the problem
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit testing and manual testing
Closes#30014 from planga82/bugfix/SPARK-33118_create_temp_table_location.
Authored-by: Pablo <pablo.langa@stratio.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
With following scenario when AQE is on, SQLMetrics could be incorrect.
1. Stage A and B are created, and UI updated thru event onAdaptiveExecutionUpdate.
2. Stage A and B are running. Subquery in stage A keep updating metrics thru event onAdaptiveSQLMetricUpdate.
3. Stage B completes, while stage A's subquery is still running, updating metrics.
4. Completion of stage B triggers new stage creation and UI update thru event onAdaptiveExecutionUpdate again (just like step 1).
So decided to make a trade off of keeping more duplicate SQLMetrics without deleting them when AQE with newPlan updated.
### Why are the changes needed?
Make SQLMetrics behavior 100% correct.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Updated SQLAppStatusListenerSuite.
Closes#29965 from leanken/leanken-SPARK-33016.
Authored-by: xuewei.linxuewei <xuewei.linxuewei@alibaba-inc.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
1, when `predictionCol` and `quantilesCol` are both set, we only need one prediction for each row: prediction is just the variable `lambda` in `predictQuantiles`;
2, in the computation of variable `quantiles` in `predictQuantiles`, a pre-computed vector `val baseQuantiles = $(quantileProbabilities).map(q => math.exp(math.log(-math.log1p(-q)) * scale))` can be reused for each row;
### Why are the changes needed?
avoid redundant computation in transform, like what we did in `ProbabilisticClassificationModel`, `GaussianMixtureModel`, etc
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
existing testsuite
Closes#30000 from zhengruifeng/aft_predict_transform_opt.
Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
This PR intends to correct version values (`3.0.0` -> `3.1.0`) of three configs below in `SQLConf`:
- spark.sql.planChangeLog.level
- spark.sql.planChangeLog.rules
- spark.sql.planChangeLog.batches
This PR comes from https://github.com/apache/spark/pull/29544#discussion_r503049350.
### Why are the changes needed?
Bugfix.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
N/A
Closes#30015 from maropu/pr29544-FOLLOWUP.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade ZStandard library for Apache Spark 3.1.0.
### Why are the changes needed?
This will bring the latest bug fixes.
- 2662fbdc32
- bbe140b758
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CI.
Closes#30010 from dongjoon-hyun/SPARK-33117.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This pr removes `com.twitter:parquet-hadoop-bundle:1.6.0` and `orc.classifier`.
### Why are the changes needed?
To make code more clear and readable.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing test.
Closes#30005 from wangyum/SPARK-33107.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
At the moment there is no possibility to turn off JDBC authentication providers which exists on the classpath. This can be problematic because service providers are loaded with service loader. In this PR I've added `spark.sql.sources.disabledJdbcConnProviderList` configuration possibility (default: empty).
### Why are the changes needed?
No possibility to turn off JDBC authentication providers.
### Does this PR introduce _any_ user-facing change?
Yes, it introduces new configuration option.
### How was this patch tested?
* Existing + newly added unit tests.
* Existing integration tests.
Closes#29964 from gaborgsomogyi/SPARK-32047.
Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This PR aims to rename hdpVersion to versionValue.
Use the general variable name.
No.
Pass the CI.
Closes#30008 from williamhyun/sbt.
Authored-by: William Hyun <williamhyun3@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
Rename manually added resolver for local Ivy repo.
Create configuration to publish to local Ivy repo similar to Maven one.
Use `publishLocal` to publish both to local Maven and Ivy repos instead
of custom task `localPublish` (renamed from `publish-local` of sbt 0.13.x).
### Why are the changes needed?
There are two resolvers (bootResolvers's "local" and manually added "local") that point to the same local Ivy repo, but have different configurations, which led to excessive warnings in logs and, potentially, resolution issues.
Changeset fixes that case, observable in sbt output as
```
[warn] Multiple resolvers having different access mechanism configured with same name 'local'. To avoid conflict, Remove duplicate project resolvers (`resolvers`) or rename publishing resolve
r (`publishTo`).
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Executed `build/sbt`'s `publishLocal` task on individual module and on root project.
Closes#30006 from gemelen/feature/local-resolvers.
Authored-by: Denis Pyshev <git@gemelen.net>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
Remove unused `typing.Optional` import from `pyspark.resource.profile` stub.
### Why are the changes needed?
Since SPARK-32319 we don't allow unused imports. However, this one slipped both local and CI tests for some reason.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existing tests and mypy check.
Closes#30002 from zero323/SPARK-33086-FOLLOWUP.
Authored-by: zero323 <mszymkiewicz@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This pr remove `hive-2.3` workaround code.
### Why are the changes needed?
Make code more clear and readable.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing unit tests.
Closes#29996 from wangyum/SPARK-33107.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
Decommissioning can run out of time resulting in some race condition, these race conditions result in confusing error messages but not negative impact.
### Why are the changes needed?
The NPE & element missing errors in the log can create a missunderstanding.
### Does this PR introduce _any_ user-facing change?
Logs change.
### How was this patch tested?
Existing tests pass.
Closes#29992 from holdenk/SPARK-32881-error-messaging-on-decom-race-messages.
Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to remove `sbt-dependency-graph` SBT plugin.
### Why are the changes needed?
`sbt-dependency-graph` officially doesn't support SBT 1.3.x and it's broken due to `NoSuchMethodError`. This cannot be fixed in `sbt-dependency-graph` side at SBT 1.3.x
- https://github.com/sbt/sbt-dependency-graph
> Note: Under sbt >= 1.3.x some features might currently not work as expected or not at all (like dependencyLicenses).
```
$ build/sbt dependencyTree
Launching sbt from build/sbt-launch-1.3.13.jar
[info] welcome to sbt 1.3.13 (AdoptOpenJDK Java 1.8.0_252)
...
[error] java.lang.NoSuchMethodError: sbt.internal.LibraryManagement$.cachedUpdate(Lsbt/librarymanagement/DependencyResolution;Lsbt/librarymanagement/ModuleDescriptor;Lsbt/util/CacheStoreFactory;Ljava/lang/String;Lsbt/librarymanagement/UpdateConfiguration;Lscala/Function1;ZZZLsbt/librarymanagement/UnresolvedWarningConfiguration;Lsbt/librarymanagement/EvictionWarningOptions;ZLsbt/internal/librarymanagement/CompatibilityWarningOptions;Lsbt/util/Logger;)Lsbt/librarymanagement/UpdateReport;
```
**ALTERNATIVES**
- One alternative is `coursier`, but it requires `coursier-based sbt launcher` which is more intrusive.
- https://get-coursier.io/docs/sbt-coursier.html#sbt-13x
> you'll have to use the coursier-based sbt launcher, via its custom sbt-extras launcher for example.
- Another alternative is moving to `SBT 1.4.0` which uses `sbt-dependency-graph` as a built-in, but it's still new and will requires many change.
So, this PR aims to remove the broken plugin simply.
### Does this PR introduce _any_ user-facing change?
No. This is a dev-only change.
### How was this patch tested?
Manual.
```
$ build/sbt dependencyTree
...
[error] Not a valid command: dependencyTree
[error] Not a valid project ID: dependencyTree
[error] Not a valid key: dependencyTree (similar: dependencyOverrides, sbtDependency, dependencyResolution)
[error] dependencyTree
[error] ^
```
Closes#29997 from dongjoon-hyun/remove_depedencyTree.
Lead-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
While I've implemented JDBC provider disable functionality it has been popped up [here](https://github.com/apache/spark/pull/29964#discussion_r501786746) that `Utils.stringToSeq` must be used when String list type SQL parameter handled. In this PR I've fixed the problematic parameters.
### Why are the changes needed?
`Utils.stringToSeq` must be used when String list type SQL parameter handled.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing unit tests.
Closes#29989 from gaborgsomogyi/SPARK-33102.
Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
- Change default R `arch` from `i386` to `x64`, to match Rtools version.
- Parameterize `BINPREF` with `WIN` (https://stackoverflow.com/a/44035904)
Reported on dev:
http://apache-spark-developers-list.1001551.n3.nabble.com/Broken-rlang-installation-on-AppVeyor-td30294.html
### Why are the changes needed?
It seems like update from rlang 0.4.7 to 0.4.8 exposed an issue, where build fails because of incompatible ddl
```
c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
skipping incompatible C:/R/bin/i386/R.dll when searching for -lR
[00:01:52]
c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
skipping incompatible C:/R/bin/i386/R.dll when searching for -lR
[00:01:52]
c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
cannot find -lR
[00:01:52] collect2.exe: error: ld returned 1 exit status
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing tests.
Closes#29991 from zero323/APPVEYOR-DEAFAULT-ARCH.
Authored-by: zero323 <mszymkiewicz@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR removes the leftover of Hive 1.2 workarounds and Hive 1.2 profile in Jenkins script.
- `test-hive1.2` title is not used anymore in Jenkins
- Remove some comments related to Hive 1.2
- Remove unused codes in `OrcFilters.scala` Hive
- Test `spark.sql.hive.convertMetastoreOrc` disabled case for the tests added at SPARK-19809 and SPARK-22267
### Why are the changes needed?
To remove unused codes & improve test coverage
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
Manually ran the unit tests. Also It will be tested in CI in this PR.
Closes#29973 from HyukjinKwon/SPARK-33082-SPARK-20202.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR proposes to add `DataStreamWriter.table` to specify the output "table" to write from the streaming query.
### Why are the changes needed?
For now, there's no way to write to the table (especially catalog table) even the table is capable to handle streaming write, so even with Spark 3, writing to the catalog table via SS should go through the `DataStreamWriter.format(provider)` and wish the provider can handle it as same as we do with catalog table.
With the new API, we can directly point to the catalog table which supports streaming write. Some of usages are covered with tests - simply saying, end users can do the following:
```scala
// assuming `testcat` is a custom catalog, and `ns` is a namespace in the catalog
spark.sql("CREATE TABLE testcat.ns.table1 (id bigint, data string) USING foo")
val query = inputDF
.writeStream
.table("testcat.ns.table1")
.option(...)
.start()
```
### Does this PR introduce _any_ user-facing change?
Yes, as this adds a new public API in DataStreamWriter. This doesn't bring backward incompatible change.
### How was this patch tested?
New unit tests.
Closes#29767 from HeartSaVioR/SPARK-32896.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to protect the executor pod request or pending pod during executor idle timeout.
### Why are the changes needed?
In case of dynamic allocation, Apache Spark K8s `ExecutorPodsAllocator` cancels the pod requests or pending pods too eagerly. Like the following example, `ExecutorPodsAllocator` received the new total executor adjust request rapidly in two minutes. Sometimes, it's called 3 times in a single second. It repeats `request` and `delete` on that request or pending pod frequently. This PR is reusing `spark.dynamicAllocation.executorIdleTimeout (default: 60s)` to keep the pod request or pending pod.
```
20/10/08 05:58:08 INFO ExecutorPodsAllocator: Set totalExpectedExecutors to 3
20/10/08 05:58:08 INFO ExecutorPodsAllocator: Going to request 3 executors from Kubernetes.
20/10/08 05:58:09 INFO ExecutorPodsAllocator: Set totalExpectedExecutors to 3
20/10/08 05:58:43 INFO ExecutorPodsAllocator: Set totalExpectedExecutors to 1
20/10/08 05:58:47 INFO ExecutorPodsAllocator: Set totalExpectedExecutors to 0
20/10/08 05:59:26 INFO ExecutorPodsAllocator: Set totalExpectedExecutors to 3
20/10/08 05:59:30 INFO ExecutorPodsAllocator: Set totalExpectedExecutors to 2
20/10/08 05:59:31 INFO ExecutorPodsAllocator: Set totalExpectedExecutors to 3
20/10/08 05:59:44 INFO ExecutorPodsAllocator: Set totalExpectedExecutors to 2
20/10/08 05:59:44 INFO ExecutorPodsAllocator: Set totalExpectedExecutors to 0
20/10/08 05:59:45 INFO ExecutorPodsAllocator: Set totalExpectedExecutors to 3
20/10/08 05:59:50 INFO ExecutorPodsAllocator: Set totalExpectedExecutors to 2
20/10/08 05:59:50 INFO ExecutorPodsAllocator: Set totalExpectedExecutors to 1
20/10/08 05:59:50 INFO ExecutorPodsAllocator: Set totalExpectedExecutors to 0
20/10/08 05:59:54 INFO ExecutorPodsAllocator: Set totalExpectedExecutors to 3
20/10/08 05:59:54 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes.
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the newly added test case.
Closes#29981 from dongjoon-hyun/SPARK-K8S-INITIAL.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
Propagate LibSVM options to Hadoop configs in the LibSVM datasource.
### Why are the changes needed?
There is a bug that when running:
```scala
spark.read.format("libsvm").options(conf).load(path)
```
The underlying file system will not receive the `conf` options.
### Does this PR introduce _any_ user-facing change?
Yes. After the changes, for example, users should read files from Azure Data Lake successfully:
```scala
def hadoopConf1() = Map[String, String](
s"fs.adl.oauth2.access.token.provider.type" -> "ClientCredential",
s"fs.adl.oauth2.client.id" -> dbutils.secrets.get(scope = "...", key = "..."),
s"fs.adl.oauth2.credential" -> dbutils.secrets.get(scope = "...", key = "..."),
s"fs.adl.oauth2.refresh.url" -> s"https://login.microsoftonline.com/.../oauth2/token")
val df = spark.read.format("libsvm").options(hadoopConf1).load("adl://....azuredatalakestore.net/foldersp1/...")
```
and not get the following exception because the settings above are not propagated to the filesystem:
```java
java.lang.IllegalArgumentException: No value for fs.adl.oauth2.access.token.provider found in conf file.
at ....adl.AdlFileSystem.getNonEmptyVal(AdlFileSystem.java:820)
at ....adl.AdlFileSystem.getCustomAccessTokenProvider(AdlFileSystem.java:220)
at ....adl.AdlFileSystem.getAccessTokenProvider(AdlFileSystem.java:257)
at ....adl.AdlFileSystem.initialize(AdlFileSystem.java:164)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
```
### How was this patch tested?
Added UT to `LibSVMRelationSuite`.
Closes#29984 from MaxGekk/ml-option-propagation.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
- Annotated return types of `assert_true` and `raise_error` as discussed [here](https://github.com/apache/spark/pull/29947#pullrequestreview-504495801).
- Add `assert_true` and `raise_error` to SparkR NAMESPACE.
- Validating message vector size in SparkR as discussed [here](https://github.com/apache/spark/pull/29947#pullrequestreview-504539004).
### Why are the changes needed?
As discussed in review for #29947.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
- Existing tests.
- Validation of annotations using MyPy
Closes#29978 from zero323/SPARK-32793-FOLLOW-UP.
Authored-by: zero323 <mszymkiewicz@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Add distinct info at `UnresolvedFunction.toString`.
### Why are the changes needed?
Make `UnresolvedFunction` info complete.
```
create table test (c1 int, c2 int);
explain extended select sum(distinct c1) from test;
-- before this pr
== Parsed Logical Plan ==
'Project [unresolvedalias('sum('c1), None)]
+- 'UnresolvedRelation [test]
-- after this pr
== Parsed Logical Plan ==
'Project [unresolvedalias('sum(distinct 'c1), None)]
+- 'UnresolvedRelation [test]
```
### Does this PR introduce _any_ user-facing change?
Yes, get distinct info during sql parse.
### How was this patch tested?
manual test.
Closes#29586 from ulysses-you/SPARK-32743.
Authored-by: ulysses <youxiduo@weidian.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
### What changes were proposed in this pull request?
Propagate ORC options to Hadoop configs in Hive `OrcFileFormat` and in the regular ORC datasource.
### Why are the changes needed?
There is a bug that when running:
```scala
spark.read.format("orc").options(conf).load(path)
```
The underlying file system will not receive the conf options.
### Does this PR introduce _any_ user-facing change?
Yes
### How was this patch tested?
Added UT to `OrcSourceSuite`.
Closes#29976 from MaxGekk/orc-option-propagation.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to use `LinkedHashMap` instead of `Map` for `newlyCreatedExecutors`.
### Why are the changes needed?
This makes log messages (INFO/DEBUG) more readable. This is helpful when `spark.kubernetes.allocation.batch.size` is large and especially when K8s dynamic allocation is used.
**BEFORE**
```
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 8 was not found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 2 was not found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 5 was not found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 4 was not found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 7 was not found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 10 was not found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 9 was not found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 3 was not found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 6 was not found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 INFO ExecutorPodsAllocator: Deleting 9 excess pod requests (5,10,6,9,2,7,3,8,4).
```
**AFTER**
```
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 2 was not found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 3 was not found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 4 was not found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 5 was not found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 6 was not found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 7 was not found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 8 was not found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 9 was not found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 10 was not found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 INFO ExecutorPodsAllocator: Deleting 9 excess pod requests (2,3,4,5,6,7,8,9,10).
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CI or `build/sbt -Pkubernetes "kubernetes/test"`
Closes#29979 from dongjoon-hyun/SPARK-K8S-LOG.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
While explaining pySpark usage, use of repeated synonymous words were causing confusion.
Removed "instead of a JAR" word, to keep it more readable.
### Why are the changes needed?
To keep the docs more readable and easy to understand.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
No code changes, minor documentation change only. No tests added.
Closes#29956 from manubatham20/patch-1.
Authored-by: manubatham20 <manubatham2006@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
This is a kind of a followup of SPARK-32646. New JIRA was filed to control the fixed versions properly.
When you use `map`, it might be lazily evaluated and not executed. To avoid this, we should better use `foreach`. See also SPARK-16694. Current codes look not causing any bug for now but it should be best to fix to avoid potential issues.
### Why are the changes needed?
To avoid potential issues from `map` being lazy and not executed.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Ran related tests. CI in this PR should verify.
Closes#29974 from HyukjinKwon/SPARK-32646.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
### What changes were proposed in this pull request?
1. Add new method to the `JdbcDialect` class - `classifyException()`. It converts dialect specific exception to Spark's `AnalysisException` or its sub-classes.
2. Replace H2 exception `org.h2.jdbc.JdbcSQLException` in `JDBCTableCatalogSuite` by `AnalysisException`.
3. Add `H2Dialect`
### Why are the changes needed?
Currently JDBC v2 Table Catalog implementation throws dialect specific exception and ignores exceptions defined in the `TableCatalog` interface. This PR adds new method for converting dialect specific exception, and assumes that follow up PRs will implement `classifyException()`.
### Does this PR introduce _any_ user-facing change?
Yes.
### How was this patch tested?
By running existing test suites `JDBCTableCatalogSuite` and `JDBCV2Suite`.
Closes#29952 from MaxGekk/jdbcv2-classify-exception.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
In `AvroUtils`'s `inferSchema()`, propagate Hadoop config from DS options to underlying HDFS file system.
### Why are the changes needed?
There is a bug that when running:
```scala
spark.read.format("avro").options(conf).load(path)
```
The underlying file system will not receive the `conf` options.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
unit test added
Closes#29971 from yuningzh-db/avro_options.
Authored-by: Yuning Zhang <yuning.zhang@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
## What changes were proposed in this pull request?
Adds a SQL function `raise_error` which underlies the refactored `assert_true` function. `assert_true` now also (optionally) accepts a custom error message field.
`raise_error` is exposed in SQL, Python, Scala, and R.
`assert_true` was previously only exposed in SQL; it is now also exposed in Python, Scala, and R.
### Why are the changes needed?
Improves usability of `assert_true` by clarifying error messaging, and adds the useful helper function `raise_error`.
### Does this PR introduce _any_ user-facing change?
Yes:
- Adds `raise_error` function to the SQL, Python, Scala, and R APIs.
- Adds `assert_true` function to the SQL, Python and R APIs.
### How was this patch tested?
Adds unit tests in SQL, Python, Scala, and R for `assert_true` and `raise_error`.
Closes#29947 from karenfeng/spark-32793.
Lead-authored-by: Karen Feng <karen.feng@databricks.com>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR adds `dropFields` method to:
- PySpark `Column`
- SparkR `Column`
### Why are the changes needed?
Feature parity.
### Does this PR introduce _any_ user-facing change?
No, new API.
### How was this patch tested?
- New unit tests.
- Manual verification of examples / doctests.
- Manual run of MyPy tests
Closes#29967 from zero323/SPARK-32511-FOLLOW-UP-PYSPARK-SPARKR.
Authored-by: zero323 <mszymkiewicz@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR replaces dynamically generated annotations for following modules:
- `pyspark.resource.information`
- `pyspark.resource.profile`
- `pyspark.resource.requests`
### Why are the changes needed?
These modules where not manually annotated in `pyspark-stubs`, but are part of the public API and we should provide more precise annotations.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
MyPy tests:
```
mypy --no-incremental --config python/mypy.ini python/pyspark
```
Closes#29969 from zero323/SPARK-32714-FOLLOW-UP-RESOURCE.
Authored-by: zero323 <mszymkiewicz@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Migrate sbt-launcher URL to download one for sbt 1.x.
Update plugins versions where required by sbt update.
Change sbt version to be used to latest released at the moment, 1.3.13
Adjust build settings according to plugins and sbt changes.
### Why are the changes needed?
Migration to sbt 1.x:
1. enhances dev experience in development
2. updates build plugins to bring there new features/to fix bugs in them
3. enhances build performance on sbt side
4. eases movement to Scala 3 / dotty
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
All existing tests passed, both on Jenkins and via Github Actions, also manually for Scala 2.13 profile.
Closes#29286 from gemelen/feature/sbt-1.x.
Authored-by: Denis Pyshev <git@gemelen.net>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
1. Don't print the exception in the error message while loading a built-in provider.
2. Print the exception starting from the INFO level.
Up to the INFO level, the output is:
```
17:48:32.342 ERROR org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProvider: Failed to load built in provider.
```
and starting from the INFO level:
```
17:48:32.342 ERROR org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProvider: Failed to load built in provider.
17:48:32.342 INFO org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProvider: Loading of the provider failed with the exception:
java.util.ServiceConfigurationError: org.apache.spark.sql.jdbc.JdbcConnectionProvider: Provider org.apache.spark.sql.execution.datasources.jdbc.connection.IntentionallyFaultyConnectionProvider could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProvider$.loadProviders(ConnectionProvider.scala:41)
```
### Why are the changes needed?
To avoid "noise" in logs while running tests. Currently, logs are blown up:
```
org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProvider: Loading of the provider failed with the exception:
java.util.ServiceConfigurationError: org.apache.spark.sql.jdbc.JdbcConnectionProvider: Provider org.apache.spark.sql.execution.datasources.jdbc.connection.IntentionallyFaultyConnectionProvider could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProvider$.loadProviders(ConnectionProvider.scala:41)
...
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Intentional Exception
at org.apache.spark.sql.execution.datasources.jdbc.connection.IntentionallyFaultyConnectionProvider.<init>(IntentionallyFaultyConnectionProvider.scala:26)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By running:
```
$ build/sbt "sql/test:testOnly org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalogSuite"
```
Closes#29968 from MaxGekk/gaborgsomogyi-SPARK-32001-followup.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR removes old Hive-1.2 profile related workaround code.
### Why are the changes needed?
To simply the code.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CI.
Closes#29961 from dongjoon-hyun/SPARK-HIVE12.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
The pod template configmap always had the same name. This PR makes it unique.
### Why are the changes needed?
If you scheduled 2 spark jobs they will both use the same configmap name this will result in conflicts. This PR fixes that
**BEFORE**
```
$ kubectl get cm --all-namespaces -w | grep podspec
podspec-configmap 1 65s
```
**AFTER**
```
$ kubectl get cm --all-namespaces -w | grep podspec
aaece65ef82e4a30b7b7800aad600d4f spark-test-app-aac9f37502b2ca55-driver-podspec-conf-map 1 0s
```
This can be seen when running the integration tests
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit tests and the integration tests test if this works
Closes#29934 from stijndehaes/bugfix/SPARK-32067-unique-name-for-template-configmap.
Authored-by: Stijn De Haes <stijndehaes@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR intends to refactor code in `RewriteCorrelatedScalarSubquery` for replacing `ExprId`s in a bottom-up manner instead of doing in a top-down one.
This PR comes from the talk with cloud-fan in https://github.com/apache/spark/pull/29585#discussion_r490371252.
### Why are the changes needed?
To improve code.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing tests.
Closes#29913 from maropu/RefactorRewriteCorrelatedScalarSubquery.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
### What changes were proposed in this pull request?
This PR:
- removes annotations for modules which are not part of the public API.
- removes `__init__.pyi` files, if no annotations, beyond exports, are present.
### Why are the changes needed?
Primarily to reduce maintenance overhead and as requested in the comments to https://github.com/apache/spark/pull/29591
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existing tests and additional MyPy checks:
```
mypy --no-incremental --config python/mypy.ini python/pyspark
MYPYPATH=python/ mypy --no-incremental --config python/mypy.ini examples/src/main/python/ml examples/src/main/python/sql examples/src/main/python/sql/streaming
```
Closes#29879 from zero323/SPARK-33002.
Authored-by: zero323 <mszymkiewicz@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR proposes to migrate `DESCRIBE tbl colname` to use `UnresolvedTableOrView` to resolve the table/view identifier. This allows consistent resolution rules (temp view first, etc.) to be applied for both v1/v2 commands. More info about the consistent resolution rule proposal can be found in [JIRA](https://issues.apache.org/jira/browse/SPARK-29900) or [proposal doc](https://docs.google.com/document/d/1hvLjGA8y_W_hhilpngXVub1Ebv8RsMap986nENCFnrg/edit?usp=sharing).
### Why are the changes needed?
The current behavior is not consistent between v1 and v2 commands when resolving a temp view.
In v2, the `t` in the following example is resolved to a table:
```scala
sql("CREATE TABLE testcat.ns.t (id bigint) USING foo")
sql("CREATE TEMPORARY VIEW t AS SELECT 2 as i")
sql("USE testcat.ns")
sql("DESCRIBE t i") // 't' is resolved to testcat.ns.t
Describing columns is not supported for v2 tables.;
org.apache.spark.sql.AnalysisException: Describing columns is not supported for v2 tables.;
```
whereas in v1, the `t` is resolved to a temp view:
```scala
sql("CREATE DATABASE test")
sql("CREATE TABLE spark_catalog.test.t (id bigint) USING csv")
sql("CREATE TEMPORARY VIEW t AS SELECT 2 as i")
sql("USE spark_catalog.test")
sql("DESCRIBE t i").show // 't' is resolved to a temp view
+---------+----------+
|info_name|info_value|
+---------+----------+
| col_name| i|
|data_type| int|
| comment| NULL|
+---------+----------+
```
### Does this PR introduce _any_ user-facing change?
After this PR, `DESCRIBE t i` is resolved to a temp view `t` instead of `testcat.ns.t`.
### How was this patch tested?
Added a new test
Closes#29880 from imback82/describe_column_consistent.
Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
1. Override the default SQL strings in the Oracle Dialect for:
- ALTER TABLE ADD COLUMN
- ALTER TABLE UPDATE COLUMN TYPE
- ALTER TABLE UPDATE COLUMN NULLABILITY
2. Add new docker integration test suite `jdbc/v2/OracleIntegrationSuite.scala`
### Why are the changes needed?
In SPARK-24907, we implemented JDBC v2 Table Catalog but it doesn't support some `ALTER TABLE` at the moment. This PR supports Oracle specific `ALTER TABLE`.
### Does this PR introduce _any_ user-facing change?
Yes
### How was this patch tested?
By running new integration test suite:
```
$ ./build/sbt -Pdocker-integration-tests "test-only *.OracleIntegrationSuite"
```
Closes#29912 from MaxGekk/jdbcv2-oracle-alter-table.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
This is a followup of https://github.com/apache/spark/pull/29918. We should add it into the documentation as well.
### Why are the changes needed?
To show users new APIs.
### Does this PR introduce _any_ user-facing change?
Yes, `SparkContext.getCheckpointDir` will be documented.
### How was this patch tested?
Manually built the PySpark documentation:
```bash
cd python/docs
make clean html
cd build/html
open index.html
```
Closes#29960 from HyukjinKwon/SPARK-33017.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Get error message from the expected exception, and check that they are reasonable.
### Why are the changes needed?
To improve tests by expecting particular error messages.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By running `JDBCTableCatalogSuite`.
Closes#29957 from MaxGekk/jdbcv2-negative-tests-followup.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This proposes to simplify named_struct + get struct field + from_json expression chain from `struct(from_json.col1, from_json.col2, from_json.col3...)` to `struct(from_json)`.
### Why are the changes needed?
Simplify complex expression tree that could be produced by query optimization or user.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit test.
Closes#29942 from viirya/SPARK-33007.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
Fix the flaky test.
### Why are the changes needed?
The test is flaky: `Expected exception org.apache.spark.SparkException to be thrown, but no exception was thrown`.
Check the full error stack [here](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128548/testReport/org.apache.spark.scheduler/BarrierTaskContextSuite/throw_exception_if_the_number_of_barrier___calls_are_not_the_same_on_every_task/).
By analyzing the log below, I found that task 0 hadn't reached the second `context.barrier()` when another three tasks already raised the sync timeout exceptions by the first `context.barrier()`. The timeout exceptions were caught by the `try...catch...`. Then, each task started another round barrier sync from the second `context.barrier()` and completed the sync successfully.
```scala
20/09/10 20:54:48.821 dispatcher-event-loop-10 INFO BarrierCoordinator: Current barrier epoch for Stage 0 (Attempt 0) is 0.
20/09/10 20:54:48.822 dispatcher-event-loop-10 INFO BarrierCoordinator: Barrier sync epoch 0 from Stage 0 (Attempt 0) received update from Task 2, current progress: 1/4.
20/09/10 20:54:48.826 dispatcher-BlockManagerMaster INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:38420 (size: 2.2 KiB, free: 546.3 MiB)
20/09/10 20:54:48.908 dispatcher-event-loop-12 INFO BarrierCoordinator: Current barrier epoch for Stage 0 (Attempt 0) is 0.
20/09/10 20:54:48.909 dispatcher-event-loop-12 INFO BarrierCoordinator: Barrier sync epoch 0 from Stage 0 (Attempt 0) received update from Task 1, current progress: 2/4.
20/09/10 20:54:48.959 dispatcher-event-loop-11 INFO BarrierCoordinator: Current barrier epoch for Stage 0 (Attempt 0) is 0.
20/09/10 20:54:48.960 dispatcher-event-loop-11 INFO BarrierCoordinator: Barrier sync epoch 0 from Stage 0 (Attempt 0) received update from Task 3, current progress: 3/4.
20/09/10 20:54:49.616 dispatcher-CoarseGrainedScheduler INFO TaskSchedulerImpl: Skip current round of resource offers for barrier stage 0 because the barrier taskSet requires 4 slots, while the total number of available slots is 0.
20/09/10 20:54:49.899 dispatcher-event-loop-15 INFO BarrierCoordinator: Current barrier epoch for Stage 0 (Attempt 0) is 0.
20/09/10 20:54:49.900 dispatcher-event-loop-15 INFO BarrierCoordinator: Barrier sync epoch 0 from Stage 0 (Attempt 0) received update from Task 1, current progress: 1/4.
20/09/10 20:54:49.965 dispatcher-event-loop-13 INFO BarrierCoordinator: Current barrier epoch for Stage 0 (Attempt 0) is 0.
20/09/10 20:54:49.966 dispatcher-event-loop-13 INFO BarrierCoordinator: Barrier sync epoch 0 from Stage 0 (Attempt 0) received update from Task 3, current progress: 2/4.
20/09/10 20:54:50.112 dispatcher-event-loop-16 INFO BarrierCoordinator: Current barrier epoch for Stage 0 (Attempt 0) is 0.
20/09/10 20:54:50.113 dispatcher-event-loop-16 INFO BarrierCoordinator: Barrier sync epoch 0 from Stage 0 (Attempt 0) received update from Task 0, current progress: 3/4.
20/09/10 20:54:50.609 dispatcher-CoarseGrainedScheduler INFO TaskSchedulerImpl: Skip current round of resource offers for barrier stage 0 because the barrier taskSet requires 4 slots, while the total number of available slots is 0.
20/09/10 20:54:50.826 dispatcher-event-loop-17 INFO BarrierCoordinator: Current barrier epoch for Stage 0 (Attempt 0) is 0.
20/09/10 20:54:50.827 dispatcher-event-loop-17 INFO BarrierCoordinator: Barrier sync epoch 0 from Stage 0 (Attempt 0) received update from Task 2, current progress: 4/4.
20/09/10 20:54:50.827 dispatcher-event-loop-17 INFO BarrierCoordinator: Barrier sync epoch 0 from Stage 0 (Attempt 0) received all updates from tasks, finished successfully.
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Updated the test and tested a hundred times without failure(Previously, there could be several failures).
Closes#29732 from Ngone51/fix-flaky-throw-exception.
Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>