spark-instrumented-optimizer/mllib
Max Gekk aa87b0aba3 [SPARK-32815][ML] Fix LibSVM data source loading error on file paths with glob metacharacters
### What changes were proposed in this pull request?
In the PR, I propose to fix an issue with LibSVM datasource when both of the following are true:
* no user specified schema
* some file paths contain escaped glob metacharacters, such as `[``]`, `{``}`, `*` etc.

The fix is based on another bug fix for CSV/JSON datasources https://github.com/apache/spark/pull/29659.

### Why are the changes needed?
To fix the issue when the follow two queries try to read from paths `[abc]`:
```scala
spark.read.format("libsvm").load("""/tmp/\[abc\].csv""").show
```
but would end up hitting an exception:
```
Path does not exist: file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tc0000gn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-00000-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm;
org.apache.spark.sql.AnalysisException: Path does not exist: file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tc0000gn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-00000-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm;
	at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$3(DataSource.scala:770)
	at org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:373)
	at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
	at scala.util.Success.$anonfun$map$1(Try.scala:255)
	at scala.util.Success.map(Try.scala:213)
```

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Added UT to `LibSVMRelationSuite`.

Closes #29670 from MaxGekk/globbing-paths-when-inferring-schema-ml.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-09-08 14:15:16 +00:00
..
benchmarks [SPARK-29297][TESTS] Compare core/mllib module benchmarks in JDK8/11 2019-09-29 21:43:58 -07:00
src [SPARK-32815][ML] Fix LibSVM data source loading error on file paths with glob metacharacters 2020-09-08 14:15:16 +00:00
pom.xml [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT 2020-02-25 19:44:31 -08:00