## What changes were proposed in this pull request?
This PR cleans up a few Java linter errors for Apache Spark 2.2 release.
## How was this patch tested?
```bash
$ dev/lint-java
Using `mvn` from path: /usr/local/bin/mvn
Checkstyle checks passed.
```
We can check the result at Travis CI, [here](https://travis-ci.org/dongjoon-hyun/spark/builds/244297894).
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes#18345 from dongjoon-hyun/fix_lint_java_2.
the original code cant visit the last element of the"parts" array.
so the v[v.length–1] always equals 0
## What changes were proposed in this pull request?
change the recycle range from (1 to parts.length-1) to (1 to parts.length)
## How was this patch tested?
debug it in eclipse (´〜`*) zzz.
Please review http://spark.apache.org/contributing.html before opening a pull request.
Author: junzhi lu <452756565@qq.com>
Closes#18237 from masterwugui/patch-1.
## What changes were proposed in this pull request?
- Add Scala, Python and Java examples for `partitionBy`, `sortBy` and `bucketBy`.
- Add _Bucketing, Sorting and Partitioning_ section to SQL Programming Guide
- Remove bucketing from Unsupported Hive Functionalities.
## How was this patch tested?
Manual tests, docs build.
Author: zero323 <zero323@users.noreply.github.com>
Closes#17938 from zero323/DOCS-BUCKETING-AND-PARTITIONING.
## What changes were proposed in this pull request?
1, add an example for sparkr `decisionTree`
2, document it in user guide
## How was this patch tested?
local submit
Author: Zheng RuiFeng <ruifengz@foxmail.com>
Closes#18067 from zhengruifeng/dt_example.
## What changes were proposed in this pull request?
Add Structured Streaming Kafka Source to the `examples` project so that people can run `bin/run-example StructuredKafkaWordCount ...`.
## How was this patch tested?
manually tested it.
Author: Shixiong Zhu <shixiong@databricks.com>
Closes#18101 from zsxwing/add-missing-example-dep.
## What changes were proposed in this pull request?
The license is not at the top in some files. and it will be best if we update these places of the ASF header to be consistent with other files.
## How was this patch tested?
manual tests
Author: zuotingbing <zuo.tingbing9@zte.com.cn>
Closes#18012 from zuotingbing/spark-license.
## What changes were proposed in this pull request?
Add docs and examples for ```ml.stat.Correlation``` and ```ml.stat.ChiSquareTest```.
## How was this patch tested?
Generate docs and run examples manually, successfully.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#17994 from yanboliang/spark-20505.
Update ALS examples illustrating use of "recommendForAllX" methods.
## How was this patch tested?
Built and ran examples locally
Author: Nick Pentreath <nickp@za.ibm.com>
Closes#17950 from MLnick/SPARK-20553-update-als-examples.
## What changes were proposed in this pull request?
Remove uses of scala.language.reflectiveCalls that are either unnecessary or probably resulting in more complex code. This turned out to be less significant than I thought, but, still worth a touch-up.
## How was this patch tested?
Existing tests.
Author: Sean Owen <sowen@cloudera.com>
Closes#17949 from srowen/SPARK-20554.
## What changes were proposed in this pull request?
Any Dataset/DataFrame batch query with the operation `withWatermark` does not execute because the batch planner does not have any rule to explicitly handle the EventTimeWatermark logical plan.
The right solution is to simply remove the plan node, as the watermark should not affect any batch query in any way.
Changes:
- In this PR, we add a new rule `EliminateEventTimeWatermark` to check if we need to ignore the event time watermark. We will ignore watermark in any batch query.
Depends upon:
- [SPARK-20672](https://issues.apache.org/jira/browse/SPARK-20672). We can not add this rule into analyzer directly, because streaming query will be copied to `triggerLogicalPlan ` in every trigger, and the rule will be applied to `triggerLogicalPlan` mistakenly.
Others:
- A typo fix in example.
## How was this patch tested?
add new unit test.
Author: uncleGen <hustyugm@gmail.com>
Closes#17896 from uncleGen/SPARK-20373.
## What changes were proposed in this pull request?
Add
- R vignettes
- R programming guide
- SS programming guide
- R example
Also disable spark.als in vignettes for now since it's failing (SPARK-20402)
## How was this patch tested?
manually
Author: Felix Cheung <felixcheung_m@hotmail.com>
Closes#17814 from felixcheung/rdocss.
## What changes were proposed in this pull request?
Fix build warnings primarily related to Breeze 0.13 operator changes, Java style problems
## How was this patch tested?
Existing tests
Author: Sean Owen <sowen@cloudera.com>
Closes#17803 from srowen/SPARK-20523.
Add PCA and SVD to PySpark's wrappers for `RowMatrix` and `IndexedRowMatrix` (SVD only).
Based on #7963, updated.
## How was this patch tested?
New doc tests and unit tests. Ran all examples locally.
Author: MechCoder <manojkumarsivaraj334@gmail.com>
Author: Nick Pentreath <nickp@za.ibm.com>
Closes#17621 from MLnick/SPARK-6227-pyspark-svd-pca.
## What changes were proposed in this pull request?
Add a new section for fpm
Add Example for FPGrowth in scala and Java
updated: Rewrite transform to be more compact.
## How was this patch tested?
local doc generation.
Author: Yuhao Yang <yuhao.yang@intel.com>
Closes#17130 from hhbyyh/fpmdoc.
## What changes were proposed in this pull request?
Document fpGrowth in:
- vignettes
- programming guide
- code example
## How was this patch tested?
Manual tests.
Author: zero323 <zero323@users.noreply.github.com>
Closes#17557 from zero323/SPARK-20208.
## What changes were proposed in this pull request?
Extra accessors in java bean class causes incorrect encoder generation, which corrupted the state when using timeouts.
## How was this patch tested?
manually ran the example
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes#17676 from tdas/SPARK-20377.
## What changes were proposed in this pull request?
This PR proposes corrections related to JSON APIs as below:
- Rendering links in Python documentation
- Replacing `RDD` to `Dataset` in programing guide
- Adding missing description about JSON Lines consistently in `DataFrameReader.json` in Python API
- De-duplicating little bit of `DataFrameReader.json` in Scala/Java API
## How was this patch tested?
Manually build the documentation via `jekyll build`. Corresponding snapstops will be left on the codes.
Note that currently there are Javadoc8 breaks in several places. These are proposed to be handled in https://github.com/apache/spark/pull/17477. So, this PR does not fix those.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes#17602 from HyukjinKwon/minor-json-documentation.
## What changes were proposed in this pull request?
Add Locale.ROOT to internal calls to String `toLowerCase`, `toUpperCase`, to avoid inadvertent locale-sensitive variation in behavior (aka the "Turkish locale problem").
The change looks large but it is just adding `Locale.ROOT` (the locale with no country or language specified) to every call to these methods.
## How was this patch tested?
Existing tests.
Author: Sean Owen <sowen@cloudera.com>
Closes#17527 from srowen/SPARK-20156.
## What changes were proposed in this pull request?
Add Tweedie example for SparkR in programming guide.
The doc was already updated in #17103.
Author: actuaryzhang <actuaryzhang10@gmail.com>
Closes#17553 from actuaryzhang/programGuide.
## What changes were proposed in this pull request?
Fix typo in hive examples from "DaraFrames" to "DataFrames"
## How was this patch tested?
N/A
Please review http://spark.apache.org/contributing.html before opening a pull request.
Author: Dustin Koupal <dkoupal@blizzard.com>
Closes#17554 from cooper6581/typo-daraframes.
## What changes were proposed in this pull request?
- Fixed bug in Java API not passing timeout conf to scala API
- Updated markdown docs
- Updated scala docs
- Added scala and Java example
## How was this patch tested?
Manually ran examples.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes#17539 from tdas/SPARK-20224.
## What changes were proposed in this pull request?
Add docs and examples for spark.ml.feature.Imputer. Currently scala and Java examples are included. Python example will be added after https://github.com/apache/spark/pull/17316
## How was this patch tested?
local doc generation and example execution
Author: Yuhao Yang <yuhao.yang@intel.com>
Closes#17324 from hhbyyh/imputerdoc.
…adoc
## What changes were proposed in this pull request?
Use recommended values for row boundaries in Window's scaladoc, i.e. `Window.unboundedPreceding`, `Window.unboundedFollowing`, and `Window.currentRow` (that were introduced in 2.1.0).
## How was this patch tested?
Local build
Author: Jacek Laskowski <jacek@japila.pl>
Closes#17417 from jaceklaskowski/window-expression-scaladoc.
## What changes were proposed in this pull request?
There are two examples in r folder missing the run commands.
In this PR, I just add the missing comment, which is consistent with other examples.
## How was this patch tested?
Manual test.
Author: wm624@hotmail.com <wm624@hotmail.com>
Closes#17474 from wangmiao1981/stat.
## What changes were proposed in this pull request?
Currently JDBC data source creates tables in the target database using the default type mapping, and the JDBC dialect mechanism. If users want to specify different database data type for only some of columns, there is no option available. In scenarios where default mapping does not work, users are forced to create tables on the target database before writing. This workaround is probably not acceptable from a usability point of view. This PR is to provide a user-defined type mapping for specific columns.
The solution is to allow users to specify database column data type for the create table as JDBC datasource option(createTableColumnTypes) on write. Data type information can be specified in the same format as table schema DDL format (e.g: `name CHAR(64), comments VARCHAR(1024)`).
All supported target database types can not be specified , the data types has to be valid spark sql data types also. For example user can not specify target database CLOB data type. This will be supported in the follow-up PR.
Example:
```Scala
df.write
.option("createTableColumnTypes", "name CHAR(64), comments VARCHAR(1024)")
.jdbc(url, "TEST.DBCOLTYPETEST", properties)
```
## How was this patch tested?
Added new test cases to the JDBCWriteSuite
Author: sureshthalamati <suresh.thalamati@gmail.com>
Closes#16209 from sureshthalamati/jdbc_custom_dbtype_option_json-spark-10849.
[SPARK-14489](https://issues.apache.org/jira/browse/SPARK-14489) added the ability to skip `NaN` predictions during `ALSModel.transform`. This PR adds documentation for the `coldStartStrategy` param to the ALS user guide, and add code to the examples to illustrate usage.
## How was this patch tested?
Doc and example change only. Build HTML doc locally and verified example code builds, and runs in shell for Scala/Python.
Author: Nick Pentreath <nickp@za.ibm.com>
Closes#17102 from MLnick/SPARK-19345-coldstart-doc.
## What changes were proposed in this pull request?
Remove `org.apache.spark.examples.` in
Add slash in one of the python doc.
## How was this patch tested?
Run examples using the commands in the comments.
Author: Yun Ni <yunn@uber.com>
Closes#17104 from Yunni/yunn_minor.
## What changes were proposed in this pull request?
Replace `iris` dataset with `Titanic` or other dataset in example and document.
## How was this patch tested?
Manual and existing test
Author: wm624@hotmail.com <wm624@hotmail.com>
Closes#17032 from wangmiao1981/example.
## What changes were proposed in this pull request?
This PR proposes to replace the deprecated `json(RDD[String])` usage to `json(Dataset[String])`.
This currently produces so many warnings.
## How was this patch tested?
Fixed tests.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes#17071 from HyukjinKwon/SPARK-15615-followup.
## What changes were proposed in this pull request?
This PR proposes to fix the lint-breaks as below:
```
[ERROR] src/test/java/org/apache/spark/network/TransportResponseHandlerSuite.java:[29,8] (imports) UnusedImports: Unused import - org.apache.spark.network.buffer.ManagedBuffer.
[ERROR] src/main/java/org/apache/spark/unsafe/types/UTF8String.java:[156,10] (modifier) ModifierOrder: 'Nonnull' annotation modifier does not precede non-annotation modifiers.
[ERROR] src/main/java/org/apache/spark/SparkFirehoseListener.java:[122] (sizes) LineLength: Line is longer than 100 characters (found 105).
[ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[164,78] (coding) OneStatementPerLine: Only one statement per line allowed.
[ERROR] src/test/java/test/org/apache/spark/JavaAPISuite.java:[1157] (sizes) LineLength: Line is longer than 100 characters (found 121).
[ERROR] src/test/java/org/apache/spark/streaming/JavaMapWithStateSuite.java:[149] (sizes) LineLength: Line is longer than 100 characters (found 113).
[ERROR] src/test/java/test/org/apache/spark/streaming/Java8APISuite.java:[146] (sizes) LineLength: Line is longer than 100 characters (found 122).
[ERROR] src/test/java/test/org/apache/spark/streaming/JavaAPISuite.java:[32,8] (imports) UnusedImports: Unused import - org.apache.spark.streaming.Time.
[ERROR] src/test/java/test/org/apache/spark/streaming/JavaAPISuite.java:[611] (sizes) LineLength: Line is longer than 100 characters (found 101).
[ERROR] src/test/java/test/org/apache/spark/streaming/JavaAPISuite.java:[1317] (sizes) LineLength: Line is longer than 100 characters (found 102).
[ERROR] src/test/java/test/org/apache/spark/sql/JavaDatasetAggregatorSuite.java:[91] (sizes) LineLength: Line is longer than 100 characters (found 102).
[ERROR] src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java:[113] (sizes) LineLength: Line is longer than 100 characters (found 101).
[ERROR] src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java:[164] (sizes) LineLength: Line is longer than 100 characters (found 110).
[ERROR] src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java:[212] (sizes) LineLength: Line is longer than 100 characters (found 114).
[ERROR] src/test/java/org/apache/spark/mllib/tree/JavaDecisionTreeSuite.java:[36] (sizes) LineLength: Line is longer than 100 characters (found 101).
[ERROR] src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java:[26,8] (imports) UnusedImports: Unused import - com.amazonaws.regions.RegionUtils.
[ERROR] src/test/java/org/apache/spark/streaming/kinesis/JavaKinesisStreamSuite.java:[20,8] (imports) UnusedImports: Unused import - com.amazonaws.regions.RegionUtils.
[ERROR] src/test/java/org/apache/spark/streaming/kinesis/JavaKinesisStreamSuite.java:[94] (sizes) LineLength: Line is longer than 100 characters (found 103).
[ERROR] src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java:[30,8] (imports) UnusedImports: Unused import - org.apache.spark.sql.api.java.UDF1.
[ERROR] src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java:[72] (sizes) LineLength: Line is longer than 100 characters (found 104).
[ERROR] src/main/java/org/apache/spark/examples/mllib/JavaRankingMetricsExample.java:[121] (sizes) LineLength: Line is longer than 100 characters (found 101).
[ERROR] src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java:[28,8] (imports) UnusedImports: Unused import - org.apache.spark.api.java.JavaRDD.
[ERROR] src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java:[29,8] (imports) UnusedImports: Unused import - org.apache.spark.api.java.JavaSparkContext.
```
## How was this patch tested?
Manually via
```bash
./dev/lint-java
```
Author: hyukjinkwon <gurwls223@gmail.com>
Closes#17072 from HyukjinKwon/java-lint.
## What changes were proposed in this pull request?
Removed duplicated lines in sql python example and found a typo.
## How was this patch tested?
Searched for other typo's in the page to minimize PR's.
Author: Boaz Mohar <boazmohar@gmail.com>
Closes#17066 from boazmohar/doc-fix.
## What changes were proposed in this pull request?
Documentation and examples (Java, scala, python, R) for LinearSVC
## How was this patch tested?
local doc generation
Author: Yuhao Yang <yuhao.yang@intel.com>
Closes#16968 from hhbyyh/mlsvmdoc.
## What changes were proposed in this pull request?
Convert Java tests to use lambdas, Java 8 features.
## How was this patch tested?
Jenkins tests.
Author: Sean Owen <sowen@cloudera.com>
Closes#16961 from srowen/SPARK-19533.
## What changes were proposed in this pull request?
We recently add the spark.svmLinear API for SparkR. We need to add an example and update the vignettes.
## How was this patch tested?
Manually run example.
Author: wm624@hotmail.com <wm624@hotmail.com>
Closes#16969 from wangmiao1981/example.
## What changes were proposed in this pull request?
stop session at end of example
## How was this patch tested?
manual
Author: Felix Cheung <felixcheung_m@hotmail.com>
Closes#16973 from felixcheung/rexamples.
- Move external/java8-tests tests into core, streaming, sql and remove
- Remove MaxPermGen and related options
- Fix some reflection / TODOs around Java 8+ methods
- Update doc references to 1.7/1.8 differences
- Remove Java 7/8 related build profiles
- Update some plugins for better Java 8 compatibility
- Fix a few Java-related warnings
For the future:
- Update Java 8 examples to fully use Java 8
- Update Java tests to use lambdas for simplicity
- Update Java internal implementations to use lambdas
## How was this patch tested?
Existing tests
Author: Sean Owen <sowen@cloudera.com>
Closes#16871 from srowen/SPARK-19493.
## What changes were proposed in this pull request?
This pull request includes python API and examples for LSH. The API changes was based on yanboliang 's PR #15768 and resolved conflicts and API changes on the Scala API. The examples are consistent with Scala examples of MinHashLSH and BucketedRandomProjectionLSH.
## How was this patch tested?
API and examples are tested using spark-submit:
`bin/spark-submit examples/src/main/python/ml/min_hash_lsh.py`
`bin/spark-submit examples/src/main/python/ml/bucketed_random_projection_lsh.py`
User guide changes are generated and manually inspected:
`SKIP_API=1 jekyll build`
Author: Yun Ni <yunn@uber.com>
Author: Yanbo Liang <ybliang8@gmail.com>
Author: Yunni <Euler57721@gmail.com>
Closes#16715 from Yunni/spark-18080.
### What changes were proposed in this pull request?
```
Liquid Exception: Start indices amount is not equal to end indices amount, see /Users/xiao/IdeaProjects/sparkDelivery/docs/../examples/src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java. in ml-features.md
```
So far, the build is broken after merging https://github.com/apache/spark/pull/16789
This PR is to fix it.
## How was this patch tested?
Manual
Author: Xiao Li <gatorsmile@gmail.com>
Closes#16908 from gatorsmile/docMLFix.
## What changes were proposed in this pull request?
SPARK-19444 imports not being present in documentation
## How was this patch tested?
Manual
## Disclaimer
Contribution is original work and I license the work to the project under the project’s open source license
Author: Aseem Bansal <anshbansal@users.noreply.github.com>
Closes#16789 from anshbansal/patch-1.
## What changes were proposed in this pull request?
Update programming guide, example and vignette with Bisecting k-means.
Author: krishnakalyan3 <krishnakalyan3@gmail.com>
Closes#16767 from krishnakalyan3/bisecting-kmeans.
## What changes were proposed in this pull request?
- A separate subsection for Aggregations under “Getting Started” in the Spark SQL programming guide. It mentions which aggregate functions are predefined and how users can create their own.
- Examples of using the `UserDefinedAggregateFunction` abstract class for untyped aggregations in Java and Scala.
- Examples of using the `Aggregator` abstract class for type-safe aggregations in Java and Scala.
- Python is not covered.
- The PR might not resolve the ticket since I do not know what exactly was planned by the author.
In total, there are four new standalone examples that can be executed via `spark-submit` or `run-example`. The updated Spark SQL programming guide references to these examples and does not contain hard-coded snippets.
## How was this patch tested?
The patch was tested locally by building the docs. The examples were run as well.
![image](https://cloud.githubusercontent.com/assets/6235869/21292915/04d9d084-c515-11e6-811a-999d598dffba.png)
Author: aokolnychyi <okolnychyyanton@gmail.com>
Closes#16329 from aokolnychyi/SPARK-16046.
## What changes were proposed in this pull request?
remove ununsed imports and outdated comments, and fix some minor code style issue.
## How was this patch tested?
existing ut
Author: uncleGen <hustyugm@gmail.com>
Closes#16591 from uncleGen/SPARK-19227.
## What changes were proposed in this pull request?
```ml.R``` example depends on ```e1071``` package, if it's not available in users' environment, it will fail. I think the example should not depends on third-party packages, so I update it to remove the dependency.
## How was this patch tested?
Manual test.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#16548 from yanboliang/spark-19158.
## What changes were proposed in this pull request?
**binary_classification_metrics_example.py**
LibSVM datasource loads `ml.linalg.SparseVector` whereas the example requires it to be `mllib.linalg.SparseVector`. For the equivalent Scala exmaple, `BinaryClassificationMetricsExample.scala` seems fine.
```
./bin/spark-submit examples/src/main/python/mllib/binary_classification_metrics_example.py
```
```
File ".../spark/examples/src/main/python/mllib/binary_classification_metrics_example.py", line 39, in <lambda>
.rdd.map(lambda row: LabeledPoint(row[0], row[1]))
File ".../spark/python/pyspark/mllib/regression.py", line 54, in __init__
self.features = _convert_to_vector(features)
File ".../spark/python/pyspark/mllib/linalg/__init__.py", line 80, in _convert_to_vector
raise TypeError("Cannot convert type %s into Vector" % type(l))
TypeError: Cannot convert type <class 'pyspark.ml.linalg.SparseVector'> into Vector
```
**status_api_demo.py** (this one does not work on Python 3.4.6)
It's `queue` in Python 3+.
```
PYSPARK_PYTHON=python3 ./bin/spark-submit examples/src/main/python/status_api_demo.py
```
```
Traceback (most recent call last):
File ".../spark/examples/src/main/python/status_api_demo.py", line 22, in <module>
import Queue
ImportError: No module named 'Queue'
```
**bisecting_k_means_example.py**
`BisectingKMeansModel` does not implement `save` and `load` in Python.
```bash
./bin/spark-submit examples/src/main/python/mllib/bisecting_k_means_example.py
```
```
Traceback (most recent call last):
File ".../spark/examples/src/main/python/mllib/bisecting_k_means_example.py", line 46, in <module>
model.save(sc, path)
AttributeError: 'BisectingKMeansModel' object has no attribute 'save'
```
**elementwise_product_example.py**
It calls `collect` from the vector.
```bash
./bin/spark-submit examples/src/main/python/mllib/elementwise_product_example.py
```
```
Traceback (most recent call last):
File ".../spark/examples/src/main/python/mllib/elementwise_product_example.py", line 48, in <module>
for each in transformedData2.collect():
File ".../spark/python/pyspark/mllib/linalg/__init__.py", line 478, in __getattr__
return getattr(self.array, item)
AttributeError: 'numpy.ndarray' object has no attribute 'collect'
```
**These three tests look throwing an exception for a relative path set in `spark.sql.warehouse.dir`.**
**hive.py**
```
./bin/spark-submit examples/src/main/python/sql/hive.py
```
```
Traceback (most recent call last):
File ".../spark/examples/src/main/python/sql/hive.py", line 47, in <module>
spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive")
File ".../spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 541, in sql
File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File ".../spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
pyspark.sql.utils.AnalysisException: 'org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse);'
```
**SparkHiveExample.scala**
```
./bin/run-example sql.hive.SparkHiveExample
```
```
Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse
at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:498)
at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:484)
at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1668)
```
**JavaSparkHiveExample.java**
```
./bin/run-example sql.hive.JavaSparkHiveExample
```
```
Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse
at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:498)
at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:484)
at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1668)
```
## How was this patch tested?
Manually via
```
./bin/spark-submit examples/src/main/python/mllib/binary_classification_metrics_example.py
```
```
PYSPARK_PYTHON=python3 ./bin/spark-submit examples/src/main/python/status_api_demo.py
```
```
./bin/spark-submit examples/src/main/python/mllib/bisecting_k_means_example.py
```
```
./bin/spark-submit examples/src/main/python/mllib/elementwise_product_example.py
```
```
./bin/spark-submit examples/src/main/python/sql/hive.py
```
```
./bin/run-example sql.hive.JavaSparkHiveExample
```
```
./bin/run-example sql.hive.SparkHiveExample
```
These were found via
```bash
find ./examples/src/main/python -name "*.py" -exec spark-submit {} \;
```
Author: hyukjinkwon <gurwls223@gmail.com>
Closes#16515 from HyukjinKwon/minor-example-fix.
## What changes were proposed in this pull request?
Today we have different syntax to create data source or hive serde tables, we should unify them to not confuse users and step forward to make hive a data source.
Please read https://issues.apache.org/jira/secure/attachment/12843835/CREATE-TABLE.pdf for details.
TODO(for follow-up PRs):
1. TBLPROPERTIES is not added to the new syntax, we should decide if we wanna add it later.
2. `SHOW CREATE TABLE` should be updated to use the new syntax.
3. we should decide if we wanna change the behavior of `SET LOCATION`.
## How was this patch tested?
new tests
Author: Wenchen Fan <wenchen@databricks.com>
Closes#16296 from cloud-fan/create-table.
## What changes were proposed in this pull request?
There are many locations in the Spark repo where the same word occurs consecutively. Sometimes they are appropriately placed, but many times they are not. This PR removes the inappropriately duplicated words.
## How was this patch tested?
N/A since only docs or comments were updated.
Author: Niranjan Padmanabhan <niranjan.padmanabhan@gmail.com>
Closes#16455 from neurons/np.structure_streaming_doc.
## What changes were proposed in this pull request?
Add `finally` clause for `sc.stop()` in the `test("register and deregister Spark listener from SparkContext")`.
## How was this patch tested?
Pass the build and unit tests.
Author: Weiqing Yang <yangweiqing001@gmail.com>
Closes#16426 from weiqingy/testIssue.