Commit graph

2500 commits

Author SHA1 Message Date
WeichenXu d8b0914c2e [SPARK-28957][SQL] Copy any "spark.hive.foo=bar" spark properties into hadoop conf as "hive.foo=bar"
### What changes were proposed in this pull request?

Copy any "spark.hive.foo=bar" spark properties into hadoop conf as "hive.foo=bar"

### Why are the changes needed?
Providing spark side config entry for hive configurations.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
UT.

Closes #25661 from WeichenXu123/add_hive_conf.

Authored-by: WeichenXu <weichen.xu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-09-25 15:54:44 +08:00
Jungtaek Lim (HeartSaVioR) 4513f1c0dc [SPARK-26848][SQL][SS] Introduce new option to Kafka source: offset by timestamp (starting/ending)
## What changes were proposed in this pull request?

This patch introduces new options "startingOffsetsByTimestamp" and "endingOffsetsByTimestamp" to set specific timestamp per topic (since we're unlikely to set the different value per partition) to let source starts reading from offsets which have equal of greater timestamp, and ends reading until offsets which have equal of greater timestamp.

The new option would be optional of course, and take preference over existing offset options.

## How was this patch tested?

New unit tests added. Also manually tested basic functionality with Kafka 2.0.0 server.

Running query below

```
val df = spark.read.format("kafka")
  .option("kafka.bootstrap.servers", "localhost:9092")
  .option("subscribe", "spark_26848_test_v1,spark_26848_test_2_v1")
  .option("startingOffsetsByTimestamp", """{"spark_26848_test_v1": 1549669142193, "spark_26848_test_2_v1": 1549669240965}""")
  .option("endingOffsetsByTimestamp", """{"spark_26848_test_v1": 1549669265676, "spark_26848_test_2_v1": 1549699265676}""")
  .load().selectExpr("CAST(value AS STRING)")

df.show()
```

with below records (one string which number part remarks when they're put after such timestamp) in

topic `spark_26848_test_v1`
```
hello1 1549669142193
world1 1549669142193
hellow1 1549669240965
world1 1549669240965
hello1 1549669265676
world1 1549669265676
```

topic `spark_26848_test_2_v1`

```
hello2 1549669142193
world2 1549669142193
hello2 1549669240965
world2 1549669240965
hello2 1549669265676
world2 1549669265676
```

the result of `df.show()` follows:
```
+--------------------+
|               value|
+--------------------+
|world1 1549669240965|
|world1 1549669142193|
|world2 1549669240965|
|hello2 1549669240965|
|hellow1 154966924...|
|hello2 1549669265676|
|hello1 1549669142193|
|world2 1549669265676|
+--------------------+
```

Note that endingOffsets (as well as endingOffsetsByTimestamp) are exclusive.

Closes #23747 from HeartSaVioR/SPARK-26848.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-09-23 19:25:36 -05:00
xy_xin 655356e825 [SPARK-28892][SQL] support UPDATE in the parser and add the corresponding logical plan
### What changes were proposed in this pull request?

This PR supports UPDATE in the parser and add the corresponding logical plan. The SQL syntax is a standard UPDATE statement:
```
UPDATE tableName tableAlias SET colName=value [, colName=value]+ WHERE predicate?
```

### Why are the changes needed?

With this change, we can start to implement UPDATE in builtin sources and think about how to design the update API in DS v2.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

New test cases added.

Closes #25626 from xianyinxin/SPARK-28892.

Authored-by: xy_xin <xianyin.xxy@alibaba-inc.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-09-23 19:25:56 +08:00
Jungtaek Lim (HeartSaVioR) 81b6f11a3a [SPARK-29160][CORE] Use UTF-8 explicitly for reading/writing event log file
### What changes were proposed in this pull request?

Credit to vanzin as he found and commented on this while reviewing #25670 - [comment](https://github.com/apache/spark/pull/25670#discussion_r325383512).

This patch proposes to specify UTF-8 explicitly while reading/writer event log file.

### Why are the changes needed?

The event log file is being read/written as default character set of JVM process which may open the chance to bring some problems on reading event log files from another machines. Spark's de facto standard character set is UTF-8, so it should be explicitly set to.

### Does this PR introduce any user-facing change?

Yes, if end users have been running Spark process with different default charset than "UTF-8", especially their driver JVM processes. No otherwise.

### How was this patch tested?

Existing UTs, as ReplayListenerSuite contains "end-to-end" event logging/reading tests (both uncompressed/compressed).

Closes #25845 from HeartSaVioR/SPARK-29160.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-09-21 23:59:37 +09:00
Dongjoon Hyun 4a89fa1cd1 [SPARK-29196][DOCS] Add JDK11 support to the document
### What changes were proposed in this pull request?

This PRs add Java 11 version to the document.

### Why are the changes needed?

Apache Spark 3.0.0 starts to support JDK11 officially.

### Does this PR introduce any user-facing change?

Yes.

![jdk11](https://user-images.githubusercontent.com/9700541/65364063-39204580-dbc4-11e9-982b-fc1552be2ec5.png)

### How was this patch tested?

Manually. Doc generation.

Closes #25875 from dongjoon-hyun/SPARK-29196.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-09-21 08:40:49 +09:00
shivusondur d3eb4c94cc [SPARK-28822][DOC][SQL] Document USE DATABASE in SQL Reference
### What changes were proposed in this pull request?
Added document reference for USE databse sql command

### Why are the changes needed?
For USE database command usage

### Does this PR introduce any user-facing change?
It is adding the USE database sql command refernce information in the doc

### How was this patch tested?
Attached the test snap
![image](https://user-images.githubusercontent.com/7912929/65170499-7242a380-da66-11e9-819c-76df62c86c5a.png)

Closes #25572 from shivusondur/jiraUSEDaBa1.

Lead-authored-by: shivusondur <shivusondur@gmail.com>
Co-authored-by: Xiao Li <gatorsmile@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-19 13:04:17 -07:00
Dongjoon Hyun 3bf43fb60d [SPARK-29159][BUILD] Increase ReservedCodeCacheSize to 1G
### What changes were proposed in this pull request?

This PR aims to increase the JVM CodeCacheSize from 0.5G to 1G.

### Why are the changes needed?

After upgrading to `Scala 2.12.10`, the following is observed during building.
```
2019-09-18T20:49:23.5030586Z OpenJDK 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled.
2019-09-18T20:49:23.5032920Z OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize=
2019-09-18T20:49:23.5034959Z CodeCache: size=524288Kb used=521399Kb max_used=521423Kb free=2888Kb
2019-09-18T20:49:23.5035472Z  bounds [0x00007fa62c000000, 0x00007fa64c000000, 0x00007fa64c000000]
2019-09-18T20:49:23.5035781Z  total_blobs=156549 nmethods=155863 adapters=592
2019-09-18T20:49:23.5036090Z  compilation: disabled (not enough contiguous free space left)
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually check the Jenkins or GitHub Action build log (which should not have the above).

Closes #25836 from dongjoon-hyun/SPARK-CODE-CACHE-1G.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-19 00:24:15 -07:00
Gengliang Wang b917a6593d [SPARK-28989][SQL] Add a SQLConf spark.sql.ansi.enabled
### What changes were proposed in this pull request?
Currently, there are new configurations for compatibility with ANSI SQL:

* `spark.sql.parser.ansi.enabled`
* `spark.sql.decimalOperations.nullOnOverflow`
* `spark.sql.failOnIntegralTypeOverflow`
This PR is to add new configuration `spark.sql.ansi.enabled` and remove the 3 options above. When the configuration is true, Spark tries to conform to the ANSI SQL specification. It will be disabled by default.

### Why are the changes needed?

Make it simple and straightforward.

### Does this PR introduce any user-facing change?

The new features for ANSI compatibility will be set via one configuration `spark.sql.ansi.enabled`.

### How was this patch tested?

Existing unit tests.

Closes #25693 from gengliangwang/ansiEnabled.

Lead-authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Co-authored-by: Xiao Li <gatorsmile@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-18 22:30:28 -07:00
Yuming Wang 8c3f27ceb4 [SPARK-28683][BUILD] Upgrade Scala to 2.12.10
## What changes were proposed in this pull request?

This PR upgrade Scala to **2.12.10**.

Release notes:
- Fix regression in large string interpolations with non-String typed splices
- Revert "Generate shallower ASTs in pattern translation"
- Fix regression in classpath when JARs have 'a.b' entries beside 'a/b'

- Faster compiler: 5–10% faster since 2.12.8
- Improved compatibility with JDK 11, 12, and 13
- Experimental support for build pipelining and outline type checking

More details:
https://github.com/scala/scala/releases/tag/v2.12.10
https://github.com/scala/scala/releases/tag/v2.12.9

## How was this patch tested?

Existing tests

Closes #25404 from wangyum/SPARK-28683.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-18 13:30:36 -07:00
Luca Canali cd481773c3 [SPARK-28091][CORE] Extend Spark metrics system with user-defined metrics using executor plugins
## What changes were proposed in this pull request?

This proposes to improve Spark instrumentation by adding a hook for user-defined metrics, extending Spark’s Dropwizard/Codahale metrics system.
The original motivation of this work was to add instrumentation for S3 filesystem access metrics by Spark job. Currently, [[ExecutorSource]] instruments HDFS and local filesystem metrics. Rather than extending the code there, we proposes with this JIRA to add a metrics plugin system which is of more flexible and general use.
Context: The Spark metrics system provides a large variety of metrics, see also , useful to  monitor and troubleshoot Spark workloads. A typical workflow is to sink the metrics to a storage system and build dashboards on top of that.
Highlights:
-	The metric plugin system makes it easy to implement instrumentation for S3 access by Spark jobs.
-	The metrics plugin system allows for easy extensions of how Spark collects HDFS-related workload metrics. This is currently done using the Hadoop Filesystem GetAllStatistics method, which is deprecated in recent versions of Hadoop. Recent versions of Hadoop Filesystem recommend using method GetGlobalStorageStatistics, which also provides several additional metrics. GetGlobalStorageStatistics is not available in Hadoop 2.7 (had been introduced in Hadoop 2.8). Using a metric plugin for Spark would allow an easy way to “opt in” using such new API calls for those deploying suitable Hadoop versions.
-	We also have the use case of adding Hadoop filesystem monitoring for a custom Hadoop compliant filesystem in use in our organization (EOS using the XRootD protocol). The metrics plugin infrastructure makes this easy to do. Others may have similar use cases.
-	More generally, this method makes it straightforward to plug in Filesystem and other metrics to the Spark monitoring system. Future work on plugin implementation can address extending monitoring to measure usage of external resources (OS, filesystem, network, accelerator cards, etc), that maybe would not normally be considered general enough for inclusion in Apache Spark code, but that can be nevertheless useful for specialized use cases, tests or troubleshooting.

Implementation:
The proposed implementation extends and modifies the work on Executor Plugin of SPARK-24918. Additionally, this is related to recent work on extending Spark executor metrics, such as SPARK-25228.
As discussed during the review, the implementaiton of this feature modifies the Developer API for Executor Plugins, such that the new version is incompatible with the original version in Spark 2.4.

## How was this patch tested?

This modifies existing tests for ExecutorPluginSuite to adapt them to the API changes. In addition, the new funtionality for registering pluginMetrics has been manually tested running Spark on YARN and K8S clusters, in particular for monitoring S3 and for extending HDFS instrumentation with the Hadoop Filesystem “GetGlobalStorageStatistics” metrics. Executor metric plugin example and code used for testing are available, for example at: https://github.com/cerndb/SparkExecutorPlugins

Closes #24901 from LucaCanali/executorMetricsPlugin.

Authored-by: Luca Canali <luca.canali@cern.ch>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-09-18 10:32:10 -07:00
Marcelo Vanzin 276aaaae8d [SPARK-29105][CORE] Keep driver log file size up to date in HDFS
HDFS doesn't update the file size reported by the NM if you just keep
writing to the file; this makes the SHS believe the file is inactive,
and so it may delete it after the configured max age for log files.

This change uses hsync to keep the log file as up to date as possible
when using HDFS. It also disables erasure coding by default for these
logs, since hsync (& friends) does not work with EC.

Tested with a SHS configured to aggressively clean up logs; verified
a spark-shell session kept updating the log, which was not deleted by
the SHS.

Closes #25819 from vanzin/SPARK-29105.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-09-18 09:11:55 -07:00
Pavithra Ramachandran 600a2a4ae5 [SPARK-28972][DOCS] Updating unit description in configurations, to maintain consistency
### What changes were proposed in this pull request?
Updating unit description in configurations, inorder to maintain consistency across configurations.

### Why are the changes needed?
the description does not mention about suffix that can be mentioned while configuring this value.
For better user understanding

### Does this PR introduce any user-facing change?
yes. Doc description

### How was this patch tested?
generated document and checked.
![Screenshot from 2019-09-05 11-09-17](https://user-images.githubusercontent.com/51401130/64314853-07a55880-cfce-11e9-8af0-6416a50b0188.png)

Closes #25689 from PavithraRamachandran/heapsize_config.

Authored-by: Pavithra Ramachandran <pavi.rams@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-09-18 09:11:15 -05:00
Pavithra Ramachandran b48ef7a9fb [SPARK-28799][DOC] Documentation for Truncate command
### What changes were proposed in this pull request?

Document TRUNCATE  statement in SQL Reference Guide.

### Why are the changes needed?
Adding documentation for SQL reference.

### Does this PR introduce any user-facing change?
yes

Before:
There was no documentation for this.

After.
![image (4)](https://user-images.githubusercontent.com/51401130/64956929-5e057780-d8a9-11e9-89a3-2d02c942b9ad.png)
![image (5)](https://user-images.githubusercontent.com/51401130/64956942-61006800-d8a9-11e9-9767-6164eabfdc2c.png)

### How was this patch tested?

Used jekyll build and serve to verify.

Closes #25557 from PavithraRamachandran/truncate_doc.

Lead-authored-by: Pavithra Ramachandran <pavi.rams@gmail.com>
Co-authored-by: pavithra <pavi.rams@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-09-18 08:44:44 -05:00
sharangk dd32476a82 [SPARK-28792][SQL][DOC] Document CREATE DATABASE statement in SQL Reference
### What changes were proposed in this pull request?
Document CREATE DATABASE statement in SQL Reference Guide.

### Why are the changes needed?
Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand the
usage. This is aimed at addressing this issue.

### Does this PR introduce any user-facing change?
Yes.

### Before:
There was no documentation for this.
### After:
![image](https://user-images.githubusercontent.com/29914590/65037831-290e2900-d96c-11e9-8563-92e5379c3ad1.png)
![image](https://user-images.githubusercontent.com/29914590/64858915-55f9cd80-d646-11e9-91a9-16c52b1daa56.png)

### How was this patch tested?
Manual Review and Tested using jykyll build --serve

Closes #25595 from sharangk/createDbDoc.

Lead-authored-by: sharangk <sharan.gk@gmail.com>
Co-authored-by: Xiao Li <gatorsmile@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-17 14:40:08 -07:00
sharangk c6ca66113f [SPARK-28814][SQL][DOC] Document SET/RESET in SQL Reference
### What changes were proposed in this pull request?
Document SET/REST statement in SQL Reference Guide.

### Why are the changes needed?
Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand the
usage. This is aimed at addressing this issue.

### Does this PR introduce any user-facing change?
Yes.

#### Before:
There was no documentation for this.

#### After:

**SET**
![image](https://user-images.githubusercontent.com/29914590/65037551-94a3c680-d96b-11e9-9d59-9f7af5185e06.png)
![image](https://user-images.githubusercontent.com/29914590/64858792-fb607180-d645-11e9-8a53-8cf87a166fc1.png)

**RESET**
![image](https://user-images.githubusercontent.com/29914590/64859019-b12bc000-d646-11e9-8cb4-73dc21830067.png)

### How was this patch tested?
Manual Review and Tested using jykyll build --serve

Closes #25606 from sharangk/resetDoc.

Authored-by: sharangk <sharan.gk@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-17 14:36:56 -07:00
HyukjinKwon 7d4eb38bbc [SPARK-29052][DOCS][ML][PYTHON][CORE][R][SQL][SS] Create a Migration Guide tap in Spark documentation
### What changes were proposed in this pull request?

Currently, there is no migration section for PySpark, SparkCore and Structured Streaming.
It is difficult for users to know what to do when they upgrade.

This PR proposes to create create a "Migration Guide" tap at Spark documentation.

![Screen Shot 2019-09-11 at 7 02 05 PM](https://user-images.githubusercontent.com/6477701/64688126-ad712f80-d4c6-11e9-8672-9a2c56c05bf8.png)

![Screen Shot 2019-09-11 at 7 27 15 PM](https://user-images.githubusercontent.com/6477701/64689915-389ff480-d4ca-11e9-8c54-7f46095d0d23.png)

This page will contain migration guides for Spark SQL, PySpark, SparkR, MLlib, Structured Streaming and Core. Basically it is a refactoring.

There are some new information added, which I will leave a comment inlined for easier review.

1. **MLlib**
  Merge [ml-guide.html#migration-guide](https://spark.apache.org/docs/latest/ml-guide.html#migration-guide) and [ml-migration-guides.html](https://spark.apache.org/docs/latest/ml-migration-guides.html)

    ```
    'docs/ml-guide.md'
            ↓ Merge new/old migration guides
    'docs/ml-migration-guide.md'
    ```

2. **PySpark**
  Extract PySpark specific items from https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html

    ```
    'docs/sql-migration-guide-upgrade.md'
           ↓ Extract PySpark specific items
    'docs/pyspark-migration-guide.md'
    ```

3. **SparkR**
  Move [sparkr.html#migration-guide](https://spark.apache.org/docs/latest/sparkr.html#migration-guide) into a separate file, and extract from [sql-migration-guide-upgrade.html](https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html)

    ```
    'docs/sparkr.md'                     'docs/sql-migration-guide-upgrade.md'
     Move migration guide section ↘     ↙ Extract SparkR specific items
                     docs/sparkr-migration-guide.md
    ```

4. **Core**
  Newly created at `'docs/core-migration-guide.md'`. I skimmed resolved JIRAs at 3.0.0 and found some items to note.

5. **Structured Streaming**
  Newly created at `'docs/ss-migration-guide.md'`. I skimmed resolved JIRAs at 3.0.0 and found some items to note.

6. **SQL**
  Merged [sql-migration-guide-upgrade.html](https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html) and [sql-migration-guide-hive-compatibility.html](https://spark.apache.org/docs/latest/sql-migration-guide-hive-compatibility.html)
    ```
    'docs/sql-migration-guide-hive-compatibility.md'     'docs/sql-migration-guide-upgrade.md'
     Move Hive compatibility section ↘                   ↙ Left over after filtering PySpark and SparkR items
                                  'docs/sql-migration-guide.md'
    ```

### Why are the changes needed?

In order for users in production to effectively migrate to higher versions, and detect behaviour or breaking changes before upgrading and/or migrating.

### Does this PR introduce any user-facing change?
Yes, this changes Spark's documentation at https://spark.apache.org/docs/latest/index.html.

### How was this patch tested?

Manually build the doc. This can be verified as below:

```bash
cd docs
SKIP_API=1 jekyll build
open _site/index.html
```

Closes #25757 from HyukjinKwon/migration-doc.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-15 11:17:30 -07:00
Pablo Langa d334fee502 [SPARK-28373][DOCS][WEBUI] JDBC/ODBC Server Tab
### What changes were proposed in this pull request?
New documentation to explain in detail JDBC/ODBC server tab. New images are included to better explanation.

![image](https://user-images.githubusercontent.com/12819544/64735402-c4287e00-d4e8-11e9-9366-c8ac0fbfc058.png)
![image](https://user-images.githubusercontent.com/12819544/64735429-cee31300-d4e8-11e9-83f1-0b662037e194.png)

### Does this PR introduce any user-facing change?
Only documentation

### How was this patch tested?
I have generated it using "jekyll build" to ensure that it's ok

Closes #25718 from planga82/SPARK-28373_JDBCServerPage.

Lead-authored-by: Pablo Langa <soypab@gmail.com>
Co-authored-by: Unknown <soypab@gmail.com>
Co-authored-by: Pablo <soypab@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-14 10:18:52 -07:00
Lee Dongjin 1675d5114e [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming
## What changes were proposed in this pull request?

This update adds support for Kafka Headers functionality in Structured Streaming.

## How was this patch tested?

With following unit tests:

- KafkaRelationSuite: "default starting and ending offsets with headers" (new)
- KafkaSinkSuite: "batch - write to kafka" (updated)

Closes #22282 from dongjinleekr/feature/SPARK-23539.

Lead-authored-by: Lee Dongjin <dongjin@apache.org>
Co-authored-by: Jungtaek Lim <kabhwan@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-09-13 12:31:28 -05:00
aman_omer d59980783e [SPARK-28795][DOC][SQL] Document CREATE VIEW statement in SQL Reference
### What changes were proposed in this pull request?
Added document for CREATE VIEW command.

### Why are the changes needed?
As a reference to syntax and examples of CREATE VIEW command.

### How was this patch tested?
Documentation update. Verified manually.

Closes #25543 from amanomer/spark-28795.

Lead-authored-by: aman_omer <amanomer1996@gmail.com>
Co-authored-by: Xiao Li <gatorsmile@gmail.com>
Co-authored-by: Aman Omer <amanomer1996@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-12 23:44:23 -07:00
sandeep katta b83304fb01 [SPARK-28796][DOC] Document DROP DATABASE statement in SQL Reference
### What changes were proposed in this pull request?
Document DROP DATABASE statement in SQL Reference

### Why are the changes needed?
Currently from spark there is no complete sql guide is present, so it is better to document all the sql commands, this jira is sub part of this task.

### Does this PR introduce any user-facing change?
Yes, Before there was no documentation about drop database syntax

After Fix
![image](https://user-images.githubusercontent.com/35216143/64787097-977a7200-d58d-11e9-911c-d2ff6f3ccff5.png)
![image](https://user-images.githubusercontent.com/35216143/64787122-a6612480-d58d-11e9-978c-9455baff007f.png)

### How was this patch tested?
tested with jenkyll build

Closes #25554 from sandeep-katta/dropDbDoc.

Authored-by: sandeep katta <sandeep.katta2007@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-12 23:10:50 -07:00
Kevin Yu ee63031270 [SPARK-28828][DOC] Document REFRESH TABLE command
### What changes were proposed in this pull request?
Document REFRESH TABLE statement in the SQL Reference Guide.

### Why are the changes needed?
Currently there is no documentation in the SPARK SQL to describe how to use this command, it is to address this issue.

### Does this PR introduce any user-facing change?
Yes.
#### Before:
There is no documentation for this.

#### After:
<img width="826" alt="Screen Shot 2019-09-12 at 11 39 21 AM" src="https://user-images.githubusercontent.com/7550280/64811385-01752600-d552-11e9-876d-91ebb005b851.png">

### How was this patch tested?
Using jykll build --serve

Closes #25549 from kevinyu98/spark-28828-refreshTable.

Authored-by: Kevin Yu <qyu@us.ibm.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-12 23:00:42 -07:00
dengziming 8f632d7045 [MINOR][DOCS] Fix few typos in the java docs
JIRA :https://issues.apache.org/jira/browse/SPARK-29050
'a hdfs' change into  'an hdfs'
'an unique' change into 'a unique'
'an url' change into 'a url'
'a error' change into 'an error'

Closes #25756 from dengziming/feature_fix_typos.

Authored-by: dengziming <dengziming@growingio.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-09-12 09:30:03 +09:00
Thomas Graves b425f8ee65 [SPARK-27492][DOC][YARN][K8S][CORE] Resource scheduling high level user docs
### What changes were proposed in this pull request?

Document the resource scheduling feature - https://issues.apache.org/jira/browse/SPARK-24615
Add general docs, yarn, kubernetes, and standalone cluster specific ones.

### Why are the changes needed?
Help users understand the feature

### Does this PR introduce any user-facing change?
docs

### How was this patch tested?
N/A

Closes #25698 from tgravescs/SPARK-27492-gpu-sched-docs.

Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2019-09-11 08:22:36 -05:00
Dilip Biswal 7309e021ec [SPARK-29028][DOCS] Add links to IBM Cloud Object Storage connector in cloud-integration.md
### What changes were proposed in this pull request?
Add links to IBM Cloud Storage connector in cloud-integration.md

### Why are the changes needed?
This page mentions the connectors to cloud providers.  Currently connector to
IBM cloud storage is not specified. This PR adds the necessary links for
completeness.

### Does this PR introduce any user-facing change?
Yes.

**Before:**
<img width="1234" alt="Screen Shot 2019-09-09 at 3 52 44 PM" src="https://user-images.githubusercontent.com/14225158/64571863-11a2c080-d31a-11e9-82e3-78c02675adb9.png">

**After.**

<img width="1234" alt="Screen Shot 2019-09-10 at 8 16 49 AM" src="https://user-images.githubusercontent.com/14225158/64626857-663e4e00-d3a3-11e9-8fa3-15ebf52ea832.png">

### How was this patch tested?
Tested using jykyll build --serve

Closes #25737 from dilipbiswal/ibm-cloud-storage.

Authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-09-10 11:19:55 -05:00
Terry Kim bf43541c92 [SPARK-28856][SQL] Implement SHOW DATABASES for Data Source V2 Tables
### What changes were proposed in this pull request?
Implement the SHOW DATABASES logical and physical plans for data source v2 tables.

### Why are the changes needed?
To support `SHOW DATABASES` SQL commands for v2 tables.

### Does this PR introduce any user-facing change?
`spark.sql("SHOW DATABASES")` will return namespaces if the default catalog is set:
```
+---------------+
|      namespace|
+---------------+
|            ns1|
|      ns1.ns1_1|
|ns1.ns1_1.ns1_2|
+---------------+
```

### How was this patch tested?
Added unit tests to `DataSourceV2SQLSuite`.

Closes #25601 from imback82/show_databases.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-09-10 21:23:57 +08:00
Gabor Somogyi e516f7e09e [SPARK-28928][SS] Use Kafka delegation token protocol on sources/sinks
### What changes were proposed in this pull request?
At the moment there are 3 places where communication protocol with Kafka cluster has to be set when delegation token used:
* On delegation token
* On source
* On sink

Most of the time users are using the same protocol on all these places (within one Kafka cluster). It would be better to declare it in one place (delegation token side) and Kafka sources/sinks can take this config over.

In this PR I've I've modified the code in a way that Kafka sources/sinks are taking over delegation token side `security.protocol` configuration when the token and the source/sink matches in `bootstrap.servers` configuration. This default configuration can be overwritten on each source/sink independently by using `kafka.security.protocol` configuration.

### Why are the changes needed?
The actual configuration's default behavior represents the minority of the use-cases and inconvenient.

### Does this PR introduce any user-facing change?
Yes, with this change users need to provide less configuration parameters by default.

### How was this patch tested?
Existing + additional unit tests.

Closes #25631 from gaborgsomogyi/SPARK-28928.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-09-09 15:41:51 -07:00
Huaxin Gao 125af78d32 [SPARK-28831][DOC][SQL] Document CLEAR CACHE statement in SQL Reference
### What changes were proposed in this pull request?
Document CLEAR CACHE statement in SQL Reference

### Why are the changes needed?
To complete SQL Reference

### Does this PR introduce any user-facing change?
Yes

After change:
![image](https://user-images.githubusercontent.com/13592258/64565512-caf89a80-d308-11e9-99ea-88e966d1b1a1.png)

### How was this patch tested?
Tested using jykyll build --serve

Closes #25541 from huaxingao/spark-28831-n.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-09 14:28:55 -07:00
Dilip Biswal c839d09789 [SPARK-28773][DOC][SQL] Handling of NULL data in Spark SQL
### What changes were proposed in this pull request?
Document ```NULL``` semantics in SQL Reference Guide.

### Why are the changes needed?
Currently Spark lacks documentation on how `NULL` data is handled in various expressions and operators. This is aimed at addressing this issue.

### Does this PR introduce any user-facing change?
Yes.
**Before:**
There was no documentation for this.

**After.**
<img width="1234" alt="Screen Shot 2019-09-08 at 11 24 41 PM" src="https://user-images.githubusercontent.com/14225158/64507782-83362c80-d290-11e9-8295-70de412ea1f4.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 24 56 PM" src="https://user-images.githubusercontent.com/14225158/64507784-83362c80-d290-11e9-8f85-fbaf6116905f.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 25 08 PM" src="https://user-images.githubusercontent.com/14225158/64507785-83362c80-d290-11e9-9f9a-1dbafbc33bba.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 25 24 PM" src="https://user-images.githubusercontent.com/14225158/64507787-83362c80-d290-11e9-99b0-fcaa4a1f9a2d.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 25 34 PM" src="https://user-images.githubusercontent.com/14225158/64507789-83cec300-d290-11e9-94e7-feb8cf65d7ce.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 25 49 PM" src="https://user-images.githubusercontent.com/14225158/64507790-83cec300-d290-11e9-8c68-d745e7e9e4ca.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 26 00 PM" src="https://user-images.githubusercontent.com/14225158/64507791-83cec300-d290-11e9-9590-1e4c7ae28dac.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 26 09 PM" src="https://user-images.githubusercontent.com/14225158/64507792-83cec300-d290-11e9-885a-58752633ee71.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 26 20 PM" src="https://user-images.githubusercontent.com/14225158/64507793-83cec300-d290-11e9-8af8-9ef17034accb.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 26 32 PM" src="https://user-images.githubusercontent.com/14225158/64507794-83cec300-d290-11e9-874b-0d419cadbf75.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 26 47 PM" src="https://user-images.githubusercontent.com/14225158/64507795-84675980-d290-11e9-9ce6-870b46b060bc.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 26 59 PM" src="https://user-images.githubusercontent.com/14225158/64507796-84675980-d290-11e9-91cc-d6ffc5e3374d.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 27 10 PM" src="https://user-images.githubusercontent.com/14225158/64507797-84675980-d290-11e9-9d36-dcc6b1e75f38.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 27 18 PM" src="https://user-images.githubusercontent.com/14225158/64507798-84675980-d290-11e9-842c-8d57877b4389.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 27 27 PM" src="https://user-images.githubusercontent.com/14225158/64507799-84675980-d290-11e9-881d-16a24c6f5acd.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 27 37 PM" src="https://user-images.githubusercontent.com/14225158/64507801-84675980-d290-11e9-8f52-875a7a3c92c1.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 27 48 PM" src="https://user-images.githubusercontent.com/14225158/64507802-84675980-d290-11e9-9586-1d66fc07c069.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 27 59 PM" src="https://user-images.githubusercontent.com/14225158/64507804-84fff000-d290-11e9-8378-2d1a6cfa76d2.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 28 08 PM" src="https://user-images.githubusercontent.com/14225158/64507805-84fff000-d290-11e9-81ec-abeec2842922.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 28 20 PM" src="https://user-images.githubusercontent.com/14225158/64507806-84fff000-d290-11e9-900f-1debb28f8f93.png">

### How was this patch tested?
Tested using jykyll build --serve

Closes #25726 from dilipbiswal/sql-ref-null-data.

Lead-authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Co-authored-by: Xiao Li <gatorsmile@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-09 13:41:17 -07:00
Sean Owen 6378d4bc06 [SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3
### What changes were proposed in this pull request?

- Remove SQLContext.createExternalTable and Catalog.createExternalTable, deprecated in favor of createTable since 2.2.0, plus tests of deprecated methods
- Remove HiveContext, deprecated in 2.0.0, in favor of `SparkSession.builder.enableHiveSupport`
- Remove deprecated KinesisUtils.createStream methods, plus tests of deprecated methods, deprecate in 2.2.0
- Remove deprecated MLlib (not Spark ML) linear method support, mostly utility constructors and 'train' methods, and associated docs. This includes methods in LinearRegression, LogisticRegression, Lasso, RidgeRegression. These have been deprecated since 2.0.0
- Remove deprecated Pyspark MLlib linear method support, including LogisticRegressionWithSGD, LinearRegressionWithSGD, LassoWithSGD
- Remove 'runs' argument in KMeans.train() method, which has been a no-op since 2.0.0
- Remove deprecated ChiSqSelector isSorted protected method
- Remove deprecated 'yarn-cluster' and 'yarn-client' master argument in favor of 'yarn' and deploy mode 'cluster', etc

Notes:

- I was not able to remove deprecated DataFrameReader.json(RDD) in favor of DataFrameReader.json(Dataset); the former was deprecated in 2.2.0, but, it is still needed to support Pyspark's .json() method, which can't use a Dataset.
- Looks like SQLContext.createExternalTable was not actually deprecated in Pyspark, but, almost certainly was meant to be? Catalog.createExternalTable was.
- I afterwards noted that the toDegrees, toRadians functions were almost removed fully in SPARK-25908, but Felix suggested keeping just the R version as they hadn't been technically deprecated. I'd like to revisit that. Do we really want the inconsistency? I'm not against reverting it again, but then that implies leaving SQLContext.createExternalTable just in Pyspark too, which seems weird.
- I *kept* LogisticRegressionWithSGD, LinearRegressionWithSGD, LassoWithSGD, RidgeRegressionWithSGD in Pyspark, though deprecated, as it is hard to remove them (still used by StreamingLogisticRegressionWithSGD?) and they are not fully removed in Scala. Maybe should not have been deprecated.

### Why are the changes needed?

Deprecated items are easiest to remove in a major release, so we should do so as much as possible for Spark 3. This does not target items deprecated 'recently' as of Spark 2.3, which is still 18 months old.

### Does this PR introduce any user-facing change?

Yes, in that deprecated items are removed from some public APIs.

### How was this patch tested?

Existing tests.

Closes #25684 from srowen/SPARK-28980.

Lead-authored-by: Sean Owen <sean.owen@databricks.com>
Co-authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-09-09 10:19:40 -05:00
Kengo Seki 1f056eb313 [SPARK-27420][DSTREAMS][KINESIS] KinesisInputDStream should expose a way to configure CloudWatch metrics
## What changes were proposed in this pull request?

KinesisInputDStream currently does not provide a way to disable
CloudWatch metrics push. Its default level is "DETAILED" which pushes
10s of metrics every 10 seconds. When dealing with multiple streaming
jobs this add up pretty quickly, leading to thousands of dollars in cost.
To address this problem, this PR adds interfaces for accessing
KinesisClientLibConfiguration's `withMetrics` and
`withMetricsEnabledDimensions` methods to KinesisInputDStream
so that users can configure KCL's metrics levels and dimensions.

## How was this patch tested?

By running updated unit tests in KinesisInputDStreamBuilderSuite.
In addition, I ran a Streaming job with MetricsLevel.NONE and confirmed:

* there's no data point for the "Operation", "Operation, ShardId" and "WorkerIdentifier" dimensions on the AWS management console
* there's no DEBUG level message from Amazon KCL, such as "Successfully published xx datums."

Please review http://spark.apache.org/contributing.html before opening a pull request.

Closes #24651 from sekikn/SPARK-27420.

Authored-by: Kengo Seki <sekikn@apache.org>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-09-08 19:48:53 -05:00
Liang-Chi Hsieh 89aba69378 [SPARK-28935][SQL][DOCS] Document SQL metrics for Details for Query Plan
### What changes were proposed in this pull request?

This patch adds the description of common SQL metrics in web ui document.

### Why are the changes needed?

The current web ui document describes query plan but does not describe the meaning SQL metrics. For end users, they might not understand the meaning of the metrics.

### Does this PR introduce any user-facing change?

No. This is just documentation change.

### How was this patch tested?

Built the docs locally.

![image](https://user-images.githubusercontent.com/11567269/64463485-1583d800-d0b9-11e9-9916-141f5c09f009.png)

Closes #25658 from viirya/SPARK-28935.

Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Xiao Li <gatorsmile@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-06 15:56:50 -07:00
Kevin Yu 36f8e53cfa [SPARK-28802][DOC][SQL] Document DESCRIBE DATABASE statement in SQL Reference
### What changes were proposed in this pull request?
Document DESCRIBE DATABASE statement in SQL Reference

### Why are the changes needed?

To complete the SQL Reference

### Does this PR introduce any user-facing change?
Yes

#### Before
There is no documentation for this command in sql reference

#### After
![Screen Shot 2019-09-05 at 12 59 32 PM](https://user-images.githubusercontent.com/7550280/64379235-53aec800-cfe3-11e9-8a51-ea55f0455c47.png)
![Screen Shot 2019-09-05 at 12 59 45 PM](https://user-images.githubusercontent.com/7550280/64379247-58737c00-cfe3-11e9-9a51-f12c5c5bc26a.png)

### How was this patch tested?
Used jekyll build and serve to verify

Closes #25528 from kevinyu98/sql-ref-describe.

Lead-authored-by: Kevin Yu <qyu@us.ibm.com>
Co-authored-by: Xiao Li <gatorsmile@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-05 16:23:08 -07:00
Huaxin Gao e4f70023ad [SPARK-28830][DOC][SQL] Document UNCACHE TABLE statement in SQL Reference
### What changes were proposed in this pull request?
Document UNCACHE TABLE statement in SQL Reference
### Why are the changes needed?
To complete SQL Reference

### Does this PR introduce any user-facing change?
Yes.

After change:

![image](https://user-images.githubusercontent.com/13592258/64299133-e04a7f00-cf2c-11e9-8f39-9b288e46c995.png)

### How was this patch tested?
Tested using jykyll build --serve

Closes #25540 from huaxingao/spark-28830.

Lead-authored-by: Huaxin Gao <huaxing@us.ibm.com>
Co-authored-by: Xiao Li <gatorsmile@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-04 21:42:01 -07:00
Dilip Biswal f96486b4aa [SPARK-28808][DOCS][SQL] Document SHOW FUNCTIONS in SQL Reference
### What changes were proposed in this pull request?
Document SHOW FUNCTIONS statement in SQL Reference Guide.

### Why are the changes needed?
Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand the
usage. This is aimed at addressing this issue.

### Does this PR introduce any user-facing change?
Yes.

**Before:**
There was no documentation for this.

**After.**

![image](https://user-images.githubusercontent.com/11567269/64281840-e3cc0f00-cf08-11e9-9784-f01392276130.png)

<img width="589" alt="Screen Shot 2019-09-04 at 11 41 44 AM" src="https://user-images.githubusercontent.com/11567269/64281911-0fe79000-cf09-11e9-955f-21b44590707c.png">

<img width="572" alt="Screen Shot 2019-09-04 at 11 41 54 AM" src="https://user-images.githubusercontent.com/11567269/64281916-12e28080-cf09-11e9-9187-688c2c751559.png">

### How was this patch tested?
Tested using jykyll build --serve

Closes #25539 from dilipbiswal/ref-doc-show-functions.

Lead-authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Co-authored-by: Xiao Li <gatorsmile@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-04 11:47:10 -07:00
Dilip Biswal b992160eae [SPARK-28811][DOCS][SQL] Document SHOW TBLPROPERTIES in SQL Reference
### What changes were proposed in this pull request?
Document SHOW TBLPROPERTIES statement in SQL Reference Guide.

### Why are the changes needed?
Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand the
usage. This is aimed at addressing this issue.

### Does this PR introduce any user-facing change?
Yes.

**Before:**
There was no documentation for this.

**After.**
![image](https://user-images.githubusercontent.com/11567269/64281442-fdb92200-cf07-11e9-90ba-4699b6e93e23.png)
![Screen Shot 2019-09-04 at 11 32 11 AM](https://user-images.githubusercontent.com/11567269/64281484-188b9680-cf08-11e9-8e42-f130751ca495.png)

### How was this patch tested?
Tested using jykyll build --serve

Closes #25571 from dilipbiswal/ref-show-tblproperties.

Authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-04 11:36:45 -07:00
Jungtaek Lim (HeartSaVioR) 594c9c5a3e [SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer
## What changes were proposed in this pull request?

This patch does pooling for both kafka consumers as well as fetched data. The overall benefits of the patch are following:

* Both pools support eviction on idle objects, which will help closing invalid idle objects which topic or partition are no longer be assigned to any tasks.
* It also enables applying different policies on pool, which helps optimization of pooling for each pool.
* We concerned about multiple tasks pointing same topic partition as well as same group id, and existing code can't handle this hence excess seek and fetch could happen. This patch properly handles the case.
* It also makes the code always safe to leverage cache, hence no need to maintain reuseCache parameter.

Moreover, pooling kafka consumers is implemented based on Apache Commons Pool, which also gives couple of benefits:

* We can get rid of synchronization of KafkaDataConsumer object while acquiring and returning InternalKafkaConsumer.
* We can extract the feature of object pool to outside of the class, so that the behaviors of the pool can be tested easily.
* We can get various statistics for the object pool, and also be able to enable JMX for the pool.

FetchedData instances are pooled by custom implementation of pool instead of leveraging Apache Commons Pool, because they have CacheKey as first key and "desired offset" as second key which "desired offset" is changing - I haven't found any general pool implementations supporting this.

This patch brings additional dependency, Apache Commons Pool 2.6.0 into `spark-sql-kafka-0-10` module.

## How was this patch tested?

Existing unit tests as well as new tests for object pool.

Also did some experiment regarding proving concurrent access of consumers for same topic partition.

* Made change on both sides (master and patch) to log when creating Kafka consumer or fetching records from Kafka is happening.
* branches
  * master: https://github.com/HeartSaVioR/spark/tree/SPARK-25151-master-ref-debugging
  * patch: https://github.com/HeartSaVioR/spark/tree/SPARK-25151-debugging
* Test query (doing self-join)
  * https://gist.github.com/HeartSaVioR/d831974c3f25c02846f4b15b8d232cc2
* Ran query from spark-shell, with using `local[*]` to maximize the chance to have concurrent access
* Collected the count of fetch requests on Kafka via command: `grep "creating new Kafka consumer" logfile | wc -l`
* Collected the count of creating Kafka consumers via command: `grep "fetching data from Kafka consumer" logfile | wc -l`

Topic and data distribution is follow:

```
truck_speed_events_stream_spark_25151_v1:0:99440
truck_speed_events_stream_spark_25151_v1:1:99489
truck_speed_events_stream_spark_25151_v1:2:397759
truck_speed_events_stream_spark_25151_v1:3:198917
truck_speed_events_stream_spark_25151_v1:4:99484
truck_speed_events_stream_spark_25151_v1:5:497320
truck_speed_events_stream_spark_25151_v1:6:99430
truck_speed_events_stream_spark_25151_v1:7:397887
truck_speed_events_stream_spark_25151_v1:8:397813
truck_speed_events_stream_spark_25151_v1:9:0
```

The experiment only used smallest 4 partitions (0, 1, 4, 6) from these partitions to finish the query earlier.

The result of experiment is below:

branch | create Kafka consumer | fetch request
-- | -- | --
master | 1986 | 2837
patch | 8 | 1706

Closes #22138 from HeartSaVioR/SPARK-25151.

Lead-authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Co-authored-by: Jungtaek Lim <kabhwan@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-09-04 10:17:38 -07:00
yangjie01 a07f795aea [SPARK-28577][YARN] Resource capability requested for each executor add offHeapMemorySize
## What changes were proposed in this pull request?

If MEMORY_OFFHEAP_ENABLED is true, add MEMORY_OFFHEAP_SIZE to resource requested for executor to ensure instance has enough memory to use.

In this pr add a helper method `executorOffHeapMemorySizeAsMb` in `YarnSparkHadoopUtil`.

## How was this patch tested?
Add 3 new test suite to test `YarnSparkHadoopUtil#executorOffHeapMemorySizeAsMb`

Closes #25309 from LuciferYang/spark-28577.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2019-09-04 09:00:12 -05:00
Huaxin Gao 56f2887dc8 [SPARK-28788][DOC][SQL] Document ANALYZE TABLE statement in SQL Reference
### What changes were proposed in this pull request?
Document ANALYZE TABLE statement in SQL Reference

### Why are the changes needed?
To complete SQL reference

### Does this PR introduce any user-facing change?
Yes

***Before***:
There was no documentation for this.

***After***:
![image](https://user-images.githubusercontent.com/13592258/64046883-f8339480-cb21-11e9-85da-6617d5c96412.png)

![image](https://user-images.githubusercontent.com/13592258/64209526-9a6eb780-ce55-11e9-9004-53c5c5d24567.png)

![image](https://user-images.githubusercontent.com/13592258/64209542-a2c6f280-ce55-11e9-8624-e7349204ec8e.png)

### How was this patch tested?
Tested using jykyll build --serve

Closes #25524 from huaxingao/spark-28788.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-03 15:26:12 -07:00
Xiao Li 2856398de9 [SPARK-28961][HOT-FIX][BUILD] Upgrade Maven from 3.6.1 to 3.6.2
### What changes were proposed in this pull request?
This PR is to upgrade the maven dependence from 3.6.1 to 3.6.2.

### Why are the changes needed?
All the builds are broken because 3.6.1 is not available.  http://ftp.wayne.edu/apache//maven/maven-3/

- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/485/
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10536/

![image](https://user-images.githubusercontent.com/11567269/64196667-36d69100-ce39-11e9-8f93-40eb333d595d.png)

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes #25665 from gatorsmile/upgradeMVN.

Authored-by: Xiao Li <gatorsmile@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-03 11:06:57 -07:00
Dilip Biswal 94e66744a7 [SPARK-28805][DOCS][SQL] Document DESCRIBE FUNCTION in SQL Reference
### What changes were proposed in this pull request?
Document DESCRIBE FUNCTION statement in SQL Reference Guide.

### Why are the changes needed?
Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand the
usage. This is aimed at addressing this issue.

### Does this PR introduce any user-facing change?
Yes.

**Before:**
There was no documentation for this.

**After.**
<img width="1234" alt="Screen Shot 2019-09-02 at 11 14 09 PM" src="https://user-images.githubusercontent.com/14225158/64148193-85534380-cdd7-11e9-9c07-5956b5e8276e.png">
<img width="1234" alt="Screen Shot 2019-09-02 at 11 14 29 PM" src="https://user-images.githubusercontent.com/14225158/64148201-8a17f780-cdd7-11e9-93d8-10ad9932977c.png">
<img width="1234" alt="Screen Shot 2019-09-02 at 11 14 42 PM" src="https://user-images.githubusercontent.com/14225158/64148208-8dab7e80-cdd7-11e9-97c5-3a4ce12cac7a.png">

### How was this patch tested?
Tested using jykyll build --serve

Closes #25530 from dilipbiswal/ref-doc-desc-function.

Lead-authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Co-authored-by: Xiao Li <gatorsmile@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-03 09:45:58 -07:00
Dilip Biswal 92ae271081 [SPARK-28806][DOCS][SQL] Document SHOW COLUMNS in SQL Reference
### What changes were proposed in this pull request?
Document SHOW COLUMNS statement in SQL Reference Guide.

### Why are the changes needed?
Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand the
usage. This is aimed at addressing this issue.

### Does this PR introduce any user-facing change?
Yes.

**Before:**
There was no documentation for this.

**After.**
<img width="1234" alt="Screen Shot 2019-09-02 at 11 07 48 PM" src="https://user-images.githubusercontent.com/14225158/64148033-0fe77300-cdd7-11e9-93ee-e5951c7ed33c.png">
<img width="1234" alt="Screen Shot 2019-09-02 at 11 08 08 PM" src="https://user-images.githubusercontent.com/14225158/64148039-137afa00-cdd7-11e9-8bec-634ea9d2594c.png">
<img width="1234" alt="Screen Shot 2019-09-02 at 11 11 45 PM" src="https://user-images.githubusercontent.com/14225158/64148046-17a71780-cdd7-11e9-91c3-95a9c97e7a77.png">

### How was this patch tested?
Tested using jykyll build --serve

Closes #25531 from dilipbiswal/ref-doc-show-columns.

Lead-authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Co-authored-by: Xiao Li <gatorsmile@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-03 09:39:26 -07:00
Huaxin Gao 585954dbed [SPARK-28790][DOC][SQL] Document CACHE TABLE statement in SQL Reference
### What changes were proposed in this pull request?
Document CACHE TABLE statement in SQL Reference

### Why are the changes needed?
To complete SQL Reference

### Does this PR introduce any user-facing change?
Yes.

Here is the screen shot:

![image](https://user-images.githubusercontent.com/13592258/64072307-26f45c80-cc41-11e9-8ab3-dc56fe8ff45f.png)

![image](https://user-images.githubusercontent.com/13592258/64072309-2cea3d80-cc41-11e9-9a4d-8cb9eb63569f.png)

### How was this patch tested?
Tested using jykyll build --serve

Closes #25532 from huaxingao/spark-28790.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-01 17:08:09 -07:00
Huaxin Gao b85a554487 [SPARK-28786][DOC][SQL][FOLLOW-UP] Change "Related Statements" to bold
### What changes were proposed in this pull request?
Change "Related Statements" to bold

### Why are the changes needed?
To make doc look nice and consistent.

### Does this PR introduce any user-facing change?
Yes

### How was this patch tested?
Tested using jykyll build --serve

Before the change:
![image](https://user-images.githubusercontent.com/13592258/63965303-ae797a00-ca4d-11e9-8a85-71fbfdeaaccb.png)

After the change:
![image](https://user-images.githubusercontent.com/13592258/63965316-b76a4b80-ca4d-11e9-9a85-48d7a909f0ef.png)

Before the change:
![image](https://user-images.githubusercontent.com/13592258/63988989-7c8b0680-ca93-11e9-9352-a9ec5457b279.png)

After the change:
![image](https://user-images.githubusercontent.com/13592258/63988996-87459b80-ca93-11e9-9e51-8cb36a632436.png)

Closes #25623 from huaxingao/spark-28786-n.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-08-31 14:58:41 -07:00
Dilip Biswal b4d7b30aa6 [SPARK-28803][DOCS][SQL] Document DESCRIBE TABLE in SQL Reference
### What changes were proposed in this pull request?
Document DESCRIBE TABLE statement in SQL Reference Guide.

### Why are the changes needed?
Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand the
usage. This is aimed at addressing this issue.

### Does this PR introduce any user-facing change?
Yes.

**Before:**
There was no documentation for this.

**After.**
<img width="1234" alt="Screen Shot 2019-08-31 at 1 53 35 PM" src="https://user-images.githubusercontent.com/14225158/64069071-f556a380-cbf6-11e9-985d-13dd37a32bbb.png">
<img width="1234" alt="Screen Shot 2019-08-31 at 1 53 50 PM" src="https://user-images.githubusercontent.com/14225158/64069073-f982c100-cbf6-11e9-925b-eb2fc85c3341.png">
<img width="1234" alt="Screen Shot 2019-08-31 at 1 54 02 PM" src="https://user-images.githubusercontent.com/14225158/64069076-0ef7eb00-cbf7-11e9-8062-9a9fb8700bb3.png">
<img width="1234" alt="Screen Shot 2019-08-31 at 1 54 15 PM" src="https://user-images.githubusercontent.com/14225158/64069077-0f908180-cbf7-11e9-9a31-9b7f122db2d3.png">
<img width="1234" alt="Screen Shot 2019-08-31 at 1 54 30 PM" src="https://user-images.githubusercontent.com/14225158/64069078-0f908180-cbf7-11e9-96ee-438a7b64c961.png">
<img width="1234" alt="Screen Shot 2019-08-31 at 1 54 42 PM" src="https://user-images.githubusercontent.com/14225158/64069079-0f908180-cbf7-11e9-9bae-734a1994f936.png">

### How was this patch tested?
Tested using jykyll build --serve

Closes #25527 from dilipbiswal/ref-doc-desc-table.

Lead-authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Co-authored-by: Xiao Li <gatorsmile@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-08-31 14:46:55 -07:00
Unknown d573e4c482 [SPARK-28542][DOCS][WEBUI] Stages Tab
### What changes were proposed in this pull request?
New documentation to explain in detail Web UI Stages page. New images are included to better explanation.
![image](https://user-images.githubusercontent.com/12819544/63807320-c05bff80-c91d-11e9-986f-e09d0b8d4bbb.png)
![image](https://user-images.githubusercontent.com/12819544/63807343-cd78ee80-c91d-11e9-9e4a-2cef3ff70577.png)
![image](https://user-images.githubusercontent.com/12819544/63807363-d9fd4700-c91d-11e9-9691-1d39b0e2c69e.png)
![image](https://user-images.githubusercontent.com/12819544/63807384-e41f4580-c91d-11e9-92bd-cb01aced3752.png)

### Does this PR introduce any user-facing change?
Only documentation

### How was this patch tested?
I have generated it using "jekyll build" to ensure that it's ok

Closes #25598 from planga82/feature/SPARK-28542_ImproveWebUIStagesPage.

Lead-authored-by: Unknown <soypab@gmail.com>
Co-authored-by: Pablo <soypab@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-08-31 13:33:44 -05:00
Dilip Biswal a08f33be68 [SPARK-28804][DOCS][SQL] Document DESCRIBE QUERY in SQL Reference
### What changes were proposed in this pull request?
Document DESCRIBE QUERY statement in SQL Reference Guide.

### Why are the changes needed?
Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand the
usage. This is aimed at addressing this issue.

### Does this PR introduce any user-facing change?
Yes.

**Before:**
There was no documentation for this.

**After.**
<img width="1234" alt="Screen Shot 2019-08-29 at 5 47 51 PM" src="https://user-images.githubusercontent.com/14225158/63985609-43e43080-ca85-11e9-8a1a-c9c15d988e24.png">
<img width="1234" alt="Screen Shot 2019-08-29 at 5 48 06 PM" src="https://user-images.githubusercontent.com/14225158/63985610-46468a80-ca85-11e9-882a-7163784f72c6.png">
<img width="1234" alt="Screen Shot 2019-08-29 at 5 48 18 PM" src="https://user-images.githubusercontent.com/14225158/63985617-49da1180-ca85-11e9-9e77-a6d6c7042a85.png">

### How was this patch tested?
Tested using jykyll build --serve

Closes #25529 from dilipbiswal/ref-doc-desc-query.

Lead-authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Co-authored-by: Xiao Li <gatorsmile@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-08-30 16:05:16 -07:00
Dilip Biswal fb1053d14a [SPARK-28807][DOCS][SQL] Document SHOW DATABASES in SQL Reference
### What changes were proposed in this pull request?
Document SHOW DATABASES statement in SQL Reference Guide.

### Why are the changes needed?
Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand the
usage. This is aimed at addressing this issue.

### Does this PR introduce any user-facing change?
Yes.

**Before:**
There was no documentation for this.

**After.**
<img width="1234" alt="Screen Shot 2019-08-28 at 11 43 36 PM" src="https://user-images.githubusercontent.com/14225158/63916727-dd600380-c9ed-11e9-8372-789110c9d2dc.png">
<img width="1234" alt="Screen Shot 2019-08-28 at 11 43 57 PM" src="https://user-images.githubusercontent.com/14225158/63916734-e0f38a80-c9ed-11e9-8ad4-d854febeaab8.png">
<img width="1234" alt="Screen Shot 2019-08-28 at 11 44 13 PM" src="https://user-images.githubusercontent.com/14225158/63916740-e4871180-c9ed-11e9-9cfc-199cd8a64852.png">

### How was this patch tested?
Tested using jykyll build --serve

Closes #25526 from dilipbiswal/ref-doc-show-db.

Authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-08-29 09:04:27 -07:00
Huaxin Gao 3e09a0fce9 [SPARK-28786][DOC][SQL] Document INSERT statement in SQL Reference
### What changes were proposed in this pull request?
Document INSERT statement in SQL Reference

### Why are the changes needed?
To complete SQL reference.

### Does this PR introduce any user-facing change?
Yes.

### How was this patch tested?
Manually checked newly added doc.

Here are the screen shots:

![image](https://user-images.githubusercontent.com/13592258/63490232-0a01a180-c469-11e9-82de-cfdc7c2343e7.png)

![image](https://user-images.githubusercontent.com/13592258/63903006-cce56400-c9c0-11e9-9f24-badd586227a2.png)

<img width="1100" alt="Screen Shot 2019-08-27 at 5 01 48 PM" src="https://user-images.githubusercontent.com/13592258/63816303-845c7680-c8ec-11e9-8c36-1b8e4d3e6286.png">

<img width="1100" alt="Screen Shot 2019-08-27 at 5 03 22 PM" src="https://user-images.githubusercontent.com/13592258/63816347-ac4bda00-c8ec-11e9-9470-fa99522e6f14.png">

![image](https://user-images.githubusercontent.com/13592258/63817393-fc2ca000-c8f0-11e9-9d66-dd9b22a9d900.png)

<img width="1102" alt="Screen Shot 2019-08-27 at 5 05 13 PM" src="https://user-images.githubusercontent.com/13592258/63816423-ea48fe00-c8ec-11e9-8f66-5b226a1ff693.png">

![image](https://user-images.githubusercontent.com/13592258/63903080-0e760f00-c9c1-11e9-966a-f45b0b1c1ea6.png)

<img width="1100" alt="Screen Shot 2019-08-27 at 5 07 19 PM" src="https://user-images.githubusercontent.com/13592258/63816494-37c56b00-c8ed-11e9-88e1-27a9101eb09d.png">

![image](https://user-images.githubusercontent.com/13592258/63816712-131dc300-c8ee-11e9-8ee7-d83b8ad07bf2.png)

![image](https://user-images.githubusercontent.com/13592258/63817479-5a598300-c8f1-11e9-8789-adae7df5535a.png)

![image](https://user-images.githubusercontent.com/13592258/63817900-4adb3980-c8f3-11e9-94fe-d60f7d61c4b4.png)

![image](https://user-images.githubusercontent.com/13592258/63903155-4da46000-c9c1-11e9-88dd-609d4fe685a9.png)

![image](https://user-images.githubusercontent.com/13592258/63817157-d652cb80-c8ef-11e9-944c-99391cf2fb0a.png)

![image](https://user-images.githubusercontent.com/13592258/63903259-aa077f80-c9c1-11e9-982f-b8590ce0270d.png)

![image](https://user-images.githubusercontent.com/13592258/63903270-b1c72400-c9c1-11e9-85c6-6d8e8cd7f006.png)

Closes #25525 from huaxingao/spark-28786.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-08-29 09:00:42 -07:00
Dilip Biswal 74527868b2 [SPARK-28789][DOCS][SQL] Document ALTER DATABASE command
### What changes were proposed in this pull request?
Document ALTER DATABSE statement in SQL Reference Guide.

### Why are the changes needed?
Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand the
usage. This is aimed at addressing this issue.

### Does this PR introduce any user-facing change?
Yes.

**Before:**
There was no documentation for this.

**After.**
<img width="1234" alt="Screen Shot 2019-08-28 at 1 51 13 PM" src="https://user-images.githubusercontent.com/14225158/63891854-fc817580-c99a-11e9-918e-6b305edf92e6.png">
<img width="1234" alt="Screen Shot 2019-08-28 at 1 51 27 PM" src="https://user-images.githubusercontent.com/14225158/63891869-0acf9180-c99b-11e9-91a4-04d870474a40.png">

### How was this patch tested?
Tested using jykyll build --serve

Closes #25523 from dilipbiswal/ref-doc-alterdb.

Lead-authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Co-authored-by: Xiao Li <gatorsmile@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-08-28 15:30:38 -07:00
Yuming Wang 1b404b9b99 [SPARK-28890][SQL] Upgrade Hive Metastore Client to the 3.1.2 for Hive 3.1
### What changes were proposed in this pull request?

Hive 3.1.2 has been released. This PR upgrades the Hive Metastore Client to 3.1.2 for Hive 3.1.

Hive 3.1.2 release notes:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12344397&styleName=Html&projectId=12310843

### Why are the changes needed?

This is an improvement to support a newly release 3.1.2. Otherwise, it will throws `UnsupportedOperationException` if user `set spark.sql.hive.metastore.version=3.1.2`:
```scala
Exception in thread "main" java.lang.UnsupportedOperationException: Unsupported Hive Metastore version (3.1.2). Please set spark.sql.hive.metastore.version with a valid version.
	at org.apache.spark.sql.hive.client.IsolatedClientLoader$.hiveVersion(IsolatedClientLoader.scala:109)
```

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existing UT

Closes #25604 from wangyum/SPARK-28890.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-08-28 09:16:54 -07:00