Commit graph

21 commits

Author SHA1 Message Date
root1 ce7a49f7fa [SPARK-30363][SQL][DOC] Add Documentation for refresh resources
### What changes were proposed in this pull request?
Documentation added for refresh resources command in spark-sql.

### Why are the changes needed?
Previously, only refresh table command was documented.

### Does this PR introduce any user-facing change?
Yes. Now users can access documentation for refresh resources command.

### How was this patch tested?
Manually.

Closes #27023 from iRakson/SPARK-30363.

Authored-by: root1 <raksonrakesh@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2019-12-31 09:36:41 -06:00
Yuanjian Li 2acae975aa [SPARK-30278][SQL][DOC] Update Spark SQL document menu for new changes
### What changes were proposed in this pull request?
Update the Spark SQL document menu and join strategy hints.

### Why are the changes needed?
- Several new changes in the Spark SQL document didn't change the menu-sql.yaml correspondingly.
- Update the demo code for join strategy hints.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Document change only.

Closes #26917 from xuanyuanking/SPARK-30278.

Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-27 13:22:26 +08:00
Kazuaki Ishizaki f31d9a629b [MINOR][DOC][SQL][CORE] Fix typo in document and comments
### What changes were proposed in this pull request?

Fixed typo in `docs` directory and in other directories

1. Find typo in `docs` and apply fixes to files in all directories
2. Fix `the the` -> `the`

### Why are the changes needed?

Better readability of documents

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

No test needed

Closes #26976 from kiszk/typo_20191221.

Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-21 14:08:58 -08:00
Huaxin Gao f0534fb9e5 [SPARK-28816][DOC][SQL] Document ADD JAR statement in SQL Reference
### What changes were proposed in this pull request?
document ADD JAR statement in SQL Reference

### Why are the changes needed?
To complete SQL reference

### Does this PR introduce any user-facing change?
yes

after change:
![image](https://user-images.githubusercontent.com/13592258/66337691-80147780-e8f4-11e9-9d7c-7c1e7ff5379a.png)

![image](https://user-images.githubusercontent.com/13592258/66337704-860a5880-e8f4-11e9-93fa-789695de29d7.png)

![image](https://user-images.githubusercontent.com/13592258/66337721-8b67a300-e8f4-11e9-9056-998187a16c7b.png)

![image](https://user-images.githubusercontent.com/13592258/66337736-928eb100-e8f4-11e9-91c5-d8935a7b93b5.png)

### How was this patch tested?
Tested using jykyll build --serve

Closes #25895 from huaxingao/spark_28816.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-10-07 13:39:03 -05:00
shivusondur d3eb4c94cc [SPARK-28822][DOC][SQL] Document USE DATABASE in SQL Reference
### What changes were proposed in this pull request?
Added document reference for USE databse sql command

### Why are the changes needed?
For USE database command usage

### Does this PR introduce any user-facing change?
It is adding the USE database sql command refernce information in the doc

### How was this patch tested?
Attached the test snap
![image](https://user-images.githubusercontent.com/7912929/65170499-7242a380-da66-11e9-819c-76df62c86c5a.png)

Closes #25572 from shivusondur/jiraUSEDaBa1.

Lead-authored-by: shivusondur <shivusondur@gmail.com>
Co-authored-by: Xiao Li <gatorsmile@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-19 13:04:17 -07:00
HyukjinKwon 7d4eb38bbc [SPARK-29052][DOCS][ML][PYTHON][CORE][R][SQL][SS] Create a Migration Guide tap in Spark documentation
### What changes were proposed in this pull request?

Currently, there is no migration section for PySpark, SparkCore and Structured Streaming.
It is difficult for users to know what to do when they upgrade.

This PR proposes to create create a "Migration Guide" tap at Spark documentation.

![Screen Shot 2019-09-11 at 7 02 05 PM](https://user-images.githubusercontent.com/6477701/64688126-ad712f80-d4c6-11e9-8672-9a2c56c05bf8.png)

![Screen Shot 2019-09-11 at 7 27 15 PM](https://user-images.githubusercontent.com/6477701/64689915-389ff480-d4ca-11e9-8c54-7f46095d0d23.png)

This page will contain migration guides for Spark SQL, PySpark, SparkR, MLlib, Structured Streaming and Core. Basically it is a refactoring.

There are some new information added, which I will leave a comment inlined for easier review.

1. **MLlib**
  Merge [ml-guide.html#migration-guide](https://spark.apache.org/docs/latest/ml-guide.html#migration-guide) and [ml-migration-guides.html](https://spark.apache.org/docs/latest/ml-migration-guides.html)

    ```
    'docs/ml-guide.md'
            ↓ Merge new/old migration guides
    'docs/ml-migration-guide.md'
    ```

2. **PySpark**
  Extract PySpark specific items from https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html

    ```
    'docs/sql-migration-guide-upgrade.md'
           ↓ Extract PySpark specific items
    'docs/pyspark-migration-guide.md'
    ```

3. **SparkR**
  Move [sparkr.html#migration-guide](https://spark.apache.org/docs/latest/sparkr.html#migration-guide) into a separate file, and extract from [sql-migration-guide-upgrade.html](https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html)

    ```
    'docs/sparkr.md'                     'docs/sql-migration-guide-upgrade.md'
     Move migration guide section ↘     ↙ Extract SparkR specific items
                     docs/sparkr-migration-guide.md
    ```

4. **Core**
  Newly created at `'docs/core-migration-guide.md'`. I skimmed resolved JIRAs at 3.0.0 and found some items to note.

5. **Structured Streaming**
  Newly created at `'docs/ss-migration-guide.md'`. I skimmed resolved JIRAs at 3.0.0 and found some items to note.

6. **SQL**
  Merged [sql-migration-guide-upgrade.html](https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html) and [sql-migration-guide-hive-compatibility.html](https://spark.apache.org/docs/latest/sql-migration-guide-hive-compatibility.html)
    ```
    'docs/sql-migration-guide-hive-compatibility.md'     'docs/sql-migration-guide-upgrade.md'
     Move Hive compatibility section ↘                   ↙ Left over after filtering PySpark and SparkR items
                                  'docs/sql-migration-guide.md'
    ```

### Why are the changes needed?

In order for users in production to effectively migrate to higher versions, and detect behaviour or breaking changes before upgrading and/or migrating.

### Does this PR introduce any user-facing change?
Yes, this changes Spark's documentation at https://spark.apache.org/docs/latest/index.html.

### How was this patch tested?

Manually build the doc. This can be verified as below:

```bash
cd docs
SKIP_API=1 jekyll build
open _site/index.html
```

Closes #25757 from HyukjinKwon/migration-doc.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-15 11:17:30 -07:00
Kevin Yu ee63031270 [SPARK-28828][DOC] Document REFRESH TABLE command
### What changes were proposed in this pull request?
Document REFRESH TABLE statement in the SQL Reference Guide.

### Why are the changes needed?
Currently there is no documentation in the SPARK SQL to describe how to use this command, it is to address this issue.

### Does this PR introduce any user-facing change?
Yes.
#### Before:
There is no documentation for this.

#### After:
<img width="826" alt="Screen Shot 2019-09-12 at 11 39 21 AM" src="https://user-images.githubusercontent.com/7550280/64811385-01752600-d552-11e9-876d-91ebb005b851.png">

### How was this patch tested?
Using jykll build --serve

Closes #25549 from kevinyu98/spark-28828-refreshTable.

Authored-by: Kevin Yu <qyu@us.ibm.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-12 23:00:42 -07:00
Dilip Biswal c839d09789 [SPARK-28773][DOC][SQL] Handling of NULL data in Spark SQL
### What changes were proposed in this pull request?
Document ```NULL``` semantics in SQL Reference Guide.

### Why are the changes needed?
Currently Spark lacks documentation on how `NULL` data is handled in various expressions and operators. This is aimed at addressing this issue.

### Does this PR introduce any user-facing change?
Yes.
**Before:**
There was no documentation for this.

**After.**
<img width="1234" alt="Screen Shot 2019-09-08 at 11 24 41 PM" src="https://user-images.githubusercontent.com/14225158/64507782-83362c80-d290-11e9-8295-70de412ea1f4.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 24 56 PM" src="https://user-images.githubusercontent.com/14225158/64507784-83362c80-d290-11e9-8f85-fbaf6116905f.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 25 08 PM" src="https://user-images.githubusercontent.com/14225158/64507785-83362c80-d290-11e9-9f9a-1dbafbc33bba.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 25 24 PM" src="https://user-images.githubusercontent.com/14225158/64507787-83362c80-d290-11e9-99b0-fcaa4a1f9a2d.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 25 34 PM" src="https://user-images.githubusercontent.com/14225158/64507789-83cec300-d290-11e9-94e7-feb8cf65d7ce.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 25 49 PM" src="https://user-images.githubusercontent.com/14225158/64507790-83cec300-d290-11e9-8c68-d745e7e9e4ca.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 26 00 PM" src="https://user-images.githubusercontent.com/14225158/64507791-83cec300-d290-11e9-9590-1e4c7ae28dac.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 26 09 PM" src="https://user-images.githubusercontent.com/14225158/64507792-83cec300-d290-11e9-885a-58752633ee71.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 26 20 PM" src="https://user-images.githubusercontent.com/14225158/64507793-83cec300-d290-11e9-8af8-9ef17034accb.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 26 32 PM" src="https://user-images.githubusercontent.com/14225158/64507794-83cec300-d290-11e9-874b-0d419cadbf75.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 26 47 PM" src="https://user-images.githubusercontent.com/14225158/64507795-84675980-d290-11e9-9ce6-870b46b060bc.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 26 59 PM" src="https://user-images.githubusercontent.com/14225158/64507796-84675980-d290-11e9-91cc-d6ffc5e3374d.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 27 10 PM" src="https://user-images.githubusercontent.com/14225158/64507797-84675980-d290-11e9-9d36-dcc6b1e75f38.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 27 18 PM" src="https://user-images.githubusercontent.com/14225158/64507798-84675980-d290-11e9-842c-8d57877b4389.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 27 27 PM" src="https://user-images.githubusercontent.com/14225158/64507799-84675980-d290-11e9-881d-16a24c6f5acd.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 27 37 PM" src="https://user-images.githubusercontent.com/14225158/64507801-84675980-d290-11e9-8f52-875a7a3c92c1.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 27 48 PM" src="https://user-images.githubusercontent.com/14225158/64507802-84675980-d290-11e9-9586-1d66fc07c069.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 27 59 PM" src="https://user-images.githubusercontent.com/14225158/64507804-84fff000-d290-11e9-8378-2d1a6cfa76d2.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 28 08 PM" src="https://user-images.githubusercontent.com/14225158/64507805-84fff000-d290-11e9-81ec-abeec2842922.png">
<img width="1234" alt="Screen Shot 2019-09-08 at 11 28 20 PM" src="https://user-images.githubusercontent.com/14225158/64507806-84fff000-d290-11e9-900f-1debb28f8f93.png">

### How was this patch tested?
Tested using jykyll build --serve

Closes #25726 from dilipbiswal/sql-ref-null-data.

Lead-authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Co-authored-by: Xiao Li <gatorsmile@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-09 13:41:17 -07:00
Dilip Biswal a5df5ff0fd [SPARK-28734][DOC] Initial table of content in the left hand side bar for SQL doc
## What changes were proposed in this pull request?
This is a initial PR that creates the table of content for SQL reference guide. The left side bar will displays additional menu items corresponding to supported SQL constructs. One this PR is merged, we will fill in the content incrementally.  Additionally this PR contains a minor change to make the left sidebar scrollable. Currently it is not possible to scroll in the left hand side window.

## How was this patch tested?
Used jekyll build and serve to verify.

Closes #25459 from dilipbiswal/ref-doc.

Authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
2019-08-18 23:17:50 -07:00
zhengruifeng b29829e2ab [SPARK-25584][ML][DOC] datasource for libsvm user guide
## What changes were proposed in this pull request?
it seems that doc for libsvm datasource is not added in https://github.com/apache/spark/pull/22675.
This pr is to add it.

## How was this patch tested?
doc built locally
![图片](https://user-images.githubusercontent.com/7322292/62044350-4ad51480-b235-11e9-8f09-cbcbe9d3b7f9.png)

Closes #25286 from zhengruifeng/doc_libsvm_data_source.

Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-08-01 09:15:42 -05:00
Takeshi Yamamuro 967e4cb011 [SPARK-26215][SQL] Define reserved/non-reserved keywords based on the ANSI SQL standard
## What changes were proposed in this pull request?
This pr targeted to define reserved/non-reserved keywords for Spark SQL based on the ANSI SQL standards and the other database-like systems (e.g., PostgreSQL). We assume that they basically follow the ANSI SQL-2011 standard, but it is slightly different between each other. Therefore, this pr documented all the keywords in `docs/sql-reserved-and-non-reserved-key-words.md`.

NOTE: This pr only added a small set of keywords as reserved ones and these keywords are reserved in all the ANSI SQL standards (SQL-92, SQL-99, SQL-2003, SQL-2008, SQL-2011, and SQL-2016) and PostgreSQL. This is because there is room to discuss which keyword should be reserved or not, .e.g., interval units (day, hour, minute, second, ...) are reserved in the ANSI SQL standards though, they are not reserved in PostgreSQL. Therefore, we need more researches about the other database-like systems (e.g., Oracle Databases, DB2, SQL server) in follow-up activities.

References:
 - The reserved/non-reserved SQL keywords in the ANSI SQL standards: https://developer.mimer.com/wp-content/uploads/2018/05/Standard-SQL-Reserved-Words-Summary.pdf
 - SQL Key Words in PostgreSQL: https://www.postgresql.org/docs/current/sql-keywords-appendix.html

## How was this patch tested?
Added tests in `TableIdentifierParserSuite`.

Closes #23259 from maropu/SPARK-26215-WIP.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2019-02-23 08:38:47 +09:00
WeichenXu 6540c2f8f3 [SPARK-25347][ML][DOC] Spark datasource for image/libsvm user guide
## What changes were proposed in this pull request?

Spark datasource for image/libsvm user guide

## How was this patch tested?
Scala:
<img width="1022" alt="1" src="https://user-images.githubusercontent.com/19235986/47330111-a4f2e900-d6a9-11e8-9a6f-609fb8cd0f8a.png">

Java:
<img width="1019" alt="2" src="https://user-images.githubusercontent.com/19235986/47330114-a9b79d00-d6a9-11e8-97fe-c7e4b8dd5086.png">

Python:
<img width="1022" alt="3" src="https://user-images.githubusercontent.com/19235986/47330120-afad7e00-d6a9-11e8-8a0c-4340c2af727b.png">

R:
<img width="1024" alt="4" src="https://user-images.githubusercontent.com/19235986/47330126-b3410500-d6a9-11e8-9329-5e6217718edd.png">

Closes #22675 from WeichenXu123/add_image_source_doc.

Authored-by: WeichenXu <weichen.xu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2018-10-25 23:03:16 +08:00
Kazuaki Ishizaki c391dc65ef [SPARK-24499][SQL][DOC][FOLLOW-UP] Fix spelling in doc
## What changes were proposed in this pull request?

This PR replaces `turing` with `tuning` in files and a file name. Currently, in the left side menu, `Turing` is shown. [This page](https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/_site/sql-performance-turing.html) is one of examples.
![image](https://user-images.githubusercontent.com/1315079/47332714-20a96180-d6bb-11e8-9a5a-0a8dad292626.png)

## How was this patch tested?

`grep -rin turing docs` && `find docs -name "*turing*"`

Closes #22800 from kiszk/SPARK-24499-follow.

Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2018-10-23 12:19:31 +08:00
Yuanjian Li 987f386588 [SPARK-24499][SQL][DOC] Split the page of sql-programming-guide.html to multiple separate pages
## What changes were proposed in this pull request?

1. Split the main page of sql-programming-guide into 7 parts:

- Getting Started
- Data Sources
- Performance Turing
- Distributed SQL Engine
- PySpark Usage Guide for Pandas with Apache Arrow
- Migration Guide
- Reference

2. Add left menu for sql-programming-guide, keep first level index for each part in the menu.
![image](https://user-images.githubusercontent.com/4833765/47016859-6332e180-d183-11e8-92e8-ce62518a83c4.png)

## How was this patch tested?

Local test with jekyll build/serve.

Closes #22746 from xuanyuanking/SPARK-24499.

Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
2018-10-18 11:59:06 -07:00
Yanbo Liang 697a5e5517 [SPARK-20505][ML] Add docs and examples for ml.stat.Correlation and ml.stat.ChiSquareTest.
## What changes were proposed in this pull request?
Add docs and examples for ```ml.stat.Correlation``` and ```ml.stat.ChiSquareTest```.

## How was this patch tested?
Generate docs and run examples manually, successfully.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #17994 from yanboliang/spark-20505.
2017-05-18 11:54:09 +08:00
Yuhao Yang add9d1bba5 [SPARK-19791][ML] Add doc and example for fpgrowth
## What changes were proposed in this pull request?

Add a new section for fpm
Add Example for FPGrowth in scala and Java

updated: Rewrite transform to be more compact.

## How was this patch tested?

local doc generation.

Author: Yuhao Yang <yuhao.yang@intel.com>

Closes #17130 from hhbyyh/fpmdoc.
2017-04-29 10:51:45 -07:00
Joseph K. Bradley 5ffd5d3838 [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API primary in MLlib guide
## What changes were proposed in this pull request?

Made DataFrame-based API primary
* Spark doc menu bar and other places now link to ml-guide.html, not mllib-guide.html
* mllib-guide.html keeps RDD-specific list of features, with a link at the top redirecting people to ml-guide.html
* ml-guide.html includes a "maintenance mode" announcement about the RDD-based API
  * **Reviewers: please check this carefully**
* (minor) Titles for DF API no longer include "- spark.ml" suffix.  Titles for RDD API have "- RDD-based API" suffix
* Moved migration guide to ml-guide from mllib-guide
  * Also moved past guides from mllib-migration-guides to ml-migration-guides, with a redirect link on mllib-migration-guides
  * **Reviewers**: I did not change any of the content of the migration guides.

Reorganized DataFrame-based guide:
* ml-guide.html mimics the old mllib-guide.html page in terms of content: overview, migration guide, etc.
* Moved Pipeline description into ml-pipeline.html and moved tuning into ml-tuning.html
  * **Reviewers**: I did not change the content of these guides, except some intro text.
* Sidebar remains the same, but with pipeline and tuning sections added

Other:
* ml-classification-regression.html: Moved text about linear methods to new section in page

## How was this patch tested?

Generated docs locally

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #14213 from jkbradley/ml-guide-2.0.
2016-07-15 13:38:23 -07:00
BenFradet 00c72d27bf [SPARK-12247][ML][DOC] Documentation for spark.ml's ALS and collaborative filtering in general
This documents the implementation of ALS in `spark.ml` with example code in scala, java and python.

Author: BenFradet <benjamin.fradet@gmail.com>

Closes #10411 from BenFradet/SPARK-12247.
2016-02-16 13:03:28 +00:00
Timothy Hunter 2ecbe02d5b [SPARK-12212][ML][DOC] Clarifies the difference between spark.ml, spark.mllib and mllib in the documentation.
Replaces a number of occurences of `MLlib` in the documentation that were meant to refer to the `spark.mllib` package instead. It should clarify for new users the difference between `spark.mllib` (the package) and MLlib (the umbrella project for ML in spark).

It also removes some files that I forgot to delete with #10207

Author: Timothy Hunter <timhunter@databricks.com>

Closes #10234 from thunterdb/12212.
2015-12-10 12:50:46 -08:00
Timothy Hunter 765c67f5f2 [SPARK-8517][ML][DOC] Reorganizes the spark.ml user guide
This PR moves pieces of the spark.ml user guide to reflect suggestions in SPARK-8517. It does not introduce new content, as requested.

<img width="192" alt="screen shot 2015-12-08 at 11 36 00 am" src="https://cloud.githubusercontent.com/assets/7594753/11666166/e82b84f2-9d9f-11e5-8904-e215424d8444.png">

Author: Timothy Hunter <timhunter@databricks.com>

Closes #10207 from thunterdb/spark-8517.
2015-12-08 18:40:21 -08:00
Timothy Hunter fc4b792d28 [SPARK-11835] Adds a sidebar menu to MLlib's documentation
This PR adds a sidebar menu when browsing the user guide of MLlib. It uses a YAML file to describe the structure of the documentation. It should be trivial to adapt this to the other projects.

![screen shot 2015-11-18 at 4 46 12 pm](https://cloud.githubusercontent.com/assets/7594753/11259591/a55173f4-8e17-11e5-9340-0aed79d66262.png)

Author: Timothy Hunter <timhunter@databricks.com>

Closes #9826 from thunterdb/spark-11835.
2015-11-22 21:51:42 -08:00