Commit graph

44 commits

Author SHA1 Message Date
Leona Yoda aeb3da2798 [SPARK-36541][DOCS][PYTHON] Replace the word Koalas to pandas-on-Spark
### What changes were proposed in this pull request?

Replace images in pyspark on pandas document because those images uses the word Koalas

### Why are the changes needed?

Images in Transform and apply a function documentation still uses the word Koalas, althogh the word was replaced to panas-on-Spark by this PR .
https://github.com/apache/spark/pull/32835

I think we have to match the word on that images

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

`make html`

Screen shots
![130179112-8485fdde-b422-4834-8b23-fe69e7402118](https://user-images.githubusercontent.com/14937752/130186051-d6ff65f0-c121-40bd-b4f1-2fbc10e76f3e.png)
![130179239-8dae7812-4d81-4f8c-8558-b75e4eae3787](https://user-images.githubusercontent.com/14937752/130186063-17d4a95f-0b9d-49d3-85c7-13ea07e4b6bb.png)
![130179273-10f9fbc3-0a62-4e1a-ab6e-7049d75653a1](https://user-images.githubusercontent.com/14937752/130186074-7d684669-b9ef-4a4e-8a2d-c63bb9800ddb.png)
![130179311-616545af-dde2-4dec-807f-dde0a0d4bfbe](https://user-images.githubusercontent.com/14937752/130186095-20669673-b1d3-4552-97bf-86bbc1a5d43b.png)
Environment
- Windows 10
- Google Chrome 92.0.4515.159

[images.pptx](https://github.com/apache/spark/files/7029087/images.pptx)

Closes #33786 from yoda-mon/replace-pyspark-doc-images.

Authored-by: Leona Yoda <yodal@oss.nttdata.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-08-26 19:03:02 +09:00
Gengliang Wang df98d5b5f1 [SPARK-34249][DOCS] Add documentation for ANSI implicit cast rules
### What changes were proposed in this pull request?

Add documentation for the ANSI implicit cast rules which are introduced from https://github.com/apache/spark/pull/31349

### Why are the changes needed?

Better documentation.

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

Build and preview in local:
![image](https://user-images.githubusercontent.com/1097932/127149039-f0cc4766-8eca-4061-bc35-c8e67f009544.png)
![image](https://user-images.githubusercontent.com/1097932/127149072-1b65ef56-65ff-4327-9a5e-450d44719073.png)

![image](https://user-images.githubusercontent.com/1097932/127033375-b4536854-ca72-42fa-8ea9-dde158264aa5.png)
![image](https://user-images.githubusercontent.com/1097932/126950445-435ba521-92b8-44d1-8f2c-250b9efb4b98.png)
![image](https://user-images.githubusercontent.com/1097932/126950495-9aa4e960-60cd-4b20-88d9-b697ff57a7f7.png)

Closes #33516 from gengliangwang/addDoc.

Lead-authored-by: Gengliang Wang <gengliang@apache.org>
Co-authored-by: Serge Rielau <serge@rielau.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-07-27 20:48:49 +08:00
Jungtaek Lim 0eb31a06d6 [SPARK-36172][SS] Document session window into Structured Streaming guide doc
### What changes were proposed in this pull request?

This PR documents a new feature "native support of session window" into Structured Streaming guide doc.

Screenshots are following:

![스크린샷 2021-07-20 오후 5 04 20](https://user-images.githubusercontent.com/1317309/126284848-526ec056-1028-4a70-a1f4-ae275d4b5437.png)

![스크린샷 2021-07-20 오후 3 34 38](https://user-images.githubusercontent.com/1317309/126276763-763cf841-aef7-412a-aa03-d93273f0c850.png)

### Why are the changes needed?

This change is needed to explain a new feature to the end users.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Documentation changes.

Closes #33433 from HeartSaVioR/SPARK-36172.

Authored-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
2021-07-21 10:45:31 +09:00
Gabor Somogyi b6a0a7ea53 [SPARK-35311][SS][UI][DOCS] Structured Streaming Web UI state information documentation
### What changes were proposed in this pull request?
In this PR I'm adding Structured Streaming Web UI state information documentation.

### Why are the changes needed?
Missing documentation.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
```
cd docs/
SKIP_API=1 bundle exec jekyll build
```
Manual webpage check.

Closes #32433 from gaborgsomogyi/SPARK-35311.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
2021-05-14 10:40:12 +09:00
itholic 9c653c957f [SPARK-32189][DOCS][PYTHON] Development - Setting up IDEs
### What changes were proposed in this pull request?

This PR proposes to document the way of setting up IDEs

![스크린샷 2020-09-21 오전 10 43 12](https://user-images.githubusercontent.com/44108233/93727715-5c2a6e80-fbf7-11ea-821b-555723b00bc8.png)
![스크린샷 2020-09-21 오전 10 43 45](https://user-images.githubusercontent.com/44108233/93727716-5f255f00-fbf7-11ea-9c6c-7b8a973bc511.png)

### Why are the changes needed?

To let users know how to setup IDEs

### Does this PR introduce _any_ user-facing change?

Yes, it adds a new page in the documentation about setting IDEs.

### How was this patch tested?

Manually built the doc.

Closes #29781 from itholic/SPARK-32189.

Authored-by: itholic <haejoon309@naver.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-09-21 12:29:17 +09:00
HyukjinKwon c336ae39cd [SPARK-32186][DOCS][PYTHON] Development - Debugging
### What changes were proposed in this pull request?

This PR proposes to document the way of debugging PySpark. It's pretty much self-descriptive.

I made a demo site to review it more effectively: https://hyukjin-spark.readthedocs.io/en/stable/development/debugging.html

### Why are the changes needed?

To let users know how to debug PySpark applications.

### Does this PR introduce _any_ user-facing change?

Yes, it adds a new page in the documentation about debugging PySpark.

### How was this patch tested?

Manually built the doc.

Closes #29639 from HyukjinKwon/SPARK-32186.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-09-08 10:32:22 +09:00
HyukjinKwon 15b73339d9 [SPARK-32507][DOCS][PYTHON] Add main page for PySpark documentation
### What changes were proposed in this pull request?

This PR proposes to write the main page of PySpark documentation. The base work is finished at https://github.com/apache/spark/pull/29188.

### Why are the changes needed?

For better usability and readability in PySpark documentation.

### Does this PR introduce _any_ user-facing change?

Yes, it creates a new main page as below:

![Screen Shot 2020-07-31 at 10 02 44 PM](https://user-images.githubusercontent.com/6477701/89037618-d2d68880-d379-11ea-9a44-562f2aa0e3fd.png)

### How was this patch tested?

Manually built the PySpark documentation.

```bash
cd python
make clean html
```

Closes #29320 from HyukjinKwon/SPARK-32507.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-08-05 11:14:14 +09:00
HyukjinKwon 6ab29b37cf [SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base
### What changes were proposed in this pull request?

This PR proposes to redesign the PySpark documentation.

I made a demo site to make it easier to review: https://hyukjin-spark.readthedocs.io/en/stable/reference/index.html.

Here is the initial draft for the final PySpark docs shape: https://hyukjin-spark.readthedocs.io/en/latest/index.html.

In more details, this PR proposes:
1. Use [pydata_sphinx_theme](https://github.com/pandas-dev/pydata-sphinx-theme) theme - [pandas](https://pandas.pydata.org/docs/) and [Koalas](https://koalas.readthedocs.io/en/latest/) use this theme. The CSS overwrite is ported from Koalas. The colours in the CSS were actually chosen by designers to use in Spark.
2. Use the Sphinx option to separate `source` and `build` directories as the documentation pages will likely grow.
3. Port current API documentation into the new style. It mimics Koalas and pandas to use the theme most effectively.

    One disadvantage of this approach is that you should list up APIs or classes; however, I think this isn't a big issue in PySpark since we're being conservative on adding APIs. I also intentionally listed classes only instead of functions in ML and MLlib to make it relatively easier to manage.

### Why are the changes needed?

Often I hear the complaints, from the users, that current PySpark documentation is pretty messy to read - https://spark.apache.org/docs/latest/api/python/index.html compared other projects such as [pandas](https://pandas.pydata.org/docs/) and [Koalas](https://koalas.readthedocs.io/en/latest/).

It would be nicer if we can make it more organised instead of just listing all classes, methods and attributes to make it easier to navigate.

Also, the documentation has been there from almost the very first version of PySpark. Maybe it's time to update it.

### Does this PR introduce _any_ user-facing change?

Yes, PySpark API documentation will be redesigned.

### How was this patch tested?

Manually tested, and the demo site was made to show.

Closes #29188 from HyukjinKwon/SPARK-32179.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-07-27 17:49:21 +09:00
Xingcan Cui 8ba2b47737 [SPARK-31792][SS][DOCS] Introduce the structured streaming UI in the Web UI doc
### What changes were proposed in this pull request?
This PR adds the structured streaming UI introduction to the Web UI doc.

![image](https://user-images.githubusercontent.com/1452518/82642209-92b99380-9bdb-11ea-9a0d-cbb26040b0ef.png)

### Why are the changes needed?
The structured streaming web UI introduced before was missing from the Web UI documentation.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
N.A.

Closes #28609 from xccui/ss-ui-doc.

Authored-by: Xingcan Cui <xccui@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-05-26 14:27:42 +09:00
Luca Canali fd308ade52 [SPARK-30041][SQL][WEBUI] Add Codegen Stage Id to Stage DAG visualization in Web UI
### What changes were proposed in this pull request?
SPARK-29894 provides information on the Codegen Stage Id in WEBUI for SQL Plan graphs. Similarly, this proposes to add Codegen Stage Id in the DAG visualization for Stage execution. DAGs for Stage execution are available in the WEBUI under the Jobs and Stages tabs.

### Why are the changes needed?
This is proposed as an aid for drill-down analysis of complex SQL statement execution, as it is not always easy to match parts of the SQL Plan graph with the corresponding Stage DAG execution graph. Adding Codegen Stage Id for WholeStageCodegen operations makes this task easier.

### Does this PR introduce any user-facing change?
Stage DAG visualization in the WEBUI will show codegen stage id for WholeStageCodegen operations, as in the example snippet from the WEBUI, Jobs tab  (the query used in the example is TPCDS 2.4 q14a):
![](https://issues.apache.org/jira/secure/attachment/12987461/Snippet_StagesDags_with_CodegenId%20_annotated.png)

### How was this patch tested?
Manually tested, see also example snippet.

Closes #26675 from LucaCanali/addCodegenStageIdtoStageGraph.

Authored-by: Luca Canali <luca.canali@cern.ch>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-01-18 01:00:45 +08:00
Luca Canali b5df40bd87 [SPARK-29894][SQL][WEBUI] Add Codegen Stage Id to Spark plan graphs in Web UI SQL Tab
### What changes were proposed in this pull request?
The Web UI SQL Tab provides information on the executed SQL using plan graphs and by reporting SQL execution plans. Both sources provide useful information. Physical execution plans report Codegen Stage Ids. This PR adds Codegen Stage Ids to the plan graphs.

### Why are the changes needed?
It is useful to have Codegen Stage Id information also reported in plan graphs, this allows to more easily match physical plans and graphs with metrics when troubleshooting SQL execution.
Example snippet to show the proposed change:

![](https://issues.apache.org/jira/secure/attachment/12985837/snippet__plan_graph_with_Codegen_Stage_Id_Annotated.png)

Example of the current state:
![](https://issues.apache.org/jira/secure/attachment/12985838/snippet_plan_graph_before_patch.png)

Physical plan:
![](https://issues.apache.org/jira/secure/attachment/12985932/Physical_plan_Annotated.png)

### Does this PR introduce any user-facing change?
This PR adds Codegen Stage Id information to SQL plan graphs in the Web UI/SQL Tab.

### How was this patch tested?
Added a test + manually tested

Closes #26519 from LucaCanali/addCodegenStageIdtoWEBUIGraphs.

Authored-by: Luca Canali <luca.canali@cern.ch>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-20 23:20:33 +08:00
Pablo Langa d334fee502 [SPARK-28373][DOCS][WEBUI] JDBC/ODBC Server Tab
### What changes were proposed in this pull request?
New documentation to explain in detail JDBC/ODBC server tab. New images are included to better explanation.

![image](https://user-images.githubusercontent.com/12819544/64735402-c4287e00-d4e8-11e9-9366-c8ac0fbfc058.png)
![image](https://user-images.githubusercontent.com/12819544/64735429-cee31300-d4e8-11e9-83f1-0b662037e194.png)

### Does this PR introduce any user-facing change?
Only documentation

### How was this patch tested?
I have generated it using "jekyll build" to ensure that it's ok

Closes #25718 from planga82/SPARK-28373_JDBCServerPage.

Lead-authored-by: Pablo Langa <soypab@gmail.com>
Co-authored-by: Unknown <soypab@gmail.com>
Co-authored-by: Pablo <soypab@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-09-14 10:18:52 -07:00
Unknown d573e4c482 [SPARK-28542][DOCS][WEBUI] Stages Tab
### What changes were proposed in this pull request?
New documentation to explain in detail Web UI Stages page. New images are included to better explanation.
![image](https://user-images.githubusercontent.com/12819544/63807320-c05bff80-c91d-11e9-986f-e09d0b8d4bbb.png)
![image](https://user-images.githubusercontent.com/12819544/63807343-cd78ee80-c91d-11e9-9e4a-2cef3ff70577.png)
![image](https://user-images.githubusercontent.com/12819544/63807363-d9fd4700-c91d-11e9-9691-1d39b0e2c69e.png)
![image](https://user-images.githubusercontent.com/12819544/63807384-e41f4580-c91d-11e9-92bd-cb01aced3752.png)

### Does this PR introduce any user-facing change?
Only documentation

### How was this patch tested?
I have generated it using "jekyll build" to ensure that it's ok

Closes #25598 from planga82/feature/SPARK-28542_ImproveWebUIStagesPage.

Lead-authored-by: Unknown <soypab@gmail.com>
Co-authored-by: Pablo <soypab@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-08-31 13:33:44 -05:00
zhengruifeng 3e7b0e1dd6 [SPARK-28539][WEBUI][DOC] Document Executors page
### What changes were proposed in this pull request?
1, add a basic doc for executor page
2, btw, move the version number in the document of SQL page outside

### Why are the changes needed?
Spark web UIs are being used to monitor the status and resource consumption of your Spark applications and clusters. However, we do not have the corresponding document. It is hard for end users to use and understand them.

### Does this PR introduce any user-facing change?
yes, the doc is changed

### How was this patch tested?
locally build

<img width="468" alt="图片" src="https://user-images.githubusercontent.com/7322292/63758724-d2727980-c8ee-11e9-8380-cbae51453629.png">

Closes #25596 from zhengruifeng/doc_ui_exe.

Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-08-28 08:34:24 -05:00
zhengruifeng bdef7125b7 [SPARK-28540][WEBUI] Document Environment page
## What changes were proposed in this pull request?
Document Environment page

## How was this patch tested?
locally building

![图片](https://user-images.githubusercontent.com/7322292/63237759-e3c7e000-c275-11e9-8e1f-57ed1b0e86e8.png)

Closes #25430 from zhengruifeng/doc_ui_conf.

Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-08-21 10:48:48 -05:00
zhengruifeng c4257b18a1 [SPARK-28541][WEBUI] Document Storage page
## What changes were proposed in this pull request?
add an example for storage tab

## How was this patch tested?
locally building

Closes #25445 from zhengruifeng/doc_ui_storage.

Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-08-20 20:05:13 -05:00
Unknown 3f35440304 [SPARK-28543][DOCS][WEBUI] Document Spark Jobs page
## What changes were proposed in this pull request?

New documentation to explain in detail Web UI Jobs page and link it to monitoring page. New images are included to better explanation

![image](https://user-images.githubusercontent.com/12819544/62898145-2741bc00-bd55-11e9-89f7-175a4fd81009.png)
![image](https://user-images.githubusercontent.com/12819544/62898187-39235f00-bd55-11e9-9f03-a4d179e197fe.png)

## How was this patch tested?

This pull request contains only documentation. I have generated it using "jekyll build" to ensure that it's ok

Closes #25424 from planga82/feature/SPARK-28543_ImproveWebUIDocs.

Lead-authored-by: Unknown <soypab@gmail.com>
Co-authored-by: Pablo <soypab@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-08-15 08:52:23 -05:00
zhengruifeng ae4edd5489 [SPARK-28538][UI] Document SQL page
## What changes were proposed in this pull request?
1, add basic doc for each page;
2, doc SQL page with an exmple;

## How was this patch tested?
locally built

![图片](https://user-images.githubusercontent.com/7322292/62421626-86f5f280-b6d7-11e9-8057-8be3a4afb611.png)

![图片](https://user-images.githubusercontent.com/7322292/62421634-9d9c4980-b6d7-11e9-8e31-1e6ba9b402e8.png)

Closes #25349 from zhengruifeng/doc_ui_sql.

Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-08-12 08:36:01 -05:00
foxish 7ab165b706 [SPARK-22648][K8S] Spark on Kubernetes - Documentation
What changes were proposed in this pull request?

This PR contains documentation on the usage of Kubernetes scheduler in Spark 2.3, and a shell script to make it easier to build docker images required to use the integration. The changes detailed here are covered by https://github.com/apache/spark/pull/19717 and https://github.com/apache/spark/pull/19468 which have merged already.

How was this patch tested?
The script has been in use for releases on our fork. Rest is documentation.

cc rxin mateiz (shepherd)
k8s-big-data SIG members & contributors: foxish ash211 mccheah liyinan926 erikerlandson ssuchter varunkatta kimoonkim tnachen ifilonenko
reviewers: vanzin felixcheung jiangxb1987 mridulm

TODO:
- [x] Add dockerfiles directory to built distribution. (https://github.com/apache/spark/pull/20007)
- [x] Change references to docker to instead say "container" (https://github.com/apache/spark/pull/19995)
- [x] Update configuration table.
- [x] Modify spark.kubernetes.allocation.batch.delay to take time instead of int (#20032)

Author: foxish <ramanathana@google.com>

Closes #19946 from foxish/update-k8s-docs.
2017-12-21 17:21:11 -08:00
Tathagata Das b59cddaba0 [SPARK-19074][SS][DOCS] Updated Structured Streaming Programming Guide for update mode and source/sink options
## What changes were proposed in this pull request?

Updates
- Updated Late Data Handling section by adding a figure for Update Mode. Its more intuitive to explain late data handling with Update Mode, so I added the new figure before the Append Mode figure.
- Updated Output Modes section with Update mode
- Added options for all the sources and sinks

---------------------------
---------------------------

![image](https://cloud.githubusercontent.com/assets/663212/21665176/f150b224-d29f-11e6-8372-14d32da21db9.png)

---------------------------
---------------------------
<img width="931" alt="screen shot 2017-01-03 at 6 09 11 pm" src="https://cloud.githubusercontent.com/assets/663212/21629740/d21c9bb8-d1df-11e6-915b-488a59589fa6.png">
<img width="933" alt="screen shot 2017-01-03 at 6 10 00 pm" src="https://cloud.githubusercontent.com/assets/663212/21629749/e22bdabe-d1df-11e6-86d3-7e51d2f28dbc.png">

---------------------------
---------------------------
![image](https://cloud.githubusercontent.com/assets/663212/21665200/108e18fc-d2a0-11e6-8640-af598cab090b.png)
![image](https://cloud.githubusercontent.com/assets/663212/21665148/cfe414fa-d29f-11e6-9baa-4124ccbab093.png)
![image](https://cloud.githubusercontent.com/assets/663212/21665226/2e8f39e4-d2a0-11e6-85b1-7657e2df5491.png)

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #16468 from tdas/SPARK-19074.
2017-01-06 11:29:01 -08:00
Tathagata Das 092c6725bf [SPARK-18669][SS][DOCS] Update Apache docs for Structured Streaming regarding watermarking and status
## What changes were proposed in this pull request?

- Extended the Window operation section with code snippet and explanation of watermarking
- Extended the Output Mode section with a table showing the compatibility between query type and output mode
- Rewrote the Monitoring section with updated jsons generated by StreamingQuery.progress/status
- Updated API changes in the StreamingQueryListener example

TODO
- [x] Figure showing the watermarking

## How was this patch tested?

N/A

## Screenshots
### Section: Windowed Aggregation with Event Time

<img width="927" alt="screen shot 2016-12-15 at 3 33 10 pm" src="https://cloud.githubusercontent.com/assets/663212/21246197/0e02cb1a-c2dc-11e6-8816-0cd28d8201d7.png">

![image](https://cloud.githubusercontent.com/assets/663212/21246241/45b0f87a-c2dc-11e6-9c29-d0a89e07bf8d.png)

<img width="929" alt="screen shot 2016-12-15 at 3 33 46 pm" src="https://cloud.githubusercontent.com/assets/663212/21246202/1652cefa-c2dc-11e6-8c64-3c05977fb3fc.png">

----------------------------
### Section: Output Modes
![image](https://cloud.githubusercontent.com/assets/663212/21246276/8ee44948-c2dc-11e6-9fa2-30502fcf9a55.png)

----------------------------
### Section: Monitoring
![image](https://cloud.githubusercontent.com/assets/663212/21246535/3c5baeb2-c2de-11e6-88cd-ca71db7c5cf9.png)
![image](https://cloud.githubusercontent.com/assets/663212/21246574/789492c2-c2de-11e6-8471-7bef884e1837.png)

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #16294 from tdas/SPARK-18669.
2016-12-28 12:11:25 -08:00
Sean Owen 18fb57f58a [MINOR][DOCS] Remove unused images; crush PNGs that could use it for good measure
## What changes were proposed in this pull request?

Coincidentally, I discovered that a couple images were unused in `docs/`, and then searched and found more, and then realized some PNGs were pretty big and could be crushed, and before I knew it, had done the same for the ASF site (not committed yet).

No functional change at all, just less superfluous image data.

## How was this patch tested?

`jekyll serve`

Author: Sean Owen <sowen@cloudera.com>

Closes #14029 from srowen/RemoveCompressImages.
2016-07-04 09:21:58 +01:00
Tathagata Das 5d00a7bc19 [SPARK-16256][DOCS] Fix window operation diagram
Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #14001 from tdas/SPARK-16256-2.
2016-06-30 14:01:34 -07:00
Tathagata Das 64132a14fb [SPARK-16256][SQL][STREAMING] Added Structured Streaming Programming Guide
Title defines all.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #13945 from tdas/SPARK-16256.
2016-06-29 11:45:57 -07:00
Sean Owen 3761330dd0 [SPARK-15879][DOCS][UI] Update logo in UI and docs to add "Apache"
## What changes were proposed in this pull request?

Use new Spark logo including "Apache" (now, with crushed PNGs). Remove old unreferenced logo files.

## How was this patch tested?

Manual check of generated HTML site and Spark UI. I searched for references to the deleted files to make sure they were not used.

Author: Sean Owen <sowen@cloudera.com>

Closes #13609 from srowen/SPARK-15879.
2016-06-11 12:46:07 +01:00
Jacek Laskowski 8df8a81825 [DOCS][MINOR] Screenshot + minor fixes to improve reading for accumulators
## What changes were proposed in this pull request?

Added screenshot + minor fixes to improve reading

## How was this patch tested?

Manual

Author: Jacek Laskowski <jacek@japila.pl>

Closes #12569 from jaceklaskowski/docs-accumulators.
2016-04-24 10:36:33 +01:00
Peter Parente b9c51c0493 [SPARK-6343] Doc driver-worker network reqs
Attempt at making the driver-worker networking requirement more explicit and up-front in the documentation (see https://issues.apache.org/jira/browse/SPARK-6343).

Update cluster overview diagram to show connections from workers to driver. Add a bullet below about how driver listens / accepts connections from workers.

Author: Peter Parente <pparent@us.ibm.com>

Closes #5382 from parente/SPARK-6343 and squashes the following commits:

0b2fb9d [Peter Parente] [SPARK-6343] Doc driver-worker network reqs
2015-04-09 06:37:20 -04:00
Xiangrui Meng d12d2ad76e [SPARK-5879][MLLIB] update PIC user guide and add a Java example
Updated PIC user guide to reflect API changes and added a simple Java example. The API is still not very Java-friendly. I created SPARK-5990 for this issue.

Author: Xiangrui Meng <meng@databricks.com>

Closes #4680 from mengxr/SPARK-5897 and squashes the following commits:

847d216 [Xiangrui Meng] apache header
87719a2 [Xiangrui Meng] remove PIC image
2dd921f [Xiangrui Meng] update PIC user guide and add a Java example
2015-02-18 16:29:32 -08:00
sboeschhuawei f377431a57 [SPARK-4259][MLlib]: Add Power Iteration Clustering Algorithm with Gaussian Similarity Function
Add single pseudo-eigenvector PIC
Including documentations and updated pom.xml with the following codes:
mllib/src/main/scala/org/apache/spark/mllib/clustering/PIClustering.scala
mllib/src/test/scala/org/apache/spark/mllib/clustering/PIClusteringSuite.scala

Author: sboeschhuawei <stephen.boesch@huawei.com>
Author: Fan Jiang <fanjiang.sc@huawei.com>
Author: Jiang Fan <fjiang6@gmail.com>
Author: Stephen Boesch <stephen.boesch@huawei.com>
Author: Xiangrui Meng <meng@databricks.com>

Closes #4254 from fjiang6/PIC and squashes the following commits:

4550850 [sboeschhuawei] Removed pic test data
f292f31 [Stephen Boesch] Merge pull request #44 from mengxr/SPARK-4259
4b78aaf [Xiangrui Meng] refactor PIC
24fbf52 [sboeschhuawei] Updated API to be similar to KMeans plus other changes requested by Xiangrui on the PR
c12dfc8 [sboeschhuawei] Removed examples files and added pic_data.txt. Revamped testcases yet to come
92d4752 [sboeschhuawei] Move the Guassian/ Affinity matrix calcs out of PIC. Presently in the test suite
7ebd149 [sboeschhuawei] Incorporate Xiangrui's first set of PR comments except restructure PIC.run to take Graph but do not remove Gaussian
121e4d5 [sboeschhuawei] Remove unused testing data files
1c3a62e [sboeschhuawei] removed matplot.py and reordered all private methods to bottom of PIC
218a49d [sboeschhuawei] Applied Xiangrui's comments - especially removing RDD/PICLinalg classes and making noncritical methods private
43ab10b [sboeschhuawei] Change last two println's to log4j logger
88aacc8 [sboeschhuawei] Add assert to testcase on cluster sizes
24f438e [sboeschhuawei] fixed incorrect markdown in clustering doc
060e6bf [sboeschhuawei] Added link to PIC doc from the main clustering md doc
be659e3 [sboeschhuawei] Added mllib specific log4j
90e7fa4 [sboeschhuawei] Converted from custom Linalg routines to Breeze: added JavaDoc comments; added Markdown documentation
bea48ea [sboeschhuawei] Converted custom Linear Algebra datatypes/routines to use Breeze.
b29c0db [Fan Jiang] Update PIClustering.scala
ace9749 [Fan Jiang] Update PIClustering.scala
a112f38 [sboeschhuawei] Added graphx main and test jars as dependencies to mllib/pom.xml
f656c34 [sboeschhuawei] Added iris dataset
b7dbcbe [sboeschhuawei] Added axes and combined into single plot for matplotlib
a2b1e57 [sboeschhuawei] Revert inadvertent update to KMeans
9294263 [sboeschhuawei] Added visualization/plotting of input/output data
e5df2b8 [sboeschhuawei] First end to end working PIC
0700335 [sboeschhuawei] First end to end working version: but has bad performance issue
32a90dc [sboeschhuawei] Update circles test data values
0ef163f [sboeschhuawei] Added ConcentricCircles data generation and KMeans clustering
3fd5bc8 [sboeschhuawei] PIClustering is running in new branch (up to the pseudo-eigenvector convergence step)
d5aae20 [Jiang Fan] Adding Power Iteration Clustering and Suite test
a3c5fbe [Jiang Fan] Adding Power Iteration Clustering
2015-01-30 14:09:49 -08:00
Joseph K. Bradley 469a6e5f3b [SPARK-4575] [mllib] [docs] spark.ml pipelines doc + bug fixes
Documentation:
* Added ml-guide.md, linked from mllib-guide.md
* Updated mllib-guide.md with small section pointing to ml-guide.md

Examples:
* CrossValidatorExample
* SimpleParamsExample
* (I copied these + the SimpleTextClassificationPipeline example into the ml-guide.md)

Bug fixes:
* PipelineModel: did not use ParamMaps correctly
* UnaryTransformer: issues with TypeTag serialization (Thanks to mengxr for that fix!)

CC: mengxr shivaram  etrain  Documentation for Pipelines: I know the docs are not complete, but the goal is to have enough to let interested people get started using spark.ml and to add more docs once the package is more established/complete.

Author: Joseph K. Bradley <joseph@databricks.com>
Author: jkbradley <joseph.kurata.bradley@gmail.com>
Author: Xiangrui Meng <meng@databricks.com>

Closes #3588 from jkbradley/ml-package-docs and squashes the following commits:

d393b5c [Joseph K. Bradley] fixed bug in Pipeline (typo from last commit).  updated examples for CV and Params for spark.ml
c38469c [Joseph K. Bradley] Updated ml-guide with CV examples
99f88c2 [Joseph K. Bradley] Fixed bug in PipelineModel.transform* with usage of params.  Updated CrossValidatorExample to use more training examples so it is less likely to get a 0-size fold.
ea34dc6 [jkbradley] Merge pull request #4 from mengxr/ml-package-docs
3b83ec0 [Xiangrui Meng] replace TypeTag with explicit datatype
41ad9b1 [Joseph K. Bradley] Added examples for spark.ml: SimpleParamsExample + Java version, CrossValidatorExample + Java version.  CrossValidatorExample not working yet.  Added programming guide for spark.ml, but need to add CrossValidatorExample to it once CrossValidatorExample works.
2014-12-04 17:00:06 +08:00
Reynold Xin 28fdc6f682 [Doc][GraphX] Remove unused png files. 2014-11-21 00:30:58 -08:00
Tathagata Das baff7e9361 [SPARK-2419][Streaming][Docs] More updates to the streaming programming guide
- Improvements to the kinesis integration guide from @cfregly
- More information about unified input dstreams in main guide

Author: Tathagata Das <tathagata.das1565@gmail.com>
Author: Chris Fregly <chris@fregly.com>

Closes #2307 from tdas/streaming-doc-fix1 and squashes the following commits:

ec40b5d [Tathagata Das] Updated figure with kinesis
fdb9c5e [Tathagata Das] Fixed style issues with kinesis guide
036d219 [Chris Fregly] updated kinesis docs and added an arch diagram
24f622a [Tathagata Das] More modifications.
2014-09-06 14:46:43 -07:00
Tathagata Das 7930209614 Merge pull request #497 from tdas/docs-update
Updated Spark Streaming Programming Guide

Here is the updated version of the Spark Streaming Programming Guide. This is still a work in progress, but the major changes are in place. So feedback is most welcome.

In general, I have tried to make the guide to easier to understand even if the reader does not know much about Spark. The updated website is hosted here -

http://www.eecs.berkeley.edu/~tdas/spark_docs/streaming-programming-guide.html

The major changes are:
- Overview illustrates the usecases of Spark Streaming - various input sources and various output sources
- An example right after overview to quickly give an idea of what Spark Streaming program looks like
- Made Java API and examples a first class citizen like Scala by using tabs to show both Scala and Java examples (similar to AMPCamp tutorial's code tabs)
- Highlighted the DStream operations updateStateByKey and transform because of their powerful nature
- Updated driver node failure recovery text to highlight automatic recovery in Spark standalone mode
- Added information about linking and using the external input sources like Kafka and Flume
- In general, reorganized the sections to better show the Basic section and the more advanced sections like Tuning and Recovery.

Todos:
- Links to the docs of external Kafka, Flume, etc
- Illustrate window operation with figure as well as example.

Author: Tathagata Das <tathagata.das1565@gmail.com>

== Merge branch commits ==

commit 18ff10556570b39d672beeb0a32075215cfcc944
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   Tue Jan 28 21:49:30 2014 -0800

    Fixed a lot of broken links.

commit 34a5a6008dac2e107624c7ff0db0824ee5bae45f
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   Tue Jan 28 18:02:28 2014 -0800

    Updated github url to use SPARK_GITHUB_URL variable.

commit f338a60ae8069e0a382d2cb170227e5757cc0b7a
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   Mon Jan 27 22:42:42 2014 -0800

    More updates based on Patrick and Harvey's comments.

commit 89a81ff25726bf6d26163e0dd938290a79582c0f
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   Mon Jan 27 13:08:34 2014 -0800

    Updated docs based on Patricks PR comments.

commit d5b6196b532b5746e019b959a79ea0cc013a8fc3
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   Sun Jan 26 20:15:58 2014 -0800

    Added spark.streaming.unpersist config and info on StreamingListener interface.

commit e3dcb46ab83d7071f611d9b5008ba6bc16c9f951
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   Sun Jan 26 18:41:12 2014 -0800

    Fixed docs on StreamingContext.getOrCreate.

commit 6c29524639463f11eec721e4d17a9d7159f2944b
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   Thu Jan 23 18:49:39 2014 -0800

    Added example and figure for window operations, and links to Kafka and Flume API docs.

commit f06b964a51bb3b21cde2ff8bdea7d9785f6ce3a9
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   Wed Jan 22 22:49:12 2014 -0800

    Fixed missing endhighlight tag in the MLlib guide.

commit 036a7d46187ea3f2a0fb8349ef78f10d6c0b43a9
Merge: eab351d a1cd185
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   Wed Jan 22 22:17:42 2014 -0800

    Merge remote-tracking branch 'apache/master' into docs-update

commit eab351d05c0baef1d4b549e1581310087158d78d
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   Wed Jan 22 22:17:15 2014 -0800

    Update Spark Streaming Programming Guide.
2014-01-28 21:51:05 -08:00
Joseph E. Gonzalez 64c4593586 Finished docummenting join operators and revised some of the initial presentation. 2014-01-11 13:48:35 -08:00
Joseph E. Gonzalez b8a44f12a5 More edits. 2014-01-10 23:52:24 -08:00
Joseph E. Gonzalez b1eeefb401 WIP. Updating figures and cleaning up initial skeleton for GraphX Programming guide. 2014-01-10 00:39:08 -08:00
Joseph E. Gonzalez 41b3122120 Strating to improve README. 2013-10-29 20:57:55 -07:00
Matei Zaharia 5a587fb98d Updated cluster diagram to show caches 2013-09-08 13:51:57 -07:00
Matei Zaharia f261d2a60f Added cluster overview doc, made logo higher-resolution, and added more
details on monitoring
2013-09-08 00:29:11 -07:00
Matei Zaharia f3a964848d More doc improvements + better warnings when you haven't built Spark 2013-08-30 12:41:25 -07:00
Matei Zaharia f1246cc7c1 Various enhancements to the programming guide and HTML/CSS 2012-09-25 23:26:56 -07:00
Andy Konwinski 5ec7a6665b More crisp logo created from vector source (ai) and disabled
responsive css (so nav menu doesn't switch to collapsed version
for narrow viewports).
2012-09-13 15:27:33 -07:00
Andy Konwinski b0207e2bfd Replaces "Spark" word in nav bar with logo. 2012-09-13 12:08:12 -07:00
Andy Konwinski 16da942d66 Adding docs directory containing documentation currently on the wiki
which can be compiled via jekyll, using the command `jekyll`. To compile
and run a local webserver to serve the doc as a website, run
`jekyll --server`.
2012-09-12 13:03:43 -07:00