History

HyukjinKwon b54103016a [SPARK-32204][SPARK-32182][DOCS] Add a quickstart page with Binder integration in PySpark documentation ### What changes were proposed in this pull request? This PR proposes to: - add a notebook with a Binder integration which allows users to try PySpark in a live notebook. Please [try this here](https://mybinder.org/v2/gh/HyukjinKwon/spark/SPARK-32204?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart.ipynb). - reuse this notebook as a quickstart guide in PySpark documentation. Note that Binder turns a Git repo into a collection of interactive notebooks. It works based on Docker image. Once somebody builds, other people can reuse the image against a specific commit. Therefore, if we run Binder with the images based on released tags in Spark, virtually all users can instantly launch the Jupyter notebooks. <br/> I made a simple demo to make it easier to review. Please see: - [Main page](https://hyukjin-spark.readthedocs.io/en/stable/). Note that the link ("Live Notebook") in the main page wouldn't work since this PR is not merged yet. - [Quickstart page](https://hyukjin-spark.readthedocs.io/en/stable/getting_started/quickstart.html) <br/> When reviewing the notebook file itself, please give my direct feedback which I will appreciate and address. Another way might be: - open [here](https://mybinder.org/v2/gh/HyukjinKwon/spark/SPARK-32204?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart.ipynb). - edit / change / update the notebook. Please feel free to change as whatever you want. I can apply as are or slightly update more when I apply to this PR. - download it as a `.ipynb` file: ![Screen Shot 2020-08-20 at 10 12 19 PM](https://user-images.githubusercontent.com/6477701/90774311-3e38c800-e332-11ea-8476-699a653984db.png) - upload the `.ipynb` file here in a GitHub comment. Then, I will push a commit with that file with crediting correctly, of course. - alternatively, push a commit into this PR right away if that's easier for you (if you're a committer). References: - https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html - https://databricks.com/jp/blog/2020/03/31/10-minutes-from-pandas-to-koalas-on-apache-spark.html - my own blog post .. :-) and https://koalas.readthedocs.io/en/latest/getting_started/10min.html ### Why are the changes needed? To improve PySpark's usability. The current quickstart for Python users are very friendly. ### Does this PR introduce _any_ user-facing change? Yes, it will add a documentation page, and expose a live notebook to PySpark users. ### How was this patch tested? Manually tested, and GitHub Actions builds will test. Closes #29491 from HyukjinKwon/SPARK-32204. Lead-authored-by: HyukjinKwon <gurwls223@apache.org> Co-authored-by: Fokko Driesprong <fokko@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>		2020-08-26 12:23:24 +09:00
..
create-release	[SPARK-32204][SPARK-32182][DOCS] Add a quickstart page with Binder integration in PySpark documentation	2020-08-26 12:23:24 +09:00
deps	[SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV	2020-08-26 00:25:58 +09:00
sparktestsupport	[SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature	2020-07-15 11:40:55 -05:00
tests	[MINOR] Fix typos in dev/* scripts.	2018-01-31 07:37:25 +09:00
.gitignore	[SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*.py to .gitignore file.	2018-01-31 00:51:00 +09:00
.rat-excludes	[SPARK-23431][CORE] Expose stage level peak executor metrics via REST API	2020-08-04 21:11:00 +08:00
.scalafmt.conf	[SPARK-26177] Config change followup to [] Automated formatting for Scala code	2018-12-03 10:03:51 -06:00
appveyor-guide.md	[SPARK-26918][DOCS] All .md should have ASF license header	2019-03-30 19:49:45 -05:00
appveyor-install-dependencies.ps1	[SPARK-32231][R][INFRA] Use Hadoop 3.2 winutils in AppVeyor build	2020-07-09 17:18:39 +09:00
change-scala-version.sh	[SPARK-30012][CORE][SQL] Change classes extending scala collection classes to work with 2.13	2019-12-03 08:59:43 -08:00
check-license	[MINOR][BUILD] Upgrade apache-rat to 0.13	2019-04-01 16:44:42 +09:00
checkstyle-suppressions.xml	[SPARK-29674][CORE] Update dropwizard metrics to 4.1.x for JDK 9+	2019-11-03 15:13:06 -08:00
checkstyle.xml	[MINOR] Fix google style guide address	2019-12-12 11:04:01 -06:00
github_jira_sync.py	[SPARK-29802][BUILD] Use python3 in build scripts	2020-07-19 11:02:37 +09:00
lint-java	[SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses)	2018-01-13 21:34:28 -08:00
lint-python	[SPARK-32204][SPARK-32182][DOCS] Add a quickstart page with Binder integration in PySpark documentation	2020-08-26 12:23:24 +09:00
lint-r	[SPARK-29932][R][TESTS] lint-r should do non-zero exit in case of errors	2019-11-17 10:09:46 -08:00
lint-r.R	[MINOR][R] small tidying of sh scripts for R	2020-04-30 16:58:05 -07:00
lint-scala	[SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles	2019-03-15 08:20:42 +09:00
make-distribution.sh	[SPARK-31041][BUILD] Show Maven errors from within make-distribution.sh	2020-03-11 08:22:02 -05:00
merge_spark_pr.py	[SPARK-29802][BUILD] Use python3 in build scripts	2020-07-19 11:02:37 +09:00
mima	[SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles	2019-03-15 08:20:42 +09:00
pip-sanity-check.py	[SPARK-32319][PYSPARK] Disallow the use of unused imports	2020-08-08 08:51:57 -07:00
README.md	Merge pull request #565 from pwendell/dev-scripts. Closes #565 .	2014-02-08 23:13:34 -08:00
requirements.txt	[SPARK-32204][SPARK-32182][DOCS] Add a quickstart page with Binder integration in PySpark documentation	2020-08-26 12:23:24 +09:00
run-pip-tests	[SPARK-32419][PYTHON][BUILD] Avoid using subshell for Conda env (de)activation in pip packaging test	2020-07-25 13:09:23 +09:00
run-tests	[SPARK-29672][PYSPARK] update spark testing framework to use python3	2019-11-14 10:18:55 -08:00
run-tests-jenkins	[SPARK-29672][PYSPARK] update spark testing framework to use python3	2019-11-14 10:18:55 -08:00
run-tests-jenkins.py	[SPARK-32138] Drop Python 2.7, 3.4 and 3.5	2020-07-14 11:22:44 +09:00
run-tests.py	[SPARK-32682][INFRA] Use workflow_dispatch to enable manual test triggers	2020-08-21 21:23:41 +09:00
sbt-checkstyle	[SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles	2019-03-15 08:20:42 +09:00
scalafmt	[SPARK-30570][BUILD] Update scalafmt plugin to 1.0.3 with onlyChangedFiles feature	2020-01-23 12:44:43 -08:00
scalastyle	Revert "[SPARK-30534][INFRA] Use mvn in `dev/scalastyle`"	2020-01-21 18:23:03 +09:00
test-dependencies.sh	[SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES	2020-07-17 11:59:19 -05:00
tox.ini	[SPARK-32319][PYSPARK] Disallow the use of unused imports	2020-08-08 08:51:57 -07:00

README.md

Spark Developer Scripts

This directory contains scripts useful to developers when packaging, testing, or committing to Spark.

Many of these scripts require Apache credentials to work correctly.