be9089731a
### What changes were proposed in this pull request?
This PR proposes to fix:
- the Binder integration of pandas API on Spark, and merge them together with the existing PySpark one.
- update quickstart of pandas API on Spark, and make it working
The notebooks can be easily reviewed here:
https://mybinder.org/v2/gh/HyukjinKwon/spark/SPARK-35588-3?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_ps.ipynb
Original page in Koalas: https://koalas.readthedocs.io/en/latest/getting_started/10min.html
### Why are the changes needed?
- To show the working examples of quickstart to end users.
- To allow users to try out the examples without installation easily.
### Does this PR introduce _any_ user-facing change?
No to end users because the existing quickstart of pandas API on Spark is not released yet.
### How was this patch tested?
I manually tested it by uploading built Spark distribution to Binder. See 3bc15310a0
Closes #33041 from HyukjinKwon/SPARK-35588-2.
Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
25 lines
1 KiB
Bash
25 lines
1 KiB
Bash
#!/bin/bash
|
|
|
|
#
|
|
# Licensed to the Apache Software Foundation (ASF) under one or more
|
|
# contributor license agreements. See the NOTICE file distributed with
|
|
# this work for additional information regarding copyright ownership.
|
|
# The ASF licenses this file to You under the Apache License, Version 2.0
|
|
# (the "License"); you may not use this file except in compliance with
|
|
# the License. You may obtain a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software
|
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
# See the License for the specific language governing permissions and
|
|
# limitations under the License.
|
|
#
|
|
|
|
# This file is used for Binder integration to install PySpark available in
|
|
# Jupyter notebook.
|
|
|
|
VERSION=$(python -c "exec(open('python/pyspark/version.py').read()); print(__version__)")
|
|
pip install plotly "pyspark[sql,ml,mllib,pandas_on_spark]<=$VERSION"
|