[SPARK-35755][PYTHON][INFRA] Use higher PyArrow versions in GitHub Actions build

### What changes were proposed in this pull request?

This PR proposes to use higher versions of PyArrow which more users use in general.

Without this PR, the testing matrix as follows:

- (Python 3.8) Use PyArrow **2.x** in [pandas UDF tests in SQL side](https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/IntegratedUDFTestUtils.scala)
- (Python 3.6) Use PyArrow **2.x** in PySpark tests
- (Python 3.9) Use PyArrow 4.x in PySpark tests (no change)
- (Python 3.6) Use PyArrow **2.x** in PySpark documentation generation (it runs Spark jobs to generate images to use in PySpark API docs)

After this PR, the testing matrix as follows:

- (Python 3.8) Use PyArrow **4.x** in [pandas UDF tests in SQL side](https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/IntegratedUDFTestUtils.scala)
- (Python 3.6) Use PyArrow **3.x** in PySpark tests
- (Python 3.9) Use PyArrow 4.x in PySpark tests (no change)
- (Python 3.6) Use PyArrow **4.x** in PySpark documentation generation (it runs Spark jobs to generate images to use in PySpark API docs)

### Why are the changes needed?

Test matrix which more people use.

### Does this PR introduce _any_ user-facing change?

No, dev and testing only.

### How was this patch tested?

GitHub Actions in this PR should test it out.

Closes #32906 from HyukjinKwon/SPARK-35755.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
This commit is contained in:
Hyukjin Kwon 2021-06-15 09:59:38 +09:00
parent ef7545b788
commit 2d47fb7683

View file

@ -126,7 +126,7 @@ jobs:
- name: Install Python packages (Python 3.8)
if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-'))
run: |
python3.8 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner
python3.8 -m pip install numpy 'pyarrow<5.0.0' pandas scipy xmlrunner
python3.8 -m pip list
# Run the tests.
- name: Run tests
@ -217,7 +217,7 @@ jobs:
# Ubuntu 20.04. See also SPARK-33162.
- name: Install Python packages (Python 3.6)
run: |
python3.6 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner 'plotly>=4.8'
python3.6 -m pip install numpy 'pyarrow<4.0.0' pandas scipy xmlrunner 'plotly>=4.8'
python3.6 -m pip list
- name: List Python packages (Python 3.9)
run: |
@ -388,7 +388,7 @@ jobs:
# Jinja2 3.0.0+ causes error when building with Sphinx.
# See also https://issues.apache.org/jira/browse/SPARK-35375.
python3.6 -m pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsphinx numpydoc 'jinja2<3.0.0'
python3.6 -m pip install sphinx_plotly_directive 'pyarrow<3.0.0' pandas 'plotly>=4.8'
python3.6 -m pip install sphinx_plotly_directive 'pyarrow<5.0.0' pandas 'plotly>=4.8'
apt-get update -y
apt-get install -y ruby ruby-dev
Rscript -e "install.packages(c('devtools', 'testthat', 'knitr', 'rmarkdown', 'roxygen2'), repos='https://cloud.r-project.org/')"