From 7e1651e3152c1af59393cb1ca16701fc2500e181 Mon Sep 17 00:00:00 2001 From: Dongjoon Hyun Date: Tue, 19 Jan 2021 19:09:14 -0800 Subject: [PATCH] [SPARK-34162][DOCS][PYSPARK] Add PyArrow compatibility note for Python 3.9 ### What changes were proposed in this pull request? This PR aims to add a note for Apache Arrow project's `PyArrow` compatibility for Python 3.9. ### Why are the changes needed? Although Apache Spark documentation claims `Spark runs on Java 8/11, Scala 2.12, Python 3.6+ and R 3.5+.`, Apache Arrow's `PyArrow` is not compatible with Python 3.9.x yet. Without installing `PyArrow` library, PySpark UTs passed without any problem. So, it would be enough to add a note for this limitation and the compatibility link of Apache Arrow website. - https://arrow.apache.org/docs/python/install.html#python-compatibility ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? **BEFORE** Screen Shot 2021-01-19 at 1 45 07 PM **AFTER** Screen Shot 2021-01-19 at 7 06 41 PM Closes #31251 from dongjoon-hyun/SPARK-34162. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- docs/index.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/index.md b/docs/index.md index c4c2d722f9..84f760fe67 100644 --- a/docs/index.md +++ b/docs/index.md @@ -50,6 +50,7 @@ For the Scala API, Spark {{site.SPARK_VERSION}} uses Scala {{site.SCALA_BINARY_VERSION}}. You will need to use a compatible Scala version ({{site.SCALA_BINARY_VERSION}}.x). +For Python 3.9, Arrow optimization and pandas UDFs might not work due to the supported Python versions in Apache Arrow. Please refer to the latest [Python Compatibility](https://arrow.apache.org/docs/python/install.html#python-compatibility) page. For Java 11, `-Dio.netty.tryReflectionSetAccessible=true` is required additionally for Apache Arrow library. This prevents `java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available` when Apache Arrow uses Netty internally. # Running the Examples and Shell