ea9e8f365a
### What changes were proposed in this pull request? This PR aims to upgrade PySpark's embedded cloudpickle to the latest cloudpickle v1.5.0 (See https://github.com/cloudpipe/cloudpickle/blob/v1.5.0/cloudpickle/cloudpickle.py) ### Why are the changes needed? There are many bug fixes. For example, the bug described in the JIRA: dill unpickling fails because they define `types.ClassType`, which is undefined in dill. This results in the following error: ``` Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", line 279, in loads return dill.loads(s) File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 317, in loads return load(file, ignore) File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 305, in load obj = pik.load() File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 577, in _load_type return _reverse_typemap[name] KeyError: 'ClassType' ``` See also https://github.com/cloudpipe/cloudpickle/issues/82. This was fixed for cloudpickle 1.3.0+ (https://github.com/cloudpipe/cloudpickle/pull/337), but PySpark's cloudpickle.py doesn't have this change yet. More notably, now it supports C pickle implementation with Python 3.8 which hugely improve performance. This is already adopted in another project such as Ray. ### Does this PR introduce _any_ user-facing change? Yes, as described above, the bug fixes. Internally, users also could leverage the fast cloudpickle backed by C pickle. ### How was this patch tested? Jenkins will test it out. Closes #29114 from HyukjinKwon/SPARK-32094. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org> |
||
---|---|---|
.. | ||
create-release | ||
deps | ||
sparktestsupport | ||
tests | ||
.gitignore | ||
.rat-excludes | ||
.scalafmt.conf | ||
appveyor-guide.md | ||
appveyor-install-dependencies.ps1 | ||
change-scala-version.sh | ||
check-license | ||
checkstyle-suppressions.xml | ||
checkstyle.xml | ||
github_jira_sync.py | ||
lint-java | ||
lint-python | ||
lint-r | ||
lint-r.R | ||
lint-scala | ||
make-distribution.sh | ||
merge_spark_pr.py | ||
mima | ||
pip-sanity-check.py | ||
README.md | ||
requirements.txt | ||
run-pip-tests | ||
run-tests | ||
run-tests-jenkins | ||
run-tests-jenkins.py | ||
run-tests.py | ||
sbt-checkstyle | ||
scalafmt | ||
scalastyle | ||
test-dependencies.sh | ||
tox.ini |
Spark Developer Scripts
This directory contains scripts useful to developers when packaging, testing, or committing to Spark.
Many of these scripts require Apache credentials to work correctly.