Bump required Py4J version and add test for large broadcast variables.

This commit is contained in:
Josh Rosen 2012-10-28 16:46:31 -07:00
parent d4f2e5b0ef
commit 7859879aaa
3 changed files with 4 additions and 2 deletions

View file

@ -36,7 +36,7 @@ examples.
PySpark requires a development version of Py4J, a Python library for
interacting with Java processes. It can be installed from
https://github.com/bartdag/py4j; make sure to install a version that
contains at least the commits through 3dbf380d3d.
contains at least the commits through b7924aabe9.
PySpark uses the `PYTHONPATH` environment variable to search for Python
classes; Py4J should be on this path, along with any libraries used by

View file

@ -13,6 +13,8 @@
>>> sc.parallelize([0, 0]).flatMap(lambda x: b.value).collect()
[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
>>> large_broadcast = sc.broadcast(list(range(10000)))
"""
# Holds broadcasted data received from Java, keyed by its id.
_broadcastRegistry = {}

View file

@ -3,4 +3,4 @@
# package is not at the root of the git repository. It may be possible to
# install Py4J from git once https://github.com/pypa/pip/pull/526 is merged.
# git+git://github.com/bartdag/py4j.git@3dbf380d3d2cdeb9aab394454ea74d80c4aba1ea
# git+git://github.com/bartdag/py4j.git@b7924aabe9c5e63f0a4d8bbd17019534c7ec014e