diff --git a/pyspark/README b/pyspark/README index 63a1def141..55490e1a83 100644 --- a/pyspark/README +++ b/pyspark/README @@ -36,7 +36,7 @@ examples. PySpark requires a development version of Py4J, a Python library for interacting with Java processes. It can be installed from https://github.com/bartdag/py4j; make sure to install a version that -contains at least the commits through 3dbf380d3d. +contains at least the commits through b7924aabe9. PySpark uses the `PYTHONPATH` environment variable to search for Python classes; Py4J should be on this path, along with any libraries used by diff --git a/pyspark/pyspark/broadcast.py b/pyspark/pyspark/broadcast.py index 4cff02b36d..93876fa738 100644 --- a/pyspark/pyspark/broadcast.py +++ b/pyspark/pyspark/broadcast.py @@ -13,6 +13,8 @@ >>> sc.parallelize([0, 0]).flatMap(lambda x: b.value).collect() [1, 2, 3, 4, 5, 1, 2, 3, 4, 5] + +>>> large_broadcast = sc.broadcast(list(range(10000))) """ # Holds broadcasted data received from Java, keyed by its id. _broadcastRegistry = {} diff --git a/pyspark/requirements.txt b/pyspark/requirements.txt index 71e2bc2b89..48fa2ab105 100644 --- a/pyspark/requirements.txt +++ b/pyspark/requirements.txt @@ -3,4 +3,4 @@ # package is not at the root of the git repository. It may be possible to # install Py4J from git once https://github.com/pypa/pip/pull/526 is merged. -# git+git://github.com/bartdag/py4j.git@3dbf380d3d2cdeb9aab394454ea74d80c4aba1ea +# git+git://github.com/bartdag/py4j.git@b7924aabe9c5e63f0a4d8bbd17019534c7ec014e