9b23be2e95
## What changes were proposed in this pull request? Python with rpc and disk encryption enabled along with a python broadcast variable and just read the value back on the driver side the job failed with: Traceback (most recent call last): File "broadcast.py", line 37, in <module> words_new.value File "/pyspark.zip/pyspark/broadcast.py", line 137, in value File "pyspark.zip/pyspark/broadcast.py", line 122, in load_from_path File "pyspark.zip/pyspark/broadcast.py", line 128, in load EOFError: Ran out of input To reproduce use configs: --conf spark.network.crypto.enabled=true --conf spark.io.encryption.enabled=true Code: words_new = sc.broadcast(["scala", "java", "hadoop", "spark", "akka"]) words_new.value print(words_new.value) ## How was this patch tested? words_new = sc.broadcast([“scala”, “java”, “hadoop”, “spark”, “akka”]) textFile = sc.textFile(“README.md”) wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word: (word + words_new.value[1], 1)).reduceByKey(lambda a, b: a+b) count = wordCounts.count() print(count) words_new.value print(words_new.value) Closes #23166 from redsanket/SPARK-26201. Authored-by: schintap <schintap@oath.com> Signed-off-by: Thomas Graves <tgraves@apache.org> |
||
---|---|---|
.. | ||
__init__.py | ||
test_appsubmit.py | ||
test_broadcast.py | ||
test_conf.py | ||
test_context.py | ||
test_daemon.py | ||
test_join.py | ||
test_profiler.py | ||
test_rdd.py | ||
test_readwrite.py | ||
test_serializers.py | ||
test_shuffle.py | ||
test_taskcontext.py | ||
test_util.py | ||
test_worker.py |