[SPARK-8821] [EC2] Switched to binary mode for file reading

Otherwise the script will crash with

    - Downloading boto...
    Traceback (most recent call last):
      File "ec2/spark_ec2.py", line 148, in <module>
        setup_external_libs(external_libs)
      File "ec2/spark_ec2.py", line 128, in setup_external_libs
        if hashlib.md5(tar.read()).hexdigest() != lib["md5"]:
      File "/usr/lib/python3.4/codecs.py", line 319, in decode
        (result, consumed) = self._buffer_decode(data, self.errors, final)
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

In case of an utf8 env setting.

Author: Simon Hafner <hafnersimon@gmail.com>

Closes #7215 from reactormonk/branch-1.4 and squashes the following commits:

e86957a [Simon Hafner] [SPARK-8821] [EC2] Switched to binary mode
This commit is contained in:
Simon Hafner 2015-07-07 09:42:59 -07:00 committed by Shivaram Venkataraman
parent bf8b47d17b
commit 83a621a5a8

View file

@ -127,7 +127,7 @@ def setup_external_libs(libs):
) )
with open(tgz_file_path, "wb") as tgz_file: with open(tgz_file_path, "wb") as tgz_file:
tgz_file.write(download_stream.read()) tgz_file.write(download_stream.read())
with open(tgz_file_path) as tar: with open(tgz_file_path, "rb") as tar:
if hashlib.md5(tar.read()).hexdigest() != lib["md5"]: if hashlib.md5(tar.read()).hexdigest() != lib["md5"]:
print("ERROR: Got wrong md5sum for {lib}.".format(lib=lib["name"]), file=stderr) print("ERROR: Got wrong md5sum for {lib}.".format(lib=lib["name"]), file=stderr)
sys.exit(1) sys.exit(1)