spark-instrumented-optimizer

History

HyukjinKwon eb9966b700 [SPARK-33190][INFRA][TESTS] Set upper bound of PyArrow version in GitHub Actions ### What changes were proposed in this pull request? PyArrow is uploaded into PyPI today (https://pypi.org/project/pyarrow/), and some tests fail with PyArrow 2.0.0+: ``` ====================================================================== ERROR [0.774s]: test_grouped_over_window_with_key (pyspark.sql.tests.test_pandas_grouped_map.GroupedMapInPandasTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/sql/tests/test_pandas_grouped_map.py", line 595, in test_grouped_over_window_with_key .select('id', 'result').collect() File "/__w/spark/spark/python/pyspark/sql/dataframe.py", line 588, in collect sock_info = self._jdf.collectToPython() File "/__w/spark/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/__w/spark/spark/python/pyspark/sql/utils.py", line 117, in deco raise converted from None pyspark.sql.utils.PythonException: An exception was thrown from the Python worker. Please see the stack trace below. Traceback (most recent call last): File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 601, in main process() File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 593, in process serializer.dump_stream(out_iter, outfile) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 255, in dump_stream return ArrowStreamSerializer.dump_stream(self, init_stream_yield_batches(), stream) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 81, in dump_stream for batch in iterator: File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 248, in init_stream_yield_batches for series in iterator: File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 426, in mapper return f(keys, vals) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 170, in <lambda> return lambda k, v: [(wrapped(k, v), to_arrow_type(return_type))] File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 158, in wrapped result = f(key, pd.concat(value_series, axis=1)) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 68, in wrapper return f(args, *kwargs) File "/__w/spark/spark/python/pyspark/sql/tests/test_pandas_grouped_map.py", line 590, in f "{} != {}".format(expected_key[i][1], window_range) AssertionError: {'start': datetime.datetime(2018, 3, 15, 0, 0), 'end': datetime.datetime(2018, 3, 20, 0, 0)} != {'start': datetime.datetime(2018, 3, 15, 0, 0, tzinfo=<StaticTzInfo 'Etc/UTC'>), 'end': datetime.datetime(2018, 3, 20, 0, 0, tzinfo=<StaticTzInfo 'Etc/UTC'>)} ``` https://github.com/apache/spark/runs/1278917457 This PR proposes to set the upper bound of PyArrow in GitHub Actions build. This should be removed when we properly support PyArrow 2.0.0+ (SPARK-33189). ### Why are the changes needed? To make build pass. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? GitHub Actions in this build will test it out. Closes #30098 from HyukjinKwon/hot-fix-test. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>		2020-10-20 17:35:09 +09:00
..
workflows	[SPARK-33190][INFRA][TESTS] Set upper bound of PyArrow version in GitHub Actions	2020-10-20 17:35:09 +09:00
autolabeler.yml	[SPARK-31330][INFRA][FOLLOW-UP] Exclude 'ui' and 'UI.scala' in CORE and 'dev/.rat-excludes' in BUILD autolabeller	2020-04-16 10:16:58 +09:00
PULL_REQUEST_TEMPLATE	[MINOR][INFRA] Add a guide to clarify release/unreleased Spark versions of user-facing change in the Github PR template	2020-04-30 09:22:07 +09:00