spark-instrumented-optimizer

History

Sean Owen d5b7eed12f [SPARK-28903][STREAMING][PYSPARK][TESTS] Fix AWS JDK version conflict that breaks Pyspark Kinesis tests The Pyspark Kinesis tests are failing, at least in master: ``` ====================================================================== ERROR: test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/pyspark/streaming/tests/test_kinesis.py", line 44, in test_kinesis_stream kinesisTestUtils = self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2) File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__ answer, self._gateway_client, None, self._fqn) File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value format(target_id, ".", name), value) Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.kinesis.KinesisTestUtils. : java.lang.NoSuchMethodError: com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection; at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211) at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211) at scala.collection.Iterator.find(Iterator.scala:993) at scala.collection.Iterator.find$(Iterator.scala:990) at scala.collection.AbstractIterator.find(Iterator.scala:1429) at scala.collection.IterableLike.find(IterableLike.scala:81) at scala.collection.IterableLike.find$(IterableLike.scala:80) at scala.collection.AbstractIterable.find(Iterable.scala:56) at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211) at org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46) ... ``` The non-Python Kinesis tests are fine though. It turns out that this is because Pyspark tests use the output of the Spark assembly, and it pulls in `hadoop-cloud`, which in turn pulls in an old AWS Java SDK. Per Steve Loughran (below), it seems like we can just resolve this by excluding the aws-java-sdk dependency. See the attached PR for some more detail about the debugging and other options. See https://github.com/apache/spark/pull/25558#issuecomment-524042709 Closes #25559 from srowen/KinesisTest. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-08-31 10:29:46 -05:00
..
src	[SPARK-23977][SQL] Support High Performance S3A committers [test-hadoop3.2]	2019-08-15 09:39:26 -07:00
pom.xml	[SPARK-28903][STREAMING][PYSPARK][TESTS] Fix AWS JDK version conflict that breaks Pyspark Kinesis tests	2019-08-31 10:29:46 -05:00

Sean Owen d5b7eed12f [SPARK-28903][STREAMING][PYSPARK][TESTS] Fix AWS JDK version conflict that breaks Pyspark Kinesis tests

The Pyspark Kinesis tests are failing, at least in master:

```
======================================================================
ERROR: test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/pyspark/streaming/tests/test_kinesis.py", line 44, in test_kinesis_stream
    kinesisTestUtils = self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2)
  File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__
    answer, self._gateway_client, None, self._fqn)
  File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.kinesis.KinesisTestUtils.
: java.lang.NoSuchMethodError: com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection;
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211)
	at scala.collection.Iterator.find(Iterator.scala:993)
	at scala.collection.Iterator.find$(Iterator.scala:990)
	at scala.collection.AbstractIterator.find(Iterator.scala:1429)
	at scala.collection.IterableLike.find(IterableLike.scala:81)
	at scala.collection.IterableLike.find$(IterableLike.scala:80)
	at scala.collection.AbstractIterable.find(Iterable.scala:56)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46)
...
```

The non-Python Kinesis tests are fine though. It turns out that this is because Pyspark tests use the output of the Spark assembly, and it pulls in `hadoop-cloud`, which in turn pulls in an old AWS Java SDK.

Per Steve Loughran (below), it seems like we can just resolve this by excluding the aws-java-sdk dependency. See the attached PR for some more detail about the debugging and other options.

See https://github.com/apache/spark/pull/25558#issuecomment-524042709

Closes #25559 from srowen/KinesisTest.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>

2019-08-31 10:29:46 -05:00

src

[SPARK-23977][SQL] Support High Performance S3A committers [test-hadoop3.2]

2019-08-15 09:39:26 -07:00

pom.xml

[SPARK-28903][STREAMING][PYSPARK][TESTS] Fix AWS JDK version conflict that breaks Pyspark Kinesis tests

2019-08-31 10:29:46 -05:00