ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Cheng Lian	b73744a147	[SPARK-27611][BUILD] Exclude jakarta.activation:jakarta.activation-api from org.glassfish.jaxb:jaxb-runtime:2.3.2 PR #23890 introduced `org.glassfish.jaxb:jaxb-runtime:2.3.2` as a runtime dependency. As an unexpected side effect, `jakarta.activation:jakarta.activation-api:1.2.1` was also pulled in as a transitive dependency. As a result, for the Maven build, both of the following two jars can be found under `assembly/target/scala-2.12/jars/`: ``` activation-1.1.1.jar jakarta.activation-api-1.2.1.jar ``` This PR exludes the Jakarta one. Manually built Spark using Maven and checked files under `assembly/target/scala-2.12/jars/`. After this change, only `activation-1.1.1.jar` is there. Closes #24507 from liancheng/spark-27611. Authored-by: Cheng Lian <lian@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-05-01 20:12:17 -07:00
HyukjinKwon	d8db7db50b	Revert "[SPARK-27467][FOLLOW-UP][BUILD] Upgrade Maven to 3.6.1 in AppVeyor and Doc" This reverts commit `bde30bc57c`.	2019-04-28 11:03:15 +09:00
Yuming Wang	bde30bc57c	[SPARK-27467][FOLLOW-UP][BUILD] Upgrade Maven to 3.6.1 in AppVeyor and Doc ## What changes were proposed in this pull request? Update the `docs/building-spark.md`. Otherwise: ``` mvn package -DskipTests=true ... [INFO] --- maven-enforcer-plugin:3.0.0-M2:enforce (enforce-versions) spark-parent_2.12 --- [WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed with message: Detected Maven Version: 3.6.0 is not in the allowed range 3.6.1. ... [ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce (enforce-versions) on project spark-parent_2.12: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. -> [Help 1] [ERROR] ... ``` ## How was this patch tested? Just test `https://archive.apache.org/dist/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.zip` is avilable. Closes #24477 from wangyum/SPARK-27467. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-04-27 09:09:47 -07:00
Yuming Wang	fe99305101	[SPARK-27556][BUILD] Exclude com.zaxxer:HikariCP-java7 from hadoop-yarn-server-web-proxy ## What changes were proposed in this pull request? There are two HikariCP packages in classpath when building with `-Phive -Pyarn -Phadoop-3.2`. The HikariCP dependency tree: ``` [INFO] \| +- org.apache.hadoop:hadoop-yarn-server-web-proxy:jar:3.2.0:compile [INFO] \| \| \- org.apache.hadoop:hadoop-yarn-server-common:jar:3.2.0:compile [INFO] \| \| +- org.apache.hadoop:hadoop-yarn-registry:jar:3.2.0:compile [INFO] \| \| \| \- commons-daemon:commons-daemon:jar:1.0.13:compile [INFO] \| \| +- org.apache.geronimo.specs:geronimo-jcache_1.0_spec🫙1.0-alpha-1:compile [INFO] \| \| +- org.ehcache:ehcache:jar:3.3.1:compile [INFO] \| \| +- com.zaxxer:HikariCP-java7:jar:2.4.12:compile ``` ``` [INFO] +- org.apache.hive:hive-metastore:jar:2.3.4:compile [INFO] \| +- javolution:javolution:jar:5.5.1:compile [INFO] \| +- com.google.protobuf:protobuf-java:jar:2.5.0:compile [INFO] \| +- com.jolbox:bonecp:jar:0.8.0.RELEASE:compile [INFO] \| +- com.zaxxer:HikariCP:jar:2.5.1:compile ``` This pr exclude `com.zaxxer:HikariCP-java7` from `hadoop-yarn-server-web-proxy`. ## How was this patch tested? manual tests Closes #24450 from wangyum/SPARK-27556. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-04-26 12:15:39 -05:00
Yuming Wang	f82ed5e8e0	[MINOR][TEST] Remove out-dated hive version in run-tests.py ## What changes were proposed in this pull request? ``` ======================================================================== Building Spark ======================================================================== [info] Building Spark (w/Hive 1.2.1) using SBT with these arguments: -Phadoop-3.2 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos test:package streaming-kinesis-asl-assembly/assembly ``` `(w/Hive 1.2.1)` is incorrect when testing hadoop-3.2, It's should be (w/Hive 2.3.4). This pr removes `(w/Hive 1.2.1)` in run-tests.py. ## How was this patch tested? N/A Closes #24451 from wangyum/run-tests-invalid-info. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-04-24 21:22:15 -07:00
Yuming Wang	777b4502b2	[SPARK-27176][FOLLOW-UP][SQL] Upgrade Hive parquet to 1.10.1 for hadoop-3.2 ## What changes were proposed in this pull request? When we compile and test Hadoop 3.2, we will hint the following two issues: 1. JobSummaryLevel is not a member of object org.apache.parquet.hadoop.ParquetOutputFormat. Fixed by [PARQUET-381](https://issues.apache.org/jira/browse/PARQUET-381)(Parquet 1.9.0) 2. java.lang.NoSuchFieldError: BROTLI at org.apache.parquet.hadoop.metadata.CompressionCodecName.<clinit>(CompressionCodecName.java:31). Fixed by [PARQUET-1143](https://issues.apache.org/jira/browse/PARQUET-1143)(Parquet 1.10.0) The reason is that the `parquet-hadoop-bundle-1.8.1.jar` conflicts with Parquet 1.10.1. I think it would be safe to upgrade Hive's parquet to 1.10.1 to workaround this issue. This is what Hive did when upgrading Parquet 1.8.1 to 1.10.0: [HIVE-17000](https://issues.apache.org/jira/browse/HIVE-17000) and [HIVE-19464](https://issues.apache.org/jira/browse/HIVE-19464). We can see that all changes are related to vectors, and vectors are disabled by default: see [HIVE-14826](https://issues.apache.org/jira/browse/HIVE-14826) and [HiveConf.java#L2723](https://github.com/apache/hive/blob/rel/release-2.3.4/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2723). This pr removes [parquet-hadoop-bundle-1.8.1.jar](https://github.com/apache/parquet-mr/tree/master/parquet-hadoop-bundle) , so Hive serde will use [parquet-common-1.10.1.jar, parquet-column-1.10.1.jar and parquet-hadoop-1.10.1.jar](https://github.com/apache/spark/blob/master/dev/deps/spark-deps-hadoop-3.2#L185-L189). ## How was this patch tested? 1. manual tests 2. [upgrade Hive Parquet to 1.10.1 annd run Hadoop 3.2 test on jenkins](https://github.com/apache/spark/pull/24044#commits-pushed-0c3f962) Closes #24346 from wangyum/SPARK-27176. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2019-04-19 08:59:08 -07:00
shane knapp	e1ece6a319	[SPARK-25079][PYTHON] update python3 executable to 3.6.x ## What changes were proposed in this pull request? have jenkins test against python3.6 (instead of 3.4). ## How was this patch tested? extensive testing on both the centos and ubuntu jenkins workers. NOTE: this will need to be backported to all active branches. Closes #24266 from shaneknapp/updating-python3-executable. Authored-by: shane knapp <incomplete@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-04-19 10:03:50 +09:00
Dongjoon Hyun	f93460dae9	[SPARK-27493][BUILD] Upgrade ASM to 7.1 ## What changes were proposed in this pull request? [SPARK-25946](https://issues.apache.org/jira/browse/SPARK-25946) upgraded ASM to 7.0 to support JDK11. This PR aims to update ASM to 7.1 to bring the bug fixes. - https://asm.ow2.io/versions.html - https://issues.apache.org/jira/browse/XBEAN-316 ## How was this patch tested? Pass the Jenkins. Closes #24395 from dongjoon-hyun/SPARK-27493. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-04-18 13:36:52 +09:00
Dongjoon Hyun	a8f20c95ab	[SPARK-27452][BUILD] Update zstd-jni to 1.3.8-9 ## What changes were proposed in this pull request? This PR aims to update `zstd-jni` from 1.3.2-2 to 1.3.8-9 to be aligned with the latest Zstd 1.3.8 in Apache Spark 3.0.0. Currently, Apache Spark is aligned with the old Zstd used in the first PR and there are many bugfix and improvement updates in `zstd-jni` until now. - https://github.com/facebook/zstd/releases/tag/v1.3.8 - https://github.com/facebook/zstd/releases/tag/v1.3.7 - https://github.com/facebook/zstd/releases/tag/v1.3.6 - https://github.com/facebook/zstd/releases/tag/v1.3.4 - https://github.com/facebook/zstd/releases/tag/v1.3.3 ## How was this patch tested? Pass the Jenkins with the existing tests. Closes #24364 from dongjoon-hyun/SPARK-ZSTD. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-04-16 08:54:16 -07:00
Sean Owen	8718367e2e	[SPARK-27470][PYSPARK] Update pyrolite to 4.23 ## What changes were proposed in this pull request? Update pyrolite to 4.23 to pick up bug and security fixes. ## How was this patch tested? Existing tests. Closes #24381 from srowen/SPARK-27470. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-04-16 19:41:40 +09:00
Sean Owen	a4cf1a4f4e	[SPARK-27469][CORE] Update Commons BeanUtils to 1.9.3 ## What changes were proposed in this pull request? Unify commons-beanutils deps to latest 1.9.3. This resolves the version inconsistency in Hadoop 2.7's build and also picks up security and bug fixes. ## How was this patch tested? Existing tests. Closes #24378 from srowen/SPARK-27469. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-04-15 19:18:37 -07:00
Dongjoon Hyun	0881f648cf	[SPARK-27451][BUILD] Upgrade lz4-java to 1.5.1 ## What changes were proposed in this pull request? This PR upgrades `lz4-java` to 1.5.1 in order to get a patch for avoiding racing with GC. - https://github.com/lz4/lz4-java/blob/master/CHANGES.md#151 ## How was this patch tested? Pass the Jenkins with the existing tests. Closes #24363 from dongjoon-hyun/SPARK-LZ4. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-04-12 19:21:43 -07:00
Yuming Wang	33f3c48cac	[SPARK-27176][SQL] Upgrade hadoop-3's built-in Hive maven dependencies to 2.3.4 ## What changes were proposed in this pull request? This PR mainly contains: 1. Upgrade hadoop-3's built-in Hive maven dependencies to 2.3.4. 2. Resolve compatibility issues between Hive 1.2.1 and Hive 2.3.4 in the `sql/hive` module. ## How was this patch tested? jenkins test hadoop-2.7 manual test hadoop-3: ```shell build/sbt clean package -Phadoop-3.2 -Phive export SPARK_PREPEND_CLASSES=true # rm -rf metastore_db cat <<EOF > test_hadoop3.scala spark.range(10).write.saveAsTable("test_hadoop3") spark.table("test_hadoop3").show EOF bin/spark-shell --conf spark.hadoop.hive.metastore.schema.verification=false --conf spark.hadoop.datanucleus.schema.autoCreateAll=true -i test_hadoop3.scala ``` Closes #23788 from wangyum/SPARK-23710-hadoop3. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2019-04-08 08:42:21 -07:00
Sean Owen	23bde44797	[SPARK-27358][UI] Update jquery to 1.12.x to pick up security fixes ## What changes were proposed in this pull request? Update jquery -> 1.12.4, datatables -> 1.10.18, mustache -> 2.3.12. Add missing mustache license ## How was this patch tested? I manually tested the UI locally with the javascript console open and didn't observe any problems or JS errors. The only 'risky' change seems to be mustache, but on reading its release notes, don't think the changes from 0.8.1 to 2.x would affect Spark's simple usage. Closes #24288 from srowen/SPARK-27358. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-04-05 12:54:01 -05:00
LantaoJin	69dd44af19	[SPARK-27216][CORE] Upgrade RoaringBitmap to 0.7.45 to fix Kryo unsafe ser/dser issue ## What changes were proposed in this pull request? HighlyCompressedMapStatus uses RoaringBitmap to record the empty blocks. But RoaringBitmap couldn't be ser/deser with unsafe KryoSerializer. It's a bug of RoaringBitmap-0.5.11 and fixed in latest version. This is an update of #24157 ## How was this patch tested? Add a UT Closes #24264 from LantaoJin/SPARK-27216. Lead-authored-by: LantaoJin <jinlantao@gmail.com> Co-authored-by: Lantao Jin <jinlantao@gmail.com> Signed-off-by: Imran Rashid <irashid@cloudera.com>	2019-04-03 20:09:50 -05:00
Yuming Wang	13c5c1fb4b	[SPARK-27180][BUILD][YARN] Fix testing issues with yarn module in Hadoop-3 ## What changes were proposed in this pull request? Fix testing issues with `yarn` module in Hadoop-3: 1. Upgrade jersey-1 to `1.19` to fix ```Cause: java.lang.NoClassDefFoundError: com/sun/jersey/spi/container/servlet/ServletContainer```. 2. Copy `ServerSocketUtil` from hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/ServerSocketUtil.java to fix ```java.lang.NoClassDefFoundError: org/apache/hadoop/net/ServerSocketUtil```. 3. Adapte `SessionHandler` from jetty-9.3.25.v20180904/jetty-server/src/main/java/org/eclipse/jetty/server/session/SessionHandler.java to fix ```java.lang.NoSuchMethodError: org.eclipse.jetty.server.session.SessionHandler.getSessionManager()Lorg/eclipse/jetty/server/SessionManager```. ## How was this patch tested? manual tests: ```shell build/sbt yarn/test -Pyarn build/sbt yarn/test -Phadoop-3.2 -Pyarn build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.deploy.yarn.YarnClusterSuite -pl resource-managers/yarn test -Pyarn build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.deploy.yarn.YarnClusterSuite -pl resource-managers/yarn test -Pyarn -Phadoop-3.2 ``` Closes #24115 from wangyum/hadoop3-yarn. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-04-02 15:38:26 -05:00
Yuming Wang	f799e34962	[MINOR][BUILD] Upgrade apache-rat to 0.13 ## What changes were proposed in this pull request? This PR upgrade `apache-rat` to 0.13. Issues fixed by 0.13: https://issues.apache.org/jira/issues/?jql=project%20%3D%20RAT%20AND%20fixVersion%20%3D%200.13 ## How was this patch tested? manual tests Closes #24262 from wangyum/apache-rat. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-04-01 16:44:42 +09:00
Sean Owen	754f820035	[SPARK-26918][DOCS] All .md should have ASF license header ## What changes were proposed in this pull request? Add AL2 license to metadata of all .md files. This seemed to be the tidiest way as it will get ignored by .md renderers and other tools. Attempts to write them as markdown comments revealed that there is no such standard thing. ## How was this patch tested? Doc build Closes #24243 from srowen/SPARK-26918. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-03-30 19:49:45 -05:00
Sean Owen	2ec650d843	[SPARK-27267][CORE] Update snappy to avoid error when decompressing empty serialized data ## What changes were proposed in this pull request? (See JIRA for problem statement) Update snappy 1.1.7.1 -> 1.1.7.3 to pick up an empty-stream and Java 9 fix. There appear to be no other changes of consequence: https://github.com/xerial/snappy-java/blob/master/Milestone.md ## How was this patch tested? Existing tests Closes #24242 from srowen/SPARK-27267. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-03-30 02:41:24 -05:00
Hyukjin Kwon	0e16a6f5b0	[SPARK-27277][INFRA] Recover from setting fix version failure in merge script ## What changes were proposed in this pull request? I happened to meet this case few times before: ``` Enter comma-separated fix version(s) [3.0.0]: 3.0,0 Restoring head pointer to master git checkout master Already on 'master' git branch Traceback (most recent call last): File "./dev/merge_spark_pr_jira.py", line 537, in <module> main() File "./dev/merge_spark_pr_jira.py", line 523, in main resolve_jira_issues(title, merged_refs, jira_comment) File "./dev/merge_spark_pr_jira.py", line 359, in resolve_jira_issues resolve_jira_issue(merge_branches, comment, jira_id) File "./dev/merge_spark_pr_jira.py", line 302, in resolve_jira_issue jira_fix_versions = map(lambda v: get_version_json(v), fix_versions) File "./dev/merge_spark_pr_jira.py", line 302, in <lambda> jira_fix_versions = map(lambda v: get_version_json(v), fix_versions) File "./dev/merge_spark_pr_jira.py", line 300, in get_version_json return filter(lambda v: v.name == version_str, versions)[0].raw IndexError: list index out of range ``` I typed the fix version wrongly (there's comma in `3.0,0`) and it ended the loop in the merge script. Not a big deal but it bugged me few times. Finally I met this today again, and decided to fix. This PR proposes to recover from wrongly set fix versions. ## How was this patch tested? I manually copied and pasted the specific codes and tested separately in both Python 2 and Python 3. Positive cases: ``` Enter comma-separated fix version(s) [3.0.0]: # blank test (to use default) ['3.0.0'] ``` ``` Enter comma-separated fix version(s) [3.0.0,2.4.2]: # multiple default versions ['3.0.0', '2.4.2'] ``` ``` Enter comma-separated fix version(s) [3.0.0]: 2.4.1 # valid version ['2.4.1'] ``` ``` Enter comma-separated fix version(s) [3.0.0]: 3.0.0,2.4.2 # multiple valid versions ['3.0.0', '2.4.2'] ``` Keyboard interrupt(Ctrl + c): ``` Enter comma-separated fix version(s) [3.0.0]: ^CTraceback (most recent call last): # keyboard interrupt File "test_merge_script.py", line 45, in <module> test() File "test_merge_script.py", line 26, in test fix_versions = input("Enter comma-separated fix version(s) [%s]: " % default_fix_versions) KeyboardInterrupt ``` Wrongly typed versions (recovered): ``` Enter comma-separated fix version(s) [3.0.0]: 3.1 Specified version(s) [3.1] not found in the available versions, try again (or leave blank and fix manually). Enter comma-separated fix version(s) [3.0.0]: 123 Specified version(s) [123] not found in the available versions, try again (or leave blank and fix manually). Enter comma-separated fix version(s) [3.0.0]: 3.0,0 Specified version(s) [3.0, 0] not found in the available versions, try again (or leave blank and fix manually). Enter comma-separated fix version(s) [3.0.0]: damn Specified version(s) [damn] not found in the available versions, try again (or leave blank and fix manually). Enter comma-separated fix version(s) [3.0.0]: 3.0.0,2.5.2 # one invalid versions in multiple versions Specified version(s) [3.0.0, 2.5.2] not found in the available versions, try again (or leave blank and fix manually). ``` Arbitrary exceptions in fix version parsing (recovered) ``` Enter comma-separated fix version(s) [3.0.0]: Traceback (most recent call last): File "tmp.py", line 11, in <module> raise Exception("arbitrary exception") Exception: arbitrary exception Error setting fix version(s), try again (or leave blank and fix manually) Enter comma-separated fix version(s) [3.0.0]: Traceback (most recent call last): File "tmp.py", line 10, in <module> raise Exception("arbitrary exception") Exception: arbitrary exception Error setting fix version(s), try again (or leave blank and fix manually) Enter comma-separated fix version(s) [3.0.0]: ``` Closes #24213 from HyukjinKwon/merge_script_fix_version. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-03-26 21:14:07 +09:00
Sean Owen	8bc304f97e	[SPARK-26132][BUILD][CORE] Remove support for Scala 2.11 in Spark 3.0.0 ## What changes were proposed in this pull request? Remove Scala 2.11 support in build files and docs, and in various parts of code that accommodated 2.11. See some targeted comments below. ## How was this patch tested? Existing tests. Closes #23098 from srowen/SPARK-26132. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-03-25 10:46:42 -05:00
Yuming Wang	9c0af746e5	[SPARK-27175][BUILD] Upgrade hadoop-3 to 3.2.0 ## What changes were proposed in this pull request? This PR upgrade `hadoop-3` to `3.2.0` to workaround [HADOOP-16086](https://issues.apache.org/jira/browse/HADOOP-16086). Otherwise some test case will throw IllegalArgumentException: ```java 02:44:34.707 ERROR org.apache.hadoop.hive.ql.exec.Task: Job Submission failed with exception 'java.io.IOException(Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.)' java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:116) at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:109) at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:102) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:475) at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:454) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:369) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$runHive$1(HiveClientImpl.scala:730) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:283) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266) at org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:719) at org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:709) at org.apache.spark.sql.hive.StatisticsSuite.createNonPartitionedTable(StatisticsSuite.scala:719) at org.apache.spark.sql.hive.StatisticsSuite.$anonfun$testAlterTableProperties$2(StatisticsSuite.scala:822) ``` ## How was this patch tested? manual tests Closes #24106 from wangyum/SPARK-27175. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-03-16 19:42:05 -05:00
Dongjoon Hyun	f26a1f3d37	[SPARK-27165][SPARK-27107][BUILD][SQL] Upgrade Apache ORC to 1.5.5 ## What changes were proposed in this pull request? This PR aims to update Apache ORC dependency to fix [SPARK-27107](https://issues.apache.org/jira/browse/SPARK-27107) . ``` [ORC-452] Support converting MAP column from JSON to ORC Improvement [ORC-447] Change the docker scripts to keep a persistent m2 cache [ORC-463] Add `version` command [ORC-475] ORC reader should lazily get filesystem [ORC-476] Make SearchAgument kryo buffer size configurable ``` ## How was this patch tested? Pass the Jenkins with the existing tests. Closes #24096 from dongjoon-hyun/SPARK-27165. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-14 20:14:31 -07:00
Yuming Wang	f0b6245ea4	[SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles ## What changes were proposed in this pull request? `dev/mima` and `dev/scalastyle` support dynamic reading profiles from `modules.py`. ## How was this patch tested? manual tests Closes #24089 from wangyum/SPARK-27158. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-03-15 08:20:42 +09:00
Jiaxin Shan	2d0b7cfe44	[SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2 ## What changes were proposed in this pull request? https://github.com/apache/spark/pull/23814 was reverted because of Jenkins integration tests failure. After minikube upgrade, Kubernetes client SDK v1.4.2 work with kubernetes v1.13. We can bring this change back. Reference: [Bump Kubernetes Client Version to 4.1.2](https://issues.apache.org/jira/browse/SPARK-26742) [Original PR against master](https://github.com/apache/spark/pull/23814) [Kubernetes client upgrade for Spark 2.4](https://github.com/apache/spark/pull/23993) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Unit Tests: ``` All tests passed. [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 2.343 s] [INFO] Spark Project Tags ................................. SUCCESS [ 2.039 s] [INFO] Spark Project Sketch ............................... SUCCESS [ 12.714 s] [INFO] Spark Project Local DB ............................. SUCCESS [ 2.185 s] [INFO] Spark Project Networking ........................... SUCCESS [ 38.154 s] [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 7.989 s] [INFO] Spark Project Unsafe ............................... SUCCESS [ 2.297 s] [INFO] Spark Project Launcher ............................. SUCCESS [ 2.813 s] [INFO] Spark Project Core ................................. SUCCESS [38:03 min] [INFO] Spark Project ML Local Library ..................... SUCCESS [ 3.848 s] [INFO] Spark Project GraphX ............................... SUCCESS [ 56.084 s] [INFO] Spark Project Streaming ............................ SUCCESS [04:58 min] [INFO] Spark Project Catalyst ............................. SUCCESS [06:39 min] [INFO] Spark Project SQL .................................. SUCCESS [37:12 min] [INFO] Spark Project ML Library ........................... SUCCESS [18:59 min] [INFO] Spark Project Tools ................................ SUCCESS [ 0.767 s] [INFO] Spark Project Hive ................................. SUCCESS [33:45 min] [INFO] Spark Project REPL ................................. SUCCESS [01:14 min] [INFO] Spark Project Assembly ............................. SUCCESS [ 1.444 s] [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:12 min] [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 6.719 s] [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [07:00 min] [INFO] Spark Project Examples ............................. SUCCESS [ 21.805 s] [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 0.906 s] [INFO] Spark Avro ......................................... SUCCESS [ 50.486 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 02:32 h [INFO] Finished at: 2019-03-07T08:39:34Z [INFO] ------------------------------------------------------------------------ ``` Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #24002 from Jeffwan/update_k8s_sdk_master. Authored-by: Jiaxin Shan <seedjeffwan@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-03-13 15:04:27 -07:00
DB Tsai	2b9ad2516e	[MINOR][BUILD] Add Scala 2.12 profile back for branch-2.4 build Closes #24074 from dbtsai/scala-2.12. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-12 20:08:52 -07:00
Yuming Wang	dccf6615c3	[SPARK-27130][BUILD] Automatically select profile when executing sbt-checkstyle ## What changes were proposed in this pull request? This PR makes it automatically select profile when executing `sbt-checkstyle`. The reason for this is that `hadoop-2.7` and `hadoop-3.1` may have different `hive-thriftserver` module in the future. ## How was this patch tested? manual tests: ``` Update AbstractService.java file. export HADOOP_PROFILE=hadoop2.7 ./dev/run-tests ``` The result: ![image](https://user-images.githubusercontent.com/5399861/54197992-5337e780-4500-11e9-930c-722982cdcd45.png) Closes #24065 from wangyum/SPARK-27130. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-03-13 08:03:46 +09:00
Yuming Wang	8ab13065f6	[SPARK-23807][FOLLOW-UP][BUILD][TEST-HADOOP3.1] Add test-hadoop3.1 phrase ## What changes were proposed in this pull request? Add `test-hadoop3.1` phrase to test Spark against Spark’s Hadoop 3.1 profile. ## How was this patch tested? Tested on jenkins. This is output: ``` [info] Using build tool sbt with Hadoop profile hadoop3.1 under environment amplab_jenkins ... [info] Building Spark (w/Hive 1.2.1) using SBT with these arguments: -Phadoop-3.1 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos test:package streaming-kinesis-asl-assembly/assembly ``` https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103282/console Closes #24045 from wangyum/SPARK-23807. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-03-11 11:15:58 +09:00
Gabor Somogyi	3729efb4d0	[SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs ## What changes were proposed in this pull request? Avro is built-in but external data source module since Spark 2.4 but `from_avro` and `to_avro` APIs not yet supported in pyspark. In this PR I've made them available from pyspark. ## How was this patch tested? Please see the python API examples what I've added. cd docs/ SKIP_SCALADOC=1 SKIP_RDOC=1 SKIP_SQLDOC=1 jekyll build Manual webpage check. Closes #23797 from gaborgsomogyi/SPARK-26856. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-03-11 10:15:07 +09:00
Yuming Wang	eed3091a60	[SPARK-27120][BUILD][TEST] Upgrade scalatest version to 3.0.5 ## What changes were proposed in this pull request? ScalaTest 3.0.5 Release Notes Bug Fixes - Fixed the implicit view not available problem when used with compile macro. - Fixed a stack depth problem in RefSpecLike and fixture.SpecLike under Scala 2.13. - Changed Framework and ScalaTestFramework to set spanScaleFactor for Runner object instances for different Runners using different class loaders. This fixed a problem whereby an incorrect Runner.spanScaleFactor could be used when the tests for multiple sbt project's were run concurrently. - Fixed a bug in endsWith regex matcher. Improvements - Removed duplicated parsing code for -C in ArgsParser. - Improved performance in WebBrowser. - Documentation typo rectification. - Improve validity of Junit XML reports. - Improved performance by replacing all .size == 0 and .length == 0 to .isEmpty. Enhancements - Added 'C' option to -P, which will tell -P to use cached thread pool. - External Dependencies Update - Bumped up scala-js version to 0.6.22. - Changed to depend on mockito-core, not mockito-all. - Bumped up jmock version to 2.8.3. - Bumped up junit version to 4.12. - Removed dependency to scala-parser-combinators. More details: http://www.scalatest.org/release_notes/3.0.5 ## How was this patch tested? manual tests on local machine: ``` nohup build/sbt clean -Djline.terminal=jline.UnsupportedTerminal -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pyarn -Pspark-ganglia-lgpl -Phive -Pkinesis-asl -Pmesos test > run.scalatest.log & ``` Closes #24042 from wangyum/SPARK-27120. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-10 15:22:52 -07:00
Yuming Wang	f732647ae4	[SPARK-27054][BUILD][SQL] Remove the Calcite dependency ## What changes were proposed in this pull request? Calcite is only used for [runSqlHive](`02bbe977ab/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala (L699-L705)`) when `hive.cbo.enable=true`([SemanticAnalyzer](https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java#L278-L280)). So we can disable `hive.cbo.enable` and remove Calcite dependency. ## How was this patch tested? Exist tests Closes #23970 from wangyum/SPARK-27054. Lead-authored-by: Yuming Wang <yumwang@ebay.com> Co-authored-by: Yuming Wang <wgyumg@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-09 16:34:24 -08:00
DB Tsai	b6375097bc	[SPARK-27026][BUILD] Upgrade Docker image for release build to Ubuntu 18.04 LTS ## What changes were proposed in this pull request? Upgrade Docker image for release build to Ubuntu 18.04LTS ## How was this patch tested? Manually tested. Closes #23932 from dbtsai/ubuntu18.04. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-06 13:58:21 -08:00
Yanbo Liang	7857c6d633	[SPARK-27051][CORE] Bump Jackson version to 2.9.8 ## What changes were proposed in this pull request? Fasterxml Jackson version before 2.9.8 is affected by multiple [CVEs](https://github.com/FasterXML/jackson-databind/issues/2186), we need to fix bump the dependent Jackson to 2.9.8. ## How was this patch tested? Existing tests and offline benchmark. I have run ```SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.datasources.json.JSONBenchmark"``` to check there is no performance degradation for this upgrade. Closes #23965 from yanboliang/SPARK-27051. Authored-by: Yanbo Liang <ybliang8@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-03-05 11:46:51 +09:00
LantaoJin	e5c502c596	[SPARK-25865][CORE] Add GC information to ExecutorMetrics ## What changes were proposed in this pull request? Only memory usage without GC information could not help us to determinate the proper settings of memory. We need the GC metrics about frequency of major & minor GC. For example, two cases, their configured memory for executor are all 10GB and their usages are all near 10GB. So should we increase or decrease the configured memory for them? This metrics may be helpful. We can increase configured memory for the first one if it has very frequency major GC and decrease the second one if only some minor GC and none major GC. GC metrics are only useful in entire lifetime of executors instead of separated stages. ## How was this patch tested? Adding UT. Closes #22874 from LantaoJin/SPARK-25865. Authored-by: LantaoJin <jinlantao@gmail.com> Signed-off-by: Imran Rashid <irashid@cloudera.com>	2019-03-04 14:26:02 -06:00
Sean Owen	d8754df2bf	[SPARK-27029][BUILD] Update Thrift to 0.12.0 ## What changes were proposed in this pull request? Update Thrift to 0.12.0 to pick up bug and security fixes. Changes: https://github.com/apache/thrift/blob/master/CHANGES.md The important one is for https://issues.apache.org/jira/browse/THRIFT-4506 ## How was this patch tested? Existing tests. A quick local test suggests this works. Closes #23935 from srowen/SPARK-27029. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-02 17:28:37 -08:00
Marcelo Vanzin	d00eca75b3	[SPARK-26048][BUILD] Enable flume profile when creating 2.x releases. Closes #23931 from vanzin/SPARK-26048. Authored-by: Marcelo Vanzin <vanzin@cloudera.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-02 08:14:06 -08:00
Sean Owen	131b464d0c	[SPARK-26986][ML][FOLLOWUP] Add JAXB reference impl to build for Java 9+ ## What changes were proposed in this pull request? Remove a few new JAXB dependencies that shouldn't be necessary now. See https://github.com/apache/spark/pull/23890#issuecomment-468299922 ## How was this patch tested? Existing tests Closes #23923 from srowen/SPARK-26986.2. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-03-01 11:23:40 -06:00
Sean Owen	9c283662c6	[SPARK-26986][ML] Add JAXB reference impl to build for Java 9+ ## What changes were proposed in this pull request? Add reference JAXB impl for Java 9+ from Glassfish. Right now it's only apparently necessary in MLlib but can be expanded later. ## How was this patch tested? Existing tests particularly PMML-related ones, which use JAXB. This works on Java 11. Closes #23890 from srowen/SPARK-26986. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-02-26 18:26:49 -06:00
Marcelo Vanzin	afbff6446f	Revert "[SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2" This reverts commit `a3192d966a`.	2019-02-26 13:42:07 -08:00
Jungtaek Lim (HeartSaVioR)	c5de804093	[MINOR][BUILD] Update all checkstyle dtd to use "https://checkstyle.org " ## What changes were proposed in this pull request? Below build failed with Java checkstyle test, but instead of violation it shows FileNotFound on dtd file. https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102751/ Looks like the link of dtd file is dead `http://www.puppycrawl.com/dtds/configuration_1_3.dtd`. This patch updates the dtd link to "https://checkstyle.org/dtds/" given checkstyle repository also updated the URL path. https://github.com/checkstyle/checkstyle/issues/5601 ## How was this patch tested? Checked the new links. Closes #23887 from HeartSaVioR/java-checkstyle-dtd-change-url. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-02-25 11:25:53 -08:00
Jiaxin Shan	a3192d966a	[SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2 ## What changes were proposed in this pull request? Changed the `kubernetes-client` version to 4.1.2. Latest version fix error with exec credentials (used by aws eks) and this will be used to talk with kubernetes API server. Users can submit spark job to EKS api endpoint now with this patch. ## How was this patch tested? unit tests and manual tests. Closes #23814 from Jeffwan/update_k8s_sdk. Authored-by: Jiaxin Shan <seedjeffwan@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-02-25 04:56:04 -06:00
Holden Karau	6b3c832dac	[SPARK-26882] Check the Kubernetes integration tests scalatyle ## What changes were proposed in this pull request? Add the kubernetes integration tests to the scalastyle profiles. ## How was this patch tested? Run ./dev/scalastyle with a bad change manually ## Follow on work See SPARK-26898 to add scalastyle for k8s integration to the CI Closes #23792 from holdenk/SPARK-26882-check-k8s-integration-tests-when-linting. Authored-by: Holden Karau <holden@pigscanfly.ca> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-02-19 13:49:47 -08:00
cchung100m	dc46fb77ba	[SPARK-26822] Upgrade the deprecated module 'optparse' Follow the [official document](https://docs.python.org/2/library/argparse.html#upgrading-optparse-code) to upgrade the deprecated module 'optparse' to 'argparse'. ## What changes were proposed in this pull request? This PR proposes to replace 'optparse' module with 'argparse' module. ## How was this patch tested? Follow the [previous testing](`7e3eb3cd20`), manually tested and negative tests were also done. My [test results](https://gist.github.com/cchung100m/1661e7df6e8b66940a6e52a20861f61d) Closes #23730 from cchung100m/solve_deprecated_module_optparse. Authored-by: cchung100m <cchung100m@cs.ccu.edu.tw> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-02-10 00:36:22 -06:00
Ryan Blue	f72d217788	[SPARK-26677][BUILD] Update Parquet to 1.10.1 with notEq pushdown fix. ## What changes were proposed in this pull request? Update to Parquet Java 1.10.1. ## How was this patch tested? Added a test from HyukjinKwon that validates the notEq case from SPARK-26677. Closes #23704 from rdblue/SPARK-26677-fix-noteq-parquet-bug. Lead-authored-by: Ryan Blue <blue@apache.org> Co-authored-by: Hyukjin Kwon <gurwls223@apache.org> Co-authored-by: Ryan Blue <rdblue@users.noreply.github.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2019-02-02 09:17:52 -08:00
Hyukjin Kwon	cdd694c52b	[SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins ## What changes were proposed in this pull request? ### Background For the current status, the test script that generates coverage information was merged into Spark, https://github.com/apache/spark/pull/20204 So, we can generate the coverage report and site by, for example: ``` run-tests-with-coverage --python-executables=python3 --modules=pyspark-sql ``` like `run-tests` script in `./python`. ### Proposed change The next step is to host this coverage report via `github.io` automatically by Jenkins (see https://spark-test.github.io/pyspark-coverage-site/). This uses my testing account for Spark, spark-test, which is shared to Felix and Shivaram a long time ago for testing purpose including AppVeyor. To cut this short, this PR targets to run the coverage in [spark-master-test-sbt-hadoop-2.7](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/) In the specific job, it will clone the page, and rebase the up-to-date PySpark test coverage from the latest commit. For instance as below: ```bash # Clone PySpark coverage site. git clone https://github.com/spark-test/pyspark-coverage-site.git # Remove existing HTMLs. rm -fr pyspark-coverage-site/* # Copy generated coverage HTMLs. cp -r .../python/test_coverage/htmlcov/* pyspark-coverage-site/ # Check out to a temporary branch. git symbolic-ref HEAD refs/heads/latest_branch # Add all the files. git add -A # Commit current HTMLs. git commit -am "Coverage report at latest commit in Apache Spark" # Delete the old branch. git branch -D gh-pages # Rename the temporary branch to master. git branch -m gh-pages # Finally, force update to our repository. git push -f origin gh-pages ``` So, it is a one single up-to-date coverage can be shown in the `github-io` page. The commands above were manually tested. ### TODOs - [x] Write a draft HyukjinKwon - [x] `pip install coverage` to all python implementations (pypy, python2, python3) in Jenkins workers - shaneknapp - [x] Set hidden `SPARK_TEST_KEY` for spark-test's password in Jenkins via Jenkins's feature This should be set in both PR builder and `spark-master-test-sbt-hadoop-2.7` so that later other PRs can test and fix the bugs - shaneknapp - [x] Set an environment variable that indicates `spark-master-test-sbt-hadoop-2.7` so that that specific build can report and update the coverage site - shaneknapp - [x] Make PR builder's test passed HyukjinKwon - [x] Fix flaky test related with coverage HyukjinKwon - 6 consecutive passes out of 7 runs This PR will be co-authored with me and shaneknapp ## How was this patch tested? It will be tested via Jenkins. Closes #23117 from HyukjinKwon/SPARK-7721. Lead-authored-by: Hyukjin Kwon <gurwls223@apache.org> Co-authored-by: hyukjinkwon <gurwls223@apache.org> Co-authored-by: shane knapp <incomplete@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-02-01 10:18:08 +08:00
Bryan Cutler	16990f9299	[SPARK-26566][PYTHON][SQL] Upgrade Apache Arrow to version 0.12.0 ## What changes were proposed in this pull request? Upgrade Apache Arrow to version 0.12.0. This includes the Java artifacts and fixes to enable usage with pyarrow 0.12.0 Version 0.12.0 includes the following selected fixes/improvements relevant to Spark users: * Safe cast fails from numpy float64 array with nans to integer, ARROW-4258 * Java, Reduce heap usage for variable width vectors, ARROW-4147 * Binary identity cast not implemented, ARROW-4101 * pyarrow open_stream deprecated, use ipc.open_stream, ARROW-4098 * conversion to date object no longer needed, ARROW-3910 * Error reading IPC file with no record batches, ARROW-3894 * Signed to unsigned integer cast yields incorrect results when type sizes are the same, ARROW-3790 * from_pandas gives incorrect results when converting floating point to bool, ARROW-3428 * Import pyarrow fails if scikit-learn is installed from conda (boost-cpp / libboost issue), ARROW-3048 * Java update to official Flatbuffers version 1.9.0, ARROW-3175 complete list [here](https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.12.0) PySpark requires the following fixes to work with PyArrow 0.12.0 * Encrypted pyspark worker fails due to ChunkedStream missing closed property * pyarrow now converts dates as objects by default, which causes error because type is assumed datetime64 * ArrowTests fails due to difference in raised error message * pyarrow.open_stream deprecated * tests fail because groupby adds index column with duplicate name ## How was this patch tested? Ran unit tests with pyarrow versions 0.8.0, 0.10.0, 0.11.1, 0.12.0 Closes #23657 from BryanCutler/arrow-upgrade-012. Authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-01-29 14:18:45 +08:00
Sean Owen	c2d0d700b5	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis ## What changes were proposed in this pull request? Misc code cleanup from lgtm.com analysis. See comments below for details. ## How was this patch tested? Existing tests. Closes #23571 from srowen/SPARK-26640. Lead-authored-by: Sean Owen <sean.owen@databricks.com> Co-authored-by: Hyukjin Kwon <gurwls223@apache.org> Co-authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-17 19:40:39 -06:00
wright	4915cb3adf	[MINOR][BUILD] ensure call to translate_component has correct number of arguments ## What changes were proposed in this pull request? The call to `translate_component` only supplied 2 out of the 3 required arguments. I added a default empty list for the missing argument to avoid a run-time error. I work for Semmle, and noticed the bug with our LGTM code analyzer: `0655f1624f/files/dev/create-release/releaseutils.py`?sort=name&dir=ASC&mode=heatmap#x1434915b6576fb40:1 ## How was this patch tested? I checked that `./dev/run-tests` pass OK. Closes #23567 from ipwright/wrong-number-of-arguments-fix. Authored-by: wright <wright@semmle.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-16 21:00:58 -06:00
Takeshi Yamamuro	abc937b247	[MINOR][BUILD] Remove binary license/notice files in a source release for branch-2.4+ only ## What changes were proposed in this pull request? To skip some steps to remove binary license/notice files in a source release for branch2.3 (these files only exist in master/branch-2.4 now), this pr checked a Spark release version in `dev/create-release/release-build.sh`. ## How was this patch tested? Manually checked. Closes #23538 from maropu/FixReleaseScript. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-14 19:17:39 -06:00
Dongjoon Hyun	6f35ede31c	[SPARK-26554][BUILD][FOLLOWUP] Use GitHub instead of GitBox to check HEADER ## What changes were proposed in this pull request? This PR uses GitHub repository instead of GitBox because GitHub repo returns HTTP header status correctly. ## How was this patch tested? Manual. ``` $ ./do-release-docker.sh -d /tmp/test -n Branch [branch-2.4]: Current branch version is 2.4.1-SNAPSHOT. Release [2.4.1]: RC # [1]: This is a dry run. Please confirm the ref that will be built for testing. Ref [v2.4.1-rc1]: ``` Closes #23482 from dongjoon-hyun/SPARK-26554-2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2019-01-07 17:54:05 -08:00

1 2 3 4 5 ...

678 commits