ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
yangjie01	2289389821	[SPARK-33441][BUILD][FOLLOWUP] Make unused-imports check for SBT specific ### What changes were proposed in this pull request? Move "unused-imports" check config to `SparkBuild.scala` and make it SBT specific. ### Why are the changes needed? Make unused-imports check for SBT specific. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass the Jenkins or GitHub Action Closes #30441 from LuciferYang/SPARK-33441-FOLLOWUP. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-11-20 21:27:41 +09:00
HyukjinKwon	3af1651e50	[SPARK-33297][BUILD] Switch to use flat class loader strategy in SBT ### What changes were proposed in this pull request? This PR proposes to switch the class loader strategy from `ScalaLibrary` to `Flat` (see https://www.scala-sbt.org/1.x/docs/In-Process-Classloaders.html): https://github.com/apache/spark/runs/1314691686 ``` Error: java.util.MissingResourceException: Can't find bundle for base name org.scalactic.ScalacticBundle, locale en Error: at java.util.ResourceBundle.throwMissingResourceException(ResourceBundle.java:1581) Error: at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1396) Error: at java.util.ResourceBundle.getBundle(ResourceBundle.java:782) Error: at org.scalactic.Resources$.resourceBundle$lzycompute(Resources.scala:8) Error: at org.scalactic.Resources$.resourceBundle(Resources.scala:8) Error: at org.scalactic.Resources$.pleaseDefineScalacticFillFilePathnameEnvVar(Resources.scala:256) Error: at org.scalactic.source.PositionMacro$PositionMacroImpl.apply(PositionMacro.scala:65) Error: at org.scalactic.source.PositionMacro$.genPosition(PositionMacro.scala:85) Error: at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) Error: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Error: at java.lang.reflect.Method.invoke(Method.java:498) ``` See also https://github.com/sbt/sbt/issues/5736 ### Why are the changes needed? To make the build unflaky. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? GitHub Actions build in this test. Closes #30198 from HyukjinKwon/SPARK-33297. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-10-30 17:53:30 +09:00
Denis Pyshev	ba69d68d91	[SPARK-33080][BUILD] Replace fatal warnings snippet ### What changes were proposed in this pull request? Current solution in build file to enable build failure on compilation warnings with exclusion of deprecation ones is not portable after SBT version 1.3.13 (build import fails with compilation error with SBT 1.4) and could be replaced with more robust and maintainable, especially since Scala 2.13.2 with similar built-in functionality. Additionally, warnings were fixed to pass the build, with as few changes as possible: warnings in 2.12 compilation fixed in code, warnings in 2.13 compilation covered by configuration to be addressed separately ### Why are the changes needed? Unblocks upgrade to SBT after 1.3.13. Enhances build file maintainability. Allows fine tune of warnings configuration in scope of Scala 2.13 compilation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? `build/sbt`'s `compile` and `Test/compile` for both Scala 2.12 and 2.13 profiles. Closes #29995 from gemelen/feature/warnings-reporter. Authored-by: Denis Pyshev <git@gemelen.net> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-10-15 14:49:43 -05:00
William Hyun	50b2a497f3	[SPARK-21708][BUILD][FOLLOWUP] Rename hdpVersion to hadoopVersionValue This PR aims to rename hdpVersion to versionValue. Use the general variable name. No. Pass the CI. Closes #30008 from williamhyun/sbt. Authored-by: William Hyun <williamhyun3@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-10-11 20:06:26 -07:00
Denis Pyshev	c78971b1c7	[SPARK-33106][BUILD] Fix resolvers clash in SBT ### What changes were proposed in this pull request? Rename manually added resolver for local Ivy repo. Create configuration to publish to local Ivy repo similar to Maven one. Use `publishLocal` to publish both to local Maven and Ivy repos instead of custom task `localPublish` (renamed from `publish-local` of sbt 0.13.x). ### Why are the changes needed? There are two resolvers (bootResolvers's "local" and manually added "local") that point to the same local Ivy repo, but have different configurations, which led to excessive warnings in logs and, potentially, resolution issues. Changeset fixes that case, observable in sbt output as ``` [warn] Multiple resolvers having different access mechanism configured with same name 'local'. To avoid conflict, Remove duplicate project resolvers (`resolvers`) or rename publishing resolve r (`publishTo`). ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Executed `build/sbt`'s `publishLocal` task on individual module and on root project. Closes #30006 from gemelen/feature/local-resolvers. Authored-by: Denis Pyshev <git@gemelen.net> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-10-11 18:51:00 -07:00
Denis Pyshev	6daa2aeb01	[SPARK-21708][BUILD] Migrate build to sbt 1.x ### What changes were proposed in this pull request? Migrate sbt-launcher URL to download one for sbt 1.x. Update plugins versions where required by sbt update. Change sbt version to be used to latest released at the moment, 1.3.13 Adjust build settings according to plugins and sbt changes. ### Why are the changes needed? Migration to sbt 1.x: 1. enhances dev experience in development 2. updates build plugins to bring there new features/to fix bugs in them 3. enhances build performance on sbt side 4. eases movement to Scala 3 / dotty ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? All existing tests passed, both on Jenkins and via Github Actions, also manually for Scala 2.13 profile. Closes #29286 from gemelen/feature/sbt-1.x. Authored-by: Denis Pyshev <git@gemelen.net> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-10-07 15:28:00 -07:00
Kousuke Saruta	f7ba95264d	[SPARK-33048][BUILD] Fix SparkBuild.scala to recognize build settings for Scala 2.13 ### What changes were proposed in this pull request? This PR fixes `SparkBuild.scala` to recognize build settings for Scala 2.13. In `SparkBuild.scala`, a variable `scalaBinaryVersion` is hardcoded as `2.12`. So, an environment variable `SPARK_SCALA_VERSION` is also to be `2.12`. This issue causes some test suites (e.g. `SparkSubmitSuite`) to be error. ``` ===== TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'user classpath first in driver' ===== 20/10/02 08:55:30.234 redirect stderr for command /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: Error: Could not find or load m ain class org.apache.spark.launcher.Main 20/10/02 08:55:30.235 redirect stderr for command /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: /home/kou/work/oss/spark-scala- 2.13/bin/spark-class: line 96: CMD: bad array subscript ``` The reason of this error is that environment variables `SPARK_JARS_DIR` and `LAUNCH_CLASSPATH` is defined in `bin/spark-class` as follows. ``` SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars" LAUNCH_CLASSPATH="${SPARK_HOME}/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH" ``` ### Why are the changes needed? To build for Scala 2.13 successfully. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tests for `core` module finish successfully. ``` build/sbt -Pscala-2.13 clean "core/test" ``` Closes #29927 from sarutak/fix-sparkbuild-for-scala-2.13. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-10-02 15:17:44 +09:00
Kousuke Saruta	fbb0f37685	[SPARK-32869][BUILD] Ignore deprecation warnings for build with Scala 2.13 and sbt ### What changes were proposed in this pull request? This PR changes SparkBuild.scala to ignore deprecation warnings for build with Scala 2.13 and sbt. Actually, deprecation warnings are already ignored for Scala 2.12 but string matching logic for deprecation warnings should be changed for Scala 2.13. Currently, if a warning message contains `is deprecated`, it's ignored but some warnings contain "are deprecated` and `will be deprecated`. ``` [error] [warn] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala:656: multiarg infix syntax looks\ like a tuple and will be deprecated [error] [warn] if (opt.clOption != null) { childArgs += (opt.clOption, opt.value) } ``` ``` [error] [warn] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/rdd/SequenceFileRDDFunctions.scala:35: view bounds are de\ precated; use an implicit parameter instead. [error] example: instead of `def f[A <% Int](a: A)` use `def f[A](a: A)(implicit ev: A => Int)` [error] [warn] class SequenceFileRDDFunctions[K <% Writable: ClassTag, V <% Writable : ClassTag]( ``` ### Why are the changes needed? Enable to build Spark with Scala 2.13 and sbt. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Build with the following command and confirmed deprecation warnings are not treated as fatal ( Build itself doesn't pass due to another problem). `build/sbt -Pscala-2.13 package` Closes #29741 from sarutak/scala-2.13-deprecated-warning. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-09-14 10:00:15 +09:00
Gengliang Wang	de141a3271	[SPARK-32660][SQL][DOC] Show Avro related API in documentation ### What changes were proposed in this pull request? Currently, the Avro related APIs are missing in the documentation https://spark.apache.org/docs/latest/api/scala/org/apache/spark/index.html . This PR is to: 1. Mark internal Avro related classes as private 2. Show Avro related API in Spark official API documentation ### Why are the changes needed? Better documentation. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Build doc and preview: ![image](https://user-images.githubusercontent.com/1097932/90623042-d156ee00-e1ca-11ea-9edd-2c45b3001fd8.png) ![image](https://user-images.githubusercontent.com/1097932/90623047-d451de80-e1ca-11ea-94ba-02921b64d6f1.png) ![image](https://user-images.githubusercontent.com/1097932/90623058-d6b43880-e1ca-11ea-849a-b9ea9efe6527.png) Closes #29476 from gengliangwang/avroAPIDoc. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2020-08-21 13:12:43 +08:00
Holden Karau	548ac7c4af	[SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling ### What changes were proposed in this pull request? If graceful decommissioning is enabled, Spark's dynamic scaling uses this instead of directly killing executors. ### Why are the changes needed? When scaling down Spark we should avoid triggering recomputes as much as possible. ### Does this PR introduce _any_ user-facing change? Hopefully their jobs run faster or at the same speed. It also enables experimental shuffle service free dynamic scaling when graceful decommissioning is enabled (using the same code as the shuffle tracking dynamic scaling). ### How was this patch tested? For now I've extended the ExecutorAllocationManagerSuite for both core & streaming. Closes #29367 from holdenk/SPARK-31198-use-graceful-decommissioning-as-part-of-dynamic-scaling. Lead-authored-by: Holden Karau <hkarau@apple.com> Co-authored-by: Holden Karau <holden@pigscanfly.ca> Signed-off-by: Holden Karau <hkarau@apple.com>	2020-08-12 17:07:18 -07:00
HyukjinKwon	fa184c3308	[SPARK-32408][BUILD] Enable crossPaths back to prevent side effects ### What changes were proposed in this pull request? This PR proposes to enable `corssPaths` back for now to match with the build as it was. It still indeterministically doesn't run JUnit tests given my observation, and this PR basically reverts the partial fix from https://github.com/apache/spark/pull/29057. See also https://github.com/apache/spark/pull/29205 for the full context. ### Why are the changes needed? To prevent the side effects from crossPaths such as SPARK_PREPEND_CLASSES or tests that run conditionally if the test classes are present in PySpark. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually tested: ```bash build/sbt -Phadoop-2.7 -Phive -Phive-2.3 -Phive-thriftserver -DskipTests clean test:package ./python/run-tests --python-executable=python3 --testname="pyspark.sql.tests.test_dataframe QueryExecutionListenerTests" ``` Closes #29218 from HyukjinKwon/SPARK-32408-1. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-24 08:52:30 -07:00
Gengliang Wang	8896f4af87	Revert "[SPARK-32253][INFRA] Show errors only for the sbt tests of github actions" ### What changes were proposed in this pull request? This reverts commit `026b0b926d`. ### Why are the changes needed? As HyukjinKwon pointed out in https://github.com/apache/spark/pull/29133#issuecomment-663339240, there is no JUnit test report after https://github.com/apache/spark/pull/29133. Let's revert https://github.com/apache/spark/pull/29133 for now and find a better solution to improve the log output later. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? GitHub Actions build Closes #29219 from gengliangwang/revertErrorOnly. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2020-07-24 18:14:19 +08:00
Yuanjian Li	a71233f89d	[SPARK-32389][TESTS] Add all hive.execution suite in the parallel test group ### What changes were proposed in this pull request? Add a new parallel test group for all `hive.execution` suites. ### Why are the changes needed? Base on the tests, it can reduce the Jenkins testing time. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests. Closes #28977 from xuanyuanking/parallelTest. Authored-by: Yuanjian Li <xyliyuanjian@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-23 21:14:36 +09:00
Gengliang Wang	026b0b926d	[SPARK-32253][INFRA] Show errors only for the sbt tests of github actions ### What changes were proposed in this pull request? Make the test result log of github action more readable by showing errors from SBT only. 1. Add "--error" flag to sbt in github action to set the log level as "ERROR" 2. Show only failed test cases in stderr output of github action. According to https://www.scalatest.org/user_guide/using_the_runner, with SBT option `-eNCXEHLOPQMDF ` we can drop all the following events: ``` N - drop TestStarting events C - drop TestSucceeded events X - drop TestIgnored events E - drop TestPending events H - drop SuiteStarting events L - drop SuiteCompleted events O - drop InfoProvided events P - drop ScopeOpened events Q - drop ScopeClosed events R - drop ScopePending events M - drop MarkupProvided events ``` and enable the following two mode: ``` D - show all durations F - show full stack traces ``` ### Why are the changes needed? Currently, the output of github action is very long and we have to scroll down to find the failed test cases. Even more, the log may be truncated. In such a case, we will have to wait until all the jobs are completed and then download all the raw logs. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Before changes, all the warnings in compiling are shown: ![image](https://user-images.githubusercontent.com/1097932/87846810-98ec8900-c887-11ea-913b-164b84df62cd.png) as well as all the passed and ignored test cases: ![image](https://user-images.githubusercontent.com/1097932/87846834-ca655480-c887-11ea-9c29-977f802e4c82.png) After changes, sbt test only shows the summary for a successful job: ![image](https://user-images.githubusercontent.com/1097932/87846961-e74e5780-c888-11ea-82d5-cf1da1740181.png) ![image](https://user-images.githubusercontent.com/1097932/87745273-5735e280-c7a2-11ea-8ac9-b4b0e3cb458d.png) If there is a test failure, a full stack track is shown as well as a test failure summary at the end of test log: ![image](https://user-images.githubusercontent.com/1097932/87751143-3aa1a680-c7b2-11ea-9d09-52637a322270.png) ![image](https://user-images.githubusercontent.com/1097932/87752404-1f846600-c7b5-11ea-8106-8ddaf3cc3f7e.png) Closes #29133 from gengliangwang/shortLog. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-19 12:00:23 +09:00
Erik Krogen	cf22d947fb	[SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature ### What changes were proposed in this pull request? This PR will remove references to these "blacklist" and "whitelist" terms besides the blacklisting feature as a whole, which can be handled in a separate JIRA/PR. This touches quite a few files, but the changes are straightforward (variable/method/etc. name changes) and most quite self-contained. ### Why are the changes needed? As per discussion on the Spark dev list, it will be beneficial to remove references to problematic language that can alienate potential community members. One such reference is "blacklist" and "whitelist". While it seems to me that there is some valid debate as to whether these terms have racist origins, the cultural connotations are inescapable in today's world. ### Does this PR introduce _any_ user-facing change? In the test file `HiveQueryFileTest`, a developer has the ability to specify the system property `spark.hive.whitelist` to specify a list of Hive query files that should be tested. This system property has been renamed to `spark.hive.includelist`. The old property has been kept for compatibility, but will log a warning if used. I am open to feedback from others on whether keeping a deprecated property here is unnecessary given that this is just for developers running tests. ### How was this patch tested? Existing tests should be suitable since no behavior changes are expected as a result of this PR. Closes #28874 from xkrogen/xkrogen-SPARK-32036-rename-blacklists. Authored-by: Erik Krogen <ekrogen@linkedin.com> Signed-off-by: Thomas Graves <tgraves@apache.org>	2020-07-15 11:40:55 -05:00
HyukjinKwon	b84ed4146d	[SPARK-32245][INFRA] Run Spark tests in Github Actions ### What changes were proposed in this pull request? This PR aims to run the Spark tests in Github Actions. To briefly explain the main idea: - Reuse `dev/run-tests.py` with SBT build - Reuse the modules in `dev/sparktestsupport/modules.py` to test each module - Pass the modules to test into `dev/run-tests.py` directly via `TEST_ONLY_MODULES` environment variable. For example, `pyspark-sql,core,sql,hive`. - `dev/run-tests.py` _does not_ take the dependent modules into account but solely the specified modules to test. Another thing to note might be `SlowHiveTest` annotation. Running the tests in Hive modules takes too much so the slow tests are extracted and it runs as a separate job. It was extracted from the actual elapsed time in Jenkins: ![Screen Shot 2020-07-09 at 7 48 13 PM](https://user-images.githubusercontent.com/6477701/87050238-f6098e80-c238-11ea-9c4a-ab505af61381.png) So, Hive tests are separated into to jobs. One is slow test cases, and the other one is the other test cases. _Note that_ the current GitHub Actions build virtually copies what the default PR builder on Jenkins does (without other profiles such as JDK 11, Hadoop 2, etc.). The only exception is Kinesis https://github.com/apache/spark/pull/29057/files#diff-04eb107ee163a50b61281ca08f4e4c7bR23 ### Why are the changes needed? Last week and onwards, the Jenkins machines became very unstable for many reasons: - Apparently, the machines became extremely slow. Almost all tests can't pass. - One machine (worker 4) started to have the corrupt `.m2` which fails the build. - Documentation build fails time to time for an unknown reason in Jenkins machine specifically. This is disabled for now at https://github.com/apache/spark/pull/29017. - Almost all PRs are basically blocked by this instability currently. The advantages of using Github Actions: - To avoid depending on few persons who can access to the cluster. - To reduce the elapsed time in the build - we could split the tests (e.g., SQL, ML, CORE), and run them in parallel so the total build time will significantly reduce. - To control the environment more flexibly. - Other contributors can test and propose to fix Github Actions configurations so we can distribute this build management cost. Note that: - The current build in Jenkins takes _more than 7 hours_. With Github actions it takes _less than 2 hours_ - We can now control the environments especially for Python easily. - The test and build look more stable than the Jenkins'. ### Does this PR introduce _any_ user-facing change? No, dev-only change. ### How was this patch tested? Tested at https://github.com/HyukjinKwon/spark/pull/4 Closes #29057 from HyukjinKwon/migrate-to-github-actions. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-11 13:09:06 -07:00
Shixiong Zhu	c8779d9dfc	[SPARK-32256][SQL][TEST-HADOOP2.7] Force to initialize Hadoop VersionInfo in HiveExternalCatalog ### What changes were proposed in this pull request? Force to initialize Hadoop VersionInfo in HiveExternalCatalog to make sure Hive can get the Hadoop version when using the isolated classloader. ### Why are the changes needed? This is a regression in Spark 3.0.0 because we switched the default Hive execution version from 1.2.1 to 2.3.7. Spark allows the user to set `spark.sql.hive.metastore.jars` to specify jars to access Hive Metastore. These jars are loaded by the isolated classloader. Because we also share Hadoop classes with the isolated classloader, the user doesn't need to add Hadoop jars to `spark.sql.hive.metastore.jars`, which means when we are using the isolated classloader, hadoop-common jar is not available in this case. If Hadoop VersionInfo is not initialized before we switch to the isolated classloader, and we try to initialize it using the isolated classloader (the current thread context classloader), it will fail and report `Unknown` which causes Hive to throw the following exception: ``` java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* format) at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:147) at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:122) at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:88) at org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:377) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58) at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:78) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84) at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:219) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:67) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108) at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3349) at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:217) at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:204) at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:331) at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:292) at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:262) at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:247) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:511) at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:175) at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:128) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:301) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:431) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:324) at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:72) at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:71) at org.apache.spark.sql.hive.client.HadoopVersionInfoSuite.$anonfun$new$1(HadoopVersionInfoSuite.scala:63) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) ``` Technically, This is indeed an issue of Hadoop VersionInfo which has been fixed: https://issues.apache.org/jira/browse/HADOOP-14067. But since we are still supporting old Hadoop versions, we should fix it. Why this issue starts to happen in Spark 3.0.0? In Spark 2.4.x, we use Hive 1.2.1 by default. It will trigger `VersionInfo` initialization in the static codes of `Hive` class. This will happen when we load `HiveClientImpl` class because `HiveClientImpl.clent` method refers to `Hive` class. At this moment, the thread context classloader is not using the isolcated classloader, so it can access hadoop-common jar on the classpath and initialize it correctly. In Spark 3.0.0, we use Hive 2.3.7. The static codes of `Hive` class are not accessing `VersionInfo` because of the change in https://issues.apache.org/jira/browse/HIVE-11657. Instead, accessing `VersionInfo` happens when creating a `Hive` object (See the above stack trace). This happens here https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L260. But we switch to the isolated classloader before calling `HiveClientImpl.client` (See https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L283). This is exactly what I mentioned above: `If Hadoop VersionInfo is not initialized before we switch to the isolated classloader, and we try to initialize it using the isolated classloader (the current thread context classloader), it will fail` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The new regression test added in this PR. Note that the new UT doesn't fail with the default profiles (-Phadoop-3.2) because it's already fixed at Hadoop 3.1. Please use the following to verify this. ``` build/sbt -Phadoop-2.7 -Phive "hive/testOnly *.HadoopVersionInfoSuite" ``` Closes #29059 from zsxwing/SPARK-32256. Authored-by: Shixiong Zhu <zsxwing@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-10 21:14:29 +09:00
Ali Smesseim	8b0a54e6ff	[SPARK-32057][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] ExecuteStatement: cancel and close should not transiently ERROR ### What changes were proposed in this pull request? #28671 introduced a change where the order in which CANCELED state for SparkExecuteStatementOperation is set was changed. Before setting the state to CANCELED, `cleanup()` was called which kills the jobs, causing an exception to be thrown inside `execute()`. This causes the state to transiently become ERROR before being set to CANCELED. This PR fixes the order. ### Why are the changes needed? Bug: wrong operation state is set. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test in SparkExecuteStatementOperationSuite.scala. Closes #28912 from alismess-db/execute-statement-operation-cleanup-order. Authored-by: Ali Smesseim <ali.smesseim@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-08 09:28:16 +09:00
Kent Yao	abc8ccc37b	[SPARK-31926][SQL][TESTS][FOLLOWUP][TEST-HIVE1.2][TEST-MAVEN] Fix concurrency issue for ThriftCLIService to getPortNumber ### What changes were proposed in this pull request? This PR brings https://github.com/apache/spark/pull/28751 back - It once reverted by `4a25200` because of inevitable maven test failure - See related updates in this followup `a0187cd6b5` - And reverted again because of the flakiness of the added unit tests - In this PR, The flakiness reason found is caused by the hive metastore connection that the SparkSQLCLIService trying to create which turns out is unnecessary at all. This metastore client points to a dummy metastore server only. - Also, add some cleanups for SharedThriftServer trait in before and after to prevent its configurations being polluted or polluting others ### Why are the changes needed? fix flaky test ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? passing sbt and maven tests Closes #28835 from yaooqinn/SPARK-31926-F. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-06-19 05:58:54 +00:00
Dongjoon Hyun	75afd88904	Revert "[SPARK-31926][SQL][TEST-HIVE1.2][TEST-MAVEN] Fix concurrency issue for ThriftCLIService to getPortNumber" This reverts commit `a0187cd6b5`.	2020-06-15 19:04:23 -07:00
Kent Yao	a0187cd6b5	[SPARK-31926][SQL][TEST-HIVE1.2][TEST-MAVEN] Fix concurrency issue for ThriftCLIService to getPortNumber ### What changes were proposed in this pull request? This PR brings `02f32cfae4` back which reverted by `4a25200cd7` because of maven test failure diffs newly made: 1. add a missing log4j file to test resources 2. Call `SessionState.detachSession()` to clean the thread local one in `afterAll`. 3. Not use dedicated JVMs for sbt test runner too ### Why are the changes needed? fix the maven test ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? add new tests Closes #28797 from yaooqinn/SPARK-31926-NEW. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-06-15 06:10:24 +00:00
Kousuke Saruta	610acb2fe4	[SPARK-31644][BUILD][FOLLOWUP] Make Spark's guava version configurable from the command line for sbt ### What changes were proposed in this pull request? This PR proposes to support guava version configurable from command line for sbt. ### Why are the changes needed? #28455 added the configurability for Maven but not for sbt. sbt is usually faster than Maven so it's useful for developers. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I confirmed the guava version is changed with the following commands. ``` $ build/sbt "inspect tree clean" \| grep guava [info] +-spark/:dependencyOverrides = Set(com.google.guava:guava:14.0.1, xerces:xercesImpl:2.12.0, jline:jline:2.14.6, org.apache.avro:avro:1.8.2) ``` ``` $ build/sbt -Dguava.version=25.0-jre "inspect tree clean" \| grep guava [info] +-spark/:dependencyOverrides = Set(com.google.guava:guava:25.0-jre, xerces:xercesImpl:2.12.0, jline:jline:2.14.6, org.apache.avro:avro:1.8.2) ``` Closes #28822 from sarutak/guava-version-for-sbt. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-06-13 19:04:33 -07:00
Dongjoon Hyun	4a25200cd7	Revert "[SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber" This reverts commit `02f32cfae4`.	2020-06-10 17:21:03 -07:00
Kent Yao	02f32cfae4	[SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber ### What changes were proposed in this pull request? When` org.apache.spark.sql.hive.thriftserver.HiveThriftServer2#startWithContext` called, it starts `ThriftCLIService` in the background with a new Thread, at the same time we call `ThriftCLIService.getPortNumber,` we might not get the bound port if it's configured with 0. This PR moves the TServer/HttpServer initialization code out of that new Thread. ### Why are the changes needed? Fix concurrency issue, improve test robustness. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? add new tests Closes #28751 from yaooqinn/SPARK-31926. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-06-09 16:49:40 +00:00
Kousuke Saruta	d3eba5bc8c	[SPARK-31756][WEBUI] Add real headless browser support for UI test ### What changes were proposed in this pull request? This PR mainly adds two things. 1. Real headless browser support for UI test 2. A test suite using headless Chrome as one instance of those browsers. Also, for environment where Chrome and Chrome driver is not installed, `ChromeUITest` tag is added to filter out the test suite. By default, test suites with `ChromeUITest` is disabled. ### Why are the changes needed? In the current master, there are two problems for UI test. 1. Lots of tests especially JavaScript related ones are done manually. Appearance is better to be confirmed by our eyes but logic should be tested by test cases ideally. 2. Compared to the real web browsers, HtmlUnit doesn't seem to support JavaScript enough. I added a JavaScript related test before for SPARK-31534 using HtmlUnit which is simple library based headless browser for test. The test I added works somehow but some JavaScript related error is shown in unit-tests.log. ``` ======= EXCEPTION START ======== Exception class=[net.sourceforge.htmlunit.corejs.javascript.JavaScriptException] com.gargoylesoftware.htmlunit.ScriptException: Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null\|function)". (http://192.168.1.209:60724/static/jquery-3.4.1.min.js#2) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:904) at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:515) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:835) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:807) at com.gargoylesoftware.htmlunit.InteractivePage.executeJavaScriptFunctionIfPossible(InteractivePage.java:216) at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptFunctionJob.runJavaScript(JavaScriptFunctionJob.java:52) at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptExecutionJob.run(JavaScriptExecutionJob.java:102) at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl.runSingleJob(JavaScriptJobManagerImpl.java:426) at com.gargoylesoftware.htmlunit.javascript.background.DefaultJavaScriptExecutor.run(DefaultJavaScriptExecutor.java:157) at java.lang.Thread.run(Thread.java:748) Caused by: net.sourceforge.htmlunit.corejs.javascript.JavaScriptException: Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null\|function)". (http://192.168.1.209:60724/static/jquery-3.4.1.min.js#2) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1009) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:800) at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:413) at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:252) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3264) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$4.doRun(JavaScriptEngine.java:828) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:889) ... 10 more JavaScriptException value = Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null\|function)". == CALLING JAVASCRIPT == function () { throw e; } ======= EXCEPTION END ======== ``` I tried to upgrade HtmlUnit to 2.40.0 but what is worse, the test become not working even though it works on real browsers like Chrome, Safari and Firefox without error. ``` [info] UISeleniumSuite: [info] - SPARK-31534: text for tooltip should be escaped * FAILED * (17 seconds, 745 milliseconds) [info] The code passed to eventually never returned normally. Attempted 2 times over 12.910785232 seconds. Last failure message: com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: Assignment to undefined "regeneratorRuntime" in strict mode (http://192.168.1.209:62132/static/vis-timeline-graph2d.min.js#52(Function)#1) ``` To resolve those problems, it's better to support headless browser for UI test. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I tested with following patterns. Both Chrome and Chrome driver should be installed to test. 1. sbt / with default excluded tags (ChromeUISeleniumSuite is expected to be skipped and SQLQueryTestSuite is expected to succeed) `build/sbt -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly org.apache.spark.ui.ChromeUISeleniumSuite org.apache.spark.sql.SQLQueryTestSuite" 2. sbt / overwrite default excluded tags as empty string (Both suites are expected to succeed) `build/sbt -Dtest.default.exclude.tags= -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly org.apache.spark.ui.ChromeUISeleniumSuite org.apache.spark.sql.SQLQueryTestSuite" 3. sbt / set `test.exclude.tags` to `org.apache.spark.tags.ExtendedSQLTest` (Both suites are expected to be skipped) `build/sbt -Dtest.exclude.tags=org.apache.spark.tags.ExtendedSQLTest -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly org.apache.spark.ui.ChromeUISeleniumSuite org.apache.spark.sql.SQLQueryTestSuite" 4. Maven / with default excluded tags (ChromeUISeleniumSuite is expected to be skipped and SQLQueryTestSuite is expected to succeed) `build/mvn -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite,org.apache.spark.sql.SQLQueryTestSuite test` 5. Maven / overwrite default excluded tags as empty string (Both suites are expected to succeed) `build/mvn -Dtest.default.exclude.tags= -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite,org.apache.spark.sql.SQLQueryTestSuite test` 6. Maven / set `test.exclude.tags` to `org.apache.spark.tags.ExtendedSQLTest` (Both suites are expected to be skipped) `build/mvn -Dtest.exclude.tags=org.apache.spark.tags.ExtendedSQLTest -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite,org.apache.spark.sql.SQLQueryTestSuite test` Closes #28627 from sarutak/real-headless-browser-support-take2. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-29 10:41:29 -07:00
Dongjoon Hyun	625abca9db	[SPARK-31858][BUILD] Upgrade commons-io to 2.5 in Hadoop 3.2 profile ### What changes were proposed in this pull request? This PR aims to upgrade `commons-io` from 2.4 to 2.5 for Apache Spark 3.1. ### Why are the changes needed? Since Hadoop 3.1, `commons-io` 2.5 is used. - https://issues.apache.org/jira/browse/HADOOP-15261 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins with Hadoop-3.2 profile. Maven dependency is verified via `test-dependencies.sh` automatically. SBT dependency can be verified like the following manually. ``` build/sbt -Phadoop-3.2 "core/dependencyTree" \| grep commons-io:commons-io \| head -n1 [info] \| \| +-commons-io:commons-io:2.5 ``` Closes #28665 from dongjoon-hyun/SPARK-31858. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-29 07:46:53 -07:00
Dongjoon Hyun	6180028a37	[SPARK-31547][BUILD] Upgrade Genjavadoc to 0.16 ### What changes were proposed in this pull request? This PR aims to upgrade Genjavadoc to 0.16. ### Why are the changes needed? Although we skipped Scala 2.12.11, this brings 2.12.11 official support and better 2.12.12 compatibility. - https://github.com/lightbend/genjavadoc/commits/v0.16 ### Does this PR introduce any user-facing change? No. (The generated doc is the same) ### How was this patch tested? Build with 0.15 and 0.16. ``` $ SKIP_PYTHONDOC=1 SKIP_RDOC=1 SKIP_SQLDOC=1 jekyll build ``` Compare the result. The generated doc is identical. ``` $ diff -r _site_0.15 _site_0.16 \| grep -v '^diff -r' \| grep -v 'Generated by javadoc' \| sort \| uniq --- 5c5 ``` Closes #28321 from dongjoon-hyun/SPARK-31547. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>	2020-04-24 12:13:10 +09:00
Kent Yao	336621e277	[SPARK-31258][BUILD] Pin the avro version in SBT ### What changes were proposed in this pull request? add arvo dep in SparkBuild ### Why are the changes needed? fix sbt unidoc like https://github.com/apache/spark/pull/28017#issuecomment-603828597 ```scala [warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list [warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list [info] Main Scala API documentation to /home/jenkins/workspace/SparkPullRequestBuilder6/target/scala-2.12/unidoc... [info] Main Java API documentation to /home/jenkins/workspace/SparkPullRequestBuilder6/target/javaunidoc... [error] /home/jenkins/workspace/SparkPullRequestBuilder6/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala:123: value createDatumWriter is not a member of org.apache.avro.generic.GenericData [error] writerCache.getOrElseUpdate(schema, GenericData.get.createDatumWriter(schema)) [error] ^ [info] No documentation generated with unsuccessful compiler run [error] one error found ``` ### Does this PR introduce any user-facing change? no ### How was this patch tested? pass jenkins and verify manually with `sbt dependencyTree` ```scala kentyaohulk  ~/spark   dep  build/sbt dependencyTree \| grep avro \| grep -v Resolving [info] +-org.apache.avro:avro-mapred:1.8.2 [info] \| +-org.apache.avro:avro-ipc:1.8.2 [info] \| \| +-org.apache.avro:avro:1.8.2 [info] +-org.apache.avro:avro:1.8.2 [info] \| \| +-org.apache.avro:avro:1.8.2 [info] org.apache.spark:spark-avro_2.12:3.1.0-SNAPSHOT [S] [info] \| \| \| +-org.apache.avro:avro-mapred:1.8.2 [info] \| \| \| \| +-org.apache.avro:avro-ipc:1.8.2 [info] \| \| \| \| \| +-org.apache.avro:avro:1.8.2 [info] \| \| \| +-org.apache.avro:avro:1.8.2 [info] \| \| \| \| \| +-org.apache.avro:avro:1.8.2 ``` Closes #28020 from yaooqinn/dep. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-26 10:48:11 +09:00
Gabor Somogyi	bf342bafa8	[SPARK-30541][TESTS] Implement KafkaDelegationTokenSuite with testRetry ### What changes were proposed in this pull request? `KafkaDelegationTokenSuite` has been ignored because showed flaky behaviour. In this PR I've changed the approach how the test executed and turning it on again. This PR contains the following: * The test runs in separate JVM in order to avoid modified security context * The body of the test runs in `testRetry` which reties if failed * Additional logs to analyse possible failures * Enhanced clean-up code ### Why are the changes needed? `KafkaDelegationTokenSuite ` is ignored. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Executed the test in loop 1k+ times in jenkins (locally much harder to reproduce). Closes #27877 from gaborgsomogyi/SPARK-30541. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-21 18:59:29 -07:00
Prashant Sharma	3b6da36cd6	[SPARK-31120][BUILD] Support enabling maven profiles for importing vi… …a sbt on Intellij IDEA. ### What changes were proposed in this pull request? Read from java property "sbt.maven.profiles", the maven profiles to be enabled while importing to intellij IDEA via SBT. ### Why are the changes needed? Without this change one needs to set an os-wide environment variable `SBT_MAVEN_PROFILES`, on mac it is even trickier (I have not figured out, what can be done). ### Does this PR introduce any user-facing change? None ### How was this patch tested? Manually tested by applying multiple profiles or a single profile. Please see the attached images to see the steps. <img width="802" alt="Screenshot 2020-03-11 at 4 09 57 PM" src="https://user-images.githubusercontent.com/992952/76411667-46223280-63b8-11ea-9a77-dc014b66d48b.png"> <img width="867" alt="Screenshot 2020-03-11 at 4 18 09 PM" src="https://user-images.githubusercontent.com/992952/76411676-4ae6e680-63b8-11ea-895d-ed9d6cc223c5.png"> Closes #27878 from ScrapCodes/SPARK-31120/idea-load-maven-profiles. Authored-by: Prashant Sharma <prashsh1@in.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-03-15 12:39:46 -05:00
Dongjoon Hyun	972e23d181	[SPARK-31130][BUILD] Use the same version of `commons-io` in SBT ### What changes were proposed in this pull request? This PR (SPARK-31130) aims to pin `Commons IO` version to `2.4` in SBT build like Maven build. ### Why are the changes needed? [HADOOP-15261](https://issues.apache.org/jira/browse/HADOOP-15261) upgraded `commons-io` from 2.4 to 2.5 at Apache Hadoop 3.1. In `Maven`, Apache Spark always uses `Commons IO 2.4` based on `pom.xml`. ``` $ git grep commons-io.version pom.xml: <commons-io.version>2.4</commons-io.version> pom.xml: <version>${commons-io.version}</version> ``` However, `SBT` choose `2.5`. branch-3.0 ``` $ build/sbt -Phadoop-3.2 "core/dependencyTree" \| grep commons-io:commons-io \| head -n1 [info] \| \| +-commons-io:commons-io:2.5 ``` branch-2.4 ``` $ build/sbt -Phadoop-3.1 "core/dependencyTree" \| grep commons-io:commons-io \| head -n1 [info] \| \| +-commons-io:commons-io:2.5 ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with `[test-hadoop3.2]` (the default PR Builder is `SBT`) and manually do the following locally. ``` build/sbt -Phadoop-3.2 "core/dependencyTree" \| grep commons-io:commons-io \| head -n1 ``` Closes #27886 from dongjoon-hyun/SPARK-31130. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-12 09:06:29 -07:00
HyukjinKwon	5b3277f4fc	[SPARK-30994][BUILD][FOLLOW-UP] Change scope of xml-apis to include it and add xerces in SBT as dependency override ### What changes were proposed in this pull request? This PR propose 1. Explicitly include xml-apis. xml-apis is already the part of xerces 2.12.0 (https://repo1.maven.org/maven2/xerces/xercesImpl/2.12.0/xercesImpl-2.12.0.pom). However, we're excluding it by setting `scope` to `test`. This seems causing `spark-shell`, built from Maven, to fail. Seems like previously xml-apis wasn't reached for some reasons but after we upgrade, it seems requiring. Therefore, this PR proposes to include it. 2. Pins `xerces` version in SBT as well. Seems this dependency is resolved differently from Maven. Note that Hadoop 3 does not looks requiring this as they replaced xerces as of [HDFS-12221](https://issues.apache.org/jira/browse/HDFS-12221). ### Why are the changes needed? To make `spark-shell` working from Maven build, and uses the same xerces version. ### Does this PR introduce any user-facing change? No, it's master only. ### How was this patch tested? 1. ```bash ./build/mvn -DskipTests -Psparkr -Phive clean package ./bin/spark-shell ``` Before: ``` Exception in thread "main" java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.xerces.parsers.AbstractDOMParser.startDocument(Unknown Source) at org.apache.xerces.xinclude.XIncludeHandler.startDocument(Unknown Source) at org.apache.xerces.impl.dtd.XMLDTDValidator.startDocument(Unknown Source) at org.apache.xerces.impl.XMLDocumentScannerImpl.startEntity(Unknown Source) at org.apache.xerces.impl.XMLVersionDetector.startDocumentParsing(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2482) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2470) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2541) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2494) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2407) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115) at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.w3c.dom.ElementTraversal at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 42 more ``` After: ``` ... Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT /_/ Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_202) Type in expressions to have them evaluated. Type :help for more information. scala> ``` 2. ``` ./build/sbt dependencyTree -Phadoop-2.7 -Phive-2.3 -Phive-thriftserver -Phive ./build/sbt dependencyTree -Phadoop-3.2 -Phive-2.3 -Phive-thriftserver -Phive ``` Closes #27808 from HyukjinKwon/SPARK-30994. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-06 09:39:02 +09:00
Josh Rosen	f152d2a0a8	[SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven Central ### What changes were proposed in this pull request? This PR is a followup to #27307: per https://travis-ci.community/t/maven-builds-that-use-the-gcs-maven-central-mirror-should-update-their-paths/5926, the Google Cloud Storage mirror of Maven Central has updated its URLs: the new paths are updated more frequently. The new paths are listed on https://storage-download.googleapis.com/maven-central/index.html This patch updates our build files to use these new URLs. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing build + tests. Closes #27688 from JoshRosen/update-gcs-mirror-url. Authored-by: Josh Rosen <joshrosen@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-02-25 17:04:13 +09:00
Yuanjian Li	a5efbb284e	[SPARK-30809][SQL] Review and fix issues in SQL API docs ### What changes were proposed in this pull request? - Add missing `since` annotation. - Don't show classes under `org.apache.spark.sql.dynamicpruning` package in API docs. - Fix the scope of `xxxExactNumeric` to remove it from the API docs. ### Why are the changes needed? Avoid leaking APIs unintentionally in Spark 3.0.0. ### Does this PR introduce any user-facing change? No. All these changes are to avoid leaking APIs unintentionally in Spark 3.0.0. ### How was this patch tested? Manually generated the API docs and verified the above issues have been fixed. Closes #27560 from xuanyuanking/SPARK-30809. Authored-by: Yuanjian Li <xyliyuanjian@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-02-21 17:03:22 +08:00
HyukjinKwon	2bc765a831	[SPARK-30756][SQL] Fix `ThriftServerWithSparkContextSuite` on spark-branch-3.0-test-sbt-hadoop-2.7-hive-2.3 ### What changes were proposed in this pull request? This PR tries #26710 (comment) way to fix the test. ### Why are the changes needed? To make the tests pass. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Jenkins will test first, and then `on spark-branch-3.0-test-sbt-hadoop-2.7-hive-2.3` will test it out. Closes #27513 from HyukjinKwon/test-SPARK-30756. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit `8efe367a4e`) Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-02-11 15:50:16 +09:00
Shixiong Zhu	e2ebca733c	[SPARK-30779][SS] Fix some API issues found when reviewing Structured Streaming API docs ### What changes were proposed in this pull request? - Fix the scope of `Logging.initializeForcefully` so that it doesn't appear in subclasses' public methods. Right now, `sc.initializeForcefully(false, false)` is allowed to called. - Don't show classes under `org.apache.spark.internal` package in API docs. - Add missing `since` annotation. - Fix the scope of `ArrowUtils` to remove it from the API docs. ### Why are the changes needed? Avoid leaking APIs unintentionally in Spark 3.0.0. ### Does this PR introduce any user-facing change? No. All these changes are to avoid leaking APIs unintentionally in Spark 3.0.0. ### How was this patch tested? Manually generated the API docs and verified the above issues have been fixed. Closes #27528 from zsxwing/audit-ss-apis. Authored-by: Shixiong Zhu <zsxwing@gmail.com> Signed-off-by: Xiao Li <gatorsmile@gmail.com>	2020-02-10 14:26:14 -08:00
HyukjinKwon	6f4703e22e	[SPARK-30690][DOCS][BUILD] Add CalendarInterval into API documentation ### What changes were proposed in this pull request? We should also expose it in documentation as we marked it as unstable API as of SPARK-30547 Note that, seems Javadoc -> Scaladoc doesn't work but this PR does not target to fix. ### Why are the changes needed? To show the documentation of API. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually built the docs via `jykill serve` under `docs` directory: ![Screen Shot 2020-01-31 at 4 04 15 PM](https://user-images.githubusercontent.com/6477701/73519315-12143300-4444-11ea-9260-070c9f672dde.png) Closes #27412 from HyukjinKwon/SPARK-30547. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-01-31 22:50:01 +09:00
HyukjinKwon	cd9ccdc0ac	[SPARK-30601][BUILD] Add a Google Maven Central as a primary repository ### What changes were proposed in this pull request? This PR proposes to address four things. Three issues and fixes were a bit mixed so this PR sorts it out. See also http://apache-spark-developers-list.1001551.n3.nabble.com/Adding-Maven-Central-mirror-from-Google-to-the-build-td28728.html for the discussion in the mailing list. 1. Add the Google Maven Central mirror (GCS) as a primary repository. This will not only help development more stable but also in order to make Github Actions build (where it is always required to download jars) stable. In case of Jenkins PR builder, it wouldn't be affected too much as it uses the pre-downloaded jars under `.m2`. - Google Maven Central seems stable for heavy workload but not synced very quickly (e.g., new release is missing) - Maven Central (default) seems less stable but synced quickly. We already added this GCS mirror as a default additional remote repository at SPARK-29175. So I don't see an issue to add it as a repo. `abf759a91e/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (L2111-L2118)` 2. Currently, we have the hard-corded repository in [`sbt-pom-reader`](https://github.com/JoshRosen/sbt-pom-reader/blob/v1.0.0-spark/src/main/scala/com/typesafe/sbt/pom/MavenPomResolver.scala#L32) and this seems overwriting Maven's existing resolver by the same ID `central` with `http://` when initially the pom file is ported into SBT instance. This uses `http://` which latently Maven Central disallowed (see https://github.com/apache/spark/pull/27242) My speculation is that we just need to be able to load plugin and let it convert POM to SBT instance with another fallback repo. After that, it _seems_ using `central` with `https` properly. See also https://github.com/apache/spark/pull/27307#issuecomment-576720395. I double checked that we use `https` properly from the SBT build as well: ``` [debug] downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom ... [debug] public: downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom [debug] public: downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom.sha1 ``` This was fixed by adding the same repo (https://github.com/apache/spark/pull/27281), `central_without_mirror`, which is a bit awkward. Instead, this PR adds GCS as a main repo, and community Maven central as a fallback repo. So, presumably the community Maven central repo is used when the plugin is loaded as a fallback. 3. While I am here, I fix another issue. Github Action at https://github.com/apache/spark/pull/27279 is being failed. The reason seems to be scalafmt 1.0.3 is in Maven central but not in GCS. ``` org.apache.maven.plugin.PluginResolutionException: Plugin org.antipathy:mvn-scalafmt_2.12:1.0.3 or one of its dependencies could not be resolved: Could not find artifact org.antipathy:mvn-scalafmt_2.12🫙1.0.3 in google-maven-central (https://maven-central.storage-download.googleapis.com/repos/central/data/) at org.apache.maven.plugin.internal.DefaultPluginDependenciesResolver.resolve (DefaultPluginDependenciesResolver.java:131) ``` `mvn-scalafmt` exists in Maven central: ```bash $ curl https://repo.maven.apache.org/maven2/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom ``` ```xml <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> ... ``` whereas not in GCS mirror: ```bash $ curl https://maven-central.storage-download.googleapis.com/repos/central/data/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom ``` ```xml <?xml version='1.0' encoding='UTF-8'?><Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Details>No such object: maven-central/repos/central/data/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom</Details></Error>% ``` In this PR, simply make both repos accessible by adding to `pluginRepositories`. 4. Remove the workarounds in Github Actions to switch mirrors because now we have same repos in the same order (Google Maven Central first, and Maven Central second) ### Why are the changes needed? To make the build and Github Action more stable. ### Does this PR introduce any user-facing change? No, dev only change. ### How was this patch tested? I roughly checked local and PR against my fork (https://github.com/HyukjinKwon/spark/pull/2 and https://github.com/HyukjinKwon/spark/pull/3). Closes #27307 from HyukjinKwon/SPARK-30572. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-01-23 16:00:21 +09:00
Kousuke Saruta	a3357dfcca	[SPARK-30544][BUILD] Upgrade the version of Genjavadoc to 0.15 ### What changes were proposed in this pull request? Upgrade the version of Genjavadoc from 0.14 to 0.15. ### Why are the changes needed? To enable to build for Scala 2.13.1. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? I confirmed there is no dependency error related to genjavadoc by manual build. Also, I generated javadoc by `LANG=C build/sbt -Pkinesis-asl -Pyarn -Pkubernetes -Phive-thriftserver unidoc` for both code with/without this change and did `diff -r` target/javadoc. Closes #27255 from sarutak/upgrade-genjavadoc. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-18 00:15:49 -08:00
Sean Owen	fac6b9bde8	Revert [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies This reverts commit `709387d660`. See https://issues.apache.org/jira/browse/SPARK-27300?focusedCommentId=16990048&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16990048 and previous mailing list discussions. ### What changes were proposed in this pull request? Revert the addition of skeleton graph API modules for Spark 3.0. ### Why are the changes needed? It does not appear that content will be added to the module for Spark 3, so I propose avoiding committing to the modules, which are no-ops now, in the upcoming major 3.0 release. ### Does this PR introduce any user-facing change? No, the modules were not released. ### How was this patch tested? Existing tests, but mostly N/A. Closes #26928 from srowen/Revert27300. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-17 09:06:23 -08:00
HyukjinKwon	a57bbf2ee0	[SPARK-30164][TESTS][DOCS] Exclude Hive domain in Unidoc build explicitly ### What changes were proposed in this pull request? This PR proposes to exclude Unidoc checking in Hive domain. We don't publish this as a part of Spark documentation (see also https://github.com/apache/spark/blob/master/docs/_plugins/copy_api_dirs.rb#L30) and most of them are copy of Hive thrift server so that we can officially use Hive 2.3 release. It doesn't much make sense to check the documentation generation against another domain, and that we don't use in documentation publish. ### Why are the changes needed? To avoid unnecessary computation. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? By Jenkins: ``` ======================================================================== Building Spark ======================================================================== [info] Building Spark using SBT with these arguments: -Phadoop-2.7 -Phive-2.3 -Phive -Pmesos -Pkubernetes -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pspark-ganglia-lgpl -Pyarn test:package streaming-kinesis-asl-assembly/assembly ... ======================================================================== Building Unidoc API Documentation ======================================================================== [info] Building Spark unidoc using SBT with these arguments: -Phadoop-2.7 -Phive-2.3 -Phive -Pmesos -Pkubernetes -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pspark-ganglia-lgpl -Pyarn unidoc ... [info] Main Java API documentation successful. ... [info] Main Scala API documentation successful. ``` Closes #26800 from HyukjinKwon/do-not-merge. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-09 13:15:49 +09:00
shahid	91b83de417	[SPARK-30086][SQL][TESTS] Run HiveThriftServer2ListenerSuite on a dedicated JVM to fix flakiness ### What changes were proposed in this pull request? This PR tries to fix flakiness in `HiveThriftServer2ListenerSuite` by using a dedicated JVM (after we switch to Hive 2.3 by default in PR builders). Likewise in `4a73bed318`, there's no explicit evidence for this fix. See https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114653/testReport/org.apache.spark.sql.hive.thriftserver.ui/HiveThriftServer2ListenerSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/ ``` sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.LinkageError: loader constraint violation: loader (instance of net/bytebuddy/dynamic/loading/MultipleParentClassLoader) previously initiated loading for a different type with name "org/apache/hive/service/ServiceStateChangeListener" at org.mockito.codegen.HiveThriftServer2$MockitoMock$1974707245.<clinit>(Unknown Source) at sun.reflect.GeneratedSerializationConstructorAccessor164.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.objenesis.instantiator.sun.SunReflectionFactoryInstantiator.newInstance(SunReflectionFactoryInstantiator.java:48) at org.objenesis.ObjenesisBase.newInstance(ObjenesisBase.java:73) at org.mockito.internal.creation.instance.ObjenesisInstantiator.newInstance(ObjenesisInstantiator.java:19) at org.mockito.internal.creation.bytebuddy.SubclassByteBuddyMockMaker.createMock(SubclassByteBuddyMockMaker.java:47) at org.mockito.internal.creation.bytebuddy.ByteBuddyMockMaker.createMock(ByteBuddyMockMaker.java:25) at org.mockito.internal.util.MockUtil.createMock(MockUtil.java:35) at org.mockito.internal.MockitoCore.mock(MockitoCore.java:62) at org.mockito.Mockito.mock(Mockito.java:1908) at org.mockito.Mockito.mock(Mockito.java:1880) at org.apache.spark.sql.hive.thriftserver.ui.HiveThriftServer2ListenerSuite.createAppStatusStore(HiveThriftServer2ListenerSuite.scala:156) at org.apache.spark.sql.hive.thriftserver.ui.HiveThriftServer2ListenerSuite.$anonfun$new$3(HiveThriftServer2ListenerSuite.scala:47) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) ``` ### Why are the changes needed? To make test cases more robust. ### Does this PR introduce any user-facing change? No (dev only). ### How was this patch tested? Jenkins build. Closes #26720 from shahidki31/mock. Authored-by: shahid <shahidki31@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-11-30 20:30:04 +09:00
HyukjinKwon	4a73bed318	[SPARK-29991][INFRA] Support Hive 1.2 and Hive 2.3 (default) in PR builder ### What changes were proposed in this pull request? Currently, Apache Spark PR Builder using `hive-1.2` for `hadoop-2.7` and `hive-2.3` for `hadoop-3.2`. This PR aims to support - `[test-hive1.2]` in PR builder - `[test-hive2.3]` in PR builder to be consistent and independent of the default profile - After this PR, all PR builders will use Hive 2.3 by default (because Spark uses Hive 2.3 by default as of `c98e5eb339`) - Use default profile in AppVeyor build. Note that this was reverted due to unexpected test failure at `ThriftServerPageSuite`, which was investigated in https://github.com/apache/spark/pull/26706 . This PR fixed it by letting it use their own forked JVM. There is no explicit evidence for this fix and it was just my speculation, and thankfully it fixed at least. ### Why are the changes needed? This new tag allows us more flexibility. ### Does this PR introduce any user-facing change? No. (This is a dev-only change.) ### How was this patch tested? Check the Jenkins triggers in this PR. Default: ``` ======================================================================== Building Spark ======================================================================== [info] Building Spark using SBT with these arguments: -Phadoop-2.7 -Phive-2.3 -Phive-thriftserver -Pmesos -Pspark-ganglia-lgpl -Phadoop-cloud -Phive -Pkubernetes -Pkinesis-asl -Pyarn test:package streaming-kinesis-asl-assembly/assembly ``` `[test-hive1.2][test-hadoop3.2]`: ``` ======================================================================== Building Spark ======================================================================== [info] Building Spark using SBT with these arguments: -Phadoop-3.2 -Phive-1.2 -Phadoop-cloud -Pyarn -Pspark-ganglia-lgpl -Phive -Phive-thriftserver -Pmesos -Pkubernetes -Pkinesis-asl test:package streaming-kinesis-asl-assembly/assembly ``` `[test-maven][test-hive-2.3]`: ``` ======================================================================== Building Spark ======================================================================== [info] Building Spark using Maven with these arguments: -Phadoop-2.7 -Phive-2.3 -Pspark-ganglia-lgpl -Pyarn -Phive -Phadoop-cloud -Pkinesis-asl -Pmesos -Pkubernetes -Phive-thriftserver clean package -DskipTests ``` Closes #26710 from HyukjinKwon/SPARK-29991. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-11-30 12:48:15 +09:00
Dongjoon Hyun	f77c10de38	[SPARK-29923][SQL][TESTS] Set io.netty.tryReflectionSetAccessible for Arrow on JDK9+ ### What changes were proposed in this pull request? This PR aims to add `io.netty.tryReflectionSetAccessible=true` to the testing configuration for JDK11 because this is an officially documented requirement of Apache Arrow. Apache Arrow community documented this requirement at `0.15.0` ([ARROW-6206](https://github.com/apache/arrow/pull/5078)). > #### For java 9 or later, should set "-Dio.netty.tryReflectionSetAccessible=true". > This fixes `java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available`. thrown by netty. ### Why are the changes needed? After ARROW-3191, Arrow Java library requires the property `io.netty.tryReflectionSetAccessible` to be set to true for JDK >= 9. After https://github.com/apache/spark/pull/26133, JDK11 Jenkins job seem to fail. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/676/ - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/677/ - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/678/ ```scala Previous exception in task: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:473) io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:243) io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:233) io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:245) org.apache.arrow.vector.ipc.message.ArrowRecordBatch.computeBodyLength(ArrowRecordBatch.java:222) ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with JDK11. Closes #26552 from dongjoon-hyun/SPARK-ARROW-JDK11. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-15 23:58:15 -08:00
Jungtaek Lim (HeartSaVioR)	121510cb7b	[SPARK-29604][SQL][FOLLOWUP][test-hadoop3.2] Let SparkSQLEnvSuite to be run in dedicated JVM ### What changes were proposed in this pull request? This patch addresses CI build issue on sbt Hadoop-3.2 Jenkins job: SparkSQLEnvSuite are failing. Looks like the reason of test failure is the test checks registered listeners from active SparkSession which could be interfered with other test suites running concurrently. If we isolate test suite the problem should be gone. ### Why are the changes needed? CI builds for "spark-master-test-sbt-hadoop-3.2" are failing. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? I've run the single test suite with below command and it passed 3 times sequentially: ``` build/sbt "hive-thriftserver/testOnly *.SparkSQLEnvSuite" -Phadoop-3.2 -Phive-thriftserver ``` so we expect the test suite will pass if we isolate the test suite. Closes #26342 from HeartSaVioR/SPARK-29604-FOLLOWUP. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-31 08:34:39 -07:00
Dongjoon Hyun	df28671800	[SPARK-29282][TESTS] Use the same VM configurations for test/benchmark ### What changes were proposed in this pull request? This PR aims to specify the JDK8 default configurations `-XX:+UseParallelGC -XX:-UseDynamicNumberOfGCThreads` explicitly. As we see in this PR [here](https://github.com/apache/spark/pull/25966/files#diff-12b89b7ee67c63c2254b749c8f8d0694R10), this will make the comparison between JDK8 and JDK11 easier by removing a misleading regression. NOTE THAT THESE JVM CONFS ARE ONLY FOR BENCHMARK COMPARISON, NOT FOR A PRODUCTION ### Why are the changes needed? There exists many JVM-level changes between JDK8 and JDK11. For example, the followings are notable changes and it turns out that especially (1) and (2) shows a misleading regression in our micro-benchmark environment because our microbenchmark uses small VM memory. 1. [JEP 248: Make G1 the Default Garbage Collector](https://bugs.openjdk.java.net/browse/JDK-8073273) JDK9+ 2. [Enable UseDynamicNumberOfGCThreads by default](https://bugs.openjdk.java.net/browse/JDK-8198547) JDK11+ 3. [Change default value of HeapSizePerGCThread](https://bugs.openjdk.java.net/browse/JDK-8200417) JDK11+ ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only JVM configuration change. Manually, run the benchmark. Closes #25966 from dongjoon-hyun/SPARK-29282. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-29 15:11:46 -07:00
Yuming Wang	8c3f27ceb4	[SPARK-28683][BUILD] Upgrade Scala to 2.12.10 ## What changes were proposed in this pull request? This PR upgrade Scala to 2.12.10. Release notes: - Fix regression in large string interpolations with non-String typed splices - Revert "Generate shallower ASTs in pattern translation" - Fix regression in classpath when JARs have 'a.b' entries beside 'a/b' - Faster compiler: 5–10% faster since 2.12.8 - Improved compatibility with JDK 11, 12, and 13 - Experimental support for build pipelining and outline type checking More details: https://github.com/scala/scala/releases/tag/v2.12.10 https://github.com/scala/scala/releases/tag/v2.12.9 ## How was this patch tested? Existing tests Closes #25404 from wangyum/SPARK-28683. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-18 13:30:36 -07:00
Yuming Wang	6e12b585a9	[SPARK-28527][SQL][TEST] Re-run all the tests in SQLQueryTestSuite via Thrift Server ### What changes were proposed in this pull request? This PR build a test framework that directly re-run all the tests in `SQLQueryTestSuite` via Thrift Server. But it's a little different from `SQLQueryTestSuite`: 1. Can not support [UDF testing](`44e607e921/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala (L293-L297)`). 2. Can not support `DESC` command and `SHOW` command because `SQLQueryTestSuite` [formatted the output](`1882912cca/sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala (L38-L50)`.). When building this framework, found two bug: [SPARK-28624](https://issues.apache.org/jira/browse/SPARK-28624): `make_date` is inconsistent when reading from table [SPARK-28611](https://issues.apache.org/jira/browse/SPARK-28611): Histogram's height is different found two features that ThriftServer can not support: [SPARK-28636](https://issues.apache.org/jira/browse/SPARK-28636): ThriftServer can not support decimal type with negative scale [SPARK-28637](https://issues.apache.org/jira/browse/SPARK-28637): ThriftServer can not support interval type Also, found two inconsistent behavior: [SPARK-28620](https://issues.apache.org/jira/browse/SPARK-28620): Double type returned for float type in Beeline/JDBC [SPARK-28619](https://issues.apache.org/jira/browse/SPARK-28619): The golden result file is different when tested by `bin/spark-sql` ### Why are the changes needed? Improve the overall test coverage for Thrift Server. ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A Closes #25567 from wangyum/SPARK-28527. Lead-authored-by: Yuming Wang <yumwang@ebay.com> Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-08-26 22:39:57 +09:00
Dongjoon Hyun	f0834d3a7f	Revert "[SPARK-28527][SQL][TEST] Re-run all the tests in SQLQueryTestSuite via Thrift Server" This reverts commit `efbb035902`.	2019-08-18 16:54:24 -07:00
Yuming Wang	efbb035902	[SPARK-28527][SQL][TEST] Re-run all the tests in SQLQueryTestSuite via Thrift Server ## What changes were proposed in this pull request? This PR build a test framework that directly re-run all the tests in `SQLQueryTestSuite` via Thrift Server. But it's a little different from `SQLQueryTestSuite`: 1. Can not support [UDF testing](`44e607e921/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala (L293-L297)`). 2. Can not support `DESC` command and `SHOW` command because `SQLQueryTestSuite` [formatted the output](`1882912cca/sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala (L38-L50)`.). When building this framework, found two bug: [SPARK-28624](https://issues.apache.org/jira/browse/SPARK-28624): `make_date` is inconsistent when reading from table [SPARK-28611](https://issues.apache.org/jira/browse/SPARK-28611): Histogram's height is different found two features that ThriftServer can not support: [SPARK-28636](https://issues.apache.org/jira/browse/SPARK-28636): ThriftServer can not support decimal type with negative scale [SPARK-28637](https://issues.apache.org/jira/browse/SPARK-28637): ThriftServer can not support interval type Also, found two inconsistent behavior: [SPARK-28620](https://issues.apache.org/jira/browse/SPARK-28620): Double type returned for float type in Beeline/JDBC [SPARK-28619](https://issues.apache.org/jira/browse/SPARK-28619): The golden result file is different when tested by `bin/spark-sql` ## How was this patch tested? N/A Closes #25373 from wangyum/SPARK-28527. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2019-08-17 19:12:50 -07:00

1 2 3 4 5 ...

702 commits