ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Gengliang Wang	eea7d0037e	[SPARK-36557][DOCS] Update the MAVEN_OPTS in Spark build docs ### What changes were proposed in this pull request? As Jacek Laskowski pointed out in the dev list, there is StackOverflowError if compiling Spark with the current MAVEN_OPTS in Spark documentation. We should update it with `-Xss64m` to avoid it. ### Why are the changes needed? Correct the documentation ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual test. The MAVEN_OPTS is consistent with our github action build. Closes #33804 from gengliangwang/updateBuildDoc. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `3da0e9500f`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-08-23 09:46:41 +09:00
Kousuke Saruta	d54edf0bde	[SPARK-35758][DOCS] Update the document about building Spark with Hadoop for Hadoop 2.x and 3.x ### What changes were proposed in this pull request? This PR updates the document about building Spark with Hadoop for Hadoop 3.x and Hadoop 3.2. ### Why are the changes needed? The document says about how to build like as follows: ``` ./build/mvn -Pyarn -Dhadoop.version=2.8.5 -DskipTests clean package ``` But this command fails because the default build settings are for Hadoop 3.x. So, we need to modify the command example. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I confirmed both of these commands successfully finished. ``` ./build/mvn -Pyarn -Dhadoop.version=3.3.0 -DskipTests package ./build/mvn -Phadoop-2.7 -Pyarn -Dhadoop.version=2.8.5 -DskipTests package ``` I also built the document and confirmed the result. This is before: ![hadoop-version-before](https://user-images.githubusercontent.com/4736016/122016157-bf020c80-cdfb-11eb-8e74-4840861f8541.png) And this is after: ![hadoop-version-after](https://user-images.githubusercontent.com/4736016/122016188-c75a4780-cdfb-11eb-8427-2f0765e6ff7a.png) Closes #32917 from sarutak/fix-build-doc-with-hadoop. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-06-15 20:19:50 +09:00
Yuming Wang	463daabd5a	[SPARK-34512][BUILD][SQL] Upgrade built-in Hive to 2.3.9 ### What changes were proposed in this pull request? This pr upgrades built-in Hive to 2.3.9. Hive 2.3.9 changes: - [HIVE-17155] - findConfFile() in HiveConf.java has some issues with the conf path - [HIVE-24797] - Disable validate default values when parsing Avro schemas - [HIVE-24608] - Switch back to get_table in HMS client for Hive 2.3.x - [HIVE-21200] - Vectorization: date column throwing java.lang.UnsupportedOperationException for parquet - [HIVE-21563] - Improve Table#getEmptyTable performance by disabling registerAllFunctionsOnce - [HIVE-19228] - Remove commons-httpclient 3.x usage ### Why are the changes needed? Fix regression caused by AVRO-2035. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes #32750 from wangyum/SPARK-34512. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-06-10 20:44:35 -07:00
Yikun Jiang	85b50d4258	[SPARK-34539][BUILD][INFRA] Remove stand-alone version Zinc server ### What changes were proposed in this pull request? Cleanup all Zinc standalone server code, and realated coniguration. ### Why are the changes needed? ![image](https://user-images.githubusercontent.com/1736354/109154790-c1d3e580-77a9-11eb-8cde-835deed6e10e.png) - Zinc is the incremental compiler to speed up builds of compilation. - The scala-maven-plugin is the mave plugin, which is used by Spark, one of the function is to integrate the Zinc to enable the incremental compiler. - Since Spark v3.0.0 ([SPARK-28759](https://issues.apache.org/jira/browse/SPARK-28759)), the scala-maven-plugin is upgraded to v4.X, that means Zinc v0.3.13 standalone server is useless anymore. However, we still download, install, start the standalone Zinc server. we should remove all zinc standalone server code, and all related configuration. See more in [SPARK-34539](https://issues.apache.org/jira/projects/SPARK/issues/SPARK-34539) or the doc [Zinc standalone server is useless after scala-maven-plugin 4.x](https://docs.google.com/document/d/1u4kCHDx7KjVlHGerfmbcKSB0cZo6AD4cBdHSse-SBsM). ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Run any mvn build: ./build/mvn -DskipTests clean package -pl core You could see the increamental compilation is still working, the stage of "scala-maven-plugin:4.3.0:compile (scala-compile-first)" with incremental compilation info, like: ``` [INFO] --- scala-maven-plugin:4.3.0:testCompile (scala-test-compile-first) spark-core_2.12 --- [INFO] Using incremental compilation using Mixed compile order [INFO] Compiler bridge file: /root/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.12-1.3.1-bin_2.12.10__52.0-1.3.1_20191012T045515.jar [INFO] compiler plugin: BasicArtifact(com.github.ghik,silencer-plugin_2.12.10,1.6.0,null) [INFO] Compiling 303 Scala sources and 27 Java sources to /root/spark/core/target/scala-2.12/test-classes ... ``` Closes #31647 from Yikun/cleanup-zinc. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2021-03-01 08:39:38 -06:00
Yuming Wang	c87b0085c9	[SPARK-33696][BUILD][SQL] Upgrade built-in Hive to 2.3.8 ### What changes were proposed in this pull request? Hive 2.3.8 changes: HIVE-19662: Upgrade Avro to 1.8.2 HIVE-24324: Remove deprecated API usage from Avro HIVE-23980: Shade Guava from hive-exec in Hive 2.3 HIVE-24436: Fix Avro NULL_DEFAULT_VALUE compatibility issue HIVE-24512: Exclude calcite in packaging. HIVE-22708: Fix for HttpTransport to replace String.equals HIVE-24551: Hive should include transitive dependencies from calcite after shading it HIVE-24553: Exclude calcite from test-jar dependency of hive-exec ### Why are the changes needed? Upgrade Avro and Parquet to latest version. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing test add test try to upgrade Parquet to 1.11.1 and Avro to 1.10.1: https://github.com/apache/spark/pull/30517 Closes #30657 from wangyum/SPARK-33696. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-01-17 21:54:35 -08:00
Josh Soref	485145326a	[MINOR] Spelling bin core docs external mllib repl ### What changes were proposed in this pull request? This PR intends to fix typos in the sub-modules: * `bin` * `core` * `docs` * `external` * `mllib` * `repl` * `pom.xml` Split per srowen https://github.com/apache/spark/pull/30323#issuecomment-728981618 NOTE: The misspellings have been reported at `706a726f87 (commitcomment-44064356)` ### Why are the changes needed? Misspelled words make it harder to read / understand content. ### Does this PR introduce _any_ user-facing change? There are various fixes to documentation, etc... ### How was this patch tested? No testing was performed Closes #30530 from jsoref/spelling-bin-core-docs-external-mllib-repl. Authored-by: Josh Soref <jsoref@users.noreply.github.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-11-30 13:59:51 +09:00
Kousuke Saruta	005999721f	[SPARK-33046][DOCS] Update how to build doc for Scala 2.13 with sbt ### What changes were proposed in this pull request? This PR fixes the description how to build Spark for Scala 2.13 with sbt. In the current doc, how to build Spark for Scala 2.13 with sbt is described like: ![scala-2 13-build-before](https://user-images.githubusercontent.com/4736016/94816248-80c3e900-0436-11eb-9bc2-99af5786971a.png) But build fails with this command because scala-2.13 profile is not enabled and scala-parallel-collections is absent. ``` [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala:23: object parallel is not a member of package collection ``` The correct command should be: ``` build/sbt -Pspark-2.13 compile ``` ### Why are the changes needed? The build command is wrong. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I checked that `sbt -Pspark-2.13` is correct with the following command: ``` build/sbt -Dscala.version=2.13.3 -Phive -Phive-thriftserver -Pyarn -Pkubernetes compile ``` I also build the modified doc and checked the generated html: ![spark-scala-2 13-build-doc-after](https://user-images.githubusercontent.com/4736016/94869259-f2745500-047f-11eb-89e5-20816f3ed24d.png) Closes #29921 from sarutak/fix-scala-2.13-build-doc. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-10-01 18:01:23 -05:00
Yuming Wang	b11e42663b	[SPARK-31381][SPARK-29245][SQL] Upgrade built-in Hive 2.3.6 to 2.3.7 ### What changes were proposed in this pull request? Hive 2.3.7 fixed these issues: - HIVE-21508: ClassCastException when initializing HiveMetaStoreClient on JDK10 or newer - HIVE-21980:Parsing time can be high in case of deeply nested subqueries - HIVE-22249: Support Parquet through HCatalog ### Why are the changes needed? Fix CCE during creating HiveMetaStoreClient in JDK11 environment: [SPARK-29245](https://issues.apache.org/jira/browse/SPARK-29245). ### Does this PR introduce any user-facing change? No. ### How was this patch tested? - [x] Test Jenkins with Hadoop 2.7 (https://github.com/apache/spark/pull/28148#issuecomment-616757840) - [x] Test Jenkins with Hadoop 3.2 on JDK11 (https://github.com/apache/spark/pull/28148#issuecomment-616294353) - [x] Manual test with remote hive metastore. Hive side: ``` export JAVA_HOME=/usr/lib/jdk1.8.0_221 export PATH=$JAVA_HOME/bin:$PATH cd /usr/lib/hive-2.3.6 # Start Hive metastore with Hive 2.3.6 bin/schematool -dbType derby -initSchema --verbose bin/hive --service metastore ``` Spark side: ``` export JAVA_HOME=/usr/lib/jdk-11.0.3 export PATH=$JAVA_HOME/bin:$PATH build/sbt clean package -Phive -Phadoop-3.2 -Phive-thriftserver export SPARK_PREPEND_CLASSES=true bin/spark-sql --conf spark.hadoop.hive.metastore.uris=thrift://localhost:9083 ``` Closes #28148 from wangyum/SPARK-31381. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-04-20 13:38:24 -07:00
zero323	298d0a5102	[SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 ### What changes were proposed in this pull request? - Update `testthat` to >= 2.0.0 - Replace of `testthat:::run_tests` with `testthat:::test_package_dir` - Add trivial assertions for tests, without any expectations, to avoid skipping. - Update related docs. ### Why are the changes needed? `testthat` version has been frozen by [SPARK-22817](https://issues.apache.org/jira/browse/SPARK-22817) / https://github.com/apache/spark/pull/20003, but 1.0.2 is pretty old, and we shouldn't keep things in this state forever. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? - Existing CI pipeline: - Windows build on AppVeyor, R 3.6.2, testthtat 2.3.1 - Linux build on Jenkins, R 3.1.x, testthat 1.0.2 - Additional builds with thesthat 2.3.1 using [sparkr-build-sandbox](https://github.com/zero323/sparkr-build-sandbox) on c7ed64af9e697b3619779857dd820832176b3be3 R 3.4.4 (image digest ec9032f8cf98) ``` docker pull zero323/sparkr-build-sandbox:3.4.4 docker run zero323/sparkr-build-sandbox:3.4.4 zero323 --branch SPARK-23435 --commit c7ed64af9e697b3619779857dd820832176b3be3 --public-key https://keybase.io/zero323/pgp_keys.asc ``` 3.5.3 (image digest 0b1759ee4d1d) ``` docker pull zero323/sparkr-build-sandbox:3.5.3 docker run zero323/sparkr-build-sandbox:3.5.3 zero323 --branch SPARK-23435 --commit c7ed64af9e697b3619779857dd820832176b3be3 --public-key https://keybase.io/zero323/pgp_keys.asc ``` and 3.6.2 (image digest 6594c8ceb72f) ``` docker pull zero323/sparkr-build-sandbox:3.6.2 docker run zero323/sparkr-build-sandbox:3.6.2 zero323 --branch SPARK-23435 --commit c7ed64af9e697b3619779857dd820832176b3be3 --public-key https://keybase.io/zero323/pgp_keys.asc ```` Corresponding [asciicast](https://asciinema.org/) are available as 10.5281/zenodo.3629431 [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3629431.svg)](https://doi.org/10.5281/zenodo.3629431) (a bit to large to burden asciinema.org, but can run locally via `asciinema play`). ---------------------------- Continued from #27328 Closes #27359 from zero323/SPARK-23435. Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-01-29 10:37:08 +09:00
Yuming Wang	fa47b7faf7	[SPARK-30280][DOC] Update docs for make Hive 2.3 dependency by default ### What changes were proposed in this pull request? This PR update document for make Hive 2.3 dependency by default. ### Why are the changes needed? The documentation is incorrect. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? N/A Closes #26919 from wangyum/SPARK-30280. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-21 10:51:28 -08:00
Yuming Wang	e1ee3fb72f	[SPARK-30216][INFRA] Use python3 in Docker release image ### What changes were proposed in this pull request? - Reverts commit `1f94bf4` and `d6be46e` - Switches python to python3 in Docker release image. ### Why are the changes needed? `dev/make-distribution.sh` and `python/setup.py` are use python3. https://github.com/apache/spark/pull/26844/files#diff-ba2c046d92a1d2b5b417788bfb5cb5f8L236 https://github.com/apache/spark/pull/26330/files#diff-8cf6167d58ce775a08acafcfe6f40966 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? manual test: ``` yumwangubuntu-3513086:~/spark$ dev/create-release/do-release-docker.sh -n -d /home/yumwang/spark-release Output directory already exists. Overwrite and continue? [y/n] y Branch [branch-2.4]: master Current branch version is 3.0.0-SNAPSHOT. Release [3.0.0]: 3.0.0-preview2 RC # [1]: This is a dry run. Please confirm the ref that will be built for testing. Ref [master]: ASF user [yumwang]: Full name [Yuming Wang]: GPG key [yumwangapache.org]: DBD447010C1B4F7DAD3F7DFD6E1B4122F6A3A338 ================ Release details: BRANCH: master VERSION: 3.0.0-preview2 TAG: v3.0.0-preview2-rc1 NEXT: 3.0.1-SNAPSHOT ASF USER: yumwang GPG KEY: DBD447010C1B4F7DAD3F7DFD6E1B4122F6A3A338 FULL NAME: Yuming Wang E-MAIL: yumwangapache.org ================ Is this info correct [y/n]? y GPG passphrase: ======================== = Building spark-rm image with tag latest... Command: docker build -t spark-rm:latest --build-arg UID=110302528 /home/yumwang/spark/dev/create-release/spark-rm Log file: docker-build.log Building v3.0.0-preview2-rc1; output will be at /home/yumwang/spark-release/output gpg: directory '/home/spark-rm/.gnupg' created gpg: keybox '/home/spark-rm/.gnupg/pubring.kbx' created gpg: /home/spark-rm/.gnupg/trustdb.gpg: trustdb created gpg: key 6E1B4122F6A3A338: public key "Yuming Wang <yumwangapache.org>" imported gpg: key 6E1B4122F6A3A338: secret key imported gpg: Total number processed: 1 gpg: imported: 1 gpg: secret keys read: 1 gpg: secret keys imported: 1 ======================== = Creating release tag v3.0.0-preview2-rc1... Command: /opt/spark-rm/release-tag.sh Log file: tag.log It may take some time for the tag to be synchronized to github. Press enter when you've verified that the new tag (v3.0.0-preview2-rc1) is available. ======================== = Building Spark... Command: /opt/spark-rm/release-build.sh package Log file: build.log ======================== = Building documentation... Command: /opt/spark-rm/release-build.sh docs Log file: docs.log ======================== = Publishing release Command: /opt/spark-rm/release-build.sh publish-release Log file: publish.log ``` Generated doc: ![image](https://user-images.githubusercontent.com/5399861/70693075-a7723100-1cf7-11ea-9f88-9356a02349a1.png) Closes #26848 from wangyum/SPARK-30216. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-13 11:31:31 -08:00
Yuming Wang	eb509968a7	[SPARK-30211][INFRA] Use python3 in make-distribution.sh ### What changes were proposed in this pull request? This PR switches python to python3 in `make-distribution.sh`. ### Why are the changes needed? SPARK-29672 changed this - https://github.com/apache/spark/pull/26330/files#diff-8cf6167d58ce775a08acafcfe6f40966 ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A Closes #26844 from wangyum/SPARK-30211. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-10 23:30:12 -08:00
Dongjoon Hyun	1595e46a4e	[SPARK-30142][TEST-MAVEN][BUILD] Upgrade Maven to 3.6.3 ### What changes were proposed in this pull request? This PR aims to upgrade Maven from 3.6.2 to 3.6.3. ### Why are the changes needed? This will bring bug fixes like the following. - MNG-6759 Maven fails to use <repositories> section from dependency when resolving transitive dependencies in some cases - MNG-6760 ExclusionArtifactFilter result invalid when wildcard exclusion is followed by other exclusions The following is the full release note. - https://maven.apache.org/docs/3.6.3/release-notes.html ### Does this PR introduce any user-facing change? No. (This is a dev-environment change.) ### How was this patch tested? Pass the Jenkins with both SBT and Maven. Closes #26770 from dongjoon-hyun/SPARK-30142. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-06 23:41:59 +09:00
Tomoko Komiyama	8beb736a00	[SPARK-29256][DOCS] Fix typo in building document ### What changes were proposed in this pull request? Changed 'Phive-thriftserver' to ' -Phive-thriftserver'. ### Why are the changes needed? Typo ### Does this PR introduce any user-facing change? Yes. ### How was this patch tested? Manually tested. Closes #25937 from TomokoKomiyama/fix-build-doc. Authored-by: Tomoko Komiyama <btkomiyamatm@oss.nttdata.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-09-26 08:23:43 -05:00
Dongjoon Hyun	3bf43fb60d	[SPARK-29159][BUILD] Increase ReservedCodeCacheSize to 1G ### What changes were proposed in this pull request? This PR aims to increase the JVM CodeCacheSize from 0.5G to 1G. ### Why are the changes needed? After upgrading to `Scala 2.12.10`, the following is observed during building. ``` 2019-09-18T20:49:23.5030586Z OpenJDK 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled. 2019-09-18T20:49:23.5032920Z OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize= 2019-09-18T20:49:23.5034959Z CodeCache: size=524288Kb used=521399Kb max_used=521423Kb free=2888Kb 2019-09-18T20:49:23.5035472Z bounds [0x00007fa62c000000, 0x00007fa64c000000, 0x00007fa64c000000] 2019-09-18T20:49:23.5035781Z total_blobs=156549 nmethods=155863 adapters=592 2019-09-18T20:49:23.5036090Z compilation: disabled (not enough contiguous free space left) ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually check the Jenkins or GitHub Action build log (which should not have the above). Closes #25836 from dongjoon-hyun/SPARK-CODE-CACHE-1G. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-19 00:24:15 -07:00
Xiao Li	2856398de9	[SPARK-28961][HOT-FIX][BUILD] Upgrade Maven from 3.6.1 to 3.6.2 ### What changes were proposed in this pull request? This PR is to upgrade the maven dependence from 3.6.1 to 3.6.2. ### Why are the changes needed? All the builds are broken because 3.6.1 is not available. http://ftp.wayne.edu/apache//maven/maven-3/ - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/485/ - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10536/ ![image](https://user-images.githubusercontent.com/11567269/64196667-36d69100-ce39-11e9-8f93-40eb333d595d.png) ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A Closes #25665 from gatorsmile/upgradeMVN. Authored-by: Xiao Li <gatorsmile@gmail.com> Signed-off-by: Xiao Li <gatorsmile@gmail.com>	2019-09-03 11:06:57 -07:00
Yuming Wang	02a0cdea13	[SPARK-28723][SQL] Upgrade to Hive 2.3.6 for HiveMetastore Client and Hadoop-3.2 profile ### What changes were proposed in this pull request? This PR upgrade the built-in Hive to 2.3.6 for `hadoop-3.2`. Hive 2.3.6 release notes: - [HIVE-22096](https://issues.apache.org/jira/browse/HIVE-22096): Backport [HIVE-21584](https://issues.apache.org/jira/browse/HIVE-21584) (Java 11 preparation: system class loader is not URLClassLoader) - [HIVE-21859](https://issues.apache.org/jira/browse/HIVE-21859): Backport [HIVE-17466](https://issues.apache.org/jira/browse/HIVE-17466) (Metastore API to list unique partition-key-value combinations) - [HIVE-21786](https://issues.apache.org/jira/browse/HIVE-21786): Update repo URLs in poms branch 2.3 version ### Why are the changes needed? Make Spark support JDK 11. ### Does this PR introduce any user-facing change? Yes. Please see [SPARK-28684](https://issues.apache.org/jira/browse/SPARK-28684) and [SPARK-24417](https://issues.apache.org/jira/browse/SPARK-24417) for more details. ### How was this patch tested? Existing unit test and manual test. Closes #25443 from wangyum/test-on-jenkins. Lead-authored-by: Yuming Wang <yumwang@ebay.com> Co-authored-by: HyukjinKwon <gurwls223@apache.org> Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-23 21:34:30 -07:00
Dongjoon Hyun	4856c0e33a	[SPARK-28609][DOC] Fix broken styles/links and make up-to-date ## What changes were proposed in this pull request? This PR aims to fix the broken styles/links and make the doc up-to-date for Apache Spark 2.4.4 and 3.0.0 release. - `building-spark.md` ![Screen Shot 2019-08-02 at 10 33 51 PM](https://user-images.githubusercontent.com/9700541/62407962-a248ec80-b575-11e9-8a16-532e9bc421f8.png) - `configuration.md` ![Screen Shot 2019-08-02 at 10 34 52 PM](https://user-images.githubusercontent.com/9700541/62407969-c7d5f600-b575-11e9-9b1a-a76c6cc095c5.png) - `sql-pyspark-pandas-with-arrow.md` ![Screen Shot 2019-08-02 at 10 36 14 PM](https://user-images.githubusercontent.com/9700541/62407979-18e5ea00-b576-11e9-99af-7ad9264656ae.png) - `streaming-programming-guide.md` ![Screen Shot 2019-08-02 at 10 37 11 PM](https://user-images.githubusercontent.com/9700541/62407981-213e2500-b576-11e9-8bc5-a925df7e98a7.png) - `structured-streaming-programming-guide.md` (1/2) ![Screen Shot 2019-08-02 at 10 38 20 PM](https://user-images.githubusercontent.com/9700541/62408001-49c61f00-b576-11e9-9519-f699775ceecd.png) - `structured-streaming-programming-guide.md` (2/2) ![Screen Shot 2019-08-02 at 10 40 05 PM](https://user-images.githubusercontent.com/9700541/62408017-7f6b0800-b576-11e9-9341-52664ba6b460.png) - `submitting-applications.md` ![Screen Shot 2019-08-02 at 10 41 13 PM](https://user-images.githubusercontent.com/9700541/62408027-b2ad9700-b576-11e9-910e-8f22173e1251.png) ## How was this patch tested? Manual. Build the doc. ``` SKIP_API=1 jekyll build ``` Closes #25345 from dongjoon-hyun/SPARK-28609. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-04 09:42:47 -07:00
Yuming Wang	90c64ea419	[SPARK-28267][DOC] Update building-spark.md(support build with hadoop-3.2) ## What changes were proposed in this pull request? Since [SPARK-23710](https://issues.apache.org/jira/browse/SPARK-23710), Hadoop 3.x can support Hive. This PR add _build with `hadoop-3.2`_ to building-spark.md. ## How was this patch tested? manual tests ``` cd docs SKIP_API=1 jekyll build ``` ![image](https://user-images.githubusercontent.com/5399861/60942057-cf5a0480-a313-11e9-9534-4765520e799f.png) Closes #25063 from wangyum/SPARK-28267. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-07-10 08:51:08 -05:00
Sean Owen	eed6de1a65	[MINOR][DOCS] Tighten up some key links to the project and download pages to use HTTPS ## What changes were proposed in this pull request? Tighten up some key links to the project and download pages to use HTTPS ## How was this patch tested? N/A Closes #24665 from srowen/HTTPSURLs. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-05-21 10:56:42 -07:00
Dongjoon Hyun	375cfa3d89	[SPARK-27467][BUILD] Upgrade Maven to 3.6.1 ## What changes were proposed in this pull request? This PR aims to upgrade Maven to 3.6.1 to bring JDK9+ related patches like [MNG-6506](https://issues.apache.org/jira/browse/MNG-6506). For the full release note, please see the following. - https://maven.apache.org/docs/3.6.1/release-notes.html This was committed and reverted due to AppVeyor failure. It turns out that the root cause is `PATH` issue. With the updated AppVeyor script, it passed. https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/24273412 ## How was this patch tested? Pass the Jenkins and AppVoyer Closes #24481 from dongjoon-hyun/SPARK-R. Lead-authored-by: Dongjoon Hyun <dhyun@apple.com> Co-authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-05-02 20:01:17 -07:00
HyukjinKwon	d8db7db50b	Revert "[SPARK-27467][FOLLOW-UP][BUILD] Upgrade Maven to 3.6.1 in AppVeyor and Doc" This reverts commit `bde30bc57c`.	2019-04-28 11:03:15 +09:00
Yuming Wang	bde30bc57c	[SPARK-27467][FOLLOW-UP][BUILD] Upgrade Maven to 3.6.1 in AppVeyor and Doc ## What changes were proposed in this pull request? Update the `docs/building-spark.md`. Otherwise: ``` mvn package -DskipTests=true ... [INFO] --- maven-enforcer-plugin:3.0.0-M2:enforce (enforce-versions) spark-parent_2.12 --- [WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed with message: Detected Maven Version: 3.6.0 is not in the allowed range 3.6.1. ... [ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce (enforce-versions) on project spark-parent_2.12: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. -> [Help 1] [ERROR] ... ``` ## How was this patch tested? Just test `https://archive.apache.org/dist/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.zip` is avilable. Closes #24477 from wangyum/SPARK-27467. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-04-27 09:09:47 -07:00
Sean Owen	754f820035	[SPARK-26918][DOCS] All .md should have ASF license header ## What changes were proposed in this pull request? Add AL2 license to metadata of all .md files. This seemed to be the tidiest way as it will get ignored by .md renderers and other tools. Attempts to write them as markdown comments revealed that there is no such standard thing. ## How was this patch tested? Doc build Closes #24243 from srowen/SPARK-26918. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-03-30 19:49:45 -05:00
Sean Owen	8bc304f97e	[SPARK-26132][BUILD][CORE] Remove support for Scala 2.11 in Spark 3.0.0 ## What changes were proposed in this pull request? Remove Scala 2.11 support in build files and docs, and in various parts of code that accommodated 2.11. See some targeted comments below. ## How was this patch tested? Existing tests. Closes #23098 from srowen/SPARK-26132. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-03-25 10:46:42 -05:00
Darcy Shen	9d2a11554b	[MINOR][DOC] Documentation on JVM options for SBT ## What changes were proposed in this pull request? Documentation and .gitignore ## How was this patch tested? Manual test that SBT honors the settings in .jvmopts if present Closes #23615 from sadhen/impr/gitignore. Authored-by: Darcy Shen <sadhen@zoho.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-22 18:27:24 -06:00
Kazuaki Ishizaki	1abfbda7eb	[SPARK-26212][BUILD][TEST-MAVEN] Upgrade maven version to 3.6.0 ## What changes were proposed in this pull request? This PR updates maven version from 3.5.4 to 3.6.0. The release note of the 3.6.0 is [here](https://maven.apache.org/docs/3.6.0/release-notes.html). From [the release note of the 3.6.0](https://maven.apache.org/docs/3.6.0/release-notes.html), the followings are new features: 1. There had been issues related to the project discoverytime which has been increased in previous version which influenced some of our users. 1. The output in the reactor summary has been improved. 1. There was an issue related to the classpath ordering. ## How was this patch tested? Existing tests Closes #23177 from kiszk/SPARK-26212. Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2018-12-01 07:06:18 -06:00
DB Tsai	ad853c5678	[SPARK-25956] Make Scala 2.12 as default Scala version in Spark 3.0 ## What changes were proposed in this pull request? This PR makes Spark's default Scala version as 2.12, and Scala 2.11 will be the alternative version. This implies that Scala 2.12 will be used by our CI builds including pull request builds. We'll update the Jenkins to include a new compile-only jobs for Scala 2.11 to ensure the code can be still compiled with Scala 2.11. ## How was this patch tested? existing tests Closes #22967 from dbtsai/scala2.12. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2018-11-14 16:22:23 -08:00
Dongjoon Hyun	fc9ba9dcc6	[MINOR][DOC] Update the building doc to use Maven 3.5.4 and Java 8 only ## What changes were proposed in this pull request? Since we didn't test Java 9 ~ 11 up to now in the community, fix the document to describe Java 8 only. ## How was this patch tested? N/A (This is a document only change.) Closes #22781 from dongjoon-hyun/SPARK-JDK-DOC. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2018-10-19 23:56:40 -07:00
Sean Owen	703e6da1ec	[SPARK-25705][BUILD][STREAMING][TEST-MAVEN] Remove Kafka 0.8 integration ## What changes were proposed in this pull request? Remove Kafka 0.8 integration ## How was this patch tested? Existing tests, build scripts Closes #22703 from srowen/SPARK-25705. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2018-10-16 09:10:24 -05:00
lajin	541d7e1e4b	[SPARK-25685][BUILD] Allow running tests in Jenkins in enterprise Git repository ## What changes were proposed in this pull request? Many companies have their own enterprise GitHub to manage Spark code. To build and test in those repositories with Jenkins need to modify this script. So I suggest to add some environment variables to allow regression testing in enterprise Jenkins instead of default Spark repository in GitHub. ## How was this patch tested? Manually test. Closes #22678 from LantaoJin/SPARK-25685. Lead-authored-by: lajin <lajin@ebay.com> Co-authored-by: LantaoJin <jinlantao@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2018-10-12 12:41:33 -05:00
Sean Owen	a001814189	[SPARK-25598][STREAMING][BUILD][TEST-MAVEN] Remove flume connector in Spark 3 ## What changes were proposed in this pull request? Removes all vestiges of Flume in the build, for Spark 3. I don't think this needs Jenkins config changes. ## How was this patch tested? Existing tests. Closes #22692 from srowen/SPARK-25598. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2018-10-11 14:28:06 -07:00
Sean Owen	80813e1980	[SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6 ## What changes were proposed in this pull request? Remove Hadoop 2.6 references and make 2.7 the default. Obviously, this is for master/3.0.0 only. After this we can also get rid of the separate test jobs for Hadoop 2.6. ## How was this patch tested? Existing tests Closes #22615 from srowen/SPARK-25016. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2018-10-10 12:07:53 -07:00
Yuming Wang	b0ada7dce0	[SPARK-25330][BUILD][BRANCH-2.3] Revert Hadoop 2.7 to 2.7.3 ## What changes were proposed in this pull request? How to reproduce permission issue: ```sh # build spark ./dev/make-distribution.sh --name SPARK-25330 --tgz -Phadoop-2.7 -Phive -Phive-thriftserver -Pyarn tar -zxf spark-2.4.0-SNAPSHOT-bin-SPARK-25330.tar && cd spark-2.4.0-SNAPSHOT-bin-SPARK-25330 export HADOOP_PROXY_USER=user_a bin/spark-sql export HADOOP_PROXY_USER=user_b bin/spark-sql ``` ```java Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=user_b, access=EXECUTE, inode="/tmp/hive-$%7Buser.name%7D/user_b/668748f2-f6c5-4325-a797-fd0a7ee7f4d4":user_b:hadoop:drwx------ at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) ``` The issue occurred in this commit: `feb886f209`. This pr revert Hadoop 2.7 to 2.7.3 to avoid this issue. ## How was this patch tested? unit tests and manual tests. Closes #22327 from wangyum/SPARK-25330. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2018-09-06 21:41:13 -07:00
Darcy Shen	546683c21a	[SPARK-25298][BUILD] Improve build definition for Scala 2.12 ## What changes were proposed in this pull request? Improve build for Scala 2.12. Current build for sbt fails on the subproject `repl`: ``` [info] Compiling 6 Scala sources to /Users/rendong/wdi/spark/repl/target/scala-2.12/classes... [error] /Users/rendong/wdi/spark/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoopInterpreter.scala:80: overriding lazy value importableSymbolsWithRenames in class ImportHandler of type List[(this.intp.global.Symbol, this.intp.global.Name)]; [error] lazy value importableSymbolsWithRenames needs `override' modifier [error] lazy val importableSymbolsWithRenames: List[(Symbol, Name)] = { [error] ^ [warn] /Users/rendong/wdi/spark/repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala:53: variable addedClasspath in class ILoop is deprecated (since 2.11.0): use reset, replay or require to update class path [warn] if (addedClasspath != "") { [warn] ^ [warn] /Users/rendong/wdi/spark/repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala:54: variable addedClasspath in class ILoop is deprecated (since 2.11.0): use reset, replay or require to update class path [warn] settings.classpath append addedClasspath [warn] ^ [warn] two warnings found [error] one error found [error] (repl/compile:compileIncremental) Compilation failed [error] Total time: 93 s, completed 2018-9-3 10:07:26 ``` ## How was this patch tested? ``` ./dev/change-scala-version.sh 2.12 ## For Maven ./build/mvn -Pscala-2.12 [mvn commands] ## For SBT sbt -Dscala.version=2.12.6 ``` Closes #22310 from sadhen/SPARK-25298. Authored-by: Darcy Shen <sadhen@zoho.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2018-09-03 07:36:04 -05:00
Maxim Gekk	3c67cb0b52	[SPARK-25273][DOC] How to install testthat 1.0.2 ## What changes were proposed in this pull request? R tests require `testthat` v1.0.2. In the PR, I described how to install the version in the section http://spark.apache.org/docs/latest/building-spark.html#running-r-tests. Closes #22272 from MaxGekk/r-testthat-doc. Authored-by: Maxim Gekk <maxim.gekk@databricks.com> Signed-off-by: hyukjinkwon <gurwls223@apache.org>	2018-08-30 20:25:26 +08:00
Sean Owen	35f7f5ce83	[DOCS][MINOR] Fix a few broken links and typos, and, nit, use HTTPS more consistently ## What changes were proposed in this pull request? Fix a few broken links and typos, and, nit, use HTTPS more consistently esp. on scripts and Apache links ## How was this patch tested? Doc build Closes #22172 from srowen/DocTypo. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: hyukjinkwon <gurwls223@apache.org>	2018-08-22 01:02:17 +08:00
Sean Owen	5f9633dc97	[SPARK-25015][BUILD] Update Hadoop 2.7 to 2.7.7 ## What changes were proposed in this pull request? Update Hadoop 2.7 to 2.7.7 to pull in bug and security fixes. ## How was this patch tested? Existing tests. Author: Sean Owen <srowen@gmail.com> Closes #21987 from srowen/SPARK-25015.	2018-08-04 14:59:13 -05:00
Bruce Robbins	4c059ebc60	[SPARK-23776][DOC] Update instructions for running PySpark after building with SBT ## What changes were proposed in this pull request? This update tells the reader how to build Spark with SBT such that pyspark-sql tests will succeed. If you follow the current instructions for building Spark with SBT, pyspark/sql/udf.py fails with: <pre> AnalysisException: u'Can not load class test.org.apache.spark.sql.JavaStringLength, please make sure it is on the classpath;' </pre> ## How was this patch tested? I ran the doc build command (SKIP_API=1 jekyll build) and eyeballed the result. Author: Bruce Robbins <bersprockets@gmail.com> Closes #21628 from bersprockets/SPARK-23776_doc.	2018-06-26 09:48:15 +08:00
Daniel Sakuma	6ade5cbb49	[MINOR][DOC] Fix some typos and grammar issues ## What changes were proposed in this pull request? Easy fix in the documentation. ## How was this patch tested? N/A Closes #20948 Author: Daniel Sakuma <dsakuma@gmail.com> Closes #20928 from dsakuma/fix_typo_configuration_docs.	2018-04-06 13:37:08 +08:00
foxish	7ab165b706	[SPARK-22648][K8S] Spark on Kubernetes - Documentation What changes were proposed in this pull request? This PR contains documentation on the usage of Kubernetes scheduler in Spark 2.3, and a shell script to make it easier to build docker images required to use the integration. The changes detailed here are covered by https://github.com/apache/spark/pull/19717 and https://github.com/apache/spark/pull/19468 which have merged already. How was this patch tested? The script has been in use for releases on our fork. Rest is documentation. cc rxin mateiz (shepherd) k8s-big-data SIG members & contributors: foxish ash211 mccheah liyinan926 erikerlandson ssuchter varunkatta kimoonkim tnachen ifilonenko reviewers: vanzin felixcheung jiangxb1987 mridulm TODO: - [x] Add dockerfiles directory to built distribution. (https://github.com/apache/spark/pull/20007) - [x] Change references to docker to instead say "container" (https://github.com/apache/spark/pull/19995) - [x] Update configuration table. - [x] Modify spark.kubernetes.allocation.batch.delay to take time instead of int (#20032) Author: foxish <ramanathana@google.com> Closes #19946 from foxish/update-k8s-docs.	2017-12-21 17:21:11 -08:00
Sean Owen	0c03297bf0	[SPARK-22142][BUILD][STREAMING] Move Flume support behind a profile, take 2 ## What changes were proposed in this pull request? Move flume behind a profile, take 2. See https://github.com/apache/spark/pull/19365 for most of the back-story. This change should fix the problem by removing the examples module dependency and moving Flume examples to the module itself. It also adds deprecation messages, per a discussion on dev about deprecating for 2.3.0. ## How was this patch tested? Existing tests, which still enable flume integration. Author: Sean Owen <sowen@cloudera.com> Closes #19412 from srowen/SPARK-22142.2.	2017-10-06 15:08:28 +01:00
gatorsmile	472864014c	Revert "[SPARK-22142][BUILD][STREAMING] Move Flume support behind a profile" This reverts commit `a2516f41ae`.	2017-09-29 11:45:58 -07:00
Sean Owen	a2516f41ae	[SPARK-22142][BUILD][STREAMING] Move Flume support behind a profile ## What changes were proposed in this pull request? Add 'flume' profile to enable Flume-related integration modules ## How was this patch tested? Existing tests; no functional change Author: Sean Owen <sowen@cloudera.com> Closes #19365 from srowen/SPARK-22142.	2017-09-29 08:26:53 +01:00
Sean Owen	4fbf748bf8	[SPARK-21893][BUILD][STREAMING][WIP] Put Kafka 0.8 behind a profile ## What changes were proposed in this pull request? Put Kafka 0.8 support behind a kafka-0-8 profile. ## How was this patch tested? Existing tests, but, until PR builder and Jenkins configs are updated the effect here is to not build or test Kafka 0.8 support at all. Author: Sean Owen <sowen@cloudera.com> Closes #19134 from srowen/SPARK-21893.	2017-09-13 10:10:40 +01:00
Kousuke Saruta	957558235b	[DOCS] Fix unreachable links in the document ## What changes were proposed in this pull request? Recently, I found two unreachable links in the document and fixed them. Because of small changes related to the document, I don't file this issue in JIRA but please suggest I should do it if you think it's needed. ## How was this patch tested? Tested manually. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #19195 from sarutak/fix-unreachable-link.	2017-09-12 15:07:04 +01:00
Sean Owen	425c4ada4c	[SPARK-19810][BUILD][CORE] Remove support for Scala 2.10 ## What changes were proposed in this pull request? - Remove Scala 2.10 build profiles and support - Replace some 2.10 support in scripts with commented placeholders for 2.12 later - Remove deprecated API calls from 2.10 support - Remove usages of deprecated context bounds where possible - Remove Scala 2.10 workarounds like ScalaReflectionLock - Other minor Scala warning fixes ## How was this patch tested? Existing tests Author: Sean Owen <sowen@cloudera.com> Closes #17150 from srowen/SPARK-19810.	2017-07-13 17:06:24 +08:00
liuzhaokun	24367f23f7	[SPARK-21382] The note about Scala 2.10 in building-spark.md is wrong. [https://issues.apache.org/jira/browse/SPARK-21382](https://issues.apache.org/jira/browse/SPARK-21382) There should be "Note that support for Scala 2.10 is deprecated as of Spark 2.1.0 and may be removed in Spark 2.3.0",right? Author: liuzhaokun <liu.zhaokun@zte.com.cn> Closes #18606 from liu-zhaokun/new07120923.	2017-07-11 23:02:20 -07:00
Yuming Wang	45824fb608	[MINOR][DOCS] Improve Running R Tests docs ## What changes were proposed in this pull request? Update Running R Tests dependence packages to: ```bash R -e "install.packages(c('knitr', 'rmarkdown', 'testthat', 'e1071', 'survival'), repos='http://cran.us.r-project.org')" ``` ## How was this patch tested? manual tests Author: Yuming Wang <wgyumg@gmail.com> Closes #18271 from wangyum/building-spark.	2017-06-16 11:03:54 +01:00
Armin Braun	c8f1219510	[SPARK-20455][DOCS] Fix Broken Docker IT Docs ## What changes were proposed in this pull request? Just added the Maven `test`goal. ## How was this patch tested? No test needed, just a trivial documentation fix. Author: Armin Braun <me@obrown.io> Closes #17756 from original-brownbear/SPARK-20455.	2017-04-25 09:13:50 +01:00

1 2 3

113 commits