ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Hyukjin Kwon	e3edb65bf0	[MINOR][DOCS] Add Apache license header to GitHub Actions workflow files ### What changes were proposed in this pull request? Some of GitHub Actions workflow files do not have the Apache license header. This PR adds them. ### Why are the changes needed? To comply Apache license. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #33862 from HyukjinKwon/minor-lisence. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `22c492a6b8`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-08-28 20:30:25 -07:00
Kevin Su	873f6b9d97	[SPARK-36007][INFRA] Failed to run benchmark in GA When I'm running the benchmark in GA, I met the below error. https://github.com/pingsutw/spark/runs/2867617238?check_suite_focus=true ``` java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.j ava:1692)java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) 21/06/20 07:40:02 ERROR SparkContext: Error initializing SparkContext.java.lang.AssertionError: assertion failed: spark.test.home is not set! at scala.Predef$.assert(Predef.scala:223) at org.apache.spark.deploy.worker.Worker.<init> (Worker.scala:148) at org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:954) at org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2(LocalSparkCluster.scala:68) at org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2$adapted(LocalSparkCluster.scala:65) at scala.collection.immutable.Range.foreach(Range.scala:158) at org.apache.spark.deploy.LocalSparkCluster.start(LocalSparkCluster.scala:65) at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2954) at org.apache.spark.SparkContext.<init>(SparkContext.scala:559) at org.apache.spark.SparkContext.<init> (SparkContext.scala:137) at org.apache.spark.serializer.KryoSerializerBenchmark$.createSparkContext(KryoSerializerBenchmark.scala:86) at org.apache.spark.serializer.KryoSerializerBenchmark$.sc$lzycompute$1(KryoSerializerBenchmark.scala:58) at org.apache.spark.serializer.KryoSerializerBenchmark$.sc$1(KryoSerializerBenchmark.scala:58) at org.apache.spark.serializer.KryoSerializerBenchmark$.$anonfun$run$3(KryoSerializerBenchmark.scala:63) ``` Set `spark.test.home` in the benchmark workflow. No Rerun the benchmark in my fork. https://github.com/pingsutw/spark/actions/runs/996067851 Closes #33203 from pingsutw/SPARK-36007. Lead-authored-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Kevin Su <pingsutw@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `11fcbc73cb`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-07-05 09:18:03 +09:00
Hyukjin Kwon	3be7b29cd8	Revert "[SPARK-35668][INFRA] Use "concurrency" syntax on Github Actions workflow" This reverts commit `f3dc549d9c`.	2021-06-09 16:48:29 +09:00
Yikun Jiang	f3dc549d9c	[SPARK-35668][INFRA] Use "concurrency" syntax on Github Actions workflow ### What changes were proposed in this pull request? This patch uses the "concurrency" syntax to replace the "cancel job" workflow: - .github/workflows/benchmark.yml - .github/workflows/labeler.yml - .github/workflows/notify_test_workflow.yml - .github/workflows/test_report.yml Remove the .github/workflows/cancel_duplicate_workflow_runs.yml Note that the push/schedule based job are not changed to keep the same config in `a4b70758d3`: - .github/workflows/build_and_test.yml - .github/workflows/publish_snapshot.yml - .github/workflows/stale.yml - .github/workflows/update_build_status.yml ### Why are the changes needed? We are using [cancel_duplicate_workflow_runs](`a70e66ecfa/.github/workflows/cancel_duplicate_workflow_runs.yml (L1)`) job to cancel previous jobs when a new job is queued. Now, it has been supported by the github action by using ["concurrency"](https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#concurrency) syntax to make sure only a single job or workflow using the same concurrency group. Related: https://github.com/apache/arrow/pull/10416 and https://github.com/potiuk/cancel-workflow-runs ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? triger the PR manaully Closes #32806 from Yikun/SPARK-X. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-06-08 12:10:40 +09:00
HyukjinKwon	a2927cb28b	[SPARK-35302][INFRA] Benchmark workflow should create new files for new benchmarks ### What changes were proposed in this pull request? Currently, it fails at `git diff --name-only` when new benchmarks are added, see https://github.com/HyukjinKwon/spark/actions/runs/808870999 We should include untracked files (new benchmark result files) to upload so developers download the results. ### Why are the changes needed? So the new benchmark results can be added and uploaded. ### Does this PR introduce _any_ user-facing change? No, dev-only ### How was this patch tested? Tested at: https://github.com/HyukjinKwon/spark/actions/runs/808867285 Closes #32428 from HyukjinKwon/include-new-benchmarks. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2021-05-04 19:02:52 +09:00
HyukjinKwon	a3d1e00317	[SPARK-35002][INFRA][FOLLOW-UP] Use localhost instead of 127.0.0.1 at SPARK_LOCAL_IP in GA builds ### What changes were proposed in this pull request? This PR replaces 127.0.0.1 to `localhost`. ### Why are the changes needed? - https://github.com/apache/spark/pull/32096#discussion_r610349269 - https://github.com/apache/spark/pull/32096#issuecomment-816442481 ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? I didn't test it because it's CI specific issue. I will test it in Github Actions build in this PR. Closes #32102 from HyukjinKwon/SPARK-35002. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Yuming Wang <yumwang@ebay.com>	2021-04-09 16:39:20 +08:00
Yuming Wang	9663c4061a	[SPARK-35002][INFRA] Fix the java.net.BindException when testing with Github Action ### What changes were proposed in this pull request? This PR tries to fix the `java.net.BindException` when testing with Github Action: ``` [info] org.apache.spark.sql.kafka010.producer.InternalKafkaProducerPoolSuite * ABORTED * (282 milliseconds) [info] java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 100 retries (on a random free port)! Consider explicitly setting the appropriate binding address for the service 'sparkDriver' (for example spark.driver.bindAddress for SparkDriver) to the correct binding address. [info] at sun.nio.ch.Net.bind0(Native Method) [info] at sun.nio.ch.Net.bind(Net.java:461) [info] at sun.nio.ch.Net.bind(Net.java:453) ``` https://github.com/apache/spark/pull/32090/checks?check_run_id=2295418529 ### Why are the changes needed? Fix test framework. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Test by Github Action. Closes #32096 from wangyum/SPARK_LOCAL_IP=localhost. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-04-08 23:03:51 -07:00
HyukjinKwon	71effba5f2	[SPARK-34821][INFRA] Set up a workflow for developers to run benchmark in their fork ### What changes were proposed in this pull request? This PR proposes to add a workflow that allows developers to run benchmarks and download the results files. After this PR, developers can run benchmarks in GitHub Actions in their fork. ### Why are the changes needed? 1. Very easy to use. 2. We can use the (almost) same environment to run the benchmarks. Given my few experiments and observation, the CPU, cores, and memory are same. 3. Does not burden ASF's resource at GitHub Actions. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually tested in https://github.com/HyukjinKwon/spark/pull/31. Entire benchmarks are being run as below: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) ### How do developers use it in their fork? 1. Go to Actions in your fork, and click "Run benchmarks" ![Screen Shot 2021-03-31 at 10 15 13 PM](https://user-images.githubusercontent.com/6477701/113150018-99d71680-926e-11eb-8647-4ecf062c55f2.png) 2. Run the benchmarks with JDK 8 or 11 with benchmark classes to run. Glob pattern is supported just like `testOnly` in SBT ![Screen Shot 2021-04-02 at 8 35 02 PM](https://user-images.githubusercontent.com/6477701/113412599-ab95f680-93f3-11eb-9a15-c6ed54587b9d.png) 3. After finishing the jobs, the benchmark results are available on the top in the underlying workflow: ![Screen Shot 2021-03-31 at 10 17 21 PM](https://user-images.githubusercontent.com/6477701/113150332-ede1fb00-926e-11eb-9c0e-97d195070508.png) 4. After downloading it, unzip and untar at Spark git root directory: ```bash cd .../spark mv ~/Downloads/benchmark-results-8.zip . unzip benchmark-results-8.zip tar -xvf benchmark-results-8.tar ``` 5. Check the results: ```bash git status ``` ``` ... modified: core/benchmarks/MapStatusesSerDeserBenchmark-results.txt ``` Closes #32015 from HyukjinKwon/SPARK-34821-pr. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2021-04-03 20:55:54 +09:00

8 commits