### What changes were proposed in this pull request?
Some of GitHub Actions workflow files do not have the Apache license header. This PR adds them.
### Why are the changes needed?
To comply Apache license.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
N/A
Closes#33862 from HyukjinKwon/minor-lisence.
Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 22c492a6b8)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
When I'm running the benchmark in GA, I met the below error.
https://github.com/pingsutw/spark/runs/2867617238?check_suite_focus=true
```
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.j
ava:1692)java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
21/06/20 07:40:02 ERROR SparkContext: Error initializing SparkContext.java.lang.AssertionError: assertion failed:
spark.test.home is not set! at scala.Predef$.assert(Predef.scala:223) at org.apache.spark.deploy.worker.Worker.<init>
(Worker.scala:148) at org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:954) at
org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2(LocalSparkCluster.scala:68) at
org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2$adapted(LocalSparkCluster.scala:65) at
scala.collection.immutable.Range.foreach(Range.scala:158) at
org.apache.spark.deploy.LocalSparkCluster.start(LocalSparkCluster.scala:65) at
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2954) at
org.apache.spark.SparkContext.<init>(SparkContext.scala:559) at org.apache.spark.SparkContext.<init>
(SparkContext.scala:137) at
org.apache.spark.serializer.KryoSerializerBenchmark$.createSparkContext(KryoSerializerBenchmark.scala:86) at
org.apache.spark.serializer.KryoSerializerBenchmark$.sc$lzycompute$1(KryoSerializerBenchmark.scala:58) at
org.apache.spark.serializer.KryoSerializerBenchmark$.sc$1(KryoSerializerBenchmark.scala:58) at
org.apache.spark.serializer.KryoSerializerBenchmark$.$anonfun$run$3(KryoSerializerBenchmark.scala:63)
```
Set `spark.test.home` in the benchmark workflow.
No
Rerun the benchmark in my fork.
https://github.com/pingsutw/spark/actions/runs/996067851Closes#33203 from pingsutw/SPARK-36007.
Lead-authored-by: Kevin Su <pingsutw@apache.org>
Co-authored-by: Kevin Su <pingsutw@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 11fcbc73cb)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This patch uses the "concurrency" syntax to replace the "cancel job" workflow:
- .github/workflows/benchmark.yml
- .github/workflows/labeler.yml
- .github/workflows/notify_test_workflow.yml
- .github/workflows/test_report.yml
Remove the .github/workflows/cancel_duplicate_workflow_runs.yml
Note that the push/schedule based job are not changed to keep the same config in a4b70758d3:
- .github/workflows/build_and_test.yml
- .github/workflows/publish_snapshot.yml
- .github/workflows/stale.yml
- .github/workflows/update_build_status.yml
### Why are the changes needed?
We are using [cancel_duplicate_workflow_runs](a70e66ecfa/.github/workflows/cancel_duplicate_workflow_runs.yml (L1)) job to cancel previous jobs when a new job is queued. Now, it has been supported by the github action by using ["concurrency"](https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#concurrency) syntax to make sure only a single job or workflow using the same concurrency group.
Related: https://github.com/apache/arrow/pull/10416 and https://github.com/potiuk/cancel-workflow-runs
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
triger the PR manaully
Closes#32806 from Yikun/SPARK-X.
Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Currently, it fails at `git diff --name-only` when new benchmarks are added, see https://github.com/HyukjinKwon/spark/actions/runs/808870999
We should include untracked files (new benchmark result files) to upload so developers download the results.
### Why are the changes needed?
So the new benchmark results can be added and uploaded.
### Does this PR introduce _any_ user-facing change?
No, dev-only
### How was this patch tested?
Tested at:
https://github.com/HyukjinKwon/spark/actions/runs/808867285Closes#32428 from HyukjinKwon/include-new-benchmarks.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR replaces 127.0.0.1 to `localhost`.
### Why are the changes needed?
- https://github.com/apache/spark/pull/32096#discussion_r610349269
- https://github.com/apache/spark/pull/32096#issuecomment-816442481
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
I didn't test it because it's CI specific issue. I will test it in Github Actions build in this PR.
Closes#32102 from HyukjinKwon/SPARK-35002.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
### What changes were proposed in this pull request?
This PR tries to fix the `java.net.BindException` when testing with Github Action:
```
[info] org.apache.spark.sql.kafka010.producer.InternalKafkaProducerPoolSuite *** ABORTED *** (282 milliseconds)
[info] java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 100 retries (on a random free port)! Consider explicitly setting the appropriate binding address for the service 'sparkDriver' (for example spark.driver.bindAddress for SparkDriver) to the correct binding address.
[info] at sun.nio.ch.Net.bind0(Native Method)
[info] at sun.nio.ch.Net.bind(Net.java:461)
[info] at sun.nio.ch.Net.bind(Net.java:453)
```
https://github.com/apache/spark/pull/32090/checks?check_run_id=2295418529
### Why are the changes needed?
Fix test framework.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Test by Github Action.
Closes#32096 from wangyum/SPARK_LOCAL_IP=localhost.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR proposes to add a workflow that allows developers to run benchmarks and download the results files. After this PR, developers can run benchmarks in GitHub Actions in their fork.
### Why are the changes needed?
1. Very easy to use.
2. We can use the (almost) same environment to run the benchmarks. Given my few experiments and observation, the CPU, cores, and memory are same.
3. Does not burden ASF's resource at GitHub Actions.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
Manually tested in https://github.com/HyukjinKwon/spark/pull/31.
Entire benchmarks are being run as below:
- [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465)
- [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337)
### How do developers use it in their fork?
1. **Go to Actions in your fork, and click "Run benchmarks"**
![Screen Shot 2021-03-31 at 10 15 13 PM](https://user-images.githubusercontent.com/6477701/113150018-99d71680-926e-11eb-8647-4ecf062c55f2.png)
2. **Run the benchmarks with JDK 8 or 11 with benchmark classes to run. Glob pattern is supported just like `testOnly` in SBT**
![Screen Shot 2021-04-02 at 8 35 02 PM](https://user-images.githubusercontent.com/6477701/113412599-ab95f680-93f3-11eb-9a15-c6ed54587b9d.png)
3. **After finishing the jobs, the benchmark results are available on the top in the underlying workflow:**
![Screen Shot 2021-03-31 at 10 17 21 PM](https://user-images.githubusercontent.com/6477701/113150332-ede1fb00-926e-11eb-9c0e-97d195070508.png)
4. **After downloading it, unzip and untar at Spark git root directory:**
```bash
cd .../spark
mv ~/Downloads/benchmark-results-8.zip .
unzip benchmark-results-8.zip
tar -xvf benchmark-results-8.tar
```
5. **Check the results:**
```bash
git status
```
```
...
modified: core/benchmarks/MapStatusesSerDeserBenchmark-results.txt
```
Closes#32015 from HyukjinKwon/SPARK-34821-pr.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>