Commit graph

338 commits

Author SHA1 Message Date
Gengliang Wang 69be513c5e Preparing development version 3.2.1-SNAPSHOT 2021-08-20 12:40:47 +00:00
Gengliang Wang 6bb3523d8e Preparing Spark release v3.2.0-rc1 2021-08-20 12:40:40 +00:00
Gengliang Wang fafdc1482b Revert "Preparing Spark release v3.2.0-rc1"
This reverts commit 8e58fafb05.
2021-08-20 20:07:02 +08:00
Gengliang Wang c829ed53ff Revert "Preparing development version 3.2.1-SNAPSHOT"
This reverts commit 4f1d21571d.
2021-08-20 20:07:01 +08:00
Gengliang Wang 4f1d21571d Preparing development version 3.2.1-SNAPSHOT 2021-08-19 14:08:32 +00:00
Gengliang Wang 8e58fafb05 Preparing Spark release v3.2.0-rc1 2021-08-19 14:08:26 +00:00
Dongjoon Hyun 7e7028282c [SPARK-35928][BUILD] Upgrade ASM to 9.1
### What changes were proposed in this pull request?

This PR aims to upgrade ASM to 9.1

### Why are the changes needed?

The latest `xbean-asm9-shaded` is built with ASM 9.1.

- https://mvnrepository.com/artifact/org.apache.xbean/xbean-asm9-shaded/4.20
- 5e0e3c0c64/pom.xml (L67)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #33130 from dongjoon-hyun/SPARK-35928.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-06-29 10:27:51 -07:00
zengruios 1cf18a2277 [SPARK-35851][GRAPHX] Modify the wrong variable used in GraphGenerators.sampleLogNormal
### What changes were proposed in this pull request?
modify the wrong variable used in GraphGenerators.sampleLogNormal

### Why are the changes needed?
wrong variable used

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
UT

Closes #33010 from zengruios/SPARK-35851.

Authored-by: zengruios <578395184@qq.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2021-06-23 18:20:33 -05:00
Enzo Bonnal 402375b59e [SPARK-35357][GRAPHX] Allow to turn off the normalization applied by static PageRank utilities
### What changes were proposed in this pull request?

Overload methods `PageRank.runWithOptions` and  `PageRank.runWithOptionsWithPreviousPageRank` (not to break any user-facing signature) with a `normalized` parameter that describes "whether or not to normalize the rank sum".

### Why are the changes needed?

https://issues.apache.org/jira/browse/SPARK-35357

When dealing with a non negligible proportion of sinks in a graph, algorithm based on incremental update of ranks can get a **precision gain for free** if they are allowed to manipulate non normalized ranks.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

By adding a unit test that verifies that (even when dealing with a graph containing a sink) we end up with the same result for both these scenarios:
a)
  - Run **6 iterations** of pagerank in a row using `PageRank.runWithOptions` with **normalization enabled**

b)
  - Run **2 iterations** using `PageRank.runWithOptions` with **normalization disabled**
  - Resume from the `preRankGraph1` and run **2 more iterations** using `PageRank.runWithOptionsWithPreviousPageRank` with **normalization disabled**
  - Finally resume from the `preRankGraph2` and run **2 more iterations** using `PageRank.runWithOptionsWithPreviousPageRank` with **normalization enabled**

Closes #32485 from bonnal-enzo/make-pagerank-normalization-optional.

Authored-by: Enzo Bonnal <enzobonnal@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2021-05-12 08:56:22 -05:00
Ludovic Henry 5b77ebb57b [SPARK-35150][ML] Accelerate fallback BLAS with dev.ludovic.netlib
### What changes were proposed in this pull request?

Following https://github.com/apache/spark/pull/30810, I've continued looking for ways to accelerate the usage of BLAS in Spark. With this PR, I integrate work done in the [`dev.ludovic.netlib`](https://github.com/luhenry/netlib/) Maven package.

The `dev.ludovic.netlib` library wraps the original `com.github.fommil.netlib` library and focus on accelerating the linear algebra routines in use in Spark. When running the `org.apache.spark.ml.linalg.BLASBenchmark` benchmarking suite, I get the results at [1] on an Intel machine. Moreover, this library is thoroughly tested to return the exact same results as the reference implementation.

Under the hood, it reimplements the necessary algorithms in pure autovectorization-friendly Java 8, as well as takes advantage of the Vector API and Foreign Linker API introduced in JDK 16 when available.

A table summarising which version gets loaded in which case:

```
|                       | BLAS.nativeBLAS                                    | BLAS.javaBLAS                                      |
| --------------------- | -------------------------------------------------- | -------------------------------------------------- |
| with -Pnetlib-lgpl    | 1. dev.ludovic.netlib.blas.NetlibNativeBLAS, a     | 1. dev.ludovic.netlib.blas.VectorizedBLAS          |
|                       |     wrapper for com.github.fommil:all              |    (JDK16+, relies on the Vector API, requires     |
|                       | 2. dev.ludovic.netlib.blas.ForeignBLAS (JDK16+,    |     `--add-modules=jdk.incubator.vector` on JDK16) |
|                       |    relies on the Foreign Linker API, requires      | 2. dev.ludovic.netlib.blas.Java11BLAS (JDK11+)     |
|                       |    `--add-modules=jdk.incubator.foreign            | 3. dev.ludovic.netlib.blas.JavaBLAS                |
|                       |     -Dforeign.restricted=warn`)                    | 4. dev.ludovic.netlib.blas.NetlibF2jBLAS, a        |
|                       | 3. fails to load, falls back to BLAS.javaBLAS in   |     wrapper for com.github.fommil:core             |
|                       |     org.apache.spark.ml.linalg.BLAS                |                                                    |
| --------------------- | -------------------------------------------------- | -------------------------------------------------- |
| without -Pnetlib-lgpl | 1. dev.ludovic.netlib.blas.ForeignBLAS (JDK16+,    | 1. dev.ludovic.netlib.blas.VectorizedBLAS          |
|                       |    relies on the Foreign Linker API, requires      |    (JDK16+, relies on the Vector API, requires     |
|                       |    `--add-modules=jdk.incubator.foreign            |     `--add-modules=jdk.incubator.vector` on JDK16) |
|                       |     -Dforeign.restricted=warn`)                    | 2. dev.ludovic.netlib.blas.Java11BLAS (JDK11+)     |
|                       | 2. fails to load, falls back to BLAS.javaBLAS in   | 3. dev.ludovic.netlib.blas.JavaBLAS                |
|                       |     org.apache.spark.ml.linalg.BLAS                | 4. dev.ludovic.netlib.blas.NetlibF2jBLAS, a        |
|                       |                                                    |     wrapper for com.github.fommil:core             |
| --------------------- | -------------------------------------------------- | -------------------------------------------------- |
```

### Why are the changes needed?

Accelerates linear algebra operations when the pure-java fallback method is in use. Transparently falls back to native implementation (OpenBLAS, MKL) when available.

### Does this PR introduce _any_ user-facing change?

No, all changes are transparent to the user.

### How was this patch tested?

The `dev.ludovic.netlib` library has its own test suite [2]. It has also been validated by running the Spark test suite and benchmarking suite.

[1] Results for `org.apache.spark.ml.linalg.BLASBenchmark`:
#### JDK8:
```
[info] OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.8.0-50-generic
[info] Intel(R) Xeon(R) E-2276G CPU  3.80GHz
[info]
[info] f2jBLAS    = dev.ludovic.netlib.blas.NetlibF2jBLAS
[info] javaBLAS   = dev.ludovic.netlib.blas.Java8BLAS
[info] nativeBLAS = dev.ludovic.netlib.blas.Java8BLAS
[info]
[info] daxpy:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 223            232           8        448.0           2.2       1.0X
[info] java                                                221            228           7        453.0           2.2       1.0X
[info]
[info] saxpy:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 122            128           4        821.2           1.2       1.0X
[info] java                                                122            128           4        822.3           1.2       1.0X
[info]
[info] ddot:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 109            112           2        921.4           1.1       1.0X
[info] java                                                 70             74           3       1423.5           0.7       1.5X
[info]
[info] sdot:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  96             98           2       1046.1           1.0       1.0X
[info] java                                                 47             49           2       2121.7           0.5       2.0X
[info]
[info] dscal:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 184            195           8        544.3           1.8       1.0X
[info] java                                                185            196           7        539.5           1.9       1.0X
[info]
[info] sscal:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  99            104           4       1011.9           1.0       1.0X
[info] java                                                 99            104           4       1010.4           1.0       1.0X
[info]
[info] dspmv[U]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0        947.2           1.1       1.0X
[info] java                                                  0              0           0       1584.8           0.6       1.7X
[info]
[info] dspr[U]:                                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0        867.4           1.2       1.0X
[info] java                                                  1              1           0        865.0           1.2       1.0X
[info]
[info] dsyr[U]:                                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0        485.9           2.1       1.0X
[info] java                                                  1              1           0        486.8           2.1       1.0X
[info]
[info] dgemv[N]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0       1843.0           0.5       1.0X
[info] java                                                  0              0           0       2690.6           0.4       1.5X
[info]
[info] dgemv[T]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0       1214.7           0.8       1.0X
[info] java                                                  0              0           0       2536.8           0.4       2.1X
[info]
[info] sgemv[N]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0       1895.9           0.5       1.0X
[info] java                                                  0              0           0       2961.1           0.3       1.6X
[info]
[info] sgemv[T]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0       1223.4           0.8       1.0X
[info] java                                                  0              0           0       3091.4           0.3       2.5X
[info]
[info] dgemm[N,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 560            575          20       1787.1           0.6       1.0X
[info] java                                                226            232           5       4432.4           0.2       2.5X
[info]
[info] dgemm[N,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 570            586          23       1755.2           0.6       1.0X
[info] java                                                227            232           4       4410.1           0.2       2.5X
[info]
[info] dgemm[T,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 863            879          17       1158.4           0.9       1.0X
[info] java                                                227            231           3       4407.9           0.2       3.8X
[info]
[info] dgemm[T,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                1282           1305          23        780.0           1.3       1.0X
[info] java                                                227            232           4       4413.4           0.2       5.7X
[info]
[info] sgemm[N,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 538            548           8       1858.6           0.5       1.0X
[info] java                                                221            226           3       4521.1           0.2       2.4X
[info]
[info] sgemm[N,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 549            558          10       1819.9           0.5       1.0X
[info] java                                                222            229           7       4503.5           0.2       2.5X
[info]
[info] sgemm[T,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 838            852          12       1193.0           0.8       1.0X
[info] java                                                222            229           5       4500.5           0.2       3.8X
[info]
[info] sgemm[T,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 905            919          18       1104.8           0.9       1.0X
[info] java                                                221            228           5       4521.3           0.2       4.1X
```

#### JDK11:
```
[info] OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.8.0-50-generic
[info] Intel(R) Xeon(R) E-2276G CPU  3.80GHz
[info]
[info] f2jBLAS    = dev.ludovic.netlib.blas.NetlibF2jBLAS
[info] javaBLAS   = dev.ludovic.netlib.blas.Java11BLAS
[info] nativeBLAS = dev.ludovic.netlib.blas.Java11BLAS
[info]
[info] daxpy:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 195            204          10        512.7           2.0       1.0X
[info] java                                                195            202           7        512.4           2.0       1.0X
[info]
[info] saxpy:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 108            113           4        923.3           1.1       1.0X
[info] java                                                102            107           4        984.4           1.0       1.1X
[info]
[info] ddot:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 107            110           3        938.1           1.1       1.0X
[info] java                                                 69             72           3       1447.1           0.7       1.5X
[info]
[info] sdot:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  96             98           2       1046.5           1.0       1.0X
[info] java                                                 43             45           2       2317.1           0.4       2.2X
[info]
[info] dscal:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 155            168           8        644.2           1.6       1.0X
[info] java                                                158            169           8        632.8           1.6       1.0X
[info]
[info] sscal:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  85             90           4       1178.1           0.8       1.0X
[info] java                                                 86             90           4       1167.7           0.9       1.0X
[info]
[info] dspmv[U]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   0              0           0       1182.1           0.8       1.0X
[info] java                                                  0              0           0       1432.1           0.7       1.2X
[info]
[info] dspr[U]:                                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0        898.7           1.1       1.0X
[info] java                                                  1              1           0        891.5           1.1       1.0X
[info]
[info] dsyr[U]:                                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0        495.4           2.0       1.0X
[info] java                                                  1              1           0        495.7           2.0       1.0X
[info]
[info] dgemv[N]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   0              0           0       2271.6           0.4       1.0X
[info] java                                                  0              0           0       3648.1           0.3       1.6X
[info]
[info] dgemv[T]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0       1229.3           0.8       1.0X
[info] java                                                  0              0           0       2711.3           0.4       2.2X
[info]
[info] sgemv[N]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   0              0           0       2677.5           0.4       1.0X
[info] java                                                  0              0           0       3288.2           0.3       1.2X
[info]
[info] sgemv[T]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0       1233.0           0.8       1.0X
[info] java                                                  0              0           0       2766.3           0.4       2.2X
[info]
[info] dgemm[N,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 520            536          16       1923.6           0.5       1.0X
[info] java                                                214            221           7       4669.5           0.2       2.4X
[info]
[info] dgemm[N,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 593            612          17       1686.5           0.6       1.0X
[info] java                                                215            219           3       4643.3           0.2       2.8X
[info]
[info] dgemm[T,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 853            870          16       1172.8           0.9       1.0X
[info] java                                                215            218           3       4659.7           0.2       4.0X
[info]
[info] dgemm[T,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                1350           1370          23        740.8           1.3       1.0X
[info] java                                                215            219           4       4656.6           0.2       6.3X
[info]
[info] sgemm[N,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 460            468           6       2173.2           0.5       1.0X
[info] java                                                210            213           2       4752.7           0.2       2.2X
[info]
[info] sgemm[N,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 535            544           8       1869.3           0.5       1.0X
[info] java                                                210            215           5       4761.8           0.2       2.5X
[info]
[info] sgemm[T,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 843            853          11       1186.8           0.8       1.0X
[info] java                                                209            214           4       4793.4           0.2       4.0X
[info]
[info] sgemm[T,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 891            904          15       1122.0           0.9       1.0X
[info] java                                                209            214           4       4777.2           0.2       4.3X
```

#### JDK16:
```
[info] OpenJDK 64-Bit Server VM 16+36 on Linux 5.8.0-50-generic
[info] Intel(R) Xeon(R) E-2276G CPU  3.80GHz
[info]
[info] f2jBLAS    = dev.ludovic.netlib.blas.NetlibF2jBLAS
[info] javaBLAS   = dev.ludovic.netlib.blas.VectorizedBLAS
[info] nativeBLAS = dev.ludovic.netlib.blas.VectorizedBLAS
[info]
[info] daxpy:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 194            199           7        515.7           1.9       1.0X
[info] java                                                181            186           3        551.1           1.8       1.1X
[info]
[info] saxpy:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 109            115           4        915.0           1.1       1.0X
[info] java                                                 88             92           3       1138.8           0.9       1.2X
[info]
[info] ddot:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 108            110           2        922.6           1.1       1.0X
[info] java                                                 54             56           2       1839.2           0.5       2.0X
[info]
[info] sdot:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  96             97           2       1046.1           1.0       1.0X
[info] java                                                 29             30           1       3393.4           0.3       3.2X
[info]
[info] dscal:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 156            165           5        643.0           1.6       1.0X
[info] java                                                150            159           5        667.1           1.5       1.0X
[info]
[info] sscal:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  85             91           6       1171.0           0.9       1.0X
[info] java                                                 75             79           3       1340.6           0.7       1.1X
[info]
[info] dspmv[U]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0        917.0           1.1       1.0X
[info] java                                                  0              0           0       8147.2           0.1       8.9X
[info]
[info] dspr[U]:                                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0        859.3           1.2       1.0X
[info] java                                                  1              1           0        859.3           1.2       1.0X
[info]
[info] dsyr[U]:                                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0        482.1           2.1       1.0X
[info] java                                                  1              1           0        482.6           2.1       1.0X
[info]
[info] dgemv[N]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   0              0           0       2214.2           0.5       1.0X
[info] java                                                  0              0           0       7975.8           0.1       3.6X
[info]
[info] dgemv[T]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0       1231.4           0.8       1.0X
[info] java                                                  0              0           0       8680.9           0.1       7.0X
[info]
[info] sgemv[N]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   0              0           0       2684.3           0.4       1.0X
[info] java                                                  0              0           0      18527.1           0.1       6.9X
[info]
[info] sgemv[T]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0       1235.4           0.8       1.0X
[info] java                                                  0              0           0      17347.9           0.1      14.0X
[info]
[info] dgemm[N,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 530            552          18       1887.5           0.5       1.0X
[info] java                                                 58             64           3      17143.9           0.1       9.1X
[info]
[info] dgemm[N,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 598            620          17       1671.1           0.6       1.0X
[info] java                                                 58             64           3      17196.6           0.1      10.3X
[info]
[info] dgemm[T,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 834            847          14       1199.4           0.8       1.0X
[info] java                                                 57             63           4      17486.9           0.1      14.6X
[info]
[info] dgemm[T,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                1338           1366          22        747.3           1.3       1.0X
[info] java                                                 58             63           3      17356.6           0.1      23.2X
[info]
[info] sgemm[N,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 489            501           9       2045.5           0.5       1.0X
[info] java                                                 36             38           2      27721.9           0.0      13.6X
[info]
[info] sgemm[N,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 478            488           9       2094.0           0.5       1.0X
[info] java                                                 36             38           2      27813.2           0.0      13.3X
[info]
[info] sgemm[T,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 825            837          10       1211.6           0.8       1.0X
[info] java                                                 35             38           2      28433.1           0.0      23.5X
[info]
[info] sgemm[T,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 900            918          15       1111.6           0.9       1.0X
[info] java                                                 36             38           2      28073.0           0.0      25.3X
```

[2] https://github.com/luhenry/netlib/tree/master/blas/src/test/java/dev/ludovic/netlib/blas

Closes #32253 from luhenry/master.

Authored-by: Ludovic Henry <git@ludovic.dev>
Signed-off-by: Sean Owen <srowen@gmail.com>
2021-04-27 14:00:59 -05:00
yangjie01 8b1ba233f1 [SPARK-34068][CORE][SQL][MLLIB][GRAPHX] Remove redundant collection conversion
### What changes were proposed in this pull request?
There are some redundant collection conversion can be removed, for version compatibility, clean up these with Scala-2.13 profile.

### Why are the changes needed?
Remove redundant collection conversion

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass the Jenkins or GitHub  Action
- Manual test `core`, `graphx`, `mllib`, `mllib-local`, `sql`, `yarn`,`kafka-0-10` in Scala 2.13 passed

Closes #31125 from LuciferYang/SPARK-34068.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2021-01-13 18:07:02 -06:00
Dongjoon Hyun de9818f043
[SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT
### What changes were proposed in this pull request?

This PR aims to update `master` branch version to 3.2.0-SNAPSHOT.

### Why are the changes needed?

Start to prepare Apache Spark 3.2.0.

### Does this PR introduce _any_ user-facing change?

N/A.

### How was this patch tested?

Pass the CIs.

Closes #30606 from dongjoon-hyun/SPARK-3.2.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-12-04 14:10:42 -08:00
Josh Soref 9d58a2f0f0 [MINOR][GRAPHX] Correct typos in the sub-modules: graphx, external, and examples
### What changes were proposed in this pull request?

This PR intends to fix typos in the sub-modules: graphx, external, and examples.
Split per holdenk https://github.com/apache/spark/pull/30323#issuecomment-725159710

NOTE: The misspellings have been reported at 706a726f87 (commitcomment-44064356)

### Why are the changes needed?

Misspelled words make it harder to read / understand content.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

No testing was performed

Closes #30326 from jsoref/spelling-graphx.

Authored-by: Josh Soref <jsoref@users.noreply.github.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-11-12 08:29:22 +09:00
Sean Owen be2eca22e9 [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
### What changes were proposed in this pull request?

Updates to scalatest 3.2.0. Though it looks large, it is 99% changes to the new location of scalatest classes.

### Why are the changes needed?

3.2.0+ has a fix that is required for Scala 2.13.3+ compatibility.

### Does this PR introduce _any_ user-facing change?

No, only affects tests.

### How was this patch tested?

Existing tests.

Closes #29196 from srowen/SPARK-32398.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-23 16:20:17 -07:00
gatorsmile 28b8713036 [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT
### What changes were proposed in this pull request?
This patch is to bump the master branch version to 3.1.0-SNAPSHOT.

### Why are the changes needed?
N/A

### Does this PR introduce any user-facing change?
N/A

### How was this patch tested?
N/A

Closes #27698 from gatorsmile/updateVersion.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-25 19:44:31 -08:00
Wu, Xiaochang f5238ea6cb [GRAPHX][MINOR] Fix typo setRest => setDest
### What changes were proposed in this pull request?
Fix typo

def setRest(dstId: VertexId, localDstId: Int, dstAttr: VD, attr: ED)

to

def setDest(dstId: VertexId, localDstId: Int, dstAttr: VD, attr: ED)

### Why are the changes needed?
Typo

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes #27594 from xwu99/fix-graphx-setDest.

Authored-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-02-16 09:51:02 -06:00
Yuming Wang 696288f623 [INFRA] Reverts commit 56dcd79 and c216ef1
### What changes were proposed in this pull request?
1. Revert "Preparing development version 3.0.1-SNAPSHOT": 56dcd79

2. Revert "Preparing Spark release v3.0.0-preview2-rc2": c216ef1

### Why are the changes needed?
Shouldn't change master.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
manual test:
https://github.com/apache/spark/compare/5de5e46..wangyum:revert-master

Closes #26915 from wangyum/revert-master.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
2019-12-16 19:57:44 -07:00
Yuming Wang 56dcd79992 Preparing development version 3.0.1-SNAPSHOT 2019-12-17 01:57:27 +00:00
Yuming Wang c216ef1d03 Preparing Spark release v3.0.0-preview2-rc2 2019-12-17 01:57:21 +00:00
joanfontanals 43556e46e9 [SPARK-29877][GRAPHX] static PageRank allow checkPoint from previous computations
### What changes were proposed in this pull request?
Add an optional parameter to the staticPageRank computation with the result of a previous PageRank computation. This would make the algorithm start from a different starting point closer to the convergence configuration

### Why are the changes needed?
https://issues.apache.org/jira/browse/SPARK-29877

It would be really helpful to have the possibility, when computing staticPageRank to use a previous computation as a checkpoint to continue the iterations.

### Does this PR introduce any user-facing change?
Yes, it allows to start the static  page Rank computation from the point where an earlier one finished.

Example: Compute 10 iteration first, and continue for 3 more iterations
```scala
val partialPageRank = graph.ops.staticPageRank(numIter=10, resetProb=0.15)
val continuationPageRank = graph.ops.staticPageRank(numIter=3,  resetProb=0.15, Some(partialPageRank))
 ```

### How was this patch tested?
Yes, some tests were added.
Testing was done as follow:
- Check how many iterations it takes for a static Page Rank computation to converge
- Run the static Page Rank computation for half of these iterations and take result as checkpoint
- Restart computation and check that number of iterations it takes to converge. It never has to be larger than the original one and in most of the cases it is much smaller.

Due to the presence of sinks and the normalization done in [[SPARK-18847]] it is not exactly equivalent to compute static page rank for 2 iterations, take the result at checkpoint and run for 2 more iterations than to compute directly for 4 iterations.

However this checkpointing can give the algorithm a hint about the true distribution of pageRanks in the graph

Closes #26608 from JoanFM/pageRank_checkPoint.

Authored-by: joanfontanals <jfontanals@ntent.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-11-28 08:36:54 -06:00
Sean Owen 1febd373ea [MINOR][TESTS] Replace JVM assert with JUnit Assert in tests
### What changes were proposed in this pull request?

Use JUnit assertions in tests uniformly, not JVM assert() statements.

### Why are the changes needed?

assert() statements do not produce as useful errors when they fail, and, if they were somehow disabled, would fail to test anything.

### Does this PR introduce any user-facing change?

No. The assertion logic should be identical.

### How was this patch tested?

Existing tests.

Closes #26581 from srowen/assertToJUnit.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-11-20 14:04:15 -06:00
Xingbo Jiang 8207c835b4 Revert "Prepare Spark release v3.0.0-preview-rc2"
This reverts commit 007c873ae3.
2019-10-30 17:45:44 -07:00
Xingbo Jiang 007c873ae3 Prepare Spark release v3.0.0-preview-rc2
### What changes were proposed in this pull request?

To push the built jars to maven release repository, we need to remove the 'SNAPSHOT' tag from the version name.

Made the following changes in this PR:
* Update all the `3.0.0-SNAPSHOT` version name to `3.0.0-preview`
* Update the sparkR version number check logic to allow jvm version like `3.0.0-preview`

**Please note those changes were generated by the release script in the past, but this time since we manually add tags on master branch, we need to manually apply those changes too.**

We shall revert the changes after 3.0.0-preview release passed.

### Why are the changes needed?

To make the maven release repository to accept the built jars.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

N/A
2019-10-30 17:42:59 -07:00
Xingbo Jiang b33a58c0c6 Revert "Prepare Spark release v3.0.0-preview-rc1"
This reverts commit 5eddbb5f1d.
2019-10-28 22:32:34 -07:00
Xingbo Jiang 5eddbb5f1d Prepare Spark release v3.0.0-preview-rc1
### What changes were proposed in this pull request?

To push the built jars to maven release repository, we need to remove the 'SNAPSHOT' tag from the version name.

Made the following changes in this PR:
* Update all the `3.0.0-SNAPSHOT` version name to `3.0.0-preview`
* Update the PySpark version from `3.0.0.dev0` to `3.0.0`

**Please note those changes were generated by the release script in the past, but this time since we manually add tags on master branch, we need to manually apply those changes too.**

We shall revert the changes after 3.0.0-preview release passed.

### Why are the changes needed?

To make the maven release repository to accept the built jars.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

N/A

Closes #26243 from jiangxb1987/3.0.0-preview-prepare.

Lead-authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Co-authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
2019-10-28 22:31:29 -07:00
Sean Owen 3b0bca42ac [SPARK-29401][FOLLOWUP] Additional cases where a .parallelize call with Array is ambiguous in 2.13
This is just a followup on https://github.com/apache/spark/pull/26062 -- see it for more detail.

I think we will eventually find more cases of this. It's hard to get them all at once as there are many different types of compile errors in earlier modules. I'm trying to address them in as a big a chunk as possible.

Closes #26074 from srowen/SPARK-29401.2.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-09 10:27:05 -07:00
Sean Owen ee83d09b53 [SPARK-29401][CORE][ML][SQL][GRAPHX][TESTS] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples
### What changes were proposed in this pull request?

Invocations like `sc.parallelize(Array((1,2)))` cause a compile error in 2.13, like:
```
[ERROR] [Error] /Users/seanowen/Documents/spark_2.13/core/src/test/scala/org/apache/spark/ShuffleSuite.scala:47: overloaded method value apply with alternatives:
  (x: Unit,xs: Unit*)Array[Unit] <and>
  (x: Double,xs: Double*)Array[Double] <and>
  (x: Float,xs: Float*)Array[Float] <and>
  (x: Long,xs: Long*)Array[Long] <and>
  (x: Int,xs: Int*)Array[Int] <and>
  (x: Char,xs: Char*)Array[Char] <and>
  (x: Short,xs: Short*)Array[Short] <and>
  (x: Byte,xs: Byte*)Array[Byte] <and>
  (x: Boolean,xs: Boolean*)Array[Boolean]
 cannot be applied to ((Int, Int), (Int, Int), (Int, Int), (Int, Int))
```
Using a `Seq` instead appears to resolve it, and is effectively equivalent.

### Why are the changes needed?

To better cross-build for 2.13.

### Does this PR introduce any user-facing change?

None.

### How was this patch tested?

Existing tests.

Closes #26062 from srowen/SPARK-29401.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-08 20:22:02 -07:00
Sean Owen e1ea806b30 [SPARK-29291][CORE][SQL][STREAMING][MLLIB] Change procedure-like declaration to function + Unit for 2.13
### What changes were proposed in this pull request?

Scala 2.13 emits a deprecation warning for procedure-like declarations:

```
def foo() {
 ...
```

This is equivalent to the following, so should be changed to avoid a warning:

```
def foo(): Unit = {
  ...
```

### Why are the changes needed?

It will avoid about a thousand compiler warnings when we start to support Scala 2.13. I wanted to make the change in 3.0 as there are less likely to be back-ports from 3.0 to 2.4 than 3.1 to 3.0, for example, minimizing that downside to touching so many files.

Unfortunately, that makes this quite a big change.

### Does this PR introduce any user-facing change?

No behavior change at all.

### How was this patch tested?

Existing tests.

Closes #25968 from srowen/SPARK-29291.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-30 10:03:23 -07:00
Liang-Chi Hsieh c610de6952 [SPARK-29042][CORE] Sampling-based RDD with unordered input should be INDETERMINATE
### What changes were proposed in this pull request?

We already have found and fixed the correctness issue before when RDD output is INDETERMINATE. One missing part is sampling-based RDD. This kind of RDDs is order sensitive to its input. A sampling-based RDD with unordered input, should be INDETERMINATE.

### Why are the changes needed?

A sampling-based RDD with unordered input is just like MapPartitionsRDD with isOrderSensitive parameter as true. The RDD output can be different after a rerun.

It is a problem in ML applications.

In ML, sample is used to prepare training data. ML algorithm fits the model based on the sampled data. If rerun tasks of sample produce different output during model fitting, ML results will be unreliable and also buggy.

Each sample is random output, but once you sampled, the output should be determinate.

### Does this PR introduce any user-facing change?

Previously, a sampling-based RDD can possibly come with different output after a rerun.
After this patch, sampling-based RDD is INDETERMINATE. For an INDETERMINATE map stage, currently Spark scheduler will re-try all the tasks of the failed stage.

### How was this patch tested?

Added test.

Closes #25751 from viirya/sample-order-sensitive.

Authored-by: Liang-Chi Hsieh <liangchi@uber.com>
Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com>
2019-09-13 14:07:00 -07:00
Sean Owen bfb3ffe9b3 [SPARK-27682][CORE][GRAPHX][MLLIB] Replace use of collections and methods that will be removed in Scala 2.13 with work-alikes
## What changes were proposed in this pull request?

This replaces use of collection classes like `MutableList` and `ArrayStack` with workalikes that are available in 2.12, as they will be removed in 2.13. It also removes use of `.to[Collection]` as its uses was superfluous anyway. Removing `collection.breakOut` will have to wait until 2.13

## How was this patch tested?

Existing tests

Closes #24586 from srowen/SPARK-27682.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-05-15 09:29:12 -05:00
Sean Owen a10608cb82 [SPARK-27680][CORE][SQL][GRAPHX] Remove usage of Traversable
## What changes were proposed in this pull request?

This removes usage of `Traversable`, which is removed in Scala 2.13. This is mostly an internal change, except for the change in the `SparkConf.setAll` method. See additional comments below.

## How was this patch tested?

Existing tests.

Closes #24584 from srowen/SPARK-27680.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-05-14 09:14:56 -05:00
Adi Muraru 8ef4da753d [SPARK-27610][YARN] Shade netty native libraries
## What changes were proposed in this pull request?

Fixed the `spark-<version>-yarn-shuffle.jar` artifact packaging to shade the native netty libraries:
- shade the `META-INF/native/libnetty_*` native libraries when packagin
the yarn shuffle service jar. This is required as netty library loader
derives that based on shaded package name.
- updated the `org/spark_project` shade package prefix to `org/sparkproject`
(i.e. removed underscore) as the former breaks the netty native lib loading.

This was causing the yarn external shuffle service to fail
when spark.shuffle.io.mode=EPOLL

## How was this patch tested?
Manual tests

Closes #24502 from amuraru/SPARK-27610_master.

Authored-by: Adi Muraru <amuraru@adobe.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-05-07 10:47:36 -07:00
Sean Owen 8a17d26784 [SPARK-27536][CORE][ML][SQL][STREAMING] Remove most use of scala.language.existentials
## What changes were proposed in this pull request?

I want to get rid of as much use of `scala.language.existentials` as possible for 3.0. It's a complicated language feature that generates warnings unless this value is imported. It might even be on the way out of Scala: https://contributors.scala-lang.org/t/proposal-to-remove-existential-types-from-the-language/2785

For Spark, it comes up mostly where the code plays fast and loose with generic types, not the advanced situations you'll often see referenced where this feature is explained. For example, it comes up in cases where a function returns something like `(String, Class[_])`. Scala doesn't like matching this to any other instance of `(String, Class[_])` because doing so requires inferring the existence of some type that satisfies both. Seems obvious if the generic type is a wildcard, but, not technically something Scala likes to let you get away with.

This is a large PR, and it only gets rid of _most_ instances of `scala.language.existentials`. The change should be all compile-time and shouldn't affect APIs or logic.

Many of the changes simply touch up sloppiness about generic types, making the known correct value explicit in the code.

Some fixes involve being more explicit about the existence of generic types in methods. For instance, `def foo(arg: Class[_])` seems innocent enough but should really be declared `def foo[T](arg: Class[T])` to let Scala select and fix a single type when evaluating calls to `foo`.

For kind of surprising reasons, this comes up in places where code evaluates a tuple of things that involve a generic type, but is OK if the two parts of the tuple are evaluated separately.

One key change was altering `Utils.classForName(...): Class[_]` to the more correct `Utils.classForName[T](...): Class[T]`. This caused a number of small but positive changes to callers that otherwise had to cast the result.

In several tests, `Dataset[_]` was used where `DataFrame` seems to be the clear intent.

Finally, in a few cases in MLlib, the return type `this.type` was used where there are no subclasses of the class that uses it. This really isn't needed and causes issues for Scala reasoning about the return type. These are just changed to be concrete classes as return types.

After this change, we have only a few classes that still import `scala.language.existentials` (because modifying them would require extensive rewrites to fix) and no build warnings.

## How was this patch tested?

Existing tests.

Closes #24431 from srowen/SPARK-27536.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-04-29 11:02:01 -05:00
Sean Owen d4420b455a [SPARK-27323][CORE][SQL][STREAMING] Use Single-Abstract-Method support in Scala 2.12 to simplify code
## What changes were proposed in this pull request?

Use Single Abstract Method syntax where possible (and minor related cleanup). Comments below. No logic should change here.

## How was this patch tested?

Existing tests.

Closes #24241 from srowen/SPARK-27323.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-04-02 07:37:05 -07:00
Sean Owen 8bc304f97e [SPARK-26132][BUILD][CORE] Remove support for Scala 2.11 in Spark 3.0.0
## What changes were proposed in this pull request?

Remove Scala 2.11 support in build files and docs, and in various parts of code that accommodated 2.11. See some targeted comments below.

## How was this patch tested?

Existing tests.

Closes #23098 from srowen/SPARK-26132.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-03-25 10:46:42 -05:00
Maxim Gekk a829234df3 [SPARK-26817][CORE] Use System.nanoTime to measure time intervals
## What changes were proposed in this pull request?

In the PR, I propose to use `System.nanoTime()` instead of `System.currentTimeMillis()` in measurements of time intervals.

`System.currentTimeMillis()` returns current wallclock time and will follow changes to the system clock. Thus, negative wallclock adjustments can cause timeouts to "hang" for a long time (until wallclock time has caught up to its previous value again). This can happen when ntpd does a "step" after the network has been disconnected for some time. The most canonical example is during system bootup when DHCP takes longer than usual. This can lead to failures that are really hard to understand/reproduce. `System.nanoTime()` is guaranteed to be monotonically increasing irrespective of wallclock changes.

## How was this patch tested?

By existing test suites.

Closes #23727 from MaxGekk/system-nanotime.

Lead-authored-by: Maxim Gekk <max.gekk@gmail.com>
Co-authored-by: Maxim Gekk <maxim.gekk@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-02-13 13:12:16 -06:00
Sean Owen 8171b156eb [SPARK-26771][CORE][GRAPHX] Make .unpersist(), .destroy() consistently non-blocking by default
## What changes were proposed in this pull request?

Make .unpersist(), .destroy() non-blocking by default and adjust callers to request blocking only where important.

This also adds an optional blocking argument to Pyspark's RDD.unpersist(), which never had one.

## How was this patch tested?

Existing tests.

Closes #23685 from srowen/SPARK-26771.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-02-01 18:29:55 -06:00
Huon Wilson da526985c7 [SPARK-26757][GRAPHX] Return 0 for count on empty Edge/Vertex RDDs
## What changes were proposed in this pull request?

Previously a "java.lang.UnsupportedOperationException: empty
collection" exception would be thrown due to using `reduce`, rather
than `fold` or similar that can tolerate empty RDDs.

This behaviour has existed for the Vertex RDDs since it was introduced
in b30e0ae035. It seems this behaviour
was inherited by the Edge RDDs via copy-paste in
ee29ef3800.

## How was this patch tested?

Two new unit tests.

Closes #23681 from huonw/empty-graphx.

Authored-by: Huon Wilson <Huon.Wilson@data61.csiro.au>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-01-31 17:27:11 -06:00
Huon Wilson aeda76e2b7 [GRAPHX] Remove unused variables left over by previous refactoring.
## What changes were proposed in this pull request?

Some variables were previously used for indexing the routing table's backing
array, but that indexing now happens elsewhere, and so the variables aren't needed.

## How was this patch tested?

Unit tests.

(This contribution is my original work and I license the work to Spark under its open source license.)

Closes #23112 from huonw/remove-unused-variables.

Authored-by: Huon Wilson <Huon.Wilson@data61.csiro.au>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2018-11-22 15:43:04 -06:00
DB Tsai ad853c5678
[SPARK-25956] Make Scala 2.12 as default Scala version in Spark 3.0
## What changes were proposed in this pull request?

This PR makes Spark's default Scala version as 2.12, and Scala 2.11 will be the alternative version. This implies that Scala 2.12 will be used by our CI builds including pull request builds.

We'll update the Jenkins to include a new compile-only jobs for Scala 2.11 to ensure the code can be still compiled with Scala 2.11.

## How was this patch tested?

existing tests

Closes #22967 from dbtsai/scala2.12.

Authored-by: DB Tsai <d_tsai@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2018-11-14 16:22:23 -08:00
DB Tsai 3ed91c9b89
[SPARK-25946][BUILD] Upgrade ASM to 7.x to support JDK11
## What changes were proposed in this pull request?

Upgrade ASM to 7.x to support JDK11

## How was this patch tested?

Existing tests.

Closes #22953 from dbtsai/asm7.

Authored-by: DB Tsai <d_tsai@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
2018-11-06 05:38:59 +00:00
gatorsmile 9bf397c0e4 [SPARK-25592] Setting version to 3.0.0-SNAPSHOT
## What changes were proposed in this pull request?

This patch is to bump the master branch version to 3.0.0-SNAPSHOT.

## How was this patch tested?
N/A

Closes #22606 from gatorsmile/bump3.0.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
2018-10-02 08:48:24 -07:00
gatorsmile bb2f069cf2 [SPARK-25436] Bump master branch version to 2.5.0-SNAPSHOT
## What changes were proposed in this pull request?
In the dev list, we can still discuss whether the next version is 2.5.0 or 3.0.0. Let us first bump the master branch version to `2.5.0-SNAPSHOT`.

## How was this patch tested?
N/A

Closes #22426 from gatorsmile/bumpVersionMaster.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
2018-09-15 16:24:02 -07:00
Shahid 3b6591b0b0 [SPARK-25268][GRAPHX] run Parallel Personalized PageRank throws serialization Exception
## What changes were proposed in this pull request?
mapValues in scala is currently not serializable. To avoid the serialization issue while running pageRank, we need to use map instead of mapValues.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Closes #22271 from shahidki31/master_latest.

Authored-by: Shahid <shahidki31@gmail.com>
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2018-09-06 09:52:58 -07:00
Bago Amirbekian 72ecfd0950 [SPARK-25149][GRAPHX] Update Parallel Personalized Page Rank to test with large vertexIds
## What changes were proposed in this pull request?

runParallelPersonalizedPageRank in graphx checks that `sources` are <= Int.MaxValue.toLong, but this is not actually required. This check seems to have been added because we use sparse vectors in the implementation and sparse vectors cannot be indexed by values > MAX_INT. However we do not ever index the sparse vector by the source vertexIds so this isn't an issue. I've added a test with large vertexIds to confirm this works as expected.

## How was this patch tested?

Unit tests.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Closes #22139 from MrBago/remove-veretexId-check-pppr.

Authored-by: Bago Amirbekian <bago@databricks.com>
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2018-08-21 15:21:55 -07:00
Xiao Li a9928277da
[SPARK-24420][BUILD][FOLLOW-UP] Upgrade ASM6 APIs
## What changes were proposed in this pull request?
Use ASM 6 APIs after we upgrading it to ASM6.

## How was this patch tested?
N/A

Closes #22082 from gatorsmile/asm6.

Authored-by: Xiao Li <gatorsmile@gmail.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
2018-08-13 05:59:08 +00:00
Sean Owen 66699c5c30 [SPARK-25029][TESTS] Scala 2.12 issues: TaskNotSerializable and Janino "Two non-abstract methods ..." errors
## What changes were proposed in this pull request?

Fixes for test issues that arose after Scala 2.12 support was added -- ones that only affect the 2.12 build.

## How was this patch tested?

Existing tests.

Closes #22004 from srowen/SPARK-25029.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2018-08-07 17:30:37 -05:00
DB Tsai 5585c5765f
[SPARK-24420][BUILD] Upgrade ASM to 6.1 to support JDK9+
## What changes were proposed in this pull request?

Upgrade ASM to 6.1 to support JDK9+

## How was this patch tested?

Existing tests.

Author: DB Tsai <d_tsai@apple.com>

Closes #21459 from dbtsai/asm.
2018-07-03 10:13:48 -07:00
gatorsmile 651f76153f [SPARK-23028] Bump master branch version to 2.4.0-SNAPSHOT
## What changes were proposed in this pull request?
This patch bumps the master branch version to `2.4.0-SNAPSHOT`.

## How was this patch tested?
N/A

Author: gatorsmile <gatorsmile@gmail.com>

Closes #20222 from gatorsmile/bump24.
2018-01-13 00:37:59 +08:00
FanDonglai 6d230dccf6 Update PageRank.scala
## What changes were proposed in this pull request?

Hi, acording to code below,
"if (id == src) (0.0, Double.NegativeInfinity) else (0.0, 0.0)"
I think the comment can be wrong

## How was this patch tested?
Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: FanDonglai <ddna_1022@163.com>

Closes #20220 from ddna1021/master.
2018-01-11 09:06:40 -06:00