01cf6f4c6b
### What changes were proposed in this pull request? There are 3 ways to use Guava cache in spark code: 1. `Loadingcache` is the main way to use Guava cache in spark code and the key usages are as follows: a. `LoadingCache` with `maximumsize` data eviction policy, such as `appCache` in `ApplicationCache`, `cache` in `Codegenerator` b. `LoadingCache` with `maximumWeight` data eviction policy, such as `shuffleIndexCache` in `ExternalShuffleBlockResolver` c. `LoadingCache` with 'expireAfterWrite' data eviction policy, such as `tableRelationCache` in `SessionCatalog` 2. `ManualCache` is another way to use Guava cache in spark code and the key usage is `cache` in `SharedInMemoryCache`, it use to caches partition file statuses in memory 3. The last use way is `hadoopJobMetadata` in `SparkEnv`, it uses Guava Cache to build a `soft-reference map`. The goal of this pr is use `Caffeine` instead of `Guava Cache` because `Caffeine` is faster than `Guava Cache` from benchmarks, the main changes as follows: 1. Add `Caffeine` deps to maven `pom.xml` 2. Use `Caffeine` instead of Guava `LoadingCache`, `ManualCache` and soft-reference map in `SparkEnv` 3. Add `LocalCacheBenchmark` to compare performance of `Loadingcache` between `Guava Cache` and `Caffeine` ### Why are the changes needed? `Caffeine` is faster than `Guava Cache` from benchmarks ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass the Jenkins or GitHub Action - Add `LocalCacheBenchmark` to compare performance of `Loadingcache` between `Guava Cache` and `Caffeine` Closes #31517 from LuciferYang/guava-cache-to-caffeine. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Holden Karau <hkarau@netflix.com> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |