spark-instrumented-optimizer/dev
Sean Owen 6378d4bc06 [SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3
### What changes were proposed in this pull request?

- Remove SQLContext.createExternalTable and Catalog.createExternalTable, deprecated in favor of createTable since 2.2.0, plus tests of deprecated methods
- Remove HiveContext, deprecated in 2.0.0, in favor of `SparkSession.builder.enableHiveSupport`
- Remove deprecated KinesisUtils.createStream methods, plus tests of deprecated methods, deprecate in 2.2.0
- Remove deprecated MLlib (not Spark ML) linear method support, mostly utility constructors and 'train' methods, and associated docs. This includes methods in LinearRegression, LogisticRegression, Lasso, RidgeRegression. These have been deprecated since 2.0.0
- Remove deprecated Pyspark MLlib linear method support, including LogisticRegressionWithSGD, LinearRegressionWithSGD, LassoWithSGD
- Remove 'runs' argument in KMeans.train() method, which has been a no-op since 2.0.0
- Remove deprecated ChiSqSelector isSorted protected method
- Remove deprecated 'yarn-cluster' and 'yarn-client' master argument in favor of 'yarn' and deploy mode 'cluster', etc

Notes:

- I was not able to remove deprecated DataFrameReader.json(RDD) in favor of DataFrameReader.json(Dataset); the former was deprecated in 2.2.0, but, it is still needed to support Pyspark's .json() method, which can't use a Dataset.
- Looks like SQLContext.createExternalTable was not actually deprecated in Pyspark, but, almost certainly was meant to be? Catalog.createExternalTable was.
- I afterwards noted that the toDegrees, toRadians functions were almost removed fully in SPARK-25908, but Felix suggested keeping just the R version as they hadn't been technically deprecated. I'd like to revisit that. Do we really want the inconsistency? I'm not against reverting it again, but then that implies leaving SQLContext.createExternalTable just in Pyspark too, which seems weird.
- I *kept* LogisticRegressionWithSGD, LinearRegressionWithSGD, LassoWithSGD, RidgeRegressionWithSGD in Pyspark, though deprecated, as it is hard to remove them (still used by StreamingLogisticRegressionWithSGD?) and they are not fully removed in Scala. Maybe should not have been deprecated.

### Why are the changes needed?

Deprecated items are easiest to remove in a major release, so we should do so as much as possible for Spark 3. This does not target items deprecated 'recently' as of Spark 2.3, which is still 18 months old.

### Does this PR introduce any user-facing change?

Yes, in that deprecated items are removed from some public APIs.

### How was this patch tested?

Existing tests.

Closes #25684 from srowen/SPARK-28980.

Lead-authored-by: Sean Owen <sean.owen@databricks.com>
Co-authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-09-09 10:19:40 -05:00
..
create-release [SPARK-28951][INFRA] Add release announce template 2019-09-02 14:55:05 -07:00
deps [SPARK-29011][BUILD] Update netty-all from 4.1.30-Final to 4.1.39-Final 2019-09-06 17:48:53 -07:00
sparktestsupport [SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3 2019-09-09 10:19:40 -05:00
tests [MINOR] Fix typos in dev/* scripts. 2018-01-31 07:37:25 +09:00
.gitignore [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*.py to .gitignore file. 2018-01-31 00:51:00 +09:00
.rat-excludes [SPARK-27489][WEBUI] UI updates to show executor resource information 2019-09-04 09:45:44 +08:00
.scalafmt.conf [SPARK-26177] Config change followup to [] Automated formatting for Scala code 2018-12-03 10:03:51 -06:00
appveyor-guide.md [SPARK-26918][DOCS] All .md should have ASF license header 2019-03-30 19:49:45 -05:00
appveyor-install-dependencies.ps1 [SPARK-28961][HOT-FIX][BUILD] Upgrade Maven from 3.6.1 to 3.6.2 2019-09-03 11:06:57 -07:00
change-scala-version.sh [SPARK-26132][BUILD][CORE] Remove support for Scala 2.11 in Spark 3.0.0 2019-03-25 10:46:42 -05:00
check-license [MINOR][BUILD] Upgrade apache-rat to 0.13 2019-04-01 16:44:42 +09:00
checkstyle-suppressions.xml [MINOR][BUILD] Update all checkstyle dtd to use "https://checkstyle.org" 2019-02-25 11:25:53 -08:00
checkstyle.xml [MINOR][BUILD] Update all checkstyle dtd to use "https://checkstyle.org" 2019-02-25 11:25:53 -08:00
github_jira_sync.py [SPARK-27889][INFRA] Make development scripts under dev/ support Python 3 2019-08-09 18:55:48 +09:00
lint-java [SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses) 2018-01-13 21:34:28 -08:00
lint-python [BUILD] refactor dev/lint-python in to something readable 2018-11-20 12:38:40 -08:00
lint-r [SPARK-10328] [SPARKR] Fix generic for na.omit 2015-08-28 00:37:50 -07:00
lint-r.R [SPARK-22063][R] Fixes lint check failures in R by latest commit sha1 ID of lint-r 2017-10-01 18:42:45 +09:00
lint-scala [SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles 2019-03-15 08:20:42 +09:00
make-distribution.sh [SPARK-26095][BUILD] Disable parallelization in make-distibution.sh. 2018-11-16 15:57:38 -08:00
merge_spark_pr.py [SPARK-28857][INFRA] Clean up the comments of PR template during merging 2019-08-23 18:08:10 +09:00
mima [SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles 2019-03-15 08:20:42 +09:00
pip-sanity-check.py [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis 2019-01-17 19:40:39 -06:00
README.md Merge pull request #565 from pwendell/dev-scripts. Closes #565. 2014-02-08 23:13:34 -08:00
requirements.txt [SPARK-25270] lint-python: Add flake8 to find syntax errors and undefined names 2018-09-07 09:35:25 -07:00
run-pip-tests Fix typos detected by github.com/client9/misspell 2018-08-11 21:23:36 -05:00
run-tests [SPARK-22302][INFRA] Remove manual backports for subprocess and print explicit message for < Python 2.7 2017-10-22 02:22:35 +09:00
run-tests-jenkins [MINOR] Fix typos in dev/* scripts. 2018-01-31 07:37:25 +09:00
run-tests-jenkins.py [SPARK-28701][TEST-HADOOP3.2][TEST-JAVA11][K8S] adding java11 support for pull request builds 2019-08-27 00:48:01 +09:00
run-tests.py [SPARK-28701][INFRA][FOLLOWUP] Fix the key error when looking in os.environ 2019-08-26 12:40:31 -07:00
sbt-checkstyle [SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles 2019-03-15 08:20:42 +09:00
scalafmt [SPARK-26177] Automated formatting for Scala code 2018-11-29 08:54:31 -06:00
scalastyle [SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles 2019-03-15 08:20:42 +09:00
test-dependencies.sh [SPARK-27175][BUILD] Upgrade hadoop-3 to 3.2.0 2019-03-16 19:42:05 -05:00
tox.ini [SPARK-23367][BUILD] Include python document style checking 2018-10-27 08:20:42 -05:00

Spark Developer Scripts

This directory contains scripts useful to developers when packaging, testing, or committing to Spark.

Many of these scripts require Apache credentials to work correctly.