## What changes were proposed in this pull request?
Update Jetty 8.1 to the latest 2016/02 release, from a 2013/10 release, for security and bug fixes. This does not resolve the JIRA necessarily, as it's still worth considering an update to 9.3.
## How was this patch tested?
Jenkins tests
Author: Sean Owen <sowen@cloudera.com>
Closes#12842 from srowen/SPARK-14897.
## What changes were proposed in this pull request?
This PR copy the thrift-server from hive-service-1.2 (including TCLIService.thrift and generated Java source code) into sql/hive-thriftserver, so we can do further cleanup and improvements.
## How was this patch tested?
Existing tests.
Author: Davies Liu <davies@databricks.com>
Closes#12764 from davies/thrift_server.
## What changes were proposed in this pull request?
This PR adds `since` tag into the matrix and vector classes in spark-mllib-local.
## How was this patch tested?
Scala-style checks passed.
Author: Pravin Gadakh <prgadakh@in.ibm.com>
Closes#12416 from pravingadakh/SPARK-14613.
Spark's published POMs are supposed to be flattened and not contain variable substitution (see SPARK-3812), but the dummy dependency that was required for this was accidentally removed. We should re-introduce this dependency in order to fix an issue where the un-flattened POMs cause the wrong dependencies to be included in Scala 2.10 published POMs.
Author: Josh Rosen <joshrosen@databricks.com>
Closes#12706 from JoshRosen/SPARK-14925-published-poms-should-be-flattened.
First, make all dependencies in the examples module provided, and explicitly
list a couple of ones that somehow are promoted to compile by maven. This
means that to run streaming examples, the streaming connector package needs
to be provided to run-examples using --packages or --jars, just like regular
apps.
Also, remove a couple of outdated examples. HBase has had Spark bindings for
a while and is even including them in the HBase distribution in the next
version, making the examples obsolete. The same applies to Cassandra, which
seems to have a proper Spark binding library already.
I just tested the build, which passes, and ran SparkPi. The examples jars
directory now has only two jars:
```
$ ls -1 examples/target/scala-2.11/jars/
scopt_2.11-3.3.0.jar
spark-examples_2.11-2.0.0-SNAPSHOT.jar
```
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#12544 from vanzin/SPARK-14744.
## What changes were proposed in this pull request?
This PR creates a compatibility module in sql (called `hive-1-x-compatibility`), which will host HiveContext in Spark 2.0 (moving HiveContext to here will be done separately). This module is not included in assembly because only users who still want to access HiveContext need it.
## How was this patch tested?
I manually tested `sbt/sbt -Phive package` and `mvn -Phive package -DskipTests`.
Author: Yin Huai <yhuai@databricks.com>
Closes#12580 from yhuai/compatibility.
Spark SQL's POM hardcodes a dependency on `spark-sketch_2.11`, which causes Scala 2.10 builds to include the `_2.11` dependency. This is harmless since `spark-sketch` is a pure-Java module (see #12334 for a discussion of dropping the Scala version suffixes from these modules' artifactIds), but it's confusing to people looking at the published POMs.
This patch fixes this by using `${scala.binary.version}` to substitute the correct suffix, and also adds a set of Maven Enforcer rules to ensure that `_2.11` artifacts are not used in 2.10 builds (and vice-versa).
/cc ahirreddy, who spotted this issue.
Author: Josh Rosen <joshrosen@databricks.com>
Closes#12563 from JoshRosen/fix-sketch-scala-version.
## What changes were proposed in this pull request?
https://issues.apache.org/jira/browse/SPARK-14787
The possible problems are described in the JIRA above. Please refer this if you are wondering the purpose of this PR.
This PR upgrades Joda-Time library from 2.9 to 2.9.3.
## How was this patch tested?
`sbt scalastyle` and Jenkins tests in this PR.
closes#11847
Author: hyukjinkwon <gurwls223@gmail.com>
Closes#12552 from HyukjinKwon/SPARK-14787.
## What changes were proposed in this pull request?
Move json4s, breeze dependency declaration into parent
## How was this patch tested?
Should be no functional change, but Jenkins tests will test that.
Author: Sean Owen <sowen@cloudera.com>
Closes#12390 from srowen/SPARK-14612.
Add integration tests based on docker to test DB2 JDBC dialect support
Author: Luciano Resende <lresende@apache.org>
Closes#9893 from lresende/SPARK-10521.
## What changes were proposed in this pull request?
In order to separate the linear algebra, and vector matrix classes into a standalone jar, we need to setup the build first. This PR will create a new jar called mllib-local with minimal dependencies.
The previous PR was failing the build because of `spark-core:test` dependency, and that was reverted. In this PR, `FunSuite` with `// scalastyle:ignore funsuite` in mllib-local test was used, similar to sketch.
Thanks.
## How was this patch tested?
Unit tests
mengxr tedyu holdenk
Author: DB Tsai <dbt@netflix.com>
Closes#12298 from dbtsai/dbtsai-mllib-local-build-fix.
## What changes were proposed in this pull request?
Currently, `checkstyle` is configured to check the files under `src/main/java`. However, Spark has Java files in `src/main/scala`, too. This PR fixes the following configuration in `pom.xml` and the unchecked-so-far violations on those files.
```xml
-<sourceDirectory>${basedir}/src/main/java</sourceDirectory>
+<sourceDirectories>${basedir}/src/main/java,${basedir}/src/main/scala</sourceDirectories>
```
## How was this patch tested?
After passing the Jenkins build and manually `dev/lint-java`. (Note that Jenkins does not run `lint-java`)
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes#12242 from dongjoon-hyun/SPARK-14465.
## What changes were proposed in this pull request?
In order to separate the linear algebra, and vector matrix classes into a standalone jar, we need to setup the build first. This PR will create a new jar called mllib-local with minimal dependencies. The test scope will still depend on spark-core and spark-core-test in order to use the common utilities, but the runtime will avoid any platform dependency. Couple platform independent classes will be moved to this package to demonstrate how this work.
## How was this patch tested?
Unit tests
Author: DB Tsai <dbt@netflix.com>
Closes#12241 from dbtsai/dbtsai-mllib-local-build.
This patch upgrades Chill to 0.8.0 and Kryo to 3.0.3. While we'll likely need to bump these dependencies again before Spark 2.0 (due to SPARK-14221 / https://github.com/twitter/chill/issues/252), I wanted to get the bulk of the Kryo 2 -> Kryo 3 migration done now in order to figure out whether there are any unexpected surprises.
Author: Josh Rosen <joshrosen@databricks.com>
Closes#12076 from JoshRosen/kryo3.
## What changes were proposed in this pull request?
This splits commons.httpclient.version from commons.httpcore.version, since these two versions do not necessarily have to be the same. This change may follow up with an up-to-date version of the httpclient/httpcore libraries.
The latest 4.3.x httpclient version as of writing is 4.3.6 and the latest 4.3.x httpcore version as of writing is 4.3.3. This change would be a prerequisite for potentially moving to this new bugfix version.
## How was this patch tested?
no version change was made for httpclient/httpcore versions
mvn package
Author: Aaron Tokhy <tokaaron@amazon.com>
Closes#12245 from atokhy/pull-request.
The current package name uses a dash, which is a little weird but seemed
to work. That is, until a new test tried to mock a class that references
one of those shaded types, and then things started failing.
Most changes are just noise to fix the logging configs.
For reference, SPARK-8815 also raised this issue, although at the time it
did not cause any issues in Spark, so it was not addressed.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#11941 from vanzin/SPARK-14134.
This change modifies the "assembly/" module to just copy needed
dependencies to its build directory, and modifies the packaging
script to pick those up (and remove duplicate jars packages in the
examples module).
I also made some minor adjustments to dependencies to remove some
test jars from the final packaging, and remove jars that conflict with each
other when packaged separately (e.g. servlet api).
Also note that this change restores guava in applications' classpaths, even
though it's still shaded inside Spark. This is now needed for the Hadoop
libraries that are packaged with Spark, which now are not processed by
the shade plugin.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#11796 from vanzin/SPARK-13579.
## What changes were proposed in this pull request?
Upgrade to 2.11.8 (from the current 2.11.7)
## How was this patch tested?
A manual build
Author: Jacek Laskowski <jacek@japila.pl>
Closes#11681 from jaceklaskowski/SPARK-13825-scala-2_11_8.
## What changes were proposed in this pull request?
Upgrade snappy to 1.1.2.4 to improve snappy read/write performance.
## How was this patch tested?
Tested by running a job on the cluster and saw 7.5% cpu savings after this change.
Author: Sital Kedia <skedia@fb.com>
Closes#12096 from sitalkedia/snappyRelease.
This patch fixes a compilation / build break in Spark's `java8-tests` and refactors their POM to simplify the build. See individual commit messages for more details.
Author: Josh Rosen <joshrosen@databricks.com>
Closes#12073 from JoshRosen/fix-java8-tests.
## What changes were proposed in this pull request?
Exclude jline from curator-recipes since it conflicts with scala 2.11 when running spark-shell. Should not affect scala 2.10 since it is builtin.
## How was this patch tested?
Ran spark-shell manually.
Author: Michel Lemay <mlemay@gmail.com>
Closes#12043 from michellemay/spark-13710-fix-jline-on-windows.
### What changes were proposed in this pull request?
This PR removes the ANTLR3 based parser, and moves the new ANTLR4 based parser into the `org.apache.spark.sql.catalyst.parser package`.
### How was this patch tested?
Existing unit tests.
cc rxin andrewor14 yhuai
Author: Herman van Hovell <hvanhovell@questtec.nl>
Closes#12071 from hvanhovell/SPARK-14211.
### What changes were proposed in this pull request?
The current ANTLR3 parser is quite complex to maintain and suffers from code blow-ups. This PR introduces a new parser that is based on ANTLR4.
This parser is based on the [Presto's SQL parser](https://github.com/facebook/presto/blob/master/presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4). The current implementation can parse and create Catalyst and SQL plans. Large parts of the HiveQl DDL and some of the DML functionality is currently missing, the plan is to add this in follow-up PRs.
This PR is a work in progress, and work needs to be done in the following area's:
- [x] Error handling should be improved.
- [x] Documentation should be improved.
- [x] Multi-Insert needs to be tested.
- [ ] Naming and package locations.
### How was this patch tested?
Catalyst and SQL unit tests.
Author: Herman van Hovell <hvanhovell@questtec.nl>
Closes#11557 from hvanhovell/ngParser.
## What changes were proposed in this pull request?
This PR moves flume back to Spark as per the discussion in the dev mail-list.
## How was this patch tested?
Existing Jenkins tests.
Author: Shixiong Zhu <shixiong@databricks.com>
Closes#11895 from zsxwing/move-flume-back.
As part of the goal to stop creating assemblies in Spark, this change
modifies the mvn and sbt builds to not create an assembly for examples.
Instead, dependencies are copied to the build directory (under
target/scala-xx/jars), and in the final archive, into the "examples/jars"
directory.
To avoid having to deal too much with Windows batch files, I made examples
run through the launcher library; the spark-submit launcher now has a
special mode to run examples, which adds all the necessary jars to the
spark-submit command line, and replaces the bash and batch scripts that
were used to run examples. The scripts are now just a thin wrapper around
spark-submit; another advantage is that now all spark-submit options are
supported.
There are a few glitches; in the mvn build, a lot of duplicated dependencies
get copied, because they are promoted to "compile" scope due to extra
dependencies in the examples module (such as HBase). In the sbt build,
all dependencies are copied, because there doesn't seem to be an easy
way to filter things.
I plan to clean some of this up when the rest of the tasks are finished.
When the main assembly is replaced with jars, we can remove duplicate jars
from the examples directory during packaging.
Tested by running SparkPi in: maven build, sbt build, dist created by
make-distribution.sh.
Finally: note that running the "assembly" target in sbt doesn't build
the examples anymore. You need to run "package" for that.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#11452 from vanzin/SPARK-13576.
## What changes were proposed in this pull request?
Currently there are a few sub-projects, each for integrating with different external sources for Streaming. Now that we have better ability to include external libraries (spark packages) and with Spark 2.0 coming up, we can move the following projects out of Spark to https://github.com/spark-packages
- streaming-flume
- streaming-akka
- streaming-mqtt
- streaming-zeromq
- streaming-twitter
They are just some ancillary packages and considering the overhead of maintenance, running tests and PR failures, it's better to maintain them out of Spark. In addition, these projects can have their different release cycles and we can release them faster.
I have already copied these projects to https://github.com/spark-packages
## How was this patch tested?
Jenkins tests
Author: Shixiong Zhu <shixiong@databricks.com>
Closes#11672 from zsxwing/remove-external-pkg.
## What changes were proposed in this pull request?
Update snappy to 1.1.2.1 to pull in a single fix -- the OOM fix we already worked around.
Supersedes https://github.com/apache/spark/pull/11524
## How was this patch tested?
Jenkins tests.
Author: Sean Owen <sowen@cloudera.com>
Closes#11631 from srowen/SPARK-13663.
## What changes were proposed in this pull request?
Move `docker` dirs out of top level into `external/`; move `extras/*` into `external/`
## How was this patch tested?
This is tested with Jenkins tests.
Author: Sean Owen <sowen@cloudera.com>
Closes#11523 from srowen/SPARK-13595.
## What changes were proposed in this pull request?
Remove last usage of jblas, in tests
## How was this patch tested?
Jenkins tests -- the same ones that are being modified.
Author: Sean Owen <sowen@cloudera.com>
Closes#11560 from srowen/SPARK-13715.
## What changes were proposed in this pull request?
This PR fixes `dev/lint-java` and `mvn checkstyle:check` failures due the recent file location change.
The following is the error message of current master.
```
Checkstyle checks failed at following occurrences:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check (default-cli) on project spark-parent_2.11: Failed during checkstyle configuration: cannot initialize module SuppressionFilter - Cannot set property 'file' to 'checkstyle-suppressions.xml' in module SuppressionFilter: InvocationTargetException: Unable to find: checkstyle-suppressions.xml -> [Help 1]
```
## How was this patch tested?
Manual. The following command should run correctly.
```
./dev/lint-java
mvn checkstyle:check
```
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes#11567 from dongjoon-hyun/hotfix_checkstyle_suppression.
## What changes were proposed in this pull request?
Move many top-level files in dev/ or other appropriate directory. In particular, put `make-distribution.sh` in `dev` and update docs accordingly. Remove deprecated `sbt/sbt`.
I was (so far) unable to figure out how to move `tox.ini`. `scalastyle-config.xml` should be movable but edits to the project `.sbt` files didn't work; config file location is updatable for compile but not test scope.
## How was this patch tested?
`./dev/run-tests` to verify RAT and checkstyle work. Jenkins tests for the rest.
Author: Sean Owen <sowen@cloudera.com>
Closes#11522 from srowen/SPARK-13596.
## What changes were proposed in this pull request?
Modifies the dependency declarations of the all the hive artifacts, to explicitly exclude the groovy-all JAR.
This stops the groovy classes *and everything else in that uber-JAR* from getting into spark-assembly JAR.
## How was this patch tested?
1. Pre-patch build was made: `mvn clean install -Pyarn,hive,hive-thriftserver`
1. spark-assembly expanded, observed to have the org.codehaus.groovy packages and JARs
1. A maven dependency tree was created `mvn dependency:tree -Pyarn,hive,hive-thriftserver -Dverbose > target/dependencies.txt`
1. This text file examined to confirm that groovy was being imported as a dependency of `org.spark-project.hive`
1. Patch applied
1. Repeated step1: clean build of project with ` -Pyarn,hive,hive-thriftserver` set
1. Examined created spark-assembly, verified no org.codehaus packages
1. Verified that the maven dependency tree no longer references groovy
Note also that the size of the assembly JAR was 181628646 bytes before this patch, 166318515 after —15MB smaller. That's a good metric of things being excluded
Author: Steve Loughran <stevel@hortonworks.com>
Closes#11449 from steveloughran/fixes/SPARK-13599-groovy-dependency.
## What changes were proposed in this pull request?
This patch moves tags and unsafe modules into common directory to remove 2 top level non-user-facing directories.
## How was this patch tested?
Jenkins should suffice.
Author: Reynold Xin <rxin@databricks.com>
Closes#11426 from rxin/SPARK-13548.
## What changes were proposed in this pull request?
As the title says, this moves the three modules currently in network/ into common/network-*. This removes one top level, non-user-facing folder.
## How was this patch tested?
Compilation and existing tests. We should run both SBT and Maven.
Author: Reynold Xin <rxin@databricks.com>
Closes#11409 from rxin/SPARK-13529.
It registers more Scala classes, including ListBuffer to support Kryo with FPGrowth.
See https://github.com/twitter/chill/releases for Chill's change log.
Author: mark800 <yky800@126.com>
Closes#11041 from mark800/master.
Phase 1: update plugin versions, test dependencies, some example and third-party versions
Author: Sean Owen <sowen@cloudera.com>
Closes#11206 from srowen/SPARK-13324.
This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds).
The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance).
After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break.
Author: Josh Rosen <joshrosen@databricks.com>
Closes#10608 from JoshRosen/SPARK-6363.
This PR adds an initial implementation of count min sketch, contained in a new module spark-sketch under `common/sketch`. The implementation is based on the [`CountMinSketch` class in stream-lib][1].
As required by the [design doc][2], spark-sketch should have no external dependency.
Two classes, `Murmur3_x86_32` and `Platform` are copied to spark-sketch from spark-unsafe for hashing facilities. They'll also be used in the upcoming bloom filter implementation.
The following features will be added in future follow-up PRs:
- Serialization support
- DataFrame API integration
[1]: aac6b4d23a/src/main/java/com/clearspring/analytics/stream/frequency/CountMinSketch.java
[2]: https://issues.apache.org/jira/secure/attachment/12782378/BloomFilterandCount-MinSketchinSpark2.0.pdf
Author: Cheng Lian <lian@databricks.com>
Closes#10851 from liancheng/count-min-sketch.
- Remove Akka dependency from core. Note: the streaming-akka project still uses Akka.
- Remove HttpFileServer
- Remove Akka configs from SparkConf and SSLOptions
- Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult` depends on it.
- Update comments and docs
Author: Shixiong Zhu <shixiong@databricks.com>
Closes#10854 from zsxwing/remove-akka.
Include the following changes:
1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream
2. Remove "StreamingContext.actorStream" and "JavaStreamingContext.actorStream"
3. Update the ActorWordCount example and add the JavaActorWordCount example
4. Make "streaming-zeromq" depend on "streaming-akka" and update the codes accordingly
Author: Shixiong Zhu <shixiong@databricks.com>
Closes#10744 from zsxwing/streaming-akka-2.
This patch adds a Hadoop 2.7 build profile in order to let us automate tests against that version.
/cc rxin srowen
Author: Josh Rosen <joshrosen@databricks.com>
Closes#10775 from JoshRosen/add-hadoop-2.7-profile.
The current Spark Streaming kinesis connector references a quite old version 1.9.40 of the AWS Java SDK (1.10.40 is current). Numerous AWS features including Kinesis Firehose are unavailable in 1.9. Those two versions of the AWS SDK in turn require conflicting versions of Jackson (2.4.4 and 2.5.3 respectively) such that one cannot include the current AWS SDK in a project that also uses the Spark Streaming Kinesis ASL.
Author: BrianLondon <brian@seatgeek.com>
Closes#10256 from BrianLondon/master.
This is a hotfix for a build bug introduced by the Netty exclusion changes in #10672. We can't exclude `io.netty:netty` because Akka depends on it. There's not a direct conflict between `io.netty:netty` and `io.netty:netty-all`, because the former puts classes in the `org.jboss.netty` namespace while the latter uses the `io.netty` namespace. However, there still is a conflict between `org.jboss.netty:netty` and `io.netty:netty`, so we need to continue to exclude the JBoss version of that artifact.
While the diff here looks somewhat large, note that this is only a revert of a some of the changes from #10672. You can see the net changes in pom.xml at 3119206b71...5211ab8 (diff-600376dffeb79835ede4a0b285078036)
Author: Josh Rosen <joshrosen@databricks.com>
Closes#10693 from JoshRosen/netty-hotfix.
Netty classes are published under multiple artifacts with different names, so our build needs to exclude the `io.netty:netty` and `org.jboss.netty:netty` versions of the Netty artifact. However, our existing exclusions were incomplete, leading to situations where duplicate Netty classes would wind up on the classpath and cause compile errors (or worse).
This patch fixes the exclusion issue by adding more exclusions and uses Maven Enforcer's [banned dependencies](https://maven.apache.org/enforcer/enforcer-rules/bannedDependencies.html) rule to prevent these classes from accidentally being reintroduced. I also updated `dev/test-dependencies.sh` to run `mvn validate` so that the enforcer rules can run as part of pull request builds.
/cc rxin srowen pwendell. I'd like to backport at least the exclusion portion of this fix to `branch-1.5` in order to fix the documentation publishing job, which fails nondeterministically due to incompatible versions of Netty classes taking precedence on the compile-time classpath.
Author: Josh Rosen <rosenville@gmail.com>
Author: Josh Rosen <joshrosen@databricks.com>
Closes#10672 from JoshRosen/enforce-netty-exclusions.
This patch removes all non-Maven-central repositories from Spark's build, thereby avoiding any risk of future build-breaks due to us accidentally depending on an artifact which is not present in an immutable public Maven repository.
I tested this by running
```
build/mvn \
-Phive \
-Phive-thriftserver \
-Pkinesis-asl \
-Pspark-ganglia-lgpl \
-Pyarn \
dependency:go-offline
```
inside of a fresh Ubuntu Docker container with no Ivy or Maven caches (I did a similar test for SBT).
Author: Josh Rosen <joshrosen@databricks.com>
Closes#10659 from JoshRosen/SPARK-4628.