Commit graph

775 commits

Author SHA1 Message Date
Cheng Lian 1c690ddafa [SPARK-12933][SQL] Initial implementation of Count-Min sketch
This PR adds an initial implementation of count min sketch, contained in a new module spark-sketch under `common/sketch`. The implementation is based on the [`CountMinSketch` class in stream-lib][1].

As required by the [design doc][2], spark-sketch should have no external dependency.
Two classes, `Murmur3_x86_32` and `Platform` are copied to spark-sketch from spark-unsafe for hashing facilities. They'll also be used in the upcoming bloom filter implementation.

The following features will be added in future follow-up PRs:

- Serialization support
- DataFrame API integration

[1]: aac6b4d23a/src/main/java/com/clearspring/analytics/stream/frequency/CountMinSketch.java
[2]: https://issues.apache.org/jira/secure/attachment/12782378/BloomFilterandCount-MinSketchinSpark2.0.pdf

Author: Cheng Lian <lian@databricks.com>

Closes #10851 from liancheng/count-min-sketch.
2016-01-23 00:34:55 -08:00
Shixiong Zhu bc1babd63d [SPARK-7997][CORE] Remove Akka from Spark Core and Streaming
- Remove Akka dependency from core. Note: the streaming-akka project still uses Akka.
- Remove HttpFileServer
- Remove Akka configs from SparkConf and SSLOptions
- Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult`  depends on it.
- Update comments and docs

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10854 from zsxwing/remove-akka.
2016-01-22 21:20:04 -08:00
Shixiong Zhu b7d74a602f [SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" project
Include the following changes:

1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream
2. Remove "StreamingContext.actorStream" and "JavaStreamingContext.actorStream"
3. Update the ActorWordCount example and add the JavaActorWordCount example
4. Make "streaming-zeromq" depend on "streaming-akka" and update the codes accordingly

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10744 from zsxwing/streaming-akka-2.
2016-01-20 13:55:41 -08:00
Shixiong Zhu 944fdadf77 [SPARK-12847][CORE][STREAMING] Remove StreamingListenerBus and post all Streaming events to the same thread as Spark events
Including the following changes:

1. Add StreamingListenerForwardingBus to WrappedStreamingListenerEvent process events in `onOtherEvent` to StreamingListener
2. Remove StreamingListenerBus
3. Merge AsynchronousListenerBus and LiveListenerBus to the same class LiveListenerBus
4. Add `logEvent` method to SparkListenerEvent so that EventLoggingListener can use it to ignore WrappedStreamingListenerEvents

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10779 from zsxwing/streaming-listener.
2016-01-20 11:57:53 -08:00
Michael Armbrust efd7eed322 [BUILD] Runner for spark packages
This is a convenience method added to the SBT build for developers, though if people think its useful we could consider adding a official script that runs using the assembly instead of compiling on demand.  It simply compiles spark (without requiring an assembly), and invokes Spark Submit to download / run the package.

Example Usage:
```
$ build/sbt
> sparkPackage com.databricks:spark-sql-perf_2.10:0.2.4 com.databricks.spark.sql.perf.RunBenchmark --help
```

Author: Michael Armbrust <michael@databricks.com>

Closes #10834 from marmbrus/sparkPackageRunner.
2016-01-19 14:28:00 -08:00
Reynold Xin 38c3c0e31a [SPARK-12855][SQL] Remove parser dialect developer API
This pull request removes the public developer parser API for external parsers. Given everything a parser depends on (e.g. logical plans and expressions) are internal and not stable, external parsers will break with every release of Spark. It is a bad idea to create the illusion that Spark actually supports pluggable parsers. In addition, this also reduces incentives for 3rd party projects to contribute parse improvements back to Spark.

Author: Reynold Xin <rxin@databricks.com>

Closes #10801 from rxin/SPARK-12855.
2016-01-18 13:55:42 -08:00
Reynold Xin ad1503f92e [SPARK-12667] Remove block manager's internal "external block store" API
This pull request removes the external block store API. This is rarely used, and the file system interface is actually a better, more standard way to interact with external storage systems.

There are some other things to remove also, as pointed out by JoshRosen. We will do those as follow-up pull requests.

Author: Reynold Xin <rxin@databricks.com>

Closes #10752 from rxin/remove-offheap.
2016-01-15 12:03:28 -08:00
Kousuke Saruta 39ae04e6b7 [SPARK-12692][BUILD][STREAMING] Scala style: Fix the style violation (Space before "," or ":")
Fix the style violation (space before , and :).
This PR is a followup for #10643.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #10685 from sarutak/SPARK-12692-followup-streaming.
2016-01-11 21:06:22 -08:00
Kousuke Saruta 008a558285 [SPARK-4628][BUILD] Add a resolver to MiMaBuild.scala for mqttv3(1.0.1).
#10659 removed the repository `https://repo.eclipse.org/content/repositories/paho-releases` but it's needed by MiMa because `spark-streaming-mqtt(1.6.0)` depends on `mqttv3(1.0.1)` and it is provided by the removed repository and maven-central provide only `mqttv3(1.0.2)` for now.
Otherwise, if `mqttv3(1.0.1)` is absent from the local repository, dev/mima should fail.

JoshRosen Do you have any other better idea?

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #10688 from sarutak/SPARK-4628-followup.
2016-01-10 23:33:57 -08:00
Josh Rosen 090d691323 [SPARK-4628][BUILD] Remove all non-Maven-Central repositories from build
This patch removes all non-Maven-central repositories from Spark's build, thereby avoiding any risk of future build-breaks due to us accidentally depending on an artifact which is not present in an immutable public Maven repository.

I tested this by running

```
build/mvn \
        -Phive \
        -Phive-thriftserver \
        -Pkinesis-asl \
        -Pspark-ganglia-lgpl \
        -Pyarn \
        dependency:go-offline
```

inside of a fresh Ubuntu Docker container with no Ivy or Maven caches (I did a similar test for SBT).

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10659 from JoshRosen/SPARK-4628.
2016-01-08 20:58:53 -08:00
Sean Owen 659fd9d04b [SPARK-4819] Remove Guava's "Optional" from public API
Replace Guava `Optional` with (an API clone of) Java 8 `java.util.Optional` (edit: and a clone of Guava `Optional`)

See also https://github.com/apache/spark/pull/10512

Author: Sean Owen <sowen@cloudera.com>

Closes #10513 from srowen/SPARK-4819.
2016-01-08 13:02:30 -08:00
Shixiong Zhu 28e0e500a2 [SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for Kryo
The default serializer in Kryo is FieldSerializer and it ignores transient fields and never calls `writeObject` or `readObject`. So we should register OpenHashMapBasedStateMap using `DefaultSerializer` to make it work with Kryo.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10609 from zsxwing/SPARK-12591.
2016-01-07 17:46:24 -08:00
Shixiong Zhu c0c397509b [SPARK-12510][STREAMING] Refactor ActorReceiver to support Java
This PR includes the following changes:

1. Rename `ActorReceiver` to `ActorReceiverSupervisor`
2. Remove `ActorHelper`
3. Add a new `ActorReceiver` for Scala and `JavaActorReceiver` for Java
4. Add `JavaActorWordCount` example

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10457 from zsxwing/java-actor-stream.
2016-01-07 15:26:55 -08:00
Herman van Hovell ea489f14f1 [SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst
This PR moves a major part of the new SQL parser to Catalyst. This is a prelude to start using this parser for all of our SQL parsing. The following key changes have been made:

The ANTLR Parser & Supporting classes have been moved to the Catalyst project. They are now part of the ```org.apache.spark.sql.catalyst.parser``` package. These classes contained quite a bit of code that was originally from the Hive project, I have added aknowledgements whenever this applied. All Hive dependencies have been factored out. I have also taken this chance to clean-up the ```ASTNode``` class, and to improve the error handling.

The HiveQl object that provides the functionality to convert an AST into a LogicalPlan has been refactored into three different classes, one for every SQL sub-project:
- ```CatalystQl```: This implements Query and Expression parsing functionality.
- ```SparkQl```: This is a subclass of CatalystQL and provides SQL/Core only functionality such as Explain and Describe.
- ```HiveQl```: This is a subclass of ```SparkQl``` and this adds Hive-only functionality to the parser such as Analyze, Drop, Views, CTAS & Transforms. This class still depends on Hive.

cc rxin

Author: Herman van Hovell <hvanhovell@questtec.nl>

Closes #10583 from hvanhovell/SPARK-12575.
2016-01-06 11:16:53 -08:00
Kousuke Saruta 94c202c7d2 [SPARK-12665][CORE][GRAPHX] Remove Vector, VectorSuite and GraphKryoRegistrator which are deprecated and no longer used
Whole code of Vector.scala, VectorSuite.scala and GraphKryoRegistrator.scala  are no longer used so it's time to remove them in Spark 2.0.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #10613 from sarutak/SPARK-12665.
2016-01-06 10:19:41 -08:00
Davies Liu 70fe6ce52f [SPARK-12659] fix NPE in UnsafeExternalSorter (used by cartesian product)
Cartesian product use UnsafeExternalSorter without comparator to do spilling, it will NPE if spilling happens.

This bug also hitted by #10605

cc JoshRosen

Author: Davies Liu <davies@databricks.com>

Closes #10606 from davies/fix_spilling.
2016-01-05 18:46:52 -08:00
Reynold Xin 8ce645d4ee [SPARK-12615] Remove some deprecated APIs in RDD/SparkContext
I looked at each case individually and it looks like they can all be removed. The only one that I had to think twice was toArray (I even thought about un-deprecating it, until I realized it was a problem in Java to have toArray returning java.util.List).

Author: Reynold Xin <rxin@databricks.com>

Closes #10569 from rxin/SPARK-12615.
2016-01-05 11:10:14 -08:00
Liang-Chi Hsieh 9a6ba7e2c5 [SPARK-12643][BUILD] Set lib directory for antlr
JIRA: https://issues.apache.org/jira/browse/SPARK-12643

Without setting lib directory for antlr, the updates of imported grammar files can not be detected. So SparkSqlParser.g will not be rebuilt automatically.

Since it is a minor update, no JIRA ticket is opened. Let me know if it is needed. Thanks.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #10571 from viirya/antlr-build.
2016-01-05 10:21:47 -08:00
Reynold Xin 77ab49b857 [SPARK-12600][SQL] Remove deprecated methods in Spark SQL
Author: Reynold Xin <rxin@databricks.com>

Closes #10559 from rxin/remove-deprecated-sql.
2016-01-04 18:02:38 -08:00
Reynold Xin 7b92922f7f Update MimaExcludes now Spark 1.6 is in Maven.
Author: Reynold Xin <rxin@databricks.com>

Closes #10561 from rxin/update-mima.
2016-01-03 16:58:01 -08:00
Sean Owen 15bd73627e [SPARK-12481][CORE][STREAMING][SQL] Remove usage of Hadoop deprecated APIs and reflection that supported 1.x
Remove use of deprecated Hadoop APIs now that 2.2+ is required

Author: Sean Owen <sowen@cloudera.com>

Closes #10446 from srowen/SPARK-12481.
2016-01-02 13:15:53 +00:00
Herman van Hovell 970635a9f8 [SPARK-12362][SQL][WIP] Inline Hive Parser
This PR inlines the Hive SQL parser in Spark SQL.

The previous (merged) incarnation of this PR passed all tests, but had and still has problems with the build. These problems are caused by a the fact that - for some reason - in some cases the ANTLR generated code is not included in the compilation fase.

This PR is a WIP and should not be merged until we have sorted out the build issues.

Author: Herman van Hovell <hvanhovell@questtec.nl>
Author: Nong Li <nong@databricks.com>
Author: Nong Li <nongli@gmail.com>

Closes #10525 from hvanhovell/SPARK-12362.
2016-01-01 23:22:50 -08:00
Shixiong Zhu 4f5a24d7e7 [SPARK-7995][SPARK-6280][CORE] Remove AkkaRpcEnv and remove systemName from setupEndpointRef
### Remove AkkaRpcEnv

Keep `SparkEnv.actorSystem` because Streaming still uses it. Will remove it and AkkaUtils after refactoring Streaming actorStream API.

### Remove systemName
There are 2 places using `systemName`:
* `RpcEnvConfig.name`. Actually, although it's used as `systemName` in `AkkaRpcEnv`, `NettyRpcEnv` uses it as the service name to output the log `Successfully started service *** on port ***`. Since the service name in log is useful, I keep `RpcEnvConfig.name`.
* `def setupEndpointRef(systemName: String, address: RpcAddress, endpointName: String)`. Each `ActorSystem` has a `systemName`. Akka requires `systemName` in its URI and will refuse a connection if `systemName` is not matched. However, `NettyRpcEnv` doesn't use it. So we can remove `systemName` from `setupEndpointRef` since we are removing `AkkaRpcEnv`.

### Remove RpcEnv.uriOf

`uriOf` exists because Akka uses different URI formats for with and without authentication, e.g., `akka.ssl.tcp...` and `akka.tcp://...`. But `NettyRpcEnv` uses the same format. So it's not necessary after removing `AkkaRpcEnv`.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10459 from zsxwing/remove-akka-rpc-env.
2015-12-31 00:15:55 -08:00
Reynold Xin ee8f8d3184 [SPARK-12588] Remove HttpBroadcast in Spark 2.0.
We switched to TorrentBroadcast in Spark 1.1, and HttpBroadcast has been undocumented since then. It's time to remove it in Spark 2.0.

Author: Reynold Xin <rxin@databricks.com>

Closes #10531 from rxin/SPARK-12588.
2015-12-30 18:07:07 -08:00
Reynold Xin 27af6157f9 Revert "[SPARK-12362][SQL][WIP] Inline Hive Parser"
This reverts commit b600bccf41 due to non-deterministic build breaks.
2015-12-30 00:08:44 -08:00
Nong Li b600bccf41 [SPARK-12362][SQL][WIP] Inline Hive Parser
This is a WIP. The PR has been taken over from nongli (see https://github.com/apache/spark/pull/10420). I have removed some additional dead code, and fixed a few issues which were caused by the fact that the inlined Hive parser is newer than the Hive parser we currently use in Spark.

I am submitting this PR in order to get some feedback and testing done. There is quite a bit of work to do:
- [ ] Get it to pass jenkins build/test.
- [ ] Aknowledge Hive-project for using their parser.
- [ ] Refactorings between HiveQl and the java classes.
  - [ ] Create our own ASTNode and integrate the current implicit extentions.
  - [ ] Move remaining ```SemanticAnalyzer``` and ```ParseUtils``` functionality to ```HiveQl```.
- [ ] Removing Hive dependencies from the parser. This will require some edits in the grammar files.
  - [ ] Introduce our own context which needs to contain a ```TokenRewriteStream```.
  - [ ] Add ```useSQL11ReservedKeywordsForIdentifier``` and ```allowQuotedId``` to the catalyst or sql configuration.
  - [ ] Remove ```HiveConf``` from grammar files &HiveQl, and pass in our own configuration.
- [ ] Moving the parser into sql/core.

cc nongli rxin

Author: Herman van Hovell <hvanhovell@questtec.nl>
Author: Nong Li <nong@databricks.com>
Author: Nong Li <nongli@gmail.com>

Closes #10509 from hvanhovell/SPARK-12362.
2015-12-29 18:47:41 -08:00
Reynold Xin a820ca19de [SPARK-2331] SparkContext.emptyRDD should return RDD[T] not EmptyRDD[T]
Author: Reynold Xin <rxin@databricks.com>

Closes #10394 from rxin/SPARK-2331.
2015-12-21 14:07:48 -08:00
Reynold Xin 284e29a870 [SPARK-11808] Remove Bagel.
Author: Reynold Xin <rxin@databricks.com>

Closes #10395 from rxin/SPARK-11808.
2015-12-19 22:40:35 -08:00
Reynold Xin f496031bd2 Bump master version to 2.0.0-SNAPSHOT.
Author: Reynold Xin <rxin@databricks.com>

Closes #10387 from rxin/version-bump.
2015-12-19 15:13:05 -08:00
Sean Owen 21b3d2a75f [SPARK-11530][MLLIB] Return eigenvalues with PCA model
Add `computePrincipalComponentsAndVariance` to also compute PCA's explained variance.

CC mengxr

Author: Sean Owen <sowen@cloudera.com>

Closes #9736 from srowen/SPARK-11530.
2015-12-10 14:05:45 +00:00
Xin Ren 6cb06e8711 [SPARK-11155][WEB UI] Stage summary json should include stage duration
The json endpoint for stages doesn't include information on the stage duration that is present in the UI. This looks like a simple oversight, they should be included. eg., the metrics should be included at api/v1/applications/<appId>/stages.

Metrics I've added are: submissionTime, firstTaskLaunchedTime and completionTime

Author: Xin Ren <iamshrek@126.com>

Closes #10107 from keypointt/SPARK-11155.
2015-12-08 11:46:46 -06:00
Josh Rosen b7204e1d41 [SPARK-12112][BUILD] Upgrade to SBT 0.13.9
We should upgrade to SBT 0.13.9, since this is a requirement in order to use SBT's new Maven-style resolution features (which will be done in a separate patch, because it's blocked by some binary compatibility issues in the POM reader plugin).

I also upgraded Scalastyle to version 0.8.0, which was necessary in order to fix a Scala 2.10.5 compatibility issue (see https://github.com/scalastyle/scalastyle/issues/156). The newer Scalastyle is slightly stricter about whitespace surrounding tokens, so I fixed the new style violations.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10112 from JoshRosen/upgrade-to-sbt-0.13.9.
2015-12-05 08:15:30 +08:00
Marcelo Vanzin d64806b373 [SPARK-11314][BUILD][HOTFIX] Add exclusion for moved YARN classes.
Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #10147 from vanzin/SPARK-11314.
2015-12-04 13:05:07 -08:00
Jeroen Schot 128c29035b [SPARK-3580][CORE] Add Consistent Method To Get Number of RDD Partitions Across Different Languages
I have tried to address all the comments in pull request https://github.com/apache/spark/pull/2447.

Note that the second commit (using the new method in all internal code of all components) is quite intrusive and could be omitted.

Author: Jeroen Schot <jeroen.schot@surfsara.nl>

Closes #9767 from schot/master.
2015-12-02 09:40:07 +00:00
Cheng Lian 69dbe6b40d [SPARK-12046][DOC] Fixes various ScalaDoc/JavaDoc issues
This PR backports PR #10039 to master

Author: Cheng Lian <lian@databricks.com>

Closes #10063 from liancheng/spark-12046.doc-fix.master.
2015-12-01 10:21:31 -08:00
Josh Rosen d3ca8cfac2 [SPARK-12000] Fix API doc generation issues
This pull request fixes multiple issues with API doc generation.

- Modify the Jekyll plugin so that the entire doc build fails if API docs cannot be generated. This will make it easy to detect when the doc build breaks, since this will now trigger Jenkins failures.
- Change how we handle the `-target` compiler option flag in order to fix `javadoc` generation.
- Incorporate doc changes from thunterdb (in #10048).

Closes #10048.

Author: Josh Rosen <joshrosen@databricks.com>
Author: Timothy Hunter <timhunter@databricks.com>

Closes #10049 from JoshRosen/fix-doc-build.
2015-11-30 16:37:27 -08:00
Prashant Sharma 953e8e6dcb [MINOR][BUILD] Changed the comment to reflect the plugin project is there to support SBT pom reader only.
Author: Prashant Sharma <scrapcodes@gmail.com>

Closes #10012 from ScrapCodes/minor-build-comment.
2015-11-30 09:30:58 +00:00
Shixiong Zhu 0c1e72e7f7 [SPARK-11996][CORE] Make the executor thread dump work again
In the previous implementation, the driver needs to know the executor listening address to send the thread dump request. However, in Netty RPC, the executor doesn't listen to any port, so the executor thread dump feature is broken.

This patch makes the driver use the endpointRef stored in BlockManagerMasterEndpoint to send the thread dump request to fix it.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #9976 from zsxwing/executor-thread-dump.
2015-11-26 18:56:22 -08:00
Reynold Xin 4d6bbbc03d [SPARK-11947][SQL] Mark deprecated methods with "This will be removed in Spark 2.0."
Also fixed some documentation as I saw them.

Author: Reynold Xin <rxin@databricks.com>

Closes #9930 from rxin/SPARK-11947.
2015-11-24 18:58:55 -08:00
Josh Rosen 1b6e938be8 [SPARK-4424] Remove spark.driver.allowMultipleContexts override in tests
This patch removes `spark.driver.allowMultipleContexts=true` from our test configuration. The multiple SparkContexts check was originally disabled because certain tests suites in SQL needed to create multiple contexts. As far as I know, this configuration change is no longer necessary, so we should remove it in order to make it easier to find test cleanup bugs.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #9865 from JoshRosen/SPARK-4424.
2015-11-23 13:19:10 -08:00
Bryan Cutler 31921e0f0b [SPARK-4557][STREAMING] Spark Streaming foreachRDD Java API method should accept a VoidFunction<...>
Currently streaming foreachRDD Java API uses a function prototype requiring a return value of null.  This PR deprecates the old method and uses VoidFunction to allow for more concise declaration.  Also added VoidFunction2 to Java API in order to use in Streaming methods.  Unit test is added for using foreachRDD with VoidFunction, and changes have been tested with Java 7 and Java 8 using lambdas.

Author: Bryan Cutler <bjcutler@us.ibm.com>

Closes #9488 from BryanCutler/foreachRDD-VoidFunction-SPARK-4557.
2015-11-18 12:09:54 -08:00
jerryshao 75a2922910 [SPARK-9065][STREAMING][PYSPARK] Add MessageHandler for Kafka Python API
Fixed the merge conflicts in #7410

Closes #7410

Author: Shixiong Zhu <shixiong@databricks.com>
Author: jerryshao <saisai.shao@intel.com>
Author: jerryshao <sshao@hortonworks.com>

Closes #9742 from zsxwing/pr7410.
2015-11-17 16:57:52 -08:00
Timothy Hunter fa603e08de [SPARK-11732] Removes some MiMa false positives
This adds an extra filter for private or protected classes. We only filter for package private right now.

Author: Timothy Hunter <timhunter@databricks.com>

Closes #9697 from thunterdb/spark-11732.
2015-11-17 20:51:20 +00:00
Xiangrui Meng 21fac54341 [SPARK-11766][MLLIB] add toJson/fromJson to Vector/Vectors
This is to support JSON serialization of Param[Vector] in the pipeline API. It could be used for other purposes too. The schema is the same as `VectorUDT`. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #9751 from mengxr/SPARK-11766.
2015-11-17 10:17:16 -08:00
jerryshao 08660a0bc9 [BUILD][MINOR] Remove non-exist yarnStable module in Sbt project
Remove some old yarn related building codes, please review, thanks a lot.

Author: jerryshao <sshao@hortonworks.com>

Closes #9625 from jerryshao/remove-old-module.
2015-11-12 17:23:24 +01:00
Josh Rosen 529a1d3380 [SPARK-6152] Use shaded ASM5 to support closure cleaning of Java 8 compiled classes
This patch modifies Spark's closure cleaner (and a few other places) to use ASM 5, which is necessary in order to support cleaning of closures that were compiled by Java 8.

In order to avoid ASM dependency conflicts, Spark excludes ASM from all of its dependencies and uses a shaded version of ASM 4 that comes from `reflectasm` (see [SPARK-782](https://issues.apache.org/jira/browse/SPARK-782) and #232). This patch updates Spark to use a shaded version of ASM 5.0.4 that was published by the Apache XBean project; the POM used to create the shaded artifact can be found at https://github.com/apache/geronimo-xbean/blob/xbean-4.4/xbean-asm5-shaded/pom.xml.

http://movingfulcrum.tumblr.com/post/80826553604/asm-framework-50-the-missing-migration-guide was a useful resource while upgrading the code to use the new ASM5 opcodes.

I also added a new regression tests in the `java8-tests` subproject; the existing tests were insufficient to catch this bug, which only affected Scala 2.11 user code which was compiled targeting Java 8.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #9512 from JoshRosen/SPARK-6152.
2015-11-11 11:16:39 -08:00
Josh Rosen 1dde39d796 [SPARK-9818] Re-enable Docker tests for JDBC data source
This patch re-enables tests for the Docker JDBC data source. These tests were reverted in #4872 due to transitive dependency conflicts introduced by the `docker-client` library. This patch should avoid those problems by using a version of `docker-client` which shades its transitive dependencies and by performing some build-magic to work around problems with that shaded JAR.

In addition, I significantly refactored the tests to simplify the setup and teardown code and to fix several Docker networking issues which caused problems when running in `boot2docker`.

Closes #8101.

Author: Josh Rosen <joshrosen@databricks.com>
Author: Yijie Shen <henry.yijieshen@gmail.com>

Closes #9503 from JoshRosen/docker-jdbc-tests.
2015-11-10 15:58:30 -08:00
Josh Rosen 689386b1c6 [SPARK-7841][BUILD] Stop using retrieveManaged to retrieve dependencies in SBT
This patch modifies Spark's SBT build so that it no longer uses `retrieveManaged` / `lib_managed` to store its dependencies. The motivations for this change are nicely described on the JIRA ticket ([SPARK-7841](https://issues.apache.org/jira/browse/SPARK-7841)); my personal interest in doing this stems from the fact that `lib_managed` has caused me some pain while debugging dependency issues in another PR of mine.

Removing our use of `lib_managed` would be trivial except for one snag: the Datanucleus JARs, required by Spark SQL's Hive integration, cannot be included in assembly JARs due to problems with merging OSGI `plugin.xml` files. As a result, several places in the packaging and deployment pipeline assume that these Datanucleus JARs are copied to `lib_managed/jars`. In the interest of maintaining compatibility, I have chosen to retain the `lib_managed/jars` directory _only_ for these Datanucleus JARs and have added custom code to `SparkBuild.scala` to automatically copy those JARs to that folder as part of the `assembly` task.

`dev/mima` also depended on `lib_managed` in a hacky way in order to set classpaths when generating MiMa excludes; I've updated this to obtain the classpaths directly from SBT instead.

/cc dragos marmbrus pwendell srowen

Author: Josh Rosen <joshrosen@databricks.com>

Closes #9575 from JoshRosen/SPARK-7841.
2015-11-10 10:14:19 -08:00
Charles Yeh 08a7a836c3 [SPARK-10565][CORE] add missing web UI stats to /api/v1/applications JSON
I looked at the other endpoints, and they don't seem to be missing any fields.
Added fields:
![image](https://cloud.githubusercontent.com/assets/613879/10948801/58159982-82e4-11e5-86dc-62da201af910.png)

Author: Charles Yeh <charlesyeh@dropbox.com>

Closes #9472 from CharlesYeh/api_vars.
2015-11-09 11:59:32 -06:00
Reynold Xin bc5d6c0389 [SPARK-11541][SQL] Break JdbcDialects.scala into multiple files and mark various dialects as private.
Author: Reynold Xin <rxin@databricks.com>

Closes #9511 from rxin/SPARK-11541.
2015-11-05 22:03:26 -08:00