PR #3737 changed `spark-ec2` to automatically download boto from PyPI. This PR tell git to ignore those downloaded library files.
Author: Nicholas Chammas <nicholas.chammas@gmail.com>
Closes#3770 from nchammas/ignore-ec2-lib and squashes the following commits:
5c440d3 [Nicholas Chammas] gitignore downloaded EC2 libs
This commit involves three main changes:
(1) It separates the translation of contributor names from the
generation of the contributors list. This is largely motivated
by the Github API limit; even if we exceed this limit, we should
at least be able to proceed manually as before. This is why the
translation logic is abstracted into its own script
translate-contributors.py.
(2) When we look for candidate replacements for invalid author
names, we should look for the assignees of the associated JIRAs
too. As a result, the intermediate file must keep track of these.
(3) This provides an interactive mode with which the user can
sit at the terminal and manually pick the candidate replacement
that he/she thinks makes the most sense. As before, there is a
non-interactive mode that picks the first candidate that the
script considers "valid."
TODO: We should have a known_contributors file that stores
known mappings so we don't have to go through all of this
translation every time. This is also valuable because some
contributors simply cannot be automatically translated.
In .gitignore, there is an entry for spark-*-bin.tar.gz but considering make-distribution.sh, the name pattern should be spark-*-bin-*.tgz.
This change is really small so I don't open issue in JIRA. If it's needed, please let me know.
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes#3529 from sarutak/fix-wrong-tgz-pattern and squashes the following commits:
de3c70a [Kousuke Saruta] Fixed wrong file name pattern in .gitignore
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes#2444 from sarutak/slaves-scripts-modification and squashes the following commits:
eff7394 [Kousuke Saruta] Improve the description about Cluster Launch Script in docs/spark-standalone.md
7858225 [Kousuke Saruta] Modified sbin/slaves to use the environment variable "SPARK_SSH_FOREGROUND" as a flag
53d7121 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into slaves-scripts-modification
e570431 [Kousuke Saruta] Added a description for SPARK_SSH_FOREGROUND variable
7120a0c [Kousuke Saruta] Added a description about default host for sbin/slaves
1bba8a9 [Kousuke Saruta] Added SPARK_SSH_FOREGROUND flag to sbin/slaves
88e2f17 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into slaves-scripts-modification
297e75d [Kousuke Saruta] Modified sbin/slaves not to export HOSTLIST
ignore .idea_modules , ```sbt/sbt gen-idea``` generate this dir.
Author: wangfei <wangfei1@huawei.com>
Closes#2476 from scwf/patch-4 and squashes the following commits:
e6ab88a [wangfei] ignore .idea_modules
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes#2426 from sarutak/emacs-metafiles-ignore and squashes the following commits:
a306020 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into emacs-metafiles-ignore
6a0a5eb [Kousuke Saruta] Added cmd file entry to .rat-excludes and .gitignore
897da63 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into emacs-metafiles-ignore
8cade06 [Kousuke Saruta] Modified .gitignore to ignore emacs lock file and backup file
Some config files in ```conf``` should ignore, such as
conf/fairscheduler.xml
conf/hive-log4j.properties
conf/metrics.properties
...
So ignore all ```sh```/```properties```/```conf```/```xml``` files
Author: wangfei <wangfei1@huawei.com>
Closes#2395 from scwf/patch-2 and squashes the following commits:
3dc53f2 [wangfei] duplicate ```conf/*.conf```
3c2986f [wangfei] ignore all config files
JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
Another try for #1399 & #1600. Those two PR breaks Jenkins builds because we made a separate profile `hive-thriftserver` in sub-project `assembly`, but the `hive-thriftserver` module is defined outside the `hive-thriftserver` profile. Thus every time a pull request that doesn't touch SQL code will also execute test suites defined in `hive-thriftserver`, but tests fail because related .class files are not included in the assembly jar.
In the most recent commit, module `hive-thriftserver` is moved into its own profile to fix this problem. All previous commits are squashed for clarity.
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes#1620 from liancheng/jdbc-with-maven-fix and squashes the following commits:
629988e [Cheng Lian] Moved hive-thriftserver module definition into its own profile
ec3c7a7 [Cheng Lian] Cherry picked the Hive Thrift server
Can be run as: "mvn scalastyle:check"
Author: Rahul Singhal <rahul.singhal@guavus.com>
Closes#1550 from rahulsinghaliitd/SPARK-2651 and squashes the following commits:
53748dd [Rahul Singhal] SPARK-2651: Add maven scalastyle plugin
(This is a replacement of #1399, trying to fix potential `HiveThriftServer2` port collision between parallel builds. Please refer to [these comments](https://github.com/apache/spark/pull/1399#issuecomment-50212572) for details.)
JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
Merging the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes#1600 from liancheng/jdbc and squashes the following commits:
ac4618b [Cheng Lian] Uses random port for HiveThriftServer2 to avoid collision with parallel builds
090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
1083e9d [Cheng Lian] Fixed failed test suites
7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
9cc0f06 [Cheng Lian] Starts beeline with spark-submit
cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
061880f [Cheng Lian] Addressed all comments by @pwendell
7755062 [Cheng Lian] Adapts test suites to spark-submit settings
40bafef [Cheng Lian] Fixed more license header issues
e214aab [Cheng Lian] Added missing license headers
b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
This reverts commit 06dc0d2c6b.
#1399 is making Jenkins fail. We should investigate and put this back after its passing tests.
Author: Michael Armbrust <michael@databricks.com>
Closes#1594 from marmbrus/revertJDBC and squashes the following commits:
59748da [Michael Armbrust] Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server"
JIRA issue:
- Main: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
- Related: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678)
Cherry picked the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
(Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.)
TODO
- [x] Use `spark-submit` to launch the server, the CLI and beeline
- [x] Migration guideline draft for Shark users
----
Hit by a bug in `SparkSubmitArguments` while working on this PR: all application options that are recognized by `SparkSubmitArguments` are stolen as `SparkSubmit` options. For example:
```bash
$ spark-submit --class org.apache.hive.beeline.BeeLine spark-internal --help
```
This actually shows usage information of `SparkSubmit` rather than `BeeLine`.
~~Fixed this bug here since the `spark-internal` related stuff also touches `SparkSubmitArguments` and I'd like to avoid conflict.~~
**UPDATE** The bug mentioned above is now tracked by [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678). Decided to revert changes to this bug since it involves more subtle considerations and worth a separate PR.
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes#1399 from liancheng/thriftserver and squashes the following commits:
090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
1083e9d [Cheng Lian] Fixed failed test suites
7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
9cc0f06 [Cheng Lian] Starts beeline with spark-submit
cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
061880f [Cheng Lian] Addressed all comments by @pwendell
7755062 [Cheng Lian] Adapts test suites to spark-submit settings
40bafef [Cheng Lian] Fixed more license header issues
e214aab [Cheng Lian] Added missing license headers
b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
Fixes SPARK 2070 and 2071
Author: Prashant Sharma <prashant.s@imaginea.com>
Closes#1021 from ScrapCodes/SPARK-2070/package-private-methods and squashes the following commits:
7979a57 [Prashant Sharma] addressed code review comments
558546d [Prashant Sharma] A little fancy error message.
59275ab [Prashant Sharma] SPARK-2071 Mima ignores classes and its members from previous versions too.
0c4ff2b [Prashant Sharma] SPARK-2070 Ignore methods along with annotated classes.
This patch does a few things:
1. We have a file MimaExcludes.scala exclusively for excludes.
2. The test runner tells users about that file if a test fails.
3. I've added back the excludes used from 0.9->1.0. We should keep
these in the project as an official audit trail of times where
we decided to make exceptions.
Author: Patrick Wendell <pwendell@gmail.com>
Closes#937 from pwendell/mima and squashes the following commits:
7ee0db2 [Patrick Wendell] Better explanation for how to use MIMA excludes.
Commit for initial feedback, basically I am curious if we should prompt user for providing args esp. when its mandatory. And can we skip if they are not ?
Also few other things that did not work like
`bin/spark-submit examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop1.0.4.jar --class org.apache.spark.examples.SparkALS --arg 100 500 10 5 2`
Not all the args get passed properly, may be I have messed up something will try to sort it out hopefully.
Author: Prashant Sharma <prashant.s@imaginea.com>
Closes#552 from ScrapCodes/SPARK-1565/update-examples and squashes the following commits:
669dd23 [Prashant Sharma] Review comments
2727e70 [Prashant Sharma] SPARK-1565, update examples to be used with spark-submit script.
This is ready when Jenkins is.
Author: Michael Armbrust <michael@databricks.com>
Closes#596 from marmbrus/moreTests and squashes the following commits:
85be703 [Michael Armbrust] Blacklist MR required tests.
35bc311 [Michael Armbrust] Add hive golden answers.
ede98fd [Michael Armbrust] More hive gitignore
da096ea [Michael Armbrust] update whitelist
The JIRA in question is actually reporting a bug with Shark, but I wanted to make sure Spark SQL did not have similar problems. This fixes a bug in our parsing code that was preventing the test from executing, but it looks like the RegexSerDe is working in Spark SQL.
Author: Michael Armbrust <michael@databricks.com>
Closes#595 from marmbrus/fixRegexSerdeTest and squashes the following commits:
a4dc612 [Michael Armbrust] Add files created by hive to gitignore.
efa6402 [Michael Armbrust] Fix Hive serde_regex test.
This simplifies the shell a bunch and passes all arguments through to spark-submit.
There is a tiny incompatibility from 0.9.1 which is that you can't put `-c` _or_ `--cores`, only `--cores`. However, spark-submit will give a good error message in this case, I don't think many people used this, and it's a trivial change for users.
Author: Patrick Wendell <pwendell@gmail.com>
Closes#542 from pwendell/spark-shell and squashes the following commits:
9eb3e6f [Patrick Wendell] Updating Spark docs
b552459 [Patrick Wendell] Andrew's feedback
97720fa [Patrick Wendell] Review feedback
aa2900b [Patrick Wendell] SPARK-1619 Launch spark-shell with spark-submit
Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Prashant Sharma <scrapcodes@gmail.com>
Closes#262 from ScrapCodes/SPARK-1336/ReduceVerbosity and squashes the following commits:
87dfa54 [Prashant Sharma] Further reduction in noise and made pyspark tests to fail fast.
811170f [Prashant Sharma] Reducing the ouput of run-tests script.
This adds some changes on top of the initial work by @scrapcodes in #20:
The goal here is to do automated checking of Spark commits to determine whether they break binary compatibility.
1. Special case for inner classes of package-private objects.
2. Made tools classes accessible when running `spark-class`.
3. Made some declared types in MLLib more general.
4. Various other improvements to exclude-generation script.
5. In-code documentation.
Author: Patrick Wendell <pwendell@gmail.com>
Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Prashant Sharma <scrapcodes@gmail.com>
Closes#207 from pwendell/mima and squashes the following commits:
22ae267 [Patrick Wendell] New binary changes after upmerge
6c2030d [Patrick Wendell] Merge remote-tracking branch 'apache/master' into mima
3666cf1 [Patrick Wendell] Minor style change
0e0f570 [Patrick Wendell] Small fix and removing directory listings
647c547 [Patrick Wendell] Reveiw feedback.
c39f3b5 [Patrick Wendell] Some enhancements to binary checking.
4c771e0 [Prashant Sharma] Added a tool to generate mima excludes and also adapted build to pick automatically.
b551519 [Prashant Sharma] adding a new exclude after rebasing with master
651844c [Prashant Sharma] Support MiMa for reporting binary compatibility accross versions.
Author: Prashant Sharma <prashant.s@imaginea.com>
Closes#125 from ScrapCodes/rat-integration and squashes the following commits:
64f7c7d [Prashant Sharma] added license headers.
fcf28b1 [Prashant Sharma] Review feedback.
c0648db [Prashant Sharma] SPARK-1144 Added license and RAT to check licenses.
- Rework/expand the nav bar with more of the docs site
- Removing parts of docs about EC2 and Mesos that differentiate between
running 0.5 and before
- Merged subheadings from running-on-amazon-ec2.html that are still relevant
(i.e., "Using a newer version of Spark" and "Accessing Data in S3") into
ec2-scripts.html and deleted running-on-amazon-ec2.html
- Added some TODO comments to a few docs
- Updated the blurb about AMP Camp
- Renamed programming-guide to spark-programming-guide
- Fixing typos/etc. in Standalone Spark doc
which can be compiled via jekyll, using the command `jekyll`. To compile
and run a local webserver to serve the doc as a website, run
`jekyll --server`.