Commit graph

20 commits

Author SHA1 Message Date
Michael Armbrust 3a845d3c04 [SQL] Update Hive test harness for Hive 12 and 13
As part of the upgrade I also copy the newest version of the query tests, and whitelist a bunch of new ones that are now passing.

Author: Michael Armbrust <michael@databricks.com>

Closes #2936 from marmbrus/fix13tests and squashes the following commits:

d9cbdab [Michael Armbrust] Remove user specific tests
65801cd [Michael Armbrust] style and rat
8f6b09a [Michael Armbrust] Update test harness to work with both Hive 12 and 13.
f044843 [Michael Armbrust] Update Hive query tests and golden files to 0.13
2014-10-24 18:36:35 -07:00
Kousuke Saruta 0dc868e787 [SPARK-3584] sbin/slaves doesn't work when we use password authentication for SSH
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2444 from sarutak/slaves-scripts-modification and squashes the following commits:

eff7394 [Kousuke Saruta] Improve the description about Cluster Launch Script in docs/spark-standalone.md
7858225 [Kousuke Saruta] Modified sbin/slaves to use the environment variable "SPARK_SSH_FOREGROUND" as a flag
53d7121 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into slaves-scripts-modification
e570431 [Kousuke Saruta] Added a description for SPARK_SSH_FOREGROUND variable
7120a0c [Kousuke Saruta] Added a description about default host for sbin/slaves
1bba8a9 [Kousuke Saruta] Added SPARK_SSH_FOREGROUND flag to sbin/slaves
88e2f17 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into slaves-scripts-modification
297e75d [Kousuke Saruta] Modified sbin/slaves not to export HOSTLIST
2014-09-25 16:49:15 -07:00
Kousuke Saruta 6cab838b98 [SPARK-3566] [BUILD] .gitignore and .rat-excludes should consider Windows cmd file and Emacs' backup files
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2426 from sarutak/emacs-metafiles-ignore and squashes the following commits:

a306020 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into emacs-metafiles-ignore
6a0a5eb [Kousuke Saruta] Added cmd file entry to .rat-excludes and .gitignore
897da63 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into emacs-metafiles-ignore
8cade06 [Kousuke Saruta] Modified .gitignore to ignore emacs lock file and backup file
2014-09-18 12:04:32 -07:00
GuoQiang Li 6772afec2f [Minor] rat exclude dependency-reduced-pom.xml
Author: GuoQiang Li <witgo@qq.com>

Closes #2326 from witgo/rat-excludes and squashes the following commits:

860904e [GuoQiang Li] rat exclude dependency-reduced-pom.xml
2014-09-17 22:54:34 -07:00
Davies Liu f1e71d4c3b [SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey()
Using external sort to support sort large datasets in reduce stage.

Author: Davies Liu <davies.liu@gmail.com>

Closes #1978 from davies/sort and squashes the following commits:

bbcd9ba [Davies Liu] check spilled bytes in tests
b125d2f [Davies Liu] add test for external sort in rdd
eae0176 [Davies Liu] choose different disks from different processes and instances
1f075ed [Davies Liu] Merge branch 'master' into sort
eb53ca6 [Davies Liu] Merge branch 'master' into sort
644abaf [Davies Liu] add license in LICENSE
19f7873 [Davies Liu] improve tests
55602ee [Davies Liu] use external sort in sortBy() and sortByKey()
2014-08-26 16:57:40 -07:00
Kan Zhang 9422a9b084 [SPARK-2736] PySpark converter and example script for reading Avro files
JIRA: https://issues.apache.org/jira/browse/SPARK-2736

This patch includes:
1. An Avro converter that converts Avro data types to Python. It handles all 3 Avro data mappings (Generic, Specific and Reflect).
2. An example Python script for reading Avro files using AvroKeyInputFormat and the converter.
3. Fixing a classloading issue.

cc @MLnick @JoshRosen @mateiz

Author: Kan Zhang <kzhang@apache.org>

Closes #1916 from kanzhang/SPARK-2736 and squashes the following commits:

02443f8 [Kan Zhang] [SPARK-2736] Adding .avsc files to .rat-excludes
f74e9a9 [Kan Zhang] [SPARK-2736] nit: clazz -> className
82cc505 [Kan Zhang] [SPARK-2736] Update data sample
0be7761 [Kan Zhang] [SPARK-2736] Example pyspark script and data files
c8e5881 [Kan Zhang] [SPARK-2736] Trying to work with all 3 Avro data models
2271a5b [Kan Zhang] [SPARK-2736] Using the right class loader to find Avro classes
536876b [Kan Zhang] [SPARK-2736] Adding Avro to Java converter
2014-08-14 19:03:51 -07:00
GuoQiang Li a38d3c9efc [SPARK-2800]: Exclude scalastyle-output.xml Apache RAT checks
Author: GuoQiang Li <witgo@qq.com>

Closes #1729 from witgo/SPARK-2800 and squashes the following commits:

13ca966 [GuoQiang Li] Add scalastyle-output.xml  to .rat-excludes file
2014-08-01 19:35:16 -07:00
Kay Ousterhout 339441f545 [SPARK-2384] Add tooltips to UI.
This patch adds tooltips to clarify some points of confusion in the UI.  When users mouse over some of the table headers (shuffle read, write, and input size) as well as over the "scheduler delay" metric shown for each stage, a black tool tip (see image below) pops up describing the metric in more detail.  After the tooltip mechanism is added by this commit, I imagine others may want to add more tooltips for other things in the UI, but I think this is a good starting point.

![tooltip](https://cloud.githubusercontent.com/assets/1108612/3491905/994e179e-059f-11e4-92f2-c6c12d248d81.jpg)

This looks scary-big but much of it is adding the bootstrap tool tip JavaScript.

Also I have no idea what to put for the license in tooltip (I left it the same -- the Twitter apache header) or for JQuery (left it as nothing) -- @mateiz what's the right thing here?

cc @pwendell @andrewor14 @rxin

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #1314 from kayousterhout/tooltips and squashes the following commits:

19981b5 [Kay Ousterhout] Exclude non-licensed javascript files from style check
d9ab5a9 [Kay Ousterhout] Response to Andrew's review
7752449 [Kay Ousterhout] [SPARK-2384] Add tooltips to UI.
2014-07-08 22:57:21 -07:00
Patrick Wendell 3e0b078001 HOTFIX: Add excludes for new MIMA files 2014-06-21 15:20:15 -07:00
Yin Huai d2f4f30b12 [SPARK-2060][SQL] Querying JSON Datasets with SQL and DSL in Spark SQL
JIRA: https://issues.apache.org/jira/browse/SPARK-2060

Programming guide: http://yhuai.github.io/site/sql-programming-guide.html

Scala doc of SQLContext: http://yhuai.github.io/site/api/scala/index.html#org.apache.spark.sql.SQLContext

Author: Yin Huai <huai@cse.ohio-state.edu>

Closes #999 from yhuai/newJson and squashes the following commits:

227e89e [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
ce8eedd [Yin Huai] rxin's comments.
bc9ac51 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
94ffdaa [Yin Huai] Remove "get" from method names.
ce31c81 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
e2773a6 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
79ea9ba [Yin Huai] Fix typos.
5428451 [Yin Huai] Newline
1f908ce [Yin Huai] Remove extra line.
d7a005c [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
7ea750e [Yin Huai] marmbrus's comments.
6a5f5ef [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
83013fb [Yin Huai] Update Java Example.
e7a6c19 [Yin Huai] SchemaRDD.javaToPython should convert a field with the StructType to a Map.
6d20b85 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
4fbddf0 [Yin Huai] Programming guide.
9df8c5a [Yin Huai] Python API.
7027634 [Yin Huai] Java API.
cff84cc [Yin Huai] Use a SchemaRDD for a JSON dataset.
d0bd412 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
ab810b0 [Yin Huai] Make JsonRDD private.
6df0891 [Yin Huai] Apache header.
8347f2e [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
66f9e76 [Yin Huai] Update docs and use the entire dataset to infer the schema.
8ffed79 [Yin Huai] Update the example.
a5a4b52 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
4325475 [Yin Huai] If a sampled dataset is used for schema inferring, update the schema of the JsonTable after first execution.
65b87f0 [Yin Huai] Fix sampling...
8846af5 [Yin Huai] API doc.
52a2275 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
0387523 [Yin Huai] Address PR comments.
666b957 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
a2313a6 [Yin Huai] Address PR comments.
f3ce176 [Yin Huai] After type conflict resolution, if a NullType is found, StringType is used.
0576406 [Yin Huai] Add Apache license header.
af91b23 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
f45583b [Yin Huai] Infer the schema of a JSON dataset (a text file with one JSON object per line or a RDD[String] with one JSON object per string) and returns a SchemaRDD.
f31065f [Yin Huai] A query plan or a SchemaRDD can print out its schema.
2014-06-17 19:14:59 -07:00
Patrick Wendell d17d221487 Better explanation for how to use MIMA excludes.
This patch does a few things:
1. We have a file MimaExcludes.scala exclusively for excludes.
2. The test runner tells users about that file if a test fails.
3. I've added back the excludes used from 0.9->1.0. We should keep
   these in the project as an official audit trail of times where
   we decided to make exceptions.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #937 from pwendell/mima and squashes the following commits:

7ee0db2 [Patrick Wendell] Better explanation for how to use MIMA excludes.
2014-06-01 17:27:05 -07:00
Andrew Ash 3abe2b734a SPARK-1846 Ignore logs directory in RAT checks
https://issues.apache.org/jira/browse/SPARK-1846

Author: Andrew Ash <andrew@andrewash.com>

Closes #793 from ash211/SPARK-1846 and squashes the following commits:

3f50db5 [Andrew Ash] SPARK-1846 Ignore logs directory in RAT checks
2014-05-15 11:05:39 -07:00
witgo fde82c1549 Fix: sbt test throw an java.lang.OutOfMemoryError: PermGen space
Author: witgo <witgo@qq.com>

Closes #773 from witgo/sbt_javaOptions and squashes the following commits:

26c7d38 [witgo] Improve sbt configuration
2014-05-14 11:19:26 -07:00
Patrick Wendell fb98488fc8 Clean up and simplify Spark configuration
Over time as we've added more deployment modes, this have gotten a bit unwieldy with user-facing configuration options in Spark. Going forward we'll advise all users to run `spark-submit` to launch applications. This is a WIP patch but it makes the following improvements:

1. Improved `spark-env.sh.template` which was missing a lot of things users now set in that file.
2. Removes the shipping of SPARK_CLASSPATH, SPARK_JAVA_OPTS, and SPARK_LIBRARY_PATH to the executors on the cluster. This was an ugly hack. Instead it introduces config variables spark.executor.extraJavaOpts, spark.executor.extraLibraryPath, and spark.executor.extraClassPath.
3. Adds ability to set these same variables for the driver using `spark-submit`.
4. Allows you to load system properties from a `spark-defaults.conf` file when running `spark-submit`. This will allow setting both SparkConf options and other system properties utilized by `spark-submit`.
5. Made `SPARK_LOCAL_IP` an environment variable rather than a SparkConf property. This is more consistent with it being set on each node.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #299 from pwendell/config-cleanup and squashes the following commits:

127f301 [Patrick Wendell] Improvements to testing
a006464 [Patrick Wendell] Moving properties file template.
b4b496c [Patrick Wendell] spark-defaults.properties -> spark-defaults.conf
0086939 [Patrick Wendell] Minor style fixes
af09e3e [Patrick Wendell] Mention config file in docs and clean-up docs
b16e6a2 [Patrick Wendell] Cleanup of spark-submit script and Scala quick start guide
af0adf7 [Patrick Wendell] Automatically add user jar
a56b125 [Patrick Wendell] Responses to Tom's review
d50c388 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
a762901 [Patrick Wendell] Fixing test failures
ffa00fe [Patrick Wendell] Review feedback
fda0301 [Patrick Wendell] Note
308f1f6 [Patrick Wendell] Properly escape quotes and other clean-up for YARN
e83cd8f [Patrick Wendell] Changes to allow re-use of test applications
be42f35 [Patrick Wendell] Handle case where SPARK_HOME is not set
c2a2909 [Patrick Wendell] Test compile fixes
4ee6f9d [Patrick Wendell] Making YARN doc changes consistent
afc9ed8 [Patrick Wendell] Cleaning up line limits and two compile errors.
b08893b [Patrick Wendell] Additional improvements.
ace4ead [Patrick Wendell] Responses to review feedback.
b72d183 [Patrick Wendell] Review feedback for spark env file
46555c1 [Patrick Wendell] Review feedback and import clean-ups
437aed1 [Patrick Wendell] Small fix
761ebcd [Patrick Wendell] Library path and classpath for drivers
7cc70e4 [Patrick Wendell] Clean up terminology inside of spark-env script
5b0ba8e [Patrick Wendell] Don't ship executor envs
84cc5e5 [Patrick Wendell] Small clean-up
1f75238 [Patrick Wendell] SPARK_JAVA_OPTS --> SPARK_MASTER_OPTS for master settings
4982331 [Patrick Wendell] Remove SPARK_LIBRARY_PATH
6eaf7d0 [Patrick Wendell] executorJavaOpts
0faa3b6 [Patrick Wendell] Stash of adding config options in submit script and YARN
ac2d65e [Patrick Wendell] Change spark.local.dir -> SPARK_LOCAL_DIRS
2014-04-21 10:26:33 -07:00
Patrick Wendell 268b53567c HOTFIX: Use file name and not paths for excludes 2014-04-14 15:51:54 -07:00
Patrick Wendell 6a0f8e35ce HOTFIX: Ignore python metastore files in RAT checks.
This was causing some errors with pull request tests.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #393 from pwendell/hotfix and squashes the following commits:

6201dd3 [Patrick Wendell] HOTFIX: Ignore python metastore files in RAT checks.
2014-04-11 13:23:21 -07:00
Holden Karau fa0524fd02 Spark-939: allow user jars to take precedence over spark jars
I still need to do a small bit of re-factoring [mostly the one Java file I'll switch it back to a Scala file and use it in both the close loaders], but comments on other things I should do would be great.

Author: Holden Karau <holden@pigscanfly.ca>

Closes #217 from holdenk/spark-939-allow-user-jars-to-take-precedence-over-spark-jars and squashes the following commits:

cf0cac9 [Holden Karau] Fix the executorclassloader
1955232 [Holden Karau] Fix long line in TestUtils
8f89965 [Holden Karau] Fix tests for new class name
7546549 [Holden Karau] CR feedback, merge some of the testutils methods down, rename the classloader
644719f [Holden Karau] User the class generator for the repl class loader tests too
f0b7114 [Holden Karau] Fix the core/src/test/scala/org/apache/spark/executor/ExecutorURLClassLoaderSuite.scala tests
204b199 [Holden Karau] Fix the generated classes
9f68f10 [Holden Karau] Start rewriting the ExecutorURLClassLoaderSuite to not use the hard coded classes
858aba2 [Holden Karau] Remove a bunch of test junk
261aaee [Holden Karau] simplify executorurlclassloader a bit
7a7bf5f [Holden Karau] CR feedback
d4ae848 [Holden Karau] rewrite component into scala
aa95083 [Holden Karau] CR feedback
7752594 [Holden Karau] re-add https comment
a0ef85a [Holden Karau] Fix style issues
125ea7f [Holden Karau] Easier to just remove those files, we don't need them
bb8d179 [Holden Karau] Fix issues with the repl class loader
241b03d [Holden Karau] fix my rat excludes
a343350 [Holden Karau] Update rat-excludes and remove a useless file
d90d217 [Holden Karau] Fix fall back with custom class loader and add a test for it
4919bf9 [Holden Karau] Fix parent calling class loader issue
8a67302 [Holden Karau] Test are good
9e2d236 [Holden Karau] It works comrade
691ee00 [Holden Karau] It works ish
dc4fe44 [Holden Karau] Does not depend on being in my home directory
47046ff [Holden Karau] Remove bad import'
22d83cb [Holden Karau] Add a test suite for the executor url class loader suite
7ef4628 [Holden Karau] Clean up
792d961 [Holden Karau] Almost works
16aecd1 [Holden Karau] Doesn't quite work
8d2241e [Holden Karau] Adda FakeClass for testing ClassLoader precedence options
648b559 [Holden Karau] Both class loaders compile. Now for testing
e1d9f71 [Holden Karau] One loader workers.
2014-04-08 22:30:03 -07:00
Prashant Sharma 7c18428fac HOTFIX for broken CI, by SPARK-1336
Learnt about `set -o pipefail` is very useful.

Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Prashant Sharma <scrapcodes@gmail.com>

Closes #321 from ScrapCodes/hf-SPARK-1336 and squashes the following commits:

9d22bc2 [Prashant Sharma] added comment why echo -e q exists.
f865951 [Prashant Sharma] made error to match with word boundry so errors does not match. This is there to make sure build fails if provided SparkBuild has compile errors.
7fffdf2 [Prashant Sharma] Removed a stray line.
97379d8 [Prashant Sharma] HOTFIX for broken CI, by SPARK-1336
2014-04-04 22:49:19 -07:00
Patrick Wendell 56db8a2f05 HOT FIX: Exclude test files from RAT 2014-03-24 13:38:28 -07:00
Prashant Sharma 21109fbab0 SPARK-1144 Added license and RAT to check licenses.
Author: Prashant Sharma <prashant.s@imaginea.com>

Closes #125 from ScrapCodes/rat-integration and squashes the following commits:

64f7c7d [Prashant Sharma] added license headers.
fcf28b1 [Prashant Sharma] Review feedback.
c0648db [Prashant Sharma] SPARK-1144 Added license and RAT to check licenses.
2014-03-24 08:44:20 -07:00