Commit graph

134 commits

Author SHA1 Message Date
Patrick Wendell c3f5e07533 SPARK-1121: Include avro for yarn-alpha builds
This lets us explicitly include Avro based on a profile for 0.23.X
builds. It makes me sad how convoluted it is to express this logic
in Maven. @tgraves and @sryza curious if this works for you.

I'm also considering just reverting to how it was before. The only
real problem was that Spark advertised a dependency on Avro
even though it only really depends transitively on Avro through
other deps.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #49 from pwendell/avro-build-fix and squashes the following commits:

8d6ee92 [Patrick Wendell] SPARK-1121: Add avro to yarn-alpha profile
2014-03-02 15:18:19 -08:00
Patrick Wendell 1fd2bfd3dd Remove remaining references to incubation
This removes some loose ends not caught by the other (incubating -> tlp) patches. @markhamstra this updates the version as you mentioned earlier.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #51 from pwendell/tlp and squashes the following commits:

d553b1b [Patrick Wendell] Remove remaining references to incubation
2014-03-02 01:00:16 -08:00
Sean Owen 12bbca2065 SPARK 1084.1 (resubmitted)
(Ported from https://github.com/apache/incubator-spark/pull/637 )

Author: Sean Owen <sowen@cloudera.com>

Closes #31 from srowen/SPARK-1084.1 and squashes the following commits:

6c4a32c [Sean Owen] Suppress warnings about legitimate unchecked array creations, or change code to avoid it
f35b833 [Sean Owen] Fix two misc javadoc problems
254e8ef [Sean Owen] Fix one new style error introduced in scaladoc warning commit
5b2fce2 [Sean Owen] Fix scaladoc invocation warning, and enable javac warnings properly, with plugin config updates
007762b [Sean Owen] Remove dead scaladoc links
b8ff8cb [Sean Owen] Replace deprecated Ant <tasks> with <target>
2014-02-27 11:12:21 -08:00
CodingCat 345df5f4a9 [SPARK-1089] fix the regression problem on ADD_JARS in 0.9
https://spark-project.atlassian.net/browse/SPARK-1089

copied from JIRA, reported by @ash211

"Using the ADD_JARS environment variable with spark-shell used to add the jar to both the shell and the various workers. Now it only adds to the workers and importing a custom class in the shell is broken.
The workaround is to add custom jars to both ADD_JARS and SPARK_CLASSPATH.
We should fix ADD_JARS so it works properly again.
See various threads on the user list:
https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201402.mbox/%3CCAJbo4neMLiTrnm1XbyqomWmp0m+EUcg4yE-txuRGSVKOb5KLeA@mail.gmail.com%3E
(another one that doesn't appear in the archives yet titled "ADD_JARS not working on 0.9")"

The reason of this bug is two-folds

in the current implementation of SparkILoop.scala, the settings.classpath is not set properly when the process() method is invoked

the weird behaviour of Scala 2.10, (I personally thought it is a bug)

if we simply set value of a PathSettings object (like settings.classpath), the isDefault is not set to true (this is a flag showing if the variable is modified), so it makes the PathResolver loads the default CLASSPATH environment variable value to calculated the path (see https://github.com/scala/scala/blob/2.10.x/src/compiler/scala/tools/util/PathResolver.scala#L215)

what we have to do is to manually make this flag set, (e3991d97dd/repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala (L884))

Author: CodingCat <zhunansjtu@gmail.com>

Closes #13 from CodingCat/SPARK-1089 and squashes the following commits:

8af81e7 [CodingCat] impose non-null settings
9aa2125 [CodingCat] code cleaning
ce36676 [CodingCat] code cleaning
e045582 [CodingCat] fix the regression problem on ADD_JARS in 0.9
2014-02-26 23:42:15 -08:00
Sean Owen c0ef3afa82 SPARK-1071: Tidy logging strategy and use of log4j
Prompted by a recent thread on the mailing list, I tried and failed to see if Spark can be made independent of log4j. There are a few cases where control of the underlying logging is pretty useful, and to do that, you have to bind to a specific logger.

Instead I propose some tidying that leaves Spark's use of log4j, but gets rid of warnings and should still enable downstream users to switch. The idea is to pipe everything (except log4j) through SLF4J, and have Spark use SLF4J directly when logging, and where Spark needs to output info (REPL and tests), bind from SLF4J to log4j.

This leaves the same behavior in Spark. It means that downstream users who want to use something except log4j should:

- Exclude dependencies on log4j, slf4j-log4j12 from Spark
- Include dependency on log4j-over-slf4j
- Include dependency on another logger X, and another slf4j-X
- Recreate any log config that Spark does, that is needed, in the other logger's config

That sounds about right.

Here are the key changes:

- Include the jcl-over-slf4j shim everywhere by depending on it in core.
- Exclude dependencies on commons-logging from third-party libraries.
- Include the jul-to-slf4j shim everywhere by depending on it in core.
- Exclude slf4j-* dependencies from third-party libraries to prevent collision or warnings
- Added missing slf4j-log4j12 binding to GraphX, Bagel module tests

And minor/incidental changes:

- Update to SLF4J 1.7.5, which happily matches Hadoop 2’s version and is a recommended update over 1.7.2
- (Remove a duplicate HBase dependency declaration in SparkBuild.scala)
- (Remove a duplicate mockito dependency declaration that was causing warnings and bugging me)

Author: Sean Owen <sowen@cloudera.com>

Closes #570 from srowen/SPARK-1071 and squashes the following commits:

52eac9f [Sean Owen] Add slf4j-over-log4j12 dependency to core (non-test) and remove it from things that depend on core.
77a7fa9 [Sean Owen] SPARK-1071: Tidy logging strategy and use of log4j
2014-02-23 11:40:55 -08:00
CodingCat e0d49ad229 [SPARK-1090] improvement on spark_shell (help information, configure memory)
https://spark-project.atlassian.net/browse/SPARK-1090

spark-shell should print help information about parameters and should allow user to configure exe memory
there is no document about hot to set --cores/-c in spark-shell

and also

users should be able to set executor memory through command line options

In this PR I also check the format of the options passed by the user

Author: CodingCat <zhunansjtu@gmail.com>

Closes #599 from CodingCat/spark_shell_improve and squashes the following commits:

de5aa38 [CodingCat] add parameter to set driver memory
915cbf8 [CodingCat] improvement on spark_shell (help information, configure memory)
2014-02-17 15:12:52 -08:00
Patrick Wendell b69f8b2a01 Merge pull request #557 from ScrapCodes/style. Closes #557.
SPARK-1058, Fix Style Errors and Add Scala Style to Spark Build.

Author: Patrick Wendell <pwendell@gmail.com>
Author: Prashant Sharma <scrapcodes@gmail.com>

== Merge branch commits ==

commit 1a8bd1c059b842cb95cc246aaea74a79fec684f4
Author: Prashant Sharma <scrapcodes@gmail.com>
Date:   Sun Feb 9 17:39:07 2014 +0530

    scala style fixes

commit f91709887a8e0b608c5c2b282db19b8a44d53a43
Author: Patrick Wendell <pwendell@gmail.com>
Date:   Fri Jan 24 11:22:53 2014 -0800

    Adding scalastyle snapshot
2014-02-09 10:09:19 -08:00
Mark Hamstra c2341c92bb Merge pull request #542 from markhamstra/versionBump. Closes #542.
Version number to 1.0.0-SNAPSHOT

Since 0.9.0-incubating is done and out the door, we shouldn't be building 0.9.0-incubating-SNAPSHOT anymore.

@pwendell

Author: Mark Hamstra <markhamstra@gmail.com>

== Merge branch commits ==

commit 1b00a8a7c1a7f251b4bb3774b84b9e64758eaa71
Author: Mark Hamstra <markhamstra@gmail.com>
Date:   Wed Feb 5 09:30:32 2014 -0800

    Version number to 1.0.0-SNAPSHOT
2014-02-08 16:00:43 -08:00
Patrick Wendell 23034798d7 Add missing header files 2014-01-14 01:17:13 -08:00
Patrick Wendell 0bb33076e2 Removing mentions in tests 2014-01-12 16:53:58 -08:00
Matei Zaharia 11891e68c3 Merge pull request #327 from lucarosellini/master
Added ‘-i’ command line option to Spark REPL

We had to create a new implementation of both scala.tools.nsc.CompilerCommand and scala.tools.nsc.Settings, because using scala.tools.nsc.GenericRunnerSettings would bring in other options (-howtorun, -save and -execute) which don’t make sense in Spark.
Any new Spark specific command line option could now be added to org.apache.spark.repl.SparkRunnerSettings class.

Since the behavior of loading a script from the command line should be the same as loading it using the “:load” command inside the shell, the script should be loaded when the SparkContext is available, that’s why we had to move the call to ‘loadfiles(settings)’ _after_ the call to postInitialization(). This still doesn’t work if ‘isAsync = true’.
2014-01-08 00:32:18 -05:00
Luca Rosellini 4689ce29fd Added license header and removed @author tag 2014-01-07 09:44:24 +01:00
Patrick Wendell 604fad9c39 Merge remote-tracking branch 'apache-github/master' into remove-binaries
Conflicts:
	core/src/test/scala/org/apache/spark/DriverSuite.scala
	docs/python-programming-guide.md
2014-01-03 21:29:33 -08:00
Luca Rosellini 0b6db8c186 Added ‘-i’ command line option to spark REPL.
We had to create a new implementation of both scala.tools.nsc.CompilerCommand and scala.tools.nsc.Settings, because using scala.tools.nsc.GenericRunnerSettings would bring in other options (-howtorun, -save and -execute) which don’t make sense in Spark.
Any new Spark specific command line option could now be added to org.apache.spark.repl.SparkRunnerSettings class.

Since the behavior of loading a script from the command line should be the same as loading it using the “:load” command inside the shell, the script should be loaded when the SparkContext is available, that’s why we had to move the call to ‘loadfiles(settings)’ _after_ the call to postInitialization(). This still doesn’t work if ‘isAsync = true’.
2014-01-03 12:57:06 +01:00
Prashant Sharma 94f2fffa23 fixed review comments 2014-01-03 14:43:37 +05:30
Prashant Sharma 980afd280a Merge branch 'scripts-reorg' of github.com:shane-huang/incubator-spark into spark-915-segregate-scripts
Conflicts:
	bin/spark-shell
	core/pom.xml
	core/src/main/scala/org/apache/spark/SparkContext.scala
	core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala
	core/src/main/scala/org/apache/spark/ui/UIWorkloadGenerator.scala
	core/src/test/scala/org/apache/spark/DriverSuite.scala
	python/run-tests
	sbin/compute-classpath.sh
	sbin/spark-class
	sbin/stop-slaves.sh
2014-01-02 17:55:21 +05:30
Matei Zaharia e2c68642c6 Miscellaneous fixes from code review.
Also replaced SparkConf.getOrElse with just a "get" that takes a default
value, and added getInt, getLong, etc to make code that uses this
simpler later on.
2014-01-01 22:03:39 -05:00
Matei Zaharia 642029e7f4 Various fixes to configuration code
- Got rid of global SparkContext.globalConf
- Pass SparkConf to serializers and compression codecs
- Made SparkConf public instead of private[spark]
- Improved API of SparkContext and SparkConf
- Switched executor environment vars to be passed through SparkConf
- Fixed some places that were still using system properties
- Fixed some tests, though others are still failing

This still fails several tests in core, repl and streaming, likely due
to properties not being set or cleared correctly (some of the tests run
fine in isolation).
2013-12-28 17:13:15 -05:00
Prashant Sharma 2573add94c spark-544, introducing SparkConf and related configuration overhaul. 2013-12-25 00:09:36 +05:30
Mark Hamstra 09ed7ddfa0 Use scala.binary.version in POMs 2013-12-15 12:39:58 -08:00
Prashant Sharma a854cc536d Review comments on the PR for scala 2.10 migration. 2013-12-13 15:19:51 +05:30
Prashant Sharma 17db6a9041 Style fixes and addressed review comments at #221 2013-12-10 11:47:16 +05:30
Prashant Sharma 7ad6921ae0 Incorporated Patrick's feedback comment on #211 and made maven build/dep-resolution atleast a bit faster. 2013-12-07 12:45:57 +05:30
Prashant Sharma d092a8cc6a Fixed compile time warnings and formatting post merge. 2013-11-26 15:21:50 +05:30
Aaron Davidson f629ba95b6 Various merge corrections
I've diff'd this patch against my own -- since they were both created
independently, this means that two sets of eyes have gone over all the
merge conflicts that were created, so I'm feeling significantly more
confident in the resulting PR.

@rxin has looked at the changes to the repl and is resoundingly
confident that they are correct.
2013-11-14 22:13:09 -08:00
Raymond Liu a60620b76a Merge branch 'master' into scala-2.10 2013-11-14 12:44:19 +08:00
Raymond Liu 0f2e3c6e31 Merge branch 'master' into scala-2.10 2013-11-13 16:55:11 +08:00
Reynold Xin 319299941d Propagate the SparkContext local property from the thread that calls the spark-repl to the actual execution thread. 2013-11-09 00:32:14 -08:00
Ali Ghodsi 05a0df2b9e Makes Spark SIMR ready. 2013-10-24 11:59:51 -07:00
Aaron Davidson 74737264c4 Spark shell exits if it cannot create SparkContext
Mainly, this occurs if you provide a messed up MASTER url (one that doesn't match one
of our regexes). Previously, we would default to Mesos, fail, and then start the shell
anyway, except that any Spark command would fail.
2013-10-17 18:51:19 -07:00
Andrew xia 52ccf4f859 deprecate "spark" script and SPAKR_CLASSPATH environment variable 2013-10-12 14:34:14 +08:00
Prashant Sharma 7be75682b9 Merge branch 'master' into wip-merge-master
Conflicts:
	bagel/pom.xml
	core/pom.xml
	core/src/test/scala/org/apache/spark/ui/UISuite.scala
	examples/pom.xml
	mllib/pom.xml
	pom.xml
	project/SparkBuild.scala
	repl/pom.xml
	streaming/pom.xml
	tools/pom.xml

In scala 2.10, a shorter representation is used for naming artifacts
 so changed to shorter scala version for artifacts and made it a property in pom.
2013-10-08 11:29:40 +05:30
Patrick Wendell aa9fb84994 Merging build changes in from 0.8 2013-10-05 22:07:00 -07:00
Prashant Sharma 5829692885 Merge branch 'master' into scala-2.10
Conflicts:
	core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala
	docs/_config.yml
	project/SparkBuild.scala
	repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
2013-10-01 11:57:24 +05:30
Patrick Wendell e2ff59af72 Bug fix in master build 2013-09-26 13:06:51 -07:00
Prashant Sharma 7ff4c2d399 fixed maven build for scala 2.10 2013-09-26 10:48:24 +05:30
Patrick Wendell 6079721fa1 Update build version in master 2013-09-24 11:41:51 -07:00
Prashant Sharma 69fd42aee3 ported repl improvements from master 2013-09-15 15:51:02 +05:30
Prashant Sharma 383e151fd7 Merge branch 'master' of git://github.com/mesos/spark into scala-2.10
Conflicts:
	core/src/main/scala/org/apache/spark/SparkContext.scala
	project/SparkBuild.scala
2013-09-15 10:55:12 +05:30
Prashant Sharma 20c65bc334 Fixed repl suite 2013-09-15 10:43:06 +05:30
Prashant Sharma 6fcfefcb27 Few more fixes to tests broken during merge 2013-09-10 10:57:47 +05:30
Jey Kottalam 30a32c8335 Minor YARN build cleanups 2013-09-06 11:31:16 -07:00
Prashant Sharma 4106ae9fbf Merged with master 2013-09-06 17:53:01 +05:30
Matei Zaharia f586c8ef38 Updated LICENSE with third-party licenses 2013-09-02 16:43:06 -07:00
Matei Zaharia 0a8cc30921 Move some classes to more appropriate packages:
* RDD, *RDDFunctions -> org.apache.spark.rdd
* Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util
* JavaSerializer, KryoSerializer -> org.apache.spark.serializer
2013-09-01 14:13:16 -07:00
Matei Zaharia 5701eb92c7 Fix some URLs 2013-09-01 14:13:16 -07:00
Matei Zaharia 46eecd110a Initial work to rename package to org.apache.spark 2013-09-01 14:13:13 -07:00
Mark Hamstra ff6f1b0500 Synced sbt and maven builds 2013-08-21 13:50:24 -07:00
Jey Kottalam 23f4622aff Remove redundant dependencies from POMs 2013-08-18 18:53:57 -07:00
Jey Kottalam c1e547bb7f Updates to repl and example POMs to match SBT build 2013-08-16 13:50:12 -07:00