Commit graph

4013 commits

Author SHA1 Message Date
Matei Zaharia 53cd50c069 Change build and run instructions to use assemblies
This commit makes Spark invocation saner by using an assembly JAR to
find all of Spark's dependencies instead of adding all the JARs in
lib_managed. It also packages the examples into an assembly and uses
that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script
with two better-named scripts: "run-examples" for examples, and
"spark-class" for Spark internal classes (e.g. REPL, master, etc). This
is also designed to minimize the confusion people have in trying to use
"run" to run their own classes; it's not meant to do that, but now at
least if they look at it, they can modify run-examples to do a decent
job for them.

As part of this, Bagel's examples are also now properly moved to the
examples package instead of bagel.
2013-08-29 21:19:04 -07:00
jerryshao f3dbe6b215 Fix removed block zero size log reporting 2013-08-30 09:39:01 +08:00
Patrick Wendell abdbacf252 Merge pull request #871 from pwendell/expose-local
Expose `isLocal` in SparkContext.
2013-08-28 21:11:31 -07:00
Matei Zaharia afcade3ca8 Merge pull request #873 from pwendell/master
Hot fix for command runner
2013-08-28 20:15:40 -07:00
Patrick Wendell 1798e69e71 Adding extra args 2013-08-28 19:56:46 -07:00
Patrick Wendell 30d2421112 Make local variable public 2013-08-28 19:53:31 -07:00
Patrick Wendell 2fc9a028f2 Hot fix for command runner 2013-08-28 19:03:06 -07:00
Andre Schumacher a511c5379e RDD sample() and takeSample() prototypes for PySpark 2013-08-28 16:46:13 -07:00
Josh Rosen 742c44eae6 Don't send SIGINT to Py4J gateway subprocess.
This addresses SPARK-885, a usability issue where PySpark's
Java gateway process would be killed if the user hit ctrl-c.

Note that SIGINT still won't cancel the running s

This fix is based on http://stackoverflow.com/questions/5045771
2013-08-28 16:39:44 -07:00
Andre Schumacher 457bcd3343 PySpark: implementing subtractByKey(), subtract() and keyBy() 2013-08-28 16:14:22 -07:00
Matei Zaharia baa84e7e4c Merge pull request #865 from tgravescs/fixtmpdir
Spark on Yarn should use yarn approved directories for spark.local.dir and tmp
2013-08-28 12:44:46 -07:00
Y.CORP.YAHOO.COM\tgraves aac1214ee4 Change Executor to only look at the env variable SPARK_YARN_MODE 2013-08-28 13:26:26 -05:00
Matei Zaharia cd043cf922 Merge pull request #867 from tgravescs/yarnenvconfigs
Spark on Yarn allow users to specify environment variables
2013-08-27 19:50:32 -07:00
Y.CORP.YAHOO.COM\tgraves 3f206bf0b5 Updated based on review comments. 2013-08-27 14:34:27 -05:00
Y.CORP.YAHOO.COM\tgraves cf52a3cba6 Allow for Executors to have different directories then the Spark Master for Yarn 2013-08-27 11:00:21 -05:00
Matei Zaharia 898da7e422 Merge pull request #859 from ianbuss/sbt_opts
Pass SBT_OPTS environment through to sbt_launcher
2013-08-26 20:40:49 -07:00
Y.CORP.YAHOO.COM\tgraves 63dc635de6 fix typos 2013-08-26 17:06:20 -05:00
Y.CORP.YAHOO.COM\tgraves c9464c74a1 Add ability for user to specify environment variables 2013-08-26 16:44:27 -05:00
Y.CORP.YAHOO.COM\tgraves 6dd64e8bb2 Update docs and remove old reference to --user option 2013-08-26 14:29:24 -05:00
shivaram 17bafeab39 Merge pull request #864 from rxin/json1
Revert json library change
2013-08-26 11:59:32 -07:00
Y.CORP.YAHOO.COM\tgraves dfb4c697bc Throw exception if the yarn local dirs isn't set 2013-08-26 13:57:01 -05:00
Reynold Xin a77e0abb96 Added worker state to the cluster master JSON ui. 2013-08-26 11:21:03 -07:00
Reynold Xin 9db1e50344 Revert "Merge pull request #841 from rxin/json"
This reverts commit 1fb1b09928, reversing
changes made to c69c48947d.
2013-08-26 11:05:14 -07:00
Y.CORP.YAHOO.COM\tgraves c0b4095ee8 Change to use Yarn appropriate directories rather then /tmp or the user specified spark.local.dir 2013-08-26 12:48:37 -05:00
Shivaram Venkataraman dc06b52879 Add an option to turn off data validation, test it.
Also moves addIntercept to have default true to make it similar
to validateData option
2013-08-25 23:14:35 -07:00
Shivaram Venkataraman b8c50a0642 Center & scale variables in Ridge, Lasso.
Also add a unit test that checks if ridge regression lowers
cross-validation error.
2013-08-25 22:24:27 -07:00
Patrick Wendell f9fc5c160a Merge pull request #603 from pwendell/ec2-updates
Several Improvements to EC2 Scripts
2013-08-24 15:19:56 -07:00
Patrick Wendell 2cfe52ef55 Version bump for ec2 docs 2013-08-24 15:16:53 -07:00
Patrick Wendell 4879685910 Merge remote-tracking branch 'mesos/master' into ec2-updates 2013-08-24 14:50:58 -07:00
Matei Zaharia d282c1ebbb Merge pull request #860 from jey/sbt-ide-fixes
Fix IDE project generation under SBT
2013-08-23 11:20:20 -07:00
Jey Kottalam a9db1b7b6e Upgrade SBT IDE project generators 2013-08-23 10:27:18 -07:00
Jey Kottalam b7f9e6374a Fix SBT generation of IDE project files 2013-08-23 10:26:37 -07:00
Ian Buss d7f18e3d27 Pass SBT_OPTS environment through to sbt_launcher 2013-08-23 09:50:23 +01:00
Matei Zaharia 5a6ac12840 Merge pull request #701 from ScrapCodes/documentation-suggestions
Documentation suggestions for spark streaming.
2013-08-22 22:08:03 -07:00
Prashant Sharma 2bc348e92c Linking custom receiver guide 2013-08-23 09:44:02 +05:30
Prashant Sharma 3049415e24 Corrections in documentation comment 2013-08-23 09:40:28 +05:30
Prashant Sharma 39a1d58da4 Improved documentation for spark custom receiver 2013-08-23 09:38:50 +05:30
Matei Zaharia 215c13dd41 Fix code style and a nondeterministic RDD issue in ALS 2013-08-22 16:13:46 -07:00
Matei Zaharia 46ea0c1b47 Merge pull request #814 from holdenk/master
Create less instances of the random class during ALS initialization.
2013-08-22 15:57:28 -07:00
Matei Zaharia 9ac3d62cac Merge pull request #856 from jey/sbt-fix-hadoop-0.23.9
Re-add removed dependency to fix build under Hadoop 0.23.9
2013-08-22 15:51:10 -07:00
Jey Kottalam 281b6c5f28 Re-add removed dependency on 'commons-daemon'
Fixes SBT build under Hadoop 0.23.9 and 2.0.4
2013-08-22 15:45:45 -07:00
Matei Zaharia ae8ba83ef2 Merge pull request #855 from jey/update-build-docs
Update build docs
2013-08-22 10:14:54 -07:00
Matei Zaharia 8a36fd09dd Merge pull request #854 from markhamstra/pomUpdate
Synced sbt and maven builds to use the same dependencies, etc.
2013-08-22 10:13:35 -07:00
Matei Zaharia c2d00f12e2 Merge pull request #832 from alig/coalesce
Coalesced RDD with locality
2013-08-22 10:13:03 -07:00
Jey Kottalam 9a90667d09 Increase ReservedCodeCacheSize to 256m 2013-08-21 21:15:28 -07:00
Jey Kottalam 0087b43e9c Use Hadoop 1.2.1 in application example 2013-08-21 21:15:00 -07:00
Jey Kottalam 54e9379de2 Revert "Allow build configuration to be set in conf/spark-env.sh"
This reverts commit 66e7a38a32.
2013-08-21 21:13:34 -07:00
Matei Zaharia e6d66c8abd Merge pull request #853 from AndreSchumacher/double_rdd
Implementing SPARK-838: Add DoubleRDDFunctions methods to PySpark
2013-08-21 17:44:31 -07:00
Jey Kottalam f9cc1fbf27 Remove references to unsupported Hadoop versions 2013-08-21 17:14:36 -07:00
Andre Schumacher 76077bf9f4 Implementing SPARK-838: Add DoubleRDDFunctions methods to PySpark 2013-08-21 17:05:58 -07:00