Commit graph

3959 commits

Author SHA1 Message Date
Patrick Wendell 1798e69e71 Adding extra args 2013-08-28 19:56:46 -07:00
Patrick Wendell 30d2421112 Make local variable public 2013-08-28 19:53:31 -07:00
Patrick Wendell 2fc9a028f2 Hot fix for command runner 2013-08-28 19:03:06 -07:00
Andre Schumacher a511c5379e RDD sample() and takeSample() prototypes for PySpark 2013-08-28 16:46:13 -07:00
Josh Rosen 742c44eae6 Don't send SIGINT to Py4J gateway subprocess.
This addresses SPARK-885, a usability issue where PySpark's
Java gateway process would be killed if the user hit ctrl-c.

Note that SIGINT still won't cancel the running s

This fix is based on http://stackoverflow.com/questions/5045771
2013-08-28 16:39:44 -07:00
Andre Schumacher 457bcd3343 PySpark: implementing subtractByKey(), subtract() and keyBy() 2013-08-28 16:14:22 -07:00
Matei Zaharia baa84e7e4c Merge pull request #865 from tgravescs/fixtmpdir
Spark on Yarn should use yarn approved directories for spark.local.dir and tmp
2013-08-28 12:44:46 -07:00
Y.CORP.YAHOO.COM\tgraves aac1214ee4 Change Executor to only look at the env variable SPARK_YARN_MODE 2013-08-28 13:26:26 -05:00
Matei Zaharia cd043cf922 Merge pull request #867 from tgravescs/yarnenvconfigs
Spark on Yarn allow users to specify environment variables
2013-08-27 19:50:32 -07:00
Y.CORP.YAHOO.COM\tgraves 3f206bf0b5 Updated based on review comments. 2013-08-27 14:34:27 -05:00
Y.CORP.YAHOO.COM\tgraves cf52a3cba6 Allow for Executors to have different directories then the Spark Master for Yarn 2013-08-27 11:00:21 -05:00
Matei Zaharia 898da7e422 Merge pull request #859 from ianbuss/sbt_opts
Pass SBT_OPTS environment through to sbt_launcher
2013-08-26 20:40:49 -07:00
Y.CORP.YAHOO.COM\tgraves 63dc635de6 fix typos 2013-08-26 17:06:20 -05:00
Y.CORP.YAHOO.COM\tgraves c9464c74a1 Add ability for user to specify environment variables 2013-08-26 16:44:27 -05:00
Y.CORP.YAHOO.COM\tgraves 6dd64e8bb2 Update docs and remove old reference to --user option 2013-08-26 14:29:24 -05:00
shivaram 17bafeab39 Merge pull request #864 from rxin/json1
Revert json library change
2013-08-26 11:59:32 -07:00
Y.CORP.YAHOO.COM\tgraves dfb4c697bc Throw exception if the yarn local dirs isn't set 2013-08-26 13:57:01 -05:00
Reynold Xin a77e0abb96 Added worker state to the cluster master JSON ui. 2013-08-26 11:21:03 -07:00
Reynold Xin 9db1e50344 Revert "Merge pull request #841 from rxin/json"
This reverts commit 1fb1b09928, reversing
changes made to c69c48947d.
2013-08-26 11:05:14 -07:00
Y.CORP.YAHOO.COM\tgraves c0b4095ee8 Change to use Yarn appropriate directories rather then /tmp or the user specified spark.local.dir 2013-08-26 12:48:37 -05:00
Shivaram Venkataraman dc06b52879 Add an option to turn off data validation, test it.
Also moves addIntercept to have default true to make it similar
to validateData option
2013-08-25 23:14:35 -07:00
Shivaram Venkataraman b8c50a0642 Center & scale variables in Ridge, Lasso.
Also add a unit test that checks if ridge regression lowers
cross-validation error.
2013-08-25 22:24:27 -07:00
Patrick Wendell f9fc5c160a Merge pull request #603 from pwendell/ec2-updates
Several Improvements to EC2 Scripts
2013-08-24 15:19:56 -07:00
Patrick Wendell 2cfe52ef55 Version bump for ec2 docs 2013-08-24 15:16:53 -07:00
Patrick Wendell 4879685910 Merge remote-tracking branch 'mesos/master' into ec2-updates 2013-08-24 14:50:58 -07:00
Matei Zaharia d282c1ebbb Merge pull request #860 from jey/sbt-ide-fixes
Fix IDE project generation under SBT
2013-08-23 11:20:20 -07:00
Jey Kottalam a9db1b7b6e Upgrade SBT IDE project generators 2013-08-23 10:27:18 -07:00
Jey Kottalam b7f9e6374a Fix SBT generation of IDE project files 2013-08-23 10:26:37 -07:00
Ian Buss d7f18e3d27 Pass SBT_OPTS environment through to sbt_launcher 2013-08-23 09:50:23 +01:00
Matei Zaharia 5a6ac12840 Merge pull request #701 from ScrapCodes/documentation-suggestions
Documentation suggestions for spark streaming.
2013-08-22 22:08:03 -07:00
Prashant Sharma 2bc348e92c Linking custom receiver guide 2013-08-23 09:44:02 +05:30
Prashant Sharma 3049415e24 Corrections in documentation comment 2013-08-23 09:40:28 +05:30
Prashant Sharma 39a1d58da4 Improved documentation for spark custom receiver 2013-08-23 09:38:50 +05:30
Matei Zaharia 215c13dd41 Fix code style and a nondeterministic RDD issue in ALS 2013-08-22 16:13:46 -07:00
Matei Zaharia 46ea0c1b47 Merge pull request #814 from holdenk/master
Create less instances of the random class during ALS initialization.
2013-08-22 15:57:28 -07:00
Matei Zaharia 9ac3d62cac Merge pull request #856 from jey/sbt-fix-hadoop-0.23.9
Re-add removed dependency to fix build under Hadoop 0.23.9
2013-08-22 15:51:10 -07:00
Jey Kottalam 281b6c5f28 Re-add removed dependency on 'commons-daemon'
Fixes SBT build under Hadoop 0.23.9 and 2.0.4
2013-08-22 15:45:45 -07:00
Matei Zaharia ae8ba83ef2 Merge pull request #855 from jey/update-build-docs
Update build docs
2013-08-22 10:14:54 -07:00
Matei Zaharia 8a36fd09dd Merge pull request #854 from markhamstra/pomUpdate
Synced sbt and maven builds to use the same dependencies, etc.
2013-08-22 10:13:35 -07:00
Matei Zaharia c2d00f12e2 Merge pull request #832 from alig/coalesce
Coalesced RDD with locality
2013-08-22 10:13:03 -07:00
Jey Kottalam 9a90667d09 Increase ReservedCodeCacheSize to 256m 2013-08-21 21:15:28 -07:00
Jey Kottalam 0087b43e9c Use Hadoop 1.2.1 in application example 2013-08-21 21:15:00 -07:00
Jey Kottalam 54e9379de2 Revert "Allow build configuration to be set in conf/spark-env.sh"
This reverts commit 66e7a38a32.
2013-08-21 21:13:34 -07:00
Matei Zaharia e6d66c8abd Merge pull request #853 from AndreSchumacher/double_rdd
Implementing SPARK-838: Add DoubleRDDFunctions methods to PySpark
2013-08-21 17:44:31 -07:00
Jey Kottalam f9cc1fbf27 Remove references to unsupported Hadoop versions 2013-08-21 17:14:36 -07:00
Andre Schumacher 76077bf9f4 Implementing SPARK-838: Add DoubleRDDFunctions methods to PySpark 2013-08-21 17:05:58 -07:00
Patrick Wendell c02585ea13 Make initial connection failure message less daunting.
Right now it seems like something has gone wrong when this message is printed out.
Instead, this is a normal condition. So I changed the message a bit.
2013-08-21 15:45:45 -07:00
Patrick Wendell 6be6b71c8c Merge branch 'master' into ec2-updates
Conflicts:
	ec2/spark_ec2.py
2013-08-21 15:34:31 -07:00
Jey Kottalam 4d737b6d32 Example should make sense 2013-08-21 15:03:37 -07:00
Jey Kottalam 6585f49841 Update build docs 2013-08-21 14:51:56 -07:00