ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Matei Zaharia	6edef9c833	Merge pull request #861 from AndreSchumacher/pyspark_sampling_function Pyspark sampling function	2013-08-31 13:39:24 -07:00
Matei Zaharia	fd89835965	Merge pull request #870 from JoshRosen/spark-885 Don't send SIGINT / ctrl-c to Py4J gateway subprocess	2013-08-31 13:18:12 -07:00
Matei Zaharia	618f0ecb43	Merge pull request #869 from AndreSchumacher/subtract PySpark: implementing subtractByKey(), subtract() and keyBy()	2013-08-30 18:17:13 -07:00
Reynold Xin	94bb7fd46e	Merge pull request #876 from mbautin/master_hadoop_rdd_conf Make HadoopRDD's configuration accessible	2013-08-30 12:05:13 -07:00
Mikhail Bautin	35090958b3	Also add getConf to NewHadoopRDD	2013-08-30 11:03:57 -07:00
Mikhail Bautin	5e30172f70	Make HadoopRDD's configuration accessible	2013-08-30 11:01:06 -07:00
Reynold Xin	9e17e456d2	Merge pull request #875 from shivaram/build-fix Fix broken build by removing addIntercept	2013-08-30 00:22:53 -07:00
Shivaram Venkataraman	adc700582b	Fix broken build by removing addIntercept	2013-08-30 00:16:32 -07:00
Evan Sparks	016787de32	Merge pull request #863 from shivaram/etrain-ridge Adding linear regression and refactoring Ridge regression to use SGD	2013-08-29 22:15:14 -07:00
Evan Sparks	852d810787	Merge pull request #819 from shivaram/sgd-cleanup Change SVM to use {0,1} labels	2013-08-29 22:13:15 -07:00
Matei Zaharia	ca71620950	Merge pull request #857 from mateiz/assembly Change build and run instructions to use assemblies	2013-08-29 21:51:14 -07:00
Reynold Xin	1528776628	Merge pull request #874 from jerryshao/fix-report-bug Fix removed block zero size log reporting	2013-08-29 21:30:47 -07:00
Matei Zaharia	e11bc18294	Update Maven docs	2013-08-29 21:19:07 -07:00
Matei Zaharia	d8a4008685	Fix path to assembly in make-distribution.sh	2013-08-29 21:19:07 -07:00
Matei Zaharia	2de756ff19	Update some build instructions because only sbt assembly and mvn package are now needed	2013-08-29 21:19:06 -07:00
Matei Zaharia	666d93c294	Update Maven build to create assemblies expected by new scripts This includes the following changes: - The "assembly" package now builds in Maven by default, and creates an assembly containing both hadoop-client and Spark, unlike the old BigTop distribution assembly that skipped hadoop-client - There is now a bigtop-dist package to build the old BigTop assembly - The repl-bin package is no longer built by default since the scripts don't reply on it; instead it can be enabled with -Prepl-bin - Py4J is now included in the assembly/lib folder as a local Maven repo, so that the Maven package can link to it - run-example now adds the original Spark classpath as well because the Maven examples assembly lists spark-core and such as provided - The various Maven projects add a spark-yarn dependency correctly	2013-08-29 21:19:06 -07:00
Matei Zaharia	d7dec938e5	Don't use SPARK_LAUNCH_WITH_SCALA in pyspark	2013-08-29 21:19:06 -07:00
Matei Zaharia	3ff105f87d	Find assembly correctly in pyspark	2013-08-29 21:19:06 -07:00
Matei Zaharia	aab345c463	Fix finding of assembly JAR, as well as some pointers to ./run	2013-08-29 21:19:06 -07:00
Matei Zaharia	8d81358a05	Provide more memory for tests	2013-08-29 21:19:06 -07:00
Matei Zaharia	ab0e625d9e	Fix PySpark for assembly run and include it in dist	2013-08-29 21:19:06 -07:00
Matei Zaharia	53cd50c069	Change build and run instructions to use assemblies This commit makes Spark invocation saner by using an assembly JAR to find all of Spark's dependencies instead of adding all the JARs in lib_managed. It also packages the examples into an assembly and uses that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script with two better-named scripts: "run-examples" for examples, and "spark-class" for Spark internal classes (e.g. REPL, master, etc). This is also designed to minimize the confusion people have in trying to use "run" to run their own classes; it's not meant to do that, but now at least if they look at it, they can modify run-examples to do a decent job for them. As part of this, Bagel's examples are also now properly moved to the examples package instead of bagel.	2013-08-29 21:19:04 -07:00
jerryshao	f3dbe6b215	Fix removed block zero size log reporting	2013-08-30 09:39:01 +08:00
Patrick Wendell	abdbacf252	Merge pull request #871 from pwendell/expose-local Expose `isLocal` in SparkContext.	2013-08-28 21:11:31 -07:00
Matei Zaharia	afcade3ca8	Merge pull request #873 from pwendell/master Hot fix for command runner	2013-08-28 20:15:40 -07:00
Patrick Wendell	1798e69e71	Adding extra args	2013-08-28 19:56:46 -07:00
Patrick Wendell	30d2421112	Make local variable public	2013-08-28 19:53:31 -07:00
Patrick Wendell	2fc9a028f2	Hot fix for command runner	2013-08-28 19:03:06 -07:00
Andre Schumacher	a511c5379e	RDD sample() and takeSample() prototypes for PySpark	2013-08-28 16:46:13 -07:00
Josh Rosen	742c44eae6	Don't send SIGINT to Py4J gateway subprocess. This addresses SPARK-885, a usability issue where PySpark's Java gateway process would be killed if the user hit ctrl-c. Note that SIGINT still won't cancel the running s This fix is based on http://stackoverflow.com/questions/5045771	2013-08-28 16:39:44 -07:00
Andre Schumacher	457bcd3343	PySpark: implementing subtractByKey(), subtract() and keyBy()	2013-08-28 16:14:22 -07:00
Matei Zaharia	baa84e7e4c	Merge pull request #865 from tgravescs/fixtmpdir Spark on Yarn should use yarn approved directories for spark.local.dir and tmp	2013-08-28 12:44:46 -07:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	aac1214ee4	Change Executor to only look at the env variable SPARK_YARN_MODE	2013-08-28 13:26:26 -05:00
Matei Zaharia	cd043cf922	Merge pull request #867 from tgravescs/yarnenvconfigs Spark on Yarn allow users to specify environment variables	2013-08-27 19:50:32 -07:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	3f206bf0b5	Updated based on review comments.	2013-08-27 14:34:27 -05:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	cf52a3cba6	Allow for Executors to have different directories then the Spark Master for Yarn	2013-08-27 11:00:21 -05:00
Matei Zaharia	898da7e422	Merge pull request #859 from ianbuss/sbt_opts Pass SBT_OPTS environment through to sbt_launcher	2013-08-26 20:40:49 -07:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	63dc635de6	fix typos	2013-08-26 17:06:20 -05:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	c9464c74a1	Add ability for user to specify environment variables	2013-08-26 16:44:27 -05:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	6dd64e8bb2	Update docs and remove old reference to --user option	2013-08-26 14:29:24 -05:00
shivaram	17bafeab39	Merge pull request #864 from rxin/json1 Revert json library change	2013-08-26 11:59:32 -07:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	dfb4c697bc	Throw exception if the yarn local dirs isn't set	2013-08-26 13:57:01 -05:00
Reynold Xin	a77e0abb96	Added worker state to the cluster master JSON ui.	2013-08-26 11:21:03 -07:00
Reynold Xin	9db1e50344	Revert "Merge pull request #841 from rxin/json" This reverts commit `1fb1b09928`, reversing changes made to `c69c48947d`.	2013-08-26 11:05:14 -07:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	c0b4095ee8	Change to use Yarn appropriate directories rather then /tmp or the user specified spark.local.dir	2013-08-26 12:48:37 -05:00
Shivaram Venkataraman	dc06b52879	Add an option to turn off data validation, test it. Also moves addIntercept to have default true to make it similar to validateData option	2013-08-25 23:14:35 -07:00
Shivaram Venkataraman	b8c50a0642	Center & scale variables in Ridge, Lasso. Also add a unit test that checks if ridge regression lowers cross-validation error.	2013-08-25 22:24:27 -07:00
Patrick Wendell	f9fc5c160a	Merge pull request #603 from pwendell/ec2-updates Several Improvements to EC2 Scripts	2013-08-24 15:19:56 -07:00
Patrick Wendell	2cfe52ef55	Version bump for ec2 docs	2013-08-24 15:16:53 -07:00
Patrick Wendell	4879685910	Merge remote-tracking branch 'mesos/master' into ec2-updates	2013-08-24 14:50:58 -07:00

1 2 3 4 5 ...

3884 commits