Commit graph

3119 commits

Author SHA1 Message Date
Shivaram Venkataraman fd137bd7c6 Address Reynold's comments. Also use a builder pattern to construct the regression classes. 2013-07-05 11:13:45 -07:00
Shivaram Venkataraman 48770419bd Add random data used for LR testing.
Verified that results match with glm in R
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman 282c8ed788 Add LogisticRegression using StochasticGradientDescent.
Also refactor RidgeRegression and LogisticRegression to re-use code
and update the test as well
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman b9d9b6f981 Add a unit test for Ridge Regression 2013-07-05 11:13:45 -07:00
Shivaram Venkataraman 4dc13bf5be Revert back to closed form CV error 2013-07-05 11:13:45 -07:00
Shivaram Venkataraman c8169c0a33 Add LPSA data set.
Data from
http://www-stat.stanford.edu/~tibs/ElemStatLearn/datasets/prostate.data
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman c070decb8e Add methods to normalize the data before training
Also update model after training based appropriately.
2013-07-05 11:13:45 -07:00
Reynold Xin 6a9a9a364c Minor clean up of the RidgeRegression code. I am not even sure why I did
this :s.
2013-07-05 11:13:45 -07:00
Matei Zaharia 729e463f64 Import RidgeRegression example
Conflicts:
	run
2013-07-05 11:13:41 -07:00
Matei Zaharia 6ad85d0918 Merge pull request #677 from jerryshao/fix_stage_clean
Clean StageToInfos periodically when spark.cleaner.ttl is enabled
2013-07-04 21:32:29 -07:00
jerryshao e4ff544a8d Clean StageToInfos periodically when spark.cleaner.ttl is enabled 2013-07-05 10:34:45 +08:00
Konstantin Boudnik 7687ed5292 Use standard ASF published avro module instead of a proprietory built one 2013-07-04 13:48:33 -07:00
Lian Cheng c0c3155c3c Bug fix: SPARK-789
https://spark-project.atlassian.net/browse/SPARK-789
2013-07-05 00:54:10 +08:00
Andrew xia 6ccfb73ca9 Add fair scheduler config template file 2013-07-04 19:19:44 +08:00
Holden Karau 0f06d6217d s/ActorSystemImpl/ExtendedActorSystem/ as ActorSystemImpl results in a warning 2013-07-04 01:05:39 -07:00
Mingfei 04567a1771 update guava version from 11.0.1 to 14.0.1 2013-07-03 17:43:37 +08:00
Y.CORP.YAHOO.COM\tgraves 923cf92900 Rework from pull request. Removed --user option from Spark on Yarn Client, made the user of JAVA_HOME environment
variable conditional on if its set, and created addCredentials in each of the SparkHadoopUtil classes
to only add the credentials when the profile is hadoop2-yarn.
2013-07-02 21:18:59 -05:00
Patrick Wendell 39e2325675 Removing dead code 2013-07-02 16:28:40 -07:00
Patrick Wendell 8ca1cc1786 Adding truncation for log files 2013-07-02 16:10:50 -07:00
Matei Zaharia 6d60fe571a Merge pull request #666 from c0s/master
hbase dependency is missed in hadoop2-yarn profile of examples module
2013-07-01 18:24:03 -07:00
Konstantin Boudnik 6fdbc68f2c Fixing missed hbase dependency in examples hadoop2-yarn profile 2013-07-01 17:45:07 -07:00
Patrick Wendell 9a42d04efa Throw exception for missing resource 2013-07-01 14:43:13 -07:00
Patrick Wendell 1025d7d1ef Package refactoring 2013-07-01 14:40:53 -07:00
Patrick Wendell 30b9034241 Fixing bug where logs aren't shown 2013-07-01 13:48:01 -07:00
Patrick Wendell 8688689387 Various formatting changes 2013-07-01 13:40:12 -07:00
Patrick Wendell 735c951a09 Adding test script 2013-07-01 09:33:22 -07:00
Patrick Wendell 5de326db7d Print exception message 2013-07-01 09:19:45 -07:00
root 7cd490ef5b Clarify that PySpark is not supported on Windows 2013-07-01 06:26:43 +00:00
root ec31e68d5d Fixed PySpark perf regression by not using socket.makefile(), and improved
debuggability by letting "print" statements show up in the executor's stderr

Conflicts:
	core/src/main/scala/spark/api/python/PythonRDD.scala
2013-07-01 06:26:31 +00:00
root 3296d132b6 Fix performance bug with new Python code not using buffered streams 2013-07-01 06:25:43 +00:00
Matei Zaharia 39ae073b5c Increase SLF4j version in Maven too 2013-06-30 17:11:14 -07:00
Matei Zaharia 5bbd0eec84 Update docs on SCALA_LIBRARY_PATH 2013-06-30 17:00:40 -07:00
Matei Zaharia 03d0b858c8 Made use of spark.executor.memory setting consistent and documented it
Conflicts:

	core/src/main/scala/spark/SparkContext.scala
2013-06-30 15:46:46 -07:00
Matei Zaharia ccfe953a4d Merge pull request #577 from skumargithub/master
Example of cumulative counting using updateStateByKey
2013-06-29 17:57:53 -07:00
Matei Zaharia 5cfcd3c336 Remove Twitter4J specific repo since it's in Maven central 2013-06-29 15:37:27 -07:00
Matei Zaharia 4358acfe07 Initialize Twitter4J OAuth from system properties instead of prompting 2013-06-29 15:25:06 -07:00
Matei Zaharia 1667158544 Merge remote-tracking branch 'mrpotes/master' 2013-06-29 14:36:09 -07:00
Patrick Wendell e721ff7e5a Allowing details for failed stages 2013-06-29 11:26:30 -07:00
Patrick Wendell 473961d82e Styling for progress bar 2013-06-29 08:38:04 -07:00
Patrick Wendell 249f0e54ba Minor changes from Matei's review 2013-06-28 13:25:26 -07:00
Matei Zaharia 50ca17635a Merge pull request #664 from pwendell/test-fix
Removing incorrect test statement
2013-06-27 22:24:52 -07:00
Matei Zaharia 4974b658ed Look at JAVA_HOME before PATH to determine Java executable 2013-06-27 22:16:40 -07:00
Patrick Wendell c537e869f3 Missing logo file 2013-06-27 22:02:03 -07:00
Patrick Wendell c767e74370 Removing incorrect test statement 2013-06-27 21:48:58 -07:00
Patrick Wendell 62c2c6b856 Forcing Jetty to run as daemon 2013-06-27 21:47:22 -07:00
Patrick Wendell a55190d314 Adding better tabs for UI headers. 2013-06-27 19:14:51 -07:00
Patrick Wendell 362d996c81 Handful of changes based on matei's review
- Avoid exception when no tasks have finished for a stage
- Adding DOCTYPE so css renders properly
- Adding progress slider
2013-06-27 19:14:28 -07:00
Patrick Wendell 92a4c2a5f6 Fixing bug in local scheduler time recording 2013-06-27 12:33:06 -07:00
Matei Zaharia aea727f68d Simplify Python docs a little to do substring search 2013-06-26 21:15:09 -07:00
Matei Zaharia 03906f7f0a Fixes to compute-classpath on Windows 2013-06-26 17:40:22 -07:00