Commit graph

3368 commits

Author SHA1 Message Date
Karen Feng e04a37a332 Merge branch 'master' of https://github.com/mesos/spark into bootstrap-update
cially if it merges an updated upstream into a topic branch.
2013-07-29 14:32:48 -07:00
Reynold Xin fe7298b587 Merge pull request #741 from pwendell/usability
Fix two small usability issues
2013-07-29 14:01:00 -07:00
Karen Feng 43a2cc15c0 Use Bootstrap progress bars in web UI 2013-07-29 13:37:24 -07:00
shivaram c34c0f6a7c Merge pull request #731 from pxinghao/master
Adding SVM and Lasso
2013-07-29 13:18:10 -07:00
Xinghao 2b2630ba3c Style fix
Lines shortened to < 100 characters
2013-07-29 09:22:49 -07:00
Xinghao 07f17439a5 Fix validatePrediction functions for Classification models
Classifiers return categorical (Int) values that should be compared
directly
2013-07-29 09:22:31 -07:00
Xinghao 3a8d07df8c Deleting extra LogisticRegressionGenerator and RidgeRegressionGenerator 2013-07-29 09:20:26 -07:00
Xinghao 75f3757300 Fix rounding error in LogisticRegression.scala 2013-07-29 09:19:56 -07:00
Matei Zaharia d8158ced12 Merge branch 'master' of github.com:mesos/spark 2013-07-29 02:52:02 -04:00
Matei Zaharia 497f55755f Add docs about ipython 2013-07-29 02:51:43 -04:00
Matei Zaharia feba7ee540 SPARK-815. Python parallelize() should split lists before batching
One unfortunate consequence of this fix is that we materialize any
collections that are given to us as generators, but this seems necessary
to get reasonable behavior on small collections. We could add a
batchSize parameter later to bypass auto-computation of batch size if
this becomes a problem (e.g. if users really want to parallelize big
generators nicely)
2013-07-29 02:51:43 -04:00
Matei Zaharia d75c308695 Use None instead of empty string as it's slightly smaller/faster 2013-07-29 02:51:43 -04:00
Matei Zaharia 96b50e82dc Allow python/run-tests to run from any directory 2013-07-29 02:51:43 -04:00
Matei Zaharia b5ec355622 Optimize Python foreach() to not return as many objects 2013-07-29 02:51:43 -04:00
Matei Zaharia b9d6783f36 Optimize Python take() to not compute entire first partition 2013-07-29 02:51:43 -04:00
Xinghao c823ee1e2b Replace map-reduce with dot operator using DoubleMatrix 2013-07-28 22:17:53 -07:00
Xinghao 96e04f4cb7 Fixed SVM and LR train functions to take Int instead of Double for Classification 2013-07-28 22:12:39 -07:00
Xinghao 9398dced03 Changed Classification to return Int instead of Double
Also minor changes to formatting and comments
2013-07-28 21:39:19 -07:00
Xinghao 67de051bbb SVMSuite and LassoSuite rewritten to follow closely with LogisticRegressionSuite 2013-07-28 21:09:56 -07:00
Xinghao 29e042940a Move data generators to util 2013-07-28 20:39:52 -07:00
Matei Zaharia 72ff62a37c Two fixes to IPython support:
- Don't attempt to run worker processes with ipython (that can cause
  some crashes as ipython prints things to standard out)
- Allow passing some IPYTHON_OPTS to launch things like the notebook
2013-07-28 22:23:13 -04:00
Xinghao ccfa362dde Change *_LocalRandomSGD to *LocalRandomSGD 2013-07-28 10:33:57 -07:00
Matei Zaharia f11ad72d4e Some fixes to Python examples (style and package name for LR) 2013-07-27 21:12:22 -04:00
Patrick Wendell bcafb36c1e Slight wording change 2013-07-27 16:03:50 -07:00
Patrick Wendell 8177165ac4 Log executor on finish 2013-07-27 16:02:06 -07:00
Patrick Wendell c2223e6801 Improve catch scope and logging for client stop()
This does two things:
1. Catches the more general `TimeoutException`, since those can be thrown.
2. Logs at info level when a timeout is detected.
2013-07-27 16:02:06 -07:00
Xinghao b0bbc7f6a8 Resolve conflicts with master, removed regParam for LogisticRegression 2013-07-26 18:57:39 -07:00
Xinghao 071afe2a33 New files from merge with master 2013-07-26 18:21:20 -07:00
Xinghao 10fd3949e6 Making ClassificationModel serializable 2013-07-26 17:49:11 -07:00
Xinghao f0a1f95228 Rename LogisticRegression, SVM and Lasso to *_LocalRandomSGD 2013-07-26 17:36:14 -07:00
Xinghao f74a03c6d8 Multiple changes
- Changed LogisticRegression regularization parameter to 0
- Removed println from SVM predict function
- Fixed "Lasso" -> "SVM" in SVMGenerator
- Added comment in Updater.scala to indicate L1 regularization leads to
soft thresholding proximal function
2013-07-26 17:29:44 -07:00
Reynold Xin f3d72ff2fe Merge pull request #739 from markhamstra/toolsPom
Missing tools/pom.xml scalatest dependency
2013-07-26 17:19:27 -07:00
Reynold Xin cb366774c8 Merge pull request #738 from harsha2010/pruning
Fix bug in Partition Pruning.
2013-07-26 16:59:30 -07:00
Mark Hamstra 3fc6408903 Added missing scalatest dependency 2013-07-26 16:10:20 -07:00
harshars 392d7474fd Code review 2013-07-26 15:23:15 -07:00
harshars 72cf7ec0e5 Indentation 2013-07-26 15:16:41 -07:00
harshars 822aac8f5a Indentation 2013-07-26 15:10:32 -07:00
harshars 743fc4e7aa Fix Bug in Partition Pruning, index of Pruned Partitions should inherit from parent 2013-07-26 14:35:17 -07:00
Matei Zaharia f3cf09491a Merge pull request #734 from woggle/executor-env2
Get more env vars from driver rather than worker
2013-07-25 14:53:21 -07:00
Charles Reiss a6de90c927 For standalone mode, get JAVA_HOME, SPARK_JAVA_OPTS, SPARK_LIBRARY_PATH from application env, not worker env 2013-07-25 12:42:30 -07:00
Matei Zaharia 8eb8b52997 Fix Chill version in Maven 2013-07-25 08:58:02 -07:00
Matei Zaharia e2421c1311 Update Chill reference in pom.xml too 2013-07-25 00:05:43 -07:00
Matei Zaharia 51c2427618 Merge pull request #732 from ryanlecompte/master
Refactor Kryo serializer support to use chill/chill-java
2013-07-25 00:03:11 -07:00
ryanlecompte e56aa75de0 fix wrapping 2013-07-24 22:08:09 -07:00
ryanlecompte 30a369a808 update pom.xml 2013-07-24 20:55:48 -07:00
ryanlecompte fc4b025314 add test 2013-07-24 20:53:15 -07:00
ryanlecompte a1c515fb02 add copyright back in 2013-07-24 20:50:32 -07:00
ryanlecompte 8e0939f5a9 refactor Kryo serializer support to use chill/chill-java 2013-07-24 20:43:57 -07:00
Matei Zaharia c258718606 Fix Maven build errors after previous commits 2013-07-24 16:12:32 -07:00
Xinghao eef678703e Adding SVM and Lasso, moving LogisticRegression to classification from regression
Also, add regularization parameter to SGD
2013-07-24 15:32:50 -07:00