Commit graph

3390 commits

Author SHA1 Message Date
cdshines fefb03cbd7 Eliminated code duplication, refactored to pattern-matching style Partitioner and PythonPartitioner 2013-07-31 13:19:42 +03:00
shivaram 8aee118216 Merge pull request #748 from atalwalkar/master
made SimpleUpdater consistent with other updaters
2013-07-30 10:27:54 -07:00
Ameet Talwalkar e4387ddf5d made SimpleUpdater consistent with other updaters 2013-07-29 22:21:50 -07:00
Josh Rosen 49be084ed3 Use File.pathSeparator instead of hardcoding ':'. 2013-07-29 22:08:57 -07:00
Josh Rosen b95732632b Do not inherit master's PYTHONPATH on workers.
This fixes SPARK-832, an issue where PySpark
would not work when the master and workers used
different SPARK_HOME paths.

This change may potentially break code that relied
on the master's PYTHONPATH being used on workers.
To have custom PYTHONPATH additions used on the
workers, users should set a custom PYTHONPATH in
spark-env.sh rather than setting it in the shell.
2013-07-29 22:08:57 -07:00
Matei Zaharia 468a36c005 Merge pull request #746 from rxin/cleanup
Internal cleanup
2013-07-29 19:44:33 -07:00
atalwalkar 1e1ffb192a Merge pull request #745 from shivaram/loss-update-fix
Remove duplicate loss history in Gradient Descent
2013-07-29 19:26:19 -07:00
Shivaram Venkataraman 3ca9faa341 Clarify how regVal is computed in Updater docs 2013-07-29 18:37:28 -07:00
Reynold Xin 81720e13fc Moved all StandaloneClusterMessage's into StandaloneClusterMessages object. 2013-07-29 17:53:01 -07:00
Reynold Xin 23b5da14ed Moved block manager messages into BlockManagerMessages object. 2013-07-29 17:42:05 -07:00
Reynold Xin 105f4d22e9 Removed Cache and SoftReferenceCache since they are no longer used. 2013-07-29 17:30:38 -07:00
Matei Zaharia 207548b67b Open up Job UI ports (33000-33010) on EC2 clusters 2013-07-29 17:19:33 -07:00
Reynold Xin 17e62113d4 Moved DeployMessage's into its own DeployMessages object.
Also renamed MasterState to MasterStateResponse and WorkerState to WorkerStateResponse for clarity.
2013-07-29 17:14:44 -07:00
Patrick Wendell c99b674405 Merge pull request #735 from karenfeng/ui-807
Totals for shuffle data and CPU time
2013-07-29 16:32:55 -07:00
Shivaram Venkataraman 07da72b451 Remove duplicate loss history and clarify why.
Also some minor style fixes.
2013-07-29 16:25:17 -07:00
Karen Feng 2d6da9195a Alphabetized imports 2013-07-29 15:50:52 -07:00
Reynold Xin fe7298b587 Merge pull request #741 from pwendell/usability
Fix two small usability issues
2013-07-29 14:01:00 -07:00
shivaram c34c0f6a7c Merge pull request #731 from pxinghao/master
Adding SVM and Lasso
2013-07-29 13:18:10 -07:00
Xinghao 2b2630ba3c Style fix
Lines shortened to < 100 characters
2013-07-29 09:22:49 -07:00
Xinghao 07f17439a5 Fix validatePrediction functions for Classification models
Classifiers return categorical (Int) values that should be compared
directly
2013-07-29 09:22:31 -07:00
Xinghao 3a8d07df8c Deleting extra LogisticRegressionGenerator and RidgeRegressionGenerator 2013-07-29 09:20:26 -07:00
Xinghao 75f3757300 Fix rounding error in LogisticRegression.scala 2013-07-29 09:19:56 -07:00
Matei Zaharia d8158ced12 Merge branch 'master' of github.com:mesos/spark 2013-07-29 02:52:02 -04:00
Matei Zaharia 497f55755f Add docs about ipython 2013-07-29 02:51:43 -04:00
Matei Zaharia feba7ee540 SPARK-815. Python parallelize() should split lists before batching
One unfortunate consequence of this fix is that we materialize any
collections that are given to us as generators, but this seems necessary
to get reasonable behavior on small collections. We could add a
batchSize parameter later to bypass auto-computation of batch size if
this becomes a problem (e.g. if users really want to parallelize big
generators nicely)
2013-07-29 02:51:43 -04:00
Matei Zaharia d75c308695 Use None instead of empty string as it's slightly smaller/faster 2013-07-29 02:51:43 -04:00
Matei Zaharia 96b50e82dc Allow python/run-tests to run from any directory 2013-07-29 02:51:43 -04:00
Matei Zaharia b5ec355622 Optimize Python foreach() to not return as many objects 2013-07-29 02:51:43 -04:00
Matei Zaharia b9d6783f36 Optimize Python take() to not compute entire first partition 2013-07-29 02:51:43 -04:00
Xinghao c823ee1e2b Replace map-reduce with dot operator using DoubleMatrix 2013-07-28 22:17:53 -07:00
Xinghao 96e04f4cb7 Fixed SVM and LR train functions to take Int instead of Double for Classification 2013-07-28 22:12:39 -07:00
Xinghao 9398dced03 Changed Classification to return Int instead of Double
Also minor changes to formatting and comments
2013-07-28 21:39:19 -07:00
Xinghao 67de051bbb SVMSuite and LassoSuite rewritten to follow closely with LogisticRegressionSuite 2013-07-28 21:09:56 -07:00
Xinghao 29e042940a Move data generators to util 2013-07-28 20:39:52 -07:00
Matei Zaharia 72ff62a37c Two fixes to IPython support:
- Don't attempt to run worker processes with ipython (that can cause
  some crashes as ipython prints things to standard out)
- Allow passing some IPYTHON_OPTS to launch things like the notebook
2013-07-28 22:23:13 -04:00
Xinghao ccfa362dde Change *_LocalRandomSGD to *LocalRandomSGD 2013-07-28 10:33:57 -07:00
Matei Zaharia f11ad72d4e Some fixes to Python examples (style and package name for LR) 2013-07-27 21:12:22 -04:00
Karen Feng 077f2dad22 Fixed outdated bugs 2013-07-27 16:39:36 -07:00
Patrick Wendell bcafb36c1e Slight wording change 2013-07-27 16:03:50 -07:00
Patrick Wendell 8177165ac4 Log executor on finish 2013-07-27 16:02:06 -07:00
Patrick Wendell c2223e6801 Improve catch scope and logging for client stop()
This does two things:
1. Catches the more general `TimeoutException`, since those can be thrown.
2. Logs at info level when a timeout is detected.
2013-07-27 16:02:06 -07:00
Karen Feng 5a93e3c58c Cleaned up code based on pwendell's suggestions 2013-07-27 15:55:26 -07:00
Karen Feng dcc4743a95 Moved val now to render 2013-07-27 12:52:53 -07:00
Karen Feng 1714693324 Current time called once with value now 2013-07-27 12:24:41 -07:00
Xinghao b0bbc7f6a8 Resolve conflicts with master, removed regParam for LogisticRegression 2013-07-26 18:57:39 -07:00
Xinghao 071afe2a33 New files from merge with master 2013-07-26 18:21:20 -07:00
Xinghao 10fd3949e6 Making ClassificationModel serializable 2013-07-26 17:49:11 -07:00
Xinghao f0a1f95228 Rename LogisticRegression, SVM and Lasso to *_LocalRandomSGD 2013-07-26 17:36:14 -07:00
Xinghao f74a03c6d8 Multiple changes
- Changed LogisticRegression regularization parameter to 0
- Removed println from SVM predict function
- Fixed "Lasso" -> "SVM" in SVMGenerator
- Added comment in Updater.scala to indicate L1 regularization leads to
soft thresholding proximal function
2013-07-26 17:29:44 -07:00
Karen Feng bd4cc52e30 Made metrics Option instead of Some, fixed NullPointerException 2013-07-26 17:23:18 -07:00