Commit graph

3413 commits

Author SHA1 Message Date
Reynold Xin c61843a69f Changed other LZF uses to use the compression codec interface. 2013-07-31 10:32:13 -07:00
Reynold Xin 98024eadc3 Renamed compressionOutputStream and compressionInputStream to compressedOutputStream and compressedInputStream. 2013-07-30 18:28:46 -07:00
Reynold Xin dae12fef9e Updated the configuration option for Snappy block size to be consistent with the documentation. 2013-07-30 17:49:31 -07:00
Reynold Xin 311aae76a2 Added Snappy dependency to Maven build files. 2013-07-30 17:25:42 -07:00
Reynold Xin 3b1ced83fb Exclude older version of Snappy in streaming and examples. 2013-07-30 17:25:36 -07:00
Reynold Xin 56774b176e Added unit test for compression codecs. 2013-07-30 17:12:33 -07:00
Reynold Xin 5227043f84 Documentation update for compression codec. 2013-07-30 17:12:16 -07:00
Reynold Xin ad7e9d0d64 CompressionCodec cleanup. Moved it to spark.io package. 2013-07-30 17:11:54 -07:00
Reynold Xin 368c58eac5 Merge branch 'lazy_file_open' of github.com:lyogavin/spark into compression
Conflicts:
	project/SparkBuild.scala
2013-07-30 16:04:18 -07:00
Patrick Wendell e87de037d6 Merge pull request #744 from karenfeng/bootstrap-update
Use Bootstrap progress bars in web UI
2013-07-30 15:00:08 -07:00
shivaram ae57020598 Merge pull request #752 from rxin/master
Minor mllib cleanup
2013-07-30 14:56:41 -07:00
Reynold Xin 366f7735eb Minor style cleanup of mllib. 2013-07-30 13:59:32 -07:00
Reynold Xin 47011e6854 Use a tigher bound in logistic regression unit test's prediction validation. 2013-07-30 13:58:23 -07:00
Reynold Xin e35966ae9a Renamed Classification.scala to ClassificationModel.scala and Regression.scala to RegressionModel.scala 2013-07-30 13:28:31 -07:00
Karen Feng 26144c400f Fixed wrap style 2013-07-30 12:40:41 -07:00
Karen Feng 218d7c4ed8 Fixed style, lowered height of progress bars 2013-07-30 12:39:17 -07:00
Karen Feng f1cab31b73 Removed intermediate set for activeTasks, removed progress bar margin 2013-07-30 11:06:47 -07:00
shivaram 8aee118216 Merge pull request #748 from atalwalkar/master
made SimpleUpdater consistent with other updaters
2013-07-30 10:27:54 -07:00
Ameet Talwalkar e4387ddf5d made SimpleUpdater consistent with other updaters 2013-07-29 22:21:50 -07:00
Josh Rosen 49be084ed3 Use File.pathSeparator instead of hardcoding ':'. 2013-07-29 22:08:57 -07:00
Josh Rosen b95732632b Do not inherit master's PYTHONPATH on workers.
This fixes SPARK-832, an issue where PySpark
would not work when the master and workers used
different SPARK_HOME paths.

This change may potentially break code that relied
on the master's PYTHONPATH being used on workers.
To have custom PYTHONPATH additions used on the
workers, users should set a custom PYTHONPATH in
spark-env.sh rather than setting it in the shell.
2013-07-29 22:08:57 -07:00
Matei Zaharia 468a36c005 Merge pull request #746 from rxin/cleanup
Internal cleanup
2013-07-29 19:44:33 -07:00
atalwalkar 1e1ffb192a Merge pull request #745 from shivaram/loss-update-fix
Remove duplicate loss history in Gradient Descent
2013-07-29 19:26:19 -07:00
Shivaram Venkataraman 3ca9faa341 Clarify how regVal is computed in Updater docs 2013-07-29 18:37:28 -07:00
Reynold Xin 81720e13fc Moved all StandaloneClusterMessage's into StandaloneClusterMessages object. 2013-07-29 17:53:01 -07:00
Reynold Xin 23b5da14ed Moved block manager messages into BlockManagerMessages object. 2013-07-29 17:42:05 -07:00
Reynold Xin 105f4d22e9 Removed Cache and SoftReferenceCache since they are no longer used. 2013-07-29 17:30:38 -07:00
Matei Zaharia 207548b67b Open up Job UI ports (33000-33010) on EC2 clusters 2013-07-29 17:19:33 -07:00
Reynold Xin 17e62113d4 Moved DeployMessage's into its own DeployMessages object.
Also renamed MasterState to MasterStateResponse and WorkerState to WorkerStateResponse for clarity.
2013-07-29 17:14:44 -07:00
Karen Feng 87b821dc39 Fixed continuity of executorToTasksActive, changed color of progress bars 2013-07-29 16:50:51 -07:00
Karen Feng c7b2788948 Merge branch 'master' of https://github.com/mesos/spark into bootstrap-update
Conflicts:
	core/src/main/scala/spark/ui/jobs/IndexPage.scala
2013-07-29 16:36:07 -07:00
Patrick Wendell c99b674405 Merge pull request #735 from karenfeng/ui-807
Totals for shuffle data and CPU time
2013-07-29 16:32:55 -07:00
Shivaram Venkataraman 07da72b451 Remove duplicate loss history and clarify why.
Also some minor style fixes.
2013-07-29 16:25:17 -07:00
Karen Feng 2d6da9195a Alphabetized imports 2013-07-29 15:50:52 -07:00
Karen Feng 478a2886d9 Added started tasks to progress bar 2013-07-29 14:51:07 -07:00
Karen Feng e04a37a332 Merge branch 'master' of https://github.com/mesos/spark into bootstrap-update
cially if it merges an updated upstream into a topic branch.
2013-07-29 14:32:48 -07:00
Reynold Xin fe7298b587 Merge pull request #741 from pwendell/usability
Fix two small usability issues
2013-07-29 14:01:00 -07:00
Karen Feng 43a2cc15c0 Use Bootstrap progress bars in web UI 2013-07-29 13:37:24 -07:00
shivaram c34c0f6a7c Merge pull request #731 from pxinghao/master
Adding SVM and Lasso
2013-07-29 13:18:10 -07:00
Xinghao 2b2630ba3c Style fix
Lines shortened to < 100 characters
2013-07-29 09:22:49 -07:00
Xinghao 07f17439a5 Fix validatePrediction functions for Classification models
Classifiers return categorical (Int) values that should be compared
directly
2013-07-29 09:22:31 -07:00
Xinghao 3a8d07df8c Deleting extra LogisticRegressionGenerator and RidgeRegressionGenerator 2013-07-29 09:20:26 -07:00
Xinghao 75f3757300 Fix rounding error in LogisticRegression.scala 2013-07-29 09:19:56 -07:00
Matei Zaharia d8158ced12 Merge branch 'master' of github.com:mesos/spark 2013-07-29 02:52:02 -04:00
Matei Zaharia 497f55755f Add docs about ipython 2013-07-29 02:51:43 -04:00
Matei Zaharia feba7ee540 SPARK-815. Python parallelize() should split lists before batching
One unfortunate consequence of this fix is that we materialize any
collections that are given to us as generators, but this seems necessary
to get reasonable behavior on small collections. We could add a
batchSize parameter later to bypass auto-computation of batch size if
this becomes a problem (e.g. if users really want to parallelize big
generators nicely)
2013-07-29 02:51:43 -04:00
Matei Zaharia d75c308695 Use None instead of empty string as it's slightly smaller/faster 2013-07-29 02:51:43 -04:00
Matei Zaharia 96b50e82dc Allow python/run-tests to run from any directory 2013-07-29 02:51:43 -04:00
Matei Zaharia b5ec355622 Optimize Python foreach() to not return as many objects 2013-07-29 02:51:43 -04:00
Matei Zaharia b9d6783f36 Optimize Python take() to not compute entire first partition 2013-07-29 02:51:43 -04:00