spark-instrumented-optimizer

History

Xiangrui Meng 32218307ed [SPARK-4372][MLLIB] Make LR and SVM's default parameters consistent in Scala and Python The current default regParam is 1.0 and regType is claimed to be none in Python (but actually it is l2), while regParam = 0.0 and regType is L2 in Scala. We should make the default values consistent. This PR sets the default regType to L2 and regParam to 0.01. Note that the default regParam value in LIBLINEAR (and hence scikit-learn) is 1.0. However, we use average loss instead of total loss in our formulation. Hence regParam=1.0 is definitely too heavy. In LinearRegression, we set regParam=0.0 and regType=None, because we have separate classes for Lasso and Ridge, both of which use regParam=0.01 as the default. davies atalwalkar Author: Xiangrui Meng <meng@databricks.com> Closes #3232 from mengxr/SPARK-4372 and squashes the following commits: 9979837 [Xiangrui Meng] update Ridge/Lasso to use default regParam 0.01 cast input arguments d3ba096 [Xiangrui Meng] change 'none' back to None 1909a6e [Xiangrui Meng] change default regParam to 0.01 and regType to L2 in LR and SVM		2014-11-13 13:54:16 -08:00
..
mllib	[SPARK-4372][MLLIB] Make LR and SVM's default parameters consistent in Scala and Python	2014-11-13 13:54:16 -08:00
streaming	replace awaitTransformation with awaitTermination in scaladoc/javadoc	2014-10-21 09:37:17 -07:00
__init__.py	[SPARK-4348] [PySpark] [MLlib] rename random.py to rand.py	2014-11-13 10:24:54 -08:00
accumulators.py	[SPARK-3478] [PySpark] Profile the Python tasks	2014-09-30 18:24:57 -07:00
broadcast.py	[SPARK-3430] [PySpark] [Doc] generate PySpark API docs using Sphinx	2014-09-16 12:51:58 -07:00
cloudpickle.py	[SPARK-3679] [PySpark] pickle the exact globals of functions	2014-09-24 13:00:05 -07:00
conf.py	[SPARK-3412] [PySpark] Replace Epydoc with Sphinx to generate Python API docs	2014-10-07 18:09:27 -07:00
context.py	[SPARK-4186] add binaryFiles and binaryRecords in Python	2014-11-06 00:22:19 -08:00
daemon.py	[SPARK-4088] [PySpark] Python worker should exit after socket is closed by JVM	2014-10-25 01:20:39 -07:00
files.py	[SPARK-3309] [PySpark] Put all public API in __all__	2014-09-03 11:49:45 -07:00
heapq3.py	[SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey()	2014-08-26 16:57:40 -07:00
java_gateway.py	[SPARK-3167] Handle special driver configs in Windows	2014-08-26 22:52:16 -07:00
join.py	[SPARK-546] Add full outer join to RDD and DStream.	2014-09-24 20:39:09 -07:00
rdd.py	[SPARK-4304] [PySpark] Fix sort on empty RDD	2014-11-07 20:53:03 -08:00
rddsampler.py	[SPARK-4148][PySpark] fix seed distribution and add some tests for rdd.sample	2014-11-03 12:24:24 -08:00
resultiterable.py	[SPARK-2627] [PySpark] have the build enforce PEP 8 automatically	2014-08-06 12:58:24 -07:00
serializers.py	[SPARK-3886] [PySpark] simplify serializer, use AutoBatchedSerializer by default.	2014-11-03 23:56:14 -08:00
shell.py	[SPARK-3273][SPARK-3301]We should read the version information from the same place	2014-09-06 15:08:43 -07:00
shuffle.py	[SPARK-3886] [PySpark] simplify serializer, use AutoBatchedSerializer by default.	2014-11-03 23:56:14 -08:00
sql.py	[SPARK-3886] [PySpark] simplify serializer, use AutoBatchedSerializer by default.	2014-11-03 23:56:14 -08:00
statcounter.py	StatCounter on NumPy arrays [PYSPARK][SPARK-2012]	2014-08-01 22:33:25 -07:00
storagelevel.py	[SPARK-3417] Use new-style classes in PySpark	2014-09-08 15:45:36 -07:00
tests.py	[SPARK-4304] [PySpark] Fix sort on empty RDD	2014-11-07 20:53:03 -08:00
traceback_utils.py	[SPARK-1087] Move python traceback utilities into new traceback_utils.py file.	2014-09-15 19:28:17 -07:00
worker.py	[SPARK-3993] [PySpark] fix bug while reuse worker after take()	2014-10-23 17:20:00 -07:00