spark-instrumented-optimizer

History

Joseph K. Bradley 7bf6cc9701 [SPARK-3751] [mllib] DecisionTree: example update + print options DecisionTreeRunner functionality additions: * Allow user to pass in a test dataset * Do not print full model if the model is too large. As part of this, modify DecisionTreeModel and RandomForestModel to allow printing less info. Proposed updates: * toString: prints model summary * toDebugString: prints full model (named after RDD.toDebugString) Similar update to Python API: * __repr__() now prints a model summary * toDebugString() now prints the full model CC: mengxr chouqin manishamde codedeft Small update (whomever can take a look). Thanks! Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com> Closes #2604 from jkbradley/dtrunner-update and squashes the following commits: b2b3c60 [Joseph K. Bradley] re-added python sql doc test, temporarily removed before 07b1fae [Joseph K. Bradley] repr() now prints a model summary toDebugString() now prints the full model 1d0d93d [Joseph K. Bradley] Updated DT and RF to print less when toString is called. Added toDebugString for verbose printing. 22eac8c [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dtrunner-update e007a95 [Joseph K. Bradley] Updated DecisionTreeRunner to accept a test dataset.		2014-10-01 01:03:24 -07:00
..
docs	[SPARK-3430] [PySpark] [Doc] generate PySpark API docs using Sphinx	2014-09-16 12:51:58 -07:00
lib	[SPARK-2305] [PySpark] Update Py4J to version 0.8.2.1	2014-07-29 19:02:06 -07:00
pyspark	[SPARK-3751] [mllib] DecisionTree: example update + print options	2014-10-01 01:03:24 -07:00
test_support	[SPARK-3634] [PySpark] User's module should take precedence over system modules	2014-09-24 12:10:09 -07:00
.gitignore	SPARK-1004. PySpark on YARN	2014-04-29 23:24:34 -07:00
epydoc.conf	[SPARK-3491] [MLlib] [PySpark] use pickle to serialize data in MLlib	2014-09-19 15:01:11 -07:00
run-tests	[SPARK-3491] [MLlib] [PySpark] use pickle to serialize data in MLlib	2014-09-19 15:01:11 -07:00