spark-instrumented-optimizer/mllib
Sean Owen f67ce3e229 Merge pull request #460 from srowen/RandomInitialALSVectors
Choose initial user/item vectors uniformly on the unit sphere

...rather than within the unit square to possibly avoid bias in the initial state and improve convergence.

The current implementation picks the N vector elements uniformly at random from [0,1). This means they all point into one quadrant of the vector space. As N gets just a little large, the vector tend strongly to point into the "corner", towards (1,1,1...,1). The vectors are not unit vectors either.

I suggest choosing the elements as Gaussian ~ N(0,1) and normalizing. This gets you uniform random choices on the unit sphere which is more what's of interest here. It has worked a little better for me in the past.

This is pretty minor but wanted to warm up suggesting a few tweaks to ALS.
Please excuse my Scala, pretty new to it.

Author: Sean Owen <sowen@cloudera.com>

== Merge branch commits ==

commit 492b13a7469e5a4ed7591ee8e56d8bd7570dfab6
Author: Sean Owen <sowen@cloudera.com>
Date:   Mon Jan 27 08:05:25 2014 +0000

    Style: spaces around binary operators

commit ce2b5b5a4fefa0356875701f668f01f02ba4d87e
Author: Sean Owen <sowen@cloudera.com>
Date:   Sun Jan 19 22:50:03 2014 +0000

    Generate factors with all positive components, per discussion in https://github.com/apache/incubator-spark/pull/460

commit b6f7a8a61643a8209e8bc662e8e81f2d15c710c7
Author: Sean Owen <sowen@cloudera.com>
Date:   Sat Jan 18 15:54:42 2014 +0000

    Choose initial user/item vectors uniformly on the unit sphere rather than within the unit square to possibly avoid bias in the initial state and improve convergence
2014-01-27 11:15:51 -08:00
..
data Added Java unit test, data, and main method for Naive Bayes 2014-01-11 22:30:48 -08:00
src Merge pull request #460 from srowen/RandomInitialALSVectors 2014-01-27 11:15:51 -08:00
pom.xml Use scala.binary.version in POMs 2013-12-15 12:39:58 -08:00