spark-instrumented-optimizer/mllib
Shuo Xiang 5e6ad24ff6 [MLlib] SPARK-5954: Top by key
This PR implements two functions
  - `topByKey(num: Int): RDD[(K, Array[V])]` finds the top-k values for each key in a pair RDD. This can be used, e.g., in computing top recommendations.

- `takeOrderedByKey(num: Int): RDD[(K, Array[V])] ` does the opposite of `topByKey`

The `sorted` is used here as the `toArray` method of the PriorityQueue does not return a necessarily sorted array.

Author: Shuo Xiang <shuoxiangpub@gmail.com>

Closes #5075 from coderxiang/topByKey and squashes the following commits:

1611c37 [Shuo Xiang] code clean up
6f565c0 [Shuo Xiang] naming
a80e0ec [Shuo Xiang] typo and warning
82dded9 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into topByKey
d202745 [Shuo Xiang] move to MLPairRDDFunctions
901b0af [Shuo Xiang] style check
70c6e35 [Shuo Xiang] remove takeOrderedByKey, update doc and test
0895c17 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into topByKey
b10e325 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into topByKey
debccad [Shuo Xiang] topByKey
2015-03-20 14:45:44 -04:00
..
src [MLlib] SPARK-5954: Top by key 2015-03-20 14:45:44 -04:00
pom.xml [SPARK-6371] [build] Update version to 1.4.0-SNAPSHOT. 2015-03-20 18:43:57 +00:00