spark-instrumented-optimizer

History

Xiangrui Meng e733d655df Merge pull request #578 from mengxr/rank. SPARK-1076: zipWithIndex and zipWithUniqueId to RDD Assign ranks to an ordered or unordered data set is a common operation. This could be done by first counting records in each partition and then assign ranks in parallel. The purpose of assigning ranks to an unordered set is usually to get a unique id for each item, e.g., to map feature names to feature indices. In such cases, the assignment could be done without counting records, saving one spark job. https://spark-project.atlassian.net/browse/SPARK-1076 == update == Because assigning ranks is very similar to Scala's zipWithIndex, I changed the method name to zipWithIndex and put the index in the value field. Author: Xiangrui Meng <meng@databricks.com> Closes #578 and squashes the following commits: 52a05e1 [Xiangrui Meng] changed assignRanks to zipWithIndex changed assignUniqueIds to zipWithUniqueId minor updates 756881c [Xiangrui Meng] simplified RankedRDD by implementing assignUniqueIds separately moved couting iterator size to Utils do not count items in the last partition and skip counting if there is only one partition 630868c [Xiangrui Meng] newline 21b434b [Xiangrui Meng] add assignRanks and assignUniqueIds to RDD		2014-02-12 00:42:42 -08:00
..
java/org/apache/spark/network/netty	Suggested small changes to Java code for slightly more standard style, encapsulation and in some cases performance	2014-01-02 16:17:57 +00:00
resources/org/apache/spark	Make DEBUG-level logs consummable.	2014-01-10 10:33:24 -08:00
scala/org/apache	Merge pull request #578 from mengxr/rank.	2014-02-12 00:42:42 -08:00