spark-instrumented-optimizer/core
Liang-Chi Hsieh 68c0c460bf [SPARK-13742] [CORE] Add non-iterator interface to RandomSampler
JIRA: https://issues.apache.org/jira/browse/SPARK-13742

## What changes were proposed in this pull request?

`RandomSampler.sample` currently accepts iterator as input and output another iterator. This makes it inappropriate to use in wholestage codegen of `Sampler` operator #11517. This change is to add non-iterator interface to `RandomSampler`.

This change adds a new method `def sample(): Int` to the trait `RandomSampler`. As we don't need to know the actual values of the sampling items, so this new method takes no arguments.

This method will decide whether to sample the next item or not. It returns how many times the next item will be sampled.

For `BernoulliSampler` and `BernoulliCellSampler`, the returned sampling times can only be 0 or 1. It simply means whether to sample the next item or not.

For `PoissonSampler`, the returned value can be more than 1, meaning the next item will be sampled multiple times.

## How was this patch tested?

Tests are added into `RandomSamplerSuite`.

Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
Author: Liang-Chi Hsieh <viirya@appier.com>
Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #11578 from viirya/random-sampler-no-iterator.
2016-03-28 09:58:47 -07:00
..
src [SPARK-13742] [CORE] Add non-iterator interface to RandomSampler 2016-03-28 09:58:47 -07:00
pom.xml [SPARK-13848][SPARK-5185] Update to Py4J 0.9.2 in order to fix classloading issue 2016-03-14 12:22:02 -07:00