The current way of seed distribution makes the random sequences from partition i and i+1 offset by 1.
~~~
In [14]: import random
In [15]: r1 = random.Random(10)
In [16]: r1.randint(0, 1)
Out[16]: 1
In [17]: r1.random()
Out[17]: 0.4288890546751146
In [18]: r1.random()
Out[18]: 0.5780913011344704
In [19]: r2 = random.Random(10)
In [20]: r2.randint(0, 1)
Out[20]: 1
In [21]: r2.randint(0, 1)
Out[21]: 0
In [22]: r2.random()
Out[22]: 0.5780913011344704
~~~
Note: The new tests are not for this bug fix.
Author: Xiangrui Meng <meng@databricks.com>
Closes#3010 from mengxr/SPARK-4148 and squashes the following commits:
869ae4b [Xiangrui Meng] move tests tests.py
c1bacd9 [Xiangrui Meng] fix seed distribution and add some tests for rdd.sample