spark-instrumented-optimizer/python/pyspark
Doris Xin 2f75a4a30e [SPARK-2656] Python version of stratified sampling
exact sample size not supported for now.

Author: Doris Xin <doris.s.xin@gmail.com>

Closes #1554 from dorx/pystratified and squashes the following commits:

4ba927a [Doris Xin] use rel diff (+- 50%) instead of abs diff (+- 50)
bdc3f8b [Doris Xin] updated unit to check sample holistically
7713c7b [Doris Xin] Python version of stratified sampling
2014-07-24 23:42:08 -07:00
..
mllib [SPARK-2470] PEP8 fixes to PySpark 2014-07-21 22:30:53 -07:00
__init__.py [SPARK-2470] PEP8 fixes to PySpark 2014-07-21 22:30:53 -07:00
accumulators.py Add custom serializer support to PySpark. 2013-11-10 16:45:38 -08:00
broadcast.py Fix some Python docs and make sure to unset SPARK_TESTING in Python 2013-12-29 20:15:07 -05:00
cloudpickle.py follow pep8 None should be compared using is or is not 2014-07-15 21:34:05 -07:00
conf.py [SPARK-2014] Make PySpark store RDDs in MEMORY_ONLY_SER with compression by default 2014-07-24 18:15:37 -07:00
context.py [SPARK-2014] Make PySpark store RDDs in MEMORY_ONLY_SER with compression by default 2014-07-24 18:15:37 -07:00
daemon.py [SPARK-2470] PEP8 fixes to PySpark 2014-07-21 22:30:53 -07:00
files.py Initial work to rename package to org.apache.spark 2013-09-01 14:13:13 -07:00
java_gateway.py [SPARK-2470] PEP8 fixes to PySpark 2014-07-21 22:30:53 -07:00
join.py [SPARK-2470] PEP8 fixes to PySpark 2014-07-21 22:30:53 -07:00
rdd.py [SPARK-2656] Python version of stratified sampling 2014-07-24 23:42:08 -07:00
rddsampler.py [SPARK-2656] Python version of stratified sampling 2014-07-24 23:42:08 -07:00
resultiterable.py [SPARK-2470] PEP8 fixes to PySpark 2014-07-21 22:30:53 -07:00
serializers.py [SPARK-2538] [PySpark] Hash based disk spilling aggregation 2014-07-24 22:53:47 -07:00
shell.py [SPARK-2470] PEP8 fixes to PySpark 2014-07-21 22:30:53 -07:00
shuffle.py [SPARK-2538] [PySpark] Hash based disk spilling aggregation 2014-07-24 22:53:47 -07:00
sql.py [SPARK-2470] PEP8 fixes to PySpark 2014-07-21 22:30:53 -07:00
statcounter.py [SPARK-2470] PEP8 fixes to PySpark 2014-07-21 22:30:53 -07:00
storagelevel.py [SPARK-2470] PEP8 fixes to PySpark 2014-07-21 22:30:53 -07:00
tests.py [SPARK-2538] [PySpark] Hash based disk spilling aggregation 2014-07-24 22:53:47 -07:00
worker.py [SPARK-2470] PEP8 fixes to PySpark 2014-07-21 22:30:53 -07:00