spark-instrumented-optimizer/python/pyspark
CodingCat dc9654638f SPARK-1240: handle the case of empty RDD when takeSample
https://spark-project.atlassian.net/browse/SPARK-1240

It seems that the current implementation does not handle the empty RDD case when run takeSample

In this patch, before calling sample() inside takeSample API, I add a checker for this case and returns an empty Array when it's a empty RDD; also in sample(), I add a checker for the invalid fraction value

In the test case, I also add several lines for this case

Author: CodingCat <zhunansjtu@gmail.com>

Closes #135 from CodingCat/SPARK-1240 and squashes the following commits:

fef57d4 [CodingCat] fix the same problem in PySpark
36db06b [CodingCat] create new test cases for takeSample from an empty red
810948d [CodingCat] further fix
a40e8fb [CodingCat] replace if with require
ad483fd [CodingCat] handle the case with empty RDD when take sample
2014-03-16 22:14:59 -07:00
..
mllib Complain if Python and NumPy versions are too old for MLlib 2014-01-14 12:27:58 -08:00
__init__.py Changes on top of Prashant's patch. 2014-01-03 18:30:17 -08:00
accumulators.py Add custom serializer support to PySpark. 2013-11-10 16:45:38 -08:00
broadcast.py Fix some Python docs and make sure to unset SPARK_TESTING in Python 2013-12-29 20:15:07 -05:00
cloudpickle.py Rename top-level 'pyspark' directory to 'python' 2013-01-01 15:05:00 -08:00
conf.py SPARK-1114: Allow PySpark to use existing JVM and Gateway 2014-02-20 21:20:39 -08:00
context.py [SPARK-972] Added detailed callsite info for ValueError in context.py (resubmitted) 2014-03-10 13:34:49 -07:00
daemon.py Add Apache license headers and LICENSE and NOTICE files 2013-07-16 17:21:33 -07:00
files.py Initial work to rename package to org.apache.spark 2013-09-01 14:13:13 -07:00
java_gateway.py SPARK-929: Fully deprecate usage of SPARK_MEM 2014-03-09 11:08:39 -07:00
join.py Change numSplits to numPartitions in PySpark. 2013-02-24 13:25:09 -08:00
rdd.py SPARK-1240: handle the case of empty RDD when takeSample 2014-03-16 22:14:59 -07:00
rddsampler.py RDD sample() and takeSample() prototypes for PySpark 2013-08-28 16:46:13 -07:00
serializers.py SPARK-977 Added Python RDD.zip function 2014-03-10 13:27:00 -07:00
shell.py Merge pull request #542 from markhamstra/versionBump. Closes #542. 2014-02-08 16:00:43 -08:00
statcounter.py Implementing SPARK-838: Add DoubleRDDFunctions methods to PySpark 2013-08-21 17:05:58 -07:00
storagelevel.py Spark-1163, Added missing Python RDD functions 2014-03-11 23:57:05 -07:00
tests.py Fix for SPARK-1025: PySpark hang on missing files. 2014-01-23 18:24:51 -08:00
worker.py SPARK-1115: Catch depickling errors 2014-02-26 14:51:21 -08:00