spark-instrumented-optimizer/python/pyspark
freeman 97cf19f64e Fix for sampling error in NumPy v1.9 [SPARK-3995][PYSPARK]
Change maximum value for default seed during RDD sampling so that it is strictly less than 2 ** 32. This prevents a bug in the most recent version of NumPy, which cannot accept random seeds above this bound.

Adds an extra test that uses the default seed (instead of setting it manually, as in the docstrings).

mengxr

Author: freeman <the.freeman.lab@gmail.com>

Closes #2889 from freeman-lab/pyspark-sampling and squashes the following commits:

dc385ef [freeman] Change maximum value for default seed
2014-10-22 09:33:12 -07:00
..
mllib SPARK-3770: Make userFeatures accessible from python 2014-10-21 11:49:39 -07:00
streaming replace awaitTransformation with awaitTermination in scaladoc/javadoc 2014-10-21 09:37:17 -07:00
__init__.py [SPARK-3412] [PySpark] Replace Epydoc with Sphinx to generate Python API docs 2014-10-07 18:09:27 -07:00
accumulators.py [SPARK-3478] [PySpark] Profile the Python tasks 2014-09-30 18:24:57 -07:00
broadcast.py [SPARK-3430] [PySpark] [Doc] generate PySpark API docs using Sphinx 2014-09-16 12:51:58 -07:00
cloudpickle.py [SPARK-3679] [PySpark] pickle the exact globals of functions 2014-09-24 13:00:05 -07:00
conf.py [SPARK-3412] [PySpark] Replace Epydoc with Sphinx to generate Python API docs 2014-10-07 18:09:27 -07:00
context.py [SPARK-3971] [MLLib] [PySpark] hotfix: Customized pickler should work in cluster mode 2014-10-16 14:56:50 -07:00
daemon.py [SPARK-3030] [PySpark] Reuse Python worker 2014-09-13 16:22:04 -07:00
files.py [SPARK-3309] [PySpark] Put all public API in __all__ 2014-09-03 11:49:45 -07:00
heapq3.py [SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey() 2014-08-26 16:57:40 -07:00
java_gateway.py [SPARK-3167] Handle special driver configs in Windows 2014-08-26 22:52:16 -07:00
join.py [SPARK-546] Add full outer join to RDD and DStream. 2014-09-24 20:39:09 -07:00
rdd.py [Spark] RDD take() method: overestimate too much 2014-10-13 13:11:55 -07:00
rddsampler.py Fix for sampling error in NumPy v1.9 [SPARK-3995][PYSPARK] 2014-10-22 09:33:12 -07:00
resultiterable.py [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically 2014-08-06 12:58:24 -07:00
serializers.py [SPARK-2377] Python API for Streaming 2014-10-12 02:46:56 -07:00
shell.py [SPARK-3273][SPARK-3301]We should read the version information from the same place 2014-09-06 15:08:43 -07:00
shuffle.py [SPARK-3786] [PySpark] speedup tests 2014-10-06 14:07:53 -07:00
sql.py [SPARK-3909][PySpark][Doc] A corrupted format in Sphinx documents and building warnings 2014-10-11 11:51:59 -07:00
statcounter.py StatCounter on NumPy arrays [PYSPARK][SPARK-2012] 2014-08-01 22:33:25 -07:00
storagelevel.py [SPARK-3417] Use new-style classes in PySpark 2014-09-08 15:45:36 -07:00
tests.py Fix for sampling error in NumPy v1.9 [SPARK-3995][PYSPARK] 2014-10-22 09:33:12 -07:00
traceback_utils.py [SPARK-1087] Move python traceback utilities into new traceback_utils.py file. 2014-09-15 19:28:17 -07:00
worker.py [SPARK-3478] [PySpark] Profile the Python tasks 2014-09-30 18:24:57 -07:00