Tor Myklebust
02208a175c
Initial weights in Scala are ones; do that too. Also fix some errors.
2013-12-25 00:53:48 -05:00
Tor Myklebust
05163057a1
Split the mllib bindings into a whole bunch of modules and rename some things.
2013-12-25 00:08:05 -05:00
Andrew Ash
3665c722b5
Typo: avaiable -> available
2013-12-24 17:25:04 -08:00
Tathagata Das
d4dfab503a
Fixed Python API for sc.setCheckpointDir. Also other fixes based on Reynold's comments on PR 289.
2013-12-24 14:01:13 -08:00
Tor Myklebust
86e38c4942
Remove useless line from test stub.
2013-12-24 16:49:31 -05:00
Tor Myklebust
4efec6eb94
Python change for move of PythonMLLibAPI.
2013-12-24 16:49:03 -05:00
Tor Myklebust
cbb2811189
Release JVM reference to the ALSModel when done.
2013-12-22 15:03:58 -05:00
Tor Myklebust
076fc16221
Python stubs for ALSModel.
2013-12-21 14:54:01 -05:00
Tor Myklebust
0b494c2167
Un-semicolon mllib.py.
2013-12-20 02:05:55 -05:00
Tor Myklebust
0a5cacb961
Change some docstrings and add some others.
2013-12-20 02:05:15 -05:00
Tor Myklebust
b835ddf3df
Licence notice.
2013-12-20 01:55:03 -05:00
Tor Myklebust
d89cc1e28a
Whitespace.
2013-12-20 01:50:42 -05:00
Tor Myklebust
319520b9bb
Remove gigantic endian-specific test and exception tests.
2013-12-20 01:48:44 -05:00
Tor Myklebust
2940201ad8
Tests for the Python side of the mllib bindings.
2013-12-20 01:33:32 -05:00
Tor Myklebust
73e17064c6
Python stubs for classification and clustering.
2013-12-20 00:12:48 -05:00
Tor Myklebust
2328bdd00f
Python side of python bindings for linear, Lasso, and ridge regression
2013-12-19 22:45:16 -05:00
Reynold Xin
7990c56375
Merge pull request #276 from shivaram/collectPartition
...
Add collectPartition to JavaRDD interface.
This interface is useful for implementing `take` from other language frontends where the data is serialized. Also remove `takePartition` from PythonRDD and use `collectPartition` in rdd.py.
Thanks @concretevitamin for the original change and tests.
2013-12-19 13:35:09 -08:00
Shivaram Venkataraman
d3234f9726
Make collectPartitions take an array of partitions
...
Change the implementation to use runJob instead of PartitionPruningRDD.
Also update the unit tests and the python take implementation
to use the new interface.
2013-12-19 11:40:34 -08:00
Nick Pentreath
a76f53416c
Add toString to Java RDD, and __repr__ to Python RDD
2013-12-19 14:38:20 +02:00
Tor Myklebust
bf20591a00
Incorporate most of Josh's style suggestions. I don't want to deal with the type and length checking errors until we've got at least one working stub that we're all happy with.
2013-12-19 03:40:57 -05:00
Tor Myklebust
bf491bb3c0
The rest of the Python side of those bindings.
2013-12-19 01:29:51 -05:00
Tor Myklebust
95915f8b3b
First cut at python mllib bindings. Only LinearRegression is supported.
2013-12-19 01:29:09 -05:00
Shivaram Venkataraman
af0cd6bd27
Add collectPartition to JavaRDD interface.
...
Also remove takePartition from PythonRDD and use collectPartition in rdd.py.
2013-12-18 11:40:07 -08:00
Prashant Sharma
603af51bb5
Merge branch 'master' into akka-bug-fix
...
Conflicts:
core/pom.xml
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
pom.xml
project/SparkBuild.scala
streaming/pom.xml
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala
2013-12-11 10:21:53 +05:30
Patrick Wendell
5b74609d97
License headers
2013-12-09 16:41:01 -08:00
Josh Rosen
3787f514d9
Fix UnicodeEncodeError in PySpark saveAsTextFile().
...
Fixes SPARK-970.
2013-11-28 23:44:56 -08:00
Prashant Sharma
17987778da
Merge branch 'master' into wip-scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala
core/src/main/scala/org/apache/spark/rdd/MapPartitionsWithContextRDD.scala
core/src/main/scala/org/apache/spark/rdd/RDD.scala
python/pyspark/rdd.py
2013-11-27 14:44:12 +05:30
Josh Rosen
1b74a27da0
Removed unused basestring case from dump_stream.
2013-11-26 14:35:12 -08:00
Raymond Liu
0f2e3c6e31
Merge branch 'master' into scala-2.10
2013-11-13 16:55:11 +08:00
Josh Rosen
13122ceb8c
FramedSerializer: _dumps => dumps, _loads => loads.
2013-11-10 17:53:25 -08:00
Josh Rosen
ffa5bedf46
Send PySpark commands as bytes insetad of strings.
2013-11-10 16:46:00 -08:00
Josh Rosen
cbb7f04aef
Add custom serializer support to PySpark.
...
For now, this only adds MarshalSerializer, but it lays the groundwork
for other supporting custom serializers. Many of these mechanisms
can also be used to support deserialization of different data formats
sent by Java, such as data encoded by MsgPack.
This also fixes a bug in SparkContext.union().
2013-11-10 16:45:38 -08:00
Josh Rosen
7d68a81a8e
Remove Pickle-wrapping of Java objects in PySpark.
...
If we support custom serializers, the Python
worker will know what type of input to expect,
so we won't need to wrap Tuple2 and Strings into
pickled tuples and strings.
2013-11-03 11:03:02 -08:00
Josh Rosen
a48d88d206
Replace magic lengths with constants in PySpark.
...
Write the length of the accumulators section up-front rather
than terminating it with a negative length. I find this
easier to read.
2013-11-03 10:54:24 -08:00
Ewen Cheslack-Postava
317a9eb1ce
Pass self to SparkContext._ensure_initialized.
...
The constructor for SparkContext should pass in self so that we track
the current context and produce errors if another one is created. Add
a doctest to make sure creating multiple contexts triggers the
exception.
2013-10-22 11:26:49 -07:00
Ewen Cheslack-Postava
56d230e614
Add classmethod to SparkContext to set system properties.
...
Add a new classmethod to SparkContext to set system properties like is
possible in Scala/Java. Unlike the Java/Scala implementations, there's
no access to System until the JVM bridge is created. Since
SparkContext handles that, move the initialization of the JVM
connection to a separate classmethod that can safely be called
repeatedly as long as the same instance (or no instance) is provided.
2013-10-22 00:22:37 -07:00
Ewen Cheslack-Postava
7eaa56de7f
Add an add() method to pyspark accumulators.
...
Add a regular method for adding a term to accumulators in
pyspark. Currently if you have a non-global accumulator, adding to it
is awkward. The += operator can't be used for non-global accumulators
captured via closure because it's involves an assignment. The only way
to do it is using __iadd__ directly.
Adding this method lets you write code like this:
def main():
sc = SparkContext()
accum = sc.accumulator(0)
rdd = sc.parallelize([1,2,3])
def f(x):
accum.add(x)
rdd.foreach(f)
print accum.value
where using accum += x instead would have caused UnboundLocalError
exceptions in workers. Currently it would have to be written as
accum.__iadd__(x).
2013-10-19 19:55:39 -07:00
Prashant Sharma
026ab75661
Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10
2013-10-10 09:42:55 +05:30
Matei Zaharia
478b2b7edc
Fix PySpark docs and an overly long line of code after fdbae41e
2013-10-09 12:08:04 -07:00
Prashant Sharma
7be75682b9
Merge branch 'master' into wip-merge-master
...
Conflicts:
bagel/pom.xml
core/pom.xml
core/src/test/scala/org/apache/spark/ui/UISuite.scala
examples/pom.xml
mllib/pom.xml
pom.xml
project/SparkBuild.scala
repl/pom.xml
streaming/pom.xml
tools/pom.xml
In scala 2.10, a shorter representation is used for naming artifacts
so changed to shorter scala version for artifacts and made it a property in pom.
2013-10-08 11:29:40 +05:30
Andre Schumacher
fdbae41e88
SPARK-705: implement sortByKey() in PySpark
2013-10-07 12:16:33 -07:00
Andre Schumacher
c84946fe21
Fixing SPARK-602: PythonPartitioner
...
Currently PythonPartitioner determines partition ID by hashing a
byte-array representation of PySpark's key. This PR lets
PythonPartitioner use the actual partition ID, which is required e.g.
for sorting via PySpark.
2013-10-04 11:56:47 -07:00
Prashant Sharma
5829692885
Merge branch 'master' into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala
docs/_config.yml
project/SparkBuild.scala
repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
2013-10-01 11:57:24 +05:30
shane-huang
84849baf88
Merge branch 'reorgscripts' into scripts-reorg
2013-09-27 09:28:33 +08:00
shane-huang
e8b1ee04fc
fix paths and change spark to use APP_MEM as application driver memory instead of SPARK_MEM, user should add application jars to SPARK_CLASSPATH
...
Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-09-26 17:08:47 +08:00
Patrick Wendell
6079721fa1
Update build version in master
2013-09-24 11:41:51 -07:00
shane-huang
1d53792a0a
add scripts in bin
...
Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-09-23 16:13:46 +08:00
shane-huang
dfbdc9ddb7
added spark-class and spark-executor to sbin
...
Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-09-23 11:28:58 +08:00
Prashant Sharma
383e151fd7
Merge branch 'master' of git://github.com/mesos/spark into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
project/SparkBuild.scala
2013-09-15 10:55:12 +05:30
Aaron Davidson
a3868544be
Whoopsy daisy
2013-09-08 00:30:47 -07:00
Aaron Davidson
c1cc8c4da2
Export StorageLevel and refactor
2013-09-07 14:41:31 -07:00
Aaron Davidson
8001687af5
Remove reflection, hard-code StorageLevels
...
The sc.StorageLevel -> StorageLevel pathway is a bit janky, but otherwise
the shell would have to call a private method of SparkContext. Having
StorageLevel available in sc also doesn't seem like the end of the world.
There may be a better solution, though.
As for creating the StorageLevel object itself, this seems to be the best
way in Python 2 for creating singleton, enum-like objects:
http://stackoverflow.com/questions/36932/how-can-i-represent-an-enum-in-python
2013-09-07 09:34:07 -07:00
Aaron Davidson
b8a0b6ea5e
Memoize StorageLevels read from JVM
2013-09-06 15:36:04 -07:00
Prashant Sharma
4106ae9fbf
Merged with master
2013-09-06 17:53:01 +05:30
Aaron Davidson
a63d4c7dc2
SPARK-660: Add StorageLevel support in Python
...
It uses reflection... I am not proud of that fact, but it at least ensures
compatibility (sans refactoring of the StorageLevel stuff).
2013-09-05 23:36:27 -07:00
Matei Zaharia
12b2f1f9c9
Add missing license headers found with RAT
2013-09-02 12:23:03 -07:00
Matei Zaharia
2ba695292a
Exclude some private modules in epydoc
2013-09-02 12:22:52 -07:00
Matei Zaharia
141f54279e
Further fixes to get PySpark to work on Windows
2013-09-02 01:19:29 +00:00
Matei Zaharia
6550e5e60c
Allow PySpark to launch worker.py directly on Windows
2013-09-01 18:06:15 -07:00
Matei Zaharia
0a8cc30921
Move some classes to more appropriate packages:
...
* RDD, *RDDFunctions -> org.apache.spark.rdd
* Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util
* JavaSerializer, KryoSerializer -> org.apache.spark.serializer
2013-09-01 14:13:16 -07:00
Matei Zaharia
bbaa9d7d6e
Add banner to PySpark and make wordcount output nicer
2013-09-01 14:13:16 -07:00
Matei Zaharia
46eecd110a
Initial work to rename package to org.apache.spark
2013-09-01 14:13:13 -07:00
Matei Zaharia
6edef9c833
Merge pull request #861 from AndreSchumacher/pyspark_sampling_function
...
Pyspark sampling function
2013-08-31 13:39:24 -07:00
Matei Zaharia
fd89835965
Merge pull request #870 from JoshRosen/spark-885
...
Don't send SIGINT / ctrl-c to Py4J gateway subprocess
2013-08-31 13:18:12 -07:00
Matei Zaharia
618f0ecb43
Merge pull request #869 from AndreSchumacher/subtract
...
PySpark: implementing subtractByKey(), subtract() and keyBy()
2013-08-30 18:17:13 -07:00
Andre Schumacher
96571c2524
PySpark: replacing class manifest by class tag for Scala 2.10.2 inside rdd.py
2013-08-30 15:00:42 -07:00
Matei Zaharia
ab0e625d9e
Fix PySpark for assembly run and include it in dist
2013-08-29 21:19:06 -07:00
Matei Zaharia
53cd50c069
Change build and run instructions to use assemblies
...
This commit makes Spark invocation saner by using an assembly JAR to
find all of Spark's dependencies instead of adding all the JARs in
lib_managed. It also packages the examples into an assembly and uses
that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script
with two better-named scripts: "run-examples" for examples, and
"spark-class" for Spark internal classes (e.g. REPL, master, etc). This
is also designed to minimize the confusion people have in trying to use
"run" to run their own classes; it's not meant to do that, but now at
least if they look at it, they can modify run-examples to do a decent
job for them.
As part of this, Bagel's examples are also now properly moved to the
examples package instead of bagel.
2013-08-29 21:19:04 -07:00
Andre Schumacher
a511c5379e
RDD sample() and takeSample() prototypes for PySpark
2013-08-28 16:46:13 -07:00
Josh Rosen
742c44eae6
Don't send SIGINT to Py4J gateway subprocess.
...
This addresses SPARK-885, a usability issue where PySpark's
Java gateway process would be killed if the user hit ctrl-c.
Note that SIGINT still won't cancel the running s
This fix is based on http://stackoverflow.com/questions/5045771
2013-08-28 16:39:44 -07:00
Andre Schumacher
457bcd3343
PySpark: implementing subtractByKey(), subtract() and keyBy()
2013-08-28 16:14:22 -07:00
Andre Schumacher
76077bf9f4
Implementing SPARK-838: Add DoubleRDDFunctions methods to PySpark
2013-08-21 17:05:58 -07:00
Andre Schumacher
c7e348faec
Implementing SPARK-878 for PySpark: adding zip and egg files to context and passing it down to workers which add these to their sys.path
2013-08-16 11:58:20 -07:00
Josh Rosen
7a9abb9ddc
Fix PySpark unit tests on Python 2.6.
2013-08-14 15:12:12 -07:00
Matei Zaharia
e2fdac60da
Merge pull request #802 from stayhf/SPARK-760-Python
...
Simple PageRank algorithm implementation in Python for SPARK-760
2013-08-12 21:26:59 -07:00
Matei Zaharia
d3525babee
Merge pull request #813 from AndreSchumacher/add_files_pyspark
...
Implementing SPARK-865: Add the equivalent of ADD_JARS to PySpark
2013-08-12 21:02:39 -07:00
Andre Schumacher
8fd5c7bc00
Implementing SPARK-865: Add the equivalent of ADD_JARS to PySpark
...
Now ADD_FILES uses a comma as file name separator.
2013-08-12 20:22:52 -07:00
stayhf
24f02082c7
Code update for Matei's suggestions
2013-08-11 22:54:05 +00:00
stayhf
55d9bde2fa
Simple PageRank algorithm implementation in Python for SPARK-760
2013-08-10 23:48:51 +00:00
Matei Zaharia
3c8478e1fb
Merge pull request #747 from mateiz/improved-lr
...
Update the Python logistic regression example
2013-08-06 23:25:03 -07:00
Matei Zaharia
5ac548397d
Fix string parsing and style in LR
2013-07-31 23:12:30 -07:00
Josh Rosen
b95732632b
Do not inherit master's PYTHONPATH on workers.
...
This fixes SPARK-832, an issue where PySpark
would not work when the master and workers used
different SPARK_HOME paths.
This change may potentially break code that relied
on the master's PYTHONPATH being used on workers.
To have custom PYTHONPATH additions used on the
workers, users should set a custom PYTHONPATH in
spark-env.sh rather than setting it in the shell.
2013-07-29 22:08:57 -07:00
Matei Zaharia
01f94931d5
Update the Python logistic regression example to read from a file and
...
batch input records for more efficient NumPy computations
2013-07-29 19:23:41 -07:00
Matei Zaharia
d8158ced12
Merge branch 'master' of github.com:mesos/spark
2013-07-29 02:52:02 -04:00
Matei Zaharia
feba7ee540
SPARK-815. Python parallelize() should split lists before batching
...
One unfortunate consequence of this fix is that we materialize any
collections that are given to us as generators, but this seems necessary
to get reasonable behavior on small collections. We could add a
batchSize parameter later to bypass auto-computation of batch size if
this becomes a problem (e.g. if users really want to parallelize big
generators nicely)
2013-07-29 02:51:43 -04:00
Matei Zaharia
d75c308695
Use None instead of empty string as it's slightly smaller/faster
2013-07-29 02:51:43 -04:00
Matei Zaharia
96b50e82dc
Allow python/run-tests to run from any directory
2013-07-29 02:51:43 -04:00
Matei Zaharia
b5ec355622
Optimize Python foreach() to not return as many objects
2013-07-29 02:51:43 -04:00
Matei Zaharia
b9d6783f36
Optimize Python take() to not compute entire first partition
2013-07-29 02:51:43 -04:00
Matei Zaharia
f11ad72d4e
Some fixes to Python examples (style and package name for LR)
2013-07-27 21:12:22 -04:00
Matei Zaharia
af3c9d5042
Add Apache license headers and LICENSE and NOTICE files
2013-07-16 17:21:33 -07:00
root
ec31e68d5d
Fixed PySpark perf regression by not using socket.makefile(), and improved
...
debuggability by letting "print" statements show up in the executor's stderr
Conflicts:
core/src/main/scala/spark/api/python/PythonRDD.scala
2013-07-01 06:26:31 +00:00
Jey Kottalam
c75bed0eeb
Fix reporting of PySpark exceptions
2013-06-21 12:14:16 -04:00
Jey Kottalam
7c5ff733ee
PySpark daemon: fix deadlock, improve error handling
2013-06-21 12:14:16 -04:00
Jey Kottalam
62c4781400
Add tests and fixes for Python daemon shutdown
2013-06-21 12:14:16 -04:00
Jey Kottalam
c79a6078c3
Prefork Python worker processes
2013-06-21 12:14:16 -04:00
Jey Kottalam
40afe0d2a5
Add Python timing instrumentation
2013-06-21 12:14:16 -04:00
Jey Kottalam
9a731f5a6d
Fix Python saveAsTextFile doctest to not expect order to be preserved
2013-04-02 11:59:20 -07:00
Jey Kottalam
20604001e2
Fix argv handling in Python transitive closure example
2013-04-02 11:59:07 -07:00
Josh Rosen
2c966c98fb
Change numSplits to numPartitions in PySpark.
2013-02-24 13:25:09 -08:00
Mark Hamstra
b7a1fb5c5d
Add commutative requirement for 'reduce' to Python docstring.
2013-02-09 12:14:11 -08:00
Josh Rosen
e61729113d
Remove unnecessary doctest __main__ methods.
2013-02-03 21:29:40 -08:00
Josh Rosen
8fbd5380b7
Fetch fewer objects in PySpark's take() method.
2013-02-03 06:44:49 +00:00
Josh Rosen
2415c18f48
Fix reporting of PySpark doctest failures.
2013-02-03 06:44:11 +00:00
Josh Rosen
e211f405bc
Use spark.local.dir for PySpark temp files (SPARK-580).
2013-02-01 11:50:27 -08:00
Josh Rosen
9cc6ff9c4e
Do not launch JavaGateways on workers (SPARK-674).
...
The problem was that the gateway was being initialized whenever the
pyspark.context module was loaded. The fix uses lazy initialization
that occurs only when SparkContext instances are actually constructed.
I also made the gateway and jvm variables private.
This change results in ~3-4x performance improvement when running the
PySpark unit tests.
2013-02-01 11:13:10 -08:00
Josh Rosen
57b64d0d19
Fix stdout redirection in PySpark.
2013-02-01 00:25:19 -08:00
Patrick Wendell
3446d5c8d6
SPARK-673: Capture and re-throw Python exceptions
...
This patch alters the Python <-> executor protocol to pass on
exception data when they occur in user Python code.
2013-01-31 18:06:11 -08:00
Matei Zaharia
55327a283e
Merge pull request #430 from pwendell/pyspark-guide
...
Minor improvements to PySpark docs
2013-01-30 15:35:29 -08:00
Patrick Wendell
3f945e3b83
Make module help available in python shell.
...
Also, adds a line in doc explaining how to use.
2013-01-30 15:04:06 -08:00
Stephen Haberman
7dfb82a992
Replace old 'master' term with 'driver'.
2013-01-25 11:03:00 -06:00
Matei Zaharia
a2f4891d1d
Merge pull request #396 from JoshRosen/spark-653
...
Make PySpark AccumulatorParam an abstract base class
2013-01-24 13:05:03 -08:00
Josh Rosen
b47d054cfc
Remove use of abc.ABCMeta due to cloudpickle issue.
...
cloudpickle runs into issues while pickling subclasses of AccumulatorParam,
which may be related to this Python issue:
http://bugs.python.org/issue7689
This seems hard to fix and the ABCMeta wasn't necessary, so I removed it.
2013-01-23 11:47:27 -08:00
Josh Rosen
ae2ed2947d
Allow PySpark's SparkFiles to be used from driver
...
Fix minor documentation formatting issues.
2013-01-23 10:58:50 -08:00
Josh Rosen
35168d9c89
Fix sys.path bug in PySpark SparkContext.addPyFile
2013-01-22 17:54:11 -08:00
Josh Rosen
c75ae3622e
Make AccumulatorParam an abstract base class.
2013-01-21 22:32:57 -08:00
Josh Rosen
ef711902c1
Don't download files to master's working directory.
...
This should avoid exceptions caused by existing
files with different contents.
I also removed some unused code.
2013-01-21 17:34:17 -08:00
Matei Zaharia
c7b5e5f1ec
Merge pull request #389 from JoshRosen/python_rdd_checkpointing
...
Add checkpointing to the Python API
2013-01-20 17:10:44 -08:00
Josh Rosen
9f211dd3f0
Fix PythonPartitioner equality; see SPARK-654.
...
PythonPartitioner did not take the Python-side partitioning function
into account when checking for equality, which might cause problems
in the future.
2013-01-20 15:41:42 -08:00
Josh Rosen
00d70cd660
Clean up setup code in PySpark checkpointing tests
2013-01-20 15:38:11 -08:00
Josh Rosen
5b6ea9e9a0
Update checkpointing API docs in Python/Java.
2013-01-20 15:31:41 -08:00
Josh Rosen
d0ba80dc72
Add checkpointFile() and more tests to PySpark.
2013-01-20 13:59:45 -08:00
Josh Rosen
7ed1bf4b48
Add RDD checkpointing to Python API.
2013-01-20 13:19:19 -08:00
Josh Rosen
17035db159
Add __repr__ to Accumulator; fix bug in sc.accumulator
2013-01-20 11:58:57 -08:00
Josh Rosen
9f54d7e1f5
Merge pull request #387 from mateiz/python-accumulators
...
Add accumulators to PySpark
2013-01-20 11:00:36 -08:00
Matei Zaharia
2a8c2a6790
Minor formatting fixes
2013-01-20 10:24:53 -08:00
Matei Zaharia
a23ed25f3c
Add a class comment to Accumulator
2013-01-20 02:10:25 -08:00
Matei Zaharia
61b6382a35
Launch accumulator tests in run-tests
2013-01-20 01:59:07 -08:00
Matei Zaharia
8e7f098a2c
Added accumulators to PySpark
2013-01-20 01:57:44 -08:00
Nick Pentreath
b77f7390a5
Python ALS example
2013-01-15 09:04:32 +02:00
Josh Rosen
49c74ba2af
Change PYSPARK_PYTHON_EXEC to PYSPARK_PYTHON.
2013-01-10 08:10:59 -08:00
Josh Rosen
d55f2b9882
Use take() instead of takeSample() in PySpark kmeans example.
...
This is a temporary change until we port takeSample().
2013-01-09 21:21:23 -08:00
Josh Rosen
1a64432ba5
Indicate success/failure in PySpark test script.
2013-01-09 20:30:36 -08:00
Josh Rosen
b57dd0f160
Add mapPartitionsWithSplit() to PySpark.
2013-01-08 16:05:02 -08:00
Josh Rosen
33beba3965
Change PySpark RDD.take() to not call iterator().
2013-01-03 14:52:21 -08:00
Josh Rosen
ce9f1bbe20
Add pyspark
script to replace the other scripts.
...
Expand the PySpark programming guide.
2013-01-01 21:25:49 -08:00
Josh Rosen
b58340dbd9
Rename top-level 'pyspark' directory to 'python'
2013-01-01 15:05:00 -08:00
Josh Rosen
9abdfa6633
Fix Python 2.6 compatibility in Python API.
2012-09-17 00:09:16 -07:00
Josh Rosen
886b39de55
Add Python API.
2012-08-18 22:33:51 -07:00