Josh Rosen
9abdfa6633
Fix Python 2.6 compatibility in Python API.
2012-09-17 00:09:16 -07:00
Josh Rosen
4143678509
Fix minor bugs in Python API examples.
2012-08-27 00:24:47 -07:00
Josh Rosen
bff6a46359
Add pipe(), saveAsTextFile(), sc.union() to Python API.
2012-08-27 00:24:47 -07:00
Josh Rosen
200d248dcc
Simplify Python worker; pipeline the map step of partitionBy().
2012-08-27 00:24:39 -07:00
Josh Rosen
6904cb77d4
Use local combiners in Python API combineByKey().
2012-08-27 00:19:26 -07:00
Josh Rosen
8b64b7ecd8
Add countByKey(), reduceByKeyLocally() to Python API
2012-08-27 00:19:22 -07:00
Josh Rosen
08b201d810
Add mapPartitions(), glom(), countByValue() to Python API.
2012-08-27 00:19:14 -07:00
Josh Rosen
f79a1e4d2a
Add broadcast variables to Python API.
2012-08-27 00:16:47 -07:00
Josh Rosen
65e8406029
Implement fold() in Python API.
2012-08-27 00:16:47 -07:00
Josh Rosen
f3b852ce66
Refactor Python MappedRDD to use iterator pipelines.
2012-08-24 19:44:14 -07:00
Josh Rosen
4b52300487
Fix options parsing in Python pi example.
2012-08-24 19:42:47 -07:00
Josh Rosen
607b53abfc
Use numpy in Python k-means example.
2012-08-22 00:43:55 -07:00
Josh Rosen
fd94e5443c
Use only cPickle for serialization in Python API.
...
Objects serialized with JSON can be compared for equality, but JSON can be slow
to serialize and only supports a limited range of data types.
2012-08-21 14:01:27 -07:00
Josh Rosen
13b9514966
Bundle cloudpickle with pyspark.
2012-08-19 17:17:42 -07:00
Josh Rosen
886b39de55
Add Python API.
2012-08-18 22:33:51 -07:00
Matei Zaharia
9a0c128fec
Merge pull request #172 from dennybritz/dev
...
Rsync root directory in EC2 script
2012-08-14 13:05:22 -07:00
Denny
8dc7242544
Use root login in standalone AMI
2012-08-14 10:18:24 -07:00
Denny
7152c7c12d
rsync root directory in EC2 script
2012-08-14 09:26:47 -07:00
Matei Zaharia
942e604c62
Merge pull request #171 from shivaram/for-size-estimator-pull
...
Size estimator changes for dev
2012-08-13 15:29:40 -07:00
Shivaram Venkataraman
0f4fbb057b
Change BlockManagerSuite test cases to use a deterministic size estimator and
...
update the results to match the new estimates
2012-08-13 13:32:23 -07:00
Shivaram Venkataraman
22ba3a3f77
Add test-cases for 32-bit and no-compressed oops scenarios.
2012-08-13 13:32:10 -07:00
Shivaram Venkataraman
1f68c4b03b
Update test cases to match the new size estimates. Uses 64-bit and compressed
...
oops setting to get deterministic results
2012-08-13 13:31:54 -07:00
Shivaram Venkataraman
1ea269110c
Move object size and pointer size initialization into a function to enable unit-testing
2012-08-13 13:31:45 -07:00
Shivaram Venkataraman
44661df9cc
If spark.test.useCompressedOops is set, use that to infer compressed oops
...
setting. This is useful to get a deterministic test case
2012-08-13 13:31:39 -07:00
Shivaram Venkataraman
0dd8fe73ba
Use HotSpotDiagnosticMXBean to get if CompressedOops are in use or not
2012-08-13 13:31:29 -07:00
Shivaram Venkataraman
80104ce1da
Add link to Java wiki which specifies what changes with compressed oops
2012-08-13 13:31:21 -07:00
Shivaram Venkataraman
00ab5490b3
Changes to make size estimator more accurate. Fixes object size, pointer size
...
according to architecture and also aligns objects and arrays when computing
instance sizes. Verified using Eclipse Memory Analysis Tool (MAT)
2012-08-13 13:31:11 -07:00
Matei Zaharia
6ae3c375a9
Renamed apply() to call() in Java API and allowed it to throw Exceptions
2012-08-12 23:10:19 +02:00
Matei Zaharia
0141879c40
Use Promises instead of having a Future wait on a thread in
...
ConnectionManager.
2012-08-12 22:16:32 +02:00
Matei Zaharia
845a870242
Return remotely fetched blocks in a pipelined fashion from BlockManager
2012-08-12 20:01:38 +02:00
Matei Zaharia
e17ed9a21d
Switch to Akka futures in connection manager.
...
It's still not good because each Future ends up waiting on a lock, but
it seems to work better than Scala Actors, and more importantly it
allows us to use onComplete and other listeners on futures.
2012-08-12 19:40:37 +02:00
Matei Zaharia
ad8a7612a4
Changed multi-get method in BlockManager to return an iterator
2012-08-12 19:18:01 +02:00
Matei Zaharia
3c94e5c188
Merge pull request #168 from shivaram/dev
...
Use JavaConversion to get a scala iterator
2012-08-10 00:57:33 -07:00
Matei Zaharia
e463e7a333
Merge pull request #167 from JoshRosen/piped-rdd-fixes
...
Detect non-zero exit status from PipedRDD process
2012-08-10 00:56:42 -07:00
Josh Rosen
59c22fb444
Print exit status in PipedRDD failure exception.
2012-08-10 00:33:56 -07:00
Matei Zaharia
8069bd5b41
Removed separate launcher for EC2 standalone cluster
2012-08-09 22:45:24 +02:00
Shivaram Venkataraman
1803cce692
Use an implicit conversion to get the scala iterator
2012-08-08 14:31:04 -07:00
Shivaram Venkataraman
674fcf56bf
Use JavaConversion to get a scala iterator
2012-08-08 14:10:23 -07:00
Matei Zaharia
bec4d362c8
Merge pull request #166 from shivaram/dev
...
Avoid a copy in ShuffleMapTask
2012-08-08 09:11:19 -07:00
Shivaram Venkataraman
f4aaec7a48
Avoid a copy in ShuffleMapTask by creating an iterator that will be used by the
...
block manager.
2012-08-08 00:47:02 -07:00
Matei Zaharia
88b016db2a
Merge pull request #160 from dennybritz/clusterscripts
...
Standalone cluster scripts
2012-08-04 17:45:20 -07:00
Denny
8fb955fd40
Add Apache license to non-trivial scripts taken from Hadoop.
2012-08-04 17:04:33 -07:00
Matei Zaharia
5cefda9984
Merge pull request #165 from shivaram/dev
...
Fix test checkpoint to reuse spark context defined in the class
2012-08-03 19:17:50 -07:00
Shivaram Venkataraman
ce3444d2cb
Fix testcheckpoint to reuse spark context defined in the class
2012-08-03 18:52:26 -07:00
Matei Zaharia
62898b631f
Made range partition balance tests more aggressive.
...
This is because we pull out such a large sample (10x the number of
partitions) that we should expect pretty good balance. The tests are
also deterministic so there's no worry about them failing irreproducibly.
2012-08-03 16:46:48 -04:00
Matei Zaharia
6601a6212b
Added a unit test for cross-partition balancing in sort, and changes to
...
RangePartitioner to make it pass. It turns out that the first partition
was always kind of small due to how we picked partition boundaries.
2012-08-03 16:40:45 -04:00
Harvey
1170de3757
Fix for partitioning when sorting in descending order
2012-08-03 16:40:38 -04:00
Paul Cavallaro
d05c0f97ca
Logging Throwables in Info and Debug
...
Logging Throwables in logInfo and logDebug instead of swallowing them.
Conflicts:
core/src/main/scala/spark/Logging.scala
2012-08-03 16:40:21 -04:00
Denny
c90c9ec208
Read config variables before to get the master port
2012-08-02 16:12:40 -07:00
Denny
0008994044
merged dev branch
2012-08-02 16:00:33 -07:00