Tathagata Das
d808e1026a
Merge branch 'dev' into dev-merge
2013-01-07 16:41:11 -08:00
Tathagata Das
1d8b1c9bec
Merge branch 'dev-merge' of github.com:radlab/spark into dev-merge
2013-01-07 16:14:11 -08:00
Tathagata Das
4719e6d8fe
Changed locations for unit test logs.
2013-01-07 16:06:07 -08:00
Shivaram Venkataraman
55c66d365f
Use a dummy string class in Size Estimator tests to make it resistant to jdk
...
versions
2013-01-07 15:58:00 -08:00
Shivaram Venkataraman
77d751731c
Remove unused BoundedMemoryCache file and associated test case.
2013-01-07 15:57:46 -08:00
Shivaram Venkataraman
aed368a970
Update Hadoop dependency to 1.0.3 as 0.20 has Sun specific dependencies. Also
...
fix SequenceFileRDDFunctions to pick the right type conversion across Hadoop
versions
2013-01-07 15:57:33 -08:00
Shivaram Venkataraman
f8d579a0c0
Remove dependencies on sun jvm classes. Instead use reflection to infer
...
HotSpot options and total physical memory size
2013-01-07 15:57:18 -08:00
Tathagata Das
e60514d79e
Fixed bug
2013-01-07 15:16:16 -08:00
Tathagata Das
3b0a3b89ac
Added better docs for RDDCheckpointData
2013-01-07 14:55:49 -08:00
Tathagata Das
237bac36e9
Renamed examples and added documentation.
2013-01-07 14:37:21 -08:00
Matei Zaharia
1941d9602d
Merge branch 'master' of github.com:mesos/spark
2013-01-07 16:50:39 -05:00
Matei Zaharia
9c32f300fb
Add Accumulable.setValue for easier use in Java
2013-01-07 16:50:23 -05:00
Tathagata Das
1346126485
Changed cleanup to clearOldValues for TimeStampedHashMap and TimeStampedHashSet.
2013-01-07 12:11:27 -08:00
Tathagata Das
af8738dfb5
Moved Spark Streaming examples to examples sub-project.
2013-01-06 19:31:54 -08:00
Tathagata Das
934ecc829a
Removed streaming-env.sh.template
2013-01-06 14:15:07 -08:00
Stephen Haberman
8dc06069fe
Rename RDD.tupleBy to keyBy.
2013-01-06 15:21:45 -06:00
Matei Zaharia
8fd3a70c18
Add PairRDD.keys() and values() to Java API
2013-01-05 22:46:45 -05:00
Matei Zaharia
b1663752c6
Merge pull request #351 from stephenh/values
...
Add PairRDDFunctions.keys and values.
2013-01-05 19:15:54 -08:00
Matei Zaharia
0982572519
Add methods called just 'accumulator' for int/double in Java API
2013-01-05 22:11:28 -05:00
Matei Zaharia
86af64b0a6
Fix Accumulators in Java, and add a test for them
2013-01-05 20:55:17 -05:00
Matei Zaharia
ecf9c08901
Fix Accumulators in Java, and add a test for them
2013-01-05 20:54:08 -05:00
Stephen Haberman
1fdb6946b5
Add RDD.tupleBy.
2013-01-05 13:07:59 -06:00
Stephen Haberman
6a0db3b449
Fix typo.
2013-01-05 12:56:17 -06:00
Matei Zaharia
7ab9f09140
Merge pull request #352 from stephenh/collect
...
Add RDD.collect(PartialFunction).
2013-01-05 10:17:20 -08:00
Stephen Haberman
f4e6b9361f
Add RDD.collect(PartialFunction).
2013-01-05 12:14:08 -06:00
Stephen Haberman
8d57c78c83
Add PairRDDFunctions.keys and values.
2013-01-05 12:04:01 -06:00
Josh Rosen
33beba3965
Change PySpark RDD.take() to not call iterator().
2013-01-03 14:52:21 -08:00
Patrick Wendell
c438faeac4
Merge pull request #10 from radlab/datahandler-fix
...
Several code-quality improvements to DataHandler.
2013-01-02 17:07:12 -08:00
Patrick Wendell
2ef993d159
BufferingBlockCreator -> NetworkReceiver.BlockGenerator
2013-01-02 14:19:51 -08:00
Patrick Wendell
96a6ff0b09
Merge branch 'dev-merge' into datahandler-fix
...
Conflicts:
streaming/src/main/scala/spark/streaming/dstream/DataHandler.scala
2013-01-02 14:08:15 -08:00
Patrick Wendell
493d65ce65
Several code-quality improvements to DataHandler.
...
- Changed to more accurate name: BufferingBlockCreator
- Docstring now correctly reflects the abstraction
offered by the class
- Made internal methods private
- Fixed indentation problems
2013-01-02 13:39:18 -08:00
Josh Rosen
ce9f1bbe20
Add pyspark
script to replace the other scripts.
...
Expand the PySpark programming guide.
2013-01-01 21:25:49 -08:00
Tathagata Das
3dc87dd923
Fixed compilation bug in RDDSuite created during merge for mesos/master.
2013-01-01 16:38:04 -08:00
Tathagata Das
d34dba25c2
Merge branch 'mesos' into dev-merge
2013-01-01 15:48:39 -08:00
Josh Rosen
b58340dbd9
Rename top-level 'pyspark' directory to 'python'
2013-01-01 15:05:00 -08:00
Josh Rosen
170e451fbd
Minor documentation and style fixes for PySpark.
2013-01-01 13:52:14 -08:00
Tathagata Das
02497f0cd4
Updated Streaming Programming Guide.
2013-01-01 12:21:32 -08:00
Matei Zaharia
55809fbc6d
Merge pull request #349 from woggling/cache-finally
...
Avoid stalls when computation of cached RDD throws exception
2013-01-01 08:21:33 -08:00
Matei Zaharia
c593f6329e
Merge pull request #348 from JoshRosen/spark-597
...
Raise exception when hashing Java arrays (SPARK-597)
2013-01-01 08:20:06 -08:00
Charles Reiss
58072a7340
Remove some dead comments
2013-01-01 08:07:44 -08:00
Charles Reiss
21636ee4fa
Test with exception while computing cached RDD.
2013-01-01 08:07:40 -08:00
Charles Reiss
feadaf72f4
Mark key as not loading in CacheTracker even when compute() fails
2013-01-01 07:57:20 -08:00
Josh Rosen
f803953998
Raise exception when hashing Java arrays (SPARK-597)
2012-12-31 20:20:11 -08:00
Josh Rosen
6f6a6b79c4
Launch with scala
by default in run-pyspark
2012-12-31 14:57:18 -08:00
Tathagata Das
18b9b3b99f
More classes made private[streaming] to hide from scala docs.
2012-12-30 20:00:42 -08:00
Tathagata Das
7e0271b438
Refactored a whole lot to push all DStreams into the spark.streaming.dstream package.
2012-12-30 15:19:55 -08:00
Tathagata Das
9e644402c1
Improved jekyll and scala docs. Made many classes and method private to remove them from scala docs.
2012-12-29 18:31:51 -08:00
Josh Rosen
099898b439
Port LR example to PySpark using numpy.
...
This version of the example crashes after the first iteration with
"OverflowError: math range error" because Python's math.exp()
behaves differently than Scala's; see SPARK-646.
2012-12-29 18:00:28 -08:00
Josh Rosen
39dd953fd8
Add test for pyspark.RDD.saveAsTextFile().
2012-12-29 17:06:50 -08:00
Josh Rosen
59195c68ec
Update PySpark for compatibility with TaskContext.
2012-12-29 16:01:03 -08:00