Imran Rashid
244cbbe33a
one more minor cleanup to scaladoc
2012-07-28 20:16:10 -07:00
Imran Rashid
3b392c67db
fix up scaladoc, naming of type parameters
2012-07-28 20:16:01 -07:00
Imran Rashid
f1face1ea9
rename addToAccum to addAccumulator
2012-07-28 20:16:01 -07:00
Imran Rashid
2d666b9d76
add some functionality to Vector, delete copy in AccumulatorSuite
2012-07-28 20:15:51 -07:00
Imran Rashid
edc6972f8e
move Vector class into core and spark.util package
2012-07-28 20:15:42 -07:00
Imran Rashid
83659af11c
Accumulator now inherits from Accumulable, whcih simplifies a bunch of other things (eg., no +:=)
...
Conflicts:
core/src/main/scala/spark/Accumulators.scala
2012-07-28 20:13:51 -07:00
Imran Rashid
79d58ed20a
improve scaladoc
2012-07-28 20:12:41 -07:00
Imran Rashid
ae07f3864c
add Accumulatable, add corresponding docs & tests for accumulators
2012-07-28 20:12:41 -07:00
Matei Zaharia
f6f917bd00
Add a sleep to prevent a failing test.
...
The BlockManager's put seems to be slightly asynchronous, which can
cause it to fail this test by not removing stuff from the cache before
we put the next value. We should probably change the semantics of put()
in this case but it's hard right now. It will also be hard for
asynchronously replicated puts.
2012-07-27 16:59:36 -07:00
Matei Zaharia
c0c78d2119
Renamed test more descriptively
2012-07-27 16:28:18 -07:00
Matei Zaharia
dee8ff1b9d
Added a second version of union() without varargs.
2012-07-27 16:27:52 -07:00
Tathagata Das
cf429699e1
Updated the new checkpoint RDD to remember partitioning of the original RDD.
2012-07-27 23:16:37 +00:00
Matei Zaharia
b51d733a57
Fixed Java union methods having same erasure.
...
Changed union() methods on lists to take a separate "first element"
argument in order to differentiate them to the compiler, because Java 7
considered it an error to have them all take Lists parameterized with
different types.
2012-07-27 12:23:27 -07:00
Tathagata Das
3e271c3b61
Merge branch 'dev' of github.com:tdas/spark into dev
2012-07-27 12:01:04 -07:00
Tathagata Das
024905f682
Added BlockRDD and a first-cut version of checkpoint() to RDD class.
2012-07-27 12:00:49 -07:00
Tathagata Das
d1eee44a03
Fixed more stuff in BoundedMemoryCache.
2012-07-27 18:33:32 +00:00
Tathagata Das
d1b7f41671
Fixed bug in BoundedMemoryCache.
2012-07-27 09:00:45 -07:00
Tathagata Das
435d129bec
Fixed bugs in block dropping code of MemoryStore and changed synchronized HashMap to ConcurrentHashMap in BlockManager.
2012-07-27 10:02:26 +00:00
Tathagata Das
0426769f89
Modified the block dropping code for better performance.
2012-07-26 20:53:45 -07:00
Matei Zaharia
5c5aa2ff81
Merge pull request #153 from JoshRosen/new-java-api
...
Java API
2012-07-26 17:20:52 -07:00
Josh Rosen
c5e2810dc7
Add persist(), splits(), glom(), and mapPartitions() to Java API.
2012-07-26 12:46:47 -07:00
Josh Rosen
bf61c10072
Detect non-zero exit status from PipedRDD process.
2012-07-26 11:32:59 -07:00
Josh Rosen
6a78e88237
Minor cleanup and optimizations in Java API.
...
- Add override keywords.
- Cache RDDs and counts in TC example.
- Clean up JavaRDDLike's abstract methods.
2012-07-24 09:47:00 -07:00
Denny
4f4a34c025
Stlystic changes
...
Conflicts:
core/src/test/scala/spark/MesosSchedulerSuite.scala
2012-07-23 16:32:20 -07:00
Denny
866e6949df
Always destroy SparkContext in after block for the unit tests.
...
Conflicts:
core/src/test/scala/spark/ShuffleSuite.scala
2012-07-23 16:29:17 -07:00
Matei Zaharia
600e99728d
Fix a bug where an input path was added to a Hadoop job configuration twice
2012-07-23 16:16:19 -07:00
Josh Rosen
042dcbde33
Add type annotations to Java API methods.
...
Add missing Scala Map to java.util.Map conversions.
2012-07-22 17:35:29 -07:00
Josh Rosen
e23938c3be
Use mapValues() in JavaPairRDD.cogroupResultToJava().
2012-07-22 15:10:01 -07:00
Josh Rosen
01dce3f569
Add Java API
...
Add distinct() method to RDD.
Fix bug in DoubleRDDFunctions.
2012-07-18 17:34:29 -07:00
Matei Zaharia
628bb5ca7f
Allow null keys in Spark's reduce and group by
2012-07-12 18:36:02 -07:00
Matei Zaharia
e2a67a8024
Fixes to coarse-grained Mesos scheduler in dealing with failed nodes
2012-07-12 18:21:52 -07:00
Matei Zaharia
be622cf867
Formatting
2012-07-11 17:31:44 -07:00
Matei Zaharia
e8ae77df24
Added more methods for loading/saving with new Hadoop API
2012-07-11 17:31:33 -07:00
Matei Zaharia
0a47284003
More work to allow Spark to run on the standalone deploy cluster.
2012-07-08 14:00:04 -07:00
Matei Zaharia
1aa63f775b
Added back coarse-grained Mesos scheduler based on StandaloneScheduler.
2012-07-08 10:52:13 -07:00
Matei Zaharia
c5cc10cda3
More work on standalone scheduler
2012-07-06 20:17:44 -07:00
Matei Zaharia
909b325243
Further refactoring, and start of a standalone scheduler backend
2012-07-06 17:56:44 -07:00
Matei Zaharia
4e2fe0bdaf
Miscellaneous bug fixes
2012-07-06 16:33:40 -07:00
Matei Zaharia
e72afdb817
Some refactoring to make cluster scheduler pluggable.
2012-07-06 15:23:26 -07:00
Matei Zaharia
5d1a887bed
Further updates to run processes on cluster.
2012-07-01 17:13:31 -07:00
Matei Zaharia
51c46eaca0
More work on standalone deploy system.
2012-07-01 01:05:59 -07:00
Matei Zaharia
a6eb9fda61
Detect connection and disconnection of slaves
2012-06-30 17:46:56 -07:00
Matei Zaharia
408b5a1332
More work on deploy code (adding Worker class)
2012-06-30 16:45:57 -07:00
Matei Zaharia
2fb6e7d71e
Initial framework to get a master and web UI up.
2012-06-30 14:45:55 -07:00
Matei Zaharia
c53670b9bf
Various code style fixes, mostly from IntelliJ IDEA
2012-06-29 18:47:12 -07:00
Matei Zaharia
c6be4ffbf9
Fixes to CoarseMesosScheduler
2012-06-29 16:18:51 -07:00
Matei Zaharia
3a58efa5a5
Allow binding to a free port and change Akka logging to use SLF4J. Also
...
fixes various bugs in the previous code when running on Mesos.
2012-06-29 16:02:21 -07:00
Matei Zaharia
3920189932
Upgraded to Akka 2 and fixed test execution (which was still parallel
...
across projects).
2012-06-28 23:51:28 -07:00
root
6ad3e1f1b4
Various fixes when running on Mesos
2012-06-20 06:48:26 +00:00
Tathagata Das
e896a505e2
Added testcase for ByteBufferInputStream bugs.
2012-06-17 16:11:12 -07:00
Tathagata Das
40536e3668
Fixed nasty corner case bug in ByteBufferInputStream. Could not add a test case for this as I could not figure out how to deterministically reproduce the bug in a short testcase.
2012-06-17 13:28:41 -07:00
Matei Zaharia
2893b30550
Various fixes to get unit tests running. In particular, shut down
...
ConnectionManager and DAGScheduler properly, plus a fix to
LocalScheduler that was not merged in from 0.5 and was actually caught
by one of the tests.
2012-06-17 00:28:45 -07:00
Matei Zaharia
b3eeac55b8
Fixed HttpBroadcast to work with this branch's Serializer.
2012-06-15 23:54:38 -07:00
Matei Zaharia
f58da6164e
Merge branch 'master' into dev
2012-06-15 23:47:11 -07:00
Tathagata Das
5f54bdf98b
Added shutdown for akka to SparkContext.stop(). Helps a little, but many testsuites still fail.
2012-06-13 20:49:00 -04:00
Tathagata Das
c6156da9e2
Multiple bug fixes to pass the testsuites ShuffleSuite and BlockManagerSuite.
2012-06-13 16:26:49 -04:00
Matei Zaharia
879bc0bece
Merge branch 'master' into mesos-0.9
2012-06-09 16:24:16 -07:00
Matei Zaharia
4b05798c06
Further bug fix to HttpBroadcast
2012-06-09 16:24:03 -07:00
Matei Zaharia
587a16a7ef
Merge branch 'master' into mesos-0.9
2012-06-09 16:17:07 -07:00
Matei Zaharia
8ed662862e
Bug fix to HttpBroadcast
2012-06-09 16:16:55 -07:00
Matei Zaharia
2fd9f994ae
Merge branch 'master' into mesos-0.9
2012-06-09 15:58:35 -07:00
Matei Zaharia
e75b1b5cb4
Change the default broadcast implementation to a simple HTTP-based
...
broadcast. Fixes #139 .
2012-06-09 15:58:07 -07:00
Matei Zaharia
a96558caa3
Performance improvements to shuffle operations: in particular, preserve
...
RDD partitioning in more cases where it's possible, and use iterators
instead of materializing collections when doing joins.
2012-06-09 14:44:18 -07:00
Matei Zaharia
c2c7299d7a
Added BlockManagerSuite, which I'd forgotten to merge.
2012-06-07 13:47:10 -07:00
Matei Zaharia
63051dd2bc
Merge in engine improvements from the Spark Streaming project, developed
...
jointly with Tathagata Das and Haoyuan Li. This commit imports the changes
and ports them to Mesos 0.9, but does not yet pass unit tests due to
various classes not supporting a graceful stop() yet.
2012-06-07 12:45:38 -07:00
Matei Zaharia
7e1c97fc4b
Merge branch 'master' into mesos-0.9
2012-06-06 16:48:59 -07:00
Matei Zaharia
048276799a
Commit task outputs to Hadoop-supported storage systems in parallel on the
...
cluster instead of on the master. Fixes #110 .
2012-06-06 16:46:53 -07:00
Matei Zaharia
6888bc7191
Merge branch 'master' into mesos-0.9
2012-06-06 16:14:19 -07:00
Matei Zaharia
6ae2746d1e
Handle arrays that contain the same element many times better in
...
SizeEstimator. Also added a test for SizeEstimator. Fixes #136 .
2012-06-06 16:13:02 -07:00
Matei Zaharia
0a617958d1
Some refactoring to make BoundedMemoryCache test similar to others
2012-06-06 16:12:08 -07:00
Matei Zaharia
dbc3c86ae3
Merge branch 'master' into mesos-0.9
...
Conflicts:
core/src/main/scala/spark/Executor.scala
2012-06-03 17:44:04 -07:00
Matei Zaharia
e141f644ca
Merge pull request #132 from Benky/rb-first-iteration
...
Little refactoring and unit tests for CacheTrackerActor
2012-05-26 13:15:06 -07:00
Richard Benkovsky
ae64920337
MesosScheduler refactoring
2012-05-22 11:04:54 +02:00
Richard Benkovsky
3a1bcd4028
Added tests for CacheTrackerActor
2012-05-22 11:04:54 +02:00
Richard Benkovsky
8f2f736d53
Little refactoring
2012-05-22 11:04:54 +02:00
Richard Benkovsky
518506a7c5
Added tests for Utils.copyStream
2012-05-22 11:04:51 +02:00
Richard Benkovsky
f162fc2beb
Formating fixed
2012-05-22 09:45:38 +02:00
Richard Benkovsky
565245871f
BoundedMemoryCache.put fails when estimated size of 'value' is larger than cache capacity
2012-05-20 22:13:35 +02:00
Richard Benkovsky
822a4be37d
Utils.memoryBytesToString fixed
2012-05-19 15:13:20 +02:00
Reynold Xin
d0c6e9f639
Made some RDD dependencies transient to reduce the amount of data needed
...
to be serialized in closure serialization. This can significantly reduce
the task setup time in Shark when the query involves a large number of
(Hive) partitions.
2012-05-16 14:16:55 -07:00
Reynold Xin
16461e2eda
Updated Cache's put method to use a case class for response. Previously
...
it was pretty ugly that put() should return -1 for failures.
2012-05-15 00:31:52 -07:00
Reynold Xin
019e48833f
Added the capacity to report cache usage status back to the cache
...
trackor. This is essential for building a dashboard to see the status of
caches on all slaves.
2012-05-14 18:39:04 -07:00
Matei Zaharia
f48742683a
Made caches dataset-aware so that they won't cyclically evict partitions
...
from the same dataset.
2012-05-06 20:14:40 -07:00
Matei Zaharia
bd2ab635a7
Fixed the way the JAR server is created after finding issue at Twitter
2012-05-05 20:05:15 -07:00
Matei Zaharia
32a4f4623c
Merge pull request #129 from mesos/rxin
...
Force serialize/deserialize task results in local execution mode.
2012-04-24 16:18:39 -07:00
Reynold Xin
761ea65a98
Added a test for the previous commit (failing to serialize task results
...
would throw an exception for local tasks).
2012-04-24 15:14:35 -07:00
Reynold Xin
9821cd4d42
Force serialize/deserialize task results in local execution mode.
2012-04-24 14:55:28 -07:00
Antonio
3e48818993
Removed commented-out System.exit call
2012-04-23 11:42:58 -07:00
Antonio
39d99168dc
Added exception handling instead of just exiting in LocalScheduler for tasks that throw exceptions
2012-04-20 14:46:43 -07:00
Reynold Xin
e601b3b9e5
Added the ability to set environmental variables in piped rdd.
2012-04-17 16:40:56 -07:00
Matei Zaharia
3b745176e0
Bug fix to pluggable closure serialization change
2012-04-12 17:53:02 +00:00
Matei Zaharia
112655f032
Merge pull request #121 from rxin/kryo-closure
...
Added an option (spark.closure.serializer) to specify the serializer for closures.
2012-04-10 14:21:02 -07:00
Reynold Xin
d295ccb43c
Added a closureSerializer field in SparkEnv and use it to serialize
...
tasks.
2012-04-10 13:29:46 -07:00
Reynold Xin
968f75f6af
Added an option (spark.closure.serializer) to specify the serializer for
...
closures. This enables using Kryo as the closure serializer.
2012-04-09 21:59:56 -07:00
Matei Zaharia
a69c0738d1
Merge branch 'master' into mesos-0.9
2012-04-08 23:41:36 -07:00
Matei Zaharia
a633974143
Merge branch 'master' of github.com:mesos/spark
2012-04-08 23:41:25 -07:00
Matei Zaharia
0229d5390f
Merge branch 'master' into mesos-0.9
2012-04-08 23:39:37 -07:00
Matei Zaharia
d401e1b3e8
Fix a possible deadlock in MesosScheduler
2012-04-08 23:38:49 -07:00
Ankur Dave
7be1c7b331
Report entry dropping in BoundedMemoryCache
2012-04-06 15:49:32 -07:00
Matei Zaharia
a8bb324ed9
Merge branch 'master' into mesos-0.9
2012-04-05 14:53:22 -07:00