haoyuan
db08a362aa
commit opt for grep scalibility test.
2012-09-07 02:17:52 +00:00
root
c2da64409a
Randomize the order of block fetches in getMultiple
2012-09-06 23:16:26 +00:00
root
9ef90c95f4
Bug fix
2012-09-06 00:43:46 +00:00
root
2fa6d999fd
Tuning Akka more
2012-09-06 00:16:39 +00:00
Denny
886183e591
Renamed spark-cluster to spark-local.
2012-09-05 17:10:54 -07:00
root
215544820f
Serialize map output locations more efficiently, and only once, in MapOutputTracker
2012-09-05 23:54:04 +00:00
root
dc68febdce
User Spark's closure serializer for the ShuffleMapTask cache
2012-09-05 23:06:59 +00:00
Reynold Xin
c308fbcb79
Removed cache add/remove log messages from CacheTracker.
...
Added log messages on BlockManagerMaster to reflect block add/remove.
Also did some minor cleanup of storage package code.
2012-09-05 15:59:48 -07:00
root
ed937a821f
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-05 22:26:49 +00:00
root
1d6b36d3c3
Further tuning for network performance
2012-09-05 22:26:37 +00:00
root
3fa0d7f0c9
Serialize BlockRDD more efficiently
2012-09-05 08:28:15 +00:00
root
4a5d0d249e
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-05 08:23:09 +00:00
root
efc7668d16
Allow serializing HttpBroadcast through Kryo
2012-09-05 08:22:57 +00:00
root
75487b2f5a
Broadcast the JobConf in HadoopRDD to reduce task sizes
2012-09-05 08:14:50 +00:00
root
b7ad291ac5
Tuning Akka for more connections
2012-09-05 07:08:07 +00:00
root
fc186dc18a
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-05 05:53:18 +00:00
root
4ea032a142
Some changes to make important log output visible even if we set the logging to WARNING
2012-09-05 05:53:07 +00:00
Denny
babbca0a2f
Fix wrong counting
2012-09-04 22:04:18 -07:00
Denny
9326509f66
Delete old DeployUtils.
2012-09-04 21:15:23 -07:00
Denny
1588d4dbe6
Renamed class.
2012-09-04 21:13:25 -07:00
Denny
22dde6e020
Start a standalone cluster locally.
2012-09-04 20:56:30 -07:00
Tathagata Das
7c09ad0e04
Changed DStream member access permissions from private to protected. Updated StateDStream to checkpoint RDDs and forget lineage.
2012-09-04 19:11:49 -07:00
Matei Zaharia
a842c63044
Minor formatting fixes
2012-09-03 16:24:00 -07:00
Tathagata Das
b8e9e8ea78
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-02 02:35:32 -07:00
root
ceabf71257
tweaks
2012-09-01 21:52:42 +00:00
root
6025889be0
More raw network receiver programs
2012-09-01 20:51:07 +00:00
Harvey
3076b038f4
Start fetching a remote block when a received remote block has been passed
...
to the reduce function
2012-09-01 12:01:35 -07:00
Matei Zaharia
f84d2bbe55
Bug fixes to RateLimitedOutputStream
2012-09-01 00:31:15 -07:00
Matei Zaharia
44758aa8e2
First work towards a RawInputDStream and a sender program for it.
2012-09-01 00:17:59 -07:00
root
c42e7ac282
More block manager fixes
2012-09-01 04:31:11 +00:00
Matei Zaharia
389fb4cc54
End runJob() with a SparkException when a task fails too many times in
...
one of the cluster schedulers.
2012-08-31 17:47:43 -07:00
root
113277549c
Really fixed the replication-3 issue. The problem was a few buffers not being rewound.
2012-08-31 05:39:35 +00:00
Mosharaf Chowdhury
31ffe8d528
Synchronization bug fix in broadcast implementations
2012-08-30 22:26:43 -07:00
Matei Zaharia
101ae493e2
Replicate serialized blocks properly, without sharing a ByteBuffer.
2012-08-30 22:24:14 -07:00
Mosharaf Chowdhury
3883532545
Bug fix. Fixed log messages. Updated BroadcastTest example to have iterations.
2012-08-30 21:43:00 -07:00
Matei Zaharia
a480dec6b2
Deserialize multi-get results in the caller's thread. This fixes an
...
issue with shared buffers in the KryoSerializer.
2012-08-30 20:01:06 -07:00
Matei Zaharia
1b3e3352eb
Deserialize multi-get results in the caller's thread. This fixes an
...
issue with shared buffers with the KryoSerializer.
2012-08-30 17:59:25 -07:00
root
c4366eb764
Fixes to ShuffleFetcher
2012-08-31 00:34:24 +00:00
Reynold Xin
5945bcdcc5
Added a new flag in Aggregator to indicate applying map side combiners.
2012-08-29 23:32:08 -07:00
Reynold Xin
c68e820b2a
Merge branch 'dev' of github.com:mesos/spark into dev
2012-08-29 23:01:19 -07:00
Reynold Xin
940869dfda
Disable running combiners on map tasks when mergeCombiners function is
...
not specified by the user.
2012-08-29 23:00:02 -07:00
Tathagata Das
4db3a96766
Made minor changes to reduce compilation errors in Eclipse. Twirl stuff still does not compile in Eclipse.
2012-08-29 13:04:01 -07:00
Matei Zaharia
bf2e9cb08e
Fault tolerance and block store fixes discovered through streaming tests.
2012-08-27 23:07:50 -07:00
Matei Zaharia
17af2df0cd
Log levels
2012-08-27 23:07:32 -07:00
Matei Zaharia
b4a2214218
More fault tolerance fixes to catch lost tasks
2012-08-27 22:49:29 -07:00
Reynold Xin
3a6a95dc24
Removed the deserialization cache for ShuffleMapTask because it was
...
causing concurrency problems (some variables in Shark get set to null).
The cost of task deserialization on slaves is trivial compared with the
execution time of the task anyway.
2012-08-27 22:33:15 -07:00
Matei Zaharia
b914cd0dfa
Serialize generation correctly in ShuffleMapTask
2012-08-26 20:07:59 -07:00
Matei Zaharia
69c2ab0408
logging
2012-08-26 20:00:58 -07:00
Matei Zaharia
117e3f8c86
Fix a bug that was causing FetchFailedException not to be thrown
2012-08-26 19:52:56 -07:00
Matei Zaharia
3c9c44a8d3
More helpful log messages
2012-08-26 19:37:43 -07:00
Matei Zaharia
26dfd20c9a
Detect disconnected slaves in StandaloneScheduler
2012-08-26 18:56:56 -07:00
Matei Zaharia
29e83f39e9
Fix replication with MEMORY_ONLY_DESER_2
2012-08-26 18:16:25 -07:00
Matei Zaharia
06ef7c3d1b
Less debug info
2012-08-26 16:29:20 -07:00
Matei Zaharia
741899b21e
Fix sendMessageReliablySync
2012-08-26 16:26:06 -07:00
Matei Zaharia
5a8015d2db
Merge remote-tracking branch 'public/dev' into dev
2012-08-24 16:11:44 -07:00
Matei Zaharia
deedb9e7b7
Fix further issues with tests and broadcast.
...
The broadcast fix is to store values as MEMORY_ONLY_DESER instead of
MEMORY_ONLY, which will save substantial time on serialization.
2012-08-23 20:31:49 -07:00
Matei Zaharia
59b831b9d1
Fixed test failures due to broadcast not stopping correctly
2012-08-23 19:59:55 -07:00
Matei Zaharia
7310a6f499
Merge pull request #147 from mosharaf/dev
...
Broadcast refactoring/cleaning up
2012-08-23 19:38:28 -07:00
Matei Zaharia
25a6a39e6d
Added other SparkContext constructors to JavaSparkContext
2012-08-19 18:59:16 -07:00
Shivaram Venkataraman
1ea269110c
Move object size and pointer size initialization into a function to enable unit-testing
2012-08-13 13:31:45 -07:00
Shivaram Venkataraman
44661df9cc
If spark.test.useCompressedOops is set, use that to infer compressed oops
...
setting. This is useful to get a deterministic test case
2012-08-13 13:31:39 -07:00
Shivaram Venkataraman
0dd8fe73ba
Use HotSpotDiagnosticMXBean to get if CompressedOops are in use or not
2012-08-13 13:31:29 -07:00
Shivaram Venkataraman
80104ce1da
Add link to Java wiki which specifies what changes with compressed oops
2012-08-13 13:31:21 -07:00
Shivaram Venkataraman
00ab5490b3
Changes to make size estimator more accurate. Fixes object size, pointer size
...
according to architecture and also aligns objects and arrays when computing
instance sizes. Verified using Eclipse Memory Analysis Tool (MAT)
2012-08-13 13:31:11 -07:00
Matei Zaharia
6ae3c375a9
Renamed apply() to call() in Java API and allowed it to throw Exceptions
2012-08-12 23:10:19 +02:00
Matei Zaharia
0141879c40
Use Promises instead of having a Future wait on a thread in
...
ConnectionManager.
2012-08-12 22:16:32 +02:00
Matei Zaharia
845a870242
Return remotely fetched blocks in a pipelined fashion from BlockManager
2012-08-12 20:01:38 +02:00
Matei Zaharia
e17ed9a21d
Switch to Akka futures in connection manager.
...
It's still not good because each Future ends up waiting on a lock, but
it seems to work better than Scala Actors, and more importantly it
allows us to use onComplete and other listeners on futures.
2012-08-12 19:40:37 +02:00
Matei Zaharia
ad8a7612a4
Changed multi-get method in BlockManager to return an iterator
2012-08-12 19:18:01 +02:00
Matei Zaharia
3c94e5c188
Merge pull request #168 from shivaram/dev
...
Use JavaConversion to get a scala iterator
2012-08-10 00:57:33 -07:00
Matei Zaharia
e463e7a333
Merge pull request #167 from JoshRosen/piped-rdd-fixes
...
Detect non-zero exit status from PipedRDD process
2012-08-10 00:56:42 -07:00
Josh Rosen
59c22fb444
Print exit status in PipedRDD failure exception.
2012-08-10 00:33:56 -07:00
Shivaram Venkataraman
1803cce692
Use an implicit conversion to get the scala iterator
2012-08-08 14:31:04 -07:00
Shivaram Venkataraman
674fcf56bf
Use JavaConversion to get a scala iterator
2012-08-08 14:10:23 -07:00
Shivaram Venkataraman
f4aaec7a48
Avoid a copy in ShuffleMapTask by creating an iterator that will be used by the
...
block manager.
2012-08-08 00:47:02 -07:00
Mosharaf Chowdhury
d821dd3ccc
BroadcastManager is a class now (replaced Braodcast object)
2012-08-05 01:10:51 -07:00
Mosharaf Chowdhury
b4804119f9
Merge remote-tracking branch 'upstream/dev' into dev
2012-08-04 20:42:12 -07:00
Matei Zaharia
88b016db2a
Merge pull request #160 from dennybritz/clusterscripts
...
Standalone cluster scripts
2012-08-04 17:45:20 -07:00
Mosharaf Chowdhury
1b0534af8f
Merge branch 'dev' into bc-bm
2012-08-04 00:30:08 -07:00
Mosharaf Chowdhury
d11b457e67
Merge remote-tracking branch 'upstream/dev' into dev
2012-08-04 00:28:10 -07:00
Mosharaf Chowdhury
24b7eb872c
Bug fixed. Broadcast now works with BlockManager.
2012-08-04 00:27:28 -07:00
Matei Zaharia
6601a6212b
Added a unit test for cross-partition balancing in sort, and changes to
...
RangePartitioner to make it pass. It turns out that the first partition
was always kind of small due to how we picked partition boundaries.
2012-08-03 16:40:45 -04:00
Harvey
1170de3757
Fix for partitioning when sorting in descending order
2012-08-03 16:40:38 -04:00
Paul Cavallaro
d05c0f97ca
Logging Throwables in Info and Debug
...
Logging Throwables in logInfo and logDebug instead of swallowing them.
Conflicts:
core/src/main/scala/spark/Logging.scala
2012-08-03 16:40:21 -04:00
Denny
0008994044
merged dev branch
2012-08-02 16:00:33 -07:00
Denny
53008c2d8a
Settings variables and bugfix for stop script.
2012-08-02 15:59:39 -07:00
Matei Zaharia
71a958b0b7
Merge branch 'dev' of github.com:mesos/spark into dev
...
Conflicts:
project/SparkBuild.scala
2012-08-02 17:23:13 -04:00
Denny
7312a5c30f
Use spray's implicit Marshaller for Futures.
2012-08-02 14:11:27 -07:00
Denny
ba7e30fb5e
Mostly stlyistic changes.
2012-08-02 13:55:09 -07:00
Shivaram Venkataraman
1a07bb9ba4
Avoid an extra partition copy by passing an iterator to blockManager.put
2012-08-02 12:22:33 -07:00
Shivaram Venkataraman
6790908b11
Use maxMemory to better estimate memory available for BlockManager cache
2012-08-02 12:05:05 -07:00
Denny
863c31b7c1
Moved resources into static folder
2012-08-02 09:48:36 -07:00
Tathagata Das
1c0aeee960
Merge branch 'dev' of github.com:radlab/spark into dev
2012-08-01 22:11:41 -07:00
Tathagata Das
3be54c2a8a
1. Refactored SparkStreamContext, Scheduler, InputRDS, FileInputRDS and a few other files.
...
2. Modified Time class to represent milliseconds (long) directly, instead of LongTime.
3. Added new files QueueInputRDS, RecurringTimer, etc.
4. Added RDDSuite as the skeleton for testcases.
5. Added two examples in spark.streaming.examples.
6. Removed all past examples and a few unnecessary files. Moved a number of files to spark.streaming.util.
2012-08-01 22:09:27 -07:00
Denny
0ee44c225e
Spark standalone mode cluster scripts.
...
Heavily inspired by Hadoop cluster scripts ;-)
2012-08-01 20:38:52 -07:00
Denny
6c670c37dd
Webui improvements.
2012-08-01 19:47:57 -07:00
Denny
1b29e90a79
merge dev branch
2012-08-01 14:06:09 -07:00
Denny
011220fa55
Compact job page.
2012-08-01 11:26:45 -07:00
Denny
7a295fee96
Spark WebUI Implementation.
2012-08-01 11:01:09 -07:00
Mosharaf Chowdhury
f23395e8c5
Merge remote-tracking branch 'upstream/dev' into dev
2012-07-30 19:39:49 -07:00
Matei Zaharia
7814ecbd47
Merge remote-tracking branch 'public/dev' into dev
2012-07-30 15:05:49 -07:00
Matei Zaharia
3ee2530c0c
Merge branch 'block-manager-fix' into dev
2012-07-30 13:58:46 -07:00
Matei Zaharia
400221f851
Merge branch 'dev' of git://github.com/tdas/spark into dev
2012-07-30 13:54:57 -07:00
Matei Zaharia
ed1b0f8388
Made BlockManagerMaster no longer be a singleton.
...
Also cleaned up a few formatting things throughout block manager code.
2012-07-30 13:53:47 -07:00
Matei Zaharia
f471c82558
Various reorganization and formatting fixes
2012-07-30 11:24:01 -07:00
Mosharaf Chowdhury
5932a87cac
Merge remote-tracking branch 'upstream/dev' into dev
2012-07-29 18:20:45 -07:00
Matei Zaharia
e2e71a1fb5
Merge remote-tracking branch 'public/dev' into dev
2012-07-28 20:26:59 -07:00
Imran Rashid
f7149c5e46
tasks cannot access value of accumulator
2012-07-28 20:16:17 -07:00
Imran Rashid
244cbbe33a
one more minor cleanup to scaladoc
2012-07-28 20:16:10 -07:00
Imran Rashid
3b392c67db
fix up scaladoc, naming of type parameters
2012-07-28 20:16:01 -07:00
Imran Rashid
f1face1ea9
rename addToAccum to addAccumulator
2012-07-28 20:16:01 -07:00
Imran Rashid
2d666b9d76
add some functionality to Vector, delete copy in AccumulatorSuite
2012-07-28 20:15:51 -07:00
Imran Rashid
edc6972f8e
move Vector class into core and spark.util package
2012-07-28 20:15:42 -07:00
Imran Rashid
83659af11c
Accumulator now inherits from Accumulable, whcih simplifies a bunch of other things (eg., no +:=)
...
Conflicts:
core/src/main/scala/spark/Accumulators.scala
2012-07-28 20:13:51 -07:00
Imran Rashid
79d58ed20a
improve scaladoc
2012-07-28 20:12:41 -07:00
Imran Rashid
ae07f3864c
add Accumulatable, add corresponding docs & tests for accumulators
2012-07-28 20:12:41 -07:00
Matei Zaharia
47b7ebad12
Added the Spark Streaing code, ported to Akka 2
2012-07-28 20:03:26 -07:00
Matei Zaharia
dee8ff1b9d
Added a second version of union() without varargs.
2012-07-27 16:27:52 -07:00
Tathagata Das
cf429699e1
Updated the new checkpoint RDD to remember partitioning of the original RDD.
2012-07-27 23:16:37 +00:00
Mosharaf Chowdhury
b5be936d7c
Broadcasts using BlockManager instead of BoundedMemoryCache
2012-07-27 15:38:46 -07:00
Mosharaf Chowdhury
1f19fbb8db
Merge remote-tracking branch 'upstream/dev' into dev
...
Conflicts:
core/src/main/scala/spark/broadcast/Broadcast.scala
2012-07-27 15:18:23 -07:00
Matei Zaharia
b51d733a57
Fixed Java union methods having same erasure.
...
Changed union() methods on lists to take a separate "first element"
argument in order to differentiate them to the compiler, because Java 7
considered it an error to have them all take Lists parameterized with
different types.
2012-07-27 12:23:27 -07:00
Tathagata Das
3e271c3b61
Merge branch 'dev' of github.com:tdas/spark into dev
2012-07-27 12:01:04 -07:00
Tathagata Das
024905f682
Added BlockRDD and a first-cut version of checkpoint() to RDD class.
2012-07-27 12:00:49 -07:00
Tathagata Das
d1eee44a03
Fixed more stuff in BoundedMemoryCache.
2012-07-27 18:33:32 +00:00
Tathagata Das
d1b7f41671
Fixed bug in BoundedMemoryCache.
2012-07-27 09:00:45 -07:00
Tathagata Das
435d129bec
Fixed bugs in block dropping code of MemoryStore and changed synchronized HashMap to ConcurrentHashMap in BlockManager.
2012-07-27 10:02:26 +00:00
Tathagata Das
0426769f89
Modified the block dropping code for better performance.
2012-07-26 20:53:45 -07:00
Matei Zaharia
5c5aa2ff81
Merge pull request #153 from JoshRosen/new-java-api
...
Java API
2012-07-26 17:20:52 -07:00
Josh Rosen
c5e2810dc7
Add persist(), splits(), glom(), and mapPartitions() to Java API.
2012-07-26 12:46:47 -07:00
Josh Rosen
bf61c10072
Detect non-zero exit status from PipedRDD process.
2012-07-26 11:32:59 -07:00
Josh Rosen
6a78e88237
Minor cleanup and optimizations in Java API.
...
- Add override keywords.
- Cache RDDs and counts in TC example.
- Clean up JavaRDDLike's abstract methods.
2012-07-24 09:47:00 -07:00
Denny
4f4a34c025
Stlystic changes
...
Conflicts:
core/src/test/scala/spark/MesosSchedulerSuite.scala
2012-07-23 16:32:20 -07:00
Matei Zaharia
600e99728d
Fix a bug where an input path was added to a Hadoop job configuration twice
2012-07-23 16:16:19 -07:00
Josh Rosen
042dcbde33
Add type annotations to Java API methods.
...
Add missing Scala Map to java.util.Map conversions.
2012-07-22 17:35:29 -07:00
Josh Rosen
e23938c3be
Use mapValues() in JavaPairRDD.cogroupResultToJava().
2012-07-22 15:10:01 -07:00
Josh Rosen
01dce3f569
Add Java API
...
Add distinct() method to RDD.
Fix bug in DoubleRDDFunctions.
2012-07-18 17:34:29 -07:00
Mosharaf Chowdhury
85cd9979f2
Fix for isLocal
2012-07-13 01:13:14 -07:00
Mosharaf Chowdhury
1c83fd4b66
Merged with Upstream dev
2012-07-13 01:08:28 -07:00
Mosharaf Chowdhury
bb4ee580fa
Cleaning BitTorrentBroadcast code...
2012-07-13 01:04:01 -07:00
Mosharaf Chowdhury
8ccffe21da
Cleaned TreeBroadcast
2012-07-13 00:54:25 -07:00
Matei Zaharia
628bb5ca7f
Allow null keys in Spark's reduce and group by
2012-07-12 18:36:02 -07:00
Matei Zaharia
e2a67a8024
Fixes to coarse-grained Mesos scheduler in dealing with failed nodes
2012-07-12 18:21:52 -07:00
Matei Zaharia
be622cf867
Formatting
2012-07-11 17:31:44 -07:00
Matei Zaharia
e8ae77df24
Added more methods for loading/saving with new Hadoop API
2012-07-11 17:31:33 -07:00
Mosharaf Chowdhury
34999d97f5
Added stop() to the Broadcast subsystem
2012-07-10 01:03:47 -07:00
Mosharaf Chowdhury
d6a9680604
Slightly better check for isLocal
2012-07-10 00:16:47 -07:00
Mosharaf Chowdhury
701f49e0d9
Refactoring
2012-07-09 22:39:47 -07:00
Mosharaf Chowdhury
cf1c60a1de
Refactoring
2012-07-09 22:07:46 -07:00
Mosharaf Chowdhury
e71f69ad3d
Refactoring
2012-07-09 22:07:17 -07:00
Mosharaf Chowdhury
ca02a92332
Refactored TrackMultipleValues out.
2012-07-09 21:35:39 -07:00
Mosharaf Chowdhury
654576ef1a
Tweaks
2012-07-09 21:12:42 -07:00
Mosharaf Chowdhury
425c247269
Removed some unused stuff
2012-07-08 14:29:04 -07:00
Matei Zaharia
0a47284003
More work to allow Spark to run on the standalone deploy cluster.
2012-07-08 14:00:04 -07:00
Mosharaf Chowdhury
c7c5258e25
Compiles without Dfs
2012-07-08 13:22:12 -07:00
Mosharaf Chowdhury
178bb29f05
Removed Chained and Dfs broadcast implementations
2012-07-08 11:57:00 -07:00
Matei Zaharia
1aa63f775b
Added back coarse-grained Mesos scheduler based on StandaloneScheduler.
2012-07-08 10:52:13 -07:00
Matei Zaharia
c5cc10cda3
More work on standalone scheduler
2012-07-06 20:17:44 -07:00
Matei Zaharia
909b325243
Further refactoring, and start of a standalone scheduler backend
2012-07-06 17:56:44 -07:00
Matei Zaharia
4e2fe0bdaf
Miscellaneous bug fixes
2012-07-06 16:33:40 -07:00
Matei Zaharia
e72afdb817
Some refactoring to make cluster scheduler pluggable.
2012-07-06 15:23:26 -07:00
Matei Zaharia
5d1a887bed
Further updates to run processes on cluster.
2012-07-01 17:13:31 -07:00
Matei Zaharia
51c46eaca0
More work on standalone deploy system.
2012-07-01 01:05:59 -07:00
Matei Zaharia
a6eb9fda61
Detect connection and disconnection of slaves
2012-06-30 17:46:56 -07:00
Matei Zaharia
408b5a1332
More work on deploy code (adding Worker class)
2012-06-30 16:45:57 -07:00
Matei Zaharia
2fb6e7d71e
Initial framework to get a master and web UI up.
2012-06-30 14:45:55 -07:00
Matei Zaharia
c53670b9bf
Various code style fixes, mostly from IntelliJ IDEA
2012-06-29 18:47:12 -07:00
Matei Zaharia
c6be4ffbf9
Fixes to CoarseMesosScheduler
2012-06-29 16:18:51 -07:00
Matei Zaharia
3a58efa5a5
Allow binding to a free port and change Akka logging to use SLF4J. Also
...
fixes various bugs in the previous code when running on Mesos.
2012-06-29 16:02:21 -07:00
Matei Zaharia
3920189932
Upgraded to Akka 2 and fixed test execution (which was still parallel
...
across projects).
2012-06-28 23:51:28 -07:00
root
6ad3e1f1b4
Various fixes when running on Mesos
2012-06-20 06:48:26 +00:00
Tathagata Das
40536e3668
Fixed nasty corner case bug in ByteBufferInputStream. Could not add a test case for this as I could not figure out how to deterministically reproduce the bug in a short testcase.
2012-06-17 13:28:41 -07:00
Matei Zaharia
2893b30550
Various fixes to get unit tests running. In particular, shut down
...
ConnectionManager and DAGScheduler properly, plus a fix to
LocalScheduler that was not merged in from 0.5 and was actually caught
by one of the tests.
2012-06-17 00:28:45 -07:00
Matei Zaharia
b3eeac55b8
Fixed HttpBroadcast to work with this branch's Serializer.
2012-06-15 23:54:38 -07:00
Matei Zaharia
f58da6164e
Merge branch 'master' into dev
2012-06-15 23:47:11 -07:00
Tathagata Das
5f54bdf98b
Added shutdown for akka to SparkContext.stop(). Helps a little, but many testsuites still fail.
2012-06-13 20:49:00 -04:00
Tathagata Das
c6156da9e2
Multiple bug fixes to pass the testsuites ShuffleSuite and BlockManagerSuite.
2012-06-13 16:26:49 -04:00
Matei Zaharia
879bc0bece
Merge branch 'master' into mesos-0.9
2012-06-09 16:24:16 -07:00
Matei Zaharia
4b05798c06
Further bug fix to HttpBroadcast
2012-06-09 16:24:03 -07:00
Matei Zaharia
587a16a7ef
Merge branch 'master' into mesos-0.9
2012-06-09 16:17:07 -07:00
Matei Zaharia
8ed662862e
Bug fix to HttpBroadcast
2012-06-09 16:16:55 -07:00
Matei Zaharia
2fd9f994ae
Merge branch 'master' into mesos-0.9
2012-06-09 15:58:35 -07:00
Matei Zaharia
e75b1b5cb4
Change the default broadcast implementation to a simple HTTP-based
...
broadcast. Fixes #139 .
2012-06-09 15:58:07 -07:00
Matei Zaharia
a96558caa3
Performance improvements to shuffle operations: in particular, preserve
...
RDD partitioning in more cases where it's possible, and use iterators
instead of materializing collections when doing joins.
2012-06-09 14:44:18 -07:00
Matei Zaharia
63051dd2bc
Merge in engine improvements from the Spark Streaming project, developed
...
jointly with Tathagata Das and Haoyuan Li. This commit imports the changes
and ports them to Mesos 0.9, but does not yet pass unit tests due to
various classes not supporting a graceful stop() yet.
2012-06-07 12:45:38 -07:00
Matei Zaharia
7e1c97fc4b
Merge branch 'master' into mesos-0.9
2012-06-06 16:48:59 -07:00
Matei Zaharia
048276799a
Commit task outputs to Hadoop-supported storage systems in parallel on the
...
cluster instead of on the master. Fixes #110 .
2012-06-06 16:46:53 -07:00
Matei Zaharia
6888bc7191
Merge branch 'master' into mesos-0.9
2012-06-06 16:14:19 -07:00
Matei Zaharia
6ae2746d1e
Handle arrays that contain the same element many times better in
...
SizeEstimator. Also added a test for SizeEstimator. Fixes #136 .
2012-06-06 16:13:02 -07:00
Matei Zaharia
dbc3c86ae3
Merge branch 'master' into mesos-0.9
...
Conflicts:
core/src/main/scala/spark/Executor.scala
2012-06-03 17:44:04 -07:00
Matei Zaharia
e141f644ca
Merge pull request #132 from Benky/rb-first-iteration
...
Little refactoring and unit tests for CacheTrackerActor
2012-05-26 13:15:06 -07:00
Richard Benkovsky
ae64920337
MesosScheduler refactoring
2012-05-22 11:04:54 +02:00
Richard Benkovsky
3a1bcd4028
Added tests for CacheTrackerActor
2012-05-22 11:04:54 +02:00
Richard Benkovsky
8f2f736d53
Little refactoring
2012-05-22 11:04:54 +02:00
Richard Benkovsky
f162fc2beb
Formating fixed
2012-05-22 09:45:38 +02:00
Richard Benkovsky
565245871f
BoundedMemoryCache.put fails when estimated size of 'value' is larger than cache capacity
2012-05-20 22:13:35 +02:00
Richard Benkovsky
822a4be37d
Utils.memoryBytesToString fixed
2012-05-19 15:13:20 +02:00
Reynold Xin
d0c6e9f639
Made some RDD dependencies transient to reduce the amount of data needed
...
to be serialized in closure serialization. This can significantly reduce
the task setup time in Shark when the query involves a large number of
(Hive) partitions.
2012-05-16 14:16:55 -07:00
Reynold Xin
16461e2eda
Updated Cache's put method to use a case class for response. Previously
...
it was pretty ugly that put() should return -1 for failures.
2012-05-15 00:31:52 -07:00
Reynold Xin
019e48833f
Added the capacity to report cache usage status back to the cache
...
trackor. This is essential for building a dashboard to see the status of
caches on all slaves.
2012-05-14 18:39:04 -07:00
Matei Zaharia
f48742683a
Made caches dataset-aware so that they won't cyclically evict partitions
...
from the same dataset.
2012-05-06 20:14:40 -07:00
Matei Zaharia
bd2ab635a7
Fixed the way the JAR server is created after finding issue at Twitter
2012-05-05 20:05:15 -07:00
Matei Zaharia
32a4f4623c
Merge pull request #129 from mesos/rxin
...
Force serialize/deserialize task results in local execution mode.
2012-04-24 16:18:39 -07:00
Reynold Xin
9821cd4d42
Force serialize/deserialize task results in local execution mode.
2012-04-24 14:55:28 -07:00
Antonio
3e48818993
Removed commented-out System.exit call
2012-04-23 11:42:58 -07:00
Antonio
39d99168dc
Added exception handling instead of just exiting in LocalScheduler for tasks that throw exceptions
2012-04-20 14:46:43 -07:00
Reynold Xin
e601b3b9e5
Added the ability to set environmental variables in piped rdd.
2012-04-17 16:40:56 -07:00
Matei Zaharia
3b745176e0
Bug fix to pluggable closure serialization change
2012-04-12 17:53:02 +00:00
Matei Zaharia
112655f032
Merge pull request #121 from rxin/kryo-closure
...
Added an option (spark.closure.serializer) to specify the serializer for closures.
2012-04-10 14:21:02 -07:00
Reynold Xin
d295ccb43c
Added a closureSerializer field in SparkEnv and use it to serialize
...
tasks.
2012-04-10 13:29:46 -07:00
Reynold Xin
968f75f6af
Added an option (spark.closure.serializer) to specify the serializer for
...
closures. This enables using Kryo as the closure serializer.
2012-04-09 21:59:56 -07:00
Matei Zaharia
a69c0738d1
Merge branch 'master' into mesos-0.9
2012-04-08 23:41:36 -07:00
Matei Zaharia
a633974143
Merge branch 'master' of github.com:mesos/spark
2012-04-08 23:41:25 -07:00
Matei Zaharia
0229d5390f
Merge branch 'master' into mesos-0.9
2012-04-08 23:39:37 -07:00
Matei Zaharia
d401e1b3e8
Fix a possible deadlock in MesosScheduler
2012-04-08 23:38:49 -07:00
Ankur Dave
7be1c7b331
Report entry dropping in BoundedMemoryCache
2012-04-06 15:49:32 -07:00
Matei Zaharia
a8bb324ed9
Merge branch 'master' into mesos-0.9
2012-04-05 14:53:22 -07:00
Matei Zaharia
816d4e5840
Pass local IP address instead of hostname in spark.master.host. Fixes #117 .
2012-04-05 14:53:17 -07:00
Matei Zaharia
335a6036ad
Converted some tabs to spaces
2012-04-05 11:58:01 -07:00
Matei Zaharia
8c95a85438
Use Runtime.maxMemory instead of Runtime.totalMemory in
...
BoundedMemoryCache, in case the JVM was not started with its initial
heap size equaling its maximum one (-Xms == -Xmx).
2012-03-30 13:39:35 -04:00
Matei Zaharia
03d5b3b48d
Use Runtime.maxMemory instead of Runtime.totalMemory in
...
BoundedMemoryCache, in case the JVM was not started with its initial
heap size equaling its maximum one (-Xms == -Xmx).
2012-03-30 13:38:19 -04:00
Matei Zaharia
dfa3b6b544
Fixes to work with the very latest Mesos 0.9 API
2012-03-29 22:12:35 -04:00
Matei Zaharia
4d52cc6738
Merge branch 'master' into mesos-0.9
2012-03-29 21:29:39 -04:00
Reynold Xin
42dcdbcb2f
Removed the extra spaces in OrderedRDDFunctions and SortedRDD.
2012-03-29 15:21:57 -07:00
Matei Zaharia
08cda89e8a
Further fixes to how Mesos is found and used
2012-03-17 13:39:14 -07:00
Matei Zaharia
3c3fdf6eca
Merge branch 'master' into mesos-0.9
2012-03-17 13:09:21 -07:00
Matei Zaharia
c7af538ac1
Some fixes to sorting for when the RDD has fewer elements than the
...
number of partitions we ask to partition it into. Also, removed a test
that was taking way too long to run.
2012-03-17 13:08:36 -07:00
Matei Zaharia
a099a63a8a
Initial work to make Spark compile with Mesos 0.9 and Hadoop 1.0
2012-03-17 12:31:34 -07:00
Matei Zaharia
a5e2b6a6bd
Merge pull request #112 from cengle/master
...
Changed HadoopRDD to get key and value containers from the RecordReader instead of through reflection
2012-03-06 13:38:32 -08:00
Matei Zaharia
97eee50825
Fixes a nasty bug that could happen when tasks fail, because calling
...
wait() with a timeout of 0 on a Java object means "wait forever".
2012-03-01 13:43:17 -08:00
Cliff Engle
dd68cb6099
Get key and value container from RecordReader
2012-02-29 16:33:23 -08:00
Matei Zaharia
1e10df0a46
Merge pull request #111 from alupher/master
...
Adding sorting to RDDs
2012-02-24 15:50:14 -08:00
Matei Zaharia
aa04f87cd2
Added support for parallel execution of jobs in DAGScheduler.
2012-02-19 22:50:23 -08:00
Antonio
620798161b
Added fixes to sorting
2012-02-13 00:07:39 -08:00
Matei Zaharia
2587ce1690
Fixed a deadlock that occured with MesosScheduler due to an earlier
...
synchronization change
2012-02-11 21:22:45 -08:00
Antonio
e93f622665
Added sorting by key for pair RDDs
2012-02-11 00:56:28 -08:00
Matei Zaharia
98f008b721
Formatting fixes
2012-02-10 10:52:03 -08:00
Matei Zaharia
7660a8b12f
Merge branch 'formatting'
...
Conflicts:
core/src/main/scala/spark/DAGScheduler.scala
core/src/main/scala/spark/SimpleShuffleFetcher.scala
core/src/main/scala/spark/SparkContext.scala
2012-02-10 10:42:14 -08:00
haoyuan
194c42ab79
Code format.
2012-02-10 08:19:53 -08:00
Matei Zaharia
8f5ed51234
Delete Spark's temporary directories when the JVM exits.
2012-02-09 22:58:24 -08:00
Matei Zaharia
c0a0df3285
Made the default cache BoundedMemoryCache, and reduced its default size
2012-02-09 22:32:02 -08:00
Matei Zaharia
0e93891d3d
Replaced LocalFileShuffle with a non-singleton ShuffleManager class
...
and made DAGScheduler automatically set SparkEnv.
2012-02-09 22:14:56 -08:00
haoyuan
445e0bb1b5
Format the code a bit mroe.
2012-02-09 15:50:26 -08:00
haoyuan
651932e703
Format the code as coding style agreed by Matei/TD/Haoyuan
2012-02-09 13:26:23 -08:00
Matei Zaharia
e02dc83a5b
IO optimizations
2012-02-06 20:40:39 -08:00
Matei Zaharia
c40e766368
Use java.util.HashMap in shuffles
2012-02-06 19:20:25 -08:00
Matei Zaharia
b267175ab5
Synchronization fix in case SparkContext is used from multiple threads.
2012-02-06 14:28:18 -08:00
Hiral Patel
b47952342e
Add register immutable map to kryo serializer
2012-01-26 15:24:20 -08:00
Matei Zaharia
fabcc82528
Merge pull request #103 from edisontung/master
...
Made improvements to takeSample. Also changed SparkLocalKMeans to SparkKMeans
2012-01-13 19:20:03 -08:00
Matei Zaharia
fd5581a0d3
Fixed a failure recovery bug and added some tests for fault recovery.
2012-01-13 19:17:27 -08:00