Matei Zaharia
d71a358c46
Fixed a test that was getting extremely lucky before, and increased the
...
number of samples used for sorting
2012-09-26 00:25:34 -07:00
Matei Zaharia
051785c7e6
Several fixes to sampling issues pointed out by Henry Milner:
...
- takeSample was biased towards earlier partitions
- There were some range errors in takeSample
- SampledRDDs with replacement didn't produce appropriate counts
across partitions (we took exactly frac of each one)
2012-09-25 21:46:58 -07:00
Matei Zaharia
4d3339a3ec
Merge pull request #217 from rxin/dev
...
Added a method to RDD to expose the ClassManifest.
2012-09-24 23:52:32 -07:00
Reynold Xin
7a4cd92861
Renamed RDD.manifest to RDD.elementClassManifest
2012-09-24 23:42:33 -07:00
Matei Zaharia
296e24b440
Merge pull request #218 from rnpandya/dev
...
Scripts to start Spark under windows
2012-09-24 21:10:31 -07:00
Reynold Xin
348bcbca1f
Added a method to RDD to expose the ClassManifest.
2012-09-24 16:56:27 -07:00
Ravi Pandya
39215357af
Windows command scripts for sbt and run
2012-09-24 15:43:19 -07:00
Matei Zaharia
6eeb379cf8
Fix some test issues
2012-09-24 15:39:58 -07:00
Matei Zaharia
f855e4fad2
Merge pull request #208 from rxin/dev
...
Separated ShuffledRDD into multiple classes.
2012-09-24 12:32:01 -07:00
root
107a5ca879
Make default number of parallel fetches slightly smaller since it doesn't seem to hurt performance much and it will cause slightly less GC.
2012-09-23 06:06:12 +00:00
root
e41cab04ca
Avoid creating an extra buffer when saving a stream of values as DISK_ONLY
2012-09-23 05:56:44 +00:00
Denny
afb7ccc838
HTTP File server fixes.
2012-09-21 10:58:13 -07:00
root
6d28dde370
Rename our toIterator method into asIterator to prevent confusion with the
...
Scala collection one, which often *copies* a collection.
2012-09-21 06:02:55 +00:00
root
a642051ade
Fixed a performance bug in BlockManager that was creating garbage when
...
returning deserialized, in-memory RDDs.
2012-09-21 05:42:21 +00:00
root
8feb5caacd
Fixed an issue with ordering of classloader setup that was causing Java deserializer to break
2012-09-21 05:13:19 +00:00
Reynold Xin
6b5980da79
Set a limited number of retry in standalone deploy mode.
2012-09-19 15:41:56 -07:00
Reynold Xin
397d3816e1
Separated ShuffledRDD into multiple classes: RepartitionShuffledRDD,
...
ShuffledSortedRDD, and ShuffledAggregatedRDD.
2012-09-19 12:31:45 -07:00
Denny
ca64d16a2d
When a file is downloaded, make it executable. That's neccsary for scripts (e.g. in Shark)
2012-09-17 10:08:37 -07:00
Matei Zaharia
840cbcf849
Change default serializer to Java.. it had accidentally become Kryo.
2012-09-13 17:19:26 -07:00
Matei Zaharia
b4dfa25c8a
Store shuffle map outputs as DISK_ONLY
2012-09-12 16:05:57 -07:00
Matei Zaharia
2d761e3353
Ported performance and FT improvements from latest streaming work
2012-09-12 14:54:40 -07:00
Matei Zaharia
9b4cd1648b
Fix bugs with Connection's shutdown callback failing to get its address
2012-09-12 14:54:14 -07:00
Matei Zaharia
9199775d41
Wait for Akka to really shut down in SparkEnv.stop()
2012-09-12 14:50:37 -07:00
Denny
5e4076e3f2
Merge branch 'dev' into feature/fileserver
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
2012-09-11 16:57:17 -07:00
Denny
77873d2c8e
Formatting
2012-09-11 16:51:46 -07:00
Denny
24b9b37314
Subclass URLClassLoader instead of using reflection
2012-09-11 16:51:08 -07:00
Denny
31c53e917d
Use stageId as index for fileSet caches.
2012-09-11 16:10:45 -07:00
Matei Zaharia
943df48348
Merge branch 'dev' of github.com:mesos/spark into dev
2012-09-11 16:00:37 -07:00
Matei Zaharia
6d7f907e73
Manually merge pull request #175 by Imran Rashid
2012-09-11 16:00:06 -07:00
Reynold Xin
7af7c79ce5
Updated the logError call from the previous commit to conform to
...
logError API.
2012-09-11 14:32:24 -07:00
Reynold Xin
38b9119c96
Log entire exception (including stack trace) in BlockManagerWorker.
2012-09-11 11:31:35 -07:00
Denny
4d3471dd07
Fix serialization bugs and added local cluster tests
2012-09-10 15:39:58 -07:00
Denny
b864c36a30
Dynamically adding jar files and caching fileSets.
2012-09-10 12:49:09 -07:00
Denny
f275fb07da
General FileServer
...
A general fileserver for both JARs and regular files.
2012-09-10 12:48:59 -07:00
Matei Zaharia
a13780670d
Added a unit test for local-cluster mode and simplified some of the code involved in that
2012-09-10 12:48:58 -07:00
Denny
f2ac55840c
Add shutdown hook to Executor Runner and execute code to shutdown local cluster in Scheduler Backend
2012-09-10 12:48:58 -07:00
Denny
9ead8ab14e
Set SPARK_LAUNCH_WITH_SCALA=0 in Executor Runner
2012-09-10 12:48:58 -07:00
Denny
8bb3c73977
Renamed spark-cluster to spark-local.
2012-09-10 12:48:58 -07:00
Denny
a367c20f49
Fix wrong counting
2012-09-10 12:48:57 -07:00
Denny
93fe331e6d
Delete old DeployUtils.
2012-09-10 12:48:57 -07:00
Denny
cf074f9c96
Renamed class.
2012-09-10 12:48:57 -07:00
Denny
3749f94184
Start a standalone cluster locally.
2012-09-10 12:48:57 -07:00
Matei Zaharia
995982b3c9
Added a unit test for local-cluster mode and simplified some of the code involved in that
2012-09-07 17:08:36 -07:00
Matei Zaharia
8d2fcc2832
Merge pull request #189 from dennybritz/feature/localcluster
...
Simulating a Spark standalone cluster locally
2012-09-07 15:43:43 -07:00
Denny
7ff9311add
Add shutdown hook to Executor Runner and execute code to shutdown local cluster in Scheduler Backend
2012-09-07 14:09:12 -07:00
Denny
4e7b264cf7
Set SPARK_LAUNCH_WITH_SCALA=0 in Executor Runner
2012-09-07 11:39:44 -07:00
root
c2da64409a
Randomize the order of block fetches in getMultiple
2012-09-06 23:16:26 +00:00
Denny
886183e591
Renamed spark-cluster to spark-local.
2012-09-05 17:10:54 -07:00
Reynold Xin
c308fbcb79
Removed cache add/remove log messages from CacheTracker.
...
Added log messages on BlockManagerMaster to reflect block add/remove.
Also did some minor cleanup of storage package code.
2012-09-05 15:59:48 -07:00
Denny
babbca0a2f
Fix wrong counting
2012-09-04 22:04:18 -07:00
Denny
9326509f66
Delete old DeployUtils.
2012-09-04 21:15:23 -07:00
Denny
1588d4dbe6
Renamed class.
2012-09-04 21:13:25 -07:00
Denny
22dde6e020
Start a standalone cluster locally.
2012-09-04 20:56:30 -07:00
Matei Zaharia
a842c63044
Minor formatting fixes
2012-09-03 16:24:00 -07:00
Harvey
3076b038f4
Start fetching a remote block when a received remote block has been passed
...
to the reduce function
2012-09-01 12:01:35 -07:00
Matei Zaharia
389fb4cc54
End runJob() with a SparkException when a task fails too many times in
...
one of the cluster schedulers.
2012-08-31 17:47:43 -07:00
Mosharaf Chowdhury
31ffe8d528
Synchronization bug fix in broadcast implementations
2012-08-30 22:26:43 -07:00
Mosharaf Chowdhury
3883532545
Bug fix. Fixed log messages. Updated BroadcastTest example to have iterations.
2012-08-30 21:43:00 -07:00
Matei Zaharia
a480dec6b2
Deserialize multi-get results in the caller's thread. This fixes an
...
issue with shared buffers in the KryoSerializer.
2012-08-30 20:01:06 -07:00
Reynold Xin
5945bcdcc5
Added a new flag in Aggregator to indicate applying map side combiners.
2012-08-29 23:32:08 -07:00
Reynold Xin
c68e820b2a
Merge branch 'dev' of github.com:mesos/spark into dev
2012-08-29 23:01:19 -07:00
Reynold Xin
940869dfda
Disable running combiners on map tasks when mergeCombiners function is
...
not specified by the user.
2012-08-29 23:00:02 -07:00
Matei Zaharia
bf2e9cb08e
Fault tolerance and block store fixes discovered through streaming tests.
2012-08-27 23:07:50 -07:00
Reynold Xin
3a6a95dc24
Removed the deserialization cache for ShuffleMapTask because it was
...
causing concurrency problems (some variables in Shark get set to null).
The cost of task deserialization on slaves is trivial compared with the
execution time of the task anyway.
2012-08-27 22:33:15 -07:00
Matei Zaharia
deedb9e7b7
Fix further issues with tests and broadcast.
...
The broadcast fix is to store values as MEMORY_ONLY_DESER instead of
MEMORY_ONLY, which will save substantial time on serialization.
2012-08-23 20:31:49 -07:00
Matei Zaharia
59b831b9d1
Fixed test failures due to broadcast not stopping correctly
2012-08-23 19:59:55 -07:00
Matei Zaharia
7310a6f499
Merge pull request #147 from mosharaf/dev
...
Broadcast refactoring/cleaning up
2012-08-23 19:38:28 -07:00
Matei Zaharia
25a6a39e6d
Added other SparkContext constructors to JavaSparkContext
2012-08-19 18:59:16 -07:00
Shivaram Venkataraman
1ea269110c
Move object size and pointer size initialization into a function to enable unit-testing
2012-08-13 13:31:45 -07:00
Shivaram Venkataraman
44661df9cc
If spark.test.useCompressedOops is set, use that to infer compressed oops
...
setting. This is useful to get a deterministic test case
2012-08-13 13:31:39 -07:00
Shivaram Venkataraman
0dd8fe73ba
Use HotSpotDiagnosticMXBean to get if CompressedOops are in use or not
2012-08-13 13:31:29 -07:00
Shivaram Venkataraman
80104ce1da
Add link to Java wiki which specifies what changes with compressed oops
2012-08-13 13:31:21 -07:00
Shivaram Venkataraman
00ab5490b3
Changes to make size estimator more accurate. Fixes object size, pointer size
...
according to architecture and also aligns objects and arrays when computing
instance sizes. Verified using Eclipse Memory Analysis Tool (MAT)
2012-08-13 13:31:11 -07:00
Matei Zaharia
6ae3c375a9
Renamed apply() to call() in Java API and allowed it to throw Exceptions
2012-08-12 23:10:19 +02:00
Matei Zaharia
0141879c40
Use Promises instead of having a Future wait on a thread in
...
ConnectionManager.
2012-08-12 22:16:32 +02:00
Matei Zaharia
845a870242
Return remotely fetched blocks in a pipelined fashion from BlockManager
2012-08-12 20:01:38 +02:00
Matei Zaharia
e17ed9a21d
Switch to Akka futures in connection manager.
...
It's still not good because each Future ends up waiting on a lock, but
it seems to work better than Scala Actors, and more importantly it
allows us to use onComplete and other listeners on futures.
2012-08-12 19:40:37 +02:00
Matei Zaharia
ad8a7612a4
Changed multi-get method in BlockManager to return an iterator
2012-08-12 19:18:01 +02:00
Matei Zaharia
3c94e5c188
Merge pull request #168 from shivaram/dev
...
Use JavaConversion to get a scala iterator
2012-08-10 00:57:33 -07:00
Matei Zaharia
e463e7a333
Merge pull request #167 from JoshRosen/piped-rdd-fixes
...
Detect non-zero exit status from PipedRDD process
2012-08-10 00:56:42 -07:00
Josh Rosen
59c22fb444
Print exit status in PipedRDD failure exception.
2012-08-10 00:33:56 -07:00
Shivaram Venkataraman
1803cce692
Use an implicit conversion to get the scala iterator
2012-08-08 14:31:04 -07:00
Shivaram Venkataraman
674fcf56bf
Use JavaConversion to get a scala iterator
2012-08-08 14:10:23 -07:00
Shivaram Venkataraman
f4aaec7a48
Avoid a copy in ShuffleMapTask by creating an iterator that will be used by the
...
block manager.
2012-08-08 00:47:02 -07:00
Mosharaf Chowdhury
d821dd3ccc
BroadcastManager is a class now (replaced Braodcast object)
2012-08-05 01:10:51 -07:00
Mosharaf Chowdhury
b4804119f9
Merge remote-tracking branch 'upstream/dev' into dev
2012-08-04 20:42:12 -07:00
Matei Zaharia
88b016db2a
Merge pull request #160 from dennybritz/clusterscripts
...
Standalone cluster scripts
2012-08-04 17:45:20 -07:00
Mosharaf Chowdhury
1b0534af8f
Merge branch 'dev' into bc-bm
2012-08-04 00:30:08 -07:00
Mosharaf Chowdhury
d11b457e67
Merge remote-tracking branch 'upstream/dev' into dev
2012-08-04 00:28:10 -07:00
Mosharaf Chowdhury
24b7eb872c
Bug fixed. Broadcast now works with BlockManager.
2012-08-04 00:27:28 -07:00
Matei Zaharia
6601a6212b
Added a unit test for cross-partition balancing in sort, and changes to
...
RangePartitioner to make it pass. It turns out that the first partition
was always kind of small due to how we picked partition boundaries.
2012-08-03 16:40:45 -04:00
Harvey
1170de3757
Fix for partitioning when sorting in descending order
2012-08-03 16:40:38 -04:00
Paul Cavallaro
d05c0f97ca
Logging Throwables in Info and Debug
...
Logging Throwables in logInfo and logDebug instead of swallowing them.
Conflicts:
core/src/main/scala/spark/Logging.scala
2012-08-03 16:40:21 -04:00
Denny
0008994044
merged dev branch
2012-08-02 16:00:33 -07:00
Denny
53008c2d8a
Settings variables and bugfix for stop script.
2012-08-02 15:59:39 -07:00
Matei Zaharia
71a958b0b7
Merge branch 'dev' of github.com:mesos/spark into dev
...
Conflicts:
project/SparkBuild.scala
2012-08-02 17:23:13 -04:00
Denny
7312a5c30f
Use spray's implicit Marshaller for Futures.
2012-08-02 14:11:27 -07:00
Denny
ba7e30fb5e
Mostly stlyistic changes.
2012-08-02 13:55:09 -07:00
Shivaram Venkataraman
1a07bb9ba4
Avoid an extra partition copy by passing an iterator to blockManager.put
2012-08-02 12:22:33 -07:00
Shivaram Venkataraman
6790908b11
Use maxMemory to better estimate memory available for BlockManager cache
2012-08-02 12:05:05 -07:00
Denny
863c31b7c1
Moved resources into static folder
2012-08-02 09:48:36 -07:00
Denny
0ee44c225e
Spark standalone mode cluster scripts.
...
Heavily inspired by Hadoop cluster scripts ;-)
2012-08-01 20:38:52 -07:00
Denny
6c670c37dd
Webui improvements.
2012-08-01 19:47:57 -07:00
Denny
1b29e90a79
merge dev branch
2012-08-01 14:06:09 -07:00
Denny
011220fa55
Compact job page.
2012-08-01 11:26:45 -07:00
Denny
7a295fee96
Spark WebUI Implementation.
2012-08-01 11:01:09 -07:00
Mosharaf Chowdhury
f23395e8c5
Merge remote-tracking branch 'upstream/dev' into dev
2012-07-30 19:39:49 -07:00
Matei Zaharia
3ee2530c0c
Merge branch 'block-manager-fix' into dev
2012-07-30 13:58:46 -07:00
Matei Zaharia
400221f851
Merge branch 'dev' of git://github.com/tdas/spark into dev
2012-07-30 13:54:57 -07:00
Matei Zaharia
ed1b0f8388
Made BlockManagerMaster no longer be a singleton.
...
Also cleaned up a few formatting things throughout block manager code.
2012-07-30 13:53:47 -07:00
Matei Zaharia
f471c82558
Various reorganization and formatting fixes
2012-07-30 11:24:01 -07:00
Mosharaf Chowdhury
5932a87cac
Merge remote-tracking branch 'upstream/dev' into dev
2012-07-29 18:20:45 -07:00
Imran Rashid
f7149c5e46
tasks cannot access value of accumulator
2012-07-28 20:16:17 -07:00
Imran Rashid
244cbbe33a
one more minor cleanup to scaladoc
2012-07-28 20:16:10 -07:00
Imran Rashid
3b392c67db
fix up scaladoc, naming of type parameters
2012-07-28 20:16:01 -07:00
Imran Rashid
f1face1ea9
rename addToAccum to addAccumulator
2012-07-28 20:16:01 -07:00
Imran Rashid
2d666b9d76
add some functionality to Vector, delete copy in AccumulatorSuite
2012-07-28 20:15:51 -07:00
Imran Rashid
edc6972f8e
move Vector class into core and spark.util package
2012-07-28 20:15:42 -07:00
Imran Rashid
83659af11c
Accumulator now inherits from Accumulable, whcih simplifies a bunch of other things (eg., no +:=)
...
Conflicts:
core/src/main/scala/spark/Accumulators.scala
2012-07-28 20:13:51 -07:00
Imran Rashid
79d58ed20a
improve scaladoc
2012-07-28 20:12:41 -07:00
Imran Rashid
ae07f3864c
add Accumulatable, add corresponding docs & tests for accumulators
2012-07-28 20:12:41 -07:00
Matei Zaharia
dee8ff1b9d
Added a second version of union() without varargs.
2012-07-27 16:27:52 -07:00
Tathagata Das
cf429699e1
Updated the new checkpoint RDD to remember partitioning of the original RDD.
2012-07-27 23:16:37 +00:00
Mosharaf Chowdhury
b5be936d7c
Broadcasts using BlockManager instead of BoundedMemoryCache
2012-07-27 15:38:46 -07:00
Mosharaf Chowdhury
1f19fbb8db
Merge remote-tracking branch 'upstream/dev' into dev
...
Conflicts:
core/src/main/scala/spark/broadcast/Broadcast.scala
2012-07-27 15:18:23 -07:00
Matei Zaharia
b51d733a57
Fixed Java union methods having same erasure.
...
Changed union() methods on lists to take a separate "first element"
argument in order to differentiate them to the compiler, because Java 7
considered it an error to have them all take Lists parameterized with
different types.
2012-07-27 12:23:27 -07:00
Tathagata Das
3e271c3b61
Merge branch 'dev' of github.com:tdas/spark into dev
2012-07-27 12:01:04 -07:00
Tathagata Das
024905f682
Added BlockRDD and a first-cut version of checkpoint() to RDD class.
2012-07-27 12:00:49 -07:00
Tathagata Das
d1eee44a03
Fixed more stuff in BoundedMemoryCache.
2012-07-27 18:33:32 +00:00
Tathagata Das
d1b7f41671
Fixed bug in BoundedMemoryCache.
2012-07-27 09:00:45 -07:00
Tathagata Das
435d129bec
Fixed bugs in block dropping code of MemoryStore and changed synchronized HashMap to ConcurrentHashMap in BlockManager.
2012-07-27 10:02:26 +00:00
Tathagata Das
0426769f89
Modified the block dropping code for better performance.
2012-07-26 20:53:45 -07:00
Matei Zaharia
5c5aa2ff81
Merge pull request #153 from JoshRosen/new-java-api
...
Java API
2012-07-26 17:20:52 -07:00
Josh Rosen
c5e2810dc7
Add persist(), splits(), glom(), and mapPartitions() to Java API.
2012-07-26 12:46:47 -07:00
Josh Rosen
bf61c10072
Detect non-zero exit status from PipedRDD process.
2012-07-26 11:32:59 -07:00
Josh Rosen
6a78e88237
Minor cleanup and optimizations in Java API.
...
- Add override keywords.
- Cache RDDs and counts in TC example.
- Clean up JavaRDDLike's abstract methods.
2012-07-24 09:47:00 -07:00
Denny
4f4a34c025
Stlystic changes
...
Conflicts:
core/src/test/scala/spark/MesosSchedulerSuite.scala
2012-07-23 16:32:20 -07:00
Matei Zaharia
600e99728d
Fix a bug where an input path was added to a Hadoop job configuration twice
2012-07-23 16:16:19 -07:00
Josh Rosen
042dcbde33
Add type annotations to Java API methods.
...
Add missing Scala Map to java.util.Map conversions.
2012-07-22 17:35:29 -07:00
Josh Rosen
e23938c3be
Use mapValues() in JavaPairRDD.cogroupResultToJava().
2012-07-22 15:10:01 -07:00
Josh Rosen
01dce3f569
Add Java API
...
Add distinct() method to RDD.
Fix bug in DoubleRDDFunctions.
2012-07-18 17:34:29 -07:00
Mosharaf Chowdhury
85cd9979f2
Fix for isLocal
2012-07-13 01:13:14 -07:00
Mosharaf Chowdhury
1c83fd4b66
Merged with Upstream dev
2012-07-13 01:08:28 -07:00
Mosharaf Chowdhury
bb4ee580fa
Cleaning BitTorrentBroadcast code...
2012-07-13 01:04:01 -07:00
Mosharaf Chowdhury
8ccffe21da
Cleaned TreeBroadcast
2012-07-13 00:54:25 -07:00
Matei Zaharia
628bb5ca7f
Allow null keys in Spark's reduce and group by
2012-07-12 18:36:02 -07:00
Matei Zaharia
e2a67a8024
Fixes to coarse-grained Mesos scheduler in dealing with failed nodes
2012-07-12 18:21:52 -07:00
Matei Zaharia
be622cf867
Formatting
2012-07-11 17:31:44 -07:00
Matei Zaharia
e8ae77df24
Added more methods for loading/saving with new Hadoop API
2012-07-11 17:31:33 -07:00
Mosharaf Chowdhury
34999d97f5
Added stop() to the Broadcast subsystem
2012-07-10 01:03:47 -07:00