Matei Zaharia
fd0374b9de
Comment
2012-09-29 21:43:06 -07:00
Matei Zaharia
5718cef2a4
Removed Logging trait from CoalescedRDD since we don't log anything
2012-09-29 21:40:43 -07:00
Matei Zaharia
143ef4f90d
Added a CoalescedRDD class for reducing the number of partitions in an RDD.
2012-09-29 21:30:52 -07:00
Matei Zaharia
c45758ddde
Comment
2012-09-29 20:27:54 -07:00
Matei Zaharia
ebd52347b5
Merge branch 'dev' of github.com:mesos/spark into dev
2012-09-29 20:22:31 -07:00
Matei Zaharia
9b326d01e9
Made BlockManager unmap memory-mapped files when necessary to reduce the
...
number of open files. Also optimized sending of disk-based blocks.
2012-09-29 20:21:54 -07:00
Matei Zaharia
2f11e3c285
Merge pull request #227 from JoshRosen/fix/distinct_numsplits
...
Allow controlling number of splits in distinct().
2012-09-28 23:57:24 -07:00
Josh Rosen
8654165e69
Use null as dummy value in distinct().
2012-09-28 23:55:17 -07:00
Josh Rosen
37c199bbb0
Allow controlling number of splits in distinct().
2012-09-28 23:44:19 -07:00
Matei Zaharia
56dcad5936
Don't create a Cache in SparkEnv because we don't use it
2012-09-28 23:40:56 -07:00
Matei Zaharia
1d44644f4f
Logging tweaks
2012-09-28 23:28:16 -07:00
Matei Zaharia
815d6bd69a
Renamed subdirs option
2012-09-28 19:02:41 -07:00
Matei Zaharia
e54e1d7043
Made subdirs per local dir configurable, and reduced lock usage a bit
2012-09-28 19:00:50 -07:00
Matei Zaharia
ae8c7d6cfa
Made disk store use multiple directories, deleted ShuffleManager
2012-09-28 18:28:13 -07:00
Matei Zaharia
3d7267999d
Print and track user call sites in more places in Spark
2012-09-28 17:42:00 -07:00
Matei Zaharia
9f6efbf06a
Merge pull request #225 from pwendell/dev
...
Log message which records RDD origin
2012-09-28 16:28:07 -07:00
Matei Zaharia
0121a26bd1
Changed the way tasks' dependency files are sent to workers so that
...
custom serializers or Kryo registrators can be loaded.
2012-09-28 16:14:05 -07:00
Patrick Wendell
9fc78f8f29
Fixing some whitespace issues
2012-09-28 16:05:50 -07:00
Patrick Wendell
bc909c2903
Changes based on Matei's comments
2012-09-28 16:04:36 -07:00
Patrick Wendell
c387e40fb1
Log message which records RDD origin
...
This adds tracking to determine the "origin" of an RDD. Origin is defined by
the boundary between the user's code and the spark code, during an RDD's
instantiation. It is meant to help users understand where a Spark RDD is
coming from in their code.
This patch also logs origin data when stages are submitted to the scheduler.
Finally, it adds a new log message to fix an inconsitency in the way that
dependent stages (those missing parents) and independent stages (those
without) are logged during submission.
2012-09-28 15:51:46 -07:00
Matei Zaharia
2a8bfbca00
Fixed a bug where isLocal was set to false when using local[K]
2012-09-28 14:50:54 -07:00
Matei Zaharia
4a138403ef
Fix a bug in JAR fetcher that made it always fetch the JAR
2012-09-27 21:32:06 -07:00
Matei Zaharia
009b0e37e7
Added an option to compress blocks in the block store
2012-09-27 18:45:44 -07:00
Matei Zaharia
7bcb08cef5
Renamed storage levels to something cleaner; fixes #223 .
2012-09-27 17:50:59 -07:00
Matei Zaharia
920fab23c3
Merge pull request #222 from rxin/dev
...
Added MapPartitionsWithSplitRDD.
2012-09-26 23:16:45 -07:00
Matei Zaharia
ea05fc130b
Updates to standalone cluster, web UI and deploy docs.
2012-09-26 22:54:39 -07:00
Matei Zaharia
1ef4f0fbd2
Allow controlling number of splits in sortByKey.
2012-09-26 19:18:47 -07:00
Reynold Xin
1ad1331a34
Added MapPartitionsWithSplitRDD.
2012-09-26 17:11:28 -07:00
Matei Zaharia
ee71fa49c1
Look for Kryo registrator using context class loader
2012-09-26 14:15:16 -07:00
Matei Zaharia
d71a358c46
Fixed a test that was getting extremely lucky before, and increased the
...
number of samples used for sorting
2012-09-26 00:25:34 -07:00
Matei Zaharia
051785c7e6
Several fixes to sampling issues pointed out by Henry Milner:
...
- takeSample was biased towards earlier partitions
- There were some range errors in takeSample
- SampledRDDs with replacement didn't produce appropriate counts
across partitions (we took exactly frac of each one)
2012-09-25 21:46:58 -07:00
Matei Zaharia
4d3339a3ec
Merge pull request #217 from rxin/dev
...
Added a method to RDD to expose the ClassManifest.
2012-09-24 23:52:32 -07:00
Reynold Xin
7a4cd92861
Renamed RDD.manifest to RDD.elementClassManifest
2012-09-24 23:42:33 -07:00
Matei Zaharia
296e24b440
Merge pull request #218 from rnpandya/dev
...
Scripts to start Spark under windows
2012-09-24 21:10:31 -07:00
Reynold Xin
348bcbca1f
Added a method to RDD to expose the ClassManifest.
2012-09-24 16:56:27 -07:00
Ravi Pandya
39215357af
Windows command scripts for sbt and run
2012-09-24 15:43:19 -07:00
Matei Zaharia
6eeb379cf8
Fix some test issues
2012-09-24 15:39:58 -07:00
Matei Zaharia
f855e4fad2
Merge pull request #208 from rxin/dev
...
Separated ShuffledRDD into multiple classes.
2012-09-24 12:32:01 -07:00
root
107a5ca879
Make default number of parallel fetches slightly smaller since it doesn't seem to hurt performance much and it will cause slightly less GC.
2012-09-23 06:06:12 +00:00
root
e41cab04ca
Avoid creating an extra buffer when saving a stream of values as DISK_ONLY
2012-09-23 05:56:44 +00:00
Denny
afb7ccc838
HTTP File server fixes.
2012-09-21 10:58:13 -07:00
root
6d28dde370
Rename our toIterator method into asIterator to prevent confusion with the
...
Scala collection one, which often *copies* a collection.
2012-09-21 06:02:55 +00:00
root
a642051ade
Fixed a performance bug in BlockManager that was creating garbage when
...
returning deserialized, in-memory RDDs.
2012-09-21 05:42:21 +00:00
root
8feb5caacd
Fixed an issue with ordering of classloader setup that was causing Java deserializer to break
2012-09-21 05:13:19 +00:00
Reynold Xin
6b5980da79
Set a limited number of retry in standalone deploy mode.
2012-09-19 15:41:56 -07:00
Reynold Xin
397d3816e1
Separated ShuffledRDD into multiple classes: RepartitionShuffledRDD,
...
ShuffledSortedRDD, and ShuffledAggregatedRDD.
2012-09-19 12:31:45 -07:00
Denny
ca64d16a2d
When a file is downloaded, make it executable. That's neccsary for scripts (e.g. in Shark)
2012-09-17 10:08:37 -07:00
Matei Zaharia
840cbcf849
Change default serializer to Java.. it had accidentally become Kryo.
2012-09-13 17:19:26 -07:00
Matei Zaharia
b4dfa25c8a
Store shuffle map outputs as DISK_ONLY
2012-09-12 16:05:57 -07:00
Matei Zaharia
2d761e3353
Ported performance and FT improvements from latest streaming work
2012-09-12 14:54:40 -07:00
Matei Zaharia
9b4cd1648b
Fix bugs with Connection's shutdown callback failing to get its address
2012-09-12 14:54:14 -07:00
Matei Zaharia
9199775d41
Wait for Akka to really shut down in SparkEnv.stop()
2012-09-12 14:50:37 -07:00
Denny
5e4076e3f2
Merge branch 'dev' into feature/fileserver
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
2012-09-11 16:57:17 -07:00
Denny
77873d2c8e
Formatting
2012-09-11 16:51:46 -07:00
Denny
24b9b37314
Subclass URLClassLoader instead of using reflection
2012-09-11 16:51:08 -07:00
Denny
31c53e917d
Use stageId as index for fileSet caches.
2012-09-11 16:10:45 -07:00
Matei Zaharia
943df48348
Merge branch 'dev' of github.com:mesos/spark into dev
2012-09-11 16:00:37 -07:00
Matei Zaharia
6d7f907e73
Manually merge pull request #175 by Imran Rashid
2012-09-11 16:00:06 -07:00
Reynold Xin
7af7c79ce5
Updated the logError call from the previous commit to conform to
...
logError API.
2012-09-11 14:32:24 -07:00
Reynold Xin
38b9119c96
Log entire exception (including stack trace) in BlockManagerWorker.
2012-09-11 11:31:35 -07:00
Denny
4d3471dd07
Fix serialization bugs and added local cluster tests
2012-09-10 15:39:58 -07:00
Tathagata Das
c63a606458
Made NewHadoopRDD broadcast its job configuration (same as HadoopRDD).
2012-09-10 19:51:27 +00:00
Denny
b864c36a30
Dynamically adding jar files and caching fileSets.
2012-09-10 12:49:09 -07:00
Denny
f275fb07da
General FileServer
...
A general fileserver for both JARs and regular files.
2012-09-10 12:48:59 -07:00
Matei Zaharia
a13780670d
Added a unit test for local-cluster mode and simplified some of the code involved in that
2012-09-10 12:48:58 -07:00
Denny
f2ac55840c
Add shutdown hook to Executor Runner and execute code to shutdown local cluster in Scheduler Backend
2012-09-10 12:48:58 -07:00
Denny
9ead8ab14e
Set SPARK_LAUNCH_WITH_SCALA=0 in Executor Runner
2012-09-10 12:48:58 -07:00
Denny
8bb3c73977
Renamed spark-cluster to spark-local.
2012-09-10 12:48:58 -07:00
Denny
a367c20f49
Fix wrong counting
2012-09-10 12:48:57 -07:00
Denny
93fe331e6d
Delete old DeployUtils.
2012-09-10 12:48:57 -07:00
Denny
cf074f9c96
Renamed class.
2012-09-10 12:48:57 -07:00
Denny
3749f94184
Start a standalone cluster locally.
2012-09-10 12:48:57 -07:00
Matei Zaharia
995982b3c9
Added a unit test for local-cluster mode and simplified some of the code involved in that
2012-09-07 17:08:36 -07:00
Matei Zaharia
8d2fcc2832
Merge pull request #189 from dennybritz/feature/localcluster
...
Simulating a Spark standalone cluster locally
2012-09-07 15:43:43 -07:00
Denny
7ff9311add
Add shutdown hook to Executor Runner and execute code to shutdown local cluster in Scheduler Backend
2012-09-07 14:09:12 -07:00
Denny
4e7b264cf7
Set SPARK_LAUNCH_WITH_SCALA=0 in Executor Runner
2012-09-07 11:39:44 -07:00
haoyuan
db08a362aa
commit opt for grep scalibility test.
2012-09-07 02:17:52 +00:00
root
c2da64409a
Randomize the order of block fetches in getMultiple
2012-09-06 23:16:26 +00:00
root
9ef90c95f4
Bug fix
2012-09-06 00:43:46 +00:00
root
2fa6d999fd
Tuning Akka more
2012-09-06 00:16:39 +00:00
Denny
886183e591
Renamed spark-cluster to spark-local.
2012-09-05 17:10:54 -07:00
root
215544820f
Serialize map output locations more efficiently, and only once, in MapOutputTracker
2012-09-05 23:54:04 +00:00
root
dc68febdce
User Spark's closure serializer for the ShuffleMapTask cache
2012-09-05 23:06:59 +00:00
Reynold Xin
c308fbcb79
Removed cache add/remove log messages from CacheTracker.
...
Added log messages on BlockManagerMaster to reflect block add/remove.
Also did some minor cleanup of storage package code.
2012-09-05 15:59:48 -07:00
root
ed937a821f
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-05 22:26:49 +00:00
root
1d6b36d3c3
Further tuning for network performance
2012-09-05 22:26:37 +00:00
root
3fa0d7f0c9
Serialize BlockRDD more efficiently
2012-09-05 08:28:15 +00:00
root
4a5d0d249e
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-05 08:23:09 +00:00
root
efc7668d16
Allow serializing HttpBroadcast through Kryo
2012-09-05 08:22:57 +00:00
root
75487b2f5a
Broadcast the JobConf in HadoopRDD to reduce task sizes
2012-09-05 08:14:50 +00:00
root
b7ad291ac5
Tuning Akka for more connections
2012-09-05 07:08:07 +00:00
root
fc186dc18a
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-05 05:53:18 +00:00
root
4ea032a142
Some changes to make important log output visible even if we set the logging to WARNING
2012-09-05 05:53:07 +00:00
Denny
babbca0a2f
Fix wrong counting
2012-09-04 22:04:18 -07:00
Denny
9326509f66
Delete old DeployUtils.
2012-09-04 21:15:23 -07:00
Denny
1588d4dbe6
Renamed class.
2012-09-04 21:13:25 -07:00
Denny
22dde6e020
Start a standalone cluster locally.
2012-09-04 20:56:30 -07:00
Tathagata Das
7c09ad0e04
Changed DStream member access permissions from private to protected. Updated StateDStream to checkpoint RDDs and forget lineage.
2012-09-04 19:11:49 -07:00
Matei Zaharia
a842c63044
Minor formatting fixes
2012-09-03 16:24:00 -07:00
Tathagata Das
b8e9e8ea78
Merge branch 'dev' of github.com:radlab/spark into dev
2012-09-02 02:35:32 -07:00
root
ceabf71257
tweaks
2012-09-01 21:52:42 +00:00
root
6025889be0
More raw network receiver programs
2012-09-01 20:51:07 +00:00
Harvey
3076b038f4
Start fetching a remote block when a received remote block has been passed
...
to the reduce function
2012-09-01 12:01:35 -07:00
Matei Zaharia
f84d2bbe55
Bug fixes to RateLimitedOutputStream
2012-09-01 00:31:15 -07:00
Matei Zaharia
44758aa8e2
First work towards a RawInputDStream and a sender program for it.
2012-09-01 00:17:59 -07:00
root
c42e7ac282
More block manager fixes
2012-09-01 04:31:11 +00:00
Matei Zaharia
389fb4cc54
End runJob() with a SparkException when a task fails too many times in
...
one of the cluster schedulers.
2012-08-31 17:47:43 -07:00
root
113277549c
Really fixed the replication-3 issue. The problem was a few buffers not being rewound.
2012-08-31 05:39:35 +00:00
Mosharaf Chowdhury
31ffe8d528
Synchronization bug fix in broadcast implementations
2012-08-30 22:26:43 -07:00
Matei Zaharia
101ae493e2
Replicate serialized blocks properly, without sharing a ByteBuffer.
2012-08-30 22:24:14 -07:00
Mosharaf Chowdhury
3883532545
Bug fix. Fixed log messages. Updated BroadcastTest example to have iterations.
2012-08-30 21:43:00 -07:00
Matei Zaharia
a480dec6b2
Deserialize multi-get results in the caller's thread. This fixes an
...
issue with shared buffers in the KryoSerializer.
2012-08-30 20:01:06 -07:00
Matei Zaharia
1b3e3352eb
Deserialize multi-get results in the caller's thread. This fixes an
...
issue with shared buffers with the KryoSerializer.
2012-08-30 17:59:25 -07:00
root
c4366eb764
Fixes to ShuffleFetcher
2012-08-31 00:34:24 +00:00
Reynold Xin
a8a2a08a1a
Added a test for testing map-side combine on/off switch.
2012-08-30 12:34:28 -07:00
Reynold Xin
5945bcdcc5
Added a new flag in Aggregator to indicate applying map side combiners.
2012-08-29 23:32:08 -07:00
Reynold Xin
c68e820b2a
Merge branch 'dev' of github.com:mesos/spark into dev
2012-08-29 23:01:19 -07:00
Reynold Xin
940869dfda
Disable running combiners on map tasks when mergeCombiners function is
...
not specified by the user.
2012-08-29 23:00:02 -07:00
Tathagata Das
4db3a96766
Made minor changes to reduce compilation errors in Eclipse. Twirl stuff still does not compile in Eclipse.
2012-08-29 13:04:01 -07:00
Matei Zaharia
bf2e9cb08e
Fault tolerance and block store fixes discovered through streaming tests.
2012-08-27 23:07:50 -07:00
Matei Zaharia
17af2df0cd
Log levels
2012-08-27 23:07:32 -07:00
Matei Zaharia
b4a2214218
More fault tolerance fixes to catch lost tasks
2012-08-27 22:49:29 -07:00
Reynold Xin
3a6a95dc24
Removed the deserialization cache for ShuffleMapTask because it was
...
causing concurrency problems (some variables in Shark get set to null).
The cost of task deserialization on slaves is trivial compared with the
execution time of the task anyway.
2012-08-27 22:33:15 -07:00
Matei Zaharia
b914cd0dfa
Serialize generation correctly in ShuffleMapTask
2012-08-26 20:07:59 -07:00
Matei Zaharia
69c2ab0408
logging
2012-08-26 20:00:58 -07:00
Matei Zaharia
117e3f8c86
Fix a bug that was causing FetchFailedException not to be thrown
2012-08-26 19:52:56 -07:00
Matei Zaharia
3c9c44a8d3
More helpful log messages
2012-08-26 19:37:43 -07:00
Matei Zaharia
26dfd20c9a
Detect disconnected slaves in StandaloneScheduler
2012-08-26 18:56:56 -07:00
Matei Zaharia
29e83f39e9
Fix replication with MEMORY_ONLY_DESER_2
2012-08-26 18:16:25 -07:00
Matei Zaharia
06ef7c3d1b
Less debug info
2012-08-26 16:29:20 -07:00
Matei Zaharia
741899b21e
Fix sendMessageReliablySync
2012-08-26 16:26:06 -07:00
Matei Zaharia
5a8015d2db
Merge remote-tracking branch 'public/dev' into dev
2012-08-24 16:11:44 -07:00
Matei Zaharia
2c16ae36d7
Set log level in tests to WARN
2012-08-23 20:38:14 -07:00
Matei Zaharia
deedb9e7b7
Fix further issues with tests and broadcast.
...
The broadcast fix is to store values as MEMORY_ONLY_DESER instead of
MEMORY_ONLY, which will save substantial time on serialization.
2012-08-23 20:31:49 -07:00
Matei Zaharia
59b831b9d1
Fixed test failures due to broadcast not stopping correctly
2012-08-23 19:59:55 -07:00
Matei Zaharia
7310a6f499
Merge pull request #147 from mosharaf/dev
...
Broadcast refactoring/cleaning up
2012-08-23 19:38:28 -07:00
Matei Zaharia
25a6a39e6d
Added other SparkContext constructors to JavaSparkContext
2012-08-19 18:59:16 -07:00
Shivaram Venkataraman
0f4fbb057b
Change BlockManagerSuite test cases to use a deterministic size estimator and
...
update the results to match the new estimates
2012-08-13 13:32:23 -07:00
Shivaram Venkataraman
22ba3a3f77
Add test-cases for 32-bit and no-compressed oops scenarios.
2012-08-13 13:32:10 -07:00
Shivaram Venkataraman
1f68c4b03b
Update test cases to match the new size estimates. Uses 64-bit and compressed
...
oops setting to get deterministic results
2012-08-13 13:31:54 -07:00
Shivaram Venkataraman
1ea269110c
Move object size and pointer size initialization into a function to enable unit-testing
2012-08-13 13:31:45 -07:00
Shivaram Venkataraman
44661df9cc
If spark.test.useCompressedOops is set, use that to infer compressed oops
...
setting. This is useful to get a deterministic test case
2012-08-13 13:31:39 -07:00
Shivaram Venkataraman
0dd8fe73ba
Use HotSpotDiagnosticMXBean to get if CompressedOops are in use or not
2012-08-13 13:31:29 -07:00
Shivaram Venkataraman
80104ce1da
Add link to Java wiki which specifies what changes with compressed oops
2012-08-13 13:31:21 -07:00
Shivaram Venkataraman
00ab5490b3
Changes to make size estimator more accurate. Fixes object size, pointer size
...
according to architecture and also aligns objects and arrays when computing
instance sizes. Verified using Eclipse Memory Analysis Tool (MAT)
2012-08-13 13:31:11 -07:00
Matei Zaharia
6ae3c375a9
Renamed apply() to call() in Java API and allowed it to throw Exceptions
2012-08-12 23:10:19 +02:00
Matei Zaharia
0141879c40
Use Promises instead of having a Future wait on a thread in
...
ConnectionManager.
2012-08-12 22:16:32 +02:00
Matei Zaharia
845a870242
Return remotely fetched blocks in a pipelined fashion from BlockManager
2012-08-12 20:01:38 +02:00
Matei Zaharia
e17ed9a21d
Switch to Akka futures in connection manager.
...
It's still not good because each Future ends up waiting on a lock, but
it seems to work better than Scala Actors, and more importantly it
allows us to use onComplete and other listeners on futures.
2012-08-12 19:40:37 +02:00
Matei Zaharia
ad8a7612a4
Changed multi-get method in BlockManager to return an iterator
2012-08-12 19:18:01 +02:00