Matei Zaharia
3db1e17baa
Merge pull request #620 from jerryshao/master
...
Fix CheckpointRDD java.io.FileNotFoundException when calling getPreferredLocations
2013-05-27 21:31:43 -07:00
Matei Zaharia
e8d4b6c296
Merge pull request #529 from xiajunluan/master
...
[SPARK-663]Implement Fair Scheduler in Spark Cluster Scheduler
2013-05-25 21:09:03 -07:00
Reynold Xin
26962c9340
Automatically configure Netty port. This makes unit tests using
...
local-cluster pass. Previously they were failing because Netty was
trying to bind to the same port for all processes.
Pair programmed with @shivaram.
2013-05-24 16:39:33 -07:00
Reynold Xin
6ea085169d
Fixed the bug that shuffle serializer is ignored by the new shuffle
...
block iterators for local blocks. Also added a unit test for that.
2013-05-24 14:08:37 -07:00
jerryshao
bd3ea8f2a6
fix CheckpointRDD getPreferredLocations java.io.FileNotFoundException
2013-05-24 14:26:19 +08:00
Charles Reiss
f350f14084
Use ARRAY_SAMPLE_SIZE constant instead of 100.0
2013-05-21 18:11:33 -07:00
Andrew xia
ecd6d75c6a
fix bug of unit tests
2013-05-21 06:49:23 +08:00
Reynold Xin
5912cc4967
Merge pull request #610 from JoshRosen/spark-747
...
Throw exception if TaskResult exceeds Akka frame size
2013-05-17 19:58:40 -07:00
Reynold Xin
8d78c5f89f
Changed the logging level from info to warning when addJar(null) is
...
called.
2013-05-17 18:51:35 -07:00
Andrew xia
3d4672eaa9
Merge branch 'master' into xiajunluan
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/scheduler/cluster/ClusterScheduler.scala
core/src/main/scala/spark/scheduler/cluster/TaskSetManager.scala
2013-05-18 07:28:03 +08:00
Andrew xia
d19753b9c7
expose TaskSetManager type to resourceOffer function in ClusterScheduler
2013-05-18 06:45:19 +08:00
Andrew xia
c6e2770bfe
Fix ClusterScheduler bug to avoid allocating tasks to same slave
2013-05-17 05:10:38 +08:00
Mridul Muralidharan
f0881f8d48
Hope this does not turn into a bike shed change
2013-05-17 01:58:50 +05:30
Mridul Muralidharan
feddd2530d
Filter out nulls - prevent NPE
2013-05-16 17:49:14 +05:30
Josh Rosen
b8e46b6074
Abort job if result exceeds Akka frame size; add test.
2013-05-16 01:57:57 -07:00
Matei Zaharia
2f576aba8f
Merge pull request #602 from rxin/shufflemerge
...
Manual merge & cleanup of Shane's Shuffle Performance Optimization
2013-05-15 18:06:24 -07:00
Reynold Xin
203d7b7c14
Merge pull request #593 from squito/driver_ui_link
...
Master UI has link to Application UI
2013-05-15 00:47:20 -07:00
Reynold Xin
f3491cb89b
Merge branch 'master' of github.com:mesos/spark into shufflemerge
...
Conflicts:
core/src/main/scala/spark/storage/BlockManager.scala
core/src/test/scala/spark/DistributedSuite.scala
project/SparkBuild.scala
2013-05-15 00:31:52 -07:00
Reynold Xin
f9d40a5848
Added a comment in JdbcRDD for example usage.
2013-05-14 23:29:57 -07:00
Reynold Xin
81ad2fa331
Merge branch 'jdbc' of github.com:koeninger/spark
...
Conflicts:
project/SparkBuild.scala
2013-05-14 23:12:00 -07:00
Imran Rashid
38d4b97c6d
use threads classloader when deserializing task results; classnotfoundexception includes classloader
2013-05-14 22:32:14 -07:00
Imran Rashid
d7d1da79d3
when akka starts, use akkas default classloader (current thread)
2013-05-14 22:32:09 -07:00
Matei Zaharia
016ac86830
Merge pull request #601 from rxin/emptyrdd-master
...
EmptyRDD (master branch 0.8)
2013-05-13 21:45:36 -07:00
Matei Zaharia
4b354e0a08
Merge pull request #589 from mridulm/master
...
Add support for instance local scheduling
2013-05-13 17:39:19 -07:00
Patrick Wendell
7f0833647b
Capturing class name
2013-05-12 07:54:03 -07:00
Patrick Wendell
72b9c4cb6e
Small fix
2013-05-11 23:53:50 -07:00
Patrick Wendell
1c15b85051
Removing import
2013-05-11 23:52:53 -07:00
Patrick Wendell
059ab88754
Changing technique to use same code path in all cases
2013-05-11 23:50:54 -07:00
Cody Koeninger
3da2305ed0
code cleanup per rxin comments
2013-05-11 23:59:07 -05:00
Josh Rosen
440719109e
Throw exception if task result exceeds Akka frame size.
...
This partially addresses SPARK-747.
2013-05-11 19:17:13 -07:00
Patrick Wendell
0345954530
SPARK-738: Spark should detect and squash nonserializable exceptions
2013-05-11 14:17:09 -07:00
Mark Hamstra
6e6b3e0d7e
Actually use the cleaned closure in foreachPartition
2013-05-10 13:02:34 -07:00
Imran Rashid
0ab818d508
fix linebreak
2013-05-09 00:38:59 -07:00
Reynold Xin
5d70ee4663
Cleaned up connection manager (moved many classes to their own files).
2013-05-07 22:42:15 -07:00
Reynold Xin
8388e8dd7a
Minor style fix in DiskStore...
2013-05-07 18:40:35 -07:00
Reynold Xin
547dcbe494
Cleaned up Scala files in network/netty from Shane's PR.
2013-05-07 18:39:33 -07:00
Reynold Xin
9e64396ca4
Cleaned up the Java files from Shane's PR.
2013-05-07 18:30:54 -07:00
Reynold Xin
0e5cc30868
Cleaned up BlockManager and BlockFetcherIterator from Shane's PR.
2013-05-07 18:18:24 -07:00
Reynold Xin
8b79485171
Moved BlockFetcherIterator to its own file.
2013-05-07 17:02:32 -07:00
Reynold Xin
90577ada69
Merge branch 'shuffle-performance-fix-0.7' of github.com:shane-huang/spark into shufflemerge
...
Conflicts:
core/src/main/scala/spark/storage/BlockManager.scala
core/src/main/scala/spark/storage/DiskStore.scala
project/SparkBuild.scala
2013-05-07 15:56:19 -07:00
Reynold Xin
0fd84965f6
Added EmptyRDD.
2013-05-06 15:40:34 -07:00
Imran Rashid
22a5063ae4
switch from separating appUI host & port to combining into just appUiUrl
2013-05-05 12:19:11 -07:00
Matei Zaharia
7af92f248b
Merge pull request #597 from JoshRosen/webui-fixes
...
Two minor bug fixes for Spark Web UI
2013-05-04 22:29:17 -07:00
Josh Rosen
42b1953c53
Fix SPARK-630: app details page shows finished executors as running.
2013-05-04 18:34:47 -07:00
Josh Rosen
c0688451a6
Fix wrong closing tags in web UI HTML.
2013-05-04 18:34:46 -07:00
Josh Rosen
d48e9fde01
Fix SPARK-629: weird number of cores in job details page.
2013-05-04 18:34:45 -07:00
Mridul Muralidharan
25198d7e9e
Merge branch 'master' of github.com:mridulm/spark
2013-05-04 20:45:56 +05:30
Mridul Muralidharan
5b011d18d7
Merge from master
2013-05-04 20:41:27 +05:30
Mridul Muralidharan
edb57c8331
Add support for instance local in getPreferredLocations of ZippedPartitionsBaseRDD. Add comments to both ZippedPartitionsBaseRDD and ZippedRDD to better describe the potential problem with the approach
2013-05-04 19:47:45 +05:30
Matei Zaharia
3bf2c868c3
Merge pull request #594 from shivaram/master
...
Add zip partitions to Java API
2013-05-03 18:27:30 -07:00
Shivaram Venkataraman
bb8a434f9d
Add zipPartitions to Java API.
2013-05-03 15:14:02 -07:00
Imran Rashid
6fae936088
applications (aka drivers) send their webUI address to master when registering so it can be displayed in the master web ui
2013-05-03 12:59:10 -07:00
Mridul Muralidharan
ea2a6f91d3
pull from master
2013-05-04 00:35:59 +05:30
Reynold Xin
93091f6936
Merge branch 'master' of github.com:mesos/spark into blockmanager
2013-05-03 01:02:32 -07:00
Reynold Xin
2bc895a829
Updated according to Matei's code review comment.
2013-05-03 01:02:16 -07:00
Mridul Muralidharan
11589c39d9
Fix ZippedRDD as part Matei's suggestion
2013-05-03 12:23:30 +05:30
Matei Zaharia
6fe9d4e61e
Merge pull request #592 from woggling/localdir-fix
...
Don't accept generated local directory names that can't be created
2013-05-02 21:33:56 -07:00
Matei Zaharia
538ee755b4
Merge pull request #581 from jerryshao/master
...
fix [SPARK-740] block manage UI throws exception when enabling Spark Streaming
2013-05-02 09:01:42 -07:00
Charles Reiss
c847dd3da2
Don't accept generated temp directory names that can't be created successfully.
2013-05-01 23:19:10 -07:00
Reynold Xin
4a31877408
Added the unpersist api to JavaRDD.
2013-05-01 20:31:54 -07:00
Reynold Xin
98df9d2853
Added removeRdd function in BlockManager.
2013-05-01 20:17:09 -07:00
Mridul Muralidharan
dfde9ce9dd
comment out debug versions of checkHost, etc from Utils - which were used to test
2013-05-02 07:41:33 +05:30
Mridul Muralidharan
1b5aaeadc7
Integrate review comments 2
2013-05-02 07:30:06 +05:30
jerryshao
c047f0e3ad
filter out Spark streaming block RDD and sort RDDInfo with id
2013-05-02 09:48:32 +08:00
Mridul Muralidharan
609a817f52
Integrate review comments on pull request
2013-05-02 06:44:33 +05:30
Reynold Xin
204eb32e14
Changed the type of the persistentRdds hashmap back to
...
TimeStampedHashMap.
2013-05-01 16:14:58 -07:00
Reynold Xin
34637b97ec
Added SparkContext.cleanup back. Not sure why it was removed before ...
2013-05-01 16:12:37 -07:00
Reynold Xin
3227ec8edd
Cleaned up Ram's code. Moved SparkContext.remove to RDD.unpersist.
...
Also updated unit tests to make sure they are properly testing for
concurrency.
2013-05-01 16:07:44 -07:00
harshars
8481562731
Merged Ram's commit on removing RDDs.
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
2013-05-01 14:42:17 -07:00
Mridul Muralidharan
27764a00f4
Fix some npe introduced accidentally
2013-05-01 20:56:05 +05:30
Mridul Muralidharan
d960e7e0f8
a) Add support for hyper local scheduling - specific to a host + port - before trying host local scheduling.
...
b) Add some fixes to test code to ensure it passes (and fixes some other issues).
c) Fix bug in task scheduling which incorrectly used availableCores instead of all cores on the node.
2013-05-01 20:24:00 +05:30
Matei Zaharia
aa8fe1a209
Merge pull request #586 from mridulm/master
...
Pull request to address issues Reynold Xin reported
2013-04-30 22:30:18 -07:00
Reynold Xin
dd7bef3147
Two minor fixes according to Ryan LeCompte's review.
2013-04-30 15:02:32 -07:00
Reynold Xin
cea6174573
Merge branch 'master' of github.com:mesos/spark into blockmanager
...
Conflicts:
core/src/main/scala/spark/BlockStoreShuffleFetcher.scala
2013-04-30 13:28:35 -07:00
Mridul Muralidharan
60cabb35cb
Add addition catch block for exception too
2013-05-01 01:17:14 +05:30
Mridul Muralidharan
3b748ced22
Be more aggressive and defensive in all uses of SelectionKey in select loop
2013-05-01 00:30:30 +05:30
Mridul Muralidharan
0f45477be1
Change indentation
2013-05-01 00:10:02 +05:30
Mridul Muralidharan
538614acfe
Be more aggressive and defensive in select also
2013-05-01 00:05:32 +05:30
Mridul Muralidharan
48854e1dbf
If key is not valid, close connection
2013-04-30 23:59:33 +05:30
Matei Zaharia
f708dda81e
Merge pull request #585 from pwendell/listener-perf
...
[Fix SPARK-742] Task Metrics should not employ per-record timing by default
2013-04-30 07:51:40 -07:00
Mridul Muralidharan
e46d547ccd
Fix issues reported by Reynold
2013-04-30 16:15:56 +05:30
Reynold Xin
1055785a83
Allow specifying the shuffle write file buffer size. The default buffer
...
size is 8KB in FastBufferedOutputStream, which is too small and would
cause a lot of disk seeks.
2013-04-29 23:33:56 -07:00
Reynold Xin
7007201201
Added a shuffle block manager so it is easier in the future to
...
consolidate shuffle output files.
2013-04-29 23:07:03 -07:00
Reynold Xin
d3586ef438
Merge branch 'blockmanager' of github.com:rxin/spark into blockmanager
...
Conflicts:
core/src/main/scala/spark/storage/DiskStore.scala
2013-04-29 15:44:18 -07:00
Patrick Wendell
016ce1fa9c
Using full package name for util
2013-04-29 12:02:27 -07:00
Patrick Wendell
540be6b154
Modified version of the fix which just removes all per-record tracking.
2013-04-29 11:32:07 -07:00
Patrick Wendell
224fbac061
Spark-742: TaskMetrics should not employ per-record timing.
...
This patch does three things:
1. Makes TimedIterator a trait with two implementations (one a no-op)
2. Makes the default behavior to use the no-op implementation
3. Removes DelegateBlockFetchTracker. This is just cleanup, but it seems like
the triat doesn't really reduce complexity in any way.
In the future we can add other implementations, e.g. ones which perform sampling.
2013-04-29 11:13:43 -07:00
Shivaram Venkataraman
604d3bf56c
Rename partition class and add scala doc
2013-04-28 16:31:07 -07:00
Shivaram Venkataraman
15acd49f07
Actually rename classes to ZippedPartitions*
...
(the previous commit only renamed the file)
2013-04-28 16:03:22 -07:00
Shivaram Venkataraman
6e84635ab9
Rename classes from MapZipped* to Zipped*
2013-04-28 15:58:40 -07:00
Shivaram Venkataraman
0cc6642b7c
Rename to zipPartitions and style changes
2013-04-28 05:11:03 -07:00
Shivaram Venkataraman
c9c4954d99
Add an interface to zip iterators of multiple RDDs
...
The current code supports 2, 3 or 4 arguments but can be extended
to more arguments if required.
2013-04-26 16:57:46 -07:00
Matei Zaharia
6e6b5204ea
Create an empty directory when checkpointing a 0-partition RDD (fixes a
...
test failure on Hadoop 2.0)
2013-04-25 00:42:37 -07:00
Reynold Xin
ba6ffa6a5f
Allow the specification of a shuffle serializer in the read path (for
...
local block reads).
2013-04-24 17:38:07 -07:00
Reynold Xin
aa618ed2a2
Allow changing the serializer on a per shuffle basis.
2013-04-24 14:52:49 -07:00
Mridul Muralidharan
dd515ca3ee
Attempt at fixing merge conflict
2013-04-24 09:24:17 +05:30
Reynold Xin
31ce6c66d6
Added a BlockObjectWriter interface in block manager so ShuffleMapTask
...
doesn't need to build up an array buffer for each shuffle bucket.
2013-04-23 17:48:59 -07:00
koeninger
dfac0aa5c2
prevent mysql driver from pulling entire resultset into memory. explicitly close resultset and statement.
2013-04-22 21:12:52 -05:00
Mridul Muralidharan
7acab3ab45
Fix review comments, add a new api to SparkHadoopUtil to create appropriate Configuration. Modify an example to show how to use SplitInfo
2013-04-22 08:01:13 +05:30
koeninger
b2a3f24dde
first attempt at an RDD to pull data from JDBC sources
2013-04-21 00:29:37 -05:00
Andrew xia
8436bd5d4a
remove TaskSetQueueManager and update code style
2013-04-19 02:17:22 +08:00
Andrew xia
e0603d7e8b
refactor the Schedulable interface and add unit test for SchedulingAlgorithm
2013-04-18 13:13:54 +08:00
Mridul Muralidharan
5ee2f5c483
Cache pattern, add (commented out) alternatives for check* apis
2013-04-17 23:13:34 +05:30
Mridul Muralidharan
f07961060d
Add a small note on spark.tasks.schedule.aggression
2013-04-17 23:13:02 +05:30
Mridul Muralidharan
02dffd2eb0
Ensure all ask/await block for spark.akka.askTimeout - so that it is controllable : instead of arbitrary timeouts spread across codebase. In our tests, we use 30 seconds, though default of 10 is maintained
2013-04-17 05:52:57 +05:30
Mridul Muralidharan
ad80f68eb5
remove spurious debug statements
2013-04-16 22:15:34 +05:30
Mridul Muralidharan
f7969f72ee
Fix exception when checkpoint path does not exist (no data in rdd which is being checkpointed for example)
2013-04-16 21:51:38 +05:30
Mridul Muralidharan
323ab8ff3b
Scala does not prevent variable shadowing ! Sick error due to it ...
2013-04-16 17:05:10 +05:30
shane-huang
b493f55a4f
fix a bug in netty Block Fetcher
...
Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-04-16 10:01:01 +08:00
Mridul Muralidharan
59c380d69a
Fix npe
2013-04-16 03:29:38 +05:30
Mridul Muralidharan
dd2b64ec97
Fix bug with atomic update
2013-04-16 03:19:24 +05:30
Mridul Muralidharan
5540ab8243
Use hostname instead of hostport for executor, fix creation of workdir
2013-04-16 02:57:43 +05:30
Mridul Muralidharan
eb7e95e833
Commit job to persist files
2013-04-16 02:56:36 +05:30
Matei Zaharia
a64c107449
Make ShuffledRDD.prev transient
2013-04-15 16:41:51 -04:00
Mridul Muralidharan
19652a44be
Fix issue with FileSuite failing
2013-04-15 19:16:36 +05:30
Mridul Muralidharan
54b3d45b81
Checkpoint commit - compiles and passes a lot of tests - not all though, looking into FileSuite issues
2013-04-15 18:26:50 +05:30
Mridul Muralidharan
d90d2af103
Checkpoint commit - compiles and passes a lot of tests - not all though, looking into FileSuite issues
2013-04-15 18:12:11 +05:30
Matei Zaharia
c35d530bcf
Fix compile error
2013-04-13 12:43:12 -04:00
Andrew Ash
29d3440efb
Add details when BlockManager heartbeats time out
...
Makes it more clear what the threshold was for tuning spark.storage.blockManagerSlaveTimeoutMs
Before:
WARN "Removing BlockManager BlockManagerId(201304022120-1976232532-5050-27464-0, myhostname, 51337) with no recent heart beats
After:
WARN "Removing BlockManager BlockManagerId(201304022120-1976232532-5050-27464-0, myhostname, 51337) with no recent heart beats: 19216ms exceeds 15000ms
2013-04-11 01:54:02 -03:00
Andrew xia
2f883c515f
Contiue to update codes for scala code style
...
1.refactor braces for "class" "if" "while" "for" "match"
2.make code lines less than 100
3.refactor class parameter and extends defination
2013-04-09 13:02:50 +08:00
Matei Zaharia
054feb6448
Fixed a bug with zip
2013-04-07 21:15:21 -04:00
Matei Zaharia
b5900d47b1
Fix compile warning
2013-04-07 20:55:42 -04:00
Matei Zaharia
6962d40b44
Fix deprecated warning
2013-04-07 20:27:33 -04:00
Mridul Muralidharan
6798a09df8
Add support for building against hadoop2-yarn : adding new maven profile for it
2013-04-07 17:47:38 +05:30
shane-huang
df47b40b76
Shuffle Performance fix: Use netty embeded OIO file server instead of ConnectionManager
...
Shuffle Performance Optimization: do not send 0-byte block requests to reduce network messages
change reference from io.Source to scala.io.Source to avoid looking into io.netty package
Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-04-07 14:37:12 +08:00
Andrew xia
2b373dd07a
add properties default value null to fix sbt/sbt test errors
2013-04-02 12:11:14 +08:00
Mark Hamstra
e215f67923
Correct sense of 'filter out' in comment.
2013-03-31 08:00:13 -07:00
Mark Hamstra
8bcdc64005
Fixed broken filter in getWritableClass[T]
2013-03-30 22:09:52 -07:00
Matei Zaharia
9831bc1a09
Merge pull request #539 from cgrothaus/fix-webui-workdirpath
...
Bugfix: WorkerWebUI must respect workDirPath from Worker
2013-03-29 22:16:22 -07:00
Matei Zaharia
3cc8ab6e29
Merge pull request #541 from stephenh/shufflecoalesce
...
Add a shuffle parameter to coalesce.
2013-03-29 22:14:07 -07:00
Andrew xia
1a28f92711
change some typo and some spacing
2013-03-29 08:34:28 +08:00
Andrew xia
def3d1c84a
1.remove redundant spacing in source code
...
2.replace get/set functions with val and var defination
2013-03-29 08:20:35 +08:00
Holden Karau
f5df729b12
Explicitly catch all throwables (warning in 2.10)
2013-03-24 16:15:32 -07:00
Stephen Haberman
dd854d5b9f
Use Boolean in the Java API, and != for assert.
2013-03-23 11:49:45 -05:00
Stephen Haberman
4ca273edc4
Merge branch 'master' into shufflecoalesce
...
Conflicts:
core/src/test/scala/spark/RDDSuite.scala
2013-03-23 11:45:45 -05:00
Matei Zaharia
b8949cab88
Merge pull request #505 from stephenh/volatile
...
Make Executor fields volatile since they're read from the thread pool.
2013-03-23 07:19:34 -07:00
Matei Zaharia
fd53f2fc7b
Merge pull request #510 from markhamstra/WithThing
...
mapWith, flatMapWith and filterWith
2013-03-23 07:13:21 -07:00
Andrew xia
d1d9bdaabe
Just update typo and comments
2013-03-23 07:25:30 +08:00
Stephen Haberman
00170eb0b9
Fix are/our typo.
2013-03-22 12:59:08 -05:00
Stephen Haberman
1c67c7dfd1
Add a shuffle parameter to coalesce.
...
This is useful for when you want just 1 output file (part-00000) but
still up the upstream RDD to be computed in parallel.
2013-03-22 08:54:44 -05:00
Christoph Grothaus
445f387ef4
Bugfix: WorkerWebUI must respect workDirPath from Worker
2013-03-22 11:08:40 +01:00
Matei Zaharia
35588490cb
Merge pull request #538 from rxin/cogroup
...
Added mapSideCombine flag to CoGroupedRDD. Added unit test for CoGroupedRDD.
2013-03-20 19:27:47 -07:00
Stephen Haberman
4f4215311a
Merge branch 'master' into volatile
2013-03-20 15:37:10 -05:00
Matei Zaharia
b812e6b7bb
Merge pull request #526 from markhamstra/foldByKey
...
Add foldByKey
2013-03-20 11:21:02 -07:00
Reynold Xin
d48ee7e55e
Merge branch 'master' of github.com:mesos/spark into cogroup
2013-03-20 14:00:28 +08:00
Reynold Xin
00a11304fd
Added mapSideCombine flag to CoGroupedRDD. Added unit test for
...
CoGroupedRDD.
2013-03-20 13:49:51 +08:00
Matei Zaharia
945d1e720e
Merge pull request #536 from sasurfer/master
...
CoalescedRDD for many partitions
2013-03-19 21:59:06 -07:00
Matei Zaharia
1cbbe94ac1
Merge pull request #534 from stephenh/removetrycatch
...
Remove try/catch block that can't be hit.
2013-03-19 21:34:34 -07:00
Andrey Kouznetsov
bd167f83b0
call setConf from input format if it is Configurable
2013-03-19 17:15:15 +04:00
Giovanni Delussu
aceae029f7
CoalescedRDD changed to work with a big number of partitions both in the original and the new coalesced RDD.
...
The limitation was in the range that Scala.Int can represent.
2013-03-19 11:25:45 +01:00