jerryshao
e4ff544a8d
Clean StageToInfos periodically when spark.cleaner.ttl is enabled
2013-07-05 10:34:45 +08:00
Lian Cheng
c0c3155c3c
Bug fix: SPARK-789
...
https://spark-project.atlassian.net/browse/SPARK-789
2013-07-05 00:54:10 +08:00
Holden Karau
0f06d6217d
s/ActorSystemImpl/ExtendedActorSystem/ as ActorSystemImpl results in a warning
2013-07-04 01:05:39 -07:00
Gavin Li
94238aae57
fix dependencies
2013-07-03 18:08:38 +00:00
Gavin Li
96130c30d9
add compression codec trait and snappy compression
2013-07-03 05:49:04 +00:00
Y.CORP.YAHOO.COM\tgraves
923cf92900
Rework from pull request. Removed --user option from Spark on Yarn Client, made the user of JAVA_HOME environment
...
variable conditional on if its set, and created addCredentials in each of the SparkHadoopUtil classes
to only add the credentials when the profile is hadoop2-yarn.
2013-07-02 21:18:59 -05:00
Patrick Wendell
39e2325675
Removing dead code
2013-07-02 16:28:40 -07:00
Patrick Wendell
8ca1cc1786
Adding truncation for log files
2013-07-02 16:10:50 -07:00
Patrick Wendell
9a42d04efa
Throw exception for missing resource
2013-07-01 14:43:13 -07:00
Patrick Wendell
1025d7d1ef
Package refactoring
2013-07-01 14:40:53 -07:00
Patrick Wendell
30b9034241
Fixing bug where logs aren't shown
2013-07-01 13:48:01 -07:00
Patrick Wendell
8688689387
Various formatting changes
2013-07-01 13:40:12 -07:00
Patrick Wendell
735c951a09
Adding test script
2013-07-01 09:33:22 -07:00
Patrick Wendell
5de326db7d
Print exception message
2013-07-01 09:19:45 -07:00
root
ec31e68d5d
Fixed PySpark perf regression by not using socket.makefile(), and improved
...
debuggability by letting "print" statements show up in the executor's stderr
Conflicts:
core/src/main/scala/spark/api/python/PythonRDD.scala
2013-07-01 06:26:31 +00:00
root
3296d132b6
Fix performance bug with new Python code not using buffered streams
2013-07-01 06:25:43 +00:00
Matei Zaharia
03d0b858c8
Made use of spark.executor.memory setting consistent and documented it
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
2013-06-30 15:46:46 -07:00
Patrick Wendell
e721ff7e5a
Allowing details for failed stages
2013-06-29 11:26:30 -07:00
Patrick Wendell
473961d82e
Styling for progress bar
2013-06-29 08:38:04 -07:00
Patrick Wendell
249f0e54ba
Minor changes from Matei's review
2013-06-28 13:25:26 -07:00
Patrick Wendell
62c2c6b856
Forcing Jetty to run as daemon
2013-06-27 21:47:22 -07:00
Patrick Wendell
a55190d314
Adding better tabs for UI headers.
2013-06-27 19:14:51 -07:00
Patrick Wendell
362d996c81
Handful of changes based on matei's review
...
- Avoid exception when no tasks have finished for a stage
- Adding DOCTYPE so css renders properly
- Adding progress slider
2013-06-27 19:14:28 -07:00
Patrick Wendell
92a4c2a5f6
Fixing bug in local scheduler time recording
2013-06-27 12:33:06 -07:00
Stephen Haberman
d7011632d1
Wrap lines.
2013-06-26 12:35:57 -05:00
Patrick Wendell
ee692482a6
One more private class
2013-06-26 09:07:32 -07:00
Patrick Wendell
a59c15a37e
Adding config option for retained stages
2013-06-26 08:54:57 -07:00
Patrick Wendell
274193664a
Bumping timeouts
2013-06-26 08:51:28 -07:00
Patrick Wendell
b14ad509ba
Moving static ui package
2013-06-26 08:46:51 -07:00
Patrick Wendell
2cbaa0734b
Making all new classes package private
2013-06-26 08:44:55 -07:00
Stephen Haberman
d11025dc6a
Be cute with Option and getenv.
2013-06-26 09:53:35 -05:00
Matei Zaharia
6c8d1b2ca6
Fix computation of classpath when we launch java directly
...
The previous version assumed that a CLASSPATH environment variable was
set by the "run" script when launching the process that starts the
ExecutorRunner, but unfortunately this is not true in tests. Instead, we
factor the classpath calculation into an extenral script and call that.
NOTE: This includes a Windows version but hasn't yet been tested there.
2013-06-25 18:21:00 -04:00
Matei Zaharia
15b00914c5
Some fixes to the launch-java-directly change:
...
- Split SPARK_JAVA_OPTS into multiple command-line arguments if it
contains spaces; this splitting follows quoting rules in bash
- Add the Scala JARs to the classpath if they're not in the CLASSPATH
variable because the ExecutorRunner is launched with "scala" (this can
happen when using local-cluster URLs in spark-shell)
2013-06-25 17:17:27 -04:00
Matei Zaharia
7e0191c6ea
Merge remote-tracking branch 'cgrothaus/SPARK-698'
...
Conflicts:
run
2013-06-25 15:47:40 -04:00
Patrick Wendell
d66bd6f885
Adding another unit test to Web UI suite
2013-06-24 17:12:55 -07:00
Patrick Wendell
f7389330c3
Allowing for requested port on construction
2013-06-24 16:51:52 -07:00
Patrick Wendell
42157027f2
A few bug fixes and a unit test
2013-06-24 16:25:05 -07:00
Patrick Wendell
a4248138b4
Minor style cleanup
2013-06-24 14:22:28 -07:00
Patrick Wendell
b5e6e8bcc8
Cleaning up some code for Job Progress
2013-06-24 14:13:24 -07:00
Patrick Wendell
93e8ed85aa
Work around for initalization issue
2013-06-24 13:11:18 -07:00
Patrick Wendell
f6e64b5cd6
Updating based on changes to JobLogger (and one small change to JobLogger)
2013-06-24 12:40:41 -07:00
Matei Zaharia
78ffe164b3
Clone the zero value for each key in foldByKey
...
The old version reused the object within each task, leading to
overwriting of the object when a mutable type is used, which is expected
to be common in fold.
Conflicts:
core/src/test/scala/spark/ShuffleSuite.scala
2013-06-23 10:26:53 -07:00
Matei Zaharia
0e0f9d3069
Fix search path for REPL class loader to really find added JARs
2013-06-22 17:44:04 -07:00
Matei Zaharia
3e61beff7b
Merge pull request #648 from shivaram/netty-dbg
...
Shuffle fixes and cleanup
2013-06-22 16:22:47 -07:00
Patrick Wendell
7e9f1ed0de
Some cleanup of styling
2013-06-22 10:31:37 -07:00
Patrick Wendell
3b7ebdeeb8
Handling entirely failed stages
2013-06-22 10:31:37 -07:00
Patrick Wendell
be6107ce44
Some tweaking with shared page header
2013-06-22 10:31:37 -07:00
Patrick Wendell
9a24d1a2d0
Using scala in XML imports
2013-06-22 10:31:37 -07:00
Patrick Wendell
f91e1c4822
Linking RDD information when available in stages
2013-06-22 10:31:37 -07:00
Patrick Wendell
a86bb459e2
Showing shuffle status and purging old stages
2013-06-22 10:31:37 -07:00
Patrick Wendell
3485e73376
Style cleanup
2013-06-22 10:31:37 -07:00
Patrick Wendell
dd696f3a3d
Some renaming and comments
2013-06-22 10:31:37 -07:00
Patrick Wendell
5c872e9ef5
Documentation and some refactoring
2013-06-22 10:31:37 -07:00
Patrick Wendell
17776323a6
More work on percentile data:
2013-06-22 10:31:37 -07:00
Patrick Wendell
dcf6a68177
Refactoring into different modules
2013-06-22 10:31:36 -07:00
Patrick Wendell
ce81c320ac
Adding helper function to make listing tables
2013-06-22 10:31:36 -07:00
Patrick Wendell
9fd5dc3ea9
Initial steps towards job progress UI
2013-06-22 10:31:36 -07:00
Patrick Wendell
bc4a811c57
Stash
2013-06-22 10:31:36 -07:00
Patrick Wendell
77c53f7868
Refactoring UI packages
2013-06-22 10:31:36 -07:00
Patrick Wendell
8b5c7e71c4
Import cleanup
2013-06-22 10:31:36 -07:00
Patrick Wendell
32a45d01b1
Removing twirl files
2013-06-22 10:31:36 -07:00
Patrick Wendell
4e1f202481
Removing dead code
2013-06-22 10:31:36 -07:00
Patrick Wendell
d6fde4ffe4
Some JSON cleanup
2013-06-22 10:31:36 -07:00
Patrick Wendell
91ec5a1a04
Changing JSON protocol and removing spray code
2013-06-22 10:31:36 -07:00
Patrick Wendell
fc94576ece
Adding worker version of UI
2013-06-22 10:31:36 -07:00
Patrick Wendell
ee73c09ac9
Some comments
2013-06-22 10:31:36 -07:00
Patrick Wendell
9161db5478
Cleaning up master web UI
2013-06-22 10:31:36 -07:00
Patrick Wendell
e55cf0245f
Adding WebUI file
2013-06-22 10:31:35 -07:00
Patrick Wendell
f85fd7a793
Commenting unfinished part
2013-06-22 10:31:35 -07:00
Patrick Wendell
2c36a514aa
Spray refactoring for master web UI
2013-06-22 10:31:35 -07:00
Patrick Wendell
7e6977b6c5
Fix in storage status page
2013-06-22 10:31:35 -07:00
Patrick Wendell
950f83535a
Adding deterministic port
2013-06-22 10:31:35 -07:00
Patrick Wendell
7cd70dc2c1
Minor cleanup
2013-06-22 10:31:35 -07:00
Patrick Wendell
e66f570194
Completely hacked version of block manager UI in jetty
2013-06-22 10:31:35 -07:00
Patrick Wendell
60fbf7e461
Partially working checkpoint
2013-06-22 10:31:35 -07:00
Matei Zaharia
1ef5d0d2c9
Merge pull request #644 from shimingfei/joblogger
...
add Joblogger to Spark (on new Spark code)
2013-06-22 09:35:57 -07:00
Jey Kottalam
1ba3c17303
use parens when calling method with side-effects
2013-06-21 12:14:16 -04:00
Jey Kottalam
edb18ca928
Rename PythonWorker to PythonWorkerFactory
2013-06-21 12:14:16 -04:00
Jey Kottalam
62c4781400
Add tests and fixes for Python daemon shutdown
2013-06-21 12:14:16 -04:00
Jey Kottalam
c79a6078c3
Prefork Python worker processes
2013-06-21 12:14:16 -04:00
Jey Kottalam
40afe0d2a5
Add Python timing instrumentation
2013-06-21 12:14:16 -04:00
Mingfei
2fc794a6c7
small modify in DAGScheduler
2013-06-21 18:21:35 +08:00
Mingfei
4b9862ac9c
small format modification
2013-06-21 17:55:32 +08:00
Mingfei
aa7aa587be
some format modification
2013-06-21 17:48:41 +08:00
Mingfei
5240795154
edit according to comments
2013-06-21 17:38:23 +08:00
Matei Zaharia
71030ba3eb
Merge pull request #654 from lyogavin/enhance_pipe
...
fix typo and coding style in #638
2013-06-19 15:21:03 -07:00
Thomas Graves
bad51c7cb4
upmerge with latest mesos/spark master and fix hbase compile with hadoop2-yarn profile
2013-06-19 14:39:13 -05:00
Thomas Graves
75d78c7ac9
Add support for Spark on Yarn on a secure Hadoop cluster
2013-06-19 11:18:42 -05:00
Matei Zaharia
7902baddc7
Update ASM to version 4.0
2013-06-19 13:34:30 +02:00
Gavin Li
0a2a9bce1e
fix typo and coding style
2013-06-18 21:30:13 +00:00
jerryshao
1e9269c3ee
reduce ZippedPartitionsRDD's getPreferredLocations complexity
2013-06-18 09:49:06 +08:00
Matei Zaharia
db42451a52
Merge pull request #643 from adatao/master
...
Bug fix: Zero-length partitions result in NaN for overall mean & variance
2013-06-17 15:26:36 -07:00
Matei Zaharia
e82a2ffcc9
Merge pull request #653 from rxin/logging
...
SPARK-781: Log the temp directory path when Spark says "Failed to create temp directory."
2013-06-17 15:13:15 -07:00
Matei Zaharia
ec193c7d89
Merge remote-tracking branch 'xiajunluan/xiajunluan'
...
Conflicts:
core/src/main/scala/spark/scheduler/cluster/TaskSetManager.scala
2013-06-18 00:11:50 +02:00
Reynold Xin
be3c406edf
Fixed the typo pointed out by Matei.
2013-06-17 17:07:51 -04:00
Reynold Xin
1450296797
SPARK-781: Log the temp directory path when Spark says "Failed to create
...
temp directory".
2013-06-17 16:58:23 -04:00
Gavin Li
4508089fc3
refine comments and add sc.clean
2013-06-17 05:23:46 +00:00
Gavin Li
e6ae049283
Merge remote-tracking branch 'upstream1/master' into enhance_pipe
2013-06-16 22:53:39 +00:00
Gavin Li
fb6d733fa8
update according to comments
2013-06-16 22:32:55 +00:00
Matei Zaharia
f961aac8b2
Merge pull request #649 from ryanlecompte/master
...
Add top K method to RDD using a bounded priority queue
2013-06-15 00:53:41 -07:00
ryanlecompte
e8801d4490
use delegation for BoundedPriorityQueue, add Java API
2013-06-14 23:39:05 -07:00
Reynold Xin
2cc188fd54
SPARK-774: cogroup should also disable map side combine by default
2013-06-14 00:10:54 -07:00
Reynold Xin
6738178d0d
SPARK-772: groupByKey should disable map side combine.
2013-06-13 23:59:42 -07:00
ryanlecompte
93b3f5e535
drop unneeded ClassManifest implicit
2013-06-13 16:26:35 -07:00
ryanlecompte
44b8dbaede
use Iterator.single(elem) instead of Iterator(elem) for improved performance based on scaladocs
2013-06-13 16:23:15 -07:00
Shivaram Venkataraman
1d9f0df065
Fix some comments and style
2013-06-13 14:46:25 -07:00
Mingfei
967a6a699d
modify sparklister function interface according to comments
2013-06-13 14:36:07 +08:00
Shivaram Venkataraman
5da4287b1d
Merge branch 'netty-dbg' of github.com:shivaram/spark into netty-dbg
2013-06-12 16:38:37 -07:00
Shivaram Venkataraman
5e9a9317c5
Merge branch 'master' of git://github.com/mesos/spark into netty-dbg
2013-06-12 16:38:01 -07:00
ryanlecompte
db5bca08ff
add a new top K method to RDD using a bounded priority queue
2013-06-12 10:54:16 -07:00
Andrew xia
190ec61799
change code style and debug info
2013-06-10 15:27:02 +08:00
Patrick Wendell
ef14dc2e77
Adding Java-API version of compression codec
2013-06-09 18:09:46 -07:00
Patrick Wendell
df592192e7
Monads FTW
2013-06-09 18:09:24 -07:00
Patrick Wendell
d1bbcebae5
Adding compression to Hadoop save functions
2013-06-09 11:39:35 -07:00
Mingfei
ade822011d
not check return value of eventQueue.take
2013-06-08 16:26:45 +08:00
Mingfei
4fd86e0e10
delete test code for joblogger in SparkContext
2013-06-08 15:45:47 +08:00
Mingfei
362f0f93ac
Merge branch 'master' of https://github.com/mesos/spark
2013-06-08 15:20:13 +08:00
Mingfei
1a4d93c025
modify to pass job annotation by localProperties and use daeamon thread to do joblogger's work
2013-06-08 14:23:39 +08:00
Matei Zaharia
b58a29295b
Small formatting and style fixes
2013-06-07 22:51:28 -07:00
Matei Zaharia
c8fc423bc2
Merge pull request #631 from jerryshao/master
...
Fix block manager UI display issue when enable spark.cleaner.ttl
2013-06-07 22:43:18 -07:00
Matei Zaharia
c9ca0a4a58
Small code style fix to SchedulingAlgorithm.scala
2013-06-07 22:40:44 -07:00
Matei Zaharia
1ae60bcb36
Merge pull request #634 from xiajunluan/master
...
[Spark-753] Fix ClusterSchedulSuite unit test failed
2013-06-07 22:39:06 -07:00
Shivaram Venkataraman
ac480fd977
Clean up variables and counters in BlockFetcherIterator
2013-06-06 16:34:27 -07:00
Gavin Li
e179ff8a32
update according to comments
2013-06-05 22:41:05 +00:00
Shivaram Venkataraman
cb2f5046ee
Pass in bufferSize to BufferedOutputStream
2013-06-05 15:09:02 -07:00
Shivaram Venkataraman
c851957fe4
Don't write zero block files with java serializer
2013-06-05 14:28:38 -07:00
Christopher Nguyen
9d35904357
In the current code, when both partitions happen to have zero-length, the return mean will be NaN.
...
Consequently, the result of mean after reducing over all partitions will also be NaN,
which is not correct if there are partitions with non-zero length. This patch fixes this issue.
2013-06-04 22:12:47 -07:00
Matei Zaharia
fff3728552
Merge pull request #640 from pwendell/timeout-update
...
Fixing bug in BlockManager timeout
2013-06-04 16:09:50 -07:00
Patrick Wendell
061fd3ae36
Fixing bug in BlockManager timeout
2013-06-04 19:02:44 -04:00
Matei Zaharia
f420d4f228
Merge pull request #639 from pwendell/timeout-update
...
Bump akka and blockmanager timeouts to 60 seconds
2013-06-04 15:25:58 -07:00
Patrick Wendell
8bd4e12104
Bump akka and blockmanager timeouts to 60 seconds
2013-06-04 18:14:24 -04:00
Shivaram Venkataraman
96943a1cc0
var to val
2013-06-03 12:29:38 -07:00
Shivaram Venkataraman
cd347f547a
Reuse the file object as it is valid after delete
2013-06-03 12:27:51 -07:00
Shivaram Venkataraman
a058b0acf3
Delete a file for a block if it already exists.
2013-06-03 12:10:00 -07:00
Andrew xia
606bb1b450
Fix schedulingAlgorithm bugs for unit test
2013-06-03 10:29:23 +08:00
Shivaram Venkataraman
038cfc1a9a
Make connect timeout configurable
2013-05-31 23:32:18 -07:00
Shivaram Venkataraman
91aca92249
Another round of Netty fixes.
...
1. Avoid race condition between stop and copier completion
2. Handle socket exceptions by reporting them and filling in a failed
FetchResult
2013-05-31 23:21:38 -07:00
Gavin Li
9f84315c05
enhance pipe to support what we can do in hadoop streaming
2013-06-01 00:26:10 +00:00
Reynold Xin
de1167bf2c
Incorporated Charles' feedback to put rdd metadata removal in
...
BlockManagerMasterActor.
2013-05-31 15:54:57 -07:00
Reynold Xin
ba5e544461
More block manager cleanup.
...
Implemented a removeRdd method in BlockManager, and use that to
implement RDD.unpersist. Previously, unpersist needs to send B akka
messages, where B = number of blocks. Now unpersist only needs to send W
akka messages, where W = the number of workers.
2013-05-31 01:48:16 -07:00
jerryshao
926f41cc52
fix block manager UI display issue when enable spark.cleaner.ttl
2013-05-31 09:32:52 +08:00
Reynold Xin
bed1b08169
Do not create symlink for local add file. Instead, copy the file.
...
This prevents Spark from changing the original file's permission, and
also allow add file to work on non-posix operating systems.
2013-05-30 16:21:49 -07:00
Shivaram Venkataraman
3b0cd17343
Merge branch 'master' of git://github.com/mesos/spark
...
Conflicts:
core/src/test/scala/spark/ShuffleSuite.scala
2013-05-30 14:36:24 -07:00
Andrew xia
c3db3ea554
1. Add unit test for local scheduler
...
2. Move localTaskSetManager to a new file
2013-05-30 20:49:40 +08:00
Andrew xia
ecceb101d3
implement FIFO and fair scheduler for spark local mode
2013-05-30 10:43:01 +08:00
Shivaram Venkataraman
19fd6d54c0
Also flush serializer in revertPartialWrites
2013-05-29 17:29:34 -07:00
Shivaram Venkataraman
618c8cae1e
Skip fetching zero-sized blocks in OIO.
...
Also unify splitLocalRemoteBlocks for netty/nio and add a test case
2013-05-29 13:18:54 -07:00
Matei Zaharia
6ed71390d9
Merge pull request #626 from stephenh/remove-add-if-no-port
...
Remove unused addIfNoPort.
2013-05-29 10:14:22 -07:00
Shivaram Venkataraman
b79b10a6d6
Flush serializer to fix zero-size kryo blocks bug.
...
Also convert the local-cluster test case to check for non-zero block sizes
2013-05-29 00:52:55 -07:00
Matei Zaharia
41d230ccb0
Merge pull request #611 from squito/classloader
...
Use default classloaders for akka & deserializing task results
2013-05-28 23:35:24 -07:00
Stephen Haberman
4fe1fbdd51
Remove unused addIfNoPort.
2013-05-28 16:26:32 -05:00
Matei Zaharia
3db1e17baa
Merge pull request #620 from jerryshao/master
...
Fix CheckpointRDD java.io.FileNotFoundException when calling getPreferredLocations
2013-05-27 21:31:43 -07:00
Matei Zaharia
e8d4b6c296
Merge pull request #529 from xiajunluan/master
...
[SPARK-663]Implement Fair Scheduler in Spark Cluster Scheduler
2013-05-25 21:09:03 -07:00
Reynold Xin
26962c9340
Automatically configure Netty port. This makes unit tests using
...
local-cluster pass. Previously they were failing because Netty was
trying to bind to the same port for all processes.
Pair programmed with @shivaram.
2013-05-24 16:39:33 -07:00
Reynold Xin
6ea085169d
Fixed the bug that shuffle serializer is ignored by the new shuffle
...
block iterators for local blocks. Also added a unit test for that.
2013-05-24 14:08:37 -07:00
jerryshao
bd3ea8f2a6
fix CheckpointRDD getPreferredLocations java.io.FileNotFoundException
2013-05-24 14:26:19 +08:00
Charles Reiss
f350f14084
Use ARRAY_SAMPLE_SIZE constant instead of 100.0
2013-05-21 18:11:33 -07:00
Andrew xia
ecd6d75c6a
fix bug of unit tests
2013-05-21 06:49:23 +08:00
Reynold Xin
5912cc4967
Merge pull request #610 from JoshRosen/spark-747
...
Throw exception if TaskResult exceeds Akka frame size
2013-05-17 19:58:40 -07:00
Reynold Xin
8d78c5f89f
Changed the logging level from info to warning when addJar(null) is
...
called.
2013-05-17 18:51:35 -07:00
Andrew xia
3d4672eaa9
Merge branch 'master' into xiajunluan
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/scheduler/cluster/ClusterScheduler.scala
core/src/main/scala/spark/scheduler/cluster/TaskSetManager.scala
2013-05-18 07:28:03 +08:00
Andrew xia
d19753b9c7
expose TaskSetManager type to resourceOffer function in ClusterScheduler
2013-05-18 06:45:19 +08:00
Andrew xia
c6e2770bfe
Fix ClusterScheduler bug to avoid allocating tasks to same slave
2013-05-17 05:10:38 +08:00
Mridul Muralidharan
f0881f8d48
Hope this does not turn into a bike shed change
2013-05-17 01:58:50 +05:30
Mridul Muralidharan
feddd2530d
Filter out nulls - prevent NPE
2013-05-16 17:49:14 +05:30
Josh Rosen
b8e46b6074
Abort job if result exceeds Akka frame size; add test.
2013-05-16 01:57:57 -07:00
Matei Zaharia
2f576aba8f
Merge pull request #602 from rxin/shufflemerge
...
Manual merge & cleanup of Shane's Shuffle Performance Optimization
2013-05-15 18:06:24 -07:00
Reynold Xin
203d7b7c14
Merge pull request #593 from squito/driver_ui_link
...
Master UI has link to Application UI
2013-05-15 00:47:20 -07:00
Reynold Xin
f3491cb89b
Merge branch 'master' of github.com:mesos/spark into shufflemerge
...
Conflicts:
core/src/main/scala/spark/storage/BlockManager.scala
core/src/test/scala/spark/DistributedSuite.scala
project/SparkBuild.scala
2013-05-15 00:31:52 -07:00
Reynold Xin
f9d40a5848
Added a comment in JdbcRDD for example usage.
2013-05-14 23:29:57 -07:00
Reynold Xin
81ad2fa331
Merge branch 'jdbc' of github.com:koeninger/spark
...
Conflicts:
project/SparkBuild.scala
2013-05-14 23:12:00 -07:00
Imran Rashid
38d4b97c6d
use threads classloader when deserializing task results; classnotfoundexception includes classloader
2013-05-14 22:32:14 -07:00
Imran Rashid
d7d1da79d3
when akka starts, use akkas default classloader (current thread)
2013-05-14 22:32:09 -07:00
Matei Zaharia
016ac86830
Merge pull request #601 from rxin/emptyrdd-master
...
EmptyRDD (master branch 0.8)
2013-05-13 21:45:36 -07:00
Matei Zaharia
4b354e0a08
Merge pull request #589 from mridulm/master
...
Add support for instance local scheduling
2013-05-13 17:39:19 -07:00
Patrick Wendell
7f0833647b
Capturing class name
2013-05-12 07:54:03 -07:00
Patrick Wendell
72b9c4cb6e
Small fix
2013-05-11 23:53:50 -07:00
Patrick Wendell
1c15b85051
Removing import
2013-05-11 23:52:53 -07:00
Patrick Wendell
059ab88754
Changing technique to use same code path in all cases
2013-05-11 23:50:54 -07:00
Cody Koeninger
3da2305ed0
code cleanup per rxin comments
2013-05-11 23:59:07 -05:00
Josh Rosen
440719109e
Throw exception if task result exceeds Akka frame size.
...
This partially addresses SPARK-747.
2013-05-11 19:17:13 -07:00
Patrick Wendell
0345954530
SPARK-738: Spark should detect and squash nonserializable exceptions
2013-05-11 14:17:09 -07:00
Mark Hamstra
6e6b3e0d7e
Actually use the cleaned closure in foreachPartition
2013-05-10 13:02:34 -07:00
Imran Rashid
0ab818d508
fix linebreak
2013-05-09 00:38:59 -07:00
Reynold Xin
5d70ee4663
Cleaned up connection manager (moved many classes to their own files).
2013-05-07 22:42:15 -07:00
Reynold Xin
8388e8dd7a
Minor style fix in DiskStore...
2013-05-07 18:40:35 -07:00
Reynold Xin
547dcbe494
Cleaned up Scala files in network/netty from Shane's PR.
2013-05-07 18:39:33 -07:00
Reynold Xin
9e64396ca4
Cleaned up the Java files from Shane's PR.
2013-05-07 18:30:54 -07:00
Reynold Xin
0e5cc30868
Cleaned up BlockManager and BlockFetcherIterator from Shane's PR.
2013-05-07 18:18:24 -07:00
Reynold Xin
8b79485171
Moved BlockFetcherIterator to its own file.
2013-05-07 17:02:32 -07:00
Reynold Xin
90577ada69
Merge branch 'shuffle-performance-fix-0.7' of github.com:shane-huang/spark into shufflemerge
...
Conflicts:
core/src/main/scala/spark/storage/BlockManager.scala
core/src/main/scala/spark/storage/DiskStore.scala
project/SparkBuild.scala
2013-05-07 15:56:19 -07:00
Reynold Xin
0fd84965f6
Added EmptyRDD.
2013-05-06 15:40:34 -07:00
Imran Rashid
22a5063ae4
switch from separating appUI host & port to combining into just appUiUrl
2013-05-05 12:19:11 -07:00
Matei Zaharia
7af92f248b
Merge pull request #597 from JoshRosen/webui-fixes
...
Two minor bug fixes for Spark Web UI
2013-05-04 22:29:17 -07:00
Josh Rosen
42b1953c53
Fix SPARK-630: app details page shows finished executors as running.
2013-05-04 18:34:47 -07:00
Josh Rosen
d48e9fde01
Fix SPARK-629: weird number of cores in job details page.
2013-05-04 18:34:45 -07:00
Mridul Muralidharan
25198d7e9e
Merge branch 'master' of github.com:mridulm/spark
2013-05-04 20:45:56 +05:30
Mridul Muralidharan
5b011d18d7
Merge from master
2013-05-04 20:41:27 +05:30
Mridul Muralidharan
edb57c8331
Add support for instance local in getPreferredLocations of ZippedPartitionsBaseRDD. Add comments to both ZippedPartitionsBaseRDD and ZippedRDD to better describe the potential problem with the approach
2013-05-04 19:47:45 +05:30
Matei Zaharia
3bf2c868c3
Merge pull request #594 from shivaram/master
...
Add zip partitions to Java API
2013-05-03 18:27:30 -07:00
Shivaram Venkataraman
bb8a434f9d
Add zipPartitions to Java API.
2013-05-03 15:14:02 -07:00
Imran Rashid
6fae936088
applications (aka drivers) send their webUI address to master when registering so it can be displayed in the master web ui
2013-05-03 12:59:10 -07:00
Mridul Muralidharan
ea2a6f91d3
pull from master
2013-05-04 00:35:59 +05:30
Reynold Xin
93091f6936
Merge branch 'master' of github.com:mesos/spark into blockmanager
2013-05-03 01:02:32 -07:00
Reynold Xin
2bc895a829
Updated according to Matei's code review comment.
2013-05-03 01:02:16 -07:00
Mridul Muralidharan
11589c39d9
Fix ZippedRDD as part Matei's suggestion
2013-05-03 12:23:30 +05:30
Matei Zaharia
6fe9d4e61e
Merge pull request #592 from woggling/localdir-fix
...
Don't accept generated local directory names that can't be created
2013-05-02 21:33:56 -07:00
Matei Zaharia
538ee755b4
Merge pull request #581 from jerryshao/master
...
fix [SPARK-740] block manage UI throws exception when enabling Spark Streaming
2013-05-02 09:01:42 -07:00
Charles Reiss
c847dd3da2
Don't accept generated temp directory names that can't be created successfully.
2013-05-01 23:19:10 -07:00
Reynold Xin
4a31877408
Added the unpersist api to JavaRDD.
2013-05-01 20:31:54 -07:00
Reynold Xin
98df9d2853
Added removeRdd function in BlockManager.
2013-05-01 20:17:09 -07:00
Mridul Muralidharan
dfde9ce9dd
comment out debug versions of checkHost, etc from Utils - which were used to test
2013-05-02 07:41:33 +05:30
Mridul Muralidharan
1b5aaeadc7
Integrate review comments 2
2013-05-02 07:30:06 +05:30
jerryshao
c047f0e3ad
filter out Spark streaming block RDD and sort RDDInfo with id
2013-05-02 09:48:32 +08:00
Mridul Muralidharan
609a817f52
Integrate review comments on pull request
2013-05-02 06:44:33 +05:30
Reynold Xin
204eb32e14
Changed the type of the persistentRdds hashmap back to
...
TimeStampedHashMap.
2013-05-01 16:14:58 -07:00
Reynold Xin
34637b97ec
Added SparkContext.cleanup back. Not sure why it was removed before ...
2013-05-01 16:12:37 -07:00
Reynold Xin
3227ec8edd
Cleaned up Ram's code. Moved SparkContext.remove to RDD.unpersist.
...
Also updated unit tests to make sure they are properly testing for
concurrency.
2013-05-01 16:07:44 -07:00
harshars
8481562731
Merged Ram's commit on removing RDDs.
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
2013-05-01 14:42:17 -07:00
Mridul Muralidharan
27764a00f4
Fix some npe introduced accidentally
2013-05-01 20:56:05 +05:30
Mridul Muralidharan
d960e7e0f8
a) Add support for hyper local scheduling - specific to a host + port - before trying host local scheduling.
...
b) Add some fixes to test code to ensure it passes (and fixes some other issues).
c) Fix bug in task scheduling which incorrectly used availableCores instead of all cores on the node.
2013-05-01 20:24:00 +05:30
Matei Zaharia
aa8fe1a209
Merge pull request #586 from mridulm/master
...
Pull request to address issues Reynold Xin reported
2013-04-30 22:30:18 -07:00
Reynold Xin
dd7bef3147
Two minor fixes according to Ryan LeCompte's review.
2013-04-30 15:02:32 -07:00
Reynold Xin
cea6174573
Merge branch 'master' of github.com:mesos/spark into blockmanager
...
Conflicts:
core/src/main/scala/spark/BlockStoreShuffleFetcher.scala
2013-04-30 13:28:35 -07:00
Mridul Muralidharan
60cabb35cb
Add addition catch block for exception too
2013-05-01 01:17:14 +05:30
Mridul Muralidharan
3b748ced22
Be more aggressive and defensive in all uses of SelectionKey in select loop
2013-05-01 00:30:30 +05:30
Mridul Muralidharan
0f45477be1
Change indentation
2013-05-01 00:10:02 +05:30
Mridul Muralidharan
538614acfe
Be more aggressive and defensive in select also
2013-05-01 00:05:32 +05:30
Mridul Muralidharan
48854e1dbf
If key is not valid, close connection
2013-04-30 23:59:33 +05:30
Matei Zaharia
f708dda81e
Merge pull request #585 from pwendell/listener-perf
...
[Fix SPARK-742] Task Metrics should not employ per-record timing by default
2013-04-30 07:51:40 -07:00
Mridul Muralidharan
e46d547ccd
Fix issues reported by Reynold
2013-04-30 16:15:56 +05:30
Reynold Xin
1055785a83
Allow specifying the shuffle write file buffer size. The default buffer
...
size is 8KB in FastBufferedOutputStream, which is too small and would
cause a lot of disk seeks.
2013-04-29 23:33:56 -07:00
Reynold Xin
7007201201
Added a shuffle block manager so it is easier in the future to
...
consolidate shuffle output files.
2013-04-29 23:07:03 -07:00
Reynold Xin
d3586ef438
Merge branch 'blockmanager' of github.com:rxin/spark into blockmanager
...
Conflicts:
core/src/main/scala/spark/storage/DiskStore.scala
2013-04-29 15:44:18 -07:00
Patrick Wendell
016ce1fa9c
Using full package name for util
2013-04-29 12:02:27 -07:00
Patrick Wendell
540be6b154
Modified version of the fix which just removes all per-record tracking.
2013-04-29 11:32:07 -07:00
Patrick Wendell
224fbac061
Spark-742: TaskMetrics should not employ per-record timing.
...
This patch does three things:
1. Makes TimedIterator a trait with two implementations (one a no-op)
2. Makes the default behavior to use the no-op implementation
3. Removes DelegateBlockFetchTracker. This is just cleanup, but it seems like
the triat doesn't really reduce complexity in any way.
In the future we can add other implementations, e.g. ones which perform sampling.
2013-04-29 11:13:43 -07:00
Shivaram Venkataraman
604d3bf56c
Rename partition class and add scala doc
2013-04-28 16:31:07 -07:00
Shivaram Venkataraman
15acd49f07
Actually rename classes to ZippedPartitions*
...
(the previous commit only renamed the file)
2013-04-28 16:03:22 -07:00
Shivaram Venkataraman
6e84635ab9
Rename classes from MapZipped* to Zipped*
2013-04-28 15:58:40 -07:00
Shivaram Venkataraman
0cc6642b7c
Rename to zipPartitions and style changes
2013-04-28 05:11:03 -07:00
Shivaram Venkataraman
c9c4954d99
Add an interface to zip iterators of multiple RDDs
...
The current code supports 2, 3 or 4 arguments but can be extended
to more arguments if required.
2013-04-26 16:57:46 -07:00
Matei Zaharia
6e6b5204ea
Create an empty directory when checkpointing a 0-partition RDD (fixes a
...
test failure on Hadoop 2.0)
2013-04-25 00:42:37 -07:00
Reynold Xin
ba6ffa6a5f
Allow the specification of a shuffle serializer in the read path (for
...
local block reads).
2013-04-24 17:38:07 -07:00
Reynold Xin
aa618ed2a2
Allow changing the serializer on a per shuffle basis.
2013-04-24 14:52:49 -07:00
Mridul Muralidharan
dd515ca3ee
Attempt at fixing merge conflict
2013-04-24 09:24:17 +05:30
Reynold Xin
31ce6c66d6
Added a BlockObjectWriter interface in block manager so ShuffleMapTask
...
doesn't need to build up an array buffer for each shuffle bucket.
2013-04-23 17:48:59 -07:00
koeninger
dfac0aa5c2
prevent mysql driver from pulling entire resultset into memory. explicitly close resultset and statement.
2013-04-22 21:12:52 -05:00
Mridul Muralidharan
7acab3ab45
Fix review comments, add a new api to SparkHadoopUtil to create appropriate Configuration. Modify an example to show how to use SplitInfo
2013-04-22 08:01:13 +05:30
koeninger
b2a3f24dde
first attempt at an RDD to pull data from JDBC sources
2013-04-21 00:29:37 -05:00