Commit graph

1380 commits

Author SHA1 Message Date
Reynold Xin 2bc895a829 Updated according to Matei's code review comment. 2013-05-03 01:02:16 -07:00
Mridul Muralidharan 11589c39d9 Fix ZippedRDD as part Matei's suggestion 2013-05-03 12:23:30 +05:30
Matei Zaharia 6fe9d4e61e Merge pull request #592 from woggling/localdir-fix
Don't accept generated local directory names that can't be created
2013-05-02 21:33:56 -07:00
Matei Zaharia 538ee755b4 Merge pull request #581 from jerryshao/master
fix [SPARK-740] block manage UI throws exception when enabling Spark Streaming
2013-05-02 09:01:42 -07:00
Charles Reiss c847dd3da2 Don't accept generated temp directory names that can't be created successfully. 2013-05-01 23:19:10 -07:00
Reynold Xin 4a31877408 Added the unpersist api to JavaRDD. 2013-05-01 20:31:54 -07:00
Reynold Xin 98df9d2853 Added removeRdd function in BlockManager. 2013-05-01 20:17:09 -07:00
Mridul Muralidharan dfde9ce9dd comment out debug versions of checkHost, etc from Utils - which were used to test 2013-05-02 07:41:33 +05:30
Mridul Muralidharan 1b5aaeadc7 Integrate review comments 2 2013-05-02 07:30:06 +05:30
jerryshao c047f0e3ad filter out Spark streaming block RDD and sort RDDInfo with id 2013-05-02 09:48:32 +08:00
Mridul Muralidharan 609a817f52 Integrate review comments on pull request 2013-05-02 06:44:33 +05:30
Reynold Xin 204eb32e14 Changed the type of the persistentRdds hashmap back to
TimeStampedHashMap.
2013-05-01 16:14:58 -07:00
Reynold Xin 34637b97ec Added SparkContext.cleanup back. Not sure why it was removed before ... 2013-05-01 16:12:37 -07:00
Reynold Xin 3227ec8edd Cleaned up Ram's code. Moved SparkContext.remove to RDD.unpersist.
Also updated unit tests to make sure they are properly testing for
concurrency.
2013-05-01 16:07:44 -07:00
harshars 8481562731 Merged Ram's commit on removing RDDs.
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
2013-05-01 14:42:17 -07:00
Mridul Muralidharan 27764a00f4 Fix some npe introduced accidentally 2013-05-01 20:56:05 +05:30
Mridul Muralidharan d960e7e0f8 a) Add support for hyper local scheduling - specific to a host + port - before trying host local scheduling.
b) Add some fixes to test code to ensure it passes (and fixes some other issues).

c) Fix bug in task scheduling which incorrectly used availableCores instead of all cores on the node.
2013-05-01 20:24:00 +05:30
Matei Zaharia aa8fe1a209 Merge pull request #586 from mridulm/master
Pull request to address issues Reynold Xin reported
2013-04-30 22:30:18 -07:00
Reynold Xin dd7bef3147 Two minor fixes according to Ryan LeCompte's review. 2013-04-30 15:02:32 -07:00
Reynold Xin cea6174573 Merge branch 'master' of github.com:mesos/spark into blockmanager
Conflicts:
	core/src/main/scala/spark/BlockStoreShuffleFetcher.scala
2013-04-30 13:28:35 -07:00
Mridul Muralidharan 60cabb35cb Add addition catch block for exception too 2013-05-01 01:17:14 +05:30
Mridul Muralidharan 3b748ced22 Be more aggressive and defensive in all uses of SelectionKey in select loop 2013-05-01 00:30:30 +05:30
Mridul Muralidharan 0f45477be1 Change indentation 2013-05-01 00:10:02 +05:30
Mridul Muralidharan 538614acfe Be more aggressive and defensive in select also 2013-05-01 00:05:32 +05:30
Mridul Muralidharan 48854e1dbf If key is not valid, close connection 2013-04-30 23:59:33 +05:30
Matei Zaharia f708dda81e Merge pull request #585 from pwendell/listener-perf
[Fix SPARK-742] Task Metrics should not employ per-record timing by default
2013-04-30 07:51:40 -07:00
Mridul Muralidharan e46d547ccd Fix issues reported by Reynold 2013-04-30 16:15:56 +05:30
Reynold Xin 1055785a83 Allow specifying the shuffle write file buffer size. The default buffer
size is 8KB in FastBufferedOutputStream, which is too small and would
cause a lot of disk seeks.
2013-04-29 23:33:56 -07:00
Reynold Xin 7007201201 Added a shuffle block manager so it is easier in the future to
consolidate shuffle output files.
2013-04-29 23:07:03 -07:00
Reynold Xin d3586ef438 Merge branch 'blockmanager' of github.com:rxin/spark into blockmanager
Conflicts:
	core/src/main/scala/spark/storage/DiskStore.scala
2013-04-29 15:44:18 -07:00
Patrick Wendell 016ce1fa9c Using full package name for util 2013-04-29 12:02:27 -07:00
Patrick Wendell 540be6b154 Modified version of the fix which just removes all per-record tracking. 2013-04-29 11:32:07 -07:00
Patrick Wendell 224fbac061 Spark-742: TaskMetrics should not employ per-record timing.
This patch does three things:

1. Makes TimedIterator a trait with two implementations (one a no-op)
2. Makes the default behavior to use the no-op implementation
3. Removes DelegateBlockFetchTracker. This is just cleanup, but it seems like
   the triat doesn't really reduce complexity in any way.

In the future we can add other implementations, e.g. ones which perform sampling.
2013-04-29 11:13:43 -07:00
Matei Zaharia 0f45347c7b More unit test fixes 2013-04-28 22:29:27 -07:00
Matei Zaharia bce4089f22 Fix BlockManagerSuite to deal with clearing spark.hostPort 2013-04-28 22:23:48 -07:00
Matei Zaharia 68c07ea198 Merge pull request #582 from shivaram/master
Add zip partitions interface
2013-04-28 20:19:33 -07:00
Shivaram Venkataraman 604d3bf56c Rename partition class and add scala doc 2013-04-28 16:31:07 -07:00
Shivaram Venkataraman 15acd49f07 Actually rename classes to ZippedPartitions*
(the previous commit only renamed the file)
2013-04-28 16:03:22 -07:00
Shivaram Venkataraman 6e84635ab9 Rename classes from MapZipped* to Zipped* 2013-04-28 15:58:40 -07:00
Mridul Muralidharan afee902443 Attempt to fix streaming test failures after yarn branch merge 2013-04-28 22:26:45 +05:30
Shivaram Venkataraman 0cc6642b7c Rename to zipPartitions and style changes 2013-04-28 05:11:03 -07:00
Shivaram Venkataraman c9c4954d99 Add an interface to zip iterators of multiple RDDs
The current code supports 2, 3 or 4 arguments but can be extended
to more arguments if required.
2013-04-26 16:57:46 -07:00
Matei Zaharia 6e6b5204ea Create an empty directory when checkpointing a 0-partition RDD (fixes a
test failure on Hadoop 2.0)
2013-04-25 00:42:37 -07:00
Reynold Xin ba6ffa6a5f Allow the specification of a shuffle serializer in the read path (for
local block reads).
2013-04-24 17:38:07 -07:00
Reynold Xin aa618ed2a2 Allow changing the serializer on a per shuffle basis. 2013-04-24 14:52:49 -07:00
Mridul Muralidharan dd515ca3ee Attempt at fixing merge conflict 2013-04-24 09:24:17 +05:30
Reynold Xin 31ce6c66d6 Added a BlockObjectWriter interface in block manager so ShuffleMapTask
doesn't need to build up an array buffer for each shuffle bucket.
2013-04-23 17:48:59 -07:00
Mridul Muralidharan 8faf5c51c3 Patch from Thomas Graves to improve the YARN Client, and move to more production ready hadoop yarn branch 2013-04-24 02:31:57 +05:30
koeninger dfac0aa5c2 prevent mysql driver from pulling entire resultset into memory. explicitly close resultset and statement. 2013-04-22 21:12:52 -05:00
Mridul Muralidharan 7acab3ab45 Fix review comments, add a new api to SparkHadoopUtil to create appropriate Configuration. Modify an example to show how to use SplitInfo 2013-04-22 08:01:13 +05:30
koeninger b2a3f24dde first attempt at an RDD to pull data from JDBC sources 2013-04-21 00:29:37 -05:00
Mridul Muralidharan ac2e8e8720 Add some basic documentation 2013-04-19 00:13:19 +05:30
Mridul Muralidharan 5ee2f5c483 Cache pattern, add (commented out) alternatives for check* apis 2013-04-17 23:13:34 +05:30
Mridul Muralidharan f07961060d Add a small note on spark.tasks.schedule.aggression 2013-04-17 23:13:02 +05:30
Mridul Muralidharan 02dffd2eb0 Ensure all ask/await block for spark.akka.askTimeout - so that it is controllable : instead of arbitrary timeouts spread across codebase. In our tests, we use 30 seconds, though default of 10 is maintained 2013-04-17 05:52:57 +05:30
Mridul Muralidharan a402b23bcd Fudge order of classpath - so that our jars take precedence over what is in CLASSPATH variable. Sounds logical, hope there is no issue cos of it 2013-04-17 05:52:00 +05:30
Mridul Muralidharan bcdde331c3 Move from master to driver 2013-04-17 04:12:18 +05:30
Mridul Muralidharan ad80f68eb5 remove spurious debug statements 2013-04-16 22:15:34 +05:30
Mridul Muralidharan f7969f72ee Fix exception when checkpoint path does not exist (no data in rdd which is being checkpointed for example) 2013-04-16 21:51:38 +05:30
Mridul Muralidharan 323ab8ff3b Scala does not prevent variable shadowing ! Sick error due to it ... 2013-04-16 17:05:10 +05:30
shane-huang b493f55a4f fix a bug in netty Block Fetcher
Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-04-16 10:01:01 +08:00
Mridul Muralidharan 59c380d69a Fix npe 2013-04-16 03:29:38 +05:30
Mridul Muralidharan dd2b64ec97 Fix bug with atomic update 2013-04-16 03:19:24 +05:30
Mridul Muralidharan 5540ab8243 Use hostname instead of hostport for executor, fix creation of workdir 2013-04-16 02:57:43 +05:30
Mridul Muralidharan eb7e95e833 Commit job to persist files 2013-04-16 02:56:36 +05:30
Matei Zaharia a64c107449 Make ShuffledRDD.prev transient 2013-04-15 16:41:51 -04:00
Mridul Muralidharan 19652a44be Fix issue with FileSuite failing 2013-04-15 19:16:36 +05:30
Mridul Muralidharan 54b3d45b81 Checkpoint commit - compiles and passes a lot of tests - not all though, looking into FileSuite issues 2013-04-15 18:26:50 +05:30
Mridul Muralidharan d90d2af103 Checkpoint commit - compiles and passes a lot of tests - not all though, looking into FileSuite issues 2013-04-15 18:12:11 +05:30
Matei Zaharia c35d530bcf Fix compile error 2013-04-13 12:43:12 -04:00
Andrew Ash 29d3440efb Add details when BlockManager heartbeats time out
Makes it more clear what the threshold was for tuning spark.storage.blockManagerSlaveTimeoutMs

Before:
WARN  "Removing BlockManager BlockManagerId(201304022120-1976232532-5050-27464-0, myhostname, 51337) with no recent heart beats

After:
WARN  "Removing BlockManager BlockManagerId(201304022120-1976232532-5050-27464-0, myhostname, 51337) with no recent heart beats: 19216ms exceeds 15000ms
2013-04-11 01:54:02 -03:00
Matei Zaharia 65caa8f711 Merge remote-tracking branch 'jey/bump-development-version-to-0.8.0'
Conflicts:
	docs/_config.yml
	project/SparkBuild.scala
2013-04-08 12:43:17 -04:00
Matei Zaharia 054feb6448 Fixed a bug with zip 2013-04-07 21:15:21 -04:00
Matei Zaharia b5900d47b1 Fix compile warning 2013-04-07 20:55:42 -04:00
Matei Zaharia 6962d40b44 Fix deprecated warning 2013-04-07 20:27:33 -04:00
Mridul Muralidharan 6798a09df8 Add support for building against hadoop2-yarn : adding new maven profile for it 2013-04-07 17:47:38 +05:30
shane-huang df47b40b76 Shuffle Performance fix: Use netty embeded OIO file server instead of ConnectionManager
Shuffle Performance Optimization: do not send 0-byte block requests to reduce network messages
change reference from io.Source to scala.io.Source to avoid looking into io.netty package

Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-04-07 14:37:12 +08:00
Mark Hamstra e215f67923 Correct sense of 'filter out' in comment. 2013-03-31 08:00:13 -07:00
Mark Hamstra 8bcdc64005 Fixed broken filter in getWritableClass[T] 2013-03-30 22:09:52 -07:00
Matei Zaharia 9831bc1a09 Merge pull request #539 from cgrothaus/fix-webui-workdirpath
Bugfix: WorkerWebUI must respect workDirPath from Worker
2013-03-29 22:16:22 -07:00
Matei Zaharia 3cc8ab6e29 Merge pull request #541 from stephenh/shufflecoalesce
Add a shuffle parameter to coalesce.
2013-03-29 22:14:07 -07:00
Jey Kottalam bc8ba222ff Bump development version to 0.8.0 2013-03-28 15:42:01 -07:00
Holden Karau f5df729b12 Explicitly catch all throwables (warning in 2.10) 2013-03-24 16:15:32 -07:00
Stephen Haberman dd854d5b9f Use Boolean in the Java API, and != for assert. 2013-03-23 11:49:45 -05:00
Stephen Haberman 4ca273edc4 Merge branch 'master' into shufflecoalesce
Conflicts:
	core/src/test/scala/spark/RDDSuite.scala
2013-03-23 11:45:45 -05:00
Matei Zaharia b8949cab88 Merge pull request #505 from stephenh/volatile
Make Executor fields volatile since they're read from the thread pool.
2013-03-23 07:19:34 -07:00
Matei Zaharia fd53f2fc7b Merge pull request #510 from markhamstra/WithThing
mapWith, flatMapWith and filterWith
2013-03-23 07:13:21 -07:00
Stephen Haberman 00170eb0b9 Fix are/our typo. 2013-03-22 12:59:08 -05:00
Stephen Haberman 1c67c7dfd1 Add a shuffle parameter to coalesce.
This is useful for when you want just 1 output file (part-00000) but
still up the upstream RDD to be computed in parallel.
2013-03-22 08:54:44 -05:00
Christoph Grothaus 445f387ef4 Bugfix: WorkerWebUI must respect workDirPath from Worker 2013-03-22 11:08:40 +01:00
Matei Zaharia 35588490cb Merge pull request #538 from rxin/cogroup
Added mapSideCombine flag to CoGroupedRDD. Added unit test for CoGroupedRDD.
2013-03-20 19:27:47 -07:00
Stephen Haberman 4f4215311a Merge branch 'master' into volatile 2013-03-20 15:37:10 -05:00
Matei Zaharia b812e6b7bb Merge pull request #526 from markhamstra/foldByKey
Add foldByKey
2013-03-20 11:21:02 -07:00
Reynold Xin d48ee7e55e Merge branch 'master' of github.com:mesos/spark into cogroup 2013-03-20 14:00:28 +08:00
Reynold Xin 00a11304fd Added mapSideCombine flag to CoGroupedRDD. Added unit test for
CoGroupedRDD.
2013-03-20 13:49:51 +08:00
Matei Zaharia 945d1e720e Merge pull request #536 from sasurfer/master
CoalescedRDD for many partitions
2013-03-19 21:59:06 -07:00
Matei Zaharia 1cbbe94ac1 Merge pull request #534 from stephenh/removetrycatch
Remove try/catch block that can't be hit.
2013-03-19 21:34:34 -07:00
Andrey Kouznetsov bd167f83b0 call setConf from input format if it is Configurable 2013-03-19 17:15:15 +04:00
Giovanni Delussu aceae029f7 CoalescedRDD changed to work with a big number of partitions both in the original and the new coalesced RDD.
The limitation was in the range that Scala.Int can represent.
2013-03-19 11:25:45 +01:00
Stephen Haberman fb34967815 Remove try/catch block that can't be hit. 2013-03-18 01:55:50 -05:00