Commit graph

1489 commits

Author SHA1 Message Date
Reynold Xin f9d40a5848 Added a comment in JdbcRDD for example usage. 2013-05-14 23:29:57 -07:00
Reynold Xin 404f9ff617 Added derby dependency to Maven pom files for the JDBC Java test. 2013-05-14 23:28:34 -07:00
Reynold Xin 81ad2fa331 Merge branch 'jdbc' of github.com:koeninger/spark
Conflicts:
	project/SparkBuild.scala
2013-05-14 23:12:00 -07:00
Imran Rashid 38d4b97c6d use threads classloader when deserializing task results; classnotfoundexception includes classloader 2013-05-14 22:32:14 -07:00
Imran Rashid d7d1da79d3 when akka starts, use akkas default classloader (current thread) 2013-05-14 22:32:09 -07:00
Cody Koeninger b16c4896f6 add test for JdbcRDD using embedded derby, per rxin suggestion 2013-05-14 23:44:04 -05:00
Matei Zaharia 016ac86830 Merge pull request #601 from rxin/emptyrdd-master
EmptyRDD (master branch 0.8)
2013-05-13 21:45:36 -07:00
Matei Zaharia 4b354e0a08 Merge pull request #589 from mridulm/master
Add support for instance local scheduling
2013-05-13 17:39:19 -07:00
Patrick Wendell 7f0833647b Capturing class name 2013-05-12 07:54:03 -07:00
Patrick Wendell 72b9c4cb6e Small fix 2013-05-11 23:53:50 -07:00
Patrick Wendell 1c15b85051 Removing import 2013-05-11 23:52:53 -07:00
Patrick Wendell 059ab88754 Changing technique to use same code path in all cases 2013-05-11 23:50:54 -07:00
Cody Koeninger 3da2305ed0 code cleanup per rxin comments 2013-05-11 23:59:07 -05:00
Josh Rosen 440719109e Throw exception if task result exceeds Akka frame size.
This partially addresses SPARK-747.
2013-05-11 19:17:13 -07:00
Patrick Wendell a5c28bb888 Removing unnecessary map 2013-05-11 14:20:39 -07:00
Patrick Wendell 0345954530 SPARK-738: Spark should detect and squash nonserializable exceptions 2013-05-11 14:17:09 -07:00
Mark Hamstra 6e6b3e0d7e Actually use the cleaned closure in foreachPartition 2013-05-10 13:02:34 -07:00
Mridul Muralidharan b05c9d22d7 Remove explicit hardcoding of yarn-standalone as args(0) if it is missing. 2013-05-09 18:49:12 +05:30
Imran Rashid 0ab818d508 fix linebreak 2013-05-09 00:38:59 -07:00
Reynold Xin 9cafacf32d Added test for Netty suite. 2013-05-07 22:42:37 -07:00
Reynold Xin 5d70ee4663 Cleaned up connection manager (moved many classes to their own files). 2013-05-07 22:42:15 -07:00
Reynold Xin 8388e8dd7a Minor style fix in DiskStore... 2013-05-07 18:40:35 -07:00
Reynold Xin 547dcbe494 Cleaned up Scala files in network/netty from Shane's PR. 2013-05-07 18:39:33 -07:00
Reynold Xin 9e64396ca4 Cleaned up the Java files from Shane's PR. 2013-05-07 18:30:54 -07:00
Reynold Xin 0e5cc30868 Cleaned up BlockManager and BlockFetcherIterator from Shane's PR. 2013-05-07 18:18:24 -07:00
Reynold Xin 8b79485171 Moved BlockFetcherIterator to its own file. 2013-05-07 17:02:32 -07:00
Reynold Xin 90577ada69 Merge branch 'shuffle-performance-fix-0.7' of github.com:shane-huang/spark into shufflemerge
Conflicts:
	core/src/main/scala/spark/storage/BlockManager.scala
	core/src/main/scala/spark/storage/DiskStore.scala
	project/SparkBuild.scala
2013-05-07 15:56:19 -07:00
Jey Kottalam aacca1b8a8 Update Maven build to Scala 2.9.3 2013-05-07 14:39:44 -07:00
Reynold Xin 64d4d2b036 Added tests for joins, cogroups, and unions for EmptyRDD. 2013-05-06 16:30:46 -07:00
Reynold Xin 0fd84965f6 Added EmptyRDD. 2013-05-06 15:40:34 -07:00
Imran Rashid 22a5063ae4 switch from separating appUI host & port to combining into just appUiUrl 2013-05-05 12:19:11 -07:00
Matei Zaharia 7af92f248b Merge pull request #597 from JoshRosen/webui-fixes
Two minor bug fixes for Spark Web UI
2013-05-04 22:29:17 -07:00
Reynold Xin 0a2bed356b Fixed flaky unpersist test in DistributedSuite. 2013-05-04 21:50:08 -07:00
Reynold Xin 62a077cd08 Merge branch 'unpersist-test' of github.com:shivaram/spark into blockmanager 2013-05-04 21:49:50 -07:00
Josh Rosen 42b1953c53 Fix SPARK-630: app details page shows finished executors as running. 2013-05-04 18:34:47 -07:00
Josh Rosen c0688451a6 Fix wrong closing tags in web UI HTML. 2013-05-04 18:34:46 -07:00
Josh Rosen d48e9fde01 Fix SPARK-629: weird number of cores in job details page. 2013-05-04 18:34:45 -07:00
Mridul Muralidharan 25198d7e9e Merge branch 'master' of github.com:mridulm/spark 2013-05-04 20:45:56 +05:30
Mridul Muralidharan 5b011d18d7 Merge from master 2013-05-04 20:41:27 +05:30
Mridul Muralidharan edb57c8331 Add support for instance local in getPreferredLocations of ZippedPartitionsBaseRDD. Add comments to both ZippedPartitionsBaseRDD and ZippedRDD to better describe the potential problem with the approach 2013-05-04 19:47:45 +05:30
Matei Zaharia 3bf2c868c3 Merge pull request #594 from shivaram/master
Add zip partitions to Java API
2013-05-03 18:27:30 -07:00
Shivaram Venkataraman 2274ad0786 Fix flaky test by changing catch and adding sleep 2013-05-03 16:35:35 -07:00
Shivaram Venkataraman bb8a434f9d Add zipPartitions to Java API. 2013-05-03 15:14:02 -07:00
Imran Rashid 6fae936088 applications (aka drivers) send their webUI address to master when registering so it can be displayed in the master web ui 2013-05-03 12:59:10 -07:00
Mridul Muralidharan ea2a6f91d3 pull from master 2013-05-04 00:35:59 +05:30
Reynold Xin 93091f6936 Merge branch 'master' of github.com:mesos/spark into blockmanager 2013-05-03 01:02:32 -07:00
Reynold Xin 2bc895a829 Updated according to Matei's code review comment. 2013-05-03 01:02:16 -07:00
Mridul Muralidharan 11589c39d9 Fix ZippedRDD as part Matei's suggestion 2013-05-03 12:23:30 +05:30
Matei Zaharia 6fe9d4e61e Merge pull request #592 from woggling/localdir-fix
Don't accept generated local directory names that can't be created
2013-05-02 21:33:56 -07:00
Matei Zaharia 538ee755b4 Merge pull request #581 from jerryshao/master
fix [SPARK-740] block manage UI throws exception when enabling Spark Streaming
2013-05-02 09:01:42 -07:00
Charles Reiss c847dd3da2 Don't accept generated temp directory names that can't be created successfully. 2013-05-01 23:19:10 -07:00
Reynold Xin 4a31877408 Added the unpersist api to JavaRDD. 2013-05-01 20:31:54 -07:00
Reynold Xin 98df9d2853 Added removeRdd function in BlockManager. 2013-05-01 20:17:09 -07:00
Mridul Muralidharan dfde9ce9dd comment out debug versions of checkHost, etc from Utils - which were used to test 2013-05-02 07:41:33 +05:30
Mridul Muralidharan 1b5aaeadc7 Integrate review comments 2 2013-05-02 07:30:06 +05:30
jerryshao c047f0e3ad filter out Spark streaming block RDD and sort RDDInfo with id 2013-05-02 09:48:32 +08:00
Mridul Muralidharan 609a817f52 Integrate review comments on pull request 2013-05-02 06:44:33 +05:30
Reynold Xin 204eb32e14 Changed the type of the persistentRdds hashmap back to
TimeStampedHashMap.
2013-05-01 16:14:58 -07:00
Reynold Xin 34637b97ec Added SparkContext.cleanup back. Not sure why it was removed before ... 2013-05-01 16:12:37 -07:00
Reynold Xin 3227ec8edd Cleaned up Ram's code. Moved SparkContext.remove to RDD.unpersist.
Also updated unit tests to make sure they are properly testing for
concurrency.
2013-05-01 16:07:44 -07:00
harshars 8481562731 Merged Ram's commit on removing RDDs.
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
2013-05-01 14:42:17 -07:00
Mridul Muralidharan 27764a00f4 Fix some npe introduced accidentally 2013-05-01 20:56:05 +05:30
Mridul Muralidharan d960e7e0f8 a) Add support for hyper local scheduling - specific to a host + port - before trying host local scheduling.
b) Add some fixes to test code to ensure it passes (and fixes some other issues).

c) Fix bug in task scheduling which incorrectly used availableCores instead of all cores on the node.
2013-05-01 20:24:00 +05:30
Matei Zaharia aa8fe1a209 Merge pull request #586 from mridulm/master
Pull request to address issues Reynold Xin reported
2013-04-30 22:30:18 -07:00
Reynold Xin dd7bef3147 Two minor fixes according to Ryan LeCompte's review. 2013-04-30 15:02:32 -07:00
Reynold Xin cea6174573 Merge branch 'master' of github.com:mesos/spark into blockmanager
Conflicts:
	core/src/main/scala/spark/BlockStoreShuffleFetcher.scala
2013-04-30 13:28:35 -07:00
Mridul Muralidharan 60cabb35cb Add addition catch block for exception too 2013-05-01 01:17:14 +05:30
Mridul Muralidharan 3b748ced22 Be more aggressive and defensive in all uses of SelectionKey in select loop 2013-05-01 00:30:30 +05:30
Mridul Muralidharan 0f45477be1 Change indentation 2013-05-01 00:10:02 +05:30
Mridul Muralidharan 538614acfe Be more aggressive and defensive in select also 2013-05-01 00:05:32 +05:30
Mridul Muralidharan 48854e1dbf If key is not valid, close connection 2013-04-30 23:59:33 +05:30
Matei Zaharia f708dda81e Merge pull request #585 from pwendell/listener-perf
[Fix SPARK-742] Task Metrics should not employ per-record timing by default
2013-04-30 07:51:40 -07:00
Mridul Muralidharan e46d547ccd Fix issues reported by Reynold 2013-04-30 16:15:56 +05:30
Reynold Xin 1055785a83 Allow specifying the shuffle write file buffer size. The default buffer
size is 8KB in FastBufferedOutputStream, which is too small and would
cause a lot of disk seeks.
2013-04-29 23:33:56 -07:00
Reynold Xin 7007201201 Added a shuffle block manager so it is easier in the future to
consolidate shuffle output files.
2013-04-29 23:07:03 -07:00
Reynold Xin d3586ef438 Merge branch 'blockmanager' of github.com:rxin/spark into blockmanager
Conflicts:
	core/src/main/scala/spark/storage/DiskStore.scala
2013-04-29 15:44:18 -07:00
Patrick Wendell 016ce1fa9c Using full package name for util 2013-04-29 12:02:27 -07:00
Patrick Wendell 540be6b154 Modified version of the fix which just removes all per-record tracking. 2013-04-29 11:32:07 -07:00
Patrick Wendell 224fbac061 Spark-742: TaskMetrics should not employ per-record timing.
This patch does three things:

1. Makes TimedIterator a trait with two implementations (one a no-op)
2. Makes the default behavior to use the no-op implementation
3. Removes DelegateBlockFetchTracker. This is just cleanup, but it seems like
   the triat doesn't really reduce complexity in any way.

In the future we can add other implementations, e.g. ones which perform sampling.
2013-04-29 11:13:43 -07:00
Matei Zaharia 0f45347c7b More unit test fixes 2013-04-28 22:29:27 -07:00
Matei Zaharia bce4089f22 Fix BlockManagerSuite to deal with clearing spark.hostPort 2013-04-28 22:23:48 -07:00
Matei Zaharia 68c07ea198 Merge pull request #582 from shivaram/master
Add zip partitions interface
2013-04-28 20:19:33 -07:00
Shivaram Venkataraman 604d3bf56c Rename partition class and add scala doc 2013-04-28 16:31:07 -07:00
Shivaram Venkataraman 15acd49f07 Actually rename classes to ZippedPartitions*
(the previous commit only renamed the file)
2013-04-28 16:03:22 -07:00
Shivaram Venkataraman 6e84635ab9 Rename classes from MapZipped* to Zipped* 2013-04-28 15:58:40 -07:00
Mridul Muralidharan afee902443 Attempt to fix streaming test failures after yarn branch merge 2013-04-28 22:26:45 +05:30
Shivaram Venkataraman 0cc6642b7c Rename to zipPartitions and style changes 2013-04-28 05:11:03 -07:00
Shivaram Venkataraman c9c4954d99 Add an interface to zip iterators of multiple RDDs
The current code supports 2, 3 or 4 arguments but can be extended
to more arguments if required.
2013-04-26 16:57:46 -07:00
Matei Zaharia 6e6b5204ea Create an empty directory when checkpointing a 0-partition RDD (fixes a
test failure on Hadoop 2.0)
2013-04-25 00:42:37 -07:00
Reynold Xin ba6ffa6a5f Allow the specification of a shuffle serializer in the read path (for
local block reads).
2013-04-24 17:38:07 -07:00
Reynold Xin aa618ed2a2 Allow changing the serializer on a per shuffle basis. 2013-04-24 14:52:49 -07:00
Mridul Muralidharan dd515ca3ee Attempt at fixing merge conflict 2013-04-24 09:24:17 +05:30
Reynold Xin 31ce6c66d6 Added a BlockObjectWriter interface in block manager so ShuffleMapTask
doesn't need to build up an array buffer for each shuffle bucket.
2013-04-23 17:48:59 -07:00
Mridul Muralidharan 8faf5c51c3 Patch from Thomas Graves to improve the YARN Client, and move to more production ready hadoop yarn branch 2013-04-24 02:31:57 +05:30
koeninger dfac0aa5c2 prevent mysql driver from pulling entire resultset into memory. explicitly close resultset and statement. 2013-04-22 21:12:52 -05:00
Mridul Muralidharan 7acab3ab45 Fix review comments, add a new api to SparkHadoopUtil to create appropriate Configuration. Modify an example to show how to use SplitInfo 2013-04-22 08:01:13 +05:30
koeninger b2a3f24dde first attempt at an RDD to pull data from JDBC sources 2013-04-21 00:29:37 -05:00
Mridul Muralidharan ac2e8e8720 Add some basic documentation 2013-04-19 00:13:19 +05:30
Andrew xia 8436bd5d4a remove TaskSetQueueManager and update code style 2013-04-19 02:17:22 +08:00
Andrew xia e0603d7e8b refactor the Schedulable interface and add unit test for SchedulingAlgorithm 2013-04-18 13:13:54 +08:00
Mridul Muralidharan 5ee2f5c483 Cache pattern, add (commented out) alternatives for check* apis 2013-04-17 23:13:34 +05:30
Mridul Muralidharan f07961060d Add a small note on spark.tasks.schedule.aggression 2013-04-17 23:13:02 +05:30
Mridul Muralidharan 02dffd2eb0 Ensure all ask/await block for spark.akka.askTimeout - so that it is controllable : instead of arbitrary timeouts spread across codebase. In our tests, we use 30 seconds, though default of 10 is maintained 2013-04-17 05:52:57 +05:30
Mridul Muralidharan a402b23bcd Fudge order of classpath - so that our jars take precedence over what is in CLASSPATH variable. Sounds logical, hope there is no issue cos of it 2013-04-17 05:52:00 +05:30
Mridul Muralidharan bcdde331c3 Move from master to driver 2013-04-17 04:12:18 +05:30
Mridul Muralidharan ad80f68eb5 remove spurious debug statements 2013-04-16 22:15:34 +05:30
Mridul Muralidharan f7969f72ee Fix exception when checkpoint path does not exist (no data in rdd which is being checkpointed for example) 2013-04-16 21:51:38 +05:30
Mridul Muralidharan 323ab8ff3b Scala does not prevent variable shadowing ! Sick error due to it ... 2013-04-16 17:05:10 +05:30
shane-huang b493f55a4f fix a bug in netty Block Fetcher
Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-04-16 10:01:01 +08:00
Mridul Muralidharan 59c380d69a Fix npe 2013-04-16 03:29:38 +05:30
Mridul Muralidharan dd2b64ec97 Fix bug with atomic update 2013-04-16 03:19:24 +05:30
Mridul Muralidharan 5540ab8243 Use hostname instead of hostport for executor, fix creation of workdir 2013-04-16 02:57:43 +05:30
Mridul Muralidharan eb7e95e833 Commit job to persist files 2013-04-16 02:56:36 +05:30
Matei Zaharia a64c107449 Make ShuffledRDD.prev transient 2013-04-15 16:41:51 -04:00
Mridul Muralidharan 19652a44be Fix issue with FileSuite failing 2013-04-15 19:16:36 +05:30
Mridul Muralidharan 54b3d45b81 Checkpoint commit - compiles and passes a lot of tests - not all though, looking into FileSuite issues 2013-04-15 18:26:50 +05:30
Mridul Muralidharan d90d2af103 Checkpoint commit - compiles and passes a lot of tests - not all though, looking into FileSuite issues 2013-04-15 18:12:11 +05:30
Matei Zaharia c35d530bcf Fix compile error 2013-04-13 12:43:12 -04:00
Andrew Ash 29d3440efb Add details when BlockManager heartbeats time out
Makes it more clear what the threshold was for tuning spark.storage.blockManagerSlaveTimeoutMs

Before:
WARN  "Removing BlockManager BlockManagerId(201304022120-1976232532-5050-27464-0, myhostname, 51337) with no recent heart beats

After:
WARN  "Removing BlockManager BlockManagerId(201304022120-1976232532-5050-27464-0, myhostname, 51337) with no recent heart beats: 19216ms exceeds 15000ms
2013-04-11 01:54:02 -03:00
Andrew xia 2f883c515f Contiue to update codes for scala code style
1.refactor braces for "class" "if" "while" "for" "match"
2.make code lines less than 100
3.refactor class parameter and extends defination
2013-04-09 13:02:50 +08:00
Matei Zaharia 65caa8f711 Merge remote-tracking branch 'jey/bump-development-version-to-0.8.0'
Conflicts:
	docs/_config.yml
	project/SparkBuild.scala
2013-04-08 12:43:17 -04:00
Matei Zaharia 054feb6448 Fixed a bug with zip 2013-04-07 21:15:21 -04:00
Matei Zaharia b5900d47b1 Fix compile warning 2013-04-07 20:55:42 -04:00
Matei Zaharia 6962d40b44 Fix deprecated warning 2013-04-07 20:27:33 -04:00
Mridul Muralidharan 6798a09df8 Add support for building against hadoop2-yarn : adding new maven profile for it 2013-04-07 17:47:38 +05:30
shane-huang df47b40b76 Shuffle Performance fix: Use netty embeded OIO file server instead of ConnectionManager
Shuffle Performance Optimization: do not send 0-byte block requests to reduce network messages
change reference from io.Source to scala.io.Source to avoid looking into io.netty package

Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-04-07 14:37:12 +08:00
Andrew xia 2b373dd07a add properties default value null to fix sbt/sbt test errors 2013-04-02 12:11:14 +08:00
Mark Hamstra e215f67923 Correct sense of 'filter out' in comment. 2013-03-31 08:00:13 -07:00
Mark Hamstra 8bcdc64005 Fixed broken filter in getWritableClass[T] 2013-03-30 22:09:52 -07:00
Matei Zaharia 9831bc1a09 Merge pull request #539 from cgrothaus/fix-webui-workdirpath
Bugfix: WorkerWebUI must respect workDirPath from Worker
2013-03-29 22:16:22 -07:00
Matei Zaharia 3cc8ab6e29 Merge pull request #541 from stephenh/shufflecoalesce
Add a shuffle parameter to coalesce.
2013-03-29 22:14:07 -07:00
Andrew xia 1a28f92711 change some typo and some spacing 2013-03-29 08:34:28 +08:00
Andrew xia def3d1c84a 1.remove redundant spacing in source code
2.replace get/set functions with val and var defination
2013-03-29 08:20:35 +08:00
Jey Kottalam bc8ba222ff Bump development version to 0.8.0 2013-03-28 15:42:01 -07:00
Holden Karau f5df729b12 Explicitly catch all throwables (warning in 2.10) 2013-03-24 16:15:32 -07:00
Stephen Haberman dd854d5b9f Use Boolean in the Java API, and != for assert. 2013-03-23 11:49:45 -05:00
Stephen Haberman 4ca273edc4 Merge branch 'master' into shufflecoalesce
Conflicts:
	core/src/test/scala/spark/RDDSuite.scala
2013-03-23 11:45:45 -05:00
Matei Zaharia b8949cab88 Merge pull request #505 from stephenh/volatile
Make Executor fields volatile since they're read from the thread pool.
2013-03-23 07:19:34 -07:00
Matei Zaharia fd53f2fc7b Merge pull request #510 from markhamstra/WithThing
mapWith, flatMapWith and filterWith
2013-03-23 07:13:21 -07:00
Andrew xia d1d9bdaabe Just update typo and comments 2013-03-23 07:25:30 +08:00
Stephen Haberman 00170eb0b9 Fix are/our typo. 2013-03-22 12:59:08 -05:00
Stephen Haberman 1c67c7dfd1 Add a shuffle parameter to coalesce.
This is useful for when you want just 1 output file (part-00000) but
still up the upstream RDD to be computed in parallel.
2013-03-22 08:54:44 -05:00
Christoph Grothaus 445f387ef4 Bugfix: WorkerWebUI must respect workDirPath from Worker 2013-03-22 11:08:40 +01:00
Matei Zaharia 35588490cb Merge pull request #538 from rxin/cogroup
Added mapSideCombine flag to CoGroupedRDD. Added unit test for CoGroupedRDD.
2013-03-20 19:27:47 -07:00
Stephen Haberman 4f4215311a Merge branch 'master' into volatile 2013-03-20 15:37:10 -05:00
Matei Zaharia b812e6b7bb Merge pull request #526 from markhamstra/foldByKey
Add foldByKey
2013-03-20 11:21:02 -07:00
Reynold Xin d48ee7e55e Merge branch 'master' of github.com:mesos/spark into cogroup 2013-03-20 14:00:28 +08:00
Reynold Xin 00a11304fd Added mapSideCombine flag to CoGroupedRDD. Added unit test for
CoGroupedRDD.
2013-03-20 13:49:51 +08:00
Matei Zaharia 945d1e720e Merge pull request #536 from sasurfer/master
CoalescedRDD for many partitions
2013-03-19 21:59:06 -07:00
Matei Zaharia 1cbbe94ac1 Merge pull request #534 from stephenh/removetrycatch
Remove try/catch block that can't be hit.
2013-03-19 21:34:34 -07:00