ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Marcelo Vanzin	aa3a1276f9	[SPARK-23103][CORE] Ensure correct sort order for negative values in LevelDB. The code was sorting "0" as "less than" negative values, which is a little wrong. Fix is simple, most of the changes are the added test and related cleanup. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #20284 from vanzin/SPARK-23103.	2018-01-19 13:32:20 -06:00
gatorsmile	651f76153f	[SPARK-23028] Bump master branch version to 2.4.0-SNAPSHOT ## What changes were proposed in this pull request? This patch bumps the master branch version to `2.4.0-SNAPSHOT`. ## How was this patch tested? N/A Author: gatorsmile <gatorsmile@gmail.com> Closes #20222 from gatorsmile/bump24.	2018-01-13 00:37:59 +08:00
Marcelo Vanzin	1c70da3bfb	[SPARK-20657][CORE] Speed up rendering of the stages page. There are two main changes to speed up rendering of the tasks list when rendering the stage page. The first one makes the code only load the tasks being shown in the current page of the tasks table, and information related to only those tasks. One side-effect of this change is that the graph that shows task-related events now only shows events for the tasks in the current page, instead of the previously hardcoded limit of "events for the first 1000 tasks". That ends up helping with readability, though. To make sorting efficient when using a disk store, the task wrapper was extended to include many new indices, one for each of the sortable columns in the UI, and metrics for which quantiles are calculated. The second changes the way metric quantiles are calculated for stages. Instead of using the "Distribution" class to process data for all task metrics, which requires scanning all tasks of a stage, the code now uses the KVStore "skip()" functionality to only read tasks that contain interesting information for the quantiles that are desired. This is still not cheap; because there are many metrics that the UI and API track, the code needs to scan the index for each metric to gather the information. Savings come mainly from skipping deserialization when using the disk store, but the in-memory code also seems to be faster than before (most probably because of other changes in this patch). To make subsequent calls faster, some quantiles are cached in the status store. This makes UIs much faster after the first time a stage has been loaded. With the above changes, a lot of code in the UI layer could be simplified. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #20013 from vanzin/SPARK-20657.	2018-01-11 19:41:48 +08:00
Sean Owen	c284c4e1f6	[MINOR] Fix a bunch of typos	2018-01-02 07:10:19 +09:00
Marcelo Vanzin	0e9a750a8d	[SPARK-20643][CORE] Add listener implementation to collect app state. The initial listener code is based on the existing JobProgressListener (and others), and tries to mimic their behavior as much as possible. The change also includes some minor code movement so that some types and methods from the initial history server code code can be reused. The code introduces a few mutable versions of public API types, used internally, to make it easier to update information without ugly copy methods, and also to make certain updates cheaper. Note the code here is not 100% correct. This is meant as a building ground for the UI integration in the next milestones. As different parts of the UI are ported, fixes will be made to the different parts of this code to account for the needed behavior. I also added annotations to API types so that Jackson is able to correctly deserialize options, sequences and maps that store primitive types. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #19383 from vanzin/SPARK-20643.	2017-10-26 11:05:16 -05:00
Marcelo Vanzin	74daf622de	[SPARK-20642][CORE] Store FsHistoryProvider listing data in a KVStore. The application listing is still generated from event logs, but is now stored in a KVStore instance. By default an in-memory store is used, but a new config allows setting a local disk path to store the data, in which case a LevelDB store will be created. The provider stores things internally using the public REST API types; I believe this is better going forward since it will make it easier to get rid of the internal history server API which is mostly redundant at this point. I also added a finalizer to LevelDBIterator, to make sure that resources are eventually released. This helps when code iterates but does not exhaust the iterator, thus not triggering the auto-close code. HistoryServerSuite was modified to not re-start the history server unnecessarily; this makes the json validation tests run more quickly. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #18887 from vanzin/SPARK-20642.	2017-09-27 20:33:41 +08:00
Armin	b6ef1f57bc	[SPARK-21970][CORE] Fix Redundant Throws Declarations in Java Codebase ## What changes were proposed in this pull request? 1. Removing all redundant throws declarations from Java codebase. 2. Removing dead code made visible by this from `ShuffleExternalSorter#closeAndGetSpills` ## How was this patch tested? Build still passes. Author: Armin <me@obrown.io> Closes #19182 from original-brownbear/SPARK-21970.	2017-09-13 14:04:26 +01:00
Sean Owen	de7af295c2	[MINOR][BUILD] Fix build warnings and Java lint errors ## What changes were proposed in this pull request? Fix build warnings and Java lint errors. This just helps a bit in evaluating (new) warnings in another PR I have open. ## How was this patch tested? Existing tests Author: Sean Owen <sowen@cloudera.com> Closes #19051 from srowen/JavaWarnings.	2017-08-25 16:07:13 +01:00
Marcelo Vanzin	2c1bfb497f	[SPARK-21671][CORE] Move kvstore to "util" sub-package, add private annotation. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #18886 from vanzin/SPARK-21671.	2017-08-08 14:33:27 -07:00
Marcelo Vanzin	979bf946d5	[SPARK-20655][CORE] In-memory KVStore implementation. This change adds an in-memory implementation of KVStore that can be used by the live UI. The implementation is not fully optimized, neither for speed nor space, but should be fast enough for using in the listener bus. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #18395 from vanzin/SPARK-20655.	2017-08-08 11:02:54 -07:00
Dongjoon Hyun	ecc5631351	[MINOR][BUILD] Fix Java linter errors ## What changes were proposed in this pull request? This PR cleans up a few Java linter errors for Apache Spark 2.2 release. ## How was this patch tested? ```bash $ dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks passed. ``` We can check the result at Travis CI, [here](https://travis-ci.org/dongjoon-hyun/spark/builds/244297894). Author: Dongjoon Hyun <dongjoon@apache.org> Closes #18345 from dongjoon-hyun/fix_lint_java_2.	2017-06-19 20:17:54 +01:00
Marcelo Vanzin	0cba495120	[SPARK-20641][CORE] Add key-value store abstraction and LevelDB implementation. This change adds an abstraction and LevelDB implementation for a key-value store that will be used to store UI and SHS data. The interface is described in KVStore.java (see javadoc). Specifics of the LevelDB implementation are discussed in the javadocs of both LevelDB.java and LevelDBTypeInfo.java. Included also are a few small benchmarks just to get some idea of latency. Because they're too slow for regular unit test runs, they're disabled by default. Tested with the included unit tests, and also as part of the overall feature implementation (including running SHS with hundreds of apps). Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #17902 from vanzin/shs-ng/M1.	2017-06-06 13:39:10 -05:00

12 commits