ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Cheng Hao	27bccc5ea9	[SPARK-5202] [SQL] Add hql variable substitution support https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution This is a block issue for the CLI user, it impacts the existed hql scripts from Hive. Author: Cheng Hao <hao.cheng@intel.com> Closes #4003 from chenghao-intel/substitution and squashes the following commits: bb41fd6 [Cheng Hao] revert the removed the implicit conversion af7c31a [Cheng Hao] add hql variable substitution support	2015-01-21 17:34:18 -08:00
Davies Liu	9bad062268	[SPARK-5355] make SparkConf thread-safe The SparkConf is not thread-safe, but is accessed by many threads. The getAll() could return parts of the configs if another thread is access it. This PR changes SparkConf.settings to a thread-safe TrieMap. Author: Davies Liu <davies@databricks.com> Closes #4143 from davies/safe-conf and squashes the following commits: f8fa1cf [Davies Liu] change to TrieMap a1d769a [Davies Liu] make SparkConf thread-safe	2015-01-21 16:51:54 -08:00
wangfei	3be2a887bf	[SPARK-4984][CORE][WEBUI] Adding a pop-up containing the full job description when it is very long In some case the job description will be very long, such as a long sql. refer to #3718 This PR add a pop-up for job description when it is long. ![image](https://cloud.githubusercontent.com/assets/7018048/5847400/c757cbbc-a207-11e4-891f-528821c2e68d.png) ![image](https://cloud.githubusercontent.com/assets/7018048/5847409/d434b2b4-a207-11e4-8813-03a74b43d766.png) Author: wangfei <wangfei1@huawei.com> Closes #3819 from scwf/popup-descrip-ui and squashes the following commits: ba02b83 [wangfei] address comments a7c5e7b [wangfei] spot that it's been truncated fbf6162 [wangfei] Merge branch 'master' into popup-descrip-ui 0bca96d [wangfei] remove no use val 4b55c3b [wangfei] fix style issue 353c6f4 [wangfei] pop up the description of job with a styled read-only text form field	2015-01-21 15:27:42 -08:00
Cheng Lian	ba19689fe7	[SQL] [Minor] Remove deprecated parquet tests This PR removes the deprecated `ParquetQuerySuite`, renamed `ParquetQuerySuite2` to `ParquetQuerySuite`, and refactored changes introduced in #4115 to `ParquetFilterSuite` . It is a follow-up of #3644. Notice that test cases in the old `ParquetQuerySuite` have already been well covered by other test suites introduced in #3644. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4116) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #4116 from liancheng/remove-deprecated-parquet-tests and squashes the following commits: f73b8f9 [Cheng Lian] Removes deprecated Parquet test suite	2015-01-21 14:38:10 -08:00
Josh Rosen	b328ac6c8c	Revert "[SPARK-5244] [SQL] add coalesce() in sql parser" This reverts commit `812d3679f5`.	2015-01-21 14:27:43 -08:00
Cheng Hao	8361078efa	[SPARK-5009] [SQL] Long keyword support in SQL Parsers * The `SqlLexical.allCaseVersions` will cause `StackOverflowException` if the key word is too long, the patch will fix that by normalizing all of the keywords in `SqlLexical`. * And make a unified SparkSQLParser for sharing the common code. Author: Cheng Hao <hao.cheng@intel.com> Closes #3926 from chenghao-intel/long_keyword and squashes the following commits: 686660f [Cheng Hao] Support Long Keyword and Refactor the SQLParsers	2015-01-21 13:05:56 -08:00
Daoyuan Wang	812d3679f5	[SPARK-5244] [SQL] add coalesce() in sql parser Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #4040 from adrian-wang/coalesce and squashes the following commits: 0ac8e8f [Daoyuan Wang] add coalesce() in sql parser	2015-01-21 12:59:41 -08:00
Kenji Kikushima	3ee3ab592e	[SPARK-5064][GraphX] Add numEdges upperbound validation for R-MAT graph generator to prevent infinite loop I looked into GraphGenerators#chooseCell, and found that chooseCell can't generate more edges than pow(2, (2 * (log2(numVertices)-1))) to make a Power-law graph. (Ex. numVertices:4 upperbound:4, numVertices:8 upperbound:16, numVertices:16 upperbound:64) If we request more edges over the upperbound, rmatGraph fall into infinite loop. So, how about adding an argument validation? Author: Kenji Kikushima <kikushima.kenji@lab.ntt.co.jp> Closes #3950 from kj-ki/SPARK-5064 and squashes the following commits: 4ee18c7 [Ankur Dave] Reword error message and add unit test d760bc7 [Kenji Kikushima] Add numEdges upperbound validation for R-MAT graph generator to prevent infinite loop.	2015-01-21 12:36:03 -08:00
nate.crosswhite	7450a992b3	[SPARK-4749] [mllib]: Allow initializing KMeans clusters using a seed This implements the functionality for SPARK-4749 and provides units tests in Scala and PySpark Author: nate.crosswhite <nate.crosswhite@stresearch.com> Author: nxwhite-str <nxwhite-str@users.noreply.github.com> Author: Xiangrui Meng <meng@databricks.com> Closes #3610 from nxwhite-str/master and squashes the following commits: a2ebbd3 [nxwhite-str] Merge pull request #1 from mengxr/SPARK-4749-kmeans-seed 7668124 [Xiangrui Meng] minor updates f8d5928 [nate.crosswhite] Addressing PR issues 277d367 [nate.crosswhite] Merge remote-tracking branch 'upstream/master' 9156a57 [nate.crosswhite] Merge remote-tracking branch 'upstream/master' 5d087b4 [nate.crosswhite] Adding KMeans train with seed and Scala unit test 616d111 [nate.crosswhite] Merge remote-tracking branch 'upstream/master' 35c1884 [nate.crosswhite] Add kmeans initial seed to pyspark API	2015-01-21 10:32:10 -08:00
Reza Zadeh	aa1e22b17b	[MLlib] [SPARK-5301] Missing conversions and operations on IndexedRowMatrix and CoordinateMatrix * Transpose is missing from CoordinateMatrix (this is cheap to compute, so it should be there) * IndexedRowMatrix should be convertable to CoordinateMatrix (conversion added) Tests for both added. Author: Reza Zadeh <reza@databricks.com> Closes #4089 from rezazadeh/matutils and squashes the following commits: ec5238b [Reza Zadeh] Array -> Iterator to avoid temp array 3ce0b5d [Reza Zadeh] Array -> Iterator bbc907a [Reza Zadeh] Use 'i' for index, and zipWithIndex cb10ae5 [Reza Zadeh] remove unnecessary import a7ae048 [Reza Zadeh] Missing linear algebra utilities	2015-01-21 09:48:38 -08:00
Sandy Ryza	2eeada373e	SPARK-1714. Take advantage of AMRMClient APIs to simplify logic in YarnA... ...llocator The goal of this PR is to simplify YarnAllocator as much as possible and get it up to the level of code quality we see in the rest of Spark. In service of this, it does a few things: * Uses AMRMClient APIs for matching containers to requests. * Adds calls to AMRMClient.removeContainerRequest so that, when we use a container, we don't end up requesting it again. * Removes YarnAllocator's host->rack cache. YARN's RackResolver already does this caching, so this is redundant. * Adds tests for basic YarnAllocator functionality. * Breaks up the allocateResources method, which was previously nearly 300 lines. * A little bit of stylistic cleanup. * Fixes a bug that causes three times the requests to be filed when preferred host locations are given. The patch is lossy. In particular, it loses the logic for trying to avoid containers bunching up on nodes. As I understand it, the logic that's gone is: * If, in a single response from the RM, we receive a set of containers on a node, and prefer some number of containers on that node greater than 0 but less than the number we received, give back the delta between what we preferred and what we received. This seems like a weird way to avoid bunching E.g. it does nothing to avoid bunching when we don't request containers on particular nodes. Author: Sandy Ryza <sandy@cloudera.com> Closes #3765 from sryza/sandy-spark-1714 and squashes the following commits: 32a5942 [Sandy Ryza] Muffle RackResolver logs 74f56dd [Sandy Ryza] Fix a couple comments and simplify requestTotalExecutors 60ea4bd [Sandy Ryza] Fix scalastyle ca35b53 [Sandy Ryza] Simplify further e9cf8a6 [Sandy Ryza] Fix YarnClusterSuite 257acf3 [Sandy Ryza] Remove locality stuff and more cleanup 59a3c5e [Sandy Ryza] Take out rack stuff 5f72fd5 [Sandy Ryza] Further documentation and cleanup 89edd68 [Sandy Ryza] SPARK-1714. Take advantage of AMRMClient APIs to simplify logic in YarnAllocator	2015-01-21 10:31:54 -06:00
WangTao	8c06a5faac	[SPARK-5336][YARN]spark.executor.cores must not be less than spark.task.cpus https://issues.apache.org/jira/browse/SPARK-5336 Author: WangTao <barneystinson@aliyun.com> Author: WangTaoTheTonic <barneystinson@aliyun.com> Closes #4123 from WangTaoTheTonic/SPARK-5336 and squashes the following commits: 6c9676a [WangTao] Update ClientArguments.scala 9632d3a [WangTaoTheTonic] minor comment fix d03d6fa [WangTaoTheTonic] import ordering should be alphabetical' 3112af9 [WangTao] spark.executor.cores must not be less than spark.task.cpus	2015-01-21 09:42:30 -06:00
jerryshao	424d8c6fff	[SPARK-5297][Streaming] Fix Java file stream type erasure problem Current Java file stream doesn't support custom key/value type because of loss of type information, details can be seen in [SPARK-5297](https://issues.apache.org/jira/browse/SPARK-5297). Fix this problem by getting correct `ClassTag` from `Class[_]`. Author: jerryshao <saisai.shao@intel.com> Closes #4101 from jerryshao/SPARK-5297 and squashes the following commits: e022ca3 [jerryshao] Add Mima exclusion ecd61b8 [jerryshao] Fix Java fileInputStream type erasure problem	2015-01-20 23:37:47 -08:00
Kannan Rajah	ec5b0f2cef	[HOTFIX] Update pom.xml to pull MapR's Hadoop version 2.4.1. Author: Kannan Rajah <rkannan82@gmail.com> Closes #4108 from rkannan82/master and squashes the following commits: eca095b [Kannan Rajah] Update pom.xml to pull MapR's Hadoop version 2.4.1.	2015-01-20 23:34:23 -08:00
Davies Liu	bad6c57211	[SPARK-5275] [Streaming] include python source code Include the python source code into assembly jar. cc mengxr pwendell Author: Davies Liu <davies@databricks.com> Closes #4128 from davies/build_streaming2 and squashes the following commits: 546af4c [Davies Liu] fix indent 48859b2 [Davies Liu] include python source code	2015-01-20 22:44:58 -08:00
Kousuke Saruta	9a151ce58b	[SPARK-5294][WebUI] Hide tables in AllStagePages for "Active Stages, Completed Stages and Failed Stages" when they are empty Related to SPARK-5228 and #4028, `AllStagesPage` also should hide the table for `ActiveStages`, `CompleteStages` and `FailedStages` when they are empty. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #4083 from sarutak/SPARK-5294 and squashes the following commits: a7625c1 [Kousuke Saruta] Fixed conflicts	2015-01-20 16:40:46 -08:00
Yuhao Yang	2f82c841fa	[SPARK-5186] [MLLIB] Vector.equals and Vector.hashCode are very inefficient JIRA Issue: https://issues.apache.org/jira/browse/SPARK-5186 Currently SparseVector is using the inherited equals from Vector, which will create a full-size array for even the sparse vector. The pull request contains a specialized equals optimization that improves on both time and space. 1. The implementation will be consistent with the original. Especially it will keep equality comparison between SparseVector and DenseVector. Author: Yuhao Yang <hhbyyh@gmail.com> Author: Yuhao Yang <yuhao@yuhaodevbox.sh.intel.com> Closes #3997 from hhbyyh/master and squashes the following commits: 0d9d130 [Yuhao Yang] function name change and ut update 93f0d46 [Yuhao Yang] unify sparse vs dense vectors 985e160 [Yuhao Yang] improve locality for equals bdf8789 [Yuhao Yang] improve equals and rewrite hashCode for Vector a6952c3 [Yuhao Yang] fix scala style for comments 50abef3 [Yuhao Yang] fix ut for sparse vector with explicit 0 f41b135 [Yuhao Yang] iterative equals for sparse vector 5741144 [Yuhao Yang] Specialized equals for SparseVector	2015-01-20 15:20:20 -08:00
Reynold Xin	d181c2a1fc	[SPARK-5323][SQL] Remove Row's Seq inheritance. Author: Reynold Xin <rxin@databricks.com> Closes #4115 from rxin/row-seq and squashes the following commits: e33abd8 [Reynold Xin] Fixed compilation error. cceb650 [Reynold Xin] Python test fixes, and removal of WrapDynamic. `0334a52` [Reynold Xin] mkString. 9cdeb7d [Reynold Xin] Hive tests. 15681c2 [Reynold Xin] Fix more test cases. ea9023a [Reynold Xin] Fixed a catalyst test. c5e2cb5 [Reynold Xin] Minor patch up. b9cab7c [Reynold Xin] [SPARK-5323][SQL] Remove Row's Seq inheritance.	2015-01-20 15:16:14 -08:00
Yin Huai	bc20a52b34	[SPARK-5287][SQL] Add defaultSizeOf to every data type. JIRA: https://issues.apache.org/jira/browse/SPARK-5287 This PR only add `defaultSizeOf` to data types and make those internal type classes `protected[sql]`. I will use another PR to cleanup the type hierarchy of data types. Author: Yin Huai <yhuai@databricks.com> Closes #4081 from yhuai/SPARK-5287 and squashes the following commits: 90cec75 [Yin Huai] Update unit test. e1c600c [Yin Huai] Make internal classes protected[sql]. 7eaba68 [Yin Huai] Add `defaultSize` method to data types. fd425e0 [Yin Huai] Add all native types to NativeType.defaultSizeOf.	2015-01-20 13:26:36 -08:00
Travis Galoppo	23e25543be	SPARK-5019 [MLlib] - GaussianMixtureModel exposes instances of MultivariateGauss... This PR modifies GaussianMixtureModel to expose instances of MutlivariateGaussian rather than separate mean and covariance arrays. Author: Travis Galoppo <tjg2107@columbia.edu> Closes #4088 from tgaloppo/spark-5019 and squashes the following commits: 3ef6c7f [Travis Galoppo] In GaussianMixtureModel: Changed name of weight, gaussian to weights, gaussians. Other sources modified accordingly. 091e8da [Travis Galoppo] SPARK-5019 - GaussianMixtureModel exposes instances of MultivariateGaussian rather than mean/covariance matrices	2015-01-20 12:58:11 -08:00
Kousuke Saruta	769aced9e7	[SPARK-5329][WebUI] UIWorkloadGenerator should stop SparkContext. UIWorkloadGenerator don't stop SparkContext. I ran UIWorkloadGenerator and try to watch the result at WebUI but Jobs are marked as finished. It's because SparkContext is not stopped. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #4112 from sarutak/SPARK-5329 and squashes the following commits: bcc0fa9 [Kousuke Saruta] Disabled scalastyle for a bock comment 86a3b95 [Kousuke Saruta] Fixed UIWorkloadGenerator to stop SparkContext in it	2015-01-20 12:40:55 -08:00
Jacek Lewandowski	c93a57f0d6	SPARK-4660: Use correct class loader in JavaSerializer (copy of PR #3840... ... by Piotr Kolaczkowski) Author: Jacek Lewandowski <lewandowski.jacek@gmail.com> Closes #4113 from jacek-lewandowski/SPARK-4660-master and squashes the following commits: a5e84ca [Jacek Lewandowski] SPARK-4660: Use correct class loader in JavaSerializer (copy of PR #3840 by Piotr Kolaczkowski)	2015-01-20 12:38:01 -08:00
Cheng Lian	8140802786	[SQL][Minor] Refactors deeply nested FP style code in BooleanSimplification This is a follow-up of #4090. The original deeply nested `reduceOption` code is hard to grasp. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4091) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #4091 from liancheng/refactor-boolean-simplification and squashes the following commits: cd8860b [Cheng Lian] Improves `compareConditions` to handle more subtle cases 1bf3258 [Cheng Lian] Avoids converting predicate sets to lists e833ca4 [Cheng Lian] Refactors deeply nested FP style code	2015-01-20 11:20:14 -08:00
Jongyoul Lee	9d9294aebf	[SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - Rewind ByteBuffer before making ByteString (This fixes a bug introduced in #3849 / SPARK-4014) Author: Jongyoul Lee <jongyoul@gmail.com> Closes #4119 from jongyoul/SPARK-5333 and squashes the following commits: c6693a8 [Jongyoul Lee] [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - changed logDebug location 4141f58 [Jongyoul Lee] [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - Added license information 2190606 [Jongyoul Lee] [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - Adjusted imported libraries b7f5517 [Jongyoul Lee] [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - Rewind ByteBuffer before making ByteString	2015-01-20 10:18:10 -08:00
Ilayaperumal Gopinathan	4afad9c770	[SPARK-4803] [streaming] Remove duplicate RegisterReceiver message - The ReceiverTracker receivers `RegisterReceiver` messages two times 1) When the actor at `ReceiverSupervisorImpl`'s preStart is invoked 2) After the receiver is started at the executor `onReceiverStart()` at `ReceiverSupervisorImpl` Though, RegisterReceiver message uses the same streamId and the receiverInfo gets updated everytime the message is processed at the `ReceiverTracker`, it makes sense to call register receiver only after the receiver is started. Author: Ilayaperumal Gopinathan <igopinathan@pivotal.io> Closes #3648 from ilayaperumalg/RTActor-remove-prestart and squashes the following commits: 868efab [Ilayaperumal Gopinathan] Increase receiverInfo collector timeout to 2 secs 3118e5e [Ilayaperumal Gopinathan] Fix StreamingListenerSuite's startedReceiverStreamIds size 634abde [Ilayaperumal Gopinathan] Remove duplicate RegisterReceiver message	2015-01-20 01:41:10 -08:00
Reynold Xin	debc031953	[SQL][minor] Add a log4j file for catalyst test. Author: Reynold Xin <rxin@databricks.com> Closes #4117 from rxin/catalyst-test-log4j and squashes the following commits: 8ad610b [Reynold Xin] [SQL][minor] Add a log4j file for catalyst test.	2015-01-20 00:55:25 -08:00
Sean Owen	306ff187af	SPARK-5270 [CORE] Provide isEmpty() function in RDD API Pretty minor, but submitted for consideration -- this would at least help people make this check in the most efficient way I know. Author: Sean Owen <sowen@cloudera.com> Closes #4074 from srowen/SPARK-5270 and squashes the following commits: 66885b8 [Sean Owen] Add note that JavaRDDLike should not be implemented by user code 2e9b490 [Sean Owen] More tests, and Mima-exclude the new isEmpty method in JavaRDDLike 28395ff [Sean Owen] Add isEmpty to Java, Python 7dd04b7 [Sean Owen] Add efficient RDD.isEmpty()	2015-01-19 22:50:45 -08:00
zsxwing	e69fb8c75a	[SPARK-5214][Core] Add EventLoop and change DAGScheduler to an EventLoop This PR adds a simple `EventLoop` and use it to replace Actor in DAGScheduler. `EventLoop` is a general class to support that posting events in multiple threads and handling events in a single event thread. Author: zsxwing <zsxwing@gmail.com> Closes #4016 from zsxwing/event-loop and squashes the following commits: aefa1ce [zsxwing] Add protected to on*** methods 5cfac83 [zsxwing] Remove null check of eventProcessLoop dba35b2 [zsxwing] Add a test that onReceive swallows InterruptException 460f7b3 [zsxwing] Use volatile instead of Atomic things in unit tests 227bf33 [zsxwing] Add a stop flag and some tests 37f79c6 [zsxwing] Fix docs 55fb6f6 [zsxwing] Add private[spark] to EventLoop 1f73eac [zsxwing] Fix the import order 3b2e59c [zsxwing] Add EventLoop and change DAGScheduler to an EventLoop	2015-01-19 18:15:51 -08:00
Venkata Ramana Gollamudi	74de94ea6d	[SPARK-4504][Examples] fix run-example failure if multiple assembly jars exist Fix run-example script to fail fast with useful error message if multiple example assembly JARs are present. Author: Venkata Ramana Gollamudi <ramana.gollamudi@huawei.com> Closes #3377 from gvramana/run-example_fails and squashes the following commits: fa7f481 [Venkata Ramana Gollamudi] Fixed review comments, avoiding ls output scanning. 6aa1ab7 [Venkata Ramana Gollamudi] Fix run-examples script error during multiple jars	2015-01-19 12:00:33 -08:00
Yin Huai	2604bc35d7	[SPARK-5286][SQL] Fail to drop an invalid table when using the data source API JIRA: https://issues.apache.org/jira/browse/SPARK-5286 Author: Yin Huai <yhuai@databricks.com> Closes #4076 from yhuai/SPARK-5286 and squashes the following commits: 6b69ed1 [Yin Huai] Catch all exception when we try to uncache a query.	2015-01-19 10:45:29 -08:00
Yin Huai	cd5da42853	[SPARK-5284][SQL] Insert into Hive throws NPE when a inner complex type field has a null value JIRA: https://issues.apache.org/jira/browse/SPARK-5284 Author: Yin Huai <yhuai@databricks.com> Closes #4077 from yhuai/SPARK-5284 and squashes the following commits: fceacd6 [Yin Huai] Check if a value is null when the field has a complex type.	2015-01-19 10:44:12 -08:00
Yuhao Yang	4432568aac	[SPARK-5282][mllib]: RowMatrix easily gets int overflow in the memory size warning JIRA: https://issues.apache.org/jira/browse/SPARK-5282 fix the possible int overflow in the memory computation warning Author: Yuhao Yang <hhbyyh@gmail.com> Closes #4069 from hhbyyh/addscStop and squashes the following commits: e54e5c8 [Yuhao Yang] change to MB based number 7afac23 [Yuhao Yang] 5282: fix int overflow in the warning	2015-01-19 10:10:15 -08:00
Patrick Wendell	1ac1c1dc1b	MAINTENANCE: Automated closing of pull requests. This commit exists to close the following pull requests on Github: Closes #3584 (close requested by 'pwendell') Closes #2433 (close requested by 'pwendell') Closes #1697 (close requested by 'pwendell') Closes #4042 (close requested by 'pwendell') Closes #3723 (close requested by 'pwendell') Closes #1560 (close requested by 'pwendell') Closes #3515 (close requested by 'pwendell') Closes #1386 (close requested by 'pwendell')	2015-01-19 02:05:24 -08:00
Jongyoul Lee	4a4f9ccba2	[SPARK-5088] Use spark-class for running executors directly Author: Jongyoul Lee <jongyoul@gmail.com> Closes #3897 from jongyoul/SPARK-5088 and squashes the following commits: 8232aa8 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Added a listenerBus for fixing test cases 932289f [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Rebased from master 613cb47 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Fixed code if spark.executor.uri doesn't have any value - Added test cases ff57bda [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Adjusted orders of import 97e4bd4 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Changed command for using spark-class directly - Delete sbin/spark-executor and moved some codes into spark-class' case statement	2015-01-19 02:01:56 -08:00
Ilya Ganelin	3453d578ad	[SPARK-3288] All fields in TaskMetrics should be private and use getters/setters I've updated the fields and all usages of these fields in the Spark code. I've verified that this did not break anything on my local repo. Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes #4020 from ilganeli/SPARK-3288 and squashes the following commits: 39f3810 [Ilya Ganelin] resolved merge issues e446287 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3288 b8c05cb [Ilya Ganelin] Missed making a variable private 6444391 [Ilya Ganelin] Made inc/dec functions private[spark] 1149e78 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3288 26b312b [Ilya Ganelin] Debugging tests 17146c2 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3288 5525c20 [Ilya Ganelin] Completed refactoring to make vars in TaskMetrics class private c64da4f [Ilya Ganelin] Partially updated task metrics to make some vars private	2015-01-19 01:32:36 -08:00
Prashant Sharma	851b6a9bba	SPARK-5217 Spark UI should report pending stages during job execution on AllStagesPage. ![screenshot from 2015-01-16 13 43 25](https://cloud.githubusercontent.com/assets/992952/5773256/d61df300-9d85-11e4-9b5a-6730058839fa.png) This is a first step towards having time remaining estimates for queued and running jobs. See SPARK-5216 Author: Prashant Sharma <prashant.s@imaginea.com> Closes #4043 from ScrapCodes/SPARK-5216/5217-show-waiting-stages and squashes the following commits: 3b11803 [Prashant Sharma] Review feedback. 0992842 [Prashant Sharma] Switched to Linked hashmap, changed the order to active->pending->completed->failed. And changed pending stages to not reverse sort. c19d82a [Prashant Sharma] SPARK-5217 Spark UI should report pending stages during job execution on AllStagesPage.	2015-01-19 01:28:42 -08:00
Jacky Li	7dbf1fdb81	[SQL] fix typo in class description Author: Jacky Li <jacky.likun@gmail.com> Closes #4100 from jackylk/patch-9 and squashes the following commits: b13b9d6 [Jacky Li] Update SQLConf.scala 4d3f83d [Jacky Li] Update SQLConf.scala fcc8c85 [Jacky Li] [SQL] fix typo in class description	2015-01-18 23:59:08 -08:00
Reynold Xin	1955645488	[SQL][minor] Put DataTypes.java in java dir. Author: Reynold Xin <rxin@databricks.com> Closes #4097 from rxin/javarename and squashes the following commits: c5ce96a [Reynold Xin] [SQL][minor] Put DataTypes.java in java dir.	2015-01-18 16:35:40 -08:00
scwf	1a200a3eea	[SQL][Minor] Update sql doc according to data type APIs changes Follow up of #3925 /cc rxin Author: scwf <wangfei1@huawei.com> Closes #4095 from scwf/sql-doc and squashes the following commits: 97e311b [scwf] update sql doc since now expose only one version of the data type APIs	2015-01-18 11:03:13 -08:00
Reynold Xin	1727e0841c	[SPARK-5279][SQL] Use java.math.BigDecimal as the exposed Decimal type. Author: Reynold Xin <rxin@databricks.com> Closes #4092 from rxin/bigdecimal and squashes the following commits: 27b08c9 [Reynold Xin] Fixed test. 10cb496 [Reynold Xin] [SPARK-5279][SQL] Use java.math.BigDecimal as the exposed Decimal type.	2015-01-18 11:01:42 -08:00
Patrick Wendell	ad16da1bcc	[HOTFIX]: Minor clean up regarding skipped artifacts in build files. There are two relevant 'skip' configurations in the build, the first is for "mvn install" and the second is for "mvn deploy". As of 1.2, we actually use "mvn install" to generate our deployed artifcts, because we have some customization of the nexus upload due to having to cross compile for Scala 2.10 and 2.11. There is no reason to have differents settings for these values, this patch simply cleans this up for the repl/ and yarn/ projects. Author: Patrick Wendell <patrick@databricks.com> Closes #4080 from pwendell/master and squashes the following commits: e21b78b [Patrick Wendell] [HOTFIX]: Minor clean up regarding skipped artifacts in build files.	2015-01-17 23:15:12 -08:00
Patrick Wendell	e12b5b61c1	MAINTENANCE: Automated closing of pull requests. This commit exists to close the following pull requests on Github: Closes #681 (close requested by 'pwendell') Closes #3682 (close requested by 'pwendell') Closes #4035 (close requested by 'JoshRosen') Closes #4084 (close requested by 'pwendell') Closes #2310 (close requested by 'pwendell')	2015-01-17 20:39:54 -08:00
Reynold Xin	e7884bc950	[SQL][Minor] Added comments and examples to explain BooleanSimplification Author: Reynold Xin <rxin@databricks.com> Closes #4090 from rxin/booleanSimplification and squashes the following commits: 68c8986 [Reynold Xin] [SQL][Minor] Added comments and examples to explain BooleanSimplification.	2015-01-17 17:35:53 -08:00
Michael Armbrust	6999910b0c	[SPARK-5096] Use sbt tasks instead of vals to get hadoop version This makes it possible to compile spark as an external `ProjectRef` where as now we throw a `FileNotFoundException` Author: Michael Armbrust <michael@databricks.com> Closes #3905 from marmbrus/effectivePom and squashes the following commits: fd63aae [Michael Armbrust] Use sbt tasks instead of vals to get hadoop version.	2015-01-17 17:03:07 -08:00
scwf	c1f3c27f22	[SPARK-4937][SQL] Comment for the newly optimization rules in `BooleanSimplification` Follow up of #3778 /cc rxin Author: scwf <wangfei1@huawei.com> Closes #4086 from scwf/commentforspark-4937 and squashes the following commits: aaf89f6 [scwf] code style issue 2d3406e [scwf] added comment for spark-4937	2015-01-17 15:51:24 -08:00
Reynold Xin	f3bfc768d4	[SQL][minor] Improved Row documentation. Author: Reynold Xin <rxin@databricks.com> Closes #4085 from rxin/row-doc and squashes the following commits: f77cb27 [Reynold Xin] [SQL][minor] Improved Row documentation.	2015-01-17 00:11:08 -08:00
Reynold Xin	61b427d4b1	[SPARK-5193][SQL] Remove Spark SQL Java-specific API. After the following patches, the main (Scala) API is now usable for Java users directly. https://github.com/apache/spark/pull/4056 https://github.com/apache/spark/pull/4054 https://github.com/apache/spark/pull/4049 https://github.com/apache/spark/pull/4030 https://github.com/apache/spark/pull/3965 https://github.com/apache/spark/pull/3958 Author: Reynold Xin <rxin@databricks.com> Closes #4065 from rxin/sql-java-api and squashes the following commits: b1fd860 [Reynold Xin] Fix Mima 6d86578 [Reynold Xin] Ok one more attempt in fixing Python... e8f1455 [Reynold Xin] Fix Python again... 3e53f91 [Reynold Xin] Fixed Python. 83735da [Reynold Xin] Fix BigDecimal test. e9f1de3 [Reynold Xin] Use scala BigDecimal. 500d2c4 [Reynold Xin] Fix Decimal. ba3bfa2 [Reynold Xin] Updated javadoc for RowFactory. c4ae1c5 [Reynold Xin] [SPARK-5193][SQL] Remove Spark SQL Java-specific API.	2015-01-16 21:09:06 -08:00
scwf	ee1c1f3a04	[SPARK-4937][SQL] Adding optimization to simplify the And, Or condition in spark sql Adding optimization to simplify the And/Or condition in spark sql. There are two kinds of Optimization 1 Numeric condition optimization, such as: a < 3 && a > 5 ---- False a < 1 \|\| a > 0 ---- True a > 3 && a > 5 => a > 5 (a < 2 \|\| b > 5) && a < 2 => a < 2 2 optimizing the some query from a cartesian product into equi-join, such as this sql (one of hive-testbench): ``` select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#32' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 7 and l_quantity <= 7 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#35' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 15 and l_quantity <= 15 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#24' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 26 and l_quantity <= 26 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) ``` It has a repeated expression in Or, so we can optimize it by ``` (a && b) \|\| (a && c) = a && (b \|\| c)``` Before optimization, this sql hang in my locally test, and the physical plan is: ![image](https://cloud.githubusercontent.com/assets/7018048/5539175/31cf38e8-8af9-11e4-95e3-336f9b3da4a4.png) After optimization, this sql run successfully in 20+ seconds, and its physical plan is: ![image](https://cloud.githubusercontent.com/assets/7018048/5539176/39a558e0-8af9-11e4-912b-93de94b20075.png) This PR focus on the second optimization and some simple ones of the first. For complex Numeric condition optimization, I will make a follow up PR. Author: scwf <wangfei1@huawei.com> Author: wangfei <wangfei1@huawei.com> Closes #3778 from scwf/filter1 and squashes the following commits: 58bcbc2 [scwf] minor format fix 9570211 [scwf] conflicts fix 527e6ce [scwf] minor comment improvements 5c6f134 [scwf] remove numeric optimizations and move to BooleanSimplification 546a82b [wangfei] style fix 825fa69 [wangfei] adding more tests a001e8c [wangfei] revert pom changes 32a595b [scwf] improvement and test fix e99a26c [wangfei] refactory And/Or optimization to make it more readable and clean	2015-01-16 14:01:22 -08:00
Ilya Ganelin	fd3a8a1d15	[SPARK-733] Add documentation on use of accumulators in lazy transformation I've added documentation clarifying the particular lack of clarity highlighted in the relevant JIRA. I've also added code examples for this issue to clarify the explanation. Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes #4022 from ilganeli/SPARK-733 and squashes the following commits: 587def5 [Ilya Ganelin] Updated to clarify verbage df3afd7 [Ilya Ganelin] Revert "Partially updated task metrics to make some vars private" 3f6c512 [Ilya Ganelin] Revert "Completed refactoring to make vars in TaskMetrics class private" 58034fb [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-733 4dc2cdb [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-733 3a38db1 [Ilya Ganelin] Verified documentation update by building via jekyll 33b5a2d [Ilya Ganelin] Added code examples for java and python 1fd59b2 [Ilya Ganelin] Updated documentation for accumulators to highlight lazy evaluation issue 5525c20 [Ilya Ganelin] Completed refactoring to make vars in TaskMetrics class private c64da4f [Ilya Ganelin] Partially updated task metrics to make some vars private	2015-01-16 13:25:17 -08:00
Chip Senkbeil	d05c9ee6e8	[SPARK-4923][REPL] Add Developer API to REPL to allow re-publishing the REPL jar As requested in [SPARK-4923](https://issues.apache.org/jira/browse/SPARK-4923), I've provided a rough DeveloperApi for the repl. I've only done this for Scala 2.10 because it does not appear that Scala 2.11 is implemented. The Scala 2.11 repl still has the old `scala.tools.nsc` package and the SparkIMain does not appear to have the class server needed for shipping code over (unless this functionality has been moved elsewhere?). I also left alone the `ExecutorClassLoader` and `ConstructorCleaner` as I have no experience working with those classes. This marks the majority of methods in `SparkIMain` as _private_ with a few special cases being _private[repl]_ as other classes within the same package access them. Any public method has been marked with `DeveloperApi` as suggested by pwendell and I took the liberty of writing up a Scaladoc for each one to further elaborate their usage. As the Scala 2.11 REPL [conforms]((https://github.com/scala/scala/pull/2206)) to [JSR-223](http://docs.oracle.com/javase/8/docs/technotes/guides/scripting/), the [Spark Kernel](https://github.com/ibm-et/spark-kernel) uses the SparkIMain of Scala 2.10 in the same manner. So, I've taken care to expose methods predominately related to necessary functionality towards a JSR-223 scripting engine implementation. 1. The ability to _get_ variables from the interpreter (and other information like class/symbol/type) 2. The ability to _put_ variables into the interpreter 3. The ability to _compile_ code 4. The ability to _execute_ code 5. The ability to get contextual information regarding the scripting environment Additional functionality that I marked as exposed included the following: 1. The blocking initialization method (needed to actually start SparkIMain instance) 2. The class server uri (needed to set the _spark.repl.class.uri_ property after initialization), reduced from the entire class server 3. The class output directory (beneficial for tools like ours that need to inspect and use the directory where class files are served) 4. Suppression (quiet/silence) mechanics for output 5. Ability to add a jar to the compile/runtime classpath 6. The reset/close functionality 7. Metric information (last variable assignment, "needed" for extracting results from last execution, real variable name for better debugging) 8. Execution wrapper (useful to have, but debatable) Aside from `SparkIMain`, I updated other classes/traits and their methods in the _repl_ package to be private/package protected where possible. A few odd cases (like the SparkHelper being in the scala.tools.nsc package to expose a private variable) still exist, but I did my best at labelling them. `SparkCommandLine` has proven useful to extract settings and `SparkJLineCompletion` has proven to be useful in implementing auto-completion in the [Spark Kernel](https://github.com/ibm-et/spark-kernel) project. Other than those - and `SparkIMain` - my experience has yielded that other classes/methods are not necessary for interactive applications taking advantage of the REPL API. Tested via the following: $ export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m" $ mvn -Phadoop-2.3 -DskipTests clean package && mvn -Phadoop-2.3 test Also did a quick verification that I could start the shell and execute some code: $ ./bin/spark-shell ... scala> val x = 3 x: Int = 3 scala> sc.parallelize(1 to 10).reduce(_+_) ... res1: Int = 55 Author: Chip Senkbeil <rcsenkbe@us.ibm.com> Author: Chip Senkbeil <chip.senkbeil@gmail.com> Closes #4034 from rcsenkbeil/AddDeveloperApiToRepl and squashes the following commits: 053ca75 [Chip Senkbeil] Fixed failed build by adding missing DeveloperApi import c1b88aa [Chip Senkbeil] Added DeveloperApi to public classes in repl 6dc1ee2 [Chip Senkbeil] Added missing method to expose error reporting flag 26fd286 [Chip Senkbeil] Refactored other Scala 2.10 classes and methods to be private/package protected where possible 925c112 [Chip Senkbeil] Added DeveloperApi and Scaladocs to SparkIMain for Scala 2.10	2015-01-16 12:56:40 -08:00

... 13 14 15 16 17 ...

10074 commits