ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Zheng RuiFeng	6ce7c481dc	[SPARK-13386][GRAPHX] ConnectedComponents should support maxIteration option JIRA: https://issues.apache.org/jira/browse/SPARK-13386 ## What changes were proposed in this pull request? add maxIteration option for ConnectedComponents algorithm ## How was the this patch tested? unit tests passed Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #11268 from zhengruifeng/ccwithmax.	2016-02-20 12:24:10 -08:00
Holden Karau	9ca79c1ece	[SPARK-13302][PYSPARK][TESTS] Move the temp file creation and cleanup outside of the doctests Some of the new doctests in ml/clustering.py have a lot of setup code, move the setup code to the general test init to keep the doctest more example-style looking. In part this is a follow up to https://github.com/apache/spark/pull/10999 Note that the same pattern is followed in regression & recommendation - might as well clean up all three at the same time. Author: Holden Karau <holden@us.ibm.com> Closes #11197 from holdenk/SPARK-13302-cleanup-doctests-in-ml-clustering.	2016-02-20 09:07:19 +00:00
Shixiong Zhu	dfb2ae2f14	[SPARK-13408] [CORE] Ignore errors when it's already reported in JobWaiter ## What changes were proposed in this pull request? `JobWaiter.taskSucceeded` will be called for each task. When `resultHandler` throws an exception, `taskSucceeded` will also throw it for each task. DAGScheduler just catches it and reports it like this: ```Scala try { job.listener.taskSucceeded(rt.outputId, event.result) } catch { case e: Exception => // TODO: Perhaps we want to mark the resultStage as failed? job.listener.jobFailed(new SparkDriverExecutionException(e)) } ``` Therefore `JobWaiter.jobFailed` may be called multiple times. So `JobWaiter.jobFailed` should use `Promise.tryFailure` instead of `Promise.failure` because the latter one doesn't support calling multiple times. ## How was the this patch tested? Jenkins tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11280 from zsxwing/SPARK-13408.	2016-02-19 23:00:08 -08:00
Reynold Xin	6624a588c1	Revert "[SPARK-12567] [SQL] Add aes_{encrypt,decrypt} UDFs" This reverts commit `4f9a664818`.	2016-02-19 22:44:20 -08:00
Kai Jiang	4f9a664818	[SPARK-12567] [SQL] Add aes_{encrypt,decrypt} UDFs Author: Kai Jiang <jiangkai@gmail.com> Closes #10527 from vectorijk/spark-12567.	2016-02-19 22:28:47 -08:00
gatorsmile	ec7a1d6e42	[SPARK-12594] [SQL] Outer Join Elimination by Filter Conditions Conversion of outer joins, if the predicates in filter conditions can restrict the result sets so that all null-supplying rows are eliminated. - `full outer` -> `inner` if both sides have such predicates - `left outer` -> `inner` if the right side has such predicates - `right outer` -> `inner` if the left side has such predicates - `full outer` -> `left outer` if only the left side has such predicates - `full outer` -> `right outer` if only the right side has such predicates If applicable, this can greatly improve the performance, since outer join is much slower than inner join, full outer join is much slower than left/right outer join. The original PR is https://github.com/apache/spark/pull/10542 Author: gatorsmile <gatorsmile@gmail.com> Author: xiaoli <lixiao1983@gmail.com> Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local> Closes #10567 from gatorsmile/outerJoinEliminationByFilterCond.	2016-02-19 22:27:10 -08:00
Josh Rosen	983fa2d620	[SPARK-13407] Guard against garbage-collected accumulators in TaskMetrics.fromAccumulatorUpdates `TaskMetrics.fromAccumulatorUpdates()` can fail if accumulators have been garbage-collected on the driver. To guard against this, this patch introduces `ListenerTaskMetrics`, a subclass of `TaskMetrics` which is used only in `TaskMetrics.fromAccumulatorUpdates()` and which eliminates the need to access the original accumulators on the driver. Author: Josh Rosen <joshrosen@databricks.com> Closes #11276 from JoshRosen/accum-updates-fix.	2016-02-19 15:57:23 -08:00
Sameer Agarwal	091f6a7830	[SPARK-13091][SQL] Rewrite/Propagate constraints for Aliases This PR adds support for rewriting constraints if there are aliases in the query plan. For e.g., if there is a query of form `SELECT a, a AS b`, any constraints on `a` now also apply to `b`. JIRA: https://issues.apache.org/jira/browse/SPARK-13091 cc marmbrus Author: Sameer Agarwal <sameer@databricks.com> Closes #11144 from sameeragarwal/alias.	2016-02-19 14:48:34 -08:00
Hossein	14844118b5	[SPARK-13261][SQL] Expose maxCharactersPerColumn as a user configurable option This patch expose `maxCharactersPerColumn` and `maxColumns` to user in CSV data source. Author: Hossein <hossein@databricks.com> Closes #11147 from falaki/SPARK-13261.	2016-02-19 14:46:56 -08:00
Brandon Bradley	dbb08cdd5a	[SPARK-12966][SQL] ArrayType(DecimalType) support in Postgres JDBC Fixes error `org.postgresql.util.PSQLException: Unable to find server array type for provided name decimal(38,18)`. * Passes scale metadata to JDBC dialect for usage in type conversions. * Removes unused length/scale/precision parameters from `createArrayOf` parameter `typeName` (for writing). * Adds configurable precision and scale to Postgres `DecimalType` (for reading). * Adds a new kind of test that verifies the schema written by `DataFrame.write.jdbc`. Author: Brandon Bradley <bradleytastic@gmail.com> Closes #10928 from blbradley/spark-12966.	2016-02-19 14:43:21 -08:00
Liang-Chi Hsieh	c7c55637bf	[SPARK-13384][SQL] Keep attribute qualifiers after dedup in Analyzer JIRA: https://issues.apache.org/jira/browse/SPARK-13384 ## What changes were proposed in this pull request? When we de-duplicate attributes in Analyzer, we create new attributes. However, we don't keep original qualifiers. Some plans will be failed to analysed. We should keep original qualifiers in new attributes. ## How was the this patch tested? Unit test is added. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #11261 from viirya/keep-attr-qualifiers.	2016-02-19 12:22:22 -08:00
Iulian Dragos	6915cc23b3	[MINOR][DOCS][MESOS] Clarify that Mesos version is a lower bound. ## What changes were proposed in this pull request? Clarify that 0.21 is only a minimum requirement. ## How was the this patch tested? It's a doc change, so no tests. Author: Iulian Dragos <jaguarul@gmail.com> Closes #11271 from dragos/patch-1.	2016-02-19 11:47:36 -08:00
Sean Owen	fb7e21797e	[SPARK-13339][DOCS] Clarify commutative / associative operator requirements for reduce, fold Clarify that reduce functions need to be commutative, and fold functions do not See https://github.com/apache/spark/pull/11091 Author: Sean Owen <sowen@cloudera.com> Closes #11217 from srowen/SPARK-13339.	2016-02-19 10:26:38 +00:00
gatorsmile	c776fce99b	[SPARK-13380][SQL][DOCUMENT] Document Rand(seed) and Randn(seed) Return Indeterministic Results When Data Partitions are not fixed. `rand` and `randn` functions with a `seed` argument are commonly used. Based on the common sense, the results of `rand` and `randn` should be deterministic if the `seed` parameter value is provided. For example, in MS SQL Server, it also has a function `rand`. Regarding the parameter `seed`, the description is like: ```Seed is an integer expression (tinyint, smallint, or int) that gives the seed value. If seed is not specified, the SQL Server Database Engine assigns a seed value at random. For a specified seed value, the result returned is always the same.``` Update: the current implementation is unable to generate deterministic results when the partitions are not fixed. This PR documents this issue in the function descriptions. jkbradley hit an issue and provided an example in the following JIRA: https://issues.apache.org/jira/browse/SPARK-13333 Author: gatorsmile <gatorsmile@gmail.com> Closes #11232 from gatorsmile/randSeed.	2016-02-18 21:19:36 -08:00
Davies Liu	95e1ab223e	[SPARK-13237] [SQL] generated broadcast outer join This PR support codegen for broadcast outer join. In order to reduce the duplicated codes, this PR merge HashJoin and HashOuterJoin together (also BroadcastHashJoin and BroadcastHashOuterJoin). Author: Davies Liu <davies@databricks.com> Closes #11130 from davies/gen_out.	2016-02-18 15:15:06 -08:00
Davies Liu	26f38bb83c	[SPARK-13351][SQL] fix column pruning on Expand Currently, the columns in projects of Expand that are not used by Aggregate are not pruned, this PR fix that. Author: Davies Liu <davies@databricks.com> Closes #11225 from davies/fix_pruning_expand.	2016-02-18 13:07:41 -08:00
Sean Owen	78562535fe	[SPARK-13371][CORE][STRING] TaskSetManager.dequeueSpeculativeTask compares Option and String directly. ## What changes were proposed in this pull request? Fix some comparisons between unequal types that cause IJ warnings and in at least one case a likely bug (TaskSetManager) ## How was the this patch tested? Running Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #11253 from srowen/SPARK-13371.	2016-02-18 12:14:30 -08:00
Reynold Xin	892b2dd6dd	Add github pull request template	2016-02-17 22:14:45 -05:00
Sean Owen	b84404865b	[SPARK-13324][CORE][BUILD] Update plugin, test, example dependencies for 2.x Phase 1: update plugin versions, test dependencies, some example and third-party versions Author: Sean Owen <sowen@cloudera.com> Closes #11206 from srowen/SPARK-13324.	2016-02-17 19:03:29 -08:00
Xiangrui Meng	0088b252bf	[MINOR][MLLIB] fix mllib compile warnings This PR fixes some warnings found by `build/sbt mllib/test:compile`. Author: Xiangrui Meng <meng@databricks.com> Closes #11227 from mengxr/fix-mllib-warnings-201602.	2016-02-17 18:56:19 -08:00
Andrew Or	9451fed52c	[SPARK-13344][TEST] Fix harmless accumulator not found exceptions See [JIRA](https://issues.apache.org/jira/browse/SPARK-13344) for more detail. This was caused by #10835. Author: Andrew Or <andrew@databricks.com> Closes #11222 from andrewor14/fix-test-accum-exceptions.	2016-02-17 16:17:20 -08:00
shijinkui	97ee85daf6	[SPARK-12953][EXAMPLES] RDDRelation writer set overwrite mode https://issues.apache.org/jira/browse/SPARK-12953 fix error when run RDDRelation.main(): "path file:/Users/sjk/pair.parquet already exists" Set DataFrameWriter's mode to SaveMode.Overwrite Author: shijinkui <shijinkui666@163.com> Closes #10864 from shijinkui/set_mode.	2016-02-17 15:08:22 -08:00
jerryshao	1eac380008	[SPARK-13109][BUILD] Fix SBT publishLocal issue Add local ivy repo to the SBT build file to fix this. Scaladoc compile error is fixed. Author: jerryshao <sshao@hortonworks.com> Closes #11001 from jerryshao/SPARK-13109.	2016-02-17 15:05:40 -08:00
Christopher C. Aycock	a7c74d7563	[SPARK-13350][DOCS] Config doc updated to state that PYSPARK_PYTHON's default is "python2.7" Author: Christopher C. Aycock <chris@chrisaycock.com> Closes #11239 from chrisaycock/master.	2016-02-17 11:24:18 -08:00
Takuya UESHIN	04e8afe362	[SPARK-13357][SQL] Use generated projection and ordering for TakeOrderedAndProjectNode `TakeOrderedAndProjectNode` should use generated projection and ordering like other `LocalNode`s. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #11230 from ueshin/issues/SPARK-13357.	2016-02-17 00:21:15 -08:00
Sital Kedia	1e1e31e03d	[SPARK-13279] Remove O(n^2) operation from scheduler. This commit removes an unnecessary duplicate check in addPendingTask that meant that scheduling a task set took time proportional to (# tasks)^2. Author: Sital Kedia <skedia@fb.com> Closes #11175 from sitalkedia/fix_stuck_driver.	2016-02-16 22:27:39 -08:00
junhao	7218c0eba9	[SPARK-11627] Add initial input rate limit for spark streaming backpressure mechanism. https://issues.apache.org/jira/browse/SPARK-11627 Spark Streaming backpressure mechanism has no initial input rate limit, it might cause OOM exception. In the firest batch task ,receivers receive data at the maximum speed they can reach,it might exhaust executors memory resources. Add a initial input rate limit value can make sure the Streaming job execute success in the first batch,then the backpressure mechanism can adjust receiving rate adaptively. Author: junhao <junhao@mogujie.com> Closes #9593 from junhaoMg/junhao-dev.	2016-02-16 19:43:17 -08:00
Josh Rosen	5f37aad48c	[SPARK-13308] ManagedBuffers passed to OneToOneStreamManager need to be freed in non-error cases ManagedBuffers that are passed to `OneToOneStreamManager.registerStream` need to be freed by the manager once it's done using them. However, the current code only frees them in certain error-cases and not during typical operation. This isn't a major problem today, but it will cause memory leaks after we implement better locking / pinning in the BlockManager (see #10705). This patch modifies the relevant network code so that the ManagedBuffers are freed as soon as the messages containing them are processed by the lower-level Netty message sending code. /cc zsxwing for review. Author: Josh Rosen <joshrosen@databricks.com> Closes #11193 from JoshRosen/add-missing-release-calls-in-network-layer.	2016-02-16 12:06:30 -08:00
Marcelo Vanzin	c7d00a24da	[SPARK-13280][STREAMING] Use a better logger name for FileBasedWriteAheadLog. The new logger name is under the org.apache.spark namespace. The detection of the caller name was also enhanced a bit to ignore some common things that show up in the call stack. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11165 from vanzin/SPARK-13280.	2016-02-16 11:25:43 -08:00
Takuya UESHIN	19dc69de79	[SPARK-12976][SQL] Add LazilyGenerateOrdering and use it for RangePartitioner of Exchange. Add `LazilyGenerateOrdering` to support generated ordering for `RangePartitioner` of `Exchange` instead of `InterpretedOrdering`. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #10894 from ueshin/issues/SPARK-12976.	2016-02-16 10:54:44 -08:00
BenFradet	00c72d27bf	[SPARK-12247][ML][DOC] Documentation for spark.ml's ALS and collaborative filtering in general This documents the implementation of ALS in `spark.ml` with example code in scala, java and python. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10411 from BenFradet/SPARK-12247.	2016-02-16 13:03:28 +00:00
Miles Yucht	827ed1c067	Correct SparseVector.parse documentation There's a small typo in the SparseVector.parse docstring (which says that it returns a DenseVector rather than a SparseVector), which seems to be incorrect. Author: Miles Yucht <miles@databricks.com> Closes #11213 from mgyucht/fix-sparsevector-docs.	2016-02-16 13:01:21 +00:00
gatorsmile	fee739f07b	[SPARK-13221] [SQL] Fixing GroupingSets when Aggregate Functions Containing GroupBy Columns Using GroupingSets will generate a wrong result when Aggregate Functions containing GroupBy columns. This PR is to fix it. Since the code changes are very small. Maybe we also can merge it to 1.6 For example, the following query returns a wrong result: ```scala sql("select course, sum(earnings) as sum from courseSales group by course, earnings" + " grouping sets((), (course), (course, earnings))" + " order by course, sum").show() ``` Before the fix, the results are like ``` [null,null] [Java,null] [Java,20000.0] [Java,30000.0] [dotNET,null] [dotNET,5000.0] [dotNET,10000.0] [dotNET,48000.0] ``` After the fix, the results become correct: ``` [null,113000.0] [Java,20000.0] [Java,30000.0] [Java,50000.0] [dotNET,5000.0] [dotNET,10000.0] [dotNET,48000.0] [dotNET,63000.0] ``` UPDATE: This PR also deprecated the external column: GROUPING__ID. Author: gatorsmile <gatorsmile@gmail.com> Closes #11100 from gatorsmile/groupingSets.	2016-02-15 23:16:58 -08:00
Xin Ren	e4675c2402	[SPARK-13018][DOCS] Replace example code in mllib-pmml-model-export.md using include_example Replace example code in mllib-pmml-model-export.md using include_example https://issues.apache.org/jira/browse/SPARK-13018 The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6. Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example. `{% include_example scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala %}` Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala` and pick code blocks marked "example" and replace code block in `{% highlight %}` in the markdown. See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337 Author: Xin Ren <iamshrek@126.com> Closes #11126 from keypointt/SPARK-13018.	2016-02-15 20:17:21 -08:00
seddonm1	cbeb006f23	[SPARK-13097][ML] Binarizer allowing Double AND Vector input types This enhancement extends the existing SparkML Binarizer [SPARK-5891] to allow Vector in addition to the existing Double input column type. A use case for this enhancement is for when a user wants to Binarize many similar feature columns at once using the same threshold value (for example a binary threshold applied to many pixels in an image). This contribution is my original work and I license the work to the project under the project's open source license. viirya mengxr Author: seddonm1 <seddonm1@gmail.com> Closes #10976 from seddonm1/master.	2016-02-15 20:15:27 -08:00
JeremyNixon	adb5483650	[SPARK-13312][MLLIB] Update java train-validation-split example in ml-guide Response to JIRA https://issues.apache.org/jira/browse/SPARK-13312. This contribution is my original work and I license the work to this project. Author: JeremyNixon <jnixon2@gmail.com> Closes #11199 from JeremyNixon/update_train_val_split_example.	2016-02-15 09:25:13 +00:00
Takeshi YAMAMURO	56d49397e0	[SPARK-12995][GRAPHX] Remove deprecate APIs from Pregel Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #10918 from maropu/RemoveDeprecateInPregel.	2016-02-15 09:20:49 +00:00
Josh Rosen	a8bbc4f50e	[SPARK-12503][SPARK-12505] Limit pushdown in UNION ALL and OUTER JOIN This patch adds a new optimizer rule for performing limit pushdown. Limits will now be pushed down in two cases: - If a limit is on top of a `UNION ALL` operator, then a partition-local limit operator will be pushed to each of the union operator's children. - If a limit is on top of an `OUTER JOIN` then a partition-local limit will be pushed to one side of the join. For `LEFT OUTER` and `RIGHT OUTER` joins, the limit will be pushed to the left and right side, respectively. For `FULL OUTER` join, we will only push limits when at most one of the inputs is already limited: if one input is limited we will push a smaller limit on top of it and if neither input is limited then we will limit the input which is estimated to be larger. These optimizations were proposed previously by gatorsmile in #10451 and #10454, but those earlier PRs were closed and deferred for later because at that time Spark's physical `Limit` operator would trigger a full shuffle to perform global limits so there was a chance that pushdowns could actually harm performance by causing additional shuffles/stages. In #7334, we split the `Limit` operator into separate `LocalLimit` and `GlobalLimit` operators, so we can now push down only local limits (which don't require extra shuffles). This patch is based on both of gatorsmile's patches, with changes and simplifications due to partition-local-limiting. When we push down the limit, we still keep the original limit in place, so we need a mechanism to ensure that the optimizer rule doesn't keep pattern-matching once the limit has been pushed down. In order to handle this, this patch adds a `maxRows` method to `SparkPlan` which returns the maximum number of rows that the plan can compute, then defines the pushdown rules to only push limits to children if the children's maxRows are greater than the limit's maxRows. This idea is carried over from #10451; see that patch for additional discussion. Author: Josh Rosen <joshrosen@databricks.com> Closes #11121 from JoshRosen/limit-pushdown-2.	2016-02-14 17:32:21 -08:00
Carson Wang	7cb4d74c98	[SPARK-13185][SQL] Reuse Calendar object in DateTimeUtils.StringToDate method to improve performance The java `Calendar` object is expensive to create. I have a sub query like this `SELECT a, b, c FROM table UV WHERE (datediff(UV.visitDate, '1997-01-01')>=0 AND datediff(UV.visitDate, '2015-01-01')<=0))` The table stores `visitDate` as String type and has 3 billion records. A `Calendar` object is created every time `DateTimeUtils.stringToDate` is called. By reusing the `Calendar` object, I saw about 20 seconds performance improvement for this stage. Author: Carson Wang <carson.wang@intel.com> Closes #11090 from carsonwang/SPARK-13185.	2016-02-14 16:00:20 -08:00
Claes Redestad	22e9723d62	[SPARK-13278][CORE] Launcher fails to start with JDK 9 EA See http://openjdk.java.net/jeps/223 for more information about the JDK 9 version string scheme. Author: Claes Redestad <claes.redestad@gmail.com> Closes #11160 from cl4es/master.	2016-02-14 11:49:37 +00:00
Amit Dev	331293c302	[SPARK-13300][DOCUMENTATION] Added pygments.rb dependancy Looks like pygments.rb gem is also required for jekyll build to work. At least on Ubuntu/RHEL I could not do build without this dependency. So added this to steps. Author: Amit Dev <amitdev@gmail.com> Closes #11180 from amitdev/master.	2016-02-14 11:41:27 +00:00
Reynold Xin	354d4c24be	[SPARK-13296][SQL] Move UserDefinedFunction into sql.expressions. This pull request has the following changes: 1. Moved UserDefinedFunction into expressions package. This is more consistent with how we structure the packages for window functions and UDAFs. 2. Moved UserDefinedPythonFunction into execution.python package, so we don't have a random private class in the top level sql package. 3. Move everything in execution/python.scala into the newly created execution.python package. Most of the diffs are just straight copy-paste. Author: Reynold Xin <rxin@databricks.com> Closes #11181 from rxin/SPARK-13296.	2016-02-13 21:06:31 -08:00
Sean Owen	388cd9ea8d	[SPARK-13172][CORE][SQL] Stop using RichException.getStackTrace it is deprecated Replace `getStackTraceString` with `Utils.exceptionString` Author: Sean Owen <sowen@cloudera.com> Closes #11182 from srowen/SPARK-13172.	2016-02-13 21:05:48 -08:00
Reynold Xin	610196f93a	Closes #11185	2016-02-13 18:03:53 -08:00
Liang-Chi Hsieh	e3441e3f68	[SPARK-12363][MLLIB] Remove setRun and fix PowerIterationClustering failed test JIRA: https://issues.apache.org/jira/browse/SPARK-12363 This issue is pointed by yanboliang. When `setRuns` is removed from PowerIterationClustering, one of the tests will be failed. I found that some `dstAttr`s of the normalized graph are not correct values but 0.0. By setting `TripletFields.All` in `mapTriplets` it can work. Author: Liang-Chi Hsieh <viirya@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #10539 from viirya/fix-poweriter.	2016-02-13 15:56:20 -08:00
markpavey	374c4b2869	[SPARK-13142][WEB UI] Problem accessing Web UI /logPage/ on Microsoft Windows Due to being on a Windows platform I have been unable to run the tests as described in the "Contributing to Spark" instructions. As the change is only to two lines of code in the Web UI, which I have manually built and tested, I am submitting this pull request anyway. I hope this is OK. Is it worth considering also including this fix in any future 1.5.x releases (if any)? I confirm this is my own original work and license it to the Spark project under its open source license. Author: markpavey <mark.pavey@thefilter.com> Closes #11135 from markpavey/JIRA_SPARK-13142_WindowsWebUILogFix.	2016-02-13 08:39:43 +00:00
Davies Liu	2228f074e1	[SPARK-13293][SQL] generate Expand Expand suffer from create the UnsafeRow from same input multiple times, with codegen, it only need to copy some of the columns. After this, we can see 3X improvements (from 43 seconds to 13 seconds) on a TPCDS query (Q67) that have eight columns in Rollup. Ideally, we could mask some of the columns based on bitmask, I'd leave that in the future, because currently Aggregation (50 ns) is much slower than that just copy the variables (1-2 ns). Author: Davies Liu <davies@databricks.com> Closes #11177 from davies/gen_expand.	2016-02-12 17:32:15 -08:00
Michael Gummelt	62b1c07e7e	[SPARK-5095] remove flaky test Overrode the start() method, which was previously starting a thread causing a race condition. I believe this should fix the flaky test. Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #11164 from mgummelt/fix_mesos_tests.	2016-02-12 15:00:39 -08:00
Michael Gummelt	38bc6018e9	[SPARK-5095] Fix style in mesos coarse grained scheduler code andrewor14 This addressed your style comments from #10993 Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #11187 from mgummelt/fix_mesos_style.	2016-02-12 14:57:31 -08:00
vijaykiran	42d656814f	[SPARK-12630][PYSPARK] [DOC] PySpark classification parameter desc to consistent format Part of task for [SPARK-11219](https://issues.apache.org/jira/browse/SPARK-11219) to make PySpark MLlib parameter description formatting consistent. This is for the classification module. Author: vijaykiran <mail@vijaykiran.com> Author: Bryan Cutler <cutlerb@gmail.com> Closes #11183 from BryanCutler/pyspark-consistent-param-classification-SPARK-12630.	2016-02-12 14:24:24 -08:00

... 6 7 8 9 10 ...

15138 commits