ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Davies Liu	0e5ebac3c1	[SPARK-12950] [SQL] Improve lookup of BytesToBytesMap in aggregate This PR improve the lookup of BytesToBytesMap by: 1. Generate code for calculate the hash code of grouping keys. 2. Do not use MemoryLocation, fetch the baseObject and offset for key and value directly (remove the indirection). Author: Davies Liu <davies@databricks.com> Closes #11010 from davies/gen_map.	2016-02-09 16:41:21 -08:00
Shixiong Zhu	fae830d158	[SPARK-13245][CORE] Call shuffleMetrics methods only in one thread for ShuffleBlockFetcherIterator Call shuffleMetrics's incRemoteBytesRead and incRemoteBlocksFetched when polling FetchResult from `results` so as to always use shuffleMetrics in one thread. Also fix a race condition that could cause memory leak. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11138 from zsxwing/SPARK-13245.	2016-02-09 16:31:00 -08:00
Wenchen Fan	7fe4fe630a	[SPARK-12888] [SQL] [FOLLOW-UP] benchmark the new hash expression Adds the benchmark results as comments. The codegen version is slower than the interpreted version for `simple` case becasue of 3 reasons: 1. codegen version use a more complex hash algorithm than interpreted version, i.e. `Murmur3_x86_32.hashInt` vs [simple multiplication and addition](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala#L153). 2. codegen version will write the hash value to a row first and then read it out. I tried to create a `GenerateHasher` that can generate code to return hash value directly and got about 60% speed up for the `simple` case, does it worth? 3. the row in `simple` case only has one int field, so the runtime reflection may be removed because of branch prediction, which makes the interpreted version faster. The `array` case is also slow for similar reasons, e.g. array elements are of same type, so interpreted version can probably get rid of runtime reflection by branch prediction. Author: Wenchen Fan <wenchen@databricks.com> Closes #10917 from cloud-fan/hash-benchmark.	2016-02-09 13:06:36 -08:00
Luciano Resende	2dbb916440	[SPARK-13189] Cleanup build references to Scala 2.10 Author: Luciano Resende <lresende@apache.org> Closes #11092 from lresende/SPARK-13189.	2016-02-09 11:56:25 -08:00
Steve Loughran	34d0b70b30	[SPARK-12807][YARN] Spark External Shuffle not working in Hadoop clusters with Jackson 2.2.3 Patch to 1. Shade jackson 2.x in spark-yarn-shuffle JAR: core, databind, annotation 2. Use maven antrun to verify the JAR has the renamed classes Being Maven-based, I don't know if the verification phase kicks in on an SBT/jenkins build. It will on a `mvn install` Author: Steve Loughran <stevel@hortonworks.com> Closes #10780 from steveloughran/stevel/patches/SPARK-12807-master-shuffle.	2016-02-09 11:01:47 -08:00
Sean Owen	68ed3632c5	[SPARK-13170][STREAMING] Investigate replacing SynchronizedQueue as it is deprecated Replace SynchronizeQueue with synchronized access to a Queue Author: Sean Owen <sowen@cloudera.com> Closes #11111 from srowen/SPARK-13170.	2016-02-09 11:23:29 +00:00
Iulian Dragos	e30121afac	[SPARK-13086][SHELL] Use the Scala REPL settings, to enable things like `-i file`. Now: ``` $ bin/spark-shell -i test.scala NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). 16/01/29 17:37:38 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/01/29 17:37:39 INFO Main: Created spark context.. Spark context available as sc (master = local[*], app id = local-1454085459000). 16/01/29 17:37:39 INFO Main: Created sql context.. SQL context available as sqlContext. Loading test.scala... hello Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT /_/ Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45) Type in expressions to have them evaluated. Type :help for more information. ``` Author: Iulian Dragos <jaguarul@gmail.com> Closes #10984 from dragos/issue/repl-eval-file.	2016-02-09 09:05:22 +00:00
sachin aggarwal	d9ba4d27f4	[SPARK-13177][EXAMPLES] Update ActorWordCount example to not directly use low level linked list as it is deprecated. Author: sachin aggarwal <different.sachin@gmail.com> Closes #11113 from agsachin/master.	2016-02-09 08:52:58 +00:00
Sebastián Ramírez	c882ec57de	[SPARK-13040][DOCS] Update JDBC deprecated SPARK_CLASSPATH documentation Update JDBC documentation based on http://stackoverflow.com/a/30947090/219530 as SPARK_CLASSPATH is deprecated. Also, that's how it worked, it didn't work with the SPARK_CLASSPATH or the --jars alone. This would solve issue: https://issues.apache.org/jira/browse/SPARK-13040 Author: Sebastián Ramírez <tiangolo@gmail.com> Closes #10948 from tiangolo/patch-docs-jdbc.	2016-02-09 08:49:34 +00:00
Holden Karau	ce83fe9756	[SPARK-13201][SPARK-13200] Deprecation warning cleanups: KMeans & MFDataGenerator KMeans: Make a private non-deprecated version of setRuns API so that we can call it from the PythonAPI without deprecation warnings in our own build. Also use it internally when being called from train. Add a logWarning for non-1 values MFDataGenerator: Apparently we are calling round on an integer which now in Scala 2.11 results in a warning (it didn't make any sense before either). Figure out if this is a mistake we can just remove or if we got the types wrong somewhere. I put these two together since they are both deprecation fixes in MLlib and pretty small, but I can split them up if we would prefer it that way. Author: Holden Karau <holden@us.ibm.com> Closes #11112 from holdenk/SPARK-13201-non-deprecated-setRuns-SPARK-mathround-integer.	2016-02-09 08:47:28 +00:00
Holden Karau	159198eff6	[SPARK-13165][STREAMING] Replace deprecated synchronizedBuffer in streaming Building with Scala 2.11 results in the warning trait SynchronizedBuffer in package mutable is deprecated: Synchronization via traits is deprecated as it is inherently unreliable. Consider java.util.concurrent.ConcurrentLinkedQueue as an alternative - we already use ConcurrentLinkedQueue elsewhere so lets replace it. Some notes about how behaviour is different for reviewers: The Seq from a SynchronizedBuffer that was implicitly converted would continue to receive updates - however when we do the same conversion explicitly on the ConcurrentLinkedQueue this isn't the case. Hence changing some of the (internal & test) APIs to pass an Iterable. toSeq is safe to use if there are no more updates. Author: Holden Karau <holden@us.ibm.com> Author: tedyu <yuzhihong@gmail.com> Closes #11067 from holdenk/SPARK-13165-replace-deprecated-synchronizedBuffer-in-streaming.	2016-02-09 08:44:56 +00:00
Jakob Odersky	f9307d8fc5	[SPARK-13176][CORE] Use native file linking instead of external process ln Since Spark requires at least JRE 1.7, it is safe to use built-in java.nio.Files. Author: Jakob Odersky <jakob@odersky.com> Closes #11098 from jodersky/SPARK-13176.	2016-02-09 08:43:46 +00:00
Nong Li	3708d13f1a	[SPARK-12992] [SQL] Support vectorized decoding in UnsafeRowParquetRecordReader. WIP: running tests. Code needs a bit of clean up. This patch completes the vectorized decoding with the goal of passing the existing tests. There is still more patches to support the rest of the format spec, even just for flat schemas. This patch adds a new flag to enable the vectorized decoding. Tests were updated to try with both modes where applicable. Once this is working well, we can remove the previous code path. Author: Nong Li <nong@databricks.com> Closes #11055 from nongli/spark-12992-2.	2016-02-08 22:21:26 -08:00
Andrew Or	eeaf45b926	[SPARK-10620][SPARK-13054] Minor addendum to #10835 Additional changes to #10835, mainly related to style and visibility. This patch also adds back a few deprecated methods for backward compatibility. Author: Andrew Or <andrew@databricks.com> Closes #10958 from andrewor14/task-metrics-to-accums-followups.	2016-02-08 17:23:33 -08:00
Davies Liu	ff0af0ddfa	[SPARK-13095] [SQL] improve performance for broadcast join with dimension table This PR improve the performance for Broadcast join with dimension tables, which is common in data warehouse. If the join key can fit in a long, we will use a special api `get(Long)` to get the rows from HashedRelation. If the HashedRelation only have unique keys, we will use a special api `getValue(Long)` or `getValue(InternalRow)`. If the keys can fit within a long, also the keys are dense, we will use a array of UnsafeRow, instead a hash map. TODO: will do cleanup Author: Davies Liu <davies@databricks.com> Closes #11065 from davies/gen_dim.	2016-02-08 14:09:14 -08:00
Davies Liu	37bc203c8d	[SPARK-13210][SQL] catch OOM when allocate memory and expand array There is a bug when we try to grow the buffer, OOM is ignore wrongly (the assert also skipped by JVM), then we try grow the array again, this one will trigger spilling free the current page, the current record we inserted will be invalid. The root cause is that JVM has less free memory than MemoryManager thought, it will OOM when allocate a page without trigger spilling. We should catch the OOM, and acquire memory again to trigger spilling. And also, we could not grow the array in `insertRecord` of `InMemorySorter` (it was there just for easy testing). Author: Davies Liu <davies@databricks.com> Closes #11095 from davies/fix_expand.	2016-02-08 12:09:20 -08:00
Wenchen Fan	8e4d15f707	[SPARK-13101][SQL] nullability of array type element should not fail analysis of encoder nullability should only be considered as an optimization rather than part of the type system, so instead of failing analysis for mismatch nullability, we should pass analysis and add runtime null check. Author: Wenchen Fan <wenchen@databricks.com> Closes #11035 from cloud-fan/ignore-nullability.	2016-02-08 12:06:00 -08:00
Josh Rosen	06f0df6df2	[SPARK-8964] [SQL] Use Exchange to perform shuffle in Limit This patch changes the implementation of the physical `Limit` operator so that it relies on the `Exchange` operator to perform data movement rather than directly using `ShuffledRDD`. In addition to improving efficiency, this lays the necessary groundwork for further optimization of limit, such as limit pushdown or whole-stage codegen. At a high-level, this replaces the old physical `Limit` operator with two new operators, `LocalLimit` and `GlobalLimit`. `LocalLimit` performs per-partition limits, while `GlobalLimit` applies the final limit to a single partition; `GlobalLimit`'s declares that its `requiredInputDistribution` is `SinglePartition`, which will cause the planner to use an `Exchange` to perform the appropriate shuffles. Thus, a logical `Limit` appearing in the middle of a query plan will be expanded into `LocalLimit -> Exchange to one partition -> GlobalLimit`. In the old code, calling `someDataFrame.limit(100).collect()` or `someDataFrame.take(100)` would actually skip the shuffle and use a fast-path which used `executeTake()` in order to avoid computing all partitions in case only a small number of rows were requested. This patch preserves this optimization by treating logical `Limit` operators specially when they appear as the terminal operator in a query plan: if a `Limit` is the final operator, then we will plan a special `CollectLimit` physical operator which implements the old `take()`-based logic. In order to be able to match on operators only at the root of the query plan, this patch introduces a special `ReturnAnswer` logical operator which functions similar to `BroadcastHint`: this dummy operator is inserted at the root of the optimized logical plan before invoking the physical planner, allowing the planner to pattern-match on it. Author: Josh Rosen <joshrosen@databricks.com> Closes #7334 from JoshRosen/remove-copy-in-limit.	2016-02-08 11:38:21 -08:00
Nam Pham	edf4a0e62e	[SPARK-12986][DOC] Fix pydoc warnings in mllib/regression.py I have fixed the warnings by running "make html" under "python/docs/". They are caused by not having blank lines around indented paragraphs. Author: Nam Pham <phamducnam@gmail.com> Closes #11025 from nampham2/SPARK-12986.	2016-02-08 11:06:41 -08:00
cody koeninger	140ddef373	[SPARK-10963][STREAMING][KAFKA] make KafkaCluster public Author: cody koeninger <cody@koeninger.org> Closes #9007 from koeninger/SPARK-10963.	2016-02-07 12:52:00 +00:00
Gary King	bc8890b357	[SPARK-13132][MLLIB] cache standardization param value in LogisticRegression cache the value of the standardization Param in LogisticRegression, rather than re-fetching it from the ParamMap for every index and every optimization step in the quasi-newton optimizer also, fix Param#toString to cache the stringified representation, rather than re-interpolating it on every call, so any other implementations that have similar repeated access patterns will see a benefit. this change improves training times for one of my test sets from ~7m30s to ~4m30s Author: Gary King <gary@idibon.com> Closes #11027 from idigary/spark-13132-optimize-logistic-regression.	2016-02-07 09:13:28 +00:00
Tommy YU	81da3bee66	[SPARK-5865][API DOC] Add doc warnings for methods that return local data structures rxin srowen I work out note message for rdd.take function, please help to review. If it's fine, I can apply to all other function later. Author: Tommy YU <tummyyu@163.com> Closes #10874 from Wenpei/spark-5865-add-warning-for-localdatastructure.	2016-02-06 17:29:09 +00:00
Davies Liu	4f28291f85	[HOTFIX] fix float part of avgRate	2016-02-05 22:40:40 -08:00
Jakob Odersky	6883a5120c	[SPARK-13171][CORE] Replace future calls with Future Trivial search-and-replace to eliminate deprecation warnings in Scala 2.11. Also works with 2.10 Author: Jakob Odersky <jakob@odersky.com> Closes #11085 from jodersky/SPARK-13171.	2016-02-05 19:00:12 -08:00
Davies Liu	875f507929	[SPARK-13215] [SQL] remove fallback in codegen Since we remove the configuration for codegen, we are heavily reply on codegen (also TungstenAggregate require the generated MutableProjection to update UnsafeRow), should remove the fallback, which could make user confusing, see the discussion in SPARK-13116. Author: Davies Liu <davies@databricks.com> Closes #11097 from davies/remove_fallback.	2016-02-05 15:07:43 -08:00
Luc Bourlier	0bb5b73387	[SPARK-13002][MESOS] Send initial request of executors for dyn allocation Fix for [SPARK-13002](https://issues.apache.org/jira/browse/SPARK-13002) about the initial number of executors when running with dynamic allocation on Mesos. Instead of fixing it just for the Mesos case, made the change in `ExecutorAllocationManager`. It is already driving the number of executors running on Mesos, only no the initial value. The `None` and `Some(0)` are internal details on the computation of resources to reserved, in the Mesos backend scheduler. `executorLimitOption` has to be initialized correctly, otherwise the Mesos backend scheduler will, either, create to many executors at launch, or not create any executors and not be able to recover from this state. Removed the 'special case' description in the doc. It was not totally accurate, and is not needed anymore. This doesn't fix the same problem visible with Spark standalone. There is no straightforward way to send the initial value in standalone mode. Somebody knowing this part of the yarn support should review this change. Author: Luc Bourlier <luc.bourlier@typesafe.com> Closes #11047 from skyluc/issue/initial-dyn-alloc-2.	2016-02-05 14:37:42 -08:00
Bill Chambers	66e1383de2	[SPARK-13214][DOCS] update dynamicAllocation documentation Author: Bill Chambers <bill@databricks.com> Closes #11094 from anabranch/dynamic-docs.	2016-02-05 14:35:39 -08:00
Wenchen Fan	1ed354a536	[SPARK-12939][SQL] migrate encoder resolution logic to Analyzer https://issues.apache.org/jira/browse/SPARK-12939 Now we will catch `ObjectOperator` in `Analyzer` and resolve the `fromRowExpression/deserializer` inside it. Also update the `MapGroups` and `CoGroup` to pass in `dataAttributes`, so that we can correctly resolve value deserializer(the `child.output` contains both groupking key and values, which may mess things up if they have same-name attribtues). End-to-end tests are added. follow-ups: * remove encoders from typed aggregate expression. * completely remove resolve/bind in `ExpressionEncoder` Author: Wenchen Fan <wenchen@databricks.com> Closes #10852 from cloud-fan/bug.	2016-02-05 14:34:12 -08:00
Shixiong Zhu	7b73f1719c	[SPARK-13166][SQL] Rename DataStreamReaderWriterSuite to DataFrameReaderWriterSuite A follow up PR for #11062 because it didn't rename the test suite. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11096 from zsxwing/rename.	2016-02-05 13:44:34 -08:00
Reynold Xin	82d84ff2dd	[SPARK-13187][SQL] Add boolean/long/double options in DataFrameReader/Writer This patch adds option function for boolean, long, and double types. This makes it slightly easier for Spark users to specify options without turning them into strings. Using the JSON data source as an example. Before this patch: ```scala sqlContext.read.option("primitivesAsString", "true").json("/path/to/json") ``` After this patch: Before this patch: ```scala sqlContext.read.option("primitivesAsString", true).json("/path/to/json") ``` Author: Reynold Xin <rxin@databricks.com> Closes #11072 from rxin/SPARK-13187.	2016-02-04 22:43:44 -08:00
Jakob Odersky	352102ed0b	[SPARK-13208][CORE] Replace use of Pairs with Tuple2s Another trivial deprecation fix for Scala 2.11 Author: Jakob Odersky <jakob@odersky.com> Closes #11089 from jodersky/SPARK-13208.	2016-02-04 22:22:41 -08:00
gatorsmile	e3c75c6398	[SPARK-12850][SQL] Support Bucket Pruning (Predicate Pushdown for Bucketed Tables) JIRA: https://issues.apache.org/jira/browse/SPARK-12850 This PR is to support bucket pruning when the predicates are `EqualTo`, `EqualNullSafe`, `IsNull`, `In`, and `InSet`. Like HIVE, in this PR, the bucket pruning works when the bucketing key has one and only one column. So far, I do not find a way to verify how many buckets are actually scanned. However, I did verify it when doing the debug. Could you provide a suggestion how to do it properly? Thank you! cloud-fan yhuai rxin marmbrus BTW, we can add more cases to support complex predicate including `Or` and `And`. Please let me know if I should do it in this PR. Maybe we also need to add test cases to verify if bucket pruning works well for each data type. Author: gatorsmile <gatorsmile@gmail.com> Closes #10942 from gatorsmile/pruningBuckets.	2016-02-04 18:37:58 -08:00
Raafat Akkad	6dbfc40776	[SPARK-13052] waitingApps metric doesn't show the number of apps currently in the WAITING state Author: Raafat Akkad <raafat.akkad@gmail.com> Closes #10959 from RaafatAkkad/master.	2016-02-04 16:09:31 -08:00
Andrew Or	7a4b37f02c	[HOTFIX] Fix style violation caused by `c756bda`	2016-02-04 12:47:32 -08:00
Shixiong Zhu	8e2f296306	[SPARK-13195][STREAMING] Fix NoSuchElementException when a state is not set but timeoutThreshold is defined Check the state Existence before calling get. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11081 from zsxwing/SPARK-13195.	2016-02-04 12:43:16 -08:00
Andrew Or	bd38dd6f75	[SPARK-13079][SQL] InMemoryCatalog follow-ups This patch incorporates review feedback from #11069, which is already merged. Author: Andrew Or <andrew@databricks.com> Closes #11080 from andrewor14/catalog-follow-ups.	2016-02-04 12:20:18 -08:00
Andrew Or	c756bda477	[SPARK-12330][MESOS][HOTFIX] Rename timeout config The config already describes time and accepts a general format that is not restricted to ms. This commit renames the internal config to use a format that's consistent in Spark.	2016-02-04 12:04:54 -08:00
Josh Rosen	33212cb9a1	[SPARK-13168][SQL] Collapse adjacent repartition operators Spark SQL should collapse adjacent `Repartition` operators and only keep the last one. Author: Josh Rosen <joshrosen@databricks.com> Closes #11064 from JoshRosen/collapse-repartition.	2016-02-04 11:08:50 -08:00
Andrew Or	085f510ae5	MAINTENANCE: Automated closing of pull requests. This commit exists to close the following pull requests on Github: Closes #7971 (requested by yhuai) Closes #8539 (requested by srowen) Closes #8746 (requested by yhuai) Closes #9288 (requested by andrewor14) Closes #9321 (requested by andrewor14) Closes #9935 (requested by JoshRosen) Closes #10442 (requested by andrewor14) Closes #10585 (requested by srowen) Closes #10785 (requested by srowen) Closes #10832 (requested by andrewor14) Closes #10941 (requested by marmbrus) Closes #11024 (requested by andrewor14)	2016-02-04 11:07:06 -08:00
Andrew Or	15205da817	[SPARK-13053][TEST] Unignore tests in InternalAccumulatorSuite These were ignored because they are incorrectly written; they don't actually trigger stage retries, which is what the tests are testing. These tests are now rewritten to induce stage retries through fetch failures. Note: there were 2 tests before and now there's only 1. What happened? It turns out that the case where we only resubmit a subset of of the original missing partitions is very difficult to simulate in tests without potentially introducing flakiness. This is because the `DAGScheduler` removes all map outputs associated with a given executor when this happens, and we will need multiple executors to trigger this case, and sometimes the scheduler still removes map outputs from all executors. Author: Andrew Or <andrew@databricks.com> Closes #10969 from andrewor14/unignore-accum-test.	2016-02-04 10:34:43 -08:00
Andrew Or	4120bcbaff	[SPARK-13162] Standalone mode does not respect initial executors Currently the Master would always set an application's initial executor limit to infinity. If the user specified `spark.dynamicAllocation.initialExecutors`, the config would not take effect. This is similar to #11047 but for standalone mode. Author: Andrew Or <andrew@databricks.com> Closes #11054 from andrewor14/standalone-da-initial.	2016-02-04 10:32:32 -08:00
Holden Karau	62a7c28388	[SPARK-13164][CORE] Replace deprecated synchronized buffer in core Building with scala 2.11 results in the warning trait SynchronizedBuffer in package mutable is deprecated: Synchronization via traits is deprecated as it is inherently unreliable. Consider java.util.concurrent.ConcurrentLinkedQueue as an alternative. Investigation shows we are already using ConcurrentLinkedQueue in other locations so switch our uses of SynchronizedBuffer to ConcurrentLinkedQueue. Author: Holden Karau <holden@us.ibm.com> Closes #11059 from holdenk/SPARK-13164-replace-deprecated-synchronized-buffer-in-core.	2016-02-04 10:29:38 -08:00
Charles Allen	2eaeafe8a2	[SPARK-12330][MESOS] Fix mesos coarse mode cleanup In the current implementation the mesos coarse scheduler does not wait for the mesos tasks to complete before ending the driver. This causes a race where the task has to finish cleaning up before the mesos driver terminates it with a SIGINT (and SIGKILL after 3 seconds if the SIGINT doesn't work). This PR causes the mesos coarse scheduler to wait for the mesos tasks to finish (with a timeout defined by `spark.mesos.coarse.shutdown.ms`) This PR also fixes a regression caused by [SPARK-10987] whereby submitting a shutdown causes a race between the local shutdown procedure and the notification of the scheduler driver disconnection. If the scheduler driver disconnection wins the race, the coarse executor incorrectly exits with status 1 (instead of the proper status 0) With this patch the mesos coarse scheduler terminates properly, the executors clean up, and the tasks are reported as `FINISHED` in the Mesos console (as opposed to `KILLED` in < 1.6 or `FAILED` in 1.6 and later) Author: Charles Allen <charles@allen-net.com> Closes #10319 from drcrallen/SPARK-12330.	2016-02-04 10:27:25 -08:00
Reynold Xin	dee801adb7	[SPARK-12828][SQL] Natural join follow-up This is a small addendum to #10762 to make the code more robust again future changes. Author: Reynold Xin <rxin@databricks.com> Closes #11070 from rxin/SPARK-12828-natural-join.	2016-02-03 23:43:48 -08:00
Liang-Chi Hsieh	d39087147f	[SPARK-13113] [CORE] Remove unnecessary bit operation when decoding page number JIRA: https://issues.apache.org/jira/browse/SPARK-13113 As we shift bits right, looks like the bitwise AND operation is unnecessary. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #11002 from viirya/improve-decodepagenumber.	2016-02-03 23:17:51 -08:00
Yuhao Yang	c2c956bcd1	[ML][DOC] fix wrong api link in ml onevsrest minor fix for api link in ml onevsrest Author: Yuhao Yang <hhbyyh@gmail.com> Closes #11068 from hhbyyh/onevsrestDoc.	2016-02-03 21:19:44 -08:00
Daoyuan Wang	0f81318ae2	[SPARK-12828][SQL] add natural join support Jira: https://issues.apache.org/jira/browse/SPARK-12828 Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #10762 from adrian-wang/naturaljoin.	2016-02-03 21:05:53 -08:00
Andrew Or	a64831124c	[SPARK-13079][SQL] Extend and implement InMemoryCatalog This is a step towards consolidating `SQLContext` and `HiveContext`. This patch extends the existing Catalog API added in #10982 to include methods for handling table partitions. In particular, a partition is identified by `PartitionSpec`, which is just a `Map[String, String]`. The Catalog is still not used by anything yet, but its API is now more or less complete and an implementation is fully tested. About 200 lines are test code. Author: Andrew Or <andrew@databricks.com> Closes #11069 from andrewor14/catalog.	2016-02-03 19:32:41 -08:00
Holden Karau	a8e2ba776b	[SPARK-13152][CORE] Fix task metrics deprecation warning Make an internal non-deprecated version of incBytesRead and incRecordsRead so we don't have unecessary deprecation warnings in our build. Right now incBytesRead and incRecordsRead are marked as deprecated and for internal use only. We should make private[spark] versions which are not deprecated and switch to those internally so as to not clutter up the warning messages when building. cc andrewor14 who did the initial deprecation Author: Holden Karau <holden@us.ibm.com> Closes #11056 from holdenk/SPARK-13152-fix-task-metrics-deprecation-warnings.	2016-02-03 17:43:14 -08:00
Davies Liu	de0914522f	[SPARK-13131] [SQL] Use best and average time in benchmark Best time is stabler than average time, also added a column for nano seconds per row (which could be used to estimate contributions of each components in a query). Having best time and average time together for more information (we can see kind of variance). rate, time per row and relative are all calculated using best time. The result looks like this: ``` Intel(R) Core(TM) i7-4558U CPU 2.80GHz rang/filter/sum: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------- rang/filter/sum codegen=false 14332 / 16646 36.0 27.8 1.0X rang/filter/sum codegen=true 845 / 940 620.0 1.6 17.0X ``` Author: Davies Liu <davies@databricks.com> Closes #11018 from davies/gen_bench.	2016-02-03 17:07:27 -08:00

... 8 9 10 11 12 ...

15138 commits