ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Cheng Hao	770d8153a5	[SPARK-4375] [SQL] Add 0 argument support for udf Author: Cheng Hao <hao.cheng@intel.com> Closes #3595 from chenghao-intel/udf0 and squashes the following commits: a858973 [Cheng Hao] Add 0 arguments support for udf	2014-12-16 21:21:11 -08:00
Takuya UESHIN	ddc7ba31cb	[SPARK-4720][SQL] Remainder should also return null if the divider is 0. This is a follow-up of SPARK-4593 (#3443). Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #3581 from ueshin/issues/SPARK-4720 and squashes the following commits: `c3959d4` [Takuya UESHIN] Make Remainder return null if the divider is 0.	2014-12-16 21:19:57 -08:00
Cheng Hao	0aa834adea	[SPARK-4744] [SQL] Short circuit evaluation for AND & OR in CodeGen Author: Cheng Hao <hao.cheng@intel.com> Closes #3606 from chenghao-intel/codegen_short_circuit and squashes the following commits: f466303 [Cheng Hao] short circuit for AND & OR	2014-12-16 21:18:39 -08:00
Cheng Lian	3b395e1051	[SPARK-4798][SQL] A new set of Parquet testing API and test suites This PR provides a set Parquet testing API (see trait `ParquetTest`) that enables developers to write more concise test cases. A new set of Parquet test suites built upon this API are added and aim to replace the old `ParquetQuerySuite`. To avoid potential merge conflicts, old testing code are not removed yet. The following classes can be safely removed after most Parquet related PRs are handled: - `ParquetQuerySuite` - `ParquetTestData` <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3644) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3644 from liancheng/parquet-tests and squashes the following commits: 800e745 [Cheng Lian] Enforces ordering of test output 3bb8731 [Cheng Lian] Refactors HiveParquetSuite aa2cb2e [Cheng Lian] Decouples ParquetTest and TestSQLContext 7b43a68 [Cheng Lian] Updates ParquetTest Scaladoc 7f07af0 [Cheng Lian] Adds a new set of Parquet test suites	2014-12-16 21:16:03 -08:00
Jacky Li	fa66ef6c97	[SPARK-4269][SQL] make wait time configurable in BroadcastHashJoin In BroadcastHashJoin, currently it is using a hard coded value (5 minutes) to wait for the execution and broadcast of the small table. In my opinion, it should be a configurable value since broadcast may exceed 5 minutes in some case, like in a busy/congested network environment. Author: Jacky Li <jacky.likun@huawei.com> Closes #3133 from jackylk/timeout-config and squashes the following commits: 733ac08 [Jacky Li] add spark.sql.broadcastTimeout in SQLConf.scala 557acd4 [Jacky Li] switch to sqlContext.getConf 81a5e20 [Jacky Li] make wait time configurable in BroadcastHashJoin	2014-12-16 15:34:59 -08:00
Michael Armbrust	a66c23e134	[SPARK-4827][SQL] Fix resolution of deeply nested Project(attr, Project(Star,...)). Since `AttributeReference` resolution and `*` expansion are currently in separate rules, each pair requires a full iteration instead of being able to resolve in a single pass. Since its pretty easy to construct queries that have many of these in a row, I combine them into a single rule in this PR. Author: Michael Armbrust <michael@databricks.com> Closes #3674 from marmbrus/projectStars and squashes the following commits: d83d6a1 [Michael Armbrust] Fix resolution of deeply nested Project(attr, Project(Star,...)).	2014-12-16 15:31:19 -08:00
tianyi	30f6b85c81	[SPARK-4483][SQL]Optimization about reduce memory costs during the HashOuterJoin In `HashOuterJoin.scala`, spark read data from both side of join operation before zip them together. It is a waste for memory. We are trying to read data from only one side, put them into a hashmap, and then generate the `JoinedRow` with data from other side one by one. Currently, we could only do this optimization for `left outer join` and `right outer join`. For `full outer join`, we will do something in another issue. for table test_csv contains 1 million records table dim_csv contains 10 thousand records SQL: `select * from test_csv a left outer join dim_csv b on a.key = b.key` the result is: master: ``` CSV: 12671 ms CSV: 9021 ms CSV: 9200 ms Current Mem Usage:787788984 ``` after patch： ``` CSV: 10382 ms CSV: 7543 ms CSV: 7469 ms Current Mem Usage:208145728 ``` Author: tianyi <tianyi@asiainfo-linkage.com> Author: tianyi <tianyi.asiainfo@gmail.com> Closes #3375 from tianyi/SPARK-4483 and squashes the following commits: 72a8aec [tianyi] avoid having mutable state stored inside of the task 99c5c97 [tianyi] performance optimization d2f94d7 [tianyi] fix bug: missing output when the join-key is null. 2be45d1 [tianyi] fix spell bug 1f2c6f1 [tianyi] remove commented codes a676de6 [tianyi] optimize some codes 9e7d5b5 [tianyi] remove commented old codes 838707d [tianyi] Optimization about reduce memory costs during the HashOuterJoin	2014-12-16 15:22:29 -08:00
wangxiaojing	ea1315e3e2	[SPARK-4527][SQl]Add BroadcastNestedLoopJoin operator selection testsuite In `JoinSuite` add BroadcastNestedLoopJoin operator selection testsuite Author: wangxiaojing <u9jing@gmail.com> Closes #3395 from wangxiaojing/SPARK-4527 and squashes the following commits: ea0e495 [wangxiaojing] change style 53c3952 [wangxiaojing] Add BroadcastNestedLoopJoin operator selection testsuite	2014-12-16 14:45:56 -08:00
zsxwing	6530243a52	[SPARK-4812][SQL] Fix the initialization issue of 'codegenEnabled' The problem is `codegenEnabled` is `val`, but it uses a `val` `sqlContext`, which can be override by subclasses. Here is a simple example to show this issue. ```Scala scala> :paste // Entering paste mode (ctrl-D to finish) abstract class Foo { protected val sqlContext = "Foo" val codegenEnabled: Boolean = { println(sqlContext) // it will call subclass's `sqlContext` which has not yet been initialized. if (sqlContext != null) { true } else { false } } } class Bar extends Foo { override val sqlContext = "Bar" } println(new Bar().codegenEnabled) // Exiting paste mode, now interpreting. null false defined class Foo defined class Bar ``` We should make `sqlContext` `final` to prevent subclasses from overriding it incorrectly. Author: zsxwing <zsxwing@gmail.com> Closes #3660 from zsxwing/SPARK-4812 and squashes the following commits: 1cbb623 [zsxwing] Make `sqlContext` final to prevent subclasses from overriding it incorrectly	2014-12-16 14:13:40 -08:00
jerryshao	dc8280dcca	[SPARK-4847][SQL]Fix "extraStrategies cannot take effect in SQLContext" issue Author: jerryshao <saisai.shao@intel.com> Closes #3698 from jerryshao/SPARK-4847 and squashes the following commits: 4741130 [jerryshao] Make later added extraStrategies effect when calling strategies	2014-12-16 14:08:28 -08:00
Judy Nash	17688d1429	[SQL] SPARK-4700: Add HTTP protocol spark thrift server Add HTTP protocol support and test cases to spark thrift server, so users can deploy thrift server in both TCP and http mode. Author: Judy Nash <judynash@microsoft.com> Author: judynash <judynash@microsoft.com> Closes #3672 from judynash/master and squashes the following commits: 526315d [Judy Nash] correct spacing on startThriftServer method 31a6520 [Judy Nash] fix code style issues and update sql programming guide format issue 47bf87e [Judy Nash] modify withJdbcStatement method definition to meet less than 100 line length 2e9c11c [Judy Nash] add thrift server in http mode documentation on sql programming guide 1cbd305 [Judy Nash] Merge remote-tracking branch 'upstream/master' 2b1d312 [Judy Nash] updated http thrift server support based on feedback 377532c [judynash] add HTTP protocol spark thrift server	2014-12-16 12:37:26 -08:00
Sean Owen	81112e4b57	SPARK-4814 [CORE] Enable assertions in SBT, Maven tests / AssertionError from Hive's LazyBinaryInteger This enables assertions for the Maven and SBT build, but overrides the Hive module to not enable assertions. Author: Sean Owen <sowen@cloudera.com> Closes #3692 from srowen/SPARK-4814 and squashes the following commits: caca704 [Sean Owen] Disable assertions just for Hive f71e783 [Sean Owen] Enable assertions for SBT and Maven build	2014-12-15 17:12:05 -08:00
Daoyuan Wang	41a3f93438	[SPARK-4829] [SQL] add rule to fold count(expr) if expr is not null Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3676 from adrian-wang/countexpr and squashes the following commits: dc5765b [Daoyuan Wang] add rule to fold count(expr) if expr is not null	2014-12-11 22:56:42 -08:00
Sasaki Toru	8091dd62ea	[SPARK-4742][SQL] The name of Parquet File generated by AppendingParquetOutputFormat should be zero padded When I use Parquet File as a output file using ParquetOutputFormat#getDefaultWorkFile, the file name is not zero padded while RDD#saveAsText does zero padding. Author: Sasaki Toru <sasakitoa@nttdata.co.jp> Closes #3602 from sasakitoa/parquet-zeroPadding and squashes the following commits: 6b0e58f [Sasaki Toru] Merge branch 'master' of git://github.com/apache/spark into parquet-zeroPadding 20dc79d [Sasaki Toru] Fixed the name of Parquet File generated by AppendingParquetOutputFormat	2014-12-11 22:54:21 -08:00
Cheng Hao	0abbff2862	[SPARK-4825] [SQL] CTAS fails to resolve when created using saveAsTable Fix bug when query like: ``` test("save join to table") { val testData = sparkContext.parallelize(1 to 10).map(i => TestData(i, i.toString)) sql("CREATE TABLE test1 (key INT, value STRING)") testData.insertInto("test1") sql("CREATE TABLE test2 (key INT, value STRING)") testData.insertInto("test2") testData.insertInto("test2") sql("SELECT COUNT(a.value) FROM test1 a JOIN test2 b ON a.key = b.key").saveAsTable("test") checkAnswer( table("test"), sql("SELECT COUNT(a.value) FROM test1 a JOIN test2 b ON a.key = b.key").collect().toSeq) } ``` Author: Cheng Hao <hao.cheng@intel.com> Closes #3673 from chenghao-intel/spark_4825 and squashes the following commits: e8cbd56 [Cheng Hao] alternate the pattern matching order for logical plan:CTAS e004895 [Cheng Hao] fix bug	2014-12-11 22:51:49 -08:00
Daoyuan Wang	cbb634ae69	[SQL] enable empty aggr test case This is fixed by SPARK-4318 #3184 Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3445 from adrian-wang/emptyaggr and squashes the following commits: 982575e [Daoyuan Wang] enable empty aggr test case	2014-12-11 22:50:18 -08:00
Daoyuan Wang	acb3be6bc5	[SPARK-4828] [SQL] sum and avg on empty table should always return null So the optimizations are not valid. Also I think the optimization here is rarely encounter, so removing them will not have influence on performance. Can we merge #3445 before I add a comparison test case from this? Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3675 from adrian-wang/sumempty and squashes the following commits: 42df763 [Daoyuan Wang] sum and avg on empty table should always return null	2014-12-11 22:49:27 -08:00
scwf	d8cf678589	[SQL] Remove unnecessary case in HiveContext.toHiveString a follow up of #3547 /cc marmbrus Author: scwf <wangfei1@huawei.com> Closes #3563 from scwf/rnc and squashes the following commits: 9395661 [scwf] remove unnecessary condition	2014-12-11 22:48:03 -08:00
Takuya UESHIN	334480362b	[SPARK-4293][SQL] Make Cast be able to handle complex types. Inserting data of type including `ArrayType.containsNull == false` or `MapType.valueContainsNull == false` or `StructType.fields.exists(_.nullable == false)` into Hive table will fail because `Cast` inserted by `HiveMetastoreCatalog.PreInsertionCasts` rule of `Analyzer` can't handle these types correctly. Complex type cast rule proposal: - Cast for non-complex types should be able to cast the same as before. - Cast for `ArrayType` can evaluate if - Element type can cast - Nullability rule doesn't break - Cast for `MapType` can evaluate if - Key type can cast - Nullability for casted key type is `false` - Value type can cast - Nullability rule for value type doesn't break - Cast for `StructType` can evaluate if - The field size is the same - Each field can cast - Nullability rule for each field doesn't break - The nested structure should be the same. Nullability rule: - If the casted type is `nullable == true`, the target nullability should be `true` Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #3150 from ueshin/issues/SPARK-4293 and squashes the following commits: e935939 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-4293 ba14003 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-4293 8999868 [Takuya UESHIN] Fix a test title. f677c30 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-4293 287f410 [Takuya UESHIN] Add tests to insert data of types ArrayType / MapType / StructType with nullability is false into Hive table. 4f71bb8 [Takuya UESHIN] Make Cast be able to handle complex types.	2014-12-11 22:45:25 -08:00
Jacky Li	c152dde78f	[SPARK-4639] [SQL] Pass maxIterations in as a parameter in Analyzer fix a TODO in Analyzer: // TODO: pass this in as a parameter val fixedPoint = FixedPoint(100) Author: Jacky Li <jacky.likun@huawei.com> Closes #3499 from jackylk/config and squashes the following commits: 4c1252c [Jacky Li] fix scalastyle 820f460 [Jacky Li] pass maxIterations in as a parameter	2014-12-11 22:44:27 -08:00
Cheng Hao	a7f07f511c	[SPARK-4662] [SQL] Whitelist more unittest Whitelist more hive unit test: "create_like_tbl_props" "udf5" "udf_java_method" "decimal_1" "udf_pmod" "udf_to_double" "udf_to_float" "udf7" (this will fail in Hive 0.12) Author: Cheng Hao <hao.cheng@intel.com> Closes #3522 from chenghao-intel/unittest and squashes the following commits: f54e4c7 [Cheng Hao] work around to clean up the hive.table.parameters.default in reset 16fee22 [Cheng Hao] Whitelist more unittest	2014-12-11 22:43:02 -08:00
Cheng Hao	bf40cf89e3	[SPARK-4713] [SQL] SchemaRDD.unpersist() should not raise exception if it is not persisted Unpersist a uncached RDD, will not raise exception, for example: ``` val data = Array(1, 2, 3, 4, 5) val distData = sc.parallelize(data) distData.unpersist(true) ``` But the `SchemaRDD` will raise exception if the `SchemaRDD` is not cached. Since `SchemaRDD` is the subclasses of the `RDD`, we should follow the same behavior. Author: Cheng Hao <hao.cheng@intel.com> Closes #3572 from chenghao-intel/try_uncache and squashes the following commits: 50a7a89 [Cheng Hao] SchemaRDD.unpersist() should not raise exception if it is not persisted	2014-12-11 22:41:36 -08:00
Joseph K. Bradley	2a5b5fd4cc	[SPARK-4791] [sql] Infer schema from case class with multiple constructors Modified ScalaReflection.schemaFor to take primary constructor of Product when there are multiple constructors. Added test to suite which failed before but works now. Needed for [https://github.com/apache/spark/pull/3637] CC: marmbrus Author: Joseph K. Bradley <joseph@databricks.com> Closes #3646 from jkbradley/sql-reflection and squashes the following commits: 796b2e4 [Joseph K. Bradley] Modified ScalaReflection.schemaFor to take primary constructor of Product when there are multiple constructors. Added test to suite which failed before but works now.	2014-12-10 23:41:15 -08:00
Cheng Hao	383c5555c9	[SPARK-4785][SQL] Initilize Hive UDFs on the driver and serialize them with a wrapper Different from Hive 0.12.0, in Hive 0.13.1 UDF/UDAF/UDTF (aka Hive function) objects should only be initialized once on the driver side and then serialized to executors. However, not all function objects are serializable (e.g. GenericUDF doesn't implement Serializable). Hive 0.13.1 solves this issue with Kryo or XML serializer. Several utility ser/de methods are provided in class o.a.h.h.q.e.Utilities for this purpose. In this PR we chose Kryo for efficiency. The Kryo serializer used here is created in Hive. Spark Kryo serializer wasn't used because there's no available SparkConf instance. Author: Cheng Hao <hao.cheng@intel.com> Author: Cheng Lian <lian@databricks.com> Closes #3640 from chenghao-intel/udf_serde and squashes the following commits: 8e13756 [Cheng Hao] Update the comment 74466a3 [Cheng Hao] refactor as feedbacks 396c0e1 [Cheng Hao] avoid Simple UDF to be serialized e9c3212 [Cheng Hao] update the comment 19cbd46 [Cheng Hao] support udf instance ser/de after initialization	2014-12-09 10:28:33 -08:00
Cheng Hao	51b1fe1426	[SPARK-4769] [SQL] CTAS does not work when reading from temporary tables This is the code refactor and follow ups for #2570 Author: Cheng Hao <hao.cheng@intel.com> Closes #3336 from chenghao-intel/createtbl and squashes the following commits: 3563142 [Cheng Hao] remove the unused variable e215187 [Cheng Hao] eliminate the compiling warning 4f97f14 [Cheng Hao] fix bug in unittest 5d58812 [Cheng Hao] revert the API changes b85b620 [Cheng Hao] fix the regression of temp tabl not found in CTAS	2014-12-08 17:39:12 -08:00
Jacky Li	944384363d	[SQL] remove unnecessary import in spark-sql Author: Jacky Li <jacky.likun@huawei.com> Closes #3630 from jackylk/remove and squashes the following commits: 150e7e0 [Jacky Li] remove unnecessary import	2014-12-08 17:27:46 -08:00
Cheng Lian	6f61e1f961	[SPARK-4761][SQL] Enables Kryo by default in Spark SQL Thrift server Enables Kryo and disables reference tracking by default in Spark SQL Thrift server. Configurations explicitly defined by users in `spark-defaults.conf` are respected (the Thrift server is started by `spark-submit`, which handles configuration properties properly). <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3621) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3621 from liancheng/kryo-by-default and squashes the following commits: 70c2775 [Cheng Lian] Enables Kryo by default in Spark SQL Thrift server	2014-12-05 10:27:40 -08:00
Michael Armbrust	f5801e813f	[SPARK-4753][SQL] Use catalyst for partition pruning in newParquet. Author: Michael Armbrust <michael@databricks.com> Closes #3613 from marmbrus/parquetPartitionPruning and squashes the following commits: 4f138f8 [Michael Armbrust] Use catalyst for partition pruning in newParquet.	2014-12-04 22:25:21 -08:00
Aaron Davidson	c6c7165e7e	[SQL] Minor: Avoid calling Seq#size in a loop Just found this instance while doing some jstack-based profiling of a Spark SQL job. It is very unlikely that this is causing much of a perf issue anywhere, but it is unnecessarily suboptimal. Author: Aaron Davidson <aaron@databricks.com> Closes #3593 from aarondav/seq-opt and squashes the following commits: 962cdfc [Aaron Davidson] [SQL] Minor: Avoid calling Seq#size in a loop	2014-12-04 00:58:42 -08:00
Jacky Li	ed88db4cb2	[SQL] remove unnecessary import Author: Jacky Li <jacky.likun@huawei.com> Closes #3585 from jackylk/remove and squashes the following commits: 045423d [Jacky Li] remove unnecessary import	2014-12-04 00:43:55 -08:00
Michael Armbrust	513ef82e85	[SPARK-4552][SQL] Avoid exception when reading empty parquet data through Hive This is a very small fix that catches one specific exception and returns an empty table. #3441 will address this in a more principled way. Author: Michael Armbrust <michael@databricks.com> Closes #3586 from marmbrus/fixEmptyParquet and squashes the following commits: 2781d9f [Michael Armbrust] Handle empty lists for newParquet 04dd376 [Michael Armbrust] Avoid exception when reading empty parquet data through Hive	2014-12-03 14:13:35 -08:00
wangfei	3ae0cda83c	[SPARK-4695][SQL] Get result using executeCollect Using ```executeCollect``` to collect the result, because executeCollect is a custom implementation of collect in spark sql which better than rdd's collect Author: wangfei <wangfei1@huawei.com> Closes #3547 from scwf/executeCollect and squashes the following commits: a5ab68e [wangfei] Revert "adding debug info" a60d680 [wangfei] fix test failure 0db7ce8 [wangfei] adding debug info 184c594 [wangfei] using executeCollect instead collect	2014-12-02 14:30:44 -08:00
Daoyuan Wang	1f5ddf17e8	[SPARK-4670] [SQL] wrong symbol for bitwise not We should use `~` instead of `-` for bitwise NOT. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3528 from adrian-wang/symbol and squashes the following commits: affd4ad [Daoyuan Wang] fix code gen test case 56efb79 [Daoyuan Wang] ensure bitwise NOT over byte and short persist data type f55fbae [Daoyuan Wang] wrong symbol for bitwise not	2014-12-02 14:25:12 -08:00
Daoyuan Wang	f6df609dcc	[SPARK-4593][SQL] Return null when denominator is 0 SELECT max(1/0) FROM src would return a very large number, which is obviously not right. For hive-0.12, hive would return `Infinity` for 1/0, while for hive-0.13.1, it is `NULL` for 1/0. I think it is better to keep our behavior with newer Hive version. This PR ensures that when the divider is 0, the result of expression should be NULL, same with hive-0.13.1 Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3443 from adrian-wang/div and squashes the following commits: 2e98677 [Daoyuan Wang] fix code gen for divide 0 85c28ba [Daoyuan Wang] temp 36236a5 [Daoyuan Wang] add test cases 6f5716f [Daoyuan Wang] fix comments cee92bd [Daoyuan Wang] avoid evaluation 2 times 22ecd9a [Daoyuan Wang] fix style cf28c58 [Daoyuan Wang] divide fix 2dfe50f [Daoyuan Wang] return null when divider is 0 of Double type	2014-12-02 14:21:47 -08:00
YanTangZhai	1066427600	[SPARK-4676][SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null val jsc = new org.apache.spark.api.java.JavaSparkContext(sc) val jhc = new org.apache.spark.sql.hive.api.java.JavaHiveContext(jsc) val nrdd = jhc.hql("select null from spark_test.for_test") println(nrdd.schema) Then the error is thrown as follows: scala.MatchError: NullType (of class org.apache.spark.sql.catalyst.types.NullType$) at org.apache.spark.sql.types.util.DataTypeConversions$.asJavaDataType(DataTypeConversions.scala:43) Author: YanTangZhai <hakeemzhai@tencent.com> Author: yantangzhai <tyz0303@163.com> Author: Michael Armbrust <michael@databricks.com> Closes #3538 from YanTangZhai/MatchNullType and squashes the following commits: e052dff [yantangzhai] [SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null 4b4bb34 [yantangzhai] [SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null 896c7b7 [yantangzhai] fix NullType MatchError in JavaSchemaRDD when sql has null 6e643f8 [YanTangZhai] Merge pull request #11 from apache/master e249846 [YanTangZhai] Merge pull request #10 from apache/master d26d982 [YanTangZhai] Merge pull request #9 from apache/master 76d4027 [YanTangZhai] Merge pull request #8 from apache/master 03b62b0 [YanTangZhai] Merge pull request #7 from apache/master 8a00106 [YanTangZhai] Merge pull request #6 from apache/master cbcba66 [YanTangZhai] Merge pull request #3 from apache/master cdef539 [YanTangZhai] Merge pull request #1 from apache/master	2014-12-02 14:15:12 -08:00
baishuo	69b6fed206	[SPARK-4663][sql]add finally to avoid resource leak Author: baishuo <vc_java@hotmail.com> Closes #3526 from baishuo/master-trycatch and squashes the following commits: d446e14 [baishuo] correct the code style b36bf96 [baishuo] correct the code style ae0e447 [baishuo] add finally to avoid resource leak	2014-12-02 12:12:03 -08:00
Kousuke Saruta	e75e04f980	[SPARK-4536][SQL] Add sqrt and abs to Spark SQL DSL Spark SQL has embeded sqrt and abs but DSL doesn't support those functions. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3401 from sarutak/dsl-missing-operator and squashes the following commits: 07700cf [Kousuke Saruta] Modified Literal(null, NullType) to Literal(null) in DslQuerySuite 8f366f8 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into dsl-missing-operator 1b88e2e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into dsl-missing-operator 0396f89 [Kousuke Saruta] Added sqrt and abs to Spark SQL DSL	2014-12-02 12:07:52 -08:00
Reynold Xin	b1f8fe316a	Indent license header properly for interfaces.scala. A very small nit update. Author: Reynold Xin <rxin@databricks.com> Closes #3552 from rxin/license-header and squashes the following commits: df8d1a4 [Reynold Xin] Indent license header properly for interfaces.scala.	2014-12-02 11:59:15 -08:00
zsxwing	d3e02dddf0	[SPARK-4268][SQL] Use #::: to get benefit from Stream in SqlLexical.allCaseVersions In addition, using `s.isEmpty` to eliminate the string comparison. Author: zsxwing <zsxwing@gmail.com> Closes #3132 from zsxwing/SPARK-4268 and squashes the following commits: 358e235 [zsxwing] Improvement of allCaseVersions	2014-12-01 16:39:54 -08:00
Daoyuan Wang	4df60a8cbc	[SPARK-4529] [SQL] support view with column alias Support view definition like CREATE VIEW view3(valoo) TBLPROPERTIES ("fear" = "factor") AS SELECT upper(value) FROM src WHERE key=86; [valoo as the alias of upper(value)]. This is missing part of SPARK-4239, for a fully view support. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3396 from adrian-wang/viewcolumn and squashes the following commits: 4d001d0 [Daoyuan Wang] support view with column alias	2014-12-01 16:08:51 -08:00
wangfei	7b79957879	[SQL] Minor fix for doc and comment Author: wangfei <wangfei1@huawei.com> Closes #3533 from scwf/sql-doc1 and squashes the following commits: 962910b [wangfei] doc and comment fix	2014-12-01 14:02:02 -08:00
ravipesala	bc353819cc	[SPARK-4658][SQL] Code documentation issue in DDL of datasource API Author: ravipesala <ravindra.pesala@huawei.com> Closes #3516 from ravipesala/ddl_doc and squashes the following commits: d101fdf [ravipesala] Style issues fixed d2238cd [ravipesala] Corrected documentation	2014-12-01 13:31:27 -08:00
ravipesala	6a9ff19dc0	[SPARK-4650][SQL] Supporting multi column support in countDistinct function like count(distinct c1,c2..) in Spark SQL Supporting multi column support in countDistinct function like count(distinct c1,c2..) in Spark SQL Author: ravipesala <ravindra.pesala@huawei.com> Author: Michael Armbrust <michael@databricks.com> Closes #3511 from ravipesala/countdistinct and squashes the following commits: cc4dbb1 [ravipesala] style 070e12a [ravipesala] Supporting multi column support in count(distinct c1,c2..) in Spark SQL	2014-12-01 13:28:04 -08:00
Liang-Chi Hsieh	b57365a1ec	[SPARK-4358][SQL] Let BigDecimal do checking type compatibility Remove hardcoding max and min values for types. Let BigDecimal do checking type compatibility. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #3208 from viirya/more_numericLit and squashes the following commits: e9834b4 [Liang-Chi Hsieh] Remove byte and short types for number literal. 1bd1825 [Liang-Chi Hsieh] Fix Indentation and make the modification clearer. cf1a997 [Liang-Chi Hsieh] Modified for comment to add a rule of analysis that adds a cast. 91fe489 [Liang-Chi Hsieh] add Byte and Short. 1bdc69d [Liang-Chi Hsieh] Let BigDecimal do checking type compatibility.	2014-12-01 13:17:56 -08:00
Jacky Li	bafee67eba	[SQL] add @group tab in limit() and count() group tab is missing for scaladoc Author: Jacky Li <jacky.likun@gmail.com> Closes #3458 from jackylk/patch-7 and squashes the following commits: 0121a70 [Jacky Li] add @group tab in limit() and count()	2014-12-01 13:12:30 -08:00
zsxwing	30a86acdef	[SPARK-4661][Core] Minor code and docs cleanup Author: zsxwing <zsxwing@gmail.com> Closes #3521 from zsxwing/SPARK-4661 and squashes the following commits: 03cbe3f [zsxwing] Minor code and docs cleanup	2014-12-01 00:35:01 -08:00
Cheng Lian	5b99bf243e	[SPARK-4645][SQL] Disables asynchronous execution in Hive 0.13.1 HiveThriftServer2 This PR disables HiveThriftServer2 asynchronous execution by setting `runInBackground` argument in `ExecuteStatementOperation` to `false`, and reverting `SparkExecuteStatementOperation.run` in Hive 13 shim to Hive 12 version. This change makes Simba ODBC driver v1.0.0.1000 work. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3506) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3506 from liancheng/disable-async-exec and squashes the following commits: 593804d [Cheng Lian] Disables asynchronous execution in Hive 0.13.1 HiveThriftServer2	2014-11-28 11:42:40 -05:00
w00228970	723be60e23	[SQL] Compute timeTaken correctly ```timeTaken``` should not count the time of printing result. Author: w00228970 <wangfei1@huawei.com> Closes #3423 from scwf/time-taken-bug and squashes the following commits: da7e102 [w00228970] compute time taken correctly	2014-11-24 21:17:24 -08:00
Davies Liu	6cf507685e	[SPARK-4548] []SPARK-4517] improve performance of python broadcast Re-implement the Python broadcast using file: 1) serialize the python object using cPickle, write into disks. 2) Create a wrapper in JVM (for the dumped file), it read data from during serialization 3) Using TorrentBroadcast or HttpBroadcast to transfer the data (compressed) into executors 4) During deserialization, writing the data into disk. 5) Passing the path into Python worker, read data from disk and unpickle it into python object, until the first access. It fixes the performance regression introduced in #2659, has similar performance as 1.1, but support object larger than 2G, also improve the memory efficiency (only one compressed copy in driver and executor). Testing with a 500M broadcast and 4 tasks (excluding the benefit from reused worker in 1.2): name \| 1.1 \| 1.2 with this patch \| improvement ---------\|--------\|---------\|-------- python-broadcast-w-bytes \| 25.20 \| 9.33 \| 170.13% \| python-broadcast-w-set \| 4.13 \| 4.50 \| -8.35% \| Testing with 100 tasks (16 CPUs): name \| 1.1 \| 1.2 with this patch \| improvement ---------\|--------\|---------\|-------- python-broadcast-w-bytes \| 38.16 \| 8.40 \| 353.98% python-broadcast-w-set \| 23.29 \| 9.59 \| 142.80% Author: Davies Liu <davies@databricks.com> Closes #3417 from davies/pybroadcast and squashes the following commits: 50a58e0 [Davies Liu] address comments b98de1d [Davies Liu] disable gc while unpickle e5ee6b9 [Davies Liu] support large string 09303b8 [Davies Liu] read all data into memory dde02dd [Davies Liu] improve performance of python broadcast	2014-11-24 17:17:03 -08:00
Kousuke Saruta	dd1c9cb36c	[SPARK-4487][SQL] Fix attribute reference resolution error when using ORDER BY. When we use ORDER BY clause, at first, attributes referenced by projection are resolved (1). And then, attributes referenced at ORDER BY clause are resolved (2). But when resolving attributes referenced at ORDER BY clause, the resolution result generated in (1) is discarded so for example, following query fails. SELECT c1 + c2 FROM mytable ORDER BY c1; The query above fails because when resolving the attribute reference 'c1', the resolution result of 'c2' is discarded. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3363 from sarutak/SPARK-4487 and squashes the following commits: fd314f3 [Kousuke Saruta] Fixed attribute resolution logic in Analyzer 6e60c20 [Kousuke Saruta] Fixed conflicts cb5b7e9 [Kousuke Saruta] Added test case for SPARK-4487 282d529 [Kousuke Saruta] Fixed attributes reference resolution error b6123e6 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into concat-feature 317b7fb [Kousuke Saruta] WIP	2014-11-24 12:54:37 -08:00

1 2 3 4 5 ...

662 commits