ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Reynold Xin	3888ee2f38	[SPARK-3748] Log thread name in unit test logs Thread names are useful for correlating failures. Author: Reynold Xin <rxin@apache.org> Closes #2600 from rxin/log4j and squashes the following commits: 83ffe88 [Reynold Xin] [SPARK-3748] Log thread name in unit test logs	2014-10-01 01:03:49 -07:00
Patrick Wendell	b64fcbd2dc	Revert "[SPARK-3007][SQL]Add Dynamic Partition support to Spark Sql hive" This reverts commit `0bbe7faeff`.	2014-09-30 09:43:46 -07:00
baishuo(白硕)	0bbe7faeff	[SPARK-3007][SQL]Add Dynamic Partition support to Spark Sql hive a new PR base on new master. changes are the same as https://github.com/apache/spark/pull/1919 Author: baishuo(白硕) <vc_java@hotmail.com> Author: baishuo <vc_java@hotmail.com> Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2226 from baishuo/patch-3007 and squashes the following commits: e69ce88 [Cheng Lian] Adds tests to verify dynamic partitioning folder layout b20a3dc [Cheng Lian] Addresses @yhuai's comments 096bbbc [baishuo(白硕)] Merge pull request #1 from liancheng/refactor-dp 1093c20 [Cheng Lian] Adds more tests 5004542 [Cheng Lian] Minor refactoring fae9eff [Cheng Lian] Refactors InsertIntoHiveTable to a Command 528e84c [Cheng Lian] Fixes typo in test name, regenerated golden answer files c464b26 [Cheng Lian] Refactors dynamic partitioning support 5033928 [baishuo] pass check style 2201c75 [baishuo] use HiveConf.DEFAULTPARTITIONNAME to replace hive.exec.default.partition.name b47c9bf [baishuo] modify according micheal's advice c3ab36d [baishuo] modify for some bad indentation 7ce2d9f [baishuo] modify code to pass scala style checks 37c1c43 [baishuo] delete a empty else branch 66e33fc [baishuo] do a little modify 88d0110 [baishuo] update file after test a3961d9 [baishuo(白硕)] Update Cast.scala f7467d0 [baishuo(白硕)] Update InsertIntoHiveTable.scala c1a59dd [baishuo(白硕)] Update Cast.scala 0e18496 [baishuo(白硕)] Update HiveQuerySuite.scala 60f70aa [baishuo(白硕)] Update InsertIntoHiveTable.scala 0a50db9 [baishuo(白硕)] Update HiveCompatibilitySuite.scala 491c7d0 [baishuo(白硕)] Update InsertIntoHiveTable.scala a2374a8 [baishuo(白硕)] Update InsertIntoHiveTable.scala 701a814 [baishuo(白硕)] Update SparkHadoopWriter.scala dc24c41 [baishuo(白硕)] Update HiveQl.scala	2014-09-29 15:51:55 -07:00
Michael Armbrust	f0c7e19550	[SPARK-3680][SQL] Fix bug caused by eager typing of HiveGenericUDFs Typing of UDFs should be lazy as it is often not valid to call `dataType` on an expression until after all of its children are `resolved`. Author: Michael Armbrust <michael@databricks.com> Closes #2525 from marmbrus/concatBug and squashes the following commits: 5b8efe7 [Michael Armbrust] fix bug with eager typing of udfs	2014-09-27 12:10:16 -07:00
w00228970	0800881051	[SPARK-3676][SQL] Fix hive test suite failure due to diffs in JDK 1.6/1.7 This is a bug in JDK6: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4428022 this is because jdk get different result to operate ```double```, ```System.out.println(1/500d)``` in different jdk get different result jdk 1.6.0(_31) ---- 0.0020 jdk 1.7.0(_05) ---- 0.002 this leads to HiveQuerySuite failed when generate golden answer in jdk 1.7 and run tests in jdk 1.6, result did not match Author: w00228970 <wangfei1@huawei.com> Closes #2517 from scwf/HiveQuerySuite and squashes the following commits: 0cb5e8d [w00228970] delete golden answer of division-0 and timestamp cast #1 1df3964 [w00228970] Jdk version leads to different query output for Double, this make HiveQuerySuite failed	2014-09-27 12:06:16 -07:00
Daoyuan Wang	0ec2d2e8f0	[SPARK-3531][SQL]select null from table would throw a MatchError Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #2396 from adrian-wang/selectnull and squashes the following commits: 2458229 [Daoyuan Wang] rebase solution	2014-09-26 12:04:37 -07:00
Venkata Ramana Gollamudi	1c62f97e94	[SPARK-3268][SQL] DoubleType, FloatType and DecimalType modulus support Supported modulus operation using % operator on fractional datatypes FloatType, DoubleType and DecimalType Example: SELECT 1388632775.0 % 60 from tablename LIMIT 1 Author : Venkata Ramana Gollamudi ramana.gollamudihuawei.com Author: Venkata Ramana Gollamudi <ramana.gollamudi@huawei.com> Closes #2457 from gvramana/double_modulus_support and squashes the following commits: 79172a8 [Venkata Ramana Gollamudi] Add hive cache to testcase c09bd5b [Venkata Ramana Gollamudi] Added a HiveQuerySuite testcase 193fa81 [Venkata Ramana Gollamudi] corrected testcase 3624471 [Venkata Ramana Gollamudi] modified testcase e112c09 [Venkata Ramana Gollamudi] corrected the testcase 513d0e0 [Venkata Ramana Gollamudi] modified to add modulus support to fractional types float,double,decimal 296d253 [Venkata Ramana Gollamudi] modified to add modulus support to fractional types float,double,decimal	2014-09-23 12:17:47 -07:00
wangfei	ae60f8fb2d	[SPARK-3481][SQL] removes the evil MINOR HACK a follow up of https://github.com/apache/spark/pull/2377 and https://github.com/apache/spark/pull/2352, see detail there. Author: wangfei <wangfei1@huawei.com> Closes #2505 from scwf/patch-6 and squashes the following commits: 4874ec8 [wangfei] removes the evil MINOR HACK	2014-09-23 11:59:44 -07:00
Daoyuan Wang	116016b481	[SPARK-3582][SQL] not limit argument type for hive simple udf Since we have moved to `ConventionHelper`, it is quite easy to avoid call `javaClassToDataType` in hive simple udf. This will solve SPARK-3582. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #2506 from adrian-wang/spark3582 and squashes the following commits: 450c28e [Daoyuan Wang] not limit argument type for hive simple udf	2014-09-23 11:47:53 -07:00
Daoyuan Wang	66bc0f2d67	[SPARK-3598][SQL]cast to timestamp should be the same as hive this patch fixes timestamp smaller than 0 and cast int as timestamp select cast(1000 as timestamp) from src limit 1; should return 1970-01-01 00:00:01, but we now take it as 1000 seconds. also, current implementation has bug when the time is before 1970-01-01 00:00:00. rxin marmbrus chenghao-intel Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #2458 from adrian-wang/timestamp and squashes the following commits: 4274b1d [Daoyuan Wang] set test not related to timezone 1234f66 [Daoyuan Wang] fix timestamp smaller than 0 and cast int as timestamp	2014-09-23 11:45:44 -07:00
Michael Armbrust	293ce85145	[SPARK-3414][SQL] Replace LowerCaseSchema with Resolver This PR introduces a subtle change in semantics for HiveContext when using the results in Python or Scala. Specifically, while resolution remains case insensitive, it is now case preserving. _This PR is a follow up to #2293 (and to a lesser extent #2262 #2334)._ In #2293 the catalog was changed to store analyzed logical plans instead of unresolved ones. While this change fixed the reported bug (which was caused by yet another instance of us forgetting to put in a `LowerCaseSchema` operator) it had the consequence of breaking assumptions made by `MultiInstanceRelation`. Specifically, we can't replace swap out leaf operators in a tree without rewriting changed expression ids (which happens when you self join the same RDD that has been registered as a temp table). In this PR, I instead remove the need to insert `LowerCaseSchema` operators at all, by moving the concern of matching up identifiers completely into analysis. Doing so allows the test cases from both #2293 and #2262 to pass at the same time (and likely fixes a slew of other "unknown unknown" bugs). While it is rolled back in this PR, storing the analyzed plan might actually be a good idea. For instance, it is kind of confusing if you register a temporary table, change the case sensitivity of resolution and now you can't query that table anymore. This can be addressed in a follow up PR. Follow-ups: - Configurable case sensitivity - Consider storing analyzed plans for temp tables Author: Michael Armbrust <michael@databricks.com> Closes #2382 from marmbrus/lowercase and squashes the following commits: c21171e [Michael Armbrust] Ensure the resolver is used for field lookups and ensure that case insensitive resolution is still case preserving. d4320f1 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into lowercase 2de881e [Michael Armbrust] Address comments. 219805a [Michael Armbrust] style 5b93711 [Michael Armbrust] Replace LowerCaseSchema with Resolver.	2014-09-20 16:41:14 -07:00
Daoyuan Wang	ba68a51c40	[SPARK-3485][SQL] Use GenericUDFUtils.ConversionHelper for Simple UDF type conversions This is just another solution to SPARK-3485, in addition to PR #2355 In this patch, we will use ConventionHelper and FunctionRegistry to invoke a simple udf evaluation, which rely more on hive, but much cleaner and safer. We can discuss which one is better. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #2407 from adrian-wang/simpleudf and squashes the following commits: 15762d2 [Daoyuan Wang] add posmod test which would fail the test but now ok 0d69eb4 [Daoyuan Wang] another way to pass to hive simple udf	2014-09-19 15:39:31 -07:00
ravipesala	5522151eb1	[SPARK-2594][SQL] Support CACHE TABLE <name> AS SELECT ... This feature allows user to add cache table from the select query. Example : ```CACHE TABLE testCacheTable AS SELECT * FROM TEST_TABLE``` Spark takes this type of SQL as command and it does lazy caching just like ```SQLContext.cacheTable```, ```CACHE TABLE <name>``` does. It can be executed from both SQLContext and HiveContext. Recreated the pull request after rebasing with master.And fixed all the comments raised in previous pull requests. https://github.com/apache/spark/pull/2381 https://github.com/apache/spark/pull/2390 Author : ravipesala ravindra.pesalahuawei.com Author: ravipesala <ravindra.pesala@huawei.com> Closes #2397 from ravipesala/SPARK-2594 and squashes the following commits: a5f0beb [ravipesala] Simplified the code as per Admin comment. 8059cd2 [ravipesala] Changed the behaviour from eager caching to lazy caching. d6e469d [ravipesala] Code review comments by Admin are handled. c18aa38 [ravipesala] Merge remote-tracking branch 'remotes/ravipesala/Add-Cache-table-as' into SPARK-2594 394d5ca [ravipesala] Changed style fb1759b [ravipesala] Updated as per Admin comments 8c9993c [ravipesala] Changed the style d8b37b2 [ravipesala] Updated as per the comments by Admin bc0bffc [ravipesala] Merge remote-tracking branch 'ravipesala/Add-Cache-table-as' into Add-Cache-table-as e3265d0 [ravipesala] Updated the code as per the comments by Admin in pull request. 724b9db [ravipesala] Changed style aaf5b59 [ravipesala] Added comment dc33895 [ravipesala] Updated parser to support add cache table command b5276b2 [ravipesala] Updated parser to support add cache table command eebc0c1 [ravipesala] Add CACHE TABLE <name> AS SELECT ... 6758f80 [ravipesala] Changed style 7459ce3 [ravipesala] Added comment 13c8e27 [ravipesala] Updated parser to support add cache table command 4e858d8 [ravipesala] Updated parser to support add cache table command b803fc8 [ravipesala] Add CACHE TABLE <name> AS SELECT ...	2014-09-19 15:31:57 -07:00
Cheng Hao	2c3cc7641d	[SPARK-3501] [SQL] Fix the bug of Hive SimpleUDF creates unnecessary type cast When do the query like: ``` select datediff(cast(value as timestamp), cast('2002-03-21 00:00:00' as timestamp)) from src; ``` SparkSQL will raise exception: ``` [info] scala.MatchError: TimestampType (of class org.apache.spark.sql.catalyst.types.TimestampType$) [info] at org.apache.spark.sql.catalyst.expressions.Cast.castToTimestamp(Cast.scala:77) [info] at org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:251) [info] at org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:247) [info] at org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:263) [info] at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$5$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:217) [info] at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$5$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:210) [info] at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) [info] at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4$$anonfun$apply$2.apply(TreeNode.scala:180) [info] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) [info] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ``` Author: Cheng Hao <hao.cheng@intel.com> Closes #2368 from chenghao-intel/cast_exception and squashes the following commits: 5c9c3a5 [Cheng Hao] make more clear code 49dfc50 [Cheng Hao] Add no-op for Cast and revert the position of SimplifyCasts b804abd [Cheng Hao] Add unit test to show the failure in identical data type casting 330a5c8 [Cheng Hao] Update Code based on comments b834ed4 [Cheng Hao] Fix bug of HiveSimpleUDF with unnecessary type cast which cause exception in constant folding	2014-09-19 15:29:22 -07:00
Michael Armbrust	30f288ae34	[SPARK-2890][SQL] Allow reading of data when case insensitive resolution could cause possible ambiguity. Throwing an error in the constructor makes it possible to run queries, even when there is no actual ambiguity. Remove this check in favor of throwing an error in analysis when they query is actually is ambiguous. Also took the opportunity to add test cases that would have caught a subtle bug in my first attempt at fixing this and refactor some other test code. Author: Michael Armbrust <michael@databricks.com> Closes #2209 from marmbrus/sameNameStruct and squashes the following commits: 729cca4 [Michael Armbrust] Better tests. a003aeb [Michael Armbrust] Remove error (it'll be caught in analysis).	2014-09-16 11:42:26 -07:00
Bertrand Bossy	c243b21a8b	SPARK-3039: Allow spark to be built using avro-mapred for hadoop2 SPARK-3039: Adds the maven property "avro.mapred.classifier" to build spark-assembly with avro-mapred with support for the new Hadoop API. Sets this property to hadoop2 for Hadoop 2 profiles. I am not very familiar with maven, nor do I know whether this potentially breaks something in the hive part of spark. There might be a more elegant way of doing this. Author: Bertrand Bossy <bertrandbossy@gmail.com> Closes #1945 from bbossy/SPARK-3039 and squashes the following commits: c32ce59 [Bertrand Bossy] SPARK-3039: Allow spark to be built using avro-mapred for hadoop2	2014-09-14 21:10:17 -07:00
Michael Armbrust	0f8c4edf4e	[SQL] Decrease partitions when testing Author: Michael Armbrust <michael@databricks.com> Closes #2164 from marmbrus/shufflePartitions and squashes the following commits: 0da1e8c [Michael Armbrust] test hax ef2d985 [Michael Armbrust] more test hacks. 2dabae3 [Michael Armbrust] more test fixes 0bdbf21 [Michael Armbrust] Make parquet tests less order dependent b42eeab [Michael Armbrust] increase test parallelism 80453d5 [Michael Armbrust] Decrease partitions when testing	2014-09-13 16:08:04 -07:00
Cheng Lian	74049249ab	[SPARK-3294][SQL] Eliminates boxing costs from in-memory columnar storage This is a major refactoring of the in-memory columnar storage implementation, aims to eliminate boxing costs from critical paths (building/accessing column buffers) as much as possible. The basic idea is to refactor all major interfaces into a row-based form and use them together with `SpecificMutableRow`. The difficult part is how to adapt all compression schemes, esp. `RunLengthEncoding` and `DictionaryEncoding`, to this design. Since in-memory compression is disabled by default for now, and this PR should be strictly better than before no matter in-memory compression is enabled or not, maybe I'll finish that part in another PR. UPDATE This PR also took the chance to optimize `HiveTableScan` by 1. leveraging `SpecificMutableRow` to avoid boxing cost, and 1. building specific `Writable` unwrapper functions a head of time to avoid per row pattern matching and branching costs. TODO - [x] Benchmark - [ ] ~~Eliminate boxing costs in `RunLengthEncoding`~~ (left to future PRs) - [ ] ~~Eliminate boxing costs in `DictionaryEncoding` (seems not easy to do without specializing `DictionaryEncoding` for every supported column type)~~ (left to future PRs) ## Micro benchmark The benchmark uses a 10 million line CSV table consists of bytes, shorts, integers, longs, floats and doubles, measures the time to build the in-memory version of this table, and the time to scan the whole in-memory table. Benchmark code can be found [here](https://gist.github.com/liancheng/fe70a148de82e77bd2c8#file-hivetablescanbenchmark-scala). Script used to generate the input table can be found [here](https://gist.github.com/liancheng/fe70a148de82e77bd2c8#file-tablegen-scala). Speedup: - Hive table scanning + column buffer building: 18.74% The original benchmark uses 1K as in-memory batch size, when increased to 10K, it can be 28.32% faster. - In-memory table scanning: 7.95% Before: \| Building \| Scanning ------- \| -------- \| -------- 1 \| 16472 \| 525 2 \| 16168 \| 530 3 \| 16386 \| 529 4 \| 16184 \| 538 5 \| 16209 \| 521 Average \| 16283.8 \| 528.6 After: \| Building \| Scanning ------- \| -------- \| -------- 1 \| 13124 \| 458 2 \| 13260 \| 529 3 \| 12981 \| 463 4 \| 13214 \| 483 5 \| 13583 \| 500 Average \| 13232.4 \| 486.6 Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2327 from liancheng/prevent-boxing/unboxing and squashes the following commits: 4419fe4 [Cheng Lian] Addressing comments e5d2cf2 [Cheng Lian] Bug fix: should call setNullAt when field value is null to avoid NPE 8b8552b [Cheng Lian] Only checks for partition batch pruning flag once 489f97b [Cheng Lian] Bug fix: TableReader.fillObject uses wrong ordinals 97bbc4e [Cheng Lian] Optimizes hive.TableReader by by providing specific Writable unwrappers a head of time 3dc1f94 [Cheng Lian] Minor changes to eliminate row object creation 5b39cb9 [Cheng Lian] Lowers log level of compression scheme details f2a7890 [Cheng Lian] Use SpecificMutableRow in InMemoryColumnarTableScan to avoid boxing 9cf30b0 [Cheng Lian] Added row based ColumnType.append/extract 456c366 [Cheng Lian] Made compression decoder row based edac3cd [Cheng Lian] Makes ColumnAccessor.extractSingle row based 8216936 [Cheng Lian] Removes boxing cost in IntDelta and LongDelta by providing specialized implementations b70d519 [Cheng Lian] Made some in-memory columnar storage interfaces row-based	2014-09-13 15:08:30 -07:00
Cheng Lian	184cd51c42	[SPARK-3481][SQL] Removes the evil MINOR HACK This is a follow up of #2352. Now we can finally remove the evil "MINOR HACK", which covered up the eldest bug in the history of Spark SQL (see details [here](https://github.com/apache/spark/pull/2352#issuecomment-55440621)). Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2377 from liancheng/remove-evil-minor-hack and squashes the following commits: 0869c78 [Cheng Lian] Removes the evil MINOR HACK	2014-09-13 12:35:40 -07:00
Cheng Lian	6d887db789	[SPARK-3515][SQL] Moves test suite setup code to beforeAll rather than in constructor Please refer to the JIRA ticket for details. NOTE We should check all test suites that do similar initialization-like side effects in their constructors. This PR only fixes `ParquetMetastoreSuite` because it breaks our Jenkins Maven build. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2375 from liancheng/say-no-to-constructor and squashes the following commits: 0ceb75b [Cheng Lian] Moves test suite setup code to beforeAll rather than in constructor	2014-09-12 20:14:09 -07:00
Cheng Hao	8194fc662c	[SPARK-3481] [SQL] Eliminate the error log in local Hive comparison test Logically, we should remove the Hive Table/Database first and then reset the Hive configuration, repoint to the new data warehouse directory etc. Otherwise it raised exceptions like "Database doesn't not exists: default" in the local testing. Author: Cheng Hao <hao.cheng@intel.com> Closes #2352 from chenghao-intel/test_hive and squashes the following commits: 74fd76b [Cheng Hao] eliminate the error log	2014-09-12 11:29:30 -07:00
Cheng Hao	ca83f1e2c4	[SPARK-2917] [SQL] Avoid table creation in logical plan analyzing for CTAS Author: Cheng Hao <hao.cheng@intel.com> Closes #1846 from chenghao-intel/ctas and squashes the following commits: 56a0578 [Cheng Hao] remove the unused imports 9a57abc [Cheng Hao] Avoid table creation in logical plan analyzing	2014-09-11 11:57:01 -07:00
Aaron Staple	c27718f376	[SPARK-2781][SQL] Check resolution of LogicalPlans in Analyzer. LogicalPlan contains a ‘resolved’ attribute indicating that all of its execution requirements have been resolved. This attribute is not checked before query execution. The analyzer contains a step to check that all Expressions are resolved, but this is not equivalent to checking all LogicalPlans. In particular, the Union plan’s implementation of ‘resolved’ verifies that the types of its children’s columns are compatible. Because the analyzer does not check that a Union plan is resolved, it is possible to execute a Union plan that outputs different types in the same column. See SPARK-2781 for an example. This patch adds two checks to the analyzer’s CheckResolution rule. First, each logical plan is checked to see if it is not resolved despite its children being resolved. This allows the ‘problem’ unresolved plan to be included in the TreeNodeException for reporting. Then as a backstop the root plan is checked to see if it is resolved, which recursively checks that the entire plan tree is resolved. Note that the resolved attribute is implemented recursively, and this patch also explicitly checks the resolved attribute on each logical plan in the tree. I assume the query plan trees will not be large enough for this redundant checking to meaningfully impact performance. Because this patch starts validating that LogicalPlans are resolved before execution, I had to fix some cases where unresolved plans were passing through the analyzer as part of the implementation of the hive query system. In particular, HiveContext applies the CreateTables and PreInsertionCasts, and ExtractPythonUdfs rules manually after the analyzer runs. I moved these rules to the analyzer stage (for hive queries only), in the process completing a code TODO indicating the rules should be moved to the analyzer. It’s worth noting that moving the CreateTables rule means introducing an analyzer rule with a significant side effect - in this case the side effect is creating a hive table. The rule will only attempt to create a table once even if its batch is executed multiple times, because it converts the InsertIntoCreatedTable plan it matches against into an InsertIntoTable. Additionally, these hive rules must be added to the Resolution batch rather than as a separate batch because hive rules rules may be needed to resolve non-root nodes, leaving the root to be resolved on a subsequent batch iteration. For example, the hive compatibility test auto_smb_mapjoin_14, and others, make use of a query plan where the root is a Union and its children are each a hive InsertIntoTable. Mixing the custom hive rules with standard analyzer rules initially resulted in an additional failure because of policy differences between spark sql and hive when casting a boolean to a string. Hive casts booleans to strings as “true” / “false” while spark sql casts booleans to strings as “1” / “0” (causing the cast1.q test to fail). This behavior is a result of the BooleanCasts rule in HiveTypeCoercion.scala, and from looking at the implementation of BooleanCasts I think converting to to “1”/“0” is potentially a programming mistake. (If the BooleanCasts rule is disabled, casting produces “true”/“false” instead.) I believe “true” / “false” should be the behavior for spark sql - I changed the behavior so bools are converted to “true”/“false” to be consistent with hive, and none of the existing spark tests failed. Finally, in some initial testing with hive it appears that an implicit type coercion of boolean to string results in a lowercase string, e.g. CONCAT( TRUE, “” ) -> “true” while an explicit cast produces an all caps string, e.g. CAST( TRUE AS STRING ) -> “TRUE”. The change I’ve made just converts to lowercase strings in all cases. I believe it is at least more correct than the existing spark sql implementation where all Cast expressions become “1” / “0”. Author: Aaron Staple <aaron.staple@gmail.com> Closes #1706 from staple/SPARK-2781 and squashes the following commits: 32683c4 [Aaron Staple] Fix compilation failure due to merge. 7c77fda [Aaron Staple] Move ExtractPythonUdfs to Analyzer's extendedRules in HiveContext. d49bfb3 [Aaron Staple] Address review comments. 915b690 [Aaron Staple] Fix merge issue causing compilation failure. 701dcd2 [Aaron Staple] [SPARK-2781][SQL] Check resolution of LogicalPlans in Analyzer.	2014-09-10 21:01:53 -07:00
Michael Armbrust	84e2c8bfe4	[SQL] Add test case with workaround for reading partitioned Avro files In order to read from partitioned Avro files we need to also set the `SERDEPROPERTIES` since `TBLPROPERTIES` are not passed to the initialization. This PR simply adds a test to make sure we don't break this workaround. Author: Michael Armbrust <michael@databricks.com> Closes #2340 from marmbrus/avroPartitioned and squashes the following commits: 6b969d6 [Michael Armbrust] fix style fea2124 [Michael Armbrust] Add test case with workaround for reading partitioned avro files.	2014-09-10 20:57:38 -07:00
Wenchen Fan	e4f4886d71	[SPARK-2096][SQL] Correctly parse dot notations First let me write down the current `projections` grammar of spark sql: expression : orExpression orExpression : andExpression {"or" andExpression} andExpression : comparisonExpression {"and" comparisonExpression} comparisonExpression : termExpression \| termExpression "=" termExpression \| termExpression ">" termExpression \| ... termExpression : productExpression {"+"\|"-" productExpression} productExpression : baseExpression {"*"\|"/"\|"%" baseExpression} baseExpression : expression "[" expression "]" \| ... \| ident \| ... ident : identChar {identChar \| digit} \| delimiters \| ... identChar : letter \| "_" \| "." delimiters : "," \| ";" \| "(" \| ")" \| "[" \| "]" \| ... projection : expression [["AS"] ident] projections : projection { "," projection} For something like `a.b.c[1]`, it will be parsed as: <img src="http://img51.imgspice.com/i/03008/4iltjsnqgmtt_t.jpg" border=0> But for something like `a[1].b`, the current grammar can't parse it correctly. A simple solution is written in `ParquetQuerySuite#NestedSqlParser`, changed grammars are: delimiters : "." \| "," \| ";" \| "(" \| ")" \| "[" \| "]" \| ... identChar : letter \| "_" baseExpression : expression "[" expression "]" \| expression "." ident \| ... \| ident \| ... This works well, but can't cover some corner case like `select t.a.b from table as t`: <img src="http://img51.imgspice.com/i/03008/v2iau3hoxoxg_t.jpg" border=0> `t.a.b` parsed as `GetField(GetField(UnResolved("t"), "a"), "b")` instead of `GetField(UnResolved("t.a"), "b")` using this new grammar. However, we can't resolve `t` as it's not a filed, but the whole table.(if we could do this, then `select t from table as t` is legal, which is unexpected) My solution is: dotExpressionHeader : ident "." ident baseExpression : expression "[" expression "]" \| expression "." ident \| ... \| dotExpressionHeader \| ident \| ... I passed all test cases under sql locally and add a more complex case. "arrayOfStruct.field1 to access all values of field1" is not supported yet. Since this PR has changed a lot of code, I will open another PR for it. I'm not familiar with the latter optimize phase, please correct me if I missed something. Author: Wenchen Fan <cloud0fan@163.com> Author: Michael Armbrust <michael@databricks.com> Closes #2230 from cloud-fan/dot and squashes the following commits: e1a8898 [Wenchen Fan] remove support for arbitrary nested arrays ee8a724 [Wenchen Fan] rollback LogicalPlan, support dot operation on nested array type a58df40 [Michael Armbrust] add regression test for doubly nested data 16bc4c6 [Wenchen Fan] some enhance 95d733f [Wenchen Fan] split long line dc31698 [Wenchen Fan] SPARK-2096 Correctly parse dot notations	2014-09-10 12:56:59 -07:00
Daoyuan Wang	a0283300c4	[SPARK-3362][SQL] Fix resolution for casewhen with nulls. Current implementation will ignore else val type. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #2245 from adrian-wang/casewhenbug and squashes the following commits: 3332f6e [Daoyuan Wang] remove wrong comment 83b536c [Daoyuan Wang] a comment to trigger retest d7315b3 [Daoyuan Wang] code improve eed35fc [Daoyuan Wang] bug in casewhen resolve	2014-09-10 10:45:24 -07:00
William Benton	2b7ab814f9	[SPARK-3329][SQL] Don't depend on Hive SET pair ordering in tests. This fixes some possible spurious test failures in `HiveQuerySuite` by comparing sets of key-value pairs as sets, rather than as lists. Author: William Benton <willb@redhat.com> Author: Aaron Davidson <aaron@databricks.com> Closes #2220 from willb/spark-3329 and squashes the following commits: 3b3e205 [William Benton] Collapse collectResults case match in HiveQuerySuite 6525d8e [William Benton] Handle cases where SET returns Rows of (single) strings cf11b0e [Aaron Davidson] Fix flakey HiveQuerySuite test	2014-09-08 19:29:23 -07:00
Cheng Lian	dc1dbf206e	[SPARK-3414][SQL] Stores analyzed logical plan when registering a temp table Case insensitivity breaks when unresolved relation contains attributes with uppercase letters in their names, because we store unanalyzed logical plan when registering temp tables while the `CaseInsensitivityAttributeReferences` batch runs before the `Resolution` batch. To fix this issue, we need to store analyzed logical plan. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2293 from liancheng/spark-3414 and squashes the following commits: d9fa1d6 [Cheng Lian] Stores analyzed logical plan when registering a temp table	2014-09-08 19:08:05 -07:00
GuoQiang Li	607ae39c22	[SPARK-3397] Bump pom.xml version number of master branch to 1.2.0-SNAPSHOT Author: GuoQiang Li <witgo@qq.com> Closes #2268 from witgo/SPARK-3397 and squashes the following commits: eaf913f [GuoQiang Li] Bump pom.xml version number of master branch to 1.2.0-SNAPSHOT	2014-09-06 15:04:50 -07:00
Cheng Lian	ee575f12f2	[SPARK-2219][SQL] Added support for the "add jar" command Adds logical and physical command classes for the "add jar" command. Note that this PR conflicts with and should be merged after #2215. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2242 from liancheng/add-jar and squashes the following commits: e43a2f1 [Cheng Lian] Updates AddJar according to conventions introduced in #2215 b99107f [Cheng Lian] Added test case for ADD JAR command 095b2c7 [Cheng Lian] Also forward ADD JAR command to Hive 9be031b [Cheng Lian] Trims Jar path string 8195056 [Cheng Lian] Added support for the "add jar" command	2014-09-04 18:47:45 -07:00
Kousuke Saruta	dc1ba9e9fc	[SPARK-3378] [DOCS] Replace the word "SparkSQL" with right word "Spark SQL" Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2251 from sarutak/SPARK-3378 and squashes the following commits: 0bfe234 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-3378 bb5938f [Kousuke Saruta] Replaced rest of "SparkSQL" with "Spark SQL" 6df66de [Kousuke Saruta] Replaced "SparkSQL" with "Spark SQL"	2014-09-04 15:06:08 -07:00
Cheng Lian	248067adbe	[SPARK-2961][SQL] Use statistics to prune batches within cached partitions This PR is based on #1883 authored by marmbrus. Key differences: 1. Batch pruning instead of partition pruning When #1883 was authored, batched column buffer building (#1880) hadn't been introduced. This PR combines these two and provide partition batch level pruning, which leads to smaller memory footprints and can generally skip more elements. The cost is that the pruning predicates are evaluated more frequently (partition number multiplies batch number per partition). 1. More filters are supported Filter predicates consist of `=`, `<`, `<=`, `>`, `>=` and their conjunctions and disjunctions are supported. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2188 from liancheng/in-mem-batch-pruning and squashes the following commits: 68cf019 [Cheng Lian] Marked sqlContext as @transient 4254f6c [Cheng Lian] Enables in-memory partition pruning in PartitionBatchPruningSuite 3784105 [Cheng Lian] Overrides InMemoryColumnarTableScan.sqlContext d2a1d66 [Cheng Lian] Disables in-memory partition pruning by default 062c315 [Cheng Lian] HiveCompatibilitySuite code cleanup 16b77bf [Cheng Lian] Fixed pruning predication conjunctions and disjunctions 16195c5 [Cheng Lian] Enabled both disjunction and conjunction 89950d0 [Cheng Lian] Worked around Scala style check 9c167f6 [Cheng Lian] Minor code cleanup 3c4d5c7 [Cheng Lian] Minor code cleanup ea59ee5 [Cheng Lian] Renamed PartitionSkippingSuite to PartitionBatchPruningSuite fc517d0 [Cheng Lian] More test cases 1868c18 [Cheng Lian] Code cleanup, bugfix, and adding tests cb76da4 [Cheng Lian] Added more predicate filters, fixed table scan stats for testing purposes 385474a [Cheng Lian] Merge branch 'inMemStats' into in-mem-batch-pruning	2014-09-03 18:59:26 -07:00
Cheng Lian	f48420fde5	[SPARK-2973][SQL] Lightweight SQL commands without distributed jobs when calling .collect() By overriding `executeCollect()` in physical plan classes of all commands, we can avoid to kick off a distributed job when collecting result of a SQL command, e.g. `sql("SET").collect()`. Previously, `Command.sideEffectResult` returns a `Seq[Any]`, and the `execute()` method in sub-classes of `Command` typically convert that to a `Seq[Row]` then parallelize it to an RDD. Now with this PR, `sideEffectResult` is required to return a `Seq[Row]` directly, so that `executeCollect()` can directly leverage that and be factored to the `Command` parent class. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2215 from liancheng/lightweight-commands and squashes the following commits: 3fbef60 [Cheng Lian] Factored execute() method of physical commands to parent class Command 5a0e16c [Cheng Lian] Passes test suites e0e12e9 [Cheng Lian] Refactored Command.sideEffectResult and Command.executeCollect 995bdd8 [Cheng Lian] Cleaned up DescribeHiveTableCommand 542977c [Cheng Lian] Avoids confusion between logical and physical plan by adding package prefixes 55b2aa5 [Cheng Lian] Avoids distributed jobs when execution SQL commands	2014-09-03 18:57:20 -07:00
qiping.lqp	634d04b87c	[SPARK-3291][SQL]TestcaseName in createQueryTest should not contain ":" ":" is not allowed to appear in a file name of Windows system. If file name contains ":", this file can't be checked out in a Windows system and developers using Windows must be careful to not commit the deletion of such files, Which is very inconvenient. Author: qiping.lqp <qiping.lqp@alibaba-inc.com> Closes #2191 from chouqin/querytest and squashes the following commits: 0e943a1 [qiping.lqp] rename golden file 60a863f [qiping.lqp] TestcaseName in createQueryTest should not contain ":"	2014-08-29 15:37:43 -07:00
Cheng Lian	b1eccfc88a	[SQL] Turns on in-memory columnar compression in HiveCompatibilitySuite `HiveCompatibilitySuite` already turns on in-memory columnar caching, it would be good to also enable compression to improve test coverage. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2190 from liancheng/compression-on and squashes the following commits: 88b536c [Cheng Lian] Code cleanup, narrowed field visibility d13efd2 [Cheng Lian] Turns on in-memory columnar compression in HiveCompatibilitySuite	2014-08-29 15:34:59 -07:00
William Benton	2f1519defa	SPARK-2813: [SQL] Implement SQRT() directly in Spark SQL This PR adds a native implementation for SQL SQRT() and thus avoids delegating this function to Hive. Author: William Benton <willb@redhat.com> Closes #1750 from willb/spark-2813 and squashes the following commits: 22c8a79 [William Benton] Fixed missed newline from rebase d673861 [William Benton] Added string coercions for SQRT and associated test case e125df4 [William Benton] Added ExpressionEvaluationSuite test cases for SQRT 7b84bcd [William Benton] SQL SQRT now properly returns NULL for NULL inputs 8256971 [William Benton] added SQRT test to SqlQuerySuite 504d2e5 [William Benton] Added native SQRT implementation	2014-08-29 15:26:59 -07:00
luogankun	65253502b9	[SPARK-3065][SQL] Add locale setting to fix results do not match for udf_unix_timestamp format "yyyy MMM dd h:mm:ss a" run with not "America/Los_Angeles" TimeZone in HiveCompatibilitySuite When run the udf_unix_timestamp of org.apache.spark.sql.hive.execution.HiveCompatibilitySuite testcase with not "America/Los_Angeles" TimeZone throws error. [https://issues.apache.org/jira/browse/SPARK-3065] add locale setting on beforeAll and afterAll method to fix the bug of HiveCompatibilitySuite testcase Author: luogankun <luogankun@gmail.com> Closes #1968 from luogankun/SPARK-3065 and squashes the following commits: c167832 [luogankun] [SPARK-3065][SQL] Add Locale setting to HiveCompatibilitySuite 0a25e3a [luogankun] [SPARK-3065][SQL] Add Locale setting to HiveCompatibilitySuite	2014-08-27 15:08:22 -07:00
Aaron Davidson	cc275f4b79	[SQL] [SPARK-3236] Reading Parquet tables from Metastore mangles location Currently we do `relation.hiveQlTable.getDataLocation.getPath`, which returns the path-part of the URI (e.g., "s3n://my-bucket/my-path" => "/my-path"). We should do `relation.hiveQlTable.getDataLocation.toString` instead, as a URI's toString returns a faithful representation of the full URI, which can later be passed into a Hadoop Path. Author: Aaron Davidson <aaron@databricks.com> Closes #2150 from aarondav/parquet-location and squashes the following commits: 459f72c [Aaron Davidson] [SQL] [SPARK-3236] Reading Parquet tables from Metastore mangles location	2014-08-27 15:05:47 -07:00
viirya	28d41d6279	[SPARK-3252][SQL] Add missing condition for test According to the text message, both relations should be tested. So add the missing condition. Author: viirya <viirya@gmail.com> Closes #2159 from viirya/fix_test and squashes the following commits: b1c0f52 [viirya] add missing condition.	2014-08-27 14:55:05 -07:00
u0jing	3b5eb7083d	[SPARK-3118][SQL]add "SHOW TBLPROPERTIES tblname;" and "SHOW COLUMNS (FROM\|IN) table_name [(FROM\|IN) db_name]" support JIRA issue: [SPARK-3118] https://issues.apache.org/jira/browse/SPARK-3118 eg: > SHOW TBLPROPERTIES test; SHOW TBLPROPERTIES test; numPartitions 0 numFiles 1 transient_lastDdlTime 1407923642 numRows 0 totalSize 82 rawDataSize 0 eg: > SHOW COLUMNS in test; SHOW COLUMNS in test; OK Time taken: 0.304 seconds id stid bo Author: u0jing <u9jing@gmail.com> Closes #2034 from u0jing/spark-3118 and squashes the following commits: b231d87 [u0jing] add golden answer files 35f4885 [u0jing] add 'show columns' and 'show tblproperties' support	2014-08-27 12:47:14 -07:00
Michael Armbrust	c4787a3690	[SPARK-3194][SQL] Add AttributeSet to fix bugs with invalid comparisons of AttributeReferences It is common to want to describe sets of attributes that are in various parts of a query plan. However, the semantics of putting `AttributeReference` objects into a standard Scala `Set` result in subtle bugs when references differ cosmetically. For example, with case insensitive resolution it is possible to have two references to the same attribute whose names are not equal. In this PR I introduce a new abstraction, an `AttributeSet`, which performs all comparisons using the globally unique `ExpressionId` instead of case class equality. (There is already a related class, [`AttributeMap`](https://github.com/marmbrus/spark/blob/inMemStats/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeMap.scala#L32)) This new type of set is used to fix a bug in the optimizer where needed attributes were getting projected away underneath join operators. I also took this opportunity to refactor the expression and query plan base classes. In all but one instance the logic for computing the `references` of an `Expression` were the same. Thus, I moved this logic into the base class. For query plans the semantics of the `references` method were ill defined (is it the references output? or is it those used by expression evaluation? or what?). As a result, this method wasn't really used very much. So, I removed it. TODO: - [x] Finish scala doc for `AttributeSet` - [x] Scan the code for other instances of `Set[Attribute]` and refactor them. - [x] Finish removing `references` from `QueryPlan` Author: Michael Armbrust <michael@databricks.com> Closes #2109 from marmbrus/attributeSets and squashes the following commits: 1c0dae5 [Michael Armbrust] work on serialization bug. 9ba868d [Michael Armbrust] Merge remote-tracking branch 'origin/master' into attributeSets 3ae5288 [Michael Armbrust] review comments 40ce7f6 [Michael Armbrust] style d577cc7 [Michael Armbrust] Scaladoc cae5d22 [Michael Armbrust] remove more references implementations d6e16be [Michael Armbrust] Remove more instances of "def references" and normal sets of attributes. fc26b49 [Michael Armbrust] Add AttributeSet class, remove references from Expression.	2014-08-26 16:29:14 -07:00
Daoyuan Wang	52fbdc2ded	[Spark-3222] [SQL] Cross join support in HiveQL We can simple treat cross join as inner join without join conditions. Author: Daoyuan Wang <daoyuan.wang@intel.com> Author: adrian-wang <daoyuanwong@gmail.com> Closes #2124 from adrian-wang/crossjoin and squashes the following commits: 8c9b7c5 [Daoyuan Wang] add a test 7d47bbb [adrian-wang] add cross join support for hql	2014-08-25 22:56:35 -07:00
Cheng Hao	156eb39661	[SPARK-3058] [SQL] Support EXTENDED for EXPLAIN Provide `extended` keyword support for `explain` command in SQL. e.g. ``` explain extended select key as a1, value as a2 from src where key=1; == Parsed Logical Plan == Project ['key AS a1#3,'value AS a2#4] Filter ('key = 1) UnresolvedRelation None, src, None == Analyzed Logical Plan == Project [key#8 AS a1#3,value#9 AS a2#4] Filter (CAST(key#8, DoubleType) = CAST(1, DoubleType)) MetastoreRelation default, src, None == Optimized Logical Plan == Project [key#8 AS a1#3,value#9 AS a2#4] Filter (CAST(key#8, DoubleType) = 1.0) MetastoreRelation default, src, None == Physical Plan == Project [key#8 AS a1#3,value#9 AS a2#4] Filter (CAST(key#8, DoubleType) = 1.0) HiveTableScan [key#8,value#9], (MetastoreRelation default, src, None), None Code Generation: false == RDD == (2) MappedRDD[14] at map at HiveContext.scala:350 MapPartitionsRDD[13] at mapPartitions at basicOperators.scala:42 MapPartitionsRDD[12] at mapPartitions at basicOperators.scala:57 MapPartitionsRDD[11] at mapPartitions at TableReader.scala:112 MappedRDD[10] at map at TableReader.scala:240 HadoopRDD[9] at HadoopRDD at TableReader.scala:230 ``` It's the sub task of #1847. But can go without any dependency. Author: Cheng Hao <hao.cheng@intel.com> Closes #1962 from chenghao-intel/explain_extended and squashes the following commits: 295db74 [Cheng Hao] Fix bug in printing the simple execution plan 48bc989 [Cheng Hao] Support EXTENDED for EXPLAIN	2014-08-25 17:43:56 -07:00
Michael Armbrust	7e191fe29b	[SPARK-2554][SQL] CountDistinct partial aggregation and object allocation improvements Author: Michael Armbrust <michael@databricks.com> Author: Gregory Owen <greowen@gmail.com> Closes #1935 from marmbrus/countDistinctPartial and squashes the following commits: 5c7848d [Michael Armbrust] turn off caching in the constructor 8074a80 [Michael Armbrust] fix tests 32d216f [Michael Armbrust] reynolds comments c122cca [Michael Armbrust] Address comments, add tests b2e8ef3 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into countDistinctPartial fae38f4 [Michael Armbrust] Fix style fdca896 [Michael Armbrust] cleanup 93d0f64 [Michael Armbrust] metastore concurrency fix. db44a30 [Michael Armbrust] JIT hax. 3868f6c [Michael Armbrust] Merge pull request #9 from GregOwen/countDistinctPartial c9e67de [Gregory Owen] Made SpecificRow and types serializable by Kryo 2b46c4b [Michael Armbrust] Merge remote-tracking branch 'origin/master' into countDistinctPartial 8ff6402 [Michael Armbrust] Add specific row. 58d15f1 [Michael Armbrust] disable codegen logging 87d101d [Michael Armbrust] Fix isNullAt bug abee26d [Michael Armbrust] WIP 27984d0 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into countDistinctPartial 57ae3b1 [Michael Armbrust] Fix order dependent test b3d0f64 [Michael Armbrust] Add golden files. c1f7114 [Michael Armbrust] Improve tests / fix serialization. f31b8ad [Michael Armbrust] more fixes 38c7449 [Michael Armbrust] comments and style 9153652 [Michael Armbrust] better toString d494598 [Michael Armbrust] Fix tests now that the planner is better 41fbd1d [Michael Armbrust] Never try and create an empty hash set. 050bb97 [Michael Armbrust] Skip no-arg constructors for kryo, bd08239 [Michael Armbrust] WIP 213ada8 [Michael Armbrust] First draft of partially aggregated and code generated count distinct / max	2014-08-23 16:19:10 -07:00
Yin Huai	2fb1c72ea2	[SQL] Make functionRegistry in HiveContext transient. Seems we missed `transient` for the `functionRegistry` in `HiveContext`. cc: marmbrus Author: Yin Huai <huaiyin.thu@gmail.com> Closes #2074 from yhuai/makeFunctionRegistryTransient and squashes the following commits: 6534e7d [Yin Huai] Make functionRegistry transient.	2014-08-23 12:46:41 -07:00
Alex Liu	d9e94146a6	[SPARK-2846][SQL] Add configureInputJobPropertiesForStorageHandler to initialization of job conf ...al job conf Author: Alex Liu <alex_liu68@yahoo.com> Closes #1927 from alexliu68/SPARK-SQL-2846 and squashes the following commits: e4bdc4c [Alex Liu] SPARK-SQL-2846 add configureInputJobPropertiesForStorageHandler to initial job conf	2014-08-20 16:14:06 -07:00
Michael Armbrust	3abd0c1cda	[SPARK-2406][SQL] Initial support for using ParquetTableScan to read HiveMetaStore tables. This PR adds an experimental flag `spark.sql.hive.convertMetastoreParquet` that when true causes the planner to detects tables that use Hive's Parquet SerDe and instead plans them using Spark SQL's native `ParquetTableScan`. Author: Michael Armbrust <michael@databricks.com> Author: Yin Huai <huai@cse.ohio-state.edu> Closes #1819 from marmbrus/parquetMetastore and squashes the following commits: 1620079 [Michael Armbrust] Revert "remove hive parquet bundle" cc30430 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into parquetMetastore 4f3d54f [Michael Armbrust] fix style 41ebc5f [Michael Armbrust] remove hive parquet bundle a43e0da [Michael Armbrust] Merge remote-tracking branch 'origin/master' into parquetMetastore 4c4dc19 [Michael Armbrust] Fix bug with tree splicing. ebb267e [Michael Armbrust] include parquet hive to tests pass (Remove this later). c0d9b72 [Michael Armbrust] Avoid creating a HadoopRDD per partition. Add dirty hacks to retrieve partition values from the InputSplit. 8cdc93c [Michael Armbrust] Merge pull request #8 from yhuai/parquetMetastore a0baec7 [Yin Huai] Partitioning columns can be resolved. 1161338 [Michael Armbrust] Add a test to make sure conversion is actually happening 212d5cd [Michael Armbrust] Initial support for using ParquetTableScan to read HiveMetaStore tables.	2014-08-18 13:17:10 -07:00
Patrick Wendell	7ae28d1247	SPARK-3096: Include parquet hive serde by default in build A small change - we should just add this dependency. It doesn't have any recursive deps and it's needed for reading have parquet tables. Author: Patrick Wendell <pwendell@gmail.com> Closes #2009 from pwendell/parquet and squashes the following commits: e411f9f [Patrick Wendell] SPARk-309: Include parquet hive serde by default in build	2014-08-18 10:00:46 -07:00
Michael Armbrust	9256d4a9c8	[SPARK-2994][SQL] Support for udfs that take complex types Author: Michael Armbrust <michael@databricks.com> Closes #1915 from marmbrus/arrayUDF and squashes the following commits: a1c503d [Michael Armbrust] Support for udfs that take complex types	2014-08-13 17:35:38 -07:00
tianyi	13f54e2b97	[SPARK-2817] [SQL] add "show create table" support In spark sql component, the "show create table" syntax had been disabled. We thought it is a useful funciton to describe a hive table. Author: tianyi <tianyi@asiainfo-linkage.com> Author: tianyi <tianyi@asiainfo.com> Author: tianyi <tianyi.asiainfo@gmail.com> Closes #1760 from tianyi/spark-2817 and squashes the following commits: 7d28b15 [tianyi] [SPARK-2817] fix too short prefix problem cbffe8b [tianyi] [SPARK-2817] fix the case problem 565ec14 [tianyi] [SPARK-2817] fix the case problem 60d48a9 [tianyi] [SPARK-2817] use system temporary folder instead of temporary files in the source tree, and also clean some empty line dbe1031 [tianyi] [SPARK-2817] move some code out of function rewritePaths, as it may be called multiple times 9b2ba11 [tianyi] [SPARK-2817] fix the line length problem 9f97586 [tianyi] [SPARK-2817] remove test.tmp.dir from pom.xml bfc2999 [tianyi] [SPARK-2817] add "File.separator" support, create a "testTmpDir" outside the rewritePaths bde800a [tianyi] [SPARK-2817] add "${system:test.tmp.dir}" support add "last_modified_by" to nonDeterministicLineIndicators in HiveComparisonTest bb82726 [tianyi] [SPARK-2817] remove test which requires a system from the whitelist. bbf6b42 [tianyi] [SPARK-2817] add a systemProperties named "test.tmp.dir" to pass the test which contains "${system:test.tmp.dir}" a337bd6 [tianyi] [SPARK-2817] add "show create table" support a03db77 [tianyi] [SPARK-2817] add "show create table" support	2014-08-13 16:50:02 -07:00

1 2 3 4

197 commits