ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
navis.ryu	5c05b5c0d2	[SPARK-8285] [SQL] CombineSum should be calculated as unlimited decimal first case cs CombineSum(expr) => val calcType = expr.dataType expr.dataType match { case DecimalType.Fixed(_, _) => DecimalType.Unlimited case _ => expr.dataType } calcType is always expr.dataType. credits are all belong to IntelliJ Author: navis.ryu <navis@apache.org> Closes #6736 from navis/SPARK-8285 and squashes the following commits: 20382c1 [navis.ryu] [SPARK-8285] [SQL] CombineSum should be calculated as unlimited decimal first (cherry picked from commit `6a47114bc2`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-10 18:19:24 -07:00
Cheng Lian	69197c3e38	[SPARK-8121] [SQL] Fixes InsertIntoHadoopFsRelation job initialization for Hadoop 1.x (branch 1.4 backport based on https://github.com/apache/spark/pull/6669 )	2015-06-08 11:36:42 -07:00
Reynold Xin	b9c046f6d7	[SPARK-8004][SQL] Quote identifier in JDBC data source. This is a follow-up patch to #6577 to replace columnEnclosing to quoteIdentifier. I also did some minor cleanup to the JdbcDialect file. Author: Reynold Xin <rxin@databricks.com> Closes #6689 from rxin/jdbc-quote and squashes the following commits: bad365f [Reynold Xin] Fixed test compilation... e39e14e [Reynold Xin] Fixed compilation. db9a8e0 [Reynold Xin] [SPARK-8004][SQL] Quote identifier in JDBC data source. (cherry picked from commit `d6d601a07b`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-07 10:52:18 -07:00
Liang-Chi Hsieh	b4d54417e5	[SPARK-8141] [SQL] Precompute datatypes for partition columns and reuse it JIRA: https://issues.apache.org/jira/browse/SPARK-8141 Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #6687 from viirya/reuse_partition_column_types and squashes the following commits: dab0688 [Liang-Chi Hsieh] Reuse partitionColumnTypes. (cherry picked from commit `26d07f1ece`) Signed-off-by: Cheng Lian <lian@databricks.com>	2015-06-07 15:35:43 +08:00
Liang-Chi Hsieh	b6fdc6cf11	[SPARK-8004][SQL] Enclose column names by JDBC Dialect JIRA: https://issues.apache.org/jira/browse/SPARK-8004 Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #6577 from viirya/enclose_jdbc_columns and squashes the following commits: 614606a [Liang-Chi Hsieh] For comment. bc50182 [Liang-Chi Hsieh] Enclose column names by JDBC Dialect. (cherry picked from commit `901a552c5e`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-06 23:00:18 -07:00
Cheng Lian	d8a53fb806	[SPARK-8079] [SQL] Makes InsertIntoHadoopFsRelation job/task abortion more robust As described in SPARK-8079, when writing a DataFrame to a `HadoopFsRelation`, if `HadoopFsRelation.prepareForWriteJob` throws exception, an unexpected NPE will be thrown during job abortion. (This issue doesn't bring much damage since the job is failing anyway.) This PR makes the job/task abortion logic in `InsertIntoHadoopFsRelation` more robust to avoid such confusing exceptions. Author: Cheng Lian <lian@databricks.com> Closes #6612 from liancheng/spark-8079 and squashes the following commits: 87cd81e [Cheng Lian] Addresses @rxin's comment 1864c75 [Cheng Lian] Addresses review comments 9e6dbb3 [Cheng Lian] Makes InsertIntoHadoopFsRelation job/task abortion more robust (cherry picked from commit `16fc49617e`) Signed-off-by: Cheng Lian <lian@databricks.com>	2015-06-06 17:23:46 +08:00
Shivaram Venkataraman	3e3151e755	[SPARK-8085] [SPARKR] Support user-specified schema in read.df cc davies sun-rui Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6620 from shivaram/sparkr-read-schema and squashes the following commits: 16a6726 [Shivaram Venkataraman] Fix loadDF to pass schema Also add a unit test a229877 [Shivaram Venkataraman] Use wrapper function to DataFrameReader ee70ba8 [Shivaram Venkataraman] Support user-specified schema in read.df (cherry picked from commit `12f5eaeee1`) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>	2015-06-05 10:19:15 -07:00
Mike Dusenberry	81ff7a9012	[SPARK-7969] [SQL] Added a DataFrame.drop function that accepts a Column reference. Added a `DataFrame.drop` function that accepts a `Column` reference rather than a `String`, and added associated unit tests. Basically iterates through the `DataFrame` to find a column with an expression that is equivalent to that of the `Column` argument supplied to the function. Author: Mike Dusenberry <dusenberrymw@gmail.com> Closes #6585 from dusenberrymw/SPARK-7969_Drop_method_on_Dataframes_should_handle_Column and squashes the following commits: 514727a [Mike Dusenberry] Updating the @since tag of the drop(Column) function doc to reflect version 1.4.1 instead of 1.4.0. 2f1bb4e [Mike Dusenberry] Adding an additional assert statement to the 'drop column after join' unit test in order to make sure the correct column was indeed left over. 6bf7c0e [Mike Dusenberry] Minor code formatting change. e583888 [Mike Dusenberry] Adding more Python doctests for the df.drop with column reference function to test joined datasets that have columns with the same name. 5f74401 [Mike Dusenberry] Updating DataFrame.drop with column reference function to use logicalPlan.output to prevent ambiguities resulting from columns with the same name. Also added associated unit tests for joined datasets with duplicate column names. 4b8bbe8 [Mike Dusenberry] Adding Python support for Dataframe.drop with a Column reference. 986129c [Mike Dusenberry] Added a DataFrame.drop function that accepts a Column reference rather than a String, and added associated unit tests. Basically iterates through the DataFrame to find a column with an expression that is equivalent to one supplied to the function. (cherry picked from commit `df7da07a86`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-04 11:30:25 -07:00
Andrew Or	bfe74b34a6	[SPARK-7558] Demarcate tests in unit-tests.log (1.4) This includes the following commits: original: `9eb222c` hotfix1: `8c99793` hotfix2: `a4f2412` scalastyle check: `609c492` --- Original patch #6441 Branch-1.3 patch #6602 Author: Andrew Or <andrew@databricks.com> Closes #6598 from andrewor14/demarcate-tests-1.4 and squashes the following commits: 4c3c566 [Andrew Or] Merge branch 'branch-1.4' of github.com:apache/spark into demarcate-tests-1.4 e217b78 [Andrew Or] [SPARK-7558] Guard against direct uses of FunSuite / FunSuiteLike 46d4361 [Andrew Or] Various whitespace changes (minor) 3d9bf04 [Andrew Or] Make all test suites extend SparkFunSuite instead of FunSuite eaa520e [Andrew Or] Fix tests? b4d93de [Andrew Or] Fix tests 634a777 [Andrew Or] Fix log message a932e8d [Andrew Or] Fix manual things that cannot be covered through automation 8bc355d [Andrew Or] Add core tests as dependencies in all modules 75d361f [Andrew Or] Introduce base abstract class for all test suites	2015-06-03 20:46:44 -07:00
Reynold Xin	1f90a06bda	[SPARK-8074] Parquet should throw AnalysisException during setup for data type/name related failures. Author: Reynold Xin <rxin@databricks.com> Closes #6608 from rxin/parquet-analysis and squashes the following commits: b5dc8e2 [Reynold Xin] Code review feedback. 5617cf6 [Reynold Xin] [SPARK-8074] Parquet should throw AnalysisException during setup for data type/name related failures. (cherry picked from commit `939e4f3d8d`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-03 13:58:15 -07:00
animesh	0a1dad6cd4	[SPARK-7980] [SQL] Support SQLContext.range(end) 1. range() overloaded in SQLContext.scala 2. range() modified in python sql context.py 3. Tests added accordingly in DataFrameSuite.scala and python sql tests.py Author: animesh <animesh@apache.spark> Closes #6609 from animeshbaranawal/SPARK-7980 and squashes the following commits: 935899c [animesh] SPARK-7980:python+scala changes (cherry picked from commit `d053a31be9`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-03 11:28:38 -07:00
Patrick Wendell	ab713af564	Preparing development version 1.4.0-SNAPSHOT	2015-06-02 18:06:41 -07:00
Patrick Wendell	22596c534a	Preparing Spark release v1.4.0-rc4	2015-06-02 18:06:35 -07:00
Patrick Wendell	e3c35b217c	Preparing development version 1.4.0-SNAPSHOT	2015-06-02 17:01:15 -07:00
Patrick Wendell	a14fad11ef	Preparing Spark release v1.4.0-rc4	2015-06-02 17:01:10 -07:00
Patrick Wendell	92ccc5ba39	Preparing development version 1.4.0-SNAPSHOT	2015-06-02 14:02:19 -07:00
Patrick Wendell	d630f4d697	Preparing Spark release v1.4.0-rc4	2015-06-02 14:02:14 -07:00
Cheng Lian	cbaf595447	[SPARK-8014] [SQL] Avoid premature metadata discovery when writing a HadoopFsRelation with a save mode other than Append The current code references the schema of the DataFrame to be written before checking save mode. This triggers expensive metadata discovery prematurely. For save mode other than `Append`, this metadata discovery is useless since we either ignore the result (for `Ignore` and `ErrorIfExists`) or delete existing files (for `Overwrite`) later. This PR fixes this issue by deferring metadata discovery after save mode checking. Author: Cheng Lian <lian@databricks.com> Closes #6583 from liancheng/spark-8014 and squashes the following commits: 1aafabd [Cheng Lian] Updates comments 088abaa [Cheng Lian] Avoids schema merging and partition discovery when data schema and partition schema are defined 8fbd93f [Cheng Lian] Fixes SPARK-8014 (cherry picked from commit `686a45f0b9`) Signed-off-by: Yin Huai <yhuai@databricks.com>	2015-06-02 13:32:34 -07:00
Cheng Lian	f71a09de6e	[SPARK-8037] [SQL] Ignores files whose name starts with dot in HadoopFsRelation Author: Cheng Lian <lian@databricks.com> Closes #6581 from liancheng/spark-8037 and squashes the following commits: d08e97b [Cheng Lian] Ignores files whose name starts with dot in HadoopFsRelation (cherry picked from commit `1bb5d716c0`) Signed-off-by: Cheng Lian <lian@databricks.com>	2015-06-03 01:09:19 +08:00
Patrick Wendell	92a677891c	Preparing development version 1.4.0-SNAPSHOT	2015-06-02 08:41:15 -07:00
Patrick Wendell	48c506724a	Preparing Spark release v1.4.0-rc4	2015-06-02 08:41:10 -07:00
Yin Huai	87941ff8c4	[SPARK-8023][SQL] Add "deterministic" attribute to Expression to avoid collapsing nondeterministic projects. This closes #6570. Author: Yin Huai <yhuai@databricks.com> Author: Reynold Xin <rxin@databricks.com> Closes #6573 from rxin/deterministic and squashes the following commits: 356cd22 [Reynold Xin] Added unit test for the optimizer. da3fde1 [Reynold Xin] Merge pull request #6570 from yhuai/SPARK-8023 da56200 [Yin Huai] Comments. e38f264 [Yin Huai] Comment. f9d6a73 [Yin Huai] Add a deterministic method to Expression. (cherry picked from commit `0f80990bfa`) Signed-off-by: Reynold Xin <rxin@databricks.com> Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/random.scala	2015-06-02 00:21:27 -07:00
Yin Huai	4940630f56	[SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make metadataHive get constructed too early https://issues.apache.org/jira/browse/SPARK-8020 Author: Yin Huai <yhuai@databricks.com> Closes #6571 from yhuai/SPARK-8020-1 and squashes the following commits: 0398f5b [Yin Huai] First populate the SQLConf and then construct executionHive and metadataHive. (cherry picked from commit `7b7f7b6c6f`) Signed-off-by: Yin Huai <yhuai@databricks.com>	2015-06-02 00:17:09 -07:00
Davies Liu	9d6475b93d	[SPARK-6917] [SQL] DecimalType is not read back when non-native type exists cc yhuai Author: Davies Liu <davies@databricks.com> Closes #6558 from davies/decimalType and squashes the following commits: c877ca8 [Davies Liu] Update ParquetConverter.scala 48cc57c [Davies Liu] Update ParquetConverter.scala b43845c [Davies Liu] add test 3b4a94f [Davies Liu] DecimalType is not read back when non-native type exists (cherry picked from commit `bcb47ad771`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-01 23:12:37 -07:00
Reynold Xin	3af4c0b4e8	[minor doc] Add exploratory data analysis warning for DataFrame.stat.freqItem API Author: Reynold Xin <rxin@databricks.com> Closes #6569 from rxin/freqItemsWarning and squashes the following commits: 7eec145 [Reynold Xin] [minor doc] Add exploratory data analysis warning for DataFrame.stat.freqItem API. (cherry picked from commit `4c868b9943`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-01 21:29:46 -07:00
Reynold Xin	8ac23762ec	[SPARK-8026][SQL] Add Column.alias to Scala/Java DataFrame API Author: Reynold Xin <rxin@databricks.com> Closes #6565 from rxin/alias and squashes the following commits: 286d880 [Reynold Xin] [SPARK-8026][SQL] Add Column.alias to Scala/Java DataFrame API (cherry picked from commit `89f642a0e8`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-01 21:13:21 -07:00
Reynold Xin	efc0e05323	[SPARK-7982][SQL] DataFrame.stat.crosstab should use 0 instead of null for pairs that don't appear Author: Reynold Xin <rxin@databricks.com> Closes #6566 from rxin/crosstab and squashes the following commits: e0ace1c [Reynold Xin] [SPARK-7982][SQL] DataFrame.stat.crosstab should use 0 instead of null for pairs that don't appear (cherry picked from commit `6396cc0303`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-01 21:11:26 -07:00
Reynold Xin	bab0fab68f	[SPARK-3850] Turn style checker on for trailing whitespaces. Author: Reynold Xin <rxin@databricks.com> Closes #6541 from rxin/trailing-whitespace-on and squashes the following commits: f72ebe4 [Reynold Xin] [SPARK-3850] Turn style checker on for trailing whitespaces. (cherry picked from commit `866652c903`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-31 14:23:48 -07:00
Reynold Xin	a1904fa79e	[SPARK-3850] Trim trailing spaces for SQL. Author: Reynold Xin <rxin@databricks.com> Closes #6535 from rxin/whitespace-sql and squashes the following commits: de50316 [Reynold Xin] [SPARK-3850] Trim trailing spaces for SQL. (cherry picked from commit `63a50be13d`) Signed-off-by: Reynold Xin <rxin@databricks.com> Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala	2015-05-31 00:52:02 -07:00
Reynold Xin	2016927f70	[SPARK-7975] Add style checker to disallow overriding equals covariantly. Author: Reynold Xin <rxin@databricks.com> This patch had conflicts when merged, resolved by Committer: Reynold Xin <rxin@databricks.com> Closes #6527 from rxin/covariant-equals and squashes the following commits: e7d7784 [Reynold Xin] [SPARK-7975] Enforce CovariantEqualsChecker (cherry picked from commit `7896e99b2a`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-31 00:06:02 -07:00
Cheng Lian	0d093d6e78	[SQL] [MINOR] Adds @deprecated Scaladoc entry for SchemaRDD Author: Cheng Lian <lian@databricks.com> Closes #6529 from liancheng/schemardd-deprecation-fix and squashes the following commits: 49765c2 [Cheng Lian] Adds @deprecated Scaladoc entry for SchemaRDD (cherry picked from commit `8764dccebd`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-30 23:49:47 -07:00
Reynold Xin	e74ea78276	[SPARK-7971] Add JavaDoc style deprecation for deprecated DataFrame methods Scala deprecated annotation actually doesn't show up in JavaDoc. Author: Reynold Xin <rxin@databricks.com> Closes #6523 from rxin/df-deprecated-javadoc and squashes the following commits: 26da2b2 [Reynold Xin] [SPARK-7971] Add JavaDoc style deprecation for deprecated DataFrame methods. (cherry picked from commit `c63e1a742b`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-30 19:51:58 -07:00
Reynold Xin	dc58e688ab	[SQL] Tighten up visibility for JavaDoc. I went through all the JavaDocs and tightened up visibility. Author: Reynold Xin <rxin@databricks.com> Closes #6526 from rxin/sql-1.4-visibility-for-docs and squashes the following commits: bc37d1e [Reynold Xin] Tighten up visibility for JavaDoc. (cherry picked from commit `14b314dc2c`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-30 19:51:17 -07:00
Reynold Xin	f40605f064	[SPARK-7940] Enforce whitespace checking for DO, TRY, CATCH, FINALLY, MATCH, LARROW, RARROW in style checker. … Author: Reynold Xin <rxin@databricks.com> Closes #6491 from rxin/more-whitespace and squashes the following commits: f6e63dc [Reynold Xin] [SPARK-7940] Enforce whitespace checking for DO, TRY, CATCH, FINALLY, MATCH, LARROW, RARROW in style checker. (cherry picked from commit `94f62a4979`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-29 13:39:02 -07:00
Patrick Wendell	e549874c33	Preparing development version 1.4.0-SNAPSHOT	2015-05-29 13:07:07 -07:00
Patrick Wendell	dd109a8746	Preparing Spark release v1.4.0-rc3	2015-05-29 13:06:59 -07:00
Patrick Wendell	c68abaa34e	Preparing development version 1.4.0-SNAPSHOT	2015-05-29 12:15:18 -07:00
Patrick Wendell	fb60503ff2	Preparing Spark release v1.4.0-rc3	2015-05-29 12:15:13 -07:00
Patrick Wendell	6bf5a42084	Preparing development version 1.4.0-SNAPSHOT	2015-05-28 23:40:27 -07:00
Patrick Wendell	f2796816be	Preparing Spark release v1.4.0-rc3	2015-05-28 23:40:22 -07:00
Patrick Wendell	119c93af9c	Preparing development version 1.4.0-SNAPSHOT	2015-05-28 22:57:31 -07:00
Patrick Wendell	2d97d7a0aa	Preparing Spark release v1.4.0-rc3	2015-05-28 22:57:26 -07:00
Reynold Xin	9b97e95e86	[SPARK-7927] whitespace fixes for SQL core. So we can enable a whitespace enforcement rule in the style checker to save code review time. Author: Reynold Xin <rxin@databricks.com> Closes #6477 from rxin/whitespace-sql-core and squashes the following commits: ce6e369 [Reynold Xin] Fixed tests. 6095fed [Reynold Xin] [SPARK-7927] whitespace fixes for SQL core. (cherry picked from commit `ff44c711ab`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-28 20:10:28 -07:00
Patrick Wendell	7c342bdd93	Preparing development version 1.4.0-SNAPSHOT	2015-05-27 22:36:30 -07:00
Patrick Wendell	4983dfc878	Preparing Spark release v1.4.0-rc3	2015-05-27 22:36:23 -07:00
Liang-Chi Hsieh	b4ecbce65c	[SPARK-7897][SQL] Use DecimalType to represent unsigned bigint in JDBCRDD JIRA: https://issues.apache.org/jira/browse/SPARK-7897 Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #6438 from viirya/jdbc_unsigned_bigint and squashes the following commits: ccb3c3f [Liang-Chi Hsieh] Use DecimalType to represent unsigned bigint. (cherry picked from commit `a1e092eae5`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-27 18:51:42 -07:00
Cheng Lian	89fe93fc3b	[SPARK-7684] [SQL] Refactoring MetastoreDataSourcesSuite to workaround SPARK-7684 As stated in SPARK-7684, currently `TestHive.reset` has some execution order specific bug, which makes running specific test suites locally pretty frustrating. This PR refactors `MetastoreDataSourcesSuite` (which relies on `TestHive.reset` heavily) using various `withXxx` utility methods in `SQLTestUtils` to ask each test case to cleanup their own mess so that we can avoid calling `TestHive.reset`. Author: Cheng Lian <lian@databricks.com> Author: Yin Huai <yhuai@databricks.com> Closes #6353 from liancheng/workaround-spark-7684 and squashes the following commits: 26939aa [Yin Huai] Move the initialization of jsonFilePath to beforeAll. a423d48 [Cheng Lian] Fixes Scala style issue dfe45d0 [Cheng Lian] Refactors MetastoreDataSourcesSuite to workaround SPARK-7684 92a116d [Cheng Lian] Fixes minor styling issues (cherry picked from commit `b97ddff000`) Signed-off-by: Yin Huai <yhuai@databricks.com>	2015-05-27 13:09:42 -07:00
Reynold Xin	0468d57a6f	Removed Guava dependency from JavaTypeInference's type signature. This should also close #6243. Author: Reynold Xin <rxin@databricks.com> Closes #6431 from rxin/JavaTypeInference-guava and squashes the following commits: e58df3c [Reynold Xin] Removed Gauva dependency from JavaTypeInference's type signature. (cherry picked from commit `6fec1a9409`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-27 11:54:42 -07:00
Cheng Lian	a25ce91f96	[SPARK-7847] [SQL] Fixes dynamic partition directory escaping Please refer to [SPARK-7847] [1] for details. [1]: https://issues.apache.org/jira/browse/SPARK-7847 Author: Cheng Lian <lian@databricks.com> Closes #6389 from liancheng/spark-7847 and squashes the following commits: 935c652 [Cheng Lian] Adds test case for writing various data types as dynamic partition value f4fc398 [Cheng Lian] Converts partition columns to Scala type when writing dynamic partitions d0aeca0 [Cheng Lian] Fixes dynamic partition directory escaping (cherry picked from commit `15459db4f6`) Signed-off-by: Yin Huai <yhuai@databricks.com>	2015-05-27 10:09:20 -07:00
Liang-Chi Hsieh	01c3ef536d	[SPARK-7697][SQL] Use LongType for unsigned int in JDBCRDD JIRA: https://issues.apache.org/jira/browse/SPARK-7697 The reported problem case is mysql. But for h2 db, there is no unsigned int. So it is not able to add corresponding test. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #6229 from viirya/unsignedint_as_long and squashes the following commits: dc4b5d8 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into unsignedint_as_long 608695b [Liang-Chi Hsieh] Use LongType for unsigned int in JDBCRDD. (cherry picked from commit `4f98d7a7f1`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-27 00:27:44 -07:00

1 2 3 4 5 ...

847 commits