ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Dongjoon Hyun	2639c3ed03	[SPARK-19910][SQL] `stack` should not reject NULL values due to type mismatch ## What changes were proposed in this pull request? Since `stack` function generates a table with nullable columns, it should allow mixed null values. ```scala scala> sql("select stack(3, 1, 2, 3)").printSchema root \|-- col0: integer (nullable = true) scala> sql("select stack(3, 1, 2, null)").printSchema org.apache.spark.sql.AnalysisException: cannot resolve 'stack(3, 1, 2, NULL)' due to data type mismatch: Argument 1 (IntegerType) != Argument 3 (NullType); line 1 pos 7; ``` ## How was this patch tested? Pass the Jenkins with a new test case. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #17251 from dongjoon-hyun/SPARK-19910.	2017-06-12 21:18:43 -07:00
Reynold Xin	b1436c7496	[SPARK-21059][SQL] LikeSimplification can NPE on null pattern ## What changes were proposed in this pull request? This patch fixes a bug that can cause NullPointerException in LikeSimplification, when the pattern for like is null. ## How was this patch tested? Added a new unit test case in LikeSimplificationSuite. Author: Reynold Xin <rxin@databricks.com> Closes #18273 from rxin/SPARK-21059.	2017-06-12 14:07:51 -07:00
aokolnychyi	ca4e960aec	[SPARK-17914][SQL] Fix parsing of timestamp strings with nanoseconds The PR contains a tiny change to fix the way Spark parses string literals into timestamps. Currently, some timestamps that contain nanoseconds are corrupted during the conversion from internal UTF8Strings into the internal representation of timestamps. Consider the following example: ``` spark.sql("SELECT cast('2015-01-02 00:00:00.000000001' as TIMESTAMP)").show(false) +------------------------------------------------+ \|CAST(2015-01-02 00:00:00.000000001 AS TIMESTAMP)\| +------------------------------------------------+ \|2015-01-02 00:00:00.000001 \| +------------------------------------------------+ ``` The fix was tested with existing tests. Also, there is a new test to cover cases that did not work previously. Author: aokolnychyi <anton.okolnychyi@sap.com> Closes #18252 from aokolnychyi/spark-17914.	2017-06-12 13:06:14 -07:00
Michal Senkyr	f48273c13c	[SPARK-18891][SQL] Support for specific Java List subtypes ## What changes were proposed in this pull request? Add support for specific Java `List` subtypes in deserialization as well as a generic implicit encoder. All `List` subtypes are supported by using either the size-specifying constructor (one `int` parameter) or the default constructor. Interfaces/abstract classes use the following implementations: * `java.util.List`, `java.util.AbstractList` or `java.util.AbstractSequentialList` => `java.util.ArrayList` ## How was this patch tested? ```bash build/mvn -DskipTests clean package && dev/run-tests ``` Additionally in Spark shell: ``` scala> val jlist = new java.util.LinkedList[Int]; jlist.add(1) jlist: java.util.LinkedList[Int] = [1] res0: Boolean = true scala> Seq(jlist).toDS().map(_.element()).collect() res1: Array[Int] = Array(1) ``` Author: Michal Senkyr <mike.senkyr@gmail.com> Closes #18009 from michalsenkyr/dataset-java-lists.	2017-06-12 08:53:23 +08:00
Michal Senkyr	0538f3b0ae	[SPARK-18891][SQL] Support for Scala Map collection types ## What changes were proposed in this pull request? Add support for arbitrary Scala `Map` types in deserialization as well as a generic implicit encoder. Used the builder approach as in #16541 to construct any provided `Map` type upon deserialization. Please note that this PR also adds (ignored) tests for issue [SPARK-19104 CompileException with Map and Case Class in Spark 2.1.0](https://issues.apache.org/jira/browse/SPARK-19104) but doesn't solve it. Added support for Java Maps in codegen code (encoders will be added in a different PR) with the following default implementations for interfaces/abstract classes: * `java.util.Map`, `java.util.AbstractMap` => `java.util.HashMap` * `java.util.SortedMap`, `java.util.NavigableMap` => `java.util.TreeMap` * `java.util.concurrent.ConcurrentMap` => `java.util.concurrent.ConcurrentHashMap` * `java.util.concurrent.ConcurrentNavigableMap` => `java.util.concurrent.ConcurrentSkipListMap` Resulting codegen for `Seq(Map(1 -> 2)).toDS().map(identity).queryExecution.debug.codegen`: ``` /* 001 / public Object generate(Object[] references) { / 002 / return new GeneratedIterator(references); / 003 / } / 004 / / 005 / final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator { / 006 / private Object[] references; / 007 / private scala.collection.Iterator[] inputs; / 008 / private scala.collection.Iterator inputadapter_input; / 009 / private boolean CollectObjectsToMap_loopIsNull1; / 010 / private int CollectObjectsToMap_loopValue0; / 011 / private boolean CollectObjectsToMap_loopIsNull3; / 012 / private int CollectObjectsToMap_loopValue2; / 013 / private UnsafeRow deserializetoobject_result; / 014 / private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder deserializetoobject_holder; / 015 / private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter deserializetoobject_rowWriter; / 016 / private scala.collection.immutable.Map mapelements_argValue; / 017 / private UnsafeRow mapelements_result; / 018 / private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder mapelements_holder; / 019 / private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter mapelements_rowWriter; / 020 / private UnsafeRow serializefromobject_result; / 021 / private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder serializefromobject_holder; / 022 / private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter serializefromobject_rowWriter; / 023 / private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter serializefromobject_arrayWriter; / 024 / private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter serializefromobject_arrayWriter1; / 025 / / 026 / public GeneratedIterator(Object[] references) { / 027 / this.references = references; / 028 / } / 029 / / 030 / public void init(int index, scala.collection.Iterator[] inputs) { / 031 / partitionIndex = index; / 032 / this.inputs = inputs; / 033 / wholestagecodegen_init_0(); / 034 / wholestagecodegen_init_1(); / 035 / / 036 / } / 037 / / 038 / private void wholestagecodegen_init_0() { / 039 / inputadapter_input = inputs[0]; / 040 / / 041 / deserializetoobject_result = new UnsafeRow(1); / 042 / this.deserializetoobject_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(deserializetoobject_result, 32); / 043 / this.deserializetoobject_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(deserializetoobject_holder, 1); / 044 / / 045 / mapelements_result = new UnsafeRow(1); / 046 / this.mapelements_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(mapelements_result, 32); / 047 / this.mapelements_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(mapelements_holder, 1); / 048 / serializefromobject_result = new UnsafeRow(1); / 049 / this.serializefromobject_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(serializefromobject_result, 32); / 050 / this.serializefromobject_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(serializefromobject_holder, 1); / 051 / this.serializefromobject_arrayWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); / 052 / / 053 / } / 054 / / 055 / private void wholestagecodegen_init_1() { / 056 / this.serializefromobject_arrayWriter1 = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); / 057 / / 058 / } / 059 / / 060 / protected void processNext() throws java.io.IOException { / 061 / while (inputadapter_input.hasNext() && !stopEarly()) { / 062 / InternalRow inputadapter_row = (InternalRow) inputadapter_input.next(); / 063 / boolean inputadapter_isNull = inputadapter_row.isNullAt(0); / 064 / MapData inputadapter_value = inputadapter_isNull ? null : (inputadapter_row.getMap(0)); / 065 / / 066 / boolean deserializetoobject_isNull1 = true; / 067 / ArrayData deserializetoobject_value1 = null; / 068 / if (!inputadapter_isNull) { / 069 / deserializetoobject_isNull1 = false; / 070 / if (!deserializetoobject_isNull1) { / 071 / Object deserializetoobject_funcResult = null; / 072 / deserializetoobject_funcResult = inputadapter_value.keyArray(); / 073 / if (deserializetoobject_funcResult == null) { / 074 / deserializetoobject_isNull1 = true; / 075 / } else { / 076 / deserializetoobject_value1 = (ArrayData) deserializetoobject_funcResult; / 077 / } / 078 / / 079 / } / 080 / deserializetoobject_isNull1 = deserializetoobject_value1 == null; / 081 / } / 082 / / 083 / boolean deserializetoobject_isNull3 = true; / 084 / ArrayData deserializetoobject_value3 = null; / 085 / if (!inputadapter_isNull) { / 086 / deserializetoobject_isNull3 = false; / 087 / if (!deserializetoobject_isNull3) { / 088 / Object deserializetoobject_funcResult1 = null; / 089 / deserializetoobject_funcResult1 = inputadapter_value.valueArray(); / 090 / if (deserializetoobject_funcResult1 == null) { / 091 / deserializetoobject_isNull3 = true; / 092 / } else { / 093 / deserializetoobject_value3 = (ArrayData) deserializetoobject_funcResult1; / 094 / } / 095 / / 096 / } / 097 / deserializetoobject_isNull3 = deserializetoobject_value3 == null; / 098 / } / 099 / scala.collection.immutable.Map deserializetoobject_value = null; / 100 / / 101 / if ((deserializetoobject_isNull1 && !deserializetoobject_isNull3) \|\| / 102 / (!deserializetoobject_isNull1 && deserializetoobject_isNull3)) { / 103 / throw new RuntimeException("Invalid state: Inconsistent nullability of key-value"); / 104 / } / 105 / / 106 / if (!deserializetoobject_isNull1) { / 107 / if (deserializetoobject_value1.numElements() != deserializetoobject_value3.numElements()) { / 108 / throw new RuntimeException("Invalid state: Inconsistent lengths of key-value arrays"); / 109 / } / 110 / int deserializetoobject_dataLength = deserializetoobject_value1.numElements(); / 111 / / 112 / scala.collection.mutable.Builder CollectObjectsToMap_builderValue5 = scala.collection.immutable.Map$.MODULE$.newBuilder(); / 113 / CollectObjectsToMap_builderValue5.sizeHint(deserializetoobject_dataLength); / 114 / / 115 / int deserializetoobject_loopIndex = 0; / 116 / while (deserializetoobject_loopIndex < deserializetoobject_dataLength) { / 117 / CollectObjectsToMap_loopValue0 = (int) (deserializetoobject_value1.getInt(deserializetoobject_loopIndex)); / 118 / CollectObjectsToMap_loopValue2 = (int) (deserializetoobject_value3.getInt(deserializetoobject_loopIndex)); / 119 / CollectObjectsToMap_loopIsNull1 = deserializetoobject_value1.isNullAt(deserializetoobject_loopIndex); / 120 / CollectObjectsToMap_loopIsNull3 = deserializetoobject_value3.isNullAt(deserializetoobject_loopIndex); / 121 / / 122 / if (CollectObjectsToMap_loopIsNull1) { / 123 / throw new RuntimeException("Found null in map key!"); / 124 / } / 125 / / 126 / scala.Tuple2 CollectObjectsToMap_loopValue4; / 127 / / 128 / if (CollectObjectsToMap_loopIsNull3) { / 129 / CollectObjectsToMap_loopValue4 = new scala.Tuple2(CollectObjectsToMap_loopValue0, null); / 130 / } else { / 131 / CollectObjectsToMap_loopValue4 = new scala.Tuple2(CollectObjectsToMap_loopValue0, CollectObjectsToMap_loopValue2); / 132 / } / 133 / / 134 / CollectObjectsToMap_builderValue5.$plus$eq(CollectObjectsToMap_loopValue4); / 135 / / 136 / deserializetoobject_loopIndex += 1; / 137 / } / 138 / / 139 / deserializetoobject_value = (scala.collection.immutable.Map) CollectObjectsToMap_builderValue5.result(); / 140 / } / 141 / / 142 / boolean mapelements_isNull = true; / 143 / scala.collection.immutable.Map mapelements_value = null; / 144 / if (!false) { / 145 / mapelements_argValue = deserializetoobject_value; / 146 / / 147 / mapelements_isNull = false; / 148 / if (!mapelements_isNull) { / 149 / Object mapelements_funcResult = null; / 150 / mapelements_funcResult = ((scala.Function1) references[0]).apply(mapelements_argValue); / 151 / if (mapelements_funcResult == null) { / 152 / mapelements_isNull = true; / 153 / } else { / 154 / mapelements_value = (scala.collection.immutable.Map) mapelements_funcResult; / 155 / } / 156 / / 157 / } / 158 / mapelements_isNull = mapelements_value == null; / 159 / } / 160 / / 161 / MapData serializefromobject_value = null; / 162 / if (!mapelements_isNull) { / 163 / final int serializefromobject_length = mapelements_value.size(); / 164 / final Object[] serializefromobject_convertedKeys = new Object[serializefromobject_length]; / 165 / final Object[] serializefromobject_convertedValues = new Object[serializefromobject_length]; / 166 / int serializefromobject_index = 0; / 167 / final scala.collection.Iterator serializefromobject_entries = mapelements_value.iterator(); / 168 / while(serializefromobject_entries.hasNext()) { / 169 / final scala.Tuple2 serializefromobject_entry = (scala.Tuple2) serializefromobject_entries.next(); / 170 / int ExternalMapToCatalyst_key1 = (Integer) serializefromobject_entry._1(); / 171 / int ExternalMapToCatalyst_value1 = (Integer) serializefromobject_entry._2(); / 172 / / 173 / boolean ExternalMapToCatalyst_value_isNull1 = false; / 174 / / 175 / if (false) { / 176 / throw new RuntimeException("Cannot use null as map key!"); / 177 / } else { / 178 / serializefromobject_convertedKeys[serializefromobject_index] = (Integer) ExternalMapToCatalyst_key1; / 179 / } / 180 / / 181 / if (false) { / 182 / serializefromobject_convertedValues[serializefromobject_index] = null; / 183 / } else { / 184 / serializefromobject_convertedValues[serializefromobject_index] = (Integer) ExternalMapToCatalyst_value1; / 185 / } / 186 / / 187 / serializefromobject_index++; / 188 / } / 189 / / 190 / serializefromobject_value = new org.apache.spark.sql.catalyst.util.ArrayBasedMapData(new org.apache.spark.sql.catalyst.util.GenericArrayData(serializefromobject_convertedKeys), new org.apache.spark.sql.catalyst.util.GenericArrayData(serializefromobject_convertedValues)); / 191 / } / 192 / serializefromobject_holder.reset(); / 193 / / 194 / serializefromobject_rowWriter.zeroOutNullBytes(); / 195 / / 196 / if (mapelements_isNull) { / 197 / serializefromobject_rowWriter.setNullAt(0); / 198 / } else { / 199 / // Remember the current cursor so that we can calculate how many bytes are / 200 / // written later. / 201 / final int serializefromobject_tmpCursor = serializefromobject_holder.cursor; / 202 / / 203 / if (serializefromobject_value instanceof UnsafeMapData) { / 204 / final int serializefromobject_sizeInBytes = ((UnsafeMapData) serializefromobject_value).getSizeInBytes(); / 205 / // grow the global buffer before writing data. / 206 / serializefromobject_holder.grow(serializefromobject_sizeInBytes); / 207 / ((UnsafeMapData) serializefromobject_value).writeToMemory(serializefromobject_holder.buffer, serializefromobject_holder.cursor); / 208 / serializefromobject_holder.cursor += serializefromobject_sizeInBytes; / 209 / / 210 / } else { / 211 / final ArrayData serializefromobject_keys = serializefromobject_value.keyArray(); / 212 / final ArrayData serializefromobject_values = serializefromobject_value.valueArray(); / 213 / / 214 / // preserve 8 bytes to write the key array numBytes later. / 215 / serializefromobject_holder.grow(8); / 216 / serializefromobject_holder.cursor += 8; / 217 / / 218 / // Remember the current cursor so that we can write numBytes of key array later. / 219 / final int serializefromobject_tmpCursor1 = serializefromobject_holder.cursor; / 220 / / 221 / if (serializefromobject_keys instanceof UnsafeArrayData) { / 222 / final int serializefromobject_sizeInBytes1 = ((UnsafeArrayData) serializefromobject_keys).getSizeInBytes(); / 223 / // grow the global buffer before writing data. / 224 / serializefromobject_holder.grow(serializefromobject_sizeInBytes1); / 225 / ((UnsafeArrayData) serializefromobject_keys).writeToMemory(serializefromobject_holder.buffer, serializefromobject_holder.cursor); / 226 / serializefromobject_holder.cursor += serializefromobject_sizeInBytes1; / 227 / / 228 / } else { / 229 / final int serializefromobject_numElements = serializefromobject_keys.numElements(); / 230 / serializefromobject_arrayWriter.initialize(serializefromobject_holder, serializefromobject_numElements, 4); / 231 / / 232 / for (int serializefromobject_index1 = 0; serializefromobject_index1 < serializefromobject_numElements; serializefromobject_index1++) { / 233 / if (serializefromobject_keys.isNullAt(serializefromobject_index1)) { / 234 / serializefromobject_arrayWriter.setNullInt(serializefromobject_index1); / 235 / } else { / 236 / final int serializefromobject_element = serializefromobject_keys.getInt(serializefromobject_index1); / 237 / serializefromobject_arrayWriter.write(serializefromobject_index1, serializefromobject_element); / 238 / } / 239 / } / 240 / } / 241 / / 242 / // Write the numBytes of key array into the first 8 bytes. / 243 / Platform.putLong(serializefromobject_holder.buffer, serializefromobject_tmpCursor1 - 8, serializefromobject_holder.cursor - serializefromobject_tmpCursor1); / 244 / / 245 / if (serializefromobject_values instanceof UnsafeArrayData) { / 246 / final int serializefromobject_sizeInBytes2 = ((UnsafeArrayData) serializefromobject_values).getSizeInBytes(); / 247 / // grow the global buffer before writing data. / 248 / serializefromobject_holder.grow(serializefromobject_sizeInBytes2); / 249 / ((UnsafeArrayData) serializefromobject_values).writeToMemory(serializefromobject_holder.buffer, serializefromobject_holder.cursor); / 250 / serializefromobject_holder.cursor += serializefromobject_sizeInBytes2; / 251 / / 252 / } else { / 253 / final int serializefromobject_numElements1 = serializefromobject_values.numElements(); / 254 / serializefromobject_arrayWriter1.initialize(serializefromobject_holder, serializefromobject_numElements1, 4); / 255 / / 256 / for (int serializefromobject_index2 = 0; serializefromobject_index2 < serializefromobject_numElements1; serializefromobject_index2++) { / 257 / if (serializefromobject_values.isNullAt(serializefromobject_index2)) { / 258 / serializefromobject_arrayWriter1.setNullInt(serializefromobject_index2); / 259 / } else { / 260 / final int serializefromobject_element1 = serializefromobject_values.getInt(serializefromobject_index2); / 261 / serializefromobject_arrayWriter1.write(serializefromobject_index2, serializefromobject_element1); / 262 / } / 263 / } / 264 / } / 265 / / 266 / } / 267 / / 268 / serializefromobject_rowWriter.setOffsetAndSize(0, serializefromobject_tmpCursor, serializefromobject_holder.cursor - serializefromobject_tmpCursor); / 269 / } / 270 / serializefromobject_result.setTotalSize(serializefromobject_holder.totalSize()); / 271 / append(serializefromobject_result); / 272 / if (shouldStop()) return; / 273 / } / 274 / } / 275 / } ``` Codegen for `java.util.Map`: ``` / 001 / public Object generate(Object[] references) { / 002 / return new GeneratedIterator(references); / 003 / } / 004 / / 005 / final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator { / 006 / private Object[] references; / 007 / private scala.collection.Iterator[] inputs; / 008 / private scala.collection.Iterator inputadapter_input; / 009 / private boolean CollectObjectsToMap_loopIsNull1; / 010 / private int CollectObjectsToMap_loopValue0; / 011 / private boolean CollectObjectsToMap_loopIsNull3; / 012 / private int CollectObjectsToMap_loopValue2; / 013 / private UnsafeRow deserializetoobject_result; / 014 / private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder deserializetoobject_holder; / 015 / private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter deserializetoobject_rowWriter; / 016 / private java.util.HashMap mapelements_argValue; / 017 / private UnsafeRow mapelements_result; / 018 / private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder mapelements_holder; / 019 / private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter mapelements_rowWriter; / 020 / private UnsafeRow serializefromobject_result; / 021 / private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder serializefromobject_holder; / 022 / private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter serializefromobject_rowWriter; / 023 / private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter serializefromobject_arrayWriter; / 024 / private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter serializefromobject_arrayWriter1; / 025 / / 026 / public GeneratedIterator(Object[] references) { / 027 / this.references = references; / 028 / } / 029 / / 030 / public void init(int index, scala.collection.Iterator[] inputs) { / 031 / partitionIndex = index; / 032 / this.inputs = inputs; / 033 / wholestagecodegen_init_0(); / 034 / wholestagecodegen_init_1(); / 035 / / 036 / } / 037 / / 038 / private void wholestagecodegen_init_0() { / 039 / inputadapter_input = inputs[0]; / 040 / / 041 / deserializetoobject_result = new UnsafeRow(1); / 042 / this.deserializetoobject_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(deserializetoobject_result, 32); / 043 / this.deserializetoobject_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(deserializetoobject_holder, 1); / 044 / / 045 / mapelements_result = new UnsafeRow(1); / 046 / this.mapelements_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(mapelements_result, 32); / 047 / this.mapelements_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(mapelements_holder, 1); / 048 / serializefromobject_result = new UnsafeRow(1); / 049 / this.serializefromobject_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(serializefromobject_result, 32); / 050 / this.serializefromobject_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(serializefromobject_holder, 1); / 051 / this.serializefromobject_arrayWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); / 052 / / 053 / } / 054 / / 055 / private void wholestagecodegen_init_1() { / 056 / this.serializefromobject_arrayWriter1 = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); / 057 / / 058 / } / 059 / / 060 / protected void processNext() throws java.io.IOException { / 061 / while (inputadapter_input.hasNext() && !stopEarly()) { / 062 / InternalRow inputadapter_row = (InternalRow) inputadapter_input.next(); / 063 / boolean inputadapter_isNull = inputadapter_row.isNullAt(0); / 064 / MapData inputadapter_value = inputadapter_isNull ? null : (inputadapter_row.getMap(0)); / 065 / / 066 / boolean deserializetoobject_isNull1 = true; / 067 / ArrayData deserializetoobject_value1 = null; / 068 / if (!inputadapter_isNull) { / 069 / deserializetoobject_isNull1 = false; / 070 / if (!deserializetoobject_isNull1) { / 071 / Object deserializetoobject_funcResult = null; / 072 / deserializetoobject_funcResult = inputadapter_value.keyArray(); / 073 / if (deserializetoobject_funcResult == null) { / 074 / deserializetoobject_isNull1 = true; / 075 / } else { / 076 / deserializetoobject_value1 = (ArrayData) deserializetoobject_funcResult; / 077 / } / 078 / / 079 / } / 080 / deserializetoobject_isNull1 = deserializetoobject_value1 == null; / 081 / } / 082 / / 083 / boolean deserializetoobject_isNull3 = true; / 084 / ArrayData deserializetoobject_value3 = null; / 085 / if (!inputadapter_isNull) { / 086 / deserializetoobject_isNull3 = false; / 087 / if (!deserializetoobject_isNull3) { / 088 / Object deserializetoobject_funcResult1 = null; / 089 / deserializetoobject_funcResult1 = inputadapter_value.valueArray(); / 090 / if (deserializetoobject_funcResult1 == null) { / 091 / deserializetoobject_isNull3 = true; / 092 / } else { / 093 / deserializetoobject_value3 = (ArrayData) deserializetoobject_funcResult1; / 094 / } / 095 / / 096 / } / 097 / deserializetoobject_isNull3 = deserializetoobject_value3 == null; / 098 / } / 099 / java.util.HashMap deserializetoobject_value = null; / 100 / / 101 / if ((deserializetoobject_isNull1 && !deserializetoobject_isNull3) \|\| / 102 / (!deserializetoobject_isNull1 && deserializetoobject_isNull3)) { / 103 / throw new RuntimeException("Invalid state: Inconsistent nullability of key-value"); / 104 / } / 105 / / 106 / if (!deserializetoobject_isNull1) { / 107 / if (deserializetoobject_value1.numElements() != deserializetoobject_value3.numElements()) { / 108 / throw new RuntimeException("Invalid state: Inconsistent lengths of key-value arrays"); / 109 / } / 110 / int deserializetoobject_dataLength = deserializetoobject_value1.numElements(); / 111 / java.util.Map CollectObjectsToMap_builderValue5 = new java.util.HashMap(deserializetoobject_dataLength); / 112 / / 113 / int deserializetoobject_loopIndex = 0; / 114 / while (deserializetoobject_loopIndex < deserializetoobject_dataLength) { / 115 / CollectObjectsToMap_loopValue0 = (int) (deserializetoobject_value1.getInt(deserializetoobject_loopIndex)); / 116 / CollectObjectsToMap_loopValue2 = (int) (deserializetoobject_value3.getInt(deserializetoobject_loopIndex)); / 117 / CollectObjectsToMap_loopIsNull1 = deserializetoobject_value1.isNullAt(deserializetoobject_loopIndex); / 118 / CollectObjectsToMap_loopIsNull3 = deserializetoobject_value3.isNullAt(deserializetoobject_loopIndex); / 119 / / 120 / if (CollectObjectsToMap_loopIsNull1) { / 121 / throw new RuntimeException("Found null in map key!"); / 122 / } / 123 / / 124 / CollectObjectsToMap_builderValue5.put(CollectObjectsToMap_loopValue0, CollectObjectsToMap_loopValue2); / 125 / / 126 / deserializetoobject_loopIndex += 1; / 127 / } / 128 / / 129 / deserializetoobject_value = (java.util.HashMap) CollectObjectsToMap_builderValue5; / 130 / } / 131 / / 132 / boolean mapelements_isNull = true; / 133 / java.util.HashMap mapelements_value = null; / 134 / if (!false) { / 135 / mapelements_argValue = deserializetoobject_value; / 136 / / 137 / mapelements_isNull = false; / 138 / if (!mapelements_isNull) { / 139 / Object mapelements_funcResult = null; / 140 / mapelements_funcResult = ((scala.Function1) references[0]).apply(mapelements_argValue); / 141 / if (mapelements_funcResult == null) { / 142 / mapelements_isNull = true; / 143 / } else { / 144 / mapelements_value = (java.util.HashMap) mapelements_funcResult; / 145 / } / 146 / / 147 / } / 148 / mapelements_isNull = mapelements_value == null; / 149 / } / 150 / / 151 / MapData serializefromobject_value = null; / 152 / if (!mapelements_isNull) { / 153 / final int serializefromobject_length = mapelements_value.size(); / 154 / final Object[] serializefromobject_convertedKeys = new Object[serializefromobject_length]; / 155 / final Object[] serializefromobject_convertedValues = new Object[serializefromobject_length]; / 156 / int serializefromobject_index = 0; / 157 / final java.util.Iterator serializefromobject_entries = mapelements_value.entrySet().iterator(); / 158 / while(serializefromobject_entries.hasNext()) { / 159 / final java.util.Map$Entry serializefromobject_entry = (java.util.Map$Entry) serializefromobject_entries.next(); / 160 / int ExternalMapToCatalyst_key1 = (Integer) serializefromobject_entry.getKey(); / 161 / int ExternalMapToCatalyst_value1 = (Integer) serializefromobject_entry.getValue(); / 162 / / 163 / boolean ExternalMapToCatalyst_value_isNull1 = false; / 164 / / 165 / if (false) { / 166 / throw new RuntimeException("Cannot use null as map key!"); / 167 / } else { / 168 / serializefromobject_convertedKeys[serializefromobject_index] = (Integer) ExternalMapToCatalyst_key1; / 169 / } / 170 / / 171 / if (false) { / 172 / serializefromobject_convertedValues[serializefromobject_index] = null; / 173 / } else { / 174 / serializefromobject_convertedValues[serializefromobject_index] = (Integer) ExternalMapToCatalyst_value1; / 175 / } / 176 / / 177 / serializefromobject_index++; / 178 / } / 179 / / 180 / serializefromobject_value = new org.apache.spark.sql.catalyst.util.ArrayBasedMapData(new org.apache.spark.sql.catalyst.util.GenericArrayData(serializefromobject_convertedKeys), new org.apache.spark.sql.catalyst.util.GenericArrayData(serializefromobject_convertedValues)); / 181 / } / 182 / serializefromobject_holder.reset(); / 183 / / 184 / serializefromobject_rowWriter.zeroOutNullBytes(); / 185 / / 186 / if (mapelements_isNull) { / 187 / serializefromobject_rowWriter.setNullAt(0); / 188 / } else { / 189 / // Remember the current cursor so that we can calculate how many bytes are / 190 / // written later. / 191 / final int serializefromobject_tmpCursor = serializefromobject_holder.cursor; / 192 / / 193 / if (serializefromobject_value instanceof UnsafeMapData) { / 194 / final int serializefromobject_sizeInBytes = ((UnsafeMapData) serializefromobject_value).getSizeInBytes(); / 195 / // grow the global buffer before writing data. / 196 / serializefromobject_holder.grow(serializefromobject_sizeInBytes); / 197 / ((UnsafeMapData) serializefromobject_value).writeToMemory(serializefromobject_holder.buffer, serializefromobject_holder.cursor); / 198 / serializefromobject_holder.cursor += serializefromobject_sizeInBytes; / 199 / / 200 / } else { / 201 / final ArrayData serializefromobject_keys = serializefromobject_value.keyArray(); / 202 / final ArrayData serializefromobject_values = serializefromobject_value.valueArray(); / 203 / / 204 / // preserve 8 bytes to write the key array numBytes later. / 205 / serializefromobject_holder.grow(8); / 206 / serializefromobject_holder.cursor += 8; / 207 / / 208 / // Remember the current cursor so that we can write numBytes of key array later. / 209 / final int serializefromobject_tmpCursor1 = serializefromobject_holder.cursor; / 210 / / 211 / if (serializefromobject_keys instanceof UnsafeArrayData) { / 212 / final int serializefromobject_sizeInBytes1 = ((UnsafeArrayData) serializefromobject_keys).getSizeInBytes(); / 213 / // grow the global buffer before writing data. / 214 / serializefromobject_holder.grow(serializefromobject_sizeInBytes1); / 215 / ((UnsafeArrayData) serializefromobject_keys).writeToMemory(serializefromobject_holder.buffer, serializefromobject_holder.cursor); / 216 / serializefromobject_holder.cursor += serializefromobject_sizeInBytes1; / 217 / / 218 / } else { / 219 / final int serializefromobject_numElements = serializefromobject_keys.numElements(); / 220 / serializefromobject_arrayWriter.initialize(serializefromobject_holder, serializefromobject_numElements, 4); / 221 / / 222 / for (int serializefromobject_index1 = 0; serializefromobject_index1 < serializefromobject_numElements; serializefromobject_index1++) { / 223 / if (serializefromobject_keys.isNullAt(serializefromobject_index1)) { / 224 / serializefromobject_arrayWriter.setNullInt(serializefromobject_index1); / 225 / } else { / 226 / final int serializefromobject_element = serializefromobject_keys.getInt(serializefromobject_index1); / 227 / serializefromobject_arrayWriter.write(serializefromobject_index1, serializefromobject_element); / 228 / } / 229 / } / 230 / } / 231 / / 232 / // Write the numBytes of key array into the first 8 bytes. / 233 / Platform.putLong(serializefromobject_holder.buffer, serializefromobject_tmpCursor1 - 8, serializefromobject_holder.cursor - serializefromobject_tmpCursor1); / 234 / / 235 / if (serializefromobject_values instanceof UnsafeArrayData) { / 236 / final int serializefromobject_sizeInBytes2 = ((UnsafeArrayData) serializefromobject_values).getSizeInBytes(); / 237 / // grow the global buffer before writing data. / 238 / serializefromobject_holder.grow(serializefromobject_sizeInBytes2); / 239 / ((UnsafeArrayData) serializefromobject_values).writeToMemory(serializefromobject_holder.buffer, serializefromobject_holder.cursor); / 240 / serializefromobject_holder.cursor += serializefromobject_sizeInBytes2; / 241 / / 242 / } else { / 243 / final int serializefromobject_numElements1 = serializefromobject_values.numElements(); / 244 / serializefromobject_arrayWriter1.initialize(serializefromobject_holder, serializefromobject_numElements1, 4); / 245 / / 246 / for (int serializefromobject_index2 = 0; serializefromobject_index2 < serializefromobject_numElements1; serializefromobject_index2++) { / 247 / if (serializefromobject_values.isNullAt(serializefromobject_index2)) { / 248 / serializefromobject_arrayWriter1.setNullInt(serializefromobject_index2); / 249 / } else { / 250 / final int serializefromobject_element1 = serializefromobject_values.getInt(serializefromobject_index2); / 251 / serializefromobject_arrayWriter1.write(serializefromobject_index2, serializefromobject_element1); / 252 / } / 253 / } / 254 / } / 255 / / 256 / } / 257 / / 258 / serializefromobject_rowWriter.setOffsetAndSize(0, serializefromobject_tmpCursor, serializefromobject_holder.cursor - serializefromobject_tmpCursor); / 259 / } / 260 / serializefromobject_result.setTotalSize(serializefromobject_holder.totalSize()); / 261 / append(serializefromobject_result); / 262 / if (shouldStop()) return; / 263 / } / 264 / } / 265 */ } ``` ## How was this patch tested? ``` build/mvn -DskipTests clean package && dev/run-tests ``` Additionally in Spark shell: ``` scala> Seq(collection.mutable.HashMap(1 -> 2, 2 -> 3)).toDS().map(_ += (3 -> 4)).collect() res0: Array[scala.collection.mutable.HashMap[Int,Int]] = Array(Map(2 -> 3, 1 -> 2, 3 -> 4)) ``` Author: Michal Senkyr <mike.senkyr@gmail.com> Author: Michal Šenkýř <mike.senkyr@gmail.com> Closes #16986 from michalsenkyr/dataset-map-builder.	2017-06-12 08:47:01 +08:00
Zhenhua Wang	a7c61c100b	[SPARK-21031][SQL] Add `alterTableStats` to store spark's stats and let `alterTable` keep existing stats ## What changes were proposed in this pull request? Currently, hive's stats are read into `CatalogStatistics`, while spark's stats are also persisted through `CatalogStatistics`. As a result, hive's stats can be unexpectedly propagated into spark' stats. For example, for a catalog table, we read stats from hive, e.g. "totalSize" and put it into `CatalogStatistics`. Then, by using "ALTER TABLE" command, we will store the stats in `CatalogStatistics` into metastore as spark's stats (because we don't know whether it's from spark or not). But spark's stats should be only generated by "ANALYZE" command. This is unexpected from this command. Secondly, now that we have spark's stats in metastore, after inserting new data, although hive updated "totalSize" in metastore, we still cannot get the right `sizeInBytes` in `CatalogStatistics`, because we respect spark's stats (should not exist) over hive's stats. A running example is shown in [JIRA](https://issues.apache.org/jira/browse/SPARK-21031). To fix this, we add a new method `alterTableStats` to store spark's stats, and let `alterTable` keep existing stats. ## How was this patch tested? Added new tests. Author: Zhenhua Wang <wzh_zju@163.com> Closes #18248 from wzhfy/separateHiveStats.	2017-06-12 08:23:04 +08:00
Xiao Li	8e96acf71c	[SPARK-20211][SQL] Fix the Precision and Scale of Decimal Values when the Input is BigDecimal between -1.0 and 1.0 ### What changes were proposed in this pull request? The precision and scale of decimal values are wrong when the input is BigDecimal between -1.0 and 1.0. The BigDecimal's precision is the digit count starts from the leftmost nonzero digit based on the [JAVA's BigDecimal definition](https://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html). However, our Decimal decision follows the database decimal standard, which is the total number of digits, including both to the left and the right of the decimal point. Thus, this PR is to fix the issue by doing the conversion. Before this PR, the following queries failed: ```SQL select 1 > 0.0001 select floor(0.0001) select ceil(0.0001) ``` ### How was this patch tested? Added test cases. Author: Xiao Li <gatorsmile@gmail.com> Closes #18244 from gatorsmile/bigdecimal.	2017-06-10 10:28:14 -07:00
Xiao Li	571635488d	[SPARK-20918][SQL] Use FunctionIdentifier as function identifiers in FunctionRegistry ### What changes were proposed in this pull request? Currently, the unquoted string of a function identifier is being used as the function identifier in the function registry. This could cause the incorrect the behavior when users use `.` in the function names. This PR is to take the `FunctionIdentifier` as the identifier in the function registry. - Add one new function `createOrReplaceTempFunction` to `FunctionRegistry` ```Scala final def createOrReplaceTempFunction(name: String, builder: FunctionBuilder): Unit ``` ### How was this patch tested? Add extra test cases to verify the inclusive bug fixes. Author: Xiao Li <gatorsmile@gmail.com> Author: gatorsmile <gatorsmile@gmail.com> Closes #18142 from gatorsmile/fuctionRegistry.	2017-06-09 10:16:30 -07:00
Xiao Li	1a527bde49	[SPARK-20976][SQL] Unify Error Messages for FAILFAST mode ### What changes were proposed in this pull request? Before 2.2, we indicate the job was terminated because of `FAILFAST` mode. ``` Malformed line in FAILFAST mode: {"a":{, b:3} ``` If possible, we should keep it. This PR is to unify the error messages. ### How was this patch tested? Modified the existing messages. Author: Xiao Li <gatorsmile@gmail.com> Closes #18196 from gatorsmile/messFailFast.	2017-06-08 12:10:31 -07:00
Wenchen Fan	c92949ac23	[SPARK-20972][SQL] rename HintInfo.isBroadcastable to broadcast ## What changes were proposed in this pull request? `HintInfo.isBroadcastable` is actually not an accurate name, it's used to force the planner to broadcast a plan no matter what the data size is, via the hint mechanism. I think `forceBroadcast` is a better name. And `isBroadcastable` only have 2 possible values: `Some(true)` and `None`, so we can just use boolean type for it. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #18189 from cloud-fan/stats.	2017-06-06 22:50:06 -07:00
Reza Safi	b61a401da8	[SPARK-20926][SQL] Removing exposures to guava library caused by directly accessing SessionCatalog's tableRelationCache There could be test failures because DataStorageStrategy, HiveMetastoreCatalog and also HiveSchemaInferenceSuite were exposed to guava library by directly accessing SessionCatalog's tableRelationCacheg. These failures occur when guava shading is in place. ## What changes were proposed in this pull request? This change removes those guava exposures by introducing new methods in SessionCatalog and also changing DataStorageStrategy, HiveMetastoreCatalog and HiveSchemaInferenceSuite so that they use those proxy methods. ## How was this patch tested? Unit tests passed after applying these changes. Author: Reza Safi <rezasafi@cloudera.com> Closes #18148 from rezasafi/branch-2.2. (cherry picked from commit `1388fdd707`)	2017-06-06 09:54:13 -07:00
Feng Liu	88a23d3de0	[SPARK-20991][SQL] BROADCAST_TIMEOUT conf should be a TimeoutConf ## What changes were proposed in this pull request? The construction of BROADCAST_TIMEOUT conf should take the TimeUnit argument as a TimeoutConf. Author: Feng Liu <fengliu@databricks.com> Closes #18208 from liufengdb/fix_timeout.	2017-06-05 17:48:28 -07:00
Wieland Hoffmann	c70c38eb93	[DOCS] Fix a typo in Encoder.clsTag ## What changes were proposed in this pull request? Fixes a typo: `and` -> `an` ## How was this patch tested? Not at all. Author: Wieland Hoffmann <mineo@users.noreply.github.com> Closes #17759 from mineo/patch-1.	2017-06-03 10:12:37 +01:00
Xiao Li	2a780ac7fe	[MINOR][SQL] Update the description of spark.sql.files.ignoreCorruptFiles and spark.sql.columnNameOfCorruptRecord ### What changes were proposed in this pull request? 1. The description of `spark.sql.files.ignoreCorruptFiles` is not accurate. When the file does not exist, we will issue the error message. ``` org.apache.spark.sql.AnalysisException: Path does not exist: file:/nonexist/path; ``` 2. `spark.sql.columnNameOfCorruptRecord` also affects the CSV format. The current description only mentions JSON format. ### How was this patch tested? N/A Author: Xiao Li <gatorsmile@gmail.com> Closes #18184 from gatorsmile/updateMessage.	2017-06-02 12:58:29 -07:00
Bogdan Raducanu	2134196a9c	[SPARK-20854][SQL] Extend hint syntax to support expressions ## What changes were proposed in this pull request? SQL hint syntax: * support expressions such as strings, numbers, etc. instead of only identifiers as it is currently. * support multiple hints, which was missing compared to the DataFrame syntax. DataFrame API: * support any parameters in DataFrame.hint instead of just strings ## How was this patch tested? Existing tests. New tests in PlanParserSuite. New suite DataFrameHintSuite. Author: Bogdan Raducanu <bogdan@databricks.com> Closes #18086 from bogdanrdc/SPARK-20854.	2017-06-01 15:50:40 -07:00
Xiao Li	f7cf2096fd	[SPARK-20941][SQL] Fix SubqueryExec Reuse ### What changes were proposed in this pull request? Before this PR, Subquery reuse does not work. Below are three issues: - Subquery reuse does not work. - It is sharing the same `SQLConf` (`spark.sql.exchange.reuse`) with the one for Exchange Reuse. - No test case covers the rule Subquery reuse. This PR is to fix the above three issues. - Ignored the physical operator `SubqueryExec` when comparing two plans. - Added a dedicated conf `spark.sql.subqueries.reuse` for controlling Subquery Reuse - Added a test case for verifying the behavior ### How was this patch tested? N/A Author: Xiao Li <gatorsmile@gmail.com> Closes #18169 from gatorsmile/subqueryReuse.	2017-06-01 09:52:18 -07:00
Yuming Wang	6d05c1c1da	[SPARK-20910][SQL] Add build-in SQL function - UUID ## What changes were proposed in this pull request? Add build-int SQL function - UUID. ## How was this patch tested? unit tests Author: Yuming Wang <wgyumg@gmail.com> Closes #18136 from wangyum/SPARK-20910.	2017-06-01 16:15:24 +09:00
Yuming Wang	c8045f8b48	[MINOR][SQL] Fix a few function description error. ## What changes were proposed in this pull request? Fix a few function description error. ## How was this patch tested? manual tests ![descissues](https://cloud.githubusercontent.com/assets/5399861/26619392/d547736c-4610-11e7-85d7-aeeb09c02cc8.gif) Author: Yuming Wang <wgyumg@gmail.com> Closes #18157 from wangyum/DescIssues.	2017-05-31 23:17:15 -07:00
Jacek Laskowski	beed5e20af	[DOCS][MINOR] Scaladoc fixes (aka typo hunting) ## What changes were proposed in this pull request? Minor changes to scaladoc ## How was this patch tested? Local build Author: Jacek Laskowski <jacek@japila.pl> Closes #18074 from jaceklaskowski/scaladoc-fixes.	2017-05-31 11:24:37 +01:00
Wenchen Fan	1f5dddffa3	Revert "[SPARK-20392][SQL] Set barrier to prevent re-entering a tree" This reverts commit `8ce0d8ffb6`.	2017-05-30 21:14:55 -07:00
Wenchen Fan	10e526e7e6	[SPARK-20213][SQL] Fix DataFrameWriter operations in SQL UI tab ## What changes were proposed in this pull request? Currently the `DataFrameWriter` operations have several problems: 1. non-file-format data source writing action doesn't show up in the SQL tab in Spark UI 2. file-format data source writing action shows a scan node in the SQL tab, without saying anything about writing. (streaming also have this issue, but not fixed in this PR) 3. Spark SQL CLI actions don't show up in the SQL tab. This PR fixes all of them, by refactoring the `ExecuteCommandExec` to make it have children. close https://github.com/apache/spark/pull/17540 ## How was this patch tested? existing tests. Also test the UI manually. For a simple command: `Seq(1 -> "a").toDF("i", "j").write.parquet("/tmp/qwe")` before this PR: <img width="266" alt="qq20170523-035840 2x" src="https://cloud.githubusercontent.com/assets/3182036/26326050/24e18ba2-3f6c-11e7-8817-6dd275bf6ac5.png"> after this PR: <img width="287" alt="qq20170523-035708 2x" src="https://cloud.githubusercontent.com/assets/3182036/26326054/2ad7f460-3f6c-11e7-8053-d68325beb28f.png"> Author: Wenchen Fan <wenchen@databricks.com> Closes #18064 from cloud-fan/execution.	2017-05-30 20:12:32 -07:00
Tathagata Das	fa757ee1d4	[SPARK-20883][SPARK-20376][SS] Refactored StateStore APIs and added conf to choose implementation ## What changes were proposed in this pull request? A bunch of changes to the StateStore APIs and implementation. Current state store API has a bunch of problems that causes too many transient objects causing memory pressure. - `StateStore.get(): Option` forces creation of Some/None objects for every get. Changed this to return the row or null. - `StateStore.iterator(): (UnsafeRow, UnsafeRow)` forces creation of new tuple for each record returned. Changed this to return a UnsafeRowTuple which can be reused across records. - `StateStore.updates()` requires the implementation to keep track of updates, while this is used minimally (only by Append mode in streaming aggregations). Removed updates() and updated StateStoreSaveExec accordingly. - `StateStore.filter(condition)` and `StateStore.remove(condition)` has been merge into a single API `getRange(start, end)` which allows a state store to do optimized range queries (i.e. avoid full scans). Stateful operators have been updated accordingly. - Removed a lot of unnecessary row copies Each operator copied rows before calling StateStore.put() even if the implementation does not require it to be copied. It is left up to the implementation on whether to copy the row or not. Additionally, - Added a name to the StateStoreId so that each operator+partition can use multiple state stores (different names) - Added a configuration that allows the user to specify which implementation to use. - Added new metrics to understand the time taken to update keys, remove keys and commit all changes to the state store. These metrics will be visible on the plan diagram in the SQL tab of the UI. - Refactored unit tests such that they can be reused to test any implementation of StateStore. ## How was this patch tested? Old and new unit tests Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #18107 from tdas/SPARK-20376.	2017-05-30 15:33:06 -07:00
Xiao Li	4bb6a53ebd	[SPARK-20924][SQL] Unable to call the function registered in the not-current database ### What changes were proposed in this pull request? We are unable to call the function registered in the not-current database. ```Scala sql("CREATE DATABASE dAtABaSe1") sql(s"CREATE FUNCTION dAtABaSe1.test_avg AS '${classOf[GenericUDAFAverage].getName}'") sql("SELECT dAtABaSe1.test_avg(1)") ``` The above code returns an error: ``` Undefined function: 'dAtABaSe1.test_avg'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7 ``` This PR is to fix the above issue. ### How was this patch tested? Added test cases. Author: Xiao Li <gatorsmile@gmail.com> Closes #18146 from gatorsmile/qualifiedFunction.	2017-05-30 14:06:19 -07:00
Liang-Chi Hsieh	35b644bd03	[SPARK-20916][SQL] Improve error message for unaliased subqueries in FROM clause ## What changes were proposed in this pull request? We changed the parser to reject unaliased subqueries in the FROM clause in SPARK-20690. However, the error message that we now give isn't very helpful: scala> sql("""SELECT x FROM (SELECT 1 AS x)""") org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'FROM' expecting {<EOF>, 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 9) We should modify the parser to throw a more clear error for such queries: scala> sql("""SELECT x FROM (SELECT 1 AS x)""") org.apache.spark.sql.catalyst.parser.ParseException: The unaliased subqueries in the FROM clause are not supported.(line 1, pos 14) ## How was this patch tested? Modified existing tests to reflect this change. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #18141 from viirya/SPARK-20916.	2017-05-30 06:28:43 -07:00
Yuming Wang	80fb24b85d	[MINOR] Fix some indent issues. ## What changes were proposed in this pull request? Fix some indent issues. ## How was this patch tested? existing tests. Author: Yuming Wang <wgyumg@gmail.com> Closes #18133 from wangyum/IndentIssues.	2017-05-30 12:15:54 +01:00
Yuming Wang	d797ed0ef1	[SPARK-20909][SQL] Add build-int SQL function - DAYOFWEEK ## What changes were proposed in this pull request? Add build-int SQL function - DAYOFWEEK ## How was this patch tested? unit tests Author: Yuming Wang <wgyumg@gmail.com> Closes #18134 from wangyum/SPARK-20909.	2017-05-30 15:40:50 +09:00
Yuming Wang	1c7db00c74	[SPARK-8184][SQL] Add additional function description for weekofyear ## What changes were proposed in this pull request? Add additional function description for weekofyear. ## How was this patch tested? manual tests ![weekofyear](https://cloud.githubusercontent.com/assets/5399861/26525752/08a1c278-4394-11e7-8988-7cbf82c3a999.gif) Author: Yuming Wang <wgyumg@gmail.com> Closes #18132 from wangyum/SPARK-8184.	2017-05-29 16:10:22 -07:00
Kazuaki Ishizaki	ef9fd920c3	[SPARK-20750][SQL] Built-in SQL Function Support - REPLACE ## What changes were proposed in this pull request? This PR adds built-in SQL function `(REPLACE(<string_expression>, <search_string> [, <replacement_string>])` `REPLACE()` return that string that is replaced all occurrences with given string. ## How was this patch tested? added new test suites Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #18047 from kiszk/SPARK-20750.	2017-05-29 11:47:31 -07:00
Tejas Patil	f9b59abeae	[SPARK-20758][SQL] Add Constant propagation optimization ## What changes were proposed in this pull request? See class doc of `ConstantPropagation` for the approach used. ## How was this patch tested? - Added unit tests Author: Tejas Patil <tejasp@fb.com> Closes #17993 from tejasapatil/SPARK-20758_const_propagation.	2017-05-29 12:21:34 +02:00
Takeshi Yamamuro	24d34281d7	[SPARK-20841][SQL] Support table column aliases in FROM clause ## What changes were proposed in this pull request? This pr added parsing rules to support table column aliases in FROM clause. ## How was this patch tested? Added tests in `PlanParserSuite`, `SQLQueryTestSuite`, and `PlanParserSuite`. Author: Takeshi Yamamuro <yamamuro@apache.org> Closes #18079 from maropu/SPARK-20841.	2017-05-28 13:23:18 -07:00
Xiao Li	06c155c90d	[SPARK-20908][SQL] Cache Manager: Hint should be ignored in plan matching ### What changes were proposed in this pull request? In Cache manager, the plan matching should ignore Hint. ```Scala val df1 = spark.range(10).join(broadcast(spark.range(10))) df1.cache() spark.range(10).join(spark.range(10)).explain() ``` The output plan of the above query shows that the second query is not using the cached data of the first query. ``` BroadcastNestedLoopJoin BuildRight, Inner :- Range (0, 10, step=1, splits=2) +- BroadcastExchange IdentityBroadcastMode +- Range (0, 10, step=1, splits=2) ``` After the fix, the plan becomes ``` InMemoryTableScan [id#20L, id#23L] +- InMemoryRelation [id#20L, id#23L], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas) +- BroadcastNestedLoopJoin BuildRight, Inner :- Range (0, 10, step=1, splits=2) +- BroadcastExchange IdentityBroadcastMode +- Range (0, 10, step=1, splits=2) ``` ### How was this patch tested? Added a test. Author: Xiao Li <gatorsmile@gmail.com> Closes #18131 from gatorsmile/HintCache.	2017-05-27 21:32:18 -07:00
liuxian	3969a8078e	[SPARK-20876][SQL] If the input parameter is float type for ceil or floor,the result is not we expected ## What changes were proposed in this pull request? spark-sql>SELECT ceil(cast(12345.1233 as float)); spark-sql>12345 For this case, the result we expected is `12346` spark-sql>SELECT floor(cast(-12345.1233 as float)); spark-sql>-12345 For this case, the result we expected is `-12346` Because in `Ceil` or `Floor`, `inputTypes` has no FloatType, so it is converted to LongType. ## How was this patch tested? After the modification: spark-sql>SELECT ceil(cast(12345.1233 as float)); spark-sql>12346 spark-sql>SELECT floor(cast(-12345.1233 as float)); spark-sql>-12346 Author: liuxian <liu.xian3@zte.com.cn> Closes #18103 from 10110346/wip-lx-0525-1.	2017-05-27 16:23:45 -07:00
Yuming Wang	a0f8a072e3	[SPARK-20748][SQL] Add built-in SQL function CH[A]R. ## What changes were proposed in this pull request? Add built-in SQL function `CH[A]R`: For `CHR(bigint\|double n)`, returns the ASCII character having the binary equivalent to `n`. If n is larger than 256 the result is equivalent to CHR(n % 256) ## How was this patch tested? unit tests Author: Yuming Wang <wgyumg@gmail.com> Closes #18019 from wangyum/SPARK-20748.	2017-05-26 20:59:14 -07:00
Michael Armbrust	d935e0a9d9	[SPARK-20844] Remove experimental from Structured Streaming APIs Now that Structured Streaming has been out for several Spark release and has large production use cases, the `Experimental` label is no longer appropriate. I've left `InterfaceStability.Evolving` however, as I think we may make a few changes to the pluggable Source & Sink API in Spark 2.3. Author: Michael Armbrust <michael@databricks.com> Closes #18065 from marmbrus/streamingGA.	2017-05-26 13:33:23 -07:00
Liang-Chi Hsieh	8ce0d8ffb6	[SPARK-20392][SQL] Set barrier to prevent re-entering a tree ## What changes were proposed in this pull request? It is reported that there is performance downgrade when applying ML pipeline for dataset with many columns but few rows. A big part of the performance downgrade comes from some operations (e.g., `select`) on DataFrame/Dataset which re-create new DataFrame/Dataset with a new `LogicalPlan`. The cost can be ignored in the usage of SQL, normally. However, it's not rare to chain dozens of pipeline stages in ML. When the query plan grows incrementally during running those stages, the total cost spent on re-creation of DataFrame grows too. In particular, the `Analyzer` will go through the big query plan even most part of it is analyzed. By eliminating part of the cost, the time to run the example code locally is reduced from about 1min to about 30 secs. In particular, the time applying the pipeline locally is mostly spent on calling transform of the 137 `Bucketizer`s. Before the change, each call of `Bucketizer`'s transform can cost about 0.4 sec. So the total time spent on all `Bucketizer`s' transform is about 50 secs. After the change, each call only costs about 0.1 sec. <del>We also make `boundEnc` as lazy variable to reduce unnecessary running time.</del> ### Performance improvement The codes and datasets provided by Barry Becker to re-produce this issue and benchmark can be found on the JIRA. Before this patch: about 1 min After this patch: about 20 secs ## How was this patch tested? Existing tests. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #17770 from viirya/SPARK-20392.	2017-05-26 13:45:55 +08:00
liuxian	197f9018a4	[SPARK-20403][SQL] Modify the instructions of some functions ## What changes were proposed in this pull request? 1. add instructions of 'cast' function When using 'show functions' and 'desc function cast' command in spark-sql 2. Modify the instructions of functions，such as boolean，tinyint，smallint，int，bigint，float，double，decimal，date，timestamp，binary，string ## How was this patch tested? Before modification： spark-sql>desc function boolean; Function: boolean Class: org.apache.spark.sql.catalyst.expressions.Cast Usage: boolean(expr AS type) - Casts the value `expr` to the target data type `type`. After modification： spark-sql> desc function boolean; Function: boolean Class: org.apache.spark.sql.catalyst.expressions.Cast Usage: boolean(expr) - Casts the value `expr` to the target data type `boolean`. spark-sql> desc function cast Function: cast Class: org.apache.spark.sql.catalyst.expressions.Cast Usage: cast(expr AS type) - Casts the value `expr` to the target data type `type`. Author: liuxian <liu.xian3@zte.com.cn> Closes #17698 from 10110346/wip_lx_0418.	2017-05-24 17:32:02 -07:00
Reynold Xin	a64746677b	[SPARK-20867][SQL] Move hints from Statistics into HintInfo class ## What changes were proposed in this pull request? This is a follow-up to SPARK-20857 to move the broadcast hint from Statistics into a new HintInfo class, so we can be more flexible in adding new hints in the future. ## How was this patch tested? Updated test cases to reflect the change. Author: Reynold Xin <rxin@databricks.com> Closes #18087 from rxin/SPARK-20867.	2017-05-24 13:57:19 -07:00
Reynold Xin	0d589ba00b	[SPARK-20857][SQL] Generic resolved hint node ## What changes were proposed in this pull request? This patch renames BroadcastHint to ResolvedHint (and Hint to UnresolvedHint) so the hint framework is more generic and would allow us to introduce other hint types in the future without introducing new hint nodes. ## How was this patch tested? Updated test cases. Author: Reynold Xin <rxin@databricks.com> Closes #18072 from rxin/SPARK-20857.	2017-05-23 18:44:49 +02:00
Xiao Li	a2460be9c3	[SPARK-17410][SPARK-17284] Move Hive-generated Stats Info to HiveClientImpl ### What changes were proposed in this pull request? After we adding a new field `stats` into `CatalogTable`, we should not expose Hive-specific Stats metadata to `MetastoreRelation`. It complicates all the related codes. It also introduces a bug in `SHOW CREATE TABLE`. The statistics-related table properties should be skipped by `SHOW CREATE TABLE`, since it could be incorrect in the newly created table. See the Hive JIRA: https://issues.apache.org/jira/browse/HIVE-13792 Also fix the issue to fill Hive-generated RowCounts to our stats. This PR is to handle Hive-specific Stats metadata in `HiveClientImpl`. ### How was this patch tested? Added a few test cases. Author: Xiao Li <gatorsmile@gmail.com> Closes #14971 from gatorsmile/showCreateTableNew.	2017-05-22 17:28:30 -07:00
gatorsmile	f3ed62a381	[SPARK-20831][SQL] Fix INSERT OVERWRITE data source tables with IF NOT EXISTS ### What changes were proposed in this pull request? Currently, we have a bug when we specify `IF NOT EXISTS` in `INSERT OVERWRITE` data source tables. For example, given a query: ```SQL INSERT OVERWRITE TABLE $tableName partition (b=2, c=3) IF NOT EXISTS SELECT 9, 10 ``` we will get the following error: ``` unresolved operator 'InsertIntoTable Relation[a#425,d#426,b#427,c#428] parquet, Map(b -> Some(2), c -> Some(3)), true, true;; 'InsertIntoTable Relation[a#425,d#426,b#427,c#428] parquet, Map(b -> Some(2), c -> Some(3)), true, true +- Project [cast(9#423 as int) AS a#429, cast(10#424 as int) AS d#430] +- Project [9 AS 9#423, 10 AS 10#424] +- OneRowRelation$ ``` This PR is to fix the issue to follow the behavior of Hive serde tables > INSERT OVERWRITE will overwrite any existing data in the table or partition unless IF NOT EXISTS is provided for a partition ### How was this patch tested? Modified an existing test case Author: gatorsmile <gatorsmile@gmail.com> Closes #18050 from gatorsmile/insertPartitionIfNotExists.	2017-05-22 22:24:50 +08:00
caoxuewen	3c9eef35a8	[SPARK-20786][SQL] Improve ceil and floor handle the value which is not expected ## What changes were proposed in this pull request? spark-sql>SELECT ceil(1234567890123456); 1234567890123456 spark-sql>SELECT ceil(12345678901234567); 12345678901234568 spark-sql>SELECT ceil(123456789012345678); 123456789012345680 when the length of the getText is greater than 16. long to double will be precision loss. but mysql handle the value is ok. mysql> SELECT ceil(1234567890123456); +------------------------+ \| ceil(1234567890123456) \| +------------------------+ \| 1234567890123456 \| +------------------------+ 1 row in set (0.00 sec) mysql> SELECT ceil(12345678901234567); +-------------------------+ \| ceil(12345678901234567) \| +-------------------------+ \| 12345678901234567 \| +-------------------------+ 1 row in set (0.00 sec) mysql> SELECT ceil(123456789012345678); +--------------------------+ \| ceil(123456789012345678) \| +--------------------------+ \| 123456789012345678 \| +--------------------------+ 1 row in set (0.00 sec) ## How was this patch tested? Supplement the unit test. Author: caoxuewen <cao.xuewen@zte.com.cn> Closes #18016 from heary-cao/ceil_long.	2017-05-21 22:39:07 -07:00
liuxian	ea3b1e352a	[SPARK-20763][SQL] The function of `month` and `day` return the value which is not we expected. ## What changes were proposed in this pull request? spark-sql>select month("1582-09-28"); spark-sql>10 For this case, the expected result is 9, but it is 10. spark-sql>select day("1582-04-18"); spark-sql>28 For this case, the expected result is 18, but it is 28. when the date before "1582-10-04", the function of `month` and `day` return the value which is not we expected. ## How was this patch tested? unit tests Author: liuxian <liu.xian3@zte.com.cn> Closes #17997 from 10110346/wip_lx_0516.	2017-05-19 10:25:21 -07:00
Yuming Wang	bff021dfaf	[SPARK-20751][SQL] Add built-in SQL Function - COT ## What changes were proposed in this pull request? Add built-in SQL Function - COT. ## How was this patch tested? unit tests Author: Yuming Wang <wgyumg@gmail.com> Closes #17999 from wangyum/SPARK-20751.	2017-05-19 09:40:22 -07:00
Ala Luszczak	ce8edb8bf4	[SPARK-20798] GenerateUnsafeProjection should check if a value is null before calling the getter ## What changes were proposed in this pull request? GenerateUnsafeProjection.writeStructToBuffer() did not honor the assumption that the caller must make sure that a value is not null before using the getter. This could lead to various errors. This change fixes that behavior. Example of code generated before: ```scala /* 059 / final UTF8String fieldName = value.getUTF8String(0); / 060 / if (value.isNullAt(0)) { / 061 / rowWriter1.setNullAt(0); / 062 / } else { / 063 / rowWriter1.write(0, fieldName); / 064 / } ``` Example of code generated now: ```scala / 060 / boolean isNull1 = value.isNullAt(0); / 061 / UTF8String value1 = isNull1 ? null : value.getUTF8String(0); / 062 / if (isNull1) { / 063 / rowWriter1.setNullAt(0); / 064 / } else { / 065 / rowWriter1.write(0, value1); / 066 */ } ``` ## How was this patch tested? Adds GenerateUnsafeProjectionSuite. Author: Ala Luszczak <ala@databricks.com> Closes #18030 from ala/fix-generate-unsafe-projection.	2017-05-19 13:18:48 +02:00
Xingbo Jiang	b7aac15d56	[SPARK-20700][SQL] InferFiltersFromConstraints stackoverflows for query (v2) ## What changes were proposed in this pull request? In the previous approach we used `aliasMap` to link an `Attribute` to the expression with potentially the form `f(a, b)`, but we only searched the `expressions` and `children.expressions` for this, which is not enough when an `Alias` may lies deep in the logical plan. In that case, we can't generate the valid equivalent constraint classes and thus we fail at preventing the recursive deductions. We fix this problem by collecting all `Alias`s from the logical plan. ## How was this patch tested? No additional test case is added, but do modified one test case to cover this situation. Author: Xingbo Jiang <xingbo.jiang@databricks.com> Closes #18020 from jiangxb1987/inferConstrants.	2017-05-17 23:32:31 -07:00
Liang-Chi Hsieh	7463a88be6	[SPARK-20690][SQL] Subqueries in FROM should have alias names ## What changes were proposed in this pull request? We add missing attributes into Filter in Analyzer. But we shouldn't do it through subqueries like this: select 1 from (select 1 from onerow t1 LIMIT 1) where t1.c1=1 This query works in current codebase. However, the outside where clause shouldn't be able to refer `t1.c1` attribute. The root cause is we allow subqueries in FROM have no alias names previously, it is confusing and isn't supported by various databases such as MySQL, Postgres, Oracle. We shouldn't support it too. ## How was this patch tested? Jenkins tests. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #17935 from viirya/SPARK-20690.	2017-05-17 12:57:35 +08:00
Herman van Hovell	69bb7715f9	[SQL][TRIVIAL] Lower parser log level to debug ## What changes were proposed in this pull request? Currently the parser logs the query it is parsing at `info` level. This is too high, this PR lowers the log level to `debug`. ## How was this patch tested? Existing tests. Author: Herman van Hovell <hvanhovell@databricks.com> Closes #18006 from hvanhovell/lower_parser_log_level.	2017-05-16 15:58:50 -07:00
Kazuaki Ishizaki	6f62e9d9b9	[SPARK-19372][SQL] Fix throwing a Java exception at df.fliter() due to 64KB bytecode size limit ## What changes were proposed in this pull request? When an expression for `df.filter()` has many nodes (e.g. 400), the size of Java bytecode for the generated Java code is more than 64KB. It produces an Java exception. As a result, the execution fails. This PR continues to execute by calling `Expression.eval()` disabling code generation if an exception has been caught. ## How was this patch tested? Add a test suite into `DataFrameSuite` Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #17087 from kiszk/SPARK-19372.	2017-05-16 14:47:21 -07:00
Takuya UESHIN	c8c878a416	[SPARK-20588][SQL] Cache TimeZone instances. ## What changes were proposed in this pull request? Because the method `TimeZone.getTimeZone(String ID)` is synchronized on the TimeZone class, concurrent call of this method will become a bottleneck. This especially happens when casting from string value containing timezone info to timestamp value, which uses `DateTimeUtils.stringToTimestamp()` and gets TimeZone instance on the site. This pr makes a cache of the generated TimeZone instances to avoid the synchronization. ## How was this patch tested? Existing tests. Author: Takuya UESHIN <ueshin@databricks.com> Closes #17933 from ueshin/issues/SPARK-20588.	2017-05-15 16:52:22 -07:00
Takeshi Yamamuro	b0888d1ac3	[SPARK-20730][SQL] Add an optimizer rule to combine nested Concat ## What changes were proposed in this pull request? This pr added a new Optimizer rule to combine nested Concat. The master supports a pipeline operator '\|\|' to concatenate strings in #17711 (This pr is follow-up). Since the parser currently generates nested Concat expressions, the optimizer needs to combine the nested expressions. ## How was this patch tested? Added tests in `CombineConcatSuite` and `SQLQueryTestSuite`. Author: Takeshi Yamamuro <yamamuro@apache.org> Closes #17970 from maropu/SPARK-20730.	2017-05-15 16:24:55 +08:00

1 2 3 4 5 ...

2297 commits