ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
gatorsmile	28b8713036	[SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT ### What changes were proposed in this pull request? This patch is to bump the master branch version to 3.1.0-SNAPSHOT. ### Why are the changes needed? N/A ### Does this PR introduce any user-facing change? N/A ### How was this patch tested? N/A Closes #27698 from gatorsmile/updateVersion. Authored-by: gatorsmile <gatorsmile@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-02-25 19:44:31 -08:00
HyukjinKwon	6f4703e22e	[SPARK-30690][DOCS][BUILD] Add CalendarInterval into API documentation ### What changes were proposed in this pull request? We should also expose it in documentation as we marked it as unstable API as of SPARK-30547 Note that, seems Javadoc -> Scaladoc doesn't work but this PR does not target to fix. ### Why are the changes needed? To show the documentation of API. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually built the docs via `jykill serve` under `docs` directory: ![Screen Shot 2020-01-31 at 4 04 15 PM](https://user-images.githubusercontent.com/6477701/73519315-12143300-4444-11ea-9260-070c9f672dde.png) Closes #27412 from HyukjinKwon/SPARK-30547. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-01-31 22:50:01 +09:00
Kent Yao	af705421db	[SPARK-30593][SQL] Revert interval ISO/ANSI SQL Standard output since we decide not to follow ANSI and no round trip ### What changes were proposed in this pull request? This revert https://github.com/apache/spark/pull/26418, file a new ticket under https://issues.apache.org/jira/browse/SPARK-30546 for better tracking interval behavior ### Why are the changes needed? Revert interval ISO/ANSI SQL Standard output since we decide not to follow ANSI and there is no round trip ### Does this PR introduce any user-facing change? no, not released yet ### How was this patch tested? existing uts Closes #27304 from yaooqinn/SPARK-30593. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-01-21 20:51:10 +08:00
Kent Yao	730388b369	[SPARK-30547][SQL][FOLLOWUP] Update since anotation for CalendarInterval class ### What changes were proposed in this pull request? Mark `CalendarInterval` class with `since 3.0.0`. ### Why are the changes needed? https://www.oracle.com/technetwork/java/javase/documentation/index-137868.html#since This class is the first time going to the public, the annotation is the first time to add, and we don't want people to get confused and try to use it 2.4.x. ### Does this PR introduce any user-facing change? no ### How was this patch tested? no Closes #27299 from yaooqinn/SPARK-30547-F. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-01-21 20:35:47 +08:00
Kent Yao	4806cc5bd1	[SPARK-30547][SQL] Add unstable annotation to the CalendarInterval class ### What changes were proposed in this pull request? `CalendarInterval` is maintained as a private class but might be used in a public way by users e.g. ```scala scala> spark.udf.register("getIntervalMonth", (_:org.apache.spark.unsafe.types.CalendarInterval).months) scala> sql("select interval 2 month 1 day a").selectExpr("getIntervalMonth(a)").show +-------------------+ \|getIntervalMonth(a)\| +-------------------+ \| 2\| +-------------------+ ``` And it exists since 1.5.0, now we go to the 3.x era，may be it's time to make it public ### Why are the changes needed? make the interval more future-proofing ### Does this PR introduce any user-facing change? doc change ### How was this patch tested? add ut. Closes #27258 from yaooqinn/SPARK-30547. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-01-20 12:17:37 +08:00
Kent Yao	17857f9b8b	[SPARK-30551][SQL] Disable comparison for interval type ### What changes were proposed in this pull request? As we are not going to follow ANSI to implement year-month and day-time interval types, it is weird to compare the year-month part to the day-time part for our current implementation of interval type now. Additionally, the current ordering logic comes from PostgreSQL where the implementation of the interval is messy. And we are not aiming PostgreSQL compliance at all. THIS PR will revert https://github.com/apache/spark/pull/26681 and https://github.com/apache/spark/pull/26337 ### Why are the changes needed? make interval type more future-proofing ### Does this PR introduce any user-facing change? there are new in 3.0, so no ### How was this patch tested? existing uts shall work Closes #27262 from yaooqinn/SPARK-30551. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-01-19 15:27:51 +08:00
root1	e0efd213eb	[SPARK-30292][SQL] Throw Exception when invalid string is cast to numeric type in ANSI mode ### What changes were proposed in this pull request? If spark.sql.ansi.enabled is set, throw exception when cast to any numeric type do not follow the ANSI SQL standards. ### Why are the changes needed? ANSI SQL standards do not allow invalid strings to get casted into numeric types and throw exception for that. Currently spark sql gives NULL in such cases. Before: `select cast('str' as decimal) => NULL` After : `select cast('str' as decimal) => invalid input syntax for type numeric: str` These results are after setting `spark.sql.ansi.enabled=true` ### Does this PR introduce any user-facing change? Yes. Now when ansi mode is on users will get arithmetic exception for invalid strings. ### How was this patch tested? Unit Tests Added. Closes #26933 from iRakson/castDecimalANSI. Lead-authored-by: root1 <raksonrakesh@gmail.com> Co-authored-by: iRakson <raksonrakesh@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-01-14 17:03:10 +08:00
Yuming Wang	696288f623	[INFRA] Reverts commit `56dcd79` and `c216ef1` ### What changes were proposed in this pull request? 1. Revert "Preparing development version 3.0.1-SNAPSHOT": `56dcd79` 2. Revert "Preparing Spark release v3.0.0-preview2-rc2": `c216ef1` ### Why are the changes needed? Shouldn't change master. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? manual test: https://github.com/apache/spark/compare/5de5e46..wangyum:revert-master Closes #26915 from wangyum/revert-master. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Yuming Wang <wgyumg@gmail.com>	2019-12-16 19:57:44 -07:00
Yuming Wang	56dcd79992	Preparing development version 3.0.1-SNAPSHOT	2019-12-17 01:57:27 +00:00
Yuming Wang	c216ef1d03	Preparing Spark release v3.0.0-preview2-rc2	2019-12-17 01:57:21 +00:00
Kent Yao	d3ec8b1735	[SPARK-30066][SQL] Support columnar execution on interval types ### What changes were proposed in this pull request? Columnar execution support for interval types ### Why are the changes needed? support cache tables with interval columns improve performance too ### Does this PR introduce any user-facing change? Yes cache table with accept interval columns ### How was this patch tested? add ut Closes #26699 from yaooqinn/SPARK-30066. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-14 13:10:46 -08:00
Liu,Linhong	f22177c957	[SPARK-29486][SQL][FOLLOWUP] Document the reason to add days field ### What changes were proposed in this pull request? Follow up of #26134 to document the reason to add days filed and explain how do we use it ### Why are the changes needed? only comment ### Does this PR introduce any user-facing change? no ### How was this patch tested? no need test Closes #26701 from LinhongLiu/spark-29486-followup. Authored-by: Liu,Linhong <liulinhong@baidu.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-11-30 08:43:34 -06:00
Kent Yao	de21f28f8a	[SPARK-29986][SQL] casting string to date/timestamp/interval should trim all whitespaces ### What changes were proposed in this pull request? A java like string trim method trims all whitespaces that less or equal than 0x20. currently, our UTF8String handle the space =0x20 ONLY. This is not suitable for many cases in Spark, like trim for interval strings, date, timestamps, PostgreSQL like cast string to boolean. ### Why are the changes needed? improve the white spaces handling in UTF8String, also with some bugs fixed ### Does this PR introduce any user-facing change? yes, string with `control character` at either end can be convert to date/timestamp and interval now ### How was this patch tested? add ut Closes #26626 from yaooqinn/SPARK-29986. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-25 14:37:04 +08:00
Kent Yao	2dd6807e42	[SPARK-28023][SQL] Add trim logic in UTF8String's toInt/toLong to make it consistent with other string-numeric casting ### What changes were proposed in this pull request? Modify `UTF8String.toInt/toLong` to support trim spaces for both sides before converting it to byte/short/int/long. With this kind of "cheap" trim can help improve performance for casting string to integrals. The idea is from https://github.com/apache/spark/pull/24872#issuecomment-556917834 ### Why are the changes needed? make the behavior consistent. ### Does this PR introduce any user-facing change? yes, cast string to an integral type, and binary comparison between string and integrals will trim spaces first. their behavior will be consistent with float and double. ### How was this patch tested? 1. add ut. 2. benchmark tests the benchmark is modified based on https://github.com/apache/spark/pull/24872#issuecomment-503827016 ```scala /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. / package org.apache.spark.sql.execution.benchmark import org.apache.spark.benchmark.Benchmark /* * Benchmark trim the string when casting string type to Boolean/Numeric types. * To run this benchmark: * {{{ * 1. without sbt: * bin/spark-submit --class <this class> --jars <spark core test jar> <spark sql test jar> * 2. build/sbt "sql/test:runMain <this class>" * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>" * Results will be written to "benchmarks/CastBenchmark-results.txt". * }}} / object CastBenchmark extends SqlBasedBenchmark { This conversation was marked as resolved by yaooqinn override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { val title = "Cast String to Integral" runBenchmark(title) { withTempPath { dir => val N = 500L << 14 val df = spark.range(N) val types = Seq("int", "long") (1 to 5).by(2).foreach { i => df.selectExpr(s"concat(id, '${" " i}') as str") .write.mode("overwrite").parquet(dir + i.toString) } val benchmark = new Benchmark(title, N, minNumIters = 5, output = output) Seq(true, false).foreach { trim => types.foreach { t => val str = if (trim) "trim(str)" else "str" val expr = s"cast($str as $t) as c_$t" (1 to 5).by(2).foreach { i => benchmark.addCase(expr + s" - with $i spaces") { _ => spark.read.parquet(dir + i.toString).selectExpr(expr).collect() } } } } benchmark.run() } } } } ``` #### benchmark result. normal trim v.s. trim in toInt/toLong ```java ================================================================================================ Cast String to Integral ================================================================================================ Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.1 Intel(R) Core(TM) i5-5287U CPU 2.90GHz Cast String to Integral: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ cast(trim(str) as int) as c_int - with 1 spaces 10220 12994 1337 0.8 1247.5 1.0X cast(trim(str) as int) as c_int - with 3 spaces 4763 8356 357 1.7 581.4 2.1X cast(trim(str) as int) as c_int - with 5 spaces 4791 8042 NaN 1.7 584.9 2.1X cast(trim(str) as long) as c_long - with 1 spaces 4014 6755 NaN 2.0 490.0 2.5X cast(trim(str) as long) as c_long - with 3 spaces 4737 6938 NaN 1.7 578.2 2.2X cast(trim(str) as long) as c_long - with 5 spaces 4478 6919 1404 1.8 546.6 2.3X cast(str as int) as c_int - with 1 spaces 4443 6222 NaN 1.8 542.3 2.3X cast(str as int) as c_int - with 3 spaces 3659 3842 170 2.2 446.7 2.8X cast(str as int) as c_int - with 5 spaces 4372 7996 NaN 1.9 533.7 2.3X cast(str as long) as c_long - with 1 spaces 3866 5838 NaN 2.1 471.9 2.6X cast(str as long) as c_long - with 3 spaces 3793 5449 NaN 2.2 463.0 2.7X cast(str as long) as c_long - with 5 spaces 4947 5961 1198 1.7 603.9 2.1X ``` Closes #26622 from yaooqinn/cheapstringtrim. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-22 19:32:27 +08:00
HyukjinKwon	882f54b0a3	[SPARK-29870][SQL][FOLLOW-UP] Keep CalendarInterval's toString ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/26418. This PR removed `CalendarInterval`'s `toString` with an unfinished changes. ### Why are the changes needed? 1. Ideally we should make each PR isolated and separate targeting one issue without touching unrelated codes. 2. There are some other places where the string formats were exposed to users. For example: ```scala scala> sql("select interval 1 days as a").selectExpr("to_csv(struct(a))").show() ``` ``` +--------------------------+ \|to_csv(named_struct(a, a))\| +--------------------------+ \| "CalendarInterval...\| +--------------------------+ ``` 3. Such fixes: ```diff private def writeMapData( map: MapData, mapType: MapType, fieldWriter: ValueWriter): Unit = { val keyArray = map.keyArray() + val keyString = mapType.keyType match { + case CalendarIntervalType => + (i: Int) => IntervalUtils.toMultiUnitsString(keyArray.getInterval(i)) + case _ => (i: Int) => keyArray.get(i, mapType.keyType).toString + } ``` can cause performance regression due to type dispatch for each map. ### Does this PR introduce any user-facing change? Yes, see 2. case above. ### How was this patch tested? Manually tested. Closes #26572 from HyukjinKwon/SPARK-29783. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-11-19 09:11:41 +09:00
Kent Yao	5cebe587c7	[SPARK-29783][SQL] Support SQL Standard/ISO_8601 output style for interval type ### What changes were proposed in this pull request? Add 3 interval output types which are named as `SQL_STANDARD`, `ISO_8601`, `MULTI_UNITS`. And we add a new conf `spark.sql.dialect.intervalOutputStyle` for this. The `MULTI_UNITS` style displays the interval values in the former behavior and it is the default. The newly added `SQL_STANDARD`, `ISO_8601` styles can be found in the following table. Style \| conf \| Year-Month Interval \| Day-Time Interval \| Mixed Interval -- \| -- \| -- \| -- \| -- Format With Time Unit Designators \| MULTI_UNITS \| 1 year 2 mons \| 1 days 2 hours 3 minutes 4.123456 seconds \| interval 1 days 2 hours 3 minutes 4.123456 seconds SQL STANDARD \| SQL_STANDARD \| 1-2 \| 3 4:05:06 \| -1-2 3 -4:05:06 ISO8601 Basic Format\| ISO_8601\| P1Y2M\| P3DT4H5M6S\|P-1Y-2M3D-4H-5M-6S ### Why are the changes needed? for ANSI SQL support ### Does this PR introduce any user-facing change? yes，interval out now has 3 output styles ### How was this patch tested? add new unit tests cc cloud-fan maropu MaxGekk HyukjinKwon thanks. Closes #26418 from yaooqinn/SPARK-29783. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-18 15:42:22 +08:00
Kent Yao	e026412d9c	[SPARK-29679][SQL] Make interval type comparable and orderable ### What changes were proposed in this pull request? interval type support >, >=, <, <=, =, <=>, order by, min,max.. ### Why are the changes needed? Part of SPARK-27764 Feature Parity between PostgreSQL and Spark ### Does this PR introduce any user-facing change? yes, we now support compare intervals ### How was this patch tested? add ut Closes #26337 from yaooqinn/SPARK-29679. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-08 22:45:11 +08:00
Kent Yao	0a03839366	[SPARK-29787][SQL] Move methods add/subtract/negate from CalendarInterval to IntervalUtils ### What changes were proposed in this pull request? Move method add/subtract/negate from CalendarInterval to IntervalUtils ### Why are the changes needed? https://github.com/apache/spark/pull/26410#discussion_r343125468 suggested here ### Does this PR introduce any user-facing change? no ### How was this patch tested? add uts and move some Closes #26423 from yaooqinn/SPARK-29787. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-08 10:28:58 +08:00
Kent Yao	9562b26914	[SPARK-29757][SQL] Move calendar interval constants together ### What changes were proposed in this pull request? ```java public static final int YEARS_PER_DECADE = 10; public static final int YEARS_PER_CENTURY = 100; public static final int YEARS_PER_MILLENNIUM = 1000; public static final byte MONTHS_PER_QUARTER = 3; public static final int MONTHS_PER_YEAR = 12; public static final byte DAYS_PER_WEEK = 7; public static final long DAYS_PER_MONTH = 30L; public static final long HOURS_PER_DAY = 24L; public static final long MINUTES_PER_HOUR = 60L; public static final long SECONDS_PER_MINUTE = 60L; public static final long SECONDS_PER_HOUR = MINUTES_PER_HOUR * SECONDS_PER_MINUTE; public static final long SECONDS_PER_DAY = HOURS_PER_DAY * SECONDS_PER_HOUR; public static final long MILLIS_PER_SECOND = 1000L; public static final long MILLIS_PER_MINUTE = SECONDS_PER_MINUTE * MILLIS_PER_SECOND; public static final long MILLIS_PER_HOUR = MINUTES_PER_HOUR * MILLIS_PER_MINUTE; public static final long MILLIS_PER_DAY = HOURS_PER_DAY * MILLIS_PER_HOUR; public static final long MICROS_PER_MILLIS = 1000L; public static final long MICROS_PER_SECOND = MILLIS_PER_SECOND * MICROS_PER_MILLIS; public static final long MICROS_PER_MINUTE = SECONDS_PER_MINUTE * MICROS_PER_SECOND; public static final long MICROS_PER_HOUR = MINUTES_PER_HOUR * MICROS_PER_MINUTE; public static final long MICROS_PER_DAY = HOURS_PER_DAY * MICROS_PER_HOUR; public static final long MICROS_PER_MONTH = DAYS_PER_MONTH * MICROS_PER_DAY; /* 365.25 days per year assumes leap year every four years / public static final long MICROS_PER_YEAR = (36525L MICROS_PER_DAY) / 100; public static final long NANOS_PER_MICROS = 1000L; public static final long NANOS_PER_MILLIS = MICROS_PER_MILLIS * NANOS_PER_MICROS; public static final long NANOS_PER_SECOND = MILLIS_PER_SECOND * NANOS_PER_MILLIS; ``` The above parameters are defined in IntervalUtils, DateTimeUtils, and CalendarInterval, some of them are redundant, some of them are cross-referenced. ### Why are the changes needed? To simplify code, enhance consistency and reduce risks ### Does this PR introduce any user-facing change? no ### How was this patch tested? modified uts Closes #26399 from yaooqinn/SPARK-29757. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-07 19:48:19 +08:00
Wenchen Fan	9b61f90987	[SPARK-29761][SQL] do not output leading 'interval' in CalendarInterval.toString ### What changes were proposed in this pull request? remove the leading "interval" in `CalendarInterval.toString`. ### Why are the changes needed? Although it's allowed to have "interval" prefix when casting string to int, it's not recommended. This is also consistent with pgsql: ``` cloud0fan=# select interval '1' day; interval ---------- 1 day (1 row) ``` ### Does this PR introduce any user-facing change? yes, when display a dataframe with interval type column, the result is different. ### How was this patch tested? updated tests. Closes #26401 from cloud-fan/interval. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-07 15:44:50 +08:00
Maxim Gekk	29dc59ac29	[SPARK-29605][SQL] Optimize string to interval casting ### What changes were proposed in this pull request? In the PR, I propose new function `stringToInterval()` in `IntervalUtils` for converting `UTF8String` to `CalendarInterval`. The function is used in casting a `STRING` column to an `INTERVAL` column. ### Why are the changes needed? The proposed implementation is ~10 times faster. For example, parsing 9 interval units on JDK 8: Before: ``` 9 units w/ interval 14004 14125 116 0.1 14003.6 0.0X 9 units w/o interval 13785 14056 290 0.1 13784.9 0.0X ``` After: ``` 9 units w/ interval 1343 1344 1 0.7 1343.0 0.3X 9 units w/o interval 1345 1349 8 0.7 1344.6 0.3X ``` ### Does this PR introduce any user-facing change? No ### How was this patch tested? - By new tests for `stringToInterval` in `IntervalUtilsSuite` - By existing tests Closes #26256 from MaxGekk/string-to-interval. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-07 12:39:52 +08:00
Maxim Gekk	441d4c953e	[SPARK-29723][SQL] Get date and time parts of an interval as java classes ### What changes were proposed in this pull request? I propose 2 new methods for `CalendarInterval`: - `extractAsPeriod()` returns the date part of an interval as an instance of `java.time.Period` - `extractAsDuration()` returns the time part of an interval as an instance of `java.time.Duration` For example: ```scala scala> import org.apache.spark.unsafe.types.CalendarInterval scala> import java.time._ scala> val i = spark.sql("select interval 1 year 3 months 4 days 10 hours 30 seconds").collect()(0).getAs[CalendarInterval](0) scala> LocalDate.of(2019, 11, 1).plus(i.period()) res8: java.time.LocalDate = 2021-02-05 scala> ZonedDateTime.parse("2019-11-01T12:13:14Z").plus(i.extractAsPeriod()).plus(i.extractAsDuration()) res9: java.time.ZonedDateTime = 2021-02-05T22:13:44Z ``` ### Why are the changes needed? Taking into account that `CalendarInterval` has been already partially exposed to users via the collect operation, and probably it will be fully exposed in the future, it could be convenient for users to get the date and time parts of intervals as java classes: - to avoid unnecessary dependency from Spark's classes in user code - to easily use external libraries that accept standard Java classes. ### Does this PR introduce any user-facing change? No ### How was this patch tested? By new test in `CalendarIntervalSuite`. Closes #26368 from MaxGekk/interval-java-period-duration. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-04 11:07:54 -08:00
Maxim Gekk	fb60c2a170	[SPARK-29671][SQL] Simplify string representation of intervals ### What changes were proposed in this pull request? In the PR, I propose to changed `CalendarInterval.toString`: - to skip the `week` unit - to convert `milliseconds` and `microseconds` as the fractional part of the `seconds` unit. ### Why are the changes needed? To improve readability. ### Does this PR introduce any user-facing change? Yes ### How was this patch tested? - By `CalendarIntervalSuite` and `IntervalUtilsSuite` - `literals.sql`, `datetime.sql` and `interval.sql` Closes #26367 from MaxGekk/interval-to-string-format. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-03 22:56:59 -08:00
Maxim Gekk	80a89873b2	[SPARK-29733][TESTS] Fix wrong order of parameters passed to `assertEquals` ### What changes were proposed in this pull request? The `assertEquals` method of JUnit Assert requires the first parameter to be the expected value. In this PR, I propose to change the order of parameters when the expected value is passed as the second parameter. ### Why are the changes needed? Wrong order of assert parameters confuses when the assert fails and the parameters have special string representation. For example: ```java assertEquals(input1.add(input2), new CalendarInterval(5, 5, 367200000000L)); ``` ``` java.lang.AssertionError: Expected :interval 5 months 5 days 101 hours Actual :interval 5 months 5 days 102 hours ``` ### Does this PR introduce any user-facing change? No ### How was this patch tested? By existing tests. Closes #26377 from MaxGekk/fix-order-in-assert-equals. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-03 11:21:28 -08:00
Liu,Linhong	a4382f7fe1	[SPARK-29486][SQL] CalendarInterval should have 3 fields: months, days and microseconds ### What changes were proposed in this pull request? Current CalendarInterval has 2 fields: months and microseconds. This PR try to change it to 3 fields: months, days and microseconds. This is because one logical day interval may have different number of microseconds (daylight saving). ### Why are the changes needed? One logical day interval may have different number of microseconds (daylight saving). For example, in PST timezone, there will be 25 hours from 2019-11-2 12:00:00 to 2019-11-3 12:00:00 ### Does this PR introduce any user-facing change? no ### How was this patch tested? unit test and new added test cases Closes #26134 from LinhongLiu/calendarinterval. Authored-by: Liu,Linhong <liulinhong@baidu.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-11-01 18:12:33 +08:00
Xingbo Jiang	8207c835b4	Revert "Prepare Spark release v3.0.0-preview-rc2" This reverts commit `007c873ae3`.	2019-10-30 17:45:44 -07:00
Xingbo Jiang	007c873ae3	Prepare Spark release v3.0.0-preview-rc2 ### What changes were proposed in this pull request? To push the built jars to maven release repository, we need to remove the 'SNAPSHOT' tag from the version name. Made the following changes in this PR: * Update all the `3.0.0-SNAPSHOT` version name to `3.0.0-preview` * Update the sparkR version number check logic to allow jvm version like `3.0.0-preview` Please note those changes were generated by the release script in the past, but this time since we manually add tags on master branch, we need to manually apply those changes too. We shall revert the changes after 3.0.0-preview release passed. ### Why are the changes needed? To make the maven release repository to accept the built jars. ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A	2019-10-30 17:42:59 -07:00
Maxim Gekk	44c1c03924	[SPARK-29607][SQL] Move static methods from CalendarInterval to IntervalUtils ### What changes were proposed in this pull request? In the PR, I propose to move all static methods from the `CalendarInterval` class to the `IntervalUtils` object. All those methods are rewritten from Java to Scala. ### Why are the changes needed? - For consistency with other helper methods. Such methods were placed to the helper object `IntervalUtils`, see https://github.com/apache/spark/pull/26190 - Taking into account that `CalendarInterval` will be fully exposed to users in the future (see https://github.com/apache/spark/pull/25022), it would be nice to clean it up by moving service methods to an internal object. ### Does this PR introduce any user-facing change? No ### How was this patch tested? - By moved tests from `CalendarIntervalSuite` to `IntervalUtilsSuite` - By existing test suites Closes #26261 from MaxGekk/refactoring-calendar-interval. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-10-30 01:15:18 +08:00
Xingbo Jiang	b33a58c0c6	Revert "Prepare Spark release v3.0.0-preview-rc1" This reverts commit `5eddbb5f1d`.	2019-10-28 22:32:34 -07:00
Xingbo Jiang	5eddbb5f1d	Prepare Spark release v3.0.0-preview-rc1 ### What changes were proposed in this pull request? To push the built jars to maven release repository, we need to remove the 'SNAPSHOT' tag from the version name. Made the following changes in this PR: * Update all the `3.0.0-SNAPSHOT` version name to `3.0.0-preview` * Update the PySpark version from `3.0.0.dev0` to `3.0.0` Please note those changes were generated by the release script in the past, but this time since we manually add tags on master branch, we need to manually apply those changes too. We shall revert the changes after 3.0.0-preview release passed. ### Why are the changes needed? To make the maven release repository to accept the built jars. ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A Closes #26243 from jiangxb1987/3.0.0-preview-prepare. Lead-authored-by: Xingbo Jiang <xingbo.jiang@databricks.com> Co-authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>	2019-10-28 22:31:29 -07:00
Wenchen Fan	cdea520ff8	[SPARK-29532][SQL] Simplify interval string parsing ### What changes were proposed in this pull request? Only use antlr4 to parse the interval string, and remove the duplicated parsing logic from `CalendarInterval`. ### Why are the changes needed? Simplify the code and fix inconsistent behaviors. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Pass the Jenkins with the updated test cases. Closes #26190 from cloud-fan/parser. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-24 09:15:59 -07:00
Maxim Gekk	da576a737c	[SPARK-29369][SQL] Support string intervals without the `interval` prefix ### What changes were proposed in this pull request? In the PR, I propose to move interval parsing to `CalendarInterval.fromCaseInsensitiveString()` which throws an `IllegalArgumentException` for invalid strings, and reuse it from `CalendarInterval.fromString()`. The former one handles `IllegalArgumentException` only and returns `NULL` for invalid interval strings. This will allow to support interval strings without the `interval` prefix in casting strings to intervals and in interval type constructor because they use `fromString()` for parsing string intervals. For example: ```sql spark-sql> select cast('1 year 10 days' as interval); interval 1 years 1 weeks 3 days spark-sql> SELECT INTERVAL '1 YEAR 10 DAYS'; interval 1 years 1 weeks 3 days ``` ### Why are the changes needed? To maintain feature parity with PostgreSQL which supports interval strings without prefix: ```sql # select interval '2 months 1 microsecond'; interval ------------------------ 2 mons 00:00:00.000001 ``` and to improve Spark SQL UX. ### Does this PR introduce any user-facing change? Yes, previously parsing of interval strings without `interval` gives `NULL`: ```sql spark-sql> select interval '2 months 1 microsecond'; NULL ``` After: ```sql spark-sql> select interval '2 months 1 microsecond'; interval 2 months 1 microseconds ``` ### How was this patch tested? - Added new tests to `CalendarIntervalSuite.java` - A test for casting strings to intervals in `CastSuite` - Test for interval type constructor from strings in `ExpressionParserSuite` Closes #26079 from MaxGekk/interval-str-without-prefix. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-10-14 23:34:18 +08:00
Maxim Gekk	b10344956d	[SPARK-29342][SQL] Make casting of string values to intervals case insensitive ### What changes were proposed in this pull request? In the PR, I propose to pass the `Pattern.CASE_INSENSITIVE` flag while compiling interval patterns in `CalendarInterval`. This makes casting string values to intervals case insensitive and tolerant to case of the `interval`, `year(s)`, `month(s)`, `week(s)`, `day(s)`, `hour(s)`, `minute(s)`, `second(s)`, `millisecond(s)` and `microsecond(s)`. ### Why are the changes needed? There are at least 2 reasons: - To maintain feature parity with PostgreSQL which is not sensitive to case: ```sql # select cast('10 Days' as INTERVAL); interval ---------- 10 days (1 row) ``` - Spark is tolerant to case of interval literals. Case insensitivity in casting should be convenient for Spark users. ```sql spark-sql> SELECT INTERVAL 1 YEAR 1 WEEK; interval 1 years 1 weeks ``` ### Does this PR introduce any user-facing change? Yes, current implementation produces `NULL` for `interval`, `year`, ... `microsecond` that are not in lower case. Before: ```sql spark-sql> SELECT CAST('INTERVAL 10 DAYS' as INTERVAL); NULL ``` After: ```sql spark-sql> SELECT CAST('INTERVAL 10 DAYS' as INTERVAL); interval 1 weeks 3 days ``` ### How was this patch tested? - by new tests in `CalendarIntervalSuite.java` - new test in `CastSuite` Closes #26010 from MaxGekk/interval-case-insensitive. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-07 09:33:01 -07:00
Dongjoon Hyun	bd031c2173	[SPARK-29307][BUILD][TESTS] Remove scalatest deprecation warnings ### What changes were proposed in this pull request? This PR aims to remove `scalatest` deprecation warnings with the following changes. - `org.scalatest.mockito.MockitoSugar` -> `org.scalatestplus.mockito.MockitoSugar` - `org.scalatest.selenium.WebBrowser` -> `org.scalatestplus.selenium.WebBrowser` - `org.scalatest.prop.Checkers` -> `org.scalatestplus.scalacheck.Checkers` - `org.scalatest.prop.GeneratorDrivenPropertyChecks` -> `org.scalatestplus.scalacheck.ScalaCheckDrivenPropertyChecks` ### Why are the changes needed? According to the Jenkins logs, there are 118 warnings about this. ``` grep "is deprecated" ~/consoleText \| grep scalatest \| wc -l 118 ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? After Jenkins passes, we need to check the Jenkins log. Closes #25982 from dongjoon-hyun/SPARK-29307. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-30 21:00:11 -07:00
younggyu chun	8535df7261	[MINOR] Fix typos in comments and replace an explicit type with <> ## What changes were proposed in this pull request? This PR fixed typos in comments and replace the explicit type with '<>' for Java 8+. ## How was this patch tested? Manually tested. Closes #25338 from younggyuchun/younggyu. Authored-by: younggyu chun <younggyuchun@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-08-10 16:47:11 -05:00
Jungtaek Lim (HeartSaVioR)	128ea37bda	[SPARK-28601][CORE][SQL] Use StandardCharsets.UTF_8 instead of "UTF-8" string representation, and get rid of UnsupportedEncodingException ## What changes were proposed in this pull request? This patch tries to keep consistency whenever UTF-8 charset is needed, as using `StandardCharsets.UTF_8` instead of using "UTF-8". If the String type is needed, `StandardCharsets.UTF_8.name()` is used. This change also brings the benefit of getting rid of `UnsupportedEncodingException`, as we're providing `Charset` instead of `String` whenever possible. This also changes some private Catalyst helper methods to operate on encodings as `Charset` objects rather than strings. ## How was this patch tested? Existing unit tests. Closes #25335 from HeartSaVioR/SPARK-28601. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-05 20:45:54 -07:00
Zhu, Lipeng	d26642dbbc	[SPARK-28107][SQL] Support 'DAY TO (HOUR\|MINUTE\|SECOND)', 'HOUR TO (MINUTE\|SECOND)' and 'MINUTE TO SECOND' ## What changes were proposed in this pull request? The interval conversion behavior is same with the PostgreSQL. https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/interval.sql#L180-L203 ## How was this patch tested? UT. Closes #25000 from lipzhu/SPARK-28107. Lead-authored-by: Zhu, Lipeng <lipzhu@ebay.com> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Co-authored-by: Lipeng Zhu <lipzhu@icloud.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-07-10 18:01:42 -07:00
Dongjoon Hyun	a5ff9221fc	[SPARK-28308][CORE] CalendarInterval sub-second part should be padded before parsing ## What changes were proposed in this pull request? The sub-second part of the interval should be padded before parsing. Currently, Spark gives a correct value only when there is 9 digits below `.`. ``` spark-sql> select interval '0 0:0:0.123456789' day to second; interval 123 milliseconds 456 microseconds spark-sql> select interval '0 0:0:0.12345678' day to second; interval 12 milliseconds 345 microseconds spark-sql> select interval '0 0:0:0.1234' day to second; interval 1 microseconds ``` ## How was this patch tested? Pass the Jenkins with the fixed test cases. Closes #25079 from dongjoon-hyun/SPARK-28308. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-07-08 19:40:41 -07:00
ketank-new	1a3858a769	[SPARK-26985][CORE] Fix "access only some column of the all of columns " for big endian architecture BUG continuation to https://github.com/apache/spark/pull/24788 ## What changes were proposed in this pull request? Changes are related to BIG ENDIAN system This changes are done to identify s390x platform. use byteorder to BIG_ENDIAN for big endian systems changes for 2 are done in access functions putFloats() and putDouble() ## How was this patch tested? Changes have been tested to build successfully on s390x as well x86 platform to make sure build is successful. Closes #24861 from ketank-new/ketan_latest_v2.3.2. Authored-by: ketank-new <ketan22584@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-06-25 08:24:10 -05:00
Josh Rosen	fc65e0fe2c	[SPARK-27839][SQL] Change UTF8String.replace() to operate on UTF8 bytes ## What changes were proposed in this pull request? This PR significantly improves the performance of `UTF8String.replace()` by performing direct replacement over UTF8 bytes instead of decoding those bytes into Java Strings. In cases where the search string is not found (i.e. no replacements are performed, a case which I expect to be common) this new implementation performs no object allocation or memory copying. My implementation is modeled after `commons-lang3`'s `StringUtils.replace()` method. As part of my implementation, I needed a StringBuilder / resizable buffer, so I moved `UTF8StringBuilder` from the `catalyst` package to `unsafe`. ## How was this patch tested? Copied tests from `StringExpressionSuite` to `UTF8StringSuite` and added a couple of new cases. To evaluate performance, I did some quick local benchmarking by running the following code in `spark-shell` (with Java 1.8.0_191): ```scala import org.apache.spark.unsafe.types.UTF8String def benchmark(text: String, search: String, replace: String) { val utf8Text = UTF8String.fromString(text) val utf8Search = UTF8String.fromString(search) val utf8Replace = UTF8String.fromString(replace) val start = System.currentTimeMillis var i = 0 while (i < 1000 * 1000 * 100) { utf8Text.replace(utf8Search, utf8Replace) i += 1 } val end = System.currentTimeMillis println(end - start) } benchmark("ABCDEFGH", "DEF", "ZZZZ") // replacement occurs benchmark("ABCDEFGH", "Z", "") // no replacement occurs ``` On my laptop this took ~54 / ~40 seconds seconds before this patch's changes and ~6.5 / ~3.8 seconds afterwards. Closes #24707 from JoshRosen/faster-string-replace. Authored-by: Josh Rosen <rosenville@gmail.com> Signed-off-by: Josh Rosen <rosenville@gmail.com>	2019-06-19 15:21:26 -07:00
Sean Owen	4576dfde19	[SPARK-28066][CORE] Optimize UTF8String.trim() for common case of no whitespace ## What changes were proposed in this pull request? UTF8String.trim() allocates a new object even if the string has no whitespace, when it can just return itself. A simple check for this case makes the method about 3x faster in the common case. ## How was this patch tested? Existing tests. A rough benchmark of 90% strings without whitespace (at ends), and 10% that do have whitespace, suggests the average runtime goes from 20 ns to 6 ns. Closes #24884 from srowen/SPARK-28066. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-06-17 08:49:11 -07:00
Zhu, Lipeng	5700c39c89	[SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax ## What changes were proposed in this pull request? Currently, SparkSQL can support interval format like this. ```sql SELECT INTERVAL '0 23:59:59.155' DAY TO SECOND ``` Like Presto/Teradata, this PR aims to support grammar like below. ```sql SELECT INTERVAL '23:59:59.155' HOUR TO SECOND ``` Although we can add a new function for this pattern, we had better extend the existing code to handle a missing day case. So, the following is also supported. ```sql SELECT INTERVAL '23:59:59.155' DAY TO SECOND SELECT INTERVAL '1 23:59:59.155' HOUR TO SECOND ``` Currently Vertica/Teradata/Postgresql/SQL Server have fully support of below interval functions. - interval ... year to month - interval ... day to hour - interval ... day to minute - interval ... day to second - interval ... hour to minute - interval ... hour to second - interval ... minute to second https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Literals/interval-qualifier.htm `df1a699e5b/src/test/regress/sql/interval.sql (L180-L203)` https://docs.teradata.com/reader/S0Fw2AVH8ff3MDA0wDOHlQ/KdCtT3pYFo~_enc8~kGKVw https://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/interval-literals?view=sql-server-2017 ## How was this patch tested? Pass the Jenkins with the updated test cases. Closes #24472 from lipzhu/SPARK-27578. Lead-authored-by: Zhu, Lipeng <lipzhu@ebay.com> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Co-authored-by: Lipeng Zhu <lipzhu@icloud.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-06-13 10:12:55 -07:00
Dongjoon Hyun	e561e92765	[SPARK-27981][CORE] Remove `Illegal reflective access` warning for `java.nio.Bits.unaligned()` in JDK9+ ## What changes were proposed in this pull request? This PR aims to remove the following warnings for `java.nio.Bits.unaligned` at JDK9/10/11/12. Please note that there are more warnings which is beyond of this PR's scope. JDK9+ shows the first warning only if you don't give `--illegal-access=warn`. BEFORE (Among 5 warnings, there is `java.nio.Bits.unaligned` warning at the startup) ``` $ bin/spark-shell --driver-java-options=--illegal-access=warn WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/dhyun/APACHE/spark/assembly/target/scala-2.12/jars/spark-unsafe_2.12-3.0.0-SNAPSHOT.jar) to method java.nio.Bits.unaligned() WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/dhyun/APACHE/spark/assembly/target/scala-2.12/jars/spark-unsafe_2.12-3.0.0-SNAPSHOT.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/dhyun/APACHE/spark/assembly/target/scala-2.12/jars/spark-unsafe_2.12-3.0.0-SNAPSHOT.jar) to field java.nio.DirectByteBuffer.cleaner WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/Users/dhyun/APACHE/spark/assembly/target/scala-2.12/jars/hadoop-auth-2.7.4.jar) to method sun.security.krb5.Config.getInstance() WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/Users/dhyun/APACHE/spark/assembly/target/scala-2.12/jars/hadoop-auth-2.7.4.jar) to method sun.security.krb5.Config.getDefaultRealm() 19/06/08 11:01:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[], app id = local-1560016882712). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT /_/ Using Scala version 2.12.8 (OpenJDK 64-Bit Server VM, Java 11.0.3) ``` AFTER (Among 4 warnings, there is no `java.nio.Bits.unaligned` warning with `hadoop-2.7` profile)* ``` $ bin/spark-shell --driver-java-options=--illegal-access=warn WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/dhyun/PRS/PLATFORM/assembly/target/scala-2.12/jars/spark-unsafe_2.12-3.0.0-SNAPSHOT.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/dhyun/PRS/PLATFORM/assembly/target/scala-2.12/jars/spark-unsafe_2.12-3.0.0-SNAPSHOT.jar) to field java.nio.DirectByteBuffer.cleaner WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/Users/dhyun/PRS/PLATFORM/assembly/target/scala-2.12/jars/hadoop-auth-2.7.4.jar) to method sun.security.krb5.Config.getInstance() WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/Users/dhyun/PRS/PLATFORM/assembly/target/scala-2.12/jars/hadoop-auth-2.7.4.jar) to method sun.security.krb5.Config.getDefaultRealm() 19/06/08 11:08:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[], app id = local-1560017311171). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT /_/ Using Scala version 2.12.8 (OpenJDK 64-Bit Server VM, Java 11.0.3) ``` AFTER (Among 2 warnings, there is no `java.nio.Bits.unaligned` warning with `hadoop-3.2` profile)* ``` $ bin/spark-shell --driver-java-options=--illegal-access=warn WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/dhyun/PRS/PLATFORM/assembly/target/scala-2.12/jars/spark-unsafe_2.12-3.0.0-SNAPSHOT.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/dhyun/PRS/PLATFORM/assembly/target/scala-2.12/jars/spark-unsafe_2.12-3.0.0-SNAPSHOT.jar) to field java.nio.DirectByteBuffer.cleaner 19/06/08 10:52:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1560016330287). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT /_/ Using Scala version 2.12.8 (OpenJDK 64-Bit Server VM, Java 11.0.3) ... ``` ## How was this patch tested? Manual. Run Spark command like `spark-shell` with `--driver-java-options=--illegal-access=warn` option in JDK9/10/11/12 environment. Closes #24825 from dongjoon-hyun/SPARK-27981. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-06-08 16:39:32 -07:00
Shixiong Zhu	6a317c8f01	[SPARK-27735][SS] Parsing interval string should be case-insensitive in SS ## What changes were proposed in this pull request? Some APIs in Structured Streaming requires the user to specify an interval. Right now these APIs don't accept upper-case strings. This PR adds a new method `fromCaseInsensitiveString` to `CalendarInterval` to support paring upper-case strings, and fixes all APIs that need to parse an interval string. ## How was this patch tested? The new unit test. Closes #24619 from zsxwing/SPARK-27735. Authored-by: Shixiong Zhu <zsxwing@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-05-16 13:58:27 -07:00
Dongjoon Hyun	614a5cc600	[SPARK-27624][CORE] Fix CalenderInterval to show an empty interval correctly ## What changes were proposed in this pull request? If the interval is `0`, it doesn't show both the value `0` and the unit at all. For example, this happens in the explain plans and Spark Web UI on `EventTimeWatermark` diagram. BEFORE ```scala scala> spark.readStream.schema("ts timestamp").parquet("/tmp/t").withWatermark("ts", "1 microsecond").explain == Physical Plan == EventTimeWatermark ts#0: timestamp, interval 1 microseconds +- StreamingRelation FileSource[/tmp/t], [ts#0] scala> spark.readStream.schema("ts timestamp").parquet("/tmp/t").withWatermark("ts", "0 microsecond").explain == Physical Plan == EventTimeWatermark ts#3: timestamp, interval +- StreamingRelation FileSource[/tmp/t], [ts#3] ``` AFTER ```scala scala> spark.readStream.schema("ts timestamp").parquet("/tmp/t").withWatermark("ts", "1 microsecond").explain == Physical Plan == EventTimeWatermark ts#0: timestamp, interval 1 microseconds +- StreamingRelation FileSource[/tmp/t], [ts#0] scala> spark.readStream.schema("ts timestamp").parquet("/tmp/t").withWatermark("ts", "0 microsecond").explain == Physical Plan == EventTimeWatermark ts#3: timestamp, interval 0 microseconds +- StreamingRelation FileSource[/tmp/t], [ts#3] ``` ## How was this patch tested? Pass the Jenkins with the updated test case. Closes #24516 from dongjoon-hyun/SPARK-27624. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-05-07 11:08:55 -07:00
SongYadong	a77505d4d3	[CORE][MINOR] Fix some typos about MemoryMode ## What changes were proposed in this pull request? Fix typos in comments by replacing "in-heap" with "on-heap". ## How was this patch tested? Existing Tests. Closes #23533 from SongYadong/typos_inheap_to_onheap. Authored-by: SongYadong <song.yadong1@zte.com.cn> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2019-01-15 14:40:00 +08:00
Sean Owen	89cebf4932	[SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set ## What changes were proposed in this pull request? In Java 9+ we can't use sun.misc.Cleaner by default anymore, and this was largely handled in https://github.com/apache/spark/pull/22993 However I think the change there left a significant problem. If a DirectByteBuffer is allocated using the reflective hack in Platform, now, we by default can't set a Cleaner. But I believe this means the memory isn't freed promptly or possibly at all. If a Cleaner can't be set, I think we need to use normal APIs to allocate the direct ByteBuffer. According to comments in the code, the downside is simply that the normal APIs will check and impose limits on how much off-heap memory can be allocated. Per the original review on https://github.com/apache/spark/pull/22993 this much seems fine, as either way in this case the user would have to add a JVM setting (increase max, or allow the reflective access). ## How was this patch tested? Existing tests. This resolved an OutOfMemoryError in Java 11 from TimSort tests without increasing test heap size. (See https://github.com/apache/spark/pull/23419#issuecomment-450772125 ) This suggests there is a problem and that this resolves it. Closes #23424 from srowen/SPARK-24421.2. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-04 15:37:09 -06:00
Wenchen Fan	bdf32847b1	[SPARK-26021][SQL][FOLLOWUP] only deal with NaN and -0.0 in UnsafeWriter ## What changes were proposed in this pull request? A followup of https://github.com/apache/spark/pull/23043 There are 4 places we need to deal with NaN and -0.0: 1. comparison expressions. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. 2. Join keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. 3. grouping keys. `-0.0` and `0.0` should be assigned to the same group. Different NaNs should be assigned to the same group. 4. window partition keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. The case 1 is OK. Our comparison already handles NaN and -0.0, and for struct/array/map, we will recursively compare the fields/elements. Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary directly, and different NaNs have different binary representation, and the same thing happens for -0.0 and 0.0. To fix it, a simple solution is: normalize float/double when building unsafe data (`UnsafeRow`, `UnsafeArrayData`, `UnsafeMapData`). Then we don't need to worry about it anymore. Following this direction, this PR moves the handling of NaN and -0.0 from `Platform` to `UnsafeWriter`, so that places like `UnsafeRow.setFloat` will not handle them, which reduces the perf overhead. It's also easier to add comments explaining why we do it in `UnsafeWriter`. ## How was this patch tested? existing tests Closes #23239 from cloud-fan/minor. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2018-12-08 11:18:09 -08:00
Wenchen Fan	09a91d98bd	[SPARK-26021][SQL][FOLLOWUP] add test for special floating point values ## What changes were proposed in this pull request? a followup of https://github.com/apache/spark/pull/23043 . Add a test to show the minor behavior change introduced by #23043 , and add migration guide. ## How was this patch tested? a new test Closes #23141 from cloud-fan/follow. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2018-11-28 16:21:42 +08:00
Alon Doron	0ec7b99ea2	[SPARK-26021][SQL] replace minus zero with zero in Platform.putDouble/Float GROUP BY treats -0.0 and 0.0 as different values which is unlike hive's behavior. In addition current behavior with codegen is unpredictable (see example in JIRA ticket). ## What changes were proposed in this pull request? In Platform.putDouble/Float() checking if the value is -0.0, and if so replacing with 0.0. This is used by UnsafeRow so it won't have -0.0 values. ## How was this patch tested? Added tests Closes #23043 from adoron/adoron-spark-26021-replace-minus-zero-with-zero. Authored-by: Alon Doron <adoron@palantir.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2018-11-23 08:55:00 +08:00

1 2 3

134 commits