[SPARK-31189][SQL][DOCS] Fix errors and missing parts for datetime pattern document

### What changes were proposed in this pull request?

Fix errors and missing parts for datetime pattern document
1. The pattern we use is similar to DateTimeFormatter and SimpleDateFormat but not identical. So we shouldn't use any of them in the API docs but use a link to the doc of our own.
2. Some pattern letters are missing
3. Some pattern letters are explicitly banned - Set('A', 'c', 'e', 'n', 'N')
4. the second fraction pattern different logic for parsing and formatting

### Why are the changes needed?

fix and improve doc
### Does this PR introduce any user-facing change?

yes, new and updated doc
### How was this patch tested?

pass Jenkins
viewed locally with `jekyll serve`
![image](https://user-images.githubusercontent.com/8326978/77044447-6bd3bb00-69fa-11ea-8d6f-7084166c5dea.png)

Closes #27956 from yaooqinn/SPARK-31189.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This commit is contained in:
Kent Yao 2020-03-20 21:59:26 +08:00 committed by Wenchen Fan
parent b402bc900a
commit 88ae6c4481
11 changed files with 155 additions and 66 deletions

View file

@ -21,11 +21,13 @@ license: |
There are several common scenarios for datetime usage in Spark:
- CSV/JSON datasources use the pattern string for parsing and formatting time content.
- CSV/JSON datasources use the pattern string for parsing and formatting datetime content.
- Datetime functions related to convert string to/from `DateType` or `TimestampType`. For example, unix_timestamp, date_format, to_unix_timestamp, from_unixtime, to_date, to_timestamp, from_utc_timestamp, to_utc_timestamp, etc.
- Datetime functions related to convert `StringType` to/from `DateType` or `TimestampType`.
For example, `unix_timestamp`, `date_format`, `to_unix_timestamp`, `from_unixtime`, `to_date`, `to_timestamp`, `from_utc_timestamp`, `to_utc_timestamp`, etc.
Spark uses pattern letters in the following table for date and timestamp parsing and formatting:
Spark uses the below letters in date and timestamp parsing and formatting:
<table class="table">
<tr>
<th> <b>Symbol</b> </th>
@ -52,7 +54,7 @@ Spark uses the below letters in date and timestamp parsing and formatting:
<td> 189 </td>
</tr>
<tr>
<td> <b>M</b> </td>
<td> <b>M/L</b> </td>
<td> month-of-year </td>
<td> number/text </td>
<td> 7; 07; Jul; July; J </td>
@ -63,6 +65,12 @@ Spark uses the below letters in date and timestamp parsing and formatting:
<td> number </td>
<td> 28 </td>
</tr>
<tr>
<td> <b>Q/q</b> </td>
<td> quarter-of-year </td>
<td> number/text </td>
<td> 3; 03; Q3; 3rd quarter </td>
</tr>
<tr>
<td> <b>Y</b> </td>
<td> week-based-year </td>
@ -147,6 +155,12 @@ Spark uses the below letters in date and timestamp parsing and formatting:
<td> fraction </td>
<td> 978 </td>
</tr>
<tr>
<td> <b>V</b> </td>
<td> time-zone ID </td>
<td> zone-id </td>
<td> America/Los_Angeles; Z; -08:30 </td>
</tr>
<tr>
<td> <b>z</b> </td>
<td> time-zone name </td>
@ -189,6 +203,18 @@ Spark uses the below letters in date and timestamp parsing and formatting:
<td> literal </td>
<td> ' </td>
</tr>
<tr>
<td> <b>[</b> </td>
<td> optional section start </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td> <b>]</b> </td>
<td> optional section end </td>
<td> </td>
<td> </td>
</tr>
</table>
The count of pattern letters determines the format.
@ -199,11 +225,16 @@ The count of pattern letters determines the format.
- Number/Text: If the count of pattern letters is 3 or greater, use the Text rules above. Otherwise use the Number rules above.
- Fraction: Outputs the micro-of-second field as a fraction-of-second. The micro-of-second value has six digits, thus the count of pattern letters is from 1 to 6. If it is less than 6, then the micro-of-second value is truncated, with only the most significant digits being output.
- Fraction: Use one or more (up to 9) contiguous `'S'` characters, e,g `SSSSSS`, to parse and format fraction of second.
For parsing, the acceptable fraction length can be [1, the number of contiguous 'S'].
For formatting, the fraction length would be padded to the number of contiguous 'S' with zeros.
Spark supports datetime of micro-of-second precision, which has up to 6 significant digits, but can parse nano-of-second with exceeded part truncated.
- Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when 'G' is not present.
- Zone names: This outputs the display name of the time-zone ID. If the count of letters is one, two or three, then the short name is output. If the count of letters is four, then the full name is output. Five or more letters will fail.
- Zone ID(V): This outputs the display the time-zone ID. Pattern letter count must be 2.
- Zone names(z): This outputs the display textual name of the time-zone ID. If the count of letters is one, two or three, then the short name is output. If the count of letters is four, then the full name is output. Five or more letters will fail.
- Offset X and x: This formats the offset based on the number of pattern letters. One letter outputs just the hour, such as '+01', unless the minute is non-zero in which case the minute is also output, such as '+0130'. Two letters outputs the hour and minute, without a colon, such as '+0130'. Three letters outputs the hour and minute, with a colon, such as '+01:30'. Four letters outputs the hour and minute and optional second, without a colon, such as '+013015'. Five letters outputs the hour and minute and optional second, with a colon, such as '+01:30:15'. Six or more letters will fail. Pattern letter 'X' (upper case) will output 'Z' when the offset to be output would be zero, whereas pattern letter 'x' (lower case) will output '+00', '+0000', or '+00:00'.
@ -211,6 +242,11 @@ The count of pattern letters determines the format.
- Offset Z: This formats the offset based on the number of pattern letters. One, two or three letters outputs the hour and minute, without a colon, such as '+0130'. The output will be '+0000' when the offset is zero. Four letters outputs the full form of localized offset, equivalent to four letters of Offset-O. The output will be the corresponding localized offset text if the offset is zero. Five letters outputs the hour, minute, with optional second if non-zero, with colon. It outputs 'Z' if the offset is zero. Six or more letters will fail.
- Optional section start and end: Use `[]` to define an optional section and maybe nested.
During formatting, all valid data will be output even it is in the optional section.
During parsing, the whole section may be missing from the parsed string.
An optional section is started by `[` and ended using `]` (or at the end of the pattern).
More details for the text style:
- Short Form: Short text, typically an abbreviation. For example, day-of-week Monday might output "Mon".

View file

@ -973,8 +973,9 @@ def date_format(date, format):
format given by the second argument.
A pattern could be for instance `dd.MM.yyyy` and could return a string like '18.03.1993'. All
pattern letters of the Java class `java.time.format.DateTimeFormatter` can be used.
pattern letters of `datetime pattern`_. can be used.
.. _datetime pattern: https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
.. note:: Use when ever possible specialized functions like `year`. These benefit from a
specialized implementation.
@ -1191,11 +1192,12 @@ def months_between(date1, date2, roundOff=True):
@since(2.2)
def to_date(col, format=None):
"""Converts a :class:`Column` into :class:`pyspark.sql.types.DateType`
using the optionally specified format. Specify formats according to
`DateTimeFormatter <https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html>`_. # noqa
using the optionally specified format. Specify formats according to `datetime pattern`_.
By default, it follows casting rules to :class:`pyspark.sql.types.DateType` if the format
is omitted. Equivalent to ``col.cast("date")``.
.. _datetime pattern: https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
>>> df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t'])
>>> df.select(to_date(df.t).alias('date')).collect()
[Row(date=datetime.date(1997, 2, 28))]
@ -1215,11 +1217,12 @@ def to_date(col, format=None):
@since(2.2)
def to_timestamp(col, format=None):
"""Converts a :class:`Column` into :class:`pyspark.sql.types.TimestampType`
using the optionally specified format. Specify formats according to
`DateTimeFormatter <https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html>`_. # noqa
using the optionally specified format. Specify formats according to `datetime pattern`_.
By default, it follows casting rules to :class:`pyspark.sql.types.TimestampType` if the format
is omitted. Equivalent to ``col.cast("timestamp")``.
.. _datetime pattern: https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
>>> df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t'])
>>> df.select(to_timestamp(df.t).alias('dt')).collect()
[Row(dt=datetime.datetime(1997, 2, 28, 10, 30))]

View file

@ -221,12 +221,11 @@ class DataFrameReader(OptionUtils):
it uses the value specified in
``spark.sql.columnNameOfCorruptRecord``.
:param dateFormat: sets the string that indicates a date format. Custom date formats
follow the formats at ``java.time.format.DateTimeFormatter``. This
applies to date type. If None is set, it uses the
follow the formats at `datetime pattern`_.
This applies to date type. If None is set, it uses the
default value, ``yyyy-MM-dd``.
:param timestampFormat: sets the string that indicates a timestamp format.
Custom date formats follow the formats at
``java.time.format.DateTimeFormatter``.
Custom date formats follow the formats at `datetime pattern`_.
This applies to timestamp type. If None is set, it uses the
default value, ``yyyy-MM-dd'T'HH:mm:ss.SSSXXX``.
:param multiLine: parse one record, which may span multiple lines, per file. If None is
@ -255,6 +254,7 @@ class DataFrameReader(OptionUtils):
disables `partition discovery`_.
.. _partition discovery: /sql-data-sources-parquet.html#partition-discovery
.. _datetime pattern: https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
>>> df1 = spark.read.json('python/test_support/sql/people.json')
>>> df1.dtypes
@ -430,12 +430,11 @@ class DataFrameReader(OptionUtils):
:param negativeInf: sets the string representation of a negative infinity value. If None
is set, it uses the default value, ``Inf``.
:param dateFormat: sets the string that indicates a date format. Custom date formats
follow the formats at ``java.time.format.DateTimeFormatter``. This
applies to date type. If None is set, it uses the
follow the formats at `datetime pattern`_.
This applies to date type. If None is set, it uses the
default value, ``yyyy-MM-dd``.
:param timestampFormat: sets the string that indicates a timestamp format.
Custom date formats follow the formats at
``java.time.format.DateTimeFormatter``.
Custom date formats follow the formats at `datetime pattern`_.
This applies to timestamp type. If None is set, it uses the
default value, ``yyyy-MM-dd'T'HH:mm:ss.SSSXXX``.
:param maxColumns: defines a hard limit of how many columns a record can have. If None is
@ -491,6 +490,8 @@ class DataFrameReader(OptionUtils):
:param recursiveFileLookup: recursively scan a directory for files. Using this option
disables `partition discovery`_.
.. _datetime pattern: https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
>>> df = spark.read.csv('python/test_support/sql/ages.csv')
>>> df.dtypes
[('_c0', 'string'), ('_c1', 'string')]
@ -850,12 +851,11 @@ class DataFrameWriter(OptionUtils):
known case-insensitive shorten names (none, bzip2, gzip, lz4,
snappy and deflate).
:param dateFormat: sets the string that indicates a date format. Custom date formats
follow the formats at ``java.time.format.DateTimeFormatter``. This
applies to date type. If None is set, it uses the
follow the formats at `datetime pattern`_.
This applies to date type. If None is set, it uses the
default value, ``yyyy-MM-dd``.
:param timestampFormat: sets the string that indicates a timestamp format.
Custom date formats follow the formats at
``java.time.format.DateTimeFormatter``.
Custom date formats follow the formats at `datetime pattern`_.
This applies to timestamp type. If None is set, it uses the
default value, ``yyyy-MM-dd'T'HH:mm:ss.SSSXXX``.
:param encoding: specifies encoding (charset) of saved json files. If None is set,
@ -865,6 +865,8 @@ class DataFrameWriter(OptionUtils):
:param ignoreNullFields: Whether to ignore null fields when generating JSON objects.
If None is set, it uses the default value, ``true``.
.. _datetime pattern: https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
>>> df.write.json(os.path.join(tempfile.mkdtemp(), 'data'))
"""
self.mode(mode)
@ -954,13 +956,12 @@ class DataFrameWriter(OptionUtils):
the default value, ``false``.
:param nullValue: sets the string representation of a null value. If None is set, it uses
the default value, empty string.
:param dateFormat: sets the string that indicates a date format. Custom date formats
follow the formats at ``java.time.format.DateTimeFormatter``. This
applies to date type. If None is set, it uses the
:param dateFormat: sets the string that indicates a date format. Custom date formats follow
the formats at `datetime pattern`_.
This applies to date type. If None is set, it uses the
default value, ``yyyy-MM-dd``.
:param timestampFormat: sets the string that indicates a timestamp format.
Custom date formats follow the formats at
``java.time.format.DateTimeFormatter``.
Custom date formats follow the formats at `datetime pattern`_.
This applies to timestamp type. If None is set, it uses the
default value, ``yyyy-MM-dd'T'HH:mm:ss.SSSXXX``.
:param ignoreLeadingWhiteSpace: a flag indicating whether or not leading whitespaces from
@ -980,6 +981,8 @@ class DataFrameWriter(OptionUtils):
:param lineSep: defines the line separator that should be used for writing. If None is
set, it uses the default value, ``\\n``. Maximum length is 1 character.
.. _datetime pattern: https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
>>> df.write.csv(os.path.join(tempfile.mkdtemp(), 'data'))
"""
self.mode(mode)

View file

@ -459,12 +459,11 @@ class DataStreamReader(OptionUtils):
it uses the value specified in
``spark.sql.columnNameOfCorruptRecord``.
:param dateFormat: sets the string that indicates a date format. Custom date formats
follow the formats at ``java.time.format.DateTimeFormatter``. This
applies to date type. If None is set, it uses the
follow the formats at `datetime pattern`_.
This applies to date type. If None is set, it uses the
default value, ``yyyy-MM-dd``.
:param timestampFormat: sets the string that indicates a timestamp format.
Custom date formats follow the formats at
``java.time.format.DateTimeFormatter``.
Custom date formats follow the formats at `datetime pattern`_.
This applies to timestamp type. If None is set, it uses the
default value, ``yyyy-MM-dd'T'HH:mm:ss.SSSXXX``.
:param multiLine: parse one record, which may span multiple lines, per file. If None is
@ -491,6 +490,7 @@ class DataStreamReader(OptionUtils):
disables `partition discovery`_.
.. _partition discovery: /sql-data-sources-parquet.html#partition-discovery
.. _datetime pattern: https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
>>> json_sdf = spark.readStream.json(tempfile.mkdtemp(), schema = sdf_schema)
>>> json_sdf.isStreaming
@ -671,12 +671,11 @@ class DataStreamReader(OptionUtils):
:param negativeInf: sets the string representation of a negative infinity value. If None
is set, it uses the default value, ``Inf``.
:param dateFormat: sets the string that indicates a date format. Custom date formats
follow the formats at ``java.time.format.DateTimeFormatter``. This
applies to date type. If None is set, it uses the
follow the formats at `datetime pattern`_.
This applies to date type. If None is set, it uses the
default value, ``yyyy-MM-dd``.
:param timestampFormat: sets the string that indicates a timestamp format.
Custom date formats follow the formats at
``java.time.format.DateTimeFormatter``.
Custom date formats follow the formats at `datetime pattern`_.
This applies to timestamp type. If None is set, it uses the
default value, ``yyyy-MM-dd'T'HH:mm:ss.SSSXXX``.
:param maxColumns: defines a hard limit of how many columns a record can have. If None is
@ -726,6 +725,8 @@ class DataStreamReader(OptionUtils):
:param recursiveFileLookup: recursively scan a directory for files. Using this option
disables `partition discovery`_.
.. _datetime pattern: https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
>>> csv_sdf = spark.readStream.csv(tempfile.mkdtemp(), schema = sdf_schema)
>>> csv_sdf.isStreaming
True

View file

@ -603,7 +603,7 @@ case class WeekOfYear(child: Expression) extends UnaryExpression with ImplicitCa
arguments = """
Arguments:
* timestamp - A date/timestamp or string to be converted to the given format.
* fmt - Date/time format pattern to follow. See `java.time.format.DateTimeFormatter` for valid date
* fmt - Date/time format pattern to follow. See <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">Datetime Patterns</a> for valid date
and time format patterns.
""",
examples = """
@ -678,13 +678,14 @@ case class DateFormatClass(left: Expression, right: Expression, timeZoneId: Opti
* Converts time string with given pattern.
* Deterministic version of [[UnixTimestamp]], must have at least one parameter.
*/
// scalastyle:off line.size.limit
@ExpressionDescription(
usage = "_FUNC_(timeExp[, format]) - Returns the UNIX timestamp of the given time.",
arguments = """
Arguments:
* timeExp - A date/timestamp or string which is returned as a UNIX timestamp.
* format - Date/time format pattern to follow. Ignored if `timeExp` is not a string.
Default value is "yyyy-MM-dd HH:mm:ss". See `java.time.format.DateTimeFormatter`
Default value is "yyyy-MM-dd HH:mm:ss". See <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">Datetime Patterns</a>
for valid date and time format patterns.
""",
examples = """
@ -693,6 +694,7 @@ case class DateFormatClass(left: Expression, right: Expression, timeZoneId: Opti
1460098800
""",
since = "1.6.0")
// scalastyle:on line.size.limit
case class ToUnixTimestamp(
timeExp: Expression,
format: Expression,
@ -714,9 +716,10 @@ case class ToUnixTimestamp(
override def prettyName: String = "to_unix_timestamp"
}
// scalastyle:off line.size.limit
/**
* Converts time string with given pattern to Unix time stamp (in seconds), returns null if fail.
* See [https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html].
* See <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">Datetime Patterns</a>.
* Note that hive Language Manual says it returns 0 if fail, but in fact it returns null.
* If the second parameter is missing, use "yyyy-MM-dd HH:mm:ss".
* If no parameters provided, the first parameter will be current_timestamp.
@ -729,7 +732,7 @@ case class ToUnixTimestamp(
Arguments:
* timeExp - A date/timestamp or string. If not provided, this defaults to current time.
* format - Date/time format pattern to follow. Ignored if `timeExp` is not a string.
Default value is "yyyy-MM-dd HH:mm:ss". See `java.time.format.DateTimeFormatter`
Default value is "yyyy-MM-dd HH:mm:ss". See <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html"> Datetime Patterns</a>
for valid date and time format patterns.
""",
examples = """
@ -740,6 +743,7 @@ case class ToUnixTimestamp(
1460041200
""",
since = "1.5.0")
// scalastyle:on line.size.limit
case class UnixTimestamp(timeExp: Expression, format: Expression, timeZoneId: Option[String] = None)
extends UnixTime {
@ -917,12 +921,13 @@ abstract class UnixTime extends ToTimestamp {
* format. If the format is missing, using format like "1970-01-01 00:00:00".
* Note that hive Language Manual says it returns 0 if fail, but in fact it returns null.
*/
// scalastyle:off line.size.limit
@ExpressionDescription(
usage = "_FUNC_(unix_time, format) - Returns `unix_time` in the specified `format`.",
arguments = """
Arguments:
* unix_time - UNIX Timestamp to be converted to the provided format.
* format - Date/time format pattern to follow. See `java.time.format.DateTimeFormatter`
* format - Date/time format pattern to follow. See <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">Datetime Patterns</a>
for valid date and time format patterns.
""",
examples = """
@ -931,6 +936,7 @@ abstract class UnixTime extends ToTimestamp {
1969-12-31 16:00:00
""",
since = "1.5.0")
// scalastyle:on line.size.limit
case class FromUnixTime(sec: Expression, format: Expression, timeZoneId: Option[String] = None)
extends BinaryExpression with TimeZoneAwareExpression with ImplicitCastInputTypes {
@ -1451,6 +1457,7 @@ case class ToUTCTimestamp(left: Expression, right: Expression)
/**
* Parses a column to a date based on the given format.
*/
// scalastyle:off line.size.limit
@ExpressionDescription(
usage = """
_FUNC_(date_str[, fmt]) - Parses the `date_str` expression with the `fmt` expression to
@ -1460,7 +1467,7 @@ case class ToUTCTimestamp(left: Expression, right: Expression)
arguments = """
Arguments:
* date_str - A string to be parsed to date.
* fmt - Date format pattern to follow. See `java.time.format.DateTimeFormatter` for valid
* fmt - Date format pattern to follow. See <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">Datetime Patterns</a> for valid
date and time format patterns.
""",
examples = """
@ -1471,6 +1478,7 @@ case class ToUTCTimestamp(left: Expression, right: Expression)
2016-12-31
""",
since = "1.5.0")
// scalastyle:on line.size.limit
case class ParseToDate(left: Expression, format: Option[Expression], child: Expression)
extends RuntimeReplaceable {
@ -1499,6 +1507,7 @@ case class ParseToDate(left: Expression, format: Option[Expression], child: Expr
/**
* Parses a column to a timestamp based on the supplied format.
*/
// scalastyle:off line.size.limit
@ExpressionDescription(
usage = """
_FUNC_(timestamp_str[, fmt]) - Parses the `timestamp_str` expression with the `fmt` expression
@ -1508,7 +1517,7 @@ case class ParseToDate(left: Expression, format: Option[Expression], child: Expr
arguments = """
Arguments:
* timestamp_str - A string to be parsed to timestamp.
* fmt - Timestamp format pattern to follow. See `java.time.format.DateTimeFormatter` for valid
* fmt - Timestamp format pattern to follow. See <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">Datetime Patterns</a> for valid
date and time format patterns.
""",
examples = """
@ -1519,6 +1528,7 @@ case class ParseToDate(left: Expression, format: Option[Expression], child: Expr
2016-12-31 00:00:00
""",
since = "2.2.0")
// scalastyle:on line.size.limit
case class ParseToTimestamp(left: Expression, format: Option[Expression], child: Expression)
extends RuntimeReplaceable {

View file

@ -162,6 +162,8 @@ private object DateTimeFormatterHelper {
toFormatter(builder, TimestampFormatter.defaultLocale)
}
final val unsupportedLetters = Set('A', 'c', 'e', 'n', 'N', 'p')
/**
* In Spark 3.0, we switch to the Proleptic Gregorian calendar and use DateTimeFormatter for
* parsing/formatting datetime values. The pattern string is incompatible with the one defined
@ -179,7 +181,7 @@ private object DateTimeFormatterHelper {
(pattern + " ").split("'").zipWithIndex.map {
case (patternPart, index) =>
if (index % 2 == 0) {
for (c <- patternPart if c == 'c' || c == 'e') {
for (c <- patternPart if unsupportedLetters.contains(c)) {
throw new IllegalArgumentException(s"Illegal pattern character: $c")
}
// The meaning of 'u' was day number of week in SimpleDateFormat, it was changed to year

View file

@ -18,7 +18,7 @@
package org.apache.spark.sql.catalyst.util
import org.apache.spark.SparkFunSuite
import org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper.convertIncompatiblePattern
import org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper._
class DateTimeFormatterHelperSuite extends SparkFunSuite {
@ -36,10 +36,10 @@ class DateTimeFormatterHelperSuite extends SparkFunSuite {
=== "uuuu-MM'u contains in quoted text'''''HH:mm:ss")
assert(convertIncompatiblePattern("yyyy-MM-dd'T'HH:mm:ss.SSSz G")
=== "yyyy-MM-dd'T'HH:mm:ss.SSSz G")
val e1 = intercept[IllegalArgumentException](convertIncompatiblePattern("yyyy-MM-dd eeee G"))
assert(e1.getMessage === "Illegal pattern character: e")
val e2 = intercept[IllegalArgumentException](convertIncompatiblePattern("yyyy-MM-dd cccc G"))
assert(e2.getMessage === "Illegal pattern character: c")
unsupportedLetters.foreach { l =>
val e = intercept[IllegalArgumentException](convertIncompatiblePattern(s"yyyy-MM-dd $l G"))
assert(e.getMessage === s"Illegal pattern character: $l")
}
assert(convertIncompatiblePattern("yyyy-MM-dd uuuu") === "uuuu-MM-dd eeee")
assert(convertIncompatiblePattern("yyyy-MM-dd EEEE") === "uuuu-MM-dd EEEE")
assert(convertIncompatiblePattern("yyyy-MM-dd'e'HH:mm:ss") === "uuuu-MM-dd'e'HH:mm:ss")

View file

@ -391,11 +391,15 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
* `spark.sql.columnNameOfCorruptRecord`): allows renaming the new field having malformed string
* created by `PERMISSIVE` mode. This overrides `spark.sql.columnNameOfCorruptRecord`.</li>
* <li>`dateFormat` (default `yyyy-MM-dd`): sets the string that indicates a date format.
* Custom date formats follow the formats at `java.time.format.DateTimeFormatter`.
* Custom date formats follow the formats at
* <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">
* Datetime Patterns</a>.
* This applies to date type.</li>
* <li>`timestampFormat` (default `yyyy-MM-dd'T'HH:mm:ss.SSSXXX`): sets the string that
* indicates a timestamp format. Custom date formats follow the formats at
* `java.time.format.DateTimeFormatter`. This applies to timestamp type.</li>
* <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">
* Datetime Patterns</a>.
* This applies to timestamp type.</li>
* <li>`multiLine` (default `false`): parse one record, which may span multiple lines,
* per file</li>
* <li>`encoding` (by default it is not set): allows to forcibly set one of standard basic
@ -616,11 +620,15 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
* <li>`negativeInf` (default `-Inf`): sets the string representation of a negative infinity
* value.</li>
* <li>`dateFormat` (default `yyyy-MM-dd`): sets the string that indicates a date format.
* Custom date formats follow the formats at `java.time.format.DateTimeFormatter`.
* Custom date formats follow the formats at
* <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">
* Datetime Patterns</a>.
* This applies to date type.</li>
* <li>`timestampFormat` (default `yyyy-MM-dd'T'HH:mm:ss.SSSXXX`): sets the string that
* indicates a timestamp format. Custom date formats follow the formats at
* `java.time.format.DateTimeFormatter`. This applies to timestamp type.</li>
* <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">
* Datetime Patterns</a>.
* This applies to timestamp type.</li>
* <li>`maxColumns` (default `20480`): defines a hard limit of how many columns
* a record can have.</li>
* <li>`maxCharsPerColumn` (default `-1`): defines the maximum number of characters allowed

View file

@ -750,11 +750,15 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
* one of the known case-insensitive shorten names (`none`, `bzip2`, `gzip`, `lz4`,
* `snappy` and `deflate`). </li>
* <li>`dateFormat` (default `yyyy-MM-dd`): sets the string that indicates a date format.
* Custom date formats follow the formats at `java.time.format.DateTimeFormatter`.
* Custom date formats follow the formats at
* <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">
* Datetime Patterns</a>.
* This applies to date type.</li>
* <li>`timestampFormat` (default `yyyy-MM-dd'T'HH:mm:ss.SSSXXX`): sets the string that
* indicates a timestamp format. Custom date formats follow the formats at
* `java.time.format.DateTimeFormatter`. This applies to timestamp type.</li>
* <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">
* Datetime Patterns</a>.
* This applies to timestamp type.</li>
* <li>`encoding` (by default it is not set): specifies encoding (charset) of saved json
* files. If it is not set, the UTF-8 charset will be used. </li>
* <li>`lineSep` (default `\n`): defines the line separator that should be used for writing.</li>
@ -871,11 +875,15 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
* one of the known case-insensitive shorten names (`none`, `bzip2`, `gzip`, `lz4`,
* `snappy` and `deflate`). </li>
* <li>`dateFormat` (default `yyyy-MM-dd`): sets the string that indicates a date format.
* Custom date formats follow the formats at `java.time.format.DateTimeFormatter`.
* Custom date formats follow the formats at
* <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">
* Datetime Patterns</a>.
* This applies to date type.</li>
* <li>`timestampFormat` (default `yyyy-MM-dd'T'HH:mm:ss.SSSXXX`): sets the string that
* indicates a timestamp format. Custom date formats follow the formats at
* `java.time.format.DateTimeFormatter`. This applies to timestamp type.</li>
* <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">
* Datetime Patterns</a>.
* This applies to timestamp type.</li>
* <li>`ignoreLeadingWhiteSpace` (default `true`): a flag indicating whether or not leading
* whitespaces from values being written should be skipped.</li>
* <li>`ignoreTrailingWhiteSpace` (default `true`): a flag indicating defines whether or not

View file

@ -2655,7 +2655,9 @@ object functions {
* Converts a date/timestamp/string to a value of string in the format specified by the date
* format given by the second argument.
*
* See [[java.time.format.DateTimeFormatter]] for valid date and time format patterns
* See <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">
* Datetime Patterns</a>
* for valid date and time format patterns
*
* @param dateExpr A date, timestamp or string. If a string, the data must be in a format that
* can be cast to a timestamp, such as `yyyy-MM-dd` or `yyyy-MM-dd HH:mm:ss.SSSS`
@ -2913,7 +2915,9 @@ object functions {
* representing the timestamp of that moment in the current system time zone in the given
* format.
*
* See [[java.time.format.DateTimeFormatter]] for valid date and time format patterns
* See <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">
* Datetime Patterns</a>
* for valid date and time format patterns
*
* @param ut A number of a type that is castable to a long, such as string or integer. Can be
* negative for timestamps before the unix epoch
@ -2957,7 +2961,9 @@ object functions {
/**
* Converts time string with given pattern to Unix timestamp (in seconds).
*
* See [[java.time.format.DateTimeFormatter]] for valid date and time format patterns
* See <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">
* Datetime Patterns</a>
* for valid date and time format patterns
*
* @param s A date, timestamp or string. If a string, the data must be in a format that can be
* cast to a date, such as `yyyy-MM-dd` or `yyyy-MM-dd HH:mm:ss.SSSS`
@ -2985,7 +2991,9 @@ object functions {
/**
* Converts time string with the given pattern to timestamp.
*
* See [[java.time.format.DateTimeFormatter]] for valid date and time format patterns
* See <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">
* Datetime Patterns</a>
* for valid date and time format patterns
*
* @param s A date, timestamp or string. If a string, the data must be in a format that can be
* cast to a timestamp, such as `yyyy-MM-dd` or `yyyy-MM-dd HH:mm:ss.SSSS`
@ -3010,7 +3018,9 @@ object functions {
/**
* Converts the column into a `DateType` with a specified format
*
* See [[java.time.format.DateTimeFormatter]] for valid date and time format patterns
* See <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">
* Datetime Patterns</a>
* for valid date and time format patterns
*
* @param e A date, timestamp or string. If a string, the data must be in a format that can be
* cast to a date, such as `yyyy-MM-dd` or `yyyy-MM-dd HH:mm:ss.SSSS`

View file

@ -253,11 +253,15 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
* `spark.sql.columnNameOfCorruptRecord`): allows renaming the new field having malformed string
* created by `PERMISSIVE` mode. This overrides `spark.sql.columnNameOfCorruptRecord`.</li>
* <li>`dateFormat` (default `yyyy-MM-dd`): sets the string that indicates a date format.
* Custom date formats follow the formats at `java.time.format.DateTimeFormatter`.
* Custom date formats follow the formats at
* <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">
* Datetime Patterns</a>.
* This applies to date type.</li>
* <li>`timestampFormat` (default `yyyy-MM-dd'T'HH:mm:ss.SSSXXX`): sets the string that
* indicates a timestamp format. Custom date formats follow the formats at
* `java.time.format.DateTimeFormatter`. This applies to timestamp type.</li>
* <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">
* Datetime Patterns</a>.
* This applies to timestamp type.</li>
* <li>`multiLine` (default `false`): parse one record, which may span multiple lines,
* per file</li>
* <li>`lineSep` (default covers all `\r`, `\r\n` and `\n`): defines the line separator
@ -319,11 +323,15 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
* <li>`negativeInf` (default `-Inf`): sets the string representation of a negative infinity
* value.</li>
* <li>`dateFormat` (default `yyyy-MM-dd`): sets the string that indicates a date format.
* Custom date formats follow the formats at `java.time.format.DateTimeFormatter`.
* Custom date formats follow the formats at
* <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">
* Datetime Patterns</a>.
* This applies to date type.</li>
* <li>`timestampFormat` (default `yyyy-MM-dd'T'HH:mm:ss.SSSXXX`): sets the string that
* indicates a timestamp format. Custom date formats follow the formats at
* `java.time.format.DateTimeFormatter`. This applies to timestamp type.</li>
* <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html">
* Datetime Patterns</a>.
* This applies to timestamp type.</li>
* <li>`maxColumns` (default `20480`): defines a hard limit of how many columns
* a record can have.</li>
* <li>`maxCharsPerColumn` (default `-1`): defines the maximum number of characters allowed