spark-instrumented-optimizer/sql/hive
hyukjinkwon 29952ed096 [SPARK-16216][SQL] Read/write timestamps and dates in ISO 8601 and dateFormat/timestampFormat option for CSV and JSON
## What changes were proposed in this pull request?

### Default - ISO 8601

Currently, CSV datasource is writing `Timestamp` and `Date` as numeric form and JSON datasource is writing both as below:

- CSV
  ```
  // TimestampType
  1414459800000000
  // DateType
  16673
  ```

- Json

  ```
  // TimestampType
  1970-01-01 11:46:40.0
  // DateType
  1970-01-01
  ```

So, for CSV we can't read back what we write and for JSON it becomes ambiguous because the timezone is being missed.

So, this PR make both **write** `Timestamp` and `Date` in ISO 8601 formatted string (please refer the [ISO 8601 specification](https://www.w3.org/TR/NOTE-datetime)).

- For `Timestamp` it becomes as below: (`yyyy-MM-dd'T'HH:mm:ss.SSSZZ`)

  ```
  1970-01-01T02:00:01.000-01:00
  ```

- For `Date` it becomes as below (`yyyy-MM-dd`)

  ```
  1970-01-01
  ```

### Custom date format option - `dateFormat`

This PR also adds the support to write and read dates and timestamps in a formatted string as below:

- **DateType**

  - With `dateFormat` option (e.g. `yyyy/MM/dd`)

    ```
    +----------+
    |      date|
    +----------+
    |2015/08/26|
    |2014/10/27|
    |2016/01/28|
    +----------+
    ```

### Custom date format option - `timestampFormat`

- **TimestampType**

  - With `dateFormat` option (e.g. `dd/MM/yyyy HH:mm`)

    ```
    +----------------+
    |            date|
    +----------------+
    |2015/08/26 18:00|
    |2014/10/27 18:30|
    |2016/01/28 20:00|
    +----------------+
    ```

## How was this patch tested?

Unit tests were added in `CSVSuite` and `JsonSuite`. For JSON, existing tests cover the default cases.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #14279 from HyukjinKwon/SPARK-16216-json-csv.
2016-08-24 22:16:20 +02:00
..
compatibility/src/test/scala/org/apache/spark/sql/hive/execution [SPARK-17149][SQL] array.sql for testing array related functions 2016-08-19 18:14:45 -07:00
src [SPARK-16216][SQL] Read/write timestamps and dates in ISO 8601 and dateFormat/timestampFormat option for CSV and JSON 2016-08-24 22:16:20 +02:00
pom.xml [SPARK-16535][BUILD] In pom.xml, remove groupId which is redundant definition and inherited from the parent 2016-07-19 11:59:46 +01:00