spark-instrumented-optimizer/R/pkg/inst
titicaca bc0a0e6392 [SPARK-19342][SPARKR] bug fixed in collect method for collecting timestamp column
## What changes were proposed in this pull request?

Fix a bug in collect method for collecting timestamp column, the bug can be reproduced as shown in the following codes and outputs:

```
library(SparkR)
sparkR.session(master = "local")
df <- data.frame(col1 = c(0, 1, 2),
                 col2 = c(as.POSIXct("2017-01-01 00:00:01"), NA, as.POSIXct("2017-01-01 12:00:01")))

sdf1 <- createDataFrame(df)
print(dtypes(sdf1))
df1 <- collect(sdf1)
print(lapply(df1, class))

sdf2 <- filter(sdf1, "col1 > 0")
print(dtypes(sdf2))
df2 <- collect(sdf2)
print(lapply(df2, class))
```

As we can see from the printed output, the column type of col2 in df2 is converted to numeric unexpectedly, when NA exists at the top of the column.

This is caused by method `do.call(c, list)`, if we convert a list, i.e. `do.call(c, list(NA, as.POSIXct("2017-01-01 12:00:01"))`, the class of the result is numeric instead of POSIXct.

Therefore, we need to cast the data type of the vector explicitly.

## How was this patch tested?

The patch can be tested manually with the same code above.

Author: titicaca <fangzhou.yang@hotmail.com>

Closes #16689 from titicaca/sparkr-dev.
2017-02-12 10:42:15 -08:00
..
profile [SPARK-15159][SPARKR] SparkR SparkSession API 2016-06-17 21:36:01 -07:00
tests/testthat [SPARK-19342][SPARKR] bug fixed in collect method for collecting timestamp column 2017-02-12 10:42:15 -08:00
worker [SPARK-17919] Make timeout to RBackend configurable in SparkR 2016-10-30 16:17:23 -07:00