spark-instrumented-optimizer

History

titicaca bc0a0e6392 [SPARK-19342][SPARKR] bug fixed in collect method for collecting timestamp column ## What changes were proposed in this pull request? Fix a bug in collect method for collecting timestamp column, the bug can be reproduced as shown in the following codes and outputs: ``` library(SparkR) sparkR.session(master = "local") df <- data.frame(col1 = c(0, 1, 2), col2 = c(as.POSIXct("2017-01-01 00:00:01"), NA, as.POSIXct("2017-01-01 12:00:01"))) sdf1 <- createDataFrame(df) print(dtypes(sdf1)) df1 <- collect(sdf1) print(lapply(df1, class)) sdf2 <- filter(sdf1, "col1 > 0") print(dtypes(sdf2)) df2 <- collect(sdf2) print(lapply(df2, class)) ``` As we can see from the printed output, the column type of col2 in df2 is converted to numeric unexpectedly, when NA exists at the top of the column. This is caused by method `do.call(c, list)`, if we convert a list, i.e. `do.call(c, list(NA, as.POSIXct("2017-01-01 12:00:01"))`, the class of the result is numeric instead of POSIXct. Therefore, we need to cast the data type of the vector explicitly. ## How was this patch tested? The patch can be tested manually with the same code above. Author: titicaca <fangzhou.yang@hotmail.com> Closes #16689 from titicaca/sparkr-dev.		2017-02-12 10:42:15 -08:00
..
profile	[SPARK-15159][SPARKR] SparkR SparkSession API	2016-06-17 21:36:01 -07:00
tests/testthat	[SPARK-19342][SPARKR] bug fixed in collect method for collecting timestamp column	2017-02-12 10:42:15 -08:00
worker	[SPARK-17919] Make timeout to RBackend configurable in SparkR	2016-10-30 16:17:23 -07:00