Commit graph

152 commits

Author SHA1 Message Date
Yanbo Liang ba02f6cb5a [SPARK-12025][SPARKR] Rename some window rank function names for SparkR
Change ```cumeDist -> cume_dist, denseRank -> dense_rank, percentRank -> percent_rank, rowNumber -> row_number``` at SparkR side.
There are two reasons that we should make this change:
* We should follow the [naming convention rule of R](http://www.inside-r.org/node/230645)
* Spark DataFrame has deprecated the old convention (such as ```cumeDist```) and will remove it in Spark 2.0.

It's better to fix this issue before 1.6 release, otherwise we will make breaking API change.
cc shivaram sun-rui

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10016 from yanboliang/SPARK-12025.
2015-11-27 11:48:01 -08:00
felixcheung 1a93323c5b [SPARK-11339][SPARKR] Document the list of functions in R base package that are masked by functions with same name in SparkR
Added tests for function that are reported as masked, to make sure the base:: or stats:: function can be called.

For those we can't call, added them to SparkR programming guide.

It would seem to me `table, sample, subset, filter, cov` not working are not actually expected - I investigated/experimented with them but couldn't get them to work. It looks like as they are defined in base or stats they are missing the S3 generic, eg.
```
> methods("transform")
[1] transform,ANY-method       transform.data.frame
[3] transform,DataFrame-method transform.default
see '?methods' for accessing help and source code
> methods("subset")
[1] subset.data.frame       subset,DataFrame-method subset.default
[4] subset.matrix
see '?methods' for accessing help and source code
Warning message:
In .S3methods(generic.function, class, parent.frame()) :
  function 'subset' appears not to be S3 generic; found functions that look like S3 methods
```
Any idea?

More information on masking:
http://www.ats.ucla.edu/stat/r/faq/referencing_objects.htm
http://www.sfu.ca/~sweldon/howTo/guide4.pdf

This is what the output doc looks like (minus css):
![image](https://cloud.githubusercontent.com/assets/8969467/11229714/2946e5de-8d4d-11e5-94b0-dda9696b6fdd.png)

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #9785 from felixcheung/rmasked.
2015-11-18 23:32:49 -08:00
Sun Rui 224723e6a8 [SPARK-11773][SPARKR] Implement collection functions in SparkR.
Author: Sun Rui <rui.sun@intel.com>

Closes #9764 from sun-rui/SPARK-11773.
2015-11-18 08:41:45 -08:00
zero323 a97d6f3a58 [SPARK-11281][SPARKR] Add tests covering the issue.
The goal of this PR is to add tests covering the issue to ensure that is was resolved by [SPARK-11086](https://issues.apache.org/jira/browse/SPARK-11086).

Author: zero323 <matthew.szymkiewicz@gmail.com>

Closes #9743 from zero323/SPARK-11281-tests.
2015-11-18 08:32:03 -08:00
Sun Rui 835a79d78e [SPARK-10500][SPARKR] sparkr.zip cannot be created if /R/lib is unwritable
The basic idea is that:
The archive of the SparkR package itself, that is sparkr.zip, is created during build process and is contained in the Spark binary distribution. No change to it after the distribution is installed as the directory it resides ($SPARK_HOME/R/lib) may not be writable.

When there is R source code contained in jars or Spark packages specified with "--jars" or "--packages" command line option, a temporary directory is created by calling Utils.createTempDir() where the R packages built from the R source code will be installed. The temporary directory is writable, and won't interfere with each other when there are multiple SparkR sessions, and will be deleted when this SparkR session ends. The R binary packages installed in the temporary directory then are packed into an archive named rpkg.zip.

sparkr.zip and rpkg.zip are distributed to the cluster in YARN modes.

The distribution of rpkg.zip in Standalone modes is not supported in this PR, and will be address in another PR.

Various R files are updated to accept multiple lib paths (one is for SparkR package, the other is for other R packages)  so that these package can be accessed in R.

Author: Sun Rui <rui.sun@intel.com>

Closes #9390 from sun-rui/SPARK-10500.
2015-11-15 19:29:09 -08:00
zero323 d7d9fa0b87 [SPARK-11086][SPARKR] Use dropFactors column-wise instead of nested loop when createDataFrame
Use `dropFactors` column-wise instead of nested loop when `createDataFrame` from a `data.frame`

At this moment SparkR createDataFrame is using nested loop to convert factors to character when called on a local data.frame.  It works but is incredibly slow especially with data.table (~ 2 orders of magnitude compared to PySpark / Pandas version on a DateFrame of size 1M rows x 2 columns).

A simple improvement is to apply `dropFactor `column-wise and then reshape output list.

It should at least partially address [SPARK-8277](https://issues.apache.org/jira/browse/SPARK-8277).

Author: zero323 <matthew.szymkiewicz@gmail.com>

Closes #9099 from zero323/SPARK-11086.
2015-11-15 19:15:27 -08:00
felixcheung ed04846e14 [SPARK-11263][SPARKR] lintr Throws Warnings on Commented Code in Documentation
Clean out hundreds of `style: Commented code should be removed.` from lintr

Like these:
```
/opt/spark-1.6.0-bin-hadoop2.6/R/pkg/R/DataFrame.R:513:3: style: Commented code should be removed.
# sc <- sparkR.init()
  ^~~~~~~~~~~~~~~~~~~
/opt/spark-1.6.0-bin-hadoop2.6/R/pkg/R/DataFrame.R:514:3: style: Commented code should be removed.
# sqlContext <- sparkRSQL.init(sc)
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/spark-1.6.0-bin-hadoop2.6/R/pkg/R/DataFrame.R:515:3: style: Commented code should be removed.
# path <- "path/to/file.json"
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~
```

tried without export or rdname, neither work
instead, added this `#' noRd` to suppress .Rd file generation

also updated `family` for DataFrame functions for longer descriptive text instead of `dataframe_funcs`
![image](https://cloud.githubusercontent.com/assets/8969467/10933937/17bf5b1e-8291-11e5-9777-40fc632105dc.png)

this covers *most* of 'Commented code' but I left out a few that looks legitimate.

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #9463 from felixcheung/rlintr.
2015-11-12 20:02:49 -08:00
JihongMa d292f74831 [SPARK-11420] Updating Stddev support via Imperative Aggregate
switched stddev support from DeclarativeAggregate to ImperativeAggregate.

Author: JihongMa <linlin200605@gmail.com>

Closes #9380 from JihongMA/SPARK-11420.
2015-11-12 13:47:34 -08:00
felixcheung 1a8e0468a1 [SPARK-11468] [SPARKR] add stddev/variance agg functions for Column
Checked names, none of them should conflict with anything in base

shivaram davies rxin

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #9489 from felixcheung/rstddev.
2015-11-10 22:45:17 -08:00
Yanbo Liang f14e95115c [ML][R] SparkR::glm summary result to compare with native R
Follow up #9561. Due to [SPARK-11587](https://issues.apache.org/jira/browse/SPARK-11587) has been fixed, we should compare SparkR::glm summary result with native R output rather than hard-code one. mengxr

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #9590 from yanboliang/glm-r-test.
2015-11-10 11:34:36 -08:00
Oscar D. Lara Yejas 47735cdc2a [SPARK-10863][SPARKR] Method coltypes() (New version)
This is a follow up on PR #8984, as the corresponding branch for such PR was damaged.

Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu>

Closes #9579 from olarayej/SPARK-10863_NEW14.
2015-11-10 11:07:57 -08:00
Shivaram Venkataraman c4e19b3819 [SPARK-11587][SPARKR] Fix the summary generic to match base R
The signature is summary(object, ...) as defined in
https://stat.ethz.ch/R-manual/R-devel/library/base/html/summary.html

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #9582 from shivaram/summary-fix.
2015-11-09 21:06:01 -08:00
felixcheung cd174882a5 [SPARK-9865][SPARKR] Flaky SparkR test: test_sparkSQL.R: sample on a DataFrame
Make sample test less flaky by setting the seed

Tested with
```
repeat {  if (count(sample(df, FALSE, 0.1)) == 3) { break } }
```

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #9549 from felixcheung/rsample.
2015-11-09 10:26:09 -08:00
Yanbo Liang 8c0e1b50e9 [SPARK-11494][ML][R] Expose R-like summary statistics in SparkR::glm for linear regression
Expose R-like summary statistics in SparkR::glm for linear regression, the output of ```summary``` like
```Java
$DevianceResiduals
 Min        Max
 -0.9509607 0.7291832

$Coefficients
                   Estimate   Std. Error t value   Pr(>|t|)
(Intercept)        1.6765     0.2353597  7.123139  4.456124e-11
Sepal_Length       0.3498801  0.04630128 7.556598  4.187317e-12
Species_versicolor -0.9833885 0.07207471 -13.64402 0
Species_virginica  -1.00751   0.09330565 -10.79796 0
```

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #9561 from yanboliang/spark-11494.
2015-11-09 08:56:22 -08:00
Imran Rashid 49f1a82037 [SPARK-10116][CORE] XORShiftRandom.hashSeed is random in high bits
https://issues.apache.org/jira/browse/SPARK-10116

This is really trivial, just happened to notice it -- if `XORShiftRandom.hashSeed` is really supposed to have random bits throughout (as the comment implies), it needs to do something for the conversion to `long`.

mengxr mkolod

Author: Imran Rashid <irashid@cloudera.com>

Closes #8314 from squito/SPARK-10116.
2015-11-06 20:06:24 +00:00
Davies Liu 2440106242 [SPARK-11542] [SPARKR] fix glm with long fomular
Because deparse() will break the long string into multiple lines, the deserialization will fail

Author: Davies Liu <davies@databricks.com>

Closes #9510 from davies/fix_glm.
2015-11-05 16:34:10 -08:00
adrian555 b9455d1f18 [SPARK-11260][SPARKR] with() function support
Author: adrian555 <wzhuang@us.ibm.com>
Author: Adrian Zhuang <adrian555@users.noreply.github.com>

Closes #9443 from adrian555/with.
2015-11-05 14:47:38 -08:00
Yanbo Liang e328b69c31 [SPARK-9492][ML][R] LogisticRegression in R should provide model statistics
Like ml ```LinearRegression```, ```LogisticRegression``` should provide a training summary including feature names and their coefficients.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #9303 from yanboliang/spark-9492.
2015-11-04 08:28:33 -08:00
felixcheung bb5a2af034 [SPARK-11340][SPARKR] Support setting driver properties when starting Spark from R programmatically or from RStudio
Mapping spark.driver.memory from sparkEnvir to spark-submit commandline arguments.

shivaram suggested that we possibly add other spark.driver.* properties - do we want to add all of those? I thought those could be set in SparkConf?
sun-rui

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #9290 from felixcheung/rdrivermem.
2015-10-30 13:51:32 -07:00
Sun Rui 40c77fb23a [SPARK-11210][SPARKR] Add window functions into SparkR [step 2].
Author: Sun Rui <rui.sun@intel.com>

Closes #9196 from sun-rui/SPARK-11210.
2015-10-30 10:56:06 -07:00
Sun Rui dc3220ce11 [SPARK-11209][SPARKR] Add window functions into SparkR [step 1].
Author: Sun Rui <rui.sun@intel.com>

Closes #9193 from sun-rui/SPARK-11209.
2015-10-26 20:58:18 -07:00
Narine Kokhlikyan 3689beb98b [SPARK-10979][SPARKR] Sparkrmerge: Add merge to DataFrame with R signature
Add merge function to DataFrame, which supports R signature.
https://stat.ethz.ch/R-manual/R-devel/library/base/html/merge.html

Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>

Closes #9012 from NarineK/sparkrmerge.
2015-10-26 15:12:25 -07:00
Forest Fang 94e2064fa1 [SPARK-11244][SPARKR] sparkR.stop() should remove SQLContext
SparkR should remove `.sparkRSQLsc` and `.sparkRHivesc` when `sparkR.stop()` is called. Otherwise even when SparkContext is reinitialized, `sparkRSQL.init` returns the stale copy of the object and complains:

```r
sc <- sparkR.init("local")
sqlContext <- sparkRSQL.init(sc)
sparkR.stop()
sc <- sparkR.init("local")
sqlContext <- sparkRSQL.init(sc)
sqlContext
```
producing
```r
Error in callJMethod(x, "getClass") :
  Invalid jobj 1. If SparkR was restarted, Spark operations need to be re-executed.
```

I have added the check and removal only when SparkContext itself is initialized. I have also added corresponding test for this fix. Let me know if you want me to move the test to SQL test suite instead.

p.s. I tried lint-r but ended up a lots of errors on existing code.

Author: Forest Fang <forest.fang@outlook.com>

Closes #9205 from saurfang/sparkR.stop.
2015-10-22 09:34:07 -07:00
Davies Liu f8c6bec657 [SPARK-11197][SQL] run SQL on files directly
This PR introduce a new feature to run SQL directly on files without create a table, for example:

```
select id from json.`path/to/json/files` as j
```

Author: Davies Liu <davies@databricks.com>

Closes #9173 from davies/source.
2015-10-21 13:38:30 -07:00
lewuathe 4c33a34ba3 [SPARK-10668] [ML] Use WeightedLeastSquares in LinearRegression with L…
…2 regularization if the number of features is small

Author: lewuathe <lewuathe@me.com>
Author: Lewuathe <sasaki@treasure-data.com>
Author: Kai Sasaki <sasaki@treasure-data.com>
Author: Lewuathe <lewuathe@me.com>

Closes #8884 from Lewuathe/SPARK-10668.
2015-10-19 10:46:10 -07:00
Sun Rui 390b22fad6 [SPARK-10996] [SPARKR] Implement sampleBy() in DataFrameStatFunctions.
Author: Sun Rui <rui.sun@intel.com>

Closes #9023 from sun-rui/SPARK-10996.
2015-10-13 22:31:23 -07:00
Monica Liu 8b32885704 [SPARK-10981] [SPARKR] SparkR Join improvements
I was having issues with collect() and orderBy() in Spark 1.5.0 so I used the DataFrame.R file and test_sparkSQL.R file from the Spark 1.5.1 download. I only modified the join() function in DataFrame.R to include "full", "fullouter", "left", "right", and "leftsemi" and added corresponding test cases in the test for join() and merge() in test_sparkSQL.R file.
Pull request because I filed this JIRA bug report:
https://issues.apache.org/jira/browse/SPARK-10981

Author: Monica Liu <liu.monica.f@gmail.com>

Closes #9029 from mfliu/master.
2015-10-13 22:24:52 -07:00
Adrian Zhuang f7f28ee7a5 [SPARK-10913] [SPARKR] attach() function support
Bring the change code up to date.

Author: Adrian Zhuang <adrian555@users.noreply.github.com>
Author: adrian555 <wzhuang@us.ibm.com>

Closes #9031 from adrian555/attach2.
2015-10-13 10:21:07 -07:00
Narine Kokhlikyan 1e0aba90b9 [SPARK-10888] [SPARKR] Added as.DataFrame as a synonym to createDataFrame
as.DataFrame is more a R-style like signature.
Also, I'd like to know if we could make the context, e.g. sqlContext global, so that we do not have to specify it as an argument, when we each time create a dataframe.

Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>

Closes #8952 from NarineK/sparkrasDataFrame.
2015-10-13 10:09:05 -07:00
Sun Rui 5e3868ba13 [SPARK-10051] [SPARKR] Support collecting data of StructType in DataFrame
Two points in this PR:

1.    Originally thought was that a named R list is assumed to be a struct in SerDe. But this is problematic because some R functions will implicitly generate named lists that are not intended to be a struct when transferred by SerDe. So SerDe clients have to explicitly mark a names list as struct by changing its class from "list" to "struct".

2.    SerDe is in the Spark Core module, and data of StructType is represented as GenricRow which is defined in Spark SQL module. SerDe can't import GenricRow as in maven build  Spark SQL module depends on Spark Core module. So this PR adds a registration hook in SerDe to allow SQLUtils in Spark SQL module to register its functions for serialization and deserialization of StructType.

Author: Sun Rui <rui.sun@intel.com>

Closes #8794 from sun-rui/SPARK-10051.
2015-10-13 10:02:21 -07:00
Sun Rui 864de3bf40 [SPARK-10079] [SPARKR] Make 'column' and 'col' functions be S4 functions.
1.  Add a "col" function into DataFrame.
2.  Move the current "col" function in Column.R to functions.R, convert it to S4 function.
3.  Add a s4 "column" function in functions.R.
4.  Convert the "column" function in Column.R to S4 function. This is for private use.

Author: Sun Rui <rui.sun@intel.com>

Closes #8864 from sun-rui/SPARK-10079.
2015-10-09 23:05:38 -07:00
Rerngvit Yanggratoke 70f44ad2d8 [SPARK-10905] [SPARKR] Export freqItems() for DataFrameStatFunctions
[SPARK-10905][SparkR]: Export freqItems() for DataFrameStatFunctions
- Add function (together with roxygen2 doc) to DataFrame.R and generics.R
- Expose the function in NAMESPACE
- Add unit test for the function

Author: Rerngvit Yanggratoke <rerngvit@kth.se>

Closes #8962 from rerngvit/SPARK-10905.
2015-10-09 09:36:40 -07:00
Narine Kokhlikyan e8f90d9dda [SPARK-10836] [SPARKR] Added sort(x, decreasing, col, ... ) method to DataFrame
the sort function can be used as an alternative to arrange(... ).
As arguments it accepts x - dataframe, decreasing - TRUE/FALSE, a list of orderings for columns and the list of columns, represented as string names

for example:
sort(df, TRUE, "col1","col2","col3","col5") # for example, if we want to sort some of the columns in the same order

sort(df, decreasing=TRUE, "col1")
sort(df, decreasing=c(TRUE,FALSE), "col1","col2")

Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>

Closes #8920 from NarineK/sparkrsort.
2015-10-08 09:53:44 -07:00
Sun Rui f57c63d4c3 [SPARK-10752] [SPARKR] Implement corr() and cov in DataFrameStatFunctions.
Author: Sun Rui <rui.sun@intel.com>

Closes #8869 from sun-rui/SPARK-10752.
2015-10-07 09:46:37 -07:00
felixcheung 721e8b5f35 [SPARK-10904] [SPARKR] Fix to support select(df, c("col1", "col2"))
The fix is to coerce `c("a", "b")` into a list such that it could be serialized to call JVM with.

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #8961 from felixcheung/rselect.
2015-10-03 22:42:36 -07:00
Oscar D. Lara Yejas f21e2da03f [SPARK-10807] [SPARKR] Added as.data.frame as a synonym for collect
Created method as.data.frame as a synonym for collect().

Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu>
Author: olarayej <oscar.lara.yejas@us.ibm.com>
Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com>

Closes #8908 from olarayej/SPARK-10807.
2015-09-30 18:03:31 -07:00
Eric Liang 922338812c [SPARK-9681] [ML] Support R feature interactions in RFormula
This integrates the Interaction feature transformer with SparkR R formula support (i.e. support `:`).

To generate reasonable ML attribute names for feature interactions, it was necessary to add the ability to read attribute the original attribute names back from `StructField`, and also to specify custom group prefixes in `VectorAssembler`. This also has the side-benefit of cleaning up the double-underscores in the attributes generated for non-interaction terms.

mengxr

Author: Eric Liang <ekl@databricks.com>

Closes #8830 from ericl/interaction-2.
2015-09-25 00:43:22 -07:00
Sun Rui 896edb51ab [SPARK-10050] [SPARKR] Support collecting data of MapType in DataFrame.
1. Support collecting data of MapType from DataFrame.
2. Support data of MapType in createDataFrame.

Author: Sun Rui <rui.sun@intel.com>

Closes #8711 from sun-rui/SPARK-10050.
2015-09-16 13:20:39 -07:00
JihongMa f4a22808e0 [SPARK-6548] Adding stddev to DataFrame functions
Adding STDDEV support for DataFrame using 1-pass online /parallel algorithm to compute variance. Please review the code change.

Author: JihongMa <linlin200605@gmail.com>
Author: Jihong MA <linlin200605@gmail.com>
Author: Jihong MA <jihongma@jihongs-mbp.usca.ibm.com>
Author: Jihong MA <jihongma@Jihongs-MacBook-Pro.local>

Closes #6297 from JihongMA/SPARK-SQL.
2015-09-12 10:17:15 -07:00
Sun Rui 45e3be5c13 [SPARK-10049] [SPARKR] Support collecting data of ArraryType in DataFrame.
this PR :
1.  Enhance reflection in RBackend. Automatically matching a Java array to Scala Seq when finding methods. Util functions like seq(), listToSeq() in R side can be removed, as they will conflict with the Serde logic that transferrs a Scala seq to R side.

2.  Enhance the SerDe to support transferring  a Scala seq to R side. Data of ArrayType in DataFrame
after collection is observed to be of Scala Seq type.

3.  Support ArrayType in createDataFrame().

Author: Sun Rui <rui.sun@intel.com>

Closes #8458 from sun-rui/SPARK-10049.
2015-09-10 12:21:13 -07:00
CHOIJAEHONG af0e3125cb [SPARK-8951] [SPARKR] support Unicode characters in collect()
Spark gives an error message and does not show the output when a field of the result DataFrame contains characters in CJK.
I changed SerDe.scala in order that Spark support Unicode characters when writes a string to R.

Author: CHOIJAEHONG <redrock07@naver.com>

Closes #7494 from CHOIJAEHONG1/SPARK-8951.
2015-09-03 13:38:26 -07:00
felixcheung 2a4e00ca4d [SPARK-9803] [SPARKR] Add subset and transform + tests
Add subset and transform
Also reorganize `[` & `[[` to subset instead of select

Note: for transform, transform is very similar to mutate. Spark doesn't seem to replace existing column with the name in mutate (ie. `mutate(df, age = df$age + 2)` - returned DataFrame has 2 columns with the same name 'age'), so therefore not doing that for now in transform.
Though it is clearly stated it should replace column with matching name (should I open a JIRA for mutate/transform?)

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #8503 from felixcheung/rsubset_transform.
2015-08-28 18:35:01 -07:00
Shivaram Venkataraman 2f99c37273 [SPARK-10328] [SPARKR] Fix generic for na.omit
S3 function is at https://stat.ethz.ch/R-manual/R-patched/library/stats/html/na.fail.html

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Author: Shivaram Venkataraman <shivaram.venkataraman@gmail.com>
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8495 from shivaram/na-omit-fix.
2015-08-28 00:37:50 -07:00
Shivaram Venkataraman e936cf8088 [SPARK-10219] [SPARKR] Fix varargsToEnv and add test case
cc sun-rui davies

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #8475 from shivaram/varargs-fix.
2015-08-26 22:27:31 -07:00
Yu ISHIKAWA 773ca037a4 [MINOR] [SPARKR] Fix some validation problems in SparkR
Getting rid of some validation problems in SparkR
https://github.com/apache/spark/pull/7883

cc shivaram

```
inst/tests/test_Serde.R:26:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_Serde.R:34:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_Serde.R:37:38: style: Trailing whitespace is superfluous.
  expect_equal(class(x), "character")
                                     ^~
inst/tests/test_Serde.R:50:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_Serde.R:55:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_Serde.R:60:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_sparkSQL.R:611:1: style: Trailing whitespace is superfluous.

^~
R/DataFrame.R:664:1: style: Trailing whitespace is superfluous.

^~~~~~~~~~~~~~
R/DataFrame.R:670:55: style: Trailing whitespace is superfluous.
                df <- data.frame(row.names = 1 : nrow)
                                                      ^~~~~~~~~~~~~~~~
R/DataFrame.R:672:1: style: Trailing whitespace is superfluous.

^~~~~~~~~~~~~~
R/DataFrame.R:686:49: style: Trailing whitespace is superfluous.
                    df[[names[colIndex]]] <- vec
                                                ^~~~~~~~~~~~~~~~~~
```

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8474 from yu-iskw/minor-fix-sparkr.
2015-08-26 18:14:32 -07:00
felixcheung 75d4773aa5 [SPARK-9316] [SPARKR] Add support for filtering using [ (synonym for filter / select)
Add support for
```
   df[df$name == "Smith", c(1,2)]
   df[df$age %in% c(19, 30), 1:2]
```

shivaram

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #8394 from felixcheung/rsubset.
2015-08-25 23:48:16 -07:00
Sun Rui 71a138cd0e [SPARK-10048] [SPARKR] Support arbitrary nested Java array in serde.
This PR:
1. supports transferring arbitrary nested array from JVM to R side in SerDe;
2. based on 1, collect() implemenation is improved. Now it can support collecting data of complex types
   from a DataFrame.

Author: Sun Rui <rui.sun@intel.com>

Closes #8276 from sun-rui/SPARK-10048.
2015-08-25 13:14:10 -07:00
Yu ISHIKAWA d898c33f77 [SPARK-10106] [SPARKR] Add ifelse Column function to SparkR
### JIRA
[[SPARK-10106] Add `ifelse` Column function to SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10106)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8303 from yu-iskw/SPARK-10106.
2015-08-19 12:39:37 -07:00
Yu ISHIKAWA 2fcb9cb955 [SPARK-9856] [SPARKR] Add expression functions into SparkR whose params are complicated
I added lots of Column functinos into SparkR. And I also added `rand(seed: Int)` and `randn(seed: Int)` in Scala. Since we need such APIs for R integer type.

### JIRA
[[SPARK-9856] Add expression functions into SparkR whose params are complicated - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9856)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8264 from yu-iskw/SPARK-9856-3.
2015-08-19 10:41:14 -07:00
Yu ISHIKAWA bf32c1f7f4 [SPARK-10075] [SPARKR] Add when expressino function in SparkR
- Add `when` and `otherwise` as `Column` methods
- Add `When` as an expression function
- Add `%otherwise%` infix as an alias of `otherwise`

Since R doesn't support a feature like method chaining, `otherwise(when(condition, value), value)` style is a little annoying for me. If `%otherwise%` looks strange for shivaram, I can remove it. What do you think?

### JIRA
[[SPARK-10075] Add `when` expressino function in SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10075)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8266 from yu-iskw/SPARK-10075.
2015-08-18 20:27:36 -07:00
Yu ISHIKAWA 26e760581f [SPARK-9871] [SPARKR] Add expression functions into SparkR which have a variable parameter
### Summary

- Add `lit` function
- Add `concat`, `greatest`, `least` functions

I think we need to improve `collect` function in order to implement `struct` function. Since `collect` doesn't work with arguments which includes a nested `list` variable. It seems that a list against `struct` still has `jobj` classes. So it would be better to solve this problem on another issue.

### JIRA
[[SPARK-9871] Add expression functions into SparkR which have a variable parameter - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9871)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8194 from yu-iskw/SPARK-9856.
2015-08-16 23:33:20 -07:00
Sun Rui 5f9ce738fe [SPARK-8844] [SPARKR] head/collect is broken in SparkR.
This is a WIP patch for SPARK-8844  for collecting reviews.

This bug is about reading an empty DataFrame. in readCol(),
      lapply(1:numRows, function(x) {
does not take into consideration the case where numRows = 0.

Will add unit test case.

Author: Sun Rui <rui.sun@intel.com>

Closes #7419 from sun-rui/SPARK-8844.
2015-08-16 00:30:02 -07:00
Yu ISHIKAWA f4bc01f1f3 [SPARK-9855] [SPARKR] Add expression functions into SparkR whose params are simple
I added lots of expression functions for SparkR. This PR includes only functions whose params  are only `(Column)` or `(Column, Column)`.  And I think we need to improve how to test those functions. However, it would be better to work on another issue.

## Diff Summary

- Add lots of functions in `functions.R` and their generic in `generic.R`
- Add aliases for `ceiling` and `sign`
- Move expression functions from `column.R` to `functions.R`
- Modify `rdname` from `column` to `functions`

I haven't supported `not` function, because the name has a collesion with `testthat` package. I didn't think of the way  to define it.

## New Supported Functions

```
approxCountDistinct
ascii
base64
bin
bitwiseNOT
ceil (alias: ceiling)
crc32
dayofmonth
dayofyear
explode
factorial
hex
hour
initcap
isNaN
last_day
length
log2
ltrim
md5
minute
month
negate
quarter
reverse
round
rtrim
second
sha1
signum (alias: sign)
size
soundex
to_date
trim
unbase64
unhex
weekofyear
year

datediff
levenshtein
months_between
nanvl
pmod
```

## JIRA
[[SPARK-9855] Add expression functions into SparkR whose params are simple - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9855)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8123 from yu-iskw/SPARK-9855.
2015-08-12 18:33:27 -07:00
Burak Yavuz c9a4c36d05 [SPARK-8313] R Spark packages support
shivaram cafreeman Could you please help me in testing this out? Exposing and running `rPackageBuilder` from inside the shell works, but for some reason, I can't get it to work during Spark Submit. It just starts relaunching Spark Submit.

For testing, you may use the R branch with [sbt-spark-package](https://github.com/databricks/sbt-spark-package). You can call spPackage, and then pass the jar using `--jars`.

Author: Burak Yavuz <brkyvz@gmail.com>

Closes #7139 from brkyvz/r-submit and squashes the following commits:

0de384f [Burak Yavuz] remove unused imports 2
d253708 [Burak Yavuz] removed unused imports
6603d0d [Burak Yavuz] addressed comments
4258ffe [Burak Yavuz] merged master
ddfcc06 [Burak Yavuz] added zipping test
3a1be7d [Burak Yavuz] don't zip
77995df [Burak Yavuz] fix URI
ac45527 [Burak Yavuz] added zipping of all libs
e6bf7b0 [Burak Yavuz] add println ignores
1bc5554 [Burak Yavuz] add assumes for tests
9778e03 [Burak Yavuz] addressed comments
b42b300 [Burak Yavuz] merged master
ffd134e [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into r-submit
d867756 [Burak Yavuz] add apache header
eff5ba1 [Burak Yavuz] ready for review
8838edb [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into r-submit
e5b5a06 [Burak Yavuz] added doc
bb751ce [Burak Yavuz] fix null bug
0226768 [Burak Yavuz] fixed issues
8810beb [Burak Yavuz] R packages support
2015-08-04 18:20:12 -07:00
Hossein 712f5b7a9a [SPARK-9318] [SPARK-9320] [SPARKR] Aliases for merge and summary functions on DataFrames
This PR adds synonyms for ```merge``` and ```summary``` in SparkR DataFrame API.

cc shivaram

Author: Hossein <hossein@databricks.com>

Closes #7806 from falaki/SPARK-9320 and squashes the following commits:

72600f7 [Hossein] Updated docs
92a6e75 [Hossein] Fixed merge generic signature issue
4c2b051 [Hossein] Fixing naming with mllib summary
0f3a64c [Hossein] Added ... to generic for merge
30fbaf8 [Hossein] Merged master
ae1a4cf [Hossein] Merge branch 'master' into SPARK-9320
e8eb86f [Hossein] Add a generic for merge
fc01f2d [Hossein] Added unit test
8d92012 [Hossein] Added merge as an alias for join
5b8bedc [Hossein] Added unit test
632693d [Hossein] Added summary as an alias for describe for DataFrame
2015-07-31 19:24:44 -07:00
Hossein 710c2b5dd2 [SPARK-9324] [SPARK-9322] [SPARK-9321] [SPARKR] Some aliases for R-like functions in DataFrames
Adds following aliases:
* unique (distinct)
* rbind (unionAll): accepts many DataFrames
* nrow (count)
* ncol
* dim
* names (columns): along with the replacement function to change names

Author: Hossein <hossein@databricks.com>

Closes #7764 from falaki/sparkR-alias and squashes the following commits:

56016f5 [Hossein] Updated R documentation
5e4a4d0 [Hossein] Removed extra code
f51cbef [Hossein] Merge branch 'master' into sparkR-alias
c1b88bd [Hossein] Moved setGeneric and other comments applied
d9307f8 [Hossein] Added tests
b5aa988 [Hossein] Added dim, ncol, nrow, names, rbind, and unique functions to DataFrames
2015-07-31 14:08:18 -07:00
Shivaram Venkataraman 82f47b8116 [SPARK-9510] [SPARKR] Remaining SparkR style fixes
With the change in this patch, I get no more warnings from `./dev/lint-r` in my machine

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #7834 from shivaram/sparkr-style-fixes and squashes the following commits:

716cd8e [Shivaram Venkataraman] Remaining SparkR style fixes
2015-07-31 14:02:44 -07:00
Yu ISHIKAWA fc0e57e5ab [SPARK-9053] [SPARKR] Fix spaces around parens, infix operators etc.
### JIRA
[[SPARK-9053] Fix spaces around parens, infix operators etc. - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9053)

### The Result of `lint-r`
[The result of lint-r at the rivision:a4c83cb1e4b066cd60264b6572fd3e51d160d26a](https://gist.github.com/yu-iskw/d253d7f8ef351f86443d)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #7584 from yu-iskw/SPARK-9053 and squashes the following commits:

613170f [Yu ISHIKAWA] Ignore a warning about a space before a left parentheses
ede61e1 [Yu ISHIKAWA] Ignores two warnings about a space before a left parentheses. TODO: After updating `lintr`, we will remove the ignores
de3e0db [Yu ISHIKAWA] Add '## nolint start' & '## nolint end' statement to ignore infix space warnings
e233ea8 [Yu ISHIKAWA] [SPARK-9053][SparkR] Fix spaces around parens, infix operators etc.
2015-07-31 09:33:38 -07:00
Hossein 157840d1b1 [SPARK-8742] [SPARKR] Improve SparkR error messages for DataFrame API
This patch improves SparkR error message reporting, especially with DataFrame API. When there is a user error (e.g., malformed SQL query), the message of the cause is sent back through the RPC and the R client reads it and returns it back to user.

cc shivaram

Author: Hossein <hossein@databricks.com>

Closes #7742 from falaki/SPARK-8742 and squashes the following commits:

4f643c9 [Hossein] Not logging exceptions in RBackendHandler
4a8005c [Hossein] Returning stack track of causing exception from RBackendHandler
5cf17f0 [Hossein] Adding unit test for error messages from SQLContext
2af75d5 [Hossein] Reading error message in case of failure and stoping with that message
f479c99 [Hossein] Wrting exception cause message in JVM
2015-07-30 16:16:17 -07:00
Eric Liang e7905a9395 [SPARK-9463] [ML] Expose model coefficients with names in SparkR RFormula
Preview:

```
> summary(m)
            features coefficients
1        (Intercept)    1.6765001
2       Sepal_Length    0.3498801
3 Species.versicolor   -0.9833885
4  Species.virginica   -1.0075104

```

Design doc from umbrella task: https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit

cc mengxr

Author: Eric Liang <ekl@databricks.com>

Closes #7771 from ericl/summary and squashes the following commits:

ccd54c3 [Eric Liang] second pass
a5ca93b [Eric Liang] comments
2772111 [Eric Liang] clean up
70483ef [Eric Liang] fix test
7c247d4 [Eric Liang] Merge branch 'master' into summary
3c55024 [Eric Liang] working
8c539aa [Eric Liang] first pass
2015-07-30 16:15:43 -07:00
Yuu ISHIKAWA 7492a33fdd [SPARK-9248] [SPARKR] Closing curly-braces should always be on their own line
### JIRA
[[SPARK-9248] Closing curly-braces should always be on their own line - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9248)

## The result of `dev/lint-r`
[The result of `dev/lint-r` for SPARK-9248 at the revistion:6175d6cfe795fbd88e3ee713fac375038a3993a8](https://gist.github.com/yu-iskw/96cadcea4ce664c41f81)

Author: Yuu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #7795 from yu-iskw/SPARK-9248 and squashes the following commits:

c8eccd3 [Yuu ISHIKAWA] [SPARK-9248][SparkR] Closing curly-braces should always be on their own line
2015-07-30 10:00:27 -07:00
Eric Liang 8d5bb5283c [SPARK-9391] [ML] Support minus, dot, and intercept operators in SparkR RFormula
Adds '.', '-', and intercept parsing to RFormula. Also splits RFormulaParser into a separate file.

Umbrella design doc here: https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit?usp=sharing

mengxr

Author: Eric Liang <ekl@databricks.com>

Closes #7707 from ericl/string-features-2 and squashes the following commits:

8588625 [Eric Liang] exclude complex types for .
8106ffe [Eric Liang] comments
a9350bb [Eric Liang] s/var/val
9c50d4d [Eric Liang] Merge branch 'string-features' into string-features-2
581afb2 [Eric Liang] Merge branch 'master' into string-features
08ae539 [Eric Liang] Merge branch 'string-features' into string-features-2
f99131a [Eric Liang] comments
cecec43 [Eric Liang] Merge branch 'string-features' into string-features-2
0bf3c26 [Eric Liang] update docs
4592df2 [Eric Liang] intercept supports
7412a2e [Eric Liang] Fri Jul 24 14:56:51 PDT 2015
3cf848e [Eric Liang] fix the parser
0556c2b [Eric Liang] Merge branch 'string-features' into string-features-2
c302a2c [Eric Liang] fix tests
9d1ac82 [Eric Liang] Merge remote-tracking branch 'upstream/master' into string-features
e713da3 [Eric Liang] comments
cd231a9 [Eric Liang] Wed Jul 22 17:18:44 PDT 2015
4d79193 [Eric Liang] revert to seq + distinct
169a085 [Eric Liang] tweak functional test
a230a47 [Eric Liang] Merge branch 'master' into string-features
72bd6f3 [Eric Liang] fix merge
d841cec [Eric Liang] Merge branch 'master' into string-features
5b2c4a2 [Eric Liang] Mon Jul 20 18:45:33 PDT 2015
b01c7c5 [Eric Liang] add test
8a637db [Eric Liang] encoder wip
a1d03f4 [Eric Liang] refactor into estimator
2015-07-28 14:16:57 -07:00
trestletech 6143234062 Use vector-friendly comparison for packages argument.
Otherwise, `sparkR.init()` with multiple `sparkPackages` results in this warning:

```
Warning message:
In if (packages != "") { :
  the condition has length > 1 and only the first element will be used
```

Author: trestletech <jeff.allen@trestletechnology.net>

Closes #7701 from trestletech/compare-packages and squashes the following commits:

72c8b36 [trestletech] Correct function name.
c52db0e [trestletech] Added test for multiple packages.
3aab1a7 [trestletech] Use vector-friendly comparison for packages argument.
2015-07-28 10:45:19 -07:00
Eric Liang 8ddfa52c20 [SPARK-9230] [ML] Support StringType features in RFormula
This adds StringType feature support via OneHotEncoder. As part of this task it was necessary to change RFormula to an Estimator, so that factor levels could be determined from the training dataset.

Not sure if I am using uids correctly here, would be good to get reviewer help on that.
cc mengxr

Umbrella design doc: https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit#

Author: Eric Liang <ekl@databricks.com>

Closes #7574 from ericl/string-features and squashes the following commits:

f99131a [Eric Liang] comments
0bf3c26 [Eric Liang] update docs
c302a2c [Eric Liang] fix tests
9d1ac82 [Eric Liang] Merge remote-tracking branch 'upstream/master' into string-features
e713da3 [Eric Liang] comments
4d79193 [Eric Liang] revert to seq + distinct
169a085 [Eric Liang] tweak functional test
a230a47 [Eric Liang] Merge branch 'master' into string-features
72bd6f3 [Eric Liang] fix merge
d841cec [Eric Liang] Merge branch 'master' into string-features
5b2c4a2 [Eric Liang] Mon Jul 20 18:45:33 PDT 2015
b01c7c5 [Eric Liang] add test
8a637db [Eric Liang] encoder wip
a1d03f4 [Eric Liang] refactor into estimator
2015-07-27 17:17:49 -07:00
Xiangrui Meng 2f5cbd860e [SPARK-8364] [SPARKR] Add crosstab to SparkR DataFrames
Add `crosstab` to SparkR DataFrames, which takes two column names and returns a local R data.frame. This is similar to `table` in R. However, `table` in SparkR is used for loading SQL tables as DataFrames. The return type is data.frame instead table for `crosstab` to be compatible with Scala/Python.

I couldn't run R tests successfully on my local. Many unit tests failed. So let's try Jenkins.

Author: Xiangrui Meng <meng@databricks.com>

Closes #7318 from mengxr/SPARK-8364 and squashes the following commits:

d75e894 [Xiangrui Meng] fix tests
53f6ddd [Xiangrui Meng] fix tests
f1348d6 [Xiangrui Meng] update test
47cb088 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-8364
5621262 [Xiangrui Meng] first version without test
2015-07-22 21:40:23 -07:00
Eric Liang 1cbdd89918 [SPARK-9201] [ML] Initial integration of MLlib + SparkR using RFormula
This exposes the SparkR:::glm() and SparkR:::predict() APIs. It was necessary to change RFormula to silently drop the label column if it was missing from the input dataset, which is kind of a hack but necessary to integrate with the Pipeline API.

The umbrella design doc for MLlib + SparkR integration can be viewed here: https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit

mengxr

Author: Eric Liang <ekl@databricks.com>

Closes #7483 from ericl/spark-8774 and squashes the following commits:

3dfac0c [Eric Liang] update
17ef516 [Eric Liang] more comments
1753a0f [Eric Liang] make glm generic
b0f50f8 [Eric Liang] equivalence test
550d56d [Eric Liang] export methods
c015697 [Eric Liang] second pass
117949a [Eric Liang] comments
5afbc67 [Eric Liang] test label columns
6b7f15f [Eric Liang] Fri Jul 17 14:20:22 PDT 2015
3a63ae5 [Eric Liang] Fri Jul 17 13:41:52 PDT 2015
ce61367 [Eric Liang] Fri Jul 17 13:41:17 PDT 2015
0299c59 [Eric Liang] Fri Jul 17 13:40:32 PDT 2015
e37603f [Eric Liang] Fri Jul 17 12:15:03 PDT 2015
d417d0c [Eric Liang] Merge remote-tracking branch 'upstream/master' into spark-8774
29a2ce7 [Eric Liang] Merge branch 'spark-8774-1' into spark-8774
d1959d2 [Eric Liang] clarify comment
2db68aa [Eric Liang] second round of comments
dc3c943 [Eric Liang] address comments
5765ec6 [Eric Liang] fix style checks
1f361b0 [Eric Liang] doc
d33211b [Eric Liang] r support
fb0826b [Eric Liang] [SPARK-8774] Add R model formula with basic support as a transformer
2015-07-20 20:49:38 -07:00
Yu ISHIKAWA 5a3c1ad087 [SPARK-9093] [SPARKR] Fix single-quotes strings in SparkR
[[SPARK-9093] Fix single-quotes strings in SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9093)

This is the result of lintr at the rivision:011551620faa87107a787530f074af3d9be7e695
[[SPARK-9093] The result of lintr at 011551620f](https://gist.github.com/yu-iskw/8c47acf3202796da4d01)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #7439 from yu-iskw/SPARK-9093 and squashes the following commits:

61c391e [Yu ISHIKAWA] [SPARK-9093][SparkR] Fix single-quotes strings in SparkR
2015-07-17 17:00:50 +09:00
Liang-Chi Hsieh 0a795336df [SPARK-8807] [SPARKR] Add between operator in SparkR
JIRA: https://issues.apache.org/jira/browse/SPARK-8807

Add between operator in SparkR.

Author: Liang-Chi Hsieh <viirya@appier.com>

Closes #7356 from viirya/add_r_between and squashes the following commits:

7f51b44 [Liang-Chi Hsieh] Add test for non-numeric column.
c6a25c5 [Liang-Chi Hsieh] Add between function.
2015-07-15 23:36:57 -07:00
Liang-Chi Hsieh 6f6902597d [SPARK-8840] [SPARKR] Add float coercion on SparkR
JIRA: https://issues.apache.org/jira/browse/SPARK-8840

Currently the type coercion rules don't include float type. This PR simply adds it.

Author: Liang-Chi Hsieh <viirya@appier.com>

Closes #7280 from viirya/add_r_float_coercion and squashes the following commits:

c86dc0e [Liang-Chi Hsieh] For comments.
dbf0c1b [Liang-Chi Hsieh] Implicitly convert Double to Float based on provided schema.
733015a [Liang-Chi Hsieh] Add test case for DataFrame with float type.
30c2a40 [Liang-Chi Hsieh] Update test case.
52b5294 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into add_r_float_coercion
6f9159d [Liang-Chi Hsieh] Add another test case.
8db3244 [Liang-Chi Hsieh] schema also needs to support float. add test case.
0dcc992 [Liang-Chi Hsieh] Add float coercion on SparkR.
2015-07-15 09:48:33 -07:00
Sun Rui f650a005e0 [SPARK-8808] [SPARKR] Fix assignments in SparkR.
Author: Sun Rui <rui.sun@intel.com>

Closes #7395 from sun-rui/SPARK-8808 and squashes the following commits:

ce603bc [Sun Rui] Use '<-' instead of '='.
88590b1 [Sun Rui] Use '<-' instead of '='.
2015-07-14 22:21:01 -07:00
Sun Rui 7f487c8bde [SPARK-6797] [SPARKR] Add support for YARN cluster mode.
This PR enables SparkR to dynamically ship the SparkR binary package to the AM node in YARN cluster mode, thus it is no longer required that the SparkR package be installed on each worker node.

This PR uses the JDK jar tool to package the SparkR package, because jar is thought to be available on both Linux/Windows platforms where JDK has been installed.

This PR does not address the R worker involved in RDD API. Will address it in a separate JIRA issue.

This PR does not address SBT build. SparkR installation and packaging by SBT will be addressed in a separate JIRA issue.

R/install-dev.bat is not tested. shivaram , Could you help to test it?

Author: Sun Rui <rui.sun@intel.com>

Closes #6743 from sun-rui/SPARK-6797 and squashes the following commits:

ca63c86 [Sun Rui] Adjust MimaExcludes after rebase.
7313374 [Sun Rui] Fix unit test errors.
72695fb [Sun Rui] Fix unit test failures.
193882f [Sun Rui] Fix Mima test error.
fe25a33 [Sun Rui] Fix Mima test error.
35ecfa3 [Sun Rui] Fix comments.
c38a005 [Sun Rui] Unzipped SparkR binary package is still required for standalone and Mesos modes.
b05340c [Sun Rui] Fix scala style.
2ca5048 [Sun Rui] Fix comments.
1acefd1 [Sun Rui] Fix scala style.
0aa1e97 [Sun Rui] Fix scala style.
41d4f17 [Sun Rui] Add support for locating SparkR package for R workers required by RDD APIs.
49ff948 [Sun Rui] Invoke jar.exe with full path in install-dev.bat.
7b916c5 [Sun Rui] Use 'rem' consistently.
3bed438 [Sun Rui] Add a comment.
681afb0 [Sun Rui] Fix a bug that RRunner does not handle client deployment modes.
cedfbe2 [Sun Rui] [SPARK-6797][SPARKR] Add support for YARN cluster mode.
2015-07-13 08:21:47 -07:00
Yu ISHIKAWA a0cb111b22 [SPARK-8549] [SPARKR] Fix the line length of SparkR
[[SPARK-8549] Fix the line length of SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8549)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #7204 from yu-iskw/SPARK-8549 and squashes the following commits:

6fb131a [Yu ISHIKAWA] Fix the typo
1737598 [Yu ISHIKAWA] [SPARK-8549][SparkR] Fix the line length of SparkR
2015-07-05 20:50:02 -07:00
Sun Rui 69c5dee2f0 [SPARK-7714] [SPARKR] SparkR tests should use more specific expectations than expect_true
1. Update the pattern 'expect_true(a == b)' to 'expect_equal(a, b)'.
2. Update the pattern 'expect_true(inherits(a, b))' to 'expect_is(a, b)'.
3. Update the pattern 'expect_true(identical(a, b))' to 'expect_identical(a, b)'.

Author: Sun Rui <rui.sun@intel.com>

Closes #7152 from sun-rui/SPARK-7714 and squashes the following commits:

8ad2440 [Sun Rui] Fix test case errors.
8fe9f0c [Sun Rui] Update the pattern 'expect_true(identical(a, b))' to 'expect_identical(a, b)'.
f1b8005 [Sun Rui] Update the pattern 'expect_true(inherits(a, b))' to 'expect_is(a, b)'.
f631e94 [Sun Rui] Update the pattern 'expect_true(a == b)' to 'expect_equal(a, b)'.
2015-07-01 09:50:12 -07:00
cafreeman 9d11817765 [SPARK-8607] SparkR -- jars not being added to application classpath correctly
Add `getStaticClass` method in SparkR's `RBackendHandler`

This is a fix for the problem referenced in [SPARK-5185](https://issues.apache.org/jira/browse/SPARK-5185).

cc shivaram

Author: cafreeman <cfreeman@alteryx.com>

Closes #7001 from cafreeman/branch-1.4 and squashes the following commits:

8f81194 [cafreeman] Add missing license
31aedcf [cafreeman] Refactor test to call an external R script
2c22073 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4
0bea809 [cafreeman] Fixed relative path issue and added smaller JAR
ee25e60 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4
9a5c362 [cafreeman] test for including JAR when launching sparkContext
9101223 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4
5a80844 [cafreeman] Fix style nits
7c6bd0c [cafreeman] [SPARK-8607] SparkR

(cherry picked from commit 2579948bf5)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
2015-06-26 17:06:16 -07:00
cafreeman a56516fc92 [SPARK-8662] SparkR Update SparkSQL Test
Test `infer_type` using a more fine-grained approach rather than comparing environments. Since `all.equal`'s behavior has changed in R 3.2, the test became unpassable.

JIRA here:
https://issues.apache.org/jira/browse/SPARK-8662

Author: cafreeman <cfreeman@alteryx.com>

Closes #7045 from cafreeman/R32_Test and squashes the following commits:

b97cc52 [cafreeman] Add `checkStructField` utility
3381e5c [cafreeman] Update SparkSQL Test

(cherry picked from commit 78b31a2a63)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
2015-06-26 10:07:49 -07:00
Holden Karau 43e66192f4 [SPARK-8506] Add pakages to R context created through init.
Author: Holden Karau <holden@pigscanfly.ca>

Closes #6928 from holdenk/SPARK-8506-sparkr-does-not-provide-an-easy-way-to-depend-on-spark-packages-when-performing-init-from-inside-of-r and squashes the following commits:

b60dd63 [Holden Karau] Add an example with the spark-csv package
fa8bc92 [Holden Karau] typo: sparm -> spark
865a90c [Holden Karau] strip spaces for comparision
c7a4471 [Holden Karau] Add some documentation
c1a9233 [Holden Karau] refactor for testing
c818556 [Holden Karau] Add pakages to R
2015-06-24 11:55:20 -07:00
Alok Singh f2fb0285ab [SPARK-8111] [SPARKR] SparkR shell should display Spark logo and version banner on startup.
spark version is taken from the environment variable SPARK_VERSION

Author: Alok  Singh <singhal@Aloks-MacBook-Pro.local>
Author: Alok  Singh <singhal@aloks-mbp.usca.ibm.com>

Closes #6944 from aloknsingh/aloknsingh_spark_jiras and squashes the following commits:

ed607bd [Alok  Singh] [SPARK-8111][SparkR] As per suggestion, 1) using the version from sparkContext rather than the Sys.env. 2) change "Welcome to SparkR!" to "Welcome to" followed by Spark logo and version
acd5b85 [Alok  Singh] fix the jira SPARK-8111 to add the spark version and logo. Currently spark version is taken from the environment variable SPARK_VERSION
2015-06-23 12:47:55 -07:00
Yu ISHIKAWA d4f633514a [SPARK-8431] [SPARKR] Add in operator to DataFrame Column in SparkR
[[SPARK-8431] Add in operator to DataFrame Column in SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8431)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #6941 from yu-iskw/SPARK-8431 and squashes the following commits:

1f64423 [Yu ISHIKAWA] Modify the comment
f4309a7 [Yu ISHIKAWA] Make a `setMethod` for `%in%` be independent
6e37936 [Yu ISHIKAWA] Modify a variable name
c196173 [Yu ISHIKAWA] [SPARK-8431][SparkR] Add in operator to DataFrame Column in SparkR
2015-06-22 23:04:36 -07:00
Yu ISHIKAWA 44fa7df64d [SPARK-8548] [SPARKR] Remove the trailing whitespaces from the SparkR files
[[SPARK-8548] Remove the trailing whitespaces from the SparkR files - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8548)

- This is the result of `lint-r`
    https://gist.github.com/yu-iskw/0019b37a2c1167f33986

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #6945 from yu-iskw/SPARK-8548 and squashes the following commits:

0bd567a [Yu ISHIKAWA] [SPARK-8548][SparkR] Remove the trailing whitespaces from the SparkR files
2015-06-22 20:55:38 -07:00
Hossein 1fa29c2df2 [SPARK-8452] [SPARKR] expose jobGroup API in SparkR
This pull request adds following methods to SparkR:

```R
setJobGroup()
cancelJobGroup()
clearJobGroup()
```
For each method, the spark context is passed as the first argument. There does not seem to be a good way to test these in R.

cc shivaram and davies

Author: Hossein <hossein@databricks.com>

Closes #6889 from falaki/SPARK-8452 and squashes the following commits:

9ce9f1e [Hossein] Added basic tests to verify methods can be called and won't throw errors
c706af9 [Hossein] Added examples
a2c19af [Hossein] taking spark context as first argument
343ca77 [Hossein] Added setJobGroup, cancelJobGroup and clearJobGroup to SparkR
2015-06-19 15:51:59 -07:00
hqzizania a5c52c1a34 [SPARK-6820] [SPARKR] Convert NAs to null type in SparkR DataFrames
Author: hqzizania <qian.huang@intel.com>

Closes #6190 from hqzizania/R and squashes the following commits:

1641f9e [hqzizania] fixes and add test units
bb3411a [hqzizania] Convert NAs to null type in SparkR DataFrames
2015-06-08 21:40:12 -07:00
Shivaram Venkataraman 12f5eaeee1 [SPARK-8085] [SPARKR] Support user-specified schema in read.df
cc davies sun-rui

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6620 from shivaram/sparkr-read-schema and squashes the following commits:

16a6726 [Shivaram Venkataraman] Fix loadDF to pass schema Also add a unit test
a229877 [Shivaram Venkataraman] Use wrapper function to DataFrameReader
ee70ba8 [Shivaram Venkataraman] Support user-specified schema in read.df
2015-06-05 10:19:03 -07:00
Sun Rui 708c63bbbe [SPARK-8063] [SPARKR] Spark master URL conflict between MASTER env variable and --master command line option.
Author: Sun Rui <rui.sun@intel.com>

Closes #6605 from sun-rui/SPARK-8063 and squashes the following commits:

51ca48b [Sun Rui] [SPARK-8063][SPARKR] Spark master URL conflict between MASTER env variable and --master command line option.
2015-06-03 11:56:35 -07:00
Sun Rui 46576ab303 [SPARK-7227] [SPARKR] Support fillna / dropna in R DataFrame.
Author: Sun Rui <rui.sun@intel.com>

Closes #6183 from sun-rui/SPARK-7227 and squashes the following commits:

dd6f5b3 [Sun Rui] Rename readEnv() back to readMap(). Add alias na.omit() for dropna().
41cf725 [Sun Rui] [SPARK-7227][SPARKR] Support fillna / dropna in R DataFrame.
2015-05-31 15:01:59 -07:00
Davies Liu 7af3818c6b [SPARK-6806] [SPARKR] [DOCS] Fill in SparkR examples in programming guide
sqlCtx -> sqlContext

You can check the docs by:

```
$ cd docs
$ SKIP_SCALADOC=1 jekyll serve
```
cc shivaram

Author: Davies Liu <davies@databricks.com>

Closes #5442 from davies/r_docs and squashes the following commits:

7a12ec6 [Davies Liu] remove rdd in R docs
8496b26 [Davies Liu] remove the docs related to RDD
e23b9d6 [Davies Liu] delete R docs for RDD API
222e4ff [Davies Liu] Merge branch 'master' into r_docs
89684ce [Davies Liu] Merge branch 'r_docs' of github.com:davies/spark into r_docs
f0a10e1 [Davies Liu] address comments from @shivaram
f61de71 [Davies Liu] Update pairRDD.R
3ef7cf3 [Davies Liu] use + instead of function(a,b) a+b
2f10a77 [Davies Liu] address comments from @cafreeman
9c2a062 [Davies Liu] mention R api together with Python API
23f751a [Davies Liu] Fill in SparkR examples in programming guide
2015-05-23 00:01:40 -07:00
Josh Rosen c9fa870a6d [SPARK-7687] [SQL] DataFrame.describe() should cast all aggregates to String
In `DataFrame.describe()`, the `count` aggregate produces an integer, the `avg` and `stdev` aggregates produce doubles, and `min` and `max` aggregates can produce varying types depending on what type of column they're applied to.  As a result, we should cast all aggregate results to String so that `describe()`'s output types match its declared output schema.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #6218 from JoshRosen/SPARK-7687 and squashes the following commits:

146b615 [Josh Rosen] Fix R test.
2974bd5 [Josh Rosen] Cast to string type instead
f206580 [Josh Rosen] Cast to double to fix SPARK-7687
307ecbf [Josh Rosen] Add failing regression test for SPARK-7687
2015-05-18 21:53:44 -07:00
qhuang 50da9e8916 [SPARK-7226] [SPARKR] Support math functions in R DataFrame
Author: qhuang <qian.huang@intel.com>

Closes #6170 from hqzizania/master and squashes the following commits:

f20c39f [qhuang] add tests units and fixes
2a7d121 [qhuang] use a function name more familiar to R users
07aa72e [qhuang] Support math functions in R DataFrame
2015-05-15 14:06:16 -07:00
Sun Rui df9b94a57c [SPARK-7482] [SPARKR] Rename some DataFrame API methods in SparkR to match their counterparts in Scala.
Author: Sun Rui <rui.sun@intel.com>

Closes #6007 from sun-rui/SPARK-7482 and squashes the following commits:

5c5cf5e [Sun Rui] Implement alias loadDF() as a new function.
3a30c10 [Sun Rui] Rename load()/save() to read.df()/write.df(). Also add loadDF()/saveDF() as aliases.
9f569d6 [Sun Rui] [SPARK-7482][SparkR] Rename some DataFrame API methods in SparkR to match their counterparts in Scala.
2015-05-12 23:52:30 -07:00
Joshi b94a93371c [SPARK-7435] [SPARKR] Make DataFrame.show() consistent with that of Scala and pySpark
Author: Joshi <rekhajoshm@gmail.com>
Author: Rekha Joshi <rekhajoshm@gmail.com>

Closes #5989 from rekhajoshm/fix/SPARK-7435 and squashes the following commits:

cfc9e02 [Joshi] Spark-7435[R]: updated patch for review comments
62becc1 [Joshi] SPARK-7435: Update to DataFrame
e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
2015-05-11 21:04:32 -07:00
Shivaram Venkataraman 0a901dd3a1 [SPARK-7231] [SPARKR] Changes to make SparkR DataFrame dplyr friendly.
Changes include
1. Rename sortDF to arrange
2. Add new aliases `group_by` and `sample_frac`, `summarize`
3. Add more user friendly column addition (mutate), rename
4. Support mean as an alias for avg in Scala and also support n_distinct, n as in dplyr

Using these changes we can pretty much run the examples as described in http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html with the same syntax

The only thing missing in SparkR is auto resolving column names when used in an expression i.e. making something like `select(flights, delay)` works in dply but we right now need `select(flights, flights$delay)` or `select(flights, "delay")`. But this is a complicated change and I'll file a new issue for it

cc sun-rui rxin

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6005 from shivaram/sparkr-df-api and squashes the following commits:

5e0716a [Shivaram Venkataraman] Fix some roxygen bugs
1254953 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into sparkr-df-api
0521149 [Shivaram Venkataraman] Changes to make SparkR DataFrame dplyr friendly. Changes include 1. Rename sortDF to arrange 2. Add new aliases `group_by` and `sample_frac`, `summarize` 3. Add more user friendly column addition (mutate), rename 4. Support mean as an alias for avg in Scala and also support n_distinct, n as in dplyr
2015-05-08 18:29:57 -07:00
Sun Rui 9cfa9a516e [SPARK-6812] [SPARKR] filter() on DataFrame does not work as expected.
According to the R manual: https://stat.ethz.ch/R-manual/R-devel/library/base/html/Startup.html,
" if a function .First is found on the search path, it is executed as .First(). Finally, function .First.sys() in the base package is run. This calls require to attach the default packages specified by options("defaultPackages")."
In .First() in profile/shell.R, we load SparkR package. This means SparkR package is loaded before default packages. If there are same names in default packages, they will overwrite those in SparkR. This is why filter() in SparkR is masked by filter() in stats, which is usually in the default package list.
We need to make sure SparkR is loaded after default packages. The solution is to append SparkR to default packages, instead of loading SparkR in .First().

BTW, I'd like to discuss our policy on how to solve name conflict. Previously, we rename API names from Scala API if there is name conflict with base or other commonly-used packages. However, from long term perspective, this is not good for API stability, because we can't predict name conflicts, for example, if in the future a name added in base package conflicts with an API in SparkR? So the better policy is to keep API name same as Scala's without worrying about name conflicts. When users use SparkR, they should load SparkR as last package, so that all API names are effective. Use can explicitly use :: to refer to hidden names from other packages. If we agree on this, I can submit a JIRA issue to change back some rename API methods, for example, DataFrame.sortDF().

Author: Sun Rui <rui.sun@intel.com>

Closes #5938 from sun-rui/SPARK-6812 and squashes the following commits:

b569145 [Sun Rui] [SPARK-6812][SparkR] filter() on DataFrame does not work as expected.
2015-05-06 22:48:16 -07:00
qhuang a466944399 [SPARK-6841] [SPARKR] add support for mean, median, stdev etc.
Moving here from https://github.com/amplab-extras/SparkR-pkg/pull/241
sum() has been implemented. (https://github.com/amplab-extras/SparkR-pkg/pull/242)

Now Phase 1: mean, sd, var have been implemented, but some things still need to be improved with the suggestions in https://issues.apache.org/jira/browse/SPARK-6841

Author: qhuang <qian.huang@intel.com>

Closes #5446 from hqzizania/R and squashes the following commits:

f283572 [qhuang] add test unit for describe()
2e74d5a [qhuang] add describe() DataFrame API
2015-05-05 20:39:56 -07:00
Shivaram Venkataraman c688e3c5e4 [SPARK-7230] [SPARKR] Make RDD private in SparkR.
This change makes the RDD API private in SparkR and all internal uses of the SparkR API use SparkR::: to access private functions.

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #5895 from shivaram/rrdd-private and squashes the following commits:

bdb2f07 [Shivaram Venkataraman] Make RDD private in SparkR. This change also makes all internal uses of the SparkR API use SparkR::: to access private functions
2015-05-05 14:40:33 -07:00
云峤 f32e69ecc3 [SPARK-7319][SQL] Improve the output from DataFrame.show()
Author: 云峤 <chensong.cs@alibaba-inc.com>

Closes #5865 from kaka1992/df.show and squashes the following commits:

c79204b [云峤] Update
a1338f6 [云峤] Update python dataFrame show test and add empty df unit test.
734369c [云峤] Update python dataFrame show test and add empty df unit test.
84aec3e [云峤] Update python dataFrame show test and add empty df unit test.
159b3d5 [云峤] update
03ef434 [云峤] update
7394fd5 [云峤] update test show
ced487a [云峤] update pep8
b6e690b [云峤] Merge remote-tracking branch 'upstream/master' into df.show
30ac311 [云峤] [SPARK-7294] ADD BETWEEN
7d62368 [云峤] [SPARK-7294] ADD BETWEEN
baf839b [云峤] [SPARK-7294] ADD BETWEEN
d11d5b9 [云峤] [SPARK-7294] ADD BETWEEN
2015-05-04 12:08:38 -07:00
hlin09 ca9f4ebb8e [SPARK-6991] [SPARKR] Adds support for zipPartitions.
Author: hlin09 <hlin09pu@gmail.com>

Closes #5568 from hlin09/zipPartitions and squashes the following commits:

12c08a5 [hlin09] Fix comments
d2d32db [hlin09] Merge branch 'master' into zipPartitions
ec56d2f [hlin09] Fix test.
27655d3 [hlin09] Adds support for zipPartitions.
2015-04-27 15:04:37 -07:00
Jeff Harrison 7078f6028b [SPARK-6856] [R] Make RDD information more useful in SparkR
Author: Jeff Harrison <jeffrharrison@gmail.com>

Closes #5667 from His-name-is-Joof/joofspark and squashes the following commits:

f8814a6 [Jeff Harrison] newline added after RDD show() output
4d9d972 [Jeff Harrison] Merge branch 'master' into joofspark
9d2295e [Jeff Harrison] parallelize with 1:10
878b830 [Jeff Harrison] Merge branch 'master' into joofspark
c8c0b80 [Jeff Harrison] add test for RDD function show()
123be65 [Jeff Harrison] SPARK-6856
2015-04-27 13:38:25 -07:00
Sun Rui ebb77b2aff [SPARK-7033] [SPARKR] Clean usage of split. Use partition instead where applicable.
Author: Sun Rui <rui.sun@intel.com>

Closes #5628 from sun-rui/SPARK-7033 and squashes the following commits:

046bc9e [Sun Rui] Clean split usage in tests.
d531c86 [Sun Rui] [SPARK-7033][SPARKR] Clean usage of split. Use partition instead where applicable.
2015-04-24 11:00:19 -07:00
Sun Rui 73db132bf5 [SPARK-6818] [SPARKR] Support column deletion in SparkR DataFrame API.
Author: Sun Rui <rui.sun@intel.com>

Closes #5655 from sun-rui/SPARK-6818 and squashes the following commits:

7c66570 [Sun Rui] [SPARK-6818][SPARKR] Support column deletion in SparkR DataFrame API.
2015-04-23 16:08:14 -07:00
cafreeman 59e206deb7 [SPARK-6807] [SparkR] Merge recent SparkR-pkg changes
This PR pulls in recent changes in SparkR-pkg, including

cartesian, intersection, sampleByKey, subtract, subtractByKey, except, and some API for StructType and StructField.

Author: cafreeman <cfreeman@alteryx.com>
Author: Davies Liu <davies@databricks.com>
Author: Zongheng Yang <zongheng.y@gmail.com>
Author: Shivaram Venkataraman <shivaram.venkataraman@gmail.com>
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Author: Sun Rui <rui.sun@intel.com>

Closes #5436 from davies/R3 and squashes the following commits:

c2b09be [Davies Liu] SQLTypes -> schema
a5a02f2 [Davies Liu] Merge branch 'master' of github.com:apache/spark into R3
168b7fe [Davies Liu] sort generics
b1fe460 [Davies Liu] fix conflict in README.md
e74c04e [Davies Liu] fix schema.R
4f5ac09 [Davies Liu] Merge branch 'master' of github.com:apache/spark into R5
41f8184 [Davies Liu] rm man
ae78312 [Davies Liu] Merge pull request #237 from sun-rui/SPARKR-154_3
1bdcb63 [Zongheng Yang] Updates to README.md.
5a553e7 [cafreeman] Use object attribute instead of argument
71372d9 [cafreeman] Update docs and examples
8526d2e71 [cafreeman] Remove `tojson` functions
6ef5f2d [cafreeman] Fix spacing
7741d66 [cafreeman] Rename the SQL DataType function
141efd8 [Shivaram Venkataraman] Merge pull request #245 from hqzizania/upstream
9387402 [Davies Liu] fix style
40199eb [Shivaram Venkataraman] Move except into sorted position
07d0dbc [Sun Rui] [SPARKR-244] Fix test failure after integration of subtract() and subtractByKey() for RDD.
7e8caa3 [Shivaram Venkataraman] Merge pull request #246 from hlin09/fixCombineByKey
ed66c81 [cafreeman] Update `subtract` to work with `generics.R`
f3ba785 [cafreeman] Fixed duplicate export
275deb4 [cafreeman] Update `NAMESPACE` and tests
1a3b63d [cafreeman] new version of `CreateDF`
836c4bf [cafreeman] Update `createDataFrame` and `toDF`
be5d5c1 [cafreeman] refactor schema functions
40338a4 [Zongheng Yang] Merge pull request #244 from sun-rui/SPARKR-154_5
20b97a6 [Zongheng Yang] Merge pull request #234 from hqzizania/assist
ba54e34 [Shivaram Venkataraman] Merge pull request #238 from sun-rui/SPARKR-154_4
c9497a3 [Shivaram Venkataraman] Merge pull request #208 from lythesia/master
b317aa7 [Zongheng Yang] Merge pull request #243 from hqzizania/master
136a07e [Zongheng Yang] Merge pull request #242 from hqzizania/stats
cd66603 [cafreeman] new line at EOF
8b76e81 [Shivaram Venkataraman] Merge pull request #233 from redbaron/fail-early-on-missing-dep
7dd81b7 [cafreeman] Documentation
0e2a94f [cafreeman] Define functions for schema and fields
2015-04-17 13:42:19 -07:00
hlin09 d7f2c19867 [SPARK-6881][SparkR] Changes the checkpoint directory name.
Author: hlin09 <hlin09pu@gmail.com>

Closes #5493 from hlin09/fixCheckpointDir and squashes the following commits:

e67fc40 [hlin09] Change to temp dir.
1f7ed9e [hlin09] Change the checkpoint dir name.
2015-04-13 16:53:50 -07:00