ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Yanbo Liang	83af297ac4	[SPARK-13925][ML][SPARKR] Expose R-like summary statistics in SparkR::glm for more family and link functions ## What changes were proposed in this pull request? Expose R-like summary statistics in SparkR::glm for more family and link functions. Note: Not all values in R [summary.glm](http://stat.ethz.ch/R-manual/R-patched/library/stats/html/summary.glm.html) are exposed, we only provide the most commonly used statistics in this PR. More statistics can be added in the followup work. ## How was this patch tested? Unit tests. SparkR Output: ``` Deviance Residuals: (Note: These are approximate quantiles with relative error <= 0.01) Min 1Q Median 3Q Max -0.95096 -0.16585 -0.00232 0.17410 0.72918 Coefficients: Estimate Std. Error t value Pr(>\|t\|) (Intercept) 1.6765 0.23536 7.1231 4.4561e-11 Sepal_Length 0.34988 0.046301 7.5566 4.1873e-12 Species_versicolor -0.98339 0.072075 -13.644 0 Species_virginica -1.0075 0.093306 -10.798 0 (Dispersion parameter for gaussian family taken to be 0.08351462) Null deviance: 28.307 on 149 degrees of freedom Residual deviance: 12.193 on 146 degrees of freedom AIC: 59.22 Number of Fisher Scoring iterations: 1 ``` R output: ``` Deviance Residuals: Min 1Q Median 3Q Max -0.95096 -0.16522 0.00171 0.18416 0.72918 Coefficients: Estimate Std. Error t value Pr(>\|t\|) (Intercept) 1.67650 0.23536 7.123 4.46e-11 * Sepal.Length 0.34988 0.04630 7.557 4.19e-12 * Speciesversicolor -0.98339 0.07207 -13.644 < 2e-16 * Speciesvirginica -1.00751 0.09331 -10.798 < 2e-16 * --- Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for gaussian family taken to be 0.08351462) Null deviance: 28.307 on 149 degrees of freedom Residual deviance: 12.193 on 146 degrees of freedom AIC: 59.217 Number of Fisher Scoring iterations: 2 ``` cc mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #12393 from yanboliang/spark-13925.	2016-04-15 08:23:51 -07:00
Burak Yavuz	1146c534d6	[SPARK-14353] Dataset Time Window `window` API for R ## What changes were proposed in this pull request? The `window` function was added to Dataset with [this PR](https://github.com/apache/spark/pull/12008). This PR adds the R API for this function. With this PR, SQL, Java, and Scala will share the same APIs as in users can use: - `window(timeColumn, windowDuration)` - `window(timeColumn, windowDuration, slideDuration)` - `window(timeColumn, windowDuration, slideDuration, startTime)` In Python and R, users can access all APIs above, but in addition they can do - In R: `window(timeColumn, windowDuration, startTime=...)` that is, they can provide the startTime without providing the `slideDuration`. In this case, we will generate tumbling windows. ## How was this patch tested? Unit tests + manual tests Author: Burak Yavuz <brkyvz@gmail.com> Closes #12141 from brkyvz/R-windows.	2016-04-05 17:21:41 -07:00
Yanbo Liang	13cbb2de70	[SPARK-13010][ML][SPARKR] Implement a simple wrapper of AFTSurvivalRegression in SparkR ## What changes were proposed in this pull request? This PR continues the work in #11447, we implemented the wrapper of ```AFTSurvivalRegression``` named ```survreg``` in SparkR. ## How was this patch tested? Test against output from R package survival's survreg. cc mengxr felixcheung Close #11447 Author: Yanbo Liang <ybliang8@gmail.com> Closes #11932 from yanboliang/spark-13010-new.	2016-03-24 22:29:34 -07:00
Xusen Yin	d6dc12ef01	[SPARK-13449] Naive Bayes wrapper in SparkR ## What changes were proposed in this pull request? This PR continues the work in #11486 from yinxusen with some code refactoring. In R package e1071, `naiveBayes` supports both categorical (Bernoulli) and continuous features (Gaussian), while in MLlib we support Bernoulli and multinomial. This PR implements the common subset: Bernoulli. I moved the implementation out from SparkRWrappers to NaiveBayesWrapper to make it easier to read. Argument names, default values, and summary now match e1071's naiveBayes. I removed the preprocess part that omit NA values because we don't know which columns to process. ## How was this patch tested? Test against output from R package e1071's naiveBayes. cc: yanboliang yinxusen Closes #11486 Author: Xusen Yin <yinxusen@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #11890 from mengxr/SPARK-13449.	2016-03-22 14:16:51 -07:00
Yanbo Liang	50e60e36f7	[SPARK-13504] [SPARKR] Add approxQuantile for SparkR ## What changes were proposed in this pull request? Add ```approxQuantile``` for SparkR. ## How was this patch tested? unit tests Author: Yanbo Liang <ybliang8@gmail.com> Closes #11383 from yanboliang/spark-13504 and squashes the following commits: 4f17adb [Yanbo Liang] Add approxQuantile for SparkR	2016-02-25 21:23:41 -08:00
Xusen Yin	8d29001dec	[SPARK-13011] K-means wrapper in SparkR https://issues.apache.org/jira/browse/SPARK-13011 Author: Xusen Yin <yinxusen@gmail.com> Closes #11124 from yinxusen/SPARK-13011.	2016-02-23 15:42:58 -08:00
Yanbo Liang	e7f9199e70	[SPARK-12903][SPARKR] Add covar_samp and covar_pop for SparkR Add ```covar_samp``` and ```covar_pop``` for SparkR. Should we also provide ```cov``` alias for ```covar_samp```? There is ```cov``` implementation at stats.R which masks ```stats::cov``` already, but may bring to breaking API change. cc sun-rui felixcheung shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10829 from yanboliang/spark-12903.	2016-01-26 19:29:47 -08:00
Sun Rui	1b2a918e59	[SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR. Author: Sun Rui <rui.sun@intel.com> Closes #10201 from sun-rui/SPARK-12204.	2016-01-20 21:08:15 -08:00
felixcheung	488bbb216c	[SPARK-12232][SPARKR] New R API for read.table to avoid name conflict shivaram sorry it took longer to fix some conflicts, this is the change to add an alias for `table` Author: felixcheung <felixcheung_m@hotmail.com> Closes #10406 from felixcheung/readtable.	2016-01-19 18:31:03 -08:00
Sun Rui	3ac648289c	[SPARK-12337][SPARKR] Implement dropDuplicates() method of DataFrame in SparkR. Author: Sun Rui <rui.sun@intel.com> Closes #10309 from sun-rui/SPARK-12337.	2016-01-19 16:37:18 -08:00
felixcheung	37fefa66cb	[SPARK-12168][SPARKR] Add automated tests for conflicted function in R Currently this is reported when loading the SparkR package in R (probably would add is.nan) ``` Loading required package: methods Attaching package: ‘SparkR’ The following objects are masked from ‘package:stats’: cov, filter, lag, na.omit, predict, sd, var The following objects are masked from ‘package:base’: colnames, colnames<-, intersect, rank, rbind, sample, subset, summary, table, transform ``` Adding this test adds an automated way to track changes to masked method. Also, the second part of this test check for those functions that would not be accessible without namespace/package prefix. Incidentally, this might point to how we would fix those inaccessible functions in base or stats. Looking for feedback for adding this test. Author: felixcheung <felixcheung_m@hotmail.com> Closes #10171 from felixcheung/rmaskedtest.	2016-01-19 16:33:48 -08:00
Oscar D. Lara Yejas	ba4a641902	[SPARK-11031][SPARKR] Method str() on a DataFrame Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.usca.ibm.com> Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu> Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com> Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.attlocal.net> Closes #9613 from olarayej/SPARK-11031.	2016-01-15 07:37:54 -08:00
Yanbo Liang	3d77cffec0	[SPARK-12645][SPARKR] SparkR support hash function Add ```hash``` function for SparkR ```DataFrame```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10597 from yanboliang/spark-12645.	2016-01-09 12:29:51 +05:30
Yanbo Liang	d1fea41363	[SPARK-12393][SPARKR] Add read.text and write.text for SparkR Add ```read.text``` and ```write.text``` for SparkR. cc sun-rui felixcheung shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10348 from yanboliang/spark-12393.	2016-01-06 12:05:41 +05:30
Yanbo Liang	22f6cd86fc	[SPARK-12310][SPARKR] Add write.json and write.parquet for SparkR Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated ```saveAsParquetFile```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10281 from yanboliang/spark-12310.	2015-12-16 10:34:30 -08:00
Yanbo Liang	0fb9825556	[SPARK-12146][SPARKR] SparkR jsonFile should support multiple input files * ```jsonFile``` should support multiple input files, such as: ```R jsonFile(sqlContext, c(“path1”, “path2”)) # character vector as arguments jsonFile(sqlContext, “path1,path2”) ``` * Meanwhile, ```jsonFile``` has been deprecated by Spark SQL and will be removed at Spark 2.0. So we mark ```jsonFile``` deprecated and use ```read.json``` at SparkR side. * Replace all ```jsonFile``` with ```read.json``` at test_sparkSQL.R, but still keep jsonFile test case. * If this PR is accepted, we should also make almost the same change for ```parquetFile```. cc felixcheung sun-rui shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10145 from yanboliang/spark-12146.	2015-12-11 11:47:35 -08:00
Yanbo Liang	eeb58722ad	[SPARK-12198][SPARKR] SparkR support read.parquet and deprecate parquetFile SparkR support ```read.parquet``` and deprecate ```parquetFile```. This change is similar with #10145 for ```jsonFile```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10191 from yanboliang/spark-12198.	2015-12-10 09:44:53 -08:00
Sun Rui	c8d0e160da	[SPARK-11774][SPARKR] Implement struct(), encode(), decode() functions in SparkR. Author: Sun Rui <rui.sun@intel.com> Closes #9804 from sun-rui/SPARK-11774.	2015-12-05 15:49:51 -08:00
felixcheung	c793d2d9a1	[SPARK-9319][SPARKR] Add support for setting column names, types Add support for for colnames, colnames<-, coltypes<- Also added tests for names, names<- which have no test previously. I merged with PR 8984 (coltypes). Clicked the wrong thing, crewed up the PR. Recreated it here. Was #9218 shivaram sun-rui Author: felixcheung <felixcheung_m@hotmail.com> Closes #9654 from felixcheung/colnamescoltypes.	2015-11-28 21:16:21 -08:00
Yanbo Liang	ba02f6cb5a	[SPARK-12025][SPARKR] Rename some window rank function names for SparkR Change ```cumeDist -> cume_dist, denseRank -> dense_rank, percentRank -> percent_rank, rowNumber -> row_number``` at SparkR side. There are two reasons that we should make this change: * We should follow the [naming convention rule of R](http://www.inside-r.org/node/230645) * Spark DataFrame has deprecated the old convention (such as ```cumeDist```) and will remove it in Spark 2.0. It's better to fix this issue before 1.6 release, otherwise we will make breaking API change. cc shivaram sun-rui Author: Yanbo Liang <ybliang8@gmail.com> Closes #10016 from yanboliang/SPARK-12025.	2015-11-27 11:48:01 -08:00
Sun Rui	224723e6a8	[SPARK-11773][SPARKR] Implement collection functions in SparkR. Author: Sun Rui <rui.sun@intel.com> Closes #9764 from sun-rui/SPARK-11773.	2015-11-18 08:41:45 -08:00
felixcheung	1a8e0468a1	[SPARK-11468] [SPARKR] add stddev/variance agg functions for Column Checked names, none of them should conflict with anything in base shivaram davies rxin Author: felixcheung <felixcheung_m@hotmail.com> Closes #9489 from felixcheung/rstddev.	2015-11-10 22:45:17 -08:00
Oscar D. Lara Yejas	47735cdc2a	[SPARK-10863][SPARKR] Method coltypes() (New version) This is a follow up on PR #8984, as the corresponding branch for such PR was damaged. Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu> Closes #9579 from olarayej/SPARK-10863_NEW14.	2015-11-10 11:07:57 -08:00
adrian555	b9455d1f18	[SPARK-11260][SPARKR] with() function support Author: adrian555 <wzhuang@us.ibm.com> Author: Adrian Zhuang <adrian555@users.noreply.github.com> Closes #9443 from adrian555/with.	2015-11-05 14:47:38 -08:00
Sun Rui	40c77fb23a	[SPARK-11210][SPARKR] Add window functions into SparkR [step 2]. Author: Sun Rui <rui.sun@intel.com> Closes #9196 from sun-rui/SPARK-11210.	2015-10-30 10:56:06 -07:00
Sun Rui	dc3220ce11	[SPARK-11209][SPARKR] Add window functions into SparkR [step 1]. Author: Sun Rui <rui.sun@intel.com> Closes #9193 from sun-rui/SPARK-11209.	2015-10-26 20:58:18 -07:00
Sun Rui	390b22fad6	[SPARK-10996] [SPARKR] Implement sampleBy() in DataFrameStatFunctions. Author: Sun Rui <rui.sun@intel.com> Closes #9023 from sun-rui/SPARK-10996.	2015-10-13 22:31:23 -07:00
Adrian Zhuang	f7f28ee7a5	[SPARK-10913] [SPARKR] attach() function support Bring the change code up to date. Author: Adrian Zhuang <adrian555@users.noreply.github.com> Author: adrian555 <wzhuang@us.ibm.com> Closes #9031 from adrian555/attach2.	2015-10-13 10:21:07 -07:00
Narine Kokhlikyan	1e0aba90b9	[SPARK-10888] [SPARKR] Added as.DataFrame as a synonym to createDataFrame as.DataFrame is more a R-style like signature. Also, I'd like to know if we could make the context, e.g. sqlContext global, so that we do not have to specify it as an argument, when we each time create a dataframe. Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com> Closes #8952 from NarineK/sparkrasDataFrame.	2015-10-13 10:09:05 -07:00
Sun Rui	864de3bf40	[SPARK-10079] [SPARKR] Make 'column' and 'col' functions be S4 functions. 1. Add a "col" function into DataFrame. 2. Move the current "col" function in Column.R to functions.R, convert it to S4 function. 3. Add a s4 "column" function in functions.R. 4. Convert the "column" function in Column.R to S4 function. This is for private use. Author: Sun Rui <rui.sun@intel.com> Closes #8864 from sun-rui/SPARK-10079.	2015-10-09 23:05:38 -07:00
Rerngvit Yanggratoke	70f44ad2d8	[SPARK-10905] [SPARKR] Export freqItems() for DataFrameStatFunctions [SPARK-10905][SparkR]: Export freqItems() for DataFrameStatFunctions - Add function (together with roxygen2 doc) to DataFrame.R and generics.R - Expose the function in NAMESPACE - Add unit test for the function Author: Rerngvit Yanggratoke <rerngvit@kth.se> Closes #8962 from rerngvit/SPARK-10905.	2015-10-09 09:36:40 -07:00
Sun Rui	f57c63d4c3	[SPARK-10752] [SPARKR] Implement corr() and cov in DataFrameStatFunctions. Author: Sun Rui <rui.sun@intel.com> Closes #8869 from sun-rui/SPARK-10752.	2015-10-07 09:46:37 -07:00
Oscar D. Lara Yejas	f21e2da03f	[SPARK-10807] [SPARKR] Added as.data.frame as a synonym for collect Created method as.data.frame as a synonym for collect(). Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu> Author: olarayej <oscar.lara.yejas@us.ibm.com> Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com> Closes #8908 from olarayej/SPARK-10807.	2015-09-30 18:03:31 -07:00
felixcheung	2a4e00ca4d	[SPARK-9803] [SPARKR] Add subset and transform + tests Add subset and transform Also reorganize `[` & `[[` to subset instead of select Note: for transform, transform is very similar to mutate. Spark doesn't seem to replace existing column with the name in mutate (ie. `mutate(df, age = df$age + 2)` - returned DataFrame has 2 columns with the same name 'age'), so therefore not doing that for now in transform. Though it is clearly stated it should replace column with matching name (should I open a JIRA for mutate/transform?) Author: felixcheung <felixcheung_m@hotmail.com> Closes #8503 from felixcheung/rsubset_transform.	2015-08-28 18:35:01 -07:00
Shivaram Venkataraman	ad7f0f160b	[SPARK-10308] [SPARKR] Add %in% to the exported namespace I also checked all the other functions defined in column.R, functions.R and DataFrame.R and everything else looked fine. cc yu-iskw Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #8473 from shivaram/in-namespace.	2015-08-26 18:13:07 -07:00
Yu ISHIKAWA	d898c33f77	[SPARK-10106] [SPARKR] Add `ifelse` Column function to SparkR ### JIRA [[SPARK-10106] Add `ifelse` Column function to SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10106) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8303 from yu-iskw/SPARK-10106.	2015-08-19 12:39:37 -07:00
Yu ISHIKAWA	2fcb9cb955	[SPARK-9856] [SPARKR] Add expression functions into SparkR whose params are complicated I added lots of Column functinos into SparkR. And I also added `rand(seed: Int)` and `randn(seed: Int)` in Scala. Since we need such APIs for R integer type. ### JIRA [[SPARK-9856] Add expression functions into SparkR whose params are complicated - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9856) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8264 from yu-iskw/SPARK-9856-3.	2015-08-19 10:41:14 -07:00
Yu ISHIKAWA	bf32c1f7f4	[SPARK-10075] [SPARKR] Add `when` expressino function in SparkR - Add `when` and `otherwise` as `Column` methods - Add `When` as an expression function - Add `%otherwise%` infix as an alias of `otherwise` Since R doesn't support a feature like method chaining, `otherwise(when(condition, value), value)` style is a little annoying for me. If `%otherwise%` looks strange for shivaram, I can remove it. What do you think? ### JIRA [[SPARK-10075] Add `when` expressino function in SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10075) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8266 from yu-iskw/SPARK-10075.	2015-08-18 20:27:36 -07:00
Yuu ISHIKAWA	1968276af0	[SPARK-10007] [SPARKR] Update `NAMESPACE` file in SparkR for simple parameters functions ### JIRA [[SPARK-10007] Update `NAMESPACE` file in SparkR for simple parameters functions - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10007) Author: Yuu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8277 from yu-iskw/SPARK-10007.	2015-08-18 09:10:59 -07:00
Yu ISHIKAWA	26e760581f	[SPARK-9871] [SPARKR] Add expression functions into SparkR which have a variable parameter ### Summary - Add `lit` function - Add `concat`, `greatest`, `least` functions I think we need to improve `collect` function in order to implement `struct` function. Since `collect` doesn't work with arguments which includes a nested `list` variable. It seems that a list against `struct` still has `jobj` classes. So it would be better to solve this problem on another issue. ### JIRA [[SPARK-9871] Add expression functions into SparkR which have a variable parameter - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9871) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8194 from yu-iskw/SPARK-9856.	2015-08-16 23:33:20 -07:00
Hossein	712f5b7a9a	[SPARK-9318] [SPARK-9320] [SPARKR] Aliases for merge and summary functions on DataFrames This PR adds synonyms for ```merge``` and ```summary``` in SparkR DataFrame API. cc shivaram Author: Hossein <hossein@databricks.com> Closes #7806 from falaki/SPARK-9320 and squashes the following commits: 72600f7 [Hossein] Updated docs 92a6e75 [Hossein] Fixed merge generic signature issue 4c2b051 [Hossein] Fixing naming with mllib summary 0f3a64c [Hossein] Added ... to generic for merge 30fbaf8 [Hossein] Merged master ae1a4cf [Hossein] Merge branch 'master' into SPARK-9320 e8eb86f [Hossein] Add a generic for merge fc01f2d [Hossein] Added unit test 8d92012 [Hossein] Added merge as an alias for join 5b8bedc [Hossein] Added unit test 632693d [Hossein] Added summary as an alias for describe for DataFrame	2015-07-31 19:24:44 -07:00
Hossein	710c2b5dd2	[SPARK-9324] [SPARK-9322] [SPARK-9321] [SPARKR] Some aliases for R-like functions in DataFrames Adds following aliases: * unique (distinct) * rbind (unionAll): accepts many DataFrames * nrow (count) * ncol * dim * names (columns): along with the replacement function to change names Author: Hossein <hossein@databricks.com> Closes #7764 from falaki/sparkR-alias and squashes the following commits: 56016f5 [Hossein] Updated R documentation 5e4a4d0 [Hossein] Removed extra code f51cbef [Hossein] Merge branch 'master' into sparkR-alias c1b88bd [Hossein] Moved setGeneric and other comments applied d9307f8 [Hossein] Added tests b5aa988 [Hossein] Added dim, ncol, nrow, names, rbind, and unique functions to DataFrames	2015-07-31 14:08:18 -07:00
Eric Liang	e7905a9395	[SPARK-9463] [ML] Expose model coefficients with names in SparkR RFormula Preview: ``` > summary(m) features coefficients 1 (Intercept) 1.6765001 2 Sepal_Length 0.3498801 3 Species.versicolor -0.9833885 4 Species.virginica -1.0075104 ``` Design doc from umbrella task: https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit cc mengxr Author: Eric Liang <ekl@databricks.com> Closes #7771 from ericl/summary and squashes the following commits: ccd54c3 [Eric Liang] second pass a5ca93b [Eric Liang] comments 2772111 [Eric Liang] clean up 70483ef [Eric Liang] fix test 7c247d4 [Eric Liang] Merge branch 'master' into summary 3c55024 [Eric Liang] working 8c539aa [Eric Liang] first pass	2015-07-30 16:15:43 -07:00
Xiangrui Meng	2f5cbd860e	[SPARK-8364] [SPARKR] Add crosstab to SparkR DataFrames Add `crosstab` to SparkR DataFrames, which takes two column names and returns a local R data.frame. This is similar to `table` in R. However, `table` in SparkR is used for loading SQL tables as DataFrames. The return type is data.frame instead table for `crosstab` to be compatible with Scala/Python. I couldn't run R tests successfully on my local. Many unit tests failed. So let's try Jenkins. Author: Xiangrui Meng <meng@databricks.com> Closes #7318 from mengxr/SPARK-8364 and squashes the following commits: d75e894 [Xiangrui Meng] fix tests 53f6ddd [Xiangrui Meng] fix tests f1348d6 [Xiangrui Meng] update test 47cb088 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-8364 5621262 [Xiangrui Meng] first version without test	2015-07-22 21:40:23 -07:00
Eric Liang	1cbdd89918	[SPARK-9201] [ML] Initial integration of MLlib + SparkR using RFormula This exposes the SparkR:::glm() and SparkR:::predict() APIs. It was necessary to change RFormula to silently drop the label column if it was missing from the input dataset, which is kind of a hack but necessary to integrate with the Pipeline API. The umbrella design doc for MLlib + SparkR integration can be viewed here: https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit mengxr Author: Eric Liang <ekl@databricks.com> Closes #7483 from ericl/spark-8774 and squashes the following commits: 3dfac0c [Eric Liang] update 17ef516 [Eric Liang] more comments 1753a0f [Eric Liang] make glm generic b0f50f8 [Eric Liang] equivalence test 550d56d [Eric Liang] export methods c015697 [Eric Liang] second pass 117949a [Eric Liang] comments 5afbc67 [Eric Liang] test label columns 6b7f15f [Eric Liang] Fri Jul 17 14:20:22 PDT 2015 3a63ae5 [Eric Liang] Fri Jul 17 13:41:52 PDT 2015 ce61367 [Eric Liang] Fri Jul 17 13:41:17 PDT 2015 0299c59 [Eric Liang] Fri Jul 17 13:40:32 PDT 2015 e37603f [Eric Liang] Fri Jul 17 12:15:03 PDT 2015 d417d0c [Eric Liang] Merge remote-tracking branch 'upstream/master' into spark-8774 29a2ce7 [Eric Liang] Merge branch 'spark-8774-1' into spark-8774 d1959d2 [Eric Liang] clarify comment 2db68aa [Eric Liang] second round of comments dc3c943 [Eric Liang] address comments 5765ec6 [Eric Liang] fix style checks 1f361b0 [Eric Liang] doc d33211b [Eric Liang] r support fb0826b [Eric Liang] [SPARK-8774] Add R model formula with basic support as a transformer	2015-07-20 20:49:38 -07:00
Liang-Chi Hsieh	0a795336df	[SPARK-8807] [SPARKR] Add between operator in SparkR JIRA: https://issues.apache.org/jira/browse/SPARK-8807 Add between operator in SparkR. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #7356 from viirya/add_r_between and squashes the following commits: 7f51b44 [Liang-Chi Hsieh] Add test for non-numeric column. c6a25c5 [Liang-Chi Hsieh] Add between function.	2015-07-15 23:36:57 -07:00
Hossein	1fa29c2df2	[SPARK-8452] [SPARKR] expose jobGroup API in SparkR This pull request adds following methods to SparkR: ```R setJobGroup() cancelJobGroup() clearJobGroup() ``` For each method, the spark context is passed as the first argument. There does not seem to be a good way to test these in R. cc shivaram and davies Author: Hossein <hossein@databricks.com> Closes #6889 from falaki/SPARK-8452 and squashes the following commits: 9ce9f1e [Hossein] Added basic tests to verify methods can be called and won't throw errors c706af9 [Hossein] Added examples a2c19af [Hossein] taking spark context as first argument 343ca77 [Hossein] Added setJobGroup, cancelJobGroup and clearJobGroup to SparkR	2015-06-19 15:51:59 -07:00
Sun Rui	46576ab303	[SPARK-7227] [SPARKR] Support fillna / dropna in R DataFrame. Author: Sun Rui <rui.sun@intel.com> Closes #6183 from sun-rui/SPARK-7227 and squashes the following commits: dd6f5b3 [Sun Rui] Rename readEnv() back to readMap(). Add alias na.omit() for dropna(). 41cf725 [Sun Rui] [SPARK-7227][SPARKR] Support fillna / dropna in R DataFrame.	2015-05-31 15:01:59 -07:00
Shivaram Venkataraman	a40bca0111	[SPARK-6811] Copy SparkR lib in make-distribution.sh This change also remove native libraries from SparkR to make sure our distribution works across platforms Tested by building on Mac, running on Amazon Linux (CentOS), Windows VM and vice-versa (built on Linux run on Mac) I will also test this with YARN soon and update this PR. Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6373 from shivaram/sparkr-binary and squashes the following commits: ae41b5c [Shivaram Venkataraman] Remove native libraries from SparkR Also include the built SparkR package in make-distribution.sh	2015-05-23 00:04:01 -07:00
qhuang	50da9e8916	[SPARK-7226] [SPARKR] Support math functions in R DataFrame Author: qhuang <qian.huang@intel.com> Closes #6170 from hqzizania/master and squashes the following commits: f20c39f [qhuang] add tests units and fixes 2a7d121 [qhuang] use a function name more familiar to R users 07aa72e [qhuang] Support math functions in R DataFrame	2015-05-15 14:06:16 -07:00

1 2

59 commits