ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Herman van Hovell	1017327930	[SPARK-12848][SQL] Change parsed decimal literal datatype from Double to Decimal The current parser turns a decimal literal, for example ```12.1```, into a Double. The problem with this approach is that we convert an exact literal into a non-exact ```Double```. The PR changes this behavior, a Decimal literal is now converted into an extact ```BigDecimal```. The behavior for scientific decimals, for example ```12.1e01```, is unchanged. This will be converted into a Double. This PR replaces the ```BigDecimal``` literal by a ```Double``` literal, because the ```BigDecimal``` is the default now. You can use the double literal by appending a 'D' to the value, for instance: ```3.141527D``` cc davies rxin Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #10796 from hvanhovell/SPARK-12848.	2016-01-20 15:13:01 -08:00
felixcheung	488bbb216c	[SPARK-12232][SPARKR] New R API for read.table to avoid name conflict shivaram sorry it took longer to fix some conflicts, this is the change to add an alias for `table` Author: felixcheung <felixcheung_m@hotmail.com> Closes #10406 from felixcheung/readtable.	2016-01-19 18:31:03 -08:00
Sun Rui	3ac648289c	[SPARK-12337][SPARKR] Implement dropDuplicates() method of DataFrame in SparkR. Author: Sun Rui <rui.sun@intel.com> Closes #10309 from sun-rui/SPARK-12337.	2016-01-19 16:37:18 -08:00
felixcheung	37fefa66cb	[SPARK-12168][SPARKR] Add automated tests for conflicted function in R Currently this is reported when loading the SparkR package in R (probably would add is.nan) ``` Loading required package: methods Attaching package: ‘SparkR’ The following objects are masked from ‘package:stats’: cov, filter, lag, na.omit, predict, sd, var The following objects are masked from ‘package:base’: colnames, colnames<-, intersect, rank, rbind, sample, subset, summary, table, transform ``` Adding this test adds an automated way to track changes to masked method. Also, the second part of this test check for those functions that would not be accessible without namespace/package prefix. Incidentally, this might point to how we would fix those inaccessible functions in base or stats. Looking for feedback for adding this test. Author: felixcheung <felixcheung_m@hotmail.com> Closes #10171 from felixcheung/rmaskedtest.	2016-01-19 16:33:48 -08:00
felixcheung	92502703f4	[SPARK-12862][SPARKR] Jenkins does not run R tests Slight correction: I'm leaving sparkR as-is (ie. R file not supported) and fixed only run-tests.sh as shivaram described. I also assume we are going to cover all doc changes in https://issues.apache.org/jira/browse/SPARK-12846 instead of here. rxin shivaram zjffdu Author: felixcheung <felixcheung_m@hotmail.com> Closes #10792 from felixcheung/sparkRcmd.	2016-01-17 09:29:08 -08:00
Oscar D. Lara Yejas	ba4a641902	[SPARK-11031][SPARKR] Method str() on a DataFrame Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.usca.ibm.com> Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu> Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com> Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.attlocal.net> Closes #9613 from olarayej/SPARK-11031.	2016-01-15 07:37:54 -08:00
Wenchen Fan	962e9bcf94	[SPARK-12756][SQL] use hash expression in Exchange This PR makes bucketing and exchange share one common hash algorithm, so that we can guarantee the data distribution is same between shuffle and bucketed data source, which enables us to only shuffle one side when join a bucketed table and a normal one. This PR also fixes the tests that are broken by the new hash behaviour in shuffle. Author: Wenchen Fan <wenchen@databricks.com> Closes #10703 from cloud-fan/use-hash-expr-in-shuffle.	2016-01-13 22:43:28 -08:00
Yanbo Liang	3d77cffec0	[SPARK-12645][SPARKR] SparkR support hash function Add ```hash``` function for SparkR ```DataFrame```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10597 from yanboliang/spark-12645.	2016-01-09 12:29:51 +05:30
Yanbo Liang	d1fea41363	[SPARK-12393][SPARKR] Add read.text and write.text for SparkR Add ```read.text``` and ```write.text``` for SparkR. cc sun-rui felixcheung shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10348 from yanboliang/spark-12393.	2016-01-06 12:05:41 +05:30
felixcheung	cc4d5229c9	[SPARK-12625][SPARKR][SQL] replace R usage of Spark SQL deprecated API rxin davies shivaram Took save mode from my PR #10480, and move everything to writer methods. This is related to PR #10559 - [x] it seems jsonRDD() is broken, need to investigate - this is not a public API though; will look into some more tonight. (fixed) Author: felixcheung <felixcheung_m@hotmail.com> Closes #10584 from felixcheung/rremovedeprecated.	2016-01-04 22:32:07 -08:00
felixcheung	c3d505602d	[SPARK-12327][SPARKR] fix code for lintr warning for commented code shivaram Author: felixcheung <felixcheung_m@hotmail.com> Closes #10408 from felixcheung/rcodecomment.	2016-01-03 20:53:35 +05:30
Hossein	f6ecf14333	[SPARK-11199][SPARKR] Improve R context management story and add getOrCreate * Changes api.r.SQLUtils to use ```SQLContext.getOrCreate``` instead of creating a new context. * Adds a simple test [SPARK-11199] #comment link with JIRA Author: Hossein <hossein@databricks.com> Closes #9185 from falaki/SPARK-11199.	2015-12-29 11:44:20 -08:00
Forest Fang	d80cc90b55	[SPARK-12526][SPARKR] ifelse`,` when`,` otherwise` unable to take Column as value `ifelse`, `when`, `otherwise` is unable to take `Column` typed S4 object as values. For example: ```r ifelse(lit(1) == lit(1), lit(2), lit(3)) ifelse(df$mpg > 0, df$mpg, 0) ``` will both fail with ```r attempt to replicate an object of type 'environment' ``` The PR replaces `ifelse` calls with `if ... else ...` inside the function implementations to avoid attempt to vectorize(i.e. `rep()`). It remains to be discussed whether we should instead support vectorization in these functions for consistency because `ifelse` in base R is vectorized but I cannot foresee any scenarios these functions will want to be vectorized in SparkR. For reference, added test cases which trigger failures: ```r . Error: when(), otherwise() and ifelse() with column on a DataFrame ---------- error in evaluating the argument 'x' in selecting a method for function 'collect': error in evaluating the argument 'col' in selecting a method for function 'select': attempt to replicate an object of type 'environment' Calls: when -> when -> ifelse -> ifelse 1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage")) 2: eval(code, new_test_environment) 3: eval(expr, envir, enclos) 4: expect_equal(collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))[, 1], c(NA, 1)) at test_sparkSQL.R:1126 5: expect_that(object, equals(expected, label = expected.label, ...), info = info, label = label) 6: condition(object) 7: compare(actual, expected, ...) 8: collect(select(df, when(df$a > 1 & df$b > 2, lit(1)))) Error: Test failures Execution halted ``` Author: Forest Fang <forest.fang@outlook.com> Closes #10481 from saurfang/spark-12526.	2015-12-29 12:45:24 +05:30
Reynold Xin	f496031bd2	Bump master version to 2.0.0-SNAPSHOT. Author: Reynold Xin <rxin@databricks.com> Closes #10387 from rxin/version-bump.	2015-12-19 15:13:05 -08:00
Yanbo Liang	22f6cd86fc	[SPARK-12310][SPARKR] Add write.json and write.parquet for SparkR Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated ```saveAsParquetFile```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10281 from yanboliang/spark-12310.	2015-12-16 10:34:30 -08:00
Jeff Zhang	2eb5af5f0d	[SPARK-12318][SPARKR] Save mode in SparkR should be error by default shivaram Please help review. Author: Jeff Zhang <zjffdu@apache.org> Closes #10290 from zjffdu/SPARK-12318.	2015-12-16 10:32:32 -08:00
Shivaram Venkataraman	fb3778de68	[SPARK-12327] Disable commented code lintr temporarily cc yhuai felixcheung shaneknapp Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #10300 from shivaram/comment-lintr-disable.	2015-12-14 16:13:55 -08:00
gatorsmile	1e3526c2d3	[SPARK-12158][SPARKR][SQL] Fix 'sample' functions that break R unit test cases The existing sample functions miss the parameter `seed`, however, the corresponding function interface in `generics` has such a parameter. Thus, although the function caller can call the function with the 'seed', we are not using the value. This could cause SparkR unit tests failed. For example, I hit it in another PR: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull Author: gatorsmile <gatorsmile@gmail.com> Closes #10160 from gatorsmile/sampleR.	2015-12-11 20:55:16 -08:00
Yanbo Liang	0fb9825556	[SPARK-12146][SPARKR] SparkR jsonFile should support multiple input files * ```jsonFile``` should support multiple input files, such as: ```R jsonFile(sqlContext, c(“path1”, “path2”)) # character vector as arguments jsonFile(sqlContext, “path1,path2”) ``` * Meanwhile, ```jsonFile``` has been deprecated by Spark SQL and will be removed at Spark 2.0. So we mark ```jsonFile``` deprecated and use ```read.json``` at SparkR side. * Replace all ```jsonFile``` with ```read.json``` at test_sparkSQL.R, but still keep jsonFile test case. * If this PR is accepted, we should also make almost the same change for ```parquetFile```. cc felixcheung sun-rui shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10145 from yanboliang/spark-12146.	2015-12-11 11:47:35 -08:00
Yanbo Liang	d9d354ed40	[SPARK-12234][SPARKR] Fix ```subset`` `function error when only set` ``select``` argument Fix ```subset``` function error when only set ```select``` argument. Please refer to the [JIRA](https://issues.apache.org/jira/browse/SPARK-12234) about the error and how to reproduce it. cc sun-rui felixcheung shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10217 from yanboliang/spark-12234.	2015-12-10 10:18:58 -08:00
Yanbo Liang	eeb58722ad	[SPARK-12198][SPARKR] SparkR support read.parquet and deprecate parquetFile SparkR support ```read.parquet``` and deprecate ```parquetFile```. This change is similar with #10145 for ```jsonFile```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10191 from yanboliang/spark-12198.	2015-12-10 09:44:53 -08:00
Sun Rui	39d677c8f1	[SPARK-12034][SPARKR] Eliminate warnings in SparkR test cases. This PR: 1. Suppress all known warnings. 2. Cleanup test cases and fix some errors in test cases. 3. Fix errors in HiveContext related test cases. These test cases are actually not run previously due to a bug of creating TestHiveContext. 4. Support 'testthat' package version 0.11.0 which prefers that test cases be under 'tests/testthat' 5. Make sure the default Hadoop file system is local when running test cases. 6. Turn on warnings into errors. Author: Sun Rui <rui.sun@intel.com> Closes #10030 from sun-rui/SPARK-12034.	2015-12-07 10:38:17 -08:00
Yanbo Liang	b6e8e63a0d	[SPARK-12044][SPARKR] Fix usage of isnan, isNaN 1, Add ```isNaN``` to ```Column``` for SparkR. ```Column``` should has three related variable functions: ```isNaN, isNull, isNotNull```. 2, Replace ```DataFrame.isNaN``` with ```DataFrame.isnan``` at SparkR side. Because ```DataFrame.isNaN``` has been deprecated and will be removed at Spark 2.0. <del>3, Add ```isnull``` to ```DataFrame``` for SparkR. ```DataFrame``` should has two related functions: ```isnan, isnull```.<del> cc shivaram sun-rui felixcheung Author: Yanbo Liang <ybliang8@gmail.com> Closes #10037 from yanboliang/spark-12044.	2015-12-05 22:51:05 -08:00
Yanbo Liang	6979edf4e1	[SPARK-12115][SPARKR] Change numPartitions() to getNumPartitions() to be consistent with Scala/Python Change ```numPartitions()``` to ```getNumPartitions()``` to be consistent with Scala/Python. <del>Note: If we can not catch up with 1.6 release, it will be breaking change for 1.7 that we also need to explain in release note.<del> cc sun-rui felixcheung shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10123 from yanboliang/spark-12115.	2015-12-05 16:39:01 -08:00
felixcheung	895b6c4747	[SPARK-11715][SPARKR] Add R support corr for Column Aggregration Need to match existing method signature Author: felixcheung <felixcheung_m@hotmail.com> Closes #9680 from felixcheung/rcorr.	2015-12-05 16:00:12 -08:00
Sun Rui	c8d0e160da	[SPARK-11774][SPARKR] Implement struct(), encode(), decode() functions in SparkR. Author: Sun Rui <rui.sun@intel.com> Closes #9804 from sun-rui/SPARK-11774.	2015-12-05 15:49:51 -08:00
Sun Rui	5011f264fb	[SPARK-12104][SPARKR] collect() does not handle multiple columns with same name. Author: Sun Rui <rui.sun@intel.com> Closes #10118 from sun-rui/SPARK-12104.	2015-12-03 21:11:10 -08:00
felixcheung	2213441e5e	[SPARK-12019][SPARKR] Support character vector for sparkR.init(), check param and fix doc and add tests. Spark submit expects comma-separated list Author: felixcheung <felixcheung_m@hotmail.com> Closes #10034 from felixcheung/sparkrinitdoc.	2015-12-03 13:25:20 -08:00
Sun Rui	cc7a1bc937	[SPARK-11781][SPARKR] SparkR has problem in inferring type of raw type. Author: Sun Rui <rui.sun@intel.com> Closes #9769 from sun-rui/SPARK-11781.	2015-11-29 11:08:26 -08:00
felixcheung	c793d2d9a1	[SPARK-9319][SPARKR] Add support for setting column names, types Add support for for colnames, colnames<-, coltypes<- Also added tests for names, names<- which have no test previously. I merged with PR 8984 (coltypes). Clicked the wrong thing, crewed up the PR. Recreated it here. Was #9218 shivaram sun-rui Author: felixcheung <felixcheung_m@hotmail.com> Closes #9654 from felixcheung/colnamescoltypes.	2015-11-28 21:16:21 -08:00
felixcheung	28e46ab463	[SPARK-12029][SPARKR] Improve column functions signature, param check, tests, fix doc and add examples shivaram sun-rui Author: felixcheung <felixcheung_m@hotmail.com> Closes #10019 from felixcheung/rfunctionsdoc.	2015-11-28 21:02:05 -08:00
Yanbo Liang	ba02f6cb5a	[SPARK-12025][SPARKR] Rename some window rank function names for SparkR Change ```cumeDist -> cume_dist, denseRank -> dense_rank, percentRank -> percent_rank, rowNumber -> row_number``` at SparkR side. There are two reasons that we should make this change: * We should follow the [naming convention rule of R](http://www.inside-r.org/node/230645) * Spark DataFrame has deprecated the old convention (such as ```cumeDist```) and will remove it in Spark 2.0. It's better to fix this issue before 1.6 release, otherwise we will make breaking API change. cc shivaram sun-rui Author: Yanbo Liang <ybliang8@gmail.com> Closes #10016 from yanboliang/SPARK-12025.	2015-11-27 11:48:01 -08:00
felixcheung	a6239d587c	[SPARK-11756][SPARKR] Fix use of aliases - SparkR can not output help information for SparkR:::summary correctly Fix use of aliases and changes uses of rdname and seealso `aliases` is the hint for `?` - it should not be linked to some other name - those should be seealso https://cran.r-project.org/web/packages/roxygen2/vignettes/rd.html Clean up usage on family, as multiple use of family with the same rdname is causing duplicated See Also html blocks (like http://spark.apache.org/docs/latest/api/R/count.html) Also changing some rdname for dplyr-like variant for better R user visibility in R doc, eg. rbind, summary, mutate, summarize shivaram yanboliang Author: felixcheung <felixcheung_m@hotmail.com> Closes #9750 from felixcheung/rdocaliases.	2015-11-20 15:10:55 -08:00
felixcheung	1a93323c5b	[SPARK-11339][SPARKR] Document the list of functions in R base package that are masked by functions with same name in SparkR Added tests for function that are reported as masked, to make sure the base:: or stats:: function can be called. For those we can't call, added them to SparkR programming guide. It would seem to me `table, sample, subset, filter, cov` not working are not actually expected - I investigated/experimented with them but couldn't get them to work. It looks like as they are defined in base or stats they are missing the S3 generic, eg. ``` > methods("transform") [1] transform,ANY-method transform.data.frame [3] transform,DataFrame-method transform.default see '?methods' for accessing help and source code > methods("subset") [1] subset.data.frame subset,DataFrame-method subset.default [4] subset.matrix see '?methods' for accessing help and source code Warning message: In .S3methods(generic.function, class, parent.frame()) : function 'subset' appears not to be S3 generic; found functions that look like S3 methods ``` Any idea? More information on masking: http://www.ats.ucla.edu/stat/r/faq/referencing_objects.htm http://www.sfu.ca/~sweldon/howTo/guide4.pdf This is what the output doc looks like (minus css): ![image](https://cloud.githubusercontent.com/assets/8969467/11229714/2946e5de-8d4d-11e5-94b0-dda9696b6fdd.png) Author: felixcheung <felixcheung_m@hotmail.com> Closes #9785 from felixcheung/rmasked.	2015-11-18 23:32:49 -08:00
Yanbo Liang	e222d75849	[SPARK-11684][R][ML][DOC] Update SparkR glm API doc, user guide and example codes This PR includes: * Update SparkR:::glm, SparkR:::summary API docs. * Update SparkR machine learning user guide and example codes to show: * supporting feature interaction in R formula. * summary for gaussian GLM model. * coefficients for binomial GLM model. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9727 from yanboliang/spark-11684.	2015-11-18 13:30:29 -08:00
Sun Rui	224723e6a8	[SPARK-11773][SPARKR] Implement collection functions in SparkR. Author: Sun Rui <rui.sun@intel.com> Closes #9764 from sun-rui/SPARK-11773.	2015-11-18 08:41:45 -08:00
zero323	a97d6f3a58	[SPARK-11281][SPARKR] Add tests covering the issue. The goal of this PR is to add tests covering the issue to ensure that is was resolved by [SPARK-11086](https://issues.apache.org/jira/browse/SPARK-11086). Author: zero323 <matthew.szymkiewicz@gmail.com> Closes #9743 from zero323/SPARK-11281-tests.	2015-11-18 08:32:03 -08:00
Yanbo Liang	8fb775ba87	[SPARK-11755][R] SparkR should export "predict" The bug described at [SPARK-11755](https://issues.apache.org/jira/browse/SPARK-11755), after exporting ```predict``` we can both get the help information from the SparkR and base R package like the following: ```Java > help(predict) Help on topic ‘predict’ was found in the following packages: Package Library SparkR /Users/yanboliang/data/trunk2/spark/R/lib stats /Library/Frameworks/R.framework/Versions/3.2/Resources/library Choose one 1: Make predictions from a model {SparkR} 2: Model Predictions {stats} ``` Author: Yanbo Liang <ybliang8@gmail.com> Closes #9732 from yanboliang/spark-11755.	2015-11-17 22:13:15 -08:00
Sun Rui	835a79d78e	[SPARK-10500][SPARKR] sparkr.zip cannot be created if /R/lib is unwritable The basic idea is that: The archive of the SparkR package itself, that is sparkr.zip, is created during build process and is contained in the Spark binary distribution. No change to it after the distribution is installed as the directory it resides ($SPARK_HOME/R/lib) may not be writable. When there is R source code contained in jars or Spark packages specified with "--jars" or "--packages" command line option, a temporary directory is created by calling Utils.createTempDir() where the R packages built from the R source code will be installed. The temporary directory is writable, and won't interfere with each other when there are multiple SparkR sessions, and will be deleted when this SparkR session ends. The R binary packages installed in the temporary directory then are packed into an archive named rpkg.zip. sparkr.zip and rpkg.zip are distributed to the cluster in YARN modes. The distribution of rpkg.zip in Standalone modes is not supported in this PR, and will be address in another PR. Various R files are updated to accept multiple lib paths (one is for SparkR package, the other is for other R packages) so that these package can be accessed in R. Author: Sun Rui <rui.sun@intel.com> Closes #9390 from sun-rui/SPARK-10500.	2015-11-15 19:29:09 -08:00
zero323	d7d9fa0b87	[SPARK-11086][SPARKR] Use dropFactors column-wise instead of nested loop when createDataFrame Use `dropFactors` column-wise instead of nested loop when `createDataFrame` from a `data.frame` At this moment SparkR createDataFrame is using nested loop to convert factors to character when called on a local data.frame. It works but is incredibly slow especially with data.table (~ 2 orders of magnitude compared to PySpark / Pandas version on a DateFrame of size 1M rows x 2 columns). A simple improvement is to apply `dropFactor `column-wise and then reshape output list. It should at least partially address [SPARK-8277](https://issues.apache.org/jira/browse/SPARK-8277). Author: zero323 <matthew.szymkiewicz@gmail.com> Closes #9099 from zero323/SPARK-11086.	2015-11-15 19:15:27 -08:00
felixcheung	ed04846e14	[SPARK-11263][SPARKR] lintr Throws Warnings on Commented Code in Documentation Clean out hundreds of `style: Commented code should be removed.` from lintr Like these: ``` /opt/spark-1.6.0-bin-hadoop2.6/R/pkg/R/DataFrame.R:513:3: style: Commented code should be removed. # sc <- sparkR.init() ^~~~~~~~~~~~~~~~~~~ /opt/spark-1.6.0-bin-hadoop2.6/R/pkg/R/DataFrame.R:514:3: style: Commented code should be removed. # sqlContext <- sparkRSQL.init(sc) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /opt/spark-1.6.0-bin-hadoop2.6/R/pkg/R/DataFrame.R:515:3: style: Commented code should be removed. # path <- "path/to/file.json" ^~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` tried without export or rdname, neither work instead, added this `#' noRd` to suppress .Rd file generation also updated `family` for DataFrame functions for longer descriptive text instead of `dataframe_funcs` ![image](https://cloud.githubusercontent.com/assets/8969467/10933937/17bf5b1e-8291-11e5-9777-40fc632105dc.png) this covers most of 'Commented code' but I left out a few that looks legitimate. Author: felixcheung <felixcheung_m@hotmail.com> Closes #9463 from felixcheung/rlintr.	2015-11-12 20:02:49 -08:00
JihongMa	d292f74831	[SPARK-11420] Updating Stddev support via Imperative Aggregate switched stddev support from DeclarativeAggregate to ImperativeAggregate. Author: JihongMa <linlin200605@gmail.com> Closes #9380 from JihongMA/SPARK-11420.	2015-11-12 13:47:34 -08:00
felixcheung	1a8e0468a1	[SPARK-11468] [SPARKR] add stddev/variance agg functions for Column Checked names, none of them should conflict with anything in base shivaram davies rxin Author: felixcheung <felixcheung_m@hotmail.com> Closes #9489 from felixcheung/rstddev.	2015-11-10 22:45:17 -08:00
Yanbo Liang	f14e95115c	[ML][R] SparkR::glm summary result to compare with native R Follow up #9561. Due to [SPARK-11587](https://issues.apache.org/jira/browse/SPARK-11587) has been fixed, we should compare SparkR::glm summary result with native R output rather than hard-code one. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9590 from yanboliang/glm-r-test.	2015-11-10 11:34:36 -08:00
Oscar D. Lara Yejas	47735cdc2a	[SPARK-10863][SPARKR] Method coltypes() (New version) This is a follow up on PR #8984, as the corresponding branch for such PR was damaged. Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu> Closes #9579 from olarayej/SPARK-10863_NEW14.	2015-11-10 11:07:57 -08:00
Yin Huai	e0701c7560	[SPARK-9830][SQL] Remove AggregateExpression1 and Aggregate Operator used to evaluate AggregateExpression1s https://issues.apache.org/jira/browse/SPARK-9830 This PR contains the following main changes. * Removing `AggregateExpression1`. * Removing `Aggregate` operator, which is used to evaluate `AggregateExpression1`. * Removing planner rule used to plan `Aggregate`. * Linking `MultipleDistinctRewriter` to analyzer. * Renaming `AggregateExpression2` to `AggregateExpression` and `AggregateFunction2` to `AggregateFunction`. * Updating places where we create aggregate expression. The way to create aggregate expressions is `AggregateExpression(aggregateFunction, mode, isDistinct)`. * Changing `val`s in `DeclarativeAggregate`s that touch children of this function to `lazy val`s (when we create aggregate expression in DataFrame API, children of an aggregate function can be unresolved). Author: Yin Huai <yhuai@databricks.com> Closes #9556 from yhuai/removeAgg1.	2015-11-10 11:06:29 -08:00
Shivaram Venkataraman	c4e19b3819	[SPARK-11587][SPARKR] Fix the summary generic to match base R The signature is summary(object, ...) as defined in https://stat.ethz.ch/R-manual/R-devel/library/base/html/summary.html Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #9582 from shivaram/summary-fix.	2015-11-09 21:06:01 -08:00
felixcheung	cd174882a5	[SPARK-9865][SPARKR] Flaky SparkR test: test_sparkSQL.R: sample on a DataFrame Make sample test less flaky by setting the seed Tested with ``` repeat { if (count(sample(df, FALSE, 0.1)) == 3) { break } } ``` Author: felixcheung <felixcheung_m@hotmail.com> Closes #9549 from felixcheung/rsample.	2015-11-09 10:26:09 -08:00
Yanbo Liang	8c0e1b50e9	[SPARK-11494][ML][R] Expose R-like summary statistics in SparkR::glm for linear regression Expose R-like summary statistics in SparkR::glm for linear regression, the output of ```summary``` like ```Java $DevianceResiduals Min Max -0.9509607 0.7291832 $Coefficients Estimate Std. Error t value Pr(>\|t\|) (Intercept) 1.6765 0.2353597 7.123139 4.456124e-11 Sepal_Length 0.3498801 0.04630128 7.556598 4.187317e-12 Species_versicolor -0.9833885 0.07207471 -13.64402 0 Species_virginica -1.00751 0.09330565 -10.79796 0 ``` Author: Yanbo Liang <ybliang8@gmail.com> Closes #9561 from yanboliang/spark-11494.	2015-11-09 08:56:22 -08:00
Imran Rashid	49f1a82037	[SPARK-10116][CORE] XORShiftRandom.hashSeed is random in high bits https://issues.apache.org/jira/browse/SPARK-10116 This is really trivial, just happened to notice it -- if `XORShiftRandom.hashSeed` is really supposed to have random bits throughout (as the comment implies), it needs to do something for the conversion to `long`. mengxr mkolod Author: Imran Rashid <irashid@cloudera.com> Closes #8314 from squito/SPARK-10116.	2015-11-06 20:06:24 +00:00
Davies Liu	2440106242	[SPARK-11542] [SPARKR] fix glm with long fomular Because deparse() will break the long string into multiple lines, the deserialization will fail Author: Davies Liu <davies@databricks.com> Closes #9510 from davies/fix_glm.	2015-11-05 16:34:10 -08:00
adrian555	b9455d1f18	[SPARK-11260][SPARKR] with() function support Author: adrian555 <wzhuang@us.ibm.com> Author: Adrian Zhuang <adrian555@users.noreply.github.com> Closes #9443 from adrian555/with.	2015-11-05 14:47:38 -08:00
Yanbo Liang	e328b69c31	[SPARK-9492][ML][R] LogisticRegression in R should provide model statistics Like ml ```LinearRegression```, ```LogisticRegression``` should provide a training summary including feature names and their coefficients. Author: Yanbo Liang <ybliang8@gmail.com> Closes #9303 from yanboliang/spark-9492.	2015-11-04 08:28:33 -08:00
lewuathe	d648a4ad54	[DOC] Missing link to R DataFrame API doc Author: lewuathe <lewuathe@me.com> Author: Lewuathe <lewuathe@me.com> Closes #9394 from Lewuathe/missing-link-to-R-dataframe.	2015-11-03 16:38:22 -08:00
vectorijk	c020f7d9d4	[SPARK-10592] [ML] [PySpark] Deprecate weights and use coefficients instead in ML models Deprecated in `LogisticRegression` and `LinearRegression` Author: vectorijk <jiangkai@gmail.com> Closes #9311 from vectorijk/spark-10592.	2015-11-02 16:12:04 -08:00
felixcheung	bb5a2af034	[SPARK-11340][SPARKR] Support setting driver properties when starting Spark from R programmatically or from RStudio Mapping spark.driver.memory from sparkEnvir to spark-submit commandline arguments. shivaram suggested that we possibly add other spark.driver.* properties - do we want to add all of those? I thought those could be set in SparkConf? sun-rui Author: felixcheung <felixcheung_m@hotmail.com> Closes #9290 from felixcheung/rdrivermem.	2015-10-30 13:51:32 -07:00
Sun Rui	40c77fb23a	[SPARK-11210][SPARKR] Add window functions into SparkR [step 2]. Author: Sun Rui <rui.sun@intel.com> Closes #9196 from sun-rui/SPARK-11210.	2015-10-30 10:56:06 -07:00
felixcheung	d89be0bf81	[SPARK-11409][SPARKR] Enable url link in R doc for Persist Quick one line doc fix link is not clickable ![image](https://cloud.githubusercontent.com/assets/8969467/10833041/4e91dd7c-7e4c-11e5-8905-713b986dbbde.png) shivaram Author: felixcheung <felixcheung_m@hotmail.com> Closes #9363 from felixcheung/rpersistdoc.	2015-10-29 21:01:10 -07:00
Yanbo Liang	fba9e95452	[SPARK-11369][ML][R] SparkR glm should support setting standardize SparkR glm currently support : ```formula, family = c(“gaussian”, “binomial”), data, lambda = 0, alpha = 0``` We should also support setting standardize which has been defined at [design documentation](https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit) Author: Yanbo Liang <ybliang8@gmail.com> Closes #9331 from yanboliang/spark-11369.	2015-10-28 08:50:21 -07:00
Sun Rui	dc3220ce11	[SPARK-11209][SPARKR] Add window functions into SparkR [step 1]. Author: Sun Rui <rui.sun@intel.com> Closes #9193 from sun-rui/SPARK-11209.	2015-10-26 20:58:18 -07:00
Narine Kokhlikyan	3689beb98b	[SPARK-10979][SPARKR] Sparkrmerge: Add merge to DataFrame with R signature Add merge function to DataFrame, which supports R signature. https://stat.ethz.ch/R-manual/R-devel/library/base/html/merge.html Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com> Closes #9012 from NarineK/sparkrmerge.	2015-10-26 15:12:25 -07:00
felixcheung	5e45812501	[SPARK-11294][SPARKR] Improve R doc for read.df, write.df, saveAsTable Add examples for read.df, write.df; fix grouping for read.df, loadDF; fix formatting and text truncation for write.df, saveAsTable. Several text issues: ![image](https://cloud.githubusercontent.com/assets/8969467/10708590/1303a44e-79c3-11e5-854f-3a2e16854cd7.png) - text collapsed into a single paragraph - text truncated at 2 places, eg. "overwrite: Existing data is expected to be overwritten by the contents of error:" shivaram Author: felixcheung <felixcheung_m@hotmail.com> Closes #9261 from felixcheung/rdocreadwritedf.	2015-10-23 21:42:00 -07:00
Forest Fang	94e2064fa1	[SPARK-11244][SPARKR] sparkR.stop() should remove SQLContext SparkR should remove `.sparkRSQLsc` and `.sparkRHivesc` when `sparkR.stop()` is called. Otherwise even when SparkContext is reinitialized, `sparkRSQL.init` returns the stale copy of the object and complains: ```r sc <- sparkR.init("local") sqlContext <- sparkRSQL.init(sc) sparkR.stop() sc <- sparkR.init("local") sqlContext <- sparkRSQL.init(sc) sqlContext ``` producing ```r Error in callJMethod(x, "getClass") : Invalid jobj 1. If SparkR was restarted, Spark operations need to be re-executed. ``` I have added the check and removal only when SparkContext itself is initialized. I have also added corresponding test for this fix. Let me know if you want me to move the test to SQL test suite instead. p.s. I tried lint-r but ended up a lots of errors on existing code. Author: Forest Fang <forest.fang@outlook.com> Closes #9205 from saurfang/sparkR.stop.	2015-10-22 09:34:07 -07:00
Davies Liu	f8c6bec657	[SPARK-11197][SQL] run SQL on files directly This PR introduce a new feature to run SQL directly on files without create a table, for example: ``` select id from json.`path/to/json/files` as j ``` Author: Davies Liu <davies@databricks.com> Closes #9173 from davies/source.	2015-10-21 13:38:30 -07:00
felixcheung	1107bd958a	[SPARK-11221][SPARKR] fix R doc for lit and add examples Currently the documentation for `lit` is inconsistent with doc format, references "Scala symbol" and has no example. Fixing that. shivaram Author: felixcheung <felixcheung_m@hotmail.com> Closes #9187 from felixcheung/rlit.	2015-10-20 21:40:22 -07:00
lewuathe	4c33a34ba3	[SPARK-10668] [ML] Use WeightedLeastSquares in LinearRegression with L… …2 regularization if the number of features is small Author: lewuathe <lewuathe@me.com> Author: Lewuathe <sasaki@treasure-data.com> Author: Kai Sasaki <sasaki@treasure-data.com> Author: Lewuathe <lewuathe@me.com> Closes #8884 from Lewuathe/SPARK-10668.	2015-10-19 10:46:10 -07:00
Sun Rui	390b22fad6	[SPARK-10996] [SPARKR] Implement sampleBy() in DataFrameStatFunctions. Author: Sun Rui <rui.sun@intel.com> Closes #9023 from sun-rui/SPARK-10996.	2015-10-13 22:31:23 -07:00
Monica Liu	8b32885704	[SPARK-10981] [SPARKR] SparkR Join improvements I was having issues with collect() and orderBy() in Spark 1.5.0 so I used the DataFrame.R file and test_sparkSQL.R file from the Spark 1.5.1 download. I only modified the join() function in DataFrame.R to include "full", "fullouter", "left", "right", and "leftsemi" and added corresponding test cases in the test for join() and merge() in test_sparkSQL.R file. Pull request because I filed this JIRA bug report: https://issues.apache.org/jira/browse/SPARK-10981 Author: Monica Liu <liu.monica.f@gmail.com> Closes #9029 from mfliu/master.	2015-10-13 22:24:52 -07:00
Adrian Zhuang	f7f28ee7a5	[SPARK-10913] [SPARKR] attach() function support Bring the change code up to date. Author: Adrian Zhuang <adrian555@users.noreply.github.com> Author: adrian555 <wzhuang@us.ibm.com> Closes #9031 from adrian555/attach2.	2015-10-13 10:21:07 -07:00
Narine Kokhlikyan	1e0aba90b9	[SPARK-10888] [SPARKR] Added as.DataFrame as a synonym to createDataFrame as.DataFrame is more a R-style like signature. Also, I'd like to know if we could make the context, e.g. sqlContext global, so that we do not have to specify it as an argument, when we each time create a dataframe. Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com> Closes #8952 from NarineK/sparkrasDataFrame.	2015-10-13 10:09:05 -07:00
Sun Rui	5e3868ba13	[SPARK-10051] [SPARKR] Support collecting data of StructType in DataFrame Two points in this PR: 1. Originally thought was that a named R list is assumed to be a struct in SerDe. But this is problematic because some R functions will implicitly generate named lists that are not intended to be a struct when transferred by SerDe. So SerDe clients have to explicitly mark a names list as struct by changing its class from "list" to "struct". 2. SerDe is in the Spark Core module, and data of StructType is represented as GenricRow which is defined in Spark SQL module. SerDe can't import GenricRow as in maven build Spark SQL module depends on Spark Core module. So this PR adds a registration hook in SerDe to allow SQLUtils in Spark SQL module to register its functions for serialization and deserialization of StructType. Author: Sun Rui <rui.sun@intel.com> Closes #8794 from sun-rui/SPARK-10051.	2015-10-13 10:02:21 -07:00
Sun Rui	864de3bf40	[SPARK-10079] [SPARKR] Make 'column' and 'col' functions be S4 functions. 1. Add a "col" function into DataFrame. 2. Move the current "col" function in Column.R to functions.R, convert it to S4 function. 3. Add a s4 "column" function in functions.R. 4. Convert the "column" function in Column.R to S4 function. This is for private use. Author: Sun Rui <rui.sun@intel.com> Closes #8864 from sun-rui/SPARK-10079.	2015-10-09 23:05:38 -07:00
Rerngvit Yanggratoke	70f44ad2d8	[SPARK-10905] [SPARKR] Export freqItems() for DataFrameStatFunctions [SPARK-10905][SparkR]: Export freqItems() for DataFrameStatFunctions - Add function (together with roxygen2 doc) to DataFrame.R and generics.R - Expose the function in NAMESPACE - Add unit test for the function Author: Rerngvit Yanggratoke <rerngvit@kth.se> Closes #8962 from rerngvit/SPARK-10905.	2015-10-09 09:36:40 -07:00
Narine Kokhlikyan	e8f90d9dda	[SPARK-10836] [SPARKR] Added sort(x, decreasing, col, ... ) method to DataFrame the sort function can be used as an alternative to arrange(... ). As arguments it accepts x - dataframe, decreasing - TRUE/FALSE, a list of orderings for columns and the list of columns, represented as string names for example: sort(df, TRUE, "col1","col2","col3","col5") # for example, if we want to sort some of the columns in the same order sort(df, decreasing=TRUE, "col1") sort(df, decreasing=c(TRUE,FALSE), "col1","col2") Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com> Closes #8920 from NarineK/sparkrsort.	2015-10-08 09:53:44 -07:00
Sun Rui	f57c63d4c3	[SPARK-10752] [SPARKR] Implement corr() and cov in DataFrameStatFunctions. Author: Sun Rui <rui.sun@intel.com> Closes #8869 from sun-rui/SPARK-10752.	2015-10-07 09:46:37 -07:00
felixcheung	721e8b5f35	[SPARK-10904] [SPARKR] Fix to support `select(df, c("col1", "col2"))` The fix is to coerce `c("a", "b")` into a list such that it could be serialized to call JVM with. Author: felixcheung <felixcheung_m@hotmail.com> Closes #8961 from felixcheung/rselect.	2015-10-03 22:42:36 -07:00
Oscar D. Lara Yejas	f21e2da03f	[SPARK-10807] [SPARKR] Added as.data.frame as a synonym for collect Created method as.data.frame as a synonym for collect(). Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu> Author: olarayej <oscar.lara.yejas@us.ibm.com> Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com> Closes #8908 from olarayej/SPARK-10807.	2015-09-30 18:03:31 -07:00
Narine Kokhlikyan	6fcee906d2	[SPARK-10760] [SPARKR] SparkR glm: the documentation in examples - family argument is missing Hi everyone, Since the family argument is required for the glm function, the execution of: model <- glm(Sepal_Length ~ Sepal_Width, df) is failing. I've fixed the documentation by adding the family argument and also added the summay(model) which will show the coefficients for the model. Thanks, Narine Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com> Closes #8870 from NarineK/sparkrml.	2015-09-25 11:55:08 -07:00
Eric Liang	922338812c	[SPARK-9681] [ML] Support R feature interactions in RFormula This integrates the Interaction feature transformer with SparkR R formula support (i.e. support `:`). To generate reasonable ML attribute names for feature interactions, it was necessary to add the ability to read attribute the original attribute names back from `StructField`, and also to specify custom group prefixes in `VectorAssembler`. This also has the side-benefit of cleaning up the double-underscores in the attributes generated for non-interaction terms. mengxr Author: Eric Liang <ekl@databricks.com> Closes #8830 from ericl/interaction-2.	2015-09-25 00:43:22 -07:00
Sun Rui	896edb51ab	[SPARK-10050] [SPARKR] Support collecting data of MapType in DataFrame. 1. Support collecting data of MapType from DataFrame. 2. Support data of MapType in createDataFrame. Author: Sun Rui <rui.sun@intel.com> Closes #8711 from sun-rui/SPARK-10050.	2015-09-16 13:20:39 -07:00
Reynold Xin	09b7e7c198	Update version to 1.6.0-SNAPSHOT. Author: Reynold Xin <rxin@databricks.com> Closes #8350 from rxin/1.6.	2015-09-15 00:54:20 -07:00
JihongMa	f4a22808e0	[SPARK-6548] Adding stddev to DataFrame functions Adding STDDEV support for DataFrame using 1-pass online /parallel algorithm to compute variance. Please review the code change. Author: JihongMa <linlin200605@gmail.com> Author: Jihong MA <linlin200605@gmail.com> Author: Jihong MA <jihongma@jihongs-mbp.usca.ibm.com> Author: Jihong MA <jihongma@Jihongs-MacBook-Pro.local> Closes #6297 from JihongMA/SPARK-SQL.	2015-09-12 10:17:15 -07:00
Sun Rui	45e3be5c13	[SPARK-10049] [SPARKR] Support collecting data of ArraryType in DataFrame. this PR : 1. Enhance reflection in RBackend. Automatically matching a Java array to Scala Seq when finding methods. Util functions like seq(), listToSeq() in R side can be removed, as they will conflict with the Serde logic that transferrs a Scala seq to R side. 2. Enhance the SerDe to support transferring a Scala seq to R side. Data of ArrayType in DataFrame after collection is observed to be of Scala Seq type. 3. Support ArrayType in createDataFrame(). Author: Sun Rui <rui.sun@intel.com> Closes #8458 from sun-rui/SPARK-10049.	2015-09-10 12:21:13 -07:00
Shivaram Venkataraman	143e521d1a	[MINOR] Minor style fix in SparkR `dev/lintr-r` passes on my machine now Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #8601 from shivaram/sparkr-style-fix.	2015-09-04 11:24:47 -07:00
CHOIJAEHONG	af0e3125cb	[SPARK-8951] [SPARKR] support Unicode characters in collect() Spark gives an error message and does not show the output when a field of the result DataFrame contains characters in CJK. I changed SerDe.scala in order that Spark support Unicode characters when writes a string to R. Author: CHOIJAEHONG <redrock07@naver.com> Closes #7494 from CHOIJAEHONG1/SPARK-8951.	2015-09-03 13:38:26 -07:00
felixcheung	2a4e00ca4d	[SPARK-9803] [SPARKR] Add subset and transform + tests Add subset and transform Also reorganize `[` & `[[` to subset instead of select Note: for transform, transform is very similar to mutate. Spark doesn't seem to replace existing column with the name in mutate (ie. `mutate(df, age = df$age + 2)` - returned DataFrame has 2 columns with the same name 'age'), so therefore not doing that for now in transform. Though it is clearly stated it should replace column with matching name (should I open a JIRA for mutate/transform?) Author: felixcheung <felixcheung_m@hotmail.com> Closes #8503 from felixcheung/rsubset_transform.	2015-08-28 18:35:01 -07:00
Luciano Resende	499e8e154b	[SPARK-8952] [SPARKR] - Wrap normalizePath calls with suppressWarnings This is based on davies comment on SPARK-8952 which suggests to only call normalizePath() when path starts with '~' Author: Luciano Resende <lresende@apache.org> Closes #8343 from lresende/SPARK-8952.	2015-08-28 09:13:21 -07:00
Shivaram Venkataraman	2f99c37273	[SPARK-10328] [SPARKR] Fix generic for na.omit S3 function is at https://stat.ethz.ch/R-manual/R-patched/library/stats/html/na.fail.html Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Author: Shivaram Venkataraman <shivaram.venkataraman@gmail.com> Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8495 from shivaram/na-omit-fix.	2015-08-28 00:37:50 -07:00
Shivaram Venkataraman	e936cf8088	[SPARK-10219] [SPARKR] Fix varargsToEnv and add test case cc sun-rui davies Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #8475 from shivaram/varargs-fix.	2015-08-26 22:27:31 -07:00
Yu ISHIKAWA	773ca037a4	[MINOR] [SPARKR] Fix some validation problems in SparkR Getting rid of some validation problems in SparkR https://github.com/apache/spark/pull/7883 cc shivaram ``` inst/tests/test_Serde.R:26:1: style: Trailing whitespace is superfluous. ^~ inst/tests/test_Serde.R:34:1: style: Trailing whitespace is superfluous. ^~ inst/tests/test_Serde.R:37:38: style: Trailing whitespace is superfluous. expect_equal(class(x), "character") ^~ inst/tests/test_Serde.R:50:1: style: Trailing whitespace is superfluous. ^~ inst/tests/test_Serde.R:55:1: style: Trailing whitespace is superfluous. ^~ inst/tests/test_Serde.R:60:1: style: Trailing whitespace is superfluous. ^~ inst/tests/test_sparkSQL.R:611:1: style: Trailing whitespace is superfluous. ^~ R/DataFrame.R:664:1: style: Trailing whitespace is superfluous. ^~~~~~~~~~~~~~ R/DataFrame.R:670:55: style: Trailing whitespace is superfluous. df <- data.frame(row.names = 1 : nrow) ^~~~~~~~~~~~~~~~ R/DataFrame.R:672:1: style: Trailing whitespace is superfluous. ^~~~~~~~~~~~~~ R/DataFrame.R:686:49: style: Trailing whitespace is superfluous. df[[names[colIndex]]] <- vec ^~~~~~~~~~~~~~~~~~ ``` Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8474 from yu-iskw/minor-fix-sparkr.	2015-08-26 18:14:32 -07:00
Shivaram Venkataraman	ad7f0f160b	[SPARK-10308] [SPARKR] Add %in% to the exported namespace I also checked all the other functions defined in column.R, functions.R and DataFrame.R and everything else looked fine. cc yu-iskw Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #8473 from shivaram/in-namespace.	2015-08-26 18:13:07 -07:00
felixcheung	75d4773aa5	[SPARK-9316] [SPARKR] Add support for filtering using `[` (synonym for filter / select) Add support for ``` df[df$name == "Smith", c(1,2)] df[df$age %in% c(19, 30), 1:2] ``` shivaram Author: felixcheung <felixcheung_m@hotmail.com> Closes #8394 from felixcheung/rsubset.	2015-08-25 23:48:16 -07:00
Sun Rui	71a138cd0e	[SPARK-10048] [SPARKR] Support arbitrary nested Java array in serde. This PR: 1. supports transferring arbitrary nested array from JVM to R side in SerDe; 2. based on 1, collect() implemenation is improved. Now it can support collecting data of complex types from a DataFrame. Author: Sun Rui <rui.sun@intel.com> Closes #8276 from sun-rui/SPARK-10048.	2015-08-25 13:14:10 -07:00
Yu ISHIKAWA	d4549fe58f	[SPARK-10214] [SPARKR] [DOCS] Improve SparkR Column, DataFrame API docs cc: shivaram ## Summary - Add name tags to each methods in DataFrame.R and column.R - Replace `rdname column` with `rdname {each_func}`. i.e. alias method : `rdname column` => `rdname alias` ## Generated PDF File https://drive.google.com/file/d/0B9biIZIU47lLNHN2aFpnQXlSeGs/view?usp=sharing ## JIRA [[SPARK-10214] Improve SparkR Column, DataFrame API docs - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10214) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8414 from yu-iskw/SPARK-10214.	2015-08-25 00:28:51 -07:00
Yu ISHIKAWA	6511bf559b	[SPARK-10118] [SPARKR] [DOCS] Improve SparkR API docs for 1.5 release cc: shivaram ## Summary - Modify `tdname` of expression functions. i.e. `ascii`: `rdname functions` => `rdname ascii` - Replace the dynamical function definitions to the static ones because of thir documentations. ## Generated PDF File https://drive.google.com/file/d/0B9biIZIU47lLX2t6ZjRoRnBTSEU/view?usp=sharing ## JIRA [[SPARK-10118] Improve SparkR API docs for 1.5 release - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10118) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Author: Yuu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8386 from yu-iskw/SPARK-10118.	2015-08-24 18:17:51 -07:00
Yu ISHIKAWA	d898c33f77	[SPARK-10106] [SPARKR] Add `ifelse` Column function to SparkR ### JIRA [[SPARK-10106] Add `ifelse` Column function to SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10106) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8303 from yu-iskw/SPARK-10106.	2015-08-19 12:39:37 -07:00
Yu ISHIKAWA	2fcb9cb955	[SPARK-9856] [SPARKR] Add expression functions into SparkR whose params are complicated I added lots of Column functinos into SparkR. And I also added `rand(seed: Int)` and `randn(seed: Int)` in Scala. Since we need such APIs for R integer type. ### JIRA [[SPARK-9856] Add expression functions into SparkR whose params are complicated - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9856) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8264 from yu-iskw/SPARK-9856-3.	2015-08-19 10:41:14 -07:00
Yu ISHIKAWA	bf32c1f7f4	[SPARK-10075] [SPARKR] Add `when` expressino function in SparkR - Add `when` and `otherwise` as `Column` methods - Add `When` as an expression function - Add `%otherwise%` infix as an alias of `otherwise` Since R doesn't support a feature like method chaining, `otherwise(when(condition, value), value)` style is a little annoying for me. If `%otherwise%` looks strange for shivaram, I can remove it. What do you think? ### JIRA [[SPARK-10075] Add `when` expressino function in SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10075) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8266 from yu-iskw/SPARK-10075.	2015-08-18 20:27:36 -07:00
Yu ISHIKAWA	b4b35f133a	[SPARKR] [MINOR] Get rid of a long line warning ``` R/functions.R:74:1: style: lines should not be more than 100 characters. jc <- callJStatic("org.apache.spark.sql.functions", "lit", ifelse(class(x) == "Column", xjc, x)) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8297 from yu-iskw/minor-lint-r.	2015-08-18 19:18:05 -07:00
Hossein	04e0fea79b	Bump SparkR version string to 1.5.0 This patch is against master, but we need to apply it to 1.5 branch as well. cc shivaram and rxin Author: Hossein <hossein@databricks.com> Closes #8291 from falaki/SparkRVersion1.5.	2015-08-18 18:02:22 -07:00

1 2 3 4 5

223 commits