ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
gatorsmile	6cb8f836da	[SPARK-15396][SQL][DOC] It can't connect hive metastore database #### What changes were proposed in this pull request? The `hive.metastore.warehouse.dir` property in hive-site.xml is deprecated since Spark 2.0.0. Users might not be able to connect to the existing metastore if they do not use the new conf parameter `spark.sql.warehouse.dir`. This PR is to update the document and example for explaining the latest changes in the configuration of default location of database. Below is the screenshot of the latest generated docs: <img width="681" alt="screenshot 2016-05-20 08 38 10" src="https://cloud.githubusercontent.com/assets/11567269/15433296/a05c4ace-1e66-11e6-8d2b-73682b32e9c2.png"> <img width="789" alt="screenshot 2016-05-20 08 53 26" src="https://cloud.githubusercontent.com/assets/11567269/15433734/645dc42e-1e68-11e6-9476-effc9f8721bb.png"> <img width="789" alt="screenshot 2016-05-20 08 53 37" src="https://cloud.githubusercontent.com/assets/11567269/15433738/68569f92-1e68-11e6-83d3-ef5bb221a8d8.png"> No change is made in the R's example. <img width="860" alt="screenshot 2016-05-20 08 54 38" src="https://cloud.githubusercontent.com/assets/11567269/15433779/965b8312-1e68-11e6-8bc4-53c88ceacde2.png"> #### How was this patch tested? N/A Author: gatorsmile <gatorsmile@gmail.com> Closes #13225 from gatorsmile/document.	2016-05-21 23:12:27 -07:00
Sean Zhong	25b315e6ca	[SPARK-15171][SQL] Remove the references to deprecated method dataset.registerTempTable ## What changes were proposed in this pull request? Update the unit test code, examples, and documents to remove calls to deprecated method `dataset.registerTempTable`. ## How was this patch tested? This PR only changes the unit test code, examples, and comments. It should be safe. This is a follow up of PR https://github.com/apache/spark/pull/12945 which was merged. Author: Sean Zhong <seanzhong@databricks.com> Closes #13098 from clockfly/spark-15171-remove-deprecation.	2016-05-18 09:01:59 +08:00
Sun Rui	4ae9fe091c	[SPARK-12919][SPARKR] Implement dapply() on DataFrame in SparkR. ## What changes were proposed in this pull request? dapply() applies an R function on each partition of a DataFrame and returns a new DataFrame. The function signature is: dapply(df, function(localDF) {}, schema = NULL) R function input: local data.frame from the partition on local node R function output: local data.frame Schema specifies the Row format of the resulting DataFrame. It must match the R function's output. If schema is not specified, each partition of the result DataFrame will be serialized in R into a single byte array. Such resulting DataFrame can be processed by successive calls to dapply(). ## How was this patch tested? SparkR unit tests. Author: Sun Rui <rui.sun@intel.com> Author: Sun Rui <sunrui2016@gmail.com> Closes #12493 from sun-rui/SPARK-12919.	2016-04-29 16:41:07 -07:00
Dongjoon Hyun	6ab4d9e0c7	[SPARK-14883][DOCS] Fix wrong R examples and make them up-to-date ## What changes were proposed in this pull request? This issue aims to fix some errors in R examples and make them up-to-date in docs and example modules. - Remove the wrong usage of `map`. We need to use `lapply` in `sparkR` if needed. However, `lapply` is private so far. The corrected example will be added later. - Fix the wrong example in Section `Generic Load/Save Functions` of `docs/sql-programming-guide.md` for consistency - Fix datatypes in `sparkr.md`. - Update a data result in `sparkr.md`. - Replace deprecated functions to remove warnings: jsonFile -> read.json, parquetFile -> read.parquet - Use up-to-date R-like functions: loadDF -> read.df, saveDF -> write.df, saveAsParquetFile -> write.parquet - Replace `SparkR DataFrame` with `SparkDataFrame` in `dataframe.R` and `data-manipulation.R`. - Other minor syntax fixes and a typo. ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12649 from dongjoon-hyun/SPARK-14883.	2016-04-24 22:10:27 -07:00
Mark Grover	ff9ae61a3b	[SPARK-14601][DOC] Minor doc/usage changes related to removal of Spark assembly ## What changes were proposed in this pull request? Removing references to assembly jar in documentation. Adding an additional (previously undocumented) usage of spark-submit to run examples. ## How was this patch tested? Ran spark-submit usage to ensure formatting was fine. Ran examples using SparkSubmit. Author: Mark Grover <mark@apache.org> Closes #12365 from markgrover/spark-14601.	2016-04-14 18:51:43 -07:00
Dongjoon Hyun	1a0cca1fc8	[MINOR][DOCS] Fix wrong data types in JSON Datasets example. ## What changes were proposed in this pull request? This PR fixes the `age` data types from `integer` to `long` in `SQL Programming Guide: JSON Datasets`. ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12290 from dongjoon-hyun/minor_fix_type_in_json_example.	2016-04-11 09:03:11 +01:00
Reynold Xin	9ca0760d67	[SPARK-10063][SQL] Remove DirectParquetOutputCommitter ## What changes were proposed in this pull request? This patch removes DirectParquetOutputCommitter. This was initially created by Databricks as a faster way to write Parquet data to S3. However, given how the underlying S3 Hadoop implementation works, this committer only works when there are no failures. If there are multiple attempts of the same task (e.g. speculation or task failures or node failures), the output data can be corrupted. I don't think this performance optimization outweighs the correctness issue. ## How was this patch tested? Removed the related tests also. Author: Reynold Xin <rxin@databricks.com> Closes #12229 from rxin/SPARK-10063.	2016-04-07 00:51:45 -07:00
Marcelo Vanzin	24d7d2e453	[SPARK-13579][BUILD] Stop building the main Spark assembly. This change modifies the "assembly/" module to just copy needed dependencies to its build directory, and modifies the packaging script to pick those up (and remove duplicate jars packages in the examples module). I also made some minor adjustments to dependencies to remove some test jars from the final packaging, and remove jars that conflict with each other when packaged separately (e.g. servlet api). Also note that this change restores guava in applications' classpaths, even though it's still shaded inside Spark. This is now needed for the Hadoop libraries that are packaged with Spark, which now are not processed by the shade plugin. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11796 from vanzin/SPARK-13579.	2016-04-04 16:52:22 -07:00
Daoyuan Wang	d1c193a2f1	[SPARK-12855][MINOR][SQL][DOC][TEST] remove spark.sql.dialect from doc and test ## What changes were proposed in this pull request? Since developer API of plug-able parser has been removed in #10801 , docs should be updated accordingly. ## How was this patch tested? This patch will not affect the real code path. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #11758 from adrian-wang/spark12855.	2016-03-16 22:52:10 -07:00
Dongjoon Hyun	4ce2d24e2a	[SPARK-13942][CORE][DOCS] Remove Shark-related docs for 2.x ## What changes were proposed in this pull request? `Shark` was merged into `Spark SQL` since [July 2014](https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html). The followings seem to be the only legacy. For Spark 2.x, we had better clean up those docs. Migration Guide ``` - ## Migration Guide for Shark Users - ... - ### Scheduling - ... - ### Reducer number - ... - ### Caching ``` ## How was this patch tested? Pass the Jenkins test. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11770 from dongjoon-hyun/SPARK-13942.	2016-03-16 15:50:24 -07:00
Dongjoon Hyun	c3689bc24e	[SPARK-13702][CORE][SQL][MLLIB] Use diamond operator for generic instance creation in Java code. ## What changes were proposed in this pull request? In order to make `docs/examples` (and other related code) more simple/readable/user-friendly, this PR replaces existing codes like the followings by using `diamond` operator. ``` - final ArrayList<Product2<Object, Object>> dataToWrite = - new ArrayList<Product2<Object, Object>>(); + final ArrayList<Product2<Object, Object>> dataToWrite = new ArrayList<>(); ``` Java 7 or higher supports diamond operator which replaces the type arguments required to invoke the constructor of a generic class with an empty set of type parameters (<>). Currently, Spark Java code use mixed usage of this. ## How was this patch tested? Manual. Pass the existing tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11541 from dongjoon-hyun/SPARK-13702.	2016-03-09 10:31:26 +00:00
Dongjoon Hyun	024482bf51	[MINOR][DOCS] Fix all typos in markdown files of `doc` and similar patterns in other comments ## What changes were proposed in this pull request? This PR tries to fix all typos in all markdown files under `docs` module, and fixes similar typos in other comments, too. ## How was the this patch tested? manual tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11300 from dongjoon-hyun/minor_fix_typos.	2016-02-22 09:52:07 +00:00
Sasaki Toru	c2f21d8898	[SPARK-13264][DOC] Removed multi-byte characters in spark-env.sh.template In spark-env.sh.template, there are multi-byte characters, this PR will remove it. Author: Sasaki Toru <sasakitoa@nttdata.co.jp> Closes #11149 from sasakitoa/remove_multibyte_in_sparkenv.	2016-02-11 09:30:36 +00:00
Sebastián Ramírez	c882ec57de	[SPARK-13040][DOCS] Update JDBC deprecated SPARK_CLASSPATH documentation Update JDBC documentation based on http://stackoverflow.com/a/30947090/219530 as SPARK_CLASSPATH is deprecated. Also, that's how it worked, it didn't work with the SPARK_CLASSPATH or the --jars alone. This would solve issue: https://issues.apache.org/jira/browse/SPARK-13040 Author: Sebastián Ramírez <tiangolo@gmail.com> Closes #10948 from tiangolo/patch-docs-jdbc.	2016-02-09 08:49:34 +00:00
Takeshi YAMAMURO	da9146c91a	[DOCS] Fix the jar location of datanucleus in sql-programming-guid.md ISTM `lib` is better because `datanucleus` jars are located in `lib` for release builds. Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #10901 from maropu/DocFix.	2016-02-01 12:02:06 -08:00
Sun Rui	1b2a918e59	[SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR. Author: Sun Rui <rui.sun@intel.com> Closes #10201 from sun-rui/SPARK-12204.	2016-01-20 21:08:15 -08:00
Brandon Bradley	a767ee8a05	[SPARK-12758][SQL] add note to Spark SQL Migration guide about TimestampType casting Warning users about casting changes. Author: Brandon Bradley <bradleytastic@gmail.com> Closes #10708 from blbradley/spark-12758.	2016-01-11 14:21:50 -08:00
Josh Rosen	6c83d938cc	[SPARK-12579][SQL] Force user-specified JDBC driver to take precedence Spark SQL's JDBC data source allows users to specify an explicit JDBC driver to load (using the `driver` argument), but in the current code it's possible that the user-specified driver will not be used when it comes time to actually create a JDBC connection. In a nutshell, the problem is that you might have multiple JDBC drivers on the classpath that claim to be able to handle the same subprotocol, so simply registering the user-provided driver class with the our `DriverRegistry` and JDBC's `DriverManager` is not sufficient to ensure that it's actually used when creating the JDBC connection. This patch addresses this issue by first registering the user-specified driver with the DriverManager, then iterating over the driver manager's loaded drivers in order to obtain the correct driver and use it to create a connection (previously, we just called `DriverManager.getConnection()` directly). If a user did not specify a JDBC driver to use, then we call `DriverManager.getDriver` to figure out the class of the driver to use, then pass that class's name to executors; this guards against corner-case bugs in situations where the driver and executor JVMs might have different sets of JDBC drivers on their classpaths (previously, there was the (rare) potential for `DriverManager.getConnection()` to use different drivers on the driver and executors if the user had not explicitly specified a JDBC driver class and the classpaths were different). This patch is inspired by a similar patch that I made to the `spark-redshift` library (https://github.com/databricks/spark-redshift/pull/143), which contains its own modified fork of some of Spark's JDBC data source code (for cross-Spark-version compatibility reasons). Author: Josh Rosen <joshrosen@databricks.com> Closes #10519 from JoshRosen/jdbc-driver-precedence.	2016-01-04 10:39:42 -08:00
Yin Huai	ac8cdf1cdc	[SPARK-11678][SQL][DOCS] Document basePath in the programming guide. This PR adds document for `basePath`, which is a new parameter used by `HadoopFsRelation`. The compiled doc is shown below. ![image](https://cloud.githubusercontent.com/assets/2072857/11673132/1ba01192-9dcb-11e5-98d9-ac0b4e92e98c.png) JIRA: https://issues.apache.org/jira/browse/SPARK-11678 Author: Yin Huai <yhuai@databricks.com> Closes #10211 from yhuai/basePathDoc.	2015-12-09 18:09:36 -08:00
Michael Armbrust	3959489423	[SPARK-12069][SQL] Update documentation with Datasets Author: Michael Armbrust <michael@databricks.com> Closes #10060 from marmbrus/docs.	2015-12-08 15:58:35 -08:00
woj-i	6a8cf80cc8	[SPARK-11821] Propagate Kerberos keytab for all environments andrewor14 the same PR as in branch 1.5 harishreedharan Author: woj-i <wojciechindyk@gmail.com> Closes #9859 from woj-i/master.	2015-12-01 11:05:45 -08:00
Stephen Samuel	026ea2eab1	Updated sql programming guide to include jdbc fetch size Author: Stephen Samuel <sam@sksamuel.com> Closes #9377 from sksamuel/master.	2015-11-23 19:52:12 -08:00
Cheng Lian	7b1407c7b9	[SPARK-11089][SQL] Adds option for disabling multi-session in Thrift server This PR adds a new option `spark.sql.hive.thriftServer.singleSession` for disabling multi-session support in the Thrift server. Note that this option is added as a Spark configuration (retrieved from `SparkConf`) rather than Spark SQL configuration (retrieved from `SQLConf`). This is because all SQL configurations are session-ized. Since multi-session support is by default on, no JDBC connection can modify global configurations like the newly added one. Author: Cheng Lian <lian@databricks.com> Closes #9740 from liancheng/spark-11089.single-session-option.	2015-11-17 11:17:52 -08:00
gatorsmile	2f38378856	[SPARK-11360][DOC] Loss of nullability when writing parquet files This fix is to add one line to explain the current behavior of Spark SQL when writing Parquet files. All columns are forced to be nullable for compatibility reasons. Author: gatorsmile <gatorsmile@gmail.com> Closes #9314 from gatorsmile/lossNull.	2015-11-09 16:06:48 -08:00
Rohit Agarwal	b541b31630	[DOC][MINOR][SQL] Fix internal link It doesn't show up as a hyperlink currently. It will show up as a hyperlink after this change. Author: Rohit Agarwal <mindprince@gmail.com> Closes #9544 from mindprince/patch-2.	2015-11-09 13:28:00 +01:00
xin Wu	26739059bc	[SPARK-10046][SQL] Hive warehouse dir not set in current directory when not … Doc change to align with HiveConf default in terms of where to create `warehouse` directory. Author: xin Wu <xinwu@us.ibm.com> Closes #9365 from xwu0226/spark-10046-commit.	2015-11-08 12:28:19 -08:00
Rohit Agarwal	5c4e6d7ec9	[DOC][SQL] Remove redundant out-of-place python snippet This snippet seems to be mistakenly introduced at two places in #5348. Author: Rohit Agarwal <mindprince@gmail.com> Closes #9540 from mindprince/patch-1.	2015-11-08 14:24:26 +00:00
Wenchen Fan	e0fc9c7e59	[SPARK-11197][SQL] add doc for run SQL on files directly Author: Wenchen Fan <wenchen@databricks.com> Closes #9467 from cloud-fan/doc.	2015-11-04 09:33:30 -08:00
lewuathe	d648a4ad54	[DOC] Missing link to R DataFrame API doc Author: lewuathe <lewuathe@me.com> Author: Lewuathe <lewuathe@me.com> Closes #9394 from Lewuathe/missing-link-to-R-dataframe.	2015-11-03 16:38:22 -08:00
Josh Rosen	b67dc6a434	[SPARK-11299][DOC] Fix link to Scala DataFrame Functions reference The SQL programming guide's link to the DataFrame functions reference points to the wrong location; this patch fixes that. Author: Josh Rosen <joshrosen@databricks.com> Closes #9269 from JoshRosen/SPARK-11299.	2015-10-25 10:31:44 +01:00
Sean Owen	82bbc2a5f2	[SPARK-9570] [DOCS] Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'. Recommend `--master yarn --deploy-mode {cluster,client}` consistently in docs. Follow-on to https://github.com/apache/spark/pull/8385 CC nssalian Author: Sean Owen <sowen@cloudera.com> Closes #8968 from srowen/SPARK-9570.	2015-10-04 09:31:52 +01:00
Jacek Laskowski	ca9fe540fe	[SPARK-10662] [DOCS] Code snippets are not properly formatted in tables * Backticks are processed properly in Spark Properties table * Removed unnecessary spaces * See http://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/running-on-yarn.html Author: Jacek Laskowski <jacek.laskowski@deepsense.io> Closes #8795 from jaceklaskowski/docs-yarn-formatting.	2015-09-21 19:46:39 +01:00
Josh Rosen	2117eea71e	[SPARK-10710] Remove ability to disable spilling in core and SQL It does not make much sense to set `spark.shuffle.spill` or `spark.sql.planner.externalSort` to false: I believe that these configurations were initially added as "escape hatches" to guard against bugs in the external operators, but these operators are now mature and well-tested. In addition, these configurations are not handled in a consistent way anymore: SQL's Tungsten codepath ignores these configurations and will continue to use spilling operators. Similarly, Spark Core's `tungsten-sort` shuffle manager does not respect `spark.shuffle.spill=false`. This pull request removes these configurations, adds warnings at the appropriate places, and deletes a large amount of code which was only used in code paths that did not support spilling. Author: Josh Rosen <joshrosen@databricks.com> Closes #8831 from JoshRosen/remove-ability-to-disable-spilling.	2015-09-19 21:40:21 -07:00
Kousuke Saruta	d507f9c0b7	[SPARK-10584] [SQL] [DOC] Documentation about the compatible Hive version is wrong. In Spark 1.5.0, Spark SQL is compatible with Hive 0.12.0 through 1.2.1 but the documentation is wrong. /CC yhuai Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #8776 from sarutak/SPARK-10584-2.	2015-09-19 01:59:36 -07:00
Kousuke Saruta	cf2821ef5f	[SPARK-10584] [DOC] [SQL] Documentation about spark.sql.hive.metastore.version is wrong. The default value of hive metastore version is 1.2.1 but the documentation says the value of `spark.sql.hive.metastore.version` is 0.13.1. Also, we cannot get the default value by `sqlContext.getConf("spark.sql.hive.metastore.version")`. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #8739 from sarutak/SPARK-10584.	2015-09-14 12:06:23 -07:00
GuoQiang Li	5369be8068	[SPARK-10350] [DOC] [SQL] Removed duplicated option description from SQL guide Author: GuoQiang Li <witgo@qq.com> Closes #8520 from witgo/SPARK-10350.	2015-08-29 13:20:27 -07:00
Yin Huai	b3dd569ad4	[SPARK-10287] [SQL] Fixes JSONRelation refreshing on read path https://issues.apache.org/jira/browse/SPARK-10287 After porting json to HadoopFsRelation, it seems hard to keep the behavior of picking up new files automatically for JSON. This PR removes this behavior, so JSON is consistent with others (ORC and Parquet). Author: Yin Huai <yhuai@databricks.com> Closes #8469 from yhuai/jsonRefresh.	2015-08-27 16:11:25 -07:00
Michael Armbrust	dc86a227e4	[SPARK-9148] [SPARK-10252] [SQL] Update SQL Programming Guide Author: Michael Armbrust <michael@databricks.com> Closes #8441 from marmbrus/documentation.	2015-08-27 11:45:15 -07:00
Cheng Lian	0fac144f6b	[SPARK-9424] [SQL] Parquet programming guide updates for 1.5 Author: Cheng Lian <lian@databricks.com> Closes #8467 from liancheng/spark-9424/parquet-docs-for-1.5.	2015-08-26 18:14:54 -07:00
Bill Chambers	b23c4d3ffc	Fix Broken Link Link was broken because it included tick marks. Author: Bill Chambers <wchambers@ischool.berkeley.edu> Closes #8302 from anabranch/patch-1.	2015-08-19 00:05:01 -07:00
Davies Liu	17284db314	[SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen/unsafe spark.sql.tungsten.enabled will be the default value for both codegen and unsafe, they are kept internally for debug/testing. cc marmbrus rxin Author: Davies Liu <davies@databricks.com> Closes #7998 from davies/tungsten and squashes the following commits: c1c16da [Davies Liu] update doc 1a47be1 [Davies Liu] use tungsten.enabled for both of codegen/unsafe (cherry picked from commit `4e70e8256c`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-08-06 19:42:02 -07:00
Davies Liu	49b1504fe3	Revert "[SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen/unsafe" This reverts commit `4e70e8256c`.	2015-08-06 17:36:12 -07:00
Davies Liu	4e70e8256c	[SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen/unsafe spark.sql.tungsten.enabled will be the default value for both codegen and unsafe, they are kept internally for debug/testing. cc marmbrus rxin Author: Davies Liu <davies@databricks.com> Closes #7998 from davies/tungsten and squashes the following commits: c1c16da [Davies Liu] update doc 1a47be1 [Davies Liu] use tungsten.enabled for both of codegen/unsafe	2015-08-06 17:30:31 -07:00
KaiXinXiaoLei	536d2adc12	[SPARK-9535][SQL][DOCS] Modify document for codegen. #7142 made codegen enabled by default so let's modify the corresponding documents. Closes #7142 Author: KaiXinXiaoLei <huleilei1@huawei.com> Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #7863 from sarutak/SPARK-9535 and squashes the following commits: 0884424 [Kousuke Saruta] Removed a line which mentioned about the effect of codegen enabled 3c11af0 [Kousuke Saruta] Merge branch 'sqlconfig' of https://github.com/KaiXinXiaoLei/spark into SPARK-9535 4ee531d [KaiXinXiaoLei] delete space 4cfd11d [KaiXinXiaoLei] change spark.sql.planner.externalSort d624cf8 [KaiXinXiaoLei] sql config is wrong	2015-08-02 20:04:21 -07:00
Sean Owen	873ab0f969	[SPARK-9490] [DOCS] [MLLIB] MLlib evaluation metrics guide example python code uses deprecated print statement Use print(x) not print x for Python 3 in eval examples CC sethah mengxr -- just wanted to close this out before 1.5 Author: Sean Owen <sowen@cloudera.com> Closes #7822 from srowen/SPARK-9490 and squashes the following commits: 01abeba [Sean Owen] Change "print x" to "print(x)" in the rest of the docs too bd7f7fb [Sean Owen] Use print(x) not print x for Python 3 in eval examples	2015-07-31 13:45:28 -07:00
Cheng Lian	bebe3f7b45	[SPARK-9207] [SQL] Enables Parquet filter push-down by default PARQUET-136 and PARQUET-173 have been fixed in parquet-mr 1.7.0. It's time to enable filter push-down by default now. Author: Cheng Lian <lian@databricks.com> Closes #7612 from liancheng/spark-9207 and squashes the following commits: 77e6b5e [Cheng Lian] Enables Parquet filter push-down by default	2015-07-23 17:49:33 -07:00
Alok Singh	8f3cd93278	[SPARK-8909][Documentation] Change the scala example in sql-programmi… …ng-guide#Manually Specifying Options to be in sync with java,python, R version Author: Alok Singh <“singhal@us.ibm.com”> Closes #7299 from aloknsingh/aloknsingh_SPARK-8909 and squashes the following commits: d3c20ba [Alok Singh] fix the file to .parquet from .json d476140 [Alok Singh] [SPARK-8909][Documentation] Change the scala example in sql-programming-guide#Manually Specifying Options to be in sync with java,python, R version	2015-07-08 14:51:18 -07:00
Sun Rui	bf02e37716	[SPARK-8894] [SPARKR] [DOC] Example code errors in SparkR documentation. Author: Sun Rui <rui.sun@intel.com> Closes #7287 from sun-rui/SPARK-8894 and squashes the following commits: da63898 [Sun Rui] [SPARK-8894][SPARKR][DOC] Example code errors in SparkR documentation.	2015-07-08 09:48:16 -07:00
Tijo Thomas	08192a1b8a	[SPARK-8886][Documentation]python Style update Fixed comment given by rxin Author: Tijo Thomas <tijoparacka@gmail.com> Closes #7281 from tijoparacka/modification_for_python_style and squashes the following commits: 6334e21 [Tijo Thomas] removed space 3de4cd8 [Tijo Thomas] python Style update	2015-07-07 22:35:39 -07:00
Tijo Thomas	9213f73a8e	[SPARK-8615] [DOCUMENTATION] Fixed Sample deprecated code Modified the deprecated jdbc api in the documentation. Author: Tijo Thomas <tijoparacka@gmail.com> Closes #7039 from tijoparacka/JIRA_8615 and squashes the following commits: 6e73b8a [Tijo Thomas] Reverted new lines 4042fcf [Tijo Thomas] updated to sql documentation a27949c [Tijo Thomas] Fixed Sample deprecated code	2015-06-30 10:50:45 -07:00
Cheng Lian	111d6b9b8a	[SPARK-8139] [SQL] Updates docs and comments of data sources and Parquet output committer options This PR only applies to master branch (1.5.0-SNAPSHOT) since it references `org.apache.parquet` classes which only appear in Parquet 1.7.0. Author: Cheng Lian <lian@databricks.com> Closes #6683 from liancheng/output-committer-docs and squashes the following commits: b4648b8 [Cheng Lian] Removes spark.sql.sources.outputCommitterClass as it's not a public option ee63923 [Cheng Lian] Updates docs and comments of data sources and Parquet output committer options	2015-06-23 17:24:26 -07:00
Cheng Lian	d96d7b5574	[DOC] [SQL] Addes Hive metastore Parquet table conversion section This PR adds a section about Hive metastore Parquet table conversion. It documents: 1. Schema reconciliation rules introduced in #5214 (see [this comment] [1] in #5188) 2. Metadata refreshing requirement introduced in #5339 [1]: https://github.com/apache/spark/pull/5188#issuecomment-86531248 Author: Cheng Lian <lian@databricks.com> Closes #5348 from liancheng/sql-doc-parquet-conversion and squashes the following commits: 42ae0d0 [Cheng Lian] Adds Python `refreshTable` snippet 4c9847d [Cheng Lian] Resorts to SQL for Python metadata refreshing snippet 756e660 [Cheng Lian] Adds Python snippet for metadata refreshing 50675db [Cheng Lian] Addes Hive metastore Parquet table conversion section	2015-06-23 14:19:21 -07:00
Jihong MA	ebd363aecd	[SPARK-7265] Improving documentation for Spark SQL Hive support Please review this pull request. Author: Jihong MA <linlin200605@gmail.com> Closes #5933 from JihongMA/SPARK-7265 and squashes the following commits: dfaa971 [Jihong MA] SPARK-7265 minor fix of the content ace454d [Jihong MA] SPARK-7265 take out PySpark on YARN limitation 9ea0832 [Jihong MA] Merge remote-tracking branch 'upstream/master' d5bf3f5 [Jihong MA] Merge remote-tracking branch 'upstream/master' 7b842e6 [Jihong MA] Merge remote-tracking branch 'upstream/master' 9c84695 [Jihong MA] SPARK-7265 address review comment a399aa6 [Jihong MA] SPARK-7265 Improving documentation for Spark SQL Hive support	2015-06-19 14:06:49 +02:00
Lars Francke	4ce3bab89f	[SPARK-8462] [DOCS] Documentation fixes for Spark SQL This fixes various minor documentation issues on the Spark SQL page Author: Lars Francke <lars.francke@gmail.com> Closes #6890 from lfrancke/SPARK-8462 and squashes the following commits: dd7e302 [Lars Francke] Merge branch 'master' into SPARK-8462 34eff2c [Lars Francke] Minor documentation fixes	2015-06-18 19:40:32 -07:00
zsxwing	78a430ea4d	[SPARK-7961][SQL]Refactor SQLConf to display better error message 1. Add `SQLConfEntry` to store the information about a configuration. For those configurations that cannot be found in `sql-programming-guide.md`, I left the doc as `<TODO>`. 2. Verify the value when setting a configuration if this is in SQLConf. 3. Use `SET -v` to display all public configurations. Author: zsxwing <zsxwing@gmail.com> Closes #6747 from zsxwing/sqlconf and squashes the following commits: 7d09bad [zsxwing] Use SQLConfEntry in HiveContext 49f6213 [zsxwing] Add getConf, setConf to SQLContext and HiveContext e014f53 [zsxwing] Merge branch 'master' into sqlconf 93dad8e [zsxwing] Fix the unit tests cf950c1 [zsxwing] Fix the code style and tests 3c5f03e [zsxwing] Add unsetConf(SQLConfEntry) and fix the code style a2f4add [zsxwing] getConf will return the default value if a config is not set 037b1db [zsxwing] Add schema to SetCommand 0520c3c [zsxwing] Merge branch 'master' into sqlconf 7afb0ec [zsxwing] Fix the configurations about HiveThriftServer 7e728e3 [zsxwing] Add doc for SQLConfEntry and fix 'toString' 5e95b10 [zsxwing] Add enumConf c6ba76d [zsxwing] setRawString => setConfString, getRawString => getConfString 4abd807 [zsxwing] Fix the test for 'set -v' 6e47e56 [zsxwing] Fix the compilation error 8973ced [zsxwing] Remove floatConf 1fc3a8b [zsxwing] Remove the 'conf' command and use 'set -v' instead 99c9c16 [zsxwing] Fix tests that use SQLConfEntry as a string 88a03cc [zsxwing] Add new lines between confs and return types ce7c6c8 [zsxwing] Remove seqConf f3c1b33 [zsxwing] Refactor SQLConf to display better error message	2015-06-17 23:22:54 -07:00
Moussa Taifi	dc455b8833	[SPARK-DOCS] [SPARK-SQL] Update sql-programming-guide.md Typo in thriftserver section Author: Moussa Taifi <moutai10@gmail.com> Closes #6847 from moutai/patch-1 and squashes the following commits: 1bd29df [Moussa Taifi] Update sql-programming-guide.md	2015-06-16 20:59:22 +01:00
Peter Hoffmann	f3f2a4397d	fix read/write mixup Author: Peter Hoffmann <ph@peter-hoffmann.com> Closes #6815 from hoffmann/patch-1 and squashes the following commits: 2abb6da [Peter Hoffmann] fix read/write mixup	2015-06-14 11:41:16 -07:00
Cheng Lian	8f7308f9c4	[SQL] [MINOR] Fixes a minor Java example error in SQL programming guide Author: Cheng Lian <lian@databricks.com> Closes #6749 from liancheng/java-sample-fix and squashes the following commits: 5b44585 [Cheng Lian] Fixes a minor Java example error in SQL programming guide	2015-06-10 11:48:14 -07:00
Liang-Chi Hsieh	03ef6be9ce	[SPARK-7939] [SQL] Add conf to enable/disable partition column type inference JIRA: https://issues.apache.org/jira/browse/SPARK-7939 Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #6503 from viirya/disable_partition_type_inference and squashes the following commits: 3e90470 [Liang-Chi Hsieh] Default to enable type inference and update docs. 455edb1 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into disable_partition_type_inference 9a57933 [Liang-Chi Hsieh] Add conf to enable/disable partition column type inference.	2015-06-08 17:50:38 +08:00
Luca Martinetti	4060526cd3	[SPARK-7747] [SQL] [DOCS] spark.sql.planner.externalSort Add documentation for spark.sql.planner.externalSort Author: Luca Martinetti <luca@luca.io> Closes #6272 from lucamartinetti/docs-externalsort and squashes the following commits: 985661b [Luca Martinetti] [SPARK-7747] [SQL] [DOCS] Add documentation for spark.sql.planner.externalSort	2015-06-05 13:40:11 -07:00
Reynold Xin	00a7137900	Update documentation for the new DataFrame reader/writer interface. Author: Reynold Xin <rxin@databricks.com> Closes #6522 from rxin/sql-doc-1.4 and squashes the following commits: c227be7 [Reynold Xin] Updated link. 040b6d7 [Reynold Xin] Update documentation for the new DataFrame reader/writer interface.	2015-05-30 20:10:02 -07:00
Reynold Xin	7716a5a1ec	Updated SQL programming guide's Hive connectivity section.	2015-05-30 14:57:23 -07:00
Cheng Lian	6e3f0c7810	[SPARK-7849] [SQL] [Docs] Updates SQL programming guide for 1.4 Author: Cheng Lian <lian@databricks.com> Closes #6520 from liancheng/spark-7849 and squashes the following commits: 705264b [Cheng Lian] Updates SQL programming guide for 1.4	2015-05-30 12:16:09 -07:00
Shivaram Venkataraman	5f48e5c33b	[SPARK-6806] [SPARKR] [DOCS] Add a new SparkR programming guide This PR adds a new SparkR programming guide at the top-level. This will be useful for R users as our APIs don't directly match the Scala/Python APIs and as we need to explain SparkR without using RDDs as examples etc. cc rxin davies pwendell cc cafreeman -- Would be great if you could also take a look at this ! Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6490 from shivaram/sparkr-guide and squashes the following commits: d5ff360 [Shivaram Venkataraman] Add a section on HiveContext, HQL queries 408dce5 [Shivaram Venkataraman] Fix link dbb86e3 [Shivaram Venkataraman] Fix minor typo 9aff5e0 [Shivaram Venkataraman] Address comments, use dplyr-like syntax in example d09703c [Shivaram Venkataraman] Fix default argument in read.df ea816a1 [Shivaram Venkataraman] Add a new SparkR programming guide Also update write.df, read.df to handle defaults better	2015-05-29 14:11:58 -07:00
Matt Wise	35410614de	[DOCS] Fix typo in documentation for Java UDF registration This contribution is my original work and I license the work to the project under the project's open source license Author: Matt Wise <mwise@quixey.com> Closes #6447 from wisematthew/fix-typo-in-java-udf-registration-doc and squashes the following commits: e7ef5f7 [Matt Wise] Fix typo in documentation for Java UDF registration	2015-05-27 22:39:19 -07:00
Davies Liu	7af3818c6b	[SPARK-6806] [SPARKR] [DOCS] Fill in SparkR examples in programming guide sqlCtx -> sqlContext You can check the docs by: ``` $ cd docs $ SKIP_SCALADOC=1 jekyll serve ``` cc shivaram Author: Davies Liu <davies@databricks.com> Closes #5442 from davies/r_docs and squashes the following commits: 7a12ec6 [Davies Liu] remove rdd in R docs 8496b26 [Davies Liu] remove the docs related to RDD e23b9d6 [Davies Liu] delete R docs for RDD API 222e4ff [Davies Liu] Merge branch 'master' into r_docs 89684ce [Davies Liu] Merge branch 'r_docs' of github.com:davies/spark into r_docs f0a10e1 [Davies Liu] address comments from @shivaram f61de71 [Davies Liu] Update pairRDD.R 3ef7cf3 [Davies Liu] use + instead of function(a,b) a+b 2f10a77 [Davies Liu] address comments from @cafreeman 9c2a062 [Davies Liu] mention R api together with Python API 23f751a [Davies Liu] Fill in SparkR examples in programming guide	2015-05-23 00:01:40 -07:00
vidmantas zemleris	640f63b959	[SPARK-6994][SQL] Update docs for fetching Row fields by name add docs for https://issues.apache.org/jira/browse/SPARK-6994 Author: vidmantas zemleris <vidmantas@vinted.com> Closes #6030 from vidma/docs/row-with-named-fields and squashes the following commits: 241b401 [vidmantas zemleris] [SPARK-6994][SQL] Update docs for fetching Row fields by name	2015-05-11 22:29:24 -07:00
Reynold Xin	3a9b6997df	[SPARK-7462][SQL] Update documentation for retaining grouping columns in DataFrames. Author: Reynold Xin <rxin@databricks.com> Closes #6062 from rxin/agg-retain-doc and squashes the following commits: 43e511e [Reynold Xin] [SPARK-7462][SQL] Update documentation for retaining grouping columns in DataFrames.	2015-05-11 18:07:12 -07:00
gchen	8e674331d9	[SPARK-7516] [Minor] [DOC] Replace depreciated inferSchema() with createDataFrame() JIRA: https://issues.apache.org/jira/browse/SPARK-7516 In sql-programming-guide, deprecated python data frame api inferSchema() should be replaced by createDataFrame(): schemaPeople = sqlContext.inferSchema(people) -> schemaPeople = sqlContext.createDataFrame(people) Author: gchen <chenguancheng@gmail.com> Closes #6041 from gchen/python-docs and squashes the following commits: c27eb7c [gchen] replace inferSchema() with createDataFrame()	2015-05-11 14:37:18 -07:00
ksonj	fae4e2d609	[SPARK-7035] Encourage __getitem__ over __getattr__ on column access in the Python DataFrame API Author: ksonj <kson@siberie.de> Closes #5971 from ksonj/doc and squashes the following commits: dadfebb [ksonj] __getitem__ is cleaner than __getattr__	2015-05-07 01:02:00 -07:00
Deborah Siegel	59b7cfc41b	[SPARK-7136][Docs] Spark SQL and DataFrame Guide fix example file and paths Changes example file for Generic Load/Save Functions to users.parquet rather than people.parquet which doesn't exist unless a later example has already been executed. Also adds filepaths. Author: Deborah Siegel <deborah.siegel@gmail.com> Author: DEBORAH SIEGEL <deborahsiegel@d-140-142-0-49.dhcp4.washington.edu> Author: DEBORAH SIEGEL <deborahsiegel@DEBORAHs-MacBook-Pro.local> Author: DEBORAH SIEGEL <deborahsiegel@d-69-91-154-197.dhcp4.washington.edu> Closes #5693 from d3borah/master and squashes the following commits: 4d5e43b [Deborah Siegel] sparkSQL doc change b15a497 [Deborah Siegel] Revert "sparkSQL doc change" 5a2863c [DEBORAH SIEGEL] Merge remote-tracking branch 'upstream/master' 91972fc [DEBORAH SIEGEL] sparkSQL doc change f000e59 [DEBORAH SIEGEL] Merge remote-tracking branch 'upstream/master' db54173 [DEBORAH SIEGEL] fixed aggregateMessages example in graphX doc	2015-04-24 20:25:07 -07:00
Ken Geis	67bccbda1e	Update sql-programming-guide.md fix typo Author: Ken Geis <geis.ken@gmail.com> Closes #5674 from kgeis/patch-1 and squashes the following commits: 5ae67de [Ken Geis] Update sql-programming-guide.md	2015-04-23 20:45:33 -07:00
Olivier Girardot	5f095d5605	SPARK-6992 : Fix documentation example for Spark SQL on StructType This patch is fixing the Java examples for Spark SQL when defining programmatically a Schema and mapping Rows. Author: Olivier Girardot <o.girardot@lateral-thoughts.com> Closes #5569 from ogirardot/branch-1.3 and squashes the following commits: c29e58d [Olivier Girardot] SPARK-6992 : Fix documentation example for Spark SQL on StructType (cherry picked from commit c9b1ba4b16a7afe93d45bf75b128cc0dd287ded0) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-04-18 00:31:15 -07:00
Olivier Girardot	d305e686b3	SPARK-6988 : Fix documentation regarding DataFrames using the Java API This patch includes : * adding how to use map after an sql query using javaRDD * fixing the first few java examples that were written in Scala Thank you for your time, Olivier. Author: Olivier Girardot <o.girardot@lateral-thoughts.com> Closes #5564 from ogirardot/branch-1.3 and squashes the following commits: 9f8d60e [Olivier Girardot] SPARK-6988 : Fix documentation regarding DataFrames using the Java API (cherry picked from commit 6b528dc139da594ef2e651d84bd91fe0f738a39d) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-04-17 16:27:02 -05:00
Liang-Chi Hsieh	e3e4e9a38b	[SPARK-6800][SQL] Update doc for JDBCRelation's columnPartition JIRA https://issues.apache.org/jira/browse/SPARK-6800 Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #5488 from viirya/fix_jdbc_where and squashes the following commits: 51386c8 [Liang-Chi Hsieh] Update code comment. 1dcc929 [Liang-Chi Hsieh] Update document. 3eb74d6 [Liang-Chi Hsieh] Revert and modify doc. df11783 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into fix_jdbc_where 3e7db15 [Liang-Chi Hsieh] Fix wrong logic to generate WHERE clause for JDBC.	2015-04-15 13:02:25 -07:00
Santiago M. Mola	6437e7cc3b	[SPARK-6863] Fix formatting on SQL programming guide. https://issues.apache.org/jira/browse/SPARK-6863 Author: Santiago M. Mola <santiago.mola@sap.com> Closes #5472 from smola/fix/sql-docs and squashes the following commits: 42503d4 [Santiago M. Mola] [SPARK-6863] Fix formatting on SQL programming guide.	2015-04-11 15:42:03 -07:00
Davies Liu	6ada4f6f52	[SPARK-6781] [SQL] use sqlContext in python shell Use `sqlContext` in PySpark shell, make it consistent with SQL programming guide. `sqlCtx` is also kept for compatibility. Author: Davies Liu <davies@databricks.com> Closes #5425 from davies/sqlCtx and squashes the following commits: af67340 [Davies Liu] sqlCtx -> sqlContext 15a278f [Davies Liu] use sqlContext in python shell	2015-04-08 13:31:45 -07:00
Michael Armbrust	aad0032276	[DOCS][SQL] Fix JDBC example Author: Michael Armbrust <michael@databricks.com> Closes #5192 from marmbrus/fixJDBCDocs and squashes the following commits: b48a33d [Michael Armbrust] [DOCS][SQL] Fix JDBC example	2015-03-26 14:51:46 -07:00
Bill Chambers	c5cc41468e	[DOCUMENTATION]Fixed Missing Type Import in Documentation Needed to import the types specifically, not the more general pyspark.sql Author: Bill Chambers <wchambers@ischool.berkeley.edu> Author: anabranch <wac.chambers@gmail.com> Closes #5179 from anabranch/master and squashes the following commits: 8fa67bf [anabranch] Corrected SqlContext Import 603b080 [Bill Chambers] [DOCUMENTATION]Fixed Missing Type Import in Documentation	2015-03-24 22:24:35 -07:00
vinodkc	2bf40c58e6	[SPARK-6337][Documentation, SQL]Spark 1.3 doc fixes Author: vinodkc <vinod.kc.in@gmail.com> Closes #5112 from vinodkc/spark_1.3_doc_fixes and squashes the following commits: 2c6aee6 [vinodkc] Spark 1.3 doc fixes	2015-03-22 20:00:08 +00:00
Kamil Smuga	6ef48632fb	SPARK-6454 [DOCS] Fix links to pyspark api Author: Kamil Smuga <smugakamil@gmail.com> Author: stderr <smugakamil@gmail.com> Closes #5120 from kamilsmuga/master and squashes the following commits: fee3281 [Kamil Smuga] more python api links fixed for docs 13240cb [Kamil Smuga] resolved merge conflicts with upstream/master 6649b3b [Kamil Smuga] fix broken docs links to Python API 92f03d7 [stderr] Fix links to pyspark api	2015-03-22 15:56:25 +00:00
Tijo Thomas	a012e08635	[SPARK-6383][SQL]Fixed compiler and errors in Dataframe examples Author: Tijo Thomas <tijoparacka@gmail.com> Closes #5068 from tijoparacka/fix_sql_dataframe_example and squashes the following commits: 6953ac1 [Tijo Thomas] Handled Java and Python example sections 0751a74 [Tijo Thomas] Fixed compiler and errors in Dataframe examples	2015-03-17 18:50:19 -07:00
Cheng Lian	69ff8e8cfb	[SPARK-5310] [SQL] [DOC] Parquet section for the SQL programming guide Also fixed a bunch of minor styling issues. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5001) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #5001 from liancheng/parquet-doc and squashes the following commits: 89ad3db [Cheng Lian] Addresses @rxin's comments 7eb6955 [Cheng Lian] Docs for the new Parquet data source 415eefb [Cheng Lian] Some minor formatting improvements	2015-03-13 21:34:50 +08:00
zzcclp	304366c465	[SPARK-6275][Documentation]Miss toDF() function in docs/sql-programming-guide.md Miss `toDF()` function in docs/sql-programming-guide.md Author: zzcclp <xm_zzc@sina.com> Closes #4977 from zzcclp/SPARK-6275 and squashes the following commits: 9a96c7b [zzcclp] Miss toDF()	2015-03-12 15:07:15 +00:00
Michael Armbrust	2672374110	[SPARK-5183][SQL] Update SQL Docs with JDBC and Migration Guide Author: Michael Armbrust <michael@databricks.com> Closes #4958 from marmbrus/sqlDocs and squashes the following commits: 9351dbc [Michael Armbrust] fix parquet example 6877e13 [Michael Armbrust] add sql examples d81b7e7 [Michael Armbrust] rxins comments e393528 [Michael Armbrust] fix order 19c2735 [Michael Armbrust] more on data source load/store 00d5914 [Michael Armbrust] Update SQL Docs with JDBC and Migration Guide	2015-03-10 18:13:09 -07:00
Reynold Xin	3cac1991a1	[SPARK-5310][Doc] Update SQL Programming Guide to include DataFrames. Author: Reynold Xin <rxin@databricks.com> Closes #4954 from rxin/df-docs and squashes the following commits: c592c70 [Reynold Xin] [SPARK-5310][Doc] Update SQL Programming Guide to include DataFrames.	2015-03-09 16:16:16 -07:00
CodingCat	31efb39c1d	[Minor] fix typo in SQL document Author: CodingCat <zhunansjtu@gmail.com> Closes #4656 from CodingCat/fix_typo and squashes the following commits: b41d15c [CodingCat] recover 689fe46 [CodingCat] fix typo	2015-02-17 12:16:52 -08:00
Antonio Navarro Perez	6a1be026cf	[SQL][DOCS] Update sql documentation Updated examples using the new api and added DataFrame concept Author: Antonio Navarro Perez <ajnavarro@users.noreply.github.com> Closes #4560 from ajnavarro/ajnavarro-doc-sql-update and squashes the following commits: 82ebcf3 [Antonio Navarro Perez] Changed a missing JavaSQLContext to SQLContext. 8d5376a [Antonio Navarro Perez] fixed typo 8196b6b [Antonio Navarro Perez] [SQL][DOCS] Update sql documentation	2015-02-12 12:46:17 -08:00
Davies Liu	ea60284095	[SPARK-5704] [SQL] [PySpark] createDataFrame from RDD with columns Deprecate inferSchema() and applySchema(), use createDataFrame() instead, which could take an optional `schema` to create an DataFrame from an RDD. The `schema` could be StructType or list of names of columns. Author: Davies Liu <davies@databricks.com> Closes #4498 from davies/create and squashes the following commits: 08469c1 [Davies Liu] remove Scala/Java API for now c80a7a9 [Davies Liu] fix hive test d1bd8f2 [Davies Liu] cleanup applySchema 9526e97 [Davies Liu] createDataFrame from RDD with columns	2015-02-10 19:40:12 -08:00
Daoyuan Wang	6fa4ac1b00	[Branch-1.3] [DOC] doc fix for date Trivial fix. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #4400 from adrian-wang/docdate and squashes the following commits: 31bbe40 [Daoyuan Wang] doc fix for date	2015-02-05 12:42:27 -08:00
Matei Zaharia	4d74f0601a	[SPARK-5608] Improve SEO of Spark documentation pages - Add meta description tags on some of the most important doc pages - Shorten the titles of some pages to have more relevant keywords; for example there's no reason to have "Spark SQL Programming Guide - Spark 1.2.0 documentation", we can just say "Spark SQL - Spark 1.2.0 documentation". Author: Matei Zaharia <matei@databricks.com> Closes #4381 from mateiz/docs-seo and squashes the following commits: 4940563 [Matei Zaharia] [SPARK-5608] Improve SEO of Spark documentation pages	2015-02-05 11:12:50 -08:00
Daoyuan Wang	0c20ce69fb	[SPARK-4987] [SQL] parquet timestamp type support Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3820 from adrian-wang/parquettimestamp and squashes the following commits: b1e2a0d [Daoyuan Wang] fix for nanos 4dadef1 [Daoyuan Wang] fix wrong read 93f438d [Daoyuan Wang] parquet timestamp support	2015-02-03 12:06:06 -08:00
scwf	1a200a3eea	[SQL][Minor] Update sql doc according to data type APIs changes Follow up of #3925 /cc rxin Author: scwf <wangfei1@huawei.com> Closes #4095 from scwf/sql-doc and squashes the following commits: 97e311b [scwf] update sql doc since now expose only one version of the data type APIs	2015-01-18 11:03:13 -08:00
luogankun	2deac748b4	[SPARK-4930][SQL][DOCS]Update SQL programming guide, CACHE TABLE is eager `CACHE TABLE tbl` is now __eager__ by default not __lazy__ Author: luogankun <luogankun@gmail.com> Closes #3773 from luogankun/SPARK-4930 and squashes the following commits: cc17b7d [luogankun] [SPARK-4930][SQL][DOCS]Update SQL programming guide, add CACHE [LAZY] TABLE [AS SELECT] ... bffe0e8 [luogankun] [SPARK-4930][SQL][DOCS]Update SQL programming guide, CACHE TABLE tbl is eager	2014-12-30 12:18:55 -08:00
luogankun	f7a41a0e79	[SPARK-4916][SQL][DOCS]Update SQL programming guide about cache section `SchemeRDD.cache()` now uses in-memory columnar storage. Author: luogankun <luogankun@gmail.com> Closes #3759 from luogankun/SPARK-4916 and squashes the following commits: 7b39864 [luogankun] [SPARK-4916]Update SQL programming guide 6018122 [luogankun] Merge branch 'master' of https://github.com/apache/spark into SPARK-4916 0b93785 [luogankun] [SPARK-4916]Update SQL programming guide 99b2336 [luogankun] [SPARK-4916]Update SQL programming guide	2014-12-30 12:17:49 -08:00
Peter Vandenabeele	1a9e35e57a	[DOCS][SQL] Add a Note on jsonFile having separate JSON objects per line * This commit hopes to avoid the confusion I faced when trying to submit a regular, valid multi-line JSON file, also see http://apache-spark-user-list.1001560.n3.nabble.com/Loading-JSON-Dataset-fails-with-com-fasterxml-jackson-databind-JsonMappingException-td20041.html Author: Peter Vandenabeele <peter@vandenabeele.com> Closes #3517 from petervandenabeele/pv-docs-note-on-jsonFile-format/01 and squashes the following commits: 1f98e52 [Peter Vandenabeele] Revert to people.json and simple Note text 6b6e062 [Peter Vandenabeele] Change the "JSON" connotation to "txt" fca7dfb [Peter Vandenabeele] Add a Note on jsonFile having separate JSON objects per line	2014-12-16 13:58:01 -08:00
Judy Nash	17688d1429	[SQL] SPARK-4700: Add HTTP protocol spark thrift server Add HTTP protocol support and test cases to spark thrift server, so users can deploy thrift server in both TCP and http mode. Author: Judy Nash <judynash@microsoft.com> Author: judynash <judynash@microsoft.com> Closes #3672 from judynash/master and squashes the following commits: 526315d [Judy Nash] correct spacing on startThriftServer method 31a6520 [Judy Nash] fix code style issues and update sql programming guide format issue 47bf87e [Judy Nash] modify withJdbcStatement method definition to meet less than 100 line length 2e9c11c [Judy Nash] add thrift server in http mode documentation on sql programming guide 1cbd305 [Judy Nash] Merge remote-tracking branch 'upstream/master' 2b1d312 [Judy Nash] updated http thrift server support based on feedback 377532c [judynash] add HTTP protocol spark thrift server	2014-12-16 12:37:26 -08:00
Andy Konwinski	15cf3b0125	Fix typo in Spark SQL docs. Author: Andy Konwinski <andykonwinski@gmail.com> Closes #3611 from andyk/patch-3 and squashes the following commits: 7bab333 [Andy Konwinski] Fix typo in Spark SQL docs.	2014-12-04 18:27:02 -08:00
Daoyuan Wang	5edbcbfb61	[SQL][DOC] Date type in SQL programming guide Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3535 from adrian-wang/datedoc and squashes the following commits: 18ff1ed [Daoyuan Wang] [DOC] Date type	2014-12-01 14:04:07 -08:00
wangfei	7b79957879	[SQL] Minor fix for doc and comment Author: wangfei <wangfei1@huawei.com> Closes #3533 from scwf/sql-doc1 and squashes the following commits: 962910b [wangfei] doc and comment fix	2014-12-01 14:02:02 -08:00

1 2 3 4

192 commits