ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Sun Rui	4ae9fe091c	[SPARK-12919][SPARKR] Implement dapply() on DataFrame in SparkR. ## What changes were proposed in this pull request? dapply() applies an R function on each partition of a DataFrame and returns a new DataFrame. The function signature is: dapply(df, function(localDF) {}, schema = NULL) R function input: local data.frame from the partition on local node R function output: local data.frame Schema specifies the Row format of the resulting DataFrame. It must match the R function's output. If schema is not specified, each partition of the result DataFrame will be serialized in R into a single byte array. Such resulting DataFrame can be processed by successive calls to dapply(). ## How was this patch tested? SparkR unit tests. Author: Sun Rui <rui.sun@intel.com> Author: Sun Rui <sunrui2016@gmail.com> Closes #12493 from sun-rui/SPARK-12919.	2016-04-29 16:41:07 -07:00
Dongjoon Hyun	6ab4d9e0c7	[SPARK-14883][DOCS] Fix wrong R examples and make them up-to-date ## What changes were proposed in this pull request? This issue aims to fix some errors in R examples and make them up-to-date in docs and example modules. - Remove the wrong usage of `map`. We need to use `lapply` in `sparkR` if needed. However, `lapply` is private so far. The corrected example will be added later. - Fix the wrong example in Section `Generic Load/Save Functions` of `docs/sql-programming-guide.md` for consistency - Fix datatypes in `sparkr.md`. - Update a data result in `sparkr.md`. - Replace deprecated functions to remove warnings: jsonFile -> read.json, parquetFile -> read.parquet - Use up-to-date R-like functions: loadDF -> read.df, saveDF -> write.df, saveAsParquetFile -> write.parquet - Replace `SparkR DataFrame` with `SparkDataFrame` in `dataframe.R` and `data-manipulation.R`. - Other minor syntax fixes and a typo. ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12649 from dongjoon-hyun/SPARK-14883.	2016-04-24 22:10:27 -07:00
Mark Grover	ff9ae61a3b	[SPARK-14601][DOC] Minor doc/usage changes related to removal of Spark assembly ## What changes were proposed in this pull request? Removing references to assembly jar in documentation. Adding an additional (previously undocumented) usage of spark-submit to run examples. ## How was this patch tested? Ran spark-submit usage to ensure formatting was fine. Ran examples using SparkSubmit. Author: Mark Grover <mark@apache.org> Closes #12365 from markgrover/spark-14601.	2016-04-14 18:51:43 -07:00
Dongjoon Hyun	1a0cca1fc8	[MINOR][DOCS] Fix wrong data types in JSON Datasets example. ## What changes were proposed in this pull request? This PR fixes the `age` data types from `integer` to `long` in `SQL Programming Guide: JSON Datasets`. ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12290 from dongjoon-hyun/minor_fix_type_in_json_example.	2016-04-11 09:03:11 +01:00
Reynold Xin	9ca0760d67	[SPARK-10063][SQL] Remove DirectParquetOutputCommitter ## What changes were proposed in this pull request? This patch removes DirectParquetOutputCommitter. This was initially created by Databricks as a faster way to write Parquet data to S3. However, given how the underlying S3 Hadoop implementation works, this committer only works when there are no failures. If there are multiple attempts of the same task (e.g. speculation or task failures or node failures), the output data can be corrupted. I don't think this performance optimization outweighs the correctness issue. ## How was this patch tested? Removed the related tests also. Author: Reynold Xin <rxin@databricks.com> Closes #12229 from rxin/SPARK-10063.	2016-04-07 00:51:45 -07:00
Marcelo Vanzin	24d7d2e453	[SPARK-13579][BUILD] Stop building the main Spark assembly. This change modifies the "assembly/" module to just copy needed dependencies to its build directory, and modifies the packaging script to pick those up (and remove duplicate jars packages in the examples module). I also made some minor adjustments to dependencies to remove some test jars from the final packaging, and remove jars that conflict with each other when packaged separately (e.g. servlet api). Also note that this change restores guava in applications' classpaths, even though it's still shaded inside Spark. This is now needed for the Hadoop libraries that are packaged with Spark, which now are not processed by the shade plugin. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11796 from vanzin/SPARK-13579.	2016-04-04 16:52:22 -07:00
Daoyuan Wang	d1c193a2f1	[SPARK-12855][MINOR][SQL][DOC][TEST] remove spark.sql.dialect from doc and test ## What changes were proposed in this pull request? Since developer API of plug-able parser has been removed in #10801 , docs should be updated accordingly. ## How was this patch tested? This patch will not affect the real code path. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #11758 from adrian-wang/spark12855.	2016-03-16 22:52:10 -07:00
Dongjoon Hyun	4ce2d24e2a	[SPARK-13942][CORE][DOCS] Remove Shark-related docs for 2.x ## What changes were proposed in this pull request? `Shark` was merged into `Spark SQL` since [July 2014](https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html). The followings seem to be the only legacy. For Spark 2.x, we had better clean up those docs. Migration Guide ``` - ## Migration Guide for Shark Users - ... - ### Scheduling - ... - ### Reducer number - ... - ### Caching ``` ## How was this patch tested? Pass the Jenkins test. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11770 from dongjoon-hyun/SPARK-13942.	2016-03-16 15:50:24 -07:00
Dongjoon Hyun	c3689bc24e	[SPARK-13702][CORE][SQL][MLLIB] Use diamond operator for generic instance creation in Java code. ## What changes were proposed in this pull request? In order to make `docs/examples` (and other related code) more simple/readable/user-friendly, this PR replaces existing codes like the followings by using `diamond` operator. ``` - final ArrayList<Product2<Object, Object>> dataToWrite = - new ArrayList<Product2<Object, Object>>(); + final ArrayList<Product2<Object, Object>> dataToWrite = new ArrayList<>(); ``` Java 7 or higher supports diamond operator which replaces the type arguments required to invoke the constructor of a generic class with an empty set of type parameters (<>). Currently, Spark Java code use mixed usage of this. ## How was this patch tested? Manual. Pass the existing tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11541 from dongjoon-hyun/SPARK-13702.	2016-03-09 10:31:26 +00:00
Dongjoon Hyun	024482bf51	[MINOR][DOCS] Fix all typos in markdown files of `doc` and similar patterns in other comments ## What changes were proposed in this pull request? This PR tries to fix all typos in all markdown files under `docs` module, and fixes similar typos in other comments, too. ## How was the this patch tested? manual tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11300 from dongjoon-hyun/minor_fix_typos.	2016-02-22 09:52:07 +00:00
Sasaki Toru	c2f21d8898	[SPARK-13264][DOC] Removed multi-byte characters in spark-env.sh.template In spark-env.sh.template, there are multi-byte characters, this PR will remove it. Author: Sasaki Toru <sasakitoa@nttdata.co.jp> Closes #11149 from sasakitoa/remove_multibyte_in_sparkenv.	2016-02-11 09:30:36 +00:00
Sebastián Ramírez	c882ec57de	[SPARK-13040][DOCS] Update JDBC deprecated SPARK_CLASSPATH documentation Update JDBC documentation based on http://stackoverflow.com/a/30947090/219530 as SPARK_CLASSPATH is deprecated. Also, that's how it worked, it didn't work with the SPARK_CLASSPATH or the --jars alone. This would solve issue: https://issues.apache.org/jira/browse/SPARK-13040 Author: Sebastián Ramírez <tiangolo@gmail.com> Closes #10948 from tiangolo/patch-docs-jdbc.	2016-02-09 08:49:34 +00:00
Takeshi YAMAMURO	da9146c91a	[DOCS] Fix the jar location of datanucleus in sql-programming-guid.md ISTM `lib` is better because `datanucleus` jars are located in `lib` for release builds. Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #10901 from maropu/DocFix.	2016-02-01 12:02:06 -08:00
Sun Rui	1b2a918e59	[SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR. Author: Sun Rui <rui.sun@intel.com> Closes #10201 from sun-rui/SPARK-12204.	2016-01-20 21:08:15 -08:00
Brandon Bradley	a767ee8a05	[SPARK-12758][SQL] add note to Spark SQL Migration guide about TimestampType casting Warning users about casting changes. Author: Brandon Bradley <bradleytastic@gmail.com> Closes #10708 from blbradley/spark-12758.	2016-01-11 14:21:50 -08:00
Josh Rosen	6c83d938cc	[SPARK-12579][SQL] Force user-specified JDBC driver to take precedence Spark SQL's JDBC data source allows users to specify an explicit JDBC driver to load (using the `driver` argument), but in the current code it's possible that the user-specified driver will not be used when it comes time to actually create a JDBC connection. In a nutshell, the problem is that you might have multiple JDBC drivers on the classpath that claim to be able to handle the same subprotocol, so simply registering the user-provided driver class with the our `DriverRegistry` and JDBC's `DriverManager` is not sufficient to ensure that it's actually used when creating the JDBC connection. This patch addresses this issue by first registering the user-specified driver with the DriverManager, then iterating over the driver manager's loaded drivers in order to obtain the correct driver and use it to create a connection (previously, we just called `DriverManager.getConnection()` directly). If a user did not specify a JDBC driver to use, then we call `DriverManager.getDriver` to figure out the class of the driver to use, then pass that class's name to executors; this guards against corner-case bugs in situations where the driver and executor JVMs might have different sets of JDBC drivers on their classpaths (previously, there was the (rare) potential for `DriverManager.getConnection()` to use different drivers on the driver and executors if the user had not explicitly specified a JDBC driver class and the classpaths were different). This patch is inspired by a similar patch that I made to the `spark-redshift` library (https://github.com/databricks/spark-redshift/pull/143), which contains its own modified fork of some of Spark's JDBC data source code (for cross-Spark-version compatibility reasons). Author: Josh Rosen <joshrosen@databricks.com> Closes #10519 from JoshRosen/jdbc-driver-precedence.	2016-01-04 10:39:42 -08:00
Yin Huai	ac8cdf1cdc	[SPARK-11678][SQL][DOCS] Document basePath in the programming guide. This PR adds document for `basePath`, which is a new parameter used by `HadoopFsRelation`. The compiled doc is shown below. ![image](https://cloud.githubusercontent.com/assets/2072857/11673132/1ba01192-9dcb-11e5-98d9-ac0b4e92e98c.png) JIRA: https://issues.apache.org/jira/browse/SPARK-11678 Author: Yin Huai <yhuai@databricks.com> Closes #10211 from yhuai/basePathDoc.	2015-12-09 18:09:36 -08:00
Michael Armbrust	3959489423	[SPARK-12069][SQL] Update documentation with Datasets Author: Michael Armbrust <michael@databricks.com> Closes #10060 from marmbrus/docs.	2015-12-08 15:58:35 -08:00
woj-i	6a8cf80cc8	[SPARK-11821] Propagate Kerberos keytab for all environments andrewor14 the same PR as in branch 1.5 harishreedharan Author: woj-i <wojciechindyk@gmail.com> Closes #9859 from woj-i/master.	2015-12-01 11:05:45 -08:00
Stephen Samuel	026ea2eab1	Updated sql programming guide to include jdbc fetch size Author: Stephen Samuel <sam@sksamuel.com> Closes #9377 from sksamuel/master.	2015-11-23 19:52:12 -08:00
Cheng Lian	7b1407c7b9	[SPARK-11089][SQL] Adds option for disabling multi-session in Thrift server This PR adds a new option `spark.sql.hive.thriftServer.singleSession` for disabling multi-session support in the Thrift server. Note that this option is added as a Spark configuration (retrieved from `SparkConf`) rather than Spark SQL configuration (retrieved from `SQLConf`). This is because all SQL configurations are session-ized. Since multi-session support is by default on, no JDBC connection can modify global configurations like the newly added one. Author: Cheng Lian <lian@databricks.com> Closes #9740 from liancheng/spark-11089.single-session-option.	2015-11-17 11:17:52 -08:00
gatorsmile	2f38378856	[SPARK-11360][DOC] Loss of nullability when writing parquet files This fix is to add one line to explain the current behavior of Spark SQL when writing Parquet files. All columns are forced to be nullable for compatibility reasons. Author: gatorsmile <gatorsmile@gmail.com> Closes #9314 from gatorsmile/lossNull.	2015-11-09 16:06:48 -08:00
Rohit Agarwal	b541b31630	[DOC][MINOR][SQL] Fix internal link It doesn't show up as a hyperlink currently. It will show up as a hyperlink after this change. Author: Rohit Agarwal <mindprince@gmail.com> Closes #9544 from mindprince/patch-2.	2015-11-09 13:28:00 +01:00
xin Wu	26739059bc	[SPARK-10046][SQL] Hive warehouse dir not set in current directory when not … Doc change to align with HiveConf default in terms of where to create `warehouse` directory. Author: xin Wu <xinwu@us.ibm.com> Closes #9365 from xwu0226/spark-10046-commit.	2015-11-08 12:28:19 -08:00
Rohit Agarwal	5c4e6d7ec9	[DOC][SQL] Remove redundant out-of-place python snippet This snippet seems to be mistakenly introduced at two places in #5348. Author: Rohit Agarwal <mindprince@gmail.com> Closes #9540 from mindprince/patch-1.	2015-11-08 14:24:26 +00:00
Wenchen Fan	e0fc9c7e59	[SPARK-11197][SQL] add doc for run SQL on files directly Author: Wenchen Fan <wenchen@databricks.com> Closes #9467 from cloud-fan/doc.	2015-11-04 09:33:30 -08:00
lewuathe	d648a4ad54	[DOC] Missing link to R DataFrame API doc Author: lewuathe <lewuathe@me.com> Author: Lewuathe <lewuathe@me.com> Closes #9394 from Lewuathe/missing-link-to-R-dataframe.	2015-11-03 16:38:22 -08:00
Josh Rosen	b67dc6a434	[SPARK-11299][DOC] Fix link to Scala DataFrame Functions reference The SQL programming guide's link to the DataFrame functions reference points to the wrong location; this patch fixes that. Author: Josh Rosen <joshrosen@databricks.com> Closes #9269 from JoshRosen/SPARK-11299.	2015-10-25 10:31:44 +01:00
Sean Owen	82bbc2a5f2	[SPARK-9570] [DOCS] Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'. Recommend `--master yarn --deploy-mode {cluster,client}` consistently in docs. Follow-on to https://github.com/apache/spark/pull/8385 CC nssalian Author: Sean Owen <sowen@cloudera.com> Closes #8968 from srowen/SPARK-9570.	2015-10-04 09:31:52 +01:00
Jacek Laskowski	ca9fe540fe	[SPARK-10662] [DOCS] Code snippets are not properly formatted in tables * Backticks are processed properly in Spark Properties table * Removed unnecessary spaces * See http://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/running-on-yarn.html Author: Jacek Laskowski <jacek.laskowski@deepsense.io> Closes #8795 from jaceklaskowski/docs-yarn-formatting.	2015-09-21 19:46:39 +01:00
Josh Rosen	2117eea71e	[SPARK-10710] Remove ability to disable spilling in core and SQL It does not make much sense to set `spark.shuffle.spill` or `spark.sql.planner.externalSort` to false: I believe that these configurations were initially added as "escape hatches" to guard against bugs in the external operators, but these operators are now mature and well-tested. In addition, these configurations are not handled in a consistent way anymore: SQL's Tungsten codepath ignores these configurations and will continue to use spilling operators. Similarly, Spark Core's `tungsten-sort` shuffle manager does not respect `spark.shuffle.spill=false`. This pull request removes these configurations, adds warnings at the appropriate places, and deletes a large amount of code which was only used in code paths that did not support spilling. Author: Josh Rosen <joshrosen@databricks.com> Closes #8831 from JoshRosen/remove-ability-to-disable-spilling.	2015-09-19 21:40:21 -07:00
Kousuke Saruta	d507f9c0b7	[SPARK-10584] [SQL] [DOC] Documentation about the compatible Hive version is wrong. In Spark 1.5.0, Spark SQL is compatible with Hive 0.12.0 through 1.2.1 but the documentation is wrong. /CC yhuai Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #8776 from sarutak/SPARK-10584-2.	2015-09-19 01:59:36 -07:00
Kousuke Saruta	cf2821ef5f	[SPARK-10584] [DOC] [SQL] Documentation about spark.sql.hive.metastore.version is wrong. The default value of hive metastore version is 1.2.1 but the documentation says the value of `spark.sql.hive.metastore.version` is 0.13.1. Also, we cannot get the default value by `sqlContext.getConf("spark.sql.hive.metastore.version")`. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #8739 from sarutak/SPARK-10584.	2015-09-14 12:06:23 -07:00
GuoQiang Li	5369be8068	[SPARK-10350] [DOC] [SQL] Removed duplicated option description from SQL guide Author: GuoQiang Li <witgo@qq.com> Closes #8520 from witgo/SPARK-10350.	2015-08-29 13:20:27 -07:00
Yin Huai	b3dd569ad4	[SPARK-10287] [SQL] Fixes JSONRelation refreshing on read path https://issues.apache.org/jira/browse/SPARK-10287 After porting json to HadoopFsRelation, it seems hard to keep the behavior of picking up new files automatically for JSON. This PR removes this behavior, so JSON is consistent with others (ORC and Parquet). Author: Yin Huai <yhuai@databricks.com> Closes #8469 from yhuai/jsonRefresh.	2015-08-27 16:11:25 -07:00
Michael Armbrust	dc86a227e4	[SPARK-9148] [SPARK-10252] [SQL] Update SQL Programming Guide Author: Michael Armbrust <michael@databricks.com> Closes #8441 from marmbrus/documentation.	2015-08-27 11:45:15 -07:00
Cheng Lian	0fac144f6b	[SPARK-9424] [SQL] Parquet programming guide updates for 1.5 Author: Cheng Lian <lian@databricks.com> Closes #8467 from liancheng/spark-9424/parquet-docs-for-1.5.	2015-08-26 18:14:54 -07:00
Bill Chambers	b23c4d3ffc	Fix Broken Link Link was broken because it included tick marks. Author: Bill Chambers <wchambers@ischool.berkeley.edu> Closes #8302 from anabranch/patch-1.	2015-08-19 00:05:01 -07:00
Davies Liu	17284db314	[SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen/unsafe spark.sql.tungsten.enabled will be the default value for both codegen and unsafe, they are kept internally for debug/testing. cc marmbrus rxin Author: Davies Liu <davies@databricks.com> Closes #7998 from davies/tungsten and squashes the following commits: c1c16da [Davies Liu] update doc 1a47be1 [Davies Liu] use tungsten.enabled for both of codegen/unsafe (cherry picked from commit `4e70e8256c`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-08-06 19:42:02 -07:00
Davies Liu	49b1504fe3	Revert "[SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen/unsafe" This reverts commit `4e70e8256c`.	2015-08-06 17:36:12 -07:00
Davies Liu	4e70e8256c	[SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen/unsafe spark.sql.tungsten.enabled will be the default value for both codegen and unsafe, they are kept internally for debug/testing. cc marmbrus rxin Author: Davies Liu <davies@databricks.com> Closes #7998 from davies/tungsten and squashes the following commits: c1c16da [Davies Liu] update doc 1a47be1 [Davies Liu] use tungsten.enabled for both of codegen/unsafe	2015-08-06 17:30:31 -07:00
KaiXinXiaoLei	536d2adc12	[SPARK-9535][SQL][DOCS] Modify document for codegen. #7142 made codegen enabled by default so let's modify the corresponding documents. Closes #7142 Author: KaiXinXiaoLei <huleilei1@huawei.com> Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #7863 from sarutak/SPARK-9535 and squashes the following commits: 0884424 [Kousuke Saruta] Removed a line which mentioned about the effect of codegen enabled 3c11af0 [Kousuke Saruta] Merge branch 'sqlconfig' of https://github.com/KaiXinXiaoLei/spark into SPARK-9535 4ee531d [KaiXinXiaoLei] delete space 4cfd11d [KaiXinXiaoLei] change spark.sql.planner.externalSort d624cf8 [KaiXinXiaoLei] sql config is wrong	2015-08-02 20:04:21 -07:00
Sean Owen	873ab0f969	[SPARK-9490] [DOCS] [MLLIB] MLlib evaluation metrics guide example python code uses deprecated print statement Use print(x) not print x for Python 3 in eval examples CC sethah mengxr -- just wanted to close this out before 1.5 Author: Sean Owen <sowen@cloudera.com> Closes #7822 from srowen/SPARK-9490 and squashes the following commits: 01abeba [Sean Owen] Change "print x" to "print(x)" in the rest of the docs too bd7f7fb [Sean Owen] Use print(x) not print x for Python 3 in eval examples	2015-07-31 13:45:28 -07:00
Cheng Lian	bebe3f7b45	[SPARK-9207] [SQL] Enables Parquet filter push-down by default PARQUET-136 and PARQUET-173 have been fixed in parquet-mr 1.7.0. It's time to enable filter push-down by default now. Author: Cheng Lian <lian@databricks.com> Closes #7612 from liancheng/spark-9207 and squashes the following commits: 77e6b5e [Cheng Lian] Enables Parquet filter push-down by default	2015-07-23 17:49:33 -07:00
Alok Singh	8f3cd93278	[SPARK-8909][Documentation] Change the scala example in sql-programmi… …ng-guide#Manually Specifying Options to be in sync with java,python, R version Author: Alok Singh <“singhal@us.ibm.com”> Closes #7299 from aloknsingh/aloknsingh_SPARK-8909 and squashes the following commits: d3c20ba [Alok Singh] fix the file to .parquet from .json d476140 [Alok Singh] [SPARK-8909][Documentation] Change the scala example in sql-programming-guide#Manually Specifying Options to be in sync with java,python, R version	2015-07-08 14:51:18 -07:00
Sun Rui	bf02e37716	[SPARK-8894] [SPARKR] [DOC] Example code errors in SparkR documentation. Author: Sun Rui <rui.sun@intel.com> Closes #7287 from sun-rui/SPARK-8894 and squashes the following commits: da63898 [Sun Rui] [SPARK-8894][SPARKR][DOC] Example code errors in SparkR documentation.	2015-07-08 09:48:16 -07:00
Tijo Thomas	08192a1b8a	[SPARK-8886][Documentation]python Style update Fixed comment given by rxin Author: Tijo Thomas <tijoparacka@gmail.com> Closes #7281 from tijoparacka/modification_for_python_style and squashes the following commits: 6334e21 [Tijo Thomas] removed space 3de4cd8 [Tijo Thomas] python Style update	2015-07-07 22:35:39 -07:00
Tijo Thomas	9213f73a8e	[SPARK-8615] [DOCUMENTATION] Fixed Sample deprecated code Modified the deprecated jdbc api in the documentation. Author: Tijo Thomas <tijoparacka@gmail.com> Closes #7039 from tijoparacka/JIRA_8615 and squashes the following commits: 6e73b8a [Tijo Thomas] Reverted new lines 4042fcf [Tijo Thomas] updated to sql documentation a27949c [Tijo Thomas] Fixed Sample deprecated code	2015-06-30 10:50:45 -07:00
Cheng Lian	111d6b9b8a	[SPARK-8139] [SQL] Updates docs and comments of data sources and Parquet output committer options This PR only applies to master branch (1.5.0-SNAPSHOT) since it references `org.apache.parquet` classes which only appear in Parquet 1.7.0. Author: Cheng Lian <lian@databricks.com> Closes #6683 from liancheng/output-committer-docs and squashes the following commits: b4648b8 [Cheng Lian] Removes spark.sql.sources.outputCommitterClass as it's not a public option ee63923 [Cheng Lian] Updates docs and comments of data sources and Parquet output committer options	2015-06-23 17:24:26 -07:00
Cheng Lian	d96d7b5574	[DOC] [SQL] Addes Hive metastore Parquet table conversion section This PR adds a section about Hive metastore Parquet table conversion. It documents: 1. Schema reconciliation rules introduced in #5214 (see [this comment] [1] in #5188) 2. Metadata refreshing requirement introduced in #5339 [1]: https://github.com/apache/spark/pull/5188#issuecomment-86531248 Author: Cheng Lian <lian@databricks.com> Closes #5348 from liancheng/sql-doc-parquet-conversion and squashes the following commits: 42ae0d0 [Cheng Lian] Adds Python `refreshTable` snippet 4c9847d [Cheng Lian] Resorts to SQL for Python metadata refreshing snippet 756e660 [Cheng Lian] Adds Python snippet for metadata refreshing 50675db [Cheng Lian] Addes Hive metastore Parquet table conversion section	2015-06-23 14:19:21 -07:00

1 2 3

140 commits