ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Timothy Chen	8938a74893	[SPARK-7962] [MESOS] Fix master url parsing in rest submission client. Only parse standalone master url when master url starts with spark:// Author: Timothy Chen <tnachen@gmail.com> Closes #6517 from tnachen/fix_mesos_client and squashes the following commits: 61a1198 [Timothy Chen] Fix master url parsing in rest submission client. (cherry picked from commit `78657d53d7`) Signed-off-by: Andrew Or <andrew@databricks.com>	2015-05-29 23:56:27 -07:00
Octavian Geagla	11a4b30d1e	[SPARK-7576] [MLLIB] Add spark.ml user guide doc/example for ElementwiseProduct Author: Octavian Geagla <ogeagla@gmail.com> Closes #6501 from ogeagla/ml-guide-elemwiseprod and squashes the following commits: 4ad93d5 [Octavian Geagla] [SPARK-7576] [MLLIB] Incorporate code review feedback. f7be7ad [Octavian Geagla] [SPARK-7576] [MLLIB] Add spark.ml user guide doc/example for ElementwiseProduct. (cherry picked from commit `da2112aef2`) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>	2015-05-29 23:55:29 -07:00
Burak Yavuz	1513cffa35	[SPARK-7957] Preserve partitioning when using randomSplit cc JoshRosen Thanks for noticing this! Author: Burak Yavuz <brkyvz@gmail.com> Closes #6509 from brkyvz/sample-perf-reg and squashes the following commits: 497465d [Burak Yavuz] addressed code review 293f95f [Burak Yavuz] [SPARK-7957] Preserve partitioning when using randomSplit (cherry picked from commit `7ed06c3992`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-29 22:19:23 -07:00
Taka Shinagawa	400e6dbce2	[DOCS][Tiny] Added a missing dash(-) in docs/configuration.md The first line had only two dashes (--) instead of three(---). Because of this missing dash(-), 'jekyll build' command was not converting configuration.md to _site/configuration.html Author: Taka Shinagawa <taka.epsilon@gmail.com> Closes #6513 from mrt/docfix3 and squashes the following commits: c470e2c [Taka Shinagawa] Added a missing dash(-) preventing jekyll from converting configuration.md to html format (cherry picked from commit `3792d25836`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-29 20:35:26 -07:00
Ram Sriharsha	9a88be1833	[SPARK-6013] [ML] Add more Python ML examples for spark.ml Author: Ram Sriharsha <rsriharsha@hw11853.local> Closes #6443 from harsha2010/SPARK-6013 and squashes the following commits: 732506e [Ram Sriharsha] Code Review Feedback 121c211 [Ram Sriharsha] python style fix 5f9b8c3 [Ram Sriharsha] python style fixes 925ca86 [Ram Sriharsha] Simple Params Example 8b372b1 [Ram Sriharsha] GBT Example 965ec14 [Ram Sriharsha] Random Forest Example (cherry picked from commit `dbf8ff38de`) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>	2015-05-29 15:22:38 -07:00
Shivaram Venkataraman	2bd4460548	[SPARK-7954] [SPARKR] Create SparkContext in sparkRSQL init cc davies Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6507 from shivaram/sparkr-init and squashes the following commits: 6fdd169 [Shivaram Venkataraman] Create SparkContext in sparkRSQL init (cherry picked from commit `5fb97dca9b`) Signed-off-by: Davies Liu <davies@databricks.com>	2015-05-29 15:08:50 -07:00
Shivaram Venkataraman	cf4122e4d4	[SPARK-6806] [SPARKR] [DOCS] Add a new SparkR programming guide This PR adds a new SparkR programming guide at the top-level. This will be useful for R users as our APIs don't directly match the Scala/Python APIs and as we need to explain SparkR without using RDDs as examples etc. cc rxin davies pwendell cc cafreeman -- Would be great if you could also take a look at this ! Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6490 from shivaram/sparkr-guide and squashes the following commits: d5ff360 [Shivaram Venkataraman] Add a section on HiveContext, HQL queries 408dce5 [Shivaram Venkataraman] Fix link dbb86e3 [Shivaram Venkataraman] Fix minor typo 9aff5e0 [Shivaram Venkataraman] Address comments, use dplyr-like syntax in example d09703c [Shivaram Venkataraman] Fix default argument in read.df ea816a1 [Shivaram Venkataraman] Add a new SparkR programming guide Also update write.df, read.df to handle defaults better (cherry picked from commit `5f48e5c33b`) Signed-off-by: Davies Liu <davies@databricks.com>	2015-05-29 14:12:18 -07:00
Reynold Xin	f40605f064	[SPARK-7940] Enforce whitespace checking for DO, TRY, CATCH, FINALLY, MATCH, LARROW, RARROW in style checker. … Author: Reynold Xin <rxin@databricks.com> Closes #6491 from rxin/more-whitespace and squashes the following commits: f6e63dc [Reynold Xin] [SPARK-7940] Enforce whitespace checking for DO, TRY, CATCH, FINALLY, MATCH, LARROW, RARROW in style checker. (cherry picked from commit `94f62a4979`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-29 13:39:02 -07:00
Patrick Wendell	e549874c33	Preparing development version 1.4.0-SNAPSHOT	2015-05-29 13:07:07 -07:00
Patrick Wendell	dd109a8746	Preparing Spark release v1.4.0-rc3	2015-05-29 13:06:59 -07:00
Patrick Wendell	18811ca20b	Revert "[SQL] [TEST] [MINOR] Uses a temporary log4j.properties in HiveThriftServer2Test to ensure expected logging behavior" This reverts commit `645e611644`.	2015-05-29 13:03:52 -07:00
Patrick Wendell	c68abaa34e	Preparing development version 1.4.0-SNAPSHOT	2015-05-29 12:15:18 -07:00
Patrick Wendell	fb60503ff2	Preparing Spark release v1.4.0-rc3	2015-05-29 12:15:13 -07:00
MechCoder	4be701aa50	[SPARK-7946] [MLLIB] DecayFactor wrongly set in StreamingKMeans Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #6497 from MechCoder/spark-7946 and squashes the following commits: 2fdd0a3 [MechCoder] Add non-regression test 8c988c6 [MechCoder] [SPARK-7946] DecayFactor wrongly set in StreamingKMeans (cherry picked from commit `6181937f31`) Signed-off-by: Xiangrui Meng <meng@databricks.com>	2015-05-29 11:36:48 -07:00
Cheng Lian	645e611644	[SQL] [TEST] [MINOR] Uses a temporary log4j.properties in HiveThriftServer2Test to ensure expected logging behavior The `HiveThriftServer2Test` relies on proper logging behavior to assert whether the Thrift server daemon process is started successfully. However, some other jar files listed in the classpath may potentially contain an unexpected Log4J configuration file which overrides the logging behavior. This PR writes a temporary `log4j.properties` and prepend it to driver classpath before starting the testing Thrift server process to ensure proper logging behavior. cc andrewor14 yhuai Author: Cheng Lian <lian@databricks.com> Closes #6493 from liancheng/override-log4j and squashes the following commits: c489e0e [Cheng Lian] Fixes minor Scala styling issue b46ef0d [Cheng Lian] Uses a temporary log4j.properties in HiveThriftServer2Test to ensure expected logging behavior (cherry picked from commit `4782e13040`) Signed-off-by: Andrew Or <andrew@databricks.com>	2015-05-29 11:11:47 -07:00
Reynold Xin	62df047a36	HOTFIX: Scala style checker for DataTypeSuite.scala.	2015-05-29 11:06:33 -07:00
Cheng Lian	caea7a618d	[SPARK-7950] [SQL] Sets spark.sql.hive.version in HiveThriftServer2.startWithContext() When starting `HiveThriftServer2` via `startWithContext`, property `spark.sql.hive.version` isn't set. This causes Simba ODBC driver 1.0.8.1006 behaves differently and fails simple queries. Hive2 JDBC driver works fine in this case. Also, when starting the server with `start-thriftserver.sh`, both Hive2 JDBC driver and Simba ODBC driver works fine. Please refer to [SPARK-7950] [1] for details. [1]: https://issues.apache.org/jira/browse/SPARK-7950 Author: Cheng Lian <lian@databricks.com> Closes #6500 from liancheng/odbc-bugfix and squashes the following commits: 051e3a3 [Cheng Lian] Fixes import order 3a97376 [Cheng Lian] Sets spark.sql.hive.version in HiveThriftServer2.startWithContext() (cherry picked from commit `e7b6177557`) Signed-off-by: Yin Huai <yhuai@databricks.com>	2015-05-29 10:43:44 -07:00
Reynold Xin	23bd05fff7	HOTFIX: Scala style checker failure due to a missing space in TachyonBlockManager.scala.	2015-05-29 09:37:46 -07:00
Tim Ellison	459c3d22e0	[SPARK-7756] [CORE] Use testing cipher suites common to Oracle and IBM security providers Add alias names for supported cipher suites to the sample SSL configuration. The IBM JSSE provider reports its cipher suite with an SSL_ prefix, but accepts TLS_ prefixed suite names as an alias. However, Jetty filters the requested ciphers based on the provider's reported supported suites, so the TLS_ versions are never passed through to JSSE causing an SSL handshake failure. Author: Tim Ellison <t.p.ellison@gmail.com> Closes #6282 from tellison/SSLFailure and squashes the following commits: 8de8a3e [Tim Ellison] Update SecurityManagerSuite with new expected suite names 96158b2 [Tim Ellison] Update the sample configs to use ciphers that are common to both the Oracle and IBM security providers. 705421b [Tim Ellison] Merge branch 'master' of github.com:tellison/spark into SSLFailure 68b9425 [Tim Ellison] Merge branch 'master' of https://github.com/apache/spark into SSLFailure b0c35f6 [Tim Ellison] [CORE] Add aliases used for cipher suites in IBM provider (cherry picked from commit `bf46580708`) Signed-off-by: Sean Owen <sowen@cloudera.com>	2015-05-29 05:15:00 -04:00
Xiangrui Meng	509a7cafcc	[SPARK-7912] [SPARK-7921] [MLLIB] Update OneHotEncoder to handle ML attributes and change includeFirst to dropLast This PR contains two major changes to `OneHotEncoder`: 1. more robust handling of ML attributes. If the input attribute is unknown, we look at the values to get the max category index 2. change `includeFirst` to `dropLast` and leave the default to `true`. There are couple benefits: a. consistent with other tutorials of one-hot encoding (or dummy coding) (e.g., http://www.ats.ucla.edu/stat/mult_pkg/faq/general/dummy.htm) b. keep the indices unmodified in the output vector. If we drop the first, all indices will be shifted by 1. c. If users use `StringIndex`, the last element is the least frequent one. Sorry for including two changes in one PR! I'll update the user guide in another PR. jkbradley sryza Author: Xiangrui Meng <meng@databricks.com> Closes #6466 from mengxr/SPARK-7912 and squashes the following commits: a280dca [Xiangrui Meng] fix tests d8f234d [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7912 171b276 [Xiangrui Meng] mention the difference between our impl vs sklearn's 00dfd96 [Xiangrui Meng] update OneHotEncoder in Python 208ddad [Xiangrui Meng] update OneHotEncoder to handle ML attributes and change includeFirst to dropLast (cherry picked from commit `23452be944`) Signed-off-by: Xiangrui Meng <meng@databricks.com>	2015-05-29 00:51:24 -07:00
Patrick Wendell	6bf5a42084	Preparing development version 1.4.0-SNAPSHOT	2015-05-28 23:40:27 -07:00
Patrick Wendell	f2796816be	Preparing Spark release v1.4.0-rc3	2015-05-28 23:40:22 -07:00
Reynold Xin	55dc7a6933	[SPARK-7929] Turn whitespace checker on for more token types. This is the last batch of changes to complete SPARK-7929. Previous related PRs: https://github.com/apache/spark/pull/6480 https://github.com/apache/spark/pull/6478 https://github.com/apache/spark/pull/6477 https://github.com/apache/spark/pull/6476 https://github.com/apache/spark/pull/6475 https://github.com/apache/spark/pull/6474 https://github.com/apache/spark/pull/6473 Author: Reynold Xin <rxin@databricks.com> Closes #6487 from rxin/whitespace-lint and squashes the following commits: b33d43d [Reynold Xin] [SPARK-7929] Turn whitespace checker on for more token types. (cherry picked from commit `97a60cf75d`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-28 23:00:08 -07:00
Patrick Wendell	119c93af9c	Preparing development version 1.4.0-SNAPSHOT	2015-05-28 22:57:31 -07:00
Patrick Wendell	2d97d7a0aa	Preparing Spark release v1.4.0-rc3	2015-05-28 22:57:26 -07:00
Patrick Wendell	e419821c3b	[HOTFIX] Minor style fix from last commit	2015-05-28 22:48:25 -07:00
Tathagata Das	7a52fdf25f	[SPARK-7931] [STREAMING] Do not restart receiver when stopped Attempts to restart the socket receiver when it is supposed to be stopped causes undesirable error messages. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #6483 from tdas/SPARK-7931 and squashes the following commits: 09aeee1 [Tathagata Das] Do not restart receiver when stopped	2015-05-28 22:48:23 -07:00
Xiangrui Meng	68559423ac	[SPARK-7922] [MLLIB] use DataFrames for user/item factors in ALSModel Expose user/item factors in DataFrames. This is to be more consistent with the pipeline API. It also helps maintain consistent APIs across languages. This PR also removed fitting params from `ALSModel`. coderxiang Author: Xiangrui Meng <meng@databricks.com> Closes #6468 from mengxr/SPARK-7922 and squashes the following commits: 7bfb1d5 [Xiangrui Meng] update ALSModel in PySpark 1ba5607 [Xiangrui Meng] use DataFrames for user/item factors in ALS (cherry picked from commit `db95137897`) Signed-off-by: Xiangrui Meng <meng@databricks.com>	2015-05-28 22:38:46 -07:00
Tathagata Das	f7cb272b7c	[SPARK-7930] [CORE] [STREAMING] Fixed shutdown hook priorities Shutdown hook for temp directories had priority 100 while SparkContext was 50. So the local root directory was deleted before SparkContext was shutdown. This leads to scary errors on running jobs, at the time of shutdown. This is especially a problem when running streaming examples, where Ctrl-C is the only way to shutdown. The fix in this PR is to make the temp directory shutdown priority lower than SparkContext, so that the temp dirs are the last thing to get deleted, after the SparkContext has been shut down. Also, the DiskBlockManager shutdown priority is change from default 100 to temp_dir_prio + 1, so that it gets invoked just before all temp dirs are cleared. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #6482 from tdas/SPARK-7930 and squashes the following commits: d7cbeb5 [Tathagata Das] Removed unnecessary line 1514d0b [Tathagata Das] Fixed shutdown hook priorities (cherry picked from commit `cd3d9a5c0c`) Signed-off-by: Patrick Wendell <patrick@databricks.com>	2015-05-28 22:28:31 -07:00
Kay Ousterhout	aee046dfa1	[SPARK-7932] Fix misleading scheduler delay visualization The existing code rounds down to the nearest percent when computing the proportion of a task's time that was spent on each phase of execution, and then computes the scheduler delay proportion as 100 - sum(all other proportions). As a result, a few extra percent can end up in the scheduler delay. This commit eliminates the rounding so that the time visualizations correspond properly to the real times. sarutak If you could take a look at this, that would be great! Not sure if there's a good reason to round here that I missed. cc shivaram Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #6484 from kayousterhout/SPARK-7932 and squashes the following commits: 1723cc4 [Kay Ousterhout] [SPARK-7932] Fix misleading scheduler delay visualization (cherry picked from commit `04ddcd4db7`) Signed-off-by: Kay Ousterhout <kayousterhout@gmail.com>	2015-05-28 22:09:59 -07:00
Xiangrui Meng	1d49d8c3fd	[MINOR] fix RegressionEvaluator doc `make clean html` under `python/doc` returns ~~~ /Users/meng/src/spark/python/pyspark/ml/evaluation.py:docstring of pyspark.ml.evaluation.RegressionEvaluator.setParams:3: WARNING: Definition list ends without a blank line; unexpected unindent. ~~~ harsha2010 Author: Xiangrui Meng <meng@databricks.com> Closes #6469 from mengxr/fix-regression-evaluator-doc and squashes the following commits: 91e2dad [Xiangrui Meng] fix RegressionEvaluator doc (cherry picked from commit `834e699524`) Signed-off-by: Xiangrui Meng <meng@databricks.com>	2015-05-28 21:26:49 -07:00
Xiangrui Meng	6e99dd5d04	[SPARK-7926] [PYSPARK] use the official Pyrolite release Switch to the official Pyrolite release from the one published under `org.spark-project`. Thanks irmen for making the releases on Maven Central. We didn't upgrade to 4.6 because we don't have enough time for QA. I excludes `serpent` from its dependencies because we don't use it in Spark. ~~~ [info] +-net.jpountz.lz4:lz4:1.3.0 [info] +-net.razorvine:pyrolite:4.4 [info] +-net.sf.py4j:py4j:0.8.2.1 ~~~ davies Author: Xiangrui Meng <meng@databricks.com> Closes #6472 from mengxr/SPARK-7926 and squashes the following commits: 7b3c6bf [Xiangrui Meng] use the official Pyrolite release (cherry picked from commit `c45d58c143`) Signed-off-by: Xiangrui Meng <meng@databricks.com>	2015-05-28 21:21:01 -07:00
Reynold Xin	b3a590061d	[SPARK-7927] whitespace fixes for GraphX. So we can enable a whitespace enforcement rule in the style checker to save code review time. Author: Reynold Xin <rxin@databricks.com> Closes #6474 from rxin/whitespace-graphx and squashes the following commits: 4d3cd26 [Reynold Xin] Fixed tests. 869dde4 [Reynold Xin] [SPARK-7927] whitespace fixes for GraphX. (cherry picked from commit `b069ad23d9`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-28 20:17:28 -07:00
Reynold Xin	e3dd2802f6	[SPARK-7927] whitespace fixes for core. So we can enable a whitespace enforcement rule in the style checker to save code review time. Author: Reynold Xin <rxin@databricks.com> Closes #6473 from rxin/whitespace-core and squashes the following commits: 058195d [Reynold Xin] Fixed tests. fce11e9 [Reynold Xin] [SPARK-7927] whitespace fixes for core. (cherry picked from commit `7f7505d8db`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-28 20:16:35 -07:00
Reynold Xin	22e42e3fee	[SPARK-7927] whitespace fixes for Catalyst module. So we can enable a whitespace enforcement rule in the style checker to save code review time. Author: Reynold Xin <rxin@databricks.com> Closes #6476 from rxin/whitespace-catalyst and squashes the following commits: 650409d [Reynold Xin] Fixed tests. 51a9e5d [Reynold Xin] [SPARK-7927] whitespace fixes for Catalyst module. (cherry picked from commit `8da560d7de`) Signed-off-by: Reynold Xin <rxin@databricks.com> Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala	2015-05-28 20:14:53 -07:00
Reynold Xin	142ae52d48	[SPARK-7929] Remove Bagel examples & whitespace fix for examples. Author: Reynold Xin <rxin@databricks.com> Closes #6480 from rxin/whitespace-example and squashes the following commits: 8a4a3d4 [Reynold Xin] [SPARK-7929] Remove Bagel examples & whitespace fix for examples. (cherry picked from commit `2881d14cbe`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-28 20:11:11 -07:00
Reynold Xin	9b97e95e86	[SPARK-7927] whitespace fixes for SQL core. So we can enable a whitespace enforcement rule in the style checker to save code review time. Author: Reynold Xin <rxin@databricks.com> Closes #6477 from rxin/whitespace-sql-core and squashes the following commits: ce6e369 [Reynold Xin] Fixed tests. 6095fed [Reynold Xin] [SPARK-7927] whitespace fixes for SQL core. (cherry picked from commit `ff44c711ab`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-28 20:10:28 -07:00
Xiangrui Meng	0c05115063	[SPARK-7927] [MLLIB] Enforce whitespace for more tokens in style checker rxin Author: Xiangrui Meng <meng@databricks.com> Closes #6481 from mengxr/mllib-scalastyle and squashes the following commits: 3ca4d61 [Xiangrui Meng] revert scalastyle config 30961ba [Xiangrui Meng] adjust spaces in mllib/test 571b5c5 [Xiangrui Meng] fix spaces in mllib (cherry picked from commit `04616b1a2f`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-28 20:09:21 -07:00
Kay Ousterhout	3479e6a127	[SPARK-7933] Remove Patrick's username/pw from merge script Looks like this was added by accident when pwendell merged a commit back in September: `fe2b1d6a20` Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #6485 from kayousterhout/SPARK-7933 and squashes the following commits: 7c6164a [Kay Ousterhout] [SPARK-7933] Remove Patrick's username/pw from merge script (cherry picked from commit `66c49ed60d`) Signed-off-by: Patrick Wendell <patrick@databricks.com>	2015-05-28 19:04:51 -07:00
Reynold Xin	3b38c06f0d	[SPARK-7927] whitespace fixes for Hive and ThriftServer. So we can enable a whitespace enforcement rule in the style checker to save code review time. Author: Reynold Xin <rxin@databricks.com> Closes #6478 from rxin/whitespace-hive and squashes the following commits: e01b0e0 [Reynold Xin] Fixed tests. a3bba22 [Reynold Xin] [SPARK-7927] whitespace fixes for Hive and ThriftServer. (cherry picked from commit `ee6a0e12fb`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-28 18:09:09 -07:00
Reynold Xin	f4b135337c	[SPARK-7927] whitespace fixes for streaming. So we can enable a whitespace enforcement rule in the style checker to save code review time. Author: Reynold Xin <rxin@databricks.com> Closes #6475 from rxin/whitespace-streaming and squashes the following commits: 810dae4 [Reynold Xin] Fixed tests. 89068ad [Reynold Xin] [SPARK-7927] whitespace fixes for streaming. (cherry picked from commit `3af0b3136e`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-28 17:55:29 -07:00
Xusen Yin	7bb445a38c	[SPARK-7577] [ML] [DOC] add bucketizer doc CC jkbradley Author: Xusen Yin <yinxusen@gmail.com> Closes #6451 from yinxusen/SPARK-7577 and squashes the following commits: e2dc32e [Xusen Yin] rename colums e350e49 [Xusen Yin] add all demos 006ddf1 [Xusen Yin] add java test 3238481 [Xusen Yin] add bucketizer (cherry picked from commit `1bd63e82fd`) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>	2015-05-28 17:30:33 -07:00
Yin Huai	8f4a86eaa1	[SPARK-7853] [SQL] Fix HiveContext in Spark Shell https://issues.apache.org/jira/browse/SPARK-7853 This fixes the problem introduced by my change in https://github.com/apache/spark/pull/6435, which causes that Hive Context fails to create in spark shell because of the class loader issue. Author: Yin Huai <yhuai@databricks.com> Closes #6459 from yhuai/SPARK-7853 and squashes the following commits: 37ad33e [Yin Huai] Do not use hiveQlTable at all. 47cdb6d [Yin Huai] Move hiveconf.set to the end of setConf. 005649b [Yin Huai] Update comment. 35d86f3 [Yin Huai] Access TTable directly to make sure Hive will not internally use any metastore utility functions. 3737766 [Yin Huai] Recursively find all jars. (cherry picked from commit `572b62cafe`) Signed-off-by: Yin Huai <yhuai@databricks.com>	2015-05-28 17:12:38 -07:00
Reynold Xin	9c2c6b4a67	Remove SizeEstimator from o.a.spark package. See comments on https://github.com/apache/spark/pull/3913 Author: Reynold Xin <rxin@databricks.com> Closes #6471 from rxin/sizeestimator and squashes the following commits: c057095 [Reynold Xin] Fixed import. 2da478b [Reynold Xin] Remove SizeEstimator from o.a.spark package. (cherry picked from commit `0077af22ca`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-28 16:57:06 -07:00
Xiangrui Meng	b9bdf12a1c	[SPARK-7198] [MLLIB] VectorAssembler should output ML attributes `VectorAssembler` should carry over ML attributes. For unknown attributes, we assume numeric values. This PR handles the following cases: 1. DoubleType with ML attribute: carry over 2. DoubleType without ML attribute: numeric value 3. Scalar type: numeric value 4. VectorType with all ML attributes: carry over and update names 5. VectorType with number of ML attributes: assume all numeric 6. VectorType without ML attributes: check the first row and get the number of attributes jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #6452 from mengxr/SPARK-7198 and squashes the following commits: a9d2469 [Xiangrui Meng] add space facdb1f [Xiangrui Meng] VectorAssembler should output ML attributes (cherry picked from commit `7859ab659e`) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>	2015-05-28 16:32:59 -07:00
Mike Dusenberry	0a65224aed	[DOCS] Fixing broken "IDE setup" link in the Building Spark documentation. The location of the IDE setup information has changed, so this just updates the link on the Building Spark page. Author: Mike Dusenberry <dusenberrymw@gmail.com> Closes #6467 from dusenberrymw/Fix_Broken_Link_On_Building_Spark_Doc and squashes the following commits: 75c533a [Mike Dusenberry] Fixing broken "IDE setup" link in the Building Spark documentation by pointing to new location. (cherry picked from commit `3e312a5ed0`) Signed-off-by: Sean Owen <sowen@cloudera.com>	2015-05-28 17:16:42 -04:00
Li Yao	4485283981	[MINOR] Fix the a minor bug in PageRank Example. Fix the bug that entering only 1 arg will cause array out of bounds exception in PageRank example. Author: Li Yao <hnkfliyao@gmail.com> Closes #6455 from lastland/patch-1 and squashes the following commits: de06128 [Li Yao] Fix the bug that entering only 1 arg will cause array out of bounds exception. (cherry picked from commit `c771589c96`) Signed-off-by: Andrew Or <andrew@databricks.com>	2015-05-28 13:39:49 -07:00
Xiangrui Meng	7b5dffb802	[SPARK-7911] [MLLIB] A workaround for VectorUDT serialize (or deserialize) being called multiple times ~~A PythonUDT shouldn't be serialized into external Scala types in PythonRDD. I'm not sure whether this should fix one of the bugs related to SQL UDT/UDF in PySpark.~~ The fix above didn't work. So I added a workaround for this. If a Python UDF is applied to a Python UDT. This will put the Python SQL types as inputs. Still incorrect, but at least it doesn't throw exceptions on the Scala side. davies harsha2010 Author: Xiangrui Meng <meng@databricks.com> Closes #6442 from mengxr/SPARK-7903 and squashes the following commits: c257d2a [Xiangrui Meng] add a workaround for VectorUDT (cherry picked from commit `530efe3e80`) Signed-off-by: Xiangrui Meng <meng@databricks.com>	2015-05-28 12:03:55 -07:00
zsxwing	ab62d73ddb	[SPARK-7895] [STREAMING] [EXAMPLES] Move Kafka examples from scala-2.10/src to src Since `spark-streaming-kafka` now is published for both Scala 2.10 and 2.11, we can move `KafkaWordCount` and `DirectKafkaWordCount` from `examples/scala-2.10/src/` to `examples/src/` so that they will appear in `spark-examples-***-jar` for Scala 2.11. Author: zsxwing <zsxwing@gmail.com> Closes #6436 from zsxwing/SPARK-7895 and squashes the following commits: c6052f1 [zsxwing] Update examples/pom.xml 0bcfa87 [zsxwing] Fix the sleep time b9d1256 [zsxwing] Move Kafka examples from scala-2.10/src to src (cherry picked from commit `000df2f0d6`) Signed-off-by: Patrick Wendell <patrick@databricks.com>	2015-05-28 09:04:22 -07:00
zuxqoj	bd568df224	[SPARK-7782] fixed sort arrow issue Current behaviour:: In spark UI ![screen shot 2015-05-27 at 3 27 51 pm](https://cloud.githubusercontent.com/assets/3919211/7837541/47d330ba-04a5-11e5-89d1-e5b11da1a513.png) In YARN ![screen shot 2015-05-27 at 3](https://cloud.githubusercontent.com/assets/3919211/7837594/aebd1d36-04a5-11e5-8216-86e03c07d2bd.png) In jira ![screen shot 2015-05-27 at 3_2](https://cloud.githubusercontent.com/assets/3919211/7837616/d3fedce2-04a5-11e5-9e68-960ed54e5d83.png) Author: zuxqoj <sbshekhar@gmail.com> Closes #6437 from zuxqoj/SPARK-7782_PR and squashes the following commits: cd068b9 [zuxqoj] [SPARK-7782] fixed sort arrow issue (cherry picked from commit `e838a25bdb`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-05-27 23:13:19 -07:00

1 2 3 4 5 ...

11294 commits