ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Reynold Xin	42de5253f3	[SPARK-11745][SQL] Enable more JSON parsing options This patch adds the following options to the JSON data source, for dealing with non-standard JSON files: * `allowComments` (default `false`): ignores Java/C++ style comment in JSON records * `allowUnquotedFieldNames` (default `false`): allows unquoted JSON field names * `allowSingleQuotes` (default `true`): allows single quotes in addition to double quotes * `allowNumericLeadingZeros` (default `false`): allows leading zeros in numbers (e.g. 00012) To avoid passing a lot of options throughout the json package, I introduced a new JSONOptions case class to define all JSON config options. Also updated documentation to explain these options. Scala ![screen shot 2015-11-15 at 6 12 12 pm](https://cloud.githubusercontent.com/assets/323388/11172965/e3ace6ec-8bc4-11e5-805e-2d78f80d0ed6.png) Python ![screen shot 2015-11-15 at 6 11 28 pm](https://cloud.githubusercontent.com/assets/323388/11172964/e23ed6ee-8bc4-11e5-8216-312f5983acd5.png) Author: Reynold Xin <rxin@databricks.com> Closes #9724 from rxin/SPARK-11745.	2015-11-16 00:06:14 -08:00
Josh Rosen	fd50fa4c3e	Revert "[SPARK-11572] Exit AsynchronousListenerBus thread when stop() is called" This reverts commit `3e0a6cf1e0`.	2015-11-15 22:38:30 -08:00
gatorsmile	b58765caa6	[SPARK-9928][SQL] Removal of LogicalLocalTable LogicalLocalTable in ExistingRDD.scala is replaced by localRelation in LocalRelation.scala? Do you know any reason why we still keep this class? Author: gatorsmile <gatorsmile@gmail.com> Closes #9717 from gatorsmile/LogicalLocalTable.	2015-11-15 21:10:46 -08:00
Sun Rui	835a79d78e	[SPARK-10500][SPARKR] sparkr.zip cannot be created if /R/lib is unwritable The basic idea is that: The archive of the SparkR package itself, that is sparkr.zip, is created during build process and is contained in the Spark binary distribution. No change to it after the distribution is installed as the directory it resides ($SPARK_HOME/R/lib) may not be writable. When there is R source code contained in jars or Spark packages specified with "--jars" or "--packages" command line option, a temporary directory is created by calling Utils.createTempDir() where the R packages built from the R source code will be installed. The temporary directory is writable, and won't interfere with each other when there are multiple SparkR sessions, and will be deleted when this SparkR session ends. The R binary packages installed in the temporary directory then are packed into an archive named rpkg.zip. sparkr.zip and rpkg.zip are distributed to the cluster in YARN modes. The distribution of rpkg.zip in Standalone modes is not supported in this PR, and will be address in another PR. Various R files are updated to accept multiple lib paths (one is for SparkR package, the other is for other R packages) so that these package can be accessed in R. Author: Sun Rui <rui.sun@intel.com> Closes #9390 from sun-rui/SPARK-10500.	2015-11-15 19:29:09 -08:00
zero323	d7d9fa0b87	[SPARK-11086][SPARKR] Use dropFactors column-wise instead of nested loop when createDataFrame Use `dropFactors` column-wise instead of nested loop when `createDataFrame` from a `data.frame` At this moment SparkR createDataFrame is using nested loop to convert factors to character when called on a local data.frame. It works but is incredibly slow especially with data.table (~ 2 orders of magnitude compared to PySpark / Pandas version on a DateFrame of size 1M rows x 2 columns). A simple improvement is to apply `dropFactor `column-wise and then reshape output list. It should at least partially address [SPARK-8277](https://issues.apache.org/jira/browse/SPARK-8277). Author: zero323 <matthew.szymkiewicz@gmail.com> Closes #9099 from zero323/SPARK-11086.	2015-11-15 19:15:27 -08:00
Yu Gao	72c1d68b4a	[SPARK-10181][SQL] Do kerberos login for credentials during hive client initialization On driver process start up, UserGroupInformation.loginUserFromKeytab is called with the principal and keytab passed in, and therefore static var UserGroupInfomation,loginUser is set to that principal with kerberos credentials saved in its private credential set, and all threads within the driver process are supposed to see and use this login credentials to authenticate with Hive and Hadoop. However, because of IsolatedClientLoader, UserGroupInformation class is not shared for hive metastore clients, and instead it is loaded separately and of course not able to see the prepared kerberos login credentials in the main thread. The first proposed fix would cause other classloader conflict errors, and is not an appropriate solution. This new change does kerberos login during hive client initialization, which will make credentials ready for the particular hive client instance. yhuai Please take a look and let me know. If you are not the right person to talk to, could you point me to someone responsible for this? Author: Yu Gao <ygao@us.ibm.com> Author: gaoyu <gaoyu@gaoyu-macbookpro.roam.corp.google.com> Author: Yu Gao <crystalgaoyu@gmail.com> Closes #9272 from yolandagao/master.	2015-11-15 14:53:59 -08:00
Yin Huai	3e2e1873b2	[SPARK-11738] [SQL] Making ArrayType orderable https://issues.apache.org/jira/browse/SPARK-11738 Author: Yin Huai <yhuai@databricks.com> Closes #9718 from yhuai/makingArrayOrderable.	2015-11-15 13:59:59 -08:00
Xiangrui Meng	64e5551103	[SPARK-11672][ML] set active SQLContext in JavaDefaultReadWriteSuite The same as #9694, but for Java test suite. yhuai Author: Xiangrui Meng <meng@databricks.com> Closes #9719 from mengxr/SPARK-11672.4.	2015-11-15 13:23:05 -08:00
Reynold Xin	d22fc10887	[SPARK-11734][SQL] Rename TungstenProject -> Project, TungstenSort -> Sort I didn't remove the old Sort operator, since we still use it in randomized tests. I moved it into test module and renamed it ReferenceSort. Author: Reynold Xin <rxin@databricks.com> Closes #9700 from rxin/SPARK-11734.	2015-11-15 10:33:53 -08:00
Yin Huai	d83c2f9f0b	[SPARK-11736][SQL] Add monotonically_increasing_id to function registry. https://issues.apache.org/jira/browse/SPARK-11736 Author: Yin Huai <yhuai@databricks.com> Closes #9703 from yhuai/MonotonicallyIncreasingID.	2015-11-14 21:04:18 -08:00
Rohan Bhanderi	22e96b87fb	Typo in comment: use 2 seconds instead of 1 Use 2 seconds batch size as duration specified in JavaStreamingContext constructor is 2000 ms Author: Rohan Bhanderi <rohan.bhanderi@sjsu.edu> Closes #9714 from RohanBhanderi/patch-2.	2015-11-14 13:38:53 +00:00
Gábor Lipták	9461f5ee80	[SPARK-11573] Correct 'reflective access of structural type member meth… …od should be enabled' Scala warnings Author: Gábor Lipták <gliptak@gmail.com> Closes #9550 from gliptak/SPARK-11573.	2015-11-14 12:02:02 +00:00
Kai Jiang	9a73b33a9a	[MINOR][DOCS] typo in docs/configuration.md `<\code>` end tag missing backslash in docs/configuration.md{L308-L339} ref #8795 Author: Kai Jiang <jiangkai@gmail.com> Closes #9715 from vectorijk/minor-typo-docs.	2015-11-14 11:59:37 +00:00
hyukjinkwon	139c15b624	[SPARK-11694][SQL] Parquet logical types are not being tested properly All the physical types are properly tested at `ParquetIOSuite` but logical type mapping is not being tested. Author: hyukjinkwon <gurwls223@gmail.com> Author: Hyukjin Kwon <gurwls223@gmail.com> Closes #9660 from HyukjinKwon/SPARK-11694.	2015-11-14 18:36:01 +08:00
nitin goyal	c939c70ac1	[SPARK-7970] Skip closure cleaning for SQL operations Also introduces new spark private API in RDD.scala with name 'mapPartitionsInternal' which doesn't closure cleans the RDD elements. Author: nitin goyal <nitin.goyal@guavus.com> Author: nitin.goyal <nitin.goyal@guavus.com> Closes #9253 from nitin2goyal/master.	2015-11-13 18:09:08 -08:00
Xiangrui Meng	bdfbc1dcaf	[MINOR][ML] remove MLlibTestsSparkContext from ImpuritySuite ImpuritySuite doesn't need SparkContext. Author: Xiangrui Meng <meng@databricks.com> Closes #9698 from mengxr/remove-mllib-test-context-in-impurity-suite.	2015-11-13 13:19:04 -08:00
Xusen Yin	912b94363b	[SPARK-11336] Add links to example codes https://issues.apache.org/jira/browse/SPARK-11336 mengxr I add a hyperlink of Spark on Github and a hint of their existences in Spark code repo in each code example. I remove the config key for changing the example code dir, since we assume all examples should be in spark/examples. The hyperlink, though we cannot use it now, since the Spark v1.6.0 has not been released yet, can be used after the release. So it is not a problem. I add some screen shots, so you can get an instant feeling. <img width="949" alt="screen shot 2015-10-27 at 10 47 18 pm" src="https://cloud.githubusercontent.com/assets/2637239/10780634/bd20e072-7cfc-11e5-8960-def4fc62a8ea.png"> <img width="1144" alt="screen shot 2015-10-27 at 10 47 31 pm" src="https://cloud.githubusercontent.com/assets/2637239/10780636/c3f6e180-7cfc-11e5-80b2-233589f4a9a3.png"> Author: Xusen Yin <yinxusen@gmail.com> Closes #9320 from yinxusen/SPARK-11336.	2015-11-13 13:14:25 -08:00
Xiangrui Meng	2d2411faa2	[SPARK-11672][ML] Set active SQLContext in MLlibTestSparkContext.beforeAll Still saw some error messages caused by `SQLContext.getOrCreate`: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/3997/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.3,label=spark-test/testReport/junit/org.apache.spark.ml.util/JavaDefaultReadWriteSuite/testDefaultReadWrite/ This PR sets the active SQLContext in beforeAll, which is not automatically set in `new SQLContext`. This makes `SQLContext.getOrCreate` return the right SQLContext. cc: yhuai Author: Xiangrui Meng <meng@databricks.com> Closes #9694 from mengxr/SPARK-11672.3.	2015-11-13 13:09:28 -08:00
Wenchen Fan	d7b2b97ad6	[SPARK-11727][SQL] Split ExpressionEncoder into FlatEncoder and ProductEncoder also add more tests for encoders, and fix bugs that I found: * when convert array to catalyst array, we can only skip element conversion for native types(e.g. int, long, boolean), not `AtomicType`(String is AtomicType but we need to convert it) * we should also handle scala `BigDecimal` when convert from catalyst `Decimal`. * complex map type should be supported other issues that still in investigation: * encode java `BigDecimal` and decode it back, seems we will loss precision info. * when encode case class that defined inside a object, `ClassNotFound` exception will be thrown. I'll remove unused code in a follow-up PR. Author: Wenchen Fan <wenchen@databricks.com> Closes #9693 from cloud-fan/split.	2015-11-13 11:25:33 -08:00
Wenchen Fan	23b8188f75	[SPARK-11654][SQL][FOLLOW-UP] fix some mistakes and clean up * rename `AppendColumn` to `AppendColumns` to be consistent with the physical plan name. * clean up stale comments. * always pass in resolved encoder to `TypedColumn.withInputType`(test added) * enable a mistakenly disabled java test. Author: Wenchen Fan <wenchen@databricks.com> Closes #9688 from cloud-fan/follow.	2015-11-13 11:13:09 -08:00
Andrew Ray	a24477996e	[SPARK-11690][PYSPARK] Add pivot to python api This PR adds pivot to the python api of GroupedData with the same syntax as Scala/Java. Author: Andrew Ray <ray.andrew@gmail.com> Closes #9653 from aray/sql-pivot-python.	2015-11-13 10:31:17 -08:00
Yanbo Liang	99693fef0a	[SPARK-11723][ML][DOC] Use LibSVM data source rather than MLUtils.loadLibSVMFile to load DataFrame Use LibSVM data source rather than MLUtils.loadLibSVMFile to load DataFrame, include: * Use libSVM data source for all example codes under examples/ml, and remove unused import. * Use libSVM data source for user guides under ml-*** which were omitted by #8697. * Fix bug: We should use ```sqlContext.read().format("libsvm").load(path)``` at Java side, but the API doc and user guides misuse as ```sqlContext.read.format("libsvm").load(path)```. * Code cleanup. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9690 from yanboliang/spark-11723.	2015-11-13 08:43:05 -08:00
Rishabh Bhardwaj	61a28486cc	[SPARK-11445][DOCS] Replaced example code in mllib-ensembles.md using include_example I have made the required changes and tested. Kindly review the changes. Author: Rishabh Bhardwaj <rbnext29@gmail.com> Closes #9407 from rishabhbhardwaj/SPARK-11445.	2015-11-13 08:36:46 -08:00
Yin Huai	7b5d9051cf	[SPARK-11678][SQL] Partition discovery should stop at the root path of the table. https://issues.apache.org/jira/browse/SPARK-11678 The change of this PR is to pass root paths of table to the partition discovery logic. So, the process of partition discovery stops at those root paths instead of going all the way to the root path of the file system. Author: Yin Huai <yhuai@databricks.com> Closes #9651 from yhuai/SPARK-11678.	2015-11-13 18:36:56 +08:00
Shixiong Zhu	ec80c0c2fc	[SPARK-11706][STREAMING] Fix the bug that Streaming Python tests cannot report failures This PR just checks the test results and returns 1 if the test fails, so that `run-tests.py` can mark it fail. Author: Shixiong Zhu <shixiong@databricks.com> Closes #9669 from zsxwing/streaming-python-tests.	2015-11-13 00:30:27 -08:00
Davies Liu	ad960885bf	[SPARK-8029] Robust shuffle writer Currently, all the shuffle writer will write to target path directly, the file could be corrupted by other attempt of the same partition on the same executor. They should write to temporary file then rename to target path, as what we do in output committer. In order to make the rename atomic, the temporary file should be created in the same local directory (FileSystem). This PR is based on #9214 , thanks to squito . Closes #9214 Author: Davies Liu <davies@databricks.com> Closes #9610 from davies/safe_shuffle.	2015-11-12 22:44:57 -08:00
Yanbo Liang	ea5ae2705a	[SPARK-11629][ML][PYSPARK][DOC] Python example code for Multilayer Perceptron Classification Add Python example code for Multilayer Perceptron Classification, and make example code in user guide document testable. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9594 from yanboliang/spark-11629.	2015-11-12 21:29:43 -08:00
Lewuathe	2035ed392e	[SPARK-11717] Ignore R session and history files from git see: https://issues.apache.org/jira/browse/SPARK-11717 SparkR generates R session data and history files under current directory. It might be useful to ignore these files even running SparkR on spark directory for test or development. Author: Lewuathe <lewuathe@me.com> Closes #9681 from Lewuathe/SPARK-11717.	2015-11-12 20:09:42 -08:00
felixcheung	ed04846e14	[SPARK-11263][SPARKR] lintr Throws Warnings on Commented Code in Documentation Clean out hundreds of `style: Commented code should be removed.` from lintr Like these: ``` /opt/spark-1.6.0-bin-hadoop2.6/R/pkg/R/DataFrame.R:513:3: style: Commented code should be removed. # sc <- sparkR.init() ^~~~~~~~~~~~~~~~~~~ /opt/spark-1.6.0-bin-hadoop2.6/R/pkg/R/DataFrame.R:514:3: style: Commented code should be removed. # sqlContext <- sparkRSQL.init(sc) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /opt/spark-1.6.0-bin-hadoop2.6/R/pkg/R/DataFrame.R:515:3: style: Commented code should be removed. # path <- "path/to/file.json" ^~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` tried without export or rdname, neither work instead, added this `#' noRd` to suppress .Rd file generation also updated `family` for DataFrame functions for longer descriptive text instead of `dataframe_funcs` ![image](https://cloud.githubusercontent.com/assets/8969467/10933937/17bf5b1e-8291-11e5-9777-40fc632105dc.png) this covers most of 'Commented code' but I left out a few that looks legitimate. Author: felixcheung <felixcheung_m@hotmail.com> Closes #9463 from felixcheung/rlintr.	2015-11-12 20:02:49 -08:00
Xiangrui Meng	e71c07557c	[SPARK-11672][ML] flaky spark.ml read/write tests We set `sqlContext = null` in `afterAll`. However, this doesn't change `SQLContext.activeContext` and then `SQLContext.getOrCreate` might use the `SparkContext` from previous test suite and hence causes the error. This PR calls `clearActive` in `beforeAll` and `afterAll` to avoid using an old context from other test suites. cc: yhuai Author: Xiangrui Meng <meng@databricks.com> Closes #9677 from mengxr/SPARK-11672.2.	2015-11-12 20:01:13 -08:00
Tathagata Das	e4e46b20f6	[SPARK-11681][STREAMING] Correctly update state timestamp even when state is not updated Bug: Timestamp is not updated if there is data but the corresponding state is not updated. This is wrong, and timeout is defined as "no data for a while", not "not state update for a while". Fix: Update timestamp when timestamp when timeout is specified, otherwise no need. Also refactored the code for better testability and added unit tests. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #9648 from tdas/SPARK-11681.	2015-11-12 19:02:49 -08:00
Burak Yavuz	7786f9cc07	[SPARK-11419][STREAMING] Parallel recovery for FileBasedWriteAheadLog + minor recovery tweaks The support for closing WriteAheadLog files after writes was just merged in. Closing every file after a write is a very expensive operation as it creates many small files on S3. It's not necessary to enable it on HDFS anyway. However, when you have many small files on S3, recovery takes very long. In addition, files start stacking up pretty quickly, and deletes may not be able to keep up, therefore deletes can also be parallelized. This PR adds support for the two parallelization steps mentioned above, in addition to a couple more failures I encountered during recovery. Author: Burak Yavuz <brkyvz@gmail.com> Closes #9373 from brkyvz/par-recovery.	2015-11-12 18:03:23 -08:00
Shixiong Zhu	0f1d00a905	[SPARK-11663][STREAMING] Add Java API for trackStateByKey TODO - [x] Add Java API - [x] Add API tests - [x] Add a function test Author: Shixiong Zhu <shixiong@databricks.com> Closes #9636 from zsxwing/java-track.	2015-11-12 17:48:43 -08:00
Michael Armbrust	41bbd23004	[SPARK-11654][SQL] add reduce to GroupedDataset This PR adds a new method, `reduce`, to `GroupedDataset`, which allows similar operations to `reduceByKey` on a traditional `PairRDD`. ```scala val ds = Seq("abc", "xyz", "hello").toDS() ds.groupBy(_.length).reduce(_ + _).collect() // not actually commutative :P res0: Array(3 -> "abcxyz", 5 -> "hello") ``` While implementing this method and its test cases several more deficiencies were found in our encoder handling. Specifically, in order to support positional resolution, named resolution and tuple composition, it is important to keep the unresolved encoder around and to use it when constructing new `Datasets` with the same object type but different output attributes. We now divide the encoder lifecycle into three phases (that mirror the lifecycle of standard expressions) and have checks at various boundaries: - Unresoved Encoders: all users facing encoders (those constructed by implicits, static methods, or tuple composition) are unresolved, meaning they have only `UnresolvedAttributes` for named fields and `BoundReferences` for fields accessed by ordinal. - Resolved Encoders: internal to a `[Grouped]Dataset` the encoder is resolved, meaning all input has been resolved to a specific `AttributeReference`. Any encoders that are placed into a logical plan for use in object construction should be resolved. - BoundEncoder: Are constructed by physical plans, right before actual conversion from row -> object is performed. It is left to future work to add explicit checks for resolution and provide good error messages when it fails. We might also consider enforcing the above constraints in the type system (i.e. `fromRow` only exists on a `ResolvedEncoder`), but we should probably wait before spending too much time on this. Author: Michael Armbrust <michael@databricks.com> Author: Wenchen Fan <wenchen@databricks.com> Closes #9673 from marmbrus/pr/9628.	2015-11-12 17:20:30 -08:00
Joseph K. Bradley	dcb896fd8c	[SPARK-11712][ML] Make spark.ml LDAModel be abstract Per discussion in the initial Pipelines LDA PR [https://github.com/apache/spark/pull/9513], we should make LDAModel abstract and create a LocalLDAModel. This code simplification should be done before the 1.6 release to ensure API compatibility in future releases. CC feynmanliang mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #9678 from jkbradley/lda-pipelines-2.	2015-11-12 17:03:19 -08:00
Xiangrui Meng	bc092966f8	[SPARK-11709] include creation site info in SparkContext.assertNotStopped error message This helps debug issues caused by multiple SparkContext instances. JoshRosen andrewor14 ~~~ scala> sc.stop() scala> sc.parallelize(0 until 10) java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext. This stopped SparkContext was created at: org.apache.spark.SparkContext.<init>(SparkContext.scala:82) org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017) $iwC$$iwC.<init>(<console>:9) $iwC.<init>(<console>:18) <init>(<console>:20) .<init>(<console>:24) .<clinit>(<console>) .<init>(<console>:7) .<clinit>(<console>) $print(<console>) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:606) org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340) org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) The active context was created at: (No active SparkContext.) ~~~ Author: Xiangrui Meng <meng@databricks.com> Closes #9675 from mengxr/SPARK-11709.	2015-11-12 16:43:04 -08:00
Chris Snow	68ef61bb65	[SPARK-11658] simplify documentation for PySpark combineByKey Author: Chris Snow <chsnow123@gmail.com> Closes #9640 from snowch/patch-3.	2015-11-12 15:50:47 -08:00
Andrew Or	12a0784ac0	[SPARK-11667] Update dynamic allocation docs to reflect supported cluster managers Author: Andrew Or <andrew@databricks.com> Closes #9637 from andrewor14/update-da-docs.	2015-11-12 15:48:42 -08:00
Andrew Or	cf38fc7551	[SPARK-11670] Fix incorrect kryo buffer default value in docs <img width="931" alt="screen shot 2015-11-11 at 1 53 21 pm" src="https://cloud.githubusercontent.com/assets/2133137/11108261/35d183d4-889a-11e5-9572-85e9d6cebd26.png"> Author: Andrew Or <andrew@databricks.com> Closes #9638 from andrewor14/fix-kryo-docs.	2015-11-12 15:47:29 -08:00
Jean-Baptiste Onofré	74c30049a8	[SPARK-2533] Add locality levels on stage summary view Author: Jean-Baptiste Onofré <jbonofre@apache.org> Closes #9487 from jbonofre/SPARK-2533-2.	2015-11-12 15:46:21 -08:00
Chris Snow	380dfcc0dc	[SPARK-11671] documentation code example typo Example for sqlContext.createDataDrame from pandas.DataFrame has a typo Author: Chris Snow <chsnow123@gmail.com> Closes #9639 from snowch/patch-2.	2015-11-12 15:42:30 -08:00
Shixiong Zhu	f0d3b58d91	[SPARK-11290][STREAMING][TEST-MAVEN] Fix the test for maven build Should not create SparkContext in the constructor of `TrackStateRDDSuite`. This is a follow up PR for #9256 to fix the test for maven build. Author: Shixiong Zhu <shixiong@databricks.com> Closes #9668 from zsxwing/hotfix.	2015-11-12 14:52:03 -08:00
Marcelo Vanzin	767d288b6b	[SPARK-11655][CORE] Fix deadlock in handling of launcher stop(). The stop() callback was trying to close the launcher connection in the same thread that handles connection data, which ended up causing a deadlock. So avoid that by dispatching the stop() request in its own thread. On top of that, add some exception safety to a few parts of the code, and use "destroyForcibly" from Java 8 if it's available, to force kill the child process. The flip side is that "kill()" may not actually work if running Java 7. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9633 from vanzin/SPARK-11655.	2015-11-12 14:29:16 -08:00
JihongMa	d292f74831	[SPARK-11420] Updating Stddev support via Imperative Aggregate switched stddev support from DeclarativeAggregate to ImperativeAggregate. Author: JihongMa <linlin200605@gmail.com> Closes #9380 from JihongMA/SPARK-11420.	2015-11-12 13:47:34 -08:00
hyukjinkwon	f5a9526fec	[SPARK-10113][SQL] Explicit error message for unsigned Parquet logical types Parquet supports some unsigned datatypes. However, Since Spark does not support unsigned datatypes, it needs to emit an exception with a clear message rather then with the one saying illegal datatype. Author: hyukjinkwon <gurwls223@gmail.com> Closes #9646 from HyukjinKwon/SPARK-10113.	2015-11-12 12:29:50 -08:00
Cheng Lian	4fe99c72c6	[SPARK-11191][SQL] Looks up temporary function using execution Hive client When looking up Hive temporary functions, we should always use the `SessionState` within the execution Hive client, since temporary functions are registered there. Author: Cheng Lian <lian@databricks.com> Closes #9664 from liancheng/spark-11191.fix-temp-function.	2015-11-12 12:17:51 -08:00
Gaurav Kumar	df0e318152	Fixed error in scaladoc of convertToCanonicalEdges The code convertToCanonicalEdges is such that srcIds are smaller than dstIds but the scaladoc suggested otherwise. Have fixed the same. Author: Gaurav Kumar <gauravkumar37@gmail.com> Closes #9666 from gauravkumar37/patch-1.	2015-11-12 12:14:00 -08:00
jerryshao	08660a0bc9	[BUILD][MINOR] Remove non-exist yarnStable module in Sbt project Remove some old yarn related building codes, please review, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #9625 from jerryshao/remove-old-module.	2015-11-12 17:23:24 +01:00
Reynold Xin	30e7433643	[SPARK-11673][SQL] Remove the normal Project physical operator (and keep TungstenProject) Also make full outer join being able to produce UnsafeRows. Author: Reynold Xin <rxin@databricks.com> Closes #9643 from rxin/SPARK-11673.	2015-11-12 08:14:08 -08:00
Yin Huai	14cf753704	[SPARK-11661][SQL] Still pushdown filters returned by unhandledFilters. https://issues.apache.org/jira/browse/SPARK-11661 Author: Yin Huai <yhuai@databricks.com> Closes #9634 from yhuai/unhandledFilters.	2015-11-12 16:47:00 +08:00

... 3 4 5 6 7 ...

13853 commits