ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Eric Liang	8d5bb5283c	[SPARK-9391] [ML] Support minus, dot, and intercept operators in SparkR RFormula Adds '.', '-', and intercept parsing to RFormula. Also splits RFormulaParser into a separate file. Umbrella design doc here: https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit?usp=sharing mengxr Author: Eric Liang <ekl@databricks.com> Closes #7707 from ericl/string-features-2 and squashes the following commits: 8588625 [Eric Liang] exclude complex types for . 8106ffe [Eric Liang] comments a9350bb [Eric Liang] s/var/val 9c50d4d [Eric Liang] Merge branch 'string-features' into string-features-2 581afb2 [Eric Liang] Merge branch 'master' into string-features 08ae539 [Eric Liang] Merge branch 'string-features' into string-features-2 f99131a [Eric Liang] comments cecec43 [Eric Liang] Merge branch 'string-features' into string-features-2 0bf3c26 [Eric Liang] update docs 4592df2 [Eric Liang] intercept supports 7412a2e [Eric Liang] Fri Jul 24 14:56:51 PDT 2015 3cf848e [Eric Liang] fix the parser 0556c2b [Eric Liang] Merge branch 'string-features' into string-features-2 c302a2c [Eric Liang] fix tests 9d1ac82 [Eric Liang] Merge remote-tracking branch 'upstream/master' into string-features e713da3 [Eric Liang] comments cd231a9 [Eric Liang] Wed Jul 22 17:18:44 PDT 2015 4d79193 [Eric Liang] revert to seq + distinct 169a085 [Eric Liang] tweak functional test a230a47 [Eric Liang] Merge branch 'master' into string-features 72bd6f3 [Eric Liang] fix merge d841cec [Eric Liang] Merge branch 'master' into string-features 5b2c4a2 [Eric Liang] Mon Jul 20 18:45:33 PDT 2015 b01c7c5 [Eric Liang] add test 8a637db [Eric Liang] encoder wip a1d03f4 [Eric Liang] refactor into estimator	2015-07-28 14:16:57 -07:00
Yin Huai	6cdcc21fe6	[SPARK-9196] [SQL] Ignore test DatetimeExpressionsSuite: function current_timestamp. This test is flaky. https://issues.apache.org/jira/browse/SPARK-9196 will track the fix of it. For now, let's disable this test. Author: Yin Huai <yhuai@databricks.com> Closes #7727 from yhuai/SPARK-9196-ignore and squashes the following commits: f92bded [Yin Huai] Ignore current_timestamp.	2015-07-28 13:16:48 -07:00
Marcelo Vanzin	31ec6a871e	[SPARK-9327] [DOCS] Fix documentation about classpath config options. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #7651 from vanzin/SPARK-9327 and squashes the following commits: 2923e23 [Marcelo Vanzin] [SPARK-9327] [docs] Fix documentation about classpath config options.	2015-07-28 11:48:56 -07:00
trestletech	6143234062	Use vector-friendly comparison for packages argument. Otherwise, `sparkR.init()` with multiple `sparkPackages` results in this warning: ``` Warning message: In if (packages != "") { : the condition has length > 1 and only the first element will be used ``` Author: trestletech <jeff.allen@trestletechnology.net> Closes #7701 from trestletech/compare-packages and squashes the following commits: 72c8b36 [trestletech] Correct function name. c52db0e [trestletech] Added test for multiple packages. 3aab1a7 [trestletech] Use vector-friendly comparison for packages argument.	2015-07-28 10:45:19 -07:00
Aaron Davidson	35ef853b3f	[SPARK-9397] DataFrame should provide an API to find source data files if applicable Certain applications would benefit from being able to inspect DataFrames that are straightforwardly produced by data sources that stem from files, and find out their source data. For example, one might want to display to a user the size of the data underlying a table, or to copy or mutate it. This PR exposes an `inputFiles` method on DataFrame which attempts to discover the source data in a best-effort manner, by inspecting HadoopFsRelations and JSONRelations. Author: Aaron Davidson <aaron@databricks.com> Closes #7717 from aarondav/paths and squashes the following commits: ff67430 [Aaron Davidson] inputFiles 0acd3ad [Aaron Davidson] [SPARK-9397] DataFrame should provide an API to find source data files if applicable	2015-07-28 10:12:09 -07:00
Reynold Xin	9bbe0171cb	[SPARK-8196][SQL] Fix null handling & documentation for next_day. The original patch didn't handle nulls correctly for next_day. Author: Reynold Xin <rxin@databricks.com> Closes #7718 from rxin/next_day and squashes the following commits: 616a425 [Reynold Xin] Merged DatetimeExpressionsSuite into DateFunctionsSuite. faa78cf [Reynold Xin] Merged DatetimeFunctionsSuite into DateExpressionsSuite. 6c4fb6a [Reynold Xin] [SPARK-8196][SQL] Fix null handling & documentation for next_day.	2015-07-28 09:43:39 -07:00
Reynold Xin	c740bed172	[SPARK-9373][SQL] follow up for StructType support in Tungsten projection. Author: Reynold Xin <rxin@databricks.com> Closes #7720 from rxin/struct-followup and squashes the following commits: d9757f5 [Reynold Xin] [SPARK-9373][SQL] follow up for StructType support in Tungsten projection.	2015-07-28 09:43:12 -07:00
Reynold Xin	5a2330e546	[SPARK-9402][SQL] Remove CodegenFallback from Abs / FormatNumber. Both expressions already implement code generation. Author: Reynold Xin <rxin@databricks.com> Closes #7723 from rxin/abs-formatnum and squashes the following commits: 31ed765 [Reynold Xin] [SPARK-9402][SQL] Remove CodegenFallback from Abs / FormatNumber.	2015-07-28 09:42:35 -07:00
vinodkc	4af622c855	[SPARK-8919] [DOCUMENTATION, MLLIB] Added @since tags to mllib.recommendation Author: vinodkc <vinod.kc.in@gmail.com> Closes #7325 from vinodkc/add_since_mllib.recommendation and squashes the following commits: 93156f2 [vinodkc] Changed 0.8.0 to 0.9.1 c413350 [vinodkc] Added @since	2015-07-28 08:48:57 -07:00
Kenichi Maehashi	ac8c549e2f	[EC2] Cosmetic fix for usage of spark-ec2 --ebs-vol-num option The last line of the usage seems ugly. ``` $ spark-ec2 --help <snip> --ebs-vol-num=EBS_VOL_NUM Number of EBS volumes to attach to each node as /vol[x]. The volumes will be deleted when the instances terminate. Only possible on EBS-backed AMIs. EBS volumes are only attached if --ebs-vol-size > 0.Only support up to 8 EBS volumes. ``` After applying this patch: ``` $ spark-ec2 --help <snip> --ebs-vol-num=EBS_VOL_NUM Number of EBS volumes to attach to each node as /vol[x]. The volumes will be deleted when the instances terminate. Only possible on EBS-backed AMIs. EBS volumes are only attached if --ebs-vol-size > 0. Only support up to 8 EBS volumes. ``` As this is a trivial thing I didn't create JIRA for this. Author: Kenichi Maehashi <webmaster@kenichimaehashi.com> Closes #7632 from kmaehashi/spark-ec2-cosmetic-fix and squashes the following commits: 526c118 [Kenichi Maehashi] cosmetic fix for spark-ec2 --ebs-vol-num option usage	2015-07-28 15:57:21 +01:00
Reynold Xin	15724fac56	[SPARK-9394][SQL] Handle parentheses in CodeFormatter. Our CodeFormatter currently does not handle parentheses, and as a result in code dump, we see code formatted this way: ``` foo( a, b, c) ``` With this patch, it is formatted this way: ``` foo( a, b, c) ``` Author: Reynold Xin <rxin@databricks.com> Closes #7712 from rxin/codeformat-parentheses and squashes the following commits: c2b1c5f [Reynold Xin] Took square bracket out `3cfb174` [Reynold Xin] Code review feedback. 91f5bb1 [Reynold Xin] [SPARK-9394][SQL] Handle parentheses in CodeFormatter.	2015-07-28 00:52:26 -07:00
Reynold Xin	fc3bd96bc3	Closes #6836 since Round has already been implemented.	2015-07-27 23:56:16 -07:00
zsxwing	d93ab93d67	[SPARK-9335] [STREAMING] [TESTS] Make sure the test stream is deleted in KinesisBackedBlockRDDSuite KinesisBackedBlockRDDSuite should make sure delete the stream. Author: zsxwing <zsxwing@gmail.com> Closes #7663 from zsxwing/fix-SPARK-9335 and squashes the following commits: f0e9154 [zsxwing] Revert "[HOTFIX] - Disable Kinesis tests due to rate limits" 71a4552 [zsxwing] Make sure the test stream is deleted	2015-07-27 23:34:29 -07:00
Cheng Hao	9c5612f4e1	[MINOR] [SQL] Support mutable expression unit test with codegen projection This is actually contains 3 minor issues: 1) Enable the unit test(codegen) for mutable expressions (FormatNumber, Regexp_Replace/Regexp_Extract) 2) Use the `PlatformDependent.copyMemory` instead of the `System.arrayCopy` Author: Cheng Hao <hao.cheng@intel.com> Closes #7566 from chenghao-intel/codegen_ut and squashes the following commits: 24f43ea [Cheng Hao] enable codegen for mutable expression & UTF8String performance	2015-07-27 23:02:23 -07:00
Reynold Xin	60f08c7c87	[SPARK-9373][SQL] Support StructType in Tungsten projection This pull request updates GenerateUnsafeProjection to support StructType. If an input struct type is backed already by an UnsafeRow, GenerateUnsafeProjection copies the bytes directly into its buffer space without any conversion. However, if the input is not an UnsafeRow, GenerateUnsafeProjection runs the code generated recursively to convert the input into an UnsafeRow and then copies it into the buffer space. Also create a TungstenProject operator that projects data directly into UnsafeRow. Note that I'm not sure if this is the way we want to structure Unsafe+codegen operators, but we can defer that decision to follow-up pull requests. Author: Reynold Xin <rxin@databricks.com> Closes #7689 from rxin/tungsten-struct-type and squashes the following commits: 9162f42 [Reynold Xin] Support IntervalType in UnsafeRow's getter. be9f377 [Reynold Xin] Fixed tests. 10c4b7c [Reynold Xin] Format generated code. 77e8d0e [Reynold Xin] Fixed NondeterministicSuite. ac4951d [Reynold Xin] Yay. ac203bf [Reynold Xin] More comments. 9f36216 [Reynold Xin] Updated comment. 6b781fe [Reynold Xin] Reset the change in DataFrameSuite. 525b95b [Reynold Xin] Merged with master, more documentation & test cases. 321859a [Reynold Xin] [SPARK-9373][SQL] Support StructType in Tungsten projection [WIP]	2015-07-27 22:51:15 -07:00
Yijie Shen	63a492b931	[SPARK-8828] [SQL] Revert SPARK-5680 JIRA: https://issues.apache.org/jira/browse/SPARK-8828 Author: Yijie Shen <henry.yijieshen@gmail.com> Closes #7667 from yjshen/revert_combinesum_2 and squashes the following commits: c37ccb1 [Yijie Shen] add test case 8377214 [Yijie Shen] revert spark.sql.useAggregate2 to its default value e2305ac [Yijie Shen] fix bug - avg on decimal column 7cb0e95 [Yijie Shen] [wip] resolving bugs 1fadb5a [Yijie Shen] remove occurance 17c6248 [Yijie Shen] revert SPARK-5680	2015-07-27 22:47:33 -07:00
Reynold Xin	3bc7055e26	Fixed a test failure.	2015-07-27 22:04:54 -07:00
Reynold Xin	84da8792e2	[SPARK-9395][SQL] Create a SpecializedGetters interface to track all the specialized getters. As we are adding more and more specialized getters to more classes (coming soon ArrayData), this interface can help us prevent missing a method in some interfaces. Author: Reynold Xin <rxin@databricks.com> Closes #7713 from rxin/SpecializedGetters and squashes the following commits: 3b39be1 [Reynold Xin] Added override modifier. 567ba9c [Reynold Xin] [SPARK-9395][SQL] Create a SpecializedGetters interface to track all the specialized getters.	2015-07-27 21:41:15 -07:00
Daoyuan Wang	2e7f99a004	[SPARK-8195] [SPARK-8196] [SQL] udf next_day last_day next_day, returns next certain dayofweek. last_day, returns the last day of the month which given date belongs to. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #6986 from adrian-wang/udfnlday and squashes the following commits: ef7e3da [Daoyuan Wang] fix 02b3426 [Daoyuan Wang] address 2 comments dc69630 [Daoyuan Wang] address comments from rxin 8846086 [Daoyuan Wang] address comments from rxin d09bcce [Daoyuan Wang] multi fix 1a9de3d [Daoyuan Wang] function next_day and last_day	2015-07-27 21:08:56 -07:00
zsxwing	daa1964b60	[SPARK-8882] [STREAMING] Add a new Receiver scheduling mechanism The design doc: https://docs.google.com/document/d/1ZsoRvHjpISPrDmSjsGzuSu8UjwgbtmoCTzmhgTurHJw/edit?usp=sharing Author: zsxwing <zsxwing@gmail.com> Closes #7276 from zsxwing/receiver-scheduling and squashes the following commits: 137b257 [zsxwing] Add preferredNumExecutors to rescheduleReceiver 61a6c3f [zsxwing] Set state to ReceiverState.INACTIVE in deregisterReceiver 5e1fa48 [zsxwing] Fix the code style 7451498 [zsxwing] Move DummyReceiver back to ReceiverTrackerSuite 715ef9c [zsxwing] Rename: scheduledLocations -> scheduledExecutors; locations -> executors 05daf9c [zsxwing] Use receiverTrackingInfo.toReceiverInfo 1d6d7c8 [zsxwing] Merge branch 'master' into receiver-scheduling 8f93c8d [zsxwing] Use hostPort as the receiver location rather than host; fix comments and unit tests 59f8887 [zsxwing] Schedule all receivers at the same time when launching them 075e0a3 [zsxwing] Add receiver RDD name; use '!isTrackerStarted' instead 276a4ac [zsxwing] Remove "ReceiverLauncher" and move codes to "launchReceivers" fab9a01 [zsxwing] Move methods back to the outer class 4e639c4 [zsxwing] Fix unintentional changes f60d021 [zsxwing] Reorganize ReceiverTracker to use an event loop for lock free 105037e [zsxwing] Merge branch 'master' into receiver-scheduling 5fee132 [zsxwing] Update tha scheduling algorithm to avoid to keep restarting Receiver 9e242c8 [zsxwing] Remove the ScheduleReceiver message because we can refuse it when receiving RegisterReceiver a9acfbf [zsxwing] Merge branch 'squash-pr-6294' into receiver-scheduling 881edb9 [zsxwing] ReceiverScheduler -> ReceiverSchedulingPolicy e530bcc [zsxwing] [SPARK-5681][Streaming] Use a lock to eliminate the race condition when stopping receivers and registering receivers happen at the same time #6294 3b87e4a [zsxwing] Revert SparkContext.scala a86850c [zsxwing] Remove submitAsyncJob and revert JobWaiter f549595 [zsxwing] Add comments for the scheduling approach 9ecc08e [zsxwing] Fix comments and code style 28d1bee [zsxwing] Make 'host' protected; rescheduleReceiver -> getAllowedLocations 2c86a9e [zsxwing] Use tryFailure to support calling jobFailed multiple times ca6fe35 [zsxwing] Add a test for Receiver.restart 27acd45 [zsxwing] Add unit tests for LoadBalanceReceiverSchedulerImplSuite cc76142 [zsxwing] Add JobWaiter.toFuture to avoid blocking threads d9a3e72 [zsxwing] Add a new Receiver scheduling mechanism	2015-07-27 17:59:43 -07:00
Michael Armbrust	ce89ff477a	[SPARK-9386] [SQL] Feature flag for metastore partition pruning Since we have been seeing a lot of failures related to this new feature, lets put it behind a flag and turn it off by default. Author: Michael Armbrust <michael@databricks.com> Closes #7703 from marmbrus/optionalMetastorePruning and squashes the following commits: 6ad128c [Michael Armbrust] style 8447835 [Michael Armbrust] [SPARK-9386][SQL] Feature flag for metastore partition pruning fd37b87 [Michael Armbrust] add config flag	2015-07-27 17:32:34 -07:00
Eric Liang	8ddfa52c20	[SPARK-9230] [ML] Support StringType features in RFormula This adds StringType feature support via OneHotEncoder. As part of this task it was necessary to change RFormula to an Estimator, so that factor levels could be determined from the training dataset. Not sure if I am using uids correctly here, would be good to get reviewer help on that. cc mengxr Umbrella design doc: https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit# Author: Eric Liang <ekl@databricks.com> Closes #7574 from ericl/string-features and squashes the following commits: f99131a [Eric Liang] comments 0bf3c26 [Eric Liang] update docs c302a2c [Eric Liang] fix tests 9d1ac82 [Eric Liang] Merge remote-tracking branch 'upstream/master' into string-features e713da3 [Eric Liang] comments 4d79193 [Eric Liang] revert to seq + distinct 169a085 [Eric Liang] tweak functional test a230a47 [Eric Liang] Merge branch 'master' into string-features 72bd6f3 [Eric Liang] fix merge d841cec [Eric Liang] Merge branch 'master' into string-features 5b2c4a2 [Eric Liang] Mon Jul 20 18:45:33 PDT 2015 b01c7c5 [Eric Liang] add test 8a637db [Eric Liang] encoder wip a1d03f4 [Eric Liang] refactor into estimator	2015-07-27 17:17:49 -07:00
Yin Huai	dafe8d857d	[SPARK-9385] [PYSPARK] Enable PEP8 but disable installing pylint. Instead of disabling all python style check, we should enable PEP8. So, this PR just comments out the part installing pylint. Author: Yin Huai <yhuai@databricks.com> Closes #7704 from yhuai/SPARK-9385 and squashes the following commits: 0056359 [Yin Huai] Enable PEP8 but disable installing pylint.	2015-07-27 15:49:42 -07:00
jerryshao	ab62595661	[SPARK-4352] [YARN] [WIP] Incorporate locality preferences in dynamic allocation requests Currently there's no locality preference for container request in YARN mode, this will affect the performance if fetching data remotely, so here proposed to add locality in Yarn dynamic allocation mode. Ping sryza, please help to review, thanks a lot. Author: jerryshao <saisai.shao@intel.com> Closes #6394 from jerryshao/SPARK-4352 and squashes the following commits: d45fecb [jerryshao] Add documents 6c3fe5c [jerryshao] Fix bug 8db6c0e [jerryshao] Further address the comments 2e2b2cb [jerryshao] Fix rebase compiling problem ce5f096 [jerryshao] Fix style issue 7f7df95 [jerryshao] Fix rebase issue 9ca9e07 [jerryshao] Code refactor according to comments d3e4236 [jerryshao] Further address the comments 5e7a593 [jerryshao] Fix bug introduced code rebase 9ca7783 [jerryshao] Style changes 08317f9 [jerryshao] code and comment refines 65b2423 [jerryshao] Further address the comments a27c587 [jerryshao] address the comment 27faabc [jerryshao] redundant code remove 9ce06a1 [jerryshao] refactor the code f5ba27b [jerryshao] Style fix 2c6cc8a [jerryshao] Fix bug and add unit tests 0757335 [jerryshao] Consider the distribution of existed containers to recalculate the new container requests 0ad66ff [jerryshao] Fix compile bugs 1c20381 [jerryshao] Minor fix 5ef2dc8 [jerryshao] Add docs and improve the code 3359814 [jerryshao] Fix rebase and test bugs 0398539 [jerryshao] reinitialize the new implementation 67596d6 [jerryshao] Still fix the code 654e1d2 [jerryshao] Fix some bugs 45b1c89 [jerryshao] Further polish the algorithm dea0152 [jerryshao] Enable node locality information in YarnAllocator 74bbcc6 [jerryshao] Support node locality for dynamic allocation initial commit	2015-07-27 15:46:35 -07:00
Yin Huai	2104931d7d	[SPARK-9385] [HOT-FIX] [PYSPARK] Comment out Python style check https://issues.apache.org/jira/browse/SPARK-9385 Comment out Python style check because of error shown in https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/3088/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/console Author: Yin Huai <yhuai@databricks.com> Closes #7702 from yhuai/SPARK-9385 and squashes the following commits: 146e6ef [Yin Huai] Comment out Python style check because of error shown in https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/3088/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/console	2015-07-27 15:18:48 -07:00
Hari Shreedharan	c1be9f309a	[SPARK-8988] [YARN] Make sure driver log links appear in secure cluste… …r mode. The NodeReports API currently used does not work in secure mode since we do not get RM tokens. Instead this patch just uses environment vars exported by YARN to create the log links. Author: Hari Shreedharan <hshreedharan@apache.org> Closes #7624 from harishreedharan/driver-logs-env and squashes the following commits: 7368c7e [Hari Shreedharan] [SPARK-8988][YARN] Make sure driver log links appear in secure cluster mode.	2015-07-27 15:16:46 -07:00
Wenchen Fan	3ab7525dce	[SPARK-9355][SQL] Remove InternalRow.get generic getter call in columnar cache code Author: Wenchen Fan <cloud0fan@outlook.com> Closes #7673 from cloud-fan/row-generic-getter-columnar and squashes the following commits: 88b1170 [Wenchen Fan] fix style eeae712 [Wenchen Fan] Remove Internal.get generic getter call in columnar cache code	2015-07-27 13:40:50 -07:00
Cheng Lian	8e7d2bee23	[SPARK-9378] [SQL] Fixes test case "CTAS with serde" This is a proper version of PR #7693 authored by viirya The reason why "CTAS with serde" fails is that the `MetastoreRelation` gets converted to a Parquet data source relation by default. Author: Cheng Lian <lian@databricks.com> Closes #7700 from liancheng/spark-9378-fix-ctas-test and squashes the following commits: 4413af0 [Cheng Lian] Fixes test case "CTAS with serde"	2015-07-27 13:28:03 -07:00
Yin Huai	55946e76fd	[SPARK-9349] [SQL] UDAF cleanup https://issues.apache.org/jira/browse/SPARK-9349 With this PR, we only expose `UserDefinedAggregateFunction` (an abstract class) and `MutableAggregationBuffer` (an interface). Other internal wrappers and helper classes are moved to `org.apache.spark.sql.execution.aggregate` and marked as `private[sql]`. Author: Yin Huai <yhuai@databricks.com> Closes #7687 from yhuai/UDAF-cleanup and squashes the following commits: db36542 [Yin Huai] Add comments to UDAF examples. ae17f66 [Yin Huai] Address comments. 9c9fa5f [Yin Huai] UDAF cleanup.	2015-07-27 13:26:57 -07:00
Reynold Xin	fa84e4a7ba	Closes #7690 since it has been merged into branch-1.4.	2015-07-27 13:21:04 -07:00
Reynold Xin	85a50a6352	[HOTFIX] Disable pylint since it is failing master.	2015-07-27 12:25:34 -07:00
Wenchen Fan	75438422c2	[SPARK-9369][SQL] Support IntervalType in UnsafeRow Author: Wenchen Fan <cloud0fan@outlook.com> Closes #7688 from cloud-fan/interval and squashes the following commits: 5b36b17 [Wenchen Fan] fix codegen a99ed50 [Wenchen Fan] address comment 9e6d319 [Wenchen Fan] Support IntervalType in UnsafeRow	2015-07-27 11:28:22 -07:00
Wenchen Fan	dd9ae7945a	[SPARK-9351] [SQL] remove literals from grouping expressions in Aggregate literals in grouping expressions have no effect at all, only make our grouping key bigger, so we should remove them in Optimizer. I also make old and new aggregation code consistent about literals in grouping here. In old aggregation, actually literals in grouping are already removed but new aggregation is not. So I explicitly make it a rule in Optimizer. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #7583 from cloud-fan/minor and squashes the following commits: 471adff [Wenchen Fan] add test 0839925 [Wenchen Fan] use transformDown when rewrite final result expressions	2015-07-27 11:23:29 -07:00
George Dittmar	1f7b3d9dc7	[SPARK-7423] [MLLIB] Modify ClassificationModel and Probabalistic model to use Vector.argmax Use Vector.argmax call instead of converting to dense vector before calculating predictions. Author: George Dittmar <georgedittmar@gmail.com> Closes #7670 from GeorgeDittmar/sprk-7423 and squashes the following commits: e796747 [George Dittmar] Changing ClassificationModel and ProbabilisticClassificationModel to use Vector.argmax instead of converting to DenseVector	2015-07-27 11:16:33 -07:00
Wenchen Fan	e2f38167f8	[SPARK-9376] [SQL] use a seed in RandomDataGeneratorSuite Make this test deterministic, i.e. make sure this test can be passed no matter how many times we run it. The origin implementation uses a random seed and gives a chance that we may break the null check assertion `assert(Iterator.fill(100)(generator()).contains(null))`. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #7691 from cloud-fan/seed and squashes the following commits: eae7281 [Wenchen Fan] use a seed in RandomDataGeneratorSuite	2015-07-27 11:02:16 -07:00
Ryan Williams	c0b7df68f8	[SPARK-9366] use task's stageAttemptId in TaskEnd event Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #7681 from ryan-williams/task-stage-attempt and squashes the following commits: d6d5f0f [Ryan Williams] use task's stageAttemptId in TaskEnd event	2015-07-27 12:54:08 -05:00
Josh Rosen	ecad9d4346	[SPARK-9364] Fix array out of bounds and use-after-free bugs in UnsafeExternalSorter This patch fixes two bugs in UnsafeExternalSorter and UnsafeExternalRowSorter: - UnsafeExternalSorter does not properly update freeSpaceInCurrentPage, which can cause it to write past the end of memory pages and trigger segfaults. - UnsafeExternalRowSorter has a use-after-free bug when returning the last row from an iterator. Author: Josh Rosen <joshrosen@databricks.com> Closes #7680 from JoshRosen/SPARK-9364 and squashes the following commits: 590f311 [Josh Rosen] null out row f4cf91d [Josh Rosen] Fix use-after-free bug in UnsafeExternalRowSorter. 8abcf82 [Josh Rosen] Properly decrement freeSpaceInCurrentPage in UnsafeExternalSorter	2015-07-27 09:34:49 -07:00
Alexander Ulanov	90006f3c51	Pregel example type fix Pregel example to express single source shortest path from https://spark.apache.org/docs/latest/graphx-programming-guide.html#pregel-api does not work due to incorrect type. The reason is that `GraphGenerators.logNormalGraph` returns the graph with `Long` vertices. Fixing `val graph: Graph[Int, Double]` to `val graph: Graph[Long, Double]`. Author: Alexander Ulanov <nashb@yandex.ru> Closes #7695 from avulanov/SPARK-9380-pregel-doc and squashes the following commits: c269429 [Alexander Ulanov] Pregel example type fix	2015-07-28 01:33:31 +09:00
Rene Treffer	aa19c696e2	[SPARK-4176] [SQL] Supports decimal types with precision > 18 in Parquet This PR is based on #6796 authored by rtreffer. To support large decimal precisions (> 18), we do the following things in this PR: 1. Making `CatalystSchemaConverter` support large decimal precision Decimal types with large precision are always converted to fixed-length byte array. 2. Making `CatalystRowConverter` support reading decimal values with large precision When the precision is > 18, constructs `Decimal` values with an unscaled `BigInteger` rather than an unscaled `Long`. 3. Making `RowWriteSupport` support writing decimal values with large precision In this PR we always write decimals as fixed-length byte array, because Parquet write path hasn't been refactored to conform Parquet format spec (see SPARK-6774 & SPARK-8848). Two follow-up tasks should be done in future PRs: - [ ] Writing decimals as `INT32`, `INT64` when possible while fixing SPARK-8848 - [ ] Adding compatibility tests as part of SPARK-5463 Author: Cheng Lian <lian@databricks.com> Closes #7455 from liancheng/spark-4176 and squashes the following commits: a543d10 [Cheng Lian] Fixes errors introduced while rebasing 9e31cdf [Cheng Lian] Supports decimals with precision > 18 for Parquet	2015-07-27 23:29:40 +08:00
Carson Wang	6228381657	[SPARK-8405] [DOC] Add how to view logs on Web UI when yarn log aggregation is enabled Some users may not be aware that the logs are available on Web UI even if Yarn log aggregation is enabled. Update the doc to make this clear and what need to be configured. Author: Carson Wang <carson.wang@intel.com> Closes #7463 from carsonwang/YarnLogDoc and squashes the following commits: 274c054 [Carson Wang] Minor text fix 74df3a1 [Carson Wang] address comments 5a95046 [Carson Wang] Update the text in the doc e5775c1 [Carson Wang] Update doc about how to view the logs on Web UI when yarn log aggregation is enabled	2015-07-27 08:02:40 -05:00
Cheng Lian	72981bc8f0	[SPARK-7943] [SPARK-8105] [SPARK-8435] [SPARK-8714] [SPARK-8561] Fixes multi-database support This PR fixes a set of issues related to multi-database. A new data structure `TableIdentifier` is introduced to identify a table among multiple databases. We should stop using a single `String` (table name without database name), or `Seq[String]` (optional database name plus table name) to identify tables internally. Author: Cheng Lian <lian@databricks.com> Closes #7623 from liancheng/spark-8131-multi-db and squashes the following commits: f3bcd4b [Cheng Lian] Addresses PR comments e0eb76a [Cheng Lian] Fixes styling issues 41e2207 [Cheng Lian] Fixes multi-database support d4d1ec2 [Cheng Lian] Adds multi-database test cases	2015-07-27 17:15:35 +08:00
Wenchen Fan	4ffd3a1db5	[SPARK-9371][SQL] fix the support for special chars in column names for hive context Author: Wenchen Fan <cloud0fan@outlook.com> Closes #7684 from cloud-fan/hive and squashes the following commits: da21ffe [Wenchen Fan] fix the support for special chars in column names for hive context	2015-07-26 23:58:03 -07:00
Reynold Xin	aa80c64fcf	[SPARK-9368][SQL] Support get(ordinal, dataType) generic getter in UnsafeRow. Author: Reynold Xin <rxin@databricks.com> Closes #7682 from rxin/unsaferow-generic-getter and squashes the following commits: 3063788 [Reynold Xin] Reset the change for real this time. 0f57c55 [Reynold Xin] Reset the changes in ExpressionEvalHelper. fb6ca30 [Reynold Xin] Support BinaryType. 24a3e46 [Reynold Xin] Added support for DateType/TimestampType. 9989064 [Reynold Xin] JoinedRow. 11f80a3 [Reynold Xin] [SPARK-9368][SQL] Support get(ordinal, dataType) generic getter in UnsafeRow.	2015-07-26 23:01:04 -07:00
Liang-Chi Hsieh	945d8bcbf6	[SPARK-9306] [SQL] Don't use SortMergeJoin when joining on unsortable columns JIRA: https://issues.apache.org/jira/browse/SPARK-9306 Author: Liang-Chi Hsieh <viirya@appier.com> Closes #7645 from viirya/smj_unsortable and squashes the following commits: a240707 [Liang-Chi Hsieh] Use forall instead of exists for readability. 55221fa [Liang-Chi Hsieh] Shouldn't use SortMergeJoin when joining on unsortable columns.	2015-07-26 22:13:37 -07:00
Cheng Hao	1efe97dc9e	[SPARK-8867][SQL] Support list / describe function usage As Hive does, we need to list all of the registered UDF and its usage for user. We add the annotation to describe a UDF, so we can get the literal description info while registering the UDF. e.g. ```scala ExpressionDescription( usage = "_FUNC_(expr) - Returns the absolute value of the numeric value", extended = """> SELECT _FUNC_('-1') 1""") case class Abs(child: Expression) extends UnaryArithmetic { ... ``` Author: Cheng Hao <hao.cheng@intel.com> Closes #7259 from chenghao-intel/desc_function and squashes the following commits: cf29bba [Cheng Hao] fixing the code style issue 5193855 [Cheng Hao] Add more powerful parser for show functions c645a6b [Cheng Hao] fix bug in unit test 78d40f1 [Cheng Hao] update the padding issue for usage 48ee4b3 [Cheng Hao] update as feedback 70eb4e9 [Cheng Hao] add show/describe function support	2015-07-26 18:34:19 -07:00
Cheng Lian	c025c3d0a1	[SPARK-9095] [SQL] Removes the old Parquet support This PR removes the old Parquet support: - Removes the old `ParquetRelation` together with related SQL configuration, plan nodes, strategies, utility classes, and test suites. - Renames `ParquetRelation2` to `ParquetRelation` - Renames `RowReadSupport` and `RowRecordMaterializer` to `CatalystReadSupport` and `CatalystRecordMaterializer` respectively, and moved them to separate files. This follows naming convention used in other Parquet data models implemented in parquet-mr. It should be easier for developers who are familiar with Parquet to follow. There's still some other code that can be cleaned up. Especially `RowWriteSupport`. But I'd like to leave this part to SPARK-8848. Author: Cheng Lian <lian@databricks.com> Closes #7441 from liancheng/spark-9095 and squashes the following commits: c7b6e38 [Cheng Lian] Removes WriteToFile 2d688d6 [Cheng Lian] Renames ParquetRelation2 to ParquetRelation ca9e1b7 [Cheng Lian] Removes old Parquet support	2015-07-26 16:49:19 -07:00
Kay Ousterhout	6b2baec04f	[SPARK-9326] Close lock file used for file downloads. A lock file is used to ensure multiple executors running on the same machine don't download the same file concurrently. Spark never closes these lock files (releasing the lock does not close the underlying file); this commit fixes that. cc vanzin (looks like you've been involved in various other fixes surrounding these lock files) Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #7650 from kayousterhout/SPARK-9326 and squashes the following commits: 0401bd1 [Kay Ousterhout] Close lock file used for file downloads.	2015-07-26 13:35:16 -07:00
Andrew Or	1cf19760d6	[SPARK-9352] [SPARK-9353] Add tests for standalone scheduling code This also fixes a small issue in the standalone Master that was uncovered by the new tests. For more detail, read the description of SPARK-9353. Author: Andrew Or <andrew@databricks.com> Closes #7668 from andrewor14/standalone-scheduling-tests and squashes the following commits: d852faf [Andrew Or] Add tests + fix scheduling with memory limits	2015-07-26 13:03:13 -07:00
Yijie Shen	fb5d43fb25	[SPARK-9356][SQL]Remove the internal use of DecimalType.Unlimited JIRA: https://issues.apache.org/jira/browse/SPARK-9356 Author: Yijie Shen <henry.yijieshen@gmail.com> Closes #7671 from yjshen/deprecated_unlimit and squashes the following commits: c707f56 [Yijie Shen] remove pattern matching in changePrecision 4a1823c [Yijie Shen] remove internal occurrence of Decimal.Unlimited	2015-07-26 10:29:22 -07:00
Reynold Xin	6c400b4f39	[SPARK-9354][SQL] Remove InternalRow.get generic getter call in Hive integration code. Replaced them with get(ordinal, datatype) so we can use UnsafeRow here. I passed the data types throughout. Author: Reynold Xin <rxin@databricks.com> Closes #7669 from rxin/row-generic-getter-hive and squashes the following commits: 3467d8e [Reynold Xin] [SPARK-9354][SQL] Remove Internal.get generic getter call in Hive integration code.	2015-07-26 10:27:39 -07:00

1 2 3 4 5 ...

12159 commits