Commit graph

60 commits

Author SHA1 Message Date
Michael Armbrust c7a183b2c2 [SPARK-2041][SQL] Correctly analyze queries where columnName == tableName.
Author: Michael Armbrust <michael@databricks.com>

Closes #985 from marmbrus/tableName and squashes the following commits:

3caaa27 [Michael Armbrust] Correctly analyze queries where columnName == tableName.
2014-06-05 17:42:08 -07:00
Takuya UESHIN e4c11eef2f [SPARK-2036] [SQL] CaseConversionExpression should check if the evaluated value is null.
`CaseConversionExpression` should check if the evaluated value is `null`.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #982 from ueshin/issues/SPARK-2036 and squashes the following commits:

61e1c54 [Takuya UESHIN] Add check if the evaluated value is null.
2014-06-05 12:00:31 -07:00
Takuya UESHIN 7c160293d6 [SPARK-2029] Bump pom.xml version number of master branch to 1.1.0-SNAPSHOT.
Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #974 from ueshin/issues/SPARK-2029 and squashes the following commits:

e19e8f4 [Takuya UESHIN] Bump version number to 1.1.0-SNAPSHOT.
2014-06-05 11:27:33 -07:00
egraldlo ec8be274a7 [SPARK-1995][SQL] system function upper and lower can be supported
I don't know whether it's time to implement system function about string operation in spark sql now.

Author: egraldlo <egraldlo@gmail.com>

Closes #936 from egraldlo/stringoperator and squashes the following commits:

3c6c60a [egraldlo] Add UPPER, LOWER, MAX and MIN into hive parser
ea76d0a [egraldlo] modify the formatting issues
b49f25e [egraldlo] modify the formatting issues
1f0bbb5 [egraldlo] system function upper and lower supported
13d3267 [egraldlo] system function upper and lower supported
2014-06-02 18:02:57 -07:00
Takuya UESHIN 3ce81494c5 [SPARK-1947] [SQL] Child of SumDistinct or Average should be widened to prevent overflows the same as Sum.
Child of `SumDistinct` or `Average` should be widened to prevent overflows the same as `Sum`.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #902 from ueshin/issues/SPARK-1947 and squashes the following commits:

99c3dcb [Takuya UESHIN] Insert Cast for SumDistinct and Average.
2014-05-31 11:30:03 -07:00
Takuya UESHIN 9df86835b6 [SPARK-1938] [SQL] ApproxCountDistinctMergeFunction should return Int value.
`ApproxCountDistinctMergeFunction` should return `Int` value because the `dataType` of `ApproxCountDistinct` is `IntegerType`.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #893 from ueshin/issues/SPARK-1938 and squashes the following commits:

3970e88 [Takuya UESHIN] Remove a superfluous line.
5ad7ec1 [Takuya UESHIN] Make dataType for each of CountDistinct, ApproxCountDistinctMerge and ApproxCountDistinct LongType.
cbe7c71 [Takuya UESHIN] Revert a change.
fc3ac0f [Takuya UESHIN] Fix evaluated value type of ApproxCountDistinctMergeFunction to Int.
2014-05-27 22:17:50 -07:00
Takuya UESHIN 3b0babad1f [SPARK-1915] [SQL] AverageFunction should not count if the evaluated value is null.
Average values are difference between the calculation is done partially or not partially.
Because `AverageFunction` (in not-partially calculation) counts even if the evaluated value is null.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #862 from ueshin/issues/SPARK-1915 and squashes the following commits:

b1ff3c0 [Takuya UESHIN] Modify AverageFunction not to count if the evaluated value is null.
2014-05-27 14:55:23 -07:00
Takuya UESHIN d1375a2bff [SPARK-1926] [SQL] Nullability of Max/Min/First should be true.
Nullability of `Max`/`Min`/`First` should be `true` because they return `null` if there are no rows.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #881 from ueshin/issues/SPARK-1926 and squashes the following commits:

322610f [Takuya UESHIN] Fix nullability of Min/Max/First.
2014-05-27 14:53:57 -07:00
Takuya UESHIN d6395d86f9 [SPARK-1914] [SQL] Simplify CountFunction not to traverse to evaluate all child expressions.
`CountFunction` should count up only if the child's evaluated value is not null.

Because it traverses to evaluate all child expressions, even if the child is null, it counts up if one of the all children is not null.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #861 from ueshin/issues/SPARK-1914 and squashes the following commits:

3b37315 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-1914
2afa238 [Takuya UESHIN] Simplify CountFunction not to traverse to evaluate all child expressions.
2014-05-26 00:17:20 -07:00
Kan Zhang 6052db9dc1 [SPARK-1822] SchemaRDD.count() should use query optimizer
Author: Kan Zhang <kzhang@apache.org>

Closes #841 from kanzhang/SPARK-1822 and squashes the following commits:

2f8072a [Kan Zhang] [SPARK-1822] Minor style update
cf4baa4 [Kan Zhang] [SPARK-1822] Adding Scaladoc
e67c910 [Kan Zhang] [SPARK-1822] SchemaRDD.count() should use optimizer
2014-05-25 00:06:42 -07:00
Takuya UESHIN bb88875ad5 [SPARK-1889] [SQL] Apply splitConjunctivePredicates to join condition while finding join ke...
...ys.

When tables are equi-joined by multiple-keys `HashJoin` should be used, but `CartesianProduct` and then `Filter` are used.
The join keys are paired by `And` expression so we need to apply `splitConjunctivePredicates` to join condition while finding join keys.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #836 from ueshin/issues/SPARK-1889 and squashes the following commits:

fe1c387 [Takuya UESHIN] Apply splitConjunctivePredicates to join condition while finding join keys.
2014-05-21 15:37:47 -07:00
Cheng Hao a20fea9881 [Spark-1461] Deferred Expression Evaluation (short-circuit evaluation)
This patch unify the foldable & nullable interface for Expression.
1) Deterministic-less UDF (like Rand()) can not be folded.
2) Short-circut will significantly improves the performance in Expression Evaluation, however, the stateful UDF should not be ignored in a short-circuit evaluation(e.g. in expression: col1 > 0 and row_sequence() < 1000, row_sequence() can not be ignored even if col1 > 0 is false)

I brought an concept of DeferredObject from Hive, which has 2 kinds of children classes (EagerResult / DeferredResult), the former requires triggering the evaluation before it's created, while the later trigger the evaluation when first called its get() method.

Author: Cheng Hao <hao.cheng@intel.com>

Closes #446 from chenghao-intel/expression_deferred_evaluation and squashes the following commits:

d2729de [Cheng Hao] Fix the codestyle issues
a08f09c [Cheng Hao] fix bug in or/and short-circuit evaluation
af2236b [Cheng Hao] revert the short-circuit expression evaluation for IF
b7861d2 [Cheng Hao] Add Support for Deferred Expression Evaluation
2014-05-15 22:12:34 -07:00
Takuya UESHIN 94c9d6f598 [SPARK-1819] [SQL] Fix GetField.nullable.
`GetField.nullable` should be `true` not only when `field.nullable` is `true` but also when `child.nullable` is `true`.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #757 from ueshin/issues/SPARK-1819 and squashes the following commits:

8781a11 [Takuya UESHIN] Modify a test to use named parameters.
5bfc77d [Takuya UESHIN] Fix GetField.nullable.
2014-05-15 11:21:33 -07:00
Prashant Sharma 46324279da Package docs
This is a few changes based on the original patch by @scrapcodes.

Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Patrick Wendell <pwendell@gmail.com>

Closes #785 from pwendell/package-docs and squashes the following commits:

c32b731 [Patrick Wendell] Changes based on Prashant's patch
c0463d3 [Prashant Sharma] added eof new line
ce8bf73 [Prashant Sharma] Added eof new line to all files.
4c35f2e [Prashant Sharma] SPARK-1563 Add package-info.java and package.scala files for all packages that appear in docs
2014-05-14 22:24:41 -07:00
wangfei 44165fc91a [SPARK-1826] fix the head notation of package object dsl
Author: wangfei <scnbwf@yeah.net>

Closes #765 from scwf/dslfix and squashes the following commits:

d2d1a9d [wangfei] Update package.scala
66ff53b [wangfei] fix the head notation of package object dsl
2014-05-14 17:59:11 -07:00
Michael Armbrust 6ce0884446 [SQL] Improve column pruning.
Fixed a bug that was preventing us from ever pruning beneath Joins.

## TPC-DS Q3
### Before:
```
Aggregate false, [d_year#12,i_brand#65,i_brand_id#64], [d_year#12,i_brand_id#64 AS brand_id#0,i_brand#65 AS brand#1,SUM(PartialSum#79) AS sum_agg#2]
 Exchange (HashPartitioning [d_year#12:0,i_brand#65:1,i_brand_id#64:2], 150)
  Aggregate true, [d_year#12,i_brand#65,i_brand_id#64], [d_year#12,i_brand#65,i_brand_id#64,SUM(CAST(ss_ext_sales_price#49, DoubleType)) AS PartialSum#79]
   Project [d_year#12:6,i_brand#65:59,i_brand_id#64:58,ss_ext_sales_price#49:43]
    HashJoin [ss_item_sk#36], [i_item_sk#57], BuildRight
     Exchange (HashPartitioning [ss_item_sk#36:30], 150)
      HashJoin [d_date_sk#6], [ss_sold_date_sk#34], BuildRight
       Exchange (HashPartitioning [d_date_sk#6:0], 150)
        Filter (d_moy#14:8 = 12)
         HiveTableScan [d_date_sk#6,d_date_id#7,d_date#8,d_month_seq#9,d_week_seq#10,d_quarter_seq#11,d_year#12,d_dow#13,d_moy#14,d_dom#15,d_qoy#16,d_fy_year#17,d_fy_quarter_seq#18,d_fy_week_seq#19,d_day_name#20,d_quarter_name#21,d_holiday#22,d_weekend#23,d_following_holiday#24,d_first_dom#25,d_last_dom#26,d_same_day_ly#27,d_same_day_lq#28,d_current_day#29,d_current_week#30,d_current_month#31,d_current_quarter#32,d_current_year#33], (MetastoreRelation default, date_dim, Some(dt)), None
       Exchange (HashPartitioning [ss_sold_date_sk#34:0], 150)
        HiveTableScan [ss_sold_date_sk#34,ss_sold_time_sk#35,ss_item_sk#36,ss_customer_sk#37,ss_cdemo_sk#38,ss_hdemo_sk#39,ss_addr_sk#40,ss_store_sk#41,ss_promo_sk#42,ss_ticket_number#43,ss_quantity#44,ss_wholesale_cost#45,ss_list_price#46,ss_sales_price#47,ss_ext_discount_amt#48,ss_ext_sales_price#49,ss_ext_wholesale_cost#50,ss_ext_list_price#51,ss_ext_tax#52,ss_coupon_amt#53,ss_net_paid#54,ss_net_paid_inc_tax#55,ss_net_profit#56], (MetastoreRelation default, store_sales, None), None
     Exchange (HashPartitioning [i_item_sk#57:0], 150)
      Filter (i_manufact_id#70:13 = 436)
       HiveTableScan [i_item_sk#57,i_item_id#58,i_rec_start_date#59,i_rec_end_date#60,i_item_desc#61,i_current_price#62,i_wholesale_cost#63,i_brand_id#64,i_brand#65,i_class_id#66,i_class#67,i_category_id#68,i_category#69,i_manufact_id#70,i_manufact#71,i_size#72,i_formulation#73,i_color#74,i_units#75,i_container#76,i_manager_id#77,i_product_name#78], (MetastoreRelation default, item, None), None
```
### After
```
Aggregate false, [d_year#172,i_brand#225,i_brand_id#224], [d_year#172,i_brand_id#224 AS brand_id#160,i_brand#225 AS brand#161,SUM(PartialSum#239) AS sum_agg#162]
 Exchange (HashPartitioning [d_year#172:0,i_brand#225:1,i_brand_id#224:2], 150)
  Aggregate true, [d_year#172,i_brand#225,i_brand_id#224], [d_year#172,i_brand#225,i_brand_id#224,SUM(CAST(ss_ext_sales_price#209, DoubleType)) AS PartialSum#239]
   Project [d_year#172:1,i_brand#225:5,i_brand_id#224:3,ss_ext_sales_price#209:0]
    HashJoin [ss_item_sk#196], [i_item_sk#217], BuildRight
     Exchange (HashPartitioning [ss_item_sk#196:2], 150)
      Project [ss_ext_sales_price#209:2,d_year#172:1,ss_item_sk#196:3]
       HashJoin [d_date_sk#166], [ss_sold_date_sk#194], BuildRight
        Exchange (HashPartitioning [d_date_sk#166:0], 150)
         Project [d_date_sk#166:0,d_year#172:1]
          Filter (d_moy#174:2 = 12)
           HiveTableScan [d_date_sk#166,d_year#172,d_moy#174], (MetastoreRelation default, date_dim, Some(dt)), None
        Exchange (HashPartitioning [ss_sold_date_sk#194:2], 150)
         HiveTableScan [ss_ext_sales_price#209,ss_item_sk#196,ss_sold_date_sk#194], (MetastoreRelation default, store_sales, None), None
     Exchange (HashPartitioning [i_item_sk#217:1], 150)
      Project [i_brand_id#224:0,i_item_sk#217:1,i_brand#225:2]
       Filter (i_manufact_id#230:3 = 436)
        HiveTableScan [i_brand_id#224,i_item_sk#217,i_brand#225,i_manufact_id#230], (MetastoreRelation default, item, None), None
```

Author: Michael Armbrust <michael@databricks.com>

Closes #729 from marmbrus/fixPruning and squashes the following commits:

5feeff0 [Michael Armbrust] Improve column pruning.
2014-05-13 23:27:22 -07:00
larvaboy c33b8dcbf6 Implement ApproximateCountDistinct for SparkSql
Add the implementation for ApproximateCountDistinct to SparkSql. We use the HyperLogLog algorithm implemented in stream-lib, and do the count in two phases: 1) counting the number of distinct elements in each partitions, and 2) merge the HyperLogLog results from different partitions.

A simple serializer and test cases are added as well.

Author: larvaboy <larvaboy@gmail.com>

Closes #737 from larvaboy/master and squashes the following commits:

bd8ef3f [larvaboy] Add support of user-provided standard deviation to ApproxCountDistinct.
9ba8360 [larvaboy] Fix alignment and null handling issues.
95b4067 [larvaboy] Add a test case for count distinct and approximate count distinct.
f57917d [larvaboy] Add the parser for the approximate count.
a2d5d10 [larvaboy] Add ApproximateCountDistinct aggregates and functions.
7ad273a [larvaboy] Add SparkSql serializer for HyperLogLog.
1d9aacf [larvaboy] Fix a minor typo in the toString method of the Count case class.
653542b [larvaboy] Fix a couple of minor typos.
2014-05-13 21:26:08 -07:00
Andrew Ash 156df87e7c SPARK-1757 Failing test for saving null primitives with .saveAsParquetFile()
https://issues.apache.org/jira/browse/SPARK-1757

The first test succeeds, but the second test fails with exception:

```
[info] - save and load case class RDD with Nones as parquet *** FAILED *** (14 milliseconds)
[info]   java.lang.RuntimeException: Unsupported datatype StructType(List())
[info]   at scala.sys.package$.error(package.scala:27)
[info]   at org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetRelation.scala:201)
[info]   at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$1.apply(ParquetRelation.scala:235)
[info]   at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$1.apply(ParquetRelation.scala:235)
[info]   at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
[info]   at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
[info]   at scala.collection.immutable.List.foreach(List.scala:318)
[info]   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
[info]   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
[info]   at org.apache.spark.sql.parquet.ParquetTypesConverter$.convertFromAttributes(ParquetRelation.scala:234)
[info]   at org.apache.spark.sql.parquet.ParquetTypesConverter$.writeMetaData(ParquetRelation.scala:267)
[info]   at org.apache.spark.sql.parquet.ParquetRelation$.createEmpty(ParquetRelation.scala:143)
[info]   at org.apache.spark.sql.parquet.ParquetRelation$.create(ParquetRelation.scala:122)
[info]   at org.apache.spark.sql.execution.SparkStrategies$ParquetOperations$.apply(SparkStrategies.scala:139)
[info]   at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
[info]   at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
[info]   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
[info]   at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
[info]   at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:264)
[info]   at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:264)
[info]   at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:265)
[info]   at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:265)
[info]   at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:268)
[info]   at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:268)
[info]   at org.apache.spark.sql.SchemaRDDLike$class.saveAsParquetFile(SchemaRDDLike.scala:66)
[info]   at org.apache.spark.sql.SchemaRDD.saveAsParquetFile(SchemaRDD.scala:98)
```

Author: Andrew Ash <andrew@andrewash.com>
Author: Michael Armbrust <michael@databricks.com>

Closes #690 from ash211/rdd-parquet-save and squashes the following commits:

747a0b9 [Andrew Ash] Merge pull request #1 from marmbrus/pr/690
54bd00e [Michael Armbrust] Need to put Option first since Option <: Seq.
8f3f281 [Andrew Ash] SPARK-1757 Add failing test for saving SparkSQL Schemas with Option[?] fields as parquet
2014-05-12 19:23:39 -07:00
Takuya UESHIN 322b1808d2 [SPARK-1754] [SQL] Add missing arithmetic DSL operations.
Add missing arithmetic DSL operations: `unary_-`, `%`.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #689 from ueshin/issues/SPARK-1754 and squashes the following commits:

a09ef69 [Takuya UESHIN] Add also missing ! (not) operation.
f73ae2c [Takuya UESHIN] Remove redundant tests.
5b3f087 [Takuya UESHIN] Add tests relating DSL operations.
e09c5b8 [Takuya UESHIN] Add missing arithmetic DSL operations.
2014-05-08 15:31:47 -07:00
Michael Armbrust 19c8fb02bc [SQL] Improve SparkSQL Aggregates
* Add native min/max (was using hive before).
* Handle nulls correctly in Avg and Sum.

Author: Michael Armbrust <michael@databricks.com>

Closes #683 from marmbrus/aggFixes and squashes the following commits:

64fe30b [Michael Armbrust] Improve SparkSQL Aggregates * Add native min/max (was using hive before). * Handle nulls correctly in Avg and Sum.
2014-05-08 01:08:43 -04:00
Cheng Hao ca43186867 [SQL] Fix Performance Issue in data type casting
Using lazy val object instead of function in the class Cast, which improved the performance nearly by 2X in my local micro-benchmark.

Author: Cheng Hao <hao.cheng@intel.com>

Closes #679 from chenghao-intel/fix_type_casting and squashes the following commits:

71b0902 [Cheng Hao] using lazy val object instead of function for data type casting
2014-05-07 16:54:58 -04:00
Cheng Hao 3eb53bd59e [WIP][Spark-SQL] Optimize the Constant Folding for Expression
Currently, expression does not support the "constant null" well in constant folding.
e.g. Sum(a, 0) actually always produces Literal(0, NumericType) in runtime.

For example:
```
explain select isnull(key+null)  from src;
== Logical Plan ==
Project [HiveGenericUdf#isnull((key#30 + CAST(null, IntegerType))) AS c_0#28]
 MetastoreRelation default, src, None

== Optimized Logical Plan ==
Project [true AS c_0#28]
 MetastoreRelation default, src, None

== Physical Plan ==
Project [true AS c_0#28]
 HiveTableScan [], (MetastoreRelation default, src, None), None
```

I've create a new Optimization rule called NullPropagation for such kind of constant folding.

Author: Cheng Hao <hao.cheng@intel.com>
Author: Michael Armbrust <michael@databricks.com>

Closes #482 from chenghao-intel/optimize_constant_folding and squashes the following commits:

2f14b50 [Cheng Hao] Fix code style issues
68b9fad [Cheng Hao] Remove the Literal pattern matching for NullPropagation
29c8166 [Cheng Hao] Update the code for feedback of code review
50444cc [Cheng Hao] Remove the unnecessary null checking
80f9f18 [Cheng Hao] Update the UnitTest for aggregation constant folding
27ea3d7 [Cheng Hao] Fix Constant Folding Bugs & Add More Unittests
b28e03a [Cheng Hao] Merge pull request #1 from marmbrus/pr/482
9ccefdb [Michael Armbrust] Add tests for optimized expression evaluation.
543ef9d [Cheng Hao] fix code style issues
9cf0396 [Cheng Hao] update code according to the code review comment
536c005 [Cheng Hao] Add Exceptional case for constant folding
3c045c7 [Cheng Hao] Optimize the Constant Folding by adding more rules
2645d4f [Cheng Hao] Constant Folding(null propagation)
2014-05-07 03:37:12 -04:00
Michael Armbrust 3c64750bdd [SQL] SPARK-1732 - Support for null primitive values.
I also removed a println that I bumped into.

Author: Michael Armbrust <michael@databricks.com>

Closes #658 from marmbrus/nullPrimitives and squashes the following commits:

a3ec4f3 [Michael Armbrust] Remove println.
695606b [Michael Armbrust] Support for null primatives from using scala and java reflection.
2014-05-05 22:59:42 -07:00
Michael Armbrust b295714708 [SQL] Better logging when applying rules.
Author: Michael Armbrust <michael@databricks.com>

Closes #616 from marmbrus/ruleLogging and squashes the following commits:

39c09fe [Michael Armbrust] Fix off by one error.
5af3537 [Michael Armbrust] Better logging when applying rules.
2014-05-03 18:38:44 -07:00
witgo 030f2c2126 Improved build configuration
1, Fix SPARK-1441: compile spark core error with hadoop 0.23.x
2, Fix SPARK-1491: maven hadoop-provided profile fails to build
3, Fix org.scala-lang: * ,org.apache.avro:* inconsistent versions dependency
4, A modified on the sql/catalyst/pom.xml,sql/hive/pom.xml,sql/core/pom.xml (Four spaces formatted into two spaces)

Author: witgo <witgo@qq.com>

Closes #480 from witgo/format_pom and squashes the following commits:

03f652f [witgo] review commit
b452680 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
bee920d [witgo] revert fix SPARK-1629: Spark Core missing commons-lang dependence
7382a07 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
6902c91 [witgo] fix SPARK-1629: Spark Core missing commons-lang dependence
0da4bc3 [witgo] merge master
d1718ed [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
e345919 [witgo] add avro dependency to yarn-alpha
77fad08 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
62d0862 [witgo] Fix org.scala-lang: * inconsistent versions dependency
1a162d7 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
934f24d [witgo] review commit
cf46edc [witgo] exclude jruby
06e7328 [witgo] Merge branch 'SparkBuild' into format_pom
99464d2 [witgo] fix maven hadoop-provided profile fails to build
0c6c1fc [witgo] Fix compile spark core error with hadoop 0.23.x
6851bec [witgo] Maintain consistent SparkBuild.scala, pom.xml
2014-04-28 22:51:46 -07:00
Takuya UESHIN 8e37ed6eb8 [SPARK-1608] [SQL] Fix Cast.nullable when cast from StringType to NumericType/TimestampType.
`Cast.nullable` should be `true` when cast from `StringType` to `NumericType` or `TimestampType`.
Because if `StringType` expression has an illegal number string or illegal timestamp string, the casted value becomes `null`.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #532 from ueshin/issues/SPARK-1608 and squashes the following commits:

065d37c [Takuya UESHIN] Add tests to check nullabilities of cast expressions.
f278ed7 [Takuya UESHIN] Revert test to keep it readable and concise.
9fc9380 [Takuya UESHIN] Fix Cast.nullable when cast from StringType to NumericType/TimestampType.
2014-04-26 14:39:54 -07:00
Michael Armbrust 86ff8b1027 Generalize pattern for planning hash joins.
This will be helpful for [SPARK-1495](https://issues.apache.org/jira/browse/SPARK-1495) and other cases where we want to have custom hash join implementations but don't want to repeat the logic for finding the join keys.

Author: Michael Armbrust <michael@databricks.com>

Closes #418 from marmbrus/hashFilter and squashes the following commits:

d5cc79b [Michael Armbrust] Address @rxin 's comments.
366b6d9 [Michael Armbrust] style fixes
14560eb [Michael Armbrust] Generalize pattern for planning hash joins.
f4809c1 [Michael Armbrust] Move common functions to PredicateHelper.
2014-04-24 21:42:33 -07:00
Michael Armbrust 4660991e67 [SQL] Add support for parsing indexing into arrays in SQL.
Author: Michael Armbrust <michael@databricks.com>

Closes #518 from marmbrus/parseArrayIndex and squashes the following commits:

afd2d6b [Michael Armbrust] 100 chars
c3d6026 [Michael Armbrust] Add support for parsing indexing into arrays in SQL.
2014-04-24 18:21:00 -07:00
Arun Ramakrishnan 35e3d199f0 SPARK-1438 RDD.sample() make seed param optional
copying form previous pull request https://github.com/apache/spark/pull/462

Its probably better to let the underlying language implementation take care of the default . This was easier to do with python as the default value for seed in random and numpy random is None.

In Scala/Java side it might mean propagating an Option or null(oh no!) down the chain until where the Random is constructed. But, looks like the convention in some other methods was to use System.nanoTime. So, followed that convention.

Conflict with overloaded method in sql.SchemaRDD.sample which also defines default params.
sample(fraction, withReplacement=false, seed=math.random)
Scala does not allow more than one overloaded to have default params. I believe the author intended to override the RDD.sample method and not overload it. So, changed it.

If backward compatible is important, 3 new method can be introduced (without default params) like this
sample(fraction)
sample(fraction, withReplacement)
sample(fraction, withReplacement, seed)

Added some tests for the scala RDD takeSample method.

Author: Arun Ramakrishnan <smartnut007@gmail.com>

This patch had conflicts when merged, resolved by
Committer: Matei Zaharia <matei@databricks.com>

Closes #477 from smartnut007/master and squashes the following commits:

07bb06e [Arun Ramakrishnan] SPARK-1438 fixing more space formatting issues
b9ebfe2 [Arun Ramakrishnan] SPARK-1438 removing redundant import of random in python rddsampler
8d05b1a [Arun Ramakrishnan] SPARK-1438 RDD . Replace System.nanoTime with a Random generated number. python: use a separate instance of Random instead of seeding language api global Random instance.
69619c6 [Arun Ramakrishnan] SPARK-1438 fix spacing issue
0c247db [Arun Ramakrishnan] SPARK-1438 RDD language apis to support optional seed in RDD methods sample/takeSample
2014-04-24 17:27:16 -07:00
Takuya UESHIN 27b2821cf1 [SPARK-1610] [SQL] Fix Cast to use exact type value when cast from BooleanType to NumericTy...
...pe.

`Cast` from `BooleanType` to `NumericType` are all using `Int` value.
But it causes `ClassCastException` when the casted value is used by the following evaluation like the code below:

``` scala
scala> import org.apache.spark.sql.catalyst._
import org.apache.spark.sql.catalyst._

scala> import types._
import types._

scala> import expressions._
import expressions._

scala> Add(Cast(Literal(true), ShortType), Literal(1.toShort)).eval()
java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Short
	at scala.runtime.BoxesRunTime.unboxToShort(BoxesRunTime.java:102)
	at scala.math.Numeric$ShortIsIntegral$.plus(Numeric.scala:72)
	at org.apache.spark.sql.catalyst.expressions.Add$$anonfun$eval$2.apply(arithmetic.scala:58)
	at org.apache.spark.sql.catalyst.expressions.Add$$anonfun$eval$2.apply(arithmetic.scala:58)
	at org.apache.spark.sql.catalyst.expressions.Expression.n2(Expression.scala:114)
	at org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:58)
	at .<init>(<console>:17)
	at .<clinit>(<console>)
	at .<init>(<console>:7)
	at .<clinit>(<console>)
	at $print(<console>)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:483)
	at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)
	at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983)
	at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604)
	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568)
	at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:760)
	at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:805)
	at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:717)
	at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:581)
	at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:588)
	at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:591)
	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:882)
	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
	at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
	at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:837)
	at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:83)
	at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:96)
	at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:105)
	at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
```

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #533 from ueshin/issues/SPARK-1610 and squashes the following commits:

70f36e8 [Takuya UESHIN] Fix Cast to use exact type value when cast from BooleanType to NumericType.
2014-04-24 09:57:28 -07:00
Sean Owen 0247b5c546 SPARK-1488. Resolve scalac feature warnings during build
For your consideration: scalac currently notes a number of feature warnings during compilation:

```
[warn] there were 65 feature warning(s); re-run with -feature for details
```

Warnings are like:

```
[warn] /Users/srowen/Documents/spark/core/src/main/scala/org/apache/spark/SparkContext.scala:1261: implicit conversion method rddToPairRDDFunctions should be enabled
[warn] by making the implicit value scala.language.implicitConversions visible.
[warn] This can be achieved by adding the import clause 'import scala.language.implicitConversions'
[warn] or by setting the compiler option -language:implicitConversions.
[warn] See the Scala docs for value scala.language.implicitConversions for a discussion
[warn] why the feature should be explicitly enabled.
[warn]   implicit def rddToPairRDDFunctions[K: ClassTag, V: ClassTag](rdd: RDD[(K, V)]) =
[warn]                ^
```

scalac is suggesting that it's just best practice to explicitly enable certain language features by importing them where used.

This PR simply adds the imports it suggests (and squashes one other Java warning along the way). This leaves just deprecation warnings in the build.

Author: Sean Owen <sowen@cloudera.com>

Closes #404 from srowen/SPARK-1488 and squashes the following commits:

8598980 [Sean Owen] Quiet scalac warnings about language features by explicitly importing language features.
39bc831 [Sean Owen] Enable -feature in scalac to emit language feature warnings
2014-04-14 19:50:00 -07:00
Patrick Wendell 4bc07eebbf SPARK-1480: Clean up use of classloaders
The Spark codebase is a bit fast-and-loose when accessing classloaders and this has caused a few bugs to surface in master.

This patch defines some utility methods for accessing classloaders. This makes the intention when accessing a classloader much more explicit in the code and fixes a few cases where the wrong one was chosen.

case (a) -> We want the classloader that loaded Spark
case (b) -> We want the context class loader, or if not present, we want (a)

This patch provides a better fix for SPARK-1403 (https://issues.apache.org/jira/browse/SPARK-1403) than the current work around, which it reverts. It also fixes a previously unreported bug that the `./spark-submit` script did not work for running with `local` master. It didn't work because the executor classloader did not properly delegate to the context class loader (if it is defined) and in local mode the context class loader is set by the `./spark-submit` script. A unit test is added for that case.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #398 from pwendell/class-loaders and squashes the following commits:

b4a1a58 [Patrick Wendell] Minor clean up
14f1272 [Patrick Wendell] SPARK-1480: Clean up use of classloaders
2014-04-13 08:58:37 -07:00
Michael Armbrust f99401a630 [SQL] Improve column pruning in the optimizer.
Author: Michael Armbrust <michael@databricks.com>

Closes #378 from marmbrus/columnPruning and squashes the following commits:

779da56 [Michael Armbrust] More consistent naming.
1a4e9ea [Michael Armbrust] More comments.
2f4e7b9 [Michael Armbrust] Improve column pruning in the optimizer.
2014-04-10 16:20:33 -07:00
Sandeep 930b70f052 Remove Unnecessary Whitespace's
stack these together in a commit else they show up chunk by chunk in different commits.

Author: Sandeep <sandeep@techaddict.me>

Closes #380 from techaddict/white_space and squashes the following commits:

b58f294 [Sandeep] Remove Unnecessary Whitespace's
2014-04-10 15:04:13 -07:00
Patrick Wendell 87bd1f9ef7 SPARK-1093: Annotate developer and experimental API's
This patch marks some existing classes as private[spark] and adds two types of API annotations:
- `EXPERIMENTAL API` = experimental user-facing module
- `DEVELOPER API - UNSTABLE` = developer-facing API that might change

There is some discussion of the different mechanisms for doing this here:
https://issues.apache.org/jira/browse/SPARK-1081

I was pretty aggressive with marking things private. Keep in mind that if we want to open something up in the future we can, but we can never reduce visibility.

A few notes here:
- In the past we've been inconsistent with the visiblity of the X-RDD classes. This patch marks them private whenever there is an existing function in RDD that can directly creat them (e.g. CoalescedRDD and rdd.coalesce()). One trade-off here is users can't subclass them.
- Noted that compression and serialization formats don't have to be wire compatible across versions.
- Compression codecs and serialization formats are semi-private as users typically don't instantiate them directly.
- Metrics sources are made private - user only interacts with them through Spark's reflection

Author: Patrick Wendell <pwendell@gmail.com>
Author: Andrew Or <andrewor14@gmail.com>

Closes #274 from pwendell/private-apis and squashes the following commits:

44179e4 [Patrick Wendell] Merge remote-tracking branch 'apache-github/master' into private-apis
042c803 [Patrick Wendell] spark.annotations -> spark.annotation
bfe7b52 [Patrick Wendell] Adding experimental for approximate counts
8d0c873 [Patrick Wendell] Warning in SparkEnv
99b223a [Patrick Wendell] Cleaning up annotations
e849f64 [Patrick Wendell] Merge pull request #2 from andrewor14/annotations
982a473 [Andrew Or] Generalize jQuery matching for non Spark-core API docs
a01c076 [Patrick Wendell] Merge pull request #1 from andrewor14/annotations
c1bcb41 [Andrew Or] DeveloperAPI -> DeveloperApi
0d48908 [Andrew Or] Comments and new lines (minor)
f3954e0 [Andrew Or] Add identifier tags in comments to work around scaladocs bug
99192ef [Andrew Or] Dynamically add badges based on annotations
824011b [Andrew Or] Add support for injecting arbitrary JavaScript to API docs
037755c [Patrick Wendell] Some changes after working with andrew or
f7d124f [Patrick Wendell] Small fixes
c318b24 [Patrick Wendell] Use CSS styles
e4c76b9 [Patrick Wendell] Logging
f390b13 [Patrick Wendell] Better visibility for workaround constructors
d6b0afd [Patrick Wendell] Small chang to existing constructor
403ba52 [Patrick Wendell] Style fix
870a7ba [Patrick Wendell] Work around for SI-8479
7fb13b2 [Patrick Wendell] Changes to UnionRDD and EmptyRDD
4a9e90c [Patrick Wendell] EXPERIMENTAL API --> EXPERIMENTAL
c581dce [Patrick Wendell] Changes after building against Shark.
8452309 [Patrick Wendell] Style fixes
1ed27d2 [Patrick Wendell] Formatting and coloring of badges
cd7a465 [Patrick Wendell] Code review feedback
2f706f1 [Patrick Wendell] Don't use floats
542a736 [Patrick Wendell] Small fixes
cf23ec6 [Patrick Wendell] Marking GraphX as alpha
d86818e [Patrick Wendell] Another naming change
5a76ed6 [Patrick Wendell] More visiblity clean-up
42c1f09 [Patrick Wendell] Using better labels
9d48cbf [Patrick Wendell] Initial pass
2014-04-09 01:14:46 -07:00
Reynold Xin f27e56aa61 Change timestamp cast semantics. When cast to numeric types, return the unix time in seconds (instead of millis).
@marmbrus @chenghao-intel

Author: Reynold Xin <rxin@apache.org>

Closes #352 from rxin/timestamp-cast and squashes the following commits:

18aacd3 [Reynold Xin] Fixed precision for double.
2adb235 [Reynold Xin] Change timestamp cast semantics. When cast to numeric types, return the unix time in seconds (instead of millis).
2014-04-07 19:28:24 -07:00
Reynold Xin 31e6fff037 Added eval for Rand (without any support for user-defined seed).
Author: Reynold Xin <rxin@apache.org>

Closes #349 from rxin/rand and squashes the following commits:

fd11322 [Reynold Xin] Added eval for Rand (without any support for user-defined seed).
2014-04-07 18:40:08 -07:00
Reynold Xin 55dfd5dcdb Removed the default eval implementation from Expression, and added a bunch of override's in classes I touched.
It is more robust to not provide a default implementation for Expression's.

Author: Reynold Xin <rxin@apache.org>

Closes #350 from rxin/eval-default and squashes the following commits:

0a83b8f [Reynold Xin] Removed the default eval implementation from Expression, and added a bunch of override's in classes I touched.
2014-04-07 18:39:18 -07:00
Reynold Xin 83f2a2f14e [sql] Rename Expression.apply to eval for better readability.
Also used this opportunity to add a bunch of override's and made some members private.

Author: Reynold Xin <rxin@apache.org>

Closes #340 from rxin/eval and squashes the following commits:

a7c7ca7 [Reynold Xin] Fixed conflicts in merge.
9069de6 [Reynold Xin] Merge branch 'master' into eval
3ccc313 [Reynold Xin] Merge branch 'master' into eval
1a47e10 [Reynold Xin] Renamed apply to eval for generators and added a bunch of override's.
ea061de [Reynold Xin] Rename Expression.apply to eval for better readability.
2014-04-07 10:45:31 -07:00
Michael Armbrust accd0999f9 [SQL] SPARK-1371 Hash Aggregation Improvements
Given:
```scala
case class Data(a: Int, b: Int)
val rdd =
  sparkContext
    .parallelize(1 to 200)
    .flatMap(_ => (1 to 50000).map(i => Data(i % 100, i)))
rdd.registerAsTable("data")
cacheTable("data")
```
Before:
```
SELECT COUNT(*) FROM data:[10000000]
16795.567ms
SELECT a, SUM(b) FROM data GROUP BY a
7536.436ms
SELECT SUM(b) FROM data
10954.1ms
```

After:
```
SELECT COUNT(*) FROM data:[10000000]
1372.175ms
SELECT a, SUM(b) FROM data GROUP BY a
2070.446ms
SELECT SUM(b) FROM data
958.969ms
```

Author: Michael Armbrust <michael@databricks.com>

Closes #295 from marmbrus/hashAgg and squashes the following commits:

ec63575 [Michael Armbrust] Add comment.
d0495a9 [Michael Armbrust] Use scaladoc instead.
b4a6887 [Michael Armbrust] Address review comments.
a2d90ba [Michael Armbrust] Capture child output statically to avoid issues with generators and serialization.
7c13112 [Michael Armbrust] Rewrite Aggregate operator to stream input and use projections.  Remove unused local RDD functions implicits.
5096f99 [Michael Armbrust] Make HiveUDAF fields transient since object inspectors are not serializable.
6a4b671 [Michael Armbrust] Add option to avoid binding operators expressions automatically.
92cca08 [Michael Armbrust] Always include serialization debug info when running tests.
1279df2 [Michael Armbrust] Increase default number of partitions.
2014-04-07 00:14:00 -07:00
witgo 7012ffafad Fix SPARK-1420 The maven build error for Spark Catalyst
Author: witgo <witgo@qq.com>

Closes #333 from witgo/SPARK-1420 and squashes the following commits:

902519e [witgo] add dependency scala-reflect to catalyst
2014-04-06 16:03:06 -07:00
Michael Armbrust d956cc2516 [SQL] Minor fixes.
Author: Michael Armbrust <michael@databricks.com>

Closes #315 from marmbrus/minorFixes and squashes the following commits:

b23a15d [Michael Armbrust] fix scaladoc
11062ac [Michael Armbrust] Fix registering "SELECT *" queries as tables and caching them.  As some tests for this and self-joins.
3997dc9 [Michael Armbrust] Move Row extractor to catalyst.
208bf5e [Michael Armbrust] More idiomatic naming of DSL functions. * subquery => as * for join condition => on, i.e., `r.join(s, condition = 'a == 'b)` =>`r.join(s, on = 'a == 'b)`
87211ce [Michael Armbrust] Correctly handle self joins of in-memory cached tables.
69e195e [Michael Armbrust] Change != to !== in the DSL since != will always translate to != on Any.
01f2dd5 [Michael Armbrust] Correctly assign aliases to tables in SqlParser.
2014-04-04 17:23:17 -07:00
Cheng Hao 5d1feda217 [SPARK-1360] Add Timestamp Support for SQL
This PR includes:
1) Add new data type Timestamp
2) Add more data type casting base on Hive's Rule
3) Fix bug missing data type in both parsers (HiveQl & SQLParser).

Author: Cheng Hao <hao.cheng@intel.com>

Closes #275 from chenghao-intel/timestamp and squashes the following commits:

df709e5 [Cheng Hao] Move orc_ends_with_nulls to blacklist
24b04b0 [Cheng Hao] Put 3 cases into the black lists(describe_pretty,describe_syntax,lateral_view_outer)
fc512c2 [Cheng Hao] remove the unnecessary data type equality check in data casting
d0d1919 [Cheng Hao] Add more data type for scala reflection
3259808 [Cheng Hao] Add the new Golden files
3823b97 [Cheng Hao] Update the UnitTest cases & add timestamp type for HiveQL
54a0489 [Cheng Hao] fix bug mapping to 0 (which is supposed to be null) when NumberFormatException occurs
9cb505c [Cheng Hao] Fix issues according to PR comments
e529168 [Cheng Hao] Fix bug of converting from String
6fc8100 [Cheng Hao] Update Unit Test & CodeStyle
8a1d4d6 [Cheng Hao] Add DataType for SqlParser
ce4385e [Cheng Hao] Add TimestampType Support
2014-04-03 15:33:17 -07:00
Andre Schumacher fbebaedf26 Spark parquet improvements
A few improvements to the Parquet support for SQL queries:
- Instead of files a ParquetRelation is now backed by a directory, which simplifies importing data from other
  sources
- InsertIntoParquetTable operation now supports switching between overwriting or appending (at least in
  HiveQL)
- tests now use the new API
- Parquet logging can be set to WARNING level (Default)
- Default compression for Parquet files (GZIP, as in parquet-mr)

Author: Andre Schumacher <andre.schumacher@iki.fi>

Closes #195 from AndreSchumacher/spark_parquet_improvements and squashes the following commits:

54df314 [Andre Schumacher] SPARK-1383 [SQL] Improvements to ParquetRelation
2014-04-03 15:31:47 -07:00
Michael Armbrust 47ebea5468 [SQL] SPARK-1364 Improve datatype and test coverage for ScalaReflection schema inference.
Author: Michael Armbrust <michael@databricks.com>

Closes #293 from marmbrus/reflectTypes and squashes the following commits:

f54e8e8 [Michael Armbrust] Improve datatype and test coverage for ScalaReflection schema inference.
2014-04-02 18:14:31 -07:00
Reynold Xin ed730c9502 StopAfter / TopK related changes
1. Renamed StopAfter to Limit to be more consistent with naming in other relational databases.
2. Renamed TopK to TakeOrdered to be more consistent with Spark RDD API.
3. Avoid breaking lineage in Limit.
4. Added a bunch of override's to execution/basicOperators.scala.

@marmbrus @liancheng

Author: Reynold Xin <rxin@apache.org>
Author: Michael Armbrust <michael@databricks.com>

Closes #233 from rxin/limit and squashes the following commits:

13eb12a [Reynold Xin] Merge pull request #1 from marmbrus/limit
92b9727 [Michael Armbrust] More hacks to make Maps serialize with Kryo.
4fc8b4e [Reynold Xin] Merge branch 'master' of github.com:apache/spark into limit
87b7d37 [Reynold Xin] Use the proper serializer in limit.
9b79246 [Reynold Xin] Updated doc for Limit.
47d3327 [Reynold Xin] Copy tuples in Limit before shuffle.
231af3a [Reynold Xin] Limit/TakeOrdered: 1. Renamed StopAfter to Limit to be more consistent with naming in other relational databases. 2. Renamed TopK to TakeOrdered to be more consistent with Spark RDD API. 3. Avoid breaking lineage in Limit. 4. Added a bunch of override's to execution/basicOperators.scala.
2014-04-02 12:48:04 -07:00
Michael Armbrust f5c418da04 [SQL] SPARK-1372 Support for caching and uncaching tables in a SQLContext.
This doesn't yet support different databases in Hive (though you can probably workaround this by calling `USE <dbname>`).  However, given the time constraints for 1.0 I think its probably worth including this now and extending the functionality in the next release.

Author: Michael Armbrust <michael@databricks.com>

Closes #282 from marmbrus/cacheTables and squashes the following commits:

83785db [Michael Armbrust] Support for caching and uncaching tables in a SQLContext.
2014-04-01 14:45:44 -07:00
Michael Armbrust 5731af5be6 [SQL] Rewrite join implementation to allow streaming of one relation.
Before we were materializing everything in memory.  This also uses the projection interface so will be easier to plug in code gen (its ported from that branch).

@rxin @liancheng

Author: Michael Armbrust <michael@databricks.com>

Closes #250 from marmbrus/hashJoin and squashes the following commits:

1ad873e [Michael Armbrust] Change hasNext logic back to the correct version.
8e6f2a2 [Michael Armbrust] Review comments.
1e9fb63 [Michael Armbrust] style
bc0cb84 [Michael Armbrust] Rewrite join implementation to allow streaming of one relation.
2014-03-31 15:23:46 -07:00
jerryshao 95d7d2a3fc [SPARK-1354][SQL] Add tableName as a qualifier for SimpleCatelogy
Fix attribute unresolved when query with table name as a qualifier in SQLContext with SimplCatelog, details please see [SPARK-1354](https://issues.apache.org/jira/browse/SPARK-1354?jql=project%20%3D%20SPARK).

Author: jerryshao <saisai.shao@intel.com>

Closes #272 from jerryshao/qualifier-fix and squashes the following commits:

7950170 [jerryshao] Add tableName as a qualifier for SimpleCatelogy
2014-03-30 10:04:28 -07:00
Cheng Hao af3746ce0d Implement the RLike & Like in catalyst
This PR includes:
1) Unify the unit test for expression evaluation
2) Add implementation of RLike & Like

Author: Cheng Hao <hao.cheng@intel.com>

Closes #224 from chenghao-intel/string_expression and squashes the following commits:

84f72e9 [Cheng Hao] fix bug in RLike/Like & Simplify the unit test
aeeb1d7 [Cheng Hao] Simplify the implementation/unit test of RLike/Like
319edb7 [Cheng Hao] change to spark code style
91cfd33 [Cheng Hao] add implementation for rlike/like
2c8929e [Cheng Hao] Update the unit test for expression evaluation
2014-03-29 15:12:43 -07:00