Commit graph

10723 commits

Author SHA1 Message Date
Yanbo Liang 10c78607b2 [SPARK-6496] [MLLIB] GeneralizedLinearAlgorithm.run(input, initialWeights) should initialize numFeatures
In GeneralizedLinearAlgorithm ```numFeatures``` is default to -1, we need to update it to correct value when we call run() to train a model.
```LogisticRegressionWithLBFGS.run(input)``` works well, but when we call ```LogisticRegressionWithLBFGS.run(input, initialWeights)``` to train multiclass classification model, it will throw exception due to the numFeatures is not updated.
In this PR, we just update numFeatures at the beginning of GeneralizedLinearAlgorithm.run(input, initialWeights) and add test case.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #5167 from yanboliang/spark-6496 and squashes the following commits:

8131c48 [Yanbo Liang] LogisticRegressionWithLBFGS.run(input, initialWeights) should initialize numFeatures
2015-03-25 17:05:56 +00:00
zzcclp 64262ed999 [SPARK-6483][SQL]Improve ScalaUdf called performance.
As issue [SPARK-6483](https://issues.apache.org/jira/browse/SPARK-6483) description, ScalaUdf is low performance because of calling *asInstanceOf* to convert per record.
With this, the performance of ScalaUdf is the same as other case.
thank lianhuiwang for telling me how to resolve this problem.

Author: zzcclp <xm_zzc@sina.com>

Closes #5154 from zzcclp/SPARK-6483 and squashes the following commits:

5ac6e09 [zzcclp] Add a newline at the end of source file
cc6868e [zzcclp] Fix for fail on unit test.
0a8cdc3 [zzcclp] indention issue
b73836a [zzcclp] Access Seq[Expression] element by :: operator, and update the code gen script.
7763848 [zzcclp] rebase from master
2015-03-25 19:11:04 +08:00
Bill Chambers c5cc41468e [DOCUMENTATION]Fixed Missing Type Import in Documentation
Needed to import the types specifically, not the more general pyspark.sql

Author: Bill Chambers <wchambers@ischool.berkeley.edu>
Author: anabranch <wac.chambers@gmail.com>

Closes #5179 from anabranch/master and squashes the following commits:

8fa67bf [anabranch] Corrected SqlContext Import
603b080 [Bill Chambers] [DOCUMENTATION]Fixed Missing Type Import in Documentation
2015-03-24 22:24:35 -07:00
Xiangrui Meng c14ddd97ed [SPARK-6515] update OpenHashSet impl
Though I don't see any bug in the existing code, the update in this PR makes it read better. rxin

Author: Xiangrui Meng <meng@databricks.com>

Closes #5176 from mengxr/SPARK-6515 and squashes the following commits:

134494d [Xiangrui Meng] update OpenHashSet impl
2015-03-24 18:58:27 -07:00
Reynold Xin 94598653bc [SPARK-6428][Streaming] Added explicit types for all public methods.
Author: Reynold Xin <rxin@databricks.com>

Closes #5110 from rxin/streaming-explicit-type and squashes the following commits:

2c2db32 [Reynold Xin] [SPARK-6428][Streaming] Added explicit types for all public methods.
2015-03-24 17:08:25 -07:00
Xiangrui Meng 6930e965e2 [SPARK-6512] add contains to OpenHashMap
Add `contains` to test whether a key exists in an OpenHashMap. rxin

Author: Xiangrui Meng <meng@databricks.com>

Closes #5171 from mengxr/openhashmap-contains and squashes the following commits:

d6e6f1f [Xiangrui Meng] add contains to primitivekeyopenhashmap
748a69b [Xiangrui Meng] add contains to OpenHashMap
2015-03-24 17:06:22 -07:00
Christophe Préaud 05c2214b41 [SPARK-6469] Improving documentation on YARN local directories usage
Clarify the local directories usage in YARN

Author: Christophe Préaud <christophe.preaud@kelkoo.com>

Closes #5165 from preaudc/yarn-doc-local-dirs and squashes the following commits:

6912b90 [Christophe Préaud] Fix some formatting issues.
4fa8ec2 [Christophe Préaud] Merge remote-tracking branch 'upstream/master' into yarn-doc-local-dirs
eaaf519 [Christophe Préaud] Clarify the local directories usage in YARN
436fb7d [Christophe Préaud] Revert "Clarify the local directories usage in YARN"
876ae5e [Christophe Préaud] Clarify the local directories usage in YARN
608dbfa [Christophe Préaud] Merge remote-tracking branch 'upstream/master'
a49a2ce [Christophe Préaud] Merge remote-tracking branch 'upstream/master'
9ba89ca [Christophe Préaud] Ensure that files are fetched atomically
54419ae [Christophe Préaud] Merge remote-tracking branch 'upstream/master'
c6a5590 [Christophe Préaud] Revert commit 8ea871f8130b2490f1bad7374a819bf56f0ccbbd
7456a33 [Christophe Préaud] Merge remote-tracking branch 'upstream/master'
8ea871f [Christophe Préaud] Ensure that files are fetched atomically
2015-03-24 17:05:49 -07:00
Andrew Or dd907d1a9d Revert "[SPARK-5771] Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called"
This reverts commit dd077abf2e.

Conflicts:
	core/src/main/scala/org/apache/spark/deploy/master/ApplicationInfo.scala
	core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala
2015-03-24 16:49:27 -07:00
Andrew Or f7c3668ee6 Revert "[SPARK-5771][UI][hotfix] Change Requested Cores into * if default cores is not set"
This reverts commit 12135e9054.
2015-03-24 16:41:31 -07:00
Kay Ousterhout d8ccf655f3 [SPARK-3570] Include time to open files in shuffle write time.
Opening shuffle files can be very significant when the disk is
contended, especially when using ext3. While writing data to
a file can avoid hitting disk (and instead hit the buffer
cache), opening a file always involves writing some metadata
about the file to disk, so the open time can be a very significant
portion of the shuffle write time. In one job I ran recently, the time to
write shuffle data to the file was only 4ms for each task, but
the time to open the file was about 100x as long (~400ms).

When we add metrics about spilled data (#2504), we should ensure
that the file open time is also included there.

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #4550 from kayousterhout/SPARK-3570 and squashes the following commits:

ea3a4ae [Kay Ousterhout] Added comment about excluded open time
fdc5185 [Kay Ousterhout] Improved comment
42b7e43 [Kay Ousterhout] Fixed parens for nanotime
2423555 [Kay Ousterhout] [SPARK-3570] Include time to open files in shuffle write time.
2015-03-24 16:29:40 -07:00
Kay Ousterhout 6948ab6f8b [SPARK-6088] Correct how tasks that get remote results are shown in UI.
It would be great to fix this for 1.3. since the fix is surgical and it helps understandability for users.

cc shivaram pwendell

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #4839 from kayousterhout/SPARK-6088 and squashes the following commits:

3ab012c [Kay Ousterhout] Update getting result time incrementally, correctly set GET_RESULT status
f346b49 [Kay Ousterhout] Typos
748ea6b [Kay Ousterhout] Fixed build failure
84d617c [Kay Ousterhout] [SPARK-6088] Correct how tasks that get remote results are shown in the UI.
2015-03-24 16:26:43 -07:00
Reynold Xin 73348012d4 [SPARK-6428][SQL] Added explicit types for all public methods in catalyst
I think after this PR, we can finally turn the rule on. There are still some smaller ones that need to be fixed, but those are easier.

Author: Reynold Xin <rxin@databricks.com>

Closes #5162 from rxin/catalyst-explicit-types and squashes the following commits:

e7eac03 [Reynold Xin] [SPARK-6428][SQL] Added explicit types for all public methods in catalyst.
2015-03-24 16:03:55 -07:00
Josh Rosen 7215aa7455 [SPARK-6209] Clean up connections in ExecutorClassLoader after failing to load classes (master branch PR)
ExecutorClassLoader does not ensure proper cleanup of network connections that it opens. If it fails to load a class, it may leak partially-consumed InputStreams that are connected to the REPL's HTTP class server, causing that server to exhaust its thread pool, which can cause the entire job to hang.  See [SPARK-6209](https://issues.apache.org/jira/browse/SPARK-6209) for more details, including a bug reproduction.

This patch fixes this issue by ensuring proper cleanup of these resources.  It also adds logging for unexpected error cases.

This PR is an extended version of #4935 and adds a regression test.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #4944 from JoshRosen/executorclassloader-leak-master-branch and squashes the following commits:

e0e3c25 [Josh Rosen] Wrap try block around getReponseCode; re-enable keep-alive by closing error stream
961c284 [Josh Rosen] Roll back changes that were added to get the regression test to fail
7ee2261 [Josh Rosen] Add a failing regression test
e2d70a3 [Josh Rosen] Properly clean up after errors in ExecutorClassLoader
2015-03-24 14:38:20 -07:00
Michael Armbrust a8f51b8296 [SPARK-6458][SQL] Better error messages for invalid data sources
Avoid unclear match errors and use `AnalysisException`.

Author: Michael Armbrust <michael@databricks.com>

Closes #5158 from marmbrus/dataSourceError and squashes the following commits:

af9f82a [Michael Armbrust] Yins comment
90c6ba4 [Michael Armbrust] Better error messages for invalid data sources
2015-03-24 14:10:56 -07:00
Michael Armbrust cbeaf9ebab [SPARK-6376][SQL] Avoid eliminating subqueries until optimization
Previously it was okay to throw away subqueries after analysis, as we would never try to use that tree for resolution again.  However, with eager analysis in `DataFrame`s this can cause errors for queries such as:

```scala
val df = Seq(1,2,3).map(i => (i, i.toString)).toDF("int", "str")
df.as('x).join(df.as('y), $"x.str" === $"y.str").groupBy("x.str").count()
```

As a result, in this PR we defer the elimination of subqueries until the optimization phase.

Author: Michael Armbrust <michael@databricks.com>

Closes #5160 from marmbrus/subqueriesInDfs and squashes the following commits:

a9bb262 [Michael Armbrust] Update Optimizer.scala
27d25bf [Michael Armbrust] fix hive tests
9137e03 [Michael Armbrust] add type
81cd597 [Michael Armbrust] Avoid eliminating subqueries until optimization
2015-03-24 14:08:20 -07:00
Michael Armbrust 046c1e2aa4 [SPARK-6375][SQL] Fix formatting of error messages.
Author: Michael Armbrust <michael@databricks.com>

Closes #5155 from marmbrus/errorMessages and squashes the following commits:

b898188 [Michael Armbrust] Fix formatting of error messages.
2015-03-24 13:22:46 -07:00
Michael Armbrust 3fa3d121df [SPARK-6054][SQL] Fix transformations of TreeNodes that hold StructTypes
Due to a recent change that made `StructType` a `Seq` we started inadvertently turning `StructType`s into generic `Traversable` when attempting nested tree transformations.  In this PR we explicitly avoid descending into `DataType`s to avoid this bug.

Author: Michael Armbrust <michael@databricks.com>

Closes #5157 from marmbrus/udfFix and squashes the following commits:

26f7087 [Michael Armbrust] Fix transformations of TreeNodes that hold StructTypes
2015-03-24 12:28:01 -07:00
Michael Armbrust 26c6ce3d29 [SPARK-6437][SQL] Use completion iterator to close external sorter
Otherwise we will leak files when spilling occurs.

Author: Michael Armbrust <michael@databricks.com>

Closes #5161 from marmbrus/cleanupAfterSort and squashes the following commits:

cb13d3c [Michael Armbrust] hint to inferencer
cdebdf5 [Michael Armbrust] Use completion iterator to close external sorter
2015-03-24 12:10:30 -07:00
Michael Armbrust 32efadd050 [SPARK-6459][SQL] Warn when constructing trivially true equals predicate
For example, one might expect the following code to work, but it does not.  Now you will at least get a warning with a suggestion to use aliases.

```scala
val df = sqlContext.load(path, "parquet")
val txns = df.groupBy("cust_id").agg($"cust_id", countDistinct($"day_num").as("txns"))
val spend = df.groupBy("cust_id").agg($"cust_id", sum($"extended_price").as("spend"))
val rmJoin = txns.join(spend, txns("cust_id") === spend("cust_id"), "inner")
```

Author: Michael Armbrust <michael@databricks.com>

Closes #5163 from marmbrus/selfJoinError and squashes the following commits:

16c1f0b [Michael Armbrust] fix visibility
1b57e8d [Michael Armbrust] Warn when constructing trivially true equals predicate
2015-03-24 12:09:02 -07:00
Xiangrui Meng 6bdddb6f6f [SPARK-6361][SQL] support adding a column with metadata in DF
This is used by ML pipelines to embed ML attributes in columns created by ML transformers/estimators. marmbrus

Author: Xiangrui Meng <meng@databricks.com>

Closes #5151 from mengxr/SPARK-6361 and squashes the following commits:

bb30de3 [Xiangrui Meng] support adding a column with metadata in DF
2015-03-24 12:08:19 -07:00
Xiangrui Meng a1d1529dae [SPARK-6475][SQL] recognize array types when infer data types from JavaBeans
Right now if there is a array field in a JavaBean, the user wold see an exception in `createDataFrame`. liancheng

Author: Xiangrui Meng <meng@databricks.com>

Closes #5146 from mengxr/SPARK-6475 and squashes the following commits:

51e87e5 [Xiangrui Meng] validate schemas
4f2df5e [Xiangrui Meng] recognize array types when infer data types from JavaBeans
2015-03-24 10:11:27 -07:00
Peter Rudenko 08d4528011 [ML][docs][minor] Define LabeledDocument/Document classes in CV example
To easier copy/paste Cross-Validation example code snippet need to define LabeledDocument/Document in it, since they difined in a previous example.

Author: Peter Rudenko <petro.rudenko@gmail.com>

Closes #5135 from petro-rudenko/patch-3 and squashes the following commits:

5190c75 [Peter Rudenko] Fix primitive types for java examples.
1d35383 [Peter Rudenko] [SQL][docs][minor] Define LabeledDocument/Document classes in CV example
2015-03-24 16:33:38 +00:00
Kousuke Saruta 85cf063682 [SPARK-5559] [Streaming] [Test] Remove oppotunity we met flakiness when running FlumeStreamSuite
When we run FlumeStreamSuite on Jenkins, sometimes we get error like as follows.

    sbt.ForkMain$ForkError: The code passed to eventually never returned normally. Attempted 52 times over 10.094849836 seconds. Last failure message: Error connecting to localhost/127.0.0.1:23456.
	    at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420)
	    at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438)
	    at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
	    at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:307)
	   at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
	   at org.apache.spark.streaming.flume.FlumeStreamSuite.writeAndVerify(FlumeStreamSuite.scala:116)
           at org.apache.spark.streaming.flume.FlumeStreamSuite.org$apache$spark$streaming$flume$FlumeStreamSuite$$testFlumeStream(FlumeStreamSuite.scala:74)
	   at org.apache.spark.streaming.flume.FlumeStreamSuite$$anonfun$3.apply$mcV$sp(FlumeStreamSuite.scala:66)
	    at org.apache.spark.streaming.flume.FlumeStreamSuite$$anonfun$3.apply(FlumeStreamSuite.scala:66)
	    at org.apache.spark.streaming.flume.FlumeStreamSuite$$anonfun$3.apply(FlumeStreamSuite.scala:66)
	    at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
	    at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
	    at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
	    at org.scalatest.Transformer.apply(Transformer.scala:22)
	    at org.scalatest.Transformer.apply(Transformer.scala:20)
    	    at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
	    at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
	    at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
	    at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
	   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	    at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	    at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
	    at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)

This error is caused by check-then-act logic  when it find free-port .

      /** Find a free port */
      private def findFreePort(): Int = {
        Utils.startServiceOnPort(23456, (trialPort: Int) => {
          val socket = new ServerSocket(trialPort)
          socket.close()
          (null, trialPort)
        }, conf)._2
      }

Removing the check-then-act is not easy but we can reduce the chance of having the error by choosing random value for initial port instead of 23456.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #4337 from sarutak/SPARK-5559 and squashes the following commits:

16f109f [Kousuke Saruta] Added `require` to Utils#startServiceOnPort
c39d8b6 [Kousuke Saruta] Merge branch 'SPARK-5559' of github.com:sarutak/spark into SPARK-5559
1610ba2 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5559
33357e3 [Kousuke Saruta] Changed "findFreePort" method in MQTTStreamSuite and FlumeStreamSuite so that it can choose valid random port
a9029fe [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5559
9489ef9 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5559
8212e42 [Kousuke Saruta] Modified default port used in FlumeStreamSuite from 23456 to random value
2015-03-24 16:20:52 +00:00
Marcelo Vanzin b293afc42c [SPARK-6473] [core] Do not try to figure out Scala version if not needed...
....

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #5143 from vanzin/SPARK-6473 and squashes the following commits:

a2e5e2d [Marcelo Vanzin] [SPARK-6473] [core] Do not try to figure out Scala version if not needed.
2015-03-24 13:48:33 +00:00
Cong Yue c12312f8b1 Update the command to use IPython notebook
As for "notebook --pylab inline" is not supported any more, update the related documentation for this.

Author: Cong Yue <yuecong1104@gmail.com>

Closes #5111 from yuecong/patch-1 and squashes the following commits:

872df76 [Cong Yue] Update the command to use IPython notebook
2015-03-24 12:58:58 +00:00
Brennon York 37fac1dcd2 [SPARK-6477][Build]: Run MIMA tests before the Spark test suite
This moves the MIMA checks to before the full Spark test suite such that, if new PR's fail the MIMA check, they will return much faster having not run the entire test suite. This is preferable to the current scenario where a user would have to wait until the entire test suite completes before realizing it failed on a MIMA check in which case, once the MIMA issues are fixed, the user would have to resubmit and rerun the full test suite again.

Author: Brennon York <brennon.york@capitalone.com>

Closes #5145 from brennonyork/SPARK-6477 and squashes the following commits:

12b0aee [Brennon York] updated to put the mima checks before the spark test suite
2015-03-24 10:33:04 +00:00
Cheng Lian 1afcf773d0 [SPARK-6452] [SQL] Checks for missing attributes and unresolved operator for all types of operator
In `CheckAnalysis`, `Filter` and `Aggregate` are checked in separate case clauses, thus never hit those clauses for unresolved operators and missing input attributes.

This PR also removes the `prettyString` call when generating error message for missing input attributes. Because result of `prettyString` doesn't contain expression ID, and may give confusing messages like

> resolved attributes a missing from a

cc rxin

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5129)
<!-- Reviewable:end -->

Author: Cheng Lian <lian@databricks.com>

Closes #5129 from liancheng/spark-6452 and squashes the following commits:

52cdc69 [Cheng Lian] Addresses comments
029f9bd [Cheng Lian] Checks for missing attributes and unresolved operator for all types of operator
2015-03-24 01:12:11 -07:00
Reynold Xin 4ce2782a61 [SPARK-6428] Added explicit types for all public methods in core.
Author: Reynold Xin <rxin@databricks.com>

Closes #5125 from rxin/core-explicit-type and squashes the following commits:

f471415 [Reynold Xin] Revert style checker changes.
81b66e4 [Reynold Xin] Code review feedback.
a7533e3 [Reynold Xin] Mima excludes.
1d795f5 [Reynold Xin] [SPARK-6428] Added explicit types for all public methods in core.
2015-03-23 23:41:06 -07:00
Volodymyr Lyubinets bfd3ee9f76 [SPARK-6124] Support jdbc connection properties in OPTIONS part of the query
One more thing if this PR is considered to be OK - it might make sense to add extra .jdbc() API's that take Properties to SQLContext.

Author: Volodymyr Lyubinets <vlyubin@gmail.com>

Closes #4859 from vlyubin/jdbcProperties and squashes the following commits:

7a8cfda [Volodymyr Lyubinets] Support jdbc connection properties in OPTIONS part of the query
2015-03-23 17:00:27 -07:00
Patrick Wendell 6cd7058b36 Revert "[SPARK-6122][Core] Upgrade Tachyon client version to 0.6.1."
This reverts commit a41b9c6004.
2015-03-23 15:08:39 -07:00
MechCoder 474d1320c9 [SPARK-6308] [MLlib] [Sql] Override TypeName in VectorUDT and MatrixUDT
Author: MechCoder <manojkumarsivaraj334@gmail.com>

Closes #5118 from MechCoder/spark-6308 and squashes the following commits:

6c8ffab [MechCoder] Add test for simpleString
b966242 [MechCoder] [SPARK-6308] [MLlib][Sql] VectorUDT is displayed as vecto in dtypes
2015-03-23 13:30:21 -07:00
Yadong Qi 9f3273bd9c [SPARK-6397][SQL] Check the missingInput simply
https://github.com/apache/spark/pull/5082

/cc liancheng

Author: Yadong Qi <qiyadong2010@gmail.com>

Closes #5132 from watermen/sql-missingInput-new and squashes the following commits:

1e5bdc5 [Yadong Qi] Check the missingInput simply
2015-03-23 18:16:49 +08:00
Cheng Lian bf044def4c Revert "[SPARK-6397][SQL] Check the missingInput simply"
This reverts commit e566fe5982.
2015-03-23 12:15:19 +08:00
q00251598 e566fe5982 [SPARK-6397][SQL] Check the missingInput simply
Author: q00251598 <qiyadong@huawei.com>

Closes #5082 from watermen/sql-missingInput and squashes the following commits:

25766b9 [q00251598] Check the missingInput simply
2015-03-23 12:06:13 +08:00
Daoyuan Wang 4659468f36 [SPARK-4985] [SQL] parquet support for date type
This PR might have some issues with #3732 ,
and this would have merge conflicts with #3820 so the review can be delayed till that 2 were merged.

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #3822 from adrian-wang/parquetdate and squashes the following commits:

2c5d54d [Daoyuan Wang] add a test case
faef887 [Daoyuan Wang] parquet support for primitive date
97e9080 [Daoyuan Wang] parquet support for date type
2015-03-23 11:46:16 +08:00
vinodkc 2bf40c58e6 [SPARK-6337][Documentation, SQL]Spark 1.3 doc fixes
Author: vinodkc <vinod.kc.in@gmail.com>

Closes #5112 from vinodkc/spark_1.3_doc_fixes and squashes the following commits:

2c6aee6 [vinodkc] Spark 1.3 doc fixes
2015-03-22 20:00:08 +00:00
Reynold Xin 7a0da47708 [HOTFIX] Build break due to https://github.com/apache/spark/pull/5128 2015-03-22 12:08:15 -07:00
Calvin Jia a41b9c6004 [SPARK-6122][Core] Upgrade Tachyon client version to 0.6.1.
Changes the Tachyon client version from 0.5 to 0.6 in spark core and distribution script.

New dependencies in Tachyon 0.6.0 include

commons-codec:commons-codec:jar:1.5:compile
io.netty:netty-all:jar:4.0.23.Final:compile

These are already in spark core.

Author: Calvin Jia <jia.calvin@gmail.com>

Closes #4867 from calvinjia/upgrade_tachyon_0.6.0 and squashes the following commits:

eed9230 [Calvin Jia] Update tachyon version to 0.6.1.
11907b3 [Calvin Jia] Use TachyonURI for tachyon paths instead of strings.
71bf441 [Calvin Jia] Upgrade Tachyon client version to 0.6.0.
2015-03-22 11:11:29 -07:00
Kamil Smuga 6ef48632fb SPARK-6454 [DOCS] Fix links to pyspark api
Author: Kamil Smuga <smugakamil@gmail.com>
Author: stderr <smugakamil@gmail.com>

Closes #5120 from kamilsmuga/master and squashes the following commits:

fee3281 [Kamil Smuga] more python api links fixed for docs
13240cb [Kamil Smuga] resolved merge conflicts with upstream/master
6649b3b [Kamil Smuga] fix broken docs links to Python API
92f03d7 [stderr] Fix links to pyspark api
2015-03-22 15:56:25 +00:00
Jongyoul Lee adb2ff752f [SPARK-6453][Mesos] Some Mesos*Suite have a different package with their classes
- Moved Suites from o.a.s.s.mesos to o.a.s.s.cluster.mesos

Author: Jongyoul Lee <jongyoul@gmail.com>

Closes #5126 from jongyoul/SPARK-6453 and squashes the following commits:

4f24a3e [Jongyoul Lee] [SPARK-6453][Mesos] Some Mesos*Suite have a different package with their classes - Fixed imports orders
8ab149d [Jongyoul Lee] [SPARK-6453][Mesos] Some Mesos*Suite have a different package with their classes - Moved Suites from o.a.s.s.mesos to o.a.s.s.cluster.mesos
2015-03-22 15:54:19 +00:00
Hangchen Yu ab4f516fbe [SPARK-6455] [docs] Correct some mistakes and typos
Correct some typos. Correct a mistake in lib/PageRank.scala. The first PageRank implementation uses standalone Graph interface, but the second uses Pregel interface. It may mislead the code viewers.

Author: Hangchen Yu <yuhc@gitcafe.com>

Closes #5128 from yuhc/master and squashes the following commits:

53e5432 [Hangchen Yu] Merge branch 'master' of https://github.com/yuhc/spark
67b77b5 [Hangchen Yu] [SPARK-6455] [docs] Correct some mistakes and typos
206f2dc [Hangchen Yu] Correct some mistakes and typos.
2015-03-22 15:51:10 +00:00
Ryan Williams b9fe504b49 [SPARK-6448] Make history server log parse exceptions
This helped me to debug a parse error that was due to the event log format changing recently.

Author: Ryan Williams <ryan.blake.williams@gmail.com>

Closes #5122 from ryan-williams/histerror and squashes the following commits:

5831656 [Ryan Williams] line length
c3742ae [Ryan Williams] Make history server log parse exceptions
2015-03-22 11:54:23 +00:00
ypcat 9b1e1f20d4 [SPARK-6408] [SQL] Fix JDBCRDD filtering string literals
Author: ypcat <ypcat6@gmail.com>
Author: Pei-Lun Lee <pllee@appier.com>

Closes #5087 from ypcat/spark-6408 and squashes the following commits:

1becc16 [ypcat] [SPARK-6408] [SQL] styling
1bc4455 [ypcat] [SPARK-6408] [SQL] move nested function outside
e57fa4a [ypcat] [SPARK-6408] [SQL] fix test case
245ab6f [ypcat] [SPARK-6408] [SQL] add test cases for filtering quoted strings
8962534 [Pei-Lun Lee] [SPARK-6408] [SQL] Fix filtering string literals
2015-03-22 15:49:13 +08:00
Reynold Xin b6090f902e [SPARK-6428][SQL] Added explicit type for all public methods for Hive module
Author: Reynold Xin <rxin@databricks.com>

Closes #5108 from rxin/hive-public-type and squashes the following commits:

a320328 [Reynold Xin] [SPARK-6428][SQL] Added explicit type for all public methods for Hive module.
2015-03-21 14:30:04 -07:00
Yin Huai 94a102acb8 [SPARK-6250][SPARK-6146][SPARK-5911][SQL] Types are now reserved words in DDL parser.
This PR creates a trait `DataTypeParser` used to parse data types. This trait aims to be single place to provide the functionality of parsing data types' string representation. It is currently mixed in with `DDLParser` and `SqlParser`. It is also used to parse the data type for `DataFrame.cast` and to convert Hive metastore's data type string back to a `DataType`.

JIRA: https://issues.apache.org/jira/browse/SPARK-6250

Author: Yin Huai <yhuai@databricks.com>

Closes #5078 from yhuai/ddlKeywords and squashes the following commits:

0e66097 [Yin Huai] Special handle struct<>.
fea6012 [Yin Huai] Style.
c9733fb [Yin Huai] Create a trait to parse data types.
2015-03-21 13:27:53 -07:00
Venkata Ramana Gollamudi ee569a0c71 [SPARK-5680][SQL] Sum function on all null values, should return zero
SELECT sum('a'), avg('a'), variance('a'), std('a') FROM src;
Should give output as
0.0	NULL	NULL	NULL
This fixes hive udaf_number_format.q

Author: Venkata Ramana G <ramana.gollamudihuawei.com>

Author: Venkata Ramana Gollamudi <ramana.gollamudi@huawei.com>

Closes #4466 from gvramana/sum_fix and squashes the following commits:

42e14d1 [Venkata Ramana Gollamudi] Added comments
39415c0 [Venkata Ramana Gollamudi] Handled the partitioned Sum expression scenario
df66515 [Venkata Ramana Gollamudi] code style fix
4be2606 [Venkata Ramana Gollamudi] Add udaf_number_format to whitelist and golden answer
330fd64 [Venkata Ramana Gollamudi] fix sum function for all null data
2015-03-21 13:24:24 -07:00
x1- 52dd4b2b27 [SPARK-5320][SQL]Add statistics method at NoRelation (override super).
Because of no statistics override, in spute of super class say 'LeafNode must override'.
fix issue

[SPARK-5320: Joins on simple table created using select gives error](https://issues.apache.org/jira/browse/SPARK-5320)

Author: x1- <viva008@gmail.com>

Closes #5105 from x1-/SPARK-5320 and squashes the following commits:

e561aac [x1-] Add statistics method at NoRelation (override super).
2015-03-21 13:22:34 -07:00
Yanbo Liang e5d2c37c68 [SPARK-5821] [SQL] JSON CTAS command should throw error message when delete path failure
When using "CREATE TEMPORARY TABLE AS SELECT" to create JSON table, we first delete the path file or directory and then generate a new directory with the same name. But if only read permission was granted, the delete failed.
Here we just throwing an error message to let users know what happened.
ParquetRelation2 may also hit this problem. I think to restrict JSONRelation and ParquetRelation2 must base on directory is more reasonable for access control. Maybe I can do it in follow up works.

Author: Yanbo Liang <ybliang8@gmail.com>
Author: Yanbo Liang <yanbohappy@gmail.com>

Closes #4610 from yanboliang/jsonInsertImprovements and squashes the following commits:

c387fce [Yanbo Liang] fix typos
42d7fb6 [Yanbo Liang] add unittest & fix output format
46f0d9d [Yanbo Liang] Update JSONRelation.scala
e2df8d5 [Yanbo Liang] check path exisit when write
79f7040 [Yanbo Liang] Update JSONRelation.scala
e4bc229 [Yanbo Liang] Update JSONRelation.scala
5a42d83 [Yanbo Liang] JSONRelation CTAS should check if delete is successful
2015-03-21 11:23:28 +08:00
Cheng Lian 937c1e5503 [SPARK-6315] [SQL] Also tries the case class string parser while reading Parquet schema
When writing Parquet files, Spark 1.1.x persists the schema string into Parquet metadata with the result of `StructType.toString`, which was then deprecated in Spark 1.2 by a schema string in JSON format. But we still need to take the old schema format into account while reading Parquet files.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5034)
<!-- Reviewable:end -->

Author: Cheng Lian <lian@databricks.com>

Closes #5034 from liancheng/spark-6315 and squashes the following commits:

a182f58 [Cheng Lian] Adds a regression test
b9c6dbe [Cheng Lian] Also tries the case class string parser while reading Parquet schema
2015-03-21 11:18:45 +08:00
Yanbo Liang bc37c9743e [SPARK-5821] [SQL] ParquetRelation2 CTAS should check if delete is successful
Do the same check as #4610 for ParquetRelation2.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #5107 from yanboliang/spark-5821-parquet and squashes the following commits:

7092c8d [Yanbo Liang] ParquetRelation2 CTAS should check if delete is successful
2015-03-21 10:53:04 +08:00