Commit graph

9762 commits

Author SHA1 Message Date
gasparms f80e2629bb [SPARK-5800] Streaming Docs. Change linked files according the selected language
Currently, Spark Streaming Programming Guide after updateStateByKey  explanation links to file stateful_network_wordcount.py and note "For the complete Scala code ..." for any language tab selected. This is an incoherence.

I've changed the guide and link its pertinent example file. JavaStatefulNetworkWordCount.java example was not created so I added to the commit.

Author: gasparms <gmunoz@stratio.com>

Closes #4589 from gasparms/feature/streaming-guide and squashes the following commits:

7f37f89 [gasparms] More style changes
ec202b0 [gasparms] Follow spark style guide
f527328 [gasparms] Improve example to look like scala example
4d8785c [gasparms] Remove throw exception
e92e6b8 [gasparms] Fix incoherence
92db405 [gasparms] Fix Streaming Programming Guide. Change files according the selected language
2015-02-14 20:10:29 +00:00
Reynold Xin e98dfe627c [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
- The old implicit would convert RDDs directly to DataFrames, and that added too many methods.
- toDataFrame -> toDF
- Dsl -> functions
- implicits moved into SQLContext.implicits
- addColumn -> withColumn
- renameColumn -> withColumnRenamed

Python changes:
- toDataFrame -> toDF
- Dsl -> functions package
- addColumn -> withColumn
- renameColumn -> withColumnRenamed
- add toDF functions to RDD on SQLContext init
- add flatMap to DataFrame

Author: Reynold Xin <rxin@databricks.com>
Author: Davies Liu <davies@databricks.com>

Closes #4556 from rxin/SPARK-5752 and squashes the following commits:

5ef9910 [Reynold Xin] More fix
61d3fca [Reynold Xin] Merge branch 'df5' of github.com:davies/spark into SPARK-5752
ff5832c [Reynold Xin] Fix python
749c675 [Reynold Xin] count(*) fixes.
5806df0 [Reynold Xin] Fix build break again.
d941f3d [Reynold Xin] Fixed explode compilation break.
fe1267a [Davies Liu] flatMap
c4afb8e [Reynold Xin] style
d9de47f [Davies Liu] add comment
b783994 [Davies Liu] add comment for toDF
e2154e5 [Davies Liu] schema() -> schema
3a1004f [Davies Liu] Dsl -> functions, toDF()
fb256af [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits moved into SQLContext.implicits - addColumn -> withColumn - renameColumn -> withColumnRenamed
0dd74eb [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
97dd47c [Davies Liu] fix mistake
6168f74 [Davies Liu] fix test
1fc0199 [Davies Liu] fix test
a075cd5 [Davies Liu] clean up, toPandas
663d314 [Davies Liu] add test for agg('*')
9e214d5 [Reynold Xin] count(*) fixes.
1ed7136 [Reynold Xin] Fix build break again.
921b2e3 [Reynold Xin] Fixed explode compilation break.
14698d4 [Davies Liu] flatMap
ba3e12d [Reynold Xin] style
d08c92d [Davies Liu] add comment
5c8b524 [Davies Liu] add comment for toDF
a4e5e66 [Davies Liu] schema() -> schema
d377fc9 [Davies Liu] Dsl -> functions, toDF()
6b3086c [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits moved into SQLContext.implicits - addColumn -> withColumn - renameColumn -> withColumnRenamed
807e8b1 [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
2015-02-13 23:03:22 -08:00
Sean Owen 0ce4e430a8 SPARK-3290 [GRAPHX] No unpersist callls in SVDPlusPlus
This just unpersist()s each RDD in this code that was cache()ed.

Author: Sean Owen <sowen@cloudera.com>

Closes #4234 from srowen/SPARK-3290 and squashes the following commits:

66c1e11 [Sean Owen] unpersist() each RDD that was cache()ed
2015-02-13 20:12:52 -08:00
Josh Rosen d06d5ee9b3 [SPARK-5227] [SPARK-5679] Disable FileSystem cache in WholeTextFileRecordReaderSuite
This patch fixes two difficult-to-reproduce Jenkins test failures in InputOutputMetricsSuite (SPARK-5227 and SPARK-5679).  The problem was that WholeTextFileRecordReaderSuite modifies the `fs.local.block.size` Hadoop configuration and this change was affecting subsequent test suites due to Hadoop's caching of FileSystem instances (see HADOOP-8490 for more details).

The fix implemented here is to disable FileSystem caching in WholeTextFileRecordReaderSuite.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #4599 from JoshRosen/inputoutputsuite-fix and squashes the following commits:

47dc447 [Josh Rosen] [SPARK-5227] [SPARK-5679] Disable FileSystem cache in WholeTextFileRecordReaderSuite
2015-02-13 17:45:31 -08:00
Xiangrui Meng 4f4c6d5a5d [SPARK-5730][ML] add doc groups to spark.ml components
This PR adds three groups to the ScalaDoc: `param`, `setParam`, and `getParam`. Params will show up in the generated Scala API doc as the top group. Setters/getters will be at the bottom.

Preview:

![screen shot 2015-02-13 at 2 47 49 pm](https://cloud.githubusercontent.com/assets/829644/6196657/5740c240-b38f-11e4-94bb-bd8ef5a796c5.png)

Author: Xiangrui Meng <meng@databricks.com>

Closes #4600 from mengxr/SPARK-5730 and squashes the following commits:

febed9a [Xiangrui Meng] add doc groups to spark.ml components
2015-02-13 16:45:59 -08:00
Xiangrui Meng d50a91d529 [SPARK-5803][MLLIB] use ArrayBuilder to build primitive arrays
because ArrayBuffer is not specialized.

Author: Xiangrui Meng <meng@databricks.com>

Closes #4594 from mengxr/SPARK-5803 and squashes the following commits:

1261bd5 [Xiangrui Meng] merge master
a4ea872 [Xiangrui Meng] use ArrayBuilder to build primitive arrays
2015-02-13 16:43:49 -08:00
Xiangrui Meng cc56c8729a [SPARK-5806] re-organize sections in mllib-clustering.md
Put example code close to the algorithm description.

Author: Xiangrui Meng <meng@databricks.com>

Closes #4598 from mengxr/SPARK-5806 and squashes the following commits:

a137872 [Xiangrui Meng] re-organize sections in mllib-clustering.md
2015-02-13 15:09:27 -08:00
Yin Huai 2e0c084528 [SPARK-5789][SQL]Throw a better error message if JsonRDD.parseJson encounters unrecoverable parsing errors.
Author: Yin Huai <yhuai@databricks.com>

Closes #4582 from yhuai/jsonErrorMessage and squashes the following commits:

152dbd4 [Yin Huai] Update error message.
1466256 [Yin Huai] Throw a better error message when a JSON object in the input dataset span multiple records (lines for files or strings for an RDD of strings).
2015-02-13 13:51:06 -08:00
Daoyuan Wang 2cbb3e433a [SPARK-5642] [SQL] Apply column pruning on unused aggregation fields
select k from (select key k, max(value) v from src group by k) t

Author: Daoyuan Wang <daoyuan.wang@intel.com>
Author: Michael Armbrust <michael@databricks.com>

Closes #4415 from adrian-wang/groupprune and squashes the following commits:

5d2d8a3 [Daoyuan Wang] address Michael's comments
61f8ef7 [Daoyuan Wang] add a unit test
80ddcc6 [Daoyuan Wang] keep project
b69d385 [Daoyuan Wang] add a prune rule for grouping set
2015-02-13 13:48:39 -08:00
Andrew Or 5d3cc6b3d7 [HOTFIX] Fix build break in MesosSchedulerBackendSuite 2015-02-13 13:10:29 -08:00
Reynold Xin 378c7eb0d6 [HOTFIX] Ignore DirectKafkaStreamSuite. 2015-02-13 12:43:53 -08:00
Emre Sevinç 9f31db0610 SPARK-5805 Fixed the type error in documentation.
Fixes SPARK-5805 : Fix the type error in the final example given in MLlib - Clustering documentation.

Author: Emre Sevinç <emre.sevinc@gmail.com>

Closes #4596 from emres/SPARK-5805 and squashes the following commits:

1029f66 [Emre Sevinç] SPARK-5805 Fixed the type error in documentation.
2015-02-13 12:31:27 -08:00
Josh Rosen 077eec2d9d [SPARK-5735] Replace uses of EasyMock with Mockito
This patch replaces all uses of EasyMock with Mockito.  There are two motivations for this:

1. We should use a single mocking framework in our tests in order to keep things consistent.
2. EasyMock may be responsible for non-deterministic unit test failures due to its Objensis dependency (see SPARK-5626 for more details).

Most of these changes are fairly mechanical translations of Mockito code to EasyMock, although I made a small change that strengthens the assertions in one test in KinesisReceiverSuite.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #4578 from JoshRosen/SPARK-5735-remove-easymock and squashes the following commits:

0ab192b [Josh Rosen] Import sorting plus two minor changes to more closely match old semantics.
977565b [Josh Rosen] Remove EasyMock from build.
fae1d8f [Josh Rosen] Remove EasyMock usage in KinesisReceiverSuite.
7cca486 [Josh Rosen] Remove EasyMock usage in MesosSchedulerBackendSuite
fc5e94d [Josh Rosen] Remove EasyMock in CacheManagerSuite
2015-02-13 09:55:36 -08:00
Ryan Williams fc6d3e796a [SPARK-5783] Better eventlog-parsing error messages
Author: Ryan Williams <ryan.blake.williams@gmail.com>

Closes #4573 from ryan-williams/history and squashes the following commits:

a8647ec [Ryan Williams] fix test calls to .replay()
98aa3fe [Ryan Williams] include filename in history-parsing error message
8deecf0 [Ryan Williams] add line number to history-parsing error message
b668b52 [Ryan Williams] add log info line to history-eventlog parsing
2015-02-13 09:47:26 -08:00
sboeschhuawei e1a1ff8108 [SPARK-5503][MLLIB] Example code for Power Iteration Clustering
Author: sboeschhuawei <stephen.boesch@huawei.com>

Closes #4495 from javadba/picexamples and squashes the following commits:

3c84b14 [sboeschhuawei] PIC Examples updates from Xiangrui's comments round 5
2878675 [sboeschhuawei] Fourth round with xiangrui on PICExample
d7ac350 [sboeschhuawei] Updates to PICExample from Xiangrui's comments round 3
d7f0cba [sboeschhuawei] Updates to PICExample from Xiangrui's comments round 3
cef28f4 [sboeschhuawei] Further updates to PICExample from Xiangrui's comments
f7ff43d [sboeschhuawei] Update to PICExample from Xiangrui's comments
efeec45 [sboeschhuawei] Update to PICExample from Xiangrui's comments
03e8de4 [sboeschhuawei] Added PICExample
c509130 [sboeschhuawei] placeholder for pic examples
5864d4a [sboeschhuawei] placeholder for pic examples
2015-02-13 09:45:57 -08:00
uncleGen c0ccd25641 [SPARK-5732][CORE]:Add an option to print the spark version in spark script.
Naturally, we may need to add an option to print the spark version in spark script. It is pretty common in script tool.
![9](https://cloud.githubusercontent.com/assets/7402327/6183331/cab1b74e-b38e-11e4-9daa-e26e6015cff3.JPG)

Author: uncleGen <hustyugm@gmail.com>
Author: genmao.ygm <genmao.ygm@alibaba-inc.com>

Closes #4522 from uncleGen/master-clean-150211 and squashes the following commits:

9f2127c [genmao.ygm] revert the behavior of "-v"
015ddee [uncleGen] minor changes
463f02c [uncleGen] minor changes
2015-02-13 09:44:13 -08:00
WangTaoTheTonic 1768bd5143 [SPARK-4832][Deploy]some other processes might take the daemon pid
Some other processes might use the pid saved in pid file. In that case we should ignore it and launch daemons.

JIRA is down for maintenance. I will file one once it return.

Author: WangTaoTheTonic <barneystinson@aliyun.com>
Author: WangTaoTheTonic <wangtao111@huawei.com>

Closes #3683 from WangTaoTheTonic/otherproc and squashes the following commits:

daa86a1 [WangTaoTheTonic] some bash style fix
8befee7 [WangTaoTheTonic] handle the mistake scenario
cf4ecc6 [WangTaoTheTonic] remove redundant condition
f36cfb4 [WangTaoTheTonic] some other processes might take the pid
2015-02-13 10:27:23 +00:00
tianyi 1c8633f3fe [SPARK-3365][SQL]Wrong schema generated for List type
This PR fix the issue SPARK-3365.
The reason is Spark generated wrong schema for the type `List` in `ScalaReflection.scala`
for example:

the generated schema for type `Seq[String]` is:
```
{"name":"x","type":{"type":"array","elementType":"string","containsNull":true},"nullable":true,"metadata":{}}`
```

the generated schema for type `List[String]` is:
```
{"name":"x","type":{"type":"struct","fields":[]},"nullable":true,"metadata":{}}`
```

Author: tianyi <tianyi.asiainfo@gmail.com>

Closes #4581 from tianyi/SPARK-3365 and squashes the following commits:

a097e86 [tianyi] change the order of resolution in ScalaReflection.scala
2015-02-12 22:18:39 -08:00
Yin Huai 2aea892ebd [SQL] Fix docs of SQLContext.tables
Author: Yin Huai <yhuai@databricks.com>

Closes #4579 from yhuai/tablesDoc and squashes the following commits:

7f8964c [Yin Huai] Fix doc.
2015-02-12 20:37:55 -08:00
Yin Huai 1d0596a16e [SPARK-3299][SQL]Public API in SQLContext to list tables
https://issues.apache.org/jira/browse/SPARK-3299

Author: Yin Huai <yhuai@databricks.com>

Closes #4547 from yhuai/tables and squashes the following commits:

6c8f92e [Yin Huai] Add tableNames.
acbb281 [Yin Huai] Update Python test.
7793dcb [Yin Huai] Fix scala test.
572870d [Yin Huai] Address comments.
aba2e88 [Yin Huai] Format.
12c86df [Yin Huai] Add tables() to SQLContext to return a DataFrame containing existing tables.
2015-02-12 18:08:01 -08:00
Yin Huai c025a46882 [SQL] Move SaveMode to SQL package.
Author: Yin Huai <yhuai@databricks.com>

Closes #4542 from yhuai/moveSaveMode and squashes the following commits:

65a4425 [Yin Huai] Move SaveMode to sql package.
2015-02-12 15:32:17 -08:00
Vladimir Grigor ada993e954 [SPARK-5335] Fix deletion of security groups within a VPC
Please see https://issues.apache.org/jira/browse/SPARK-5335.

The fix itself is in e58a8b01a8bedcbfbbc6d04b1c1489255865cf87 commit. Two earlier commits are fixes of another VPC related bug waiting to be merged. I should have created former bug fix in own branch then this fix would not have former fixes. :(

This code is released under the project's license.

Author: Vladimir Grigor <vladimir@kiosked.com>
Author: Vladimir Grigor <vladimir@voukka.com>

Closes #4122 from voukka/SPARK-5335_delete_sg_vpc and squashes the following commits:

090dca9 [Vladimir Grigor] fixes as per review: removed printing of group_id and added comment
730ec05 [Vladimir Grigor] fix for SPARK-5335: Destroying cluster in VPC with "--delete-groups" fails to remove security groups
2015-02-12 23:26:24 +00:00
Daoyuan Wang d5fc514918 [SPARK-5755] [SQL] remove unnecessary Add
explain extended select +key from src;
before:
== Parsed Logical Plan ==
'Project [(0 + 'key) AS _c0#8]
 'UnresolvedRelation [src], None

== Analyzed Logical Plan ==
Project [(0 + key#10) AS _c0#8]
 MetastoreRelation test, src, None

== Optimized Logical Plan ==
Project [(0 + key#10) AS _c0#8]
 MetastoreRelation test, src, None

== Physical Plan ==
Project [(0 + key#10) AS _c0#8]
 HiveTableScan [key#10], (MetastoreRelation test, src, None), None

after this patch:
== Parsed Logical Plan ==
'Project ['key]
 'UnresolvedRelation [src], None

== Analyzed Logical Plan ==
Project [key#10]
 MetastoreRelation test, src, None

== Optimized Logical Plan ==
Project [key#10]
 MetastoreRelation test, src, None

== Physical Plan ==
HiveTableScan [key#10], (MetastoreRelation test, src, None), None

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #4551 from adrian-wang/positive and squashes the following commits:

0821ae4 [Daoyuan Wang] remove unnecessary Add
2015-02-12 15:22:07 -08:00
Michael Armbrust ee04a8b19b [SPARK-5573][SQL] Add explode to dataframes
Author: Michael Armbrust <michael@databricks.com>

Closes #4546 from marmbrus/explode and squashes the following commits:

eefd33a [Michael Armbrust] whitespace
a8d496c [Michael Armbrust] Merge remote-tracking branch 'apache/master' into explode
4af740e [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explode
dc86a5c [Michael Armbrust] simple version
d633d01 [Michael Armbrust] add scala specific
950707a [Michael Armbrust] fix comments
ba8854c [Michael Armbrust] [SPARK-5573][SQL] Add explode to dataframes
2015-02-12 15:19:19 -08:00
Yin Huai c352ffbdb9 [SPARK-5758][SQL] Use LongType as the default type for integers in JSON schema inference.
Author: Yin Huai <yhuai@databricks.com>

Closes #4544 from yhuai/jsonUseLongTypeByDefault and squashes the following commits:

6e2ffc2 [Yin Huai] Use LongType as the default type for integers in JSON schema inference.
2015-02-12 15:17:25 -08:00
Davies Liu 0bf0315825 [SPARK-5780] [PySpark] Mute the logging during unit tests
There a bunch of logging coming from driver and worker, it's noisy and scaring, and a lots of exception in it, people are confusing about the tests are failing or not.

This PR will mute the logging during tests, only show them if any one failed.

Author: Davies Liu <davies@databricks.com>

Closes #4572 from davies/mute and squashes the following commits:

1e9069c [Davies Liu] mute the logging during python tests
2015-02-12 14:54:38 -08:00
David Y. Ross 26c816e738 SPARK-5747: Fix wordsplitting bugs in make-distribution.sh
The `$MVN` command variable may have spaces, so when referring to it, must wrap in quotes.

Author: David Y. Ross <dyross@gmail.com>

Closes #4540 from dyross/dyr-fix-make-distribution2 and squashes the following commits:

5a41596 [David Y. Ross] SPARK-5747: Fix wordsplitting bugs in make-distribution.sh
2015-02-12 14:52:38 -08:00
lianhuiwang 947b8bd82e [SPARK-5759][Yarn]ExecutorRunnable should catch YarnException while NMClient start contain...
some time since some reasons, it lead to some exception while NMClient start some containers.example:we do not config spark_shuffle on some machines, so it will throw a exception:
java.lang.Error: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist.
because YarnAllocator use ThreadPoolExecutor to start Container, so we can not find which container or hostname throw exception. I think we should catch YarnException in ExecutorRunnable when start container. if there are some exceptions, we can know the container id or hostname of failed container.

Author: lianhuiwang <lianhuiwang09@gmail.com>

Closes #4554 from lianhuiwang/SPARK-5759 and squashes the following commits:

caf5a99 [lianhuiwang] use SparkException to warp exception
c02140f [lianhuiwang] ExecutorRunnable should catch YarnException while NMClient start container
2015-02-12 14:51:06 -08:00
Andrew Or 1d5663e92c [SPARK-5760][SPARK-5761] Fix standalone rest protocol corner cases + revamp tests
The changes are summarized in the commit message. Test or test-related code accounts for 90% of the lines changed.

Author: Andrew Or <andrew@databricks.com>

Closes #4557 from andrewor14/rest-tests and squashes the following commits:

b4dc980 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest-tests
b55e40f [Andrew Or] Add test for unknown fields
cc96993 [Andrew Or] private[spark] -> private[rest]
578cf45 [Andrew Or] Clean up test code a little
d82d971 [Andrew Or] v1 -> serverVersion
ea48f65 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest-tests
00999a8 [Andrew Or] Revamp tests + fix a few corner cases
2015-02-12 14:47:52 -08:00
Kay Ousterhout 47c73d410a [SPARK-5762] Fix shuffle write time for sort-based shuffle
mateiz was excluding the time to write this final file from the shuffle write time intentional?

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #4559 from kayousterhout/SPARK-5762 and squashes the following commits:

5c6f3d9 [Kay Ousterhout] Use foreach
94e4237 [Kay Ousterhout] Removed open time metrics added inadvertently
ace156c [Kay Ousterhout] Moved metrics to finally block
d773276 [Kay Ousterhout] Use nano time
5a59906 [Kay Ousterhout] [SPARK-5762] Fix shuffle write time for sort-based shuffle
2015-02-12 14:46:37 -08:00
Venkata Ramana Gollamudi 629d0143ee [SPARK-5765][Examples]Fixed word split problem in run-example and compute-classpath
Author: Venkata Ramana G <ramana.gollamudihuawei.com>

Author: Venkata Ramana Gollamudi <ramana.gollamudi@huawei.com>

Closes #4561 from gvramana/word_split and squashes the following commits:

285c8d4 [Venkata Ramana Gollamudi] Fixed word split problem in run-example and compute-classpath
2015-02-12 14:44:21 -08:00
Katsunori Kanda 9c8076502f [EC2] Update default Spark version to 1.2.1
Author: Katsunori Kanda <potix2@gmail.com>

Closes #4566 from potix2/ec2-update-version-1-2-1 and squashes the following commits:

77e7840 [Katsunori Kanda] [EC2] Update default Spark version to 1.2.1
2015-02-12 14:38:42 -08:00
Kay Ousterhout 893d6fd704 [SPARK-5645] Added local read bytes/time to task metrics
ksakellis I stumbled on your JIRA for this yesterday; I know it's assigned to you but I'd already done this for my own uses a while ago so thought I could help save you the work of doing it!  Hopefully this doesn't duplicate any work you've already done.

Here's a screenshot of what the UI looks like:
![image](https://cloud.githubusercontent.com/assets/1108612/6135352/c03e7276-b11c-11e4-8f11-c6aefe1f35b9.png)
Based on a discussion with pwendell, I put the data read remotely in as an additional metric rather than showing it in brackets as you'd suggested, Kostas.  The assumption here is that the average user doesn't care about the differentiation between local / remote data, so it's better not to pollute the UI.

I also added data about the local read time, which I've found very helpful for debugging, but I didn't put it in the UI because I think it's probably something not a ton of people will need to use.

With this change, the total read time and total write time shown in the UI will be equal, fixing a long-term source of user confusion:
![image](https://cloud.githubusercontent.com/assets/1108612/6135399/25f14490-b11d-11e4-8086-20be5f4002e6.png)

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #4510 from kayousterhout/SPARK-5645 and squashes the following commits:

4a0182c [Kay Ousterhout] oops
5f5da1b [Kay Ousterhout] Small style fix
5da04cf [Kay Ousterhout] Addressed more comments from Kostas
ba05149 [Kay Ousterhout] Remove parens
a9dc685 [Kay Ousterhout] Kostas comment, test fix
33d2e2d [Kay Ousterhout] Merge remote-tracking branch 'upstream/master' into SPARK-5645
347e2cd [Kay Ousterhout] [SPARK-5645] Added local read bytes/time to task metrics
2015-02-12 14:36:27 -08:00
Michael Armbrust aa4ca8b873 [SQL] Improve error messages
Author: Michael Armbrust <michael@databricks.com>
Author: wangfei <wangfei1@huawei.com>

Closes #4558 from marmbrus/errorMessages and squashes the following commits:

5e5ab50 [Michael Armbrust] Merge pull request #15 from scwf/errorMessages
fa38881 [wangfei] fix for grouping__id
f279a71 [wangfei] make right references for ScriptTransformation
d29fbde [Michael Armbrust] extra case
1a797b4 [Michael Armbrust] comments
d4e9015 [Michael Armbrust] add comment
af9e668 [Michael Armbrust] no braces
34eb3a4 [Michael Armbrust] more work
6197cd5 [Michael Armbrust] [SQL] Better error messages for analysis failures
2015-02-12 13:11:28 -08:00
Antonio Navarro Perez 6a1be026cf [SQL][DOCS] Update sql documentation
Updated examples using the new api and added DataFrame concept

Author: Antonio Navarro Perez <ajnavarro@users.noreply.github.com>

Closes #4560 from ajnavarro/ajnavarro-doc-sql-update and squashes the following commits:

82ebcf3 [Antonio Navarro Perez] Changed a missing JavaSQLContext to SQLContext.
8d5376a [Antonio Navarro Perez] fixed typo
8196b6b [Antonio Navarro Perez] [SQL][DOCS] Update sql documentation
2015-02-12 12:46:17 -08:00
Sean Owen bc57789bbb SPARK-5776 JIRA version not of form x.y.z breaks merge_spark_pr.py
Consider only x.y.z verisons from JIRA. CC JoshRosen who will probably know this script well.
Alternative is to call the version "2.0.0" after all in JIRA.

Author: Sean Owen <sowen@cloudera.com>

Closes #4570 from srowen/SPARK-5776 and squashes the following commits:

fffafde [Sean Owen] Consider only x.y.z verisons from JIRA
2015-02-12 20:14:45 +00:00
Xiangrui Meng 99bd500665 [SPARK-5757][MLLIB] replace SQL JSON usage in model import/export by json4s
This PR detaches MLlib model import/export code from SQL's JSON support, and hence unblocks #4544 . yhuai

Author: Xiangrui Meng <meng@databricks.com>

Closes #4555 from mengxr/SPARK-5757 and squashes the following commits:

b0415e8 [Xiangrui Meng] replace SQL JSON usage by json4s
2015-02-12 10:48:13 -08:00
Andrew Rowson 466b1f671b [SPARK-5655] Don't chmod700 application files if running in YARN
[Was previously PR4507]

As per SPARK-5655, recently committed code chmod 700s all application files created on the local fs by a spark executor. This is both unnecessary and broken on YARN, where files created in the nodemanager's working directory are already owned by the user running the job and the 'yarn' group. Group read permission is also needed for the auxiliary shuffle service to be able to read the files, as this is running as the 'yarn' user.

Author: Andrew Rowson <github@growse.com>

Closes #4509 from growse/master and squashes the following commits:

7ca993c [Andrew Rowson] Moved chmod700 functionality into Utils.getOrCreateLocalRootDirs
f57ce6b [Andrew Rowson] [SPARK-5655] Don't chmod700 application files if running in a YARN container
2015-02-12 18:41:39 +00:00
Oren Mazor 9a6efbccf9 ignore cache paths for RAT tests
RAT fails on cache paths. add to .rat-excludes

Author: Oren Mazor <oren.mazor@gmail.com>

Closes #4569 from orenmazor/apache_master and squashes the following commits:

d0c9e7e [Oren Mazor] ignore cache paths for RAT tests
2015-02-12 18:37:00 +00:00
Sean Owen 9a3ea49f74 SPARK-5727 [BUILD] Remove Debian packaging
(for master / 1.4 only)

Author: Sean Owen <sowen@cloudera.com>

Closes #4526 from srowen/SPARK-5727.2 and squashes the following commits:

83ba49c [Sean Owen] Remove Debian packaging
2015-02-12 12:36:26 +00:00
Michael Armbrust a38e23c30f [SQL] Make dataframe more tolerant of being serialized
Eases use in the spark-shell.

Author: Michael Armbrust <michael@databricks.com>

Closes #4545 from marmbrus/serialization and squashes the following commits:

04748e6 [Michael Armbrust] @scala.annotation.varargs
b36e219 [Michael Armbrust] moreFixes
2015-02-11 19:05:49 -08:00
Reynold Xin d931b01dca [SQL] Two DataFrame fixes.
- Removed DataFrame.apply for projection & filtering since they are extremely confusing.
- Added implicits for RDD[Int], RDD[Long], and RDD[String]

Author: Reynold Xin <rxin@databricks.com>

Closes #4543 from rxin/df-cleanup and squashes the following commits:

81ec915 [Reynold Xin] [SQL] More DataFrame fixes.
2015-02-11 18:32:48 -08:00
Reynold Xin fa6bdc6e81 [SPARK-3688][SQL] More inline comments for LogicalPlan.
As a follow-up to https://github.com/apache/spark/pull/4524

Author: Reynold Xin <rxin@databricks.com>

Closes #4539 from rxin/SPARK-3688 and squashes the following commits:

5ac56c7 [Reynold Xin] exists
da8eea4 [Reynold Xin] [SPARK-3688][SQL] More inline comments for LogicalPlan.
2015-02-11 15:26:31 -08:00
tianyi 44b2311d94 [SPARK-3688][SQL]LogicalPlan can't resolve column correctlly
This PR fixed the resolving problem described in https://issues.apache.org/jira/browse/SPARK-3688
```
CREATE TABLE t1(x INT);
CREATE TABLE t2(a STRUCT<x: INT>, k INT);
SELECT a.x FROM t1 a JOIN t2 b ON a.x = b.k;
```

Author: tianyi <tianyi.asiainfo@gmail.com>

Closes #4524 from tianyi/SPARK-3688 and squashes the following commits:

237a256 [tianyi] resolve a name with table.column pattern first.
2015-02-11 12:50:17 -08:00
Michael Armbrust a60d2b70ad [SPARK-5454] More robust handling of self joins
Also I fix a bunch of bad output in test cases.

Author: Michael Armbrust <michael@databricks.com>

Closes #4520 from marmbrus/selfJoin and squashes the following commits:

4f4a85c [Michael Armbrust] comments
49c8e26 [Michael Armbrust] fix tests
6fc38de [Michael Armbrust] fix style
55d64b3 [Michael Armbrust] fix dataframe selfjoins
2015-02-11 12:31:56 -08:00
Daniel Darabos 03bf704bf4 Remove outdated remark about take(n).
Looking at the code, I believe this remark about `take(n)` computing partitions on the driver is no longer correct. Apologies if I'm wrong.

This came up in http://stackoverflow.com/q/28436559/3318517.

Author: Daniel Darabos <darabos.daniel@gmail.com>

Closes #4533 from darabos/patch-2 and squashes the following commits:

cc80f3a [Daniel Darabos] Remove outdated remark about take(n).
2015-02-11 20:24:17 +00:00
Davies Liu b694eb9c2f [SPARK-5677] [SPARK-5734] [SQL] [PySpark] Python DataFrame API remaining tasks
1. DataFrame.renameColumn

2. DataFrame.show() and _repr_

3. Use simpleString() rather than jsonValue in DataFrame.dtypes

4. createDataFrame from local Python data, including pandas.DataFrame

Author: Davies Liu <davies@databricks.com>

Closes #4528 from davies/df3 and squashes the following commits:

014acea [Davies Liu] fix typo
6ba526e [Davies Liu] fix tests
46f5f95 [Davies Liu] address comments
6cbc154 [Davies Liu] dataframe.show() and improve dtypes
6f94f25 [Davies Liu] create DataFrame from local Python data
2015-02-11 12:13:16 -08:00
guliangliang 1ac099e3e0 [SPARK-5733] Error Link in Pagination of HistroyPage when showing Incomplete Applications
The links in pagination of HistroyPage is wrong when showing Incomplete Applications.

If "2" is click on the following page "http://history-server:18080/?page=1&showIncomplete=true", it will go to "http://history-server:18080/?page=2" instead of "http://history-server:18080/?page=2&showIncomplete=true".

Author: guliangliang <guliangliang@qiyi.com>

Closes #4523 from marsishandsome/Spark5733 and squashes the following commits:

9d7b593 [guliangliang] [SPARK-5733] Error Link in Pagination of HistroyPage when showing Incomplete Applications
2015-02-11 15:55:49 +00:00
Sean Owen bd0d6e0cc3 SPARK-5727 [BUILD] Deprecate Debian packaging
This just adds a deprecation message. It's intended for backporting to branch 1.3 but can go in master too, to be followed by another PR that removes it for 1.4.

Author: Sean Owen <sowen@cloudera.com>

Closes #4516 from srowen/SPARK-5727.1 and squashes the following commits:

d48989f [Sean Owen] Refer to Spark 1.4
6c1c8b3 [Sean Owen] Deprecate Debian packaging
2015-02-11 08:30:16 +00:00
Sean Owen da89720bf4 SPARK-5728 [STREAMING] MQTTStreamSuite leaves behind ActiveMQ database files
Use temp dir for ActiveMQ database

Author: Sean Owen <sowen@cloudera.com>

Closes #4517 from srowen/SPARK-5728 and squashes the following commits:

1d3aeb8 [Sean Owen] Use temp dir for ActiveMQ database
2015-02-11 08:13:51 +00:00