This should also close#6243.
Author: Reynold Xin <rxin@databricks.com>
Closes#6431 from rxin/JavaTypeInference-guava and squashes the following commits:
e58df3c [Reynold Xin] Removed Gauva dependency from JavaTypeInference's type signature.
(cherry picked from commit 6fec1a9409)
Signed-off-by: Reynold Xin <rxin@databricks.com>
This issue is related to #6419 .
Now AllJobPage doesn't have a "kill link" but I think fix the issue mentioned in #6419 just in case to avoid accidents in the future.
So, it's minor issue for now and I don't file this issue in JIRA.
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes#6432 from sarutak/remove-ambiguity-of-link and squashes the following commits:
cd1a503 [Kousuke Saruta] Fixed ambiguity link issue in AllJobPage
(cherry picked from commit 0db76c90ad)
Signed-off-by: Andrew Or <andrew@databricks.com>
Please refer to [SPARK-7847] [1] for details.
[1]: https://issues.apache.org/jira/browse/SPARK-7847
Author: Cheng Lian <lian@databricks.com>
Closes#6389 from liancheng/spark-7847 and squashes the following commits:
935c652 [Cheng Lian] Adds test case for writing various data types as dynamic partition value
f4fc398 [Cheng Lian] Converts partition columns to Scala type when writing dynamic partitions
d0aeca0 [Cheng Lian] Fixes dynamic partition directory escaping
(cherry picked from commit 15459db4f6)
Signed-off-by: Yin Huai <yhuai@databricks.com>
follow up for #6377
Change time to the equivalent in GMT
/cc squito
Author: scwf <wangfei1@huawei.com>
Closes#6425 from scwf/fix-HistoryServerSuite and squashes the following commits:
4d37935 [scwf] fix HistoryServerSuite
(cherry picked from commit 4615081d7a)
Signed-off-by: Imran Rashid <irashid@cloudera.com>
Two minor changes.
cc brkyvz
Author: Reynold Xin <rxin@databricks.com>
Closes#6428 from rxin/math-func-cleanup and squashes the following commits:
5910df5 [Reynold Xin] [SQL] Rename MathematicalExpression UnaryMathExpression, and specify BinaryMathExpression's output data type as DoubleType.
(cherry picked from commit 3e7d7d6b3d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
JIRA: https://issues.apache.org/jira/browse/SPARK-7697
The reported problem case is mysql. But for h2 db, there is no unsigned int. So it is not able to add corresponding test.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#6229 from viirya/unsignedint_as_long and squashes the following commits:
dc4b5d8 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into unsignedint_as_long
608695b [Liang-Chi Hsieh] Use LongType for unsigned int in JDBCRDD.
(cherry picked from commit 4f98d7a7f1)
Signed-off-by: Reynold Xin <rxin@databricks.com>
I grep'ed hive-0.12.0 in the source code and removed all the profiles and doc references.
Author: Cheolsoo Park <cheolsoop@netflix.com>
Closes#6393 from piaozhexiu/SPARK-7850 and squashes the following commits:
fb429ce [Cheolsoo Park] Remove hive-0.13.1 profile
82bf09a [Cheolsoo Park] Remove hive 0.12.0 shim code
f3722da [Cheolsoo Park] Remove hive-0.12.0 profile and references from POM and build docs
(cherry picked from commit 6dd645870d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
So that potential partial/corrupted data files left by failed tasks/jobs won't affect normal data scan.
Author: Cheng Lian <lian@databricks.com>
Closes#6411 from liancheng/spark-7868 and squashes the following commits:
273ea36 [Cheng Lian] Ignores _temporary directories
(cherry picked from commit b463e6d618)
Signed-off-by: Yin Huai <yhuai@databricks.com>
In `DataSourceStrategy.createPhysicalRDD`, we use the relation schema as the target schema for converting incoming rows into Catalyst rows. However, we should be using the output schema instead, since our scan might return a subset of the relation's columns.
This patch incorporates #6414 by liancheng, which fixes an issue in `SimpleTestRelation` that prevented this bug from being caught by our old tests:
> In `SimpleTextRelation`, we specified `needsConversion` to `true`, indicating that values produced by this testing relation should be of Scala types, and need to be converted to Catalyst types when necessary. However, we also used `Cast` to convert strings to expected data types. And `Cast` always produces values of Catalyst types, thus no conversion is done at all. This PR makes `SimpleTextRelation` produce Scala values so that data conversion code paths can be properly tested.
Closes#5986.
Author: Josh Rosen <joshrosen@databricks.com>
Author: Cheng Lian <lian@databricks.com>
Author: Cheng Lian <liancheng@users.noreply.github.com>
Closes#6400 from JoshRosen/SPARK-7858 and squashes the following commits:
e71c866 [Josh Rosen] Re-fix bug so that the tests pass again
56b13e5 [Josh Rosen] Add regression test to hadoopFsRelationSuites
2169a0f [Josh Rosen] Remove use of SpecificMutableRow and BufferedIterator
6cd7366 [Josh Rosen] Fix SPARK-7858 by using output types for conversion.
5a00e66 [Josh Rosen] Add assertions in order to reproduce SPARK-7858
8ba195c [Cheng Lian] Merge 9968fba9979287aaa1f141ba18bfb9d4c116a3b3 into 61664732b2
9968fba [Cheng Lian] Tests the data type conversion code paths
(cherry picked from commit 0c33c7b4a6)
Signed-off-by: Yin Huai <yhuai@databricks.com>
Fixing broken trainImplicit Scala example in MLlib Collaborative Filtering documentation to match one of the possible ALS.trainImplicit function signatures.
Author: Mike Dusenberry <dusenberrymw@gmail.com>
Closes#6422 from dusenberrymw/Fix_MLlib_Collab_Filtering_trainImplicit_Example and squashes the following commits:
36492f4 [Mike Dusenberry] Fixing broken trainImplicit example in MLlib Collaborative Filtering documentation to match one of the possible ALS.trainImplicit function signatures.
(cherry picked from commit 0463428b6e)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
**Reproduction.** Run a long-running job, go to the job page, expand the DAG visualization, and click into a stage. Your stage is now killed. Why? This is because the visualization code just reaches into the stage table and grabs the first link it finds. In our case, this first link happens to be the kill link instead of the one to the stage page.
**Fix.** Use proper CSS selectors to avoid ambiguity.
This is an alternative to #6407. Thanks carsonwang for catching this.
Author: Andrew Or <andrew@databricks.com>
Closes#6419 from andrewor14/fix-ui-viz-kill and squashes the following commits:
25203bd [Andrew Or] Do not kill innocent stages
(cherry picked from commit 8f20824268)
Signed-off-by: Andrew Or <andrew@databricks.com>
With descent coverage of feature transformers, algorithms, and model tuning support, it is time to graduate `spark.ml` from alpha. This PR changes all `AlphaComponent` annotations to either `DeveloperApi` or `Experimental`, depending on whether we expect a class/method to be used by end users (who use the pipeline API to assemble/tune their ML pipelines but not to create new pipeline components.) `UnaryTransformer` becomes a `DeveloperApi` in this PR.
jkbradley harsha2010
Author: Xiangrui Meng <meng@databricks.com>
Closes#6417 from mengxr/SPARK-7748 and squashes the following commits:
effbccd [Xiangrui Meng] organize imports
c15028e [Xiangrui Meng] added missing docs
1b2e5f8 [Xiangrui Meng] update package doc
73ca791 [Xiangrui Meng] alpha -> ex/dev for the rest
93819db [Xiangrui Meng] alpha -> ex/dev in ml.param
55ca073 [Xiangrui Meng] alpha -> ex/dev in ml.feature
83572f1 [Xiangrui Meng] add Experimental and DeveloperApi tags (wip)
(cherry picked from commit 836a75898f)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
This corresponds to https://github.com/mesos/spark-ec2/pull/116 in the spark-ec2 repo. The only changes required on the spark_ec2.py script is to open the RM port.
cc andrewor14
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes#6376 from shivaram/spark-ec2-yarn and squashes the following commits:
961504a [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into spark-ec2-yarn
152c94c [Shivaram Venkataraman] Open 8088 for YARN in EC2
(cherry picked from commit 2e9a5f229e)
Signed-off-by: Andrew Or <andrew@databricks.com>
The densities in KernelDensity are scaled down by
(number of parallel processes X number of points). It should be just no.of samples. This results in broken tests in KernelDensitySuite which haven't been tested properly.
Author: MechCoder <manojkumarsivaraj334@gmail.com>
Closes#6383 from MechCoder/spark-7844 and squashes the following commits:
ab81302 [MechCoder] Math->math
9b8ed50 [MechCoder] Make one pass to update count
a92fe50 [MechCoder] [SPARK-7844] Fix broken tests in KernelDensity
(cherry picked from commit 61664732b2)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
A couple of links in the MLlib Naive Bayes documentation for v1.4 were broken due to the addition of either space or newline characters between the link title and link URL in the markdown doc. (Interestingly enough, they are rendered correctly in the GitHub viewer, but not when compiled to HTML by Jekyll.)
Author: Mike Dusenberry <dusenberrymw@gmail.com>
Closes#6412 from dusenberrymw/Fix_Broken_Links_In_MLlib_Naive_Bayes_Docs and squashes the following commits:
91a4028 [Mike Dusenberry] Fixing misformatted links by removing space and newline characters.
(cherry picked from commit e5a63a0e39)
Signed-off-by: Sean Owen <sowen@cloudera.com>
I have used this script to launch, destroy, start, and stop clusters successfully.
Author: meawoppl <meawoppl@gmail.com>
Closes#6336 from meawoppl/py3ec2spark and squashes the following commits:
2e87046 [meawoppl] Py3 compat fixes.
(cherry picked from commit 8dbe777703)
Signed-off-by: Davies Liu <davies@databricks.com>
In PySpark we get memory used before and after spill, then use the difference of these two value as memorySpilled, but if the before value is small than after value, then we will get a negative value, but this scenario 0 value may be more reasonable.
Below is the result in HistoryServer we have tested:
Index ID Attempt Status Locality Level Executor ID / Host Launch Time Duration GC Time Input Size / Records Write Time Shuffle Write Size / Records Shuffle Spill (Memory) Shuffle Spill (Disk) Errors
0 0 0 SUCCESS NODE_LOCAL 3 / vm119 2015/05/04 17:31:06 21 s 0.1 s 128.1 MB (hadoop) / 3237 70 ms 10.1 MB / 2529 0.0 B 5.7 MB
2 2 0 SUCCESS NODE_LOCAL 1 / vm118 2015/05/04 17:31:06 22 s 89 ms 128.1 MB (hadoop) / 3205 0.1 s 10.1 MB / 2529 -1048576.0 B 5.9 MB
1 1 0 SUCCESS NODE_LOCAL 2 / vm117 2015/05/04 17:31:06 22 s 0.1 s 128.1 MB (hadoop) / 3271 68 ms 10.1 MB / 2529 -1048576.0 B 5.6 MB
4 4 0 SUCCESS NODE_LOCAL 2 / vm117 2015/05/04 17:31:06 22 s 0.1 s 128.1 MB (hadoop) / 3192 51 ms 10.1 MB / 2529 -1048576.0 B 5.9 MB
3 3 0 SUCCESS NODE_LOCAL 3 / vm119 2015/05/04 17:31:06 22 s 0.1 s 128.1 MB (hadoop) / 3262 51 ms 10.1 MB / 2529 1024.0 KB 5.8 MB
5 5 0 SUCCESS NODE_LOCAL 1 / vm118 2015/05/04 17:31:06 22 s 89 ms 128.1 MB (hadoop) / 3256 93 ms 10.1 MB / 2529 -1048576.0 B 5.7 MB
/cc davies
Author: linweizhong <linweizhong@huawei.com>
Closes#5887 from Sephiroth-Lin/spark-7339 and squashes the following commits:
9186c81 [linweizhong] Use max function to get a nonnegative value
d41672b [linweizhong] Update MemoryBytesSpilled when memorySpilled > 0
(cherry picked from commit 8948ad3fb5)
Signed-off-by: Davies Liu <davies@databricks.com>
```
sbt.ForkMain$ForkError: 1424424077190 was not equal to 1424474477190
at org.scalatest.MatchersHelper$.newTestFailedException(MatchersHelper.scala:160)
at org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6231)
at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6265)
at org.apache.spark.status.api.v1.SimpleDateParamTest$$anonfun$1.apply$mcV$sp(SimpleDateParamTest.scala:25)
at org.apache.spark.status.api.v1.SimpleDateParamTest$$anonfun$1.apply(SimpleDateParamTest.scala:23)
at org.apache.spark.status.api.v1.SimpleDateParamTest$$anonfun$1.apply(SimpleDateParamTest.scala:23)
at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
at org.scalatest.Suite$class.withFixture(Suite.scala:
```
Set timezone to fix SimpleDateParamTest
Author: scwf <wangfei1@huawei.com>
Author: Fei Wang <wangfei1@huawei.com>
Closes#6377 from scwf/fix-SimpleDateParamTest and squashes the following commits:
b8df1e5 [Fei Wang] Update SimpleDateParamSuite.scala
8bb74f0 [scwf] fix SimpleDateParamSuite
(cherry picked from commit bf49c22130)
Signed-off-by: Imran Rashid <irashid@cloudera.com>
The Catalyst DSL is no longer used as a public facing API. This pull request removes the UDF and writeToFile feature from it since they are not used in unit tests.
Author: Reynold Xin <rxin@databricks.com>
Closes#6350 from rxin/unused-logical-dsl and squashes the following commits:
90b3de6 [Reynold Xin] [SQL][minor] Removed unused Catalyst logical plan DSL.
(cherry picked from commit c9adcad81a)
Signed-off-by: Reynold Xin <rxin@databricks.com>
https://issues.apache.org/jira/browse/SPARK-7832
Author: Yin Huai <yhuai@databricks.com>
Closes#6385 from yhuai/runSQLTests and squashes the following commits:
3d399bc [Yin Huai] Always run SQL tests in master build.
(cherry picked from commit f38e619c41)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Adds a section in the RDD persistence section of the programming-guide docs detailing Spark-Tachyon version compatibility as discussed in [[SPARK-6391]](https://issues.apache.org/jira/browse/SPARK-6391).
Author: Calvin Jia <jia.calvin@gmail.com>
Closes#6382 from calvinjia/spark-6391 and squashes the following commits:
113e863 [Calvin Jia] Move compatibility info to the offheap storage level section.
7942dc5 [Calvin Jia] Add a section in the programming-guide docs for Tachyon compatibility.
(cherry picked from commit ce0051d6f7)
Signed-off-by: Reynold Xin <rxin@databricks.com>
When committing/aborting a write task issued in `InsertIntoHadoopFsRelation`, if an exception is thrown from `OutputWriter.close()`, the committing/aborting process will be interrupted, and leaves messy stuff behind (e.g., the `_temporary` directory created by `FileOutputCommitter`).
This PR makes these two process more robust by catching potential exceptions and falling back to normal task committment/abort.
Author: Cheng Lian <lian@databricks.com>
Closes#6378 from liancheng/spark-7838 and squashes the following commits:
f18253a [Cheng Lian] Makes task committing/aborting in InsertIntoHadoopFsRelation more robust
(cherry picked from commit 8af1bf10b7)
Signed-off-by: Cheng Lian <lian@databricks.com>
The "Database does not exist" error reported in SPARK-7684 was caused by `HiveContext.newTemporaryConfiguration()`, which always creates a new temporary metastore directory and returns a metastore configuration pointing that directory. This makes `TestHive.reset()` always replaces old temporary metastore with an empty new one.
Author: Cheng Lian <lian@databricks.com>
Closes#6359 from liancheng/spark-7684 and squashes the following commits:
95d2eb8 [Cheng Lian] Addresses @marmbrust's comment
042769d [Cheng Lian] Don't create new temp directory in HiveContext.newTemporaryConfiguration()
(cherry picked from commit bfeedc69a2)
Signed-off-by: Cheng Lian <lian@databricks.com>
https://issues.apache.org/jira/browse/SPARK-7805
Because `sql/hive`'s tests depend on the test jar of `sql/core`, we do not need to store `SQLTestUtils` and `ParquetTest` in `src/main`. We should only add stuff that will be needed by `sql/console` or Python tests (for Python, we need it in `src/main`, right? davies).
Author: Yin Huai <yhuai@databricks.com>
Closes#6334 from yhuai/SPARK-7805 and squashes the following commits:
af6d0c9 [Yin Huai] mima
b86746a [Yin Huai] Move SQLTestUtils.scala and ParquetTest.scala to src/test.
(cherry picked from commit ed21476bc0)
Signed-off-by: Yin Huai <yhuai@databricks.com>
https://issues.apache.org/jira/browse/SPARK-7845
Author: Yin Huai <yhuai@databricks.com>
Closes#6384 from yhuai/hadoop1Test and squashes the following commits:
82fcea8 [Yin Huai] Use hadoop 1.2.1 (a stable version) for hadoop 1 test.
(cherry picked from commit bfbc0df729)
Signed-off-by: Yin Huai <yhuai@databricks.com>
This is to fix an issue reported in #6373 where the `cp` would fail if `-Psparkr` was not used in the build
cc dragos pwendell
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes#6379 from shivaram/make-distribution-hotfix and squashes the following commits:
08eb7e4 [Shivaram Venkataraman] Copy SparkR lib if it exists in make-distribution
(cherry picked from commit b231baa248)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
This one continues the work of https://github.com/apache/spark/pull/6216.
Author: Yin Huai <yhuai@databricks.com>
Author: Reynold Xin <rxin@databricks.com>
Closes#6366 from yhuai/insert and squashes the following commits:
3d717fb [Yin Huai] Use insertInto to handle the casue when table exists and Append is used for saveAsTable.
56d2540 [Yin Huai] Add PreWriteCheck to HiveContext's analyzer.
c636e35 [Yin Huai] Remove unnecessary empty lines.
cf83837 [Yin Huai] Move insertInto to write. Also, remove the partition columns from InsertIntoHadoopFsRelation.
0841a54 [Reynold Xin] Removed experimental tag for deprecated methods.
33ed8ef [Reynold Xin] [SPARK-7654][SQL] Move insertInto into reader/writer interface.
(cherry picked from commit 2b7e63585d)
Signed-off-by: Yin Huai <yhuai@databricks.com>
Add tests later.
Author: Davies Liu <davies@databricks.com>
Closes#6375 from davies/insertInto and squashes the following commits:
826423e [Davies Liu] add insertInto() to Writer
(cherry picked from commit be47af1bdb)
Signed-off-by: Davies Liu <davies@databricks.com>
In the old implementation, if a batch has no block, `areWALRecordHandlesPresent` will be `true` and it will return `WriteAheadLogBackedBlockRDD`.
This PR handles this case by returning `WriteAheadLogBackedBlockRDD` or `BlockRDD` according to the configuration.
Author: zsxwing <zsxwing@gmail.com>
Closes#6372 from zsxwing/SPARK-7777 and squashes the following commits:
788f895 [zsxwing] Handle the case when there is no block in a batch
(cherry picked from commit ad0badba14)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
This change also remove native libraries from SparkR to make sure our distribution works across platforms
Tested by building on Mac, running on Amazon Linux (CentOS), Windows VM and vice-versa (built on Linux run on Mac)
I will also test this with YARN soon and update this PR.
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes#6373 from shivaram/sparkr-binary and squashes the following commits:
ae41b5c [Shivaram Venkataraman] Remove native libraries from SparkR Also include the built SparkR package in make-distribution.sh
(cherry picked from commit a40bca0111)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
sqlCtx -> sqlContext
You can check the docs by:
```
$ cd docs
$ SKIP_SCALADOC=1 jekyll serve
```
cc shivaram
Author: Davies Liu <davies@databricks.com>
Closes#5442 from davies/r_docs and squashes the following commits:
7a12ec6 [Davies Liu] remove rdd in R docs
8496b26 [Davies Liu] remove the docs related to RDD
e23b9d6 [Davies Liu] delete R docs for RDD API
222e4ff [Davies Liu] Merge branch 'master' into r_docs
89684ce [Davies Liu] Merge branch 'r_docs' of github.com:davies/spark into r_docs
f0a10e1 [Davies Liu] address comments from @shivaram
f61de71 [Davies Liu] Update pairRDD.R
3ef7cf3 [Davies Liu] use + instead of function(a,b) a+b
2f10a77 [Davies Liu] address comments from @cafreeman
9c2a062 [Davies Liu] mention R api together with Python API
23f751a [Davies Liu] Fill in SparkR examples in programming guide
(cherry picked from commit 7af3818c6b)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes#6369 from tdas/SPARK-7838 and squashes the following commits:
87d1c7f [Tathagata Das] Addressed comment
37775d8 [Tathagata Das] set scope for kinesis stream
(cherry picked from commit baa89838cc)
Signed-off-by: Andrew Or <andrew@databricks.com>
Enables the SparkR profiles for all the binary builds we create
cc pwendell
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes#6371 from shivaram/sparkr-create-release and squashes the following commits:
ca5a0b2 [Shivaram Venkataraman] Add -Psparkr to create-release.sh
(cherry picked from commit 017b3404a5)
Signed-off-by: Patrick Wendell <patrick@databricks.com>
Added logistic regression to the list of Multiclass Classification Supported Methods in the MLlib Classification and Regression documentation, as it was missing.
Author: Mike Dusenberry <dusenberrymw@gmail.com>
Closes#6357 from dusenberrymw/Add_LR_To_List_Of_Multiclass_Classification_Methods and squashes the following commits:
7918650 [Mike Dusenberry] Updating broken link due to the "Binary Classification" section on the Linear Methods page being renamed to "Classification".
3005dc2 [Mike Dusenberry] Adding logistic regression to the list of Multiclass Classification Supported Methods in the MLlib Classification and Regression documentation, as it was missing.
(cherry picked from commit 63a5ce75ea)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
The previous PR for SPARK-7224 (#5790) broke JDK 6, because it used java.nio.Path, which was in jdk 7, and not in 6. This PR uses Guava's `Files` to handle directory creation, and etc...
The description from the previous PR:
> This patch contains an `IvyTestUtils` file, which dynamically generates jars and pom files to test the `--packages` feature without having to rely on the internet, and Maven Central.
cc pwendell
I also rand the flaky test about 20 times locally, it didn't fail a single time, but I think it may fail like once every 100 builds? I still haven't figured the cause yet, but the test before it, `--jars` was also failing after we turned off the `--packages` test in `SparkSubmitSuite`. It may be related to the launch of SparkSubmit.
Author: Burak Yavuz <brkyvz@gmail.com>
Closes#5892 from brkyvz/maven-utils and squashes the following commits:
e9b1903 [Burak Yavuz] fix merge conflict
68214e0 [Burak Yavuz] remove ignore for test(neglect spark dependencies)
e632381 [Burak Yavuz] fix ignore
9ef1408 [Burak Yavuz] re-enable --packages test
22eea62 [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into maven-utils
05cd0de [Burak Yavuz] added mock repository generator
(cherry picked from commit 8014e1f6bb)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
KinesisReceiver calls worker.run() which is a blocking call (while loop) as per source code of kinesis-client library - https://github.com/awslabs/amazon-kinesis-client/blob/v1.2.1/src/main/java/com/amazonaws/services/kinesis/clientlibrary/lib/worker/Worker.java.
This results in infinite loop while calling sparkStreamingContext.stop(stopSparkContext = false, stopGracefully = true) perhaps because ReceiverTracker is never able to register the receiver (it's receiverInfo field is a empty map) causing it to be stuck in infinite loop while waiting for running flag to be set to false.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes#6348 from tdas/SPARK-7788 and squashes the following commits:
2584683 [Tathagata Das] Added receiver id in thread name
6cf1cd4 [Tathagata Das] Made KinesisReceiver.onStart non-blocking
(cherry picked from commit 1c388a9985)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
The default add time of 5s is still too slow for small jobs. Also, the current default remove time of 10 minutes seem rather high. This patch lowers both and rephrases a few log messages.
Author: Andrew Or <andrew@databricks.com>
Closes#6301 from andrewor14/da-minor and squashes the following commits:
6d614a6 [Andrew Or] Lower log level
2811492 [Andrew Or] Log information when requests are canceled
5fcd3eb [Andrew Or] Fix tests
3320710 [Andrew Or] Lower timeouts + rephrase a few log messages
(cherry picked from commit 3d8760d76e)
Signed-off-by: Andrew Or <andrew@databricks.com>
Author: Michael Armbrust <michael@databricks.com>
Closes#6363 from marmbrus/windowErrors and squashes the following commits:
516b02d [Michael Armbrust] [SPARK-7834] [SQL] Better window error messages
(cherry picked from commit 3c1305107a)
Signed-off-by: Michael Armbrust <michael@databricks.com>
JIRA: https://issues.apache.org/jira/browse/SPARK-7270
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#5864 from viirya/dyn_partition_insert and squashes the following commits:
b5627df [Liang-Chi Hsieh] For comments.
3b21e4b [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into dyn_partition_insert
8a4352d [Liang-Chi Hsieh] Consider dynamic partition when inserting into hive table.
(cherry picked from commit 126d7235de)
Signed-off-by: Michael Armbrust <michael@databricks.com>