Netty's DefaultFileRegion requires a FileDescriptor in its constructor, which means we need to have a opened file handle. In super large workloads, this could lead to too many open files due to the way these file descriptors are cleaned. This pull request creates a new LazyFileRegion that initializes the FileDescriptor when we are sending data for the first time.
Author: Reynold Xin <rxin@databricks.com>
Author: Reynold Xin <rxin@apache.org>
Closes#3172 from rxin/lazyFD and squashes the following commits:
0bdcdc6 [Reynold Xin] Added reference to Netty's DefaultFileRegion
d4564ae [Reynold Xin] Added SparkConf to the ctor argument of IndexShuffleBlockManager.
6ed369e [Reynold Xin] Code review feedback.
04cddc8 [Reynold Xin] [SPARK-4307] Initialize FileDescriptor lazily in FileRegion.
This PR check all of the existing Python MLlib API to make sure that numpy.array is supported as Vector (also RDD of numpy.array).
It also improve some docstring and doctest.
cc mateiz mengxr
Author: Davies Liu <davies@databricks.com>
Closes#3189 from davies/numpy and squashes the following commits:
d5057c4 [Davies Liu] fix tests
6987611 [Davies Liu] support numpy.array for all MLlib API
In running-on-yarn.md, a link to YARN overview is here.
But the URL is to YARN alpha's.
It should be stable's.
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes#3196 from sarutak/SPARK-4330 and squashes the following commits:
30baa21 [Kousuke Saruta] Fixed running-on-yarn.md to point proper URL for YARN
As [reported][1] on the mailing list, GraphX throws
```
java.lang.ClassCastException: java.lang.Long cannot be cast to scala.Tuple2
at org.apache.spark.graphx.impl.RoutingTableMessageSerializer$$anon$1$$anon$2.writeObject(Serializers.scala:39)
at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:195)
at org.apache.spark.util.collection.ExternalSorter.spillToMergeableFile(ExternalSorter.scala:329)
```
when sort-based shuffle attempts to spill to disk. This is because GraphX defines custom serializers for shuffling pair RDDs that assume Spark will always serialize the entire pair object rather than breaking it up into its components. However, the spill code path in sort-based shuffle [violates this assumption][2].
GraphX uses the custom serializers to compress vertex ID keys using variable-length integer encoding. However, since the serializer can no longer rely on the key and value being serialized and deserialized together, performing such encoding would either require writing a tag byte (costly) or maintaining state in the serializer and assuming that serialization calls will alternate between key and value (fragile).
Instead, this PR simply removes the custom serializers. This causes a **10% slowdown** (494 s to 543 s) and **16% increase in per-iteration communication** (2176 MB to 2518 MB) for PageRank (averages across 3 trials, 10 iterations per trial, uk-2007-05 graph, 16 r3.2xlarge nodes).
[1]: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassCastException-java-lang-Long-cannot-be-cast-to-scala-Tuple2-td13926.html#a14501
[2]: f9d6220c79/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala (L329)
Author: Ankur Dave <ankurdave@gmail.com>
Closes#2503 from ankurdave/SPARK-3649 and squashes the following commits:
a49c2ad [Ankur Dave] [SPARK-3649] Remove GraphX custom serializers
Author: Cheng Hao <hao.cheng@intel.com>
Closes#3139 from chenghao-intel/comparison_test and squashes the following commits:
f5d7146 [Cheng Hao] avoid exception in printing the codegen enabled
about convert files to RDDS there are 3 loops with files sequence in spark source.
loops files sequence:
1.files.map(...)
2.files.zip(fileRDDs)
3.files-size.foreach
It's will very time consuming when lots of files.So I do the following correction:
3 loops with files sequence => only one loop
Author: surq <surq@asiainfo.com>
Closes#2811 from surq/SPARK-3954 and squashes the following commits:
321bbe8 [surq] updated the code style.The style from [for...yield]to [files.map(file=>{})]
88a2c20 [surq] Merge branch 'master' of https://github.com/apache/spark into SPARK-3954
178066f [surq] modify code's style. [Exceeds 100 columns]
626ef97 [surq] remove redundant import(ArrayBuffer)
739341f [surq] promote the speed of convert files to RDDS
This implement the feature davies mentioned in https://github.com/apache/spark/pull/2901#discussion-diff-19313312
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#3012 from adrian-wang/iso8601 and squashes the following commits:
50df6e7 [Daoyuan Wang] json data timestamp ISO8601 support
Author: Cheng Hao <hao.cheng@intel.com>
Closes#3114 from chenghao-intel/constant_null_oi and squashes the following commits:
e603bda [Cheng Hao] fix the bug of null value for primitive types
50a13ba [Cheng Hao] fix the timezone issue
f54f369 [Cheng Hao] fix bug of constant null value for ObjectInspector
it generates warnings at compile time marmbrus
Author: Xiangrui Meng <meng@databricks.com>
Closes#3192 from mengxr/dtc-decimal and squashes the following commits:
955e9fb [Xiangrui Meng] remove a decimal case branch that has no effect
In `HiveThriftServer2`, when an exception is thrown during a SQL execution, the SQL operation state should be set to `ERROR`, but now it remains `RUNNING`. This affects the result of the `GetOperationStatus` Thrift API.
Author: Cheng Lian <lian@databricks.com>
Closes#3175 from liancheng/fix-op-state and squashes the following commits:
6d4c1fe [Cheng Lian] Sets SQL operation state to ERROR when exception is thrown
This is a follow up of #2845. In addition to unit-tests.log files, also upload failure output files generated by `HiveCompatibilitySuite` to Jenkins master. These files can be very helpful to debug Hive compatibility test failures.
/cc pwendell marmbrus
Author: Cheng Lian <lian@databricks.com>
Closes#2993 from liancheng/upload-hive-compat-logs and squashes the following commits:
8e6247f [Cheng Lian] Uploads HiveCompatibilitySuite logs
Author: Takuya UESHIN <ueshin@happy-camper.st>
Closes#3185 from ueshin/issues/SPARK-4319 and squashes the following commits:
a44a38e [Takuya UESHIN] Enable an ignored test "null count".
Based on SPARK-2434, this PR generates runtime warnings for example implementations (Python, Scala) of PageRank.
Author: Varadharajan Mukundan <srinathsmn@gmail.com>
Closes#2894 from varadharajan/SPARK-4047 and squashes the following commits:
5f9406b [Varadharajan Mukundan] [SPARK-4047] - Point users to LogisticRegressionWithSGD and LogisticRegressionWithLBFGS instead of LogisticRegressionModel
252f595 [Varadharajan Mukundan] a. Generate runtime warnings for
05a018b [Varadharajan Mukundan] Fix PageRank implementation's package reference
5c2bf54 [Varadharajan Mukundan] [SPARK-4047] - Generate runtime warnings for example implementation of PageRank
pwendell rxin
Please take a look
Author: tedyu <yuzhihong@gmail.com>
Closes#3115 from tedyu/master and squashes the following commits:
2b079c8 [tedyu] SPARK-1297 Upgrade HBase dependency to 0.98
Author: Sandy Ryza <sandy@cloudera.com>
Closes#3107 from sryza/sandy-spark-4230 and squashes the following commits:
37a1d19 [Sandy Ryza] Clear up a couple things
34d53de [Sandy Ryza] SPARK-4230. Doc for spark.default.parallelism is incorrect
sbt-launch-lib.bash includes `die` command but it's not valid command for Linux, MacOS X or Windows.
Closes#2898
Author: Jey Kottalam <jey@kottalam.net>
Closes#3182 from sarutak/SPARK-4312 and squashes the following commits:
24c6677 [Jey Kottalam] bash doesn't have "die"
Trying this example, I missed the moment when the checkpoint was iniciated
Author: comcmipi <pitonak@fns.uniba.sk>
Closes#2735 from comcmipi/patch-1 and squashes the following commits:
b6d8001 [comcmipi] Update RecoverableNetworkWordCount.scala
96fe274 [comcmipi] Update RecoverableNetworkWordCount.scala
Here's my attempt to re-port `RecoverableNetworkWordCount` to Java, following the example of its Scala and Java siblings. I fixed a few minor doc/formatting issues along the way I believe.
Author: Sean Owen <sowen@cloudera.com>
Closes#2564 from srowen/SPARK-2548 and squashes the following commits:
0d0bf29 [Sean Owen] Update checkpoint call as in https://github.com/apache/spark/pull/2735
35f23e3 [Sean Owen] Remove old comment about running in standalone mode
179b3c2 [Sean Owen] Re-port RecoverableNetworkWordCount to Java example, and touch up doc / formatting in related examples
For me the core tests failed because there are two locale dependent parts in the code.
Look at the Jira ticket for details.
Why is it necessary to check the exception message in isBindCollision in
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1686
?
Author: Niklas Wilcke <1wilcke@informatik.uni-hamburg.de>
Closes#3036 from numbnut/core-test-fix and squashes the following commits:
1fb0d04 [Niklas Wilcke] Fixing locale dependend code and tests
marmbrus
Author: Xiangrui Meng <meng@databricks.com>
Closes#3164 from mengxr/hive-udt and squashes the following commits:
57c7519 [Xiangrui Meng] support udt->hive types (hive->udt is not supported)
Make Tachyon related unit tests execute without deploying a Tachyon system locally.
Author: RongGu <gurongwalker@gmail.com>
Closes#3030 from RongGu/SPARK-2703 and squashes the following commits:
ad08827 [RongGu] Make Tachyon related unit tests execute without deploying a Tachyon system locally
This commit exists to close the following pull requests on Github:
Closes#2898 (close requested by 'pwendell')
Closes#2212 (close requested by 'pwendell')
Closes#2102 (close requested by 'pwendell')
Author: Sandy Ryza <sandy@cloudera.com>
This patch had conflicts when merged, resolved by
Committer: Kay Ousterhout <kayousterhout@gmail.com>
Closes#2968 from sryza/sandy-spark-3179 and squashes the following commits:
dce4784 [Sandy Ryza] More review feedback
8d350d1 [Sandy Ryza] Fix test against Hadoop 2.5+
e7c74d0 [Sandy Ryza] More review feedback
6cff9c4 [Sandy Ryza] Review feedback
fb2dde0 [Sandy Ryza] SPARK-3179
andrewor14 Another try at SPARK-1209, to address https://github.com/apache/spark/pull/2814#issuecomment-61197619
I successfully tested with `mvn -Dhadoop.version=1.0.4 -DskipTests clean package; mvn -Dhadoop.version=1.0.4 test` I assume that is what failed Jenkins last time. I also tried `-Dhadoop.version1.2.1` and `-Phadoop-2.4 -Pyarn -Phive` for more coverage.
So this is why the class was put in `org.apache.hadoop` to begin with, I assume. One option is to leave this as-is for now and move it only when Hadoop 1.0.x support goes away.
This is the other option, which adds a call to force the constructor to be public at run-time. It's probably less surprising than putting Spark code in `org.apache.hadoop`, but, does involve reflection. A `SecurityManager` might forbid this, but it would forbid a lot of stuff Spark does. This would also only affect Hadoop 1.0.x it seems.
Author: Sean Owen <sowen@cloudera.com>
Closes#3048 from srowen/SPARK-1209 and squashes the following commits:
0d48f4b [Sean Owen] For Hadoop 1.0.x, make certain constructors public, which were public in later versions
466e179 [Sean Owen] Disable MIMA warnings resulting from moving the class -- this was also part of the PairRDDFunctions type hierarchy though?
eb61820 [Sean Owen] Move SparkHadoopMapRedUtil / SparkHadoopMapReduceUtil from org.apache.hadoop to org.apache.spark
This commit exists to close the following pull requests on Github:
Closes#464 (close requested by 'JoshRosen')
Closes#283 (close requested by 'pwendell')
Closes#449 (close requested by 'pwendell')
Closes#907 (close requested by 'pwendell')
Closes#2478 (close requested by 'JoshRosen')
Closes#2192 (close requested by 'tdas')
Closes#918 (close requested by 'pwendell')
Closes#1465 (close requested by 'pwendell')
Closes#3135 (close requested by 'JoshRosen')
Closes#1693 (close requested by 'tdas')
Closes#1279 (close requested by 'pwendell')
Use "k" in javadoc of top and takeOrdered to avoid confusion with type K in pair RDDs. I think this resolves the discussion in SPARK-1344.
Author: Sean Owen <sowen@cloudera.com>
Closes#3168 from srowen/SPARK-1344 and squashes the following commits:
6963fcc [Sean Owen] Use "k" in javadoc of top and takeOrdered to avoid confusion with type K in pair RDDs
This is a trivial change to add links to the wiki from `README.md` and the main docs page. It is already linked to from spark.apache.org.
Author: Sean Owen <sowen@cloudera.com>
Closes#3169 from srowen/SPARK-971 and squashes the following commits:
dcb84d0 [Sean Owen] Add link to wiki from README, docs home page
In Spark 1.0.0+, calling `stop()` on a StreamingContext that has not been started is a no-op which has no side-effects. This allows users to call `stop()` on a fresh StreamingContext followed by `start()`. I believe that this almost always indicates an error and is not behavior that we should support. Since we don't allow `start() stop() start()` then I don't think it makes sense to allow `stop() start()`.
The current behavior can lead to resource leaks when StreamingContext constructs its own SparkContext: if I call `stop(stopSparkContext=True)`, then I expect StreamingContext's underlying SparkContext to be stopped irrespective of whether the StreamingContext has been started. This is useful when writing unit test fixtures.
Prior discussions:
- https://github.com/apache/spark/pull/3053#discussion-diff-19710333R490
- https://github.com/apache/spark/pull/3121#issuecomment-61927353
Author: Josh Rosen <joshrosen@databricks.com>
Closes#3160 from JoshRosen/SPARK-4301 and squashes the following commits:
dbcc929 [Josh Rosen] Address more review comments
bdbe5da [Josh Rosen] Stop SparkContext after stopping scheduler, not before.
03e9c40 [Josh Rosen] Always stop SparkContext, even if stop(false) has already been called.
832a7f4 [Josh Rosen] Address review comment
5142517 [Josh Rosen] Add tests; improve Scaladoc.
813e471 [Josh Rosen] Revert workaround added in https://github.com/apache/spark/pull/3053/files#diff-e144dbee130ed84f9465853ddce65f8eR49
5558e70 [Josh Rosen] StreamingContext.stop() should stop SparkContext even if StreamingContext has not been started yet.
Author: Aaron Davidson <aaron@databricks.com>
Closes#3166 from aarondav/closeQuietlyer and squashes the following commits:
78096b5 [Aaron Davidson] Don't NPE on closeQuietly(null)
SPARK-1553 added alternating nonnegative least squares to MLLib, however it's not possible to access it via the python API. This pull request resolves that.
Author: Michelangelo D'Agostino <mdagostino@civisanalytics.com>
Closes#3095 from mdagost/python_nmf and squashes the following commits:
a6743ad [Michelangelo D'Agostino] Use setters instead of static methods in PythonMLLibAPI. Remove the new static methods I added. Set seed in tests. Change ratings to ratingsRDD in both train and trainImplicit for consistency.
7cffd39 [Michelangelo D'Agostino] Swapped nonnegative and seed in a few more places.
3fdc851 [Michelangelo D'Agostino] Moved seed to the end of the python parameter list.
bdcc154 [Michelangelo D'Agostino] Change seed type to java.lang.Long so that it can handle null.
cedf043 [Michelangelo D'Agostino] Added in ability to set the seed from python and made that play nice with the nonnegative changes. Also made the python ALS tests more exact.
a72fdc9 [Michelangelo D'Agostino] Expose nonnegative ALS in the python API.
This PR fix sortBy()/sortByKey() on empty RDD.
This should be back ported into 1.1/1.2
Author: Davies Liu <davies@databricks.com>
Closes#3162 from davies/fix_sort and squashes the following commits:
84f64b7 [Davies Liu] add tests
52995b5 [Davies Liu] fix sortByKey() on empty RDD
This commit exists to close the following pull requests on Github:
Closes#3016 (close requested by 'andrewor14')
Closes#2798 (close requested by 'andrewor14')
Closes#2864 (close requested by 'andrewor14')
Closes#3154 (close requested by 'JoshRosen')
Closes#3156 (close requested by 'JoshRosen')
Closes#214 (close requested by 'kayousterhout')
Closes#2584 (close requested by 'andrewor14')
数组下标越界
Author: xiao321 <1042460381@qq.com>
Closes#3153 from xiao321/patch-1 and squashes the following commits:
0ed17b5 [xiao321] Update JavaCustomReceiver.java
When doing an insert into hive table with partitions the folders written to the file system are in a random order instead of the order defined in table creation. Seems that the loadPartition method in Hive.java has a Map<String,String> parameter but expects to be called with a map that has a defined ordering such as LinkedHashMap. Working on a test but having intillij problems
Author: Matthew Taylor <matthew.t@tbfe.net>
Closes#3076 from tbfenet/partition_dir_order_problem and squashes the following commits:
f1b9a52 [Matthew Taylor] Comment format fix
bca709f [Matthew Taylor] review changes
0e50f6b [Matthew Taylor] test fix
99f1a31 [Matthew Taylor] partition ordering fix
369e618 [Matthew Taylor] partition ordering fix
`Cast` from `DateType` to `DecimalType` throws `NullPointerException`.
Author: Takuya UESHIN <ueshin@happy-camper.st>
Closes#3134 from ueshin/issues/SPARK-4270 and squashes the following commits:
7394e4b [Takuya UESHIN] Fix Cast from DateType to DecimalType.
Currently, the data "unwrap" only support couple of primitive types, not all, it will not cause exception, but may get some performance in table scanning for the type like binary, date, timestamp, decimal etc.
Author: Cheng Hao <hao.cheng@intel.com>
Closes#3136 from chenghao-intel/table_reader and squashes the following commits:
fffb729 [Cheng Hao] fix bug for retrieving the timestamp object
e9c97a4 [Cheng Hao] Add more unwrapper functions for primitive type in TableReader
Following description is quoted from JIRA:
When I issue a hql query against a HiveContext where my predicate uses a column of string type with one of LT, LTE, GT, or GTE operator, I get the following error:
scala.MatchError: StringType (of class org.apache.spark.sql.catalyst.types.StringType$)
Looking at the code in org.apache.spark.sql.parquet.ParquetFilters, StringType is absent from the corresponding functions for creating these filters.
To reproduce, in a Hive 0.13.1 shell, I created the following table (at a specified DB):
create table sparkbug (
id int,
event string
) stored as parquet;
Insert some sample data:
insert into table sparkbug select 1, '2011-06-18' from <some table> limit 1;
insert into table sparkbug select 2, '2012-01-01' from <some table> limit 1;
Launch a spark shell and create a HiveContext to the metastore where the table above is located.
import org.apache.spark.sql._
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveContext
val hc = new HiveContext(sc)
hc.setConf("spark.sql.shuffle.partitions", "10")
hc.setConf("spark.sql.hive.convertMetastoreParquet", "true")
hc.setConf("spark.sql.parquet.compression.codec", "snappy")
import hc._
hc.hql("select * from <db>.sparkbug where event >= '2011-12-01'")
A scala.MatchError will appear in the output.
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes#3083 from sarutak/SPARK-4213 and squashes the following commits:
4ab6e56 [Kousuke Saruta] WIP
b6890c6 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4213
9a1fae7 [Kousuke Saruta] Fixed ParquetFilters so that compare Strings
'DOUBLE' should be moved before 'ELSE' according to the ordering convension
Author: Jacky Li <jacky.likun@gmail.com>
Closes#3080 from jackylk/patch-5 and squashes the following commits:
3c11df7 [Jacky Li] [SQL] Modify keyword val location according to ordering
Author: Michael Armbrust <michael@databricks.com>
Closes#3096 from marmbrus/reflectionContext and squashes the following commits:
adc221f [Michael Armbrust] Support ScalaReflection of schema in different universes
This PR resorts to `SparkContext.version` rather than META-INF/MANIFEST.MF in the assembly jar to inspect Spark version. Currently, when built with Maven, the MANIFEST.MF file in the assembly jar is incorrectly replaced by Guava 15.0 MANIFEST.MF, probably because of the assembly/shading tricks.
Another related PR is #3103, which tries to fix the MANIFEST issue.
Author: Cheng Lian <lian@databricks.com>
Closes#3105 from liancheng/spark-4225 and squashes the following commits:
d9585e1 [Cheng Lian] Resorts to SparkContext.version to inspect Spark version
Author: wangfei <wangfei1@huawei.com>
Closes#3127 from scwf/patch-9 and squashes the following commits:
e39a560 [wangfei] now support dynamic partitioning
This PR elimiantes the network package's usage of the Java serializer and replaces it with Encodable, which is a lightweight binary protocol. Each message is preceded by a type id, which will allow us to change messages (by only adding new ones), or to change the format entirely by switching to a special id (such as -1).
This protocol has the advantage over Java that we can guarantee that messages will remain compatible across compiled versions and JVMs, though it does not provide a clean way to do schema migration. In the future, it may be good to use a more heavy-weight serialization format like protobuf, thrift, or avro, but these all add several dependencies which are unnecessary at the present time.
Additionally this unifies the RPC messages of NettyBlockTransferService and ExternalShuffleClient.
Author: Aaron Davidson <aaron@databricks.com>
Closes#3146 from aarondav/free and squashes the following commits:
ed1102a [Aaron Davidson] Remove some unused imports
b8e2a49 [Aaron Davidson] Add appId to test
538f2a3 [Aaron Davidson] [SPARK-4187] [Core] Switch to binary protocol for external shuffle service messages
This PR fixed `Utils.exceptionString` to output the full exception information. However, the stack trace may become very huge, so I also updated the Web UI to collapse the error information by default (display the first line and clicking `+detail` will display the full info).
Here are the screenshots:
Stages:
![stages](https://cloud.githubusercontent.com/assets/1000778/4882441/66d8cc68-6356-11e4-8346-6318677d9470.png)
Details for one stage:
![stage](https://cloud.githubusercontent.com/assets/1000778/4882513/1311043c-6357-11e4-8804-ca14240a9145.png)
The full information in the gray text field is:
```Java
org.apache.spark.shuffle.FetchFailedException: Connection reset by peer
at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67)
at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:129)
at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:160)
at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:159)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:189)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:166)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
... 1 more
```
/cc aarondav
Author: zsxwing <zsxwing@gmail.com>
Closes#3073 from zsxwing/SPARK-4204 and squashes the following commits:
176d1e3 [zsxwing] Add comments to explain the stack trace difference
ca509d3 [zsxwing] Add fullStackTrace to the constructor of ExceptionFailure
a07057b [zsxwing] Core style fix
dfb0032 [zsxwing] Backward compatibility for old history server
1e50f71 [zsxwing] Update as per review and increase the max height of the stack trace details
94f2566 [zsxwing] Change Utils.exceptionString to contain the inner exceptions and make the error information in Web UI more friendly
This relies on a hook from whoever is hosting the shuffle service to invoke removeApplication() when the application is completed. Once invoked, we will clean up all the executors' shuffle directories we know about.
Author: Aaron Davidson <aaron@databricks.com>
Closes#3126 from aarondav/cleanup and squashes the following commits:
33a64a9 [Aaron Davidson] Missing brace
e6e428f [Aaron Davidson] Address comments
16a0d27 [Aaron Davidson] Cleanup
e4df3e7 [Aaron Davidson] [SPARK-4236] Cleanup removed applications' files in shuffle service
This adds a RetryingBlockFetcher to the NettyBlockTransferService which is wrapped around our typical OneForOneBlockFetcher, adding retry logic in the event of an IOException.
This sort of retry allows us to avoid marking an entire executor as failed due to garbage collection or high network load.
TODO:
- [x] unit tests
- [x] put in ExternalShuffleClient too
Author: Aaron Davidson <aaron@databricks.com>
Closes#3101 from aarondav/retry and squashes the following commits:
72a2a32 [Aaron Davidson] Add that we should remove the condition around the retry thingy
c7fd107 [Aaron Davidson] Fix unit tests
e80e4c2 [Aaron Davidson] Address initial comments
6f594cd [Aaron Davidson] Fix unit test
05ff43c [Aaron Davidson] Add to external shuffle client and add unit test
66e5a24 [Aaron Davidson] [SPARK-4238] [Core] Perform network-level retry of shuffle file fetches
Author: Aaron Davidson <aaron@databricks.com>
Closes#3142 from aarondav/worker and squashes the following commits:
3780bd7 [Aaron Davidson] Address comments
2dcdfc1 [Aaron Davidson] Add private[worker]
47f49d3 [Aaron Davidson] NettyBlockTransferService shouldn't care about app ids (it's only b/t executors)
258417c [Aaron Davidson] [SPARK-4277] Support external shuffle service on executor
I did not realize there was a `network.util.JavaUtils` when I wrote this code. This PR moves the `ByteBuffer` string conversion to the appropriate place. I tested the changes on a stable yarn cluster.
Author: Andrew Or <andrew@databricks.com>
Closes#3144 from andrewor14/yarn-shuffle-util and squashes the following commits:
b6c08bf [Andrew Or] Remove unused import
94e205c [Andrew Or] Use netty Unpooled
85202a5 [Andrew Or] Use guava Charsets
057135b [Andrew Or] Reword comment
adf186d [Andrew Or] Move byte buffer String conversion logic to JavaUtils