spark-instrumented-optimizer/network
Aaron Davidson 968ad97217 [SPARK-7003] Improve reliability of connection failure detection between Netty block transfer service endpoints
Currently we rely on the assumption that an exception will be raised and the channel closed if two endpoints cannot communicate over a Netty TCP channel. However, this guarantee does not hold in all network environments, and [SPARK-6962](https://issues.apache.org/jira/browse/SPARK-6962) seems to point to a case where only the server side of the connection detected a fault.

This patch improves robustness of fetch/rpc requests by having an explicit timeout in the transport layer which closes the connection if there is a period of inactivity while there are outstanding requests.

NB: This patch is actually only around 50 lines added if you exclude the testing-related code.

Author: Aaron Davidson <aaron@databricks.com>

Closes #5584 from aarondav/timeout and squashes the following commits:

8699680 [Aaron Davidson] Address Reynold's comments
37ce656 [Aaron Davidson] [SPARK-7003] Improve reliability of connection failure detection between Netty block transfer service endpoints
2015-04-20 09:54:21 -07:00
..
common [SPARK-7003] Improve reliability of connection failure detection between Netty block transfer service endpoints 2015-04-20 09:54:21 -07:00
shuffle [SPARK-6371] [build] Update version to 1.4.0-SNAPSHOT. 2015-03-20 18:43:57 +00:00
yarn [SPARK-6371] [build] Update version to 1.4.0-SNAPSHOT. 2015-03-20 18:43:57 +00:00