[SPARK-7810] [PYSPARK] solve python rdd socket connection problem

Method "_load_from_socket" in rdd.py cannot load data from jvm socket when ipv6 is used. The current method only works well with ipv4. New modification should work around both two protocols.

Author: Ai He <ai.he@ussuning.com>
Author: AiHe <ai.he@ussuning.com>

Closes #6338 from AiHe/pyspark-networking-issue and squashes the following commits:

d4fc9c4 [Ai He] handle code review 2
e75c5c8 [Ai He] handle code review
5644953 [AiHe] solve python rdd socket connection problem to jvm
This commit is contained in:
Ai He 2015-06-29 14:36:26 -07:00 committed by Davies Liu
parent f6fc254ec4
commit ecd3aacf28

View file

@ -121,10 +121,22 @@ def _parse_memory(s):
def _load_from_socket(port, serializer):
sock = socket.socket()
sock.settimeout(3)
sock = None
# Support for both IPv4 and IPv6.
# On most of IPv6-ready systems, IPv6 will take precedence.
for res in socket.getaddrinfo("localhost", port, socket.AF_UNSPEC, socket.SOCK_STREAM):
af, socktype, proto, canonname, sa = res
try:
sock = socket.socket(af, socktype, proto)
sock.settimeout(3)
sock.connect(sa)
except socket.error:
sock = None
continue
break
if not sock:
raise Exception("could not open socket")
try:
sock.connect(("localhost", port))
rf = sock.makefile("rb", 65536)
for item in serializer.load_stream(rf):
yield item