Re: 'tcp: bind() fix when many ports are bound' problem

Previous thread: [PATCH] ipv4/route.c: respect prefsrc for local routes by Joel Sing on Monday, January 3, 2011 - 11:24 pm. (9 messages)

Next thread: [PATCH v2 01/10] net/fec: fix MMFR_OP type in fec_enet_mdio_write by Shawn Guo on Tuesday, January 4, 2011 - 2:24 am. (1 message)
From: Daniel Baluta
Date: Tuesday, January 4, 2011 - 1:53 am

Hi,

After a series of discussions [1], Eric provided
"tcp: bind() fix when many ports are bound" patch. ([2])

Anyhow, due to this problem ([3]) it was reverted.
Where there any follow ups on this patch?

I have spent some time looking at inet_csk_get_port with the
only conclusion that it's scary :D.

Should I work around patch "tcp: bind() fix when many ports are bound",
and try to fix problem [3], or is that a dead end?

thanks,
Daniel.

[1] http://kerneltrap.org/mailarchive/linux-netdev/2010/4/20/6275120
[2] http://kerneltrap.org/mailarchive/git-commits-head/2010/4/24/32191
[3] http://kerneltrap.org/mailarchive/linux-kernel/2010/4/28/4563937
--

From: Gaspar Chilingarov
Date: Tuesday, January 4, 2011 - 2:12 am

Hi there!

Well, that looks strange.

On my own side I've just put workaround (manually binding to all ports
in sequence :)
and moved production code to FreeBSD as it has better scalable network stack.

I can see the potential problem with that bind() problem on highly
loaded DNS servers/resolvers which establish tons of outgoing UDP
connections.

In some cases that connections could fail and as not receiving the
answer it is normal condition for DNS this will go totally unnoticed.

I don't think anyone will hit this bug in production environment
except the very high load applications.

/Gaspar




-- 
Gaspar Chilingarov

tel +37493 419763 (mobile - leave voice mail message)
icq 63174784
skype://gasparch
e mailto:nm@web.am mailto:gasparch@gmail.com
w http://gasparchilingarov.com/
--

From: Eric Dumazet
Date: Tuesday, January 4, 2011 - 4:22 am

Dont mix TCP and UDP, they are not the same.

Problem with TCP is you can have TIME_WAIT sockets, disallowing a port
to be reused. Not with UDP.

The connect() [without a previous bind()], or a sendto() [without a
previous bind()] problem is more an API problem.

When kernel autobinds an UDP socket [to get a local IP/port], there is a
problem on the selection of the local address : It must be ANY_ADDR
(0.0.0.0)

While for TCP, the IP address wont change for the whole session.
Problem is : The port can really be random, while the local address
comes from routing tables. To reach one destination, we usually use one
pref IP address, even if many are available.

If you dont bind() a socket before sending an UDP frame, kernel cannot
assume the local IP address wont change later (for other sent frames, if
routing takes another path), so must use the ANY address for the port
selection done in autobind. Max 2^16-1 choices.

If you have 100 IP addresses on your machine, it doesnt change this ANY
selection [for UDP] at all.

If you need more than 2^16 local endpoints and you have more than one
external IP address, the only portable way is to use bind() yourself and
manage a pool of [tuples]. Well, this is not true for some old OSes
(Solaris 2.5.1 comes to mind with TCP sockets)



--

Previous thread: [PATCH] ipv4/route.c: respect prefsrc for local routes by Joel Sing on Monday, January 3, 2011 - 11:24 pm. (9 messages)

Next thread: [PATCH v2 01/10] net/fec: fix MMFR_OP type in fec_enet_mdio_write by Shawn Guo on Tuesday, January 4, 2011 - 2:24 am. (1 message)