Hello all, this is about the same box as the message from Remi an hour ago, but most probably not related. I'm running a Teredo (RFC4830) relay on a i386 Xen domU with kernel 2.6.26 vanilla with the integrated pv_ops feature. This relay function is implemented in a userspace daemon called Miredo which provides a tun interface to the OS where native IPv6 for 2001::/32 is routed into. The traffic is then handled in the userspace daemon and emitted encapsulated in IPv4/UDP. Apart from a few scalability problems which seem to be related to the neighbor or route cache size it works fine. The machine is doing around 2kpps of IPv6 traffic (which means 4kpps of IPv4+IPv6). As there are a couple of similar relays globally anycasted I'm supposed to withdraw the route from BGP if the daemon or the machine fails. To do this I'm running ripngd from the Quagga routing suite, which announces the Teredo prefix to my core routers using RIPng (RFC2080). On a kernel level RIPng is basically periodic UDP to a link-local multicast address [ff02::9]:521. Every few hours this announcement fails (no announcements reach the core routers anymore, which kill the routing entry after a timeout). ripngd debugging claims that it could not send the announcement due to "Invalid argument". There are no outgoing packets in tcpdump anymore. I even get the same error when doing a multicast ping6: miredo:~# ping6 -I eth0 ff02::9 PING ff02::9(ff02::9) from fe80::216:3eff:feb9:29f5 eth0: 56 data bytes ping: sendmsg: Invalid argument 64 bytes from fe80::216:3eff:feb9:29f5: icmp_seq=1 ttl=64 time=0.030 ms 64 bytes from fe80::216:3eff:feb9:29f5: icmp_seq=1 ttl=64 time=0.018 ms (DUP!) (fe80::216:3eff:feb9:29f5 is the box itself, it's the only one that ever answers ... duplicate however) ping6 to other multicast addresses, even in the same scope works fine miredo:~# ping6 -I eth0 ff02::2 PING ff02::2(ff02::2) from fe80::216:3eff:feb9:29f5 ...
Well, it'd certainly be good to identify exactly where in
the sendmsg path you're getting the EINVAL from. If
possible, I'd suggest putting in some debugging code
and reproducing it.
If you can pick up the exact arguments when it's happening
via strace and send those here, that may help, but I wouldn't
expect those to be incorrect from ping6 only some of the
time...
I'll look into this some more if someone doesn't beat me to
it, but I'm travelling now so it will be a few days at least before
I get to it.
+-DLS
--
Do you have any suggestions/patches? I'm not an experienced programmer,
and since it usually takes a couple of hours for this problem to appear
I probably would need a few years to get anything reasonable.
Btw, it's happening again just right now (2.6.27-rc5), I've set a static
route so I can run some debugging without affecting the service. So if
you have any debugging commands you would like me to run this is the
The working group:
# strace -e recvmsg,sendmsg ping6 -c 1 -I eth0 ff02::2
recvmsg(4, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"0\0\0\0\24\0\2\0\22\37\274H4x\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1\10"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 108
recvmsg(4, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"@\0\0\0\24\0\2\0\22\37\274H4x\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 256
recvmsg(4, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\24\0\0\0\3\0\2\0\22\37\274H4x\0\0\0\0\0\0\1\0\0\0\24\0\1\0\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 20
sendmsg(3, {msg_name(28)={sa_family=AF_INET6, sin6_port=htons(58), inet_pton(AF_INET6, "ff02::2", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, msg_iov(1)=[{"\200\0\0\0004x\0\1\22\37\274H\366\v\5\0\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30"..., 64}], msg_controllen=32, {cmsg_len=32, cmsg_level=SOL_IPV6, cmsg_type=, ...}, msg_flags=MSG_PEEK|MSG_PROXY|MSG_WAITALL|MSG_CONFIRM|MSG_FIN|MSG_SYN|MSG_RST|MSG_CMSG_CLOEXEC|0x8bc0000}, 0) = 64
recvmsg(3, {msg_name(28)={sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "fe80::216:3eff:feb9:29f5", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=if_nametoindex("eth0")}, msg_iov(1)=[{"\201\0\305\n4x\0\1\22\37\274H\366\v\5\0\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30"..., 4208}], msg_controllen=36, {cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 64
The non-working RIPng ...BTW, these are the sendmsg() calls from ripngd (strace attached to the
running process):
sendmsg(5, {msg_name(28)={sa_family=AF_INET6, sin6_port=htons(521), inet_pton(AF_INET6, "ff02::9", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, msg_iov(1)=[{"\2\1\0\0 \1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \1"..., 24}], msg_controllen=32, {cmsg_len=32, cmsg_level=SOL_IPV6, cmsg_type=, ...}, msg_flags=0}, 0) = -1 EINVAL (Invalid argument)
Regards,
Bernhard
--
It seems that in the latter case, you haven't specified the interface (sin6_scope_id=0), but in the former case you have. You can't send to link-local multicast groups if you have multiple interface if the interface isn't specified. -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings --
On Mon, Sep 01, 2008 at 08:57:51PM +0300, Pekka Savola wrote: I did specify the interface: miredo:~# ping6 -I eth0 ff02::2 PING ff02::2(ff02::2) from fe80::216:3eff:feb9:29f5 eth0: 56 data bytes 64 bytes from fe80::216:3eff:feb9:29f5: icmp_seq=1 ttl=64 time=0.034 ms [...] miredo:~# ping6 -I eth0 ff02::9 PING ff02::9(ff02::9) from fe80::216:3eff:feb9:29f5 eth0: 56 data bytes ping: sendmsg: Invalid argument 64 bytes from fe80::216:3eff:feb9:29f5: icmp_seq=1 ttl=64 time=0.033 ms I have no idea what the different flags mean or where they come from. Also please note that broken ping6 is just a symptom, the real problem is ripngd suddenly not being able to send to the multicast group anymore. Bernhard --
Sorry, as a matter of fact in the quoted text, both sin6_scope_id was 0. I was looking at recvmsg where it was different. -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings --
Well, at least in the ping6 sources I have, msg_flags is never initialized before the sendmsg() call, and since it's allocated on the stack it can have random bits set. Can you rebuild your ping6 with the attached patch and retry? -Brian
Done, no change.
sendmsg(3, {msg_name(28)={sa_family=AF_INET6, sin6_port=htons(58), inet_pton(AF_INET6, "ff02::2", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, msg_iov(1)=[{"\200\0\0\0\3252\0\0010T\275H\274\314\7\0\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30"..., 64}], msg_controllen=32, {cmsg_len=32, cmsg_level=SOL_IPV6, cmsg_type=, ...}, msg_flags=0}, 0) = 64
vs.
sendmsg(3, {msg_name(28)={sa_family=AF_INET6, sin6_port=htons(58), inet_pton(AF_INET6, "ff02::9", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, msg_iov(1)=[{"\200\0\0\0\3162\0\1+T\275H\255K\16\0\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30"..., 64}], msg_controllen=32, {cmsg_len=32, cmsg_level=SOL_IPV6, cmsg_type=, ...}, msg_flags=0}, 0) = -1 EINVAL (Invalid argument)
don't push too hard on ping6, I just included it to show that all
processes are affected sending to this particular group.
Bernhard
--
That was just the obvious answer to why the flags were different. Since EINVAL is too generic to point at one place in the kernel code path, I'd second David Stevens' suggestion of finding where in the sendmsg() code this is coming from. Maybe you can trace the miredo daemon to see what it's doing that might fix the problem if you don't want to start hacking in the kernel. -Brian --
Bernhard,
I looked at this some more and didn't see anything obvious.
The send side doesn't need group membership to send, or anything
special, really. The only thing that comes to mind is that maybe you
have a bogus route installed (since you don't have a bogus interface
flag :-)).
Can you do an "ip -6 route list" when it's happening?
Also might be worthwhile to see the entire arg list, so
maybe using the "-s" option to strace to increase it, and we
probably only need sendmsg(), so maybe:
strace -s 1024 -e trace=sendmsg -e verbose=sendmsg ping6 -I eth0 ....
+-DLS
--
On Mon, Sep 08, 2008 at 05:34:00PM -0700, David Stevens wrote:
Sure, here we go
miredo:~# ip -6 route list
2001::/32 via fe80::1 dev teredo metric 1024 mtu 1280 advmss 1220 hoplimit 4294967295
2001:1b10:100::1:1 via fe80::2c0:9fff:fe4b:8ccf dev eth0 proto zebra metric 2 mtu 1500 advmss 1440 hoplimit 4294967295
2001:1b10:100::1:2 via fe80::2c0:9fff:fe4b:8a4d dev eth0 proto zebra metric 2 mtu 1500 advmss 1440 hoplimit 4294967295
2001:1b10:100::21:1 via fe80::2c0:9fff:fe4b:8a4d dev eth0 proto zebra metric 2 mtu 1500 advmss 1440 hoplimit 4294967295
2001:1b10:100::53:1 via fe80::2c0:9fff:fe4b:8ccf dev eth0 proto zebra metric 2 mtu 1500 advmss 1440 hoplimit 4294967295
2001:1b10:100::119:1 via fe80::2c0:9fff:fe4b:8a4d dev eth0 proto zebra metric 2 mtu 1500 advmss 1440 hoplimit 4294967295
2001:1b10:100::1:9000:1 via fe80::2c0:9fff:fe4b:8a4d dev eth0 proto zebra metric 2 mtu 1500 advmss 1440 hoplimit 4294967295
2001:1b10:100:3::/64 dev eth0 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 4294967295
fe80::/64 dev eth0 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 4294967295
fe80::/64 dev teredo proto kernel metric 256 mtu 1280 advmss 1220 hoplimit 4294967295
ff00::/8 dev eth0 metric 256 mtu 1500 advmss 1440 hoplimit 4294967295
ff00::/8 dev teredo metric 256 mtu 1280 advmss 1220 hoplimit 4294967295
Working (all-hosts):
sendmsg(3, {msg_name(28)={sa_family=AF_INET6, sin6_port=htons(58), inet_pton(AF_INET6, "ff02::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, msg_iov(1)=[{"\200\0\0\0\25S\0\3\0\305\305H\212/\r\0\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37 !\"#$%&'()*+,-./01234567"..., 64}], msg_controllen=32, {cmsg_len=32, cmsg_level=SOL_IPV6, cmsg_type=, ...}, msg_flags=0}, MSG_CONFIRM) = 64
Non-working (RIPng group):
sendmsg(3, {msg_name(28)={sa_family=AF_INET6, sin6_port=htons(58), inet_pton(AF_INET6, "ff02::9", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, ...Did you specify the interface on those?
I don't know whether or not I should trust that sin_scope_id==0,
which Brian also mentioned.
For link-local addresses, it must be specified, though you can
get there from an SO_BINDTODEVICE too. The error if it isn't
there is EINVAL, but sin_scope_id is showing as 0 for both cases.
The send side shouldn't actually care anything about the group
number unless they matched different routes, which I don't see
in your routing table.
So, I think it's back to running it on a debug kernel that prints
something distinguishing from the different EINVAL cases
to narrow it down.
+-DLS
--
On Tue, 9 Sep 2008 02:38:53 +0200, Bernhard Schmidt <berni@birkenwald.de>
^^^^^^
Uho... that interface is not multicast-capable. Not sure how the kernel
handles the conflicting routes.
--
Rémi Denis-Courmont
--
Multicast programs shouldn't rely on the routing
table at all, IMAO. Unicast routing has nothing at all
to do with multicast routing, where a single address
means copying and forwarding to multiple different
segments, and the address has nothing at all to do with
the routing topology.
That's why the scope_id should be set (and to
the proper index for eth0), but I think either strace
is lying to us here or they're using SO_BINDTODEVICE
or IP_MULTICAST_IF.
But when it's breaking, I imagine it is ending
up on the wrong interface and thus the EINVAL... I
just don't see how some addresses still work while
others fail at the same time. :-)
+-DLS
--
I agree with you that it should not rely on it. However, it obviously did (in some way): miredo:~# ip -6 route del ff00::/8 dev teredo metric 256 mtu 1280 advmss 1220 hoplimit 4294967295 miredo:~# ping6 -I eth0 ff02::9 PING ff02::9(ff02::9) from fe80::216:3eff:feb9:29f5 eth0: 56 data bytes 64 bytes from fe80::216:3eff:feb9:29f5: icmp_seq=1 ttl=64 time=0.098 ms 64 bytes from fe80::2c0:9fff:fe4b:8ccf: icmp_seq=1 ttl=64 time=0.453 ms (DUP!) 64 bytes from fe80::20c:86ff:fe9a:3819: icmp_seq=1 ttl=64 time=0.467 ms (DUP!) 64 bytes from fe80::20c:86ff:fe9a:2819: icmp_seq=1 ttl=64 time=0.472 ms (DUP!) I'm not convinced that this route was the culprit though, I think it might be related to some sort of routing-table locking foo that got resolved when I changed something. I'll keep it running (will take one or two days max to reappear) and try deleting an arbitrary route when it happens again. Bernhard --
Bernhard,
It'll definitely fall back to using the routing table
when the interface is not specified already, which is why
I was ranting about it (I think it should fail without a unicast
routing table lookup in that case, and always require the
application to provide the interface).
What is odd about your case is you specified
"-I eth0" on the ping6, so it should never be doing a
route lookup (but it also should have a nonzero scope_id).
That could be a bug in ping6. If it's passing
uninitialized flags to sendmsg, it clearly needs some work.
Then the question is whether RIPng is explicitly
providing the interface for the sendmsg or not. If it isn't,
that's the problem, but I'd expect it to-- to operate correctly,
it needs to know what interface it's talking on.
+-DLS
--
I wonder if it's relevant that msg_iov in working case has MSG_CONFIRM but in non-working case this is zero? I guess not but apart from the contents of msg_iov, that's the only difference.. -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings --
From: Pekka Savola <pekkas@netcore.fi> MSG_CONFIRM refreshes the route's neighbour ->confirmed timestamp with the current value of jiffies. I really can't see how that could lead to the observed behavior, but who knows at this point :) --
| Jesse Barnes | Re: [stable] [BUG][PATCH] cpqphp: fix kernel NULL pointer dereference |
| Greg KH | [003/136] p54usb: add Zcomax XG-705A usbid |
| Magnus Damm | [PATCH 03/07] ARM: Use shared GIC entry macros on Realview |
| Oliver Neukum | Re: [Bug #13682] The webcam stopped working when upgrading from 2.6.29 to 2.6.30 |
| Martin Schwidefsky | Re: [PATCH] optimized ktime_get[_ts] for GENERIC_TIME=y |
git: | |
| Junio C Hamano | Re: Some advanced index playing |
| Jeff King | Re: confusion over the new branch and merge config |
| Robin Rosenberg | Re: cvs2svn conversion directly to git ready for experimentation |
| Linus Torvalds | git binary size... |
| Ævar Arnfjörð Bjarmason | Re: Challenge with Git-Bash |
| Linux Kernel Mailing List | md: move allocation of ->queue from mddev_find to md_probe |
| Linux Kernel Mailing List | md: raid0: Represent zone->zone_offset in sectors. |
| Linux Kernel Mailing List | [ARM] S3C24XX: Add gpio_to_irq() facility |
| Linux Ke |
