Jul 8 18:03:59 doris kernel: ------------[ cut here ]------------ Jul 8 18:03:59 doris kernel: kernel BUG at fs/sysfs/file.c:540! Jul 8 18:03:59 doris kernel: invalid opcode: 0000 [#1] PREEMPT SMP Jul 8 18:03:59 doris kernel: last sysfs file: /sys/devices/virtual/misc/tun/dev Jul 8 18:03:59 doris kernel: Modules linked in: tun snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device vboxnetadp vboxnetflt vboxdrv ipv6 cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq speedstep_lib freq_table ipt_REJECT ipt_LOG xt_limit xt_recent xt_state xt_tcpudp iptable_mangle iptable_nat iptable_filter nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack ip_tables x_tables fuse nls_utf8 loop snd_hda_codec_realtek arc4 snd_hda_intel ecb snd_hda_codec iwlagn snd_pcm iwlcore snd_timer mac80211 snd cfg80211 soundcore intel_agp video usb_storage agpgart i2c_i801 rfkill snd_page_alloc output button battery ac joydev sg evdev tg3 edd ext4 jbd2 crc16 sha256_generic aes_i586 aes_generic cbc dm_crypt linear rtc_cmos uhci_hcd rtc_core rtc_lib sd_mod crc_t10dif ehci_hcd usbcore dm_snapshot dm_mod fan processor thermal [last unloaded: tun] Jul 8 18:03:59 doris kernel: Jul 8 18:03:59 doris kernel: Pid: 4320, comm: tunctl Not tainted 2.6.34.1 #3 Kuril /40684JG Jul 8 18:03:59 doris kernel: EIP: 0060:[<c03129e1>] EFLAGS: 00010246 CPU: 0 Jul 8 18:03:59 doris kernel: EIP is at sysfs_create_file+0x21/0x30 Jul 8 18:03:59 doris kernel: EAX: 00000000 EBX: f66a73c0 ECX: f66a72bc EDX: f81587b4 Jul 8 18:03:59 doris kernel: ESI: 00000000 EDI: f67a7f00 EBP: f68ade90 ESP: f68ade90 Jul 8 18:03:59 doris kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Jul 8 18:03:59 doris kernel: Process tunctl (pid: 4320, ti=f68ac000 task=f66e8f40 task.ti=f68ac000) Jul 8 18:03:59 doris kernel: Stack: Jul 8 18:03:59 doris kernel: f68ade98 c03e6da3 f68adf00 f8157fe0 f8158700 f48496c0 00000001 f66a7000 Jul 8 18:03:59 doris ...
Network namespaces didn't go in until 2.6.35-rc, how is this working on .34.1? Can you verify this works properly in .35-rc4? thanks, greg k-h --
Hi, On Fri, 9 Jul 2010 16:57:44 -0700 Huh? Network namespaces were there for quite some time - just for fun I grabbed a random older kernel (got 2.6.27.48) to check. CONFIG NET_NS was also in there (but was defined in net/Kconfig and had a depency on !SYSFS). In 2.6.34 it is defined in init/Kconfig (and has no depency on SYSFS at all). Indeed, it works there. There are several differencies between 2.6.34.1 and 2.6.35-rc4 in fs/sysfs/* stuff as well as in e.g. net/core/net_namespaces.c. I'll bisect that (but not on this atom netbook I used so far ;-) ). -- MfG, Michael Leun --
I'd not recommend using it until .35-rc because of the sysfs changes required. sorry, greg k-h --
On Sat, 10 Jul 2010 07:08:00 -0700 ...and knowing now, that basic sysfs support for netns was missing all the time it works surprisingly well most time. One point might be, that I never used interfaces with same name in git bisect seems to have a somewhat pessimistic nature - it only wants to help find out when something got broken, not when something got fixed (good rev cannot be greater then bad rev). So I had to do some manual mumbo jumbo to find out that a1b3f594dc5faab91d3a218c7019e9b5edd9fe1a seems to be the one finally fixing it, but of course depending on a zillion other ones, most notably the ones adding sysfs support for namespaces at all... I think I'll manage to understand that they will not fit in a stabile A I said, I'm using it quite a while now and it works surprisingly (when knowing what is missing) well in scenarios I used so far. Thank you very much for your support. -- MfG, Michael Leun --
Hi, On Sat, 10 Jul 2010 16:52:08 +0200 [bug solved in 2.6.35-rcX - I used 2.6.34.1] Now that we have solved that last one I've another glitch (this time using 2.6.35-rc4): In an network namespace I can use an tun/tap tunnel through ssh and when closing that namespace then eveything is fine. But when using openvpn (also tunnel trough tun/tap) in an network namespace and then closing that namespace I get: unregister_netdevice: waiting for lo to become free [repeated] Please see the following two examples showing that difference: # > unshare -n /bin/bash # > # how to setup veth device pair to get connectivity into namespace not shown here # > tunctl -u ml -t tap1 # > ssh -o Tunnel=Ethernet -w 1:1 somewhere [ running some traffic over tap1 not shown here ] ^d # logging out from somewhere # > tunctl -d tap1 # > exit # logging out from shell in network namespace Now the veth device pair used automagically vanishes and nothing from that different network namespace remains - very well. but # > unshare -n /bin/bash # > # how to setup veth device pair to get connectivity into namespace not shown here # > openvpn --config some.config [ running some traffic over vpn device not shown here ] ^c # stopping openvpn # > lsof -i # > netstat -an Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State Active UNIX domain sockets (servers and established) Proto RefCnt Flags Type State I-Node Path # > ps ax|grep openvpn|grep -v grep # > # cannot find anything that suggests there is anything left from that openvpn session # > exit # logging out from shell in network namespace Now I get Jul 10 20:02:36 doris kernel: unregister_netdevice: waiting for lo to become free. Usage count = 3 [repeated] Now one might say it is fault of openvpn (used OpenVPN 2.1_rc20 i586-suse-linux - the one in openSuSE 11.2 package), openvpn didn't close some ressource and ssh does fine. But: ...
Yes, you are correct. Care to resend all of this to the network-namespace developer(s) and the netdev mailing list so that the correct people are notified so they can fix it all? thanks, greg k-h --
Hi, On Sat, 10 Jul 2010 16:52:08 +0200 [bug (no sysfs support for net namespaces at all) solved in 2.6.35-rcX - I used 2.6.34.1] Now that we have solved that last one I've another glitch (this time using 2.6.35-rc4): In an network namespace I can use an tun/tap tunnel through ssh and when closing that namespace then eveything is fine. But when using openvpn (also tunnel trough tun/tap) in an network namespace and then closing that namespace I get: unregister_netdevice: waiting for lo to become free [repeated] Please see the following two examples showing that difference: # > unshare -n /bin/bash # > # how to setup veth device pair to get connectivity into namespace not shown here # > tunctl -u ml -t tap1 # > ssh -o Tunnel=Ethernet -w 1:1 somewhere [ running some traffic over tap1 not shown here ] ^d # logging out from somewhere # > tunctl -d tap1 # > exit # logging out from shell in network namespace Now the veth device pair used automagically vanishes and nothing from that different network namespace remains - very well. but # > unshare -n /bin/bash # > # how to setup veth device pair to get connectivity into namespace not shown here # > openvpn --config some.config [ running some traffic over vpn device not shown here ] ^c # stopping openvpn # > lsof -i # > netstat -an Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State Active UNIX domain sockets (servers and established) Proto RefCnt Flags Type State I-Node Path # > ps ax|grep openvpn|grep -v grep # > # cannot find anything that suggests there is anything left from that openvpn session # > exit # logging out from shell in network namespace Now I get Jul 10 20:02:36 doris kernel: unregister_netdevice: waiting for lo to become free. Usage count = 3 [repeated] Now one might say it is fault of openvpn (used OpenVPN 2.1_rc20 i586-suse-linux - the one in openSuSE 11.2 package), openvpn didn't close ...
Hi, On Sun, 11 Jul 2010 19:29:39 +0200 Michael Leun <lkml20100708@newton.leun.net> wrote: Did'nt work. Got no reaction from network mailinglist at all and bug still is in 2.6.35. -- MfG, Michael Leun --
Eric, here's a bug with the network namespace stuff, care to work on resolving it? thanks, greg k-h --
Does this repeat indefinitely, or are there only a couple of repetitions? If this repeats indefinitely every 5 seconds or so we have a serious bug. Otherwise we just have cleanup taking longer than it should, which isn't Greg thanks for forwarding this in my direction. Eric --
Hi, On Wed, 4 Aug 2010 14:46:18 -0700 Just in case I provide the complete scenario again below. If I can help somehow (provide further information, test something...) of course I'll happily do so. In an network namespace I can use an tun/tap tunnel through ssh and when closing that namespace then eveything is fine. But when using openvpn (also tunnel trough tun/tap) in an network namespace and then closing that namespace I get: unregister_netdevice: waiting for lo to become free [repeated] Please see the following two examples showing that difference: # > unshare -n /bin/bash # > # how to setup veth device pair to get connectivity into namespace not shown here # > tunctl -u ml -t tap1 # > ssh -o Tunnel=Ethernet -w 1:1 somewhere [ running some traffic over tap1 not shown here ] ^d # logging out from somewhere # > tunctl -d tap1 # > exit # logging out from shell in network namespace Now the veth device pair used automagically vanishes and nothing from that different network namespace remains - very well. but # > unshare -n /bin/bash # > # how to setup veth device pair to get connectivity into namespace not shown here # > openvpn --config some.config [ running some traffic over vpn device not shown here ] ^c # stopping openvpn # > lsof -i # > netstat -an Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State Active UNIX domain sockets (servers and established) Proto RefCnt Flags Type State I-Node Path # > ps ax|grep openvpn|grep -v grep # > # cannot find anything that suggests there is anything left from that openvpn session # > exit # logging out from shell in network namespace Now I get Jul 10 20:02:36 doris kernel: unregister_netdevice: waiting for lo to become free. Usage count = 3 [repeated] Now one might say it is fault of openvpn (used OpenVPN 2.1_rc20 i586-suse-linux - the one in openSuSE 11.2 package - EDIT: meanwhile it is 2.1.1, openSuSE 11.3 ), ...
We do, and the only place you will see: unregister_netdevice: waiting for lo to become free. Usage count = 3 [repeated] is when the a network namespace is being cleaned up. However it looks like something is either taking a long time to get cleaned up, or there is a bug and something is failing to get cleaned up altogether thus resulting in an infinite stream of messages about waiting for lo to become free. I know of cases where a recent kernel can be slow to cleanup everything attached to lo. I don't know of any cases where it will actually fail to clean up lo. So I suspect all you are seeing is clean up process that is slow and annoying not wrong. Eric --
Hi, On Wed, 04 Aug 2010 17:12:29 -0700 ebiederm@xmission.com (Eric W. Biederman) wrote: First, thank you very much for picking that up (and, of course, for Unfortunately looks like indefinitely. Never watched longer so far (rebooted soon), but I'm seeing this message now repeated every 10 secs for ~10 minutes on a idle system. Additionally when testing this I found another one (by accident started my firewall script in that namespace...) - using netfilter RECENT Aug 5 11:19:47 doris kernel: [ 218.420238] ------------[ cut here ]------------ Aug 5 11:19:47 doris kernel: [ 218.420256] kernel BUG at net/netfilter/xt_recent.c:609! Aug 5 11:19:47 doris kernel: [ 218.420268] invalid opcode: 0000 [#1] PREEMPT SMP Aug 5 11:19:47 doris kernel: [ 218.420284] last sysfs file: /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0A:00/power_supply/BAT0/charge_full Aug 5 11:19:47 doris kernel: [ 218.420295] Modules linked in: snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device edd cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf ipt_REJECT ipt_LOG xt_limit xt_recent iptable_mangle iptable_nat iptable_filter nf_conntrack_ipv6 xt_state xt_tcpudp ip6table_filter ip6_tables nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack ip_tables x_tables fuse nls_utf8 loop arc4 ecb iwlagn iwlcore mac80211 snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm cfg80211 snd_timer snd usb_storage iTCO_wdt soundcore tg3 sg iTCO_vendor_support snd_page_alloc rfkill i2c_i801 pcspkr battery ac ext4 jbd2 crc16 sha256_generic aes_generic cbc dm_crypt linear i915 drm_kms_helper drm i2c_algo_bit sd_mod video intel_agp button dm_snapshot dm_mod fan processor ata_piix ahci libahci libata scsi_mod thermal thermal_sys Aug 5 11:19:47 doris kernel: [ 218.420529] Aug 5 11:19:47 doris kernel: [ 218.420542] Pid: 13, comm: netns Not tainted 2.6.35 #2 Kuril /40684JG Aug 5 11:19:47 doris kernel: [ 218.420557] EIP: ...
Ugh. A real bug then. These can be a pain to track down and fix. I think the last one of these I tracked down took a couple of weeks. I Micheal this is on 2.6.35? Alexey can you look at this BUG_ON? It looks like there has been a regression Eric --
On Thu, 05 Aug 2010 02:51:29 -0700 As I said, if I can do anything to support you, testing or so, please let me know. Yup - almost vanilla 2.6.35, only patches for aufs (union filesystem) [...] -- MfG, Michael Leun --
On Thu, 05 Aug 2010 02:51:29 -0700 OK, fortunately (hopefully) you have not put to much time onto that so far - because everything I told about usage of tun and difference between ssh and openvpn is complete nonsense. I happen to have an script in that openvpn config, which puts an ipv6 address on the vpn device. Putting an ipv6 address on a device seems to be the trigger: OrigNS > # ip link add type veth OrigNS > # ip link set dev veth0 up OrigNS > # unshare -n /bin/bash NewNS > # echo $$ <SomePID> OrigNS > # ip link set dev veth1 netns <SomePID> # this, of course is on a different terminal NewNS > # ip link set dev veth1 up NewNS > # ip -6 addr add dev veth1 fd50:dead:beef::1/64 NewNS > # exit Yields kernel: unregister_netdevice: waiting for veth1 to become free. Usage count = 3 Oh - its veth1 this time, not lo - add an "ip link set up dev lo" in the above scenario just after the unshare, and you get the message with lo. also does the trick, so I tried it - and it does NOT. In the above scenario, not setting veth0 and veth1 up also makes it not happen. Only setting veth1 up also is not enough (seems to need to be "really up" what as you shurely know with veth is only the case when both sides are up). I hope, this makes it somewhat easier to track that down. -- MfG, Michael Leun --
What puzzles me is that on a slightly patched 2.6.32 (so sysfs works) and I am doing very similar things (openvpn tunnels, ipv6 to the network as a whole etc), and I am not seeing the infinite unregister_netdevice: messages you are talking about. When a network device is removed most references to it are redirected to the loopback device so a normal network device should not see the worst of the problems. That is why lo showed up. In that context I'm a bit surprised you managed trigger a problem on veth1. I wonder what has changed with ipv6 recently. Eric --
From: ebiederm@xmission.com (Eric W. Biederman) There was a recent fix to the IGMP snooping code we have in the bridging layer, if parsing of an ipv6 IGMP packet failed we'd leak the packet (and thus references to whatever device it referenced). commit 6d1d1d398cb7db7a12c5d652d50f85355345234f Author: Herbert Xu <herbert@gondor.apana.org.au> Date: Thu Jul 29 01:12:31 2010 +0000 bridge: Fix skb leak when multicast parsing fails on TX On the bridge TX path we're leaking an skb when br_multicast_rcv returns an error. Reported-by: David Lamparter <equinox@diac24.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c index 4cec805..f49bcd9 100644 --- a/net/bridge/br_device.c +++ b/net/bridge/br_device.c @@ -48,8 +48,10 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev) rcu_read_lock(); if (is_multicast_ether_addr(dest)) { - if (br_multicast_rcv(br, NULL, skb)) + if (br_multicast_rcv(br, NULL, skb)) { + kfree_skb(skb); goto out; + } mdst = br_mdb_get(br, skb); if (mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) --
On Thu, 05 Aug 2010 13:11:28 -0700 (PDT) [...] But this patch is not in 2.6.35 and therefore cannot make the difference Eric sees (belives to see) between his modified 2.6.32 and 2.6.35. Also, this patch, if I understand that correctly, only changes bridging and in my scenario bridge.ko (have it as module) was not even loaded, so applying this patch should not make any difference for the bug I see, or do I overlook something? So, I guess, your answer was general information to Erics question what changed with ipv6, not related to that bug we seek in particular? -- MfG, Michael Leun --
On Thu, 05 Aug 2010 12:57:59 -0700 Hmmm, I think there are 2 possibilities: - You send me a patch against plain 2.6.32, so I can check my scenarios against that kernel or - You could try yourself, its really just that few lines against a fresh booted system in a clean, easy to reproduce state Difference was, when that message showed up with veth1, lo in that namespace was down while testing. When lo was up it showed up on lo. -- MfG, Michael Leun --
Hi, unfortunately the bug described below originally reported in 2.6.35-rcX is still there in 2.6.36. Is there anything I might do to help fix it (besides fixing it myself, I do not have the knowhow)? On Thu, 5 Aug 2010 13:47:07 +0200 Michael Leun <lkml20100708@newton.leun.net> wrote: -- MfG, Michael Leun --
can you post your full kernel .config? I'm using network namespaces quite extensively - OpenVPN and IPv6 included - and i haven't hit this bug. That makes it rather likely it depends on some option difference. btw, while the bridging bug is unrelated, it might be the same kind of origin - a skb not being free'd. can you leave the namespace running for a few minutes and check whether the usage count number is higher then? -David --
Hi, On Fri, 22 Oct 2010 14:48:58 +0200 I take candidates for options to change for test... ;-) Did'nt notice so far: in 2.6.35-rcX usage count was 3 every time if I remember correctly, in 2.6.36 usage count is 1, regardless if I exit the namespace immediately or half an hour later. ml@doris:~> su Passwort: doris:/home/ml # ip link add type veth doris:/home/ml # ip link set dev veth0 up doris:/home/ml # unshare -n /bin/bash doris:/home/ml # echo $$ 3942 doris:/home/ml # ip link set dev veth1 up doris:/home/ml # ip -6 addr add dev veth1 fd50:dead:beef::1/64 doris:/home/ml # date Fr 22. Okt 18:17:09 CEST 2010 doris:/home/ml # date; exit Fr 22. Okt 18:49:02 CEST 2010 exit doris:/home/ml # Message from syslogd@doris at Oct 22 18:49:12 ... kernel:[ 2093.460240] unregister_netdevice: waiting for veth1 to become free. Usage count = 1 Message from syslogd@doris at Oct 22 18:49:22 ... kernel:[ 2103.500166] unregister_netdevice: waiting for veth1 to become free. Usage count = 1 Message from syslogd@doris at Oct 22 18:49:32 ... kernel:[ 2113.540203] unregister_netdevice: waiting for veth1 to become free. Usage count = 1 Message from syslogd@doris at Oct 22 18:49:42 ... kernel:[ 2123.580209] unregister_netdevice: waiting for veth1 to become free. Usage count = 1 Message from syslogd@doris at Oct 22 18:49:52 ... kernel:[ 2133.620200] unregister_netdevice: waiting for veth1 to become free. Usage count = 1 Message from syslogd@doris at Oct 22 18:50:02 ... kernel:[ 2143.660215] unregister_netdevice: waiting for veth1 to become free. Usage count = 1 Message from syslogd@doris at Oct 22 18:50:12 ... kernel:[ 2153.700201] unregister_netdevice: waiting for veth1 to become free. Usage count = 1 Message from syslogd@doris at Oct 22 18:50:22 ... kernel:[ 2163.740187] unregister_netdevice: waiting for veth1 to become free. Usage count = 1 Message from syslogd@doris at Oct 22 18:50:32 ... kernel:[ 2173.780169] unregister_netdevice: waiting for veth1 to become ...
On Fri, 22 Oct 2010 19:05:32 +0200 Now I have tried a kernel config with almost everything disabled - please find below. If I have overlooked anything I should disable (or if you suggest I should not disable, but enable any option) - please tell me. I've created an ~2.5MB bootable ISO image based on buildroot/busybox with this kernel, which also shows this bug (e.g. when run in VirtualBox). If anybody is interested I could send this image as mail attachment or provide a download link. ml@xenia:~> grep -v "^#" .config |grep -v "^ *$" CONFIG_X86_32=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf32-i386" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_DEFAULT_IDLE=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_HAVE_EARLY_RES=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_X86_32_LAZY_GS=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx ...
Hi, curiously, i'm facing a similar problem in 2.6.36.1 in my container, when i configure ipv6 adress on the interfaces, everything seems good on the first boot of the host. If i shutdown my container (lxc), then boot it, i observe the following logs: Dec 6 17:04:12 suntory.u06.univ-nantes.prive kernel: [ 368.192019] unregister_netdevice: waiting for lo to become free. Usage count = 4 Dec 6 17:04:22 suntory.u06.univ-nantes.prive kernel: [ 378.432018] unregister_netdevice: waiting for lo to become free. Usage count = 4 Dec 6 17:04:32 suntory.u06.univ-nantes.prive kernel: [ 388.672015] unregister_netdevice: waiting for lo to become free. Usage count = 4 Dec 6 17:04:42 suntory.u06.univ-nantes.prive kernel: [ 398.912016] unregister_netdevice: waiting for lo to become free. Usage count = 4 Dec 6 17:04:53 suntory.u06.univ-nantes.prive kernel: [ 409.152016] unregister_netdevice: waiting for lo to become free. Usage count = 4 Dec 6 17:05:03 suntory.u06.univ-nantes.prive kernel: [ 419.392018] unregister_netdevice: waiting for lo to become free. Usage count = 4 Dec 6 17:05:13 suntory.u06.univ-nantes.prive kernel: [ 429.632018] unregister_netdevice: waiting for lo to become free. Usage count = 4 Dec 6 17:05:23 suntory.u06.univ-nantes.prive kernel: [ 439.876016] unregister_netdevice: waiting for lo to become free. Usage count = 4 Dec 6 17:05:34 suntory.u06.univ-nantes.prive kernel: [ 450.116015] unregister_netdevice: waiting for lo to become free. Usage count = 4 Dec 6 17:05:44 suntory.u06.univ-nantes.prive kernel: [ 460.356019] unregister_netdevice: waiting for lo to become free. Usage count = 4 Dec 6 17:05:54 suntory.u06.univ-nantes.prive kernel: [ 470.596020] unregister_netdevice: waiting for lo to become free. Usage count = 4 Dec 6 17:06:04 suntory.u06.univ-nantes.prive kernel: [ 480.836019] unregister_netdevice: waiting for lo to become free. Usage count = 4 Dec 6 17:06:05 suntory.u06.univ-nantes.prive kernel: [ 481.468021] INFO: task ...
2.6.37-rc4 is working here. There were problems earlier in the earlier rcs. Can you try that? There have been a couple of different reference counting bugs between 2.6.34 and the present, and I haven't tracked them, just noticed they exist. Eric --
On Mon, 06 Dec 2010 13:22:00 -0800 Can reproduce the following still on 2.6.36.1, but NOT on 2.6.37-rc4 - so indeed seems to be fixed! Putting an ipv6 address on a device seems to be the trigger: OrigNS > # ip link add type veth OrigNS > # ip link set dev veth0 up OrigNS > # unshare -n /bin/bash NewNS > # echo $$ <SomePID> OrigNS > # ip link set dev veth1 netns <SomePID> # this, of course is on a different terminal NewNS > # ip link set dev veth1 up NewNS > # ip -6 addr add dev veth1 fd50:dead:beef::1/64 NewNS > # exit Yields kernel: unregister_netdevice: waiting for veth1 to become free. Usage count = 3 -- MfG, Michael Leun --
Hi, thanks for the info. Unfortunately, i have another error with a kernel 2.6.37-rc5, already related to iov6 Dec 7 11:36:54 suntory.u06.univ-nantes.prive kernel: [ 1454.803477] ------------[ cut here ]------------ Dec 7 11:36:54 suntory.u06.univ-nantes.prive kernel: [ 1454.803554] WARNING: at net/ipv6/ip6_fib.c:1172 fib6_del+0x3e/0x2ce [ipv6]() Dec 7 11:36:54 suntory.u06.univ-nantes.prive kernel: [ 1454.803616] Hardware name: PowerEdge M605 Dec 7 11:36:54 suntory.u06.univ-nantes.prive kernel: [ 1454.803673] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat ipt_REJECT veth fuse xt_physdev ip6t_LOG ip6table_filter ip6_tables ipt_LOG xt_multiport xt_limit xt_tcpudp xt_state iptable_filter ip_tables x_tables nf_conntrack_tftp nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 8021q bridge stp ext2 mbcache dm_round_robin dm_multipath nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 ipv6 snd_pcm snd_timer snd soundcore snd_page_alloc tpm_tis tpm psmouse tpm_bios i2c_nforce2 pcspkr shpchp serio_raw pci_hotplug i2c_core button joydev ghes hed evdev dcdbas processor thermal_sys xfs exportfs dm_mod btrfs zlib_deflate crc32c libcrc32c sg sr_mod cdrom usbhid hid usb_storage ses sd_mod enclosure megaraid_sas lpfc ohci_hcd scsi_transport_fc scsi_tgt scsi_mod bnx2 ehci_hcd [last unloaded: scsi_wait_scan] Dec 7 11:36:54 suntory.u06.univ-nantes.prive kernel: [ 1454.806871] Pid: 5, comm: kworker/u:0 Not tainted 2.6.37-rc5-dsiun-1a #2 Dec 7 11:36:54 suntory.u06.univ-nantes.prive kernel: [ 1454.806931] Call Trace: Dec 7 11:36:54 suntory.u06.univ-nantes.prive kernel: [ 1454.806991] [<ffffffff81040f56>] ? warn_slowpath_common+0x78/0x8c Dec 7 11:36:54 suntory.u06.univ-nantes.prive kernel: [ 1454.807057] [<ffffffffa02df719>] ? fib6_del+0x3e/0x2ce [ipv6] Dec 7 11:36:54 suntory.u06.univ-nantes.prive kernel: [ 1454.807118] [<ffffffff812d69c8>] ? schedule+0x79d/0x846 Dec 7 11:36:54 suntory.u06.univ-nantes.prive kernel: [ 1454.807182] [<ffffffffa02df9ed>] ? ...
I once spent a similar amount of time putting in debug variants that printed info for each time a netdev was acquired and released. Maybe a similar logic could be put into the official kernel (and disabled by default)? That should save effort in the long run, I'd think. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com --
