From: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Yes, we discussed the locking issue over past few days. See the thread: "stuck localhost TCP connections, v2.6.26-rc3+" More and more, the arguments are mounting to completely revert the established code path changes, and frankly that is likely what I am going to do by the end of today. --
the 3 reverts have been extensively tested in -tip via: # tip/out-of-tree: 9e5b6ca: tcp: revert DEFER_ACCEPT modifications and the distcc problems are fixed. (The locking fix alone did not fix it conclusively in my testing, possibly due to the follow-on observations outlined in your description.) Tested-by: Ingo Molnar <mingo@elte.hu> Ingo --
From: Ingo Molnar <mingo@elte.hu> I didn't revert all three changes, just the final part of that 3 part series. Please test the patch I actually applied. --
i just updated all my testsystems to revert the change i tested so far,
and updated it to yours. The delta between the two is the 3 lines patch
below.
A few testsystems already booted into your patch, so if i dont report a
hung TCP connection in the next 6 hours consider it:
Tested-by: Ingo Molnar <mingo@elte.hu>
Ingo
--------------->
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index ec83448..045e799 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -466,9 +466,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
reqp=&lopt->syn_table[i];
while ((req = *reqp) != NULL) {
if (time_after_eq(now, req->expires)) {
- if ((req->retrans < thresh ||
- (inet_rsk(req)->acked && req->retrans < max_retries))
- && !req->rsk_ops->rtx_syn_ack(parent, req)) {
+ if ((req->retrans < (inet_rsk(req)->acked ? max_retries : thresh)) &&
+ (inet_rsk(req)->acked ||
+ !req->rsk_ops->rtx_syn_ack(parent, req))) {
unsigned long timeo;
if (req->retrans++ == 0)
--
this threw the warning below - never saw that before in thousands of bootups and this was the only networking change that happened. config and bootlog attached. Might be unlucky coincidence. Ingo [ 173.354049] NETDEV WATCHDOG: eth0: transmit timed out [ 173.354148] ------------[ cut here ]------------ [ 173.354221] WARNING: at net/sched/sch_generic.c:222 dev_watchdog+0x9a/0xec() [ 173.354298] Modules linked in: [ 173.354421] Pid: 13452, comm: cc1 Tainted: G W 2.6.26-rc6-00273-g81ae43a-dirty #2573 [ 173.354516] [<c01250ca>] warn_on_slowpath+0x46/0x76 [ 173.354641] [<c011d428>] ? try_to_wake_up+0x1d6/0x1e0 [ 173.354815] [<c01411e9>] ? trace_hardirqs_off+0xb/0xd [ 173.357370] [<c011d43d>] ? default_wake_function+0xb/0xd [ 173.357370] [<c014112a>] ? trace_hardirqs_off_caller+0x15/0xc9 [ 173.357370] [<c01411e9>] ? trace_hardirqs_off+0xb/0xd [ 173.357370] [<c0142c83>] ? trace_hardirqs_on+0xb/0xd [ 173.357370] [<c0142b33>] ? trace_hardirqs_on_caller+0x16/0x15b [ 173.357370] [<c0142c83>] ? trace_hardirqs_on+0xb/0xd [ 173.357370] [<c06bb3c9>] ? _spin_unlock_irqrestore+0x5b/0x71 [ 173.357370] [<c0133d46>] ? __queue_work+0x2d/0x32 [ 173.357370] [<c0134023>] ? queue_work+0x50/0x72 [ 173.357483] [<c0134059>] ? schedule_work+0x14/0x16 [ 173.357654] [<c05c59b8>] dev_watchdog+0x9a/0xec [ 173.357783] [<c012d456>] run_timer_softirq+0x13d/0x19d [ 173.357905] [<c05c591e>] ? dev_watchdog+0x0/0xec [ 173.358073] [<c05c591e>] ? dev_watchdog+0x0/0xec [ 173.360804] [<c0129ad7>] __do_softirq+0xb2/0x15c [ 173.360804] [<c0129a25>] ? __do_softirq+0x0/0x15c [ 173.360804] [<c0105526>] do_softirq+0x84/0xe9 [ 173.360804] [<c0129996>] irq_exit+0x4b/0x88 [ 173.360804] [<c010ec7a>] smp_apic_timer_interrupt+0x73/0x81 [ 173.360804] [<c0103ddd>] apic_timer_interrupt+0x2d/0x34 [ 173.360804] ======================= [ 173.360804] ---[ end trace a7919e7f17c0a725 ]--- [ 173.396182] evbug.c: Event. Dev: <NULL>, Type: 0, Code: 0, Value: 0 [ ...
hm, threw a second warning after 6 more hours of testing:
[ 362.170209] WARNING: at net/sched/sch_generic.c:222 dev_watchdog+0xde/0xf0()
that appears to be more than just coincidence. I've applied the patch
below - which brings me back to the well-tested revert from Ilpo.
This is the only change i've done for the overnight -tip testruns, so if
the warning from sch_generic.c goes away it's this change that has an
impact on that warning.
Ingo
--------------------->
commit 3019ae9652fe44c099669e5dba116acad583cfcb
Author: Ingo Molnar <mingo@elte.hu>
Date: Fri Jun 13 23:09:28 2008 +0200
tcp: revert again
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 045e799..ec83448 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -466,9 +466,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
reqp=&lopt->syn_table[i];
while ((req = *reqp) != NULL) {
if (time_after_eq(now, req->expires)) {
- if ((req->retrans < (inet_rsk(req)->acked ? max_retries : thresh)) &&
- (inet_rsk(req)->acked ||
- !req->rsk_ops->rtx_syn_ack(parent, req))) {
+ if ((req->retrans < thresh ||
+ (inet_rsk(req)->acked && req->retrans < max_retries))
+ && !req->rsk_ops->rtx_syn_ack(parent, req)) {
unsigned long timeo;
if (req->retrans++ == 0)
--
From: Ingo Molnar <mingo@elte.hu>
So that we can make forward progress here, please confirm that the
following patch against -tip makes your problems go away for good.
Once you can confirm I will push it to Linus.
Thanks!
tcp: Revert reset of deferred accept changes in 2.6.26
Ingo's system is still seeing strange behavior, and he
reports that is goes away if the rest of the deferred
accept changes are reverted too.
Therefore this reverts e4c78840284f3f51b1896cf3936d60a6033c4d2c
("[TCP]: TCP_DEFER_ACCEPT updates - dont retxmt synack") and
539fae89bebd16ebeafd57a87169bc56eb530d76 ("[TCP]: TCP_DEFER_ACCEPT
updates - defer timeout conflicts with max_thresh").
Just like the other revert, these ideas can be revisited for
2.6.27
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/ipv4/inet_connection_sock.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 045e799..ec83448 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -466,9 +466,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
reqp=&lopt->syn_table[i];
while ((req = *reqp) != NULL) {
if (time_after_eq(now, req->expires)) {
- if ((req->retrans < (inet_rsk(req)->acked ? max_retries : thresh)) &&
- (inet_rsk(req)->acked ||
- !req->rsk_ops->rtx_syn_ack(parent, req))) {
+ if ((req->retrans < thresh ||
+ (inet_rsk(req)->acked && req->retrans < max_retries))
+ && !req->rsk_ops->rtx_syn_ack(parent, req)) {
unsigned long timeo;
if (req->retrans++ == 0)
--
1.5.5.1.308.g1fbb5
--
i triggered the net/sched/sch_generic.c:222 warning once more meanwhile (yesterday) with the full revert applied (which i think is the same as the patch below). So i think it's either some unlucky coincidence or some timing relationship - perhaps the change impacts packet ordering for certain workload patterns? [but that same condition can occur without that patch too] I also checked kerneloops.org and this warning seems to have been reported by others as well - although it's not triggering heavily. In some of those other reports the warning came together with a dead interface, while in my case it's just a warning with still working networking. So since there's no clear bug pattern and no sure reproducability on my side i'd suggest we track this problem separately and "do nothing" right now. I've excluded this warning from my 'is the freshly booted kernel buggy' list of conditions of -tip testing so it's not holding me up. and i can apply any test-patch if that would be helpful - if it does a WARN_ON() i'll notice it. (pure extra debug printks with no stack trace are much harder to notice in automated tests) btw., it would be nice if there was some .config driven networking debug option that randomized packet ordering in the tx and rx queue. (transparently enabled, with zero-config on the userspace side) I.e. it would have an (expensive, because O(1)) debug mechanism that randomized things - it would insert new packets into a random place within the queue where it gets queued. We could hit races and rarer codepaths much sooner that way - as especially in LAN based testing there's a strong natural ordering of packets so randomizing it artificially looks promising to me. If you make that new option =y enable-able in the .config(dependent on DEBUG_KERNEL && default off, etc.), and as long as it does not have to be configured on the userspace side (i'm testing unmodified userspace images with default distro installs, etc.) the randconfig ...
From: Ingo Molnar <mingo@elte.hu> I'm going to push the revert through just to be safe and I think it's a good idea to do so because all of those defer accept changes should I don't have time to work on your bug, sorry. Someone else will have to step forward and help you with it. FWIW I don't think your TX timeout problem has anything to do with packet ordering. The TX element of the network device is totally stateless, but it's hanging under some set of circumstances to the point where we timeout and reset the hardware to get it going again. --
okay - in that case the full revert is well-tested on my side as well,
fwiw.
it's not really "my bug" - i just offered help to debug someone else's
bug :-) This is pretty common hw so i guess there will be such reports.
Let me describe what i'm doing exactly: i do a lot of randomized testing
on about a dozen real systems (all across the x86 spectrum) so i tend to
trigger a lot of mainline bugs pretty early on.
My collection of kernel bugs for the last 8 months shows 1285 bugs
(kernel crashes or build failures - about 50%/50%) triggered. One
test-system alone has a serial log of 15 gigabytes - and there's a dozen
of them. That's about 5 kernel bugs a day handled by me, on average.
These systems have about 10 times the hardware variability of your
Niagara system for example, and many of them are rather difficult to
debug (laptops without serial port, etc.). So i physically cannot avoid
and debug all bugs on all my test-systems, like you do on the Niagara. I
will report bugs, i'll bisect anything that is bisectable (on average i
bisect once a day), and i can add patches and report any test-results,
and i'll of course debug any bugs that look like heavy mainline
ok. That's e1000 then. Cc:s added. Stock T60 laptop, 32-bit:
02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
Subsystem: Lenovo ThinkPad T60
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at ee000000 (32-bit, non-prefetchable) [size=128K]
I/O ports at 2000 [size=32]
Capabilities: <access denied>
Kernel driver in use: e1000
the problem is this non-fatal warning showing up after bootup,
sporadically, in a non-reproducible way:
[ 173.354049] NETDEV WATCHDOG: eth0: transmit timed out
[ 173.354148] ------------[ cut here ]------------
[ 173.354221] WARNING: at net/sched/sch_generic.c:222 dev_watchdog+0x9a/0xec()
[ 173.354298] Modules linked in:
[ 173.354421] Pid: 13452, comm: cc1 Tainted: G ...btw., this reminds me that this is the same system that has a serious e1000 network latency bug which i have reported more than a year ago, but which still does not appear to be fixed in latest mainline: PING europe (10.0.1.15) 56(84) bytes of data. 64 bytes from europe (10.0.1.15): icmp_seq=1 ttl=64 time=1.51 ms 64 bytes from europe (10.0.1.15): icmp_seq=2 ttl=64 time=404 ms 64 bytes from europe (10.0.1.15): icmp_seq=3 ttl=64 time=487 ms 64 bytes from europe (10.0.1.15): icmp_seq=4 ttl=64 time=296 ms 64 bytes from europe (10.0.1.15): icmp_seq=5 ttl=64 time=305 ms 64 bytes from europe (10.0.1.15): icmp_seq=6 ttl=64 time=1011 ms 64 bytes from europe (10.0.1.15): icmp_seq=7 ttl=64 time=0.209 ms 64 bytes from europe (10.0.1.15): icmp_seq=8 ttl=64 time=763 ms 64 bytes from europe (10.0.1.15): icmp_seq=9 ttl=64 time=1000 ms 64 bytes from europe (10.0.1.15): icmp_seq=10 ttl=64 time=0.438 ms 64 bytes from europe (10.0.1.15): icmp_seq=11 ttl=64 time=1000 ms 64 bytes from europe (10.0.1.15): icmp_seq=12 ttl=64 time=0.299 ms ^C --- europe ping statistics --- 12 packets transmitted, 12 received, 0% packet loss, time 11085ms those up to 1000 msec delays can be 'felt' via ssh too, if this problem triggers then the system is almost unusable via the network. Local latencies are perfect so it's an e1000 problem. Ingo --
From: Ingo Molnar <mingo@elte.hu> Or some kind of weird interrupt problem. Such an interrupt level bug would also account for the TX timeout's you're seeing btw. --
when i originally reported it i debugged it back to missing e1000 TX completion IRQs. I tried various versions of the driver to figure out whether new workarounds for e1000 cover it but it was fruitless. There is a 1000 msec internal watchdog timer IRQ within e1000 that gets things going if it's stuck. But the line sch_generic.c:222 problem is new. It could be an escallation of this same problem - not even the hw-internal watchdog timeout fixing up things? So basically two levels of completion failed, the third fallback level (a hard reset of the interface) helped things get going. High score from me for networking layer robustness :-) Ingo --
From: Ingo Molnar <mingo@elte.hu> Then that explains your latency, the chip is getting stuck and I think it is an escallation of the same problem. My first thought is that there must have been some change to the reset logic and it isn't as foolproof as it used to be, especially under load. --
note that the 1000 msecs timer is AFAIK internal to the e1000
_hardware_, not the driver itself. I.e. probably the firmware detects
and works around a hung transmitter. This is not detectable from the OS
(it's not an OS timer), but it can be observed by a lot of testing on a
totally quiescent system - which i did back then ;-)
i also played a lot with the various knobs of the e1000, none of which
seemed to help.
/me digs in archives
i reported it to the e1000 folks in 2006:
Date: Mon, 4 Dec 2006 11:24:00 +0100
against 2.6.19. The original report is below - with a trace and various
things i tried to debug this.
i eventually got the suggestion from Auke to set RxIntDelay=8 which
seemed to work around the issue - but since i use a built-in driver i
dont have that setting here (RxIntDelay=8 is a module load parameter and
not exposed via Kconfig methods) and the e1000 driver does not seem to
have changed its default setting for RxIntDelay.
2.6.18-1.2849.fc6 was the last kernel that worked fine.
Ingo
-------------------->
Date: Wed, 13 Dec 2006 22:09:22 +0100
From: Ingo Molnar <mingo@elte.hu>
To: Auke Kok <auke-jan.h.kok@intel.com>
Subject: Re: e1000: 2.6.19 & long packet latencies
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>,
"Ronciak, John" <john.ronciak@intel.com>
Jesse, et al.,
i'm having a weird packet processing latency problem with the e1000
driver and recent kernels.
The symptom is this: if i connect to a T60 laptop (which has an on-board
e1000) from the outside, i see large delays in network activity, and ssh
sessions are very sluggish.
ping latencies show it best under a dynticks kernel (but vanilla 2.6.19
is affected too):
titan:~/linux/linux> ping e
PING europe (10.0.1.15) 56(84) bytes of data.
64 bytes from europe (10.0.1.15): icmp_seq=1 ttl=64 time=0.340 ms
64 bytes from europe (10.0.1.15): icmp_seq=2 ttl=64 time=757 ms
64 bytes from europe (10.0.1.15): icmp_seq=3 ttl=64 time=1001 ms
64 bytes ...will try it. But even it if solves the problem it's a nasty complication: given how many times i have to bisect back into the times when there was only e1000 around, how do i handle the transition? I have automated bisection tools, etc. and i bisect very frequently. It's a real practical problem for me: if i have E1000E=y in my .config and go back to an older kernel, i lose that .config setting in 'make oldconfig'. Then when the bisection run happens to go back into the E1000E times, 'make oldconfig' picks up E1000E with a default-off setting - and things break or work differently. no other Linux driver i'm using forces me to do that and i rely on many of them and i rely on proper 'make oldconfig' behavior on a daily basis. Until now i was able to do automatic bisection back for _years_, to the v2.6.19 times. You broke that. And that's just one driver out of thousands of Linux drivers. Imagine what happened to bisectability and migration quality if every driver version update was this careless about its installed base as e1000/e1000e. The e1000 -> e1000e migration it was not only done in an incompetent, amateurish way, you also ignored real feedback and that combined together is totally lame and inacceptable behavior in my book. You should not expect praise and roses from me as long as you do stupid things like that. Ingo --
where were you when we discussed this? We took over a year and a half to get to a final plan and many people responded and provided feedback. In the end Jeff Garzik and many community members suggested a plan and this is what I implemented. In not a single way did I force anything down anyones throat. I did exactly what the community wanted me to do, and in the way that it seemed best by everyone. You only complain and do not provide a single solution to your problem. Your continued screaming and whining is totally not productive nor constructive at all, and frankly is insulting since you completely ignore the fact that we worked with the the community more than two-year to come to some maintainable situation. All you do is complain. Direct your problems to the network stack and driver maintainers since they approved and worked with me to implement the changes. *** NOTE: I NO LONGER MAINTAIN E1000/E1000E, nor do I represent them or speak for them. *** I frankly suggested that you try e1000e because this might provide valuable information for the people who are taking this ingrateful job after me. This was meant in a productive and constructive way. your flame is totally inappropriate and unprofessional. Either come up with a solution or start working on one, like I did when I took the much hated job as e1000 maintainer. I am totally open to suggestions and if needed I will work with the current e1000/e1000e maintainers on working something out if I see a better solution than the current situation. Until I see such a thing I can't do much else than ignore your childish whining. Auke --
From: "Kok, Auke" <auke-jan.h.kok@intel.com> Join the club. I'm also ignoring everything he writes until he changes his modus operandi to one that is more constructive than the pure hurtful whining he is emitting as of late. --
Technically, Ingo has asked for a solution (but btw. he gave some Actually, screaming and whining often "helps" to have things done, Two-year work doesn't guarantee the solution is right (but it might be I give 25... Regards, Jarek P. --
ok, i tried it now, and there's good news: the latency problem seems largely fixed by e1000e. (yay!) with e1000 i got these anomalous latencies: 64 bytes from europe (10.0.1.15): icmp_seq=10 ttl=64 time=1000 ms 64 bytes from europe (10.0.1.15): icmp_seq=11 ttl=64 time=0.882 ms 64 bytes from europe (10.0.1.15): icmp_seq=12 ttl=64 time=1007 ms 64 bytes from europe (10.0.1.15): icmp_seq=13 ttl=64 time=0.522 ms 64 bytes from europe (10.0.1.15): icmp_seq=14 ttl=64 time=1003 ms 64 bytes from europe (10.0.1.15): icmp_seq=15 ttl=64 time=0.381 ms 64 bytes from europe (10.0.1.15): icmp_seq=16 ttl=64 time=1010 ms with e1000e i get: 64 bytes from europe (10.0.1.15): icmp_seq=1 ttl=64 time=0.212 ms 64 bytes from europe (10.0.1.15): icmp_seq=2 ttl=64 time=0.372 ms 64 bytes from europe (10.0.1.15): icmp_seq=3 ttl=64 time=0.815 ms 64 bytes from europe (10.0.1.15): icmp_seq=4 ttl=64 time=0.961 ms 64 bytes from europe (10.0.1.15): icmp_seq=5 ttl=64 time=0.201 ms 64 bytes from europe (10.0.1.15): icmp_seq=6 ttl=64 time=0.788 ms TCP latencies are fine too - ssh feels snappy again. it still does not have nearly as good latencies as say forcedeth though: 64 bytes from mercury (10.0.1.13): icmp_seq=1 ttl=64 time=0.076 ms 64 bytes from mercury (10.0.1.13): icmp_seq=2 ttl=64 time=0.085 ms 64 bytes from mercury (10.0.1.13): icmp_seq=3 ttl=64 time=0.045 ms 64 bytes from mercury (10.0.1.13): icmp_seq=4 ttl=64 time=0.053 ms that's 10 times better packet latencies. and even an ancient Realtek RTL-8139 over 10 megabit Ethernet (!) has better latencies than the e1000e over 1000 megabit: 64 bytes from pluto (10.0.1.10): icmp_seq=2 ttl=64 time=0.309 ms 64 bytes from pluto (10.0.1.10): icmp_seq=3 ttl=64 time=0.333 ms 64 bytes from pluto (10.0.1.10): icmp_seq=4 ttl=64 time=0.329 ms 64 bytes from pluto (10.0.1.10): icmp_seq=5 ttl=64 time=0.311 ms 64 bytes from pluto (10.0.1.10): icmp_seq=6 ttl=64 time=0.302 ms is it done intentionally perhaps? I dont think it makes ...
Idle box, ICH8 chipset, e1000e, latest git. MegaRouterCore-KARAM ~ # ping 192.168.20.26 PING 192.168.20.26 (192.168.20.26) 56(84) bytes of data. 64 bytes from 192.168.20.26: icmp_seq=1 ttl=64 time=0.109 ms 64 bytes from 192.168.20.26: icmp_seq=2 ttl=64 time=0.134 ms 64 bytes from 192.168.20.26: icmp_seq=3 ttl=64 time=0.120 ms 64 bytes from 192.168.20.26: icmp_seq=4 ttl=64 time=0.117 ms 64 bytes from 192.168.20.26: icmp_seq=5 ttl=64 time=0.117 ms 64 bytes from 192.168.20.26: icmp_seq=6 ttl=64 time=0.113 ms Disabling interrupt moderation MegaRouterCore-KARAM ~ # ethtool -C eth0 rx-usecs 0 MegaRouterCore-KARAM ~ # ping 192.168.20.26 PING 192.168.20.26 (192.168.20.26) 56(84) bytes of data. 64 bytes from 192.168.20.26: icmp_seq=1 ttl=64 time=0.072 ms 64 bytes from 192.168.20.26: icmp_seq=2 ttl=64 time=0.091 ms 64 bytes from 192.168.20.26: icmp_seq=3 ttl=64 time=0.066 ms 64 bytes from 192.168.20.26: icmp_seq=4 ttl=64 time=0.065 ms 64 bytes from 192.168.20.26: icmp_seq=5 ttl=64 time=0.077 ms 64 bytes from 192.168.20.26: icmp_seq=6 ttl=64 time=0.073 ms Maybe try the same? ethtool -C eth0 rx-usecs 0 -- ------ Technical Manager Virtual ISP S.A.L. Lebanon --
ok, that looks much better! i have another box with e1000, ich7: 64 bytes from titan (10.0.1.14): icmp_seq=5 ttl=64 time=0.345 ms 64 bytes from titan (10.0.1.14): icmp_seq=6 ttl=64 time=1.03 ms 64 bytes from titan (10.0.1.14): icmp_seq=7 ttl=64 time=0.383 ms 64 bytes from titan (10.0.1.14): icmp_seq=8 ttl=64 time=0.320 ms 64 bytes from titan (10.0.1.14): icmp_seq=9 ttl=64 time=0.996 ms well i tend not to tweak my drivers with such options because i want to experience and test what 99.9% of our users will experience in the field. The reality is that if it's not the default behavior, it's almost as if it didnt exist at all. but even with that tune on e1000e (on the t60, ich7) i still get rather large numbers: earth4:~/s> ping eu PING europe (10.0.1.15) 56(84) bytes of data. 64 bytes from europe (10.0.1.15): icmp_seq=1 ttl=64 time=0.250 ms 64 bytes from europe (10.0.1.15): icmp_seq=2 ttl=64 time=0.250 ms 64 bytes from europe (10.0.1.15): icmp_seq=3 ttl=64 time=0.225 ms 64 bytes from europe (10.0.1.15): icmp_seq=4 ttl=64 time=0.932 ms 64 bytes from europe (10.0.1.15): icmp_seq=5 ttl=64 time=0.251 ms 64 bytes from europe (10.0.1.15): icmp_seq=6 ttl=64 time=0.915 ms 64 bytes from europe (10.0.1.15): icmp_seq=7 ttl=64 time=0.250 ms 64 bytes from europe (10.0.1.15): icmp_seq=8 ttl=64 time=0.238 ms 64 bytes from europe (10.0.1.15): icmp_seq=9 ttl=64 time=0.390 ms 64 bytes from europe (10.0.1.15): icmp_seq=10 ttl=64 time=0.260 ms Ingo --
Maybe there is some flow-control involved? ethtool -S eth0 ? This is Interrupt throttling i guess in e1000. In e1000 also parameters, but available only on insmod stage parm: TxIntDelay:Transmit Interrupt Delay (array of int) parm: TxAbsIntDelay:Transmit Absolute Interrupt Delay (array of int) parm: RxIntDelay:Receive Interrupt Delay (array of int) parm: RxAbsIntDelay:Receive Absolute Interrupt Delay (array of int) Each coin have two sides. On one side - low latencies(difference 1ms, it is matter anywhere?) Is all this hosts on same switch? Is the switch manageable or not? For example i am having problems with packetloss on long fiber link between two cheap Linksys switches. Without flow-control i cannot survive, and as result i have 1-2ms additional delay on load, and +-0.500ms jitter "inside" this switches (probably from switches). There is many things matter. Maybe even processor sleep latencies involved? bus latency, PCI latency, whatever. Also on laptops is dynamic frequency running (Speedstep) with 600 Mhz PentiumM (Speedstep - ondemand) 64 bytes from 127.0.0.1: icmp_seq=17 ttl=64 time=0.017 ms full speed 1.7 Ghz 64 bytes from 127.0.0.1: icmp_seq=33 ttl=64 time=0.007 ms on network also i see difference -0.030ms when i am running burnP6 (from CPUburn package). -- ------ Technical Manager Virtual ISP S.A.L. Lebanon --
i have reported the problem and even provided a fix.
I have triggered an e1000/e1000e related problem that got introduced in
the v2.6.25 merge window - one of my testboxes came up with no
networking and it took me an hour to figure out why. (i wasnt
particularly focusing on e1000, i just happened to hit that bug in 9
million lines of Linux kernel code)
I have reported it here, two and a half months ago:
http://lkml.org/lkml/2008/4/8/256
I even showed you which commit introduced the problem and gave you a
oneliner fix that i tested (it solved the problem):
http://bugzilla.kernel.org/attachment.cgi?id=15704&amp;action=view
You were Cc:-ed to that. (attached below again for reference) The bug
was added to the regression list of v2.6.25. I never expected to spend
more than 10 minutes on this problem once i found out what's happening -
we fix dozens of bugs like this per stable kernel release.
I just checked latest -git, my fix is still not upstream (or any
equivalent solution - i really dont mind how it's solved and i'm not
maintaining this code).
no alternative patch was sent to me - i offered to test any solution
back then.
FYI, since i first reported it i've been hit by that problem roughly a
dozen times. (it happened sporadically so i forgot about it - until i
again had a system come up with no networking.) It caused me lost time
and lost work that could have been spent on better things.
Ingo
------------------------>
Subject: e1000=y && e1000e=m regression fix
From: Ingo Molnar <mingo@elte.hu>
Date: Wed Apr 09 21:09:35 CEST 2008
fix a regression from v2.6.24: do not transfer the e1000e PCI IDs from
e1000 to e1000e if e1000 is built-in and e1000e is a module.
Built-in drivers take precedence over modules in many ways - and in this
case it's clear that the user intended the e1000 driver to be the
primary one. "Silently change behavior and break existing configs" is
never a good migration strategy. Most users will use ...Revert patch takes away problem with leak sockets. -- Thank, Vitaliy Gusev --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds |
