I have encountered an error which looks tg3-related. Upon adding some
htb queue rules (which I don't have handy ATM but can provide if
needed), after some time we get such messages in the kernel log:
Oct 3 17:04:04 sbd kernel: [ 1941.584154] tg3: eth0: The system may be re-ordering memory-mapped I/O cycles to the network device, attempting to recover. Please report the problem to the driver maintainer and include system chipset information.
Oct 3 17:04:04 sbd kernel: [ 1941.686114] tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
Oct 3 17:04:04 sbd kernel: [ 1941.750691] tg3: eth0: Link is down.
Oct 3 17:04:08 sbd kernel: [ 1945.300166] tg3: eth0: Link is up at 1000 Mbps, full duplex.
Oct 3 17:04:08 sbd kernel: [ 1945.300196] tg3: eth0: Flow control is on for TX and on for RX.
After that, the machine is pretty much dead. It doesn't crash hard
(the messages reached syslog) but the network no longer works, so
a reboot is neccessary anyway.
The machine is a Dell PE860 with two tg3 controllers (the second one
is not used at all):
0000:04:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 11)
Subsystem: Dell: Unknown device 01e6
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 0x10 (64 bytes)
Interrupt: pin A routed to IRQ 16
Region 0: Memory at fe8f0000 (64-bit, non-prefetchable) [size=64K]
Capabilities:  Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities:  Vital Product Data
Capabilities:  Message Signalled Interrupts: 64bit+ Queue=0/3 Enable-
Address: 042401610720cc0c Data: 02a0
Capabilities: [d0] #10 ...
The HTB rules are as simple as below:
$TC qdisc del dev $INTERFACE root
$TC qdisc add dev $INTERFACE root handle 1:0 htb
$TC class add dev $INTERFACE parent 1:0 classid 1:1 htb rate 1000Mbit ceil 1000Mbit
$TC class add dev $INTERFACE parent 1:1 classid 1:2 htb rate 5Mbit ceil 5Mbit
$TC qdisc add dev $INTERFACE parent 1:2 sfq
$TC filter add dev $INTERFACE parent 1:0 protocol ip prio 10 u32 match ip src 10.10.10.10 flowid 1:2
(the IP address was changed but it's one of the machine's 65 local IP
The traffic is 4-5kpps both ways, 2Mbps in, 40Mbps out. There are
several database servers running. The point of these rules was to
limit the data rate of 10.10.10.10 (a mysql server) to 5Mbps. Currently
the traffic from this IP accounts for much of the whole traffic but overall
it's pretty bursty. It ranges from nothing at all to 10Mbps to 70Mbps
It may be related that on other machines the HTB queues are saturated
only occasionally and usually the data rates are well below HTB
settings, so a drastically overlimit queue may well be unique to this