What happened in e1000 while kernel disables IRQ line?
Intel has released some fixes regarding to this issue several times.
Unfortunately, once fix has been released the next similar issue
came up again. The latest info from Intel in linux-netdev mailing
list shows that Intel has been released the newest patch for
I tried to find why one issue fixed then another one came up.
The scenario how this can be happened is, the fixes didn't "kill"
the root bug.
The most possible thing how spurious interrupts counted by kernel
reaches the limit of irqs_unhandled is, when the I/O APIC level is never
deasserted in some reasons.
Whatever version of e1000 driver you see, you will find a reset
command to NIC such as following line:
E1000_WRITE_REQ_IO(hw, CTRL, (ctrl | E1000_CTRL_RST));
By writing CTRL_RST to NIC, the ICR register will be set to 0 (zero).
As you can imagine here, zero-ing ICR without deassert I/O APIC level
will lead I/O APIC to send interrupt to local APIC continuously until
another interrupt comes and interrupt handler deasserts I/O APIC level.
How if another interrupt doesn't come? Well, the result is kernel
will disable IRQ line as all incoming interrupt signals to APIC will
be counted as spurious interrupt.
IMHO, as long as Intel doesn't fix how to reset NIC device, then
all the released fixes will be useless.