Hello,
(I sent this earlier today but it doesn't look like it made it, I
apologize if it gets through multiple times)
I am working on an application that uses a fairly simple UDP protocol to
send data between two embedded devices. I'm noticing an issue with an
initial test that was written where datagrams are received but not seen
by the recvfrom() call until more data arrives after it. As of right now
the test case does not implement any type of lost packet protection or
other flow control, which is what makes the issue so noticeable.
The target for this code is a board using the Atmel AT91SAM9260 ARM
processor. I have tested with 2.6.20 and 2.6.25 on this board.
The test consists of a two applications with the following pseudo code
(msg_size = 127, 9003/9005 are the UDP ports used):
"client app"
while(1) {
sendto(9003, &msg_size, 4bytes);
sendto(9003, buffer, msg_size);
recvfrom(9005, &msg_size, 4bytes);
recvfrom(9005, buffer, msg_size);
}
"server app"
while(1) {
recvfrom(9003, &msg_size, 4bytes);
recvfrom(9003, buffer, msg_size);
sendto(9005, &msg_size, 4bytes);
sendto(9005, buffer, msg_size);
}
As long as the server is started first and no packets are lost or out of
order, the client and server should continue indefinitely. When run
between two boards on a local gigabit switch, the application will run
smoothly most of the time, but I periodically see delays of 30 seconds
or more where one of the applications is waiting for the second datagram
to arrive before sending the next packet. Wireshark shows that the data
was sent very shortly after the first datagram, and no packets are ever
lost, ifconfig reports no collisions, overruns, or errors.
When I run the application between two identical devices on a cross-over
cable, data is transferred for a few seconds after which everything
freezes until I send a ping between the two boards in the background.
This forces the communication to start up again for a few seconds ...On Tue, 17 Jun 2008 17:08:58 -0500 I am unfamiliar with interrupts on the ARM. Are IRQ's level or edge triggered? NAPI won't work if interrupts are edge-triggered. --
Interrupts in this case are set to be level triggered. It has an interrupt controller that allows them to be configured several ways. The EMAC driver for the at91sam9260 is in drivers/net/macb.[ch]. Also note that the 133 MHz x86 that I tested on was an STPC Elite (it also displayed the same behavior). Thanks, Travis --
UDP packets can be lost anywhere..including in the receive buffer after it has been received by the NIC. You probably just need to write your code smarter to use non-blocking IO and deal with packet loss. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com --
Thanks Ben. I understand that there is no guarantee of anything with UDP, but it seems to me that if there is a packet in the buffer (it shows up after another packet comes in behind it) the system should know about it, right? The code will eventually deal with packet loss / retransmission (it is actually a customer's application, not my own). Development was only stopped at this point because this behavior was discovered. However, if the final application behaves in the same way that things are going now, the application would need to timeout on read, request retransmission, receive the original packet (that was just stuck in the buffer somewhere) and the retransmitted packet and decide which to toss every couple of seconds. This is a whole lot more retransmissions than I would expect to see on a cross-over cable, especially from receiving and processing only two small packets at one pass. If this is what's required I will relay that to the customer or implement some type of workaround to force a poll or flush. However, if there is possibly a bug or race condition that is not getting handled properly it would be better to try and find it. Thanks, --
Ahh, I see what you mean. I'm afraid I don't know anything about your NIC driver, and it would seem to be implicated. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com --
I agree, but it also troubles me that the x86 board that I noticed the
same issue on uses the realtek (8139too) driver, so I'm not completely
convinced that the issue is at the NIC level.
I was able to do some more extensive testing today with the macb (atmel
Eternet MAC controller) driver and noticed that the
netif_rx_schedule_prep function is returning false at times in the
interrupt handler. In the code below, the printk shows up during heavy
traffic, though it only happens a handful of times. (The else block is
code that I have added to the driver while debugging).
if (status & MACB_RX_INT_FLAGS) {
if (netif_rx_schedule_prep(dev)) {
/*
* There's no point taking any more interrupts
* until we have processed the buffers
*/
macb_writel(bp, IDR, MACB_RX_INT_FLAGS);
dev_dbg(&bp->pdev->dev, "scheduling RX softirq\n");
__netif_rx_schedule(dev);
} else {
printk(KERN_ERR "%s: Driver bug: interrupt while in polling mode\n", dev->name);
/* disable interrupts */
macb_writel(bp, IDR, MACB_RX_INT_FLAGS);
}
polling is already enabled for the interface (though I haven't looked
much deeper than the inline for netif_rx_schedule_prep()).
I went through the poll function, and actually rewrote the whole thing
according to the guidelines in the NAPI documentation, and I can't see
anyway for it to get out of poll with interrupts enabled without first
removing itself from the polling list.
Can someone who knows more about this give me some more insight into
what might be happening here? I can post the poll function or a patch to
macb.c if it would be helpful.
Thanks,
--
If you run a sniffer on the machine that is dropping/delaying receiving the pkt, you can probably determine whether it is a driver issue or some other stack issue: If you see the pkt in the sniffer, but not in the application, then it's probably a udp stack issue or at least not the driver. Otherwise, the driver must be holding onto the packet. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com --
I looked at macb.c and can see that it uses napi only for rx work, leaving tx interrupts enabled at all times. The interrupt handler reads the device interrupt status when a tx interrupt happens and may find rx bits also set. As a result, your netif_rx_schedule_prep() will sometimes return false because napi might be already scheduled. The code you have above (i.e. the "driver bug" case) is wrong. The napi code in the in-tree version looks suspect because it seems to enable rx interrupts unconditionally regardless of whether napi rx processing is complete. It might help to post a patch here showing all of your changes. -- James Chapman Katalix Systems Ltd http://www.katalix.com Catalysts for your Embedded Linux software development --
Thanks for the reply James. That is somewhat confusing to me because once an rx interrupt is detected and the rx interrupts are disabled the rx bits should not be set in the interrupt status register until they are re-enabled again the ISR is read and when the rx bits are tested and rx ints are disabled for it to be there the next time around in the while(status) loop. Correct, this is one of the reasons that I rewrote the driver poll Did this earlier today, I should get a patch against 2.6.25 up tomorrow which will be a little more useful. Thanks! Travis --
The rx and tx status are flagged in the same status register. The bits are set regardless of whether rx or tx interrupts are enabled in the device. So when you handle a tx interrupt, the interrupt routine will read the status register and may see rx bits also set. You could mask the status register value that you read to ignore rx bits if rx interrupts are disabled (NAPI polled mode). But to be honest, I think it is simpler to handle rx _and_ tx work in the NAPI poll handler so you only get interrupts when not in NAPI polled mode. See tg3.c or e100.c for example. -- James Chapman Katalix Systems Ltd http://www.katalix.com Catalysts for your Embedded Linux software development --
I will take a look at modifying the driver to use NAPI for tx. Thanks, Travis --
Hi. Did you run wireshark on receiver or sender? Check MIB stats if packet was dropped because of low mem or incorrect checksumm or some other problematic fields in UDP header. Sending part can see it perfectly correct, which will not be the issue on the receiver. If packet was delivered to receiving host, udp input path is rather simple so there are no places which can race with something and thus lost the packet. -- Evgeniy Polyakov --
Initially, I had run wireshark on my PC and connected it to one of the embedded boards (the issue still shows up in this case). I did some more testing today where I ran tcpdump on both of the boards connected with a cross-over cable until the application froze. What I was able to find was that the first 1 or 2 hangups are corrected after 4 or 5 seconds because the boards send an ARP request when data communication stops. This causes communication to start up again. No packets are ever lost or corrupted, they just don't appear to the application until something else happens on the network. Here is a snippet of the packet trace surrounding the hangup (these are from the same session, but the clocks on the two boards were not set to the same time): (On the "server" -- sbc41): 22:53:57.763656 IP sbc042.emacinc.com.3072 > sbc041.emacinc.com.9003: UDP, length 4 22:53:57.764000 IP sbc042.emacinc.com.3072 > sbc041.emacinc.com.9003: UDP, length 127 22:53:57.764229 IP sbc041.emacinc.com.3072 > sbc042.emacinc.com.9005: UDP, length 4 22:53:57.764387 IP sbc041.emacinc.com.3072 > sbc042.emacinc.com.9005: UDP, length 127 22:54:01.034522 arp who-has sbc041.emacinc.com tell sbc042.emacinc.com 22:54:01.034642 arp reply sbc041.emacinc.com is-at 00:50:c2:0d:6e:00 (oui Unknown) 22:54:01.035585 IP sbc042.emacinc.com.3072 > sbc041.emacinc.com.9003: UDP, length 4 22:54:01.035736 IP sbc042.emacinc.com.3072 > sbc041.emacinc.com.9003: UDP, length 127 22:54:01.036095 IP sbc041.emacinc.com.3072 > sbc042.emacinc.com.9005: UDP, length 4 22:54:01.036263 IP sbc041.emacinc.com.3072 > sbc042.emacinc.com.9005: UDP, length 127 22:54:01.036793 IP sbc042.emacinc.com.3072 > sbc041.emacinc.com.9003: UDP, length 4 -- 22:54:01.803384 IP sbc041.emacinc.com.3072 > sbc042.emacinc.com.9005: UDP, length 127 22:54:01.803773 IP sbc042.emacinc.com.3072 > sbc041.emacinc.com.9003: UDP, length 4 22:54:01.803916 IP sbc042.emacinc.com.3072 > sbc041.emacinc.com.9003: UDP, length 127 22:54:01.804274 IP sbc041.emacinc.com.3072 > ...
