Re: weird ppp upgrade problems

Previous thread: Re: cp(1) bug ? by Tom Van Looy on Saturday, October 20, 2007 - 2:59 am. (6 messages)

Next thread: Re: Squid/authpf with lookups on Active Directory by Schöberle Dániel on Saturday, October 20, 2007 - 5:37 am. (1 message)
From: Mitja Muženič
Date: Saturday, October 20, 2007 - 4:41 am

Hi!

I have a box with two external ISDN modems attached to it that acts as an
outgoing modem pool to a number of remote located ISDN routers (zyxel P-202H
Plus v2).

Recently I was given the go-ahead to upgrade this 3.6 box so I swapped the
disk with a new one with a fresh install of 4.1 (4.2 cds haven't yet arrived
then). Unfortuantely, this upgrade caused the outgoing ppp connections to
fail.

My initial investigation didn't show me a clear reason for this so I went
back to 3.6 and then upgraded version by version, until I found the moment
when the unchanged very simple ppp.conf (at the bottom of this mail) didn't
work anymore.

To cut a long story shirt, something happened between 3.8 and 3.9 that
changes the behaviour of the LCP negotiation protocol in my specific case:

3.8:
Oct 18 16:29:27 ringie ppp[11094]: tun0: LCP: deflink: LayerStart
Oct 18 16:29:27 ringie ppp[11094]: tun0: LCP: deflink: SendConfigReq(1)
state = Stopped
Oct 18 16:29:27 ringie ppp[11094]: tun0: LCP:  ACFCOMP[2]
Oct 18 16:29:27 ringie ppp[11094]: tun0: LCP:  PROTOCOMP[2]
Oct 18 16:29:27 ringie ppp[11094]: tun0: LCP:  ACCMAP[6] 0x00000000
Oct 18 16:29:27 ringie ppp[11094]: tun0: LCP:  MRU[4] 1500
Oct 18 16:29:27 ringie ppp[11094]: tun0: LCP:  MAGICNUM[6] 0x434b14d1
Oct 18 16:29:27 ringie ppp[11094]: tun0: LCP: deflink: State change Stopped
--> Req-Sent
Oct 18 16:29:27 ringie ppp[11094]: tun0: LCP: deflink: RecvConfigRej(1)
state = Req-Sent
Oct 18 16:29:27 ringie ppp[11094]: tun0: LCP:  PROTOCOMP[2]
Oct 18 16:29:27 ringie ppp[11094]: tun0: LCP:  ACFCOMP[2]
Oct 18 16:29:27 ringie ppp[11094]: tun0: LCP: deflink: SendConfigReq(2)
state = Req-Sent
Oct 18 16:29:27 ringie ppp[11094]: tun0: LCP:  ACCMAP[6] 0x00000000
Oct 18 16:29:27 ringie ppp[11094]: tun0: LCP:  MRU[4] 1500
Oct 18 16:29:27 ringie ppp[11094]: tun0: LCP:  MAGICNUM[6] 0x434b14d1
Oct 18 16:29:27 ringie ppp[11094]: tun0: LCP: deflink: RecvConfigAck(2)
state = Req-Sent

3.9:

Oct 18 16:28:45 ringie ppp[6602]: tun0: LCP: deflink: ...
From: Mitja Muženič
Date: Sunday, October 28, 2007 - 12:34 pm

I managed to narrow down this isssue significantly.

My hardware setup on this box is a soekris 4801 board + a 4-port serial card
by Sunix (
http://www.sunix.com.tw/it/en/Product_Detail.php?cate=2&class_a_id=34&sid=36
1 ), full dmesg at the end.

The serial ports are:
....
puc0 at pci0 dev 10 function 0 "Sunix 40XX" rev 0x01: ports: 4 com
pccom3 at puc0 port 0 irq 11: ti16750, 64 byte fifo
pccom4 at puc0 port 1 irq 11: ti16750, 64 byte fifo
pccom5 at puc0 port 2 irq 11: ti16750, 64 byte fifo
pccom6 at puc0 port 3 irq 11: ti16750, 64 byte fifo
...
pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pccom0: console
pccom1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
.....


OpenBSD 3.6-3.8: modem comms using an ISDN modem over /dev/cua03 to cua06
work OK.

OpenBSD 3.9 and later up to -current of a few days ago: using /dev/cua0[3,4]
fails, /dev/cua01 works OK. When I use the sunix ports, on the remote end it
appears that my ppp process is sending corrupted data mixed with valid data
fields in LCP proposals. The fact that the on-board isa-attached pccom1
works shows that userland ppp is not at fault for this breakage.

So far everything points at the changes in puc(4) between 3.8 and 3.9.

Before I start reverting the (luckily few) changes one by one, does anyone
have a better idea? Or a hint on how to debug this better.


Thanks, Mitja
(and apologies for not submiting a dmesg the first time, but I felt that the
original mail was long enough even without it)


OpenBSD 4.2-current (GENERIC) #10: Sat Oct 20 15:59:16 CEST 2007
    root@test.xxxxxxx.net:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Geode(TM) Integrated Processor by National Semi ("Geode by NSC"
586-class) 267 MHz
cpu0: FPU,TSC,MSR,CX8,CMOV,MMX
cpu0: TSC disabled
real mem  = 133787648 (127MB)
avail mem = 121565184 (115MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 20/70/08, BIOS32 rev. 0 @ 0xf7840
pcibios0 at bios0: rev 2.0 @ 0xf0000/0x10000
pcibios0: pcibios_get_intr_routing - ...
From: Otto Moerbeek
Date: Sunday, October 28, 2007 - 1:18 pm

Can you try a more recent snap? It has fifo probe code enabled, and it
is interesteing to see if the fifo probe reports a different depth.

That commit was done to resolve an issue for a 2 port Sunix card. For
this card (4036A) the actual fifo depth is 32 instead of 64.

See
http://www.openbsd.org/cgi-bin/cvsweb/src/sys/dev/ic/com_subr.c.diff?r1=1.10&
r2=1.11


From: Mitja Muženič
Date: Sunday, October 28, 2007 - 3:25 pm

Good catch!

...
puc0 at pci0 dev 10 function 0 "Sunix 40XX" rev 0x01: ports: 4 com
pccom3 at puc0 port 0 irq 11: ti16750, 64 byte fifo
pccom3: probed fifo depth: 32 bytes
pccom4 at puc0 port 1 irq 11: ti16750, 64 byte fifo
pccom4: probed fifo depth: 32 bytes
pccom5 at puc0 port 2 irq 11: ti16750, 64 byte fifo
pccom5: probed fifo depth: 32 bytes
pccom6 at puc0 port 3 irq 11: ti16750, 64 byte fifo
pccom6: probed fifo depth: 32 bytes
....

And guess what? The ppp modem connection now works!

Now, is there a way to add a quirk for this specific card for 4.2? Or should
I take my chances and uncomment com_fifo_probe() in the 4.2 source? I would
rather not run -current in production at this early stage o