carp: intermittent master/backup swapping

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: <misc@...>
Date: Friday, August 31, 2007 - 10:38 pm

have 2 sun netra t1s running sparc64 4.1-release as my firewalls and am
experiencing intermittent swapping of MASTER and BACKUP states on carp
interfaces. i have carp working fine in a number of other places and do
not see this behavior there, although the working setups are i386-based.

NOTE: i've included several tcpdumps and various outputs, so this is a
long message. have spent several hours at this without a resolution and
do appreciate folks taking the time to read through it =)

problems are most apparent when the internal interface drops packets,
but the most serious case is that of the public IPs that are carp-ed.
example:

- have the external interface on both machines, hme1, carp-ed and the
ifconfig output for each machine's interfaces is as follows

FW #1

hme1:
flags=8b63
mtu 1500
lladdr 08:00:20:c2:21:45
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet6 fe80::a00:20ff:fec2:2145%hme1 prefixlen 64 scopeid 0x2
inet 208.70.19.203 netmask 0xfffffff8 broadcast 208.70.19.207
...
carp0: flags=8843 mtu 1500
lladdr 00:00:5e:00:01:01
carp: MASTER carpdev hme1 vhid 1 advbase 1 advskew 0
groups: carp
inet 208.70.19.202 netmask 0xfffffff8 broadcast 208.70.19.207
inet6 fe80::200:5eff:fe00:101%carp0 prefixlen 64 scopeid 0xd

FW #2

hme1:
flags=8b63
mtu 1500
lladdr 08:00:20:f9:a8:8d
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet6 fe80::a00:20ff:fef9:a88d%hme1 prefixlen 64 scopeid 0x2
inet 208.70.19.204 netmask 0xfffffff8 broadcast 208.70.19.207
...
carp0: flags=8843 mtu 1500
lladdr 00:00:5e:00:01:01
carp: BACKUP carpdev hme1 vhid 1 advbase 1 advskew 100
groups: carp
inet6 fe80::200:5eff:fe00:101%carp0 prefixlen 64 scopeid 0xc
inet 208.70.19.202 netmask 0xfffffff8 broadcast 208.70.19.207

the pf rules that are in place to allow carp traffic are

carp_if = "{ hme1 fxp0 vlan0 }"
...
pass quick on $carp_if proto carp

running a tcpdump on each machine's hme1 interface while pinging from
another public IP should only show packets going to the MASTER host, but
they show up at both hosts every ~4 pings:

FW #1

# tcpdump -nettvi hme1 icmp
1188612639.360200 0:d:88:db:90:c2 0:0:5e:0:1:1 0800 98: 69.217.100.54 >
208.70.19.202: icmp: echo request (id:273d seq:0) (ttl 245, id 10051,
len 84)
1188612639.360459 8:0:20:c2:21:45 0:d:88:db:90:c2 0800 98: 208.70.19.202
> 69.217.100.54: icmp: echo reply (id:273d seq:0) (ttl 255, id 54830,
len 84)
1188612640.367703 0:d:88:db:90:c2 0:0:5e:0:1:1 0800 98: 69.217.100.54 >
208.70.19.202: icmp: echo request (id:273d seq:1) (ttl 245, id 16678,
len 84)
1188612640.367845 8:0:20:c2:21:45 0:d:88:db:90:c2 0800 98: 208.70.19.202
> 69.217.100.54: icmp: echo reply (id:273d seq:1) (ttl 255, id 39539,
len 84)
1188612641.377920 8:0:20:c2:21:45 0:d:88:db:90:c2 0800 98: 208.70.19.202
> 69.217.100.54: icmp: echo reply (id:273d seq:2) (ttl 254, id 34078,
len 84)
1188612642.387150 0:d:88:db:90:c2 0:0:5e:0:1:1 0800 98: 69.217.100.54 >
208.70.19.202: icmp: echo request (id:273d seq:3) (ttl 245, id 24763,
len 84)
1188612642.387234 8:0:20:c2:21:45 0:d:88:db:90:c2 0800 98: 208.70.19.202
> 69.217.100.54: icmp: echo reply (id:273d seq:3) (ttl 255, id 34369,
len 84)
1188612643.397651 0:d:88:db:90:c2 0:0:5e:0:1:1 0800 98: 69.217.100.54 >
208.70.19.202: icmp: echo request (id:273d seq:4) (ttl 245, id 27355,
len 84)
1188612643.397737 8:0:20:c2:21:45 0:d:88:db:90:c2 0800 98: 208.70.19.202
> 69.217.100.54: icmp: echo reply (id:273d seq:4) (ttl 255, id 63331,
len 84)
1188612644.407624 0:d:88:db:90:c2 0:0:5e:0:1:1 0800 98: 69.217.100.54 >
208.70.19.202: icmp: echo request (id:273d seq:5) (ttl 245, id 6342, len 84)
1188612644.407705 8:0:20:c2:21:45 0:d:88:db:90:c2 0800 98: 208.70.19.202
> 69.217.100.54: icmp: echo reply (id:273d seq:5) (ttl 255, id 42235,
len 84)
1188612645.417367 8:0:20:c2:21:45 0:d:88:db:90:c2 0800 98: 208.70.19.202
> 69.217.100.54: icmp: echo reply (id:273d seq:6) (ttl 254, id 38737,
len 84)
1188612646.427398 0:d:88:db:90:c2 0:0:5e:0:1:1 0800 98: 69.217.100.54 >
208.70.19.202: icmp: echo request (id:273d seq:7) (ttl 245, id 11606,
len 84)
1188612646.427537 8:0:20:c2:21:45 0:d:88:db:90:c2 0800 98: 208.70.19.202
> 69.217.100.54: icmp: echo reply (id:273d seq:7) (ttl 255, id 61344,
len 84)
1188612647.437498 0:d:88:db:90:c2 0:0:5e:0:1:1 0800 98: 69.217.100.54 >
208.70.19.202: icmp: echo request (id:273d seq:8) (ttl 245, id 6757, len 84)
1188612647.437591 8:0:20:c2:21:45 0:d:88:db:90:c2 0800 98: 208.70.19.202
> 69.217.100.54: icmp: echo reply (id:273d seq:8) (ttl 255, id 47796,
len 84)
1188612648.447103 0:d:88:db:90:c2 0:0:5e:0:1:1 0800 98: 69.217.100.54 >
208.70.19.202: icmp: echo request (id:273d seq:9) (ttl 245, id 11092,
len 84)
1188612648.447187 8:0:20:c2:21:45 0:d:88:db:90:c2 0800 98: 208.70.19.202
> 69.217.100.54: icmp: echo reply (id:273d seq:9) (ttl 255, id 55450,
len 84)
1188612649.457584 8:0:20:c2:21:45 0:d:88:db:90:c2 0800 98: 208.70.19.202
> 69.217.100.54: icmp: echo reply (id:273d seq:10) (ttl 254, id 43973,
len 84)

FW #2

# tcpdump -nettvi hme1 icmp
tcpdump: listening on hme1, link-type EN10MB
1188612641.394907 0:d:88:db:90:c2 0:0:5e:0:1:1 0800 98: 69.217.100.54 >
208.70.19.202: icmp: echo request (id:273d seq:2) (ttl 245, id 12881,
len 84)
1188612645.434527 0:d:88:db:90:c2 0:0:5e:0:1:1 0800 98: 69.217.100.54 >
208.70.19.202: icmp: echo request (id:273d seq:6) (ttl 245, id 1775, len 84)
1188612649.474761 0:d:88:db:90:c2 0:0:5e:0:1:1 0800 98: 69.217.100.54 >
208.70.19.202: icmp: echo request (id:273d seq:10) (ttl 245, id 1815,
len 84)

tcpdumps for protocol 112 (carp) on both hosts show the expected,
AFAICT, output:

FW #1:

# tcpdump -nettvi hme1 proto 112
tcpdump: listening on hme1, link-type EN10MB
1188612677.154711 0:0:5e:0:1:1 1:0:5e:0:0:12 0800 70: carp 208.70.19.203
> 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 demote=0
(DF) [tos 0x10] (ttl 255, id 39664, len 56)
1188612677.525521 0:0:5e:0:1:1 1:0:5e:0:0:12 0800 70: carp 208.70.19.204
> 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100
demote=0 (DF) [tos 0x10] (ttl 255, id 38372, len 56)
1188612677.525619 0:0:5e:0:1:1 33:33:0:0:0:12 86dd 90:
fe80::a00:20ff:fef9:a88d > ff02::12: ip-proto-112 36 (len 36, hlim 255)
1188612678.164831 0:0:5e:0:1:1 1:0:5e:0:0:12 0800 70: carp 208.70.19.203
> 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 demote=0
(DF) [tos 0x10] (ttl 255, id 42236, len 56)
1188612679.174879 0:0:5e:0:1:1 1:0:5e:0:0:12 0800 70: carp 208.70.19.203
> 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 demote=0
(DF) [tos 0x10] (ttl 255, id 60259, len 56)
1188612680.184975 0:0:5e:0:1:1 1:0:5e:0:0:12 0800 70: carp 208.70.19.203
> 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 demote=0
(DF) [tos 0x10] (ttl 255, id 56877, len 56)
1188612681.195066 0:0:5e:0:1:1 1:0:5e:0:0:12 0800 70: carp 208.70.19.203
> 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 demote=0
(DF) [tos 0x10] (ttl 255, id 49024, len 56)
1188612681.566012 0:0:5e:0:1:1 1:0:5e:0:0:12 0800 70: carp 208.70.19.204
> 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100
demote=0 (DF) [tos 0x10] (ttl 255, id 49912, len 56)
1188612681.566088 0:0:5e:0:1:1 33:33:0:0:0:12 86dd 90:
fe80::a00:20ff:fef9:a88d > ff02::12: ip-proto-112 36 (len 36, hlim 255)

FW #2:

# tcpdump -nettvi hme1 proto 112
tcpdump: listening on hme1, link-type EN10MB
1188612672.121790 0:0:5e:0:1:1 1:0:5e:0:0:12 0800 70: carp 208.70.19.203
> 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 demote=0
(DF) [tos 0x10] (ttl 255, id 60730, len 56)
1188612673.131839 0:0:5e:0:1:1 1:0:5e:0:0:12 0800 70: carp 208.70.19.203
> 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 demote=0
(DF) [tos 0x10] (ttl 255, id 59159, len 56)
1188612673.502429 0:0:5e:0:1:1 1:0:5e:0:0:12 0800 70: carp 208.70.19.204
> 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100
demote=0 (DF) [tos 0x10] (ttl 255, id 34477, len 56)
1188612673.502526 0:0:5e:0:1:1 33:33:0:0:0:12 86dd 90:
fe80::a00:20ff:fef9:a88d > ff02::12: ip-proto-112 36 (len 36, hlim 255)
1188612674.141927 0:0:5e:0:1:1 1:0:5e:0:0:12 0800 70: carp 208.70.19.203
> 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 demote=0
(DF) [tos 0x10] (ttl 255, id 46803, len 56)
1188612675.152044 0:0:5e:0:1:1 1:0:5e:0:0:12 0800 70: carp 208.70.19.203
> 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 demote=0
(DF) [tos 0x10] (ttl 255, id 38570, len 56)
1188612676.162157 0:0:5e:0:1:1 1:0:5e:0:0:12 0800 70: carp 208.70.19.203
> 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 demote=0
(DF) [tos 0x10] (ttl 255, id 52865, len 56)
1188612677.172266 0:0:5e:0:1:1 1:0:5e:0:0:12 0800 70: carp 208.70.19.203
> 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 demote=0
(DF) [tos 0x10] (ttl 255, id 39664, len 56)
1188612677.542949 0:0:5e:0:1:1 1:0:5e:0:0:12 0800 70: carp 208.70.19.204
> 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100
demote=0 (DF) [tos 0x10] (ttl 255, id 38372, len 56)
1188612677.543020 0:0:5e:0:1:1 33:33:0:0:0:12 86dd 90:
fe80::a00:20ff:fef9:a88d > ff02::12: ip-proto-112 36 (len 36, hlim 255)

the relevant sysctls are set on both machines

net.inet.carp.allow=1
net.inet.carp.preempt=1
net.inet.ip.forwarding=1

this is the second time i've battled with this problem and would like to
get it fixed. i would love to know if i am being a dumbass here but any
clues are appreciated.

cheers,
jake

--

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
carp: intermittent master/backup swapping, Jacob Yocom-Piatt, (Fri Aug 31, 10:38 pm)
Re: carp: intermittent master/backup swapping, Stuart Henderson, (Sat Sep 1, 5:52 am)
Re: carp: intermittent master/backup swapping, Jacob Yocom-Piatt, (Sat Sep 1, 10:36 am)