"I have a question for the PF/ALTQ masters out there," Matthew Dillon began on the DragonFlyBSD kernel mailing list, having recently switched from using a Cisco router to a DragonFlySD server running PF. "I am trying to configure PF in a manner similar to what Cisco's fair-queue algorithm does. Cisco's algorithm basically hashes TCP and UDP traffic based on the port/IP pairs, creating a bunch of lists of backlogged packets and then schedules the packets at the head of each list." He went on to explain that he was unsuccessfully trying to configure the same thing with PF, "neither CBQ nor HFSC seem to work well. I can separate certain types of traffic but the real problem is when there are multiple TCP connections that are essentially classified the same, and one is hogging the outgoing bandwidth. So the question is, is there a PF solution for that or do I need to write a new ALTQ mechanic to implement fair queueing?"
Not finding a solution, he followed with a series of patches implementing what he needed. He explained the resulting logic noting, "unless something comes up I am going to commit this to DragonFly on Friday and call it done. I would be pleased if other projects picked up some or all of the work":
"The queues are scanned from highest priority to lowest priority; if the packet bandwidth on the queue does not exceed the bandwidth parameter and a packet is available, a packet will be chosen fro that queue; if a packet is available but the queue has exceeded the specified bandwidth, the next lower priority queue is scanned (and so forth); if NO lower priority queues either have packets or are all over the bandwidth limit, then a packet will be taken from the highest priority queue with a packet ready; packet rate can exceed the queue bandwidth specification (but will not exceed the interface bandwidth specification, of course), but under full saturation the average bandwidth for any given queue will be limited to the specified value."
From: Matthew Dillon Subject: Network transition complete + PF question Date: Apr 2, 11:08 pm 2008 The network move is complete. I have a question for the PF/ALTQ masters out there. I am trying to configure PF in a manner similar to what Cisco's fair-queue algorithm does. Cisco's algorithm basically hashes TCP and UDP traffic based on the port/IP pairs, creating a bunch of lists of backlogged packets and then schedules the packets at the head of each list. I am trying to find something equivalent with PF and not having much luck. Neither CBQ nor HFSC seem to work well. I can separate certain types of traffic but the real problem is when there are multiple TCP connections that are essentially classified the same, and one is hogging the outgoing bandwidth. So the question is, is there a PF solution for that or do I need to write a new ALTQ mechanic to implement fair queueing ? If there is no current solution I have a pretty good idea how to implement it. I can use PF's 'keep state' mechanism and then hash the state structure pointer and store it in the packet header, then implement a new ALTQ that takes that hash code and throws it into an array of queues from which it fair-dequeues packets for output. -Matt
From: Matthew Dillon Subject: FairQ ALTQ for PF - Patch #1 Date: Apr 3, 9:28 pm 2008 Ok, This is my first attempt at adding a fairq feature to ALTQ/PF. It isn't perfect yet, but it appears to work reasonably well. fetch http://apollo.backplane.com/DFlyMisc/fairq01.patch It isn't hierarchical (at least not yet), but you can specify multiple queues for each interface as long as you give them different priorities. Here is an example configuration: altq on vke0 fairq bandwidth 500Kb queue { normal, fair } queue fair priority 1 bandwidth 100Kb fairq(buckets 64) qlimit 50 queue normal priority 2 bandwidth 400Kb fairq(buckets 64, default) qlimit 50 pass out on vke0 inet proto tcp from any to any keep state queue normal pass out on vke0 inet proto tcp from any to 216.240.41.28 keep state queue fair Here is how it works: * The queues are scanned from highest priority to lowest priority. * If the packet bandwidth on the queue does not exceed the bandwidth parameter and a packet is available, a packet will be chosen from that queue. * If a packet is available but the queue has exceeded the specified bandwidth, the next lower priority queue is scanned (and so forth). * If NO lower priority queues either have packets or are all over the bandwidth limit, then a packet will be taken from the highest priority queue with a packet ready. * Packet rate can exceed the queue bandwidth specification (but will not exceed the interface bandwidth specification, of course), but under full saturation the average bandwidth for any given queue will be limited to the specified value. Here is how the fair queueing works: * You MUST specify 'keep state' in the related rules. * keep state 'connections' will be given a fingerprint hash code which will be used to enqueue the mbuf in one of the N buckets (64 in our example) for each fair queue. * When PF request's a packet from the fairq, a packet will be selected from each of the 64 buckets in a round-robin fashion. Thus if you have a very hungy connection, it will not be able to steal all the bandwidth (or queue up tons of packets to the actual interface) from other connections within the queue. Caveats and issues: (1) The qlimit is per-bucket. So 64 buckets x 50 packets is, worst case, 3200 packets. It's unlikely this would ever occur, but it's an issue that I haven't dealt with yet. (2) Due to limitations on the number of buckets, multiple connections can end up in the same bucket. If one of those connections is a heavy hitter, the others will suffer. This could probably be fixed with further sorting or perhaps a different topology (e.g. like a tree instead of a fixed array). Please Test! I have this running on my router box right now and it appears to work very well. -Matt
From: Max Laier Subject: Re: FairQ ALTQ for PF - Patch #1 Date: Apr 5, 10:18 am 2008 On Friday 04 April 2008 06:28:22 Matthew Dillon wrote: > Ok, This is my first attempt at adding a fairq feature to ALTQ/PF. > It isn't perfect yet, but it appears to work reasonably well. > > fetch http://apollo.backplane.com/DFlyMisc/fairq01.patch There is a WFQ discipline for ALTQ: http://www.kame.net/dev/cvsweb2.cgi/kame/kame/sys/altq/ altq_wfq.{h,c} It has never been integrated with pf, but I think using your approach of passing a hash in the pkthdr this should be rather straight forward. > It isn't hierarchical (at least not yet), but you can specify > multiple queues for each interface as long as you give them different > priorities. > > Here is an example configuration: > > altq on vke0 fairq bandwidth 500Kb queue { normal, fair } > queue fair priority 1 bandwidth 100Kb fairq(buckets 64) qlimit 50 > queue normal priority 2 bandwidth 400Kb fairq(buckets 64, default) > qlimit 50 > > pass out on vke0 inet proto tcp from any to any keep state queue normal > pass out on vke0 inet proto tcp from any to 216.240.41.28 keep state > queue fair > > Here is how it works: > > * The queues are scanned from highest priority to lowest priority. > > * If the packet bandwidth on the queue does not exceed the > bandwidth parameter and a packet is available, a packet will be chosen > from that queue. > > * If a packet is available but the queue has exceeded the specified > bandwidth, the next lower priority queue is scanned (and so > forth). > > * If NO lower priority queues either have packets or are all over > the bandwidth limit, then a packet will be taken from the highest > priority queue with a packet ready. > > * Packet rate can exceed the queue bandwidth specification (but > will not exceed the interface bandwidth specification, of > course), but under full saturation the average bandwidth for any given > queue will be limited to the specified value. > > Here is how the fair queueing works: > > * You MUST specify 'keep state' in the related rules. > > * keep state 'connections' will be given a fingerprint hash code > which will be used to enqueue the mbuf in one of the N buckets (64 in > our example) for each fair queue. > > * When PF request's a packet from the fairq, a packet will be > selected from each of the 64 buckets in a round-robin fashion. > > Thus if you have a very hungy connection, it will not be able to > steal all the bandwidth (or queue up tons of packets to the > actual interface) from other connections within the queue. > > Caveats and issues: > > (1) The qlimit is per-bucket. So 64 buckets x 50 packets is, worst > case, 3200 packets. It's unlikely this would ever occur, but it's an > issue that I haven't dealt with yet. > > (2) Due to limitations on the number of buckets, multiple > connections can end up in the same bucket. If one of those connections > is a heavy hitter, the others will suffer. > > This could probably be fixed with further sorting or perhaps a > different topology (e.g. like a tree instead of a fixed array). > > Please Test! I have this running on my router box right now and > it appears to work very well. > > -Matt -- /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News
From: Matthew Dillon Subject: Re: FairQ ALTQ for PF - Patch #1 Date: Apr 5, 11:09 am 2008 :There is a WFQ discipline for ALTQ: :http://www.kame.net/dev/cvsweb2.cgi/kame/kame/sys/altq/ altq_wfq.{h,c} : :It has never been integrated with pf, but I think using your approach of :passing a hash in the pkthdr this should be rather straight forward. Ah, there we go. Wow, way back in 1997. The core of that code is definitely fair-queue. I'm not sure what they are doing at the top level, though, I don't see any prioritization or bandwidth control. There's a queue->quota and a queue->weight that looks like it has been partially coded but not finished. They are using a list of queues instead of an array which I think is somewhat superior to what I'm doing (bitmap of active queues with an iterator), but I think my bandwidth and prioritization algorithm is a bit more advanced. One thing I can theorize would be beneficial would be to record the bandwidth being used by each sub-queue and then allow low bandwidth queues to 'burst' data by moving the queue to the head of the list if it is recognized as having low bandwidth and is otherwise empty. To prevent starvation from having many low bw connections you'd keep another counter which is reset when the round-robin encounters the queue normally without it having been moved. So, e.g. if you do a 'pounding the keyboard' test on an interactive connection you would get interactive response. Right now with my implementation if you pound the keyboard you get intermediate responsiveness because the round-robin has to cycle around to that queue before the packet gets sent. Maybe that is what they were trying to control with the weighting variable. I am going to research it a bit more. I kinda like my base better (well, that's no surprise), but the list of queues approach WFQ takes has a lot more flexibility. -Matt Matthew Dillon <dillon@backplane.com>
From: Matthew Dillon Subject: FairQ ALTQ for PF - Patch #2 Date: Apr 5, 3:15 pm 2008 After looking at WFQ (thanks to Max Laier for the reference!), and reading a few papers on it, I've got the second version of my fairq patch for ALTQ ready to go. fetch http://apollo.backplane.com/DFlyMisc/fairq02.patch This version removes the bitmap and the fixed array scan. It keeps the fixed array of buckets but links the active buckets together into a circular queue. The 'hogs' option is now operational. This option allows a bucket to drain in a burst (i.e. to not advance the round robin pointer) as long as its bandwidth is less then the specified bandwidth. My fair share scheduler is not yet weighted, but the new topology makes it possible to implement a full blown fair share scheduler (aka a weighted scheduler). I haven't decided whether I want to go that far yet but in the mean I did implement a quick hack to insert new empty low-bandwidth queues (bw < hogs bw) at the head of the circular list instead of the tail (kind of a poor-man's deadline mechanic but not really). I'm considering my options. The new circular list gives me a lot of flexibility. Here is an example configuration. Also note that your kernel must be compiled with the various ALTQ options, including the new ALTQ_FAIRQ option. -Matt ports="{ 22, 25 }" altq on vke0 fairq bandwidth 500Kb queue { normal, bulk } queue bulk priority 1 bandwidth 100Kb \ fairq(buckets 64, hogs 25Kb) qlimit 50 queue normal priority 2 bandwidth 400Kb \ fairq(buckets 64, hogs 25Kb, default) qlimit 50 pass out on vke0 inet proto tcp from any to any \ keep state queue normal pass out on vke0 inet proto tcp from any to 216.240.41.28 port $ports \ keep state queue bulk
From: Matthew Dillon Subject: Re: FairQ ALTQ for PF - Patch #2 Date: Apr 6, 11:39 am 2008 This has been running well on my router and doesn't really effect other ALTQ disciplines so I am going to go ahead and commit it to clear room to port the probability keyword that Cedric mentioned, before I get back to finishing up HAMMER. -Matt
From: Matthew Dillon Subject: Re: FairQ ALTQ for PF - Patch #2 Date: Apr 6, 3:36 pm 2008 :Matthew Dillon wrote: :> This has been running well on my router and doesn't really effect :> other ALTQ disciplines so I am going to go ahead and commit it :> to clear room to port the probability keyword that Cedric mentioned, :> before I get back to finishing up HAMMER. :> :> -Matt : :For some reason, since a week ago, your servers have been unreachable to :Linux clients. The problem can be temporarily bypassed by setting the :Linux sysctl net.ipv4.tcp_window_scaling to 0 : :-- :Robert Luciani It's got to be something PF (packet filter) is doing. I was using a Cisco with the T1. I'm using a DFly box running PF with the DSL line. I'm trying to track it down. -Matt
From: Max Laier Subject: Re: FairQ ALTQ for PF - Patch #2 Date: Apr 6, 4:31 pm 2008 On Monday 07 April 2008 00:36:29 Matthew Dillon wrote: > :Matthew Dillon wrote: > :> This has been running well on my router and doesn't really > :> effect other ALTQ disciplines so I am going to go ahead and commit > :> it to clear room to port the probability keyword that Cedric > :> mentioned, before I get back to finishing up HAMMER. > :> > :> -Matt > : > :For some reason, since a week ago, your servers have been unreachable > : to Linux clients. The problem can be temporarily bypassed by setting > : the Linux sysctl net.ipv4.tcp_window_scaling to 0 > : > :-- > :Robert Luciani > > It's got to be something PF (packet filter) is doing. I was using > a Cisco with the T1. I'm using a DFly box running PF with the DSL > line. I'm trying to track it down. This is usually a symptom of creating state on a TCP packet other than the initial SYN. Make sure you add "flags S/SA" to all your tcp keep state rules. There is plenty on this in the FAQs and lists (freebsd-pf@ and the OpenBSD pf list) for more detailed reference. -- /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News
From: Matthew Dillon Subject: Re: FairQ ALTQ for PF - Patch #2 Date: Apr 6, 4:48 pm 2008 :> It's got to be something PF (packet filter) is doing. I was using :> a Cisco with the T1. I'm using a DFly box running PF with the DSL :> line. I'm trying to track it down. : :This is usually a symptom of creating state on a TCP packet other than the :initial SYN. Make sure you add "flags S/SA" to all your tcp keep state :rules. There is plenty on this in the FAQs and lists (freebsd-pf@ and :the OpenBSD pf list) for more detailed reference. : :-- :/"\ Best regards, | mlaier@freebsd.org :\ / Max Laier | ICQ #67774661 I kinda half understand that. Are you saying that because creating state on other then the initial syn has no information on the window scale (which is only handled in the SYN and SYN+ACK), that it will blow up? Here are two questions: (1) I'm using keep state, not synproxy. Is PF still attempting to do window sequence space comparisons and dropping packets if they do not match? If it is, do you know where in the code that is (I've been staring at it a while trying to find just such a comparison but not having a whole lot of luck). (2) If I restart PF, and do not create state for pre-existing connections, won't that blow up the classification of those connections? In particular, if there are a lot of flows going through the router and it drops some of its state, won't those flows wind up being left out of the state code from that point on? They would not be identifiable to the fairq code, then, which would be a fairly significant problem. What I would like to do, if (1) is true, is modify PF to flag that the state was created without a SYN, and have it automatically ignore sequence space comparisons for that case. -Matt Matthew Dillon <dillon@backplane.com>
From: Max Laier Subject: Re: FairQ ALTQ for PF - Patch #2 Date: Apr 6, 5:32 pm 2008 On Monday 07 April 2008 01:48:28 Matthew Dillon wrote: > :> It's got to be something PF (packet filter) is doing. I was > :> using a Cisco with the T1. I'm using a DFly box running PF with the > :> DSL line. I'm trying to track it down. > : > :This is usually a symptom of creating state on a TCP packet other than > : the initial SYN. Make sure you add "flags S/SA" to all your tcp keep > : state rules. There is plenty on this in the FAQs and lists > : (freebsd-pf@ and the OpenBSD pf list) for more detailed reference. > : > :-- > :/"\ Best regards, | mlaier@freebsd.org > :\ / Max Laier | ICQ #67774661 > > I kinda half understand that. Are you saying that because creating > state on other then the initial syn has no information on the > window scale (which is only handled in the SYN and SYN+ACK), that it > will blow up? Right. > Here are two questions: > > (1) I'm using keep state, not synproxy. Is PF still attempting to > do window sequence space comparisons and dropping packets if they do > not match? If it is, do you know where in the code that is > (I've been staring at it a while trying to find just such a > comparison but not having a whole lot of luck). See the attached forward from the pf mailinglist. The referenced paper is a good read, too. > (2) If I restart PF, and do not create state for pre-existing > connections, won't that blow up the classification of those > connections? Yes, if you also flush states. > In particular, if there are a lot of flows going through the router > and it drops some of its state, won't those flows wind up being > left out of the state code from that point on? They would not be > identifiable to the fairq code, then, which would be a fairly > significant problem. Usually you won't drop active states. You'd simply time them out more aggressively (see adaptive.{start,end} in pf.conf(5) if your version has that already) or not allow a new state to be created. > What I would like to do, if (1) is true, is modify PF to flag that > the state was created without a SYN, and have it automatically ignore > sequence space comparisons for that case. It really depends on what you want to achieve. If you are after security for a network of clients with bad/broken TCP stacks then leaving out the window checks is not a good idea. I can see that there are cases where you'd want to check only the (src,dst,proto)-tuple and pass every matching packet regardless. Currently pf doesn't allow for this to happen statefully and I don't think OpenBSD is going to make that change, ever. If you think of pf as a security first and foremost mechanism this makes sense. I'm also somewhat reluctant to make that change in FreeBSD, otoh there are cases where you'd want that rope. -- /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News
From: Matthew Dillon Subject: Re: FairQ ALTQ for PF - Patch #2 Date: Apr 6, 6:26 pm 2008 :> (1) I'm using keep state, not synproxy. Is PF still attempting to :> do window sequence space comparisons and dropping packets if they do :> not match? If it is, do you know where in the code that is :> (I've been staring at it a while trying to find just such a :> comparison but not having a whole lot of luck). : :See the attached forward from the pf mailinglist. The referenced paper is :a good read, too. (reading that right now) :> and it drops some of its state, won't those flows wind up being :> left out of the state code from that point on? They would not be :> identifiable to the fairq code, then, which would be a fairly :> significant problem. : :Usually you won't drop active states. You'd simply time them out more :aggressively (see adaptive.{start,end} in pf.conf(5) if your version has :that already) or not allow a new state to be created. :... :It really depends on what you want to achieve. If you are after security :for a network of clients with bad/broken TCP stacks then leaving out the :window checks is not a good idea. I can see that there are cases where :you'd want to check only the (src,dst,proto)-tuple and pass every :matching packet regardless. Currently pf doesn't allow for this to :happen statefully and I don't think OpenBSD is going to make that change, :ever. If you think of pf as a security first and foremost mechanism this :makes sense. I'm also somewhat reluctant to make that change in FreeBSD, :otoh there are cases where you'd want that rope. : :-- :/"\ Best regards, | mlaier@freebsd.org :\ / Max Laier | ICQ #67774661 Yah, we have the adaptive.start/end stuff. I think I have a pretty good handle on the issues now. I understand NetBSD's viewpoint on connection tracking. But for my own network I am extremely uncomfortable allowing a router to drop a good TCP connection, and even more uncomfortable having the router control timeouts considering that the only way to overcome such a situation in the face of overload would be to drop the keepalive timeouts on all my machines down to fairly small values. I don't want a reboot of my router to blow up the several hundred active TCP connections from half a dozen servers that are running through it. At the same time I really want to use the keep-state mechanic to serve as a basis for caching that hash code for my fairq. I don't want to roll my own like the WFQ code does... that would be a massive duplication of work. I think the solution is to add another flavor of keep state that is explicitly meant for use with fairq (or fairq-like) mechanisms, or for middle-of-network routing (verses edge routing), which want that hash code or want some sort of identification entity for flows. If I create a 'hash state' keyword that would be fairly obvious in its function. It would basically operate the same as keep state, but explicitly omit any checks which cannot be done if the state is picked up in the middle of the connection. I definitely want to make fairq portable to other OS's. What do you think about a 'hash state' keyword? From a coding perspective it's a little work in parse.y and maybe three or four conditionals in the TCP state code (to omit the sequence space checks for that case). -Matt
From: Max Laier Subject: Re: FairQ ALTQ for PF - Patch #2 Date: Apr 6, 6:54 pm 2008 On Monday 07 April 2008 03:26:32 Matthew Dillon wrote: > :> (1) I'm using keep state, not synproxy. Is PF still attempting > :> to do window sequence space comparisons and dropping packets if they > :> do not match? If it is, do you know where in the code that is (I've > :> been staring at it a while trying to find just such a comparison but > :> not having a whole lot of luck). > : > :See the attached forward from the pf mailinglist. The referenced > : paper is a good read, too. > > (reading that right now) > > :> and it drops some of its state, won't those flows wind up being > :> left out of the state code from that point on? They would not be > :> identifiable to the fairq code, then, which would be a fairly > :> significant problem. > : > :Usually you won't drop active states. You'd simply time them out more > :aggressively (see adaptive.{start,end} in pf.conf(5) if your version > : has that already) or not allow a new state to be created. > :... > :It really depends on what you want to achieve. If you are after > : security for a network of clients with bad/broken TCP stacks then > : leaving out the window checks is not a good idea. I can see that > : there are cases where you'd want to check only the > : (src,dst,proto)-tuple and pass every matching packet regardless. > : Currently pf doesn't allow for this to happen statefully and I don't > : think OpenBSD is going to make that change, ever. If you think of pf > : as a security first and foremost mechanism this makes sense. I'm > : also somewhat reluctant to make that change in FreeBSD, otoh there > : are cases where you'd want that rope. > : > :-- > :/"\ Best regards, | mlaier@freebsd.org > :\ / Max Laier | ICQ #67774661 > > Yah, we have the adaptive.start/end stuff. I think I have a pretty > good handle on the issues now. I understand NetBSD's viewpoint on > connection tracking. > > But for my own network I am extremely uncomfortable allowing a > router to drop a good TCP connection, and even more uncomfortable > having the router control timeouts considering that the only way to > overcome such a situation in the face of overload would be to drop the > keepalive timeouts on all my machines down to fairly small values. I > don't want a reboot of my router to blow up the several hundred active > TCP connections from half a dozen servers that are running through it. > > At the same time I really want to use the keep-state mechanic to > serve as a basis for caching that hash code for my fairq. I don't want > to roll my own like the WFQ code does... that would be a massive > duplication of work. Agreed. The code in WFQ is historical when there was altqd and /dev/altq and the altq_classifier. pf (or any firewall for that matter) really is the place to do the classification. > I think the solution is to add another flavor of keep state that > is explicitly meant for use with fairq (or fairq-like) mechanisms, > or for middle-of-network routing (verses edge routing), which want > that hash code or want some sort of identification entity for > flows. > > If I create a 'hash state' keyword that would be fairly obvious > in its function. It would basically operate the same as keep > state, but explicitly omit any checks which cannot be done if the state > is picked up in the middle of the connection. > > I definitely want to make fairq portable to other OS's. What do > you think about a 'hash state' keyword? From a coding perspective it's > a little work in parse.y and maybe three or four conditionals in the > TCP state code (to omit the sequence space checks for that case). I think "reduced state tracking" and the fairq are orthogonal. You can have either independent of each other. If I were to do reduced states, I'd probably make it a "state-opt" (see pf.conf(5) BNF) so that it could be applied to any keep state rule with various effects. This way you could even do modulate state or synproxy state as long as you see the initial SYN. If not, you fall back to creating a reduced state. This option would, of course, also have a setting where it would always just create a reduced state and be done with it. As for the name ... maybe, 'extra-tcp-state' with a possible setting of 'on' (default), 'off' and 'force-off' or something like that. This could also be a global setting similar to the timeouts which can also be set on a per-rule basis. -- /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News
From: Matthew Dillon Subject: Re: FairQ ALTQ for PF - Patch #2 Date: Apr 7, 12:50 am 2008 :... :could even do modulate state or synproxy state as long as you see the :initial SYN. If not, you fall back to creating a reduced state. This :option would, of course, also have a setting where it would always just :create a reduced state and be done with it. : :As for the name ... maybe, 'extra-tcp-state' with a possible setting :of 'on' (default), 'off' and 'force-off' or something like that. This :could also be a global setting similar to the timeouts which can also be :set on a per-rule basis. : :\ / Max Laier | ICQ #67774661 I came across an interesting item. I believe (but I'm not entirely sure if I am correct) that NetBSD implies S/SA for TCP keep state and it no longer needs to be specified in the rule. Is this correct? It makes sense since keep state is completely broken for TCP if S/SA isn't specified sans the type of augmentation we've been discussing. With that in mind here is my proposed state_opt_item feature. I am soliciting opinions on the feature: [additions to state_opt_item] pickups Specify that mid-stream pickups are to be allowed. The default is to NOT allow mid-stream pickups and implies flags S/SA for TCP connections. If pickups are enabled, flags S/SA are not implied for TCP connections and state can be created for any packet. The implied flags parameters need not be specified in either case unless you explicitly wish to override them, which also allows you to roll-up several protocols into a single rule. Certain validations are disabled when mid-stream pickups occur. For example, the window scaling options are not known for TCP pickups and sequence space comparisons must be disabled. This does not effect state representing fully quantified connections (for which the SYN/SYN-ACK passed through the routing engine). Those connections continue to be fully validated. nopickups Specify that mid-stream pickups are not to be allowed. This is the default and this keyword does not normally need to be specified. However, if you are concerned about rule set portability then specifying this keyword guarantees flags S/SA for TCP connections, and pfctl generates a parse-time error if it doesn't understand the feature. hashonly Implies pickups and maintains a state table entry but disables most validations whether or not the connection has been fully quantified. This feature is used if you do not wish to validate connection state, for example for a router operating in the center of a large network where such validations would be impossible to maintain. However, even though such validations may not be desired you may still require keep state for the purposes of driving the FAIRQ ALTQ. FAIRQ depends on keep state to generate the hash codes identifying the buckets in which it should place packets. You might also want to use this feature to identify high-bandwidth connections via the state table for analysis purposes, even at the center of a large network. -Matt
From: Cédric Berger Subject: Re: FairQ ALTQ for PF - Patch #2 Date: Apr 7, 6:09 am 2008 Matthew Dillon wrote: > :... > :could even do modulate state or synproxy state as long as you see the > :initial SYN. If not, you fall back to creating a reduced state. This > :option would, of course, also have a setting where it would always just > :create a reduced state and be done with it. > : > :As for the name ... maybe, 'extra-tcp-state' with a possible setting > :of 'on' (default), 'off' and 'force-off' or something like that. This > :could also be a global setting similar to the timeouts which can also be > :set on a per-rule basis. > : > :\ / Max Laier | ICQ #67774661 > > I came across an interesting item. I believe (but I'm not entirely > sure if I am correct) that NetBSD implies S/SA for TCP keep > state and it no longer needs to be specified in the rule. Is this > correct? Yes, quoting http://www.openbsd.org/faq/pf/filter.html: In OpenBSD 4.1 and later, the default flags S/SA are applied to all TCP filter rules. Since OpenBSD 4.1, "keep state" is also the default. Cedric
From: Matthew Dillon Subject: Re: FairQ ALTQ for PF - Patch #2 Date: Apr 7, 8:05 am 2008 :Yes, quoting http://www.openbsd.org/faq/pf/filter.html: : :In OpenBSD 4.1 and later, the default flags S/SA are applied to all TCP :filter rules. : :Since OpenBSD 4.1, "keep state" is also the default. : :Cedric I found the code. NetBSD hasn't seemed to have adopted that change. I'm not sure I want to adopt the keep state by default on pass rules but S/SA clearly must be adopted and its default modified by the new options (i.e. S/SA set by default (also for 'nopickups'), and not set if 'pickups' or 'hashonly' since we want to pickup the stream in the middle for the latter two. Some of this stuff is starting to look a little overboard. I can see having keep state on as a default if it didn't have such an adverse effect on existing TCP streams on reboot, but it does and because it does I don't think I want it turned on as a default in DragonFly. Or, alternatively, we could turn it on by default in DragonFly but as 'hashonly' unless a keep state directive is explicitly specified in the rule. But then issues pop up where the administrator might not have wanted keep state for everything due to extreme volumes and doing that could blow out the areas he DID want keep state on. So, right now, I'm inclined not to turn on keep state by default if it isn't specified in the rule. -Matt Matthew Dillon <dillon@backplane.com>
From: Matthew Dillon Subject: Re: FairQ ALTQ for PF - Patch #2 Date: Apr 7, 11:42 am 2008 :I concur. Keep state should be explicit. Furthermore, I don't expect :keep state not to work across reboots. That's why I then write keep :state flags S/SA. Something clearly need to be untangled here. Keep :state should keep state as good as possible, but not reject connections. : :cheers : simon I figured out another reason why linux boxes couldn't connect to me. I wasn't running keep state on incoming traffic, only outgoing. That means the keep state didn't have the initial SYN packet from an outside host making a connection into me. No initial SYN, no window scaling info. My current pickup check is not quite sufficient, either. I have to check that the SYN was observed in both directions. Seeing just one of the SYNs may not be enough. I'll have to re-read the window scaling rules. Max, or anyone... do you happen to remember whether window scaling is negotiated the same for both directions or whether each direction in a TCP connection can use a different scaling factor? -Matt Matthew Dillon <dillon@backplane.com>
From: Max Laier Subject: Re: FairQ ALTQ for PF - Patch #2 Date: Apr 7, 12:32 pm 2008 On Monday 07 April 2008 20:42:08 Matthew Dillon wrote: > :I concur. Keep state should be explicit. Furthermore, I don't expect > :keep state not to work across reboots. That's why I then write keep > :state flags S/SA. Something clearly need to be untangled here. Keep > :state should keep state as good as possible, but not reject > : connections. > : > :cheers > : simon > > I figured out another reason why linux boxes couldn't connect to > me. > > I wasn't running keep state on incoming traffic, only outgoing. > That means the keep state didn't have the initial SYN packet from an > outside host making a connection into me. No initial SYN, no window > scaling info. > > My current pickup check is not quite sufficient, either. I have to > check that the SYN was observed in both directions. Seeing just > one of the SYNs may not be enough. I'll have to re-read the window > scaling rules. > > Max, or anyone... do you happen to remember whether window scaling > is negotiated the same for both directions or whether each > direction in a TCP connection can use a different scaling factor? The latter, wouldn't make much sense if your peer could dictate a scaling factor. The wscale for the other direction is set here: http://fxr.watson.org/fxr/source/net/pf/pf.c?v=DFBSD#L3810 ff. Note that this is in the state tracking already, we are looking at the first packet from src and TH_SYN is set (-> this is the SYN+ACK) from the peer. dst.wscale was already set when the state was created: http://fxr.watson.org/fxr/source/net/pf/pf.c?v=DFBSD#L2727 (where src is the other end sending the initial SYN). At least this is the way things behave when you have "flags S/SA". -- /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News
From: Matthew Dillon Subject: Re: FairQ ALTQ for PF - Patch #2 Date: Apr 7, 2:30 pm 2008 :The latter, wouldn't make much sense if your peer could dictate a scaling :factor. : :The wscale for the other direction is set here: :http://fxr.watson.org/fxr/source/net/pf/pf.c?v=DFBSD#L3810 ff. Note that :this is in the state tracking already, we are looking at the first packet :from src and TH_SYN is set (-> this is the SYN+ACK) from the peer. :dst.wscale was already set when the state was created: :http://fxr.watson.org/fxr/source/net/pf/pf.c?v=DFBSD#L2727 (where src is :the other end sending the initial SYN). : :At least this is the way things behave when you have "flags S/SA". : :\ / Max Laier | ICQ #67774661 Got it. Oooh, that's nasty. It's confirming that the SYN is for the other direction by testing the seqlo variable, which is non-zero on the direction that already got the SYN, and zero on the direction that hasn't. That code comment deserves to be expanded a bit :-) Here's a new patch, changing the one SYN detect flag into two flags and setting them in the proper places. 'pfctl -s state -v -v' now reports three possible states: 'indeterminate', 'incomplete', and 'good'. fetch http://apollo.backplane.com/DFlyMisc/pickups02.patch I did some quick testing and all three states appear to work properly, so if someone forgets to 'keep state' in both directions the state output will say 'incomplete' instead of 'good'. -Matt Matthew Dillon <dillon@backplane.com>
From: Matthew Dillon Subject: FairQ ALTQ for PF - Patch #3 Date: Apr 9, 11:27 am 2008 Ok, here is patch #3. This is the final patch short of bug fixes: fetch http://apollo.backplane.com/DFlyMisc/pickups03.patch * Added set keep-policy to set the default stateful inspection policy. * Removed NetBSD's window scale patch. After playing with keep state for the last few days I understand now why OpenBSD made it the default. I wound up having to put it on every single pass rule I had on my router. However, I continue believe quite strongly that keep state w/ flags S/SA is an inappropriate default due to the adverse effect it has on pre-existing TCP connections, so I wanted to come up with a solution that would be acceptable to projects that might have a different opinion. I came up with set keep-policy in your pf.conf. For example: set keep-policy keep state (pickups) This will cause all pass rules to use the specified policy by default, so it does not have to be specified for each rule. The policy can be overriden in each rule. I implemented the OpenBSD 'no keep' feature as well so it can also be turned off. I did not see a similar feature to my 'set keep-policy' in OpenBSD. I think this is the best solution. This way the fact that stateful inspection is being used is explicitly specified in the pf.conf, which should satisfy everyone, plus additional features such as 'pickups' can be specified cleanly. Unless something comes up I am going to commit this to DragonFly on Friday and call it done. I would be pleased if other projects picked up some or all of the work. Max, if you make fixes or further enhancements to this for any porting you do to FreeBSD could you give me a heads up? I'd like to keep them in sync at least for a little while. -Matt
From: Matthew Dillon Subject: Re: FairQ ALTQ for PF - Patch #3 Date: Apr 9, 11:40 am 2008 Er, in case it wasn't obvious from the content, that's PICKUPS patch #3, not ALTQ patch #3. I borrowed the wrong Subject line. -Matt


Something similar on Linux?
Something similar on Linux?
Should be possible already
Should be possible already if you attach ESFQ to queues managed by HFSC
Traffic shaping
The linux equivalent of altq is the traffic shaping framework.
I'm not sure about the altq patch being discussed here, but the description of it sounds similar to the 'sfq' queue in linux.
Traffic shaping rocks, it really should get used more...
regular sfq user
i use sfq in my router pc at home. 3 years and counting.
fairq for freebsd
I hope that somebody will pick this up for freebsd, this is additional feature that i _really_ want to get in the tree.