Re: (Lack of) specification for RX n-tuple filtering

Previous thread: [net 0/2] Clean up netfilter cache on xmit-to-self. by Ben Greear on Thursday, July 22, 2010 - 12:54 pm. (6 messages)

Next thread: Re: RE: Re: Fwd: LVS on local node by Franchoze Eric on Thursday, July 22, 2010 - 2:37 pm. (2 messages)
From: Ben Hutchings
Date: Thursday, July 22, 2010 - 2:02 pm

The n-tuple filtering facility is half-baked at present.  There is an
interface to add filters but none to remove them!  And ETHTOOL_GRXNTUPLE
is not at all symmetric with ETHTOOL_SRXNTUPLE (which I complained about
at the time it was added, to no avail).

An ETHTOOL_RESET command with flag ETH_RESET_FILTER set could be defined
to clear all the filters, but that's a big hammer to use, and I think
that in general drivers should push the same configuration back to the
hardware after resetting it for whatever reason.

So far as I can work out, ixgbe clears all the filters when the filter
table fills up.  Is that true?  Is this really the intended behaviour of
manually set filters?

I also see this in the ixgbe implementation:

	/*
	 * Program the relevant mask registers.  If src/dst_port or src/dst_addr
	 * are zero, then assume a full mask for that field.  Also assume that
	 * a VLAN of 0 is unspecified, so mask that out as well.  L4type
	 * cannot be masked out in this implementation.
	 *
	 * This also assumes IPv4 only.  IPv6 masking isn't supported at this
	 * point in time.
	 */

An IPv4 address of 0 is certainly valid, so this isn't a good rule.  And
in any case, such a rule should be specified *with the interface*, in
<linux/ethtool.h>, not the implementation.

This also implies that 'mask' specifies bits to be ignored, not bits to
be matched.  That also was not specified.

Ben.`

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--

From: Dimitris Michailidis
Date: Thursday, July 22, 2010 - 2:50 pm

It's a bit worse than that.  Currently one can only append filters, not 
insert at a given position, as ethtool_rx_ntuple doesn't have an index 
field.  For devices that use TCAMs, where position matters, it's quite an 
obstacle.  It also means one cannot modify an existing filter by specifying 

--

From: Ben Hutchings
Date: Tuesday, September 7, 2010 - 7:43 am

It looks like drivers for devices that use TCAMs should implement the
RXNFC interface instead.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--

From: Vladislav Zolotarov
Date: Wednesday, December 8, 2010 - 9:24 am

Ben, from ethtool manpage it sounds like RXNFC option defines the way
the RSS hash should be calculated, while SRXNTUPLE is meant to control
the destination Rx queue for a stream specified by a filter/filters. The
semantics for a specification of the steam is also quite different. For
instance, how do u define a rule to drop all packets with source IP
address 192.168.10.200 by means of RXNFC? While with SRXNTUPLE it's
straight forward. So, if I understood the semantics of both interfaces
correctly, there is a very limited range of functionality where they may
replace one another. Pls., correct me if I'm wrong.

I also agree with Dimitris: what we have here is an offload of some
Netfilter functionality to HW. Regardless the HW implementation (TCAM or
not) if it's allowed to configure more than one rule for the same
protocol the ordering of filtering rules is important: for instance if u
change the order of applying the rules in the example below the result
of the filtering for the traffic with both VLAN 4 and destination port
3000 will be different.

ethtool -U ethX flow-type tcp4 vlan 4 action 0
ethtool -U ethX flow-type tcp4 dst-port 3000 action 3

By the way it's also unclear from the ethtool man page if it's allowed
to configure more than one rule for the same protocol. If it's not then
the above example is void... ;) However, if we want to define a proper
filtering interface I think we shouldn't restrict the driver
implementation from defining a set of rules for the same protocol,
allowing not to though.

So, I think that attaching an index to each rule could be a good idea -
this would allow us both inserting rules at the desired positions in the
filtering rule table and editing the existing rules.

It's also unclear what is the relation between RXNFC and SRXNTUPLE. The
last in general may override the decision made based on the hash result.
So, it sounds like applying rules of SRXNTUPLE should come before
applying the RSS logic and only if there was no match RSS ...
From: David Miller
Date: Wednesday, December 8, 2010 - 9:39 am

From: "Vladislav Zolotarov" <vladz@broadcom.com>

It's not the same, this whole ordering thing you expect in netfilter
land is simply not present in these hardware implementations.

The hardware does a parallel TCAM match lookup, and whatever matches
is used.

Some hardware does link-level protocol lookups first, then L3/L4 later
in the RX path right before computing the hash and selecting an RX
queue.

There really is no ordering available, so let's not pretend it can be
used "just like" netfilter rules.

As per the difference between the various ethtool facilities, this
just represents the fact that whats available to offload differs
per device.  The best we can do is encapsulate commonality as best
as we can, but each interface essentially represents what one
major chipset provides.
--

From: Ben Hutchings
Date: Wednesday, December 8, 2010 - 10:29 am

I think the match with the lowest index wins, which is why it's possible
to specify the rule's index (location) with ETHTOOL_SRXCLSRLINS and why

I think the interfaces are actually somewhat more flexible than any of
the current implementations.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--

From: David Miller
Date: Wednesday, December 8, 2010 - 10:31 am

From: Ben Hutchings <bhutchings@solarflare.com>

Yeah you're probably right.

--

From: Vladislav Zolotarov
Date: Thursday, December 9, 2010 - 3:31 am

Ben, practically, with the current ethtool userspace implementation it
seems like there is no way to specify the CAM index of the rule in the
n-tuple interface, is it? So, the decision on the index is up to the
vendor thus creating an uncertainty space. 

And I guess it's exactly what Dimitris meant talking about the index -
he said "a rule index", u say "a CAM index" while generally we are
talking about the same thing. U r referring the ETHTOOL_SRXCLSRLINS but
it has no user space interface yet and it's unclear when it will, while
n-tuple is already there. We can't remove the existing user space
interfaces - I agree. Then let's not adding the interfaces interfering
with the existing ones. This immediately implies that
ETHTOOL_SRXCLSRLINS shell never see light in a userland as a separate
interface and n-tuple user interface should be properly extended to
implement the missing ETHTOOL_SRXCLSRLINS functionality.

Pls., comment.

thanks,
vlad


--

From: Vladislav Zolotarov
Date: Wednesday, December 8, 2010 - 10:31 am

So, u say that in scope of a single protocol all rules create a set
which ordering is a vendor specific and the same configuration of
n-tuple rules may generate different results for the same traffic on
NICs from different vendors? Don't u think it's confusing from the user
point of view? ;)




--

From: Ben Hutchings
Date: Wednesday, December 8, 2010 - 10:22 am

By 'RXNFC interface' I mean ETHTOOL_{G,S}RXCLS* and not

Something like this, I think:

struct ethtool_rxnfc insert_rule = {
	.cmd = ETHTOOL_SRXCLSRLINS,
	.flow_type = IP_USER_SPEC,
	.fs = {
		.flow_type = IP_USER_SPEC,
		.h_u.usr_ip4_spec = {
			.ip4src = inet_aton("192.168.10.200"),
			.ip_ver = ETH_RX_NFC_IP4
		},
		.m_u.usr_ip4_spec = {
			.ip4dst = 0xffffffff,
			.l4_4_bytes = 0xffffffff,
			.tos = 0xff,
			.proto = 0xff
		},
		.ring_cookie = RX_CLS_FLOW_DISC,
		.location = 0,
	}
};


Our hardware (and, I suspect, the ixgbe hardware) has hash tables for
specific types of matching.  There is some control of precedence between



That's right.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--

From: Vladislav Zolotarov
Date: Wednesday, December 8, 2010 - 11:39 am

Aha. Ok. From the remarks in the upstream ethtool.h I see now that
ethtool_rxnfc has quite wide configuration possibilities (including the
above). I missed it before. ;)

Ben, could u, pls., explain me then what's the difference between
defining the rule as u wrote above on top of -N option (nfc) and
defining the rule doing the same thing on top on -U (n-tuple) option and
when I as a user should prefer one option to another? Are they expected
to be implemented differently from FW/HW perspective?

thanks,
vlad

P.S. I see that ethtool.h from the 2.6.36 tree already has the
ethtool_rxnfc that would allow such a filtering definition however from
the man page of the 2.6.36 version of the ethtool package it's unclear
what should be a command line for such a configuration. Is it supported
with the current ethtool version or maybe I'm missing something in a man
page?


--

From: Ben Hutchings
Date: Wednesday, December 8, 2010 - 12:02 pm

The -N option modifies the hash function for all flows of a specific
type (using ETHTOOL_SRXFH) whereas the -U option steers a specific flow
or set of flows (using ETHTOOL_SRXNTUPLE).  The implementation of the -U
option could potentially be made to fallback to ETHTOOL_SRXCLSRLINS if

It's not supported.  Santwona Behera implemented the kernel side of this
but so far as I know he never sent any patches for ethtool.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--

From: Vladislav Zolotarov
Date: Wednesday, December 8, 2010 - 12:10 pm

Having said that, don't u think that it could be more user friendly to
extend the ETHTOOL_SRXCLSRLINS interface to handle the lan_tag and
user_def and drop the n-tuple interface at all?

thanks,
vlad 


--

From: Ben Hutchings
Date: Wednesday, December 8, 2010 - 12:14 pm

No, we can't remove userland interfaces.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--

From: Ben Hutchings
Date: Wednesday, December 8, 2010 - 12:39 pm

Having said that, this particular interface is fairly broken...

$ cat test.c
#include <stddef.h>
#include <stdio.h>

#include <linux/ethtool.h>

int main(void)
{
    printf("%zd\n", offsetof(struct ethtool_rx_flow_spec, ring_cookie));
    return 0;
}
$ cc -m64 -Wall test.c
$ ./a.out 
152
$ cc -m32 -Wall test.c
$ ./a.out 
148

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--

From: Dimitris Michailidis
Date: Wednesday, December 8, 2010 - 11:54 am

I think the mask would be 0 for don't care fields and 1 for care, so

	.m_u.usr_ip4_spec.ip4src = htonl(0xffffffff)
	.m_u.usr_ip4_spec.ip4dst = htonl(0)
etc

There's a lot of overlap between SRXCLSRLINS and SRXNTUPLE and neither is a 
superset.  SRXCLSRLINS has the advantage of specifying position but 
SRXNTUPLE includes vlan and a device-specific field that are handy.

Also for reporting rules GRXNTUPLE is more flexible than GRXCLSRULE as it 
lets the driver specify the information it reports.  In fact I've been 
thinking of using SRXCLSRLINS and GRXNTUPLE for cxgb4 but haven't gotten 

It can be more involved than this.  Our HW allows a rule to select a 
different part of the RSS table so you get a filter hit and still do RSS 
afterwards if you want.  Current ethtool interfaces do not support this, 

--

From: Ben Hutchings
Date: Wednesday, December 8, 2010 - 12:14 pm

That is definitely the opposite of what ixgbe and sfc do for
ethtool_ntuple_rx_flow_spec, and I believe it is the opposite of what
niu does for ethtool_rx_flow_spec.


So does the rule specify an offset added to the output of the RSS hash
and indirection table, or can it also select a different indirection
table?  Our current hardware also has a filter flag for the former
behaviour...  There are still plenty of bits to spare in 'action' and
'ring_cookie' so perhaps we could define a flag for this?

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--

From: Dimitris Michailidis
Date: Wednesday, December 8, 2010 - 12:26 pm

These are the values as our HW at least wants them.  The care bits are 1 in 

You can partition the indirection table and then a rule can specify that 
matching packets should consult region X of the table.  The hash value is 

--

Previous thread: [net 0/2] Clean up netfilter cache on xmit-to-self. by Ben Greear on Thursday, July 22, 2010 - 12:54 pm. (6 messages)

Next thread: Re: RE: Re: Fwd: LVS on local node by Franchoze Eric on Thursday, July 22, 2010 - 2:37 pm. (2 messages)