Thread (28 messages) 28 messages, 6 authors, 2018-12-20

Re: [PATCH net-next,v6 00/12] add flow_rule infrastructure

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: 2018-12-18 19:57:15

Hi Jakub,

On Mon, Dec 17, 2018 at 05:39:29PM -0800, Jakub Kicinski wrote:
On Fri, 14 Dec 2018 19:11:53 +0100, Pablo Neira Ayuso wrote:
quoted
Hi,

This patchset introduces a kernel intermediate representation (IR) to
express ACL hardware offloads, this is heavily based on the existing
flow dissector infrastructure and the TC actions. This IR can be used by
different frontend ACL interfaces such as ethtool_rxnfc, tc and
netfilter to represent ACL hardware offloads.

The main goals of this patchset are:

* Provide an unified representation that can be passed to the driver
  for translation to HW IR. This consolidates the code to be maintained
  in the driver and it also simplifies the development of ACL hardware
  offloads. Driver developers do not need to add one specific parser for
  each supported ACL frontend, instead each frontend just generates
  this flow_rule IR and pass it to drivers to populate the hardware IR.
Ack.
Thanks.
quoted
* Do not expose TC software frontend details to the drivers anymore,
  such as structure layouts. Hence, if TC needs to be updated to support
  a new software feature, no changes to the drivers will be required.
This one you may need to clarify.  TC already transforms the rules and
for the most part provides abstract enough accessor helpers to protect
from changes in internal implementation of the SW path.
Drivers can still modify the TC structure content, they can modify
frontend configurations, make tc dump non-sense values to userspace
via netlink or they may crash software plane at a later stage. Better
if we tend to expose configuration plane data through read-only
infrastructure that is consumed by drivers.
You could make an argument that if one subsystem already offloads
something and the IR can express it another subsystem can add the same
functionality and drivers don't have to change.  But why do we have
multiple subsystem offloading the same thing in the first place?..
Having multiple subsystems that allow you to do the same thing is a
decision that was made long time ago. This is allowing multiple
approaches to compete inside the kernel and this also allowing users
to select the subsystem that fits better their requirements.

Anyway, the problem that this patchset addresses _already exists_ with
ethtool_rxnfc and cls_flower in place, it would be good to fix it now.
quoted
In handcrafted ascii art, the idea is the following:

                .   ethtool_rxnfc   tc      netfilter
               |       (ioctl)   (netlink)  (netlink)
               |          |         |           |     translate native
      Frontend |          |         |           |  interface representation
               |          |         |           |      to flow_rule IR
               |          |         |           |
                .         \/        \/         \/
                .        ----- flow_rule IR ------
               |                     |
       Drivers |                     | parsing of flow_rule IR
               |                     |  to populate hardware IR
               |                    \/
                .            hardware IR (driver)

This patchset only converts tc and ethtool_rxnfc to use this
infrastructure. Therefore. this patchset comes with no netfilter
changes at this stage.
Let's try to differentiate between cls_flower and TC as a subsystem.
Right. cls_flower is the one more capable subset of tc at this stage.
[...]
quoted
P.S: Moving forward, my rough plan is to propose and discuss the
following changes to use this infrastructure from netfilter for ACL
hardware offload:

* Rename tc_setup ndo to flow_offload (or similar), so it can be used
  from netfilter too. Otherwise, I'm open for alternatives.
That is the part which really makes me uneasy.  Ordering rules, tables
and offloads in HW is already a hard question, which keeps coming back
(see Jesse's talk from LPC for example).  (And if you think you know
the answer you probably forgot about legacy SR-IOV :))

The TC offloads themselves today are still quite flaky. [...]
I'm very optimistic in what I'm seeing, patch 1/12 is rather large
because we already have a _fair good amount of drivers_ using the
existing upstream infrastructure.

I don't see anything unfixable from the existing approach that we can
incrementally extend what is already upstream to solve the existing
limitations.

[...]
I don't really see how you could rename ndo_setup_tc to flow_offload.
We can start by using a single .ndo and use the enum parameter to
route the offload request to the corresponding driver handler to deal
with tc details at this stage. So we use the single .ndo as a
multiplexor. By consolidating parsers for ethtool_rxnfc and
cls_flower, we are already saving redundant code in this patchset, and
I can see more codebase in the tree that drivers could simplify.
What level of abstraction are we really going to achieve here?  The
devices still need to know netfilter is a separate subsystem with its
quirks and rules.  Say someone adds a PASS rule to flower, and a DROP
rule to netfilter.  Packet has to be dropped, but the device will most
likely PASS because flower's rule will hit first.  n-tuple filters
don't have pass so the problem isn't obvious from this patch set.
In software, tc ingress comes before netfilter, then ethtool_rxnfc
comes before tc ingress and netfilter. Once we have a single .ndo,
performing unified resource management for offloads will be easier,
ie. store ownership of rules in the driver ACL database, then when
rebuilding ACL offload, we make sure we place ethtool rules before tc,
then tc before netfilter. So we achieve the same semantics as in
software. We cannot solve this problem until we have unified
infrastructure to address this.

[...]
I don't mean pretending its flower rules just for offload, but rather
have a TC classifier module that can take nft rules (in SW path) and
which would semantically fit very clearly into the existing TC
framework.  Mirroring SW behaviour in HW becomes easier, we don't have
to teach drivers about another subsystem.
Then, we'll have to rewrite userspace netfilter to run on top of TC.
Moreover, add support for conntrack, NAT, logging... And after all
that is done. For netlink users, we'll have to translate netfilter
netlink message to TC from the kernel, then route this to TC. It will
take ages to rewrite netfilter on top of tc and there will be semantic
differences. This patchset already deals with the existing diversity
in the ecosystem by providing a single unified representation for the
hardware, no matter what frontend is being used for this goal.

Cheers,
Pablo
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help