Re: [RFC net-next 08/15] ipxlat: add translation engine and dispatch core
From: Toke Høiland-Jørgensen <toke@kernel.org>
Date: 2026-06-29 11:08:24
Also in:
lkml
Ralf Lici [off-list ref] writes:
On Tue, 23 Jun 2026 21:59:44 +0200, Toke Høiland-Jørgensen [off-list ref] wrote:quoted
Ralf Lici [off-list ref] writes:quoted
On the BPF point specifically: I agree a BPF program should be able to decide whether to translate. What I am less sure about is whether redirecting to a netdevice is the best way to expose that. A TC action (yet another model, I know :)) gives you the same thing in-pipeline and more directly: tc filter add dev wwan0 egress \ bpf obj match.o action ipxlat4to6 domain clat0 Let BPF make the policy decision, with the native action doing the translation work that the current BPF CLAT implementations have trouble with: fragmentation, checksum corner cases, and ICMP error inner headers (as explained by Beniamino). So TC clsact looks like the natural in-kernel replacement for today's TC-BPF CLAT programs: no extra netdev, you attach to the existing uplink, direction is explicit, and on egress you sit on the real route dst, so the synthetic-dst and double-routing problems above just don't arise. The cost is more moving parts than a single bpf_redirect since userspace has to manage clsact, filters, priorities and action lifecycle/cleanup.Hmm, so no one really uses the bpf filter mechanism, since you can just do everything from an action anyway (and with TCX attachment, you can even avoid the overhead of the TC filter/action infrastructure entirely). However, point taken wrt how to integrate this with BPF. I guess the most flexible thing would be to expose the functionality directly (as a kfunc callable from a BPF program). Which also fits with your point below:Ah, I see, the cls_bpf example was dated, and I like the kfunc angle better than a new TC action. I would probably keep that as the minimal per-packet interface: BPF can decide whether a packet should be translated, and the kfunc can do the actual translation work for packets whose translated form still fits the output MTU. The full 4->6 fragmentation case still looks like output-path/harness territory to me, since it is a 1->N fan-out operation.
Yeah, that would probably be fine; I would expect that in most cases you'd want to configure your MTU to avoid fragmentation anyway :)
quoted
quoted
For a gateway translator, though, I still think a device-bound model is less natural. There the translation point is more like a forwarding decision across routes and nexthops, so a route/LWT attachment, or possibly a netfilter attachment seems easier to reason about. Also, as you already pointed out while discussing LWT, an admin setting up NAT64 is more likely to reach for an nft rule than for a clsact filter on a specific device. Taking a step back, ipxlat is really a generic translation engine plus a thin harness around it. So rather than pick one attachment, it might be worth structuring the engine so different harnesses can drive it. There's interesting precedent for this shape: - ILA, again, is the closest sibling: stateless IPv6 address translation with a shared core in ila_common.c, driven both by an LWT frontend in ila_lwt.c and by an inline netfilter hook with a netlink-configured mapping table in ila_xlat.c. - act_ct is the precedent for the TC side specifically: a TC action that reuses the netfilter conntrack engine rather than reimplementing it. And act_nat is the cautionary counter-example: a standalone TC reimplementation of stateless NAT that shares no code with nf_nat, and carries a "would be nice to share code" comment :) So I am wondering whether the right direction is to factor the translation engine cleanly, land it with one harness first, and keep the other attachment points as follow-up work once the core semantics are settled. Does that direction seem reasonable to you?Yes, reusable functionality that can be called from multiple places sounds like a good fit; let's try to structure it that way!Great, that's the direction I'll take then.quoted
As for which hook to start with, well, let's see if we hear back from the netfilter devs, but either netfilter or the routing subsystem (LWT style) would be OK for me I think.Works for me. The engine factoring is common to all of them, so I'll start there. Once it's in shape I can sketch a harness against it to sanity-check the interface.
Awesome, sounds good! -Toke