Re: [PATCH v2 2/7] seg6: add End.M.GTP4.E behavior
From: Yuya Kusakabe <hidden>
Date: 2026-06-11 02:59:59
Also in:
linux-doc, linux-kselftest, lkml
Hi Andrea, Thank you for the detailed review. I intend to address every point below in the rework agreed in the cover letter thread (one RFC series per behavior, on a new lwtunnel encap type with its own SEG6_MOBILE_* attribute namespace); answers to your questions inline.
There are three issues with the End.M.GTP4.E UAPI behavior: (1) In the example, v4_mask_len = 32 and the src value is required but not used.
[...]
(2) With v4_mask_len < 32, the low bits of the IPv4 SA that the inbound packet does not carry come from src.
[...]
(3) With v4_mask_len < 32, the low bits of the IPv4 DA that the SID does not carry are set to zero.
[...]
Together with the IPv4 SA behavior in (2), does it make sense to support v4_mask_len values other than 32?
Agreed on all three, and no, I see no practical use for v4_mask_len values other than 32. I will drop both v4_mask_len and the src template from End.M.GTP4.E: the next version will recover the full 32-bit IPv4 DA from the SID right after the locator, and the full 32-bit IPv4 SA from the inbound IPv6 SA at bit offset v6_src_prefix_len (default 64), per RFC 9433 Figures 9 and 10. That leaves no required attribute whose value can be unused or partially discarded.
This check cannot trigger at runtime. seg6_mobile_v4_validate() already rejects v6_src_prefix_len > 96 at install time, and the default (64) leaves p_bits at most 96. If kept for defense in depth, or if you plan to reuse this helper elsewhere, the silent return of 0 produces an outgoing IPv4 packet with SA = 0.0.0.0 without any error. Nit: the (unsigned int) cast on p_bits can be removed.
Will fix: the range will be enforced only at build_state time with an extack message, so the data path helper cannot fail silently, and the cast will go away.
seg6_mobile_skb_prefix_bits() reads the matched route prefix length from the FIB on every packet. [...] Caching it in slwt->mobile_info at validate would let the data path read minfo->locator_bits directly.
[...]
The operator cannot configure the IPv4 DA to start at a position different from where the route prefix ends. Is this intentional?
It was, but it does not survive your first point: deriving the locator length from the matched route prefix couples the SID layout to the routing table and costs a per-packet FIB read. I will replace it with an explicit locator-length attribute (sr_prefix_len, validated at build_state time), which removes the per-packet rcu/container_of walk and the 128 fallback, and also decouples the IPv4 DA position from the route prefix, so one route can cover SIDs whose locator is longer than the route prefix.
Nit: skb_push() returns void * so the explicit casts on two of the three calls are unnecessary.
Will fix.
orig_dst may be NULL or stale. This is an NF_HOOK finish callback, and Netfilter processing during NF_INET_PRE_ROUTING can drop or replace the skb dst. skb->cb is not guaranteed to be untouched (IPCB/IP6CB aliases it).
As agreed in the cover letter thread, the initial per-behavior series will drop the NF_HOOK split entirely: each behavior becomes a single input function with no skb->cb context and no finish callback. To be revisited on top of your seg6 netfilter fix.
Same four-sizeof sum as ovhd above. Move ovhd to function scope to avoid the repetition.
Will fix; with iptunnel_handle_offloads() below, the GSO-vs-MTU check itself goes away, leaving a single worst-case skb_cow_head().
input_action_end_m_gtp4_e_finish() never calls iptunnel_handle_offloads(), so GSO of the outer UDP tunnel cannot operate correctly. The call should go between skb_cow_head() and seg6_mobile_push_gtpu().
[...]
The fix needs end-to-end testing to confirm it works correctly.
Will do exactly as you describe. I will also add a GSO case to the per-behavior selftest so the fix gets end-to-end coverage.
The caller already reserves worst-case headroom, so the skb_cow_head() calls inside seg6_mobile_push_gtpu() are always no-ops. Could they be removed?
Yes, will remove them and make the helper non-failing; the caller reserves the worst case once.
Does this mirror the drivers/net/gtp.c data-path convention (src == dst == 2152)?
Yes -- drivers/net/gtp.c transmits from its bound socket port, so its packets also leave with src == dst == 2152.
The comment says the packet "traverses NF_INET_LOCAL_OUT", but dst_output() goes through NF_INET_POST_ROUTING only. [...] IMHO, gateways that do protocol transformation can set rp_filter=0. Using ip_route_input() + dst_input() and documenting the rp_filter requirement is preferable to silently bypassing the FORWARD chain.
Agreed; will switch to ip_route_input() + dst_input() so the rebuilt IPv4 packet traverses NF_INET_PRE_ROUTING and NF_INET_FORWARD like the other behaviors, and will document the rp_filter=0 requirement on the ingress device (code comment and the iproute2 man page).
The anonymous { } scope block should be avoided. rt and fl4 should be
declared at the function top. The same pattern appears in other patches
of this series.[...]
Variable declarations are not in reverse Christmas tree order. Same issue in the other functions introduced by this patch.
Will fix both across the series, splitting the egress half into its own helper with all declarations at function top.
This pskb_may_pull pulls the OUTER IPv6 header, not the inner T-PDU. BAD_INNER is the wrong drop reason here. [...] Same BAD_INNER misuse: ipv6_skip_exthdr() is parsing the OUTER IPv6 extension headers [...]
Right. Per the cover letter thread the MUP-specific drop reasons will be left out of the initial per-behavior series; the behaviors will adopt the SRv6-level reasons from your prep series once it lands, and BAD_SID / BAD_GTPU will return with correct scoping at that point.
frag_off is not checked after ipv6_skip_exthdr(). [...] Adding "if (frag_off) goto drop;" after the ipv6_skip_exthdr() call would handle this.
Will add the check, plus a fragmented-input case to the selftests as discussed in the cover letter thread.
Same seg6_mobile_get_validated_srh() bug described in my reply to patch 1.
Will fix as discussed there: the helper will return a three-state result (absent / present / malformed) so an absent SRH is distinguishable from a malformed one, with HMAC validated whenever an SRH is present.
RFC 6040 Section 1.1 scopes the document to ECN field processing in IP-in-IP tunnels. The patch cites it for DSCP and Hop Limit/TTL on a protocol conversion (the IPv6 outer is popped, not encapsulated). Could you clarify the RFC 6040 reference?
You are right, the citation was over-scoped. I will drop the reference and describe the DSCP/ECN and Hop Limit to TTL propagation in plain words.
The inner_nfproto-based size selection appears several times: pskb_may_pull, skb->protocol, and skb_set_transport_header. Computing a local inner_hdr_len once inside the switch would replace all three. Same pattern in every other behavior.
Will fix in every behavior.
parse_nla_mobile_pdu_type() accepts the full 4-bit range 0..15, but the function comment in seg6_mobile_push_gtpu() notes that only 0 (downlink) and 1 (uplink) have a defined meaning. RFC 9433 describes the E behaviors in the downlink packet flows (Section 5.3.1.2 and 5.3.2.2). How are PDU Type 1 (uplink) and the reserved values supposed to be used with the GTP*.E behaviors? UAPI cannot be tightened after merge. IMHO, if a type is not supported yet, the parser should reject it and notify userspace.
Agreed. RFC 9433 describes End.M.GTP4.E only in the downlink toward the gNB (Sections 6.6 and 5.3.2.2), so the End.M.GTP4.E series will accept only PDU Type 0 (DL) and reject everything else with an extack message. PDU Type 1 (UL) belongs to End.M.GTP6.E, which regenerates uplink GTP-U in the drop-in mode (Section 5.4: "There is one instance of the End.M.GTP6.E SID per PDU type"), so it will be enabled by that series; the reserved values 2..15 stay rejected there too. Thanks, Yuya