Re: [PATCH net-next] icmp: support rfc 4884

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: 2020-06-30 16:52:48

On Tue, Jun 30, 2020 at 12:41 PM Tom Herbert [off-list ref] wrote:

On Tue, Jun 30, 2020 at 9:16 AM Eric Dumazet [off-list ref] wrote:

quoted



On 6/30/20 6:57 AM, Willem de Bruijn wrote:

quoted

On Mon, Jun 29, 2020 at 10:19 PM Willem de Bruijn
[off-list ref] wrote:

quoted

On Mon, Jun 29, 2020 at 8:37 PM Tom Herbert [off-list ref] wrote:

quoted

On Mon, Jun 29, 2020 at 4:07 PM Eric Dumazet [off-list ref] wrote:

quoted



On 6/29/20 2:30 PM, Willem de Bruijn wrote:

quoted

On Mon, Jun 29, 2020 at 5:15 PM Eric Dumazet [off-list ref] wrote:

quoted



On 6/29/20 9:57 AM, Willem de Bruijn wrote:

quoted

From: Willem de Bruijn <willemb@google.com>

ICMP messages may include an extension structure after the original
datagram. RFC 4884 standardized this behavior.

It introduces an explicit original datagram length field in the ICMP
header to delineate the original datagram from the extension struct.

Return this field when reading an ICMP error from the error queue.

RFC mentions a 'length' field of 8 bits, your patch chose to export the whole
second word of icmp header.

Why is this field mapped to a prior one (icmp_hdr(skb)->un.gateway) ?

Should we add an element in the union to make this a little bit more explicit/readable ?

diff --git a/include/uapi/linux/icmp.h b/include/uapi/linux/icmp.h
index 5589eeb791ca580bb182e1dc38c05eab1c75adb9..427ed5a6765316a4c1e2fa06f3b6618447c01564 100644
--- a/include/uapi/linux/icmp.h
+++ b/include/uapi/linux/icmp.h

@@ -76,6 +76,7 @@ struct icmphdr {
                __be16  sequence;
        } echo;
        __be32  gateway;
+       __be32  second_word; /* RFC 4884 4.[123] : <unused:8>,<length:8>,<mtu:16> */
        struct {
                __be16  __unused;
                __be16  mtu;

Okay. How about a variant of the existing struct frag?

@@ -80,6 +80,11 @@ struct icmphdr {
                __be16  __unused;
                __be16  mtu;
        } frag;
+       struct {
+               __u8    __unused;
+               __u8    length;
+               __be16  mtu;
+       } rfc_4884;
        __u8    reserved[4];
   } un;

Sure, but my point was later in the code :

quoted

+     if (inet_sk(sk)->recverr_rfc4884)
+             info = ntohl(icmp_hdr(skb)->un.gateway);

ntohl(icmp_hdr(skb)->un.second_word);

If you leave there "info = ntohl(icmp_hdr(skb)->un.gateway)" it is a bit hard for someone
reading linux kernel code to understand why we do this.

It's also potentially problematic. The other bits are Unused, which
isn't the same thing as necessarily being zero. Userspace might assume
that info is the length without checking its bounded.

It shouldn't. The icmp type and code are passed in sock_extended_err
as ee_type and ee_code. So it can demultiplex the meaning of the rest
of the icmp header.

It just needs access to the other 32-bits, which indeed are context
sensitive. It makes more sense to me to let userspace demultiplex this
in one place, rather than demultiplex in the kernel and define a new,
likely no simpler, data structure to share with userspace.

Specific to RFC 4884, the 8-bit length field coexists with the
16-bit mtu field in case of ICMP_FRAG_NEEDED, so we cannot just pass
the first as ee_info in RFC 4884 mode. sock_extended_err additionally
has ee_data, but after that we're out of fields, too, so this approach
is not very future proof to additional ICMP extensions.

On your previous point, it might be useful to define struct rfc_4884
equivalent outside struct icmphdr, so that an application can easily
cast to that. RFC 4884 itself does not define any extension objects.
That is out of scope there, and in my opinion, here. Again, better
left to userspace. Especially because as it describes, it standardized
the behavior after observing non-compliant, but existing in the wild,
proprietary extension variants. Users may have to change how they
interpret the fields based on what they have deployed.

As this just shares the raw icmp header data, I should probably
change the name to something less specific to RFC 4884.

Since it would also help with decoding other extensions, such as
the one you mention in  draft-ietf-6man-icmp-limits-08.

Unfortunately I cannot simply reserve IP_RECVERR with integer 2.
Perhaps IP_RECVERR_EXINFO.

Perhaps let the icmp header as is, but provides the extra information
as an explicit ancillary message in ip_recv_error() ?

Really this is all about documentation and providing stable API.

Actually, I think we may have a subtle bug here.

RFC4884 appends the ICMP extension to the original datagram. The RFC
describes backwards compatibility in section 5. To be backwards
compatible with legacy implementations that don't know how to parse
the extension, and in particular to find the length of the original
datagram in the data, the requirement is that at the original datagram
is at least 128 bytes long and it seems to assume no ICMP application
need to parse beyond that. But parsing beyond 128 bytes is possible,
consider that the original datagram was UDP/IPv6 with an extension
header such that the UDP header is offset beyond 128 bytes in the
packet. If we don't take this into account, the UDP ports for socket
lookup would be incorrect.

To fix this, we could check the Length field per the types that
support extensions as described in RFC4884. If it's non-zero then
assume the extension is present, so before processing the original
datagram, e.g. performing socket lookup, trim the skb to the length of
the orignal datagram.

This is somewhat orthogonal to this patch.

You are suggesting proactively protecting applications that do not
know how to parse RFC 4884 compliant packets by truncating
those unless the application sets the new setsockopt?

I'm afraid that might break legacy applications that currently do
expect extension objects. They already can in case of ICMPv6.
And unfortunately they can try in an unsafe manner by parsing
the original datagram for ICMPv4, too.

ICMPv6 differs from ICMP in that it suggests forwarding as much
of the "invoking" packet as possible without exceeding the min
MTU (1280B).

RFC 4884 mentions the minimum 128 B for ICMP but not for
ICMPv6. It specifically keeps the wording about forwarding as
much as possible.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help