Re: [PATCH net-next] icmp: support rfc 4884
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: 2020-06-29 20:56:29
On Mon, Jun 29, 2020 at 4:34 PM Tom Herbert [off-list ref] wrote:
On Mon, Jun 29, 2020 at 12:23 PM Willem de Bruijn [off-list ref] wrote:quoted
From: Willem de Bruijn <willemb@google.com> ICMP messages may include an extension structure after the original datagram. RFC 4884 standardized this behavior. It introduces an explicit original datagram length field in the ICMP header to delineate the original datagram from the extension struct. Return this field when reading an ICMP error from the error queue. ICMPv6 by default already returns the entire 32-bit part of the header that includes this field by default. For consistency, do the exact same for ICMP. So far it only returns mtu on ICMP_FRAG_NEEDED and gw on ICMP_PARAMETERPROB. For backwards compatibility, make this optional, set by setsockopt SOL_IP/IP_RECVERR_RFC4884. For API documentation and feature test, see https://github.com/wdebruij/kerneltools/blob/master/tests/recv_icmp.c Alternative implementation to reading from the skb in ip_icmp_error is to pass the field from icmp_unreach, again analogous to ICMPv6. But this would require changes to every $proto_err() callback, which for ICMP_FRAG_NEEDED pass the u32 info arg to a pmtu update function. Signed-off-by: Willem de Bruijn <willemb@google.com> --- include/net/inet_sock.h | 1 + include/uapi/linux/in.h | 1 + net/ipv4/ip_sockglue.c | 12 ++++++++++++ 3 files changed, 14 insertions(+)diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index a7ce00af6c44..a3702d1d4875 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h@@ -225,6 +225,7 @@ struct inet_sock { mc_all:1, nodefrag:1; __u8 bind_address_no_port:1, + recverr_rfc4884:1, defer_connect:1; /* Indicates that fastopen_connect is set * and cookie exists so we defer connect * until first data frame is writtendiff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h index 8533bf07450f..3d0d8231dc19 100644 --- a/include/uapi/linux/in.h +++ b/include/uapi/linux/in.h@@ -123,6 +123,7 @@ struct in_addr { #define IP_CHECKSUM 23 #define IP_BIND_ADDRESS_NO_PORT 24 #define IP_RECVFRAGSIZE 25 +#define IP_RECVERR_RFC4884 26 /* IP_MTU_DISCOVER values */ #define IP_PMTUDISC_DONT 0 /* Never send DF frames */diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index 84ec3703c909..525140e3947c 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c@@ -398,6 +398,9 @@ void ip_icmp_error(struct sock *sk, struct sk_buff *skb, int err, if (!skb) return; + if (inet_sk(sk)->recverr_rfc4884) + info = ntohl(icmp_hdr(skb)->un.gateway); +Willem, Doesn't this assume that all received ICMP errors received on the socket use the extended RFC4884 format once the option is set on the socket? I think what we might need to do this properly is to switch on the ICMP Type/Code to determine if the format is RFC4884. If it is then, we can figure out where the appropriate info for the ICMP error is. Good example of this is draft-ietf-6man-icmp-limits-08 currently on the RFC editor queue. A code for destination unreachable is added for "aggregate header limit exceeded". An RFC4884 format is used to contain an ICMP extension that includes a pointer to the offending byte (unlike parameter problem, destination unreachable doesn't have Pointer in ICMP header). So in this case it makes sense that the kernel returns the Pointer in extended ICMP as the info.
This implementation matches the existing behavior of ICMPv6. You're right that the RFC 4884 length field is relevant only for a selection of ICMP types. For other types most of the bits are reserved. I don't see any issue with exposing those. Anything more fine grained will have to be extended for each additional feature such as the one you mention. I prefer to just make this information available once and for all. It really helped that it was already available on v6, for instance.
Tomquoted
serr = SKB_EXT_ERR(skb); serr->ee.ee_errno = err; serr->ee.ee_origin = SO_EE_ORIGIN_ICMP;@@ -755,6 +758,7 @@ static int do_ip_setsockopt(struct sock *sk, int level, case IP_RECVORIGDSTADDR: case IP_CHECKSUM: case IP_RECVFRAGSIZE: + case IP_RECVERR_RFC4884: if (optlen >= sizeof(int)) { if (get_user(val, (int __user *) optval)) return -EFAULT;@@ -914,6 +918,11 @@ static int do_ip_setsockopt(struct sock *sk, int level, if (!val) skb_queue_purge(&sk->sk_error_queue); break; + case IP_RECVERR_RFC4884: + if (val != 0 && val != 1) + goto e_inval; + inet->recverr_rfc4884 = val; + break; case IP_MULTICAST_TTL: if (sk->sk_type == SOCK_STREAM) goto e_inval;@@ -1588,6 +1597,9 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname, case IP_RECVERR: val = inet->recverr; break; + case IP_RECVERR_RFC4884: + val = inet->recverr_rfc4884; + break; case IP_MULTICAST_TTL: val = inet->mc_ttl; break; --2.27.0.212.ge8ba1cc988-goog