Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: 2026-05-25 19:04:03
On Mon, May 25, 2026 at 1:34 PM Jakub Kicinski [off-list ref] wrote:
On Mon, 25 May 2026 12:22:40 -0400 Jamal Hadi Salim wrote:quoted
On Mon, May 25, 2026 at 11:39 AM Jakub Kicinski [off-list ref] wrote:quoted
quoted
So as an alternative to the piece i posted? i.e this:diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c index 79921b8d89ba..8f0f84b50c85 100644 --- a/net/sched/act_pedit.c +++ b/net/sched/act_pedit.c@@ -474,6 +474,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, if (write_offset < 0) { if (skb_cow(skb, -write_offset)) goto bad; + if (write_offset + (int)sizeof(hdata) > 0) { + if (skb_ensure_writable(skb, + min_t(int, skb->len, + write_offset + (int)sizeof(hdata)))) + goto bad; + } } else { if (unlikely(check_add_overflow(write_offset, (int)sizeof(hdata),Yup! Even better.Dude, it's hard to follow you sometimes ;-> It's hard to grok what you mean the problem of "we are writing to frags".Long threads have the tendency of losing focus. Better to just repost the whole diff than tossing snippets at some point.. ;)quoted
Let me try to be verbose and you can narrow it down. There are _two_ codelets dealing with "frags" both of which can be written to. 1) The patch from Rajat deals with zero copy from app with shared flags. That's whats being exploited in the wild. There's a piece of code there that does this in Rajat's patch to handle it: + /* + * If the skb has shared frags the user is likely using zero-copy + * (e.g. sendfile). Those page frags may point to page-cache pages; + * writing into them would silently corrupt the page cache. + * Linearize so pedit operates on a private copy. + * TL;DR: if you want zero-copy, don't use pedit. + */ + if (skb_has_shared_frag(skb)) { + if (__skb_linearize(skb)) + goto bad; + } After you posted, i thought this is the "we are writing to frags" issue you were referring to. I if-zeroed that code and indeed it does not seem that the exploit can be executed even with that taken out.I don't want the skb_has_shared_frag() being added, that's my ask. TBH I missed that skb_ensure_writable() when reading the patch. If we use skb_ensure_writable() before all the writes we can delete the skb_has_shared_frag() (as your test proves). This is the extent to which I care about the patch, anything else - no strong opinion :)
Ok, I will send the patch probably tommorow AM because Rajat may be confused by now ;->
quoted
2) There's the frags coming from the other (driver) direction in particular when skbs get cloned. That is what sashiko2 (nipa variant) pointed out. IOW, there's no active report for that specific one but Han Guidong responded after i posted this:--- a/net/sched/act_pedit.c +++ b/net/sched/act_pedit.c@@ -474,6 +474,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, if (write_offset < 0) { if (skb_cow(skb, -write_offset)) goto bad; + if (write_offset + (int)sizeof(hdata) > 0) { + if (skb_ensure_writable(skb, + min_t(int, skb->len, + write_offset + (int)sizeof(hdata)))) + goto bad; + } } else { if (unlikely(check_add_overflow(write_offset, (int)sizeof(hdata),Saying he was able to recreate that scenario with a kernel module. And that this patchlet fixed it. Hope you are still following at this point ;-> So when you said we can use skb pulls - I thought you were referring to removing the above patchlet and instead to use an skb pull approach (for which you posted a sample patch).Yes, skb_ensure_writable() does a pull already. So the only problem with existing patch was that the negative offset branch was missing a skb_ensure_writable(). Your snippet added it, plugging that hole, so skb_has_shared_frag() can now be 100% safely removed. Hence my "LGTM". Please also remove the skb_store_bits(), and skb_header_pointer(). Unless I'm missing something these are dead code.
That also seems sensible. But needs testing.
quoted
I mentioned the two issues: 1) It will likely break the negative offset that work with pedit already (skb pull could conceivably be tricked to assume a large positive number)Your snippet looked fine tho unnecessarily complex in practice. (AI generated?). I'd go with skb_ensure_writable(sizeof(*ptr)) as the worst case. Packets shorter than 4B are irrelevant in practice. But again, up to you.
Sashiko-nipa provided very good context and implicitly suggested a way to solve it (and it doesnt seem to be a mechanical turk ;->). We need a session at netdev conf to discuss all this tooling. Maybe some of the security people can show up and share their secret trade - it's overloading. cheers, jamal
quoted
2) that skb clones could result in writting into the shared data (which i said i may be overthinking). So which one of the two are you referring to? Or maybe it is both. Should we keep #1? or this the one that should be replaced? Are you ok with patchlet for number #2? Or do you want that replaced? Provide as much context as you can so we dont go back and forth ;->