Thread (40 messages) 40 messages, 6 authors, 3d ago

Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption

From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: 2026-05-25 19:04:03

On Mon, May 25, 2026 at 1:34 PM Jakub Kicinski [off-list ref] wrote:
On Mon, 25 May 2026 12:22:40 -0400 Jamal Hadi Salim wrote:
quoted
On Mon, May 25, 2026 at 11:39 AM Jakub Kicinski [off-list ref] wrote:
quoted
quoted
So as an alternative to the piece i posted? i.e this:
diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index 79921b8d89ba..8f0f84b50c85 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -474,6 +474,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
                if (write_offset < 0) {
                        if (skb_cow(skb, -write_offset))
                                goto bad;
+                       if (write_offset + (int)sizeof(hdata) > 0) {
+                               if (skb_ensure_writable(skb,
+                                               min_t(int, skb->len,
+                                                     write_offset + (int)sizeof(hdata))))
+                                       goto bad;
+                       }
                } else {
                        if (unlikely(check_add_overflow(write_offset,
                                                        (int)sizeof(hdata),
Yup! Even better.
Dude, it's hard to follow you sometimes ;-> It's hard to grok what you
mean the problem of "we are writing to frags".
Long threads have the tendency of losing focus.
Better to just repost the whole diff than tossing snippets at some point.. ;)
quoted
Let me try to be verbose and you can narrow it down. There are _two_
codelets dealing with "frags" both of which can be written to.

1) The patch from Rajat deals with zero copy from app with shared
flags. That's whats being exploited in the wild.
There's a piece of code there that does this in Rajat's patch to handle it:

+       /*
+        * If the skb has shared frags the user is likely using zero-copy
+        * (e.g. sendfile).  Those page frags may point to page-cache pages;
+        * writing into them would silently corrupt the page cache.
+        * Linearize so pedit operates on a private copy.
+        * TL;DR: if you want zero-copy, don't use pedit.
+        */
+       if (skb_has_shared_frag(skb)) {
+               if (__skb_linearize(skb))
+                       goto bad;
+       }

After you posted, i thought this is the "we are writing to frags"
issue you were referring to.
I if-zeroed that code and indeed it does not seem that the exploit can
be executed even with that taken out.
I don't want the skb_has_shared_frag() being added, that's my ask.
TBH I missed that skb_ensure_writable() when reading the patch.

If we use skb_ensure_writable() before all the writes we can delete
the skb_has_shared_frag() (as your test proves). This is the extent
to which I care about the patch, anything else - no strong opinion :)
Ok, I will send the patch probably tommorow AM because Rajat may be
confused by now ;->
quoted
2) There's the frags coming from the other (driver) direction in
particular when skbs get cloned. That is what sashiko2 (nipa variant)
pointed out. IOW, there's no active report for that specific one but
Han Guidong responded after i posted this:
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -474,6 +474,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
                if (write_offset < 0) {
                        if (skb_cow(skb, -write_offset))
                                goto bad;
+                       if (write_offset + (int)sizeof(hdata) > 0) {
+                               if (skb_ensure_writable(skb,
+                                               min_t(int, skb->len,
+                                                     write_offset + (int)sizeof(hdata))))
+                                       goto bad;
+                       }
                } else {
                        if (unlikely(check_add_overflow(write_offset,
                                                        (int)sizeof(hdata),
Saying he was able to recreate that scenario with a kernel module. And
that this patchlet fixed it.

Hope you are still following at this point ;->
So when you said we can use skb pulls - I thought you were referring
to removing the above patchlet and instead to use an skb pull approach
(for which you posted a sample patch).
Yes, skb_ensure_writable() does a pull already. So the only problem
with existing patch was that the negative offset branch was missing
a skb_ensure_writable(). Your snippet added it, plugging that hole,
so skb_has_shared_frag() can now be 100% safely removed. Hence my
"LGTM".

Please also remove the skb_store_bits(), and skb_header_pointer().
Unless I'm missing something these are dead code.
That also seems sensible. But needs testing.
quoted
I mentioned the two issues:
1) It will likely break the negative offset that work with pedit
already (skb pull could conceivably be tricked to assume a large
positive number)
Your snippet looked fine tho unnecessarily complex in practice.
(AI generated?). I'd go with skb_ensure_writable(sizeof(*ptr))
as the worst case. Packets shorter than 4B are irrelevant in practice.
But again, up to you.
Sashiko-nipa provided very good context and implicitly suggested a way
to solve it (and it doesnt seem to be a mechanical turk ;->).

We need a session at netdev conf to discuss all this tooling. Maybe
some of the security people can show up and share their secret trade -
it's overloading.

cheers,
jamal
quoted
2) that skb clones could result in writting into the shared data
(which i said i may be overthinking).

So which one of the two are you referring to? Or maybe it is both.
Should we keep #1? or this the one that should be replaced?
Are you ok with patchlet for number #2? Or do you want that replaced?

Provide as much context as you can so we dont go back and forth ;->
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help