Thread (23 messages) 23 messages, 5 authors, 2021-08-10

Re: [PATCH net] net: bridge: validate the NUD_PERMANENT bit when adding an extern_learn FDB entry

From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: 2021-08-02 11:20:04
Also in: bridge

On Mon, Aug 02, 2021 at 02:02:36PM +0300, Nikolay Aleksandrov wrote:
quoted
quoted
Actually I believe there is still a bug in 52e4bec15546 even with this fix.
The flag can change after the dst has been read in br_switchdev_fdb_notify()
so in theory you could still do a null pointer dereference. fdb_notify()
can be called from a few places without locking. The code shouldn't dereference
the dst based on the flag.
Are you thinking of a specific code path that triggers a race between
(a) a writer side doing WRITE_ONCE(fdb->dst, NULL) and then
    set_bit(BR_FDB_LOCAL, &fdb->flags), exactly in this order, and
Visible order is not guaranteed, there are no barriers neither at writer nor reader
sides, especially when used without locking. So we cannot make any assumptions
about the order visibility of these writes.
quoted
(b) a reader side catching that fdb exactly in between the above 2
    statements, through fdb_notify or otherwise (br_fdb_replay)?

Because I don't see any.

Plus, I am a bit nervous about protecting against theoretical/unproven
races in a way that masks real bugs, as we would be doing if I add an
extra check in br_fdb_replay_one and br_switchdev_fdb_notify against the
case where an entry has fdb->dst == NULL but not BR_FDB_LOCAL.
The bits are _not_ visible atomically with the setting of ->dst. It is obvious
you must not dereference anything based on them, they are only indications when used
outside of locked regions and code must be able to deal with inconsistencies as that
is implied by the way they're used. It is a clear and obvious bug dereferencing based
on a bit that can change in parallel without any memory ordering guarantees.
Ok, I will send a separate patch for that.
You are not "masking" anything, but fixing what is currently buggy use of fdb bits.
I am "masking" in the sense that the bug I am fixing here was not
obvious to me until it triggered a NPD. That would stop happening with
the patch I'm about to send, but maybe there are still bridge UAPI
functions that do not validate the 'permanent' flag from FDB entries.
As I already said - this doesn't fix the null deref bug completely, in fact it fixes a different
inconsistency, before at worst you'd get blackholed traffic for such entries now
you get a null pointer dereference.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help