Re: [PATCH net] net: bridge: validate the NUD_PERMANENT bit when adding an extern_learn FDB entry
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: 2021-08-02 11:20:04
Also in:
bridge
On Mon, Aug 02, 2021 at 02:02:36PM +0300, Nikolay Aleksandrov wrote:
quoted
quoted
Actually I believe there is still a bug in 52e4bec15546 even with this fix. The flag can change after the dst has been read in br_switchdev_fdb_notify() so in theory you could still do a null pointer dereference. fdb_notify() can be called from a few places without locking. The code shouldn't dereference the dst based on the flag.Are you thinking of a specific code path that triggers a race between (a) a writer side doing WRITE_ONCE(fdb->dst, NULL) and then set_bit(BR_FDB_LOCAL, &fdb->flags), exactly in this order, andVisible order is not guaranteed, there are no barriers neither at writer nor reader sides, especially when used without locking. So we cannot make any assumptions about the order visibility of these writes.quoted
(b) a reader side catching that fdb exactly in between the above 2 statements, through fdb_notify or otherwise (br_fdb_replay)? Because I don't see any. Plus, I am a bit nervous about protecting against theoretical/unproven races in a way that masks real bugs, as we would be doing if I add an extra check in br_fdb_replay_one and br_switchdev_fdb_notify against the case where an entry has fdb->dst == NULL but not BR_FDB_LOCAL.The bits are _not_ visible atomically with the setting of ->dst. It is obvious you must not dereference anything based on them, they are only indications when used outside of locked regions and code must be able to deal with inconsistencies as that is implied by the way they're used. It is a clear and obvious bug dereferencing based on a bit that can change in parallel without any memory ordering guarantees.
Ok, I will send a separate patch for that.
You are not "masking" anything, but fixing what is currently buggy use of fdb bits.
I am "masking" in the sense that the bug I am fixing here was not obvious to me until it triggered a NPD. That would stop happening with the patch I'm about to send, but maybe there are still bridge UAPI functions that do not validate the 'permanent' flag from FDB entries.
As I already said - this doesn't fix the null deref bug completely, in fact it fixes a different inconsistency, before at worst you'd get blackholed traffic for such entries now you get a null pointer dereference.