Re: [PATCH v7 08/16] rust: add devres abstraction
From: Danilo Krummrich <dakr@kernel.org>
Date: 2025-01-15 06:42:03
Also in:
linux-pci, lkml, rcu, rust-for-linux
On Tue, Jan 14, 2025 at 05:33:45PM +0100, Danilo Krummrich wrote:
quoted
quoted
+impl<T> Drop for Devres<T> { + fn drop(&mut self) { + // Revoke the data, such that it gets dropped already and the actual resource is freed. + // + // `DevresInner` has to stay alive until the devres callback has been called. This is + // necessary since we don't know when `Devres` is dropped and calling + // `devm_remove_action()` instead could race with `devres_release_all()`.IIUC, the outcome of that race is the `WARN` if devres_release_all takes the spinlock first and has already remvoed the action? Could you do a custom devres_release here that mimick `devm_remove_action` but omit the `WARN`? This way it allows the memory behind DevresInner to be freed early without keeping it allocated until the end of device lifetime.Better late than never, I now remember what's the *actual* race I was preventing here: | Task 0 | Task 1 --|---------------------------------------------------------------------------- 1 | Devres::drop() { | devres_release_all() { 2 | DevresInner::remove_action() { | spin_lock_irqsave(); 3 | | remove_nodes(&todo); 4 | | spin_unlock_irqrestore(); 5 | devm_remove_action_nowarn(); | 6 | let _ = Arc::from_raw(); | 7 | } | 8 | } | 9 | | release_nodes(&todo); 10| | } So, if devres_release_all() wins the race it stores the devres action within the temporary todo list. Which means that in 9 we enter DevresInner::devres_callback() even though DevresInner has been freed already. Unfortunately, this race can happen with [1], but not with this original version of Devres.
Well, actually this can't happen, since obviously we know whether Devres:drop removed it or whether it will be released by devres_release_all() and we only free DevresInner conditionally. So, the code in [1] is fine; I somehow managed to confuse myself. Sorry for the noise. - Danilo
I see two ways to fix it: 1) Just revert [1] and stick with the original version. 2) Use devm_release_action() instead of devm_remove_action() and don't drop the reference in DevresInner::remove_action() (6). This way the reference is always dropped from the callback. With 2) we still have an unnecessary call to revoke() if Devres is dropped before the device is detached from the driver, but that's still better than keeping DevresInner alive until the device is detached from the driver, which the original version did. Hence, I'll go ahead and prepare a patch for this, unless there are any concerns. - Danilo [1] https://lore.kernel.org/rust-for-linux/20250107122609.8135-2-dakr@kernel.org/ (local)