Thread (4 messages) 4 messages, 2 authors, 2024-02-02

Re: [RFC PATCH] kernel/module: add a safer implementation of try_module_get()

From: Marco Pagani <hidden>
Date: 2024-02-01 14:28:02
Also in: lkml


On 2024-01-30 21:47, Luis Chamberlain wrote:
On Tue, Jan 30, 2024 at 08:36:14PM +0100, Marco Pagani wrote:
quoted
The current implementation of try_module_get() requires the module to
exist and be live as a precondition. While this may seem intuitive at
first glance, enforcing the precondition can be tricky, considering that
modules can be unloaded at any time if not previously taken. For
instance, the caller could be preempted just before calling
try_module_get(), and while preempted, the module could be unloaded and
freed. More subtly, the module could also be unloaded at any point while
executing try_module_get() before incrementing the refount with
atomic_inc_not_zero().

Neglecting the precondition that the module must exist and be live can
cause unexpected race conditions that can lead to crashes. However,
ensuring that the precondition is met may require additional locking
that increases the complexity of the code and can make it more
error-prone.

This patch adds a slower yet safer implementation of try_module_get()
that checks if the module is valid by looking into the mod_tree before
taking the module's refcount. This new function can be safely called on
stale and invalid module pointers, relieving developers from the burden
of ensuring that the module exists and is live before attempting to take
it.

The tree lookup and refcount increment are executed after taking the
module_mutex to prevent the module from being unloaded after looking up
the tree.

Signed-off-by: Marco Pagani <redacted>
It very much sounds like there is a desire to have this but without a
user, there is no justification.
I was working on a set of patches to fix an issue in the fpga subsystem
when I came across your commit 557aafac1153 ("kernel/module: add
documentation for try_module_get()") that made me realize we also had a
safety problem. 

To solve this problem for the fpga manager, we had to add a mutex to
ensure the low-level module still exists before calling
try_module_get(). However, having a safer version of try_module_get()
would have simplified the code and made it more robust against changes.

https://lore.kernel.org/linux-fpga/20240111160242.149265-1-marpagan@redhat.com/ (local)

I suspect there may be other cases where try_module_get() is
inadvertently called without ensuring that the module still exists
that may benefit from a safer implementation.
quoted
+bool try_module_get_safe(struct module *module)
+{
+	struct module *mod;
+	bool ret = true;
+
+	if (!module)
+		goto out;
+
+	mutex_lock(&module_mutex);
If a user comes around then this should be mutex_lock_interruptible(),
and add might_sleep()
Would it be okay to return false if it gets interrupted, or should I
change the return type to int to propagate -EINTR? My concern with
changing the signature is that it would be less straightforward to
use the function in place of try_module_get().
quoted
+
+	/*
+	 * Check if the address points to a valid live module and take
+	 * the refcount only if it points to the module struct.
+	 */
+	mod = __module_address((unsigned long)module);
+	if (mod && mod == module && module_is_live(mod))
+		__module_get(mod);
+	else
+		ret = false;
+
+	mutex_unlock(&module_mutex);
+
+out:
+	return ret;
+}
+EXPORT_SYMBOL(try_module_get_safe);
And EXPORT_SYMBOL_GPL() would need to be used.
Okay, I initially used EXPORT_SYMBOL() to be compatible with
try_module_get().
I'd also expect selftests to be expanded for this case, but again,
without a user, this is just trying to resolve a problem which does not
exist.
I can add selftests in the next versions.
Thanks,
Marco
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help