Re: [PATCH v2] module: Don't wait for GOING modules

From: Petr Mladek <pmladek@suse.com>
Date: 2022-12-07 15:15:27
Also in: lkml, stable

On Tue 2022-12-06 17:57:30, Petr Pavlu wrote:

On 12/5/22 20:54, Petr Mladek wrote:

quoted

On Mon 2022-12-05 11:35:57, Petr Pavlu wrote:

quoted

During a system boot, it can happen that the kernel receives a burst of
requests to insert the same module but loading it eventually fails
during its init call. For instance, udev can make a request to insert
a frequency module for each individual CPU when another frequency module
is already loaded which causes the init function of the new module to
return an error.

Since commit 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for
modules that have finished loading"), the kernel waits for modules in
MODULE_STATE_GOING state to finish unloading before making another
attempt to load the same module.

This creates unnecessary work in the described scenario and delays the
boot. In the worst case, it can prevent udev from loading drivers for
other devices and might cause timeouts of services waiting on them and
subsequently a failed boot.

This patch attempts a different solution for the problem 6e6de3dee51a
was trying to solve. Rather than waiting for the unloading to complete,
it returns a different error code (-EBUSY) for modules in the GOING
state. This should avoid the error situation that was described in
6e6de3dee51a (user space attempting to load a dependent module because
the -EEXIST error code would suggest to user space that the first module
had been loaded successfully), while avoiding the delay situation too.

--- a/kernel/module/main.c
+++ b/kernel/module/main.c

@@ -2386,7 +2386,8 @@ static bool finished_loading(const char *name)
 	sched_annotate_sleep();
 	mutex_lock(&module_mutex);
 	mod = find_module_all(name, strlen(name), true);
-	ret = !mod || mod->state == MODULE_STATE_LIVE;
+	ret = !mod || mod->state == MODULE_STATE_LIVE
+		|| mod->state == MODULE_STATE_GOING;

There is a actually one more race.

This function is supposed to wait until load of a particular module
finishes. But we might find some another module of the same name here.

Maybe, it is not that bad. If many modules of the same name are loaded
in parallel then hopefully most of them would wait for the first one
in add_unformed_module(). And they will never appear in the @modules
list.

Good point, a load waiting in add_unformed_module() could miss that its older
parallel load already finished if another insert of the same module appears in
the modules list in the meantime. This requires that the new load happens to
arrive just after the old one finishes and before the waiting load makes its
check.

This is somewhat similar to the current state where new same-name insert
requests can skip and starve the ones already waiting in
add_unformed_module().

I think in practice the situation should occur very rarely and be cleared
soon, as long one doesn't continuously try to insert the same module.

I am not completely sure how rare it is. Anyway, the conditions are
rather complex so it feels rare ;-)

Anyway, the important part of this race is that the new module
found in finished_loading() did not see the previously loaded
module and did not wait in add_unformed_module(). It means
that the load was not parallel enough. It means that the
optimization added by this patch was not used.

By other words, if the multiple loads are not parallel enough
then the optimization added by this patch will not help.

And the other way. If this patch helps in practice then
this race is not important.

quoted

Anyway, to be on the safe side. We might want to pass the pointer
to the @old module found in add_unformed_module() and make sure
that we find the same module here. Something like:

/*
 * @pending_mod: pointer to module that we are waiting for
 * @name: name of the module; the string must stay even when
 *	the pending module goes away completely
 */
static bool finished_loading(const struct module *pending_mod,
			    const char *name)
{
	struct module *mod;
	bool ret = true;

	/*
	 * The module_mutex should not be a heavily contended lock;
	 * if we get the occasional sleep here, we'll go an extra iteration
	 * in the wait_event_interruptible(), which is harmless.
	 */
	sched_annotate_sleep();
	mutex_lock(&module_mutex);

	mod = find_module_all(name, strlen(name), true);
	/* Check if the pending module is still being loaded */
	if (mod == pending_mod &&
	    (mod->state == MODULE_STATE_UNFORMED ||
	       mod->state == MODULE_STATE_COMMING))
	       ret = false;
	mutex_unlock(&module_mutex);

	return ret;
}

The new pending_mod pointer has no ownership of the target module which can be
then destroyed at any time. While no dereference of pending_mod is made in
finished_loading(), this looks still problematic to me.

I think it is generally good to treat a pointer as invalid and its value as
indeterminate when the object it points to reaches the end of its lifetime.
One specific problem here is that nothing guarantees that a new module doesn't
get allocated at the exactly same address as the old one that got released and
the code is actually waiting on.

Improving this case then likely results in a more complex solution, similar
to the one that was discussed originally.

Fair enough. Let's ignore the race in finished_loading().

Otherwise, the patch looks good to me. It is easy enough.
IMHO, it makes sense to use it if it helps in practice
(multiple loads are parallel enough[*]). Feel free to use:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Of course, the ideal solution would be to avoid the multiple
loads in the first place. AFAIK, everything starts in the kernel
that sends the same udev events for each CPU...

Best Regards,
Petr

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help