Re: [PATCH v6 3/8] firmware_loader: Move module refcounts to allow unloading
From: Russ Weight <russ.weight@linux.dev>
Date: 2024-11-14 16:36:00
Also in:
lkml
On Tue, Nov 12, 2024 at 06:40:28PM -0800, Dan Williams wrote:
Dionna Glaze wrote:quoted
If a kernel module registers a firmware upload API ops set, then it's unable to be moved due to effectively a cyclic reference that the module depends on the upload which depends on the module. Instead, only require the try_module_get when an upload is requested to disallow unloading a module only while the upload is in progress.Oh, interesting, I wondered why CXL did not uncover this loop in its usage only to realize that CXL calls firmware registration from the cxl_pci module, but the @module paramter passed to firmware_upload_register() is the cxl_core module. I.e. we are accidentally avoiding the problem. I assume other CONFIG_FW_UPLOAD users simply do not test module removal. However, I think the fix is simply to remove all module reference taking by the firmware_loader core. It is the consumer's responsibility to call firmware_upload_unregister() in its module removal path and that should flush any and all future usage of the passed in ops structure.
As I understand it, if a module directly references symbols in another module, then the reference count is automatically incremented to ensure that the dependent symbols are available to the consumer. In this case, the firmware_loader does not directly reference symbol names in the device driver that registered it. The call-back function pointers are provided during registration. Without explicitly incrementing the module reference count, it is possible to remove the device driver while leaving the firmware loader instance (and sysfs entries) intact. Accessing those sysfs nodes would result in references to pointers that are no longer valid. Clearly this would be an unexpected/unusual case. Someone with root access would have to remove the device driver. I'm not sure how much effort should be expended in preventing it - but this is the reasoning behind the incrementing/decrementing of the module reference counts. - Russ