Re: [PATCH v9] livepatch: Clear relocation targets on a module removal

From: Song Liu <song@kernel.org>
Date: 2023-01-20 19:41:22
Also in: live-patching, lkml

On Fri, Jan 20, 2023 at 11:16 AM Josh Poimboeuf [off-list ref] wrote:

On Wed, Jan 18, 2023 at 12:47:28PM -0800, Song Liu wrote:

quoted

From: Miroslav Benes <mbenes@suse.cz>

Josh reported a bug:

  When the object to be patched is a module, and that module is
  rmmod'ed and reloaded, it fails to load with:

  module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
  livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
  livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'

  The livepatch module has a relocation which references a symbol
  in the _previous_ loading of nfsd. When apply_relocate_add()
  tries to replace the old relocation with a new one, it sees that
  the previous one is nonzero and it errors out.

Should we add a selftest to make sure this problem doesn't come back?

IIRC, a selftest for this issue is not easy without Joe's klp-convert work.
At the moment I use kpatch-build for testing.

quoted

  On ppc64le, we have a similar issue:

  module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
  livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
  livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'

Just to clarify my previous comment, the reference to the powerpc issue
should be removed because it'll be fixed in a separate patch.

Will remove.

quoted

He also proposed three different solutions. We could remove the error
check in apply_relocate_add() introduced by commit eda9cec4c9a1
("x86/module: Detect and skip invalid relocations"). However the check
is useful for detecting corrupted modules.

We could also deny the patched modules to be removed. If it proved to be
a major drawback for users, we could still implement a different
approach. The solution would also complicate the existing code a lot.

We thus decided to reverse the relocation patching (clear all relocation
targets on x86_64). The solution is not
universal and is too much arch-specific, but it may prove to be simpler
in the end.

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Co-developed-by: Song Liu <song@kernel.org>
Reported-by: Josh Poimboeuf <redacted>

According to submitting-patches.rst the Co-developed-by is supposed to
be immediately before your Signed-off-by.

Also, other than the commit log, this patch is almost completely
unrecognizable compared to Miroslav's original patch.  Does it still
make sense for him to be listed as the author?

In the -tip tree they sometimes use an Originally-by tag, which might be
relevant here.

How about:

Signed-off-by: Song Liu <song@kernel.org>
Originally-by: Miroslav Benes [off-list ref]
Acked-by: Miroslav Benes <mbenes@suse.cz>
Reported-by: Josh Poimboeuf <redacted>

quoted

-static int __apply_relocate_add(Elf64_Shdr *sechdrs,
+static int __write_relocate_add(Elf64_Shdr *sechdrs,
                 const char *strtab,
                 unsigned int symindex,
                 unsigned int relsec,
                 struct module *me,
-                void *(*write)(void *dest, const void *src, size_t len))
+                void *(*write)(void *dest, const void *src, size_t len),
+                bool apply)
 {
      unsigned int i;
      Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
      Elf64_Sym *sym;
      void *loc;
      u64 val;
+     u64 zero = 0ULL;

-     DEBUGP("Applying relocate section %u to %u\n",
+     DEBUGP("%s relocate section %u to %u\n",
+            (apply) ? "Applying" : "Clearing",

"(apply)" has unnecessary parentheses.

quoted

             relsec, sechdrs[relsec].sh_info);
      for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rel); i++) {
+             int write_size = 4;

We already know we're writing, I suggest just calling it 'size' to be
more consistent with kernel style.

Also, it can be an unsigned value like size_t.

Also, instead of initializing it here with a potentially bogus value
which may need to be overwritten for a 64-bit reloc, it's better to
explicitly set the size in each individual case below.  That's makes the
logic clearer and helps prevent future bugs if new 64-bit relocation
types are added later.

Will fix in v10.

quoted

              case R_X86_64_PC32:
              case R_X86_64_PLT32:
-                     if (*(u32 *)loc != 0)
-                             goto invalid_relocation;
                      val -= (u64)loc;
-                     write(loc, &val, 4);
 #if 0
-                     if ((s64)val != *(s32 *)loc)
+                     if ((s64)val != *(s32 *)&val)
                              goto overflow;
 #endif

Why is the compiled-out code getting changed?  Is it actually fixing a
hypothetical bug?

I just changed them the same way as other cases (assuming we remove
the #if 0, of course).

This code really needs to be removed anyway, it's been dead for at least
15 years.

Shall we remove it now? Within the same patch? Or with a preparation
patch?

quoted

                      break;
              case R_X86_64_PC64:
-                     if (*(u64 *)loc != 0)
-                             goto invalid_relocation;
                      val -= (u64)loc;
-                     write(loc, &val, 8);
+                     write_size = 8;
                      break;
              default:
                      pr_err("%s: Unknown rela relocation: %llu\n",
                             me->name, ELF64_R_TYPE(rel[i].r_info));
                      return -ENOEXEC;
              }
+
+             if (apply) {
+                     if (memcmp(loc, &zero, write_size)) {
+                             pr_err("x86/modules: Skipping invalid relocation target, existing value is nonzero for type %d, loc %p, val %Lx\n",

It's not just "skipping", it's erroring out completely.  Yes, it's is a
pre-existing error message but we might as well improve it while
touching it.

Maybe just remove the "Skipping", i.e. change "Skipping invalid ..." to
"Invalid ..." ?

Sounds good to me.

quoted

+                                    (int)ELF64_R_TYPE(rel[i].r_info), loc, val);
+                             return -ENOEXEC;
+                     }
+                     write(loc, &val, write_size);
+             } else {
+                     if (memcmp(loc, &val, write_size)) {
+                             pr_warn("x86/modules: Clearing invalid relocation target, existing value does not match expected value for type %d, loc %p, val %Lx\n",
+                                     (int)ELF64_R_TYPE(rel[i].r_info), loc, val);
+                     }
+                     write(loc, &zero, write_size);

If the value doesn't match then something has gone badly wrong.  Why go
ahead with the clearing in that case?

We can pr_err() then return -ENOEXEC (?). But I guess we need to
handle the error case in:
  klp_cleanup_module_patches_limited()
  klp_module_coming()
  klp_module_going()
and all the functions that call klp_module_going().

This seems a big overkill to me...

Or do you mean we just skip the write()?

quoted

+#ifdef CONFIG_LIVEPATCH
+
+void clear_relocate_add(Elf64_Shdr *sechdrs,
+                     const char *strtab,
+                     unsigned int symindex,
+                     unsigned int relsec,
+                     struct module *me)
+{
+     write_relocate_add(sechdrs, strtab, symindex, relsec, me, false);
+}
+#endif

Superflous newline after the '#ifdef CONFIG_LIVEPATCH'.

quoted

+
 #endif

 int module_finalize(const Elf_Ehdr *hdr,

diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 7b4587a19189..0b54ec9856df 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h

@@ -75,6 +75,13 @@ int apply_relocate_add(Elf_Shdr *sechdrs,
                     unsigned int symindex,
                     unsigned int relsec,
                     struct module *mod);
+#ifdef CONFIG_LIVEPATCH

This could use a comment explaining the purpose of this function:

/*
 * Some architectures (namely x86_64 and ppc64) perform sanity checks when
 * applying relocations.  If a patched module gets unloaded and then later
 * reloaded (and re-patched), klp re-applies relocations to the replacement
 * function(s).  Any leftover relocations from the previous loading of the
 * patched module might trigger the sanity checks.
 *
 * To prevent that, when unloading a patched module, clear out any relocations
 * that might trigger arch-specific sanity checks on a future module reload.
 */

Sounds great. Will add this to the next version.

quoted

+void clear_relocate_add(Elf_Shdr *sechdrs,
+                const char *strtab,
+                unsigned int symindex,
+                unsigned int relsec,
+                struct module *me);
+#endif


--
Josh

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help