Thread (51 messages) 51 messages, 3 authors, 2024-06-20

Re: [PATCH v5 33/37] s390/uaccess: Add KMSAN support to put_user() and get_user()

From: Ilya Leoshkevich <iii@linux.ibm.com>
Date: 2024-06-20 17:06:27
Also in: linux-mm, linux-s390, lkml

On Thu, 2024-06-20 at 13:19 +0200, Ilya Leoshkevich wrote:
On Thu, 2024-06-20 at 10:36 +0200, Alexander Potapenko wrote:
quoted
On Wed, Jun 19, 2024 at 5:45 PM Ilya Leoshkevich
[off-list ref]
wrote:
quoted
put_user() uses inline assembly with precise constraints, so
Clang
is
in principle capable of instrumenting it automatically.
Unfortunately,
one of the constraints contains a dereferenced user pointer, and
Clang
does not currently distinguish user and kernel pointers.
Therefore
KMSAN attempts to access shadow for user pointers, which is not a
right
thing to do.
By the way, how does this problem manifest?
I was expecting KMSAN to generate dummy shadow accesses in this
case,
and reading/writing 1-8 bytes from dummy shadow shouldn't be a
problem.

(On the other hand, not inlining the get_user/put_user functions is
probably still faster than retrieving the dummy shadow, so I'm fine
either way)
We have two problems here: not only clang can't distinguish user and
kernel pointers, the KMSAN runtime - which is supposed to clean that
up - can't do that either due to overlapping kernel and user address
spaces on s390. So the instrumentation ultimately tries to access the
real shadow.

I forgot what the consequences of that were exactly, so I reverted
the
patch and now I get:

Unable to handle kernel pointer dereference in virtual kernel address
space
Failing address: 000003fed25fa000 TEID: 000003fed25fa403
Fault in home space mode while using kernel ASCE.
AS:0000000005a70007 R3:00000000824d8007 S:0000000000000020 
Oops: 0010 ilc:2 [#1] SMP 
Modules linked in:
CPU: 3 PID: 1 Comm: init Tainted: G    B            N 6.10.0-rc4-
g8aadb00f495e #11
Hardware name: IBM 3931 A01 704 (KVM/Linux)
Krnl PSW : 0704c00180000000 000003ffe288975a (memset+0x3a/0xa0)
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0
EA:3
Krnl GPRS: 0000000000000000 000003fed25fa180 000003fed25fa180
000003ffe28897a6
           0000000000000007 000003ffe0000000 0000000000000000
000002ee06e68190
           000002ee06f19000 000003fed25fa180 000003ffd25fa180
000003ffd25fa180
           0000000000000008 0000000000000000 000003ffe17262e0
0000037ee000f730
Krnl Code: 000003ffe288974c: 41101100           la      %r1,256(%r1)
           000003ffe2889750: a737fffb           brctg  
%r3,000003ffe2889746
          #000003ffe2889754: c03000000029       larl   
%r3,000003ffe28897a6
          >000003ffe288975a: 44403000           ex      %r4,0(%r3)
           000003ffe288975e: 07fe               bcr     15,%r14
           000003ffe2889760: a74f0001           cghi    %r4,1
           000003ffe2889764: b9040012           lgr     %r1,%r2
           000003ffe2889768: a784001c           brc    
8,000003ffe28897a0
Call Trace:
 [<000003ffe288975a>] memset+0x3a/0xa0 
([<000003ffe17262bc>] kmsan_internal_set_shadow_origin+0x21c/0x3a0)
 [<000003ffe1725fb6>] kmsan_internal_unpoison_memory+0x26/0x30 
 [<000003ffe1c1c646>] create_elf_tables+0x13c6/0x2620 
 [<000003ffe1c0ebaa>] load_elf_binary+0x50da/0x68f0  
 [<000003ffe18c41fc>] bprm_execve+0x201c/0x2f40 
 [<000003ffe18bff9a>] kernel_execve+0x2cda/0x2d00 
 [<000003ffe49b745a>] kernel_init+0x9ba/0x1630 
 [<000003ffe000cd5c>] __ret_from_fork+0xbc/0x180 
 [<000003ffe4a1907a>] ret_from_fork+0xa/0x30 
Last Breaking-Event-Address:
 [<000003ffe2889742>] memset+0x22/0xa0
Kernel panic - not syncing: Fatal exception: panic_on_oops

So is_bad_asm_addr() returned false for a userspace address.
Why? Because it happened to collide with the kernel modules area:
precisely the effect of overlapping.

VMALLOC_START: 0x37ee0000000
VMALLOC_END:   0x3a960000000
MODULES_VADDR: 0x3ff60000000
Address:       0x3ffd157a580
MODULES_END:   0x3ffe0000000

Now the question is, why do we crash when accessing shadow for
modules?
So, Alexander G. and I have figured it out. KMSAN maps vmalloc/modules
metadata lazily - when the corresponding memory is allocated. Here we
have a completely random address that did not come from a prior
vmalloc()/execmem_alloc(), so the corresponding metadata pages are
missing.

We could probably detect this situation and perform the lazy
initialization in this case as well, but I don't know if it's worth the
effort.
I'll need to investigate, this does not look normal. But even if that
worked, we clearly wouldn't want userspace accesses to pollute module
shadow, so I think we need this patch in its current form.

[...]
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help