Thread (20 messages) 20 messages, 5 authors, 2023-07-08

Re: [PATCH] net: tls: enable __GFP_ZERO upon tls_init()

From: Ard Biesheuvel <ardb@kernel.org>
Date: 2023-06-30 15:18:00
Also in: linux-crypto

Possibly related (same subject, not in this thread)

On Fri, 30 Jun 2023 at 13:55, Alexander Potapenko [off-list ref] wrote:
On Fri, Jun 30, 2023 at 1:49 PM Ard Biesheuvel [off-list ref] wrote:
quoted
On Fri, 30 Jun 2023 at 13:38, Alexander Potapenko [off-list ref] wrote:
quoted
On Fri, Jun 30, 2023 at 12:18 PM Ard Biesheuvel [off-list ref] wrote:
quoted
On Fri, 30 Jun 2023 at 12:11, Alexander Potapenko [off-list ref] wrote:
quoted
On Fri, Jun 30, 2023 at 12:02 PM Ard Biesheuvel [off-list ref] wrote:
quoted
On Fri, 30 Jun 2023 at 11:53, Tetsuo Handa
[off-list ref] wrote:
quoted
On 2023/06/30 18:36, Ard Biesheuvel wrote:
quoted
Why are you sending this now?
Just because this is currently top crasher and I can reproduce locally.
quoted
Do you have a reproducer for this issue?
Yes. https://syzkaller.appspot.com/text?tag=ReproC&x=12931621900000 works.
Could you please share your kernel config and the resulting kernel log
when running the reproducer? I'll try to reproduce locally as well,
and see if I can figure out what is going on in the crypto layer
The config together with the repro is available at
https://syzkaller.appspot.com/bug?extid=828dfc12440b4f6f305d, see the
latest row of the "Crashes" table that contains a C repro.
Could you explain why that bug contains ~50 reports that seem entirely
unrelated?
These are some unfortunate effects of syzbot trying to deduplicate
bugs. There's a tradeoff between reporting every single crash
separately and grouping together those that have e.g. the same origin.
Applying this algorithm transitively results in bigger clusters
containing unwanted reports.
We'll look closer.
quoted
AIUI, this actual issue has not been reproduced since
2020??
Oh, sorry, I misread the table and misinformed you. The topmost row of
the table is indeed the _oldest_ one.
Another manifestation of the bug was on 2023/05/23
(https://syzkaller.appspot.com/text?tag=CrashReport&x=146f66b1280000)
That one has nothing to do with networking, so I don't see how this
patch would affect it.
I definitely have to be more attentive.
You are right that this bug report is also unrelated. Yet it is still
fine to use the build artifacts corresponding to it (which is what I
did).
I'll investigate why so many reports got clustered into this one.


quoted
OK, thanks for the instructions.

Out of curiosity - does the stack trace you cut off here include the
BPF routine mentioned in the report?
It does:

[  151.522472][ T5865] =====================================================
[  151.523843][ T5865] BUG: KMSAN: uninit-value in aes_encrypt+0x15cc/0x1db0
[  151.525120][ T5865]  aes_encrypt+0x15cc/0x1db0
[  151.526113][ T5865]  aesti_encrypt+0x7d/0xf0
[  151.527057][ T5865]  crypto_cipher_encrypt_one+0x112/0x200
[  151.528224][ T5865]  crypto_cbcmac_digest_update+0x301/0x4b0
[  151.529459][ T5865]  shash_ahash_finup+0x66e/0xc00
[  151.530541][ T5865]  shash_async_finup+0x7f/0xc0
[  151.531542][ T5865]  crypto_ahash_finup+0x1b8/0x3e0
[  151.532583][ T5865]  crypto_ccm_auth+0x1269/0x1350
[  151.533606][ T5865]  crypto_ccm_encrypt+0x1c9/0x7a0
[  151.534650][ T5865]  crypto_aead_encrypt+0xe0/0x150
[  151.535695][ T5865]  tls_push_record+0x3bf3/0x4ec0
[  151.539491][ T5865]  bpf_exec_tx_verdict+0x46e/0x21d0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[  151.540597][ T5865]  tls_sw_do_sendpage+0x1150/0x1ad0
OK, so after poking around a little bit, I have managed to confirm
that the problem is in the TLS layer, and I am a bit out of my depth
debugging that.

With the debugging code below applied, KMSAN triggers on an
uninit-memory in the input scatterlist provided by the TLS layer into
the CCM code.

[  148.375852][ T2424] =====================================================
[  148.377269][ T2424] BUG: KMSAN: uninit-value in tls_push_record+0x2d9f/0x3eb0
[  148.378623][ T2424]  tls_push_record+0x2d9f/0x3eb0
[  148.379559][ T2424]  bpf_exec_tx_verdict+0x5ba/0x2530
[  148.380534][ T2424]  tls_sw_do_sendpage+0x169c/0x1f80
[  148.381519][ T2424]  tls_sw_sendpage+0x247/0x2b0
...
[  148.411559][ T2424]
[  148.412108][ T2424] Bytes 0-15 of 16 are uninitialized
[  148.413379][ T2424] Memory access of size 16 starts at ffff8880157889c7

Note that this is the *input* scatterlist containing the AAD
(additional authenticated data) and the crypto input, and so there is
definitely a bug here that shouldn't be papered over by zero'ing the
allocation.
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -543,6 +543,21 @@ static int tls_do_encryption(struct sock *sk,
        list_add_tail((struct list_head *)&rec->list, &ctx->tx_list);
        atomic_inc(&ctx->encrypt_pending);

+       {
+               int len = aead_req->assoclen + aead_req->cryptlen;
+               struct sg_mapping_iter miter;
+
+               sg_miter_start(&miter, rec->sg_aead_in,
+                              sg_nents(rec->sg_aead_in),
+                              SG_MITER_TO_SG | SG_MITER_ATOMIC);
+
+               while (len > 0 && sg_miter_next(&miter)) {
+                       kmsan_check_memory(miter.addr, min(len,
(int)miter.length));
+                       len -= miter.length;
+               }
+               sg_miter_stop(&miter);
+       }
+

The reason that this cascades all the way down to the AES cipher code
appears to be that sbox substitution involves array indexing, which is
one of the actions KMSAN qualifies as 'use' of uninit data.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help