Re: [PATCHv7 10/14] x86/mm: Avoid load_unaligned_zeropad() stepping into unaccepted memory
From: Dave Hansen <hidden>
Date: 2022-08-02 23:46:44
Also in:
linux-efi, linux-mm, lkml
On 7/26/22 03:21, Borislav Petkov wrote:
On Tue, Jun 14, 2022 at 03:02:27PM +0300, Kirill A. Shutemov wrote:quoted
But, this approach does not work for unaccepted memory. For TDX, a load from unaccepted memory will not lead to a recoverable exception within the guest. The guest will exit to the VMM where the only recourse is to terminate the guest.FTR, this random-memory-access-to-unaccepted-memory-is-deadly thing is really silly. We should be able to handle such cases - because they do happen often - in a more resilient way. Just look at the complex dance this patch needs to do just to avoid this. IOW, this part of the coco technology needs improvement.
This particular wound is self-inflicted. The hardware can *today* generate a #VE for these accesses. But, to make writing the #VE code more straightforward, we asked that the hardware not even bother delivering the exception. At the time, nobody could come up with a case why there would ever be a legitimate, non-buggy access to unaccepted memory. We learned about load_unaligned_zeropad() the hard way. I never ran into it and never knew it was there. Dangit. We _could_ go back to the way it was originally. We could add load_unaligned_zeropad() support to the #VE handler, and there's little risk of load_unaligned_zeropad() itself being used in the interrupts-disabled window early in the #VE handler. That would get rid of all the nasty adjacent page handling in the unaccepted memory code. But, that would mean that we can land in the #VE handler from more contexts. Any normal, non-buggy use of load_unaligned_zeropad() can end up there, obviously. We would, for instance, need to be more careful about #VE recursion. We'd also have to make sure that _bugs_ that land in the #VE handler can still be handled in a sane way. To sum it all up, I'm not happy with the complexity of the page acceptance code either but I'm not sure that it's bad tradeoff compared to greater #VE complexity or fragility. Does anyone think we should go back and really reconsider this?