Re: [RFC 0/6] x86: prefetch_page() vDSO call
From: Peter Zijlstra <peterz@infradead.org>
Date: 2021-02-25 08:41:24
Also in:
lkml
On Wed, Feb 24, 2021 at 11:29:04PM -0800, Nadav Amit wrote:
From: Nadav Amit <redacted> Just as applications can use prefetch instructions to overlap computations and memory accesses, applications may want to overlap the page-faults and compute or overlap the I/O accesses that are required for page-faults of different pages. Applications can use multiple threads and cores for this matter, by running one thread that prefetches the data (i.e., faults in the data) and another that does the compute, but this scheme is inefficient. Using mincore() can tell whether a page is mapped, but might not tell whether the page is in the page-cache and does not fault in the data. Introduce prefetch_page() vDSO-call to prefetch, i.e. fault-in memory asynchronously. The semantic of this call is: try to prefetch a page of in a given address and return zero if the page is accessible following the call. Start I/O operations to retrieve the page if such operations are required and there is no high memory pressure that might introduce slowdowns. Note that as usual the page might be paged-out at any point and therefore, similarly to mincore(), there is no guarantee that the page will be present at the time that the user application uses the data that resides on the page. Nevertheless, it is expected that in the vast majority of the cases this would not happen, since prefetch_page() accesses the page and therefore sets the PTE access-bit (if it is clear). The implementation is as follows. The vDSO code accesses the data, triggering a page-fault it is not present. The handler detects based on the instruction pointer that this is an asynchronous-#PF, using the recently introduce vDSO exception tables. If the page can be brought without waiting (e.g., the page is already in the page-cache), the kernel handles the fault and returns success (zero). If there is memory pressure that prevents the proper handling of the fault (i.e., requires heavy-weight reclamation) it returns a failure. Otherwise, it starts an I/O to bring the page and returns failure. Compilers can be extended to issue the prefetch_page() calls when needed.
Interesting, but given we've been removing explicit prefetch from some parts of the kernel how useful is this in actual use? I'm thinking there should at least be a real user and performance numbers with this before merging.