Re: [PATCH v5 1/6] mm/page_idle: Add per-pid idle page tracking using virtual index
From: Jann Horn <jannh@google.com>
Date: 2019-08-13 15:41:31
Also in:
linux-doc, linux-fsdevel, linux-mm, lkml
On Tue, Aug 13, 2019 at 5:30 PM Joel Fernandes [off-list ref] wrote:
On Mon, Aug 12, 2019 at 08:14:38PM +0200, Jann Horn wrote: [snip]quoted
quoted
+/* Helper to get the start and end frame given a pos and count */ +static int page_idle_get_frames(loff_t pos, size_t count, struct mm_struct *mm, + unsigned long *start, unsigned long *end) +{ + unsigned long max_frame; + + /* If an mm is not given, assume we want physical frames */ + max_frame = mm ? (mm->task_size >> PAGE_SHIFT) : max_pfn; + + if (pos % BITMAP_CHUNK_SIZE || count % BITMAP_CHUNK_SIZE) + return -EINVAL; + + *start = pos * BITS_PER_BYTE; + if (*start >= max_frame) + return -ENXIO; + + *end = *start + count * BITS_PER_BYTE; + if (*end > max_frame) + *end = max_frame; + return 0; +}You could add some overflow checks for the multiplications. I haven't seen any place where it actually matters, but it seems unclean; and in particular, on a 32-bit architecture where the maximum user address is very high (like with a 4G:4G split), it looks like this function might theoretically return with `*start > *end`, which could be confusing to callers.I could store the multiplication result in unsigned long long (since we are bounds checking with max_frame, start > end would not occur). Something like the following (with extraneous casts). But I'll think some more about the point you are raising.
check_mul_overflow() exists and could make that a bit cleaner.
quoted
This means that BITMAP_CHUNK_SIZE is UAPI on big-endian systems, right? My opinion is that it would be slightly nicer to design the UAPI such that incrementing virtual addresses are mapped to incrementing offsets in the buffer (iow, either use bytewise access or use little-endian), but I'm not going to ask you to redesign the UAPI this late.That would also be slow and consume more memory in userspace buffers. Currently, a 64-bit (8 byte) chunk accounts for 64 pages worth or 256KB.
I still wanted to use one bit per page; I just wanted to rearrange the bits. So the first byte would always contain 8 bits corresponding to the first 8 pages, instead of corresponding to pages 56-63 on some systems depending on endianness. Anyway, this is a moot point, since as you said...
Also I wanted to keep the interface consistent with the global /sys/kernel/mm/page_idle interface.
Sorry, I missed that this is already UAPI in the global interface. I agree, using a different API for the per-process interface would be a bad idea.