Thread (34 messages) 34 messages, 6 authors, 2021-11-29

Re: [PATCH 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults

From: Catalin Marinas <catalin.marinas@arm.com>
Date: 2021-11-29 23:00:24
Also in: linux-arm-kernel, linux-fsdevel, lkml

On Mon, Nov 29, 2021 at 10:40:38AM -0800, Linus Torvalds wrote:
On Mon, Nov 29, 2021 at 7:36 AM Catalin Marinas [off-list ref] wrote:
quoted
That's what this series does when it probes the whole range in
fault_in_writeable(). The main reason was that it's more efficient to do
a read than a write on a large range (the latter dirtying the cache
lines).
The more this thread goes on, the more I'm starting to think that we
should just make "fault_in_writable()" (and readable, of course) only
really work on the beginning of the area.

Not just for the finer-granularity pointer color probing, but for the
page probing too.
I have patches for the finer-granularity checking of the beginning of
the buffer. They need a bit of testing, so probably posting them
tomorrow.
I'm looking at our current fault_in_writeable(), and I'm going

 (a) it uses __put_user() without range checks, which is really not great
For arm64 at least __put_user() does the access_ok() check. I thought
only unsafe_put_user() should skip the checks. If __put_user() can write
arbitrary memory, we may have a bigger problem.
 (b) it looks like a disaster from another standpoint: essentially
user-controlled loop size with no limit checking, no preemption, and
no check for fatal signals.
Indeed, the fault_in_*() loop can get pretty long, bounded by how much
memory can be faulted in the user process. My patches for now only
address the outer loop doing the copy_to_user() as that can be
unbounded.
Now, (a) should be fixed with a access_ok() or similar.

And (b) can easily be fixed multiple ways, with one option simply just
being adding a can_resched() call and checking for fatal signals.

But faulting in the whole region is actually fundamentally wrong in
low-memory situations - the beginning of the region might be swapped
out by the time we get to the end. That's unlikely to be a problem in
real life, but it's an example of how it's simply not conceptually
sensible.

So I do wonder why we don't just say "fault_in_writable will fault in
_at_most_ X bytes", and simply limit the actual fault-in size to
something reasonable.

That solves _all_ the problems. It solves the lack of preemption and
fatal signals (by virtue of just limiting the amount of work we do).
It solves the low memory situation. And it solves the "excessive dirty
cachelines" case too.
I think that would be useful, though it doesn't solve the potential
livelock with sub-page faults. We still need the outer loop to
handle the copy_to_user() for the whole user buffer and the sub-page
probing of the beginning of such buffer (or whenever copy_to_user()
failed). IOW, you still fault in the whole buffer eventually.

Anyway, I think the sub-page probing and limiting the fault-in are
complementary improvements.

-- 
Catalin
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help