Re: [PATCH 1/2] mm: vma_merge: fix vm_page_prot SMP race condition against rmap_walk
From: Andrea Arcangeli <hidden>
Date: 2016-09-23 19:18:44
Hello Hugh, On Thu, Sep 22, 2016 at 03:36:36AM -0700, Hugh Dickins wrote:
I suppose: this one seems overblown to me, and risks more change (as the CONFIG_DEBUG_VM_RB=y crashes showed).
When DEBUG_VM_RB=n there was no bug that I know of. So I don't think the fact there was a false positive in the validation code that didn't immediately cope with the new changes, should be a major concern.
But I've come back to it several times, not found any incorrectness, and was just about ready to Ack it (once the VM_RB fix is folded in, though I've not studied that yet): when I noticed that what I'd liked least about this one, looks unnecessary too - see below.
The reason the VM_RB=y incremental fix to the validation code is not a few liner, is to micro-optimize it. I call directly __vma_unlink_common to be sure the additional parameter is eliminated at build time if CONFIG_DEBUG_VM_RB=n, and it never risks to go through the stack in production.
At the bottom I've appended my corrected version of Andrea's earlier patches for comparison: maybe better for stable?
I think it's perfectly suitable for -stable, if there is urgency to merge it in -stable. OTOH with regard to urgency, this isn't exploitable and the bug was there for 10+ years?
quoted
+static inline void __vma_unlink_prev(struct mm_struct *mm, + struct vm_area_struct *vma, + struct vm_area_struct *prev) +{ + __vma_unlink_common(mm, vma, prev, true); +} + +static inline void __vma_unlink(struct mm_struct *mm, + struct vm_area_struct *vma) +{ + __vma_unlink_common(mm, vma, NULL, false); +} +Umm, how many functions do we need to unlink a vma? Perhaps I'm missing some essential, but what's wrong with a single __vma_unlink(mm, vma)? (Could omit mm, but probably better with it.)
Of course that would work, I did that initially. I only had __vma_unlink and I just removed the "prev" parameter from it.
The existing __vma_unlink(mm, vma, prev) dates, of course, from long before Linus added vma->vm_prev in 2.6.36. It doesn't really need its prev arg nowadays, and I wonder if that misled you into all this prev and has_prev stuff?
After removing "prev" from __vma_unlink I reintroduced __vma_unlink_prev as a microoptimization for remove_next = 1/2 cases. In those two cases we have already "prev" and it's guaranteed not null. So by keeping the _common version __always_inline the parameters of the _common disappears in the assembly and in turn the __vma_unlink_prev is a bit faster. Perhaps it's not worth to do these kind of microoptimizations? The only reason I reintroduced a version of __vma_unlink_prev that gets prev not NULL as parameter was explicitly to microoptimize with __always_inline.
(Yes, of course it needs to handle the NULL vma->vm_prev mm->mmap case, but that doesn't need these three functions.) But I see this area gets touched again in yesterday's 3/4 to fix the VM_RB issue. I haven't tried applying that patch on top to see what the result looks like, but I hope simpler than this.
Right, to handle the case of DEBUG_VM_RB=y I need to pass a different "ignore" parameter in remove_next == 3, so it's even more worth to microoptimize now that I'm forced to have a different kind of call anyway, and I can't just call __vma_unlink(next). Once the two patches are folded, __vma_unlink is renamed to __vma_unlink_prev that is a more accurate name anyway I think, given the parameters and that assumption it does on prev being not NULL.
quoted
+ if (remove_next != 3) + __vma_unlink_prev(mm, next, vma); + else + /* vma is not before next if they've been swapped */ + __vma_unlink(mm, next);And if the VM_RB issue doesn't complicate it, this would just amount to __vma_unlink(mm, next); without any remove_next 3 variation.
Yes, and VM_RB complicates it.
quoted
+ if (remove_next != 3) {if (vma == orig_vma), and you won't need the remove_next 3 state at all.
I think that would be less readable. I don't want to risk to mistake case 1/2/3. I could use an enum and REMOVE_NEXT, REMOVE_NEXT_NEXT, REMOVE_PREV, or I could use -1 instead of 3 to show it's removing prev if you wish, but I would prefer not to use vma == orig_vma to detect remove_next != 3. It can't improve performance either. orig_vma is purely for trans_huge split when the vma->vm_start/end (and next->vm_start if adjust_next) boundary changes. The only point of orig_vma is to replace this statement: "remove_next != 3 : vma : next". I wouldn't mix up the detection of case 1/2/3 with that micro-optimization.
Here's my fixup of Andrea's earlier version, not swapping vma and next as the above does, but applying properties of next to vma as before. Maybe this version should go in first, so that it's available as an easier and safer candidate for stable backports: whatever akpm prefers.
I think this is a more conservative and in turn safer approach for urgent -stable or for urgent backports. Performance-wise I doubt any difference is measurable. For the longer term upstream, I think removing the oddness factor from case 8 for good is better, as we may get bitten by it again and it's also quite counter intuitive for the callers of vma_merge to receive a vma that isn't already fully in sync with all the parameters passed to vma_merge. And having to overwrite the "different" bits by hand. I feel the oddness in case 8 should be dropped for good and it's not much more complicated to do so (especially if we ignore the __vma_unlink details which are a fully self contained problem and they cannot add up to the complexity of vma_merge/vma_adjust). I successfully tested your fix with the testcase that exercises the race and reviewed your fix and it's certainly correct too to solve the race against rmap_walks that access vma_page_prot/vm_flags. The fix in -mm however is solving the race condition for all fields, if any rmap_walk accessed more than those two fields, and without having to copy them off. Tested-by: Andrea Arcangeli <redacted> Reviewed-by: Andrea Arcangeli <redacted> Overall I'm fine either ways, but I had to elaborate my preference :). Thanks, Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>