Thread (55 messages) 55 messages, 9 authors, 2013-01-02

Re: [patch] mm, mempolicy: Introduce spinlock to read shared policy tree

From: Hugh Dickins <hughd@google.com>
Date: 2012-12-21 18:21:23
Also in: lkml

On Fri, 21 Dec 2012, Linus Torvalds wrote:
On Fri, Dec 21, 2012 at 5:47 AM, Mel Gorman [off-list ref] wrote:
quoted
On Thu, Dec 20, 2012 at 02:55:22PM -0800, David Rientjes wrote:
quoted
This is probably worth discussing now to see if we can't revert
b22d127a39dd ("mempolicy: fix a race in shared_policy_replace()"), keep it
only as a spinlock as you suggest, and do what KOSAKI suggested in
http://marc.info/?l=linux-kernel&m=133940650731255 instead.  I don't think
it's worth trying to optimize this path at the cost of having both a
spinlock and mutex.
Jeez, I'm still not keen on that approach for the reasons that are explained
in the changelog for b22d127a39dd.
Christ, Mel.

Your reasons in b22d127a39dd are weak as hell, and then you come up
with *THIS* shit instead:
quoted
That leads to this third *ugly* option that conditionally drops the lock
and it's up to the caller to figure out what happened. Fooling around with
how it conditionally releases the lock results in different sorts of ugly.
We now have three ugly sister patches for this. Who wants to be Cinderalla?

---8<---
mm: numa: Release the PTL if calling vm_ops->get_policy during NUMA hinting faults
Heck no. In fact, not a f*cking way in hell. Look yourself in the
mirror, Mel. This patch is ugly, and *guaranteed* to result in subtle
locking issues, and then you have the *gall* to quote the "uhh, that's
a bit ugly due to some trivial duplication" thing in commit
b22d127a39dd.

Reverting commit b22d127a39dd and just having a "ok, if we need to
allocate, then drop the lock, allocate, re-get the lock, and see if we
still need the new allocation" is *beautiful* code compared to the
diseased abortion you just posted.

Seriously. Conditional locking is error-prone, and about a million
times worse than the trivial fix that Kosaki suggested.
I'm picking up a vibe that you don't entirely like Mel's approach.

I've an unsubstantiated suspicion that it's also incomplete as is.
Although at first I thought huge_memory.c does not need a similar
mod, because THPages are anonymous and cannot come from tmpfs,
I now wonder about a MAP_PRIVATE mapping from tmpfs - for better
or for worse, anon pages there are subject to the same mempolicy
as the shared file pages, and I don't see what prevents khugepaged
from gathering those into THPages.  But it didn't happen when I
tried, so maybe I'm just missing what prevents it.

I don't understand David's and Mel's remarks about the "shared pages"
check making Sasha's warning unlikely: page_mapcount has nothing to do
with whether a page belongs to shm/shmem/tmpfs, and it's easy enough
to reproduce Sasha's warning on the current git tree.  "mount -o
remount,mpol=local /tmp" or something like that is useful in testing.

I wish wish wish I had time to spend on this today, but I don't.
And I've not looked to see (let alone tested) whether it's easy
to revert Mel's mutex then add in Kosaki's patch (which I didn't
look at so have no opinion on).

Shall we go for Peter/David's mutex+spinlock for rc1 - I assume
they both tested that - with a promise to do better in rc2?

What I wanted to try is separate the get_vma_policy() out from
mpol_misplaced(), and have the various callsites do that first
outside the page table lock, passing it in to mpol_misplaced.
But that doesn't work (efficiently) unless it also returns the
range that that policy is valid for, so we don't have to (drop
lock and) call it on every pte.  I cannot do that for rc1, and
perhaps it's irrelevant if Kosaki's patch is preferred.

(Perhaps I should confess I've another reason to come here for
rc2: that "+ info->vfs_inode.i_ino" we recently added for better
interleave distribution in shmem_alloc_page: I think NUMA placement
faults will be fighting shmem_alloc_page's choices because that
offset is not exposed.)

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help