Re: [PATCH v4 01/11] powerpc/mm: Adds counting method to monitor lockless pgtable walks
From: Leonardo Bras <hidden>
Date: 2019-10-01 18:40:44
Also in:
linux-arch, linux-mm, lkml
On Mon, 2019-09-30 at 14:47 -0700, John Hubbard wrote:
On 9/30/19 11:42 AM, Leonardo Bras wrote:quoted
On Mon, 2019-09-30 at 10:57 -0700, John Hubbard wrote:quoted
quoted
As I told before, there are cases where this function is called from 'real mode' in powerpc, which doesn't disable irqs and may have a tricky behavior if we do. So, encapsulate the irq disable in this function can be a bad choice.You still haven't explained how this works in that case. So far, the synchronization we've discussed has depended upon interrupt disabling as part of the solution, in order to hold off page splitting and page table freeing.The irqs are already disabled by another mechanism (hw): MSR_EE=0. So, serialize will work as expected.I get that they're disabled. But will this interlock with the code that issues IPIs?? Because it's not just disabling interrupts that matters, but rather, synchronizing with the code (TLB flushing) that *happens* to require issuing IPIs, which in turn interact with disabling interrupts. So I'm still not seeing how that could work here, unless there is something interesting about the smp_call_function_many() on ppc with MSR_EE=0 mode...?
I am failing to understand the issue. I mean, smp_call_function_many() will issue a IPI to each CPU in CPUmask and wait it to run before returning. If interrupts are disabled (either by MSR_EE=0 or local_irq_disable), the IPI will not run on that CPU, and the wait part will make sure to lock the thread until the interrupts are enabled again. Could you please point the issue there?
quoted
quoted
Simply skipping that means that an additional mechanism is required...which btw might involve a new, ppc-specific routine, so maybe this is going to end up pretty close to what I pasted in after all...quoted
Of course, if we really need that, we can add a bool parameter to the function to choose about disabling/enabling irqs.quoted
* This is really a core mm function, so don't hide it away in arch layers. (If you're changing mm/ files, that's a big hint.)My idea here is to let the arch decide on how this 'register' is going to work, as archs may have different needs (in powerpc for example, we can't always disable irqs, since we may be in realmode).Yes, the tension there is that a) some things are per-arch, and b) it's easy to get it wrong. The commit below (d9101bfa6adc) is IMHO a perfect example of that. So, I would like core mm/ functions that guide the way, but the interrupt behavior complicates it. I think your original passing of just struct_mm is probably the right balance, assuming that I'm wrong about interrupts.
I think, for the generic function, that including {en,dis}abling the
interrupt is fine. I mean, if disabling the interrupt is the generic
behavior, it's ok.
I will just make sure to explain that the interrupt {en,dis}abling is
part of the sync process. If an arch don't like it, it can write a
specific function that does the sync in a better way. (and defining
__HAVE_ARCH_LOCKLESS_PGTBL_WALK_COUNTER to ignore the generic function)
In this case, the generic function would also include the ifdef'ed
atomic inc and the memory barrier.
quoted hunk ↗ jump to hunk
quoted
quoted
quoted
Maybe we can create a generic function instead of a dummy, and let it be replaced in case the arch needs to do so.Yes, that might be what we need, if it turns out that ppc can't use this approach (although let's see about that).I initially used the dummy approach because I did not see anything like serialize in other archs. I mean, even if I put some generic function here, if there is no function to use the 'lockless_pgtbl_walk_count', it becomes only a overhead.Not really: the memory barrier is required in all cases, and this code would be good I think: +void register_lockless_pgtable_walker(struct mm_struct *mm) +{ +#ifdef LOCKLESS_PAGE_TABLE_WALK_TRACKING + atomic_inc(&mm->lockless_pgtbl_nr_walkers); +#endif + /* + * This memory barrier pairs with any code that is either trying to + * delete page tables, or split huge pages. + */ + smp_mb(); +} +EXPORT_SYMBOL_GPL(gup_fast_lock_acquire); And this is the same as your original patch, with just a minor name change:@@ -2341,9 +2395,11 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write, if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) && gup_fast_permitted(start, end)) { + register_lockless_pgtable_walker(current->mm); local_irq_save(flags); gup_pgd_range(start, end, write ? FOLL_WRITE : 0, pages, &nr); local_irq_restore(flags); + deregister_lockless_pgtable_walker(current->mm);Btw, hopefully minor note: it also looks like there's a number of changes in the same area that conflict, for example: commit d9101bfa6adc ("powerpc/mm/mce: Keep irqs disabled during lockless page table walk") <Aneesh Kumar K.V> (Thu, 19 Sep 2019) ...so it would be good to rebase this onto 5.4-rc1, now that that's here.
Yeap, agree. Already rebased on top of v5.4-rc1.
thanks,
Thank you!
Attachments
- signature.asc [application/pgp-signature] 833 bytes