quoted
quoted
My problem with that is it's not really much different to just skipping the
page table update entirely. Skipping the DSB is closer to what is done on
x86, where we bound the stale entry time to the next context-switch.
Which of the three implementations is the "that" and "it" in the first sentence?
that = it = skipping the whole invalidation + the DSB
The TLB is tiny compared to the size of the inactive list. Somehow a TLB has to
not be evicted during the page's life in the inactive list. That is not an easy
feat except for the hottest of pages.
If there is a context-switch, most of the original thread's TLBs will be
evicted because TLBs have a hard time to hold two thread's working sets. So, in
practice, that is almost the same as the x86 guarantee.
The worst case cannot have a large impact because the maximum number of pages
that will not have the TLB evicted is the number of pages in the TLB. For
example, a 1024 entry TLB can at worst result in 4 MB of pages erroneously
reclaimed. That is not bad on a system with 4+ GB of memory.
We did benchmark the extreme case where half the pages accessed where not
evicted from the TLB. In the read case, skipping the DSB was ~10% faster than
skipping the invalidate or doing the invalidate and the DSB.
Compared to the improvement in the average performance and variability in the
other cases we tested, the 10% loss in a carefully crafted test is not as
important.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel