Thread (10 messages) 10 messages, 3 authors, 2017-11-24

Re: [PATCH 1/3] lockdep: Apply crossrelease to PG_locked locks

From: Jan Kara <jack@suse.cz>
Date: 2017-11-24 09:38:26
Also in: linux-mm, lkml

On Fri 24-11-17 09:11:49, Michal Hocko wrote:
On Fri 24-11-17 12:02:36, Byungchul Park wrote:
quoted
On Thu, Nov 16, 2017 at 02:07:46PM +0100, Michal Hocko wrote:
quoted
On Thu 16-11-17 21:48:05, Byungchul Park wrote:
quoted
On 11/16/2017 9:02 PM, Michal Hocko wrote:
quoted
for each struct page. So you are doubling the size. Who is going to
enable this config option? You are moving this to page_ext in a later
patch which is a good step but it doesn't go far enough because this
still consumes those resources. Is there any problem to make this
kernel command line controllable? Something we do for page_owner for
example?
Sure. I will add it.
quoted
Also it would be really great if you could give us some measures about
the runtime overhead. I do not expect it to be very large but this is
The major overhead would come from the amount of additional memory
consumption for 'lockdep_map's.
yes
quoted
Do you want me to measure the overhead by the additional memory
consumption?

Or do you expect another overhead?
I would be also interested how much impact this has on performance. I do
not expect it would be too large but having some numbers for cache cold
parallel kbuild or other heavy page lock workloads.
Hello Michal,

I measured 'cache cold parallel kbuild' on my qemu machine. The result
varies much so I cannot confirm, but I think there's no meaningful
difference between before and after applying crossrelease to page locks.

Actually, I expect little overhead in lock_page() and unlock_page() even
after applying crossreleas to page locks, but only expect a bit overhead
by additional memory consumption for 'lockdep_map's per page.

I run the following instructions within "QEMU x86_64 4GB memory 4 cpus":

   make clean
   echo 3 > drop_caches
   time make -j4
Maybe FS people will help you find a more representative workload. E.g.
linear cache cold file read should be good as well. Maybe there are some
tests in fstests (or how they call xfstests these days).
So a relatively good test of page handling costs is to mmap cache hot file
and measure time to fault in all the pages in the mapping. That way IO and
filesystem stays out of the way and you measure only page table lookup,
page handling (taking page ref and locking the page), and instantiation of
the new PTE. Out of this page handling is actually the significant part.

								Honza
-- 
Jan Kara [off-list ref]
SUSE Labs, CR
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help