Thread (34 messages) 34 messages, 10 authors, 2012-06-15

RE: [PATCH v2 3/3] x86: Support local_flush_tlb_kernel_range

From: Dan Magenheimer <hidden>
Date: 2012-06-15 20:14:56
Also in: lkml

From: Nitin Gupta [mailto:ngupta@vflare.org]
Subject: Re: [PATCH v2 3/3] x86: Support local_flush_tlb_kernel_range

On 06/15/2012 12:39 PM, Dan Magenheimer wrote:
quoted
quoted
From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
quoted
The decompression path calls lzo1x directly and it would be
a huge pain to make lzo1x smart about page boundaries.  BUT
since we know that the decompressed result will always fit
into a page (actually exactly a page), you COULD do an extra
copy to the end of the target page (using the same smart-
about-page-boundaries copying code from above) and then do
in-place decompression, knowing that the decompression will
not cross a page boundary.  So, with the extra copy, the "pair
mapping" can be avoided for decompression as well.
This is an interesting thought.

But this does result in a copy in the decompression (i.e. page fault)
path, where right now, it is copy free.  The compressed data is
decompressed directly from its zsmalloc allocation to the page allocated
in the fault path.
The page fault occurs as soon as the lzo1x compression code starts anyway,
as do all the cache faults... both just occur earlier, so the only
additional cost is the actual cpu instructions to move the sequence of
(compressed) bytes from the zsmalloc-allocated area to the end
of the target page.

TLB operations can be very expensive, not to mention (as the
subject of this thread attests) non-portable.
Even if you go for copying chunks followed by decompression, it still
requires two kmaps and kunmaps. Each of these require one local TLB
invlpg. So, a total of 2 local maps + unmaps even with this approach.
That may be true for i386, but on a modern (non-highmem) machine those
kmaps/kunmaps are free and "pair mappings" in the TLB are still expensive
and not very portable.  Doesn't make sense to me to design for better
performance on highmem and poorer performance on non-highmem.
 
The only additional requirement of zsmalloc is that it requires two
mappings which are virtually contiguous. The cost is the same in both
approaches but the current zsmalloc approach presents a much cleaner
interface.
OK, it's your code and I'm just making a suggestion. I will shut up now ;-)

Dan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help