Thread (35 messages) 35 messages, 8 authors, 2011-01-14

still nfs problems [Was: Linux 2.6.37-rc8]

From: Trond Myklebust <hidden>
Date: 2011-01-05 17:17:32
Also in: linux-nfs, lkml

Possibly related (same subject, not in this thread)

On Wed, 2011-01-05 at 15:52 +0000, Russell King - ARM Linux wrote: 
On Wed, Jan 05, 2011 at 10:14:17AM -0500, Trond Myklebust wrote:
quoted
OK. So,the new behaviour in 2.6.37 is that we're writing to a series of
pages via the usual kmap_atomic()/kunmap_atomic() and kmap()/kunmap()
interfaces, but we can end up reading them via a virtual address range
that gets set up via vm_map_ram() (that range gets set up before the
write occurs).
kmap of lowmem pages will always reuses the existing kernel direct
mapping, so there won't be a problem there.
quoted
Do we perhaps need an invalidate_kernel_vmap_range() before we can read
the data on ARM in this kind of scenario?
Firstly, vm_map_ram() does no cache maintainence of any sort, nor does
it take care of page colouring - so any architecture where cache aliasing
can occur will see this problem.  It should not limited to ARM.

Secondly, no, invalidate_kernel_vmap_range() probably isn't sufficient.
There's two problems here:

	addr = kmap(lowmem_page);
	*addr = stuff;
	kunmap(lowmem_page);

Such lowmem pages are accessed through their kernel direct mapping.

	ptr = vm_map_ram(lowmem_page);
	read = *ptr;

This creates a new mapping which can alias with the kernel direct mapping.
Now, as this is a new mapping, there should be no cache lines associated
with it.  (Looking at vm_unmap_ram(), it calls free_unmap_vmap_area_addr(),
free_unmap_vmap_area(), which then calls flush_cache_vunmap() on the
region.  vb_free() also calls flush_cache_vunmap() too.)

If the write after kmap() hits an already present cache line, the cache
line will be updated, but it won't be written back to memory.  So, on
a subsequent vm_map_ram(), with any kind of aliasing cache, there's
no guarantee that you'll hit that cache line and read the data just
written there.

The kernel direct mapping would need to be flushed.
We should already be flushing the kernel direct mapping after writing by
means of the calls to flush_dcache_page() in xdr_partial_copy_from_skb()
and all the helpers in net/sunrpc/xdr.c.

The only new thing is the read access through the virtual address
mapping. That mapping is created outside the loop in
nfs_readdir_xdr_to_array(), which is why I'm thinking we do need the
invalidate_kernel_vmap_range(): we're essentially doing a series of
writes through the kernel direct mapping (i.e. readdir RPC calls), then
reading the results through the virtual mapping.

i.e. we're doing

ptr = vm_map_ram(lowmem_pages);
while (need_more_data) {

for (i = 0; i < npages; i++) {
addr = kmap_atomic(lowmem_page[i]);
*addr = rpc_stuff;
flush_dcache_page(lowmem_page[i]);
kunmap_atomic(lowmem_page[i]);
}

invalidate_kernel_vmap_range(ptr); // Needed here?

read = *ptr;
}
vm_unmap_ram(lowmem_pages)
I'm really getting to the point of hating the poliferation of RAM
remapping interfaces - it's going to (and is) causing nothing but lots
of pain on virtual cache architectures, needing more and more cache
flushing interfaces to be created.

Is there any other solution to this?
Arbitrary sized pages. :-)

The problem here is that we want to read variable sized records (i.e.
readdir() records) from a multi-page buffer. We could do that by copying
those particular records that overlap with page boundaries, but that
would make for a fairly intrusive rewrite too.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help