Thread (19 messages) 19 messages, 9 authors, 2021-08-04

Re: flush_kernel_dcache_page fixes and removal

From: Christoph Hellwig <hch@lst.de>
Date: 2021-07-13 09:11:23
Also in: linux-arm-kernel, linux-doc, linux-mips, linux-mmc, linux-scsi, linux-sh, lkml

On Tue, Jul 13, 2021 at 09:46:48AM +0100, Russell King (Oracle) wrote:
I think you need to be careful - I seem to have a recollection that the
reason we ended up with flush_kernel_dcache_page() was the need to avoid
the taking of the mmap lock for 32-bit ARM VIVT based CPUs in
flush_dcache_page(). 32-bit ARM flush_dcache_page() can block.
Where can arm32 flush_dcache_page block?  __flush_dcache_aliases does
walk mapping->i_mmap, but using a spinlock hidden under
flush_dcache_mmap_lock.  If flush_dcache_page did block plenty of code
already calling it e.g. from interrupt and block submission contexts
in various mmc/sdcard drivers would be rather unhappy.  Even today
calls to flush_kernel_dcache_page page in the block I/O path are
vastly outnumber by calls to flush_dcache_page.
The second issue I have is that, when we are reading a page into a page
cache page, how can that page be mapped to userspace? Isn't that a
violation of semantics? If the page is mapped to userspace but does not
contain data from the underlying storage device, then the page contains
stale data (if it's a new page, lets hope that's zeroed and not some
previous contents - which would be a massive security hole.)
I did not come up with the rules, but these are the existing documented
ones for flush_dcache_page:

        Any time the kernel writes to a page cache page, _OR_
	the kernel is about to read from a page cache page and
	user space shared/writable mappings of this page potentially
	exist, this routine is called.

So writing to (aka reading into) a page cache page is not conditional
on it being mapped yet.  Only the kernel reading from the page cache
page has this condition.
As I
understand it, the flush_kernel_dcache_page() calls in the block layer
are primarily there to cope with drivers that do PIO read to write to a
page cache page to ensure that later userspace mappings can see the data
in the page cache page - by ensuring that the page cache pages are in
the same state as far as caches go as if they had been DMA'd to.
PIO is one big case, but also kernel generated data and all kinds of
bounce buffering schemes.
We know that the current implementation works fine - you're now
proposing to radically change it, asserting that it's buggy. I'm
nervous about this change.
Do we know it works?  There are very few calls to
flush_dcache_kernel_page and very few implementations that differ from
flush_dcache_page.  For arm32 the relevant drivers would mostly be
mmc drivers using the sg_miter interface, are they even used much on
the platforms where the difference exists?
-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
---end quoted text---
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help