Re: /proc/PID/io/read_bytes accounting regression?
From: David Wysochanski <hidden>
Date: 2023-02-17 14:09:58
On Fri, Feb 17, 2023 at 6:56 AM Daire Byrne [off-list ref] wrote:
Hi, Maybe someone here can quickly point me in the right direction for this oddity that we noticed. On newer kernels, it looks like the task io accounting is not incrementing the read_bytes when reading from a NFS mount? This was definitely working on v5.16 downwards, but has not been working since v5.18 up to v6.2 (I haven't tested v5.17 yet). If I read from a local filesystem, then the read_bytes for that PID is incremented as expected. If I read over NFS using directIO, then the read_bytes is also correctly incremented for that PID. It's just when reading normally without directIO that it is not. The write_bytes and rchar are also still both correct in all situations. I have checked the kernel config and I'm fairly sure I have all the right things enabled: CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y CONFIG_TASKSTATS=y Unless there was some extra config introduced specific to the nfs client in later kernels that I missed? Cheers, Daire
Daire, Thanks for the report. Willy, Question for you at the bottom of this. First, here's what looks to be the candidate changes between these versions: $ git log --oneline v5.16..v5.18 fs/nfs/read.c 89c2be8a9516 NFS: discard NFS_RPC_SWAPFLAGS and RPC_TASK_ROOTCREDS fc1c5abfca7e NFS: Rename fscache read and write pages functions 8786fde8421c Convert NFS from readpages to readahead 16f2f4e679cf nfs: Implement cache I/O by accessing the cache directly I would be this is due to this patch, which went into 5.18: 8786fde8421c Convert NFS from readpages to readahead And the hunks that now call readahead_page vs read_cache_pages:
@@ -397,14 +396,16 @@ int nfs_readpage(struct file *file, struct page *page) return ret; } -int nfs_readpages(struct file *file, struct address_space *mapping, - struct list_head *pages, unsigned nr_pages) +void nfs_readahead(struct readahead_control *ractl) { + unsigned int nr_pages = readahead_count(ractl); + struct file *file = ractl->file; struct nfs_readdesc desc; - struct inode *inode = mapping->host; + struct inode *inode = ractl->mapping->host; + struct page *page; int ret; - trace_nfs_aop_readahead(inode, lru_to_page(pages), nr_pages); + trace_nfs_aop_readahead(inode, readahead_pos(ractl), nr_pages); nfs_inc_stats(inode, NFSIOS_VFSREADPAGES); ret = -ESTALE;
@@ -422,14 +423,18 @@ int nfs_readpages(struct file *file, structaddress_space *mapping,
nfs_pageio_init_read(&desc.pgio, inode, false,
&nfs_async_read_completion_ops);
- ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc);
+ while ((page = readahead_page(ractl)) != NULL) {
+ ret = readpage_async_filler(&desc, page);
+ put_page(page);
+ if (ret)
+ break;
+ }
nfs_pageio_complete_read(&desc.pgio);
put_nfs_open_context(desc.ctx);
out:
trace_nfs_aop_readahead_done(inode, nr_pages, ret);
- return ret;
}
and this hunk:@@ -397,14 +396,16 @@ int nfs_readpage(struct file *file, struct page *page) return ret; } -int nfs_readpages(struct file *file, struct address_space *mapping, - struct list_head *pages, unsigned nr_pages) +void nfs_readahead(struct readahead_control *ractl) { + unsigned int nr_pages = readahead_count(ractl); + struct file *file = ractl->file; struct nfs_readdesc desc; - struct inode *inode = mapping->host; + struct inode *inode = ractl->mapping->host; + struct page *page; int ret; - trace_nfs_aop_readahead(inode, lru_to_page(pages), nr_pages); + trace_nfs_aop_readahead(inode, readahead_pos(ractl), nr_pages); nfs_inc_stats(inode, NFSIOS_VFSREADPAGES); ret = -ESTALE;
@@ -422,14 +423,18 @@ int nfs_readpages(struct file *file, structaddress_space *mapping,
nfs_pageio_init_read(&desc.pgio, inode, false,
&nfs_async_read_completion_ops);
- ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc);
+ while ((page = readahead_page(ractl)) != NULL) {
+ ret = readpage_async_filler(&desc, page);
+ put_page(page);
+ if (ret)
+ break;
+ }
nfs_pageio_complete_read(&desc.pgio);
put_nfs_open_context(desc.ctx);
out:
trace_nfs_aop_readahead_done(inode, nr_pages, ret);
- return ret;
}
int __init nfs_init_readpagecache(void)
In v5.16 we had this call to task_io_account_read(PAGE_SIZE); on line 109
of read_cache_pages();
76 /**
77 * read_cache_pages - populate an address space with some pages &
start reads against them
78 * @mapping: the address_space
79 * @pages: The address of a list_head which contains the target
pages. These
80 * pages have their ->index populated and are otherwise uninitialised.
81 * @filler: callback routine for filling a single page.
82 * @data: private data for the callback routine.
83 *
84 * Hides the details of the LRU cache etc from the filesystems.
85 *
86 * Returns: %0 on success, error return by @filler otherwise
87 */
88 int read_cache_pages(struct address_space *mapping, struct list_head *pages,
89 int (*filler)(void *, struct page *), void *data)
90 {
91 struct page *page;
92 int ret = 0;
93
94 while (!list_empty(pages)) {
95 page = lru_to_page(pages);
96 list_del(&page->lru);
97 if (add_to_page_cache_lru(page, mapping, page->index,
98 readahead_gfp_mask(mapping))) {
99 read_cache_pages_invalidate_page(mapping, page);
100 continue;
101 }
102 put_page(page);
103
104 ret = filler(data, page);
105 if (unlikely(ret)) {
106 read_cache_pages_invalidate_pages(mapping, pages);
107 break;
108 }
109 task_io_account_read(PAGE_SIZE);
But there's no call to task_io_account_read() anymore in the new
readahead code paths that I could tell,
but maybe I'm missing something.
Willy,
Does each caller of readahead_page() now need to call
task_io_account_read() or should we add that into
readahead_page() or maybe inside read_pages()?