Re: [RFC PATCH 0/3] cgroup: fsio throttle controller
From: Josef Bacik <josef@toxicpanda.com>
Date: 2019-01-18 17:12:06
Also in:
cgroups, lkml
On Fri, Jan 18, 2019 at 06:07:45PM +0100, Paolo Valente wrote:
quoted
Il giorno 18 gen 2019, alle ore 17:35, Josef Bacik [off-list ref] ha scritto: On Fri, Jan 18, 2019 at 11:31:24AM +0100, Andrea Righi wrote:quoted
This is a redesign of my old cgroup-io-throttle controller: https://lwn.net/Articles/330531/ I'm resuming this old patch to point out a problem that I think is still not solved completely. = Problem = The io.max controller works really well at limiting synchronous I/O (READs), but a lot of I/O requests are initiated outside the context of the process that is ultimately responsible for its creation (e.g., WRITEs). Throttling at the block layer in some cases is too late and we may end up slowing down processes that are not responsible for the I/O that is being processed at that level.How so? The writeback threads are per-cgroup and have the cgroup stuff set properly. So if you dirty a bunch of pages, they are associated with your cgroup, and then writeback happens and it's done in the writeback thread associated with your cgroup and then that is throttled. Then you are throttled at balance_dirty_pages() because the writeout is taking longer.IIUC, Andrea described this problem: certain processes in a certain group dirty a lot of pages, causing write back to start. Then some other blameless process in the same group experiences very high latency, in spite of the fact that it has to do little I/O.
In that case the io controller isn't doing it's job and needs to be fixed (or reconfigured). io.latency guards against this, I assume io.max would keep this from happening if it were configured properly.
Does your blk_cgroup_congested() stuff solves this issue? Or simply I didn't get what Andrea meant at all :)
I _think_ Andrea is talking about the fact that we can generate IO indirectly and never get throttled for it, which is what blk_cgroup_congested() is meant to address. I added it specifically because some low prio task was just allocating all of the memory on the system and causing a lot of pressure because of swapping, but there was no direct feedback loop there. blk_cgroup_congested() provides that feedback loop. Course I could be wrong too and we're all just talking past each other ;). Thanks, Josef