Re: [RFC PATCH 0/3] cgroup: fsio throttle controller

From: Josef Bacik <josef@toxicpanda.com>
Date: 2019-01-18 17:12:06
Also in: cgroups, lkml

On Fri, Jan 18, 2019 at 06:07:45PM +0100, Paolo Valente wrote:

quoted

Il giorno 18 gen 2019, alle ore 17:35, Josef Bacik [off-list ref] ha scritto:

On Fri, Jan 18, 2019 at 11:31:24AM +0100, Andrea Righi wrote:

quoted

This is a redesign of my old cgroup-io-throttle controller:
https://lwn.net/Articles/330531/

I'm resuming this old patch to point out a problem that I think is still
not solved completely.

= Problem =

The io.max controller works really well at limiting synchronous I/O
(READs), but a lot of I/O requests are initiated outside the context of
the process that is ultimately responsible for its creation (e.g.,
WRITEs).

Throttling at the block layer in some cases is too late and we may end
up slowing down processes that are not responsible for the I/O that
is being processed at that level.

How so?  The writeback threads are per-cgroup and have the cgroup stuff set
properly.  So if you dirty a bunch of pages, they are associated with your
cgroup, and then writeback happens and it's done in the writeback thread
associated with your cgroup and then that is throttled.  Then you are throttled
at balance_dirty_pages() because the writeout is taking longer.

IIUC, Andrea described this problem: certain processes in a certain group dirty a
lot of pages, causing write back to start.  Then some other blameless
process in the same group experiences very high latency, in spite of
the fact that it has to do little I/O.

In that case the io controller isn't doing it's job and needs to be fixed (or
reconfigured).  io.latency guards against this, I assume io.max would keep this
from happening if it were configured properly.

Does your blk_cgroup_congested() stuff solves this issue?

Or simply I didn't get what Andrea meant at all :)

I _think_ Andrea is talking about the fact that we can generate IO indirectly
and never get throttled for it, which is what blk_cgroup_congested() is meant to
address.  I added it specifically because some low prio task was just allocating
all of the memory on the system and causing a lot of pressure because of
swapping, but there was no direct feedback loop there.  blk_cgroup_congested()
provides that feedback loop.

Course I could be wrong too and we're all just talking past each other ;).
Thanks,

Josef

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help