Re: [Documentation] State of CPU controller in cgroup v2
From: Andy Lutomirski <hidden>
Date: 2016-09-16 18:20:12
Also in:
cgroups, lkml
On Fri, Sep 16, 2016 at 9:50 AM, Peter Zijlstra [off-list ref] wrote:
On Fri, Sep 16, 2016 at 09:29:06AM -0700, Andy Lutomirski wrote:quoted
quoted
SCHED_DEADLINE, its a 'Global'-EDF like scheduler that doesn't support CPU affinities (because that doesn't make sense). The only way to restrict it is to partition. 'Global' because you can partition it. If you reduce your system to single CPU partitions you'll reduce to P-EDF. (The same is true of SCHED_FIFO, that's a 'Global'-FIFO on the same partition scheme, it however does support sched_affinity, but using it gives 'interesting' schedulability results -- call it a historic accident).Hmm, I didn't realize that the deadline scheduler was global. But ISTM requiring the use of "exclusive" to get this working is unfortunate. What if a user wants two separate partitions, one using CPUs 1 and 2 and the other using CPUs 3 and 4 (with 5 reserved for non-RT stuff)?{1,2} {3,4} {5} seem exclusive, did I miss something? (other than that 5 cpu parts are 'rare').
There's no overlap, so they're logically exclusive, but it avoids needing the "cpu_exclusive" parameter. It always seemed confusing to me that a setting on a child cgroup would strictly remove a resource from the parent. (To be clear: I don't have any particularly strong objection to cpu_exclusive. It just always seemed like a bit of a hack that mostly duplicated what you could get by just setting the cpusets appropriately throughout the hierarchy.)
quoted
quoted
Note that related, but differently, we have the isolcpus boot parameter which creates single CPU partitions for all listed CPUs and gives the rest to the root cpuset. Ideally we'd kill this option given its a boot time setting (for something which is trivially to do at runtime). But this cannot be done, because that would mean we'd have to start with a !0 cpuset layout: '/' load_balance=0 / \ 'system' 'isolated' cpus=~isolcpus cpus=isolcpus load_balance=0 And start with _everything_ in the /system group (inclding default IRQ affinities). Of course, that will break everything cgroup :-(I would actually *much* prefer this over the status quo. I'm tired of my crappy, partially-working script that sits there and creates exactly this configuration (minus the isolcpus part because I actually want migration to work) on boot. (Actually, it could have two automatic cgroups: /kernel and /init -- init and UMH would go in init and kernel threads and such would go in /kernel. Userspace would be able to request that a different cgroup be used for newly-created kernel threads.)So there's a problem with sticking kernel threads (and esp. kthreadd) into !root groups. For example if you place it in a cpuset that doesn't have all cpus, then binding your shiny new kthread to a cpu will fail. You can fix that of course, and we used to do exactly that, but we kept running into 'fun' cases like that.
Blech. But may this *should* have that effect. I'm sick of random kernel crap being scheduled on my RT CPUs and on the CPUs that I intend to be kept forcibly idle.
The unbound workqueue stuff is totally arbitrary borkage though, that can be made to work just fine, TJ didn't like it for some reason which I really cannot remember. Also, UMH?
User mode helper. Fortunately most users are gone now, but it still exists.