Re: [PATCH] glibc: Remove CPU set size checking from affinity functions [BZ #19143]

From: Florian Weimer <hidden>
Date: 2015-10-23 14:00:34
Also in: cgroups

On 10/19/2015 07:23 PM, Carlos O'Donell wrote:

quoted

The current situation, briefly stated, is this: glibc tries to guess the
kernel CPU set size and rejects attempts to specify an affinity mask
which is larger than that, but it does not work, and glibc and the
kernel still silently accept CPU affinity masks with invalid bits,
without returning an error.  The glibc check does not provide any value
to applications, it just adds pointless complexity to the library.
Therefore, I want to remove it from glibc.

The point of the code you want to remove was to detect the case where
the user set CPU bits outside of the maximum possible number of supported
CPUs. AFAIK today that value is CONFIG_NR_CPUS and the various variables
that derive from that value.

CONFIG_NR_CPUS is the absolute maximum for a specific kernel (available
at run time in /sys/devices/system/cpu/kernel_max).  The kernel lowers
the observable value if it detects the system cannot support more than a
specific number of CPUs.  This is the kernel-internal nr_cpu_ids
variable.  I don't think its value is directly exported, but it can
currently be derived from /sys/devices/system/cpu/possible.

(The difficulty of obtaining this value, and the tendency of the kernel
to replace hard compile-time limits with run-time configuration options,
makes me think it is unwise to expose these values through sysconf.)

Can you elaborate some more on any of the false negative cases you think
might impact user applications? For example what happens if you use the
stock cpu_set_t size? It would seem to me that such a use keeps working
and there is no change.

Yes, applications which worked before continue to work.  But you can now
specify an all-ones mask you have allocated, and glibc will not reject
it because it has set bits beyond the value it guessed for nr_cpu_ids.

quoted

Remove CPU set size checking from affinity functions [BZ #19143]

With current kernel versions, the check does not reliably detect that
unavailable CPUs are requested, for these reasons:

(1) The kernel will silently ignore non-allowed CPUs.

You mean to say that if sched_setaffinity is called with a CPU mask 
bit set to enabled, but that cpu is not allowed for the process, then
it will ignore the setting?

Yes, that is what the kernel does.

This requires you run sched_getaffinity to
verify what CPUs you're actually set to run on?

Yes, if you care about this detail.

Is this because the
cpuset mechanism is merged with sched_setaffinity and overrides it?

As far as I can tell, yes.  I do not know where the kernel gets the
other mask from, but it seems cgroups-related (hence the Cc:), and it
does and AND on those two masks.

quoted

(3) The existing probing code assumes that the CPU mask size is a
    power of two and at least 1024.  Neither it has to be a power
    of two, nor is the minimum possible value 1024, so the value
    determined is often too large, resulting in incorrect false
    negatives.

Could you explain those "false negative" cases again?

I tried to make this clearer in the revised commit message of the
attached patch.

The goal is to keep nptl/ free from linux-isms, and AFAICT your test is
indeed free of any linux-specific features since your code for finding
the size of the cpu mask is generic. Is there anything I might have missed
that would make your test linux-specific? Keep in mind that we share nptl/
with the nacl port.

Good point.  sched_getcpu is not universally available, so I had to move
the tests to sysdeps/unix/sysv/linux.

I noticed that tst-getcpu was failing (bug 19164), so I added another
sched_setaffinity test variant that supersedes it.

quoted

+* sched_setaffinity, pthread_setaffinity_np no longer attempt to guess the
+  kernel-internal CPU set size.  This means that requests that change the
+  CPU affinity which failed before will now succeed.  Applications that need

Please provide at least one example of a failure which now succeeds.

I've updated the NEWS entry.

quoted

+/* We wave two loops running for two seconds each.  */
+#define TIMEOUT 8

Why two seconds?

Why two threads?

If the value of 2 seconds is arbitrary please state so in 
a comment such that future reviewers can adjust it as they
see fit without having to review the history of the value.

Should be clearer in the new version.

quoted

+++ b/posix/tst-affinity.c

@@ -0,0 +1,254 @@

Needs a one line test description with BZ#.

I did not include the bug number because it is a generic, non-regression
test.

quoted

+static int
+find_set_size (void)
+{
+  /* We need to use multiples of 64 because otherwise, CPU_ALLOC
+     over-allocates, and and we do not see all bits returned by the
+     kernel.  */

How does CPU_ALLOC over-allocating result in not seeing bits from 
the kernel? Is this because you get over-allocation in CPU_ALLOC,
but your own external count of num_cpus would be lower and it's
num_cpus you return? Can't you rely on the result of CPU_ALLOC_SIZE
and return that?

See find_last_cpu.  It is tricky to determine the actual size of a CPU
set just based on the macros.  I think I got it right.

It turns out the comment was incorrect, I changed the code.

Why do we do this instead of calling sysconf to get the number
of CPUs?

sysconf currently does not give us the proper number, and I really don't
think applications should rely on it.  This is a separate conversation,
IMHO.  I added a comment.

quoted

+
+static bool
+test_size (const struct conf *conf, size_t size)
+{

Should print PASS:/FAIL: prefix to make grepping easier.

I added info/warning/error prefixes instead.

Should do what other tests do e.g. err |= and run each
test on a distinct line. Makes it easier to add tests and
disable tests while debugging.

Ah, right.

I have added tests which cover additional scenarios (cross-process
sched_* calls and cross-thread pthread_sched* calls).

Florian

Attachments

0001-Remove-CPU-set-size-checking-from-affinity-functions.patch [text/x-patch] 44653 bytes · preview

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help