On Fri, Aug 20, 2021, Mathieu Desnoyers wrote:
I still really hate flakiness in tests, because then people stop caring when they
fail once in a while. And with the nature of rseq, a once-in-a-while failure is a
big deal. Let's see if we can use other tricks to ensure stability of the cpu id
without changing timings too much.
Yeah, zero agrument regarding flaky tests.
One idea would be to use a seqcount lock.
A sequence counter did the trick! Thanks much!
But even if we use that, I'm concerned that the very long writer critical
section calling sched_setaffinity would need to be alternated with a sleep to
ensure the read-side progresses. The sleep delay could be relatively small
compared to the duration of the sched_setaffinity call, e.g. ratio 1:10.
I already had an arbitrary usleep(10) to let the reader make progress between
sched_setaffinity() calls. Dropping it down to 1us didn't affect reproducibility,
so I went with that to shave those precious cycles :-) Eliminating the delay
entirely did result in no repro, which was a nice confirmation that it's needed
to let the reader get back into KVM_RUN.
Thanks again!