Re: [PATCH v2] Add /proc/pid_gen

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date: 2018-11-22 15:27:25
Also in: linux-doc, lkml

----- On Nov 21, 2018, at 7:30 PM, Daniel Colascione dancol@google.com wrote:
[...]

quoted

The problem here is the possibility of confusion, even if it's rare.
Does the naive approach of just walking /proc and ignoring the
possibility of PID reuse races work most of the time? Sure. But "most
of the time" isn't good enough. It's not that there are tons of sob
stories: it's that without completely robust reporting, we can't rule
out of the possibility that weirdness we observe in a given trace is
actually just an artifact from a kinda-sort-working best-effort trace
collection system instead of a real anomaly in behavior. Tracing,
essentially, gives us deltas for system state, and without an accurate
baseline, collected via some kind of scan on trace startup, it's
impossible to use these deltas to robustly reconstruct total system
state at a given time. And this matters, because errors in
reconstruction (e.g., assigning a thread to the wrong process because
the IDs happen to be reused) can affect processing of the whole trace.
If it's 3am and I'm analyzing the lone trace from a dogfooder
demonstrating a particularly nasty problem, I don't want to find out
that the trace I'm analyzing ended up being useless because the
kernel's trace system is merely best effort. It's very cheap to be
100% reliable here, so let's be reliable and rule out sources of
error.

[...]

I've just been CC'd on this thread for some reason, so I'll add my 2 cents.

WHIW, I think using /proc to add stateful information to a time-based
trace is the wrong way to do things. Here, the fact that you need to
add a generation counter struct pid_namespace and expose it via /proc
just highlights its limitations when it comes to dealing with state
that changes over time. Your current issue is with PID re-use, but
you will eventually face the same issue for re-use of all other resources
you are trying to model. For instance, a file descriptor may be associated
to a path as some point in time, but that is not true anymore after a
sequence of close/open which re-uses that file descriptor. Does that
mean we will eventually end up needing per-file-descriptor generation
counters as well ?

LTTng solves this by dumping the system state as events within the
trace [1], which associates time-stamps with the state being dumped.
It is recorded while the rest of the system is being traced, so tools
can reconstruct full system state by combining this statedump with the
rest of the events recording state transitions.

So while I agree that it's important to have a way to reconstruct
system state that is aware of PID re-use, I think trying to extend
/proc for this is the wrong approach. It adds extra fields to struct
pid_namespace that seem to be only useful for tracing, whereas using
the time-stamp at which the thread/process was first seen in the trace
(either fork or statedump) as secondary key should suffice to uniquely
identify a thread/process. I would recommend extending tracing
facilities to dump the data you need rather than /proc.

Thanks,

Mathieu

[1] http://git.lttng.org/?p=lttng-modules.git;a=blob;f=lttng-statedump-impl.c;h=dc037508c055b7f61b8c758d581bd0178e26552a;hb=HEAD


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help