Re: [PATCH v2] Add /proc/pid_gen

From: Andrew Morton <akpm@linux-foundation.org>
Date: 2018-11-21 22:12:26
Also in: linux-doc, lkml

On Wed, 21 Nov 2018 12:54:20 -0800 Daniel Colascione [off-list ref] wrote:

Trace analysis code needs a coherent picture of the set of processes
and threads running on a system. While it's possible to enumerate all
tasks via /proc, this enumeration is not atomic. If PID numbering
rolls over during snapshot collection, the resulting snapshot of the
process and thread state of the system may be incoherent, confusing
trace analysis tools. The fundamental problem is that if a PID is
reused during a userspace scan of /proc, it's impossible to tell, in
post-processing, whether a fact that the userspace /proc scanner
reports regarding a given PID refers to the old or new task named by
that PID, as the scan of that PID may or may not have occurred before
the PID reuse, and there's no way to "stamp" a fact read from the
kernel with a trace timestamp.

This change adds a per-pid-namespace 64-bit generation number,
incremented on PID rollover, and exposes it via a new proc file
/proc/pid_gen. By examining this file before and after /proc
enumeration, user code can detect the potential reuse of a PID and
restart the task enumeration process, repeating until it gets a
coherent snapshot.

PID rollover ought to be rare, so in practice, scan repetitions will
be rare.

In general, tracing is a rather specialized thing.  Why is this very
occasional confusion a sufficiently serious problem to warrant addition
of this code?

Which userspace tools will be using pid_gen?  Are the developers of
those tools signed up to use pid_gen?

quoted hunk ↗ jump to hunk

--- a/include/linux/pid.h
+++ b/include/linux/pid.h

@@ -112,6 +112,7 @@ extern struct pid *find_ge_pid(int nr, struct pid_namespace *);
 int next_pidmap(struct pid_namespace *pid_ns, unsigned int last);
 
 extern struct pid *alloc_pid(struct pid_namespace *ns);
+extern u64 read_pid_generation(struct pid_namespace *ns);

pig_generation_read() would be a better (and more idiomatic) name.

 extern void free_pid(struct pid *pid);
 extern void disable_pid_allocation(struct pid_namespace *ns);

...

+u64 read_pid_generation(struct pid_namespace *ns)
+{
+	u64 generation;
+
+
+	spin_lock_irq(&pidmap_lock);
+	generation = ns->generation;
+	spin_unlock_irq(&pidmap_lock);
+	return generation;
+}

What is the spinlocking in here for?  afaict the only purpose it serves
is to make the 64-bit read atomic, so it isn't needed on 32-bit?

quoted hunk ↗ jump to hunk

 void disable_pid_allocation(struct pid_namespace *ns)
 {
 	spin_lock_irq(&pidmap_lock);

@@ -449,6 +463,17 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
 	return idr_get_next(&ns->idr, &nr);
 }
 
+#ifdef CONFIG_PROC_FS
+static int pid_generation_show(struct seq_file *m, void *v)
+{
+	u64 generation =
+		read_pid_generation(proc_pid_ns(file_inode(m->file)));

	u64 generation;

	generation = read_pid_generation(proc_pid_ns(file_inode(m->file)));

is a nicer way of avoiding column wrap.

quoted hunk ↗ jump to hunk

+	seq_printf(m, "%llu\n", generation);
+	return 0;
+
+};
+#endif
+
 void __init pid_idr_init(void)
 {
 	/* Verify no one has done anything silly: */

@@ -465,4 +490,13 @@ void __init pid_idr_init(void)
 
 	init_pid_ns.pid_cachep = KMEM_CACHE(pid,
 			SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT);
+
+}
+
+void __init pid_proc_init(void)
+{
+	/* pid_idr_init is too early, so get a separate init function. */

s/get a/use a/

+#ifdef CONFIG_PROC_FS
+	WARN_ON(!proc_create_single("pid_gen", 0, NULL, pid_generation_show));
+#endif
 }

This whole function could vanish if !CONFIG_PROC_FS.  Doesn't matter
much with __init code though.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help