Re: [PATCH v3 21/22] netoops: Add user-programmable boot_id
From: Matt Mackall <hidden>
Date: 2010-12-14 22:47:55
Also in:
lkml, netdev
On Tue, 2010-12-14 at 14:33 -0800, Mike Waychison wrote:
On Tue, Dec 14, 2010 at 2:06 PM, Matt Mackall [off-list ref] wrote:quoted
On Tue, 2010-12-14 at 13:59 -0800, Mike Waychison wrote:quoted
On Tue, Dec 14, 2010 at 1:42 PM, Matt Mackall [off-list ref] wrote:quoted
On Tue, 2010-12-14 at 13:30 -0800, Mike Waychison wrote:quoted
Add support for letting userland define a 32bit boot id. This is useful for users to be able to correlate netoops reports to specific boot instances offline.This sounds a lot like the pre-existing /proc/sys/kernel/random/boot_id that's used by kerneloops.org.Could be. I'm looking at it now... There is no documentation for this boot_id field?Probably not. It's just a random number generated at boot.quoted
Reusing this guy would work, except that it doesn't appear to allow arbitrary values to be set. We need to inject our boot sequence number (which is figured out in userland) in the packet somehow as we need to correlate it to our other monitoring systems.What happens if you oops before userspace is available?Either one of two general cases: - The crash is a one-off and the machine comes back. The boot number sequence will see a hole in it, which is a clue that something bad happened. - The machine is in a crash loop. This has the same failure mode for us as if the machine never made it onto the network due to whatever reason: bad cables, bad firmware, bad ram, ... In both cases, we can detect that something is wrong and handle it. Note that our firmware is responsible for incrementing the boot sequence at bootup, which is why the above works. In general though, our machines do make it up to userland -- staying alive once booted is the hard part ;)
Interesting. Is this Google-specific firmware magic? I'd probably accept a hook in random.c to fold a number into the UUID, which would unify things. -- Mathematics is the supreme nostalgia of our time.