Re: [PATCH v8 07/18] nfsd: add "localio" support

From: Chuck Lever III <chuck.lever@oracle.com>
Date: 2024-06-28 14:40:24

On Jun 27, 2024, at 11:35 PM, NeilBrown [off-list ref] wrote:
On Fri, 28 Jun 2024, Chuck Lever III wrote:

quoted

On Jun 27, 2024, at 1:27 PM, Mike Snitzer [off-list ref] wrote:
On Thu, Jun 27, 2024 at 12:07:03PM -0400, Chuck Lever wrote:

quoted

On Thu, Jun 27, 2024 at 11:48:09AM -0400, Jeff Layton wrote:

quoted

Chuck mentioned this earlier, but I don't think we ought to merge the
dprintks. If they're useful for debugging then they should be turned
into tracepoints. This one, I'd probably just drop.

Right; the problem with dprintk() is they can create so much chatter
that the systemd journal will automatically toss those messages and
they are lost. No diagnostic value in that.

Also you probably won't find it useful if lots of debugging info
goes into the trace log but a handful of the stuff you are
looking for is dumped into the system journal; the two use different
timestamps and so are really hard to line up after the fact.

We're trying to transition away from dprintk() in NFSD for these
reasons.

OK, I understand wanting to not allow new dprintk() to be added.

Meanwhile:
$ grep -ri dprintk fs/nfsd/*.[ch]  | wc -l
   181

So I'm struggling to get motivated to convert to tracepoints.  Feels
like a needless make-work hurdle, these could be converted by others
more proficient with tracepoints if/when needed.

Making everyone have to be proficient at developing debugging via
tracepoints seems misplaced (but I also understand that forcing others
to fish enables "others" to not be doing the conversion work).

Trace points are part of the cost of contributing to NFSD,
just like XDR, RCU, lockdep_assert, and dozens of other
kernel facilities. Not a hurdle, and I don't ask for busy
work changes.

I think trace points are quite different from the other facilities you
highlight.
You need to know XDR and RCU etc to get correct performant code.  If you
get it wrong, then the code won't work or (hopefully) a reviewer will
tell you.

But trace points .... when and where are they really useful?  The answer
to that question is no where near as clear cut.

I disagree; see below.

While I'm sure they can be useful, I rarely find them to be so.  I've
certainly had a few positive experiences, but also seen a lot of noise
that doesn't really help me with the particular behaviour that I'm
trying the analyse.  system-tap can be incredibly useful as it is
targeted.  Fixed trace points are (for me) only occasionally useful.

Some of Oracle's customers, for example, refuse to use out-of-band
debugging facilities like BPF or systemtap because that requires
bespoke case-specific code to be written. They feel that enabling
any lightly-tested code at a kernel privilege level on heavily-used
production systems introduces an unacceptable risk of crashing such
systems. (I'm told by Red Hat support engineers that they have
heard the same concerns).

dprintk impacts thread timing and has a heavy performance penalty.
It can also run the root file system out of space, thus it's not
something that can be left enabled for long periods of time. It
has no mechanisms for data reduction during capture. So it's
simply not a viable player in most live debugging scenarios.

If you prefer systemtap or BPF, you are still free to use those
instead! However, built-in tracing is the only choice for the
above cases, and it has to be part of the source code.

I think it would be good to know if localio is active - maybe something
in /proc/self/mountinfo could provide that.
I think it might be useful to know what server-uuid each server and each
mount was using.  The client could again have it in
/proc/self/mountinfo.  The server ...  maybe in /proc/fs/nfsd/, maybe
available over netlink...

Netlink is where we are adding such things these days.

just fyi, the most valuable part of the dprintk debugging in my
experience is the rpc_show_tasks() call when rpc debugging is turned on
or off.  This view into the current status can be very useful.


NFSD now has a similar facility via netlink.

Note also that the client's "show tasks" mechanism can also be
accessed via /sys.


--
Chuck Lever

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help