Thread (49 messages) 49 messages, 7 authors, 2021-01-22

Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel

From: Thomas Gleixner <hidden>
Date: 2020-10-17 16:08:24
Also in: linux-arch, linux-arm-kernel, lkml, netdev

On Sat, Oct 17 2020 at 01:08, Alex Belits wrote:
On Mon, 2020-10-05 at 14:52 -0400, Nitesh Narayan Lal wrote:
quoted
On 10/4/20 7:14 PM, Frederic Weisbecker wrote:
I think that the goal of "finding source of disturbance" interface is
different from what can be accomplished by tracing in two ways:

1. "Source of disturbance" should provide some useful information about
category of event and it cause as opposed to determining all precise
details about things being called that resulted or could result in
disturbance. It should not depend on the user's knowledge about
details
Tracepoints already give you selectively useful information.
of implementations, it should provide some definite answer of what
happened (with whatever amount of details can be given in a generic
mechanism) even if the user has no idea how those things happen and
what part of kernel is responsible for either causing or processing
them. Then if the user needs further details, they can be obtained with
tracing.
It's just a matter of defining the tracepoint at the right place.
2. It should be usable as a runtime error handling mechanism, so the
information it provides should be suitable for application use and
logging. It should be usable when applications are running on a system
in production, and no specific tracing or monitoring mechanism can be
in use.
That's a strawman really. There is absolutely no reason why a specific
set of tracepoints cannot be enabled on a production system.

Your tracker is a monitoring mechanism, just a different flavour.  By
your logic above it cannot be enabled on a production system either.

Also you can enable tracepoints from a control application, consume, log
and act upon them. It's not any different from opening some magic
isolation tracker interface. There are even multiple ways to do that
including libraries.
If, say, thousands of devices are controlling neutrino detectors on an
ocean floor, and in a month of work one of them got one isolation
breaking event, it should be able to report that isolation was broken
by an interrupt from a network interface, so the users will be able to
track it down to some userspace application reconfiguring those
interrupts.
Tracing can do that and it can do it selectively on the isolated
CPUs. It's just a matter of proper configuration and usage.
It will be a good idea to make such mechanism optional and suitable for
tracking things on conditions other than "always enabled" and "enabled
with task isolation".
Tracing already provides that. Tracepoints are individually controlled
and filtered.
However in my opinion, there should be something in kernel entry
procedure that, if enabled, prepared something to be filled by the
cause data, and we know at least one such situation when this kernel
entry procedure should be triggered -- when task isolation is on.
A tracepoint will gather that information for you.

task isolation is not special, it's just yet another way to configure
and use a system and tracepoints provide everything you need with the
bonus that you can gather more correlated information when you need it.

In fact tracing and tracepoints have replaced all specialized trackers
which were in the kernel before tracing was available. We're not going
to add a new one just because.

If there is anything which you find that tracing and tracepoints cannot
provide then the obvious solution is to extend that infrastructure so it
can serve your usecase.

Thanks,

        tglx
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help