Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel
From: Thomas Gleixner <hidden>
Date: 2020-10-17 16:08:24
Also in:
linux-arch, linux-arm-kernel, lkml, netdev
On Sat, Oct 17 2020 at 01:08, Alex Belits wrote:
On Mon, 2020-10-05 at 14:52 -0400, Nitesh Narayan Lal wrote:quoted
On 10/4/20 7:14 PM, Frederic Weisbecker wrote:I think that the goal of "finding source of disturbance" interface is different from what can be accomplished by tracing in two ways: 1. "Source of disturbance" should provide some useful information about category of event and it cause as opposed to determining all precise details about things being called that resulted or could result in disturbance. It should not depend on the user's knowledge about details
Tracepoints already give you selectively useful information.
of implementations, it should provide some definite answer of what happened (with whatever amount of details can be given in a generic mechanism) even if the user has no idea how those things happen and what part of kernel is responsible for either causing or processing them. Then if the user needs further details, they can be obtained with tracing.
It's just a matter of defining the tracepoint at the right place.
2. It should be usable as a runtime error handling mechanism, so the information it provides should be suitable for application use and logging. It should be usable when applications are running on a system in production, and no specific tracing or monitoring mechanism can be in use.
That's a strawman really. There is absolutely no reason why a specific set of tracepoints cannot be enabled on a production system. Your tracker is a monitoring mechanism, just a different flavour. By your logic above it cannot be enabled on a production system either. Also you can enable tracepoints from a control application, consume, log and act upon them. It's not any different from opening some magic isolation tracker interface. There are even multiple ways to do that including libraries.
If, say, thousands of devices are controlling neutrino detectors on an ocean floor, and in a month of work one of them got one isolation breaking event, it should be able to report that isolation was broken by an interrupt from a network interface, so the users will be able to track it down to some userspace application reconfiguring those interrupts.
Tracing can do that and it can do it selectively on the isolated CPUs. It's just a matter of proper configuration and usage.
It will be a good idea to make such mechanism optional and suitable for tracking things on conditions other than "always enabled" and "enabled with task isolation".
Tracing already provides that. Tracepoints are individually controlled and filtered.
However in my opinion, there should be something in kernel entry procedure that, if enabled, prepared something to be filled by the cause data, and we know at least one such situation when this kernel entry procedure should be triggered -- when task isolation is on.
A tracepoint will gather that information for you.
task isolation is not special, it's just yet another way to configure
and use a system and tracepoints provide everything you need with the
bonus that you can gather more correlated information when you need it.
In fact tracing and tracepoints have replaced all specialized trackers
which were in the kernel before tracing was available. We're not going
to add a new one just because.
If there is anything which you find that tracing and tracepoints cannot
provide then the obvious solution is to extend that infrastructure so it
can serve your usecase.
Thanks,
tglx