Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel

[PATCH v4 00/13] "Task_isolation" mode · Alex Belits <hidden> · 2020-07-22
[PATCH v4 01/13] task_isolation: vmstat: add quiet_vmstat_sync function · Alex Belits <hidden> · 2020-07-22
[PATCH v4 02/13] task_isolation: vmstat: add vmstat_idle function · Alex Belits <hidden> · 2020-07-22
[PATCH v4 03/13] task_isolation: userspace hard isolation from kernel · Alex Belits <hidden> · 2020-07-22
Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel · Frederic Weisbecker <frederic@kernel.org> · 2020-10-01
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel · Alex Belits <hidden> · 2020-10-04
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel · Frederic Weisbecker <frederic@kernel.org> · 2020-10-04
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel · Nitesh Narayan Lal <hidden> · 2020-10-05
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel · Frederic Weisbecker <frederic@kernel.org> · 2020-10-06
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel · Alex Belits <hidden> · 2020-10-17
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel · Alex Belits <hidden> · 2020-10-17
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel · Thomas Gleixner <hidden> · 2020-10-17
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel · Alex Belits <hidden> · 2020-10-17
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel · Thomas Gleixner <hidden> · 2020-10-17
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel · Alex Belits <hidden> · 2020-10-06
Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel · Frederic Weisbecker <frederic@kernel.org> · 2020-10-01
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel · Alex Belits <hidden> · 2020-10-04
[PATCH v4 04/13] task_isolation: Add task isolation hooks to arch-independent code · Alex Belits <hidden> · 2020-07-22
[PATCH v4 05/13] task_isolation: Add xen-specific hook · Alex Belits <hidden> · 2020-07-22
[PATCH 06/13] task_isolation: Add driver-specific hooks · Alex Belits <hidden> · 2020-07-22
[PATCH v4 07/13] task_isolation: arch/x86: enable task isolation functionality · Alex Belits <hidden> · 2020-07-22
[PATCH 08/13] task_isolation: arch/arm64: enable task isolation functionality · Alex Belits <hidden> · 2020-07-22
[PATCH v4 09/13] task_isolation: arch/arm: enable task isolation functionality · Alex Belits <hidden> · 2020-07-22
[PATCH v4 10/13] task_isolation: don't interrupt CPUs with tick_nohz_full_kick_cpu() · Alex Belits <hidden> · 2020-07-22
Re: [PATCH v4 10/13] task_isolation: don't interrupt CPUs with tick_nohz_full_kick_cpu() · Frederic Weisbecker <frederic@kernel.org> · 2020-10-01
Re: [EXT] Re: [PATCH v4 10/13] task_isolation: don't interrupt CPUs with tick_nohz_full_kick_cpu() · Alex Belits <hidden> · 2020-10-04
Re: [EXT] Re: [PATCH v4 10/13] task_isolation: don't interrupt CPUs with tick_nohz_full_kick_cpu() · Frederic Weisbecker <frederic@kernel.org> · 2020-10-06
Re: [EXT] Re: [PATCH v4 10/13] task_isolation: don't interrupt CPUs with tick_nohz_full_kick_cpu() · Alex Belits <hidden> · 2020-10-17
[PATCH v4 11/13] task_isolation: net: don't flush backlog on CPUs running isolated tasks · Alex Belits <hidden> · 2020-07-22
Re: [PATCH v4 11/13] task_isolation: net: don't flush backlog on CPUs running isolated tasks · Frederic Weisbecker <frederic@kernel.org> · 2020-10-01
Re: [EXT] Re: [PATCH v4 11/13] task_isolation: net: don't flush backlog on CPUs running isolated tasks · Alex Belits <hidden> · 2020-10-04
Re: [PATCH v4 11/13] task_isolation: net: don't flush backlog on CPUs running isolated tasks · Marcelo Tosatti <hidden> · 2021-01-22
Re: [PATCH v4 11/13] task_isolation: net: don't flush backlog on CPUs running isolated tasks · Paolo Abeni <pabeni@redhat.com> · 2021-01-22
[PATCH v4 12/13] task_isolation: ringbuffer: don't interrupt CPUs running isolated tasks on buffer resize · Alex Belits <hidden> · 2020-07-22
[PATCH 13/13] task_isolation: kick_all_cpus_sync: don't kick isolated cpus · Alex Belits <hidden> · 2020-07-22
Re: [PATCH v4 00/13] "Task_isolation" mode · Thomas Gleixner <hidden> · 2020-07-23
Re: [PATCH v4 00/13] "Task_isolation" mode · Peter Zijlstra <peterz@infradead.org> · 2020-07-23
Re: [PATCH v4 00/13] "Task_isolation" mode · Thomas Gleixner <hidden> · 2020-07-23
Re: [PATCH v4 00/13] "Task_isolation" mode · Peter Zijlstra <peterz@infradead.org> · 2020-07-23
Re: [EXT] Re: [PATCH v4 00/13] "Task_isolation" mode · Alex Belits <hidden> · 2020-07-23
Re: [EXT] Re: [PATCH v4 00/13] "Task_isolation" mode · Peter Zijlstra <peterz@infradead.org> · 2020-07-23
Re: [EXT] Re: [PATCH v4 00/13] "Task_isolation" mode · Alex Belits <hidden> · 2020-07-23
Re: [PATCH v4 00/13] "Task_isolation" mode · Alex Belits <hidden> · 2020-07-23
Re: [PATCH v4 00/13] "Task_isolation" mode · Peter Zijlstra <peterz@infradead.org> · 2020-07-23
Re: [PATCH v4 00/13] "Task_isolation" mode · Alex Belits <hidden> · 2020-07-23
Re: [PATCH v4 00/13] "Task_isolation" mode · Thomas Gleixner <hidden> · 2020-07-23
Re: [EXT] Re: [PATCH v4 00/13] "Task_isolation" mode · Alex Belits <hidden> · 2020-07-24
Re: [PATCH v4 00/13] "Task_isolation" mode · Thomas Gleixner <hidden> · 2020-07-24
Re: [PATCH v4 00/13] "Task_isolation" mode · Thomas Gleixner <hidden> · 2020-07-23

From: Alex Belits <hidden>
Date: 2020-10-17 05:44:48
Also in: linux-api, linux-arch, linux-arm-kernel, lkml

On Mon, 2020-10-05 at 14:52 -0400, Nitesh Narayan Lal wrote:

On 10/4/20 7:14 PM, Frederic Weisbecker wrote:

quoted

On Sun, Oct 04, 2020 at 02:44:39PM +0000, Alex Belits wrote:

quoted

On Thu, 2020-10-01 at 15:56 +0200, Frederic Weisbecker wrote:

quoted

External Email

-------------------------------------------------------------
------
---
On Wed, Jul 22, 2020 at 02:49:49PM +0000, Alex Belits wrote:

quoted

+/*
+ * Description of the last two tasks that ran isolated on a
given
CPU.
+ * This is intended only for messages about isolation
breaking. We
+ * don't want any references to actual task while accessing
this
from
+ * CPU that caused isolation breaking -- we know nothing
about
timing
+ * and don't want to use locking or RCU.
+ */
+struct isol_task_desc {
+	atomic_t curr_index;
+	atomic_t curr_index_wr;
+	bool	warned[2];
+	pid_t	pid[2];
+	pid_t	tgid[2];
+	char	comm[2][TASK_COMM_LEN];
+};
+static DEFINE_PER_CPU(struct isol_task_desc,
isol_task_descs);

So that's quite a huge patch that would have needed to be split
up.
Especially this tracing engine.

Speaking of which, I agree with Thomas that it's unnecessary.
It's
too much
code and complexity. We can use the existing trace events and
perform
the
analysis from userspace to find the source of the disturbance.

The idea behind this is that isolation breaking events are
supposed to
be known to the applications while applications run normally, and
they
should not require any analysis or human intervention to be
handled.

Sure but you can use trace events for that. Just trace interrupts,
workqueues,
timers, syscalls, exceptions and scheduler events and you get all
the local
disturbance. You might want to tune a few filters but that's pretty
much it.

As for the source of the disturbances, if you really need that
information,
you can trace the workqueue and timer queue events and just filter
those that
target your isolated CPUs.

I agree that we can do all those things with tracing.
However, IMHO having a simplified logging mechanism to gather the
source of
violation may help in reducing the manual effort.

Although, I am not sure how easy will it be to maintain such an
interface
over time.

I think that the goal of "finding source of disturbance" interface is
different from what can be accomplished by tracing in two ways:

1. "Source of disturbance" should provide some useful information about
category of event and it cause as opposed to determining all precise
details about things being called that resulted or could result in
disturbance. It should not depend on the user's knowledge about details
of implementations, it should provide some definite answer of what
happened (with whatever amount of details can be given in a generic
mechanism) even if the user has no idea how those things happen and
what part of kernel is responsible for either causing or processing
them. Then if the user needs further details, they can be obtained with
tracing.

2. It should be usable as a runtime error handling mechanism, so the
information it provides should be suitable for application use and
logging. It should be usable when applications are running on a system
in production, and no specific tracing or monitoring mechanism can be
in use. If, say, thousands of devices are controlling neutrino
detectors on an ocean floor, and in a month of work one of them got one
isolation breaking event, it should be able to report that isolation
was broken by an interrupt from a network interface, so the users will
be able to track it down to some userspace application reconfiguring
those interrupts.

It will be a good idea to make such mechanism optional and suitable for
tracking things on conditions other than "always enabled" and "enabled
with task isolation". However in my opinion, there should be something
in kernel entry procedure that, if enabled, prepared something to be
filled by the cause data, and we know at least one such situation when
this kernel entry procedure should be triggered -- when task isolation
is on.

-- 
Alex

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help