Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2

[PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-12
[PATCH 01/10] mm: workingset: don't drop refault information prematurely · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-12
[PATCH 03/10] delayacct: track delays from thrashing cache pages · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-12
[PATCH 04/10] sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-12
[PATCH 05/10] sched: loadavg: make calc_load_n() public · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-12
[PATCH 07/10] sched: introduce this_rq_lock_irq() · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-12
[PATCH 06/10] sched: sched.h: make rq locking and clock functions available in stats.h · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-12
[PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-12
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Peter Zijlstra <peterz@infradead.org> · 2018-07-13
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-13
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Peter Zijlstra <peterz@infradead.org> · 2018-07-14
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Peter Zijlstra <peterz@infradead.org> · 2018-07-14
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Peter Zijlstra <peterz@infradead.org> · 2018-07-17
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-18
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Peter Zijlstra <peterz@infradead.org> · 2018-07-17
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-18
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Peter Zijlstra <peterz@infradead.org> · 2018-07-17
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-18
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Peter Zijlstra <peterz@infradead.org> · 2018-07-17
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-18
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-20
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Peter Zijlstra <peterz@infradead.org> · 2018-07-17
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-18
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Peter Zijlstra <peterz@infradead.org> · 2018-07-17
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Peter Zijlstra <peterz@infradead.org> · 2018-07-18
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Peter Zijlstra <peterz@infradead.org> · 2018-07-18
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-18
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Peter Zijlstra <peterz@infradead.org> · 2018-07-19
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Peter Zijlstra <peterz@infradead.org> · 2018-07-19
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-19
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Peter Zijlstra <peterz@infradead.org> · 2018-07-19
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Linus Torvalds <torvalds@linux-foundation.org> · 2018-07-19
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-19
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-19
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Peter Zijlstra <peterz@infradead.org> · 2018-07-19
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-24
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Peter Zijlstra <peterz@infradead.org> · 2018-07-18
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-18
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Peter Zijlstra <peterz@infradead.org> · 2018-07-18
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-18
Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO · Peter Zijlstra <peterz@infradead.org> · 2018-07-20
[RFC PATCH 10/10] psi: aggregate ongoing stall events when somebody reads pressure · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-12
Re: [RFC PATCH 10/10] psi: aggregate ongoing stall events when somebody reads pressure · Andrew Morton <akpm@linux-foundation.org> · 2018-07-12
Re: [RFC PATCH 10/10] psi: aggregate ongoing stall events when somebody reads pressure · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-13
Re: [RFC PATCH 10/10] psi: aggregate ongoing stall events when somebody reads pressure · Suren Baghdasaryan <surenb@google.com> · 2018-07-13
Re: [RFC PATCH 10/10] psi: aggregate ongoing stall events when somebody reads pressure · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-13
Re: [RFC PATCH 10/10] psi: aggregate ongoing stall events when somebody reads pressure · Suren Baghdasaryan <surenb@google.com> · 2018-07-13
Re: [RFC PATCH 10/10] psi: aggregate ongoing stall events when somebody reads pressure · Peter Zijlstra <peterz@infradead.org> · 2018-07-17
[PATCH 09/10] psi: cgroup support · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-12
Re: [PATCH 09/10] psi: cgroup support · Tejun Heo <tj@kernel.org> · 2018-07-12
Re: [PATCH 09/10] psi: cgroup support · Peter Zijlstra <peterz@infradead.org> · 2018-07-17
Re: [PATCH 09/10] psi: cgroup support · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-24
[PATCH 02/10] mm: workingset: tell cache transitions from workingset thrashing · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-12
Re: [PATCH 02/10] mm: workingset: tell cache transitions from workingset thrashing · Arnd Bergmann <arnd@arndb.de> · 2018-07-23
Re: [PATCH 02/10] mm: workingset: tell cache transitions from workingset thrashing · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-23
Re: [PATCH 02/10] mm: workingset: tell cache transitions from workingset thrashing · Arnd Bergmann <arnd@arndb.de> · 2018-07-23
Re: [PATCH 02/10] mm: workingset: tell cache transitions from workingset thrashing · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-23
Re: [PATCH 02/10] mm: workingset: tell cache transitions from workingset thrashing · Will Deacon <hidden> · 2018-07-24
Re: [PATCH 02/10] mm: workingset: tell cache transitions from workingset thrashing · Will Deacon <hidden> · 2018-07-25
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Linus Torvalds <torvalds@linux-foundation.org> · 2018-07-12
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Andrew Morton <akpm@linux-foundation.org> · 2018-07-12
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-13
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Daniel Drake <hidden> · 2018-07-16
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Michal Hocko <mhocko@kernel.org> · 2018-07-17
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Daniel Drake <hidden> · 2018-07-17
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Michal Hocko <mhocko@kernel.org> · 2018-07-17
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Daniel Drake <hidden> · 2018-07-25
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-18
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · peter enderborg <hidden> · 2018-07-19
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-19
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Balbir Singh <bsingharora@gmail.com> · 2018-07-23
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-24
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · "Singh, Balbir" <bsingharora@gmail.com> · 2018-07-26
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-26
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Suren Baghdasaryan <surenb@google.com> · 2018-07-27
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Pavel Machek <hidden> · 2018-07-27
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Johannes Weiner <hannes@cmpxchg.org> · 2018-07-30
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Pavel Machek <hidden> · 2018-07-30
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Tejun Heo <tj@kernel.org> · 2018-07-30
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Randy Dunlap <hidden> · 2018-07-30
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Tejun Heo <tj@kernel.org> · 2018-07-30
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Pavel Machek <hidden> · 2018-07-30
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 · Tejun Heo <tj@kernel.org> · 2018-07-30

From: Johannes Weiner <hannes@cmpxchg.org>
Date: 2018-07-18 22:19:15
Also in: linux-mm, lkml

On Tue, Jul 17, 2018 at 01:25:15PM +0200, Michal Hocko wrote:

On Mon 16-07-18 10:57:45, Daniel Drake wrote:

quoted

Hi Johannes,

Thanks for your work on psi! 

We have also been investigating the "thrashing problem" on our Endless
desktop OS. We have seen that systems can easily get into a state where the
UI becomes unresponsive to input, and the mouse cursor becomes extremely
slow or stuck when the system is running out of memory. We are working with
a full GNOME desktop environment on systems with only 2GB RAM, and
sometimes no real swap (although zram-swap helps mitigate the problem to
some extent).

My analysis so far indicates that when the system is low on memory and hits
this condition, the system is spending much of the time under
__alloc_pages_direct_reclaim. "perf trace -F" shows many many page faults
in executable code while this is going on. I believe the kernel is
swapping out executable code in order to satisfy memory allocation
requests, but then that swapped-out code is needed a moment later so it
gets swapped in again via the page fault handler, and all this activity
severely starves the system from being able to respond to user input.

I appreciate the kernel's attempt to keep processes alive, but in the
desktop case we see that the system rarely recovers from this situation,
so you have to hard shutdown. In this case we view it as desirable that
the OOM killer would step in (it is not doing so because direct reclaim
is not actually failing).

Yes, we currently use a userspace application that monitors pressure
and OOM kills (there is usually plenty of headroom left for a small
application to run by the time quality of service for most workloads
has already tanked to unacceptable levels). We want to eventually add
this back into the kernel with the appropriate configuration options
(pressure threshold value and sustained duration etc.)

Yes this is really unfortunate. One thing that could help would be to
consider a trashing level during the reclaim (get_scan_count) to simply
forget about LRUs which are constantly refaulting pages back. We already
have the infrastructure for that. We just need to plumb it in.

This doesn't work without quantifying the actual time you're spending
on thrashing IO. The cutoff for acceptable refaults is very different
between rotating disks, crappy SSDs, and high-end flash.

But in the future we might want the OOM killer to monitor psi memory
levels and dispatch tasks when we sustain X percent for Y seconds.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help