Thread (51 messages) 51 messages, 13 authors, 2012-11-26

Re: [RFC v3 0/3] vmpressure_fd: Linux VM pressure notifications

From: David Rientjes <rientjes@google.com>
Date: 2012-11-18 22:53:24
Also in: linux-mm, lkml

On Fri, 16 Nov 2012, Anton Vorontsov wrote:
 The main change is that I decided to go with discrete levels of the
 pressure.

 When I started writing the man page, I had to describe the 'reclaimer
 inefficiency index', and while doing this I realized that I'm describing
 how the kernel is doing the memory management, which we try to avoid in
 the vmevent. And applications don't really care about these details:
 reclaimers, its inefficiency indexes, scanning window sizes, priority
 levels, etc. -- it's all "not interesting", and purely kernel's stuff. So
 I guess Mel Gorman was right, we need some sort of levels.

 What applications (well, activity managers) are really interested in is
 this:

 1. Do we we sacrifice resources for new memory allocations (e.g. files
    cache)?
 2. Does the new memory allocations' cost becomes too high, and the system
    hurts because of this?
 3. Are we about to OOM soon?

 And here are the answers:

 1. VMEVENT_PRESSURE_LOW
 2. VMEVENT_PRESSURE_MED
 3. VMEVENT_PRESSURE_OOM

 There is no "high" pressure, since I really don't see any definition of
 it, but it's possible to introduce new levels without breaking ABI.

Later I came up with the fourth level:

 Maybe it makes sense to implement something like PRESSURE_MILD/BALANCE
 with an additional nr_pages threshold, which basically hits the kernel
 about how many easily reclaimable pages userland has (that would be a
 part of our definition for the mild/balance pressure level).

I.e. the fourth level can serve as a two-way communication w/ the kernel.
But again, this would be just an extension, I don't want to introduce this
now.
That certainly makes sense, it would be too much of a usage and 
maintenance burden to assume that the implementation of the VM is to 
remain the same.
quoted
The set of nodes that a thread is allowed to allocate from may face memory 
pressure up to and including oom while the rest of the system may have a 
ton of free memory.  Your solution is to compile and mount memcg if you 
want notifications of memory pressure on those nodes.  Others in this 
thread have already said they don't want to rely on memcg for any of this 
and, as Anton showed, this can be tied directly into the VM without any 
help from memcg as it sits today.  So why implement a simple and clean 
You meant 'why not'?
Yes, sorry.
quoted
mempressure cgroup that can be used alone or co-existing with either memcg 
or cpusets?

Same thing with a separate mempressure cgroup.  The point is that there 
will be users of this cgroup that do not want the overhead imposed by 
memcg (which is why it's disabled in defconfig) and there's no direct 
dependency that causes it to be a part of memcg.
There's also an API "inconvenince issue" with memcg's usage_in_bytes
stuff: applications have a hard time resetting the threshold to 'emulate'
the pressure notifications, and they also have to count bytes (like 'total
- used = free') to set the threshold. While a separate 'pressure'
notifications shows exactly what apps actually want to know: the pressure.
Agreed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help