Thread (51 messages) 51 messages, 13 authors, 2012-11-26

Re: [RFC v3 0/3] vmpressure_fd: Linux VM pressure notifications

From: Anton Vorontsov <hidden>
Date: 2012-11-17 01:24:46
Also in: linux-mm, lkml

On Fri, Nov 16, 2012 at 01:57:09PM -0800, David Rientjes wrote:
quoted
quoted
I'm wondering if we should have more than three different levels.
In the case I outlined below, for backwards compatibility. What I
actually mean is that memcg *currently* allows arbitrary notifications.
One way to merge those, while moving to a saner 3-point notification, is
to still allow the old writes and fit them in the closest bucket.
Yeah, but I'm wondering why three is the right answer.
You were not Cc'ed, so let me repeat why I ended up w/ the levels (not
necessary three levels), instead of relying on the 0..100 scale:

 The main change is that I decided to go with discrete levels of the
 pressure.

 When I started writing the man page, I had to describe the 'reclaimer
 inefficiency index', and while doing this I realized that I'm describing
 how the kernel is doing the memory management, which we try to avoid in
 the vmevent. And applications don't really care about these details:
 reclaimers, its inefficiency indexes, scanning window sizes, priority
 levels, etc. -- it's all "not interesting", and purely kernel's stuff. So
 I guess Mel Gorman was right, we need some sort of levels.

 What applications (well, activity managers) are really interested in is
 this:

 1. Do we we sacrifice resources for new memory allocations (e.g. files
    cache)?
 2. Does the new memory allocations' cost becomes too high, and the system
    hurts because of this?
 3. Are we about to OOM soon?

 And here are the answers:

 1. VMEVENT_PRESSURE_LOW
 2. VMEVENT_PRESSURE_MED
 3. VMEVENT_PRESSURE_OOM

 There is no "high" pressure, since I really don't see any definition of
 it, but it's possible to introduce new levels without breaking ABI.

Later I came up with the fourth level:

 Maybe it makes sense to implement something like PRESSURE_MILD/BALANCE
 with an additional nr_pages threshold, which basically hits the kernel
 about how many easily reclaimable pages userland has (that would be a
 part of our definition for the mild/balance pressure level).

I.e. the fourth level can serve as a two-way communication w/ the kernel.
But again, this would be just an extension, I don't want to introduce this
now.
quoted
quoted
Umm, why do users of cpusets not want to be able to trigger memory 
pressure notifications?
Because cpusets only deal with memory placement, not memory usage.
The set of nodes that a thread is allowed to allocate from may face memory 
pressure up to and including oom while the rest of the system may have a 
ton of free memory.  Your solution is to compile and mount memcg if you 
want notifications of memory pressure on those nodes.  Others in this 
thread have already said they don't want to rely on memcg for any of this 
and, as Anton showed, this can be tied directly into the VM without any 
help from memcg as it sits today.  So why implement a simple and clean 
You meant 'why not'?
mempressure cgroup that can be used alone or co-existing with either memcg 
or cpusets?
quoted
And it is not that moving a task to cpuset disallows you to do any of
this: you could, as long as the same set of tasks are mounted in a
corresponding memcg.
Same thing with a separate mempressure cgroup.  The point is that there 
will be users of this cgroup that do not want the overhead imposed by 
memcg (which is why it's disabled in defconfig) and there's no direct 
dependency that causes it to be a part of memcg.
There's also an API "inconvenince issue" with memcg's usage_in_bytes
stuff: applications have a hard time resetting the threshold to 'emulate'
the pressure notifications, and they also have to count bytes (like 'total
- used = free') to set the threshold. While a separate 'pressure'
notifications shows exactly what apps actually want to know: the pressure.

Thanks,
Anton.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help