Thread (61 messages) 61 messages, 10 authors, 2011-03-23

Re: [PATCH v6 0/9] memcg: per cgroup dirty page accounting

From: Johannes Weiner <hannes@cmpxchg.org>
Date: 2011-03-16 21:52:33
Also in: linux-fsdevel, lkml

On Wed, Mar 16, 2011 at 02:19:26PM -0700, Greg Thelen wrote:
On Wed, Mar 16, 2011 at 6:13 AM, Johannes Weiner [off-list ref] wrote:
quoted
On Tue, Mar 15, 2011 at 02:48:39PM -0400, Vivek Goyal wrote:
quoted
I think even for background we shall have to implement some kind of logic
where inodes are selected by traversing memcg->lru list so that for
background write we don't end up writting too many inodes from other
root group in an attempt to meet the low background ratio of memcg.

So to me it boils down to coming up a new inode selection logic for
memcg which can be used both for background as well as foreground
writes. This will make sure we don't end up writting pages from the
inodes we don't want to.
Originally for struct page_cgroup reduction, I had the idea of
introducing something like

       struct memcg_mapping {
               struct address_space *mapping;
               struct mem_cgroup *memcg;
       };

hanging off page->mapping to make memcg association no longer per-page
and save the pc->memcg linkage (it's not completely per-inode either,
multiple memcgs can still refer to a single inode).

We could put these descriptors on a per-memcg list and write inodes
from this list during memcg-writeback.

We would have the option of extending this structure to contain hints
as to which subrange of the inode is actually owned by the cgroup, to
further narrow writeback to the right pages - iff shared big files
become a problem.

Does that sound feasible?
If I understand your memcg_mapping proposal, then each inode could
have a collection of memcg_mapping objects representing the set of
memcg that were charged for caching pages of the inode's data.  When a
new file page is charged to a memcg, then the inode's set of
memcg_mapping would be scanned to determine if current's memcg is
already in the memcg_mapping set.  If this is the first page for the
memcg within the inode, then a new memcg_mapping would be allocated
and attached to the inode.  The memcg_mapping may be reference counted
and would be deleted when the last inode page for a particular memcg
is uncharged.
Dead-on.  Well, on which side you put the list - a per-memcg list of
inodes, or a per-inode list of memcgs - really depends on which way
you want to do the lookups.  But this is the idea, yes.
  page->mapping = &memcg_mapping
  inode->i_mapping = collection of memcg_mapping, grows/shrinks with [un]charge
If the memcg_mapping list (or hash-table for quick find-or-create?)
was to be on the inode side, I'd put it in struct address_space, since
this is all about page cache, not so much an fs thing.

Still, correct in general.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help