Re: [PATCH 00/33] Swap over NFS -v14

[PATCH 00/33] Swap over NFS -v14 · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 32/33] nfs: fix various memory recursions possible with swap over NFS. · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 05/33] mm: kmem_estimate_pages() · Peter Zijlstra <hidden> · 2007-10-30
Re: [PATCH 05/33] mm: kmem_estimate_pages() · Nick Piggin <hidden> · 2007-10-31
Re: [PATCH 05/33] mm: kmem_estimate_pages() · Peter Zijlstra <hidden> · 2007-10-31
[PATCH 24/33] mm: prepare swap entry methods for use in page methods · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 03/33] mm: slub: add knowledge of reserve pages · Peter Zijlstra <hidden> · 2007-10-30
Re: [PATCH 03/33] mm: slub: add knowledge of reserve pages · Nick Piggin <hidden> · 2007-10-31
Re: [PATCH 03/33] mm: slub: add knowledge of reserve pages · Peter Zijlstra <hidden> · 2007-10-31
Re: [PATCH 03/33] mm: slub: add knowledge of reserve pages · Nick Piggin <hidden> · 2007-10-31
Re: [PATCH 03/33] mm: slub: add knowledge of reserve pages · Peter Zijlstra <hidden> · 2007-10-31
Re: [PATCH 03/33] mm: slub: add knowledge of reserve pages · Nick Piggin <hidden> · 2007-10-31
Re: [PATCH 03/33] mm: slub: add knowledge of reserve pages · Peter Zijlstra <hidden> · 2007-10-31
Re: [PATCH 03/33] mm: slub: add knowledge of reserve pages · Peter Zijlstra <peterz@infradead.org> · 2007-10-31
[PATCH 16/33] netvm: network reserve infrastructure · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 08/33] mm: emergency pool · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 30/33] nfs: swap vs nfs_writepage · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 26/33] mm: methods for teaching filesystems about PG_swapcache pages · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 02/33] mm: tag reseve pages · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 28/33] nfs: teach the NFS client how to treat PG_swapcache pages · Peter Zijlstra <hidden> · 2007-10-30
Re: [PATCH 28/33] nfs: teach the NFS client how to treat PG_swapcache pages · Christoph Hellwig <hch@infradead.org> · 2007-10-31
[PATCH 22/33] netfilter: NF_QUEUE vs emergency skbs · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 19/33] netvm: hook skb allocation to reserves · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 13/33] net: wrap sk->sk_backlog_rcv() · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 10/33] mm: __GFP_MEMALLOC · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 23/33] netvm: skb processing · Peter Zijlstra <hidden> · 2007-10-30
Re: [PATCH 23/33] netvm: skb processing · Stephen Hemminger <hidden> · 2007-10-30
Re: [PATCH 23/33] netvm: skb processing · Peter Zijlstra <peterz@infradead.org> · 2007-10-30
[PATCH 25/33] mm: add support for non block device backed swap files · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 20/33] netvm: filter emergency skbs. · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 01/33] mm: gfp_to_alloc_flags() · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 17/33] sysctl: propagate conv errors · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 14/33] net: packet split receive api · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 15/33] net: sk_allocation() - concentrate socket related allocations · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 07/33] mm: serialize access to min_free_kbytes · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 18/33] netvm: INET reserves. · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 11/33] mm: memory reserve management · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 21/33] netvm: prevent a TCP specific deadlock · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 29/33] nfs: disable data cache revalidation for swapfiles · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 27/33] nfs: remove mempools · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 31/33] nfs: enable swap on NFS · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 04/33] mm: allow mempool to fall back to memalloc reserves · Peter Zijlstra <hidden> · 2007-10-30
Re: [PATCH 04/33] mm: allow mempool to fall back to memalloc reserves · Nick Piggin <hidden> · 2007-10-31
[PATCH 06/33] mm: allow PF_MEMALLOC from softirq context · Peter Zijlstra <hidden> · 2007-10-30
Re: [PATCH 06/33] mm: allow PF_MEMALLOC from softirq context · Nick Piggin <hidden> · 2007-10-31
Re: [PATCH 06/33] mm: allow PF_MEMALLOC from softirq context · Peter Zijlstra <hidden> · 2007-10-31
Re: [PATCH 06/33] mm: allow PF_MEMALLOC from softirq context · Nick Piggin <hidden> · 2007-10-31
Re: [PATCH 06/33] mm: allow PF_MEMALLOC from softirq context · Peter Zijlstra <hidden> · 2007-10-31
[PATCH 12/33] selinux: tag avc cache alloc as non-critical · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 33/33] nfs: do not warn on radix tree node allocation failures · Peter Zijlstra <hidden> · 2007-10-30
[PATCH 09/33] mm: system wide ALLOC_NO_WATERMARK · Peter Zijlstra <hidden> · 2007-10-30
Re: [PATCH 09/33] mm: system wide ALLOC_NO_WATERMARK · Nick Piggin <hidden> · 2007-10-31
Re: [PATCH 09/33] mm: system wide ALLOC_NO_WATERMARK · Peter Zijlstra <hidden> · 2007-10-31
Re: [PATCH 00/33] Swap over NFS -v14 · Nick Piggin <hidden> · 2007-10-31
Re: [PATCH 00/33] Swap over NFS -v14 · David Miller <davem@davemloft.net> · 2007-10-31
Re: [PATCH 00/33] Swap over NFS -v14 · Nick Piggin <hidden> · 2007-10-31
Re: [PATCH 00/33] Swap over NFS -v14 · Byron Stanoszek <hidden> · 2007-10-31
Re: [PATCH 00/33] Swap over NFS -v14 · Christoph Hellwig <hch@infradead.org> · 2007-10-31
Re: [PATCH 00/33] Swap over NFS -v14 · Peter Zijlstra <hidden> · 2007-10-31
NBD was Re: [PATCH 00/33] Swap over NFS -v14 · Pavel Machek <hidden> · 2007-10-31
Re: NBD was Re: [PATCH 00/33] Swap over NFS -v14 · Peter Zijlstra <hidden> · 2007-10-31
Re: [PATCH 00/33] Swap over NFS -v14 · Mike Snitzer <hidden> · 2007-10-31
Re: [PATCH 00/33] Swap over NFS -v14 · Evgeniy Polyakov <hidden> · 2007-10-31
Re: [PATCH 00/33] Swap over NFS -v14 · Peter Zijlstra <hidden> · 2007-10-31
Re: [PATCH 00/33] Swap over NFS -v14 · Peter Zijlstra <hidden> · 2007-10-31
Re: [PATCH 00/33] Swap over NFS -v14 · Jeff Garzik <hidden> · 2007-10-31
Re: [PATCH 00/33] Swap over NFS -v14 · Peter Zijlstra <hidden> · 2007-10-31
Re: [PATCH 00/33] Swap over NFS -v14 · Arnaldo Carvalho de Melo <hidden> · 2007-10-31
Re: [PATCH 00/33] Swap over NFS -v14 · Gregory Haskins <hidden> · 2007-10-31
Re: [PATCH 00/33] Swap over NFS -v14 · Pavel Machek <hidden> · 2007-11-02
Re: [PATCH 00/33] Swap over NFS -v14 · Robin Humble <hidden> · 2007-11-18

From: Peter Zijlstra <hidden>
Date: 2007-10-31 12:57:18
Also in: linux-mm, lkml

On Wed, 2007-10-31 at 08:16 -0400, Jeff Garzik wrote:

Thoughts:

1) I absolutely agree that NFS is far more prominent and useful than any 
network block device, at the present time.


2) Nonetheless, swap over NFS is a pretty rare case.  I view this work 
as interesting, but I really don't see a huge need, for swapping over 
NBD or swapping over NFS.  I tend to think swapping to a remote resource 
starts to approach "migration" rather than merely swapping.  Yes, we can 
do it...  but given the lack of burning need one must examine the price.

There is a large corporate demand for this, which is why I'm doing this.

The typical usage scenarios are:
 - cluster/blades, where having local disks is a cost issue (maintenance
   of failures, heat, etc)
 - virtualisation, where dumping the storage on a networked storage unit
   makes for trivial migration and what not..

But please, people who want this (I'm sure some of you are reading) do
speak up. I'm just the motivated corporate drone implementing the
feature :-)

3) You note

quoted

Swap over network has the problem that the network subsystem does not use fixed
sized allocations, but heavily relies on kmalloc(). This makes mempools
unusable.

True, but IMO there are mitigating factors that should be researched and 
taken into account:

a) To give you some net driver background/history, most mainstream net 
drivers were coded to allocate RX skbs of size 1538, under the theory 
that they would all be allocating out of the same underlying slab cache. 
  It would not be difficult to update a great many of the [non-jumbo] 
cases to create a fixed size allocation pattern.

One issue that comes to mind is how to ensure we'd still overflow the
IP-reassembly buffers. Currently those are managed on the number of
bytes present, not the number of fragments.

One of the goals of my approach was to not rewrite the network subsystem
to accomodate this feature (and I hope I succeeded).

b) Spare-time experiments and anecdotal evidence points to RX and TX skb 
recycling as a potentially valuable area of research.  If you are able 
to do something like that, then memory suddenly becomes a lot more 
bounded and predictable.


So my gut feeling is that taking a hard look at how net drivers function 
in the field should give you a lot of good ideas that approach the 
shared goal of making network memory allocations more predictable and 
bounded.

Note that being bounded only comes from dropping most packets before
trying them to a socket. That is the crucial part of the RX path, to
receive all packets from the NIC (regardless their size) but to not pass
them on to the network stack - unless they belong to a 'special' socket
that promises undelayed processing.

Thanks for these ideas, I'll look into them.

Attachments

signature.asc [application/pgp-signature] 189 bytes

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help