Re: [PATCH net-next 3/3] net: auto-tune mergeable rx buffer size for improved performance
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: 2014-01-08 17:37:53
Also in:
virtualization
On Fri, Dec 27, 2013 at 01:41:28PM -0800, Michael Dalton wrote:
I'm working on a followup patchset to address current feedback. I think it will be cleaner to do a debugfs implementation for per-receive queue packet buffer size exporting, so I'm trying that out. On Thu, Dec 26, 2013 at 7:04 PM, Jason Wang [off-list ref] wrote:quoted
We can make this more accurate by using extra data structure to track the real buf size and using it as token.I agree -- we can do precise buffer total len tracking. Something like struct mergeable_packet_buffer_ctx { void *buf; unsigned int total_len;
Maybe make total_len long so size is a power of 2.
};
Hmm this doubles VQ cache footprint. In the past when I tried increasong cache footprint this hurt performance measureable. It's just a suggestion though, YMMV, if numbers are good we don't need to argue about this.
Each receive queue could have a pointer to an array of N buffer contexts, where N is queue size (kzalloc'd in init_vqs or similar). That would allow us to allocate all of our buffer context data at startup. Would this be preferred to the current approach or is there another approach you would prefer? All other things being equal, having precise length tracking is advantageous, so I'm inclined to try this out and see how it goes. I think this is a big design point - for example, if we have an extra buffer context structure, then per-receive queue frag allocators are not required for auto-tuning and we can reduce the number of patches in this patchset.
I'd be careful with adding even more stuff in mergeable_packet_buffer_ctx for above reason.
I'm happy to implement either way. Thanks! Best, Mike