Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)

[PATCH 0/8] Intel I/O Acceleration Technology (I/OAT) · Chris Leech <hidden> · 2006-03-03
[PATCH 1/8] [I/OAT] DMA memcpy subsystem · Chris Leech <hidden> · 2006-03-03
Re: [PATCH 1/8] [I/OAT] DMA memcpy subsystem · "David S. Miller" <davem@davemloft.net> · 2006-03-04
Re: [PATCH 1/8] [I/OAT] DMA memcpy subsystem · Chris Leech <hidden> · 2006-03-06
Re: [PATCH 1/8] [I/OAT] DMA memcpy subsystem · Benjamin LaHaise <bcrl@kvack.org> · 2006-03-04
Re: [PATCH 1/8] [I/OAT] DMA memcpy subsystem · Chris Leech <hidden> · 2006-03-06
[PATCH 3/8] [I/OAT] Setup the networking subsystem as a DMA client · Chris Leech <hidden> · 2006-03-03
[PATCH 6/8] [I/OAT] Rename cleanup_rbuf to tcp_cleanup_rbuf and make non-static · Chris Leech <hidden> · 2006-03-03
[PATCH 7/8] [I/OAT] Add a sysctl for tuning the I/OAT offloaded I/O threshold · Chris Leech <hidden> · 2006-03-03
Re: [PATCH 7/8] [I/OAT] Add a sysctl for tuning the I/OAT offloaded I/O threshold · Alexey Dobriyan <hidden> · 2006-03-04
Re: [PATCH 7/8] [I/OAT] Add a sysctl for tuning the I/OAT offloaded I/O threshold · Andrew Morton <hidden> · 2006-03-05
[PATCH 5/8] [I/OAT] Structure changes for TCP recv offload to I/OAT · Chris Leech <hidden> · 2006-03-03
Re: [PATCH 5/8] [I/OAT] Structure changes for TCP recv offload to I/OAT · Andrew Morton <hidden> · 2006-03-05
[PATCH 4/8] [I/OAT] Utility functions for offloading sk_buff to iovec copies · Chris Leech <hidden> · 2006-03-03
Re: [PATCH 4/8] [I/OAT] Utility functions for offloading sk_buff to iovec copies · Andrew Morton <hidden> · 2006-03-05
[PATCH 8/8] [I/OAT] TCP recv offload to I/OAT · Chris Leech <hidden> · 2006-03-03
Re: [PATCH 8/8] [I/OAT] TCP recv offload to I/OAT · Greg KH <hidden> · 2006-03-04
Re: [PATCH 8/8] [I/OAT] TCP recv offload to I/OAT · Chris Leech <hidden> · 2006-03-06
Re: [PATCH 8/8] [I/OAT] TCP recv offload to I/OAT · Andrew Morton <hidden> · 2006-03-05
Re: [PATCH 8/8] [I/OAT] TCP recv offload to I/OAT · Andrew Morton <hidden> · 2006-03-05
Re: [PATCH 8/8] [I/OAT] TCP recv offload to I/OAT · "David S. Miller" <davem@davemloft.net> · 2006-03-05
Re: [PATCH 8/8] [I/OAT] TCP recv offload to I/OAT · Chris Leech <hidden> · 2006-03-06
Re: [PATCH 8/8] [I/OAT] TCP recv offload to I/OAT · Pavel Machek <hidden> · 2006-03-05
Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT) · Jeff Garzik <hidden> · 2006-03-03
Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT) · Chris Leech <hidden> · 2006-03-03
Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT) · Jeff Garzik <hidden> · 2006-03-03
Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT) · Evgeniy Polyakov <hidden> · 2006-03-04
Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT) · Andrew Morton <hidden> · 2006-03-05
Discourage duplicate symbols in the kernel? [Was: Intel I/O Acc...] · Sam Ravnborg <hidden> · 2006-03-05
Re: Discourage duplicate symbols in the kernel? [Was: Intel I/O Acc...] · Andrew Morton <hidden> · 2006-03-05
Re: Discourage duplicate symbols in the kernel? [Was: Intel I/O Acc...] · Chris Leech <hidden> · 2006-03-06
Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT) · Kumar Gala <hidden> · 2006-03-03
Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT) · Chris Leech <hidden> · 2006-03-03
Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT) · Jan Engelhardt <hidden> · 2006-03-04
Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT) · "David S. Miller" <davem@davemloft.net> · 2006-03-04
Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT) · Evgeniy Polyakov <hidden> · 2006-03-05
Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT) · "David S. Miller" <davem@davemloft.net> · 2006-03-05
Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT) · Ingo Oeser <hidden> · 2006-03-06
Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT) · Evgeniy Polyakov <hidden> · 2006-03-07
Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT) · Ingo Oeser <hidden> · 2006-03-07
Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT) · Evgeniy Polyakov <hidden> · 2006-03-07

From: Evgeniy Polyakov <hidden>
Date: 2006-03-07 10:17:11
Also in: lkml

On Tue, Mar 07, 2006 at 10:43:59AM +0100, Ingo Oeser (netdev@axxeo.de) wrote:

Evgeniy Polyakov wrote:

quoted

On Mon, Mar 06, 2006 at 06:44:07PM +0100, Ingo Oeser (netdev@axxeo.de) wrote:

quoted

Hmm, so I should resurrect my user page table walker abstraction?

There I would hand each page to a "recording" function, which
can drop the page from the collection or coalesce it in the collector
if your scatter gather implementation allows it.

It depends on where performance growth is stopped.
From the first glance it does not look like find_extend_vma(),
probably follow_page() fault and thus __handle_mm_fault().
I can not say actually, but if it is true and performance growth is
stopped due to increased number of faults and it's processing, 
your approach will hit this problem too, doesn't it?

My approach reduced the number of loops performed and number
of memory needed at the expense of doing more work in the main
loop of get_user_pages. 

This was mitigated for the common case of getting just one page by 
providing a get_one_user_page() function.

The whole problem, why we need such multiple loops is that we have
no common container object for "IO vector + additional data".

So we always do a loop working over the vector returned by 
get_user_pages() all the time. The bigger that vector, 
the bigger the impact.

Maybe sth. as simple as providing get_user_pages() with some offset_of 
and container_of hackery will work these days without the disadvantages 
my old get_user_pages() work had.

The idea is, that you'll provide a vector (like arguments to calloc) and two 
offsets: One for the page to store within the offset and one for the vma 
to store.

If the offset has a special value (e.g MAX_LONG) you don't store there at all.

You still need to find VMA in one loop, and run through it's(mm_structu) pages in
second loop.

But if the performance problem really is get_user_pages() itself 
(and not its callers), then my approach won't help at all.

It looks so.
My test pseudocode is following:
fget_light();
igrab();
kzalloc(number_of_pages * sizeof(void *));
get_user_pages(number_of_pages);
... undo ...

I've attached two graphs of performance with and without
get_user_pages(), it is get_user_pages.png and kmalloc.png.

Vertical axis is number of Mbytes per second thrown through above code,
horizontal one is number of pages in each run.

Regards

Ingo Oeser

-- 
	Evgeniy Polyakov

Attachments

get_user_pages.png [image/png] 5498 bytes
kmalloc.png [image/png] 5816 bytes

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help