Thread (101 messages) 101 messages, 14 authors, 2005-03-17

Re: radeon, apertures & memory mapping

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: 2005-03-13 23:11:19

I'm not being clear....

Leave AGP memory as normal RAM
driver does it thing to the memory
driver executes flush of data cache on CPU
after flush tell GPU to access the data

The performance hit of executing the flush is probably negligible
since you probably didn't care about anything in the data cache. All
of those entries would be replaced by later code anyway. You will lose
some later overlap parallelism as the cache is refilled.
Should be measured though, but yes. I agree. We must make sure we have a
proper hook in the userland DRI to flush AGP pages before they get
submited (indirect buffers, texture datas, host data blits, ...) and the
kernel DRM should flush ring entries (easy probably to do it from the
various ring access macros).
quoted
Though the flushes may be fast if there is no actual hit in the cache, I
agree. Again, that should be benched.

In fact, i would _love_ to be able to mark AGP memory as cacheable on
ppc, even if there is no performance benefit in the end. The issue is
that currently, we end up having both a cacheable and a non-cacheable
mapping for those pages (the kernel linear mapping still maps those
pages cacheable, and it's almost impossible to get rid of that unless
you are prepared to disable the large pages mapping of kernel space or
the BATs on ppc32, which would harm kernel performances significantly).

It works, but it's illegal. That means that the CPU might well speculate
a load from one of these pages in kernel-land just because it happens to
be next to a page where you are iterating an array, and may then bring a
bit in the cache from that page.
That shouldn't matter the page brought in would be for a speculative
read and never accessed. It should just fall out of the cache and not
be written back. There is only one cachable mapping. In this model
writes are always followed by a flush before telling the GPU to access
the memory that has just been written.
I was talking about the current state of having both cacheable and
non-cacheable mappng. I was saying that this model has the above
possible issue, and that indeed, mapping everything cacheable would fix
it.
quoted
At that point, a non-cacheable access from userland to that same line
that was brought to the cache may lead to undefined behaviour, ranging
from just works, to checkstops the CPU, with cases of writing corrupted
data, etc... depending on the CPU.

I yet have to see the problem happening in practice, but we are
definitely not on the safe side currently. I suspect ppc32 in practice
won't hit it, but ppc64 will...

Ben.
-- 
Benjamin Herrenschmidt [off-list ref]
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help