Thread (8 messages) 8 messages, 4 authors, 2022-01-24

Re: fbdev: Garbage collect fbdev scrolling acceleration

From: Sven Schnelle <hidden>
Date: 2022-01-19 16:34:05
Also in: dri-devel

Hi Daniel,

Daniel Vetter [off-list ref] writes:
On Wed, Jan 19, 2022 at 05:15:44PM +0100, Sven Schnelle wrote:
quoted
Hi Daniel,

Daniel Vetter [off-list ref] writes:
quoted
On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote:
quoted
Helge Deller [off-list ref] writes:
quoted
Maybe on fast new x86 boxes the performance difference isn't huge,
but for all old systems, or when emulated in qemu, this makes
a big difference.

Helge
I second that. For most people, the framebuffer isn't important as
they're mostly interested in getting to X11/wayland as fast as possible.
But for systems like servers without X11 it's nice to have a fast
console.
Fast console howto:
- shadow buffer in cached memory
- timer based upload of changed areas to the real framebuffer

This one is actually fast, instead of trying to use hw bltcopy and having
the most terrible fallback path if that's gone. Yes drm fbdev helpers has
this (but not enabled on most drivers because very, very few people care).
Hmm.... Take my Laptop with a 4k (3180x2160) screen as an example:

Lets say on average the half of every line is filled with text.

So 3840/2*2160 pixels that change = 4147200 pixels. Every pixel takes 4
bytes = 16,588800 bytes per timer interrupt. In another Mail updating on
vsync was mentioned, so multiply that by 60 and get ~927MB. And even if
you only update the screen ony 4 times per second, that would be ~64MB
of data. I'm likely missing something here.
Since you say 4k it's a modern box, so you have on the order of 10GB/s of
write bandwidth.

And around 100MB/s of read bandwidth. Both from the cpu. It all adds up.
It's that uncached read which kills you and means dmesg takes seconds to
display.

Also since this is 4k looking at sales volume we're talking integrated, so
whether it's the gpu or the cpu that's doing the memcpy, it's the same
memory bw budget you're burning down.
That might be true for integrated graphics, as said, i don't know the
architecture. But saying it's good just because it's good on one
architecture doesn't mean it's good for everyone. If you have an
external GPU, than the memory/system bus BW would be different whether
it's memcpy or the GPU doing the scrolling. And whether internal or external
graphics - the CPU could do other stuff while the GPU scrolls stuff.

Quite a lot of discussion for a revert of a patch that was already in
the kernel for more than 20(?) years.

/Sven
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help