Thread (18 messages) 18 messages, 5 authors, 2019-06-04

Re: [PATCH kernel] prom_init: Fetch flatten device tree from the system firmware

From: Segher Boessenkool <hidden>
Date: 2019-05-03 15:36:48

On Fri, May 03, 2019 at 10:10:57AM +1000, Stewart Smith wrote:
David Gibson [off-list ref] writes:
quoted
On Wed, May 01, 2019 at 01:42:21PM +1000, Alexey Kardashevskiy wrote:
quoted
At the moment, on 256CPU + 256 PCI devices guest, it takes the guest
about 8.5sec to fetch the entire device tree via the client interface
as the DT is traversed twice - for strings blob and for struct blob.
Also, "getprop" is quite slow too as SLOF stores properties in a linked
list.

However, since [1] SLOF builds flattened device tree (FDT) for another
purpose. [2] adds a new "fdt-fetch" client interface for the OS to fetch
the FDT.

This tries the new method; if not supported, this falls back to
the old method.

There is a change in the FDT layout - the old method produced
(reserved map, strings, structs), the new one receives only strings and
structs from the firmware and adds the final reserved map to the end,
so it is (fw reserved map, strings, structs, reserved map).
This still produces the same unflattened device tree.

This merges the reserved map from the firmware into the kernel's reserved
map. At the moment SLOF generates an empty reserved map so this does not
change the existing behaviour in regard of reservations.

This supports only v17 onward as only that version provides dt_struct_size
which works as "fdt-fetch" only produces v17 blobs.

If "fdt-fetch" is not available, the old method of fetching the DT is used.

[1] https://git.qemu.org/?p=SLOF.git;a=commitdiff;h=e6fc84652c9c00
[2] https://git.qemu.org/?p=SLOF.git;a=commit;h=ecda95906930b80

Signed-off-by: Alexey Kardashevskiy <redacted>
Hrm.  I've gotta say I'm not terribly convinced that it's worth adding
a new interface we'll need to maintain to save 8s on a somewhat
contrived testcase.
256CPUs aren't that many anymore though. Although I guess that many PCI
devices is still a little uncommon.

A 4 socket POWER8 or POWER9 can easily be that large, and a small test
kernel/userspace will boot in ~2.5-4 seconds. So it's possible that
the device tree fetch could be surprisingly non-trivial percentage of boot
time at least on some machines.
All client interface calls are really heavy, and you need to do a lot of
them if you have a big device tree.  This takes time, even if the linked
list stuff does not kill you :-)


Segher
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help