Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
From: Baoquan He <bhe@redhat.com>
Date: 2018-07-25 02:21:31
Also in:
kexec, linux-devicetree, linux-input, linux-pci, lkml, nvdimm
Hi Andrew, On 07/19/18 at 12:44pm, Andrew Morton wrote:
On Thu, 19 Jul 2018 23:17:53 +0800 Baoquan He [off-list ref] wrote:quoted
quoted
As far as I can tell, the above is the whole reason for the patchset, yes? To avoid confusing users.In fact, it's not just trying to avoid confusing users. Kexec loading and kexec_file loading are just do the same thing in essence. Just we need do kernel image verification on uefi system, have to port kexec loading code to kernel. Kexec has been a formal feature in our distro, and customers owning those kind of very large machine can make use of this feature to speed up the reboot process. On uefi machine, the kexec_file loading will search place to put kernel under 4G from top to down. As we know, the 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume it. It may have possibility to not be able to find a usable space for kernel/initrd. From the top down of the whole memory space, we don't have this worry. And at the first post, I just posted below with AKASHI's walk_system_ram_res_rev() version. Later you suggested to use list_head to link child sibling of resource, see what the code change looks like. http://lkml.kernel.org/r/20180322033722.9279-1-bhe@redhat.com Then I posted v2 http://lkml.kernel.org/r/20180408024724.16812-1-bhe@redhat.com Rob Herring mentioned that other components which has this tree struct have planned to do the same thing, replacing the singly linked list with list_head to link resource child sibling. Just quote Rob's words as below. I think this could be another reason. ~~~~~ From Rob The DT struct device_node also has the same tree structure with parent, child, sibling pointers and converting to list_head had been on the todo list for a while. ACPI also has some tree walking functions (drivers/acpi/acpica/pstree.c). Perhaps there should be a common tree struct and helpers defined either on top of list_head or a ~~~~~ new struct if that saves some size.Please let's get all this into the changelogs?
Sorry for late reply because of some urgent customer hotplug issues. I am rewriting all change logs, and cover letter. Then found I was wrong about the 2nd reason. The current kexec_file_load calls kexec_locate_mem_hole() to go through all system RAM region, if one region is larger than the size of kernel or initrd, it will search a position in that region from top to down. Since kexec will jump to 2nd kernel and don't need to care the 1st kernel's data, we can always find a usable space to load kexec kernel/initrd under 4G. So the only reason for this patch is keeping consistent with kexec_load and avoid confusion. And since x86 5-level paging mode has been added, we have another issue for top-down searching in the whole system RAM. That is we support dynamic 4-level to 5-level changing. Namely a kernel compiled with 5-level support, we can add 'no5lvl' to force 4-level. Then jumping from a 5-level kernel to 4-level kernel, e.g we load kernel at the top of system RAM in 5-level paging mode which might be bigger than 64TB, then try to jump to 4-level kernel with the upper limit of 64TB. For this case, we need add limit for kexec kernel loading if in 5-level kernel. All this mess makes me hesitate to choose a deligate method. Maybe I should drop this patchset.
quoted
quoted
Is that sufficient? Can we instead simplify their lives by providing better documentation or informative printks or better Kconfig text, etc? And who *are* the people who are performing this configuration? Random system administrators? Linux distro engineers? If the latter then they presumably aren't easily confused!Kexec was invented for kernel developer to speed up their kernel rebooting. Now high end sever admin, kernel developer and QE are also keen to use it to reboot large box for faster feature testing, bug debugging. Kernel dev could know this well, about kernel loading position, admin or QE might not be aware of it very well.quoted
In other words, I'm trying to understand how much benefit this patchset will provide to our users as a whole.Understood. The list_head replacing patch truly involes too many code changes, it's risky. I am willing to try any idea from reviewers, won't persuit they have to be accepted finally. If don't have a try, we don't know what it looks like, and what impact it may have. I am fine to take AKASHI's simple version of walk_system_ram_res_rev() to lower risk, even though it could be a little bit low efficient.The larger patch produces a better result. We can handle it ;)
For this issue, if we stop changing the kexec top down searching code, I am not sure if we should post this replacing with list_head patches separately. Thanks Baoquan