Re: [RFC/RFT PATCH 2/5] memblock: introduce generic memblock_setup_resources()
From: Mike Rapoport <rppt@kernel.org>
Date: 2021-06-02 18:43:49
Also in:
linux-mips, linux-mm, linux-s390, lkml
Subsystem:
arm port, the rest · Maintainers:
Russell King, Linus Torvalds
On Wed, Jun 02, 2021 at 04:51:41PM +0100, Russell King (Oracle) wrote:
On Wed, Jun 02, 2021 at 04:54:17PM +0300, Mike Rapoport wrote:quoted
On Wed, Jun 02, 2021 at 11:15:21AM +0100, Russell King (Oracle) wrote:quoted
On Wed, Jun 02, 2021 at 11:33:10AM +0300, Mike Rapoport wrote:quoted
On Tue, Jun 01, 2021 at 02:54:15PM +0100, Russell King (Oracle) wrote:quoted
If I look at one of my kernels: c0008000 T _text c0b5b000 R __end_rodata ... exception and unwind tables live here ... c0c00000 T __init_begin c0e00000 D _sdata c0e68870 D _edata c0e68870 B __bss_start c0e995d4 B __bss_stop c0e995d4 B _end So the original covers _text..__init_begin-1 which includes the exception and unwind tables. Your version above omits these, which leaves them exposed.Right, this needs to be fixed. Is there any reason the exception and unwind tables cannot be placed between _sdata and _edata? It seems to me that they were left outside for purely historical reasons. Commit ee951c630c5c ("ARM: 7568/1: Sort exception table at compile time") moved the exception tables out of .data section before _sdata existed. Commit 14c4a533e099 ("ARM: 8583/1: mm: fix location of _etext") moved _etext before the unwind tables and didn't bother to put them into data or rodata areas.You can not assume that all sections will be between these symbols. This isn't specific to 32-bit ARM. If you look at x86's vmlinux.lds.in, you will see that BUG_TABLE and ORC_UNWIND_TABLE are after _edata, along with many other undiscarded sections before __bss_start.But if you look at x86's setup_arch() all these never make it to the resource tree. So there are holes in /proc/iomem between the kernel resources.Also true. However, my point was to counter your claim that these sections should be part of the .text/.data/.rodata etc sections in the output vmlinux. There is, however, a more important point. The __ex_table section must exist and be separate from the .text/.data/.rodata sections in the output ELF file, as sorttable (the exception table sorter) relies on this to be able to find the table and sort it. So, it isn't entirely "for historical reasons" as you said two messages ago.
Back then when __ex_table was moved from .data section, _sdata and _edata were part of the .data section. Today they are not. So something like the patch below will ensure for instance that __ex_table would be a part of "Kernel data" in /proc/iomem without moving it to the .data section:
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index f7f4620d59c3..2991feceab31 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S@@ -72,13 +72,6 @@ SECTIONS RO_DATA(PAGE_SIZE) - . = ALIGN(4); - __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) { - __start___ex_table = .; - ARM_MMU_KEEP(*(__ex_table)) - __stop___ex_table = .; - } - #ifdef CONFIG_ARM_UNWIND ARM_UNWIND_SECTIONS #endif
@@ -143,6 +136,14 @@ SECTIONS __init_end = .; _sdata = .; + + . = ALIGN(4); + __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) { + __start___ex_table = .; + ARM_MMU_KEEP(*(__ex_table)) + __stop___ex_table = .; + } + RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE) _edata = .;
Now, bear in mind that /proc/iomem is a user API, one which userspace depends on. If we start going around making /proc/iomem report stuff like kernel boot time reservations as "reserved" memory, we will end up breaking the kexec tooling on some platforms. For example, kexec tooling for 32-bit ARM parses /proc/iomem, looking for "System RAM", "System RAM (boot alias)" and "reserved" regions. So, I think changes to make this "more consistent" come with high risk.
I agree there is a risk but I don't think it's high. It does not look like the minor changes in "reserved" reporting in /proc/iomem will break kexec tooling. Anyway the amount of reserved and free memory depends on a particular system, kernel version, configuration and command line. I have no intention to report kernel boot time reservations to /proc/iomem on architectures that do not report them there today, although this also does not seem like a significant factor. On the other hand, making /proc/iomem reporting consistent among architectures will allow to reduce complexity of both the kernel and kexec tools in the long run. -- Sincerely yours, Mike. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel