Thread (21 messages) 21 messages, 5 authors, 2021-06-03

Re: [RFC/RFT PATCH 2/5] memblock: introduce generic memblock_setup_resources()

From: Mike Rapoport <rppt@kernel.org>
Date: 2021-06-02 18:43:49
Also in: linux-mips, linux-mm, linux-s390, lkml
Subsystem: arm port, the rest · Maintainers: Russell King, Linus Torvalds

On Wed, Jun 02, 2021 at 04:51:41PM +0100, Russell King (Oracle) wrote:
On Wed, Jun 02, 2021 at 04:54:17PM +0300, Mike Rapoport wrote:
quoted
On Wed, Jun 02, 2021 at 11:15:21AM +0100, Russell King (Oracle) wrote:
quoted
On Wed, Jun 02, 2021 at 11:33:10AM +0300, Mike Rapoport wrote:
quoted
On Tue, Jun 01, 2021 at 02:54:15PM +0100, Russell King (Oracle) wrote:
quoted
If I look at one of my kernels:

c0008000 T _text
c0b5b000 R __end_rodata
... exception and unwind tables live here ...
c0c00000 T __init_begin
c0e00000 D _sdata
c0e68870 D _edata
c0e68870 B __bss_start
c0e995d4 B __bss_stop
c0e995d4 B _end

So the original covers _text..__init_begin-1 which includes the
exception and unwind tables. Your version above omits these, which
leaves them exposed.
Right, this needs to be fixed. Is there any reason the exception and unwind
tables cannot be placed between _sdata and _edata? 

It seems to me that they were left outside for purely historical reasons.
Commit ee951c630c5c ("ARM: 7568/1: Sort exception table at compile time")
moved the exception tables out of .data section before _sdata existed.
Commit 14c4a533e099 ("ARM: 8583/1: mm: fix location of _etext") moved
_etext before the unwind tables and didn't bother to put them into data or
rodata areas.
You can not assume that all sections will be between these symbols. This
isn't specific to 32-bit ARM. If you look at x86's vmlinux.lds.in, you
will see that BUG_TABLE and ORC_UNWIND_TABLE are after _edata, along
with many other undiscarded sections before __bss_start.
But if you look at x86's setup_arch() all these never make it to the
resource tree. So there are holes in /proc/iomem between the kernel
resources.
Also true. However, my point was to counter your claim that these
sections should be part of the .text/.data/.rodata etc sections in the
output vmlinux.

There is, however, a more important point. The __ex_table section
must exist and be separate from the .text/.data/.rodata sections in
the output ELF file, as sorttable (the exception table sorter) relies
on this to be able to find the table and sort it.

So, it isn't entirely "for historical reasons" as you said two messages
ago.
Back then when __ex_table was moved from .data section, _sdata and _edata
were part of the .data section. Today they are not. So something like the
patch below will ensure for instance that __ex_table would be a part of
"Kernel data" in /proc/iomem without moving it to the .data section:
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index f7f4620d59c3..2991feceab31 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -72,13 +72,6 @@ SECTIONS
 
 	RO_DATA(PAGE_SIZE)
 
-	. = ALIGN(4);
-	__ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
-		__start___ex_table = .;
-		ARM_MMU_KEEP(*(__ex_table))
-		__stop___ex_table = .;
-	}
-
 #ifdef CONFIG_ARM_UNWIND
 	ARM_UNWIND_SECTIONS
 #endif
@@ -143,6 +136,14 @@ SECTIONS
 	__init_end = .;
 
 	_sdata = .;
+
+	. = ALIGN(4);
+	__ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
+		__start___ex_table = .;
+		ARM_MMU_KEEP(*(__ex_table))
+		__stop___ex_table = .;
+	}
+
 	RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE)
 	_edata = .;
 
 
Now, bear in mind that /proc/iomem is a user API, one which userspace
depends on. If we start going around making /proc/iomem report stuff
like kernel boot time reservations as "reserved" memory, we will end up
breaking the kexec tooling on some platforms. For example, kexec
tooling for 32-bit ARM parses /proc/iomem, looking for "System RAM",
"System RAM (boot alias)" and "reserved" regions.

So, I think changes to make this "more consistent" come with high
risk.
I agree there is a risk but I don't think it's high. It does not look like
the minor changes in "reserved" reporting in /proc/iomem will break kexec
tooling. Anyway the amount of reserved and free memory depends on a
particular system, kernel version, configuration and command line.
I have no intention to report kernel boot time reservations
to /proc/iomem on architectures that do not report them there today,
although this also does not seem like a significant factor.

On the other hand, making /proc/iomem reporting consistent among
architectures will allow to reduce complexity of both the kernel and kexec
tools in the long run.

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help