Re: [PATCH 0/8] v5 De-couple sysfs memory directories from memory sections
From: Nathan Fontenot <hidden>
Date: 2010-08-16 14:34:17
Also in:
linux-mm, lkml
On 08/12/2010 02:08 PM, Andrew Morton wrote:
On Mon, 09 Aug 2010 12:53:00 -0500 Nathan Fontenot [off-list ref] wrote:quoted
This set of patches de-couples the idea that there is a single directory in sysfs for each memory section. The intent of the patches is to reduce the number of sysfs directories created to resolve a boot-time performance issue. On very large systems boot time are getting very long (as seen on powerpc hardware) due to the enormous number of sysfs directories being created. On a system with 1 TB of memory we create ~63,000 directories. For even larger systems boot times are being measured in hours.And those "hours" are mainly due to this problem, I assume.
Yes, those hours are spent creating the sysfs directories for each of the memory sections.
quoted
This set of patches allows for each directory created in sysfs to cover more than one memory section. The default behavior for sysfs directory creation is the same, in that each directory represents a single memory section. A new file 'end_phys_index' in each directory contains the physical_id of the last memory section covered by the directory so that users can easily determine the memory section range of a directory.What you're proposing appears to be a non-back-compatible userspace-visible change. This is a big issue! It's not an unresolvable issue, as this is a must-fix problem. But you should tell us what your proposal is to prevent breakage of existing installations. A Kconfig option would be good, but a boot-time kernel command line option which selects the new format would be much better.
This shouldn't break existing installations, unless an architecture chooses to do so. With my patch only the powerpc/pseries arch is updated such that what is seen in userspace is different. The default behavior is maintained for all architectures unless they define their own version of memory_block_size_bytes(). The default definition of this routine (defined as __weak in Patch 5/8) sets the memory block size to the same size it currently is, and thus preserving the exisitng 1 sysfs directory per memory section. The only change that will be seen is a new propery for memory section, end_phys_addr, which will have the same value as the existing 'phys_addr' property.
However you didn't mention this issue at all, and it's the most important one.quoted
Updates for version 5 of the patchset include the following: Patch 4/8 Add mutex for add/remove of memory blocks - Define the mutex using DEFINE_MUTEX macro. Patch 8/8 Update memory-hotplug documentation - Add information concerning memory holes in phys_index..end_phys_index.And you forgot to tell us how long those machines boot with the patchset applied, which is the entire point of the patchset!
Yes, I am working on getting more time on our large systems to get performance numbers with this patch. I'll post them when I get them. -Nathan