Re: [PATCH 1/8] pseries: phyp dump: Docmentation
From: Michael Ellerman <hidden>
Date: 2008-01-09 04:58:35
On Tue, 2008-01-08 at 22:29 -0600, Nathan Lynch wrote:
Manish Ahuja wrote:quoted
+ + Hypervisor-Assisted Dump + ------------------------ + November 2007Date is unneeded (and, uhm, dated :)quoted
+The goal of hypervisor-assisted dump is to enable the dump of +a crashed system, and to do so from a fully-reset system, and +to minimize the total elapsed time until the system is back +in production use.Is it actually faster than kdump?quoted
+As compared to kdump or other strategies, hypervisor-assisted +dump offers several strong, practical advantages: + +-- Unlike kdump, the system has been reset, and loaded + with a fresh copy of the kernel. In particular, + PCI and I/O devices have been reinitialized and are + in a clean, consistent state. +-- As the dump is performed, the dumped memory becomes + immediately available to the system for normal use. +-- After the dump is completed, no further reboots are + required; the system will be fully usable, and running + in it's normal, production mode on it normal kernel. + +The above can only be accomplished by coordination with, +and assistance from the hypervisor. The procedure is +as follows: + +-- When a system crashes, the hypervisor will save + the low 256MB of RAM to a previously registered + save region. It will also save system state, system + registers, and hardware PTE's. + +-- After the low 256MB area has been saved, the + hypervisor will reset PCI and other hardware state. + It will *not* clear RAM. It will then launch the + bootloader, as normal. + +-- The freshly booted kernel will notice that there + is a new node (ibm,dump-kernel) in the device tree, + indicating that there is crash data available from + a previous boot. It will boot into only 256MB of RAM, + reserving the rest of system memory. + +-- Userspace tools will parse /sys/kernel/release_region + and read /proc/vmcore to obtain the contents of memory, + which holds the previous crashed kernel. The userspace + tools may copy this info to disk, or network, nas, san, + iscsi, etc. as desired. + + For Example: the values in /sys/kernel/release-region + would look something like this (address-range pairs). + CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: / + DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A + +-- As the userspace tools complete saving a portion of + dump, they echo an offset and size to + /sys/kernel/release_region to release the reserved + memory back to general use. + + An example of this is: + "echo 0x40000000 0x10000000 > /sys/kernel/release_region" + which will release 256MB at the 1GB boundary.This violates the "one file, one value" rule of sysfs, but nobody really takes that seriously, I guess. In any case, consider documenting this in Documentation/ABI.quoted
+ +Please note that the hypervisor-assisted dump feature +is only available on Power6-based systems with recent +firmware versions.This statement will of course become dated/incorrect so I recommend removing it.quoted
+ +Implementation details: +---------------------- +In order for this scheme to work, memory needs to be reserved +quite early in the boot cycle. However, access to the device +tree this early in the boot cycle is difficult, and device-tree +access is needed to determine if there is a crash data waiting.I don't think this bit about early device tree access is correct. By the time your code is reserving memory (from early_init_devtree(), I think), RTAS has been instantiated and you are able to test for the existence of /rtas/ibm,dump-kernel.
Yep it's early_init_devtree(), and yes it's fairly easy to access the (flattened) device tree at that point.
quoted
+To work around this problem, all but 256MB of RAM is reserved +during early boot. A short while later in boot, a check is made +to determine if there is dump data waiting. If there isn't, +then the reserved memory is released to general kernel use.So I think these gymnastics are unneeded -- unless I'm misunderstanding something, you should be able to determine very early whether to reserve that memory.
I agree. cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person
Attachments
- signature.asc [application/pgp-signature] 189 bytes