Re: [PATCH V4] powerpc/prom: Export device tree physical address via proc
From: Matthew McClintock <hidden>
Date: 2010-07-15 18:58:22
On Thu, 2010-07-15 at 12:37 -0600, Grant Likely wrote:
On Thu, Jul 15, 2010 at 12:03 PM, Matthew McClintock [off-list ref] wrote:quoted
On Thu, 2010-07-15 at 10:57 -0600, Grant Likely wrote:quoted
On Thu, Jul 15, 2010 at 10:39 AM, Matthew McClintock [off-list ref] wrote:quoted
On Thu, 2010-07-15 at 10:22 -0600, Grant Likely wrote:quoted
quoted
Thanks for taking a look. My first thought was to just blow away allthequoted
memreserve regions and start over. But, there are reserve regionsforquoted
other things that I might not want to blow away. For example, onmpc85xxquoted
SMP systems we have an additional reserve region for our boot page.What is your starting point? Where does the device tree (and memreserve list) come from that you're passing to kexec? My first impression is that if you have to scrub the memreserve list, then the source being used to obtain the memreserves is either faulty or unsuitable to the task.I'm pulling the device tree passed in via u-boot and passing it to kexec.How? (what mechanism?) I hope you're not using the debugfs flat-device-tree file.That is one way to get a good working copy. What is wrong with this mechanism?It's unstable. It is in the debugfs, so there are no guarantees that the ABI will remain the same. Plus it doesn't reflect any changes that the kernel may make to the device tree. That interface is *debug only*. Do not use it.
Ok.
quoted
Should we duplicate everything u-boot does in kexec to build up a flat device tree? Or is there another way to get a good tree?That is one option. U-Boot really shouldn't be modifying the tree very much anyway (I know on some platforms U-Boot is almost creating a tree from scratch, but that is insane and an entirely different discussion). /proc/device-tree always gives the kernel's current view of the tree. You can use dtc to extract it and write it into a dtb.
Ok wow, I've missed this completely. dtc to extract the device tree is a very good option. I will pursue that line of thinking.
quoted
Ideally, we don't make the end user manually edit a device tree.Of course not, any device tree manipulation is the job of the kexec tools. None of this should be manual. However, the data source is a significant and important question.
Ideally, we don't duplicate this in kexec and u-boot. Right now there is nothing specific for say mpc85xx in kexec it's just ppc32. I would prefer it stay this way.
quoted
quoted
quoted
It is the most complete device tree and requires the least amount of fixup. I have to scrub two items, the ramdisk/initrd and the device tree because upon kexec'ing the kernel we have the ability to pass in new ramdisk/initrd and device tree. They can also live at different physical addresses for the second reboot.This sounds like the model is backwards. Rather than scrubbing items, the memreserve list should be built up from a known good source.You can build one up yourself and it will still work out fine. Or you can pull one from debugfs to get yourself started. Or you can pull it every time.What do you mean by "pull it every time"?
Exactly what you are saying is bad to do ;-P. Pull it from debugfs. But the above "dts -I fs" solution practically fixes that issue.
Out of curiosity, what is responsible for building up the memreserve list? The userspace portion, or the kernel portion of kexec? Or is it done by a totally separate program?
Currently, neither. I have submitted patches for the user space tool to fixup the memreserve regions.
quoted
quoted
quoted
The initrd addresses are already exposed, so we can update/remove/reuse that entry, we just need a way for kexec to determine the current device tree address so it can replace the correct memreserve region for the kexec'ing kernels' device tree. The whole problem comes from repeatedly kexec'ing, we need to make sure we don't keep losing blobs of memory to reserve regions (so we can't just blindly add). We also need to make sure we don't lose other memreserve regions that might be important for other things (so we can't just blow them all away).Right, so you need to have a known-good list of reserve sections. Trying to go the other way sounds very fragile.Yes. Where would we get a list of memreserve sections?I would say the list of reserves that are not under the control of Linux should be explicitly described in the device tree proper. For instance, if you have a region that firmware depends on, then have a node for describing the firmware and a property stating the memory regions that it depends on. The memreserve regions can be generated from that.
Ok, so we could traverse the tree node-by-bode for a persistent-memreserve property and add them to the /memreserve/ list in the kexec user space tools?
quoted
Should we export the reserve sections instead of the device tree location?It shouldn't really be something that the kernel is explicitly exporting because it is a characteristic of the board design. It is something that belongs in the tree-proper. ie. when you extract the tree you have data telling what the region is, and why it is reserved.
Agreed.
quoted
We just need a way to preserve what was there at boot to pass to the new kernel.Yet there is no differentiation between the board-dictated memory reserves and the things that U-Boot/Linux made an arbitrary decision on. The solution should focus not on "can I throw this one away?" but rather "Is this one I should keep?" :-) A subtle difference, I know, but it changes the way you approach the solution.
Fair enough. I think the above solution will work nicely, and I can start implementing something if you agree - if I interpreted your idea correctly. Although it should not require any changes to the kernel proper. -M