Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
From: Ming Lei <hidden>
Date: 2012-11-28 13:51:17
Also in:
linux-mm, lkml, netdev
On Wed, Nov 28, 2012 at 6:06 PM, Rafael J. Wysocki [off-list ref] wrote:
Well, it may be unfrequent, but does it mean it has to do things that may be avoided (ie. walking the children of every node in the path in some cases)?
I agree so without introducing extra cost, :-)
I don't really think that the counters would cost us that much anyway.
On ARM v7, sizeof(struct device) becomes 376 from 368 after introducing 'unsigned int noio_cnt;' to 'struct dev_pm_info', and total memory increases about 3752bytes in a small configuration(about 494 device instance). The actual memory increase should be more than the data because 'struct device' is generally embedded into other concrete device structure.
quoted
Also looks the current implementation of pm_runtime_set_memalloc_noio() is simple and clean enough with the flag, IMO.I know you always know better. :-)
We still need to consider cost and the function calling frequency, :-)
quoted
quoted
I would use the flag only to store the information that pm_runtime_set_memalloc_noio(dev, true) has been run for this device directly and I'd use a counter for everything else. That is, have power.memalloc_count that would be incremented when (1) pm_runtime_set_memalloc_noio(dev, true) is called for that device and (2) when power.memalloc_count for one of its children changes from 0 to 1 (and analogously for decrementation). Then, check the counter in rpm_callback().Sorry, could you explain in a bit detail why we need the counter? Looks only checking the flag in rpm_callback() is enough, doesn't it?Why would I want to use power.memalloc_count in addition to the power.memalloc_noio flag? Consider this: pm_runtime_set_memalloc_noio(dev): return if power.memalloc_noio is set set power.memalloc_noio loop: increment power.memalloc_count if power.memalloc_count is 1 now switch to parent and go to loop
I am wondering if the above should be changed to below because the child
count of memalloc_noio device need to be recorded.
pm_runtime_set_memalloc_noio(dev):
return if power.memalloc_noio is set
set power.memalloc_noio
loop:
increment power.memalloc_count
switch to parent and go to loop
So pm_runtime_set_memalloc_noio(dev) will become worse than
the improved pm_runtime_set_memalloc_noio(dev, true), which
can return immediately if one dev or parent's flag is true.
pm_runtime_clear_memalloc_noio(dev):
return if power.memalloc_noio is unset
unset power.memalloc_noio
loop:
decrement power.memalloc_count
if power.memalloc_count is 0 now switch to parent and go to loop
The above will perform well than pm_runtime_set_memalloc_noio(dev, false),
because the above avoids to walk children of device.
So one becomes worse and another becomes better, :-)
Also the children count of one device is generally very small, less than
10 for most devices, see the data obtained in one common x86 pc(thinkpad
t410) from below link:
http://kernel.ubuntu.com/~ming/up/t410-dev-child-cnt.log
- about 8 devices whose child count is more than 10, top three are 18, 17 ,12,
and all the three are root devices.
- about 117 devices whose child count is between 1 and 9
- other 501 devices whose child count is zero
From above data, walking device children should have not much effect on
performance of pm_runtime_set_memalloc_noio(), which is also called in very infrequent path.
Looks kind of simpler, doesn't it?
Looks simpler, but more code lines than single pm_runtime_set_memalloc_noio(), :-)
And why rpm_callback() should check power.memalloc_count instead of the count? Because power.memalloc_noio will only be set for devices that pm_runtime_set_memalloc_noio(dev) was called for directly (not necessarily for the parents). And that works even if someone calls any of them twice in a row for the same device (presumably by mistake) and doesn't have to make any assumptions about devices it is called for.
IMO, we can ignore the mistake usage because the function is called only in network/block core code currently, not by individual driver.
quoted
quoted
Besides, don't you need to check children for the arg device itself?It isn't needed since the children of network/block device can't be involved of the deadlock in runtime PM path. Also, the function is only called by network device or block device subsystem, both the two kind of device are class device and should have no children.OK, so not walking the arg device's children is an optimization related to some assumptions regarding who's supposed to use this routine. That should be clearly documented.
I think the patch already documents it in the comment of pm_runtime_set_memalloc_noio(). Thanks, -- Ming Lei -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>