Re: 回复: [PATCH 01/10] tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value
From: Zhou Ti (x2019cwm) <hidden>
Date: 2021-03-29 14:49:52
Also in:
lkml
On Mon 2021-03-29 8:45, Rafael J. Wysocki wrote:
On Fri, Mar 26, 2021 at 11:53 PM Zhou Ti (x2019cwm) [off-list ref] wrote:quoted
On Fri, 26 Mar 2021 19:54:26 +0100, Rafael J. Wysocki wrote:quoted
On Fri, Mar 26, 2021 at 6:53 PM Zhou Ti (x2019cwm) [off-list ref] wrote:quoted
On Fri, 26 Mar 2021 18:01:47 +0100, Rafael J. Wysocki wrote:quoted
On Thu, Mar 25, 2021 at 9:37 PM Zhou Ti (x2019cwm) [off-list ref] wrote:quoted
On March 25, 2021 15:50, Rafael J. Wysocki wrote:quoted
On Thu, Mar 25, 2021 at 8:18 PM Zhou Ti (x2019cwm) [off-list ref] wrote:quoted
On March 25, 2021 14:56, Rafael J. Wysocki wrote:quoted
On Thursday, March 25, 2021 2:14:00 PM CET Frederic Weisbecker wrote:quoted
On Tue, Mar 16, 2021 at 04:08:08PM +0000, Zhou Ti (x2019cwm) wrote:quoted
But I don't think it's a good idea to handle this in callers, because logically the function shouldn't return negative values. Returning 0 directly would allow idle governors to get another chance to select again.Hmm, I'm going to leave the last word to Rafael since cpuidle are the only callers of this. In any case we need to fix it.Yes, we do. So I said that I preferred to address this in the callers and the reason why is because, for example, for the teo governor it would be a matter of using a different data type to store the tick_nohz_get_sleep_length() return value, like in the (untested) patch below. So at least in this case there is no need to add any new branches anywhere. I'm still not sure about menu, because it is more complicated, but even if that one needs an extra branch, that is a win already.I would like to point out the potential trouble that fixing this issue in the callers could cause. 1. This function is called multiple times in menu governor and TEO governor.What do you mean by "multiple times"? Each of the governors calls it once per cycle and its previous return value is not used in the next cycle at least in teo.I remember a governor called this function twice in a cycle, I guess I remember wrong.That obviously depends on the governor, but both teo and menu call it once per cycle.quoted
quoted
quoted
I'm not sure that receiving results using signed integers is enough to solve all the problems, in the worst case it may require increasing the logical complexity of the code.That is a valid concern, so it is a tradeoff between increasing the logical complexity of the code and adding branches to it.quoted
2. This function is important for developing idle governor. If the problem is not fixed in the function itself, then this potential pitfall should be explicitly stated in the documentation.That I can agree with.quoted
This is because it is difficult to predict from the definition and naming of the function that it might return a negative number. I actually discovered this anomaly when I was doing data analysis on my own idle governor. For some idle control algorithms, this exception return could lead to serious consequences, because negative return logically won't happen.Well, it's a matter of how to take the possible negative return value into account so it does not affect the result of the computations.I think it is challenging for some algorithms to take negative return values into account properly. For TEO (and even menu), it is possible to solve the problem by just changing the way the data is received is because the learning mechanism for both algorithms is simple.Of course this depends on the governor.quoted
One of the interesting things about the CPUIdle subsystem is that it is well suited to introduce machine learning and probabilistic statistical methods.You need to remember that the governor code runs in the idle loop context which is expected to be reasonably fast. That's why we are worrying about individual branches here.quoted
This means that many of the more complex and data-sensitive algorithms can potentially be explored. In the best case we will still need to add additional code complexity to a new algorithm.So I'm not sure what the problem with adding an upfront negative value check to the governor is.quoted
It would reduce a lot of unnecessary considerations (for example, highlight this shortcoming in the documentation) if we could ensure that this function would work as it is logically defined. But I don't really understand how much of a burden adding an extra branch would impose, so I don't know if this tradeoff is worth it.It ultimately depends on the governor, which is why I think that the negative value check should be done by the governor, if needed, and not by the function called by it, because in the latter case the check may be redundant and we end up with an extra branch (or two branches in this particular case) for no good reason whatsoever. Yes, there are governors which simply can do the negative value check upfront right after calling that function and ensure that they will not deal with negative values going forward. This is probably what I'll do in the menu case. However, if the governor is simple enough and it can avoid doing the explicit negative value check, I don't see a reason to do that check elsewhere "just in case".Makes sense. I will submit my patch to fix this issue in menu and TEO.Well, I have patches for that already and they are not super-straightforward. Though If you want to try to fix this yourself, I'll wait for your submission.Thanks! I really like this subsystem, so I hope to contribute a little. I still have some questions. For TEO governor: Even if we change some datatypes as your patch did before, some explicit type conversions still need to be added to prevent wrong results.That's true. That patch was way incomplete and I would start with changing the data type of exit_latency_ns and target_residency_ns in struct cpuidle_state to s64 for this reason.quoted
For example: line 276 (because 1u > -1 will be false) line 329 line 422 Since some of them are in a loop, does the overhead caused by the type conversions worth it? or do I need to do some pre-processing to avoid duplicate conversions (which may cause additional space overhead) ?Well, it looks like it may be better if I send my patches after all, because it will take less time overall than me explaining here what I would do. Let me do that.quoted
For menu governor: If we simply change the datatypes, the conversions required are even more.I would add a negative return value check to menu as I said before. Not really AFAICS, but again let me send a patch to illustrate my point.
Sure! I will study your patches carefully.