Thread (4 messages) 4 messages, 2 authors, 2007-03-23

Re: [PATCH RFC] Change softlockup watchdog to ignore stolen time

From: Zachary Amsden <hidden>
Date: 2007-03-22 23:51:42
Also in: lkml

Jeremy Fitzhardinge wrote:
The softlockup watchdog is currently a nuisance in a virtual machine,
since the whole system could have the CPU stolen from it for a long
period of time.  While it would be unlikely for a guest domain to be
denied timer interrupts for over 10s, it could happen and any softlockup
message would be completely spurious.
  
No, it is not unlikely.  4-way SMP VMs idling exhibit this behavior with 
NO_HZ or NO_IDLE_HZ because they get quiet enough to schedule nothing on 
the APs.

And that can happen on native hardware as well.
Earlier I proposed that sched_clock() return time in unstolen
nanoseconds, which is how Xen and VMI currently implement it.  If the
softlockup watchdog uses sched_clock() to measure time, it would
automatically ignore stolen time, and therefore only report when the
guest itself locked up.  When running native, sched_clock() returns
real-time nanoseconds, so the behaviour would be unchanged.

Does this seem sound?

Also, softlockup.c's use of jiffies seems archaic now.  Should it be
converted to use timers?  Mightn't it report lockups just because there
was no timer event?
  
This looks good to me, as a first order approximation.  But on native 
hardware, with NO_HZ, this is just broken to begin with.  Perhaps we 
should make SOFTLOCKUP depend on !NO_HZ.

Zach
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help