Thread (25 messages) 25 messages, 7 authors, 2016-07-30

Re: [Query] Preemption (hogging) of the work handler

From: Viresh Kumar <viresh.kumar@linaro.org>
Date: 2016-07-14 22:12:57
Also in: lkml

Possibly related (same subject, not in this thread)

On 14-07-16, 16:12, Jan Kara wrote:
Exactly. Calling printk() from certain parts of the kernel (like scheduler
code or timer code) has been always unsafe because printk itself uses these
parts and so it can lead to deadlocks. That's why printk_deffered() has
been introduced as you mention below.

And with sync printk the above deadlock doesn't trigger only by chance - if
there happened to be a waiter on console_sem while we suspend, the same
deadlock would trigger because up(&console_sem) will try to wake him up and
the warning in timekeeping code will cause recursive printk.

So I think your patch doesn't really address the real issue - it only
works around the particular WARN_ON(timekeeping_enabled) warning but if
there was a different warning in timekeeping code which would trigger, it
has a potential for causing recursive printk deadlock (and indeed we had
such issues previously - see e.g. 504d58745c9c "timer: Fix lock inversion
between hrtimer_bases.lock and scheduler locks").

So there are IMHO two issues here worth looking at:

1) I didn't find how a wakeup would would lead to calling to ktime_get() in
the current upstream kernel or even current RT kernel. Maybe this is a
problem specific to the 3.10 kernel you are using? If yes, we don't have to
do anything for current upstream AFAIU.
I haven't checked that earlier, but I see the path in both 3.10 and mainline.

vprintk_emit
 -> wake_up_process
  -> try_to_wake_up
   -> ttwu_queue
    -> ttwu_do_activate
     -> ttwu_activate
      -> activate_task
       -> enqueue_task (sched/core.c)
        -> enqueue_task_rt (rt.c)
         -> enqueue_rt_entity
          -> __enqueue_rt_entity
           -> inc_rt_tasks
            -> inc_rt_group
             -> start_rt_bandwidth
              -> start_bandwidth_timer
               -> __hrtimer_start_range_ns
                -> ktime_get()
If I just missed how wakeup can call into ktime_get() in current upstream,
there is another question:

2) Is it OK that printk calls wakeup so late during suspend?
To clarify again to everybody, we are talking about the place where all non-boot
CPUs are already hot-unplugged and the last running one has disabled interrupts.

I believe that we can't do migration at all now, right? What will we get by
calling wake_up_process() now anyway ?
I believe it
is but I'm neither scheduler nor suspend expert. If it is OK, and wakeup
can lead to ktime_get() in current upstream, then this contradicts the
check WARN_ON(timekeeping_suspended) in ktime_get() and something is wrong.

Adding Thomas to CC as timer / RT expert...
Thanks.

-- 
viresh
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help