Re: CONFIG_PREEMPT_RT local_softirq_pending warning when ISR blocks

From: Brian Silverman <hidden>
Date: 2015-03-10 00:37:06
Also in: lkml

On Mon, Mar 9, 2015 at 12:08 PM, Sebastian Andrzej Siewior
[off-list ref] wrote:

* Brian Silverman | 2015-03-05 00:16:20 [-0500]:

quoted

Beforehand, 000 is spending most of its time in interrupts, but bash
is doing something related to memory management on it in between.
           bash-14721 [000] ......1  6854.629126: rt_spin_lock <-free_pcppages_bulk
           bash-14721 [000] ....1.1  6854.629126: mm_page_pcpu_drain: page=ffffea000fa1aa40 pfn=4097705 order=0 migratetype=1
           bash-14721 [000] ......1  6854.629127: get_pfnblock_flags_mask <-free_pcppages_bulk
           bash-14721 [000] ......1  6854.629127: __mod_zone_page_state <-free_pcppages_bulk
           bash-14721 [000] ....1.1  6854.629127: mm_page_pcpu_drain: page=ffffea000f572ac0 pfn=4021419 order=0 migratetype=0
           bash-14721 [000] ......1  6854.629128: get_pfnblock_flags_mask <-free_pcppages_bulk
           bash-14721 [000] ......1  6854.629128: __mod_zone_page_state <-free_pcppages_bulk
... # lots more virtually identical repetitions of those last 3 lines
           bash-14721 [000] ....1.1  6854.629139: mm_page_pcpu_drain: page=ffffea000f481a80 pfn=4005994 order=0 migratetype=1
           bash-14721 [000] ......1  6854.629139: get_pfnblock_flags_mask <-free_pcppages_bulk

You free memory and hold the zone->lock

quoted

           bash-14721 [000] d.....1  6854.629139: do_IRQ <-ret_from_intr
           bash-14721 [000] d.....1  6854.629139: irq_enter <-do_IRQ
... # wakes up the can1 ISR thread on 001 and the can0 one on 000
(same physical IRQ line)
           bash-14721 [000] d...3.1  6854.629261: sched_switch: prev_comm=bash prev_pid=14721 prev_prio=120 prev_state=R+ ==> next_comm=irq/18-can0 next_pid=2015 next_prio=28

I would assume that this one raises NET_RX softirq. But at the bottom
you have the irq handler on the other CPU which confuses me…

There wasn't actually any traffic on can0 for this test, so it didn't.
The can0 ISR only makes a few reads/writes to the device, never
calling netif_rx.

The can1 handler (which actually raises a NET_RX softirq) runs on 001
because it's pinned there.

quoted

... # runs the can0 ISR
    irq/18-can0-2015  [000] d...3..  6854.629283: sched_switch: prev_comm=irq/18-can0 prev_pid=2015 prev_prio=28 prev_state=S ==> next_comm=ksoftirqd/0 next_pid=3 next_prio=98
...
    ksoftirqd/0-3     [000] ....1.1  6854.629291: softirq_entry: vec=1 [action=TIMER]
...
    ksoftirqd/0-3     [000] ....1.1  6854.629293: softirq_exit: vec=1 [action=TIMER]

only the timer since nobody raised NET_RX

Correct. I included that as context for what 000 spent its time doing
with the lock held, not implying that it's incorrect.

quoted

...
    ksoftirqd/0-3     [000] .......  6854.629298: schedule <-smpboot_thread_fn ...
    ksoftirqd/0-3     [000] d...3..  6854.629304: sched_switch: prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=98 prev_state=S ==> next_comm=bash next_pid=14721 next_prio=28
...
           bash-14721 [000] d...1.1  6854.629308: smp_trace_reschedule_interrupt <-trace_reschedule_interrupt
# Actually unnecessary schedule IPI from 001?
           bash-14721 [000] d...1.1  6854.629309: irq_enter <-smp_trace_reschedule_interrupt
...
           bash-14721 [000] ....1.1  6854.629316: __tick_nohz_task_switch <-__schedule
           bash-14721 [000] ......1  6854.629316: __mod_zone_page_state <-free_pcppages_bulk
           bash-14721 [000] ....1.1  6854.629317: mm_page_pcpu_drain: page=ffffea000f57a900 pfn=4021924 order=0 migratetype=0
           bash-14721 [000] ......1  6854.629317: get_pfnblock_flags_mask <-free_pcppages_bulk
           bash-14721 [000] ......1  6854.629317: __mod_zone_page_state <-free_pcppages_bulk

and it continues cleaning up memory.

quoted

... # more of this like it was doing before
I don't see it unlocking the problematic mutex before the trace ends.

The relevant part for 001 starts while it's running the can1 ISR.
    irq/18-can1-7228  [001] ....1.1  6854.629275: netif_rx: dev=can1 skbaddr=ffff880412d8fc00 len=16
    irq/18-can1-7228  [001] ......1  6854.629275: migrate_disable <-netif_rx_internal
    irq/18-can1-7228  [001] ......2  6854.629275: enqueue_to_backlog <-netif_rx_internal

enqueue_to_backlog() looks like packet reception so this should be
handled in napi so I assume we run in NET_RX softirq

enqueue_to_backlog is the napi function that actually raises the NET_RX softirq.

quoted

    irq/18-can1-7228  [001] d.....2  6854.629276: _raw_spin_lock <-enqueue_to_backlog
    irq/18-can1-7228  [001] d...1.2  6854.629276: __raise_softirq_irqoff <-enqueue_to_backlog
    irq/18-can1-7228  [001] d...1.2  6854.629276: do_raise_softirq_irqoff <-__raise_softirq_irqoff
    irq/18-can1-7228  [001] d...2.2  6854.629276: softirq_raise: vec=3 [action=NET_RX]

This is where the softirq gets raised.

quoted

... # continues handling the can1 interrupt
    irq/18-can1-7228  [001] ......6  6854.629286: rt_spin_lock <-get_page_from_freelist
    irq/18-can1-7228  [001] ......6  6854.629287: rt_spin_lock_slowlock <-get_page_from_freelist
    irq/18-can1-7228  [001] ......6  6854.629287: _raw_spin_lock <-rt_spin_lock_slowlock
    irq/18-can1-7228  [001] ....1.6  6854.629287: __try_to_take_rt_mutex <-rt_spin_lock_slowlock
    irq/18-can1-7228  [001] ....1.6  6854.629287: _raw_spin_lock_irq <-rt_spin_lock_slowlock
    irq/18-can1-7228  [001] d...2.6  6854.629288: _raw_spin_unlock_irq <-rt_spin_lock_slowlock
    irq/18-can1-7228  [001] ....1.6  6854.629288: task_blocks_on_rt_mutex <-rt_spin_lock_slowlock

it might be zone->lock it goes after. It boosts the bash process which
seems to free memory so it would make sense.

quoted

... # rt_mutex/scheduling stuff
    irq/18-can1-7228  [001] d...4.6  6854.629291: sched_pi_setprio: comm=bash pid=14721 oldprio=120 newprio=28
... # more scheduler stuff
    irq/18-can1-7228  [001] d...3.6  6854.629299: native_smp_send_reschedule <-rt_mutex_setprio
... # more scheduler stuff
    irq/18-can1-7228  [001] d...2.6  6854.629307: pick_next_task_fair <-__schedule
    irq/18-can1-7228  [001] d...2.6  6854.629307: pick_next_task_stop <-__schedule
    irq/18-can1-7228  [001] d...2.6  6854.629307: pick_next_task_dl <-__schedule
    irq/18-can1-7228  [001] d...2.6  6854.629307: pick_next_task_rt <-__schedule
    irq/18-can1-7228  [001] d...2.6  6854.629307: pick_next_task_fair <-__schedule
    irq/18-can1-7228  [001] d...2.6  6854.629308: pick_next_task_idle <-__schedule
    irq/18-can1-7228  [001] d...3.6  6854.629308: sched_switch: prev_comm=irq/18-can1 prev_pid=7228 prev_prio=28 prev_state=D ==>
next_comm=swapper/1 next_pid=0 next_prio=120
...
         <idle>-0     [001] d...1..  6854.629319: softirq_check_pending_idle <-tick_nohz_idle_enter
My tracing_off() call is in softirq_check_pending_idle, so that's it.

It looks like your softirq for net_rx is getting a packet and then after
raising NET_RX (again?) it blocks on a lock. In order to get this lock
it boosts and schedules bash. It gets runable but on the other CPU. On
CPU1 there is nothig going is nothing going and the only runable task is
the idle thread. And this is probably where the warning is written
because we go to idle while we should process a softirq instead.

That sounds like the issue. Doing the softirq instead of going idle in
this situation seems like it means calling thread_do_softirq() from
__schedule, but I don't know where the right place is. Can anybody
give me some help on where exactly to check for softirqs from?

Thanks,
Brian

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help