[PATCH] clocksource: em_sti: Adjust clock event rating to fix SMP broadcast
From: magnus.damm@gmail.com (Magnus Damm)
Date: 2013-08-29 08:41:55
Also in:
linux-sh
On Thu, Aug 1, 2013 at 5:45 AM, Stephen Boyd [off-list ref] wrote:
On 07/31/13 12:17, Magnus Damm wrote:quoted
Hi Stephen, On Thu, Aug 1, 2013 at 2:32 AM, Stephen Boyd [off-list ref] wrote:quoted
On 07/30/13 23:25, Simon Horman wrote:quoted
From: Magnus Damm <redacted> Update the STI rating from 200 to 80 to fix SMP operation with the ARM broadcast timer. This breakage was introduced by: f7db706 ARM: 7674/1: smp: Avoid dummy clockevent being preferred over real hardware clock-event Without this fix SMP operation is broken on EMEV2 since no broadcast timer interrupts trigger on the secondary CPU cores. Signed-off-by: Magnus Damm <redacted> Signed-off-by: Simon Horman <redacted> ---This looks suspicious. Are you're purposefully deflating the rating so that the STI timer fills in the broadcast position? Why not make the STI cpumask be all possible CPUs? Presumably the interrupt can target all CPUs since it isn't a per-cpu interrupt and doing this would cause the STI to fill in the broadcast slot, leaving the per-cpu dummys in the tick position.While letting the timer broadcast to all CPUs sounds interesting the STI driver has so far only been used to drive a single CPU core. This used to work well for us but has since some time unfortunately been broken. I agree that it may be suboptimal with a single timer like STI and using IPI for broadcast, but for more efficient SMP we already have TWD or arch timer.I think there is some confusion. The mask field says what CPUs the timer can possibly interrupt and for non-percpu interrupts this should be all possible CPUs (unless we're talking clusters, etc. but I don't think we are). Can you please give the output of /proc/timer_list or confirm that the STI is your broadcast source? If so you should probably be marking the cpumask for all possible CPUs so that the clockevent core knows to prefer this clockevent for the broadcast source and not a per-cpu source. Then you can leave the rating as is.
Hello Stephen, Thanks for your suggestion. Yes, there was indeed some confusion. Now after diving into the code a bit deeper I can finally understand what you mean. Instead of adjusting the rating I've changed the cpumask member like this:
--- 0001/drivers/clocksource/em_sti.c
+++ work/drivers/clocksource/em_sti.c 2013-08-29 17:33:16.000000000 +0900@@ -301,7 +301,7 @@ static void em_sti_register_clockevent(s ced->name = dev_name(&p->pdev->dev); ced->features = CLOCK_EVT_FEAT_ONESHOT; ced->rating = 200; - ced->cpumask = cpumask_of(0); + ced->cpumask = cpu_all_mask; ced->set_next_event = em_sti_clock_event_next; ced->set_mode = em_sti_clock_event_mode;
Without the cpumask fix or without the earlier rating fix the following interrupt count can be seen in /proc/interrupts on KZM9D: 157: 140 0 GIC 157 e0180000.sti 160: 0 0 e0050000.gpio 1 eth0 IPI0: 0 0 CPU wakeup interrupts IPI1: 0 0 Timer broadcast interrupts Above, notice how no IPI1 interrupts seem to be arriving. With the cpumask fix above the interrupt count becomes like this: 157: 559 0 GIC 157 e0180000.sti 160: 0 0 e0050000.gpio 1 eth0 IPI0: 0 0 CPU wakeup interrupts IPI1: 0 601 Timer broadcast interrupts Would this be in line with your expectation? Thanks, / magnus