Thread (11 messages) 11 messages, 3 authors, 2013-11-03

[PATCH] watchdog: at91sam9_wdt: various fixes

From: boris brezillon <hidden>
Date: 2013-11-03 14:42:18
Also in: linux-watchdog, lkml

Hello Guenter,

Are you okay with the 3 fixes/improvements provided by this patch,
and the explanation I gave regarding the reason for these changes ?

If so, I will submit a new series splitting the patch and including your 
suggestions.

Best Regards,

Boris

On 29/10/2013 18:22, boris brezillon wrote:
On 29/10/2013 17:43, Guenter Roeck wrote:
quoted
On Tue, Oct 29, 2013 at 05:22:50PM +0100, boris brezillon wrote:
quoted
On 29/10/2013 16:45, Guenter Roeck wrote:
quoted
On Tue, Oct 29, 2013 at 11:37:33AM +0100, Boris BREZILLON wrote:
quoted
Fix the secs_to_ticks macro in case 0 is passed as an argument.

Rework the heartbeat calculation to increase the security margin 
of the
watchdog reset timer.

Use the min_heartbeat value instead of the calculated heartbeat 
value for
the first watchdog reset.

Signed-off-by: Boris BREZILLON <redacted>
Hi Boris,

can you possibly split the three changes into separate patches ?
Sure. My first idea was to split this in 3 patches, but, as the
buggy at91 watchdog series was
already applied to linux-watchdog-next, I thought it would be faster
to only provide one
patch to fix all the issues at once.
quoted
The first is a no-brainer. Gives my opinion of my code review 
capabilities
a slight damper ;-).

For the other two problems, it might make sense to describe
the problems you are trying to solve.

Couple of comments inline.

Thanks,
Guenter

quoted
---
  drivers/watchdog/at91sam9_wdt.c |   35 
++++++++++++++++++++++++-----------
  1 file changed, 24 insertions(+), 11 deletions(-)
diff --git a/drivers/watchdog/at91sam9_wdt.c 
b/drivers/watchdog/at91sam9_wdt.c
index 9bd089e..f1b59f1 100644
--- a/drivers/watchdog/at91sam9_wdt.c
+++ b/drivers/watchdog/at91sam9_wdt.c
@@ -51,7 +51,7 @@
  #define ticks_to_hz_rounddown(t)    ((((t) + 1) * HZ) >> 8)
  #define ticks_to_hz_roundup(t)        (((((t) + 1) * HZ) + 255) 
quoted
quoted
8)
  #define ticks_to_secs(t)        (((t) + 1) >> 8)
-#define secs_to_ticks(s)        (((s) << 8) - 1)
+#define secs_to_ticks(s)        (s ? (((s) << 8) - 1) : 0)
    (s)
quoted
  #define WDT_MR_RESET    0x3FFF2FFF
@@ -61,6 +61,11 @@
  /* Watchdog max delta/value in secs */
  #define WDT_COUNTER_MAX_SECS ticks_to_secs(WDT_COUNTER_MAX_TICKS)
+/* Watchdog heartbeat shift used for security margin:
+ * we'll try to rshift the heartbeat value with this value to secure
+ * the watchdog reset. */
+#define WDT_HEARTBEAT_SHIFT    2
+
  /* Hardware timeout in seconds */
  #define WDT_HW_TIMEOUT 2
@@ -158,7 +163,9 @@ static int at91_wdt_init(struct 
platform_device *pdev, struct at91wdt *wdt)
      int err;
      u32 mask = wdt->mr_mask;
      unsigned long min_heartbeat = 1;
+    unsigned long max_heartbeat;
      struct device *dev = &pdev->dev;
+    int shift;
      tmp = wdt_read(wdt, AT91_WDT_MR);
      if ((tmp & mask) != (wdt->mr & mask)) {
@@ -181,23 +188,27 @@ static int at91_wdt_init(struct 
platform_device *pdev, struct at91wdt *wdt)
      if (delta < value)
          min_heartbeat = ticks_to_hz_roundup(value - delta);
-    wdt->heartbeat = ticks_to_hz_rounddown(value);
-    if (!wdt->heartbeat) {
+    max_heartbeat = ticks_to_hz_rounddown(value);
+    if (!max_heartbeat) {
          dev_err(dev,
              "heartbeat is too small for the system to handle it 
correctly\n");
          return -EINVAL;
      }
-    if (wdt->heartbeat < min_heartbeat + 4) {
+    for (shift = WDT_HEARTBEAT_SHIFT; shift > 0; shift--) {
+        if ((max_heartbeat >> shift) < min_heartbeat)
+            continue;
+
+        wdt->heartbeat = max_heartbeat >> shift;
+        break;
+    }
+
+    if (!shift)
          wdt->heartbeat = min_heartbeat;
Correct me if I am wrong, but it seems to me that

    if ((max_heartbeat >> 2) >= min_heartbeat)
         wdt->heartbeat = max_heartbeat >> 2;
    else if ((max_heartbeat >> 1) >= min_heartbeat)
        wdt->heartbeat = max_heartbeat >> 1;
    else
        wdt->heartbeat = min_heartbeat;

would accomplish the same and be easier to understand.
This is exactly what I'm trying to accomplish.
I used the for loop in case we ever want to change the
WDT_HEARTBEAT_SHIFT value
(which is unlikely to happen).

I'll move to your proposition.
quoted
However, given that, I wonder if it is really necessary to bail out 
above if
max_heartbeat is 0. After all, you set heartbeat to min_heartbeat 
anyway
in this case.
Yes it is necessary. The max_heartbeat is a configuration that
cannot be changed once configured.
We have to inform the user that his max_heartbeat value is too small
to be handled correctly by the Linux kernel.

If we simply use the min_heartbeat value as heartbeat and ignore the
wrong max_heartbeat value,
the watchdog will reset the SoC before we can ever reset the
watchdog counter.
quoted
quoted
+
+    if (max_heartbeat < min_heartbeat + 4)
          dev_warn(dev,
               "min heartbeat and max heartbeat might be too close 
for the system to handle it correctly\n");
-        if (wdt->heartbeat < 4)
-            dev_warn(dev,
-                 "heartbeat might be too small for the system to 
handle it correctly\n");
-    } else {
-        wdt->heartbeat -= 4;
-    }
      if ((tmp & AT91_WDT_WDFIEN) && wdt->irq) {
          err = request_irq(wdt->irq, wdt_interrupt,
@@ -213,7 +224,9 @@ static int at91_wdt_init(struct 
platform_device *pdev, struct at91wdt *wdt)
               tmp & wdt->mr_mask, wdt->mr & wdt->mr_mask);
      setup_timer(&wdt->timer, at91_ping, (unsigned long)wdt);
-    mod_timer(&wdt->timer, jiffies + wdt->heartbeat);
+    /* Use min_heartbeat the first time because the watchdog 
timer might
+     * be running for a long time when we reach this init 
function. */
    /*
     * Multi-line comment style
     *
     * Not really sure I understand what this accomplishes. What 
problem
     * are you trying to solve here ?
     */
Sure, I'll change the comment style.

What, I'm trying to explain, is that the watchdog might (or should)
have been resetted
before loading the linux kernel. But loading the kernel takes some
time (uncompressing,
low level init, ...), and if we take the heartbeat (or max_heartbeat
/ 4 in the common case) value as
the next trigger to reset the watchdog counter, the watchdog timer
might have already expired.
But increasing anything in the watchdog driver itself won't help you 
there.
You can not execute any kernel code before that kernel code is running.
Of course, but you can at least try to minimize the time between the 
watchdog driver init
and the first wathdog counter reset.
quoted
quoted
Here is an example:
  - max_heartbeat configured to 8 seconds
  - min_heartbeat configured to 1 second
  => heartbeat = 2s
  - kernel boot time (before at91 watchdog is loaded) = 6 secs
Guess that is where I got lost. Do you mean the time from loading the 
driver
to starting the watchdog application ? Because the init function is only
executed after the driver is loaded, so nothing will help you if it 
takes
too long for the driver to load.
I think there is another bug here: the driver should not be compiled 
as a module.

Here is why:

At POR the at91 SoC configure its watchdog with these default values:
 - enabled
 - min heartbeat = 0 ticks
 - max heartbeat = 0xfff ticks <=> 16 secs
 - some reset options
After a POR the watchdog can only be reconfigured once (and only once).
This configuration oftenly takes place in the the bootstrap (or 
bootloader),
but can eventually be done by the Linux kernel.

If the watchdog is kept enabled by the bootstrap (or bootloader), this 
means
the linux kernel will have to reset the watchdog counter as soon as 
possible to avoid
a possible watchdog reset.

=> If we authorize the at91 sam9 watchdog to be compiled as a module, 
we're not sure
the module will be loaded soon enough to be able to reset the watchdog 
counter.
quoted
You really have two times to deal with:
- Time from ROMMON handoff to loading the driver
   Nothing you can do there. If the watchdog fires before the driver 
is loaded,
   you are lost. Only way t handle this situation is to increase the 
timeout
   in the ROMMON.
- Time from loading driver to watchdog device open. This is really 
the time
   you are increasing with your change.
This is where it gets a bit tricky.

The heartbeat I'm talking about is not the "user space" heartbeat
(struct watchdog_device timeout field), it's the "hardware watchdog 
counter reset"
heartbeat (struct at91wdt heartbeat field).

Actually the at91 sam9 wdt driver does not provide a direct access to 
the watchdog reset
counter functionnality.
Instead, it periodically reset the watchdog counter (based on the 
timing config retrieved from
the hw registers), and eventually, if the user open a the watchdog dev 
file and fails to reset
the watchdog (using the user space API), the drivers stops resetting 
the hw counter, which leads
to a watchdog reset.

I hope I'm clear enough, cause it's quite complicated to explain.

Best Regards,

Boris
quoted
Thanks,
Guenter
quoted
If I use the heartbeat value when configuring the first expiration
of the timer,
it might be too late to reset the watchdog counter.

I'll try to find a proper to explain this use case :-).
quoted
quoted
+    mod_timer(&wdt->timer, jiffies + min_heartbeat);
      /* Try to set timeout from device tree first */
      if (watchdog_init_timeout(&wdt->wdd, 0, dev))
-- 
1.7.9.5

-- 
To unsubscribe from this list: send the line "unsubscribe 
linux-watchdog" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
-- 
To unsubscribe from this list: send the line "unsubscribe 
linux-watchdog" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help