Re: [PATCH v13 01/11] can: etas_es58x: add core support for ETAS ES58X CAN... | linux-can

Re: [PATCH v13 01/11] can: etas_es58x: add core support for ETAS ES58X CAN USB interfaces

From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: 2021-03-29 09:19:40

On 20.03.2021 19:33:02, Vincent MAILHOL wrote:

quoted

+/**
+ * es58x_set_skb_timestamp() - Set the hardware timestamp of an skb.
+ * @netdev: CAN network device.
+ * @skb: socket buffer of a CAN message.
+ * @timestamp: Timestamp received from an ES58X device.
+ *
+ * Used for both received and loopback messages.
+ *
+ * Return: zero on success, -EFAULT if @skb is NULL.

That's not correct.

Please make sure to call it with skb != NULL and make it a void function.

Ack.
All calls to that function already checked that the skb was not NULL.
I will make the function void.

quoted

Does it make sense for you to use of struct cyclecounter/struct timecounter.
Have a look at:

https://lore.kernel.org/linux-can/20210304160328.2752293-6-mkl@pengutronix.de/ (local)

As your device already provides a u64 timestamp you don't need the worker.

I saw your patch but because my device already uses a 64 bit
counter, I did not see a benefit to do so.

Which problem would the struct cyclecounter/struct timecounter
solve for my driver?

It probably doesn't matter.

quoted

+ */
+static int es58x_set_skb_timestamp(struct net_device *netdev,
+                                struct sk_buff *skb, u64 timestamp)
+{
+     struct es58x_device *es58x_dev = es58x_priv(netdev)->es58x_dev;
+     struct skb_shared_hwtstamps *hwts;
+
+     hwts = skb_hwtstamps(skb);
+     /* Ignoring overflow (overflow on 64 bits timestamp with nano
+      * second precision would occur after more than 500 years).
+      */
+     hwts->hwtstamp = ns_to_ktime(es58x_timestamp_to_ns(timestamp) +
+                                  es58x_dev->realtime_diff_ns);
+
+     return 0;
+}
+
+/**
+ * es58x_rx_timestamp() - Handle a received timestamp.
+ * @es58x_dev: ES58X device.
+ * @timestamp: Timestamp received from a ES58X device.
+ *
+ * Calculate the difference between the ES58X device and the kernel
+ * internal clocks. This difference will be later used as an offset to
+ * convert the timestamps of RX and loopback messages to match the
+ * kernel system time (e.g. convert to UNIX time).

I'm not sure if we want to have the timestamp based on the kernel system time.
The problem I see is that your clock is not synchronized to your system clock
and will probably drift, so you might accumulate an offset.

When looking at candump output, a timestamp that more or less current time gives
a false sense of correctness, but if the timestamp is short after 01.0.1970 you
don't expect it to be correct.

But this is a different problem.....

Here, we are discussing tastes and colors. I am perfectly aware
that the two quartz used by the device and the kernel are not in
sync and will drift away from each other. However, when looking
at my traces, I like to have a hint of when an event occurs. In
the worst case, the quartz might drift away up to four seconds a day,
but the date, the hours and the minutes stay accurate.

So it is a discussion of making it convenient versus making it dummy proof.

If there is a strong consensus not to base the hardware timestamp
on the kernel, then I will remove it. But if I have the choice, I
prefer to keep it as it is.

I've changed the mcp251xfd timestamp to sync to the kernel time, too :)

[...]

quoted

+/**
+ * es58x_rx_err_msg() - Handle a received CAN event or error message.
+ * @netdev: CAN network device.
+ * @error: Error code.
+ * @event: Event code.
+ * @timestamp: Timestamp received from a ES58X device.
+ *
+ * Handle the errors and events received by the ES58X device, create
+ * a CAN error skb and post it.
+ *
+ * In some rare cases the devices might get stucked alternating between
+ * CAN_STATE_ERROR_PASSIVE and CAN_STATE_ERROR_WARNING. To prevent
+ * this behavior, we force a bus off state if the device goes in
+ * CAN_STATE_ERROR_WARNING for ES58X_MAX_CONSECUTIVE_WARN consecutive
+ * times with no successful transmission or reception in between.
+ *
+ * Once the device is in bus off state, the only way to restart it is
+ * through the drivers/net/can/dev.c:can_restart() function. The
+ * device is technically capable to recover by itself under certain
+ * circumstances, however, allowing self recovery would create
+ * complex race conditions with drivers/net/can/dev.c:can_restart()
+ * and thus was not implemented. To activate automatic restart, please
+ * set the restart-ms parameter (e.g. ip link set can0 type can
+ * restart-ms 100).
+ *
+ * If the bus is really instable, this function would try to send a
+ * lot of log messages. Those are rate limited (i.e. you will see
+ * messages such as "net_ratelimit: XXX callbacks suppressed" in
+ * dmesg).
+ *
+ * Return: zero on success, errno when any error occurs.
+ */
+int es58x_rx_err_msg(struct net_device *netdev, enum es58x_err error,
+                  enum es58x_event event, u64 timestamp)
+{

[...]

quoted

+     switch (event) {
+     case ES58X_EVENT_OK:    /* 0: No event */
+             break;
+
+     case ES58X_EVENT_CRTL_ACTIVE:
+             if (can->state == CAN_STATE_BUS_OFF) {
+                     netdev_err(netdev,
+                                "%s: state transition: BUS OFF -> ACTIVE\n",
+                                __func__);

Your device should not go autoamtically from bus off to active. Only when it's
restarted by the kernel (or the user).

Agree, this is why it is an error message.

CAN controllers are allowed to automatically recover under very
specific conditions (c.f. ISO 11898-1, section 10.9.4 "Bus
integration state": "Nodes that are in bus-off state shall
re-enter the bus integration state immediately after detecting
the idle condition if the bus-off recovery condition is not yet
met").

On Linux the controllers should not recover automatically, but
controlled by the kernel as configured by restart-ms. So shut down the
controller in case of a bus off and wait for the kernel to recover via
priv->can.do_set_mode(CAN_MODE_START).

Aside from the idle condition, I do not think that the nodes are
allowed to self-recover.

ACK

But because the can_restart() function does not use locks for the
echo_skb[], I treat this scenario by printing an error message and
ignoring it.

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

Attachments

signature.asc [application/pgp-signature] 488 bytes

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help