Thread (4 messages) 4 messages, 3 authors, 2019-05-02

Re: [Patch net-next] net: add a generic tracepoint for TX queue timeout

From: Cong Wang <hidden>
Date: 2019-05-02 00:50:57

On Wed, May 1, 2019 at 6:11 AM Eran Ben Elisha [off-list ref] wrote:


On 4/30/2019 9:50 PM, Cong Wang wrote:
quoted
Although devlink health report does a nice job on reporting TX
timeout and other NIC errors, unfortunately it requires drivers
to support it but currently only mlx5 has implemented it.
The devlink health was never intended to be the generic mechanism for
monitoring all driver's TX timeouts notifications. mlx5e driver chose to
handle TX timeout notification by reporting it to the newly devlink
health mechanism.
Understood.
quoted
Before other drivers could catch up, it is useful to have a
generic tracepoint to monitor this kind of TX timeout. We have
been suffering TX timeout with different drivers, we plan to
start to monitor it with rasdaemon which just needs a new tracepoint.
Great idea to suggest a generic trace message that can be monitored over
all drivers.
quoted
Sample output:

   ksoftirqd/1-16    [001] ..s2   144.043173: net_dev_xmit_timeout: dev=ens3 driver=e1000 queue=0

Cc: Eran Ben Elisha <redacted>
Cc: Jiri Pirko <redacted>
Signed-off-by: Cong Wang <redacted>
---
  include/trace/events/net.h | 23 +++++++++++++++++++++++
  net/sched/sch_generic.c    |  2 ++
  2 files changed, 25 insertions(+)
diff --git a/include/trace/events/net.h b/include/trace/events/net.h
index 1efd7d9b25fe..002d6f04b9e5 100644
--- a/include/trace/events/net.h
+++ b/include/trace/events/net.h
@@ -303,6 +303,29 @@ DEFINE_EVENT(net_dev_rx_exit_template, netif_receive_skb_list_exit,
      TP_ARGS(ret)
  );
I would have put this next to net_dev_xmit trace event declaration.
Sounds reasonable, it would be slightly easier to find it.
I will send v2.

Thanks.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help