Thread (59 messages) 59 messages, 13 authors, 2011-08-23

Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods

From: Eric Dumazet <hidden>
Date: 2011-07-20 00:43:18

Le mardi 19 juillet 2011 à 20:19 -0400, Neil Horman a écrit :
quoted
You are correct Eric, this can cause a significant performance regression, but I
think that beats causing a panic or other unexpected behavior.  I read your
previous threads with others regarding fixing this with vlans, but I don't think
its fair to just say 'its fast, but it might cause oopses'. 

And its not sufficient to simply forbid soft drivers to make use of pktgen, its
not just a soft driver problem, its systemic.  Any driver which assumes that it
has exclusive access to an skb submitted for transmit is at risk from pktgen in
its current implementation.  That of course as a subset includes all the soft
drivers, but others are also suceptible.  As examples (some of which I noted in
the origional post) virtio_net uses the skb->cb to hold vnet header information
which will be corrupted on sucessive sends.  bnx2x linearizes skbs under certain
circumstances, which means pktgen, if it marshals a fragmented frame will not
send a fragmented frame after the first iteration.  The PPP and Slip drivers
skb_push the skb to prepend a header to the frame on send, meaning sucessive
uses, up until they get an skb_under_panic will get iteratively more malformed
frames on the wire as ppp headers get stacked on top of one another.  These are
ust a few of the examples I've found.

The long and the short of it in my mind, is that we have a fundamental
disconnect between driver asumptions and pktgen.  If its ok to submit shared
skbs to drivers, then we need to augment drivers that modify skbs on transmit to
clone the skb (likey not an efficient solution), or if its not ok to do so, we
need to change pktgen to not behave that way.
Its a known problem, please check mail archives. Nobody felt a fix was
needed.
quoted
Note : a sysadmin has other ways to make a machine panic or reboot or
halt...
Yes, predictable ways, that the sysadmin can see coming based on what they're
doing (i.e. no one should be shocked if they dd /dev/random to /dev/kmem and get
a hang or panic, or if they issue a sysrq-c, etc).  This case is different.  A
sysadmin reasonably expects pktgen to send the frames they configure on the
interface they specify.  While its arguably reasonable to forsee that it may not
work with soft interfaces, pktgen just won't work with some hardware drivers (as
per the examples above).  And it won't always be an oops, it may be occasional
random behvaior in the output data, and its highly dependent not just on the use
of pktgen, but rather the specific command(s) issued.


I'm sensitive to the performance impact, but I would much rather see a lower
performing pktgen that doesn't randomly crash, and bring the performance back up
in a safe, reliable way.  To that end, I've been starting to think about
pre-allocating a ring buffer of skbs with a skb->users count biased up to
prevent driver freeing.  That way we could detect 'unused skb's' by a user count
that was at the bias level.  Thoughts?
I dont know. I use pktgen maybe once per week and never got a single
crash like this. We probably are very few pktgen users in the world, and
we use it exactly to avoid calling skb_clone() or other expensive per
xmit setup.

Just remove pktgen from RedHat kernels, if you dont trust sysadmins.
# CONFIG_PKTGEN is not set

Alternatively, add a check to problematic drivers to _not_ mess skb if
skb_shared(skb) is true : eventually use skb_share_check()


Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help