Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: Eric Dumazet <hidden>
Date: 2011-07-20 00:43:18
Le mardi 19 juillet 2011 à 20:19 -0400, Neil Horman a écrit :
quoted
You are correct Eric, this can cause a significant performance regression, but I think that beats causing a panic or other unexpected behavior. I read your previous threads with others regarding fixing this with vlans, but I don't think its fair to just say 'its fast, but it might cause oopses'. And its not sufficient to simply forbid soft drivers to make use of pktgen, its not just a soft driver problem, its systemic. Any driver which assumes that it has exclusive access to an skb submitted for transmit is at risk from pktgen in its current implementation. That of course as a subset includes all the soft drivers, but others are also suceptible. As examples (some of which I noted in the origional post) virtio_net uses the skb->cb to hold vnet header information which will be corrupted on sucessive sends. bnx2x linearizes skbs under certain circumstances, which means pktgen, if it marshals a fragmented frame will not send a fragmented frame after the first iteration. The PPP and Slip drivers skb_push the skb to prepend a header to the frame on send, meaning sucessive uses, up until they get an skb_under_panic will get iteratively more malformed frames on the wire as ppp headers get stacked on top of one another. These are ust a few of the examples I've found. The long and the short of it in my mind, is that we have a fundamental disconnect between driver asumptions and pktgen. If its ok to submit shared skbs to drivers, then we need to augment drivers that modify skbs on transmit to clone the skb (likey not an efficient solution), or if its not ok to do so, we need to change pktgen to not behave that way.
Its a known problem, please check mail archives. Nobody felt a fix was needed.
quoted
Note : a sysadmin has other ways to make a machine panic or reboot or halt...Yes, predictable ways, that the sysadmin can see coming based on what they're doing (i.e. no one should be shocked if they dd /dev/random to /dev/kmem and get a hang or panic, or if they issue a sysrq-c, etc). This case is different. A sysadmin reasonably expects pktgen to send the frames they configure on the interface they specify. While its arguably reasonable to forsee that it may not work with soft interfaces, pktgen just won't work with some hardware drivers (as per the examples above). And it won't always be an oops, it may be occasional random behvaior in the output data, and its highly dependent not just on the use of pktgen, but rather the specific command(s) issued. I'm sensitive to the performance impact, but I would much rather see a lower performing pktgen that doesn't randomly crash, and bring the performance back up in a safe, reliable way. To that end, I've been starting to think about pre-allocating a ring buffer of skbs with a skb->users count biased up to prevent driver freeing. That way we could detect 'unused skb's' by a user count that was at the bias level. Thoughts?
I dont know. I use pktgen maybe once per week and never got a single crash like this. We probably are very few pktgen users in the world, and we use it exactly to avoid calling skb_clone() or other expensive per xmit setup. Just remove pktgen from RedHat kernels, if you dont trust sysadmins. # CONFIG_PKTGEN is not set Alternatively, add a check to problematic drivers to _not_ mess skb if skb_shared(skb) is true : eventually use skb_share_check()