Re: fealnx oopses

From: Denis Vlasenko <hidden>
Date: 2004-03-29 22:50:42

I think bug can be considered fixed if I can start
netcat UDP flood, wait however long I want, then press
ctrl-C and get my bash prompt back. Local netcat
closes socket and exits, remote netcat gets its
icmp 'port unreachable' and exits too. Everybody's
happy.

Oopses are gone but it looks like box is so much interrupt
flooded that userspace has no chance of processing ctrl-C.
What can we do? I think driver can do something useful
whet it detects 'too much work in interrupt'. Disabling rx
for several ms seems like 'quick and dirty' way.

Francois what do you think? Can you code something up
for me to test?

On Tuesday 30 March 2004 00:20, Francois Romieu wrote:

Denis Vlasenko [off-list ref]:
[...]

quoted

in intr_handler():
                if (--boguscnt < 0) {
                        printk(KERN_WARNING "%s: Too much work at
interrupt, " "status=0x%4.4x.\n", dev->name, intr_status); break;
                }
Shall we do something with this condition?
What if card is simply go mad? Maybe card reset?

1 - Yes.
2 - disable the offending interruption/NAPI (reset is not needed)

Imagine that hardware got stuck with intr constantly asserted.
Reset can cure that. In any event, it might give us a needed
pause of several ms, just what I want.

If you worry about lost packets, that's not a concern -
if we reached this, we are dropping tons of them already.

[...]

quoted

static int netdev_rx(struct net_device *dev)
{
        struct netdev_private *np = dev->priv;

        if( ! (!(np->cur_rx->status & RXOWN) && np->cur_rx->skbuff) ) {
//vda: printk(KERN_ERR "netdev_rx(): nothing to do?! (np->cur_rx->status
& RXOWN) == 0x%04x, np->cur_rx->skbuff == %p\n" ,(np->cur_rx->status &
RXOWN)
                        ,np->cur_rx->skbuff
                );
        }
I added this. If we trigger this, netdev_rx won't enter
while() loop and will do essentially nothing
except for trying to allocate_rx_buffers(dev).

It is supposed to mean that there is an unallocated buffer in the ring and
that the driver has simply wrapped to the point where it met it again.
So there is only one thing to do: try to allocate.

Hm, but why we got rx intr at all? Card couldn't receive packet into
non-allocated buffer, right?

[...]

quoted

I did trigger this right before 'too much work'
(RXOWN was set, ->skbuff was not NULL).
What does it mean? Card received a packet but _not_
into this buffer? How card decides into which buffer
to receive? Shall we check them all?

It probably means that several packets were processed during a previous
interruption so when this interruption is triggered, there's nothing to
do.

Aha, card didn't know that and prods CPU again. I got it.

[...]

quoted

                np->cur_rx = np->cur_rx->next_desc_logical;
        }                       /* end of while loop */
if(pkt_len < rx_copybreak...) path is taken, skbuff is still usable
for next rx, no? Then why np->cur_rx = np->cur_rx->next_desc_logical?

Not for the next Rx: the whole ring will have to be processed first. The
sole difference when copybreak does not apply is that an allocation should
be performed for the relevant descriptor. The descriptor are set up in a
circular list and the asic walks this list. So whatever happens, the driver
must consider the next descriptor as current for the upcoming interruption.

/me feels enlightened
--
vda

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help