Re: Fw: [Bugme-new] [Bug 4628] New: Test server hang while running rhr (network) test on RHEL4 with kernel 2.6.12-rc1-mm4
From: Andrew Morton <hidden>
Date: 2005-05-26 20:31:23
Jian Jun He [off-list ref] wrote:
2. Download rhr2-rhel4-1.0-14a.noarch.rpm from rhn.redhat.com and install it on the test machine. 3. Configure and run the rhr test via invoking redhat-ready.
This is the problematic bit. - Please provide a full URL which can be used to obtain rhr. rhn.redhat.com is subscription-based. - Please describe the hardware setup - surely the test requires at least two machines. How are they configured? - Provide an exact transcript of the commands which are to be used. Is it just redhat-ready with no arguments? All that begin said, we already have a quite specific diagnosis via code inspection, from Herbert: Herbert Xu [off-list ref] wrote:
Andrew Morton [off-list ref] wrote:quoted
Might be a bug in the e100 driver, might not be. I assume this is the BUG_ON(skb->list != NULL);It certainly is a bug in e100. e100_tx_timeout -> e100_down -> e100_rx_clean_list is racing against e100_poll -> e100_rx_clean -> e100_rx_indicate e100_rx_clean/e100_rx_indicate takes an skb off the RX ring and while it's being processed e100_rx_clean_list comes along and frees it. From a quick check similar problems may exist in other drivers that have lockless ->poll() functions with RX rings.
Do the e100 maintainers agree with this diagnosis? If so then more testing isn't required at this stage - the next step is to fix the above bug, no?