Re: Fw: [Bugme-new] [Bug 4628] New: Test server hang while running rhr (network) test on RHEL4 with kernel 2.6.12-rc1-mm4
From: Jian Jun He <hidden>
Date: 2005-05-27 06:18:07
hello, I verified the problem on 2.6.12-rc5 with mm1 patch. The test server works find during the test procedure. So I will close this defect in bugme. Thanks all of your attention on this defect. to Andrew Morton: 1)If you are a register in rhn.redhat.com, you can search the package "rhr2", then you can download rhr2. Also you could download rhr2 from the following links http://people.redhat.com/rlandry/rhr2/test/1.0-17beta/rhr2-1.0-17beta.noarch.rpm 2)The attachments are the conf files that I used for rhr2 test. (See attached file: hardware.conf)(See attached file: rhr.conf)(See attached file: system.conf)(See attached file: tests.conf) 3) invoke redhat-ready is ok, no arguments. Best Regards! Jian Jun He CSDL, Beijing Email: hejianj@cn.ibm.com Andrew Morton [off-list ref] To 2005-05-27 04:31 Jian Jun He/China/Contr/IBM@IBMCN cc ganesh.venkatesan@intel.com, anton@samba.org, Dang En Ren/China/IBM@IBMCN, ganesh.venkatesan@gmail.com, herbert@gondor.apana.org.au, jesse.brandeburg@intel.com, jgarzik@pobox.com, Jia Sen Wang/China/IBM@IBMCN, john.ronciak@intel.com, Lei CDL Wang/China/Contr/IBM@IBMCN, linuxppc64-dev@lists.linuxppc.org.s gi.com, netdev@oss.sgi.com Subject Re: Fw: [Bugme-new] [Bug 4628] New: Test server hang while running rhr (network) test on RHEL4 with kernel 2.6.12-rc1-mm4 Jian Jun He [off-list ref] wrote:
2. Download rhr2-rhel4-1.0-14a.noarch.rpm from rhn.redhat.com and
install
it on the test machine. 3. Configure and run the rhr test via invoking redhat-ready.
This is the problematic bit.
- Please provide a full URL which can be used to obtain rhr.
rhn.redhat.com is subscription-based.
- Please describe the hardware setup - surely the test requires at least
two machines. How are they configured?
- Provide an exact transcript of the commands which are to be used. Is
it just
redhat-ready
with no arguments?
All that begin said, we already have a quite specific diagnosis via code
inspection, from Herbert:
Herbert Xu [off-list ref] wrote:Andrew Morton [off-list ref] wrote:quoted
Might be a bug in the e100 driver, might not be. I assume this is the BUG_ON(skb->list != NULL);It certainly is a bug in e100. e100_tx_timeout -> e100_down -> e100_rx_clean_list is racing against e100_poll -> e100_rx_clean -> e100_rx_indicate e100_rx_clean/e100_rx_indicate takes an skb off the RX ring and while it's being processed e100_rx_clean_list comes along and frees it. From a quick check similar problems may exist in other drivers that have lockless ->poll() functions with RX rings.
Do the e100 maintainers agree with this diagnosis? If so then more testing isn't required at this stage - the next step is to fix the above bug, no?
Attachments
- hardware.conf [application/octet-stream] 1672 bytes · preview
- rhr.conf [application/octet-stream] 509 bytes · preview
- system.conf [application/octet-stream] 226 bytes · preview
- tests.conf [application/octet-stream] 1253 bytes · preview