Thread (6 messages) 6 messages, 3 authors, 2005-05-27

Re: Fw: [Bugme-new] [Bug 4628] New: Test server hang while running rhr (network) test on RHEL4 with kernel 2.6.12-rc1-mm4

From: Jian Jun He <hidden>
Date: 2005-05-27 06:18:07


hello,

I verified the problem on 2.6.12-rc5 with mm1 patch. The test server works
find during the test procedure.
So I will close this defect in bugme.

Thanks all of your attention on this defect.


to Andrew Morton:

1)If you are a register in rhn.redhat.com, you can search the package
"rhr2", then you can download rhr2.
Also you could download rhr2 from the following links
http://people.redhat.com/rlandry/rhr2/test/1.0-17beta/rhr2-1.0-17beta.noarch.rpm

2)The attachments are the conf files that I used for rhr2 test.


(See attached file: hardware.conf)(See attached file: rhr.conf)(See
attached file: system.conf)(See attached file: tests.conf)
3) invoke redhat-ready is ok, no arguments.

Best Regards!

Jian Jun He

CSDL, Beijing
Email: hejianj@cn.ibm.com



                                                                           
             Andrew Morton                                                 
             [off-list ref]                                               
                                                                        To 
             2005-05-27 04:31          Jian Jun He/China/Contr/IBM@IBMCN   
                                                                        cc 
                                       ganesh.venkatesan@intel.com,        
                                       anton@samba.org, Dang En            
                                       Ren/China/IBM@IBMCN,                
                                       ganesh.venkatesan@gmail.com,        
                                       herbert@gondor.apana.org.au,        
                                       jesse.brandeburg@intel.com,         
                                       jgarzik@pobox.com, Jia Sen          
                                       Wang/China/IBM@IBMCN,               
                                       john.ronciak@intel.com, Lei CDL     
                                       Wang/China/Contr/IBM@IBMCN,         
                                       linuxppc64-dev@lists.linuxppc.org.s 
                                       gi.com, netdev@oss.sgi.com          
                                                                   Subject 
                                       Re: Fw: [Bugme-new] [Bug 4628] New: 
                                       Test server hang while running rhr  
                                       (network) test on RHEL4 with kernel 
                                       2.6.12-rc1-mm4                      
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




Jian Jun He [off-list ref] wrote:
 2. Download rhr2-rhel4-1.0-14a.noarch.rpm from rhn.redhat.com and
install
 it on
 the test machine.
 3. Configure and run the rhr test via invoking redhat-ready.
This is the problematic bit.

- Please provide a full URL which can be used to obtain rhr.
  rhn.redhat.com is subscription-based.

- Please describe the hardware setup - surely the test requires at least
  two machines.  How are they configured?

- Provide an exact transcript of the commands which are to be used.  Is
  it just

             redhat-ready

  with no arguments?



All that begin said, we already have a quite specific diagnosis via code
inspection, from Herbert:


Herbert Xu [off-list ref] wrote:
Andrew Morton [off-list ref] wrote:
quoted
Might be a bug in the e100 driver, might not be.

I assume this is the

       BUG_ON(skb->list != NULL);
It certainly is a bug in e100.

e100_tx_timeout -> e100_down -> e100_rx_clean_list

is racing against

e100_poll -> e100_rx_clean -> e100_rx_indicate

e100_rx_clean/e100_rx_indicate takes an skb off the RX ring and
while it's being processed e100_rx_clean_list comes along and
frees it.

From a quick check similar problems may exist in other drivers that
have lockless ->poll() functions with RX rings.
Do the e100 maintainers agree with this diagnosis?  If so then more testing
isn't required at this stage - the next step is to fix the above bug, no?

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help