Thread (19 messages) 19 messages, 5 authors, 2008-03-29

Re: kernel 2.6.25-rc7 highly unstable on high load

From: Denys Fedoryshchenko <hidden>
Date: 2008-03-28 07:38:54
Also in: netfilter-devel

Possibly related (same subject, not in this thread)

Already patched and tested, it doesn't change anything.

On Fri, 28 Mar 2008 06:49:53 +0100, Eric Dumazet wrote
Denys Fedoryshchenko a :
quoted
Just to make sure 2.6.24.3 is stable and it is regression i am supplying 
output from it.
Do you want me to submit summary to bugzilla and regression list as well?

And in short, IMHO 2.6.25 have major issues on routing that have to be 
fixed 
quoted
before release. TRIE is crashing, and even with HASH there is leak. I am 
trying my best to bisect it, but it is major router and i cannot take 
much 
quoted
risk on it, so i wish i can simulate in my home mini-lab. Still i am not 
able 
quoted
to get even proper switch (Lebanon difficult country for IT).

Kup ~ # uname -a
Linux Kup 2.6.24.3-build-0023 #3 SMP Sat Mar 8 13:01:35 EET 2008 i686 
unknown
quoted
up ~ # rtstat -i60 -c6000
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
 entries|  in_hit|in_slow_|in_slow_|in_no_ro|  in_brd|in_marti|in_marti| 
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
        |        |     tot|      mc|     ute|        |  an_dst|  
an_src|        |    _tot|     _mc|        |      ed|    miss| verflow| 
_search|t_search|
   54750|    4430|    1128|       0|      12|       0|       0|       
0|     
quoted
263|     190|       0|     709|     708|       0|       0|    3545|     
313|
quoted
   92913|    8829|    1211|       0|       1|       0|       0|       
0|     
quoted
343|     163|       0|    1375|    1373|       0|       0|   12545|     
724|
quoted
  115323|    8232|     906|       0|       0|       0|       0|       
0|     
quoted
299|     128|       0|    1035|    1033|       0|       0|   18069|     
813|
quoted
  128985|    8650|     839|       0|       0|       0|       0|       
0|     
quoted
289|     115|       0|     954|     952|       0|       0|   22515|     
845|
quoted
  116682|    8911|     861|       0|       0|       0|       0|       
0|     
quoted
288|     117|       0|     978|     976|       0|       0|   23433|     
775|
quoted
   99969|    9164|     889|       0|       0|       0|       0|       
0|     
quoted
280|     113|       0|    1002|    1000|       0|       0|   26741|     
839|
quoted
  124602|    9395|    1012|       0|       0|       0|       0|       
0|     
quoted
271|     122|       0|    1134|    1132|       0|       0|   27381|     
787|
quoted
  110051|   10036|     824|       0|       0|       0|       0|       
0|     
quoted
279|     120|       0|     944|     942|       0|       0|   28558|     
783|
quoted
  126835|   10631|     772|       0|       0|       0|       0|       
0|     
quoted
274|     117|       0|     888|     886|       0|       0|   29451|     
780|
quoted
  111881|   10357|     762|       0|       0|       0|       0|       
0|     
quoted
275|     117|       0|     879|     877|       0|       0|   28235|     
751|
quoted
  127018|   10178|     796|       0|       0|       0|       0|       
0|     
quoted
283|     117|       0|     913|     911|       0|       0|   29480|     
807|
quoted
  112242|    9839|     814|       0|       0|       0|       0|       
0|     
quoted
293|     115|       0|     929|     927|       0|       0|   28095|     
796|
quoted
   41267|    9493|    1217|       0|       1|       0|       0|       
0|     
quoted
269|     138|       0|     811|     810|       0|       0|   18545|     
548|
quoted
   76380|    9722|    1060|       0|       1|       0|       0|       
0|     
quoted
250|     135|       0|    1195|    1193|       0|       0|   14786|     
414|
quoted
   99922|    9811|     779|       0|       0|       0|       0|       
0|     
quoted
281|     124|       0|     902|     900|       0|       0|   21853|     
589|
quoted
Kup ~ # rtstat -i60 -c6000
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
 entries|  in_hit|in_slow_|in_slow_|in_no_ro|  in_brd|in_marti|in_marti| 
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
        |        |     tot|      mc|     ute|        |  an_dst|  
an_src|        |    _tot|     _mc|        |      ed|    miss| verflow| 
_search|t_search|

  122053|  150955|   14888|       0|      25|       1|       0|       
0|    
quoted
4611|    2090|       0|   15820|   15789|       0|       0|  369513|   
11562|
quoted
  105226|   10215|     872|       0|       0|       0|       0|       
0|     
quoted
279|     116|       0|     988|     986|       0|       0|   30343|     
799|
quoted
  126236|   10462|     924|       0|       0|       0|       0|       
0|     
quoted
260|     120|       0|    1044|    1042|       0|       0|   31699|     
782|
quoted
  114492|    9782|     884|       0|       0|       0|       0|       
0|     
quoted
253|     120|       0|    1005|    1003|       0|       0|   29695|     
722|
quoted
After ip route flush cache
Kup ~ # rtstat -i60 -c6000
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
rt_cache|
 entries|  in_hit|in_slow_|in_slow_|in_no_ro|  in_brd|in_marti|in_marti| 
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|
out_hlis|
        |        |     tot|      mc|     ute|        |  an_dst|  
an_src|        |    _tot|     _mc|        |      ed|    miss| verflow| 
_search|t_search|
    9088|  202136|   19262|       0|      29|       1|       0|       
0|    
quoted
5976|    2696|       0|   20647|   20606|       0|       0|  521714|   
15415|
quoted

!!!!!
I am not wrong, ip route flush cache doesn't work at 2.6.25-rc7. I will 
make 
quoted
sure about that now.
  
Maybe you are a litle bit too fast for "ip route flush cache" :)

It used to work like that : schedule a timer to start a flush in 
about 2 seconds. A flush meaning : scan the whole table and delete 
all entries.

On machines with 4 millions dst entries, this was using too much 
time and eventually crashing.

On recent kernels, each rtable entry has a special field named 
rt_genid, so that "ip route flush cache" doesnt have to scan the 
whole table, but only change the global genid. rtables entries will 
be deleted later, when their rt_genid is found to be different than 
the global genid.

Please try the patch that was suggested yesterday, as it is probably 
the cure your router needs.

http://git2.kernel.org/?p=linux/kernel/git/davem/net-
2.6.git;a=commitdiff;h=7c0ecc4c4f8fd90988aab8a95297b9c0038b6160
Thank you

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help