Re: TCP sacked_out and fackets_out inconsistency (Was: Re: BUG: unable to handle kernel NULL pointer dereference at 000000000000002c)
From: Stefan Priebe - Profihost AG <hidden>
Date: 2012-02-08 08:26:42
Hi Eric, Am 06.02.2012 13:47, schrieb Ilpo Järvinen:
quoted
quoted
Any idea about that? Is it due to my custom patch being buggy or is it anything you know which is missing in 3.0.X too?This warning is known to trigger every now and then...quoted
Thats the tcp_fastretrans_alert() if (WARN_ON(!tp->sacked_out && tp->fackets_out)) tp->fackets_out = 0; I dont know if some recent patch addressed this issue....the recent fix from Neal to pick correct MSS might fix this but it is of course hard to confirm for sure (we'll see it indirectly eventually if there won't be anymore these rare splats). If one has infinite time it would be quite simple to see if changing mss setup triggers this and if the Neal's fix helped or not, however, I don't consider this particular inconsistency worth the effort. ...What I can say for sure is at least tp->fackets_out -= min(pkts_acked, tp->fackets_out); seems to fail when pkts_acked (u32) underflows due to the mss badness we used to have. So it could actually solve this for real. The effects of this counter inconsistency are not that devastating. Fackets_out mainly affect when recovery is triggered/which segments to mark lost in the recovery itself. Two extremes I can think of: recovery not triggered => RTO triggers and everyone is happy except some researcher who finds that odd and unwanted and needs to fix it :-); recovery in progress but works too much ahead, as if dupthresh (tp->reordering) would be slightly smaller (if in-order behavior in the network is assumed this is still fully safe, dupthresh is there to help in cases of minor reordering).
What do you think about this? Can anybody give me the commit id? Stefan