Thread (50 messages) 50 messages, 2 authors, 2021-05-17

Re: nvme tcp receive errors

From: Sagi Grimberg <sagi@grimberg.me>
Date: 2021-05-13 19:55:45


On 5/13/21 8:48 AM, Keith Busch wrote:
On Tue, May 11, 2021 at 10:17:09AM -0700, Sagi Grimberg wrote:
quoted
quoted
quoted
I may have a theory to this issue. I think that the problem is in
cases where we send commands with data to the controller and then in
nvme_tcp_send_data between the last successful kernel_sendpage
and before nvme_tcp_advance_req, the controller sends back a successful
completion.

If that is the case, then the completion path could be triggered,
the tag would be reused, triggering a new .queue_rq, setting again
the req.iter with the new bio params (all is not taken by the
send_mutex) and then the send context would call nvme_tcp_advance_req
progressing the req.iter with the former sent bytes... And given that
the req.iter is used for reads/writes, it is possible that it can
explain both issues.

While this is not easy to trigger, there is nothing I think that
can prevent that. The driver used to have a single context that
would do both send and recv so this could not have happened, but
now that we added the .queue_rq send context, I guess this can
indeed confuse the driver.
Awesome, this is exactly the type of sequence I've been trying to
capture, but couldn't quite get there. Now that you've described it,
that flow can certainly explain the observations, including the
corrupted debug trace event I was trying to add.

The sequence looks unlikely to happen, which agrees with the difficulty
in reproducing it. I am betting right now that you got it, but a little
surprised no one else is reporting a similar problem yet.
We had at least one report from Potnuri that I think may have been
triggered by this, this ended up fixed (or rather worked-around
with 5c11f7d9f843).
quoted
Your option "1" looks like the best one, IMO. I've requested dropping
all debug and test patches and using just this one on the current nvme
baseline for the next test cycle.
Cool, waiting to hear back...
This patch has been tested successfully on the initial workloads. There
are several more that need to be validated, but each one runs for many
hours, so it may be a couple more days before completed. Just wanted to
leat you know: so far, so good.
Encouraging... I'll send a patch for that as soon as you give me the
final verdict. I'm assuming Narayan would be the reporter and the
tester?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help