[PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
From: Chopra, Manish <hidden>
Date: 2018-03-24 14:30:28
Also in:
linux-arm-msm, lkml, netdev
-----Original Message----- From: Sinan Kaya [mailto:okaya at codeaurora.org] Sent: Friday, March 23, 2018 10:44 PM To: David Miller <davem@davemloft.net> Cc: netdev at vger.kernel.org; timur at codeaurora.org; sulrich at codeaurora.org; linux-arm-msm at vger.kernel.org; linux-arm-kernel at lists.infradead.org; Elior, Ariel [off-list ref]; Dept-Eng Everest Linux L2 <Dept- EngEverestLinuxL2 at cavium.com>; linux-kernel at vger.kernel.org Subject: Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly- ordered archs On 3/23/2018 1:04 PM, David Miller wrote:quoted
From: Sinan Kaya <redacted> Date: Fri, 23 Mar 2018 12:51:47 -0400quoted
It could if txdata->tx_db was not a union. There is a data dependency between txdata->tx_db.data.prod and txdata->tx_db.raw. So, no reordering.I don't see it that way, the code requires that: txdata->tx_db.data.prod += nbd; is visible before the doorbell update.> barrier() doesn't provide that. Neither does writel_relaxed(). However plain writel() does.Correct for some architectures including ARM but not correct universally. writel() just guarantees register read/writes before and after to be ordered when HW observes it. writel() doesn't guarantee that the memory update is visible to the HW on all architectures. If you need memory update visibility, that barrier() should have been a wmb() A correct multi-arch pattern is wmb() writel_relaxed() mmiowb()
Sinan, Since you have mentioned the use of mmiowb() here after writel_relaxed(). I believe this is not always correct for all types of IO mapped memory [Specially if IO memory is mapped using write combined (for ex. Ioremap_wc())]. We have a current issue on our NIC (qede) driver on x86 for which the patch is already been sent more than a week ago [Still awaiting to hear from David on that]. where mmiowb() seems to be useless since we use write combined mapped doorbell and mmiowb() just seems to be a compiler barrier() there. So in order to flush write combined buffer we really need writel_relaxed() followed by a wmb() to synchronize writes among CPU cores. I think the correct pattern in such cases (for write combined IO) would have been like below - wmb(); writel_relaxed(); wmb(); -> To flush the writes actually. Thanks.