Thread (15 messages) 15 messages, 3 authors, 2018-03-24

[PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs

From: Chopra, Manish <hidden>
Date: 2018-03-24 14:30:28
Also in: linux-arm-msm, lkml, netdev

-----Original Message-----
From: Sinan Kaya [mailto:okaya at codeaurora.org]
Sent: Friday, March 23, 2018 10:44 PM
To: David Miller <davem@davemloft.net>
Cc: netdev at vger.kernel.org; timur at codeaurora.org; sulrich at codeaurora.org;
linux-arm-msm at vger.kernel.org; linux-arm-kernel at lists.infradead.org; Elior,
Ariel [off-list ref]; Dept-Eng Everest Linux L2 <Dept-
EngEverestLinuxL2 at cavium.com>; linux-kernel at vger.kernel.org
Subject: Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-
ordered archs

On 3/23/2018 1:04 PM, David Miller wrote:
quoted
From: Sinan Kaya <redacted>
Date: Fri, 23 Mar 2018 12:51:47 -0400
quoted
It could if txdata->tx_db was not a union. There is a data dependency
between txdata->tx_db.data.prod and txdata->tx_db.raw.

So, no reordering.
I don't see it that way, the code requires that:

 	txdata->tx_db.data.prod += nbd;

is visible before the doorbell update.>
barrier() doesn't provide that.

Neither does writel_relaxed().  However plain writel() does.
Correct for some architectures including ARM but not correct universally.

writel() just guarantees register read/writes before and after to be ordered
when HW observes it.

writel() doesn't guarantee that the memory update is visible to the HW on all
architectures.

If you need memory update visibility, that barrier() should have been a
wmb()

A correct multi-arch pattern is

wmb()
writel_relaxed()
mmiowb()
Sinan,  Since you have mentioned the use of mmiowb() here after writel_relaxed().
I believe this is not always correct for all types of IO mapped memory [Specially if IO memory is mapped using write combined (for ex. Ioremap_wc())].
We have a current issue on our NIC (qede) driver on x86 for which the patch is already been sent more than a week ago [Still awaiting to hear from David on that].
where mmiowb() seems to be useless since we use write combined mapped doorbell and mmiowb() just seems to be a compiler barrier() there.
So in order to flush write combined buffer we really need writel_relaxed() followed by a wmb() to synchronize writes among CPU cores.
I think  the correct pattern in such cases (for write combined IO) would have been like below - 

wmb();
writel_relaxed();
wmb(); -> To flush the writes actually.

Thanks.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help