Thread (56 messages) 56 messages, 12 authors, 2021-06-09

Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

From: Palmer Dabbelt <palmer@dabbelt.com>
Date: 2021-06-03 04:14:47
Also in: linux-riscv, linux-sunxi, lkml

On Sat, 29 May 2021 17:30:18 PDT (-0700), Palmer Dabbelt wrote:
On Fri, 21 May 2021 17:36:08 PDT (-0700), guoren@kernel.org wrote:
quoted
On Wed, May 19, 2021 at 3:15 PM Anup Patel [off-list ref] wrote:
quoted
On Wed, May 19, 2021 at 12:24 PM Drew Fustini [off-list ref] wrote:
quoted
On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig wrote:
quoted
On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote:
quoted
Since the existing RISC-V ISA cannot solve this problem, it is better
to provide some configuration for the SOC vendor to customize.
We've been talking about this problem for close to five years.  So no,
if you don't manage to get the feature into the ISA it can't be
supported.
Isn't it a good goal for Linux to support the capabilities present in
the SoC that a currently being fab'd?

I believe the CMO group only started last year [1] so the RV64GC SoCs
that are going into mass production this year would not have had the
opporuntiy of utilizing any RISC-V ISA extension for handling cache
management.
The current Linux RISC-V policy is to only accept patches for frozen or
ratified ISA specs.
(Refer, Documentation/riscv/patch-acceptance.rst)

This means even if emulate CMO instructions in OpenSBI, the Linux
patches won't be taken by Palmer because CMO specification is
still in draft stage.
Before CMO specification release, could we use a sbi_ecall to solve
the current problem? This is not against the specification, when CMO
is ready we could let users choose to use the new CMO in Linux.

From a tech view, CMO trap emulation is the same as sbi_ecall.
quoted
Also, we all know how much time it takes for RISCV international
to freeze some spec. Judging by that we are looking at another
3-4 years at minimum.
Sorry for being slow here, this thread got buried.

I've been trying to work with a handful of folks at the RISC-V
foundation to try and get a subset of the various in-development
specifications (some simple CMOs, something about non-caching in the
page tables, and some way to prevent speculative accesse from generating
coherence traffic that will break non-coherent systems).  I'm not sure
we can get this together quickly, but I'd prefer to at least try before
we jump to taking vendor-specificed behavior here.  It's obviously an
up-hill battle to try and get specifications through the process and I'm
certainly not going to promise it will work, but I'm hoping that the
impending need to avoid forking the ISA will be sufficient to get people
behind producing some specifications in a timely fashion.

I wasn't aware than this chip had non-coherent devices until I saw this
thread, so we'd been mostly focused on the Beagle V chip.  That was in a
sense an easier problem because the SiFive IP in it was never designed
to have non-coherent devices so we'd have to make anything work via a
series of slow workarounds, which would make emulating the eventually
standardized behavior reasonable in terms of performance (ie, everything
would be super slow so who really cares).

I don't think relying on some sort of SBI call for the CMOs whould be
such a performance hit that it would prevent these systems from being
viable, but assuming you have reasonable performance on your non-cached
accesses then that's probably not going to be viable to trap and
emulate.  At that point it really just becomes silly to pretend that
we're still making things work by emulating the eventually ratified
behavior, as anyone who actually tries to use this thing to do IO would
need out of tree patches.  I'm not sure exactly what the plan is for the
page table bits in the specification right now, but if you can give me a
pointer to some documentation then I'm happy to try and push for
something compatible.

If we can't make the process work at the foundation then I'd be strongly
in favor of just biting the bullet and starting to take vendor-specific
code that's been implemented in hardware and is necessarry to make
things work acceptably.  That's obviously a sub-optimal solution as
it'll lead to a bunch of ISA fragmentation, but at least we'll be able
to keep the software stack together.

Can you tell us when these will be in the hands of users?  That's pretty
important here, as I don't want to be blocking real users from having
their hardware work.  IIRC there were some plans to distribute early
boards, but it looks like the foundation got involved and I guess I lost
the thread at that point.

Sorry this is all such a headache, but hopefully we can get things
sorted out.
I talked with some of the RISC-V foundation folks, we're not going to 
have an ISA specification for the non-coherent stuff any time soon.  I 
took a look at this code and I definately don't want to take it as is, 
but I'm not opposed to taking something that makes the hardware work as 
long as it's a lot cleaner.  We've already got two of these non-coherent 
chips, I'm sure more will come, and I'd rather have the extra headaches 
than make everyone fork the software stack.

After talking to Atish it looks like there's likely to be an SBI 
extension to handle the CMOs, which should let us avoid the bulk of the 
vendor-specific behavior in the kernel.  I know some people are worried 
about adding to the SBI surface.  I'm worried about that too, but that's 
way better than sticking a bunch of vendor-specific instructions into 
the kernel.  The SBI extension should make for a straight-forward cache 
flush implementation in Linux, so let's just plan on that getting 
through quickly (as has been done before).

Unfortunately we've yet to come up with a way to handle the 
non-cacheable mappings without introducing a degree of vendor-specific 
behavior or seriously impacting performance (mark them as not valid and 
deal with them in the trap handler).  I'm not really sure it counts as 
supporting the hardware if it's massively slow, so that really leaves us 
with vendor-specific mappings as the only option to make these chips 
work.

This implementation, which adds some Kconfig entries that control page 
table bits, definately isn't suitable for upstream.  Allowing users to 
set arbitrary page table bits will eventually conflict with the 
standard, and is just going to be a mess.  It'll also lead to kernels 
that are only compatible with specific designs, which we're trying very 
hard to avoid.  At a bare minimum we'll need some way to detect systems 
with these page table bits before setting them, and some description of 
what the bits actually do so we can reason about them.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help