Thread (56 messages) 56 messages, 12 authors, 2021-06-09

RE: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

From: Anup Patel <hidden>
Date: 2021-06-07 03:38:29
Also in: linux-riscv, linux-sunxi, lkml

-----Original Message-----
From: Guo Ren <guoren@kernel.org>
Sent: 06 June 2021 22:42
To: Anup Patel <redacted>; Atish Patra <redacted>
Cc: Palmer Dabbelt <palmer@dabbelt.com>; anup@brainfault.org;
drew@beagleboard.org; Christoph Hellwig [off-list ref]; wefu@redhat.com;
lazyparser@gmail.com; linux-riscv@lists.infradead.org; linux-
kernel@vger.kernel.org; linux-arch@vger.kernel.org; linux-
sunxi@lists.linux.dev; guoren@linux.alibaba.com; Paul Walmsley
[off-list ref]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

Hi Anup and Atish,

On Thu, Jun 3, 2021 at 2:00 PM Anup Patel [off-list ref] wrote:
quoted

quoted
-----Original Message-----
From: Palmer Dabbelt <palmer@dabbelt.com>
Sent: 03 June 2021 09:43
To: guoren@kernel.org
Cc: anup@brainfault.org; drew@beagleboard.org; Christoph Hellwig
[off-list ref]; Anup Patel [off-list ref]; wefu@redhat.com;
lazyparser@gmail.com; linux-riscv@lists.infradead.org; linux-
kernel@vger.kernel.org; linux-arch@vger.kernel.org; linux-
sunxi@lists.linux.dev; guoren@linux.alibaba.com; Paul Walmsley
[off-list ref]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Sat, 29 May 2021 17:30:18 PDT (-0700), Palmer Dabbelt wrote:
quoted
On Fri, 21 May 2021 17:36:08 PDT (-0700), guoren@kernel.org wrote:
quoted
On Wed, May 19, 2021 at 3:15 PM Anup Patel [off-list ref]
wrote:
quoted
quoted
quoted
On Wed, May 19, 2021 at 12:24 PM Drew Fustini
[off-list ref] wrote:
quoted
quoted
quoted
quoted
On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig
wrote:
quoted
quoted
quoted
quoted
quoted
On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote:
quoted
Since the existing RISC-V ISA cannot solve this problem,
it is better to provide some configuration for the SOC
vendor to
customize.
quoted
quoted
quoted
quoted
quoted
We've been talking about this problem for close to five years.
So no, if you don't manage to get the feature into the ISA
it can't be supported.
Isn't it a good goal for Linux to support the capabilities
present in the SoC that a currently being fab'd?

I believe the CMO group only started last year [1] so the
RV64GC SoCs that are going into mass production this year
would not have had the opporuntiy of utilizing any RISC-V ISA
extension for handling cache management.
The current Linux RISC-V policy is to only accept patches for
frozen or ratified ISA specs.
(Refer, Documentation/riscv/patch-acceptance.rst)

This means even if emulate CMO instructions in OpenSBI, the
Linux patches won't be taken by Palmer because CMO specification
is still in draft stage.
Before CMO specification release, could we use a sbi_ecall to
solve the current problem? This is not against the specification,
when CMO is ready we could let users choose to use the new CMO in
Linux.
quoted
quoted
quoted
quoted
From a tech view, CMO trap emulation is the same as sbi_ecall.
quoted
Also, we all know how much time it takes for RISCV international
to freeze some spec. Judging by that we are looking at another
3-4 years at minimum.
Sorry for being slow here, this thread got buried.

I've been trying to work with a handful of folks at the RISC-V
foundation to try and get a subset of the various in-development
specifications (some simple CMOs, something about non-caching in
the page tables, and some way to prevent speculative accesse from
generating coherence traffic that will break non-coherent systems).
I'm not sure we can get this together quickly, but I'd prefer to
at least try before we jump to taking vendor-specificed behavior here.
It's obviously an up-hill battle to try and get specifications
through the process and I'm certainly not going to promise it will
work, but I'm hoping that the impending need to avoid forking the
ISA will be sufficient to get people behind producing some
specifications in a timely
fashion.
quoted
I wasn't aware than this chip had non-coherent devices until I saw
this thread, so we'd been mostly focused on the Beagle V chip.
That was in a sense an easier problem because the SiFive IP in it
was never designed to have non-coherent devices so we'd have to
make anything work via a series of slow workarounds, which would
make emulating the eventually standardized behavior reasonable in
terms of performance (ie, everything would be super slow so who really
cares).
quoted
quoted
quoted
I don't think relying on some sort of SBI call for the CMOs whould
be such a performance hit that it would prevent these systems from
being viable, but assuming you have reasonable performance on your
non-cached accesses then that's probably not going to be viable to
trap and emulate.  At that point it really just becomes silly to
pretend that we're still making things work by emulating the
eventually ratified behavior, as anyone who actually tries to use
this thing to do IO would need out of tree patches.  I'm not sure
exactly what the plan is for the page table bits in the
specification right now, but if you can give me a pointer to some
documentation then I'm happy to try and push for something
compatible.
quoted
quoted
quoted
If we can't make the process work at the foundation then I'd be
strongly in favor of just biting the bullet and starting to take
vendor-specific code that's been implemented in hardware and is
necessarry to make things work acceptably.  That's obviously a
sub-optimal solution as it'll lead to a bunch of ISA
fragmentation, but at least we'll be able to keep the software stack
together.
quoted
quoted
quoted
Can you tell us when these will be in the hands of users?  That's
pretty important here, as I don't want to be blocking real users
from having their hardware work.  IIRC there were some plans to
distribute early boards, but it looks like the foundation got
involved and I guess I lost the thread at that point.

Sorry this is all such a headache, but hopefully we can get things
sorted out.
I talked with some of the RISC-V foundation folks, we're not going
to have an ISA specification for the non-coherent stuff any time
soon.  I took a look at this code and I definately don't want to
take it as is, but I'm not opposed to taking something that makes the
hardware work as long as it's a lot cleaner.
quoted
quoted
We've already got two of these non-coherent chips, I'm sure more
will come, and I'd rather have the extra headaches than make
everyone fork the software stack.
Thanks for confirming. The CMO extension is still in early stages so
it will certainly take more time for them. After CMO extension is
finalized, it will take some more time to have actual RISC-V platforms with
CMO implemented.
quoted
quoted
After talking to Atish it looks like there's likely to be an SBI
extension to handle the CMOs, which should let us avoid the bulk of
the vendor-specific behavior in the kernel.  I know some people are
worried about adding to the SBI surface.  I'm worried about that
too, but that's way better than sticking a bunch of vendor-specific
instructions into the kernel.  The SBI extension should make for a
straight-forward cache flush implementation in Linux, so let's just plan on
that getting through quickly (as has been done before).
quoted
Yes, I agree. We can have just a single SBI call which is meant for
DMA sync purpose only which means it will flush/invalidate pages from
all cache levels irrespective of the cache hierarchy (i.e.
flush/invalidate to RAM). The CMO extension might more generic cache
operations which can target any cache level.

I am already preparing a write-up for SBI DMA sync call in SBI spec. I
will share it with you separately as well.
quoted
Unfortunately we've yet to come up with a way to handle the
non-cacheable mappings without introducing a degree of
vendor-specific behavior or seriously impacting performance (mark
them as not valid and deal with them in the trap handler).  I'm not
really sure it counts as supporting the hardware if it's massively
slow, so that really leaves us with vendor-specific mappings as the only
option to make these chips work.
quoted
A RISC-V platform can have non-cacheable mappings is following
possible
ways:
1) Fixed physical address range as non-cacheable using PMAs
2) Custom page table attributes
3) Svpmbt extension being defined by RVI

Atish and me both think it is possible to have RISC-V specific DMA ops
implementation which can handle all above case. Atish is already
working on DMA ops implementation for RISC-V.
Not only DMA ops, but also icache_sync & __vdso_icache_sync. Please have a
look at:
https://lore.kernel.org/linux-riscv/1622970249-50770-12-git-send-email-
guoren@kernel.org/T/#u
The icache_sync and __vdso_icache_sync will have to be addressed
differently. The SBI DMA sync extension cannot address this.

It seems Allwinner D1 have more non-standard stuff:
1) Custom PTE bits for IO-coherent access
2) Custom data cache flush/invalidate for DMA sync
3) Custom icache flush/invalidate

Other hand, BeagleV has only two problems:
1) Custom physical address range for IO-coherent access
2) Custom L2 cache flush/invalidate for DMA sync

From above #2, can be solved by SBI DMA sync call and
Linux DMA ops for both BeagleV and Allwinner D1

On BeagleV, issue #1 can be solved using "dma-ranges".

On Allwinner D1, issues #1 and #3 need to be addressed
separately.

I think supporting BeagleV in upstream Linux is relatively
easy compared to Allwinner D1.

@Guo, please check if you can reserve dedicated
physical address range for IO-coherent access (just like
BeagleV). If yes, then we can tackle issue #1 for Allwinner
D1 using "dma-ranges" DT property.

Regards,
Anup
quoted
quoted
This implementation, which adds some Kconfig entries that control
page table bits, definately isn't suitable for upstream.  Allowing
users to set arbitrary page table bits will eventually conflict with
the standard, and is just going to be a mess.  It'll also lead to
kernels that are only compatible with specific designs, which we're
trying very hard to avoid.  At a bare minimum we'll need some way to
detect systems with these page table bits before setting them, and
some description of what the bits actually do so we can reason about
them.
quoted
Yes, vendor specific Kconfig options are strict NO NO. We can't
give-up the goal of unified kernel image for all platforms.

Regards,
Anup


--
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help