Thread (16 messages) 16 messages, 5 authors, 2020-08-25

RE: pcm|dmaengine|imx-sdma race condition on i.MX6

From: Robin Gong <hidden>
Date: 2020-08-14 08:45:25
Also in: alsa-devel, dmaengine, lkml

On 2020/08/13 19:23: Richard Leitner [off-list ref] wrote:
Hi,
we've found a race condition with the PCM on the i.MX6 which results in an
-EIO for the SNDRV_PCM_IOCTL_READI_FRAMES ioctl after an -EPIPE (XRUN).

A possible reproduction may look like the following reduced call graph during a
PCM capture:

us -> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES)
      - wait_for_avail()
        - schedule_timeout()
   -> snd_pcm_update_hw_ptr0()
      - snd_pcm_update_state: EPIPE (XRUN)
      - sdma_disable_channel_async() # get's scheduled away due to sleep us
<- ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) returns -EPIPE us ->
ioctl(SNDRV_PCM_IOCTL_PREPARE) # as reaction to the EPIPE (XRUN) us ->
ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) # next try to capture frames
      - sdma_prep_dma_cyclic()
        - sdma_load_context() # not loaded as context_loaded is 1
      - wait_for_avail()
        - schedule_timeout()
# now the sdma_channel_terminate_work() comes back and sets #
context_loaded = false and frees in vchan_dma_desc_free_list().
us <- ioctl returns -EIO (capture write error (DMA or IRQ trouble?))
Seems the write error caused by context_loaded not set to false before
next transfer start? If yes, please have a try with the 03/04 of the below
patch set, anyway, could you post your failure log?
https://lkml.org/lkml/2020/8/11/111

What we have found out, based on our understanding:
The dmaengine docu states that a dmaengine_terminate_async() must be
followed by a dmaengine_synchronize().
However, in the pcm_dmaengine.c, only dmaengine_terminate_async() is
called (for performance reasons and because it might be called from an
interrupt handler).

In our tests, we saw that the user-space immediately calls
ioctl(SNDRV_PCM_IOCTL_PREPARE) as a handler for the happened xrun
(previous ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) returns with -EPIPE). In
our case (imx-sdma.c), the terminate really happens asynchronously with a
worker thread which is not awaited/synchronized by the
ioctl(SNDRV_PCM_IOCTL_PREPARE) call.

Since the syscall immediately enters an atomic context
(snd_pcm_stream_lock_irq()), we are not able to flush the work of the
termination worker from within the DMA context. This leads to an
unterminated DMA getting re-initialized and then terminated.

On the i.MX6 platform the problem is (if I got it correctly) that the
sdma_channel_terminate_work() called after the -EPIPE gets scheduled away
(for the 1-2ms sleep [1]). During that time the userspace already sends in the
ioctl(SNDRV_PCM_IOCTL_PREPARE) and
ioctl(SNDRV_PCM_IOCTL_READI_FRAMES).
As none of them are anyhow synchronized to the terminate_worker the
vchan_dma_desc_free_list() [2] and "sdmac->context_loaded = false;" [3] are
executed during the wait_for_avail() [4] of the
ioctl(SNDRV_PCM_IOCTL_READI_FRAMES).

To make sure we identified the problem correctly we've tested to add a
"dmaengine_synchronize()" before the snd_pcm_prepare() in [5]. This fixed the
race condition in all our tests. (Before we were able to reproduce it in 100% of
the test runs).

Based on our understanding, there are two different points to ensure the
termination:
Either ensure that the termination is finished within the previous
SNDRV_PCM_IOCTL_READI_FRAMES call (inside the DMA context) or finishing
it in the SNDRV_PCM_IOCTL_PREPARE call (and all other applicable ioclts)
before entering the atomic context (from the PCM context).

We initially thought about implementing the first approach, basically splitting
up the dma_device terminate_all operation into a sync
(busy-wait) and a async one. This would align the operations with the
DMAengine interface and would enable a sync termination variant from atomic
contexts.
However, we saw that the dma_free_attrs() function has a WARN_ON on irqs
disabled, which would be the case for the sync variant. 
Side note: We found this issue on the current v5.4.y LTS branch, but it also
affects v5.8.y.

Any feedback or pointers how we may fix the problem are warmly welcome!
If anything is unclear please just ask :-)

regards;
Richard Leitner
Benjamin Bara

[1]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fdrivers%2Fdma%2Fimx-sdma.c%23
L1066&amp;data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f7
e7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637
329145824068928&amp;sdata=D9F%2FRUG27xv9nv8J1KtrLtld2eaI6gsXiWIAIgk
Avjw%3D&amp;reserved=0
[2]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fdrivers%2Fdma%2Fimx-sdma.c%23
L1071&amp;data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f7
e7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637
329145824068928&amp;sdata=0EKDVgzOZzL7TpX4ykhqjvpz5ryUHUpWw7frRe
cksBU%3D&amp;reserved=0
[3]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fdrivers%2Fdma%2Fimx-sdma.c%23
L1072&amp;data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f7
e7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637
329145824068928&amp;sdata=aIhatvb1ocQqyYCVFEg71LgJlRBoVusbDFPIxnte
PuY%3D&amp;reserved=0
[4]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fsound%2Fcore%2Fpcm_lib.c%23L1
825&amp;data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f7e
7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6373
29145824073919&amp;sdata=y0Udbd%2FKGaVgqLrcp6fNOlMlFCGHCMfojkpp
B4HzUuE%3D&amp;reserved=0
[5]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fsound%2Fcore%2Fpcm_native.c%2
3L3226&amp;data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f
7e7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63
7329145824073919&amp;sdata=ch3BQ5DDGU5HWXqIZSvUeFnBoRoP%2BMM
HEpnk8mIfWj8%3D&amp;reserved=0
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help