Thread (5 messages) 5 messages, 3 authors, 2025-06-15

Re: [PATCH] ALSA: pcm: Convert snd_pcm_ioctl_sync_ptr_{compat/x32} to user_access_begin/user_access_end()

From: Takashi Iwai <hidden>
Date: 2025-06-13 09:15:14
Also in: linux-sound, lkml

On Fri, 13 Jun 2025 07:24:46 +0200,
Christophe Leroy wrote:


Le 12/06/2025 à 13:02, Takashi Iwai a écrit :
quoted
On Thu, 12 Jun 2025 12:39:39 +0200,
Christophe Leroy wrote:
quoted
With user access protection (Called SMAP on x86 or KUAP on powerpc)
each and every call to get_user() or put_user() performs heavy
operations to unlock and lock kernel access to userspace.

SNDRV_PCM_IOCTL_SYNC_PTR ioctl is a hot path that needs to be
optimised. To do that, perform user accesses by blocks using
user_access_begin/user_access_end() and unsafe_get_user()/
unsafe_put_user() and alike.

Before the patch the 9 calls to put_user() at the
end of snd_pcm_ioctl_sync_ptr_compat() imply the following set of
instructions about 9 times (access_ok - enable user - write - disable
user):
     0.00 :   c057f858:       3d 20 7f ff     lis     r9,32767
     0.29 :   c057f85c:       39 5e 00 14     addi    r10,r30,20
     0.77 :   c057f860:       61 29 ff fc     ori     r9,r9,65532
     0.32 :   c057f864:       7c 0a 48 40     cmplw   r10,r9
     0.36 :   c057f868:       41 a1 fb 58     bgt     c057f3c0 <snd_pcm_ioctl+0xbb0>
     0.30 :   c057f86c:       3d 20 dc 00     lis     r9,-9216
     1.95 :   c057f870:       7d 3a c3 a6     mtspr   794,r9
     0.33 :   c057f874:       92 8a 00 00     stw     r20,0(r10)
     0.27 :   c057f878:       3d 20 de 00     lis     r9,-8704
     0.28 :   c057f87c:       7d 3a c3 a6     mtspr   794,r9
...

A perf profile shows that in total the 9 put_user() represent 36% of
the time spent in snd_pcm_ioctl() and about 80 instructions.

With this patch everything is done in 13 instructions and represent
only 15% of the time spent in snd_pcm_ioctl():

     0.57 :   c057f5dc:       3d 20 dc 00     lis     r9,-9216
     0.98 :   c057f5e0:       7d 3a c3 a6     mtspr   794,r9
     0.16 :   c057f5e4:       92 7f 00 04     stw     r19,4(r31)
     0.63 :   c057f5e8:       93 df 00 0c     stw     r30,12(r31)
     0.16 :   c057f5ec:       93 9f 00 10     stw     r28,16(r31)
     4.95 :   c057f5f0:       92 9f 00 14     stw     r20,20(r31)
     0.19 :   c057f5f4:       92 5f 00 18     stw     r18,24(r31)
     0.49 :   c057f5f8:       92 bf 00 1c     stw     r21,28(r31)
     0.27 :   c057f5fc:       93 7f 00 20     stw     r27,32(r31)
     5.88 :   c057f600:       93 36 00 00     stw     r25,0(r22)
     0.11 :   c057f604:       93 17 00 00     stw     r24,0(r23)
     0.00 :   c057f608:       3d 20 de 00     lis     r9,-8704
     0.79 :   c057f60c:       7d 3a c3 a6     mtspr   794,r9

Note that here the access_ok() in user_write_access_begin() is skipped
because the exact same verification has already been performed at the
beginning of the fonction with the call to user_read_access_begin().

Signed-off-by: Christophe Leroy <redacted>
---
This is a lighter version of previous patch "[PATCH v2] ALSA: pcm: Convert multiple {get/put}_user to user_access_begin/user_access_end()" focussing on identified hot path.
Moved and nested the failure labels closer in order to increase readability
Thanks for the revised patch!

Although it's now much lighter, I still believe that we can reduce
get_user() / put_user() calls significantly by adjusting the struct
usage.

Could you check whether the patch below can improve?
(Zero-ing of sstatus can be an overhead here, but there are some
holes, and these need to be initialized before copying back...)
Thanks for the proposed patch. Unfortunately it doesn't improve the
situation. The problem with copy_from_user() and copy_to_user() is
that they perform quite bad on small regions. And for the from_user
side we still get two user access enable/disable instead 3 and for the
to_user side we still get two as well (Allthough we had 7
previously). Those 4 user access enable/disable still have a cost.

Nowadays the tendency is really to go for the unsafe_put/get_user
style, see some significant exemples below. And as mentioned in those
commits, behind the performance improvement it also lead to much
cleaner code generation.
- 38ebcf5096a8 ("scm: optimize put_cmsg()")
- 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()")
- ef0ba0553829 ("poll: fix performance regression due to out-of-line
__put_user()")

Commit 38ebcf5096a8 is explicit about copy_to_user() being bad for
small regions.

Here below is some comparison between the three way of doing
snd_pcm_ioctl_sync_ptr_compat(), output is from 'perf top':

Initially I got the following. That 12%+ is the reason why I started
investigating.

    14.20%  test_perf           [.] engine_main
==> 12.86%  [kernel]            [k] snd_pcm_ioctl
    11.91%  [kernel]            [k] finish_task_switch.isra.0
     4.15%  [kernel]            [k] snd_pcm_group_unlock_irq.part.0
     4.07%  libc.so.6           [.] __ioctl_time64
     3.58%  libasound.so.2.0.0  [.] __snd_pcm_mmap_begin_generic
     3.37%  [kernel]            [k] sys_ioctl
     2.96%  libasound.so.2.0.0  [.] snd_pcm_hw_avail_update
     2.73%  libasound.so.2.0.0  [.] __snd_pcm_mmap_begin
     2.58%  [kernel]            [k] system_call_exception
     1.93%  libasound.so.2.0.0  [.] sync_ptr1
     1.85%  libasound.so.2.0.0  [.] snd_pcm_unlock
     1.84%  libasound.so.2.0.0  [.] snd_pcm_mmap_begin
     1.83%  libasound.so.2.0.0  [.] bad_pcm_state
     1.68%  libasound.so.2.0.0  [.] snd_pcm_mmap_avail
     1.67%  libasound.so.2.0.0  [.] snd_pcm_avail_update

With _your_ patch I get the following. copy_from_user() calls
_copy_from_user() and copy_to_user() calls _copy_to_user(). Both then
call __copy_tofrom_user(). In total it is 16.4% so it is worse than
before.

    14.47%  test_perf           [.] engine_main
    12.00%  [kernel]            [k] finish_task_switch.isra.0
==>  8.37%  [kernel]            [k] snd_pcm_ioctl
     5.44%  libc.so.6           [.] __ioctl_time64
     5.03%  [kernel]            [k] snd_pcm_group_unlock_irq.part.0
==>  4.86%  [kernel]            [k] __copy_tofrom_user
     4.62%  [kernel]            [k] sys_ioctl
     3.22%  [kernel]            [k] system_call_exception
     2.42%  libasound.so.2.0.0  [.] snd_pcm_mmap_begin
     2.31%  [kernel]            [k] fdget
     2.23%  libasound.so.2.0.0  [.] __snd_pcm_mmap_begin_generic
     2.19%  [kernel]            [k] syscall_exit_prepare
     1.92%  libasound.so.2.0.0  [.] snd_pcm_mmap_avail
     1.86%  libasound.so.2.0.0  [.] __snd_pcm_mmap_begin
     1.68%  libasound.so.2.0.0  [.] snd_pcm_hw_avail_update
==>  1.67%  [kernel]            [k] _copy_from_user
     1.66%  libasound.so.2.0.0  [.] bad_pcm_state
==>  1.53%  [kernel]            [k] _copy_to_user
     1.40%  libasound.so.2.0.0  [.] sync_ptr1

With my patch I get the following:

    17.46%  test_perf           [.] engine_main
     9.14%  [kernel]            [k] finish_task_switch.isra.0
==>  4.92%  [kernel]            [k] snd_pcm_ioctl
     3.99%  [kernel]            [k] snd_pcm_group_unlock_irq.part.0
     3.71%  libc.so.6           [.] __ioctl_time64
     3.61%  libasound.so.2.0.0  [.] __snd_pcm_mmap_begin_generic
     2.72%  libasound.so.2.0.0  [.] sync_ptr1
     2.65%  [kernel]            [k] system_call_exception
     2.46%  [kernel]            [k] sys_ioctl
     2.43%  [kernel]            [k] __rseq_handle_notify_resume
     2.34%  [kernel]            [k] do_epoll_wait
     2.30%  libasound.so.2.0.0  [.] __snd_pcm_mmap_commit
     2.14%  libasound.so.2.0.0  [.] __snd_pcm_avail
     2.04%  libasound.so.2.0.0  [.] snd_pcm_hw_avail_update
     1.89%  libasound.so.2.0.0  [.] snd_pcm_lock
     1.84%  libasound.so.2.0.0  [.] snd_pcm_mmap_avail
     1.76%  libasound.so.2.0.0  [.] __snd_pcm_avail_update
     1.61%  libasound.so.2.0.0  [.] bad_pcm_state
     1.60%  libasound.so.2.0.0  [.] __snd_pcm_mmap_begin
     1.49%  libasound.so.2.0.0  [.] query_status_data
Thanks for the detailed analysis!  Then unsafe_*_user() seems to be
the way to go.  I'll check your latest patches.


Takashi
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help