Thread (27 messages) 27 messages, 6 authors, 2025-12-02

Re: [PATCH net-next] vhost: use "checked" versions of get_user() and put_user()

From: Jason Wang <jasowang@redhat.com>
Date: 2025-11-20 01:57:26
Also in: kvm, lkml, virtualization

On Tue, Nov 18, 2025 at 1:35 AM Jon Kohler [off-list ref] wrote:

quoted
On Nov 16, 2025, at 11:32 PM, Jason Wang [off-list ref] wrote:

On Fri, Nov 14, 2025 at 10:53 PM Jon Kohler [off-list ref] wrote:
quoted

quoted
On Nov 12, 2025, at 8:09 PM, Jason Wang [off-list ref] wrote:

!-------------------------------------------------------------------|
CAUTION: External Email

|-------------------------------------------------------------------!

On Thu, Nov 13, 2025 at 8:14 AM Jon Kohler [off-list ref] wrote:
quoted
vhost_get_user and vhost_put_user leverage __get_user and __put_user,
respectively, which were both added in 2016 by commit 6b1e6cc7855b
("vhost: new device IOTLB API").
It has been used even before this commit.
Ah, thanks for the pointer. I’d have to go dig to find its genesis, but
its more to say, this existed prior to the LFENCE commit.
quoted
quoted
In a heavy UDP transmit workload on a
vhost-net backed tap device, these functions showed up as ~11.6% of
samples in a flamegraph of the underlying vhost worker thread.

Quoting Linus from [1]:
  Anyway, every single __get_user() call I looked at looked like
  historical garbage. [...] End result: I get the feeling that we
  should just do a global search-and-replace of the __get_user/
  __put_user users, replace them with plain get_user/put_user instead,
  and then fix up any fallout (eg the coco code).

Switch to plain get_user/put_user in vhost, which results in a slight
throughput speedup. get_user now about ~8.4% of samples in flamegraph.

Basic iperf3 test on a Intel 5416S CPU with Ubuntu 25.10 guest:
TX: taskset -c 2 iperf3 -c <rx_ip> -t 60 -p 5200 -b 0 -u -i 5
RX: taskset -c 2 iperf3 -s -p 5200 -D
Before: 6.08 Gbits/sec
After:  6.32 Gbits/sec
I wonder if we need to test on archs like ARM.
Are you thinking from a performance perspective? Or a correctness one?
Performance, I think the patch is correct.

Thanks
Ok gotcha. If anyone has an ARM system stuffed in their
front pocket and can give this a poke, I’d appreciate it, as
I don’t have ready access to one personally.

That said, I think this might end up in “well, it is what it is”
territory as Linus was alluding to, i.e. if performance dips on
ARM for vhost, then thats a compelling point to optimize whatever
ends up being the culprit for get/put user?

Said another way, would ARM perf testing (or any other arch) be a
blocker to taking this change?
Not a must but at least we need to explain the implication for other
archs as the discussion you quoted are all for x86.

Thanks
Thanks - Jon
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help