Thread (29 messages) 29 messages, 6 authors, 2024-09-01

Re: Regarding patch "block/blk-mq: Don't complete locally if capacities are different"

From: MANISH PANDEY <hidden>
Date: 2024-08-01 09:25:57
Also in: lkml

++ adding linux-kernel group

On 7/31/2024 7:16 PM, MANISH PANDEY wrote:
Hi Qais Yousef,
Recently we observed below patch has been merged
https://lore.kernel.org/all/20240223155749.2958009-3-qyousef@layalina.io (local)

This patch is causing performance degradation ~20% in Random IO along 
with significant drop in Sequential IO performance. So we would like to 
revert this patch as it impacts MCQ UFS devices heavily. Though Non MCQ 
devices are also getting impacted due to this.

We have several concerns with the patch
1. This patch takes away the luxury of affining best possible cpus from 
   device drivers and limits driver to fall in same group of CPUs.

2. Why can't device driver use irq affinity to use desired CPUs to 
complete the IO request, instead of forcing it from block layer.

3. Already CPUs are grouped based on LLC, then if a new categorization 
is required ?
quoted
big performance impact if the IO request
was done from a CPU with higher capacity but the interrupt is serviced
on a lower capacity CPU.
This patch doesn't considers the issue of contention in submission path 
and completion path. Also what if we want to complete the request of 
smaller capacity CPU to Higher capacity CPU?
Shouldn't a device driver take care of this and allow the vendors to use 
the best possible combination they want to use?
Does it considers MCQ devices and different SQ<->CQ mappings?
quoted
Without the patch I see the BLOCK softirq always running on little cores
(where the hardirq is serviced). With it I can see it running on all
cores.
why we can't use echo 2 > rq_affinity to force complete on the same
group of CPUs from where request was initiated?
Also why to force vendors to always use SOFTIRQ for completion?
We should be flexible to either complete the IO request via IPI, HARDIRQ 
or SOFTIRQ.


An SoC can have different CPU configuration possible and this patch 
forces a restriction on the completion path. This problem is more worse 
in MCQ devices as we can have different SQ<->CQ mapping.

So we would like to revert the patch. Please let us know if any concerns?

Regards
Manish Pandey
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help