Thread (32 messages) 32 messages, 6 authors, 2024-01-25

Re: [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas()

From: Andrew Donnellan <hidden>
Date: 2023-03-23 06:27:48

On Mon, 2023-03-06 at 15:33 -0600, Nathan Lynch via B4 Relay wrote:
From: Nathan Lynch <redacted>

The kernel can handle retrying RTAS function calls in response to
-2/990x in the sys_rtas() handler instead of relaying the
intermediate
status to user space.

Justifications:

* Currently it's nondeterministic and quite variable in practice
  whether a retry status is returned for any given invocation of
  sys_rtas(). Therefore user space code cannot be expecting a retry
  result without already being broken.

* This tends to significantly reduce the total number of system calls
  issued by programs such as drmgr which make use of sys_rtas(),
  improving the experience of tracing and debugging such
  programs. This is the main motivation for me: I think this change
  will make it easier for us to characterize current sys_rtas() use
  cases as we move them to other interfaces over time.

* It reduces the number of opportunities for user space to leave
  complex operations, such as those associated with DLPAR, incomplete
  and diffcult to recover.

* We can expect performance improvements for existing sys_rtas()
  users, not only because of overall reduction in the number of
system
  calls issued, but also due to the better handling of -2/990x in the
  kernel. For example, librtas still sleeps for 1ms on -2, which is
  completely unnecessary.
Would be good to see this fixed on the librtas side.
Performance differences for PHB add and remove on a small P10 PowerVM
partition are included below. For add, elapsed time is slightly
reduced. For remove, there are more significant improvements: the
number of context switches is reduced by an order of magnitude, and
elapsed time is reduced by over half.

(- before, + after):

  Performance counter stats for 'drmgr -c phb -a -s PHB 23' (5 runs):

-          1,847.58 msec task-clock                       #    0.135
CPUs utilized               ( +- 14.15% )
-            10,867      cs                               #    9.800
K/sec                       ( +- 14.14% )
+          1,901.15 msec task-clock                       #    0.148
CPUs utilized               ( +- 14.13% )
+            10,451      cs                               #    9.158
K/sec                       ( +- 14.14% )

-         13.656557 +- 0.000124 seconds time elapsed  ( +-  0.00% )
+          12.88080 +- 0.00404 seconds time elapsed  ( +-  0.03% )

  Performance counter stats for 'drmgr -c phb -r -s PHB 23' (5 runs):

-          1,473.75 msec task-clock                       #    0.092
CPUs utilized               ( +- 14.15% )
-             2,652      cs                               #    3.000
K/sec                       ( +- 14.16% )
+          1,444.55 msec task-clock                       #    0.221
CPUs utilized               ( +- 14.14% )
+               104      cs                               #  119.957
/sec                        ( +- 14.63% )

-          15.99718 +- 0.00801 seconds time elapsed  ( +-  0.05% )
+           6.54256 +- 0.00830 seconds time elapsed  ( +-  0.13% )

Move the existing rtas_lock-guarded critical section in sys_rtas()
into a conventional rtas_busy_delay()-based loop, returning to user
space only when a final success or failure result is available.

Signed-off-by: Nathan Lynch <redacted>
Should there be some kind of timeout? I'm a bit worried by sleeping in
a syscall for an extended period.

-- 
Andrew Donnellan    OzLabs, ADL Canberra
ajd@linux.ibm.com   IBM Australia Limited
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help