Thread (6 messages) 6 messages, 3 authors, 2024-12-04

Re: [PATCH v2] madvise: MADV_SOFT_OFFLINE requests can return -EBUSY

From: Luis Claudio R. Goncalves <hidden>
Date: 2024-12-04 20:45:26

On Wed, Dec 04, 2024 at 09:35:20PM +0100, Alejandro Colomar wrote:
Hi Luis, Tyonnchie,

On Fri, Nov 29, 2024 at 06:43:39PM -0300, Luis Claudio R. Goncalves wrote:
quoted
On Thu, Nov 28, 2024 at 12:35:48PM +0100, Alejandro Colomar wrote:
quoted
Hi Tyonnchie,

On Tue, Nov 26, 2024 at 11:12:03AM -0500, tyberry@redhat.com wrote:
quoted
If the page could not be offlined madvise will return -EBUSY. This might occur if the page is currently in use or locked.
Could you show this in a small example program (if possible)?
Like 30 lines or so.  If not, it's okay.
Hi Alejandro!

Given the ongoing holidays, let me take the liberty of giving some context
in order to keep the conversation going.

We received reports of failed LTP madvise11[1] tests. The errors looked
like this:

    madvise11.c:409: TINFO: Spawning 4 threads, with a total of 640 memory pages
    madvise11.c:132: TFAIL: madvise failed: EBUSY (16)
    madvise11.c:163: TINFO: Thread  [0]  returned 16, failed.
    madvise11.c:191: TFAIL: thread  [0]  - exited with errors
    madvise11.c:163: TINFO: Thread  [2]  returned 0, succeeded.
    madvise11.c:163: TINFO: Thread  [3]  returned 0, succeeded.
    madvise11.c:163: TINFO: Thread  [1]  returned 0, succeeded.
    madvise11.c:361: TINFO: Restore 629 Soft-offlined pages
    madvise11.c:290: TWARN: write(3,0x7ffce114b8a0,8) failed: EBUSY (16)

Clearly the problem had to do with -EBUSY being returned by a madvise()
operation. The bug was initially reported on kernels with PREEMPT_RT
enabled but we soon observed that the problem also happened with the stock
kernel, though requiring more repetitions to trigger issue.

After debug and investigation we observed that the -EBUSY return was a valid
case in the kernel code and was not being handled by the test. A fix was
sent to the LTP project by Li Wang[2], specifically for the madvise11 test.

In this process, we noticed that the man pages did not mention -EBUSY as a
possible result of a failed offlining operation, as described by Tyonnchie.

I hope this helps!
Thanks!  I've applied the patch, with some tweaks:
<https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=3205359a3a7079d9d40a50388e851874729a827a>

I added an Acked-by on your behalf, Luis.
Thank you!

You have all my respect for the great work you and many others do
with the man pages!

Luis
Have a lovely night!
Alex
quoted
Best regards,
Luis

[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise11.c
[2] https://lists.linux.it/pipermail/ltp/2024-May/038310.html

quoted
Have a lovely day!
Alex
quoted
Signed-off-by: Tyonnchie Berry <redacted>

---
diff --git a/man/man2/madvise.2 b/man/man2/madvise.2
index 4f2210ee2..c10dcd599 100644
--- a/man/man2/madvise.2
+++ b/man/man2/madvise.2
@@ -702,6 +702,13 @@ The map exists, but the area maps something that isn't a file.
 .BR MADV_COLLAPSE )
 Could not charge hugepage to cgroup: cgroup limit exceeded.
 .TP
+.B EBUSY
+(for
+.B MADV_SOFT_OFFLINE )
+If any pages within the add+length range could not be offlined,
+madvise will return -EBUSY.
+This might occur if the page is currently in use or locked.
+.TP
 .B EFAULT
 .I advice
 is
-- 
<https://www.alejandro-colomar.es/>

---end quoted text---


-- 
<https://www.alejandro-colomar.es/>

---end quoted text---

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help