Re: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD
From: Eric Dumazet <hidden>
Date: 2013-01-02 17:45:53
Also in:
linux-fsdevel, lkml
Subsystem:
filesystems (vfs and infrastructure), the rest · Maintainers:
Alexander Viro, Christian Brauner, Linus Torvalds
On Tue, 2013-01-01 at 23:56 +0000, Eric Wong wrote:
quoted hunk ↗ jump to hunk
Linus Torvalds [off-list ref] wrote:quoted
Please document the barrier that this mb() pairs with, and then give an explanation for the fix in the commit message, and I'll happily take it. Even if it's just duplicating the comments above the wq_has_sleeper() function, except modified for the ep_modify() case.Hopefully my explanation is correct and makes sense below, I think both effects of the barrier are neededquoted
Of course, it would be good to get verification from Jason and Andreas that the alternate patch also works for them.Jason just confirmed it. ------------------------------- 8< ---------------------------- From 02f43757d04bb6f2786e79eecf1cfa82e6574379 Mon Sep 17 00:00:00 2001 From: Eric Wong <redacted> Date: Tue, 1 Jan 2013 21:20:27 +0000 Subject: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to ensure events are not missed. Since the modifications to the interest mask are not protected by the same lock as ep_poll_callback, we need to ensure the change is visible to other CPUs calling ep_poll_callback. We also need to ensure f_op->poll() has an up-to-date view of past events which occured before we modified the interest mask. So this barrier also pairs with the barrier in wq_has_sleeper(). This should guarantee either ep_poll_callback or f_op->poll() (or both) will notice the readiness of a recently-ready/modified item. This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in: http://thread.gmane.org/gmane.linux.kernel/1408782/ Signed-off-by: Eric Wong <redacted> Cc: Hans Verkuil <redacted> Cc: Jiri Olsa <redacted> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <redacted> Cc: Hans de Goede <redacted> Cc: Mauro Carvalho Chehab <redacted> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <redacted> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andreas Voellmy <redacted> Tested-by: "Junchang(Jason) Wang" <redacted> Cc: netdev@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org --- fs/eventpoll.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-)diff --git a/fs/eventpoll.c b/fs/eventpoll.c index cd96649..39573ee 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c@@ -1285,7 +1285,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even * otherwise we might miss an event that happens between the * f_op->poll() call and the new event set registering. */ - epi->event.events = event->events; + epi->event.events = event->events; /* need barrier below */ pt._key = event->events; epi->event.data = event->data; /* protected by mtx */ if (epi->event.events & EPOLLWAKEUP) {@@ -1296,6 +1296,26 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even } /* + * The following barrier has two effects: + * + * 1) Flush epi changes above to other CPUs. This ensures + * we do not miss events from ep_poll_callback if an + * event occurs immediately after we call f_op->poll(). + * We need this because we did not take ep->lock while + * changing epi above (but ep_poll_callback does take + * ep->lock). + * + * 2) We also need to ensure we do not miss _past_ events + * when calling f_op->poll(). This barrier also + * pairs with the barrier in wq_has_sleeper (see + * comments for wq_has_sleeper). + * + * This barrier will now guarantee ep_poll_callback or f_op->poll + * (or both) will notice the readiness of an item. + */ + smp_mb(); + + /* * Get current event bits. We can safely use the file* here because * its usage count has been increased by the caller of this function. */-- Eric Wong
First, thanks for working on this issue. It seems the real problem is the epi->event.events = event->events; which is done without taking ep->lock While a smp_mb() could reduce the race window, I believe there is still a race, and the following patch would close it.
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index be56b21..25e5c53 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c@@ -1313,7 +1313,10 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even * otherwise we might miss an event that happens between the * f_op->poll() call and the new event set registering. */ + spin_lock_irq(&ep->lock); epi->event.events = event->events; + spin_unlock_irq(&ep->lock); + pt._key = event->events; epi->event.data = event->data; /* protected by mtx */ if (epi->event.events & EPOLLWAKEUP) {