Thread (53 messages) 53 messages, 6 authors, 2013-01-11

Re: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD

From: Eric Dumazet <hidden>
Date: 2013-01-02 17:45:53
Also in: linux-fsdevel, lkml
Subsystem: filesystems (vfs and infrastructure), the rest · Maintainers: Alexander Viro, Christian Brauner, Linus Torvalds

On Tue, 2013-01-01 at 23:56 +0000, Eric Wong wrote:
quoted hunk ↗ jump to hunk
Linus Torvalds [off-list ref] wrote:
quoted
Please document the barrier that this mb() pairs with, and then give
an explanation for the fix in the commit message, and I'll happily
take it. Even if it's just duplicating the comments above the
wq_has_sleeper() function, except modified for the ep_modify() case.
Hopefully my explanation is correct and makes sense below,
I think both effects of the barrier are needed
quoted
Of course, it would be good to get verification from Jason and Andreas
that the alternate patch also works for them.
Jason just confirmed it.

------------------------------- 8< ----------------------------
From 02f43757d04bb6f2786e79eecf1cfa82e6574379 Mon Sep 17 00:00:00 2001
From: Eric Wong <redacted>
Date: Tue, 1 Jan 2013 21:20:27 +0000
Subject: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD

EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to
ensure events are not missed.  Since the modifications to the interest
mask are not protected by the same lock as ep_poll_callback, we need to
ensure the change is visible to other CPUs calling ep_poll_callback.

We also need to ensure f_op->poll() has an up-to-date view of past
events which occured before we modified the interest mask.  So this
barrier also pairs with the barrier in wq_has_sleeper().

This should guarantee either ep_poll_callback or f_op->poll() (or both)
will notice the readiness of a recently-ready/modified item.

This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in:
http://thread.gmane.org/gmane.linux.kernel/1408782/

Signed-off-by: Eric Wong <redacted>
Cc: Hans Verkuil <redacted>
Cc: Jiri Olsa <redacted>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <redacted>
Cc: Hans de Goede <redacted>
Cc: Mauro Carvalho Chehab <redacted>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <redacted>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andreas Voellmy <redacted>
Tested-by: "Junchang(Jason) Wang" <redacted>
Cc: netdev@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
---
 fs/eventpoll.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index cd96649..39573ee 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1285,7 +1285,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
 	 * otherwise we might miss an event that happens between the
 	 * f_op->poll() call and the new event set registering.
 	 */
-	epi->event.events = event->events;
+	epi->event.events = event->events; /* need barrier below */
 	pt._key = event->events;
 	epi->event.data = event->data; /* protected by mtx */
 	if (epi->event.events & EPOLLWAKEUP) {
@@ -1296,6 +1296,26 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
 	}
 
 	/*
+	 * The following barrier has two effects:
+	 *
+	 * 1) Flush epi changes above to other CPUs.  This ensures
+	 *    we do not miss events from ep_poll_callback if an
+	 *    event occurs immediately after we call f_op->poll().
+	 *    We need this because we did not take ep->lock while
+	 *    changing epi above (but ep_poll_callback does take
+	 *    ep->lock).
+	 *
+	 * 2) We also need to ensure we do not miss _past_ events
+	 *    when calling f_op->poll().  This barrier also
+	 *    pairs with the barrier in wq_has_sleeper (see
+	 *    comments for wq_has_sleeper).
+	 *
+	 * This barrier will now guarantee ep_poll_callback or f_op->poll
+	 * (or both) will notice the readiness of an item.
+	 */
+	smp_mb();
+
+	/*
 	 * Get current event bits. We can safely use the file* here because
 	 * its usage count has been increased by the caller of this function.
 	 */
-- 
Eric Wong
First, thanks for working on this issue.

It seems the real problem is the epi->event.events = event->events;
which is done without taking ep->lock

While a smp_mb() could reduce the race window, I believe there is still
a race, and the following patch would close it.
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index be56b21..25e5c53 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1313,7 +1313,10 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
 	 * otherwise we might miss an event that happens between the
 	 * f_op->poll() call and the new event set registering.
 	 */
+	spin_lock_irq(&ep->lock);
 	epi->event.events = event->events;
+	spin_unlock_irq(&ep->lock);
+
 	pt._key = event->events;
 	epi->event.data = event->data; /* protected by mtx */
 	if (epi->event.events & EPOLLWAKEUP) {




Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help