Re: epoll_wait misses edge-triggered eventfd events: bug in Linux 5.3 and 5.4
From: Jakub Kicinski <kuba@kernel.org>
Date: 2020-02-01 20:16:52
Also in:
linux-fsdevel, lkml
From: Jakub Kicinski <kuba@kernel.org>
Date: 2020-02-01 20:16:52
Also in:
linux-fsdevel, lkml
On Fri, 31 Jan 2020 14:57:30 +0100, Max Neunhoeffer wrote:
Dear All, I believe I have found a bug in Linux 5.3 and 5.4 in epoll_wait/epoll_ctl when an eventfd together with edge-triggered or the EPOLLONESHOT policy is used. If an epoll_ctl call to rearm the eventfd happens approximately at the same time as the epoll_wait goes to sleep, the event can be lost, even though proper protection through a mutex is employed. The details together with two programs showing the problem can be found here: https://bugzilla.kernel.org/show_bug.cgi?id=205933 Older kernels seem not to have this problem, although I did not test all versions. I know that 4.15 and 5.0 do not show the problem. Note that this method of using epoll_wait/eventfd is used by boost::asio to wake up event loops in case a new completion handler is posted to an io_service, so this is probably relevant for many applications. Any help with this would be appreciated.
Could be networking related but let's CC FS folks just in case. Would you be able to perform bisection to narrow down the search for a buggy change?