Re: [PATCH] llist: Fix missing lockless_dereference()
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date: 2015-02-10 03:42:10
Also in:
lkml, stable
----- Original Message -----
From: "Huang Ying" <redacted> To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com> Cc: "Michael Cree" <redacted>, "Greg KH" <gregkh@linuxfoundation.org>, linux-alpha@vger.kernel.org, "Richard Henderson" [off-list ref], "Ivan Kokshaysky" [off-list ref], "Matt Turner" [off-list ref], linux-kernel@vger.kernel.org, "Paul McKenney" [off-list ref], "David Howells" [off-list ref], "Pranith Kumar" [off-list ref], stable@vger.kernel.org Sent: Monday, February 9, 2015 8:52:28 PM Subject: Re: [PATCH] llist: Fix missing lockless_dereference() Hi, Mathieu, On Sun, 2015-02-08 at 04:25 +0000, Mathieu Desnoyers wrote:quoted
----- Original Message -----quoted
From: "Michael Cree" <redacted> To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com> Cc: "Greg KH" <gregkh@linuxfoundation.org>, linux-alpha@vger.kernel.org, "Richard Henderson" [off-list ref], "Ivan Kokshaysky" [off-list ref], "Matt Turner" [off-list ref], "Huang Ying" [off-list ref], linux-kernel@vger.kernel.org, "Paul McKenney" [off-list ref], "David Howells" [off-list ref], "Pranith Kumar" [off-list ref], stable@vger.kernel.org Sent: Saturday, February 7, 2015 7:47:29 PM Subject: Re: [PATCH] llist: Fix missing lockless_dereference() On Sat, Feb 07, 2015 at 10:30:44PM +0000, Mathieu Desnoyers wrote:quoted
quoted
On Fri, Feb 06, 2015 at 09:08:21PM -0500, Mathieu Desnoyers wrote:quoted
A lockless_dereference() appears to be missing in llist_del_first(). It should only matter for Alpha in practice.What could one anticipate to be the symptoms of such a missing lockless_dereference()?This can trigger corruption of the lockless linked-list, which is used across a few subsystems. AFAIU, the scenario is as follows. Please bear with me, because it's been a while since I've read on the Alpha multi-cache-banks behavior. The list here would be initially non-empty. Initial state of new_last->next is unset (newly allocated); IOW: garbage. CPU A adds a node into the list while CPU B removes a node from the head of the list. CPU A CPU B llist_add_batch() - Stores to new_last->next - implicit full mb before cmpxchg makes the update to CPU A's cache bank containing new_last->next visible to other CPUs before CPU A's cache bank update making head->first visible to other CPUs. - cmpxchg updates head->first = new_first llist_del_first() - entry = load head->first -> here, lack of barrier on Alpha creates a window where CPU B's cache bank can see the updated "head->first", but the cache bank holding the next value did not receive the update yet, since each cache bank have their own channel, which can be independently saturated. - next = load entry->next (dereference entry pointer) - cmpxchg updates head->first = next -> can store unset "next" value into head->first, thus corrupting the linked list.If my understanding were correct, cmpxchg will imply a full mb before and after it, so that there is a mb between load head->first in cmpxchg and load entry->next. If so, the memory barrier is only needed before the loop.
Yes, indeed, and by using lockless_dereference(), this is what we end up doing. FWIW, the reason why I moved smp_read_barrier_depends() into the loop was to issue it after the check for NULL pointer, assuming that getting a NULL pointer was a relatively frequent case compared to a failing cmpxchg. But we're talking about very minor optimisations compared to the upside of lockless_dereference() making the code easier to understand. Thanks, Mathieu
Best Regards, Huang, Yingquoted
quoted
The Alpha kernel is behaving pretty well provided one builds a machine specific kernel and UP. When running an SMP kernel some packages (most notably the java runtime, but there are a few others) occasionally lock up in a pthread call --- could be a problem in libc rather then the kernel.Are those lockups always occasional, or you have ways to reproduce them frequently with stress-tests ? Thanks, Mathieuquoted
quoted
quoted
Meta-comment, do we really care about Alpha anymore? Is it still consered an "active" arch we support?There are a few of us still running recent kernels on Alpha. I am maintaining the unofficial Debian alpha port at debian-ports, and the Debian popcon shows about 10 installations of Debian Alpha. Cheers Michael.
-- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com