Re: [PATCH] refcount: Strengthen inc_not_zero()

[PATCH v9 00/17] reimplement per-vma lock as a refcount · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
[PATCH v9 01/17] mm: introduce vma_start_read_locked{_nested} helpers · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
[PATCH v9 02/17] mm: move per-vma lock into vm_area_struct · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
[PATCH v9 03/17] mm: mark vma as detached until it's added into vma tree · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
[PATCH v9 04/17] mm: introduce vma_iter_store_attached() to use with attached vmas · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
Re: [PATCH v9 04/17] mm: introduce vma_iter_store_attached() to use with attached vmas · Lorenzo Stoakes <hidden> · 2025-01-13
Re: [PATCH v9 04/17] mm: introduce vma_iter_store_attached() to use with attached vmas · Suren Baghdasaryan <surenb@google.com> · 2025-01-13
Re: [PATCH v9 04/17] mm: introduce vma_iter_store_attached() to use with attached vmas · Lorenzo Stoakes <hidden> · 2025-01-13
Re: [PATCH v9 04/17] mm: introduce vma_iter_store_attached() to use with attached vmas · Lorenzo Stoakes <hidden> · 2025-01-13
Re: [PATCH v9 04/17] mm: introduce vma_iter_store_attached() to use with attached vmas · Suren Baghdasaryan <surenb@google.com> · 2025-01-13
Re: [PATCH v9 04/17] mm: introduce vma_iter_store_attached() to use with attached vmas · Lorenzo Stoakes <hidden> · 2025-01-14
[PATCH v9 05/17] mm: mark vmas detached upon exit · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
Re: [PATCH v9 05/17] mm: mark vmas detached upon exit · Lorenzo Stoakes <hidden> · 2025-01-13
Re: [PATCH v9 05/17] mm: mark vmas detached upon exit · Suren Baghdasaryan <surenb@google.com> · 2025-01-13
Re: [PATCH v9 05/17] mm: mark vmas detached upon exit · Lorenzo Stoakes <hidden> · 2025-01-13
Re: [PATCH v9 05/17] mm: mark vmas detached upon exit · Suren Baghdasaryan <surenb@google.com> · 2025-01-13
Re: [PATCH v9 05/17] mm: mark vmas detached upon exit · Vlastimil Babka <hidden> · 2025-01-13
Re: [PATCH v9 05/17] mm: mark vmas detached upon exit · Suren Baghdasaryan <surenb@google.com> · 2025-01-13
Re: [PATCH v9 05/17] mm: mark vmas detached upon exit · Lorenzo Stoakes <hidden> · 2025-01-14
[PATCH v9 06/17] types: move struct rcuwait into types.h · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
Re: [PATCH v9 06/17] types: move struct rcuwait into types.h · Lorenzo Stoakes <hidden> · 2025-01-13
[PATCH v9 07/17] mm: allow vma_start_read_locked/vma_start_read_locked_nested to fail · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
Re: [PATCH v9 07/17] mm: allow vma_start_read_locked/vma_start_read_locked_nested to fail · Lorenzo Stoakes <hidden> · 2025-01-13
Re: [PATCH v9 07/17] mm: allow vma_start_read_locked/vma_start_read_locked_nested to fail · Suren Baghdasaryan <surenb@google.com> · 2025-01-13
Re: [PATCH v9 07/17] mm: allow vma_start_read_locked/vma_start_read_locked_nested to fail · Lorenzo Stoakes <hidden> · 2025-01-14
[PATCH v9 08/17] mm: move mmap_init_lock() out of the header file · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
Re: [PATCH v9 08/17] mm: move mmap_init_lock() out of the header file · Lorenzo Stoakes <hidden> · 2025-01-13
Re: [PATCH v9 08/17] mm: move mmap_init_lock() out of the header file · Suren Baghdasaryan <surenb@google.com> · 2025-01-13
[PATCH v9 09/17] mm: uninline the main body of vma_start_write() · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
Re: [PATCH v9 09/17] mm: uninline the main body of vma_start_write() · Lorenzo Stoakes <hidden> · 2025-01-13
[PATCH v9 10/17] refcount: introduce __refcount_{add|inc}_not_zero_limited · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
Re: [PATCH v9 10/17] refcount: introduce __refcount_{add|inc}_not_zero_limited · David Laight <hidden> · 2025-01-11
Re: [PATCH v9 10/17] refcount: introduce __refcount_{add|inc}_not_zero_limited · Matthew Wilcox <willy@infradead.org> · 2025-01-11
Re: [PATCH v9 10/17] refcount: introduce __refcount_{add|inc}_not_zero_limited · "Paul E. McKenney" <paulmck@kernel.org> · 2025-01-11
Re: [PATCH v9 10/17] refcount: introduce __refcount_{add|inc}_not_zero_limited · David Laight <hidden> · 2025-01-11
Re: [PATCH v9 10/17] refcount: introduce __refcount_{add|inc}_not_zero_limited - clang 17.0.1 bug · David Laight <hidden> · 2025-01-11
Re: [PATCH v9 10/17] refcount: introduce __refcount_{add|inc}_not_zero_limited - clang 17.0.1 bug · David Laight <hidden> · 2025-01-12
Re: [PATCH v9 10/17] refcount: introduce __refcount_{add|inc}_not_zero_limited - clang 17.0.1 bug · "Paul E. McKenney" <paulmck@kernel.org> · 2025-01-12
[PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Mateusz Guzik <hidden> · 2025-01-11
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Mateusz Guzik <hidden> · 2025-01-11
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Vlastimil Babka <hidden> · 2025-01-11
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Wei Yang <hidden> · 2025-01-13
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Wei Yang <hidden> · 2025-01-13
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Suren Baghdasaryan <surenb@google.com> · 2025-01-13
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Suren Baghdasaryan <surenb@google.com> · 2025-01-13
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Peter Zijlstra <peterz@infradead.org> · 2025-01-15
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Peter Zijlstra <peterz@infradead.org> · 2025-01-15
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Suren Baghdasaryan <surenb@google.com> · 2025-01-15
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Peter Zijlstra <peterz@infradead.org> · 2025-01-15
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Peter Zijlstra <peterz@infradead.org> · 2025-01-15
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Suren Baghdasaryan <surenb@google.com> · 2025-01-15
[PATCH] refcount: Strengthen inc_not_zero() · Peter Zijlstra <peterz@infradead.org> · 2025-01-15
Re: [PATCH] refcount: Strengthen inc_not_zero() · Suren Baghdasaryan <surenb@google.com> · 2025-01-16
Re: [PATCH] refcount: Strengthen inc_not_zero() · Will Deacon <will@kernel.org> · 2025-01-17
Re: [PATCH] refcount: Strengthen inc_not_zero() · Will Deacon <will@kernel.org> · 2025-01-27
Re: [PATCH] refcount: Strengthen inc_not_zero() · Suren Baghdasaryan <surenb@google.com> · 2025-01-27
Re: [PATCH] refcount: Strengthen inc_not_zero() · Suren Baghdasaryan <surenb@google.com> · 2025-01-28
[PATCH 1/1] refcount: provide ops for cases when object's memory can be reused · Suren Baghdasaryan <surenb@google.com> · 2025-02-06
Re: [PATCH 1/1] refcount: provide ops for cases when object's memory can be reused · Vlastimil Babka <hidden> · 2025-02-06
Re: [PATCH] refcount: Strengthen inc_not_zero() · Suren Baghdasaryan <surenb@google.com> · 2025-02-06
Re: [PATCH] refcount: Strengthen inc_not_zero() · Suren Baghdasaryan <surenb@google.com> · 2025-02-13
Re: [PATCH] refcount: Strengthen inc_not_zero() · Matthew Wilcox <willy@infradead.org> · 2025-01-17
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Wei Yang <hidden> · 2025-01-12
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Suren Baghdasaryan <surenb@google.com> · 2025-01-12
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Wei Yang <hidden> · 2025-01-13
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Wei Yang <hidden> · 2025-01-13
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Suren Baghdasaryan <surenb@google.com> · 2025-01-13
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Wei Yang <hidden> · 2025-01-13
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Suren Baghdasaryan <surenb@google.com> · 2025-01-13
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Wei Yang <hidden> · 2025-01-15
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Suren Baghdasaryan <surenb@google.com> · 2025-01-15
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Wei Yang <hidden> · 2025-01-15
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Suren Baghdasaryan <surenb@google.com> · 2025-01-15
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Wei Yang <hidden> · 2025-01-16
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Suren Baghdasaryan <surenb@google.com> · 2025-01-16
Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count · Wei Yang <hidden> · 2025-01-16
[PATCH v9 12/17] mm: move lesser used vma_area_struct members into the last cacheline · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
Re: [PATCH v9 12/17] mm: move lesser used vma_area_struct members into the last cacheline · Lorenzo Stoakes <hidden> · 2025-01-13
Re: [PATCH v9 12/17] mm: move lesser used vma_area_struct members into the last cacheline · Peter Zijlstra <peterz@infradead.org> · 2025-01-15
Re: [PATCH v9 12/17] mm: move lesser used vma_area_struct members into the last cacheline · Suren Baghdasaryan <surenb@google.com> · 2025-01-15
Re: [PATCH v9 12/17] mm: move lesser used vma_area_struct members into the last cacheline · Suren Baghdasaryan <surenb@google.com> · 2025-02-13
[PATCH v9 13/17] mm/debug: print vm_refcnt state when dumping the vma · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
Re: [PATCH v9 13/17] mm/debug: print vm_refcnt state when dumping the vma · Lorenzo Stoakes <hidden> · 2025-01-13
Re: [PATCH v9 13/17] mm/debug: print vm_refcnt state when dumping the vma · Liam R. Howlett <hidden> · 2025-01-13
Re: [PATCH v9 13/17] mm/debug: print vm_refcnt state when dumping the vma · Suren Baghdasaryan <surenb@google.com> · 2025-01-13
Re: [PATCH v9 13/17] mm/debug: print vm_refcnt state when dumping the vma · Lorenzo Stoakes <hidden> · 2025-01-14
[PATCH v9 14/17] mm: remove extra vma_numab_state_init() call · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
Re: [PATCH v9 14/17] mm: remove extra vma_numab_state_init() call · Lorenzo Stoakes <hidden> · 2025-01-13
Re: [PATCH v9 14/17] mm: remove extra vma_numab_state_init() call · Suren Baghdasaryan <surenb@google.com> · 2025-01-13
Re: [PATCH v9 14/17] mm: remove extra vma_numab_state_init() call · Lorenzo Stoakes <hidden> · 2025-01-14
[PATCH v9 15/17] mm: prepare lock_vma_under_rcu() for vma reuse possibility · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
[PATCH v9 16/17] mm: make vma cache SLAB_TYPESAFE_BY_RCU · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
Re: [PATCH v9 16/17] mm: make vma cache SLAB_TYPESAFE_BY_RCU · Wei Yang <hidden> · 2025-01-15
Re: [PATCH v9 16/17] mm: make vma cache SLAB_TYPESAFE_BY_RCU · Suren Baghdasaryan <surenb@google.com> · 2025-01-15
Re: [PATCH v9 16/17] mm: make vma cache SLAB_TYPESAFE_BY_RCU · Liam R. Howlett <hidden> · 2025-01-15
Re: [PATCH v9 16/17] mm: make vma cache SLAB_TYPESAFE_BY_RCU · Suren Baghdasaryan <surenb@google.com> · 2025-01-15
Re: [PATCH v9 16/17] mm: make vma cache SLAB_TYPESAFE_BY_RCU · Mateusz Guzik <hidden> · 2025-01-15
Re: [PATCH v9 16/17] mm: make vma cache SLAB_TYPESAFE_BY_RCU · Suren Baghdasaryan <surenb@google.com> · 2025-01-15
Re: [PATCH v9 16/17] mm: make vma cache SLAB_TYPESAFE_BY_RCU · Mateusz Guzik <hidden> · 2025-01-15
Re: [PATCH v9 16/17] mm: make vma cache SLAB_TYPESAFE_BY_RCU · Suren Baghdasaryan <surenb@google.com> · 2025-01-15
Re: [PATCH v9 16/17] mm: make vma cache SLAB_TYPESAFE_BY_RCU · Vlastimil Babka <hidden> · 2025-01-15
Re: [PATCH v9 16/17] mm: make vma cache SLAB_TYPESAFE_BY_RCU · Suren Baghdasaryan <surenb@google.com> · 2025-01-15
Re: [PATCH v9 16/17] mm: make vma cache SLAB_TYPESAFE_BY_RCU · Suren Baghdasaryan <surenb@google.com> · 2025-02-13
Re: [PATCH v9 16/17] mm: make vma cache SLAB_TYPESAFE_BY_RCU · Wei Yang <hidden> · 2025-01-15
Re: [PATCH v9 16/17] mm: make vma cache SLAB_TYPESAFE_BY_RCU · Suren Baghdasaryan <surenb@google.com> · 2025-01-15
[PATCH v9 17/17] docs/mm: document latest changes to vm_lock · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
Re: [PATCH v9 17/17] docs/mm: document latest changes to vm_lock · Lorenzo Stoakes <hidden> · 2025-01-13
Re: [PATCH v9 17/17] docs/mm: document latest changes to vm_lock · Suren Baghdasaryan <surenb@google.com> · 2025-01-13
Re: [PATCH v9 00/17] reimplement per-vma lock as a refcount · Matthew Wilcox <willy@infradead.org> · 2025-01-11
Re: [PATCH v9 00/17] reimplement per-vma lock as a refcount · Suren Baghdasaryan <surenb@google.com> · 2025-01-11
Re: [PATCH v9 00/17] reimplement per-vma lock as a refcount · Lorenzo Stoakes <hidden> · 2025-01-13
Re: [PATCH v9 00/17] reimplement per-vma lock as a refcount · Suren Baghdasaryan <surenb@google.com> · 2025-01-13
Re: [PATCH v9 00/17] reimplement per-vma lock as a refcount · Lorenzo Stoakes <hidden> · 2025-01-13
Re: [PATCH v9 00/17] reimplement per-vma lock as a refcount · Suren Baghdasaryan <surenb@google.com> · 2025-01-13
Re: [PATCH v9 00/17] reimplement per-vma lock as a refcount · Lorenzo Stoakes <hidden> · 2025-01-14
Re: [PATCH v9 00/17] reimplement per-vma lock as a refcount · Andrew Morton <akpm@linux-foundation.org> · 2025-01-14
Re: [PATCH v9 00/17] reimplement per-vma lock as a refcount · Suren Baghdasaryan <surenb@google.com> · 2025-01-14
Re: [PATCH v9 00/17] reimplement per-vma lock as a refcount · Andrew Morton <akpm@linux-foundation.org> · 2025-01-14
Re: [PATCH v9 00/17] reimplement per-vma lock as a refcount · Vlastimil Babka <hidden> · 2025-01-14
Re: [PATCH v9 00/17] reimplement per-vma lock as a refcount · Lorenzo Stoakes <hidden> · 2025-01-14
Re: [PATCH v9 00/17] reimplement per-vma lock as a refcount · Liam R. Howlett <hidden> · 2025-01-14
Re: [PATCH v9 00/17] reimplement per-vma lock as a refcount · Suren Baghdasaryan <surenb@google.com> · 2025-01-14
Re: [PATCH v9 00/17] reimplement per-vma lock as a refcount · Lorenzo Stoakes <hidden> · 2025-01-15
Re: [PATCH v9 00/17] reimplement per-vma lock as a refcount · Suren Baghdasaryan <surenb@google.com> · 2025-01-15
Re: [PATCH v9 00/17] reimplement per-vma lock as a refcount · Shivank Garg <hidden> · 2025-01-28
Re: [PATCH v9 00/17] reimplement per-vma lock as a refcount · Suren Baghdasaryan <surenb@google.com> · 2025-01-28

From: Suren Baghdasaryan <surenb@google.com>
Date: 2025-02-13 23:04:40
Also in: linux-mm, lkml

On Wed, Feb 5, 2025 at 7:03 PM Suren Baghdasaryan [off-list ref] wrote:

On Tue, Jan 28, 2025 at 3:51 PM Suren Baghdasaryan [off-list ref] wrote:

quoted

On Mon, Jan 27, 2025 at 11:21 AM Suren Baghdasaryan [off-list ref] wrote:

quoted

On Mon, Jan 27, 2025 at 6:09 AM Will Deacon [off-list ref] wrote:

quoted

On Fri, Jan 17, 2025 at 03:41:36PM +0000, Will Deacon wrote:

quoted

On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote:

quoted

On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote:

quoted

Notably, it means refcount_t is entirely unsuitable for anything
SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation
conditions after the refcount succeeds.

And this is probably fine, but let me ponder this all a little more.

Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd better
fix this, these things are hard enough as they are.

Will, others, wdyt?

We should also update the Documentation (atomic_t.txt and
refcount-vs-atomic.rst) if we strengthen this.

quoted

---
Subject: refcount: Strengthen inc_not_zero()

For speculative lookups where a successful inc_not_zero() pins the
object, but where we still need to double check if the object acquired
is indeed the one we set out to aquire, needs this validation to happen
*after* the increment.

Notably SLAB_TYPESAFE_BY_RCU is one such an example.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/refcount.h | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index 35f039ecb272..340e7ffa445e 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h

@@ -69,9 +69,10 @@
  * its the lock acquire, for RCU/lockless data structures its the dependent
  * load.
  *
- * Do note that inc_not_zero() provides a control dependency which will order
- * future stores against the inc, this ensures we'll never modify the object
- * if we did not in fact acquire a reference.
+ * Do note that inc_not_zero() does provide acquire order, which will order
+ * future load and stores against the inc, this ensures all subsequent accesses
+ * are from this object and not anything previously occupying this memory --
+ * consider SLAB_TYPESAFE_BY_RCU.
  *
  * The decrements will provide release order, such that all the prior loads and
  * stores will be issued before, it also provides a control dependency, which

@@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
    do {
            if (!old)
                    break;
-   } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));
+   } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i));

Hmm, do the later memory accesses need to be ordered against the store
part of the increment or just the read? If it's the former, then I don't
think that _acquire is sufficient -- accesses can still get in-between
the read and write parts of the RmW.

I dug some more into this at the end of last week. For the
SLAB_TYPESAFE_BY_RCU where we're racing inc_not_zero() with
dec_and_test(), then I think using _acquire() above is correct as the
later references can only move up into the critical section in the case
that we successfully obtained a reference.

However, if we're going to make the barriers implicit in the refcount
operations here then I think we also need to do something on the producer
side for when the object is re-initialised after being destroyed and
allocated again. I think that would necessitate release ordering for
refcount_set() so that whatever allows the consumer to validate the
object (e.g. sequence number) is published *before* the refcount.

Thanks Will!
I would like to expand on your answer to provide an example of the
race that would happen without release ordering in the producer. To
save reader's time I provide a simplified flow and reasoning first.
More detailed code of what I'm considering a typical
SLAB_TYPESAFE_BY_RCU refcounting example is added at the end of my
reply (Addendum).
Simplified flow looks like this:

consumer:
    obj = lookup(collection, key);
    if (!refcount_inc_not_zero(&obj->ref))
        return;
    smp_rmb(); /* Peter's new acquire fence */
    if (READ_ONCE(obj->key) != key) {
        put_ref(obj);
        return;
    }
    use(obj->value);

producer:
    old_key = obj->key;
    remove(collection, old_key);
    if (!refcount_dec_and_test(&obj->ref))
        return;
    obj->key = KEY_INVALID;
    free(objj);
    ...
    obj = malloc(); /* obj is reused */
    obj->key = new_key;
    obj->value = new_value;
    smp_wmb(); /* Will's proposed release fence */
    refcount_set(obj->ref, 1);
    insert(collection, key, obj);

Let's consider a case when new_key == old_key. Will call both of them
"key". Without WIll's proposed fence the following reordering is
possible:

consumer:
    obj = lookup(collection, key);

                 producer:
                     key = obj->key
                     remove(collection, key);
                     if (!refcount_dec_and_test(&obj->ref))
                         return;
                     obj->key = KEY_INVALID;
                     free(objj);
                     obj = malloc(); /* obj is reused */
                     refcount_set(obj->ref, 1);
                     obj->key = key; /* same key */

    if (!refcount_inc_not_zero(&obj->ref))
        return;
    smp_rmb();
    if (READ_ONCE(obj->key) != key) {
        put_ref(obj);
        return;
    }
    use(obj->value);

                     obj->value = new_value; /* reordered store */
                     add(collection, key, obj);

So, the consumer finds the old object, successfully takes a refcount
and validates the key. It succeeds because the object is allocated and
has the same key, which is fine. However it proceeds to use stale
obj->value. Will's proposed release ordering would prevent that.

The example in https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102
omits all these ordering issues for SLAB_TYPESAFE_BY_RCU.
I think it would be better to introduce two new functions:
refcount_add_not_zero_acquire() and refcount_set_release(), clearly
document that they should be used when a freed object can be recycled
and reused, like in SLAB_TYPESAFE_BY_RCU case. refcount_set_release()
should also clarify that once it's called, the object can be accessed
by consumers even if it was not added yet into the collection used for
object lookup (like in the example above). SLAB_TYPESAFE_BY_RCU
comment at https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102
then can explicitly use these new functions in the example code,
further clarifying their purpose and proper use.
WDYT?

Hi Peter,
Should I take a stab at preparing a patch to add the two new
refcounting functions suggested above with updates to the
documentation and comments?
If you disagree with that or need more time to decide then I'll wait.
Please let me know.

Not sure if "--in-reply-to" worked but I just posted a patch adding
new refcounting APIs for SLAB_TYPESAFE_BY_RCU here:
https://lore.kernel.org/all/20250206025201.979573-1-surenb@google.com/ (local)

Since I did not get any replies other than Vlastimil's Ack on the
above patch, I went ahead and posted v10 of my patchset [1] and
included the patch above in it [2]. Feedback is highly appreciated!

[1] https://lore.kernel.org/all/20250213224655.1680278-1-surenb@google.com/ (local)
[2] https://lore.kernel.org/all/20250213224655.1680278-11-surenb@google.com/ (local)

Since Peter seems to be busy I discussed these ordering requirements
for SLAB_TYPESAFE_BY_RCU with Paul McKenney and he was leaning towards
having separate functions with the additional fences for this case.
That's what I provided in my patch.
Another possible option is to add acquire ordering in the
__refcount_add_not_zero() as Peter suggested and add
refcount_set_release() function.
Thanks,
Suren.

quoted

Thanks,
Suren.

quoted

ADDENDUM.
Detailed code for typical use of refcounting with SLAB_TYPESAFE_BY_RCU:

struct object {
    refcount_t ref;
    u64 key;
    u64 value;
};

void init(struct object *obj, u64 key, u64 value)
{
    obj->key = key;
    obj->value = value;
    smp_wmb(); /* Will's proposed release fence */
    refcount_set(obj->ref, 1);
}

bool get_ref(struct object *obj, u64 key)
{
    if (!refcount_inc_not_zero(&obj->ref))
        return false;
    smp_rmb(); /* Peter's new acquire fence */
    if (READ_ONCE(obj->key) != key) {
        put_ref(obj);
        return false;
    }
    return true;
}

void put_ref(struct object *obj)
{
    if (!refcount_dec_and_test(&obj->ref))
        return;
    obj->key = KEY_INVALID;
    free(obj);
}

consumer:
    obj = lookup(collection, key);
    if (!get_ref(obj, key)
        return;
    use(obj->value);

producer:
    remove(collection, old_obj->key);
    put_ref(old_obj);
    new_obj = malloc();
    init(new_obj, new_key, new_value);
    insert(collection, new_key, new_obj);

With SLAB_TYPESAFE_BY_RCU old_obj in the producer can be reused and be
equal to new_obj.

quoted

Will

On Wed, Feb 5, 2025 at 7:03 PM Suren Baghdasaryan [off-list ref] wrote:

On Tue, Jan 28, 2025 at 3:51 PM Suren Baghdasaryan [off-list ref] wrote:

quoted

On Mon, Jan 27, 2025 at 11:21 AM Suren Baghdasaryan [off-list ref] wrote:

quoted

On Mon, Jan 27, 2025 at 6:09 AM Will Deacon [off-list ref] wrote:

quoted

On Fri, Jan 17, 2025 at 03:41:36PM +0000, Will Deacon wrote:

quoted

On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote:

quoted

On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote:

quoted

Notably, it means refcount_t is entirely unsuitable for anything
SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation
conditions after the refcount succeeds.

And this is probably fine, but let me ponder this all a little more.

Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd better
fix this, these things are hard enough as they are.

Will, others, wdyt?

We should also update the Documentation (atomic_t.txt and
refcount-vs-atomic.rst) if we strengthen this.

quoted

---
Subject: refcount: Strengthen inc_not_zero()

For speculative lookups where a successful inc_not_zero() pins the
object, but where we still need to double check if the object acquired
is indeed the one we set out to aquire, needs this validation to happen
*after* the increment.

Notably SLAB_TYPESAFE_BY_RCU is one such an example.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/refcount.h | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index 35f039ecb272..340e7ffa445e 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h

@@ -69,9 +69,10 @@
  * its the lock acquire, for RCU/lockless data structures its the dependent
  * load.
  *
- * Do note that inc_not_zero() provides a control dependency which will order
- * future stores against the inc, this ensures we'll never modify the object
- * if we did not in fact acquire a reference.
+ * Do note that inc_not_zero() does provide acquire order, which will order
+ * future load and stores against the inc, this ensures all subsequent accesses
+ * are from this object and not anything previously occupying this memory --
+ * consider SLAB_TYPESAFE_BY_RCU.
  *
  * The decrements will provide release order, such that all the prior loads and
  * stores will be issued before, it also provides a control dependency, which

@@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
    do {
            if (!old)
                    break;
-   } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));
+   } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i));

Hmm, do the later memory accesses need to be ordered against the store
part of the increment or just the read? If it's the former, then I don't
think that _acquire is sufficient -- accesses can still get in-between
the read and write parts of the RmW.

I dug some more into this at the end of last week. For the
SLAB_TYPESAFE_BY_RCU where we're racing inc_not_zero() with
dec_and_test(), then I think using _acquire() above is correct as the
later references can only move up into the critical section in the case
that we successfully obtained a reference.

However, if we're going to make the barriers implicit in the refcount
operations here then I think we also need to do something on the producer
side for when the object is re-initialised after being destroyed and
allocated again. I think that would necessitate release ordering for
refcount_set() so that whatever allows the consumer to validate the
object (e.g. sequence number) is published *before* the refcount.

Thanks Will!
I would like to expand on your answer to provide an example of the
race that would happen without release ordering in the producer. To
save reader's time I provide a simplified flow and reasoning first.
More detailed code of what I'm considering a typical
SLAB_TYPESAFE_BY_RCU refcounting example is added at the end of my
reply (Addendum).
Simplified flow looks like this:

consumer:
    obj = lookup(collection, key);
    if (!refcount_inc_not_zero(&obj->ref))
        return;
    smp_rmb(); /* Peter's new acquire fence */
    if (READ_ONCE(obj->key) != key) {
        put_ref(obj);
        return;
    }
    use(obj->value);

producer:
    old_key = obj->key;
    remove(collection, old_key);
    if (!refcount_dec_and_test(&obj->ref))
        return;
    obj->key = KEY_INVALID;
    free(objj);
    ...
    obj = malloc(); /* obj is reused */
    obj->key = new_key;
    obj->value = new_value;
    smp_wmb(); /* Will's proposed release fence */
    refcount_set(obj->ref, 1);
    insert(collection, key, obj);

Let's consider a case when new_key == old_key. Will call both of them
"key". Without WIll's proposed fence the following reordering is
possible:

consumer:
    obj = lookup(collection, key);

                 producer:
                     key = obj->key
                     remove(collection, key);
                     if (!refcount_dec_and_test(&obj->ref))
                         return;
                     obj->key = KEY_INVALID;
                     free(objj);
                     obj = malloc(); /* obj is reused */
                     refcount_set(obj->ref, 1);
                     obj->key = key; /* same key */

    if (!refcount_inc_not_zero(&obj->ref))
        return;
    smp_rmb();
    if (READ_ONCE(obj->key) != key) {
        put_ref(obj);
        return;
    }
    use(obj->value);

                     obj->value = new_value; /* reordered store */
                     add(collection, key, obj);

So, the consumer finds the old object, successfully takes a refcount
and validates the key. It succeeds because the object is allocated and
has the same key, which is fine. However it proceeds to use stale
obj->value. Will's proposed release ordering would prevent that.

The example in https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102
omits all these ordering issues for SLAB_TYPESAFE_BY_RCU.
I think it would be better to introduce two new functions:
refcount_add_not_zero_acquire() and refcount_set_release(), clearly
document that they should be used when a freed object can be recycled
and reused, like in SLAB_TYPESAFE_BY_RCU case. refcount_set_release()
should also clarify that once it's called, the object can be accessed
by consumers even if it was not added yet into the collection used for
object lookup (like in the example above). SLAB_TYPESAFE_BY_RCU
comment at https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102
then can explicitly use these new functions in the example code,
further clarifying their purpose and proper use.
WDYT?

Hi Peter,
Should I take a stab at preparing a patch to add the two new
refcounting functions suggested above with updates to the
documentation and comments?
If you disagree with that or need more time to decide then I'll wait.
Please let me know.

Not sure if "--in-reply-to" worked but I just posted a patch adding
new refcounting APIs for SLAB_TYPESAFE_BY_RCU here:
https://lore.kernel.org/all/20250206025201.979573-1-surenb@google.com/ (local)
Since Peter seems to be busy I discussed these ordering requirements
for SLAB_TYPESAFE_BY_RCU with Paul McKenney and he was leaning towards
having separate functions with the additional fences for this case.
That's what I provided in my patch.
Another possible option is to add acquire ordering in the
__refcount_add_not_zero() as Peter suggested and add
refcount_set_release() function.
Thanks,
Suren.

quoted

Thanks,
Suren.

quoted

ADDENDUM.
Detailed code for typical use of refcounting with SLAB_TYPESAFE_BY_RCU:

struct object {
    refcount_t ref;
    u64 key;
    u64 value;
};

void init(struct object *obj, u64 key, u64 value)
{
    obj->key = key;
    obj->value = value;
    smp_wmb(); /* Will's proposed release fence */
    refcount_set(obj->ref, 1);
}

bool get_ref(struct object *obj, u64 key)
{
    if (!refcount_inc_not_zero(&obj->ref))
        return false;
    smp_rmb(); /* Peter's new acquire fence */
    if (READ_ONCE(obj->key) != key) {
        put_ref(obj);
        return false;
    }
    return true;
}

void put_ref(struct object *obj)
{
    if (!refcount_dec_and_test(&obj->ref))
        return;
    obj->key = KEY_INVALID;
    free(obj);
}

consumer:
    obj = lookup(collection, key);
    if (!get_ref(obj, key)
        return;
    use(obj->value);

producer:
    remove(collection, old_obj->key);
    put_ref(old_obj);
    new_obj = malloc();
    init(new_obj, new_key, new_value);
    insert(collection, new_key, new_obj);

With SLAB_TYPESAFE_BY_RCU old_obj in the producer can be reused and be
equal to new_obj.

quoted

Will

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help