Thread (8 messages) 8 messages, 3 authors, 4h ago

Re: [PATCH net-next] net: neigh: avoid calling neigh_forced_gc on every alloc when table is full

From: Vimal Agrawal <hidden>
Date: 2026-06-29 07:57:15

Hi Kuniyuki,
Thank you for the feedback.
However, the rate limiting issue exists independently of the threshold
values. If entries genuinely exceed gc_thresh3 — regardless of what it
is set to — neigh_forced_gc() is called on every allocation attempt
with no rate limiting. In my workload, most entries are
active/reachable with refcnt > 1, so the GC walk traverses the entire
table without reclaiming anything. Increasing gc_thresh3 would make
this worse, not better, as GC now has a larger table to scan on each
call.

Regarding neigh_hash_shift: in my workload, neigh_alloc() returns
ENOBUFS before reaching do_alloc() since GC cannot reclaim any
entries. kzalloc() is never called, so neigh_hash_grow() is not
involved in the latency I observed. The pre-lock time check in
neigh_forced_gc() is a low-cost safeguard that prevents repeated full
table scans regardless of gc_thresh3 value. It does not interfere with
correct GC behaviour — if entries are still above the threshold, GC
runs normally.


Hi Jakub,
I tested with different threshold values, filling the table completely
with 32k reachable entries and attempting 1000 additional allocations.
Exported neigh_forced_gc so that it can be profiled
                         no change  10ms   50ms   100ms
max cpu usage %          44%        11.8%  2.56%  1.42%
calls > 100us (of 1000)  101        31     13     7

At 10ms, max CPU usage is still 11.8% and 31 out of 1000 calls take
more than 100us. Given that 50ms reduces this to 2.56% and 13 calls
respectively, I would prefer 50ms as the threshold. However, I am open
to further discussion on the right value.

Thanks,
Vimal


On Fri, Jun 26, 2026 at 3:17 AM Kuniyuki Iwashima [off-list ref] wrote:
quoted hunk ↗ jump to hunk
From: Vimal Agrawal <redacted>
Date: Thu, 25 Jun 2026 10:20:20 +0000
quoted
Once the neighbour table exceeds gc_thresh3, neigh_forced_gc() is called
on every allocation attempt with no rate limiting. In workloads with mostly
active/reachable entries, the GC walk traverses a large portion of the
neighbour table without reclaiming entries, holding tbl->lock for an
extended period. This causes severe lock contention and allocation
latencies exceeding 16ms under sustained neighbour creation.

Add a pre-lock check in neigh_forced_gc() to skip the GC run if one was
performed within the last second, avoiding repeated full table scans and
lock acquisitions on the hot allocation path.

Profiling of neigh_create() shows ~3 orders of magnitude latency
improvement with this change.

Link:https://lore.kernel.org/netdev/CALkUMdSCpx_ywYCx_ePLdm6yioO1nQWx7sSM=AEgsq0kywHxTw@mail.gmail.com/ (local)
From the thread, these look misconfigured.

---8<---
net.ipv6.neigh.default.gc_thresh2 = 32768
net.ipv6.neigh.default.gc_thresh3 = 32768
---8<---

If gc_thresh3 is larger enough, gc_thresh2 will give you 5s
rate limiting.

If the number of active neigh entries constantly exceeds
gc_thresh3, it will be the correct gc_thresh2 for you.

Also, I guess you want a new kernel param for the first
neigh_hash_alloc(), which is currently fixed for 3, which
is too small for some hosts.

50000 entries require neigh_hash_grow() 13 times.

Can you test this on your real workload, starting from
neigh_hash_shift=16 and appropriate gc_thresh2/3 ?

---8<---
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 1349c0eedb64..a75b3750eec9 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1817,6 +1817,22 @@ EXPORT_SYMBOL(neigh_parms_release);
 static struct lock_class_key neigh_table_proxy_queue_class;

 static struct neigh_table __rcu *neigh_tables[NEIGH_NR_TABLES] __read_mostly;
+static __initdata unsigned long neigh_hash_shift = 3;
+
+static int __init neigh_set_hash_shift(char *str)
+{
+       ssize_t ret;
+
+       if (!str)
+               return 0;
+
+       ret = kstrtoul(str, 0, &neigh_hash_shift);
+       if (ret)
+               return 0;
+
+       return 1;
+}
+__setup("neigh_hash_shift=", neigh_set_hash_shift);

 void neigh_table_init(int index, struct neigh_table *tbl)
 {
@@ -1843,7 +1859,7 @@ void neigh_table_init(int index, struct neigh_table *tbl)
                panic("cannot create neighbour proc dir entry");
 #endif

-       RCU_INIT_POINTER(tbl->nht, neigh_hash_alloc(3));
+       RCU_INIT_POINTER(tbl->nht, neigh_hash_alloc(neigh_hash_shift));

        phsize = (PNEIGH_HASHMASK + 1) * sizeof(struct pneigh_entry *);
        tbl->phash_buckets = kzalloc(phsize, GFP_KERNEL);
---8<---


quoted
Signed-off-by: Vimal Agrawal <redacted>
---
 net/core/neighbour.c | 3 +++
 1 file changed, 3 insertions(+)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 1349c0eedb64..078842db3c5f 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -260,6 +260,9 @@ static int neigh_forced_gc(struct neigh_table *tbl)
      int shrunk = 0;
      int loop = 0;

+     if (!time_after(jiffies, READ_ONCE(tbl->last_flush) + HZ))
+             return 0;
+
      NEIGH_CACHE_STAT_INC(tbl, forced_gc_runs);

      spin_lock_bh(&tbl->lock);
--
2.17.1
v
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help