Re: [PATCH net-next] netfilter: conntrack: expose gc_scan_interval_max via sysctl
From: Prasanna Panchamukhi <hidden>
Date: 2026-03-12 21:45:00
Also in:
linux-doc, lkml, netfilter-devel
Hi Fernando, Thank you for the quick review. On Thu, Mar 12, 2026 at 5:15 AM Fernando Fernandez Mancera [off-list ref] wrote:
On 3/11/26 8:40 PM, Prasanna S Panchamukhi wrote:quoted
The conntrack garbage collection worker uses an adaptive algorithm that adjusts the scan interval based on the average timeout of tracked entries. The upper bound of this interval is hardcoded as GC_SCAN_INTERVAL_MAX (60 seconds). Expose the upper bound as a new sysctl, net.netfilter.nf_conntrack_gc_scan_interval_max, so it can be tuned at runtime without rebuilding the kernel. The default remains 60 seconds to preserve existing behavior. The sysctl is global and read-only in non-init network namespaces, consistent with nf_conntrack_max and nf_conntrack_buckets. In environments where long-lived offloaded flows dominate the table, the adaptive average drifts toward the maximum, delaying cleanup of short-lived expired entries such as those in TCP CLOSE state (10s timeout). Adding sysctl to set the maximum GC scan helps to tune according to the evironment. Signed-off-by: Prasanna S Panchamukhi <redacted>[...]quoted
--- Documentation/networking/nf_conntrack-sysctl.rst | 11 +++++++++++ include/net/netfilter/nf_conntrack.h | 1 + net/netfilter/nf_conntrack_core.c | 9 ++++++--- net/netfilter/nf_conntrack_standalone.c | 10 ++++++++++ 4 files changed, 28 insertions(+), 3 deletions(-)diff --git a/Documentation/networking/nf_conntrack-sysctl.rst b/Documentation/networking/nf_conntrack-sysctl.rst index 35f889259fcd..c848eef9bc4f 100644 --- a/Documentation/networking/nf_conntrack-sysctl.rst +++ b/Documentation/networking/nf_conntrack-sysctl.rst@@ -64,6 +64,17 @@ nf_conntrack_frag6_timeout - INTEGER (seconds) Time to keep an IPv6 fragment in memory. +nf_conntrack_gc_scan_interval_max - INTEGER (seconds) + default 60 + + Maximum interval between garbage collection scans of the connection + tracking table. The GC worker uses an adaptive algorithm that adjusts + the scan interval based on average entry timeouts; this parameter caps + the upper bound. Lower values cause expired entries (e.g. connections + in CLOSE state) to be cleaned up faster, at the cost of slightly more + CPU usage. Minimum value is 1. + This sysctl is only writeable in the initial net namespace. +I think it would be a good idea to add under which situations it is good to tweak this setting.
Done.
quoted
nf_conntrack_generic_timeout - INTEGER (seconds) default 600diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index bc42dd0e10e6..0449577f322e 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h@@ -331,6 +331,7 @@ extern struct hlist_nulls_head *nf_conntrack_hash; extern unsigned int nf_conntrack_htable_size; extern seqcount_spinlock_t nf_conntrack_generation; extern unsigned int nf_conntrack_max; +extern unsigned int nf_conntrack_gc_scan_interval_max;Could it be just int? so there is no need to cast it to s32 later?
Regarding the data type, I encountered the following compilation error when trying to address the signedness: "../../net/netfilter/nf_conntrack_core.c: In function 'gc_worker': ../../include/linux/compiler_types.h:548:45: error: call to '__compiletime_assert_1027' declared with attribute error: clamp(next_run, (1ul * 250), gc_scan_max) signedness error"
quoted
/* must be called with rcu read lock held */ static inline voiddiff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 27ce5fda8993..54949246f329 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c@@ -91,7 +91,7 @@ static DEFINE_MUTEX(nf_conntrack_mutex); * allowing non-idle machines to wakeup more often when needed. */ #define GC_SCAN_INITIAL_COUNT 100 -#define GC_SCAN_INTERVAL_INIT GC_SCAN_INTERVAL_MAX +#define GC_SCAN_INTERVAL_INIT nf_conntrack_gc_scan_interval_max #define GC_SCAN_MAX_DURATION msecs_to_jiffies(10) #define GC_SCAN_EXPIRED_MAX (64000u / HZ)@@ -204,6 +204,9 @@ EXPORT_SYMBOL_GPL(nf_conntrack_htable_size); unsigned int nf_conntrack_max __read_mostly; EXPORT_SYMBOL_GPL(nf_conntrack_max); + +unsigned int nf_conntrack_gc_scan_interval_max __read_mostly = GC_SCAN_INTERVAL_MAX; + seqcount_spinlock_t nf_conntrack_generation __read_mostly; static siphash_aligned_key_t nf_conntrack_hash_rnd;@@ -1568,7 +1571,7 @@ static void gc_worker(struct work_struct *work) delta_time = nfct_time_stamp - gc_work->start_time; /* re-sched immediately if total cycle time is exceeded */ - next_run = delta_time < (s32)GC_SCAN_INTERVAL_MAX; + next_run = delta_time < (s32)nf_conntrack_gc_scan_interval_max; goto early_exit; }READ_ONCE() is required IMHO as it can be modified from sysctl concurrently.
Done.
quoted
@@ -1630,7 +1633,7 @@ static void gc_worker(struct work_struct *work) gc_work->next_bucket = 0; - next_run = clamp(next_run, GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_MAX); + next_run = clamp(next_run, GC_SCAN_INTERVAL_MIN, nf_conntrack_gc_scan_interval_max);Likewise here, READ_ONCE() recommended..
Done. I have also added a local variable gc_scan_max to avoid multiple load instructions since it is referenced twice in the code.
Thanks, Fernando.