[PATCH] man7, man2: document SCHED_EXT policy
From: Cheng-Yang Chou <hidden>
Date: 2026-04-12 18:18:22
Also in:
sched-ext
Subsystem:
the rest · Maintainer:
Linus Torvalds
Add the sched_ext(7) manual page and update existing scheduling documentation to include the SCHED_EXT policy. Signed-off-by: Cheng-Yang Chou <redacted> --- man/man2/sched_setattr.2 | 11 +++- man/man2/sched_setscheduler.2 | 4 ++ man/man7/sched.7 | 13 +++++ man/man7/sched_ext.7 | 100 ++++++++++++++++++++++++++++++++++ 4 files changed, 126 insertions(+), 2 deletions(-) create mode 100644 man/man7/sched_ext.7
diff --git a/man/man2/sched_setattr.2 b/man/man2/sched_setattr.2
index 80a0ac726dcf..d60678f00e72 100644
--- a/man/man2/sched_setattr.2
+++ b/man/man2/sched_setattr.2@@ -81,6 +81,10 @@ a deadline scheduling policy; see .BR sched (7) for details. +.TP 14 +.B SCHED_EXT +for extensible scheduling policies implemented via BPF +(see \fBsched_ext\fR(7)). .P The .I attr
@@ -95,7 +99,8 @@ struct sched_attr { u32 sched_policy; /* Policy (SCHED_*) */ u64 sched_flags; /* Flags */ s32 sched_nice; /* Nice value (SCHED_OTHER, - SCHED_BATCH) */ + SCHED_BATCH, + SCHED_EXT) */ u32 sched_priority; /* Static priority (SCHED_FIFO, SCHED_RR) */ /* For SCHED_DEADLINE */
@@ -218,8 +223,10 @@ This field specifies the nice value to be set when specifying .I sched_policy as .B SCHED_OTHER +, +.BR SCHED_BATCH , or -.BR SCHED_BATCH . +.BR SCHED_EXT . The nice value is a number in the range \-20 (high priority) to +19 (low priority); see
diff --git a/man/man2/sched_setscheduler.2 b/man/man2/sched_setscheduler.2
index b4c35543e5bf..825eb7290ee7 100644
--- a/man/man2/sched_setscheduler.2
+++ b/man/man2/sched_setscheduler.2@@ -67,6 +67,10 @@ and for running .I very low priority background jobs. +.TP +.B SCHED_EXT +for extensible scheduling policies implemented via BPF +(see \fBsched_ext\fR(7)). .P For each of the above policies, .I param\->sched_priority
diff --git a/man/man7/sched.7 b/man/man7/sched.7
index 00926cd34ecf..2e73a4c716b9 100644
--- a/man/man7/sched.7
+++ b/man/man7/sched.7@@ -116,6 +116,13 @@ and .BR sched_get_priority_max (2) to find the range of priorities supported for a particular policy. .P +Since Linux 6.12, there is an extensible BPF scheduling policy +.RB ( SCHED_EXT ), +which allows for custom scheduling algorithms to be implemented as BPF +programs. +See +.BR sched_ext (7). +.P Conceptually, the scheduler maintains a list of runnable threads for each possible .I sched_priority
@@ -529,6 +536,12 @@ priority (lower even than a +19 nice value with the or .B SCHED_BATCH policies). +.SS SCHED_EXT: Extensible BPF Scheduling +Tasks with this policy are managed by an extensible scheduler class, +which allows for custom scheduling algorithms to be implemented as +BPF programs. +See +.BR sched_ext (7). .\" .SS Resetting scheduling policy for child processes Each thread has a reset-on-fork scheduling flag.
diff --git a/man/man7/sched_ext.7 b/man/man7/sched_ext.7
new file mode 100644
index 000000000000..7ea467e18b84
--- /dev/null
+++ b/man/man7/sched_ext.7@@ -0,0 +1,100 @@ +.TH SCHED_EXT 7 2024-04-13 "Linux" "Linux Programmer's Manual" +.SH NAME +sched_ext \- Extensible BPF Scheduler Class +.SH SYNOPSIS +.B #include <linux/sched.h> +.PP +.B #define SCHED_EXT 7 +.SH DESCRIPTION +.B sched_ext +is a scheduling class whose behavior can be defined by a set of BPF +programs, known as the BPF scheduler. It allows for the implementation +of custom scheduling algorithms that can be loaded and unloaded +dynamically. +.PP +When a BPF scheduler is loaded, it can take over the scheduling of +tasks that use the +.B SCHED_EXT +policy, as well as tasks using standard policies like +.BR SCHED_NORMAL , +.BR SCHED_BATCH , +and +.B SCHED_IDLE , +depending on how the BPF scheduler is configured. +.SS Switching to and from sched_ext +The feature is enabled via the +.B CONFIG_SCHED_CLASS_EXT +kernel configuration option. +.PP +A task can explicitly request the +.B SCHED_EXT +policy using system calls such as +.BR sched_setscheduler (2) +or +.BR sched_setattr (2). +If no BPF scheduler is currently loaded, tasks with the +.B SCHED_EXT +policy are treated as +.B SCHED_NORMAL +and scheduled by the default fair-class scheduler (CFS/EEVDF). +.PP +When a BPF scheduler is loaded: +.IP \(bu 3 +If +.B SCX_OPS_SWITCH_PARTIAL +is NOT set in the scheduler's flags, ALL tasks with policies +.BR SCHED_NORMAL , +.BR SCHED_BATCH , +.BR SCHED_IDLE , +and +.B SCHED_EXT +are scheduled by +.BR sched_ext . +.IP \(bu 3 +If +.B SCX_OPS_SWITCH_PARTIAL +IS set, only tasks with the +.B SCHED_EXT +policy are scheduled by +.BR sched_ext . +Tasks with other policies remain under the control of the fair-class scheduler. +.PP +If the BPF scheduler terminates (either normally, due to an error, or +via a SysRq command), all tasks are automatically reverted to the +fair-class scheduler. +.SS System Interfaces +.B sched_ext +exposes several interfaces in sysfs for monitoring and control: +.TP +.I /sys/kernel/sched_ext/state +Shows the current state of the BPF scheduler (\fBenabled\fR, \fBdisabled\fR, etc.). +.TP +.I /sys/kernel/sched_ext/root/ops +Shows the name of the currently loaded BPF scheduler. +.TP +.I /sys/kernel/sched_ext/enable_seq +A monotonically incrementing counter that tracks how many times a BPF +scheduler has been enabled since boot. +.SS Safety and Debugging +System integrity is maintained regardless of the BPF scheduler's +behavior. If a runnable task stalls or an internal error is detected, +the BPF scheduler is aborted. +.PP +The following SysRq sequences are available for emergency management: +.TP +.B SysRq-S +Aborts the current BPF scheduler and reverts all tasks to the fair-class +scheduler. +.TP +.B SysRq-D +Triggers a debug dump of the current scheduler state to the +.B sched_ext_dump +tracepoint. +.SH SEE ALSO +.BR sched (7), +.BR sched_setscheduler (2), +.BR sched_setattr (2), +.BR bpf (2) +.PP +.I Documentation/scheduler/sched-ext.rst +in the Linux kernel source tree.
--
2.48.1