Re: [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
From: Ingo Molnar <mingo@kernel.org>
Date: 2015-02-18 18:49:34
Also in:
linux-fsdevel, lkml
* Fam Zheng [off-list ref] wrote:
On Sun, 02/15 15:00, Jonathan Corbet wrote:quoted
On Fri, 13 Feb 2015 17:03:56 +0800 Fam Zheng [off-list ref] wrote:quoted
SYNOPSIS #include <sys/epoll.h> int epoll_pwait1(int epfd, int flags, struct epoll_event *events, int maxevents, struct epoll_wait_params *params);Quick, possibly dumb question: might it make sense to also pass in sizeof(struct epoll_wait_params)? That way, when somebody wants to add another parameter in the future, the kernel can tell which version is in use and they won't have to do an epoll_pwait2()?Flags can be used for that, if the change is not radically different.
Passing in size is generally better than flags, because that way an extension of the ABI (new field[s]) automatically signals towards the kernel what to do with old binaries - while extending the functionality of new binaries, without sacrificing functionality. With flags you are either limited to the same structure size - or have to decode a 'size' value from the flags value - which is fragile (and in which case a real 'size' parameter is better). in the perf ABI we use something like that: there's a perf_attr.size parameter that iterates the ABI forward, while still being binary compatible with older software. If old binaries pass in a smaller structure to a newer kernel then the kernel pads the new fields with zero by default - that way the kernel internals are never burdened with compatibility details and data format versions. If new user-space passes in a large structure than the kernel can handle then the kernel returns an error - this way user-space can transparently support conditional features and fallback logic. It works really well, we've done literally a hundred perf ABI extensions this way in the last 4+ years, in a pretty natural fashion, without littering the kernel (or user-space) with version legacies and without breaking existing perf tooling. Other syscall ABIs already get painful when trying to handle 2-3 data structure versions, so people either give up, or add flags kludges or go to new syscall entries: which is painful in its own fashion and adds unnecessary latency to feature introduction as well. Thanks, Ingo