Re: [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
From: Fam Zheng <hidden>
Date: 2015-02-25 03:31:20
Also in:
linux-fsdevel, lkml
On Wed, 02/18 19:49, Ingo Molnar wrote:
* Fam Zheng [off-list ref] wrote:quoted
On Sun, 02/15 15:00, Jonathan Corbet wrote:quoted
On Fri, 13 Feb 2015 17:03:56 +0800 Fam Zheng [off-list ref] wrote:quoted
SYNOPSIS #include <sys/epoll.h> int epoll_pwait1(int epfd, int flags, struct epoll_event *events, int maxevents, struct epoll_wait_params *params);Quick, possibly dumb question: might it make sense to also pass in sizeof(struct epoll_wait_params)? That way, when somebody wants to add another parameter in the future, the kernel can tell which version is in use and they won't have to do an epoll_pwait2()?Flags can be used for that, if the change is not radically different.Passing in size is generally better than flags, because that way an extension of the ABI (new field[s]) automatically signals towards the kernel what to do with old binaries - while extending the functionality of new binaries, without sacrificing functionality. With flags you are either limited to the same structure size - or have to decode a 'size' value from the flags value - which is fragile (and in which case a real 'size' parameter is better). in the perf ABI we use something like that: there's a perf_attr.size parameter that iterates the ABI forward, while still being binary compatible with older software. If old binaries pass in a smaller structure to a newer kernel then the kernel pads the new fields with zero by default - that way the kernel internals are never burdened with compatibility details and data format versions. If new user-space passes in a large structure than the kernel can handle then the kernel returns an error - this way user-space can transparently support conditional features and fallback logic. It works really well, we've done literally a hundred perf ABI extensions this way in the last 4+ years, in a pretty natural fashion, without littering the kernel (or user-space) with version legacies and without breaking existing perf tooling. Other syscall ABIs already get painful when trying to handle 2-3 data structure versions, so people either give up, or add flags kludges or go to new syscall entries: which is painful in its own fashion and adds unnecessary latency to feature introduction as well.
Excellent. This now makes a lot of sense to me, thanks to your explanations, Ingo. I'll add the "size" field in the next revision. Thanks, Fam