Re: [PATCHv4 iproute2 2/2] lib/libnetlink: update rtnl_talk to support malloc buff at run time
From: Phil Sutter <phil@nwl.cc>
Date: 2017-10-13 10:31:26
On Thu, Oct 12, 2017 at 09:07:06AM -0700, Stephen Hemminger wrote:
On Wed, 11 Oct 2017 13:10:07 +0200 Phil Sutter [off-list ref] wrote:quoted
On Tue, Oct 10, 2017 at 09:47:43AM -0700, Stephen Hemminger wrote:quoted
On Tue, 10 Oct 2017 08:41:17 +0200 Michal Kubecek [off-list ref] wrote:quoted
On Mon, Oct 09, 2017 at 10:25:25PM +0200, Phil Sutter wrote:quoted
Hi Stephen, On Mon, Oct 02, 2017 at 10:37:08AM -0700, Stephen Hemminger wrote:quoted
On Thu, 28 Sep 2017 21:33:46 +0800 Hangbin Liu [off-list ref] wrote:quoted
From: Hangbin Liu <redacted> This is an update for 460c03f3f3cc ("iplink: double the buffer size also in iplink_get()"). After update, we will not need to double the buffer size every time when VFs number increased. With call like rtnl_talk(&rth, &req.n, NULL, 0), we can simply remove the length parameter. With call like rtnl_talk(&rth, nlh, nlh, sizeof(req), I add a new variable answer to avoid overwrite data in nlh, because it may has more info after nlh. also this will avoid nlh buffer not enough issue. We need to free answer after using. Signed-off-by: Hangbin Liu <redacted> Signed-off-by: Phil Sutter <phil@nwl.cc> ---Most of the uses of rtnl_talk() don't need to this peek and dynamic sizing. Can only those places that need that be targeted?We could probably do that, by having a buffer on stack in __rtnl_talk() which will be used instead of the allocated one if 'answer' is NULL. Or maybe even introduce a dedicated API call for the dynamically allocated receive buffer. But I really doubt that's feasible: AFAICT, that stack buffer still needs to be reasonably sized since the reply might be larger than the request (reusing the request buffer would be the most simple way to tackle this), also there is support for extack which may bloat the response to arbitrary size. Hangbin has shown in his benchmark that the overhead of the second syscall is negligible, so why care about that and increase code complexity even further? Not saying it's not possible, but I just doubt it's worth the effort.Agreed. Current code is based on the assumption that we can estimate the maximum reply length in advance and the reason for this series is that this assumption turned out to be wrong. I'm afraid that if we replace it by an assumption that we can estimate the maximum reply length for most requests with only few exceptions, it's only matter of time for us to be proven wrong again. Michal KubecekFor query responses, yes the response may be large. But for the common cases of add address or add route, the response should just be ack or error.And with extack, error is comprised of the original request plus an arbitrarily sized error message, so we can't just reuse the request buffer and are back to "guessing" the right length again. To get an idea of what we're talking about, I wrote a simple benchmark which adds 256 * 254 (= 65024) addresses to an interface, then removes them again one by one and measured the time that takes for binaries with and without Hangbin's patches: OP Vanilla Hangbin Delta -------------------------------------------------------- add real 2m16.244s real 2m27.964s +11.72s (108.6%) user 0m15.241s user 0m17.295s +2.054s (113.5%) sys 1m40.229s sys 1m48.239s +8.01s (108.0%) remove real 1m44.950s real 1m47.044s +2.094s (102.0%) user 0m13.899s user 0m14.723s +0.824s (105.9%) sys 1m30.798s sys 1m31.938s +1.140s (101.3%) So the overhead of the second syscall and dynamic memory allocation is less than 10% overall. Given the short time a single call to 'ip' typically takes, I don't think the difference is noticeable even in highly performance critical applications. Cheers, PhilFor a better benchmark, I generated 4 Million routes then did: # ip ---batch routes.txt
Ah, batch mode. Nice trick!
OP Vanilla Hangbin Delta ----------------------------------------------------- add real 1:25.840 1:33.677 +9.13% user 10.690 6.078 -56.85% sys 1:00.920 1:13.109 +20.00% remove real 2:29.881 2:25.872 -2.67% user 12.862 7.942 -38.25% sys 44.127 44.633 +1.15% So the answer is addition is slower but deletion appears faster?
Yeah, that's funny. Hangbin's tests show the same in his 'ip link show' test. I can imagine a performance improvement in some situations since the patches eliminate that memcpy() of the reply buffer in __rtnl_talk(), but neither 'route add' nor 'route del' trigger that code path.
If I rerun the Vanilla test, get about the same times. The slowdown won't impact me, but what about large scale users like Cumulus.
If they delete routes as often as they add them, things don't look too bad at least. :) Cheers, Phil