Re: [RFC] batched tc to improve change throughput

[RFC] batched tc to improve change throughput · Thomas Graf <tgraf@suug.ch> · 2005-01-17
Re: [RFC] batched tc to improve change throughput · jamal <hidden> · 2005-01-17
Re: [RFC] batched tc to improve change throughput · Thomas Graf <tgraf@suug.ch> · 2005-01-17
Re: [RFC] batched tc to improve change throughput · jamal <hidden> · 2005-01-17
Re: [RFC] batched tc to improve change throughput · Thomas Graf <tgraf@suug.ch> · 2005-01-17
Re: [RFC] batched tc to improve change throughput · jamal <hidden> · 2005-01-17
Re: [RFC] batched tc to improve change throughput · Thomas Graf <tgraf@suug.ch> · 2005-01-18
Re: [RFC] batched tc to improve change throughput · jamal <hidden> · 2005-01-18
Re: [RFC] batched tc to improve change throughput · Lennert Buytenhek <hidden> · 2005-01-18
Re: [RFC] batched tc to improve change throughput · jamal <hidden> · 2005-01-18
Re: [RFC] batched tc to improve change throughput · Thomas Graf <tgraf@suug.ch> · 2005-01-18
Re: [RFC] batched tc to improve change throughput · Lennert Buytenhek <hidden> · 2005-01-18
Re: [RFC] batched tc to improve change throughput · jamal <hidden> · 2005-01-19
Re: [RFC] batched tc to improve change throughput · Thomas Graf <tgraf@suug.ch> · 2005-01-18
Re: [RFC] batched tc to improve change throughput · Lennert Buytenhek <hidden> · 2005-01-18
Re: [RFC] batched tc to improve change throughput · jamal <hidden> · 2005-01-19
Re: [RFC] batched tc to improve change throughput · Thomas Graf <tgraf@suug.ch> · 2005-01-19
Re: [RFC] batched tc to improve change throughput · Thomas Graf <tgraf@suug.ch> · 2005-01-19
Re: [RFC] batched tc to improve change throughput · jamal <hidden> · 2005-01-20
Re: [RFC] batched tc to improve change throughput · Thomas Graf <tgraf@suug.ch> · 2005-01-20
Re: [RFC] batched tc to improve change throughput · Stephen Hemminger <hidden> · 2005-01-20
Re: [RFC] batched tc to improve change throughput · Thomas Graf <tgraf@suug.ch> · 2005-01-20
Re: [RFC] batched tc to improve change throughput · jamal <hidden> · 2005-01-24
Re: [RFC] batched tc to improve change throughput · Thomas Graf <tgraf@suug.ch> · 2005-01-24
Re: [RFC] batched tc to improve change throughput · jamal <hidden> · 2005-01-26
Re: [RFC] batched tc to improve change throughput · Thomas Graf <tgraf@suug.ch> · 2005-01-26
Re: [RFC] batched tc to improve change throughput · Dan Siemon <hidden> · 2005-02-11
Re: [RFC] batched tc to improve change throughput · jamal <hidden> · 2005-02-12
Re: [RFC] batched tc to improve change throughput · Thomas Graf <tgraf@suug.ch> · 2005-02-12
Re: [RFC] batched tc to improve change throughput · Dan Siemon <hidden> · 2005-02-12
Re: [RFC] batched tc to improve change throughput · Thomas Graf <tgraf@suug.ch> · 2005-02-12
Re: [RFC] batched tc to improve change throughput · Dan Siemon <hidden> · 2005-02-14
Re: [RFC] batched tc to improve change throughput · Thomas Graf <tgraf@suug.ch> · 2005-02-14
Re: [RFC] batched tc to improve change throughput · Dan Siemon <hidden> · 2005-02-15
Re: [RFC] batched tc to improve change throughput · Thomas Graf <tgraf@suug.ch> · 2005-02-15
Re: [RFC] batched tc to improve change throughput · Dan Siemon <hidden> · 2005-02-22
Re: [RFC] batched tc to improve change throughput · Thomas Graf <tgraf@suug.ch> · 2005-02-22
Re: [RFC] batched tc to improve change throughput · Stephen Hemminger <hidden> · 2005-01-17
Re: [RFC] batched tc to improve change throughput · Stephen Hemminger <hidden> · 2005-01-17

From: Thomas Graf <tgraf@suug.ch>
Date: 2005-01-26 14:35:45

* jamal [ref] 2005-01-26 08:48

On Mon, 2005-01-24 at 10:06, Thomas Graf wrote:

quoted

I'm not talking of the nlmsg_seq but rather a a sequence number with
global or nl_family scope. It gets increased whenever a netlink
message of that family is processed and is returned with the ack. If
a userspace application wants to enforce atomicy between two requests
which cannot be batched because a answer is expected in between then
it could provide the expected sequence number and the request is only
fullfilled if this is true. Example:

--> RTM_NEWLINK
<-- answer
<-- ACK (seq = 222)
--> RTM_SETLINK (expect = 222)
<-- ACK

Now if another netlink app interfers:

--> RTM_NEWLINK
<-- answer
<-- ACK (seq = 222)

-- other app --
--> RTM_SETLINK
<-- ACK (seq = 223)

-- back to first app --
--> RTM_SETLINK (expect = 222)
<-- ERROR

The application can then retry it's operation a few times and
finally give up.  The main problem I see is to extend nlmsghdr
in a way it stays compatible.

The best thing you could get out of this is a warning that something
changed under you i.e doesnt really solve the synchronization issue.

Why? If we do the check with regard to the rtnl sem we can guarantee
atomicity. The comparison of the expected seq and the current seq must
be done before any action and within the rtnl semaphore. It is very
unlikely that someone interfers so strict locking is pretty inefficient.

rtnl_send_atomic(msg, expect_seq)
	retries := 10;
retry:
	res := send_msg(msg, expect_seq);
	if res = -ERETRY and --retries then
		goto retry;
	endif

	if retries = 0 then
		err "Timeout while trying to achieve atomic operation"
	endif

and in the kernel:

rtnl_lock();
if expect_seq != seq then
   rtnl_unlock()
   return -ERETRY;
endif

... atomic action can take place here ...

Of course this only works if netlink requests itself are
synchronized in the relevant netlink family.

[And a lot more complexity is introduced - if you say you want to change
the netlink header and maintain state in the kernel].

This is the big problem, there is no padding gap common to all rtnl users.

What we can do is to set a flag in nlmsghdr stating that a u32 block of
data follows the nlmsg header before the netlink user specific header,
i.e.

 +---------------------------------+
 | nlmsghdr flags |= NLM_F_EXP_SEQ |
 +---------------------------------+
 | expected_seq (u32)              |
 +---------------------------------+
 | netlink user specific data      |
 +---------------------------------+

I'd even go one step further and define a header options chain like in
IPv6 so we can add more header attributes later on, like:

 +--------------------------------+
 | nlmsghdr flags |= NLM_F_OPTS   |
 +--------------------------------+
 | size=4, type=expt_seq, next=0  |
 +- - - - - - - -  - - - - - - - -+
 | expected sequence              |
 +--------------------------------+
 | netlink user specific data     |
 +--------------------------------+

Thoughts?

Your call really - you are the one who is going to maintain it;->
As for ease of use and avoiding users from knowing details of how
tlvs are put together etc - i think it doesnt matter how thats done
underneath the hood; it is still doable on top of current libnetlink. In
other words whats required, IMO, is something that hides netlink totaly
so that the programmer/user doesnt even get to see TLVs.

Agreed, I even hide the structs exported to usersapce to avoid breakage,
i.e. i don't export tc_stats directly for example.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help