Thread (57 messages) 57 messages, 4 authors, 2021-08-19

Re: [PATCH linux-next v3 2/6] vdpa: Introduce query of device config layout

From: Jason Wang <jasowang@redhat.com>
Date: 2021-06-25 03:28:55

在 2021/6/24 下午3:59, Parav Pandit 写道:
quoted
From: Jason Wang <jasowang@redhat.com>
Sent: Thursday, June 24, 2021 12:35 PM
quoted
quoted
Consider we had a mature set of virtio specific uAPI for config space.
It would be a burden if we need an unnecessary translation layer of
netlink in the middle:

[vDPA parent (virtio_net_config)] <-> [netlink
(VDPA_ATTR_DEV_NET_XX)] <-> [userspace
(VDPA_ATTR_DEV_NET_XX)]
quoted
quoted
quoted
<-> [ user (virtio_net_config)]
This translation is not there. We show relevant net config fields as
VDPA_ATTR_DEV_NET individually.
quoted
It is not a binary dump which is harder for users to parse and make any use
of it.


The is done implicitly, user needs to understand the semantic of
virtio_net_config and map the individual fields to the vdpa tool sub-
command.
Mostly not virtio_net_config is for the producer and consumer sw entities.
Here user doesn't know about such layout and where its located.
User only sets config params that gets set in the config space.
(without understanding what is config layout and its location).
quoted
quoted
It is only one level of translation from virtio_net_config (kernel) -> netlink
vdpa fields.
quoted
It is similar to 'struct netdevice' -> rtnl info fields.
I think not, the problem is that the netdevice is not a part of uAPI but
virtio_net_config is.
Virtio_net_config is a UAPI for sw consumption.
That way yes, netlink can also do it, however it requires side channel communicate what is valid.
quoted
quoted
quoted
If we make netlink simply a transport, it would be much easier. And we
had
quoted
quoted
the chance to unify the logic of build_config() and set_config() in the
driver.
quoted
How? We need bit mask to tell that out of 21 fields which fields to update
and which not.
quoted
And that is further mixed with offset and length.
So set_config() could be called from userspace, so did build_config().
The only difference is:

1) they're using different transport, ioctl vs netlink
2) build_config() is only expected to be called by the management tool

If qemu works well via set_config ioctl, netlink should work as well.
mlx5 set_config is noop.
vdpa_set_config() need to return an error code. I don't
vp_vdpa.c blindly writes the config as its passthrough.
Parsing which fields to write and which not, using offset and length is a messy code with typecast and compare old values etc.

I don't see why it needs typecast, virtio_net_config is also uABI, you 
can deference the fields directly.

quoted
Btw, what happens if management tool tries to modify the mac of vDPA
when the device is already used by the driver?
At present it allows modifying, but it should be improved in future to fail if device is in use.

This is something we need to fix I think. Or if it's really useful to 
allowing the attributes to be modified after the device is created.

Why not simply allow the config to be built only at device creation?

quoted
quoted
quoted
quoted
quoted
And actually, it's not the binary blob since uapi clearly define the
format (e.g struct virtio_net_config), can we find a way to use that?
E.g introduce device/net specific command and passing the blob with
length and negotiated features.
Length may change in future, mostly expand. And parsing based on
length
quoted
quoted
is not such a clean way.


Length is only for legal checking. The config is self contained with:
Unlikely. When structure size increases later, the parsing will change based
on the length.
quoted
Because older kernel would return shorter length with older iproute2 tool.
This is fine since the older kernel only support less features. The only
possible issue if the old iproute 2 runs on new kernel. With the current
proposal, it may cause some config fields can't not be showed.
Not showing is ok.
But the code is messy to typecast on size.
quoted
I think it might be useful to introduce a command to simply dump the
config space.

quoted
So user space always have to deal and have nasty parsing/typecasting
based on the length.
Such nasty parsing is not required for netlink interface.
quoted
That's how userspace (Qemu) is expected to work now. The userspace
should determine the semantic of the fields based on the features.

Differentiate config fields doesn't help much, e.g userspace still need
to differ LINK_UP and ANNOUNCE for the status field.
Yes, this parsing is from constant size u16 status.
quoted
[..]
quoted
quoted
Its not about performance. By the time 1st call is made, features got
updated and it is out of sync with config.
quoted
quoted
1) get config
2) get device id
3) get features
This requires using features from 3rd netlink output to decode output of
1st netlink output.
quoted
Which is a bit odd of netlink.
Other netlink nla_put() probably sending whole structure doesn’t need to
do it.


Well, we can pack them all into a single skb isn't it? (probably with a
config len).
You want to pack features and config both in the single nla_put()?
If so, it isn't necessary. There are more examples in kernel that adds individual fields instead of nla_put(blob).
I wouldn’t follow those nla_put() callers.

No, a single skb not single nla_put().

Actually git grep told me a very good example of carrying uABI via 
netlink, that is the ndt_config:

1) we had ndt_config definition in the uAPI
2) netlink simply carries the structure in neightbl_fill_info():

                 if (nla_put(skb, NDTA_CONFIG, sizeof(ndc), &ndc))


For virito_net_config, why not simply:

len = ops->get_config_len();
config = kmalloc(len, GFP_KERNEL);
ops->get_config(vdev, 0, config, len);
nla_put(skb, VIRTIO_CONFIG, config, len);
nla_put_le64(skb, VIRTIO_FETURES, features);

For build_config, we can simply do thing reversely. Then everything 
works via the existing virtio uAPI/ABI.

quoted
quoted
quoted
For build config, it's only one

1) build config

quoted
I prefer to follow rest of the kernel style to return self contained
invidividual fields.


But I saw a lot of kernel codes choose to use e.g nla_put() directly with
module specific structure.
It might be self-contained structure that probably has not found the need
to expand.


I think it's just a matter of putting the config length with the config
data. Note that we've already had .get_config_size() ops for validating
inputs through VHOST_SET_CONFIG/VHOST_GET_CONFIG.
This length comes as part of the netlink interface already, no need for extra length.
The whole point is to avoid parsing based on length.

Well, it doesn't do anything difference compared to xxx_is_valid which 
just calculating the offset implicitly (via the compiler).

We cannot change the virtio_net_config UAPI in use, but netlink code doesn’t need to be bound to size based typecasting and compare fields during build_config().

The points are:

1) Avoid duplicating the existing uAPIs
2) Avoid unnecessary parsing in the netlink, netlink is just the 
transport, it's the charge of the vDPA parent to do that

Thanks


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help