Re: [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework

From: Tan, Jianfeng <hidden>
Date: 2016-09-13 06:51:34

Hi Pankaj,


On 9/12/2016 6:55 PM, Pankaj Chauhan wrote:

On 9/9/2016 2:26 PM, Tan, Jianfeng wrote:

quoted

Hi Pankaj,

Thanks for the massive and great work.

Hi Jianfeng,

Thanks for the review.

quoted

On 9/5/2016 6:54 PM, Pankaj Chauhan wrote:

quoted

Introduce support for a generic framework for handling of switching
between
physical and vhost devices. The vswitch framework introduces the
following
concept:

1. vswitch_dev: Vswitch device is a logical switch which can have
physical and
virtio devices. The devices are operated/used using standard rte_eth
API for
physical devices and lib_vhost API for vhost devices, PMD API is not 
used
for vhost devices.

2. vswitch_port: Any physical or virtio device that is added to
vswitch. The
port can have its own tx/rx functions for doing data transfer, which
are exposed
to the framework using generic function pointers
(vs_port->do_tx/do_rx). This way
the generic code can do tx/rx without understanding the type of device
(Physical or
virtio). Similarly each port has its own functions to select tx/rx
queues. The framework
provides default tx/rx queue selection functions to the port when port
is added
(for both physical and vhost devices). But the framework allows the
switch implementation
to override the queue selection functions (vs_port->get_txq/rxq) if
required.

3. vswitch_ops: The ops is set of function pointers which are used to
do operations
like learning, unlearning, add/delete port, lookup_and_forward. The
user of vswitch
framework (vhost/main.[c,h])uses these function pointers to perform
above mentioned
operations, thus it remains agnostic of the underlying implementation.

Different switching logics can implement their vswitch_device and
vswitch_ops, and
register with the framework. This framework makes vhost-switch
application scalable
in terms of:

1. Different switching logics (one of them is vmdq, vhost/vmdq.[c,h]
2. Number of ports.
3. Policies of selecting ports for rx and tx.

Signed-off-by: Pankaj Chauhan <redacted>

After this patch set, how's mapping relationship between cores and
vswitch_dev? Old vhost example does not have the concept of switch, so
each core is binded with some VDEVs. Now, we still keep original logic?

(a) If yes, provided that phys device could has no direct relationship
with vdevs in other switching logics, we may need to bind those physical
devices to cores too? In that case, switch_worker() will receiving pkts
from all devices (phys or virtual) and dispatch, which looks like:

switch_worker() {
    FOR each port binding to this core {
         rx(port, pkts[], count);
         vs_lookup_n_fwd( information_needed );
    }
}

Since we support only one switch device running at one time. We bind 
the ports of the switchdev to the core. But The switch might have it's 
own logic to bind the port to the core. For example
VMDQ only supports one Physical port, other switch can support more 
than one Physical port. In order to take care of that i have added two 
ops in swithdev_ops:

1. vs_sched_rx_port:

struct vswitch_port *vs_sched_rx_port(struct vswitch_dev *vs_dev, enum
                                      vswitch_port_type ptype, 
uint16_t core_id)

2. vs_sched_tx_port:

struct vswitch_port *vs_sched_tx_port(struct vswitch_dev *vs_dev, enum
                                      vswitch_port_type ptype, uint16_t
core_id)

The idea of providing these functions is that vhost/main requests the 
underlying switch implementation to schedule a port for rx or Tx, the 
current running core_id is also passed as parameter. So the switch can
take a decision which port to do rx or tx based on core id, and may be 
some other custom policy.

For example VMDQ always returns the one single Physical port in 
response to these functions called from any core. The other switch
can return the ports bound to the cores.

Similarly there are two port operations (vs_port->get_rxq and 
vs_port->get_txq), here also we pass core_id as parameter so that
the underlying switch implementation can schedule the rx or tx queue 
based on the core_id.

The above mentioned ops are used in drain_eth_rx() and 
do_drain_mbuf_table() (main.c) currently, and they leave binding of 
physical port and the queues to the underlying implementation. This
way we can accommodate VMDQ which uses only one physical port and 
rxqueues based on VMDQ, OR any other switch which uses multiple 
physical port and rx/tx queue scheduling based on some other logic.

Please suggest if this way of scheduling ports and tx/rx queues is 
fine or not?

According to above explanation, in VMDQ switch, we cannot schedule two 
queues (belongs to the same port) on the same core, right?

quoted

(b) If no, we may bind each core with n switches? If so, switch_worker()
also need to be reworked to keep receiving pkts from all ports of the
switch, and dispatch.

switch_worker() {
    FOR each switch binding to this core {
         FOR each port of switch {
             rx(port, pkts[], count);
             vs_lookup_n_fwd( information_needed );
        }
    }
}

Since we currently support only one switch_dev in a running instance of
vhost_switch() we can use binding of ports to core as you suggested in 
#(a).

OK.

quoted

In all, (1) we'd better not use vdev to find phys dev in switch_worker
any more;

I agree with you. I have tried to do following in drain_eth_rx ():


        core_id = rte_lcore_id();
        rx_port = vs_sched_rx_port(vswitch_dev_g, VSWITCH_PTYPE_PHYS,
                                    core_id);
        if (unlikely(!rx_port))
            goto out;

       rxq = rx_port->get_rxq(rx_port, vdev, core_id);
        rx_count = rx_port->do_rx(rx_port, rxq, NULL, pkts, 
MAX_PKT_BURST);

Here we don't assume any relation between vdev and the physical device 
or rxq.  But pass vdev to the underlying switch so that it can choose
the rxq for this vdev. In case of VMDQ it will return VMDQ queue 
number for this vdev. In case of any other switch implementation it 
can have their own logic to decide rxq.

Thanks for explanation.

(2) we'd better not differentiate phys device and virtual

quoted

device in generic framework (it's just an attribute of vswitch_port.

What do you think?

I agree with your thought that given the current API in this patchset we
should aim for making switch_worker agnostic of the port type. Ideally 
it should look something like this:

switch_worker() {

        rx_port mask = VSWITCH_PTYPE_PHYS | VSWITCH_PTYPE_PHYS;

        rx_port = vs_sched_rx_port(vswit_dev_g, rx_port_mask, core_id)
        rx_q = rx_port->get_rxq(vs_port, vdev, code_id);
        rx_port->do_rx(rx_port, rxq, NULL, pktss, MAX_PKT_BURST);

Can we hide queues inside struct vswitch_port? I mean:
For VMDQ switch, treat (port_id, queue_id) as a vswitch_port, so far 
you've already stored "struct vhost_dev *" into vswitch_port.priv when 
it's a virtual port, how about store queue_id into vswitch_port.priv 
when it's a physical port.
For arp_learning switch, make (port_id, all_enabled_queues) as a 
vswitch_port.
Summarize above two: we treat (port_id, all_enabled_queues[]) as a 
vswitch_port.

How about it?

        vs_lookup_n_fwd(rx_port, pkts, rx_count, rxq);

}

Here i have two problems in achieving switch_worker as mentioned above:

1. while doing Rx from physical port, we need the associated vdev to 
know the VMDQ rx_q. That was the reason i kept current logic of 
keeping vdevs bound to the core in the switch_worker.  If we don't 
keep list of vdevs assigned to the core, how we' will get the rxq on 
phsyical device in case of VMDQ switch implementation.

I think my proposal above can solve this problem.

2. drain_mbuf_table() currently assumes that there is only one 
physical device on which we have to transmit. We may have to rethink 
where to keep the tx queues (&lcore_tx_queue[core_id]), i case we have 
multiple physical ports we might have to move tx queues to each port.

This function, drain_mbuf_table(), and its related data structures, are 
used for cache on tx direction. But it's not only useful for physical 
port, but also virtual port (vhost_dev). I suggest to make such cache a 
field of per vswitch_port. And each again, all vswitch_ports are 
connected to a core, each core will periodically clean those vswitch_ports.

Thanks,
Jianfeng

Since i didn't have good solution currently for #1 and #2 mentioned 
above i kept assumption of one single physical port as it is and tried 
to integrate VMDQ in the framework with leaving these assumptions (thus
the main structure of switch_worker) as it it.

Please suggest if you have some solution to above mentioned points in 
your mind. Ideally i agree we should have switch_worker() as i 
mentioned in pseudo-code i gave above.

quoted

Thanks,
Jianfeng

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help