Re: [RFC PATCH 0/6] Restructure EAL device model for bus support
From: Shreyansh Jain <hidden>
Date: 2016-11-23 09:42:52
I should have replied to this earlier, apologies. On Sunday 20 November 2016 09:00 PM, David Marchand wrote:
On Thu, Nov 17, 2016 at 6:29 AM, Shreyansh Jain [off-list ref] wrote:quoted
DPDK has been inherently a PCI inclined framework. Because of this, the design of device tree (or list) within DPDK is also PCI inclined. A non-PCI device doesn't have a way of being expressed without using hooks started from EAL to PMD. With this cover letter, some patches are presented which try to break this strict linkage of EAL with PCI devices. Aim is to generalize the device hierarchy on the lines of how Linux handles it: device A1 | +==.===='==============.============+ Bus A. | `--> driver A11 \ device A2 `-> driver A12 \______ |CPU | /````` device A1 / | / +==.===='==============.============+ Bus A` | `--> driver B11 device A2 `-> driver B12 Simply put: - a bus is connect to CPU (or core) - devices are conneted to Bus - drivers are running instances which manage one or more devices - bus is responsible for identifying devices (and interrupt propogation) - driver is responsible for initializing the device (*Reusing text from email [1]) In context of DPDK EAL: - a generic bus (not a driver, not a device). I don't know how to categorize a bus. It is certainly not a device, and then handler for a bus (physical) can be considered a 'bus driver'. So, just 'rte_bus'. - there is a bus for each physical implementation (or virtual). So, a rte_bus Object for PCI, VDEV, ABC, DEF and so on. - Buses are registered just like a PMD - RTE_PMD_BUS_REGISTER() - Each registered bus is part of a doubly list. -- Each device inherits rte_bus -- Each driver inherits rte_bus -- Device and Drivers lists are part of rte_bus - eth_driver is no more required - it was just a holder for PMDs to register themselves. It can be replaced with rte_xxx_driver and corresponding init/ uninit moved to rte_driver - rte_eth_dev modified to disassociate itself from rte_pci_device and connect to generic rte_device Once again, improvising from [1]: __ rte_bus_list / +----------'---+ |rte_bus | | driver_list------> List of rte_bus specific | device_list---- devices | scan | `-> List of rte_bus associated | match | drivers | dump | | ..some refcnt| (#) +--|------|----+ _________/ \_________ +--------/----+ +-\---------------+ |rte_device | |rte_driver | | rte_bus | | rte_bus | | rte_driver |(#) | init | | | | uninit | | devargs | | dev_private_size| +---||--------+ | drv_flags |(#) || | intr_handle(2*) |(#) | \ +----------\\\----+ | \_____________ \\\ | \ ||| +------|---------+ +----|----------+ ||| |rte_pci_device | |rte_xxx_device | (4*) ||| | PCI specific | | xxx device | ||| | info (mem,) | | specific fns | / | \ +----------------+ +---------------+ / | \ _____________________/ / \ / ___/ \ +-------------'--+ +------------'---+ +--'------------+ |rte_pci_driver | |rte_vdev_driver | |rte_xxx_driver | | PCI id table, | | <probably, | | .... | | other driver | | nothing> | +---------------+ | data | +----------------+ +----------------+ (1*) Problem is that probe functions have different arguments. So, generalizing them might be some rework in the respective device layers (2*) Interrupt handling for each driver type might be different. I am not sure how to generalize that either. This is grey area for me. (3*) Probably exposing a bitmask for device capabilities. Nothing similar exists now to relate it. Don't know if that is useful. Allowing applications to question a device about what it supports and what it doesn't - making it more flexible at application layer (but more code as well.) (4*) Even vdev would be an instantiated as a device. It is not being done currently. (#) Items which are still pending With this cover letter, some patches have been posted. These are _only_ for discussion purpose. They are not complete and neither compilable. All the patches, except 0001, have sufficient context about what the changes are and rationale for same. Obviously, code is best answer.As said by Jan B., I too think that splitting the patches into smaller patchsets is important. Looking at your datamodel, some quick comments : - Why init/uninit in rte_driver ? Why not probe/remove ?
No specific reason. At first I wasn't planning to replace driver->init with driver->probe (which rte_pci_driver) is doing. But, I will do this in v2.
- I don't think dev_private_size makes sense in rte_driver. The bus is responsible for creating rte_device objects (embedded or not in rte_$bus_device objects). If a driver needs to allocate some priv (for whatever needs), the driver should do the allocation itself (or maybe a helper for current eth_driver), then reference the original rte_$bus_device object it got from the probe method.
first off, this came as a suggestion on the ML somewhere as far as I remember. Secondly, your point makes sense. I will move it back.
For a first patchset, I would see: - introduce the rte_bus object. In rte_eal_init, for each bus, we call the scan method. Then, for each bus, we find the appropriate rte_driver using the bus match method then call the probe method. If the probe succeeds, the rte_device points to the associated rte_driver, - migrate the pci scan code to a pci bus (scan looks at sysfs for linux / ioctl for bsd + devargs for blacklist / whitelist ?), match is the same at what is done in rte_eal_pci_probe_one_driver() at the moment, - migrate the vdev init code to a vdev bus (scan looks at devargs): this is new, we must create rte_device objects for vdev drivers to use later
Thanks for outlay - I agree with that.
Then we can talk about the next steps once the bus is in place.
I will post the new set probably tomorrow or day-after.
- Shreyansh