Re: [Pv-drivers] [PATCH 2.6.31-rc9] net: VMware virtual Ethernet NIC driver: vmxnet3
From: Chris Wright <hidden>
Date: 2009-09-29 20:30:43
Also in:
lkml
* Bhavesh Davda (bhavesh@vmware.com) wrote:
Hi Chris, Thanks a bunch for your really thorough review! I'll answer some of your questions here. Shreyas can respond to your comments about some of the coding style/comments/etc. in a separate mail.
The style is less important at this stage, but certainly eases review to make it more consistent w/ Linux code. The StudlyCaps, extra macros (screaming caps) and inconistent space/tabs are visual distractions, that's all.
quoted
quoted
INTx, MSI, MSI-X (25 vectors) interrupts 16 Rx queues, 8 Tx queuesDriver doesn't appear to actually support more than a single MSI-X interrupt. What is your plan for doing real multiqueue?When we first wrote the driver a couple of years ago, Linux lacked proper multiqueue support, hence we chose to use only a single queue though the emulated device does support 16 Rx and 8 Tx queues, and 25 MSI-X vectors: 16 for Rx, 8 for Tx and 1 for other asynchronous event notifications, by design. Actually a driver can repurpose any of the 25 vectors for any notifications; just explaining the rationale for desiging the device with 25 MSI-X vectors.
I see, thanks.
We do have an internal prototype of a Linux vmxnet3 driver with 4 Tx queues and 4 Rx queues, using 9 MSI-X vectors, but it needs some work before calling it production ready.
I'd expect once you switch to alloc_etherdev_mq(), make napi work per rx queue, and fix MSI-X allocation (all needed for 4/4), you should have enough to support the max of 16/8 (IOW, 4/4 still sounds like an aritificial limitation).
quoted
How about GRO conversion?Looks attractive, and we'll work on that in a subsequent patch. Again, when we first wrote the driver, the NETIF_F_GRO stuff didn't exist in Linux.
OK, shouldn't be too much work. Another thing I forgot to mention is that net_device now has net_device_stats in it. So you shouldn't need net_device_stats in vmxnet3_adapter.
quoted
Also, heavy use of BUG_ON() (counted 51 of them), are you sure that none of them can be triggered by guest or remote (esp. the ones that happen in interrupt context)? Some initial thoughts below.We'll definitely audit all the BUG_ONs again to make sure they can't be exploited.quoted
quoted
--- /dev/null +++ b/drivers/net/vmxnet3/upt1_defs.h +#define UPT1_MAX_TX_QUEUES 64 +#define UPT1_MAX_RX_QUEUES 64This is different than the 16/8 described above (and seemingly all moot since it becomes a single queue device).Nice catch! Those are not even used and are from the earliest days of our driver development. We'll nuke those.
Could you describe the UPT layer a bit? There were a number of constants that didn't appear to be used.
quoted
quoted
+/* interrupt moderation level */ +#define UPT1_IML_NONE 0 /* no interrupt moderation */ +#define UPT1_IML_HIGHEST 7 /* least intr generated */ +#define UPT1_IML_ADAPTIVE 8 /* adpative intr moderation */enum? also only appears to support adaptive mode?Yes, the Linux driver currently only asks for adaptive mode, but the device supports 8 interrupt moderation levels.quoted
quoted
--- /dev/null +++ b/drivers/net/vmxnet3/vmxnet3_defs.h +struct Vmxnet3_MiscConf { + struct Vmxnet3_DriverInfo driverInfo; + uint64_t uptFeatures; + uint64_t ddPA; /* driver data PA */ + uint64_t queueDescPA; /* queue descriptor tablePA */quoted
+ uint32_t ddLen; /* driver data len */ + uint32_t queueDescLen; /* queue desc. table lenin bytes */quoted
+ uint32_t mtu; + uint16_t maxNumRxSG; + uint8_t numTxQueues; + uint8_t numRxQueues; + uint32_t reserved[4]; +};should this be packed (or others that are shared w/ device)? i assume you've already done 32 vs 64 hereNo need for packing since the fields are naturally 64-bit aligned. True for all structures shared between the driver and device.
I had quickly looked and thought I saw a hole that would lead to inconsistent layout for 32-bit vs 64-bit. I figured I'd be wrong there ;-)
quoted
quoted
+#define VMXNET3_MAX_TX_QUEUES 8 +#define VMXNET3_MAX_RX_QUEUES 16different to UPT, I must've missed some layering hereThese are the authoritiative #defines. Ignore the UPT ones.quoted
quoted
--- /dev/null +++ b/drivers/net/vmxnet3/vmxnet3_drv.c + VMXNET3_WRITE_BAR0_REG(adapter, VMXNET3_REG_IMR + intr_idx *8, 0); writel(0, adapter->hw_addr0 + VMXNET3_REG_IMR + intr_idx * 8) seems just as clear to me.Fair enough. We were just trying to clearly show which register accesses go to BAR 0 versus BAR 1.quoted
only ever num_intrs=1, so there's some plan to bump this up and make these wrappers useful?Yes.quoted
quoted
+static void +vmxnet3_process_events(struct vmxnet3_adapter *adapter)Should be trivial to break out to it's own MSI-X vector, basically set up to do that already.Yes, and the device is configurable to use any vector for any "events", but didn't see any compelling reason to do so. "ECR" events are extremely rare and we've got a shadow copy of the ECR register that avoids an expensive round trip to the device, stored in adapter->shared->ecr. So we can cheaply handle events on the hot Tx/Rx path with minimal overhead. But if you really see a compelling reason to allocate a separate MSI-X vector for events, we can certainly do that.
Nah, just thinking outloud while trying to understand the driver. I figured it'd be the + 1 vector (16 + 8 + 1). thanks, -chris