Re: [PATCH 2/2] virtio: fix PCI config err handling
From: Luca Boccassi <hidden>
Date: 2018-08-16 10:27:36
On Thu, 2018-08-16 at 14:46 +0800, Tiwei Bie wrote:
On Wed, Aug 15, 2018 at 10:50:57AM +0100, Luca Boccassi wrote:quoted
On Wed, 2018-08-15 at 11:11 +0800, Tiwei Bie wrote:quoted
On Tue, Aug 14, 2018 at 03:30:35PM +0100, Luca Boccassi wrote:quoted
From: Brian Russell <redacted> In virtio_read_caps, rte_pci_read_config returns the number of bytes read from PCI config or < 0 on error. If less than the expected number of bytes are read then log the failure and return rather than carrying on with garbage.Is this a fix or an improvement? Or did you see anything broken without this patch? If so, we may need a fixes line and Cc stable.It is a fix, as it was creating problems in production due to the constant flux of errors in the logs.Could you be a bit more specific about which errors were logged if possible? If my understanding is correct, you mean the errors were logged because less than the required amount of bytes were read?
Yes - rte_pci_read_config on Linux will return not just 0/-1, but the actual number of bytes read. If it's less than the required amount, the code then goes on and reads garbage, which causes errors later in the execution. Checking that we actually got the amount of data we need fixes this issue.
quoted
But given patch 1/2 is effectively doing a small change in the BSD bus API, and it's a requirement for 2/2, I don't think we can include it in the stable releases unfortunately.If it's a fix, we need a fixes line.
Sure, will send a v2.
quoted
quoted
quoted
[...]quoted
quoted
quoted
@@ -567,16 +567,18 @@ virtio_read_caps(struct rte_pci_device*dev, struct virtio_hw *hw) } ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST); - if (ret < 0) { - PMD_INIT_LOG(DEBUG, "failed to read pci capability list"); + if (ret != 1) { + PMD_INIT_LOG(DEBUG, + "failed to read pci capability list, ret %d", ret); return -1; } while (pos) { ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos); - if (ret < 0) { - PMD_INIT_LOG(ERR, - "failed to read pci cap at pos: %x", pos); + if (ret != sizeof(cap)) {Above code has to successfully read a full virtio PCI capability during each read, otherwise it will give up reading other capabilities and may fallback to the legacy mode. In which case it will fail to read the requested amount of bytes? Should we try to read the generic PCI fields first?
I do not know what exactly causes less than required bytes to be read, but we have seen it happen in production (not 100% of the times though - so I think it's worth keeping the structure as-is). As you said in that case it falls back to legacy mode which, in our experience in production deployments, then succeeds. That's why the error level print is undesired - because the code will actually work via the fallback, but the customers will see scary errors in the logs and open escalations :-)
Besides, you also need to update other calls to rte_pci_read_config(), e.g.: https://github.com/DPDK/dpdk/blob/76b9d9de5c7d/drivers/net/virtio/vir tio_pci.c#L696 Thanks
Sure I will apply the same changes in v2. -- Kind regards, Luca Boccassi