Re: [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP-DD transceivers
From: Ido Schimmel <hidden>
Date: 2020-06-30 05:59:54
On Tue, Jun 30, 2020 at 02:21:59AM +0200, Andrew Lunn wrote:
I've no practice experience with modules other than plain old SFPs, 1G. And those have all sorts of errors, even basic things like the CRC are systematically incorrect because they are not recalculated after adding the serial number. We have had people trying to submit patches to ethtool to make it ignore bits so that it dumps more information, because the manufacturer failed to set the correct bits, etc. Ido, Adrian, what is your experience with these QSFP-DD devices. Are they generally of better quality, the EEPROM can be trusted? Is there any form of compliance test.
Vadim, I know you tested with at least two different QSFP-DD modules, can you please share your experience?
If we go down the path of using the discovery information, it means we have no way for user space to try to correct for when the information is incorrect. It cannot request specific pages. So maybe we should consider an alternative? The netlink ethtool gives us more flexibility. How about we make a new API where user space can request any pages it want, and specify the size of the page. ethtool can start out by reading page 0. That should allow it to identify the basic class of device. It can then request additional pages as needed.
Just to make sure I understand, this also means adding a new API towards drivers, right? So that they only read from HW the requested info.
The nice thing about that is we don't need two parsers of the discovery information, one in user and second in kernel space. We don't need to guarantee these two parsers agree with each other, in order to correctly decode what the kernel sent to user space. And user space has the flexibility to work around known issues when manufactures get their EEPROM wrong.
Sounds sane to me... I know that in the past Vadim had to deal with various faulty modules. Vadim, is this something we can support? What happens if user space requests a page that does not exist? For example, in the case of QSFP-DD, lets say we do not provide page 03h but user space still wants it because it believes manufacturer did not set correct bits.