RE: [PATCH net-next 0/2] ice: support FEC automatic disable
From: "Keller, Jacob E" <jacob.e.keller@intel.com>
Date: 2022-08-29 07:11:31
-----Original Message----- From: Gal Pressman <redacted> Sent: Sunday, August 28, 2022 3:43 AM To: Jakub Kicinski <kuba@kernel.org>; Keller, Jacob E <jacob.e.keller@intel.com> Cc: Saeed Mahameed <saeedm@nvidia.com>; netdev@vger.kernel.org; Simon Horman [off-list ref]; Andy Gospodarek [off-list ref] Subject: Re: [PATCH net-next 0/2] ice: support FEC automatic disable On 27/08/2022 02:57, Jakub Kicinski wrote:quoted
On Fri, 26 Aug 2022 10:51:21 -0700 Jacob Keller wrote:quoted
On 8/25/2022 6:01 PM, Jakub Kicinski wrote:quoted
Oh, but per the IEEE standard No FEC is _not_ an option for CA-L. From the initial reading of your series I thought that Intel NICs would _never_ pick No FEC.That was my original interpretation when I was first introduced to this problem but I was mistaken, hence why the commit message wasn't clear :( This is rather more complicated than I originally understood and the names for various bits have not been named very well so their behavior isn't exactly obvious...quoted
Sounds like we need a bit for "ignore the standard and try everything". What about BASE-R FEC? Is the FW going to try it on the CA-L cable?Ok I got further clarification on this. We have a bit, "Auto FEC enable", as well as a bitmask for which FEC modes to try. If "Auto FEC En" is set, then the Link Establishment State Machine will try all of the FEC options we list in that bitmask, as long as we can theoretically support them even if they aren't spec compliant. For old firmware the bitmask didn't include a bit for "No FEC", where as the new firmware has a bit for "No FEC". We were always setting "Auto FEC En" so currently we try all FEC modes we could theoretically support. If "Auto FEC En" is disabled, then we only try FEC modes which are spec compliant. Additionally, only a single FEC mode is tried based on a priority and the bitmask. Currently and historically the driver has always set "Auto FEC En", so we were enabling non-spec compliant FEC modes, but "No FEC" was only based on spec compliance with the media type. From this, I think I agree the correct behavior is to add a bit for "override the spec and try everything", and then on new firmware we'd set the "No FEC" while on old firmware we'd be limited to only trying FEC modes. Does that make sense? So yea I think we do probably need a "ignore the standard" bit.. but currently that appears to already be what ice does (excepting No FEC which didn't previously have a bit to set for it)Thanks for getting to the bottom of this :) The "override spec modes" bit sounds like a reasonable addition, since we possibly have different behavior between vendors letting the user know if the device will follow the rules can save someone debugging time. But it does sound orthogonal to you adding the No FEC bit to the mask for ice. Let me add Simon and Andy so that we have the major vendors on the CC. (tl;dr the question is whether your FW follows the guidance of 'Table 110C–1—Host and cable assembly combinations' in AUTO FEC mode).
The other engineers I spoke to also wanted to mention that 110C-1 is only a small subset of all of the various link types. They also mentioned something about an SFF standard which describes many more types.
quoted
If all the vendors already ignore the standard (like Intel and AFAIU nVidia) then we just need to document, no point adding the bit...I think we misunderstood each other :). Our implementation definitely *does not* ignore the standard. When autoneg is disabled, and auto fec is enabled, the firmware chooses a fec mode according to the spec. If "no FEC" is not in the spec, we will not pick it (nor do we want to). It sounds like you're not happy with the spec, then why not change it? This doesn't sound like an area where we want to be non-compliant. Regardless, changing our interface because of one device' firmware bug/behavior change doesn't make any sense. This interface affects all consumers, and is going to stick with us forever. Firmware will eventually get updated and it only affects one driver.
Well, the current ice behavior for every FEC mode *except* No FEC, we try modes which may be supportable even though they're outside the spec. As far as I understand, the reason we attempt these is because it allows linking in more scenarios, presumably because some combination of things is not fully spec compliant? I don't really know for sure. For future firmware, this would include No FEC as well. Thus, even with future firmware we'd still be trying some modes outside of the spec. I can try to get some more answers tomorrow about the reasoning and justification for this behavior. Thanks, Jake