Re: [PATCH net v2] switchdev: don't abort hardware ipv4 fib offload on failure to program fib entry in hardware
From: Jiri Pirko <jiri@resnulli.us>
Date: 2015-05-29 07:42:45
Thu, May 28, 2015 at 05:35:05PM CEST, john.fastabend@gmail.com wrote:
On 05/28/2015 02:42 AM, Jiri Pirko wrote:quoted
Mon, May 18, 2015 at 10:19:16PM CEST, davem@davemloft.net wrote:quoted
From: Roopa Prabhu <redacted> Date: Sun, 17 May 2015 16:42:05 -0700quoted
On most systems where you can offload routes to hardware, doing routing in software is not an option (the cpu limitations make routing impossible in software).You absolutely do not get to determine this policy, none of us do. What matters is that by default the damn switch device being there is %100 transparent to the user. And the way to achieve that default is to do software routes as a fallback. I am not going to entertain changes of this nature which fail route loading by default just because we've exceeded a device's HW capacity to offload. I thought I was _really_ clear about this at netdev 0.1I certainly agree that by default, transparency 1:1 sw:hw mapping is what we need for fib. The current code is a good start! I see couple of issues regarding switchdev_fib_ipv4_abort: 1) If user adds and entry, switchdev_fib_ipv4_add fails, abort is executed -> and, error returned. I would expect that route entry should be added in this case. The next attempt of adding the same entry will be successful. The current behaviour breaks the transparency you are reffering to. 2) When switchdev_fib_ipv4_abort happens to be executed, the offload is disabled for good (until reboot). That is certainly not nice, alhough I understand that is the easiest solution for now. I believe that we all agree that the 1:1 transparency, although it is a default, may not be optimal for real-life usage. HW resources are limited and user does not know them. The danger of hitting _abort and screwing-up the whole system is huge, unacceptable. So here, there are couple of more or less simple things that I suggest to do in order to move a little bit forward: 1) Introduce system-wide option to switch _abort to just plain fail. When HW does not have capacity, do not flush and fallback to sw, but rather just fail to add the entry. This would not break anything. Userspace has to be prepared that entry add could fail. 2) Introduce a way to propagate resources to userspace. Driver knows about resources used/available/potentially_available. Switchdev infra could be extended in order to propagate the info to the user.I currently use the FlowAPI work I presented at netdev conference for this. Perhaps I was a bit reaching by trying to also push it as a replacement for the ethtool flow classification mechanism all in one shot. For what it is worth replacing 'ethtool' flow classifier when I have a pipeline of tables in a NIC is really my first use case for the 'set' operations but that is off-topic probably. The benefits I see of using this interface (or if you want rename it and push it into a different netlink type) is it gives you the entire view of the switch resources and pipeline from a single interface. Also because you are talking about system-wide behaviour above it nicely rolls up into user space software where we can act on it with the flags we have for l2 already and if we pursue your option (3) also l3. I like the single interface vs. scattering the information across many different interfaces this way we can do it once and be done with it. If you scatter it across all the interfaces just l2,l3 for now but we will get more then each interface will have its own mechanism and I have no idea where you put global information such as table ordering.
I think that for fib capacities/capabilities, user should be able to use extended existing Netlink interface. Not some parallel one. I'm still not convinced that user should care about the actual hw pipeline. We already have a pipeline in kernel. Switch drivers should just do mapping, easy as that.