Re: PMTU issues due to TOS field manipulation (for DSCP)
From: David S. Miller <hidden>
Date: 2003-12-12 08:31:43
On Thu, 11 Dec 2003 02:34:51 +0200 (EET) Julian Anastasov [off-list ref] wrote:
On Wed, 10 Dec 2003, David S. Miller wrote:quoted
But regardless, let us say that your system has complexity O(16) lookups as you mention, your proposal changes this to O(16+8).It is ~16 :) ip_rt_max_size = (rt_hash_mask + 1) * 16; This is what happens on full table, of course. OK, some simple numbers for an ideal table:
But look at default gc_thresh setting, which is when we trim
rt cache entries:
ipv4_dst_ops.gc_thresh = (rt_hash_mask + 1);
The ip_rt_max_size value is meant to be a sort of buffer to absorb
the situation where many rt cache entries are unreclaimable.
But this is a seperate issue, and we can discuss your further points
regardless.
2 cases depending on whether TOS is a hash key (path=saddr->daddr): 1. TOS is a hash key: - in each chain we have 16 paths, 1 TOS value per path - all 8 TOS values for a path are in 8 different chains 2. TOS is not a hash key: 2 paths per chain (2 paths x 8 TOS values => 16 entries) if all saddr->daddr->tos streams have same packet rate I think the CPU time to lookup them will be same. This is because 8 (number of TOS values) < 16 (chain length). And I hope the users always can tune the proposed TOS settings if they see DoS and if they do not need TOS as a rt key.
Ok. I agree with your analysis. Let's propose something concrete. 1) PMTU processing applies PMTU change to all TOS'd instances of a route. This behavior change is sysctl controllable, and on by default. The implementation is to just lookup all 8 possible TOS values. 2) Whether TOS is a routing cache hash key is controlled by another sysctl. When CONFIG_IP_ROUTE_TOS is set this sysctl defaults to on, other- wise it defaults to off. I think #2 should be very safe because fib node fn_tos values are only ever set when that config variable is enabled, and fib rule r_tos values are only compared on lookup when it is enabled as well. However, there could be a few more ifdefs added to the fib rule code to cover all the assignment cases too but let's not worry about that right now. Comments?