Thread (14 messages) 14 messages, 6 authors, 2023-02-14

Re: question about the performance impact of sec=krb5

From: Trond Myklebust <hidden>
Date: 2023-02-13 15:38:28

On Feb 13, 2023, at 09:55, Olga Kornievskaia [off-list ref] wrote:

On Sun, Feb 12, 2023 at 1:08 PM Chuck Lever III [off-list ref] wrote:
quoted

quoted
On Feb 12, 2023, at 1:01 AM, Wang Yugui [off-list ref] wrote:

Hi,

question about the performance of sec=krb5.

https://learn.microsoft.com/en-us/azure/azure-netapp-files/performance-impact-kerberos
Performance impact of krb5:
     Average IOPS decreased by 53%
     Average throughput decreased by 53%
     Average latency increased by 3.2 ms
Looking at the numbers in this article... they don't
seem quite right. Here are the others:
quoted
Performance impact of krb5i:
     • Average IOPS decreased by 55%
     • Average throughput decreased by 55%
     • Average latency increased by 0.6 ms
Performance impact of krb5p:
     • Average IOPS decreased by 77%
     • Average throughput decreased by 77%
     • Average latency increased by 1.6 ms
I would expect krb5p to be the worst in terms of
latency. And I would like to see round-trip numbers
reported: what part of the increase in latency is
due to server versus client processing?

This is also remarkable:
quoted
When nconnect is used in Linux, the GSS security context is shared between all the nconnect connections to a particular server. TCP is a reliable transport that supports out-of-order packet delivery to deal with out-of-order packets in a GSS stream, using a sliding window of sequence numbers. When packets not in the sequence window are received, the security context is discarded, and a new security context is negotiated. All messages sent with in the now-discarded context are no longer valid, thus requiring the messages to be sent again. Larger number of packets in an nconnect setup cause frequent out-of-window packets, triggering the described behavior. No specific degradation percentages can be stated with this behavior.

So, does this mean that nconnect makes the GSS sequence
window problem worse, or that when a window underrun
occurs it has broader impact because multiple connections
are affected?
Yes nconnect makes the GSS sequence window problem worse (very typical
to generate more than gss window size number of rpcs and have no
ability to control in what order they would be sent) and yes all
connections are affected. ONTAP as linux uses 128 gss window size but
we've experimented with increasing it to larger values and it would
still cause issues.
quoted
Seems like maybe nconnect should set up a unique GSS
context for each xprt. It would be helpful to file a bug.
At the time when I saw the issue and asked about it (though can't find
a reference now) I got the impression that having multiple contexts
for the same rpc client was not going to be acceptable.
We have discussed this earlier on this mailing list. To me, the two issues are separate.
- It would be nice to enforce the GSS window on the client, and to throttle further RPC calls from using a context once the window is full.
- It might also be nice to allow for multiple contexts on the client and to have them assigned on a per-xprt basis so that the number of slots scales with the number of connections.

Note though, that window issues do tend to be mitigated by the NFSv4.x (x>0) sessions. It would make sense for server vendors to ensure that they match the GSS window size to the max number of session slots.

_________________________________
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help