Re: RFC: MTU for serving NFS on Infiniband
From: Eric Dumazet <hidden>
Date: 2010-08-25 05:55:08
Also in:
lkml
Le mardi 24 août 2010 à 15:39 -0700, Stephen Hemminger a écrit :
IF NFS server is smart enough to generate: Header (skb) + one or more pages in fragment list then IP fragmentation could do fragmentation by allocating new headers skb (small) and assigning the same pages to multiple skb's using page ref count. It obviously isn't working that way.
It is, but ip_append_data() is allocating a huge head if MTU is huge. NFS is trying to build paged skb, to avoid order-X allocations (X > 0)
The whole problem is moot because NFS over UDP has known data corruption issues in the face of packet loss. The sequence number of the IP fragment can easily wrap around causing old data to be grouped with new data and the UDP checksum is so weak that the resulting UDP packet will be consumed by the NFS client ans passed to the user application as corrupted disk block. DON'T USE NFS OVER UDP!
But Marc point is using a big MTU, so that no IP fragmentation is needed. All UDP applications using MSG_MORE will hit the order-2 allocations if MTU=9000 for example...