nfs-idmapd startup race

From: Aram Akhavan <hidden>
Date: 2023-03-09 07:24:23

Hi all,

I've been debugging an nfs server issue where id mapping was not 
happening correctly unless I restarted nfs-kernel-server and re-exported 
shares shortly after reboot. The main symptom is the following log 
entries from nfs-idmapd.service:

Mar 08 22:45:59 343guiltyspark.nub.lan systemd[1]: Starting NFSv4 ID-name mapping service...
Mar 08 22:45:59 343guiltyspark.nub.lan rpc.idmapd[620]: libnfsidmap: Unable to determine the NFSv4 domain; Using 'localdomain' as the NFSv4 domain which means UIDs will be mapped to the 'Nobody-User' user defined in /etc/idmapd.conf
Mar 08 22:45:59 343guiltyspark.nub.lan rpc.idmapd[620]: rpc.idmapd: libnfsidmap: Unable to determine the NFSv4 domain; Using 'localdomain' as the NFSv4 domain which means UIDs will be mapped to the 'Nobody-User' user defined in /etc/idmapd.conf
Mar 08 22:45:59 343guiltyspark.nub.lan rpc.idmapd[620]: rpc.idmapd: libnfsidmap: using (default) domain: localdomain
Mar 08 22:45:59 343guiltyspark.nub.lan rpc.idmapd[620]: rpc.idmapd: libnfsidmap: Realms list: 'LOCALDOMAIN'
Mar 08 22:45:59 343guiltyspark.nub.lan rpc.idmapd[620]: rpc.idmapd: libnfsidmap: loaded plugin /lib/x86_64-linux-gnu/libnfsidmap/nsswitch.so for method nsswitch

I wrote a little test program to mimic libnfsidmap's domain_from_dns() 
function, which causes the above message:

#include <netdb.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
extern int h_errno;
int main() {
     struct hostent *he;
     char hname[64], *c;

     if (gethostname(hname, sizeof(hname)))
         printf("gethostname error: %d\n", errno);
     else
         printf("gethostname: '%s'\n", hname);

     if ((he = gethostbyname(hname)) == NULL)
         printf("gethostbyname error: '%s'\n", hstrerror(h_errno));
     else {
         printf("gethostbyname h_name: '%s'\n", he->h_name);
     }
}

and added it as an ExecStartPre= to the systemd service. The output is:

gethostname: '343guiltyspark.nub.lan'
gethostbyname error: 'Host name lookup failure'

It seems dns resolution isn't quite working when the service is started, 
so I added Wants=network-online.target (and After=) to the systemd 
service. It still fails.
But if I then add a "sleep 1" to the ExecStartPre, everything starts up 
correctly.

Obviously there are many solutions, including the above and setting the 
domain manually in /etc/idmap.conf. But on principle I'd like to solve 
the root race condition and help others avoid the same issue.

I'm hoping someone can answer my open questions:

1. Why does libnfsidmap use gethostname() and gethostbyname() (i.e. why 
does it need a dns lookup on the hostname)?

2. nfs-server.service already has a dependency on network-online.target, 
but nfs-idmapd.service does not (and it starts first). Since id mapping 
can depend on DNS resolution (and seems to out of the box), why not add 
the dependency to the latter as well?

3. Since the network-online.target doesn't completely solve the issue, 
any ideas on how to fix the startup race without something haphazard 
like a "sleep"?

Thanks,

Aram
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help