Re: [RFC][PATCH] xfrm: do not leak ESRCH to user space
From: Fernando Luis Vázquez Cao <hidden>
Date: 2008-10-24 01:05:02
On Thu, 2008-10-23 at 14:11 -0700, David Miller wrote:
From: Fernando Luis Vázquez Cao <redacted> Date: Thu, 23 Oct 2008 23:27:19 +0900quoted
I noticed that, under certain conditions, ESRCH can be leaked from the xfrm layer to user space through sys_connect. In particular, this seems to happen reliably when the kernel fails to resolve a template either because the AF_KEY receive buffer being used by racoon is full or because the SA entry we are trying to use is in XFRM_STATE_EXPIRED state. However, since this could be a transient issue it could be argued that EAGAIN would be more appropriate. Besides this error code is not even documented in the man page for sys_connect (as of man-pages 3.07). What is the expected behavior (I could not find anything in the RFCs)? Should we just fix the connect(2) man page instead?I think this case requires some care. -EAGAIN tells the caller that it is a temporary failure and that retrying can be expected to succeed eventually (some resource is not available at the moment). So applications loop when they see this error returned, they will try again. But that's not what is happening when ESRCH is signalled. We found no matching policy, and we've done nothing to make such a policy be found in the (near) future. It is more of a hard failure, which should not necessarily be retried over and over again. So converting this to -EAGAIN doesn't seem correct at all.
That would be so if -ESRCH did not happen to be a transient error. Looking at the code, the window during which an entry is in XFRM_STATE_EXPIRED state seems to be about 2 seconds in the worst case. Connection attempts before and after that window would most likely result in a successful connection or -EAGAIN, respectively. Would not it make sense to return -EAGAIN also during that 2 seconds window? Regarding the case when the kernel does not initiate a SA resolution because the the AF_KEY receive buffer is full, I think it fits into the "some resource is not available at the moment" definition for -EAGAIN. As the buffer gets emptied chances are the future attempts will succeed. This behavior is kind of confusing, but if deemed correct I think it deserves to be properly documented in the respective man page. Do you want me to do that or should the error code we return to user space be changed in any of the two cases mentioned above?