Re: [RFC][PATCH] xfrm: do not leak ESRCH to user space

From: Fernando Luis Vázquez Cao <hidden>
Date: 2008-10-24 01:05:02

On Thu, 2008-10-23 at 14:11 -0700, David Miller wrote:

From: Fernando Luis Vázquez Cao <redacted>
Date: Thu, 23 Oct 2008 23:27:19 +0900

quoted

I noticed that, under certain conditions, ESRCH can be leaked from the
xfrm layer to user space through sys_connect. In particular, this seems
to happen reliably when the kernel fails to resolve a template either
because the AF_KEY receive buffer being used by racoon is full or
because the SA entry we are trying to use is in XFRM_STATE_EXPIRED
state.

However, since this could be a transient issue it could be argued that
EAGAIN would be more appropriate. Besides this error code is not even
documented in the man page for sys_connect (as of man-pages 3.07).

What is the expected behavior (I could not find anything in the RFCs)?
Should we just fix the connect(2) man page instead?

I think this case requires some care.

-EAGAIN tells the caller that it is a temporary failure and that
retrying can be expected to succeed eventually (some resource is not
available at the moment).  So applications loop when they see this
error returned, they will try again.

But that's not what is happening when ESRCH is signalled.  We found
no matching policy, and we've done nothing to make such a policy
be found in the (near) future.  It is more of a hard failure, which
should not necessarily be retried over and over again.

So converting this to -EAGAIN doesn't seem correct at all.

That would be so if -ESRCH did not happen to be a transient error.
Looking at the code, the window during which an entry is in
XFRM_STATE_EXPIRED state seems to be about 2 seconds in the worst case.
Connection attempts before and after that window would most likely
result in a successful connection or -EAGAIN, respectively. Would not it
make sense to return -EAGAIN also during that 2 seconds window?

Regarding the case when the kernel does not initiate a SA resolution
because the the AF_KEY receive buffer is full, I think it fits into the
"some resource is not available at the moment" definition for -EAGAIN.
As the buffer gets emptied chances are the future attempts will succeed.

This behavior is kind of confusing, but if deemed correct I think it
deserves to be properly documented in the respective man page. Do you
want me to do that or should the error code we return to user space be
changed in any of the two cases mentioned above?

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help