Thread (49 messages) 49 messages, 14 authors, 2026-02-06

Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024

From: Alejandro Colomar <alx@kernel.org>
Date: 2026-01-20 20:42:37
Also in: linux-fsdevel

On Tue, Jan 20, 2026 at 09:35:43PM +0100, Alejandro Colomar wrote:
Hi Rich, Zack,

On Tue, Jan 20, 2026 at 12:46:59PM -0500, Rich Felker wrote:
quoted
On Tue, Jan 20, 2026 at 12:05:52PM -0500, Zack Weinberg wrote:
quoted
quoted
On Fri, May 23, 2025 at 02:10:57PM -0400, Zack Weinberg wrote:
[...]
quoted
quoted
Now, the abstract correct behavior is secondary to the fact that we
know there are both systems where close should not be retried after
EINTR (Linux) and systems where the fd is still open after EINTR
(HP-UX).  But it is my position that *portable code* should assume the
Linux behavior, because that is the safest option.  If you assume the
HP-UX behavior on a machine that implements the Linux behavior, you
might close some unrelated file out from under yourself (probably but
not necessarily a different thread).  If you assume the Linux behavior
on a machine that implements the HP-UX behavior, you have leaked a
file descriptor; the worst things that can do are much less severe.
Unfortunately, regardless of what happens, code portable to old
systems needs to avoid getting in the situation to begin with. By
either not installing interrupting signal handlers or blocking EINTR
around close.
[...]
quoted
quoted
quoted
While I agree with all of this, I think the tone is way too
proscriptive. The man pages are to document the behaviors, not tell
people how to program.
I could be persuaded to tone it down a little but in this case I think
the man page's job *is* to tell people how to program.  We know lots of
existing code has gotten the fine details of close() wrong and we are
trying to document how to do it right.
No, the job of the man pages absolutely is not "to tell people how to
program". It's to document behaviors. They are not a programming
tutorial. They are not polemic diatribes. They are unbiased statements
of facts. Facts of what the standards say and what implementations do,
that equip programmers with the knowledge they need to make their own
informed decisions, rather than blindly following what someone who
thinks they know better told them to do.
This reminds me a little bit of the realloc(p,0) fiasco of C89 and
glibc.

In most cases, I agree with you that manual pages are and should be
aseptic, there are cases where I think the manual page needs to be
tutorial.  Especially when there's such a mess, we need to both explain
all the possible behaviors (or at least mention them to some degree).
... and guide programmers about how to best use the API.

I forgot to finish the sentence.
But for example, there's the case of realloc(p,0), where we have
a fiasco that was pushed by a compoundment of wrong decisions by the
C Committee, and prior to that from System V.  We're a bit lucky that
C17 accidentally broke it so badly that we now have it as UB, and that
gives us the opportunity to fix it now (which BTW might also be the case
for close(2)).

In the case of realloc(3), I went and documented in the manual page that
glibc is broken, and that ISO C is also broken.

	STANDARDS
	     malloc()
	     free()
	     calloc()
	     realloc()
		    C23, POSIX.1‐2024.

	     reallocarray()
		    POSIX.1‐2024.

	   realloc(p, 0)
	     The  behavior of realloc(p, 0) in glibc doesn’t conform to
	     any of C99, C11, POSIX.1‐2001, POSIX.1‐2004, POSIX.1‐2008,
	     POSIX.1‐2013,  POSIX.1‐2017,  or  POSIX.1‐2024.   The  C17
	     specification  was changed to make it conforming, but that
	     specification made it impossible to write code that  reli‐
	     ably  determines if the input pointer is freed after real‐
	     loc(p, 0), and C23 changed it again to make this undefined
	     behavior, acknowledging that  the  C17  specification  was
	     broad enough, so that undefined behavior wasn’t worse than
	     that.

	     reallocarray() suffers the same issues in glibc.

	     musl  libc  and  the BSDs conform to all versions of ISO C
	     and POSIX.1.

	     gnulib provides the realloc‐posix module,  which  provides
	     wrappers  realloc() and reallocarray() that conform to all
	     versions of ISO C and POSIX.1.

	     There’s a proposal to standardize the BSD behavior: https:
	     //www.open-std.org/jtc1/sc22/wg14/www/docs/n3621.txt.

	HISTORY
	     malloc()
	     free()
	     calloc()
	     realloc()
		    POSIX.1‐2001, C89.

	     reallocarray()
		    glibc 2.26.  OpenBSD 5.6, FreeBSD 11.0.

	     malloc() and related functions rejected sizes greater than
	     PTRDIFF_MAX starting in glibc 2.30.

	     free() preserved errno starting in glibc 2.33.

	   realloc(p, 0)
	     C89 was ambiguous in its specification of  realloc(p,  0).
	     C99 partially fixed this.

	     The  original implementation in glibc would have been con‐
	     forming to C99.  However, and ironically, trying to comply
	     with C99 before the standard was released,  glibc  changed
	     its  behavior  in glibc 2.1.1 into something that ended up
	     not conforming to the final C99 specification (but this is
	     debated, as the wording of the standard seems self‐contra‐
	     dicting).

	...

	BUGS
	     Programmers  would  naturally  expect  by  induction  that
	     realloc(p, size)  is  consistent  with  free(p)  and  mal‐
	     loc(size),  as  that  is the behavior in the general case.
	     This is not explicitly required by  POSIX.1‐2024  or  C11,
	     but  all  conforming  implementations  are consistent with
	     that.

	     The glibc implementation of realloc()  is  not  consistent
	     with  that,  and as a consequence, it is dangerous to call
	     realloc(p, 0) in glibc.

	     A  trivial  workaround  for  glibc  is   calling   it   as
	     realloc(p, size?size:1).

	     The  workaround for reallocarray() in glibc ——which shares
	     the         same          bug——          would          be
	     reallocarray(p, n?n:1, size?size:1).


Apart from documenting that glibc and ISO C are broken, we document how
to best deal with it (see the last paragraph in BUGS).  This is
necessary because I fear that just by documenting the different
behaviors, programmers would still not know what to do with that.
Just take into account that even several members of the committee don't
know how to deal with it.

I'd be willing to have something similar for close(2).


Have a lovely night!
Alex

P.S.:  I have great news about realloc(p,0)!  Microsoft is on-board with
the change.  They told me they like the proposal, and are willing to
fix their realloc(3) implementation.  They'll now conduct tests to make
sure it doesn't break anything too badly, and will come back to me with
any feedback they have from those tests.

I'll put the standards proposal for realloc(3) on hold, waiting for
Microsoft's feedback.
quoted
quoted
quoted
Aside: the reason EINTR *has to* be specified this way is that pthread
cancellation is aligned with EINTR. If EINTR were defined to have
closed the fd, then acting on cancellation during close would also
have closed the fd, but the cancellation handler would have no way to
distinguish this, leading to a situation where you're forced to either
leak fds or introduce a double-close vuln.
The correct way to address this would be to make close() not be a
cancellation point.
This would also be a desirable change, one I would support if other
implementors are on-board with pushing for it.
quoted
quoted
An outline of what I'd like to see instead:

- Clear explanation of why double-close is a serious bug that must
  always be avoided. (I think we all agree on this.)

- Statement that the historical Linux/glibc behavior and current POSIX
  requirement differ, without language that tries to paint the POSIX
  behavior as a HP-UX bug/quirk. Possibly citing real sources/history
  of the issue (Austin Group tracker items 529, 614; maybe others).

- Consequence of just assuming the Linux behavior (fd leaks on
  conforming systems).

- Consequences of assuming the POSIX behavior (double-close vulns on
  GNU/Linux, maybe others).

- Survey of methods for avoiding the problem (ways to preclude EINTR,
  possibly ways to infer behavior, etc).
This outline seems more or less reasonable to me but, if it's me
writing the text, I _will_ characterize what POSIX currently says
about EINTR returns from close() as a bug in POSIX.  As far as I'm
concerned, that is a fact, not polemic.

I have found that arguing with you in particular, Rich, is generally
not worth the effort.  Therefore, unless you reply and _accept_ that
the final version of the close manpage will say that POSIX is buggy,
I am not going to write another version of this text, nor will I be
drawn into further debate.
I will not accept that because it's a gross violation of the
responsibility of document writing.

Rich
-- 
<https://www.alejandro-colomar.es>


-- 
<https://www.alejandro-colomar.es>

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help