Thread (3 messages) 3 messages, 2 authors, 2022-07-27

Re: [PATCH 2/2] builtin/cat-file.c: support NUL-delimited input with `-z`

From: Taylor Blau <hidden>
Date: 2022-07-25 23:44:58

Possibly related (same subject, not in this thread)

Hi Junio,

On Sat, Jul 23, 2022 at 10:45:05AM -0700, Junio C Hamano wrote:
quoted
So isn't this just an accident in of us having used the strbuf_getline()
function to mean "\n", but actually it also does "\r\n".

Which is a really unfortunately named function b.t.w., since it sneaks
this bit of Windows portability into places that may not want it in the
first place.
getline() is to read a single "logical" line, so it is fine for it
to strip CRLF on platforms with CRLF, and to leave CR at the end of
line on a LF platform.  If the "protocol" is defined to use LF on
any platform (and allow a payload that ends with CR to be passed),
you can argue that it is a wrong helper to call.
Reading your email up to this point makes me think that we should ignore
any CR bytes we see next to a LF. So by this point I think that we
should take Ævar's suggestion and call:

    strbuf_getwholeline(&buf, stdin, opt->nul_terminated ? '\0' : '\n');

But...
But does that result in a sensible behaviour?  I am not sure.  Some
editors that are popular on LF platforms can produce CRLF files when
the user asks (either on purpose or by mistake), and when not using
the "-z" mode of input, I suspect that most users would expect CR at
the end of the "line" (terminated with LF on their platform of
choice) would be thrown away even on their LF platform, simply
because it is unlikely that the kind of input they are preparing can
usefully and legitimately end with CR, as their colleagues on CRLF
platforms may also have to produce similar input.  IOW, the presence
of CRLF platforms makes text lines that end with CR much less useful
everywhere.

And from that point of view, "getline() or getwholeline(NUL)" may be
a pattern that is practically more useful than the old "use always
getwholeline(NUL or LF)" convention that I invented more than 10
years ago.
This makes me think that we should retain support for dropping the CR
preceeeding a LF and treat it as a historical wart that we are stuck
supporting.

Do you have a preference? I am fine with either approach, FWIW.

Thanks,
Taylor
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help