Thread (53 messages) 53 messages, 6 authors, 2024-03-15

Re: [PATCH 11/15] find multi-byte comment chars in unterminated buffers

From: René Scharfe <hidden>
Date: 2024-03-14 19:38:08

Possibly related (same subject, not in this thread)

Am 12.03.24 um 09:05 schrieb Jeff King:
On Thu, Mar 07, 2024 at 08:42:22PM +0100, René Scharfe wrote:
quoted
quoted
@@ -2562,7 +2562,7 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
 	/* left-trim */
 	bol += strspn(bol, " \t");

-	if (bol == eol || *bol == '\r' || *bol == comment_line_char) {
+	if (bol == eol || *bol == '\r' || starts_with_mem(bol, eol - bol, comment_line_str)) {
If the strspn() call is safe (which it is, as the caller expects the
string to be NUL-terminated) then you could use starts_with() here and
avoid the length calculation.  But that would also match
comment_line_str values that contain LF, which the _mem version does not
and that's better.
I try not to read too much into the use of string functions on what
otherwise appears to be an unterminated buffer. While in Git it is quite
often terminated at allocation time (coming from a strbuf, etc) I feel
like I've fixed a number of out-of-bounds reads simply due to sloppy
practices. And even if something is correct today, it is easy for it to
change, since the assumption is made far away from allocation.

So I dunno. Like you said, fewer computations is fewer opportunity to
mess things up. I don't like the idea of introducing a new hand-grenade
that might blow up later, but maybe if it's right next to a strspn()
call that's already a problem, it's not materially making anything
worse.
Yeah, and my logic was flawed: If the caller somehow guarantees that a
space or tab occurs before eol then the strspn() call is safe.  Its
presence doesn't guarantee NUL termination.  parse_insn_line() would
not be safe to use without that prerequisite, but that's a different
matter..
quoted
quoted
@@ -882,7 +882,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)

 	/* The first paragraph is the title and cannot be trailers */
 	for (s = buf; s < buf + len; s = next_line(s)) {
-		if (s[0] == comment_line_char)
+		if (starts_with_mem(s, buf + len - s, comment_line_str))
 			continue;
 		if (is_blank_line(s))
Another case where starts_with() would be safe to use, as
is_blank_line() expects (and gets) a NUL-terminated string, but it would
allow matching comment_line_str values that contain LF.
Hmm. Yes, it is a NUL-terminated string always, but the caller has told
us not to look past end_of_log_message(). I suspect that if there is no
newline in comment_line_str() it's probably impossible to go past "len"
(just because the end of the log surely ends with either a NUL or a
newline). But it feels iffy to me. I dunno.
Same flawed thinking on my part: As long as we're guaranteed a blank
line in the buffer we won't walk past its end.  That doesn't mean we can
assume a NUL is present.  But that's fragile.  The code should use
memchr() instead of strchrnul().

That's not the problem you set out to solve in your series, though, and
you avoid making it worse by respecting the length limit in the code
you change.  #leftoverbits

Keeping track of the remaining length increases code size and adds
opportunities for mistakes.  Not sure how to avoid it, however.  Using
eol instead of len at least avoids subtractions.

tl;dr: Good patch (in v2).

René
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help