Re: [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()

From: Jeff King <hidden>
Date: 2023-12-13 08:20:28
Subsystem: the rest · Maintainer: Linus Torvalds

On Wed, Dec 13, 2023 at 08:07:21AM +0100, Patrick Steinhardt wrote:

quoted

 static const char *unquote_quoted_string(struct strbuf *outbuf, const char *in)
 {
-	int c;
 	int take_next_literally = 0;
 
-	while ((c = *in++) != 0) {
+	while (*in) {
+		int c = *in++;
 		if (take_next_literally == 1) {
 			take_next_literally = 0;
 		} else {

I was wondering whether we want to convert `unquote_quoted_pair()` in
the same way. It's not subject to the same issue because it doesn't
return `in` to its callers. But it would lower the cognitive overhead a
bit by keeping the coding style consistent. But please feel free to
ignore this remark.

I dunno. I don't think the consistency matters that much between
functions, and on its own, the "while ((c = *in++))" pattern is not
harmful. It's used elsewhere in the same file for other functions that
are also fine (for decoding q/b headers).

Another thing I was wondering about is the recursive nature of these
functions, and whether it can lead to similar issues like we recently
had when parsing revisions with stack exhaustion. I _think_ this should
not be much of a problem in this case though, as we're talking about
mail headers here. While the length of header values isn't restricted
per se (the line length is restricted to 1000, but with Comment Folding
Whitespace values can span multiple lines), but mail provdiers will sure
clamp down on mails with a "From:" header that is megabytes in size.

It's just unquote_comment() that is recursive, but yeah, there is
nothing to stop it from recursing forever on "((((((...". The stack
requirements are pretty small, though. I needed between 2^17 and 2^18
bytes to segfault on my machine using:

  perl -e 'print "From: ", "(" x 2**18;' |
  git mailinfo /dev/null /dev/null

Absurdly big for an email, but maybe within the realm of possibility? I
think it might be possible to drop the recursion and just use a depth
counter, like this:

diff --git a/mailinfo.c b/mailinfo.c
index 737b9e5e13..db236f9f9f 100644
--- a/mailinfo.c
+++ b/mailinfo.c

@@ -59,6 +59,7 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)
 static const char *unquote_comment(struct strbuf *outbuf, const char *in)
 {
 	int take_next_literally = 0;
+	int depth = 1;
 
 	strbuf_addch(outbuf, '(');

@@ -72,11 +73,14 @@ static const char *unquote_comment(struct strbuf *outbuf, const char *in)
 				take_next_literally = 1;
 				continue;
 			case '(':
-				in = unquote_comment(outbuf, in);
+				strbuf_addch(outbuf, '(');
+				depth++;
 				continue;
 			case ')':
 				strbuf_addch(outbuf, ')');
-				return in;
+				if (!--depth)
+					return in;
+				continue;
 			}
 		}

I doubt it's a big deal either way, but if it's that easy to do it might
be worth it.

-Peff

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help