Thread (16 messages) 16 messages, 4 authors, 2019-10-28

Re: [PATCH v2 1/1] vreportf: Fix interleaving issues, remove 4096 limitation

From: Johannes Schindelin <hidden>
Date: 2019-10-25 14:03:04
Subsystem: the rest · Maintainer: Linus Torvalds

Hi Alex,

On Fri, 25 Oct 2019, Alexandr Miloslavskiy wrote:
quoted
Maybe this could do with an example?
I myself observed results like this when running t5516:
------
fatal: git fatal: remote errouploar: upload-pack: not our ref
64ea4c133d59fa98e86a771eda009872d6ab2886d-pack: not o
ur ref 64ea4c133d59fa98e86a771eda009872d6ab2886
------

Do you want me to add this garbled string to commit message?
I think that would make things a lot more relatable ;-)

My example is even worse (read: more convincing), though:

fatal: git uploadfata-lp: raemcokte :error:  upload-pnot our arcef k6: n4ot our ea4cr1e3f 36d45ea94fca1398e86a771eda009872d63adb28598f6a9
8e86a771eda009872d6ab2886

So maybe you want to use that?
quoted
quoted
+	buf = malloc(buf_size);
quoted
Why not `alloca()`?
Allocating large chunks on stack is usually not recommended. There is a funny
test "init allows insanely long --template" in t0001 which demonstrates that
sometimes vreportf() can attempt to print very long strings. Crashing due to
stack overflow doesn't sound like a good thing.
Indeed. A clipped message will be a lot more helpful in such a scenario.
quoted
And to take a step back, I think that previous rounds of patches trying
to essentially address the same problem made the case that it is okay to
cut off insanely-long messages, so I am not sure we would want to open
that can of worms again...
I draw a different conclusion here. Each author thought that "1024 must
definitely be enough!" but then discovered that it's not enough again, for
example due to long "usage" output. At some point, f4c3edc0 even tried to
remove all limits after a chain of limits that were too small. So I would say
that this is still a problem.
The most important aspect of `vreportf()` is, at least as far as I am
concerned, to "get the message out".

As such, no, I don't want it to fail, neither due to `alloca()` nor due
to `malloc()`. I'd rather have it produce a cut-off message that at
least shows the first 4095 bytes of the message.
quoted
Quite honestly, I would love to avoid that amount of complexity,
certainly in a part of the code that we would like to have rock
solid because it is usually exercised when things go very, very
wrong and we need to provide the user who is bitten by it enough
information to take to the Git contributors to figure out the root
cause(s).
It's a choice between simpler code and trying to account for
everything that could happen. I think we'd rather have more complex
code that handles more cases, exactly to try and deliver output to
user no matter what.
Complex code usually has more bugs than simple code. I don't want
`vreportf()` to have potential bugs that we don't know about.
quoted
Is the problem that causes those failures with VS the fact that
`fprintf(stderr, ...)` might be interleaved with the output of
another process that _also_ wants to write to `stderr`? I assume
that this _is_ the problem.
This is where I started. But if you look at comment in vreportf_buf,
there are more problems, such as interleaving blocks of larger
messages, which could happen on any platform. I tried to make it work
in most cases possible.
Again, I don't think that it is wise to try to make this work for
arbitrary sizes of error messages.

At some stage, the scrollback of the console won't be large enough to
fix that message!

So I think it is very sane to say that at some point, enough is enough.

Four thousand bytes seems a really long message, anyway.
quoted
Further, I guess that the problem is compounded by the fact that we
usually run the tests in a Git Bash on Windows, i.e. in a MinTTY
that emulates a console (there _is_ work under way to support the
newly introduces ptys, but that work is far from done), so the
standard error file handle might behave in unexpected ways in that
scenario.
To my knowledge, this is not related. t5516 failures are because git
explicitly wants stderr to be unbuffered. VC++ and MinGW runtimes take
that literally. fprintf() outputs char-by-char, and all of that
results in char-interleaving.
Yes, as my example above demonstrates. (Ugh!)
quoted
But I do wonder whether replacing that `fprintf()` by a `write()` would
work better. After all, we could write the `prefix` into the `msg`
already:

 size_t off = strlcpy(msg, prefix, sizeof(msg));
 int ret = vsnprintf(msg + off, sizeof(msg) - off, err, params);
 [...]
 if (ret > 0)
  write(2, msg, off + ret);

Would that also work around the problem?
You forgot to add '\'n. But yes, that would solve many problems,

... and indeed, I verified that this patch fixes the problem:

-- snip --
diff --git a/usage.c b/usage.c
index 2fdb20086bd..7f5bdfb0f40 100644
--- a/usage.c
+++ b/usage.c
@@ -10,13 +10,16 @@ void vreportf(const char *prefix, const char *err, va_list params)
 {
 	char msg[4096];
 	char *p;
-
-	vsnprintf(msg, sizeof(msg), err, params);
+	size_t off = strlcpy(msg, prefix, sizeof(msg));
+	int ret = vsnprintf(msg + off, sizeof(msg) - off, err, params);
 	for (p = msg; *p; p++) {
 		if (iscntrl(*p) && *p != '\t' && *p != '\n')
 			*p = '?';
 	}
-	fprintf(stderr, "%s%s\n", prefix, msg);
+	if (ret > 0) {
+		msg[off + ret] = '\n'; /* we no longer need a NUL */
+		write_in_full(2, msg, off + ret + 1);
+	}
 }

 static NORETURN void usage_builtin(const char *err, va_list params)
-- snap --
except truncation to 4096. Then I would expect a patch to increase
buffer size to 8192 in the next couple years. And if you also try to
solve truncation, it will get you very close to my code.
My point is: I don't want to "fix" truncation. I actually think of it as
a feature. An error message that is longer than the average news article
I read is too long, period.

BTW I have a couple more tidbits to add to the commit message, if you
would be so kind as to pick them up: I know _which_ two processes battle
for `stderr`. I instrumented the code slightly, and this is what I got:

-- snip --
$ GIT_TRACE=1 GIT_TEST_PROTOCOL_VERSION= ../git.exe --exec-path=$PWD/..  -C trash\ directory.t5516-fetch-push/shallow2/  fetch ../testrepo/.git 64ea4c133d59fa98e86a771eda009872d6ab2886
14:55:55.360382 exec-cmd.c:238          trace: resolved executable dir: C:/git-sdk-64/usr/src/vs2017-test
14:55:55.362379 exec-cmd.c:54           RUNTIME_PREFIX requested, but prefix computation failed.  Using static fallback '/mingw64'.
14:55:55.387189 git.c:444               trace: built-in (pid=21620): git fetch ../testrepo/.git 64ea4c133d59fa98e86a771eda009872d6ab2886
14:55:55.392644 run-command.c:663       trace: run_command: unset GIT_PREFIX; 'git-upload-pack '\''../testrepo/.git'\'''
14:55:55.659992 exec-cmd.c:238          trace: resolved executable dir: C:/git-sdk-64/usr/src/vs2017-test
14:55:55.661762 exec-cmd.c:54           RUNTIME_PREFIX requested, but prefix computation failed.  Using static fallback '/mingw64'.
14:55:55.662759 git.c:444               trace: built-in (pid=27452): git upload-pack ../testrepo/.git
14:55:55.681188 run-command.c:663       trace: run_command: git rev-list --stdin
fatal: git upload-pack: not our ref 64ea4c133d59fa98e86a771eda009872d6ab2886
fatal: remote error (pid=21620): upload-pack: not our ref 64ea4c133d59fa98e86a771eda009872d6ab2886
-- snap --

As you can see, the two error messages stem from the `git fetch` process
(with the prefix "remote error:") and the process it spawned, `git upload-pack`.

BTW if you pick up the indicated patch and the tidbits for the commit
message and then send out a new iteration via GitGitGadget, I would not
mind being co-author at all ;-)

Ciao,
Dscho
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help