Re: [PATCH] remote-curl: avoid hang if curl asks for more data after eof

From: Jonathan Tan <hidden>
Date: 2023-11-13 21:22:49

Jiří Hruška [off-list ref] writes:

It has been observed that under some circumstances, libcurl can call
our `CURLOPT_READFUNCTION` callback `rpc_out()` again even after
already getting a return value of zero (EOF) back once before.

This results in `rpc_read_from_out()` trying to read more from the
child process pipe, because `rpc->flush_read_but_not_sent` is reset to
false already the first time EOF is returned, and then the whole
operation gets stuck - the child process is already trying to read
a response back and will not write anything to the output pipe anymore,
while the parent/remote process is now blocked waiting to read more too
and never even finishes sending the request.

Ah, thanks for finding this bug. It sounds worthwhile to make Git more
resilient in this situation.

I'll just make some preliminary comments.

This commit addresses the bug with the following changes:

[snip]

This seems like a long list of changes, when from the description of the
bug above, I would have assumed it sufficient to record somewhere when
the CURLOPT_READFUNCTION callback returns zero, and then always return
zero after that if the callback is ever re-invoked. If this is indeed
not sufficient, we should describe why.

Also, if multiple changes are needed, please split them into several
commits.

A trivial solution would be to just take the line which resets the flag

    /*
     * The line length either does not need to be sent at
     * all or has already been completely sent. Now we can
     * return 0, indicating EOF, meaning that the flush has
     * been fully sent.
     */
-   rpc->flush_read_but_not_sent = 0;
    return 0;

from rpc_out() and reset it only in post_rpc(), before the next time a large
request is being sent out and rpc_out() will go into play again:

    if (large_request) {
        ...
        rpc->initial_buffer = 1;
+       rpc->flush_read_but_not_sent = 0;
        curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, rpc_out);

This way the CURLOPT_READFUNCTION would be returning zeros at the end of the
upload as long as needed, just like fread() at the end of a real file.

Hence, the bug could be fixed with just that two-lines change.

Ah, you describe the straightforward fix here. I haven't looked at this
closely enough to see if this would work, though.

But while trying to figure out the above, I noticed a few things that prolonged
the time I needed to understand what was going on, so I would like to propose
some more changes to make the code perhaps a bit easier to read for the next
person who comes to hack on it after me.

The description of the extra modifications is in the commit message. All of
these changes are obviously optional and naturally subjective. I think that we
can all agree on some points (less indentation = good), but naming is hard,
and so is balance between unclear and too verbose, or when to split all
non-functional changes to a separate commit. So let me know if there are things
to do differently and I will gladly obey, it is your codebase after all.

Yes, please split non-functional changes into a separate commit
(preferably one for each concern). I do envision reviewers saying "let's
put patches X, Y, and Z in, but not patches A, B, and C", so splitting
would make it easier to decide what's worthwhile to have.

Which brings me to the next topic, testing.

Validating the fix would be trivial with a mocked libcurl, but turns out to be
much harder with the integration-level test suite of this project.

[snip]

But outside of that, and as far as the bundled test suite goes, I have failed to
write a test that could validate this problem does not ever occur again.

Due to the nature of the bug and the fix, I do agree that it's hard to
test and I would be OK with including the fix without associated tests.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help