Re: [PATCH] remote-curl: avoid hang if curl asks for more data after eof
From: Jonathan Tan <hidden>
Date: 2023-11-13 21:22:49
Jiří Hruška [off-list ref] writes:
It has been observed that under some circumstances, libcurl can call our `CURLOPT_READFUNCTION` callback `rpc_out()` again even after already getting a return value of zero (EOF) back once before. This results in `rpc_read_from_out()` trying to read more from the child process pipe, because `rpc->flush_read_but_not_sent` is reset to false already the first time EOF is returned, and then the whole operation gets stuck - the child process is already trying to read a response back and will not write anything to the output pipe anymore, while the parent/remote process is now blocked waiting to read more too and never even finishes sending the request.
Ah, thanks for finding this bug. It sounds worthwhile to make Git more resilient in this situation. I'll just make some preliminary comments.
This commit addresses the bug with the following changes:
[snip] This seems like a long list of changes, when from the description of the bug above, I would have assumed it sufficient to record somewhere when the CURLOPT_READFUNCTION callback returns zero, and then always return zero after that if the callback is ever re-invoked. If this is indeed not sufficient, we should describe why. Also, if multiple changes are needed, please split them into several commits.
A trivial solution would be to just take the line which resets the flag
/*
* The line length either does not need to be sent at
* all or has already been completely sent. Now we can
* return 0, indicating EOF, meaning that the flush has
* been fully sent.
*/
- rpc->flush_read_but_not_sent = 0;
return 0;
from rpc_out() and reset it only in post_rpc(), before the next time a large
request is being sent out and rpc_out() will go into play again:
if (large_request) {
...
rpc->initial_buffer = 1;
+ rpc->flush_read_but_not_sent = 0;
curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, rpc_out);
This way the CURLOPT_READFUNCTION would be returning zeros at the end of the
upload as long as needed, just like fread() at the end of a real file.
Hence, the bug could be fixed with just that two-lines change.Ah, you describe the straightforward fix here. I haven't looked at this closely enough to see if this would work, though.
But while trying to figure out the above, I noticed a few things that prolonged the time I needed to understand what was going on, so I would like to propose some more changes to make the code perhaps a bit easier to read for the next person who comes to hack on it after me. The description of the extra modifications is in the commit message. All of these changes are obviously optional and naturally subjective. I think that we can all agree on some points (less indentation = good), but naming is hard, and so is balance between unclear and too verbose, or when to split all non-functional changes to a separate commit. So let me know if there are things to do differently and I will gladly obey, it is your codebase after all.
Yes, please split non-functional changes into a separate commit (preferably one for each concern). I do envision reviewers saying "let's put patches X, Y, and Z in, but not patches A, B, and C", so splitting would make it easier to decide what's worthwhile to have.
Which brings me to the next topic, testing. Validating the fix would be trivial with a mocked libcurl, but turns out to be much harder with the integration-level test suite of this project.
[snip]
But outside of that, and as far as the bundled test suite goes, I have failed to write a test that could validate this problem does not ever occur again.
Due to the nature of the bug and the fix, I do agree that it's hard to test and I would be OK with including the fix without associated tests.