Re: Potential Null Pointer Dereference detected by static analysis tool
From: René Scharfe <hidden>
Date: 2025-08-18 19:56:50
Subsystem:
the rest · Maintainer:
Linus Torvalds
On 8/18/25 7:05 AM, Jeff King wrote:
On Mon, Aug 18, 2025 at 12:48:07AM -0400, Jeff King wrote:quoted
And indeed, the current code does then segfault on "git describe foo" at the spot I mentioned. Even though the repository state here is unexpected and corrupt, I do think we should probably be more defensive and avoid the segfault.So you almost nerd-sniped me into making a series. But the more I dug into the rabbit-hole, the more I turned away in disgust. :) The set of problems I found are: 1. What should happen when traversing from HEAD does not find the blob in question? Right now we print a blank line, which is...weird.
Weird indeed.
Probably we should either print nothing, or return an error.
The latter, consistent with "git describe <commit-ish>".
If we return an error, should we respect --always?
The documentation says "git describe <blob>" takes no options. It could learn some, of course. But does it have to? Perhaps better keep that thing contained.
Are we stuck with the
current dumb behavior because it's a plumbing command?git describe is a porcelain command. The documentation doesn't say what happens when the blob is not found in HEAD's ancestry. I can't imagine why someone would use "git describe <blob>" to check whether a particular blob is linked to, but it _is_ slightly faster than "git rev-list --objects --no-object-names HEAD | grep <blob>" for me, and of course easier to type. "git log --find-object <blob>" is slower than either.
2. When we are on an unborn branch, we print a confusing message:
$ git init
$ git commit --allow-empty -m foo
$ git tag foo
$ git symbolic-ref HEAD refs/heads/unborn
$ git describe $(echo blob | git hash-object -w --stdin)
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
We should probably resolve HEAD ourselves and either bail with an
empty output or an error (depending on what we do for (1) above).It already is an error, just needs a better message. It should still report an error even if we were to stick with showing blank lines for unrelated blobs.
3. When we do traverse, if process_object() sees that we didn't find a
commit, we should detect that and either return an empty result or
an error (again, depending on the behavior of (2) above). This is
done by checking is_null_oid(&pcd->current_commit) there.OK, ending the search right there might be the best option. Traversing deeper into the forest that we then know to be cursed would be the unappealing alternative.
4. Then we can teach describe_commit() to take a commit rather than an
oid (and the is_null_oid() check becomes a NULL check).
5. When process_object() has a commit, but it is indescribable, it
shows an error:
$ git describe 5afbe6da1d6ab0b8939343166636b828e561bf35
fatal: No tags can describe '3b681e255cd8577a77983958ef7f566b05806cd0'.
Try --always, or create some tags.
It's not immediately clear that the reported hash belongs to the
found commit. And that suggestion to try --always is misleading,
as "git describe <blob>" takes no options according to the
documentation. I'm not sure I like it in general -- can't tell
if the command is being snarky with me.
So it all depends on what to do with (1), and for a feature that IMHO should not even exist in the first place, I had trouble summoning the will-power to make this 4-patch series.
644eb60bd0 (builtin/describe.c: describe a blob, 2017-11-15) and 15af58c1ad (diffcore: add a pickaxe option to find a specific blob, 2018-01-04) confuse me; the latter's commit message sounds like the former wasn't (supposed to be?) merged. I think the issues you listed are independent, though. Or what's wrong with this demo that addresses point 3 in process_object() and 1 in describe_blob(). If we want a blank line for 1 then we apply only the first hunk. Or am I missing something? René
diff --git a/builtin/describe.c b/builtin/describe.c
index d7dd8139de..9e485240aa 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c@@ -507,8 +507,10 @@ static void process_object(struct object *obj, const char *path, void *data) if (oideq(&pcd->looking_for, &obj->oid) && !pcd->dst->len) { reset_revision_walk(); - describe_commit(&pcd->current_commit, pcd->dst); - strbuf_addf(pcd->dst, ":%s", path); + if (!is_null_oid(&pcd->current_commit)) { + describe_commit(&pcd->current_commit, pcd->dst); + strbuf_addf(pcd->dst, ":%s", path); + } free_commit_list(pcd->revs->commits); pcd->revs->commits = NULL; }
@@ -519,6 +521,7 @@ static void describe_blob(struct object_id oid, struct strbuf *dst) struct rev_info revs; struct strvec args = STRVEC_INIT; struct process_commit_data pcd = { *null_oid(the_hash_algo), oid, dst, &revs}; + size_t orig_len = dst->len; strvec_pushl(&args, "internal: The first arg is not parsed", "--objects", "--in-commit-order", "--reverse", "HEAD",
@@ -532,6 +535,8 @@ static void describe_blob(struct object_id oid, struct strbuf *dst) die("revision walk setup failed"); traverse_commit_list(&revs, process_commit, process_object, &pcd); + if (dst->len == orig_len) + die(_("unable to describe blob '%s'"), oid_to_hex(&oid)); reset_revision_walk(); release_revisions(&revs); strvec_clear(&args);