Thread (23 messages) 23 messages, 5 authors, 2025-08-20

Re: Potential Null Pointer Dereference detected by static analysis tool

From: René Scharfe <hidden>
Date: 2025-08-18 19:56:50
Subsystem: the rest · Maintainer: Linus Torvalds

On 8/18/25 7:05 AM, Jeff King wrote:
On Mon, Aug 18, 2025 at 12:48:07AM -0400, Jeff King wrote:
quoted
And indeed, the current code does then segfault on "git describe foo" at
the spot I mentioned. Even though the repository state here is
unexpected and corrupt, I do think we should probably be more defensive
and avoid the segfault.
So you almost nerd-sniped me into making a series. But the more I dug
into the rabbit-hole, the more I turned away in disgust. :)

The set of problems I found are:

  1. What should happen when traversing from HEAD does not find the blob
     in question? Right now we print a blank line, which is...weird.
Weird indeed.
     Probably we should either print nothing, or return an error.
The latter, consistent with "git describe <commit-ish>".
If we return an error, should we respect --always?
The documentation says "git describe <blob>" takes no options.  It could
learn some, of course.  But does it have to?  Perhaps better keep that
thing contained.
Are we stuck with the
     current dumb behavior because it's a plumbing command?
git describe is a porcelain command.

The documentation doesn't say what happens when the blob is not found
in HEAD's ancestry.

I can't imagine why someone would use "git describe <blob>" to check
whether a particular blob is linked to, but it _is_ slightly faster than
"git rev-list --objects --no-object-names HEAD | grep <blob>" for me,
and of course easier to type.  "git log --find-object <blob>" is slower
than either.
  2. When we are on an unborn branch, we print a confusing message:

       $ git init
       $ git commit --allow-empty -m foo
       $ git tag foo
       $ git symbolic-ref HEAD refs/heads/unborn
       $ git describe $(echo blob | git hash-object -w --stdin)
       fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
       Use '--' to separate paths from revisions, like this:
       'git <command> [<revision>...] -- [<file>...]'

     We should probably resolve HEAD ourselves and either bail with an
     empty output or an error (depending on what we do for (1) above).
It already is an error, just needs a better message.  It should still
report an error even if we were to stick with showing blank lines for
unrelated blobs.
  3. When we do traverse, if process_object() sees that we didn't find a
     commit, we should detect that and either return an empty result or
     an error (again, depending on the behavior of (2) above). This is
     done by checking is_null_oid(&pcd->current_commit) there.
OK, ending the search right there might be the best option.  Traversing
deeper into the forest that we then know to be cursed would be the
unappealing alternative.
  4. Then we can teach describe_commit() to take a commit rather than an
     oid (and the is_null_oid() check becomes a NULL check).
  5. When process_object() has a commit, but it is indescribable, it
     shows an error:

     $ git describe 5afbe6da1d6ab0b8939343166636b828e561bf35
     fatal: No tags can describe '3b681e255cd8577a77983958ef7f566b05806cd0'.
     Try --always, or create some tags.

     It's not immediately clear that the reported hash belongs to the
     found commit.  And that suggestion to try --always is misleading,
     as "git describe <blob>" takes no options according to the
     documentation.  I'm not sure I like it in general -- can't tell
     if the command is being snarky with me.
So it all depends on what to do with (1), and for a feature that IMHO
should not even exist in the first place, I had trouble summoning the
will-power to make this 4-patch series.
644eb60bd0 (builtin/describe.c: describe a blob, 2017-11-15) and
15af58c1ad (diffcore: add a pickaxe option to find a specific blob,
2018-01-04) confuse me; the latter's commit message sounds like the
former wasn't (supposed to be?) merged.

I think the issues you listed are independent, though.  Or what's wrong
with this demo that addresses point 3 in process_object() and 1 in
describe_blob().  If we want a blank line for 1 then we apply only
the first hunk.  Or am I missing something?

René

diff --git a/builtin/describe.c b/builtin/describe.c
index d7dd8139de..9e485240aa 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -507,8 +507,10 @@ static void process_object(struct object *obj, const char *path, void *data)
 
 	if (oideq(&pcd->looking_for, &obj->oid) && !pcd->dst->len) {
 		reset_revision_walk();
-		describe_commit(&pcd->current_commit, pcd->dst);
-		strbuf_addf(pcd->dst, ":%s", path);
+		if (!is_null_oid(&pcd->current_commit)) {
+			describe_commit(&pcd->current_commit, pcd->dst);
+			strbuf_addf(pcd->dst, ":%s", path);
+		}
 		free_commit_list(pcd->revs->commits);
 		pcd->revs->commits = NULL;
 	}
@@ -519,6 +521,7 @@ static void describe_blob(struct object_id oid, struct strbuf *dst)
 	struct rev_info revs;
 	struct strvec args = STRVEC_INIT;
 	struct process_commit_data pcd = { *null_oid(the_hash_algo), oid, dst, &revs};
+	size_t orig_len = dst->len;
 
 	strvec_pushl(&args, "internal: The first arg is not parsed",
 		     "--objects", "--in-commit-order", "--reverse", "HEAD",
@@ -532,6 +535,8 @@ static void describe_blob(struct object_id oid, struct strbuf *dst)
 		die("revision walk setup failed");
 
 	traverse_commit_list(&revs, process_commit, process_object, &pcd);
+	if (dst->len == orig_len)
+		die(_("unable to describe blob '%s'"), oid_to_hex(&oid));
 	reset_revision_walk();
 	release_revisions(&revs);
 	strvec_clear(&args);
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help