Thread (44 messages) 44 messages, 6 authors, 2022-07-18

Re: [PATCH v4 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header()

From: Ævar Arnfjörð Bjarmason <hidden>
Date: 2022-07-13 01:39:34

On Tue, Jul 12 2022, Siddharth Asthana wrote:
quoted hunk ↗ jump to hunk
commit_rewrite_person() takes a commit buffer and replaces the idents
in the header with their canonical versions using the mailmap mechanism.
The name "commit_rewrite_person()" is misleading as it doesn't convey
what kind of rewrite are we going to do to the buffer. It also doesn't
clearly mention that the function will limit itself to the header part
of the buffer. The new name, "apply_mailmap_to_header()", expresses the
functionality of the function pretty clearly.

We intend to use apply_mailmap_to_header() in git-cat-file to replace
idents in the headers of commit and tag object buffers. So, we will be
extending this function to take tag objects buffer as well and replace
idents on the tagger header using the mailmap mechanism.

Mentored-by: Christian Couder [off-list ref]
Mentored-by: John Cai [off-list ref]
Signed-off-by: Siddharth Asthana <redacted>
---
 cache.h    | 6 +++---
 ident.c    | 2 +-
 revision.c | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/cache.h b/cache.h
index c9dbe1c29a..9edb7fefd3 100644
--- a/cache.h
+++ b/cache.h
@@ -1689,10 +1689,10 @@ struct ident_split {
 int split_ident_line(struct ident_split *, const char *, int);
 
 /*
- * Given a commit object buffer and the commit headers, replaces the idents
- * in the headers with their canonical versions using the mailmap mechanism.
+ * Given a commit or tag object buffer and the commit or tag headers, replaces
+ * the idents in the headers with their canonical versions using the mailmap mechanism.
  */
-void commit_rewrite_person(struct strbuf *buf, const char **commit_headers, struct string_list *mailmap);
+void apply_mailmap_to_header(struct strbuf *buf, const char **headers, struct string_list *mailmap);
 
 /*
  * Compare split idents for equality or strict ordering. Note that we
diff --git a/ident.c b/ident.c
index 9f4f6e9071..5f17bd607d 100644
--- a/ident.c
+++ b/ident.c
@@ -393,7 +393,7 @@ static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct
 	return 0;
 }
 
-void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
+void apply_mailmap_to_header(struct strbuf *buf, const char **headers, struct string_list *mailmap)
 {
 	size_t buf_offset = 0;
 
diff --git a/revision.c b/revision.c
index 14dca903b6..6ad3665204 100644
--- a/revision.c
+++ b/revision.c
@@ -3792,7 +3792,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 		if (!buf.len)
 			strbuf_addstr(&buf, message);
 
-		commit_rewrite_person(&buf, commit_headers, opt->mailmap);
+		apply_mailmap_to_header(&buf, commit_headers, opt->mailmap);
 	}
 
 	/* Append "fake" message parts as needed */
I can live with this so far, but I really think this is cementing the
wrong approach into place here.

We only use commit_match() to feed a commit to grep.c, which if you look
at the "header_field" struct there we take this pre-formatted output and
parse this out *again*, i.e. find "author", "reflog", "committer" etc.,
and eventually point the regex engine at that buffer.

So we really don't need to get a strbuf here, and munge the whole thing
in place to feed it to grep.c, instead we can:

 1. Not munge it at all, pass it as-is
 2. Pass the mailmap along to grep.c itself
 3. It's already parsing out the headers, so at some point it will have
    "author foo <bar>\n"
 4. In that code, we can just consult the mailmap, and then map the "foo
   <bar>" bart to "Baz <bar>" or whatever
 5. Thean search that string.

So no need for any in-place rewriting, or no?

Even with this approach this seems a bit odd, e.g. isn't your
commit_rewrite_person() largely a re-invention of find_commit_header()
in commit.c, can't we use that function there?

The replace_idents_using_mailmap() in 4/4 seems like it could be
improved in a similar way.

I.e. can't we just loop over the the object, then as we find "author"
consult the mailmap, and potentially emit a replacement, otherwise the
existing content as-is up until the next \n etc.

We should be able to "stream" all of this, instead of in-place modifying
a potentially large commit buffer, which involves memmove() etc.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help