Thread (218 messages) 218 messages, 10 authors, 2022-06-18

Re: [PATCH v2 11/13] merge-tree: provide easy access to `ls-files -u` style info

From: Elijah Newren <hidden>
Date: 2022-02-03 08:39:35

On Wed, Feb 2, 2022 at 5:22 PM Ævar Arnfjörð Bjarmason [off-list ref] wrote:
On Wed, Feb 02 2022, Elijah Newren wrote:
quoted
On Wed, Feb 2, 2022 at 1:32 PM Junio C Hamano [off-list ref] wrote:
quoted
"Elijah Newren via GitGitGadget" [off-list ref] writes:
quoted
@@ -450,7 +451,11 @@ static int real_merge(struct merge_tree_options *o,
              merge_get_conflicted_files(&result, &conflicted_files);
              for (i = 0; i < conflicted_files.nr; i++) {
                      const char *name = conflicted_files.items[i].string;
-                     if (last && !strcmp(last, name))
+                     struct stage_info *c = conflicted_files.items[i].util;
+                     if (!o->exclude_modes_oids_stages)
+                             printf("%06o %s %d\t",
+                                    c->mode, oid_to_hex(&c->oid), c->stage);
+                     else if (last && !strcmp(last, name))
                              continue;
                      write_name_quoted_relative(
                              name, prefix, stdout, line_termination);
OK.  The addition (and disabling of the deduping) is quite trivial.
We do not even have to worry about line termination since the extra
pieces of info are prepended to the pathname.  Nice.
quoted
@@ -485,6 +490,10 @@ int cmd_merge_tree(int argc, const char **argv, const char *prefix)
                          N_("do a trivial merge only"), 't'),
              OPT_BOOL(0, "messages", &o.show_messages,
                       N_("also show informational/conflict messages")),
+             OPT_BOOL_F('l', "exclude-modes-oids-stages",
+                        &o.exclude_modes_oids_stages,
+                        N_("list conflicted files without modes/oids/stages"),
+                        PARSE_OPT_NONEG),
Why does "-l" give shorter output than without it?  "-l" strongly
hints a longer output than without, at least to me.  Just wondering
if this will not become a source of confusion to future scripting
users.
Here's another example where I was struggling with naming.  Something
like ls-tree's `--name-only` would have been nice, but I was worried
it'd be confusing since it only affected the conflicted info section
and does not suppress the printing of the toplevel tree or the
informational messages sections.  And the name
--exclude-modes-oids-stages was long enough that I wanted a short flag
for it, and just used the first letter of the description ("list
conflicted files...").  I'm happy to change either the long or the
short name for this flag if anyone has suggestions.
There's always sidestepping it by replacing it with a --format :)
Another solution that occurred to me, and I was _really_ close to
doing it for v3, was to just flat drop this patch entirely and not
include any such option.  But...

  * "Which files had conflicts?" seems like such an obvious question
  * I've used `git ls-files -u | awk {print\$4} | uniq` a lot in the
past after `git merge` (Or `git rebase`) to get this info (yeah, it
turns out `git diff --name-only --diff-filter=U` is 4 fewer
characters)
  * "display the list of files where conflicts were present in the web
UI" was listed as an early usecase[1]

[1] https://lore.kernel.org/git/YYlqpuzv+bmZaFzz@nand.local/ (local)

So it seemed like making that question easy to answer was worthwhile.
Anyway, I'd mentioned that in an earlier review in
[ref]. FWIW here's an experiment
to do that that I polished up (mostly copied from the ls-tree WIP code
I'd written already).

I don't know if it will ever be useful, or if you think it's
worthwhile/simpler, but in either case I think in doing this I spotted
the following issues or otherwise noted inconsistencies in the pre-image:

   The docs say that "<stage> <path>" is SP-separated, but it's
   actually TAB-separated, the rest is SP-separated.
Yeah, good catch.  However, it doesn't actually say they are
SP-separated; it's ambiguous about the spacing.  Which probably isn't
a good thing, but it was kind of copied from the ls-files manual:

"""
       git ls-files just outputs the filenames unless --stage is specified in
       which case it outputs:

           [<tag> ]<mode> <object> <stage> <file>
"""

(which also uses a tab between <stage> and <file> and a space
otherwise, but the output above may lead you to believe otherwise.)
 * That you de-dupe --exclude-modes-oids-stages is a bit of a hidden feature,
   but argubly initiative. Should it by optional? In any case my formatting
   experiment makes it optional, since it then needs to be generalized to de-dupe
   after we've formatted.
I think without de-duping the flag isn't helpful enough to bother
implementing.  Requiring two flags also seems painful, given the
common case scenario.

I hope I'm not coming across as dismissive.  I think eventually adding
a --format and --dedupe (the combination of which might be implied by
whatever flag is used now) might be useful additions.  Maybe --abbrev
too...eventually.  But I'm worried that it's distracting from focusing
on usecases.  In particular, I'm worried it leads to "well, script
writers technically can get what they want because we provided
everything" rather than focusing on making the most common things easy
to get, and then extending the command for flexibility as needed
later.

I'd really rather that early versions _just_ focus on actual usecases
as far as UI is concerned (and thus I was really happy to see Dscho
and Taylor concentrate on that side; I think Christian might have been
talking about that angle some but it was hard to differentiate from
the "merge-tree on steroids" spitballing).  While I want to be careful
to avoid preventing UI flexibility, I think building it in from the
beginning tends to lead to a design that is less usable.  (e.g. the
possible loss of de-duping that would naturally have arisen from
looking at things from the other angle.)  It's just a bias I have.
 * Perhaps we should support --abbrev as ls-tree does? The below diff shows
   it's easy enough.
This one is less problematic to me, but I'd still rather that the UI
side of things focused on the usecases for early versions.
 * The dance you have with sed-ing out the hash in the tests could be made much
   easier with "sed 1d <out >actual" and --no-messages for some existing tests.
Ignoring the first line is semantically different than verifying it
looks like a hash.  It also only works on the first line, and hashes
appear in multiple places, so you'd need a variety of different sed
commands for different parts of the output, which doesn't seem any
easier at all to me; I think using the same replacement everywhere is
simpler.  But perhaps I should turn it into a shell function that I
use in each case.
quoted hunk ↗ jump to hunk
diff --git a/Documentation/git-merge-tree.txt b/Documentation/git-merge-tree.txt
index 6a2ed475106..e906d1dc9bf 100644
--- a/Documentation/git-merge-tree.txt
+++ b/Documentation/git-merge-tree.txt
@@ -44,10 +44,9 @@ OPTIONS
        newline.  Also begin the messages section with a NUL character
        instead of a newline.  See OUTPUT below for more information.

---exclude-oids-and-modes::
-       Instead of writing a list of (mode, oid, stage, path) tuples
-       to output for conflicted files, just provide a list of
-       filenames with conflicts.
+--conflict-format::
+       Override the default "%(objectmode) %(objectname)
+       %(stage)%x09%(path)" format.

 --[no-]messages::
        Write any informational messages such as "Auto-merging <path>"
@@ -89,13 +88,13 @@ Conflicted file info

 This is a sequence of lines with the format

-       <mode> <object> <stage> <filename>
+       %(objectmode) %(objectname) %(stage)%x09%(path)

 The filename will be quoted as explained for the configuration
-variable `core.quotePath` (see linkgit:git-config[1]).  However, if
-the `--exclude-oids-and-modes` option is passed, the mode, object, and
-stage will be omitted.  If `-z` is passed, the "lines" are terminated
-by a NUL character instead of a newline character.
+variable `core.quotePath` (see linkgit:git-config[1]).
+
+If `-z` is passed, the "lines" are terminated by a NUL character
+instead of a newline character.

 Informational messages
 ~~~~~~~~~~~~~~~~~~~~~~
diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index 58c0ddc5a32..14fed95a8ce 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -395,9 +395,64 @@ struct merge_tree_options {
        int mode;
        int allow_unrelated_histories;
        int show_messages;
-       int exclude_modes_oids_stages;
+       const char *conflict_format;
+       int unique_conflicts;
+       int abbrev;
 };

+struct expand_conflict_data {
+       const char *prefix;
+       struct string_list_item *item;
+       struct strbuf *scratch;
+       int abbrev;
+       struct strbuf *sb_tmp;
+};
+static size_t expand_conflict_format(struct strbuf *sb,
+                                    const char *start,
+                                    void *context)
+{
+       struct expand_conflict_data *data = context;
+       struct string_list_item *item = data->item;
+       struct stage_info *info = item->util;
+       const char *end;
+       const char *p;
+       size_t len;
+
+       len = strbuf_expand_literal_cb(sb, start, NULL);
+       if (len)
+               return len;
+
+       if (*start != '(')
+               die(_("bad format as of '%s'"), start);
+       end = strchr(start + 1, ')');
+       if (!end)
+               die(_("format element '%s' does not end in ')'"), start);
+       len = end - start + 1;
+
+       if (skip_prefix(start, "(objectmode)", &p)) {
+               strbuf_addf(sb, "%06o", info->mode);
+       } else if (skip_prefix(start, "(objectname)", &p)) {
+               strbuf_addstr(sb, find_unique_abbrev(&info->oid, data->abbrev));
+       } else if (skip_prefix(start, "(stage)", &p)) {
+               strbuf_addf(sb, "%d", info->stage);
+       } else if (skip_prefix(start, "(path)", &p)) {
+               const char *name = item->string;
+
+               if (data->prefix)
+                       name = relative_path(name, data->prefix, data->scratch);
+               strbuf_addstr(sb, name);
+
+               strbuf_reset(data->sb_tmp);
+               /* The relative_path() function resets "scratch" */
+
+       } else {
+               unsigned int errlen = (unsigned long)len;
+               die(_("bad format specifier %%%.*s"), errlen, start);
+       }
+
+       return len;
+}
+
 static int real_merge(struct merge_tree_options *o,
                      const char *branch1, const char *branch2,
                      const char *prefix)
@@ -446,23 +501,43 @@ static int real_merge(struct merge_tree_options *o,
        puts(oid_to_hex(&result.tree->object.oid));
        if (!result.clean) {
                struct string_list conflicted_files = STRING_LIST_INIT_NODUP;
-               const char *last = NULL;
-               int i;
+               struct string_list_item *item;
+               char *last = NULL;
+               struct strbuf sb = STRBUF_INIT;
+               struct strbuf tmp = STRBUF_INIT;

                merge_get_conflicted_files(&result, &conflicted_files);
-               for (i = 0; i < conflicted_files.nr; i++) {
-                       const char *name = conflicted_files.items[i].string;
-                       struct stage_info *c = conflicted_files.items[i].util;
-                       if (!o->exclude_modes_oids_stages)
-                               printf("%06o %s %d\t",
-                                      c->mode, oid_to_hex(&c->oid), c->stage);
-                       else if (last && !strcmp(last, name))
+               for_each_string_list_item(item, &conflicted_files) {
+                       struct expand_conflict_data ctx = {
+                               .prefix = prefix,
+                               .item = item,
+                               .abbrev = o->abbrev,
+                               .scratch = &sb,
+                               .sb_tmp = &tmp,
+                       };
+
+                       strbuf_expand(&sb, o->conflict_format, expand_conflict_format, &ctx);
+                       strbuf_addch(&sb, line_termination);
+
+                       if (o->unique_conflicts && last && !strcmp(last, sb.buf)) {
+                               free(last);
+                               last = strbuf_detach(&sb, NULL);
                                continue;
-                       write_name_quoted_relative(
-                               name, prefix, stdout, line_termination);
-                       last = name;
+                       }
+
+                       fwrite(sb.buf, sb.len, 1, stdout);
+
+                       if (o->unique_conflicts) {
+                               free(last);
+                               last = strbuf_detach(&sb, NULL);
+                       } else {
+                               strbuf_reset(&sb);
+                       }
                }
                string_list_clear(&conflicted_files, 1);
+               strbuf_release(&sb);
+               strbuf_release(&tmp);
+               free(last);
        }
        if (o->show_messages) {
                putchar(line_termination);
@@ -474,7 +549,11 @@ static int real_merge(struct merge_tree_options *o,

 int cmd_merge_tree(int argc, const char **argv, const char *prefix)
 {
-       struct merge_tree_options o = { .show_messages = -1 };
+       struct merge_tree_options o = {
+               .show_messages = -1,
+               .conflict_format = "%(objectmode) %(objectname) %(stage)%x09%(path)",
+               .unique_conflicts = 1,
+       };
        int expected_remaining_argc;
        int original_argc;
@@ -493,14 +572,15 @@ int cmd_merge_tree(int argc, const char **argv, const char *prefix)
                         N_("also show informational/conflict messages")),
                OPT_SET_INT('z', NULL, &line_termination,
                            N_("separate paths with the NUL character"), '\0'),
-               OPT_BOOL_F('l', "exclude-modes-oids-stages",
-                          &o.exclude_modes_oids_stages,
-                          N_("list conflicted files without modes/oids/stages"),
-                          PARSE_OPT_NONEG),
+               OPT_STRING(0, "conflict-format", &o.conflict_format, N_("format"),
+                          N_("specify a custom format to use for conflicted files")),
+               OPT_BOOL(0, "unique-conflicts", &o.unique_conflicts,
+                        N_("omit duplicate --conflict-format lines")),
The latter of which you didn't include in the manual?  Also,
unique_conflicts seems like something that is trivial to understand
from the coding perspective, but probably require quite a bit more
explanation from the manual.  For example, if objectname is included
in the format, unique-conflicts is essentially a no-op.  And that's
the default...so, you'd probably have to spend time in the manual
explaining under what circumstances it's useful.  I'm also not sure if
a user who wanted (mode, path) would want unique_conflicts to default
to 1; it may be something only meaningful in the particular case of
"just give me conflicted filenames".
quoted hunk ↗ jump to hunk
                OPT_BOOL_F(0, "allow-unrelated-histories",
                           &o.allow_unrelated_histories,
                           N_("allow merging unrelated histories"),
                           PARSE_OPT_NONEG),
+               OPT__ABBREV(&o.abbrev),
                OPT_END()
        };
diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
index 4de089d976d..e6354b2d284 100755
--- a/t/t4301-merge-tree-write-tree.sh
+++ b/t/t4301-merge-tree-write-tree.sh
@@ -93,7 +93,7 @@ test_expect_success 'Barf on too many arguments' '
 '

 test_expect_success 'test conflict notices and such' '
-       test_expect_code 1 git merge-tree --write-tree --exclude-modes-oids-stages side1 side2 >out &&
+       test_expect_code 1 git merge-tree --write-tree --conflict-format="%(path)" side1 side2 >out &&
        sed -e "s/[0-9a-f]\{40,\}/HASH/g" out >actual &&

        # Expected results:
@@ -115,8 +115,35 @@ test_expect_success 'test conflict notices and such' '
        test_cmp expect actual
 '

+test_expect_success 'merge-tree --unique-conflicts is the default' '
+       test_expect_code 1 git merge-tree --write-tree --conflict-format="%(path)" --no-messages side1 side2 >out &&
+       sed 1d <out >actual &&
+       cat >expect <<-\EOF &&
+       greeting
+       whatever~side1
+       EOF
+       test_cmp expect actual &&
+
+       test_expect_code 1 git merge-tree --write-tree --conflict-format="%(path)" --no-messages side1 side2 >out2 &&
+       sed 1d <out2 >actual2 &&
+       test_cmp actual actual2
+'
+
+test_expect_success 'merge-tree --no-unique-conflicts' '
+       test_expect_code 1 git merge-tree --write-tree --conflict-format="%(path)" --no-unique-conflicts --no-messages side1 side2 >out &&
+       sed 1d <out >actual &&
+       cat >expect <<-\EOF &&
+       greeting
+       greeting
+       greeting
+       whatever~side1
+       whatever~side1
+       EOF
+       test_cmp expect actual
+'
+
 test_expect_success 'Just the conflicted files without the messages' '
-       test_expect_code 1 git merge-tree --write-tree --no-messages --exclude-modes-oids-stages side1 side2 >out &&
+       test_expect_code 1 git merge-tree --write-tree --no-messages --conflict-format="%(path)" side1 side2 >out &&
        sed -e "s/[0-9a-f]\{40,\}/HASH/g" out >actual &&

        test_write_lines HASH greeting whatever~side1 >expect &&
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help