Re: [PATCH v7 3/4] notes.c: introduce '--separator=<paragraph-break>' option

From: Teng Long <hidden>
Date: 2023-03-29 14:20:00

Junio C Hamano [off-list ref] writes:

quoted

The default behavour sometimes is not enough, the user may want
to use a custom delimiter between paragraphs, like when
specifiy '-m', '-F', '-C', '-c' options. So this commit

"like when specifying", you mean?

"when executing"?

quoted

    $ git notes add -m foo -m bar --separator="-"
    $ git notes show HEAD | cat
    foo
    -
    bar

A newline is added to the value given to --separator if it

"a newline is ...", because this is not starting a new sentence, but
continuing from the "for example, when ..." above.

"a newline is ...", because this is not starting a new sentence, but
continuing from the "for example, when ..." above.

quoted

does not end with one already. So execute:

      $ git notes add -m foo -m bar --separator="-"
and
      $ export LF="
      "
      $ git notes add -m foo -m bar --separator="-$LF"

Running A and B produces the same result.

Here, it is totally unclear what A and B refers to.  "both
... and ... produce the same result" or something?

will fix, thanks.

quoted

@@ -101,6 +102,9 @@ struct note_data {
 	int use_editor;
 	char *edit_path;
 	struct strbuf buf;
+	struct strbuf **messages;
+	size_t msg_nr;
+	size_t msg_alloc;
 };

Hmph, OK.  I'd think twice before allowing an array of strbufs,
though.  strbuf is an excellent data structure to allow editing
string safely, and an array of strbufs may be very useful when these
strings need to be edited simultaneously, but such a use case is
very rare and I suspect this would not be it---rather, don't we take
one string at a time while processing each option, perhaps running
stripspace, and then after that aren't we done with the string until
when we finally have these N strings to loop over and concatenate
into a single string?  It would be sensible to use a strbuf to
produce the concatenation, but the strings to be concatenated do not
have to come from strbufs.  So my intuition, before reading the code
but after seeing the data structure, says that "const char **messages"
might be more appropriate here.  It sends a strong message about the
code structure and "phases" of processing, i.e. "we read each piece
and store it in the array once we are done preprocessing; then in
the next phase we concatenate them".

I used string in previous patch, but I found there will be a problem when
specify a binary file to '-F', that is, a binary file maybe contains NUL in
the middle, then if we feed the content to a string, it will be cut off when
appending to the 'string_list'.  So, it seems that strbuf could solve the
issue (will be reproduced as a failure in t3307).

quoted

@@ -209,71 +213,110 @@ static void write_note_data(struct note_data *d, struct object_id *oid)
 	}
 }

+static void insert_separator(struct strbuf *message, size_t pos)
+{
+	if (!separator)
+		strbuf_insertstr(message, pos, "\n");
+	else if (separator[strlen(separator) - 1] == '\n')
+		strbuf_insertstr(message, pos, separator);
+	else
+		strbuf_insertf(message, pos, "%s%s", separator, "\n");
+}

Do we need "insert" separator?  The caller in "concat" certainly
shouldn't---all it needs is "append".

OK, I'm willing to try to do that, "insert at a position" actually because
there is a related codepath need to add the separator in the very beginning.

quoted

+/* Consume messages and append them into d->buf, then free them */
+static void concat_messages(struct note_data *d)
+{
+	struct strbuf msg = STRBUF_INIT;
+
+	size_t i;
+	for (i = 0; i < d->msg_nr ; i++) {
+		if (d->buf.len)
+			insert_separator(&d->buf, d->buf.len);
+		strbuf_add(&msg, d->messages[i]->buf, d->messages[i]->len);
+		strbuf_addbuf(&d->buf, &msg);
+		strbuf_reset(&msg);
+		strbuf_release(d->messages[i]);
+		free(d->messages[i]);
+	}
+	strbuf_release(&msg);
+	free(d->messages);
+}

As I suspected earlier, with the way d->messages[i] are used, they
have no reason to be strbufs---they can and should be a simple
string "const char *".

As I said about why I choose strbuf in the above, but I expect your
further advice on that.

quoted

 static int parse_msg_arg(const struct option *opt, const char *arg, int unset)
 {
 	struct note_data *d = opt->value;
+	struct strbuf *buf;

 	BUG_ON_OPT_NEG(unset);

-	if (d->buf.len)
-		strbuf_addch(&d->buf, '\n');
-	strbuf_addstr(&d->buf, arg);
-	strbuf_stripspace(&d->buf, 0);
+	buf = xmalloc(sizeof(*buf));
+	strbuf_init(buf, strlen(arg));
+	strbuf_addstr(buf, arg);
+	strbuf_stripspace(buf, 0);

-	d->given = 1;
+	ALLOC_GROW_BY(d->messages, d->msg_nr, 1, d->msg_alloc);
+	d->messages[d->msg_nr - 1] = buf;
 	return 0;
 }

And this one can use an on-stack strbuf (initialized with STRBUF_INIT),
and strbuf_detach() the result into d->messages[].

quoted

 static int parse_file_arg(const struct option *opt, const char *arg, int unset)
 {
 	struct note_data *d = opt->value;
+	struct strbuf *buf;

Likewise.

For the same reason as above I think.

quoted

 	BUG_ON_OPT_NEG(unset);

-	if (d->buf.len)
-		strbuf_addch(&d->buf, '\n');
+	buf = xmalloc(sizeof(*buf));
+	strbuf_init(buf, 0);
+
 	if (!strcmp(arg, "-")) {
-		if (strbuf_read(&d->buf, 0, 1024) < 0)
+		if (strbuf_read(buf, 0, 1024) < 0)
 			die_errno(_("cannot read '%s'"), arg);
-	} else if (strbuf_read_file(&d->buf, arg, 1024) < 0)
+	} else if (strbuf_read_file(buf, arg, 1024) < 0)
 		die_errno(_("could not open or read '%s'"), arg);
-	strbuf_stripspace(&d->buf, 0);
+	strbuf_stripspace(buf, 0);

-	d->given = 1;
+	// we will note stripspace the buf here, because the file maybe
+	// is a binary and it maybe contains multiple continuous line breaks.

No // comments in our codebase.

Will fix.

quoted

@@ -567,7 +617,8 @@ static int append_edit(int argc, const char **argv, const char *prefix)
 	const struct object_id *note;
 	char *logmsg;
 	const char * const *usage;
-	struct note_data d = { .buf = STRBUF_INIT };
+	struct note_data d = { .buf = STRBUF_INIT, .messages = NULL };

Why explicitly initialize .messages to NULL when we are leaving
other members to zero initialized implicitly by using designated
initializers?

My bad, will fix.

quoted

@@ -618,7 +675,7 @@ static int append_edit(int argc, const char **argv, const char *prefix)
 		char *prev_buf = read_object_file(note, &type, &size);

 		if (d.buf.len && prev_buf && size)
-			strbuf_insertstr(&d.buf, 0, "\n");
+			insert_separator(&d.buf, 0);
 		if (prev_buf && size)
 			strbuf_insert(&d.buf, 0, prev_buf, size);
 		free(prev_buf);

Having to insert two things in front of d.buf feels somewhat
wasteful. ...

"wasteful" for?

... I wonder if we can easily reorder the logic to first read
the previous one, and then append new stuff to it, perhaps?

I think it can be done in this way, but in another single commit, perhaps?

It wouldn't be a huge deal.  If this weren't the only reason why we
need to invent insertstr that takes "where", I wouldn't even be
mentioning it.

This makes me think that there is a need for a method 'insert_separator'? For
example, to initialize 'separator' and then use 'strbuf_insertstr' directly?
I'm not sure, but it feels like it could work.

quoted

diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh
index 3288aaec..716192b5 100755
--- a/t/t3301-notes.sh
+++ b/t/t3301-notes.sh

@@ -5,6 +5,7 @@

 test_description='Test commit notes'

+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh

This is a strange commit to add this.  If the test and the code
involved in the test were leak-sanitizer clean before this commit,
then you should have been able to insert this without any other
change, and we should do it that way in a separate commit.  If we
are fixing an existing leak that the sanitizer complains, then that
is a different matter.  If that is the case, it makes perfect sense
to have this here, but the proposed commit log message should
explain that it is what is happening.

Will fix.

Thanks.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help