Re: [PATCH v4 8/8] sequencer: reencode commit message for am/rebase --show-current-patch
From: Danh Doan <hidden>
Date: 2019-11-07 07:49:04
On 2019-11-07 01:32:23 -0500, Jeff King wrote:
On Thu, Nov 07, 2019 at 09:56:19AM +0700, Doan Tran Cong Danh wrote:quoted
The message file will be used as commit message for the git-{am,rebase} --continue. [...] strbuf_addf(&buf, "%s/message", get_dir(opts)); if (!file_exists(buf.buf)) { - const char *commit_buffer = get_commit_buffer(commit, NULL); + const char *encoding = get_commit_output_encoding(); + const char *commit_buffer = logmsg_reencode(commit, NULL, encoding);That makes sense, though it's hard to understand the flow of this data through multiple sequencer invocations. I _think_ this would be fixing a case like this: -- >8 -- git init repo cd repo # some commits to build off of echo base >file git add file git commit -m base echo side >file git add file git commit -m side # now make a commit in iso8859-1 git checkout -b side HEAD^ echo iso8859-1 >file git add file iconv -f utf8 -t iso8859-1 <<-\EOF | súbject bödy EOF git -c i18n.commitEncoding=iso8859-1 commit -F - # and rebase it with the merge strategy, which will fail; # now .git/rebase-merge/message has iso8859-1 in it git rebase -m master # and if we resolve and commit, presumably we'd get a broken commit, # with iso8859-1 and no encoding header echo resolved >file git add file GIT_EDITOR=: git rebase --continue -- 8< -- But somehow it all seems to work. The resulting commit has real utf8 in it. I'm not sure if we pull it from the original commit via "commit -c",
Yes, somehow it worked. But, without this patch, git also warns:
% GIT_EDITOR=: git rebase --continue
Warning: commit message did not conform to UTF-8.
You may want to amend it after fixing the message, or set the config
variable i18n.commitencoding to the encoding your project uses.
Checking with strace (on glibc, musl strace can't trace execve):
[pid 12848] execve("/home/danh/workspace/git/git", ["/home/danh/workspace/git/git", "commit", "-n", "-F", ".git/rebase-merge/message", "-e", "--allow-empty"], 0x558fb02e8240 /* 51 vars */) = 0
Turn out, it's because of: commit.c::verify_utf8
/*
* This verifies that the buffer is in proper utf8 format.
*
* If it isn't, it assumes any non-utf8 characters are Latin1,
* and does the conversion.
*/
static int verify_utf8(struct strbuf *buf)
Hence, your test is just pure luck (because it's in latin1).
--
Danh