Re: [PATCH v2 3/3] sequencer: reencode to utf-8 before arrange rebase's todo list
From: Danh Doan <hidden>
Date: 2019-11-02 01:02:21
On 2019-11-01 12:59:21 -0400, Jeff King wrote:
On Fri, Nov 01, 2019 at 03:25:11PM +0700, Doan Tran Cong Danh wrote:quoted
for encoding in utf-8 iso-8859-1; do # commit using the encoding echo $encoding >file && git add file echo "éñcödèd with $encoding" | iconv -f utf-8 -t $encoding | git -c i18n.commitEncoding=$encoding commit -F - # and then fixup without it echo "$encoding fixed" >file && git add file git commit --fixup HEAD done git rebase -i --autosquash --rootIs it worth adding this as a test in t3900?
I think yes, but with a little more work. I'll make it as a separated patch in a re-roll.
quoted
parse_commit(item->commit); - commit_buffer = get_commit_buffer(item->commit, NULL); + commit_buffer = logmsg_reencode(item->commit, NULL, "UTF-8");I think there are several other spots in this file that could use the same treatment. But I can live with it if you want to just fix the one that's bugging you and move on. It's still a strict improvement.
There're 6 more occurence of get_commit_buffer in sequencer.c, and 13 occurences in other C source files. I'll try to figure out if it's safe to change. Anyway, if we're going to working with a single encoding internally, can we take other extreme approach: reencode the commit message to utf-8 before writing the commit object? (Is there any codepoint in other encoding that can't be reencoded to utf-8?) Since git-log and friends are doing 2 steps conversion for commit message for now (reencode to utf-8 first, then reencode again to get_log_output_encoding()). With this new approach, first step is likely a noop (but must be kept for backward compatible). -- Danh