Thread (79 messages) 79 messages, 6 authors, 2019-11-11

Re: [PATCH v2 3/3] sequencer: reencode to utf-8 before arrange rebase's todo list

From: Danh Doan <hidden>
Date: 2019-11-02 01:02:21

On 2019-11-01 12:59:21 -0400, Jeff King wrote:
On Fri, Nov 01, 2019 at 03:25:11PM +0700, Doan Tran Cong Danh wrote:
quoted
for encoding in utf-8 iso-8859-1; do
	# commit using the encoding
	echo $encoding >file && git add file
	echo "éñcödèd with $encoding" | iconv -f utf-8 -t $encoding |
	  git -c i18n.commitEncoding=$encoding commit -F -
	# and then fixup without it
	echo "$encoding fixed" >file && git add file
	git commit --fixup HEAD
done
git rebase -i --autosquash --root
Is it worth adding this as a test in t3900?
I think yes, but with a little more work.
I'll make it as a separated patch in a re-roll.
quoted
 		parse_commit(item->commit);
-		commit_buffer = get_commit_buffer(item->commit, NULL);
+		commit_buffer = logmsg_reencode(item->commit, NULL, "UTF-8");
I think there are several other spots in this file that could use the
same treatment. But I can live with it if you want to just fix the one
that's bugging you and move on. It's still a strict improvement.
There're 6 more occurence of get_commit_buffer in sequencer.c, and 13
occurences in other C source files. I'll try to figure out if it's
safe to change.

Anyway, if we're going to working with a single encoding internally,
can we take other extreme approach: reencode the commit message to
utf-8 before writing the commit object? (Is there any codepoint in
other encoding that can't be reencoded to utf-8?)
Since git-log and friends are doing 2 steps conversion for commit
message for now (reencode to utf-8 first, then reencode again to
get_log_output_encoding()). With this new approach, first step is
likely a noop (but must be kept for backward compatible).

-- 
Danh
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help