Re: [BUG] Unicode filenames handling in `git log --stat`
From: Alexander Meshcheryakov <hidden>
Date: 2022-08-09 19:12:39
Hi Calvin,
Sure, let me demonstrate with clean git repo:
mkdir git_test; cd git_test
git init
touch Київ.txt Kyiv.txt Маріуполь.txt Mariupol.txt
git add -A
git commit -m 'foobar'
Now let's check with GNU awk `git log --stat` strings width in bytes:
$ git log --stat | LC_ALL=C awk '/txt/{print length($0), $0}'
27 Kyiv.txt | 0
27 Mariupol.txt | 0
27 Київ.txt | 0
27 Маріуполь.txt | 0
And strings width in unicode characters:
$ git log --stat | LC_ALL=en_US.UTF-8 awk '/txt/{print length($0), $0}'
27 Kyiv.txt | 0
27 Mariupol.txt | 0
23 Київ.txt | 0
18 Маріуполь.txt | 0
See, all lines are aligned to have length 27 bytes. But on the screen
this looks distorted because length in characters differs.
On Tue, 9 Aug 2022 at 22:20, Calvin Wan [off-list ref] wrote:Hi Alexander, Thank you for the report! I attempted to reproduce with the steps you provided, but was unable to do so. What commands would I have to run on a clean git repository to reproduce this? - Calvin