Re: [bug] git diff --word-diff gives wrong result for utf-8 chinese
From: Phillip Wood <hidden>
Date: 2022-12-01 14:51:37
From: Phillip Wood <hidden>
Date: 2022-12-01 14:51:37
Hi Ping On 01/12/2022 07:33, Ping Yin wrote:
quoted
quoted
If the rule is "break on ascii whitespace",Is there a way to achieve this: break english by word, and break chinese by utf-8 character
You could extend your current regex so that it matches whole utf-8 codepoints which is what git does for the builtin userdiff regexes. I've not tested it but I think git config --global diff.wordregex "[[:alnum:]_]+|[^[:space:]]|$(printf '[\xc0-\xff][\x80-\xbf]+')" should work. The downside is that you end up with a .gitconfig that is not valid utf-8. Perhaps someone else has a clever idea to get around that. Best Wishes Phillip