Re: [GSoC PATCH 1/1] diff: improve scaling of filenames in diffstat to handle UTF-8 chars
From: Lorenzo Pegorari <hidden>
Date: 2026-01-16 00:00:37
On Wed, Jan 14, 2026 at 02:50:02PM -0800, Junio C Hamano wrote:
LorenzoPegorari [off-list ref] writes:quoted
The `show_stats()` function tries to scale the filenames in the diffstat to ensure they don't exceed the given `name-width`. It does so by calculating the "display width" of the characters to be dropped, but then advances the filename pointer by that number of bytes. However, the "display width" of a character is not always equal to its byte count. The result is that sometimes, when displaying UTF-8 characters, filenames exceed the given `name-width`, and frequently the bytes of the UTF-8 characters are truncated. The following is an example of the issue, where the 2 files are "HelloHi" and "Hello你好", and `name-width=6`: ...oHi | 0 ...<BD><A0>好 | 0 Make the filename pointer move by the actual number of bytes of the characters to drop from the filename, rather than their display width, using the `utf8_width()` function. Signed-off-by: LorenzoPegorari <redacted> --- diff.c | 15 ++++----------- 1 file changed, 4 insertions(+), 11 deletions(-)Two comments and a half. * The change needed for this is surprisingly simple.
It is indeed surprisingly simple, I agree!
* You already know about samples that may exhibit the issue you are addressing. Can we add it as a test case somewhere in t/ directory?
Yeah, we should add a test case. I will do it in the next reroll.
* The NEEDSWORK item addressed by this patch is one of the two NEEDSWORK items added by ce8529b2 (diff: leave NEEDWORK notes in show_stats() function, 2022-10-21). Makes me wonder how involved the changes would need to be to solve the other one?
Mmh, I see. I'll take a closer look, but at a first glance it doesn't seem too involved.
Thanks.
Thank you!
quoted
diff --git a/diff.c b/diff.c index a68ddd2168..271ace5728 100644 --- a/diff.c +++ b/diff.c@@ -2859,17 +2859,10 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) char *slash; prefix = "..."; len -= 3; - /* - * NEEDSWORK: (name_len - len) counts the display - * width, which would be shorter than the byte - * length of the corresponding substring. - * Advancing "name" by that number of bytes does - * *NOT* skip over that many columns, so it is - * very likely that chomping the pathname at the - * slash we will find starting from "name" will - * leave the resulting string still too long. - */ - name += name_len - len; + + while (name_len > len) + name_len -= utf8_width((const char**)&name, NULL); + slash = strchr(name, '/'); if (slash) name = slash;