Re: [RFC] send-email: UTF-8 encoding in subject line
From: Shreyansh Paliwal <hidden>
Date: 2026-02-22 15:56:11
Possibly related (same subject, not in this thread)
- 2026-02-21 · Re: [RFC] send-email: UTF-8 encoding in subject line · Shreyansh Paliwal <hidden>
- 2026-02-21 · Re: [RFC] send-email: UTF-8 encoding in subject line · Ben Knoble <hidden>
- 2026-02-20 · [RFC] send-email: UTF-8 encoding in subject line · Shreyansh Paliwal <hidden>
On Sun, Feb 22, 2026 at 9:07 AM Shreyansh Paliwal [off-list ref] wrote:quoted
quoted
quoted
That makes sense, I tried it below. I also wondered whether, in addition to this, it might be helpful to warn on an invalid charset, and/or possibly fall back to UTF-8.Agreed on the first half of the statement, if we have an easy and portable way to tell if a given random string names a valid charset. I do not recommend to "fall back" to anything, if we are asking an input from the user.Following up on this, I tried adding a warning when the provided charset does not appear to be valid. Current flow is, Which 8bit encoding should I declare [UTF-8]? y Are you sure you want to use <y> [y/N]? y With the additional check, it becomes, Which 8bit encoding should I declare [default: UTF-8]? y warning: 'y' does not appear to be a valid charset name. Are you sure you want to use <y> [y/N]? This uses find_encoding() from Perl’s Encode module to detect any unrecognized charset names. Let me know what you think. Also, is there any new test that should be added for this change? Signed-off-by: Shreyansh Paliwal <redacted> --- git-send-email.perl | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)diff --git a/git-send-email.perl b/git-send-email.perl index cd4b316ddc..e62fa259ba 100755 --- a/git-send-email.perl +++ b/git-send-email.perl@@ -23,6 +23,7 @@ use Git::LoadCPAN::Error qw(:try); use Git; use Git::I18N; +use Encode qw(find_encoding); Getopt::Long::Configure qw/ pass_through /;@@ -1044,9 +1045,25 @@ sub file_declares_8bit_cte { foreach my $f (sort keys %broken_encoding) { print " $f\n"; } - $auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "), - valid_re => qr/.{4}/, confirm_only => 1, - default => "UTF-8"); + while (1) { + my $encoding = ask(__("Which 8bit encoding should I declare [default: UTF-8]? "), + valid_re => qr/^\S+$/, + default => "UTF-8");Here we change things, right? - The original validation is "at least 4 characters", the new validation is "at least one non-blank." I'm not sure why we'd prefer one or the other, frankly. The original goes to 852a15d748 (send-email: ask confirmation if given encoding name is very short, 2015-02-13), which is motivated by the same problem we're discussing here!
I see. My understanding of the earlier change (852a15d748) is that the length check was intended as a heuristic check to catch obviously invalid inputs like "y" and trigger an extra confirmation based on the fact that charset names would be at least 4 letters. With the additional find_encoding() check, the validation becomes semantic rather than length-based, recognized charset names are accepted directly, while unrecognized ones trigger a warning and still require explicit confirmation. The relaxed regex (at least one non-blank) is only meant to ensure we receive some non-empty input before passing it to find_encoding().
- We get rid of confirm_only, since we're about to roll our own confirmation below:quoted
+ next unless defined $encoding; + if (find_encoding($encoding)) { + $auto_8bit_encoding = $encoding; + last; + } + printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $encoding; + my $yesno = ask( + sprintf(__("Are you sure you want to use <%s> [y/N]? "), $encoding), + valid_re => qr/^(?:y|n)/i, + default => 'n');…which might want refactored a bit so it can stay close to the original? idk.
Actually the flow needed to change slightly to insert the validity warning before the final confirmation step. Since ask() handles confirmation internally using confrim_only and is used in multiple places, it seemed simpler to keep the additional confirmation local here rather than modifying ask() itself. Let me know what you think. Best, Shreyansh