Re: [RFC] send-email: UTF-8 encoding in subject line

From: D. Ben Knoble <hidden>
Date: 2026-02-22 15:00:40

Possibly related (same subject, not in this thread)

2026-02-21 · Re: [RFC] send-email: UTF-8 encoding in subject line · Shreyansh Paliwal <hidden>
2026-02-21 · Re: [RFC] send-email: UTF-8 encoding in subject line · Ben Knoble <hidden>
2026-02-20 · [RFC] send-email: UTF-8 encoding in subject line · Shreyansh Paliwal <hidden>

On Sun, Feb 22, 2026 at 9:07 AM Shreyansh Paliwal
[off-list ref] wrote:

quoted hunk ↗ jump to hunk

quoted

That makes sense, I tried it below.
I also wondered whether, in addition to this, it might be helpful to warn on
an invalid charset, and/or possibly fall back to UTF-8.

Agreed on the first half of the statement, if we have an easy and
portable way to tell if a given random string names a valid charset.
I do not recommend to "fall back" to anything, if we are asking an
input from the user.

Following up on this, I tried adding a warning when the provided charset
does not appear to be valid. Current flow is,

  Which 8bit encoding should I declare [UTF-8]? y
  Are you sure you want to use <y> [y/N]? y

With the additional check, it becomes,

  Which 8bit encoding should I declare [default: UTF-8]? y
  warning: 'y' does not appear to be a valid charset name.
  Are you sure you want to use <y> [y/N]?

This uses find_encoding() from Perl’s Encode module to detect any
unrecognized charset names.

Let me know what you think.
Also, is there any new test that should be added for this change?

Signed-off-by: Shreyansh Paliwal <redacted>
---
 git-send-email.perl | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index cd4b316ddc..e62fa259ba 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl

@@ -23,6 +23,7 @@
 use Git::LoadCPAN::Error qw(:try);
 use Git;
 use Git::I18N;
+use Encode qw(find_encoding);

 Getopt::Long::Configure qw/ pass_through /;

@@ -1044,9 +1045,25 @@ sub file_declares_8bit_cte {
        foreach my $f (sort keys %broken_encoding) {
                print "    $f\n";
        }
-       $auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "),
-                                 valid_re => qr/.{4}/, confirm_only => 1,
-                                 default => "UTF-8");
+       while (1) {
+               my $encoding = ask(__("Which 8bit encoding should I declare [default: UTF-8]? "),
+                       valid_re => qr/^\S+$/,
+                       default  => "UTF-8");

Here we change things, right?

- The original validation is "at least 4 characters", the new
validation is "at least one non-blank." I'm not sure why we'd prefer
one or the other, frankly. The original goes to 852a15d748
(send-email: ask confirmation if given encoding name is very short,
2015-02-13), which is motivated by the same problem we're discussing
here!
- We get rid of confirm_only, since we're about to roll our own
confirmation below:

+               next unless defined $encoding;
+               if (find_encoding($encoding)) {
+                       $auto_8bit_encoding = $encoding;
+                       last;
+               }
+               printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $encoding;
+               my $yesno = ask(
+                       sprintf(__("Are you sure you want to use <%s> [y/N]? "), $encoding),
+                       valid_re => qr/^(?:y|n)/i,
+                       default  => 'n');

…which might want refactored a bit so it can stay close to the original? idk.

+               if (defined $yesno && $yesno =~ /^y/i) {
+                       $auto_8bit_encoding = $encoding;
+                       last;
+               }
+       }
 }

 if (!$force) {
--
2.53.0



-- 
D. Ben Knoble

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help