Thread (12 messages) 12 messages, 4 authors, 2018-12-29
STALE2735d

[PATCH 1/2] Documentation: document UTF-16-related behavior

From: brian m. carlson <hidden>
Date: 2018-12-27 02:17:49
Subsystem: documentation, the rest · Maintainers: Jonathan Corbet, Linus Torvalds

There are a number of broken Windows programs which want to process
files in a UTF-16 variant that is always little endian and always
contains a BOM. Git cannot produce or accept such an encoding for the
working-tree-encoding because no such encoding has been defined with
IANA or implemented in iconv(3).

Document this behavior since it is a frequent source of confusion for
users. Additionally, document that specifying "UTF-16" may produce bytes
of either endianness, but will be sure to provide a BOM to distinguish.

Signed-off-by: brian m. carlson <redacted>
---
 Documentation/gitattributes.txt | 5 +++++
 1 file changed, 5 insertions(+)
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index b8392fc330..2b2c93afd1 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -330,6 +330,11 @@ That operation will fail and cause an error.
 - Reencoding content requires resources that might slow down certain
   Git operations (e.g 'git checkout' or 'git add').
 
+- It is not possible to specify a variant of UTF-16 with a BOM and a
+  specified endianness, because no such variants have been standardized.
+  Using "UTF-16" will produce a BOM with an unspecified endianness, and
+  using "UTF-16LE" or "UTF-16BE" will prohibit a BOM from being used.
+
 Use the `working-tree-encoding` attribute only if you cannot store a file
 in UTF-8 encoding and if you want Git to be able to process the content
 as text.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help