[PATCH] utf-8: include RFC 3629 and clarify endianness which is left ambiguous
From: Shawn Landden <hidden>
Date: 2015-05-26 06:53:06
Subsystem:
the rest · Maintainer:
Linus Torvalds
From: Shawn Landden <hidden>
Date: 2015-05-26 06:53:06
Subsystem:
the rest · Maintainer:
Linus Torvalds
The endianness is suggested by the order the bytes are displayed, but the text is ambiguous. --- man7/utf-8.7 | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/man7/utf-8.7 b/man7/utf-8.7
index 597fad4..bbb016c 100644
--- a/man7/utf-8.7
+++ b/man7/utf-8.7@@ -133,12 +133,14 @@ The sequence to be used depends on the UCS code number of the character: The .I xxx bit positions are filled with the bits of the character code number in -binary representation. +binary representation, most significant bit first (big-endian). Only the shortest possible multibyte sequence which can represent the code number of the character can be used. .PP The UCS code values 0xd800\(en0xdfff (UTF-16 surrogates) as well as 0xfffe and -0xffff (UCS noncharacters) should not appear in conforming UTF-8 streams. +0xffff (UCS noncharacters) should not appear in conforming UTF-8 streams. According +to RFC 3629 no point above U+10FFFF should be used, which limits characters to four +bytes. .SS Example The Unicode character 0xa9 = 1010 1001 (the copyright sign) is encoded in UTF-8 as
--
2.2.1.209.g41e5f3a
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html