Suppressing hyphenation (was: [PATCH] mctp.7: Add man page for Linux MCTP support)
From: G. Branden Robinson <hidden>
Date: 2021-11-22 01:09:25
[Jeremy Kerr dropped from CC--I hope that's okay] Hi Alex, Getting back to this after a month... At 2021-10-18T09:53:54+0200, Alejandro Colomar (man-pages) wrote:
On 10/18/21 9:16 AM, Alejandro Colomar (man-pages) wrote:quoted
quoted
So we might write .B struct\~\%sockaddr_mctpOkay.Actually, wouldn't it be better to just write?: .B \%struct\~sockaddr_mctp This way \% applies to the whole (even if it was unnecessary for 'struct\~').
In fact it does not apply to the whole; '\~' still counts as a word
delimiter to groff even if it is not a permissible location for a
"break" (line break).
Before I bust out the long explanation, I'll try to present some short
advice for man page writers.
* If you wish to suppress hyphenation with the '\%' escape sequence,
place it at the _beginning_ of each such word. Except for special
character escape sequences like '\-', '\(ha', and '\[aq]', most groff
escape sequences act as word boundaries, so you may need to specify
'\%' before each word in a series, as in '\%typedef\~int\~\%strsize'.
Now for the deeper dive.
As strange as it may seem, this is consistent with the behavior of
hyphenation when it encounters most other escape sequences[1] (most of
which a portable man page should not attempt to use). The key factor to
consider in matters of hyphenation suppression is where the _word
boundaries_ are, not where white space appears.
By contrast, anything that formats a glyph for output generally _is_
part of a word. But only glyphs that not part of natural language words
(in English, [A-Za-z]) are eligible for adjacent hyphenation.
Here's the documentation of '\%' (and '\:') from the Info documentation
of the forthcoming groff 1.23.0 release.
[[
-- Escape: \%
-- Escape: \:
To tell GNU 'troff' how to hyphenate words as they occur in input,
use the '\%' escape, also known as the "hyphenation character".
Each instance within a word indicates to GNU 'troff' that the word
may be hyphenated at that point, while prefixing a word with this
escape prevents it from being otherwise hyphenated. This mechanism
affects only that occurrence of the word; to change the hyphenation
of a word for the remainder of input processing, use the 'hw'
request.
GNU 'troff' regards the escapes '\X' and '\Y' as starting a word;
that is, the '\%' escape in, say, '\X'...'\%foobar' or
'\Y'...'\%foobar' no longer prevents hyphenation of 'foobar' but
inserts a hyphenation point just prior to it; most likely this
isn't what you want. *Note Postprocessor Access::.
The '\:' escape inserts a non-printing break point; that is, the
word can break there, but the soft hyphen glyph (see below) is not
written to the output if it does. This escape is an input word
boundary, so the remainder of the word is subject to hyphenation as
normal.
You can use '\:' and '\%' in combination to control breaking of a
file name or URL or to permit hyphenation only after certain
explicit hyphens within a word.
The \%Lethbridge-Stewart-\:\%Sackville-Baggins divorce
was, in retrospect, inevitable once the contents of
\%/var/log/\:\%httpd/\:\%access_log on the family web
server came to light, revealing visitors from Hogwarts.
]]
Here's a short shell script to tell you where your installed
version of groff will hyphenate words: it forces hyphenation to occur at
every possible location.
$ cat ~/bin/hyphen
#!/bin/sh
for W
do
printf ".hy 4\n.ll 1u\n%s\n" "$W" | nroff -Wbreak | sed '/^$/d' \
| tr -d '\n'
echo
done
$ LC_ALL=C hyphen antidisestablishmentarianism 'struct\\~sockaddr'
an-tidis-es-tab-lish-men-tar-i-an-ism
struct\~sock-addr
$ LC_ALL=C hyphen sockaddr \\%sockaddr \\%sock\\%addr sock_addr sock^addr
sock-addr
sockaddr
sock-addr
sock_addr
sock^addr
(I set the locale so as to keep this email strictly "basic Latin", groff
will happily emit proper Unicode hyphens U+2010 to a supporting output
device.)
You can see from the above that we can't recklessly sprinkle '\%': apart
from looking ugly, '\%' at the beginning of a word suppresses only
_automatic_ hyphenation. If you specify it both at the beginning _and_
within a word, its other meaning of marking a hyphenation point is
still honored.
Regards,
Branden
[1] There are a few exceptions, like those which "don't produce an input
token" as the groff Texinfo manual puts it, a construction that is more
intelligible to the groff developer than the groff user. These
have to do with escape sequences that change the way glyphs are
rendered, such as changes to the font style or family, type size, or
stroke or fill colors. Most of these should never occur in portable man
pages and even '\f' is, in my view, better handled with man(7) font
style macros and the '\c' escape sequence if required for break
suppression. Attachments
- signature.asc [application/pgp-signature] 833 bytes