Thread (50 messages) 50 messages, 2 authors, 2025-03-04

Re: [PATCH v2 09/39] scripts/kernel-doc.py: add a Python parser

From: Jonathan Corbet <corbet@lwn.net>
Date: 2025-02-24 23:39:00
Also in: linux-hardening, lkml

Mauro Carvalho Chehab [off-list ref] writes:
Maintaining kernel-doc has been a challenge, as there aren't many
perl developers among maintainers. Also, the logic there is too
complex. Having lots of global variables and using pure functions
doesn't help.

Rewrite the script in Python, placing most global variables
inside classes. This should help maintaining the script in long
term.
[...]
quoted hunk ↗ jump to hunk
diff --git a/scripts/kernel-doc.py b/scripts/kernel-doc.py
new file mode 100755
index 000000000000..5cf5ed63f215
--- /dev/null
+++ b/scripts/kernel-doc.py
@@ -0,0 +1,2757 @@
+#!/usr/bin/env python3
+# pylint: disable=R0902,R0903,R0904,R0911,R0912,R0913,R0914,R0915,R0917,R1702
+# pylint: disable=C0302,C0103,C0301
+# pylint: disable=C0116,C0115,W0511,W0613
+# Copyright(c) 2025: Mauro Carvalho Chehab <mchehab@kernel.org>.
+# SPDX-License-Identifier: GPL-2.0
The SPDX tag is supposed to be up top, right under the shebang

I also think you should give consideration to preserving the other
copyright notices in the Perl version.  A language translation doesn't
remove existing copyrights...who knows how much creativity went into
some of those regexes?
+# TODO: implement warning filtering
+
+"""
+kernel_doc
+==========
+
+Print formatted kernel documentation to stdout
+
+Read C language source or header FILEs, extract embedded
+documentation comments, and print formatted documentation
+to standard output.
+
+The documentation comments are identified by the "/**"
+opening comment mark.
+
+See Documentation/doc-guide/kernel-doc.rst for the
+documentation comment syntax.
+"""
+
+import argparse
+import logging
+import os
+import re
+import sys
+
+from datetime import datetime
+from pprint import pformat
+
+from dateutil import tz
+
+# Local cache for regular expressions
+re_cache = {}
+
+
+class Re:
So I have to say this bugs me a bit ... the class is fine, but the
one-letter case-only difference from the standard "re" class is just
going to make the code harder for others to approach.  "kern_re" or
something like that?  Or even "kre" if you really want it to be as short
as possible.
+    """
+    Helper class to simplify regex declaration and usage,
+
+    It calls re.compile for a given pattern. It also allows adding
+    regular expressions and define sub at class init time.
+
+    Regular expressions can be cached via an argument, helping to speedup
+    searches.
+    """
[...]
+
+class KernelDoc:
+    # Parser states
+    STATE_NORMAL        = 0        # normal code
+    STATE_NAME          = 1        # looking for function name
+    STATE_BODY_MAYBE    = 2        # body - or maybe more description
+    STATE_BODY          = 3        # the body of the comment
+    STATE_BODY_WITH_BLANK_LINE = 4 # the body which has a blank line
+    STATE_PROTO         = 5        # scanning prototype
+    STATE_DOCBLOCK      = 6        # documentation block
+    STATE_INLINE        = 7        # gathering doc outside main block
+
+    st_name = [
+        "NORMAL",
+        "NAME",
+        "BODY_MAYBE",
+        "BODY",
+        "BODY_WITH_BLANK_LINE",
+        "PROTO",
+        "DOCBLOCK",
+        "INLINE",
+    ]
So these ... kind of look like enums?

That's kind of it for nits ... I do have one wish that will kind of hard
to grant overall ... for the long-term maintenance of this code, it
would be really nice if every non-trivial regex were described by a
comment explaining what it is trying to do.  It's not reasonable to
expect that as a condition for accepting this rewrite, but it sure would
be a nice goal to be working toward.

Thanks,

jon
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help