Thread (55 messages) 55 messages, 5 authors, 2026-03-13

Re: [Intel-wired-lan] [PATCH 00/38] docs: several improvements to kernel-doc

From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: 2026-03-13 10:48:49
Also in: intel-wired-lan, linux-doc, linux-hardening, lkml

On Wed, 04 Mar 2026 12:07:45 +0200
Jani Nikula [off-list ref] wrote:
On Mon, 23 Feb 2026, Jonathan Corbet [off-list ref] wrote:
quoted
Jani Nikula [off-list ref] writes:
 
quoted
There's always the question, if you're putting a lot of effort into
making kernel-doc closer to an actual C parser, why not put all that
effort into using and adapting to, you know, an actual C parser?  
Not speaking to the current effort but ... in the past, when I have
contemplated this (using, say, tree-sitter), the real problem is that
those parsers simply strip out the comments.  Kerneldoc without comments
... doesn't work very well.  If there were a parser without those
problems, and which could be made to do the right thing with all of our
weird macro usage, it would certainly be worth considering.  
I think e.g. libclang and its Python bindings can be made to work. The
main problems with that are passing proper compiler options (because
it'll need to include stuff to know about types etc. because it is a
proper parser), preprocessing everything is going to take time, you need
to invest a bunch into it to know how slow exactly compared to the
current thing and whether it's prohitive, and it introduces an extra
dependency.

So yeah, there are definitely tradeoffs there. But it's not like this
constant patching of kernel-doc is exactly burden free either. 
On my tests with a simple C tokenizer:

	https://lore.kernel.org/linux-doc/cover.1773326442.git.mchehab+huawei@kernel.org/ (local)

The tokenizer is working fine and didn't make it much slow: it
increases the time to pass the entire Kernel tree from 37s to 47s
for man pages generation, but should not change much the time for
htmldocs, as right now only ~4 seconds is needed to read files
pointed by Documentation kernel-doc tags and parse them.

The code can still be cleaned up, as there are still some things
hardcoded on the various dump_* functions that could be better
implemented (*).

The advantage of the approach I'm using is that it allows to
gradually migrate to rely at the tokenized code, as it can be done
incrementally.

(*) for instance, __attribute__ and a couple of other macros are parsed
    twice at dump_struct() logic, on different places.
I don't
know, is it just me, but I'd like to think as a profession we'd be past
writing ad hoc C parsers by now.
Probably not, but I don't think we need a C parser, as kernel-doc
just needs to understand data types (enum, struct, typedef, union,
vars) and function/macro prototypes.

For such purpose, a tokenizer sounds enough.

Now, there is the code that it is now inside:
	https://github.com/mchehab/linux/blob/tokenizer-v5/tools/lib/python/kdoc/xforms_lists.py

which contains a list of C/gcc/clang keywords that will
be ignored, like:

	__attribute__
	static
	extern
	inline

Together with a sanitized version of the kernel macros it needs
to handle or ignore:

	DECLARE_BITMAP
	DECLARE_HASHTABLE
 	__acquires
	__init
	__exit
	struct_group
	...


Once we finish cleaning up kdoc_parser.py to rely only
on it for prototype transformations, this will be the only file
that will require changes when more macros start affecting 
kernel-doc.

As this is complex, and may require manual adjustments, it
is probably better to not try to auto-generate xforms list
in runtime. A better approach is, IMO, to have a C pre-processor
code to help periodically update it, like using a target like:

	make kdoc-xforms

that would use either cpp or clang to generate a patch to
update xforms_list content after adding new macros that
affect docs generation.

-- 
Thanks,
Mauro
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help