Re: [RFC v2 00/38] Improve ABI documentation generation

From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: 2025-01-29 01:45:24
Also in: bpf, linux-arm-kernel, linux-block, linux-doc, linux-f2fs-devel, linux-hardening, linux-iio, linux-media, linux-pm, linux-usb, linux-wireless, lkml, workflows

Em Tue, 28 Jan 2025 15:42:00 -0700
Jonathan Corbet [off-list ref] escreveu:

Mauro Carvalho Chehab [off-list ref] writes:

quoted

Hi Jon/Greg,

That's the second version of my RFC patches meant to modenize the ABI
parser that I wrote in Perl.

I have a couple of minor comments on the individual patches, but overall
I do like this direction.

It would be nice, though, if the code were a bit more extensively
commented.  Parts of it get into the "twistly maze of regexes" mode that
can be awfully hard to follow.

The regex code is indeed complex, but documenting it is not an easy task.
Btw, they are (about) the same that the Perl script does. imported also
the documentation for there. I did some extra cleanups/optimizations there,
though, after checking the results of some expressions.

The big issue is that we don't have an uniform way of defining What: 
expressions. So, each subsystem (and/or author) document it in different
ways.

There are even some ABI symbols with:

	$(readlink)/sys/...

(I intend to send a patch for those later on)

and:

	What: /sys/something/...

	What: .../something_else

(I guess ".../" means "/sys/something/...", but I can't be sure, as this
is on one driver for a hardware I don't have - so, if I send a patch,
I may end breaking it)

If you want to understand how the whole set of regexes work, you can
run:

	$ ./scripts/get_abi.py -d 16 undefined --dry-run >/dev/null
...
	[DEBUG] /sys/kernel/mm/damon/admin/kdamonds/\w+/contexts/\w+/schemes/\w+/quotas/goals/\w+/current_value <== /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/goals/<G>/current_value
	[DEBUG] /sys/kernel/mm/damon/admin/kdamonds/\w+/contexts/\w+/schemes/\w+/quotas/goals/\w+/target_metric <== /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/goals/<G>/target_metric
	[DEBUG] /sys/kernel/mm/damon/admin/kdamonds/\w+/contexts/\w+/schemes/\w+/quotas/goals/\w+/target_value <== /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/goals/<G>/target_value
	[DEBUG] /sys/kernel/mm/damon/admin/kdamonds/\w+/contexts/\w+/schemes/\w+/quotas/goals/nr_goals     <== /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/goals/nr_goals
	[DEBUG] /sys/kernel/mm/damon/admin/kdamonds/\w+/contexts/\w+/schemes/\w+/quotas/ms                 <== /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/ms
	[DEBUG] /sys/kernel/mm/damon/admin/kdamonds/\w+/contexts/\w+/schemes/\w+/quotas/reset_interval_ms  <== /sys/kernel/mm/
...

This will place at stderr all regular expressions that are currently
parsed (they're currently used only for /sys symbols).

Yet, instead of spending too much time documenting them, IMO we shold
do the do the reverse: use the AbiRegex class to convert "What:" into
a new tag (like "Regex:") and use it as much as possible (we'll still
need "What:" for some things that aren't devnodes), as, with regular
expressions, symbols can be clearly documented. As on python match groups
can be named with:

	(?P<name>...)

this could be used to better describe some arguments, e.g. (picking an
easy case):

	What: /sys/module/<MODULENAME>/srcversion

could be described, instead, as:

	Regex: /sys/module/(?P<MODULENAME>[\w\-]+)/srcversion

The Kernel_abi extension (actually AbiParser class) can either display it
as-is (my personal preference), or even replace:
	(?P<MODULENAME>[\w\-]+)
with:
	MODULENAME

and still output this at html/pdf output as before, e. g.:

	What: /sys/module/<MODULENAME>/srcversion

Yet, doing it on a consistent way.

This is easier said than done, as if we do some automatic conversion,
subsystem reviewers/maintainers will need to double-check if the
converted expressions make sense.

quoted

On this series we have:

patches 1 to 11: several bug fixes addressing issues at ABI symbols;

1-3 aren't needed - it seems you already upstreamed #2?

For the rest, is there any reason to not apply them right away?  They
just seem like worthwhile fixes.

quoted

patch 12: a fix for scripts/documentation-file-ref-check
patches 13-15: create new script with rest and search logic and 
  minimally integrate with kernel_abi Sphinx extension(phase 1);
patches 16-19: implement phase 2: class integration (phase 2);
patch 20: fix a bug at kernel_abi: the way it splits lines is buggy;
patches  21-24: rewrite kernel_abi logic to make it simpler and more
  robust;
patches 25-27: add cross-reference support at automarkup;
patches 28-36: several ABI cleanups to cope with the improvements;
patch 37: implement undefined command;
patch 38: get rid of the old Perl script.

To make it easier to review/apply, I may end breaking the next version
on a couple of different patchsets. Still it would be nice to have more
people testing it and providing some feedback.

I've looked over everything, though with limited depth.

My testing hasn't turned up any problems.

Great!

I've only tested with current Sphinx,
have you tried this with the more ancient versions we support?

Not yet, but I double-checked at Sphinx documentation to be sure that
I won't be using any newer methods: I just kept using the same Sphinx
API as used by other extensions at the Kernel.

For instance this loop:

    def do_parse(self, content, node):
        with switch_source_input(self.state, content):
            self.state.nested_parse(content, 0, node, match_titles=1)

was changed on Sphinx 7.4[1], and even nested_parse(match_titles=1) is
not the recommended code for versions < 7.4, as there is this 
replacement function:

	nested_parse_with_titles()

Yet, as they're working fine at least up to version 8.1.3, we can
keep using the old way.

In any case, I'll do a test before sending the final version to see if
it works fine with our minimal version.

[1] See: https://www.sphinx-doc.org/en/master/extdev/markupapi.html

- 

On a separate discussion, I noticed one potential compatibility issue
we may have with future Python versions, due to some ascii texts
formatted as unicode. I'll send later a patch fixing them.

Additionally, automarkup has backward-compatible code with Python 2.7.
 Can I send patches dropping 2.7 support from Sphinx extensions?

[It's probably time to raise our minimum version again, especially now
that current Sphinx has better performance.]

Agreed. 

IMO, we should also increase Python's minimal version.

I don't see a whole lot of reasons not to apply this set shortly after
the merge window; anybody disagree?

Thanks,
Mauro

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help