Thread (6 messages) 6 messages, 6 authors, 2024-12-01

Re: Extending whitespace checks

From: Jacob Keller <hidden>
Date: 2024-11-25 22:05:12

On Sat, Nov 23, 2024 at 6:25 PM Junio C Hamano [off-list ref] wrote:
We have, via the attributes subsystem, a way to choose from a set of
predefined whitespace rules so that "git diff" can notice that you
are adding trailing whitespaces to your newly written lines, or you
are indenting a newly introduced line in a Python script with a HT.
This can be used, for example, in pre-commit hook to reject an
attempt to introduce whitespace-damaging changes to the codebase.

Which is great.

I am wondering what we can do to add a different kind of checks to
help file types with fixed format by extending the same mechanism,
or the checks I have in mind are too different from the whitespace
checks and shoehorning it into the existing mechanism does not make
sense.  The particular check I have an immediate need for is for a
filetype with lines, each has exactly 4 fields separated with HT in
between, so the check would ask "does each line have exactly 3 HT on
it?"  It would be extended to verify CSV files with fixed number of
fields (but the validator needs to be aware of the quoting rules for
comma in a value in fields).

I guess the best I could do (outside Git) is

 - write such a validator that can take one line of input and say
   "this line comforms to the rule".

 - add, via .gitattribute, my own attribute to allow me to mark
   the files that these rules apply.  Git does not do anything
   special for this attribute (remember, I said "outside Git").

 - in pre-commit hook, run "git diff ':(attr:myattr)'" to grab
   changes in these files with special formats, and have the
   line-by-line validator (above) check the new lines.

to make sure bad lines would not slip into the history, but it would
be really nice if I can trigger the check as part of "git diff --check",
which means it would be more ideal if we can do this "inside" Git.

Perhaps we could introduce a mechansim that allows me to do the
following:

 - An attribute, like whitespace=..., specifies what line-validation
   function to use to vet each new line introduced to a file with
   the attribute.

 - A line-validation function can be dynamically loaded/linked
   (here, we'd need ".gitattribute specifies the logical meaning,
   while .git/config and friends maps the 'logical meaning' to a
   specific implementation suitable for the platform" separation,
   similar to what we use for smudge/clean filters).  Perhaps this
   would be a good testbed for use of dll, written even in a foreign
   language like Rust?

 - In the diff machinery, where a '+' line is checked for whitespace
   anomalies in the existing code, add code to call the dynamically
   loaded line-validation function when applicable.

 - Profit?
I like the idea of an extensible check mechanism with an API. I can
think of a couple of other places where such a check could be useful
to ensure formatting. I do think this is slightly more general than
whitespace checking.. The concept seems reasonable to me tho.
Hmm?
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help