Re: [GSoC RFC PATCH v2 0/7] repo-info: add new command for retrieving repository info
From: Lucas Seiki Oshiro <hidden>
Date: 2025-06-23 18:49:37
Hi Lucas
Hi, Phillip, thanks for joining this discussion!
I think using an output format generated by 'printf("%s\n%s\0", key,
value)' would be easier to parse. This format matches that used by 'git
config --list -z'.Thanks for your suggestion! However, this still breaks in the corner case mentioned by Junio in https://lore.kernel.org/git/xmqqikl3mtx2.fsf@gitster.g/ (local): when a value contains a LF, which would be possible to have in the (yet to be implemented) path values.
I've not seen any discussion of how paths are going to be encoded in the JSON output. As I understand it some JSON decoders only accept utf8 input but the paths reported by git are arbitrary NUL terminated byte sequences. How is one expected to parse the output for a non utf8 encoded path using rust's JSON decoding for example?
By now, I'm directly using the jw_* functions, which format strings using the
function append_quoted_string, introduced in 75459410ed (json_writer: new
routines to create JSON data, 2018-07-13). It was also discussed when that
function was introduced:
"""
We say "JSON-like" because we do not enforce the Unicode (usually UTF-8)
requirement on string fields. Internally, Git does not necessarily have
Unicode/UTF-8 data for most fields, so it is currently unclear the best
way to enforce that requirement. For example, on Linux pathnames can
contain arbitrary 8-bit character data, so a command like "status" would
not know how to encode the reported pathnames. We may want to revisit
this (or double encode such strings) in the future.
"""
So, it looks like that "the future" is soon :-). In this RFC, I'm not handling
paths yet, and I can't propose a proper solution by now as I honestly know
very little about UTF-8 encoding...
The first solution that I can think of is to check if the sequence is a valid
UTF-8 bytestring, aborting the entire command if it's not, which would be
better than just guess the charset and re-encode it as UTF-8. However,
I don't know how hard it would be to do.
On the subject of paths do you plan to support the equivalent of "git rev-parse --git-path"?
Hmmmm... In the way that it works under rev-parse, no, as it may bloat this command with other things that aren't exactly metadata.
I'm not sure what the future plans for this command are but when I'm scripting around git it would be nice to be able to a single process that I could query for the things currently returned by "git rev-parse", "git var" and "git config"
My concern here is that this main motivation for this new command is that rev-parse has too many responsibilities. Giving too many responsibilities to this new command may turn it into a new rev-parse and create a XKCD 927 [1] situation
Best Wishes Phillip
Thanks again for bringing more light to this discussion! These first patches are only outputting hardcoded strings from Git, and dealing with Unicode is something that I'll really need to think about how to solve. [1] https://xkcd.com/927/