Re: [PATCH v5 4/7] submodule: add extension to encode gitdir paths
From: Adrian Ratiu <hidden>
Date: 2025-12-05 19:30:43
On Fri, 05 Dec 2025, Patrick Steinhardt [off-list ref] wrote:
On Wed, Nov 19, 2025 at 11:10:27PM +0200, Adrian Ratiu wrote:quoted
Add a submoduleEncoding extension which fixes filesystem collisions by encoding gitdir paths. At a high level, this implements a mechanism to encode -> validate -> retry until a working gitdir path is found. Credit goes to Junio for coming up with this design: encoding is only applied when necessary, e.g. uppercase characters are encoded only on case-folding filesystems and only if a real conflict is detected. To make this work, we rely on the submodule.<name>.gitdir config as the single source of truth for gitidir paths: the config is always set whens/gitidir/gitdir/
Ack, will fix.
quoted
diff --git a/Documentation/config/extensions.adoc b/Documentation/config/extensions.adoc index 532456644b..4861d01894 100644 --- a/Documentation/config/extensions.adoc +++ b/Documentation/config/extensions.adoc@@ -73,6 +73,12 @@ relativeWorktrees::: repaired with either the `--relative-paths` option or with the `worktree.useRelativePaths` config set to `true`. +submoduleEncoding::: + If enabled, submodule gitdir paths are encoded to avoid filesystem + conflicts due to nested gitdirs, case insensitivity or other issues. + When enabled, the submodule.<name>.gitdir config is always set for + all submodules and is the single point of authority for gitdir paths. + worktreeConfig::: If enabled, then worktrees will load config settings from the `$GIT_DIR/config.worktree` file in addition to theI think the fact that the submodule gitdir paths are encoded now is secondary to this repository extension. The more important fact is that this changes the source of truth where the submodule gitdir path is actually derived from: before it was derived on the fly, whereas now it is persisted in the gitconfig. It follows that because the source of truth is now a persistent entry in the configuration, other implementations can read it without having to understand how exactly the value was computed in the first place. So an implementation may arbitrarily change the algorithm it uses to derive that path from now on, and it doesn't necessarily have to encode anything. So I'd propose to rename the extension and rephrase its description accordingly. It could for example be called something along the lines of "submodulePathConfig".
I think this is a very reasonable suggestion, thanks! If nobody has objections or better suggestions, I will rename the extension to "submodulePathConfig" and reword the description as you suggested.
quoted
diff --git a/submodule.c b/submodule.c index 8ef028f26b..07cb4694cf 100644 --- a/submodule.c +++ b/submodule.c@@ -2559,33 +2591,74 @@ int submodule_to_gitdir(struct repository *repo, return ret; } +static int validate_and_set_submodule_gitdir(struct strbuf *gitdir_path, + const char *submodule_name) +{ + char *key; + + if (validate_submodule_encoded_git_dir(gitdir_path->buf, submodule_name)) + return -1; + + key = xstrfmt("submodule.%s.gitdir", submodule_name); + repo_config_set_gently(the_repository, key, gitdir_path->buf); + FREE_AND_NULL(key);I think a simple call to `free()` should be sufficient here. There is no risk of it being used afterwards.
Ack, will fix.
quoted
+ return 0; +} + void submodule_name_to_gitdir(struct strbuf *buf, struct repository *r, const char *submodule_name) { + const char *gitdir; + char *key; + + repo_git_path_append(r, buf, "modules/"); + strbuf_addstr(buf, submodule_name); + + /* If extensions.submoduleEncoding is disabled, use the plain path set above */ + if (!r->repository_format_submodule_encoding) { + if (validate_submodule_git_dir(buf->buf, submodule_name) < 0) + die(_("refusing to create/use '%s' in another submodule's " + "git dir"), buf->buf); + + return; /* plain gitdir is valid for use */ + } + + /* Extension is enabled: use the gitdir config if it exists */ + key = xstrfmt("submodule.%s.gitdir", submodule_name); + if (!repo_config_get_string_tmp(r, key, &gitdir)) { + strbuf_reset(buf); + strbuf_addstr(buf, gitdir); + FREE_AND_NULL(key); + + /* validate because users might have modified the config */ + if (validate_submodule_encoded_git_dir(buf->buf, submodule_name)) + die(_("Invalid 'submodule.%s.gitdir' config: '%s' please check " + "if it is unique or conflicts with another module"),Nit: error messages start with a lower-case character.
Ack, will fix.
quoted
+ submodule_name, gitdir); + + return; + } + FREE_AND_NULL(key); + /* - * NEEDSWORK: The current way of mapping a submodule's name to - * its location in .git/modules/ has problems with some naming - * schemes. For example, if a submodule is named "foo" and - * another is named "foo/bar" (whether present in the same - * superproject commit or not - the problem will arise if both - * superproject commits have been checked out at any point in - * time), or if two submodule names only have different cases in - * a case-insensitive filesystem. - * - * There are several solutions, including encoding the path in - * some way, introducing a submodule.<name>.gitdir config in - * .git/config (not .gitmodules) that allows overriding what the - * gitdir of a submodule would be (and teach Git, upon noticing - * a clash, to automatically determine a non-clashing name and - * to write such a config), or introducing a - * submodule.<name>.gitdir config in .gitmodules that repo - * administrators can explicitly set. Nothing has been decided, - * so for now, just append the name at the end of the path. + * The gitdir config does not exist, even though the extension is enabled. + * Therefore we are in one of the following cases: */ + + /* Case 1: legacy migration of valid plain submodule names */ + if (!validate_and_set_submodule_gitdir(buf, submodule_name)) + return; + + /* Case 2: Try URI-safe (RFC3986) encoding first, this fixes nested gitdirs */ + strbuf_reset(buf); repo_git_path_append(r, buf, "modules/"); - strbuf_addstr(buf, submodule_name); + strbuf_addstr_urlencode(buf, submodule_name, is_rfc3986_unreserved); + if (!validate_and_set_submodule_gitdir(buf, submodule_name)) + return; - if (validate_submodule_git_dir(buf->buf, submodule_name) < 0) - die(_("refusing to create/use '%s' in another submodule's " - "git dir"), buf->buf); + /* Case 3: Nothing worked: error out */ + die(_("Cannot construct a valid gitdir path for submodule '%s': " + "please set a unique git config for 'submodule.%s.gitdir'."), + submodule_name, submodule_name);It feels somewhat fragile to me that we unconditionally handle these cases and try to find old submodule directories. If the extension is enabled I'd expect that the submodule configuration is the _only_ source of truth. May I propose that we instead always error out in case the submodule configuration does not exist? In the best case we'd then give the user a nice error message that tells them how to run the migration manually.
Junio told me to not do any kind of manual migration and just attempt new names until one works and then use it consistently. That's why the "submodule.%s.gitdir" path is always used if set and has precedence (no new names are attempted). :)
(Coming back from reading subsequent patches) Maybe what's putting me
off is that this function is seemingly used for two things:
1. To derive the submodule path in case we know it should already
exist.
2. To compute the submodule path so we can end up writing it into the
"submodule.*.gitdir" variable.
I think we should tell these two cases apart. In the first case I expect
that we never fall back to a computed name, but bail out in case the
configuration key does not exist. And in the second case it of course
makes sense to compute the actual path that we want to store in the
configuration.I think I understand where you're coming from. Even before my patches, the unmodified submodule_name_to_gitdir() is used for both new (non-existing) and old (existing) submodules. It has no way of knowing whether a submodule exists, whether it should exist, or whether a new name is required for a new clone, which will eventually exist in the future. If I also understood your suggestion, you just need an additional check to verify if the path pointed to by "submodule.%s.gitdir" is an existing gitdir and error out if not? Or did I misunderstood your suggestion and you mean to bail out if the config key is missing entirely for any submodule when the extension is enabled? That would imply a manual migration by the user which is something both Aaron and Junio asked me to avoid and Josh also said they want to avoid setting any kind of config keys (or distributing configs), so that's why I also added the compile-time extension option, to ease the transition, together with the "retry-on-fallback" approach for setting the config. I am in favor of implementing the split you suggested, however how do we automatically figure out if a name's gitdir **should** exist if you mean the latter not the former? :)