Thread (168 messages) 168 messages, 9 authors, 2026-01-12

Re: [PATCH v5 4/7] submodule: add extension to encode gitdir paths

From: Adrian Ratiu <hidden>
Date: 2025-12-05 19:30:43

On Fri, 05 Dec 2025, Patrick Steinhardt [off-list ref] wrote:
On Wed, Nov 19, 2025 at 11:10:27PM +0200, Adrian Ratiu wrote:
quoted
Add a submoduleEncoding extension which fixes filesystem collisions by
encoding gitdir paths. At a high level, this implements a mechanism to
encode -> validate -> retry until a working gitdir path is found.

Credit goes to Junio for coming up with this design: encoding is only
applied when necessary, e.g. uppercase characters are encoded only on
case-folding filesystems and only if a real conflict is detected.

To make this work, we rely on the submodule.<name>.gitdir config as the
single source of truth for gitidir paths: the config is always set when
s/gitidir/gitdir/
Ack, will fix.
quoted
diff --git a/Documentation/config/extensions.adoc b/Documentation/config/extensions.adoc
index 532456644b..4861d01894 100644
--- a/Documentation/config/extensions.adoc
+++ b/Documentation/config/extensions.adoc
@@ -73,6 +73,12 @@ relativeWorktrees:::
 	repaired with either the `--relative-paths` option or with the
 	`worktree.useRelativePaths` config set to `true`.
 
+submoduleEncoding:::
+	If enabled, submodule gitdir paths are encoded to avoid filesystem
+	conflicts due to nested gitdirs, case insensitivity or other issues.
+	When enabled, the submodule.<name>.gitdir config is always set for
+	all submodules and is the single point of authority for gitdir paths.
+
 worktreeConfig:::
 	If enabled, then worktrees will load config settings from the
 	`$GIT_DIR/config.worktree` file in addition to the
I think the fact that the submodule gitdir paths are encoded now is
secondary to this repository extension. The more important fact is that
this changes the source of truth where the submodule gitdir path is
actually derived from: before it was derived on the fly, whereas now it
is persisted in the gitconfig.

It follows that because the source of truth is now a persistent entry in
the configuration, other implementations can read it without having to
understand how exactly the value was computed in the first place. So an
implementation may arbitrarily change the algorithm it uses to derive
that path from now on, and it doesn't necessarily have to encode
anything.

So I'd propose to rename the extension and rephrase its description
accordingly. It could for example be called something along the lines of
"submodulePathConfig".
I think this is a very reasonable suggestion, thanks!

If nobody has objections or better suggestions, I will rename the
extension to "submodulePathConfig" and reword the description as you
suggested.
quoted
diff --git a/submodule.c b/submodule.c
index 8ef028f26b..07cb4694cf 100644
--- a/submodule.c
+++ b/submodule.c
@@ -2559,33 +2591,74 @@ int submodule_to_gitdir(struct repository *repo,
 	return ret;
 }
 
+static int validate_and_set_submodule_gitdir(struct strbuf *gitdir_path,
+					     const char *submodule_name)
+{
+	char *key;
+
+	if (validate_submodule_encoded_git_dir(gitdir_path->buf, submodule_name))
+		return -1;
+
+	key = xstrfmt("submodule.%s.gitdir", submodule_name);
+	repo_config_set_gently(the_repository, key, gitdir_path->buf);
+	FREE_AND_NULL(key);
I think a simple call to `free()` should be sufficient here. There is no
risk of it being used afterwards.
Ack, will fix.
quoted
+	return 0;
+}
+
 void submodule_name_to_gitdir(struct strbuf *buf, struct repository *r,
 			      const char *submodule_name)
 {
+	const char *gitdir;
+	char *key;
+
+	repo_git_path_append(r, buf, "modules/");
+	strbuf_addstr(buf, submodule_name);
+
+	/* If extensions.submoduleEncoding is disabled, use the plain path set above */
+	if (!r->repository_format_submodule_encoding) {
+		if (validate_submodule_git_dir(buf->buf, submodule_name) < 0)
+			die(_("refusing to create/use '%s' in another submodule's "
+			      "git dir"), buf->buf);
+
+		return; /* plain gitdir is valid for use */
+	}
+
+	/* Extension is enabled: use the gitdir config if it exists */
+	key = xstrfmt("submodule.%s.gitdir", submodule_name);
+	if (!repo_config_get_string_tmp(r, key, &gitdir)) {
+		strbuf_reset(buf);
+		strbuf_addstr(buf, gitdir);
+		FREE_AND_NULL(key);
+
+		/* validate because users might have modified the config */
+		if (validate_submodule_encoded_git_dir(buf->buf, submodule_name))
+			die(_("Invalid 'submodule.%s.gitdir' config: '%s' please check "
+			      "if it is unique or conflicts with another module"),
Nit: error messages start with a lower-case character.
Ack, will fix.
quoted
+			    submodule_name, gitdir);
+
+		return;
+	}
+	FREE_AND_NULL(key);
+
 	/*
-	 * NEEDSWORK: The current way of mapping a submodule's name to
-	 * its location in .git/modules/ has problems with some naming
-	 * schemes. For example, if a submodule is named "foo" and
-	 * another is named "foo/bar" (whether present in the same
-	 * superproject commit or not - the problem will arise if both
-	 * superproject commits have been checked out at any point in
-	 * time), or if two submodule names only have different cases in
-	 * a case-insensitive filesystem.
-	 *
-	 * There are several solutions, including encoding the path in
-	 * some way, introducing a submodule.<name>.gitdir config in
-	 * .git/config (not .gitmodules) that allows overriding what the
-	 * gitdir of a submodule would be (and teach Git, upon noticing
-	 * a clash, to automatically determine a non-clashing name and
-	 * to write such a config), or introducing a
-	 * submodule.<name>.gitdir config in .gitmodules that repo
-	 * administrators can explicitly set. Nothing has been decided,
-	 * so for now, just append the name at the end of the path.
+	 * The gitdir config does not exist, even though the extension is enabled.
+	 * Therefore we are in one of the following cases:
 	 */
+
+	/* Case 1: legacy migration of valid plain submodule names */
+	if (!validate_and_set_submodule_gitdir(buf, submodule_name))
+		return;
+
+	/* Case 2: Try URI-safe (RFC3986) encoding first, this fixes nested gitdirs */
+	strbuf_reset(buf);
 	repo_git_path_append(r, buf, "modules/");
-	strbuf_addstr(buf, submodule_name);
+	strbuf_addstr_urlencode(buf, submodule_name, is_rfc3986_unreserved);
+	if (!validate_and_set_submodule_gitdir(buf, submodule_name))
+		return;
 
-	if (validate_submodule_git_dir(buf->buf, submodule_name) < 0)
-		die(_("refusing to create/use '%s' in another submodule's "
-		      "git dir"), buf->buf);
+	/* Case 3: Nothing worked: error out */
+	die(_("Cannot construct a valid gitdir path for submodule '%s': "
+	      "please set a unique git config for 'submodule.%s.gitdir'."),
+	    submodule_name, submodule_name);
It feels somewhat fragile to me that we unconditionally handle these
cases and try to find old submodule directories. If the extension is
enabled I'd expect that the submodule configuration is the _only_ source
of truth.

May I propose that we instead always error out in case the submodule
configuration does not exist? In the best case we'd then give the user a
nice error message that tells them how to run the migration manually.
Junio told me to not do any kind of manual migration and just attempt
new names until one works and then use it consistently.

That's why the "submodule.%s.gitdir" path is always used if set and
has precedence (no new names are attempted). :)
(Coming back from reading subsequent patches) Maybe what's putting me
off is that this function is seemingly used for two things:

  1. To derive the submodule path in case we know it should already
     exist.

  2. To compute the submodule path so we can end up writing it into the
     "submodule.*.gitdir" variable.

I think we should tell these two cases apart. In the first case I expect
that we never fall back to a computed name, but bail out in case the
configuration key does not exist. And in the second case it of course
makes sense to compute the actual path that we want to store in the
configuration.
I think I understand where you're coming from.

Even before my patches, the unmodified submodule_name_to_gitdir() is
used for both new (non-existing) and old (existing) submodules.

It has no way of knowing whether a submodule exists, whether it should
exist, or whether a new name is required for a new clone, which will
eventually exist in the future.

If I also understood your suggestion, you just need an additional check
to verify if the path pointed to by "submodule.%s.gitdir" is an existing
gitdir and error out if not?

Or did I misunderstood your suggestion and you mean to bail out if the
config key is missing entirely for any submodule when the extension is
enabled?

That would imply a manual migration by the user which is something both
Aaron and Junio asked me to avoid and Josh also said they want to avoid
setting any kind of config keys (or distributing configs), so that's why
I also added the compile-time extension option, to ease the transition,
together with the "retry-on-fallback" approach for setting the config.

I am in favor of implementing the split you suggested, however how do we
automatically figure out if a name's gitdir **should** exist if you mean
the latter not the former? :)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help