Re: Confusing treatment of "head" in worktree on case-insensitive FS

From: Jeff King <hidden>
Date: 2024-07-01 18:28:26

On Mon, Jul 01, 2024 at 02:17:21PM +0100, Phillip Wood wrote:

On 01/07/2024 04:31, Jeff King wrote:

quoted

On Sat, Jun 29, 2024 at 10:39:29AM -0400, Julia Evans wrote:

quoted

$ git init
$ git commit --allow-empty -m'test'
$ git worktree add /tmp/myworktree
$ cd /tmp/myworktree
$ git commit --allow-empty -m'test'
$ git rev-parse head
adf59ca8da0ee5c4eb455f87efecc6c79eaf030f
$ git rev-parse hEAd
adf59ca8da0ee5c4eb455f87efecc6c79eaf030f
$ git rev-parse HEAD
fbb28196d08d74aa61f65e5cee3cb11cc24c338a

I admit this is an unexpected case, as I'd expect both on-disk files to
be spelled "HEAD". I didn't dig into the details, though, so it's
possible there's something we could be doing differently or better. But
I do think it's mostly the tip of the iceberg for case-insensitivity
issues with refs.

I think what's happening is that the checks in is_current_worktree_ref() are
case sensitive so "head" is not treated as local to the current worktree and
ends up being resolved in the main worktree. I guess we could make those
checks case-insensitive but as you say it'd only be dealing the tip of the
iceberg.

Ah, right, that makes perfect sense (well, why it happens that way, not
from the perspective of a user :) ).

So one thing we could do (but I am not sure is wise) is for those checks
to become case-insensitive for a case-insensitive ref store. And then at
least if you use consistent case when writing refs, you should get
reasonable behavior (whereas if you make "hEaD" and "HEAD" yourself, all
bets are off). But I'd worry about opening up even more weird corner
cases. And you can already avoid this problem (I think) by using the
case-sensitive spelling "HEAD" on lookups.

On a related note do MacOs and Windows do any unicode normalization when
looking up filenames? If so then that's probably another can of worms for
filesystem based refs.

At least macOS does. That's why we have all of the precompose-unicode
code, which tries to normalize arguments to match what the OS will do.
In theory we could do something like that for case normalizing, but I
don't think it's nearly as simple.

For a read, normalizing "head" to "HEAD" on a case-insensitive
filesystem is OK, since the OS will return the same set for each group.

But writing is harder. The unicode normalization in the filesystem is
not "preserving". So if I pass in a decomposed string, the filesystem is
going to silently convert it to the precomposed form anyway. But case is
usually preserving. So if I write "hEaD", I'll get that in the
filesystem, and not actually "HEAD". I dunno. Maybe it would be OK if we
did that only for root refs which would otherwise be forbidden. But it
really feels like opening up a can of complexity worms and corner cases.

-Peff

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help