Re: [PATCH v2 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree
From: Jonathan Nieder <hidden>
Date: 2022-02-19 01:07:05
(cc-ing Jonathan Tan, Jose Lopes, and Jeff Hostetler, vfs experts) Hi Elijah, Elijah Newren wrote[1]:
The fix is short (~30 lines), but the description is not. Sorry.
There is a set of problems caused by files in what I'll refer to as the
"present-despite-SKIP_WORKTREE" state. This commit aims to not just fix
these problems, but remove the entire class as a possibility -- for
those using sparse checkouts. But first, we need to understand the
problems this class presents. A quick outline:
* Problems
* User facing issues
* Problem space complexity
* Maintenance and code correctness challenges
* SKIP_WORKTREE expectations in Git
* Suggested solution
* Pros/Cons of suggested solution
* Notes on testcase modificationsThanks for a clear explanation! This is very helpful.
=== User facing issues ===
There are various ways for users to get files to be present in the
working copy despite having the SKIP_WORKTREE bit set for that file in
the index. This may come from:
* various git commands not really supporting the SKIP_WORKTREE bit[1,2]
* users grabbing files from elsewhere and writing them to the worktree
(perhaps even cached in their editor)
* users attempting to "abort" a sparse-checkout operation with a
not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the
working tree is not atomic)[3].
Once users have present-despite-SKIP_WORKTREE files, any modifications
users make to these files will be ignored, possibly to users' confusion.[...]
The suggests a simple solution: present-despite-SKIP_WORKTREE files should not exist, for those using sparse-checkouts.
This patch just reached "next", so at $DAYJOB a test for our vfsd[2]
noticed this change. The trick behind a Git-based virtual filesystem
is typically:
- since we control the filesystem, we can pretend all files are
already present. On access, we obtain the file content from the git
object store. On write, we update the sparse-checkout pattern so
that Git knows to start tracking the file.
- by keeping the sparse-checkout pattern narrow, we minimize the time
commands like "git status" need to spend looking for changes in
unmodified files. Controlling the filesystem means we don't need to
worry about changes to files that don't match that pattern (since
any modification would also trigger a sparse-checkout pattern
update).
If I understand the intent behind this change correctly, it's
incompatible with that trick. How would you recommend handling that?
In the not too far away future, I'd expect something like the "VFS
projection hook" to handle this use case, but in the meantime, I would
expect this change to break VFS for Git performance. A few options:
a. We could guard it with a config option. It would still be a
regression for VFS for Git users, but they'd be able to use the
config option to restore the expected behavior. (Or
alternatively, such a config option could be disabled by default,
but I suspect that would defeat the purpose described for the
patch.)
b. We could distinguish between the vfsd and the "interrupted and
forgot to update SKIP_WORKTREE bits in the index" cases somehow.
This sounds complex.
c. Something else?
(b) and (c) aren't sounding obviously good, so (a) seems tempting.
What do you think?
Thanks,
Jonathan
[1] https://lore.kernel.org/git/11d46a399d26c913787b704d2b7169cafc28d639.1642175983.git.gitgitgadget@gmail.com/ (local)
[2] see
https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/ (local)
for what I mean by "vfsd"