Re: [PATCH v4 3/4] submodule: support running in multiple worktree setup
From: Stefan Beller <hidden>
Date: 2016-07-26 18:23:06
On Tue, Jul 26, 2016 at 10:20 AM, Duy Nguyen [off-list ref] wrote:
On Tue, Jul 26, 2016 at 1:25 AM, Stefan Beller [off-list ref] wrote:quoted
So what is the design philosophy in worktrees? How much independence does one working tree have?git-worktree started out as an alternative for git-stash: hmm.. i need to make some changes in another branch, okay let's leave this worktree (with all its messy stuff) as-is, create another worktree, make those changes, then delete the worktree and go back here. There's already another way of doing that without git-stash: you clone the repo, fix your stuff, push back and delete the new repo. I know I have not really answered your questions. But I think it gives an idea what are the typical use cases for multiple worktrees. How much independence would need to be decided case-by-case, I think.
Thanks!
quoted
So here is what I did: * s/git submodule init/git submodule update --init/ * added a test_pause to the last test on the last line * Then: $ find . |grep da5e6058 ./addtest/.git/modules/submod/objects/08/da5e6058267d6be703ae058d173ce38ed53066 ./addtest/.git/worktrees/super-elsewhere/modules/submod/objects/08/da5e6058267d6be703ae058d173ce38ed53066 ./addtest/.git/worktrees/super-elsewhere/modules/submod2/objects/08/da5e6058267d6be703ae058d173ce38ed53066 ./.git/objects/08/da5e6058267d6be703ae058d173ce38ed53066 The last entry is the "upstream" for the addtest clone, so that is fine. However inside the ./addtest/ (and its worktrees, which currently are embedded in there?) we only want to have one object store for a given submodule?How to store stuff in .git is the implementation details that the user does not care about.
They do unfortunately. :(
Some teams here are trying to migrate from the repo[1] tool to submodules,
and they usually have large code bases. (e.g. The Android Open Source
Project[2], put into a superproject has a .git dir size of 17G. The
17G are partitioned as follows:
.../.git$ du --max-depth=1 -h
44K ./hooks
32K ./refs
36K ./logs
17G ./modules
4.0K ./branches
8.0K ./info
4.7M ./objects
17G .
i.e. roughly all in submodules.
So our users do care about both what is on disk, as well
as what goes over the wire (network traffic).
My sudden interest in worktrees came up when I learned the
`--reference` flag for submodule operations is broken for
our use case, and instead of fixing the `--reference` flag,
I think the worktree approach is generally saner (i.e. with the
references you may have nasty gc issues IIUC, but in the
worktree world gc knows about all the working trees, detached
heads and branches.)
[1] https://source.android.com/source/developing.html
[2] https://android.googlesource.com/
As long as we keep the behavior the same (they can still "git submodule init" and stuff in the new worktree), sharing the same object store makes sense (pros: lower disk consumption, cons: none).
So I think the current workflow for submodules
may need some redesign anyway as the submodule
commands were designed with a strict "one working
tree only" assumption.
Submodule URLs are stored in 3 places:
A) In the .gitmodules file of the superproject
B) In the option submodule.<name>.URL in the superproject
C) In the remote.origin.URL in the submodule
A) is a recommendation from the superproject to make life
of downstream easier to find and setup the whole thing.
You can ignore that if you want, though generally a caring
upstream provides good URLs here.
C) is where we actually fetch from (and hope it has all
the sha1s that are recorded as gitlinks in the superproejct)
B) seems like a hack to enable the workflow as below:
Current workflow for handling submodule URLs:
1) Clone the superproject
2) Run git submodule init on desired submodules
3) Inspect .git/config to see if any submodule URL needs adaption
4) Run git submodule update to obtain the submodules from
the configured place
5) In case of superproject adapting the URL
-> git submodule sync, which overwrites the submodule.<name>.URL in the
superprojects .git/config as well as configuring the
remote."$remote".url in the submodule
6) In case of users desire to change the URL
-> No one command to solve it; possible workaround: edit
.gitmodules and git submodule sync, or configure the submodule.<name>.URL
in the superprojects .git/config as well as configuring the
remote."$remote".url in
the submodule separately. Although just changing the submodules remote works
just as well (until you remove and re-clone the submodule)
One could imagine another workflow:
1) clone the superproject, which creates empty repositories for the
submodules
(2) from the prior workflow is gone
3) instead of inspecting .git/config you can directly manipulate the
remote.$remote.url configuration in the submodule.
4) Run git submodule update to obtain the submodules from
the configured place
The current workflow is setup that way because historically you had
the submodules .git dir inside the submodule, which would be gone
if you deleted a submodule. So if you later checkout an earlier version'
that had a submodule, you are missing the objects and more importantly
configuration where to get them from.
This is now fixed by keeping the actual submodules git dir inside
the superprojects git dir.
quoted
After playing with this series a bit more, I actually like the UI as it is an easy mental model "submodules behave completely independent". However in 3/4 you said: + - `submodule.*` in current state should not be shared because the + information is tied to a particular version of .gitmodules in a + working directory. This is already a problem with say different branches/versions. That has been solved by duplicating that information to .git/config as a required step. (I don't like that approach, as it is super confusing IMHO)Hmm.. I didn't realize this. But then I have never given much thought about submodules, probably because I have an alternative solution for it (or some of its use cases) anyway :)
What is that?
OK so it's already a problem. But if we keep sharing submodule stuff in .git/config, there's a _new_ problem: when you "submodule init" a worktree, .git/config is now tailored for the current worktree, when you move back to the previous worktree, you need to "submodule init" again.
"Moving back" sounds like you use the worktree feature for short lived things only. (e.g. in the man page you refer to the hot fix your boss wants you to make urgently). I thought the worktree feature is more useful for long term "branches", e.g. I have one worktree of git now that tracks origin/master so I can use that to "make install" to repair my local broken version of git. (I could have a worktree "continuous integration", where I only run tests in. I could have one worktree for Documentation changes only.) This long lived stuff probably doesn't make sense for the a single repository, but in combination with submodules (which is another way to approach the "sparse/narrow" desire of a large project), I think that makes sense, because the "continuous integration" shares a lot of submodules with my "regular everyday hacking" or the "I need to test my colleague work now" worktree.
So moving to multiple worktrees setup changes how the user uses submodule, not good in my opinion.
Because the submodule user API is built on the strong assumption of "one working tree only", we have to at least slightly adapt. So instead of cloning a submodule in a worktree we could just setup a submodule worktree as well there? (i.e. that saves us network as well as disk)
If you have a grand plan to make submodule work at switching branches (without reinit) and if it happens to work the same way when we have multiple worktrees, great.
Eh, I am still working on the master plan. ;) The insights on how worktrees handles stuff helps me shape it though. :) If you switch a branch (or to any sha1), the submodule currently stays "as-is" and may be updated using "submodule update", which goes through the list of existing (checked out) submodules and checks them out to the sha1 pointed to by the superprojects gitlink.
quoted
I am back to the drawing board for the submodule side of things, but I guess this series could be used once we figure out how to have just one object database for a submodule.I would leave this out for now. Let's make submodule work with multiple worktrees first (and see how the users react to this). Then we can try to share object database. Object database and refs are tied closely together so you may run into other problems soon.
I see. The normal for submodules is to be in detached HEAD though. The user can of course checkout branches or things in there, but the "submodule update" operations do not go to a branch for you. ---- Another (slightly offtopic) observation on the similarity of worktree and submodules: There is no good way implemented to remove one. For submodules there is deinit both removes the working tree as well as the configuration indicating the existence (Note: the git dir still exists for the submodule). Though that sounds like what we need to save us network traffic the next time we need the submodule. Although going through the code I need to test that a bit more later today to see how fail safe it is. On the submodule side, it often gets confusing what you want to remove (local checkout of the submodule, or the gitlink or both). For worktrees there is no "worktree rm" as it would probably promise a bit more than the man pages suggestion of rm -rf $worktree && git worktree prune. Thanks, Stefan
-- Duy