Thread (10 messages) 10 messages, 4 authors, 2022-09-23

Re: Question relate to collaboration on git monorepo

From: ZheNing Hu <hidden>
Date: 2022-09-23 14:32:23

Elijah Newren [off-list ref] 于2022年9月22日周四 07:36写道:
On Wed, Sep 21, 2022 at 8:22 AM ZheNing Hu [off-list ref] wrote:
quoted
Emily Shaffer [off-list ref] 于2022年9月21日周三 02:53写道:
quoted
On Tue, Sep 20, 2022 at 5:42 AM ZheNing Hu [off-list ref] wrote:
quoted
Hey, guys,

If two users of git monorepo are working on different sub project
/project1 and /project2 by partial-clone and sparse-checkout ,
if user one push first, then user two want to push too, he must
pull some blob which pushed by user one. I guess their repo size
will gradually increase by other's project's objects, so is there any way
to delete unnecessary blobs out of working project (sparse-checkout
filterspec), or just git pull don't really fetch these unnecessary blobs?
This is exactly what the combination of partial clone and sparse
checkout is for!

Dev A is working on project1/, and excludes project2/ from her sparse
filter; she also cloned with `--filter=blob:none`.
Dev B is working on project2/, and excludes project1/ from his sparse
filter, and similarly  is using blob:none partial clone filter.

Assuming everybody is contributing by direct push, and not using a
code review tool or something else which handles the push for them...
Dev A finishes first, and pushes.
Dev B needs to pull, like you say - but during that pull he doesn't
need to fetch the objects in project1, because they're excluded by the
combination of his partial clone filter and his sparse checkout
pattern. The pull needs to happen because there is a new commit which
Dev B's commit needs to treat as a parent, and so Dev B's client needs
to know the ID of that commit.
I don't agree here, it indeed fetches the blobs during git pull. So I
do a little
change in the previous test:

(
  cd m2
  git cat-file --batch-check --batch-all-objects | grep blob | wc -l >
blob_count1
#  git push
#  git -c pull.rebase=false pull --no-edit #no conflict
  git fetch origin main
  git cat-file --batch-check --batch-all-objects | grep blob | wc -l >
blob_count2
  git merge --no-edit origin/main
  git cat-file --batch-check --batch-all-objects | grep blob | wc -l >
blob_count3
  printf "blob_count1=%s\n" $(cat blob_count1)
  printf "blob_count2=%s\n" $(cat blob_count2)
  printf "blob_count3=%s\n" $(cat blob_count3)
)

warning: This repository uses promisor remotes. Some objects may not be loaded.
remote: Enumerating objects: 32, done.
remote: Counting objects: 100% (32/32), done.
remote: Compressing objects: 100% (20/20), done.
remote: Total 30 (delta 0), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (30/30), 2.61 KiB | 2.61 MiB/s, done.
From /Users/adl/./mono-repo
 * branch            main       -> FETCH_HEAD
   a6a17f2..16a8585  main       -> origin/main
warning: This repository uses promisor remotes. Some objects may not be loaded.
Merge made by the 'ort' strategy.
Note: The merge completed successfully, and we see no evidence of
additional blobs being downloaded before this point.
Agree. Debug message This is not a problem caused by git merge,
but caused by "finish" period of git merge, which fetch missing objects
to show the diffstat.

(lldb) b fetch_objects
Breakpoint 1: where = git`fetch_objects + 29 at
promisor-remote.c:18:23, address = 0x0000000100275f4d
(lldb) r
Process 62227 launched: '/Users/adl/repos/git/git' (x86_64)
Merge made by the 'ort' strategy.
Process 62227 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100275f4d
git`fetch_objects(repo=0x0000000100406a88, remote_name="origin",
oids=0x0000000101204360, oid_nr=1) at promisor-remote.c:18:23
   15  const struct object_id *oids,
   16  int oid_nr)
   17  {
-> 18  struct child_process child = CHILD_PROCESS_INIT;
   19  int i;
   20  FILE *child_in;
   21
Target 0: (git) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000100275f4d
git`fetch_objects(repo=0x0000000100406a88, remote_name="origin",
oids=0x0000000101204360, oid_nr=1) at promisor-remote.c:18:23
    frame #1: 0x0000000100275ea3
git`promisor_remote_get_direct(repo=0x0000000100406a88,
oids=0x0000000101204360, oid_nr=1) at promisor-remote.c:249:7
    frame #2: 0x00000001001a2fe3
git`diff_queued_diff_prefetch(repository=0x0000000100406a88) at
diff.c:6781:2
    frame #3: 0x00000001001a3075
git`diffcore_std(options=0x00007ff7bfefed20) at diff.c:6805:3
    frame #4: 0x000000010009ca11
git`finish(head_commit=0x000000010151f000,
remoteheads=0x0000600000004390, new_head=0x00007ff7bfeff030,
msg="Merge made by the 'ort' strategy.") at merge.c:499:3
    frame #5: 0x000000010009d787
git`finish_automerge(head=0x000000010151f000, head_subsumed=0,
common=0x0000600000004330, remoteheads=0x0000600000004390,
result_tree=0x00007ff7bfeff280, wt_strategy="ort") at merge.c:960:2
    frame #6: 0x000000010009b07b git`cmd_merge(argc=1,
argv=0x00007ff7bfeff660, prefix=0x0000000000000000) at merge.c:1743:9
    frame #7: 0x0000000100005573 git`run_builtin(p=0x00000001003e0e60,
argc=3, argv=0x00007ff7bfeff660) at git.c:466:11
    frame #8: 0x0000000100004098 git`handle_builtin(argc=3,
argv=0x00007ff7bfeff660) at git.c:721:3
    frame #9: 0x0000000100004f76
git`run_argv(argcp=0x00007ff7bfeff4dc, argv=0x00007ff7bfeff4d0) at
git.c:788:4
    frame #10: 0x0000000100003e69 git`cmd_main(argc=3,
argv=0x00007ff7bfeff660) at git.c:921:19
    frame #11: 0x000000010011e8f6 git`main(argc=4,
argv=0x00007ff7bfeff658) at common-main.c:56:11
    frame #12: 0x00000001005b94fe dyld`start + 462
quoted
remote: Enumerating objects: 1, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (1/1), 87 bytes | 87.00 KiB/s, done.
Here, we do have an object download, which occurred after the merge
completed, so there must be something happening after the merge which
needs the extra blob; if we keep reading...
quoted
 project1/file1 | 10 ++++++++++
 1 file changed, 10 insertions(+)
Ah, the 'helpful' diffstat.  It downloads blobs from a promisor remote
just so we can see what has changed, including in the area of the
project we don't care about.

(This is yet another reason it'd be nice to have a --restrict mode for
grep/diff/log/etc. for sparse-checkout uses, and an ability to make it
the default in some repo, so you could get just the diffstat within
the region of the project that you care about.  We're discussing such
an idea, but it isn't implemented yet.)
quoted
warning: This repository uses promisor remotes. Some objects may not be loaded.
blob_count1=11
blob_count2=11
blob_count3=12

The result shows that blob count doesn't change in git fetch, but in git merge.
If you add --no-stat to your merge command (or set merge.stat to
false), the extra blob will not be downloaded.
After config merge.stat to false, the problem is solved. Thanks a lot!

ZheNing Hu
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help