Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack`

[PATCH v4 0/7] merge-ort: implement support for packing objects together · Taylor Blau <hidden> · 2023-10-19
[PATCH v4 1/7] bulk-checkin: extract abstract `bulk_checkin_source` · Taylor Blau <hidden> · 2023-10-19
Re: [PATCH v4 1/7] bulk-checkin: extract abstract `bulk_checkin_source` · Jeff King <hidden> · 2023-10-20
[PATCH v4 2/7] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types · Taylor Blau <hidden> · 2023-10-19
[PATCH v4 3/7] bulk-checkin: refactor deflate routine to accept a `bulk_checkin_source` · Taylor Blau <hidden> · 2023-10-19
[PATCH v4 4/7] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` · Taylor Blau <hidden> · 2023-10-19
Re: [PATCH v4 4/7] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` · Patrick Steinhardt <hidden> · 2023-10-23
Re: [PATCH v4 4/7] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` · Jeff King <hidden> · 2023-10-23
Re: [PATCH v4 4/7] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` · Patrick Steinhardt <hidden> · 2023-10-24
[PATCH v4 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` · Taylor Blau <hidden> · 2023-10-19
[PATCH v4 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` · Taylor Blau <hidden> · 2023-10-19
[PATCH v4 7/7] builtin/merge-tree.c: implement support for `--write-pack` · Taylor Blau <hidden> · 2023-10-19
Re: [PATCH v4 0/7] merge-ort: implement support for packing objects together · Jeff King <hidden> · 2023-10-20
Re: [PATCH v4 0/7] merge-ort: implement support for packing objects together · Patrick Steinhardt <hidden> · 2023-10-23
[PATCH v5 0/5] merge-ort: implement support for packing objects together · Taylor Blau <hidden> · 2023-10-23
[PATCH v5 1/5] bulk-checkin: extract abstract `bulk_checkin_source` · Taylor Blau <hidden> · 2023-10-23
Re: [PATCH v5 1/5] bulk-checkin: extract abstract `bulk_checkin_source` · Jeff King <hidden> · 2023-10-25
Re: [PATCH v5 1/5] bulk-checkin: extract abstract `bulk_checkin_source` · Taylor Blau <hidden> · 2023-10-25
[PATCH v5 2/5] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types · Taylor Blau <hidden> · 2023-10-23
[PATCH v5 3/5] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` · Taylor Blau <hidden> · 2023-10-23
Re: [PATCH v5 3/5] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` · Patrick Steinhardt <hidden> · 2023-10-25
Re: [PATCH v5 3/5] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` · Taylor Blau <hidden> · 2023-10-25
Re: [PATCH v5 3/5] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` · Eric Sunshine <hidden> · 2023-10-25
Re: [PATCH v5 3/5] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` · Patrick Steinhardt <hidden> · 2023-10-26
Re: [PATCH v5 3/5] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` · Elijah Newren <hidden> · 2023-11-11
[PATCH v5 4/5] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` · Taylor Blau <hidden> · 2023-10-23
[PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack` · Taylor Blau <hidden> · 2023-10-23
Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack` · Patrick Steinhardt <hidden> · 2023-10-25
Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack` · Taylor Blau <hidden> · 2023-10-25
Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack` · Elijah Newren <hidden> · 2023-11-10
Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack` · Taylor Blau <hidden> · 2023-11-11
Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack` · Jeff King <hidden> · 2023-11-13
Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack` · Elijah Newren <hidden> · 2023-11-14
Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack` · Jeff King <hidden> · 2023-11-13
Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack` · Taylor Blau <hidden> · 2023-11-13
Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack` · Elijah Newren <hidden> · 2023-11-14
Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack` · Jeff King <hidden> · 2023-11-14
Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack` · Jeff King <hidden> · 2023-11-14
Re: [PATCH v5 0/5] merge-ort: implement support for packing objects together · Patrick Steinhardt <hidden> · 2023-10-25

From: Taylor Blau <hidden>
Date: 2023-11-13 22:34:46

On Mon, Nov 13, 2023 at 05:02:54PM -0500, Jeff King wrote:

On Fri, Nov 10, 2023 at 03:51:18PM -0800, Elijah Newren wrote:

quoted

This is unsafe; the object may need to be read later within the same
merge. [...]

I think (untested) that you could fix this by creating two packs
instead of just one.  In particular, add a call to
flush_odb_transcation() after the "redo_after_renames" block in
merge_ort_nonrecursive_internal().

It might not be too hard to just let in-process callers access the
objects we've written. A quick and dirty patch is below, which works
with the test you suggested (the test still fails because it finds a
conflict, but it gets past the "woah, I can't find that sha1" part).

That's a very slick idea, and I think that this series has some legs to
stand on now as a result.

There are a few of interesting conclusions we can draw here:

  1. This solves the recursive merge problem that Elijah mentioned
     earlier where we could generate up to 2^N packs (where N is the
     maximum depth of the recursive merge).

  2. This also solves the case where the merge-ort code needs to read an
     object that it wrote earlier on in the same process without having
     to flush out intermediate packs. So in the best case we need only a
     single pack (instead of two).

  3. This also solves the 'replay' issue, *and* allows us to avoid the
     tmp-objdir thing there entirely, since we can simulate object reads
     in the bulk-checkin pack.

So I think that this is a direction worth pursuing.

In terms of making those lookups faster, I think that what you'd want is
an oidmap. The overhead is slightly unfortunate, but I think that any
other solution which requires keeping the written array in sorted order
would in practice be more expensive as you have to constantly reallocate
and copy portions of the array around to maintain its invariant.

I don't know if that is sufficient, though. Would other spawned
processes (hooks, external merge drivers, and so on) need to be able to
see these objects, too?

Interesting point. In theory those processes could ask about newly
created objects, and if they were spawned before the bulk-checkin pack
was flushed, those lookups would fail.

But that requires that, e.g. for hooks, that we know a-priori the object
ID of some newly-written objects. If you wanted to make those lookups
succeed, I think there are a couple of options:

  - teach child-processes about the bulk-checkin pack, and let them
    perform the same fake lookup

  - flush (but do not close) the bulk-checkin transaction

In any event, I think that this is a sufficiently rare and niche case
that we'd be OK to declare that you should not expect the above
scenarios to work when using `--write-pack`. If someone does ask for
that feature in the future, we could implement it relatively painlessly
using one of the above options.

Thanks,
Taylor

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help