Thread (64 messages) 64 messages, 18 authors, 2020-05-16

[TOPIC 14/17] Aspects of merge-ort: cool, or crimes against humanity?

From: James Ramsay <hidden>
Date: 2020-03-12 04:11:41

1. Elijah: ORT stands for Ostensibly Recursive’s Twin. As a merge 
strategy, just like you can call ‘git merge -s recursive’ you can 
call ‘git merge -s ort’.  Git’s option parsing doesn’t require 
the space after the ‘-s’.

2. Major question is about performance & possible layering violations. 
Merge recursive calls unpacks trees to walks trees, then needs to get 
all file and directory names and so walks the trees on the right again, 
and then the trees on the left again. Then diff needs to walk sets of 
trees twice, and then insert_stage_data() does a bunch more narrow tree 
walks for rename detection. Lots of tree walking. Replaced that with two 
tree walks.

3. Using traverse_trees() instead of unpack_trees(), and avoid the index 
entirely (not even touching or creating cache_entry’s), and building 
up information as I need. I’m not calling diffcore_std(), but instead 
directly calling diffcore_rename(). Is this horrifying? Or is it 
justified by the performance gains?

4. Peff: both, some of it sounds like an improvement, but maybe there 
were hidden benefits previously.

5. Elijah: I write to a tree before I do anything.

6. Peff: I like that. Seems like a clean up to me. We have written 
libgit2-like code for merging server-side

7. Elijah: I’ve been adding tests for the past few years, more to add, 
feel good about it.

8. Jonathan N: If you are using a lower-layer thing, I would not say 
you’re not doing anything you shouldn’t. But if you docs say you 
should not to use diffcore_rename(), you can update the docs to say that 
it’s fine to use it.

9. Elijah: three places directly write tree objects. All have different 
data structures they are writing from. Should I pull them out? But then 
my data structure was also different, so I’d have a fourth.

10. Peff: not worried because trees are simple. Worried about policy 
logic. Can’t write a tree entry with a double slash. Want this to be 
enforced everywhere, but no idea how hard that would be to implement. 
Not about lines of code, but consistency of policy. Fearful that only 
one place does it.

11. Elijah: I know merge-ort checks this, but it’s not nearby, so it 
could change.

12. Peff: as bad as it is to round trip through the index, it may bypass 
quality checks, which you will need to manually implement.

13. Elijah: usability side, with the tree I’ve created, I could have 
.git/AUTOMERGE. I have an old tree, a new tree, and a checkout can get 
me there. Fixed a whole bunch of bugs for sparsity and submodules.

14. Elijah: If we use this to on-the-fly remerge as part of git-log in 
order to compare the merge commit to what the automatic merging would 
have done, where/how should we write objects as we go?

15. Jonathan N: can end up with proliferation of packs, would be nice to 
have similar to fast import and have in memory store. Dream not to have 
loose files written ever.

16. Peff: I like your dream. But fast import packs are bad. We assume 
that packs are good, and thus need to use GC aggressively. This 
increases pollution of that problem. I know about objects, but not 
written to disc, risk that you can write objects that are broken, but 
git doesn’t know because git thinks it has the object but it’s only 
in memory. Log is conceptually a read operation, but this would create 
the need for writes.

17. Elijah: you could write into a temporary directory. Worried about 
`gc --auto` in the middle of my operation. If I write to a temp pack I 
could potentially avoid it.

18. Elijah: large files. Rename detection might not work efficiently OR 
correctly for sufficiently large files (binary or not). Limited bucket 
size means that completely different files treated as renames when both 
are over 8MB. Should big files just not be compared?

19. Peff: maybe we should fix the hash…

20. Elijah: present situation is broken, maybe we can cheat in the short 
term, and avoid fixing?

21. Peff: seems more correct for now, but we’d need to document

22. Elijah: checkout --overwrite-ignore flag. Should merge have the same 
flag.

23. Jonathan N: gitignore original use case was build outputs which can 
be regenerate. But then some people want to ignore `.hg` which is much 
more precious.

24. Peff: we can plumb it through later to other commands

25. Brian: CI doesn’t really care. Moving between branches it would 
complain. For checkout and merge it makes sense to support just 
destroying.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help