Thread (63 messages) 63 messages, 7 authors, 2019-10-12

Re: [RFC PATCH v10 32.5/36] unpack_trees: fix memory corruption with split_index when src != dst

From: Duy Nguyen <hidden>
Date: 2018-04-22 12:39:04

On Sat, Apr 21, 2018 at 9:37 PM, Elijah Newren [off-list ref] wrote:
Currently, all callers of unpack_trees() set o->src_index == o->dst_index.
Since we create a temporary index in o->result, then discard o->dst_index
and overwrite it with o->result, when o->src_index == o->dst_index it is
safe to just reuse o->src_index's split_index for o->result.  However,
o->src_index and o->dst_index are specified separately in order to allow
callers to have these be different.  In such a case, reusing
o->src_index's split_index for o->result will cause the split_index to be
shared.  If either index then has entries replaced or removed, it will
result in the other index referring to free()'d memory.

Signed-off-by: Elijah Newren <redacted>
---

I still haven't wrapped my head around the split_index stuff entirely, so
it's possible that

  - the performance optimization isn't even valid when src == dst.  Could
    the original index be different enough from the result that we don't
    want its split_index?
This really depends on the use case of course. But when git checkout
is used for switching branches, unpack-trees will be used and unless
you switch between to vastly different branches, the updated entries
may be small compared to the entire index that sharing is still good.
If the result index is so different that it results in a huge index
file anyway, I believe we have code to recreate a new shared index to
keep its size down next time.
  - there's a better, more performant fix or there is some way to actually
    share a split_index between two independent index_state objects.
A cleaner way of doing this would be something to the line [1]

    move_index_extensions(&o->result, o->dst_index);

near the end of this function. This could be where we compare the
result index with the source index's shared file and see if it's worth
keeping the shared index or not. Shared index is designed to work with
huge index files though, any operations that go through all index
entries will usually not be cheap. But at least it's safer.
However, with this fix, all the tests pass both normally and under
GIT_TEST_SPLIT_INDEX=DareISayYes.  Without this patch, when
GIT_TEST_SPLIT_INDEX is set, my directory rename detection series will fail
several tests, as reported by SZEDER.
Yes, the change looks good.

[1] To me the second parameter should be src_index, not dst_index.
We're copying entries from _source_ index to "result" and we should
also copy extensions from the source index. That line happens to work
only when dst_index is the same as src_index, which is the common use
case so far.
quoted hunk ↗ jump to hunk
 unpack-trees.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/unpack-trees.c b/unpack-trees.c
index 79fd97074e..b670415d4c 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1284,9 +1284,20 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
        o->result.timestamp.sec = o->src_index->timestamp.sec;
        o->result.timestamp.nsec = o->src_index->timestamp.nsec;
        o->result.version = o->src_index->version;
-       o->result.split_index = o->src_index->split_index;
-       if (o->result.split_index)
+       if (!o->src_index->split_index) {
+               o->result.split_index = NULL;
+       } else if (o->src_index == o->dst_index) {
+               /*
+                * o->dst_index (and thus o->src_index) will be discarded
+                * and overwritten with o->result at the end of this function,
+                * so just use src_index's split_index to avoid having to
+                * create a new one.
+                */
+               o->result.split_index = o->src_index->split_index;
                o->result.split_index->refcount++;
+       } else {
+               o->result.split_index = init_split_index(&o->result);
+       }
        hashcpy(o->result.sha1, o->src_index->sha1);
        o->merge_size = len;
        mark_all_ce_unused(o->src_index);
--
2.17.0.296.gaac25b4b81


-- 
Duy
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help