Re: [PATCH v3 16/23] ref-filter: make cat_file_info independent
From: Jeff King <hidden>
Date: 2018-02-16 21:54:51
On Thu, Feb 15, 2018 at 01:33:25PM +0300, Оля Тележная wrote:
2018-02-15 8:53 GMT+03:00 Jeff King [off-list ref]:quoted
On Mon, Feb 12, 2018 at 08:08:54AM +0000, Olga Telezhnaya wrote:quoted
Remove connection between expand_data variable in cat-file and in ref-filter. It will help further to get rid of using expand_data in cat-file.I have to admit I'm confused at this point about what is_cat_file is for, or even why we need cat_file_data. Shouldn't these items be handled by their matching ref-filter atoms at this point?We discussed that earlier outside of mailing list, and I even tried to implement that idea and spent a couple of days to prove that it's not possible. The problem is that the list of atoms is made dynamically, and we can't store pointers to any values in each atom. That's why we need separate cat_file_info variable that is outside of main atom list. We also need is_cat_file because we still have some part of logic that is different for cat-file and for all other commands, and sometimes we need to know that information.
OK, right, I forgot about the pointers thing. This is definitely the
sort of design decision that should go into the commit message.
I went over our off-list discussion again to refresh my memory. Let me
try to summarize a bit (for myself, or for other readerss). Naively, one
would hope that you could do something like:
1. While parsing the %(objectsize) atom, add a pointer like:
the_object_info->sizep = &atom.size;
2. Later when fulfilling the request for a specific object, if
the_object_info is not empty, then call sha1_object_info() with it
to hold the results.
3. Read the value out of atom.size when assembling the formatted
string.
But that has two problems:
a. During parsing, we're resizing the atom array, so the &atom.size
pointer is subject to change. This is the issue you mentioned
above. I think we could solve it by taking a pass over the atoms
after parsing and filling in the pointers then. But...
b. Another problem is that the same atom may appear multiple times. So
parsing "%(objectsize) %(objectsize)", we can't point
the_object_info at _both_ of them. We need our call to
sha1_object_info() to store the result in a single location, and
then as we format each atom they can both use that (which works
even if they might format it differently, say if you had
"%(deltabase) %(deltabase:abbrev)".
So we do need some kind of global storage to hold these results. This is
sort of the same problem we have with parsing the objects at all. There
we get by without a global because all of the logic is crammed into
populate_value(). In theory we could do the same thing here, but there
are just a lot of different items we might be asking for. So instead of
"need_tagged", you'd have a multitude of need_objectsize_disk,
need_deltabase, etc, which gets unwieldy. We may as well fill in a
"struct object_info".
So OK, I agree that something like "struct expand_data" is needed.
But what seems non-ideal to me is the way that ref-filter depends on it
in this cat-file-specific way. If it needs those values it should be
calling sha1_object_info(). And if it doesn't, then it can skip the
call. It shouldn't need to know about cat-file at all.
And likewise cat-file should not know about expand_data. And I think
that's true by the end of the series, though the steps in the middle
left me a little confused.
-Peff