Re: [PATCH v2 06/11] get_short_oid: sort ambiguous objects by type, then SHA-1
From: Derrick Stolee <hidden>
Date: 2018-05-01 14:15:30
On 5/1/2018 10:10 AM, Ævar Arnfjörð Bjarmason wrote:
quoted hunk ↗ jump to hunk
Actually I'm having second thoughts about that and thinking I might keep my original approach (with a better explanation). A few more lines of code seems worthwhile in order to not break the assumptions a documented API is making, no matter how briefly, so I set about documenting this case and supporting it, since e.g. oid_array_lookup() will completely fail with the hack of setting the .sorted member, and came up with this:diff --git a/Documentation/technical/api-oid-array.txt b/Documentation/technical/api-oid-array.txt index b0c11f868d..ff87260220 100644 --- a/Documentation/technical/api-oid-array.txt +++ b/Documentation/technical/api-oid-array.txt@@ -16,6 +16,20 @@ Data Structures the actual data. The `nr` member contains the number of items in the set. The `alloc` and `sorted` members are used internally, and should not be needed by API callers. ++ +Both the `oid_array_lookup` and `oid_array_for_each_unique` functions +rely on the array being sorted. For the former it's an absolute +requirenment that the internal `oid_array_sort` function has been +called on it, bu for the latter it's enough that the elements are +ordered in such a way as to guarantee that identical object IDs are +adjacent in the array.
s/bu/but/
quoted hunk ↗ jump to hunk
++ +This is useful e.g. to print output where commits, tags etc. are +grouped together (barring a hash collision they won't have the same +object ID), in such cases the `custom_sorted` member can be set to `1` +before calling `oid_array_for_each_unique`, and it'll skip its own +sorting. Once it's been set calling e.g. `oid_array_lookup` without it +being cleared again will cause an internal panic, so use it carefully. Functions ---------diff --git a/sha1-array.c b/sha1-array.c index 466a926aa3..cbae07ff78 100644 --- a/sha1-array.c +++ b/sha1-array.c@@ -18,6 +18,7 @@ static void oid_array_sort(struct oid_array *array) { QSORT(array->oid, array->nr, void_hashcmp); array->sorted = 1; + array->custom_sorted = 0; } static const unsigned char *sha1_access(size_t index, void *table)@@ -28,6 +29,13 @@ static const unsigned char *sha1_access(size_t index, void *table) int oid_array_lookup(struct oid_array *array, const struct object_id *oid) { + if (array->custom_sorted) + /* + * We could also just clear custom_sorted here, but if + * the caller is custom sorting and then calling this + * that's likely something they'd like to know about. + */ + BUG("PANIC: Cannot lookup OIDs in arrays with a custom sort!");
Probably don't need the "PANIC: " here.
quoted hunk ↗ jump to hunk
if (!array->sorted) oid_array_sort(array); return sha1_pos(oid->hash, array->oid, array->nr, sha1_access);@@ -39,6 +47,7 @@ void oid_array_clear(struct oid_array *array) array->nr = 0; array->alloc = 0; array->sorted = 0; + array->custom_sorted = 0; } int oid_array_for_each_unique(struct oid_array *array,@@ -47,7 +56,7 @@ int oid_array_for_each_unique(struct oid_array *array, { int i; - if (!array->sorted) + if (!array->sorted && !array->custom_sorted) oid_array_sort(array); for (i = 0; i < array->nr; i++) {diff --git a/sha1-array.h b/sha1-array.h index 1e1d24b009..bfa77ba1e4 100644 --- a/sha1-array.h +++ b/sha1-array.h@@ -6,6 +6,7 @@ struct oid_array { int nr; int alloc; int sorted; + int custom_sorted; }; #define OID_ARRAY_INIT { NULL, 0, 0, 0 }diff --git a/sha1-name.c b/sha1-name.c index b81e07adbb..d190800db0 100644 --- a/sha1-name.c +++ b/sha1-name.c@@ -490,9 +490,11 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn fn, void *cb_data) find_short_packed_object(&ds); QSORT(collect.oid, collect.nr, sort_ambiguous); - collect.sorted = 1; + collect.custom_sorted = 1; ret = oid_array_for_each_unique(&collect, fn, cb_data); + collect.custom_sorted = 0; + oid_array_clear(&collect); return ret; }So maybe I should just stop worrying and YOLO it, it just seems wrong to leave such a fragile setup in place where we set .sorted=1 and some future refactoring reasonably tries to call oid_array_lookup() on it and silently fails. What do you think?
I think this extra custom_sort check is worth keeping the API stable to future changes. Thanks, -Stolee