Re: [PATCH v2 06/11] get_short_oid: sort ambiguous objects by type, then SHA-1

[PATCH 0/9] get_short_oid UI improvements · Ævar Arnfjörð Bjarmason <hidden> · 2018-04-30
[PATCH 1/9] sha1-name.c: remove stray newline · Ævar Arnfjörð Bjarmason <hidden> · 2018-04-30
[PATCH 2/9] sha1-array.h: align function arguments · Ævar Arnfjörð Bjarmason <hidden> · 2018-04-30
[PATCH 3/9] sha1-name.c: move around the collect_ambiguous() function · Ævar Arnfjörð Bjarmason <hidden> · 2018-04-30
[PATCH 4/9] get_short_oid: sort ambiguous objects by type, then SHA-1 · Ævar Arnfjörð Bjarmason <hidden> · 2018-04-30
Re: [PATCH 4/9] get_short_oid: sort ambiguous objects by type, then SHA-1 · Derrick Stolee <hidden> · 2018-05-01
Re: [PATCH 4/9] get_short_oid: sort ambiguous objects by type, then SHA-1 · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
Re: [PATCH 4/9] get_short_oid: sort ambiguous objects by type, then SHA-1 · Derrick Stolee <hidden> · 2018-05-01
Re: [PATCH 4/9] get_short_oid: sort ambiguous objects by type, then SHA-1 · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
Re: [PATCH 4/9] get_short_oid: sort ambiguous objects by type, then SHA-1 · Derrick Stolee <hidden> · 2018-05-01
[PATCH 6/9] get_short_oid: learn to disambiguate by ^{blob} · Ævar Arnfjörð Bjarmason <hidden> · 2018-04-30
[PATCH 7/9] get_short_oid / peel_onion: ^{tree} should mean tree, not treeish · Ævar Arnfjörð Bjarmason <hidden> · 2018-04-30
Re: [PATCH 7/9] get_short_oid / peel_onion: ^{tree} should mean tree, not treeish · brian m. carlson <hidden> · 2018-05-01
[PATCH 5/9] get_short_oid: learn to disambiguate by ^{tag} · Ævar Arnfjörð Bjarmason <hidden> · 2018-04-30
[PATCH 9/9] config doc: document core.disambiguate · Ævar Arnfjörð Bjarmason <hidden> · 2018-04-30
[PATCH 8/9] get_short_oid / peel_onion: ^{tree} should mean commit, not commitish · Ævar Arnfjörð Bjarmason <hidden> · 2018-04-30
Re: [PATCH 8/9] get_short_oid / peel_onion: ^{tree} should mean commit, not commitish · Eric Sunshine <hidden> · 2018-04-30
Re: [PATCH 0/9] get_short_oid UI improvements · Stefan Beller <hidden> · 2018-04-30
Re: [PATCH 0/9] get_short_oid UI improvements · brian m. carlson <hidden> · 2018-05-01
Re: [PATCH 0/9] get_short_oid UI improvements · Derrick Stolee <hidden> · 2018-05-01
[PATCH v2 00/12] get_short_oid UI improvements · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v2 01/12] sha1-name.c: remove stray newline · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v2 02/12] sha1-array.h: align function arguments · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v2 03/12] git-p4: change "commitish" typo to "committish" · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v2 04/12] cache.h: add comment explaining the order in object_type · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v2 05/12] sha1-name.c: move around the collect_ambiguous() function · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v2 07/12] get_short_oid: learn to disambiguate by ^{tag} · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v2 06/12] get_short_oid: sort ambiguous objects by type, then SHA-1 · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v2 09/12] get_short_oid / peel_onion: ^{tree} should be tree, not treeish · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v2 10/12] get_short_oid / peel_onion: ^{commit} should be commit, not committish · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v2 12/12] get_short_oid: document & warn if we ignore the type selector · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v2 11/12] config doc: document core.disambiguate · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v2 08/12] get_short_oid: learn to disambiguate by ^{blob} · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v2 06/11] get_short_oid: sort ambiguous objects by type, then SHA-1 · Derrick Stolee <hidden> · 2018-05-01
Re: [PATCH v2 06/11] get_short_oid: sort ambiguous objects by type, then SHA-1 · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
Re: [PATCH v2 06/11] get_short_oid: sort ambiguous objects by type, then SHA-1 · Derrick Stolee <hidden> · 2018-05-01
Re: [PATCH v2 06/11] get_short_oid: sort ambiguous objects by type, then SHA-1 · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
Re: [PATCH v2 06/11] get_short_oid: sort ambiguous objects by type, then SHA-1 · Derrick Stolee <hidden> · 2018-05-01
[PATCH v3 00/12] get_short_oid UI improvements · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v3 01/12] sha1-name.c: remove stray newline · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v3 02/12] sha1-array.h: align function arguments · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v3 04/12] cache.h: add comment explaining the order in object_type · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
Re: [PATCH v3 04/12] cache.h: add comment explaining the order in object_type · Duy Nguyen <hidden> · 2018-05-08
[PATCH] pack-format.txt: more details on pack file format · Nguyễn Thái Ngọc Duy <hidden> · 2018-05-08
Re: [PATCH] pack-format.txt: more details on pack file format · Stefan Beller <hidden> · 2018-05-08
Re: [PATCH] pack-format.txt: more details on pack file format · Duy Nguyen <hidden> · 2018-05-08
Re: [PATCH] pack-format.txt: more details on pack file format · Stefan Beller <hidden> · 2018-05-08
Re: [PATCH] pack-format.txt: more details on pack file format · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-08
Re: [PATCH] pack-format.txt: more details on pack file format · Duy Nguyen <hidden> · 2018-05-08
[PATCH v2] pack-format.txt: more details on pack file format · Nguyễn Thái Ngọc Duy <hidden> · 2018-05-10
Re: [PATCH v2] pack-format.txt: more details on pack file format · Stefan Beller <hidden> · 2018-05-10
Re: [PATCH v2] pack-format.txt: more details on pack file format · Duy Nguyen <hidden> · 2018-05-11
[PATCH v3] pack-format.txt: more details on pack file format · Nguyễn Thái Ngọc Duy <hidden> · 2018-05-11
[PATCH v3 03/12] git-p4: change "commitish" typo to "committish" · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v3 05/12] sha1-name.c: move around the collect_ambiguous() function · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v3 08/12] get_short_oid: learn to disambiguate by ^{blob} · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v3 09/12] get_short_oid / peel_onion: ^{tree} should be tree, not treeish · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v4 0/6] get_short_oid UI improvements · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-10
[PATCH v4 1/6] sha1-name.c: remove stray newline · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-10
[PATCH v4 2/6] sha1-array.h: align function arguments · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-10
Re: [PATCH v4 2/6] sha1-array.h: align function arguments · Jeff King <hidden> · 2018-05-10
[PATCH v4 3/6] git-p4: change "commitish" typo to "committish" · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-10
Re: [PATCH v4 3/6] git-p4: change "commitish" typo to "committish" · Luke Diamand <hidden> · 2018-05-10
[PATCH v4 4/6] sha1-name.c: move around the collect_ambiguous() function · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-10
[PATCH v4 5/6] get_short_oid: sort ambiguous objects by type, then SHA-1 · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-10
Re: [PATCH v4 5/6] get_short_oid: sort ambiguous objects by type, then SHA-1 · Jeff King <hidden> · 2018-05-10
[PATCH v4 6/6] get_short_oid: document & warn if we ignore the type selector · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-10
Re: [PATCH v4 6/6] get_short_oid: document & warn if we ignore the type selector · Martin Ågren <hidden> · 2018-05-10
Re: [PATCH v4 6/6] get_short_oid: document & warn if we ignore the type selector · Jeff King <hidden> · 2018-05-10
Re: [PATCH v4 6/6] get_short_oid: document & warn if we ignore the type selector · Jeff King <hidden> · 2018-05-10
Re: [PATCH v4 6/6] get_short_oid: document & warn if we ignore the type selector · Jeff King <hidden> · 2018-05-10
Re: [PATCH v4 0/6] get_short_oid UI improvements · Jeff King <hidden> · 2018-05-10
[PATCH v3 10/12] get_short_oid / peel_onion: ^{commit} should be commit, not committish · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v3 11/12] config doc: document core.disambiguate · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
Re: [PATCH v3 11/12] config doc: document core.disambiguate · Jeff King <hidden> · 2018-05-08
[PATCH v3 12/12] get_short_oid: document & warn if we ignore the type selector · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
[PATCH v3 06/12] get_short_oid: sort ambiguous objects by type, then SHA-1 · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
Re: [PATCH v3 06/12] get_short_oid: sort ambiguous objects by type, then SHA-1 · Jeff King <hidden> · 2018-05-08
[PATCH v3 07/12] get_short_oid: learn to disambiguate by ^{tag} · Ævar Arnfjörð Bjarmason <hidden> · 2018-05-01
Re: [PATCH v3 00/12] get_short_oid UI improvements · Derrick Stolee <hidden> · 2018-05-02
Re: [PATCH v3 00/12] get_short_oid UI improvements · Derrick Stolee <hidden> · 2018-05-02
Re: [PATCH v3 00/12] get_short_oid UI improvements · Jacob Keller <hidden> · 2018-05-03

From: Ævar Arnfjörð Bjarmason <hidden>
Date: 2018-05-01 14:10:48
Subsystem: documentation, the rest · Maintainers: Jonathan Corbet, Linus Torvalds

On Tue, May 01 2018, Derrick Stolee wrote:

On 5/1/2018 9:39 AM, Ævar Arnfjörð Bjarmason wrote:

quoted

On Tue, May 01 2018, Derrick Stolee wrote:

quoted

From: Ævar Arnfjörð Bjarmason <redacted>

Here is what I mean by sorting during for_each_abbrev(). This seems to work for
me, so I don't know what the issue is with this one-pass approach.
[...]
+static int sort_ambiguous(const void *a, const void *b)
+{
+	int a_type = oid_object_info(a, NULL);
+	int b_type = oid_object_info(b, NULL);
+	int a_type_sort;
+	int b_type_sort;
+
+	/*
+	 * Sorts by hash within the same object type, just as
+	 * oid_array_for_each_unique() would do.
+	 */
+	if (a_type == b_type)
+		return oidcmp(a, b);
+
+	/*
+	 * Between object types show tags, then commits, and finally
+	 * trees and blobs.
+	 *
+	 * The object_type enum is commit, tree, blob, tag, but we
+	 * want tag, commit, tree blob. Cleverly (perhaps too
+	 * cleverly) do that with modulus, since the enum assigns 1 to
+	 * commit, so tag becomes 0.
+	 */
+	a_type_sort = a_type % 4;
+	b_type_sort = b_type % 4;
+	return a_type_sort > b_type_sort ? 1 : -1;
+}
+
  static int get_short_oid(const char *name, int len, struct object_id *oid,
  			  unsigned flags)
  {

@@ -451,6 +479,9 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn fn, void *cb_data)
  	find_short_object_filename(&ds);
  	find_short_packed_object(&ds);

+	QSORT(collect.oid, collect.nr, sort_ambiguous);
+	collect.sorted = 1;
+

Yes this works. You're right. I wasn't trying to intentionally omit
stuff in my recent 878t93zh60.fsf@evledraar.gmail.com, I'd just written
this code some days ago and forgotten why I did what I was doing (and
this is hard to test for), but it's all coming back to me now.

The actual requirement for oid_array_for_each_unique() working properly
is that you've got to feed it in hash order,

To work properly, duplicate entries must be consecutive. Since
duplicate entries have the same type, our sort satisfies this
condition.

quoted

but my new sort_ambiguous()
still does that (barring any SHA-1 collisions, at which point we have
bigger problems), so two passes aren't needed. So yes, this apporoach
works and is one-pass.

But that's just an implementation detail of the current sort method,
when I wrote this I was initially playing with other sort orders,
e.g. sorting SHAs regardless of type by the mtime of the file I found
them in. With this approach I'd start printing duplicates if I changed
the internals of sort_ambiguous() like that.

That makes sense.

quoted

But I think it's extremely implausible that we'll start sorting things
like that, so I'll just take this method of doing it and add some
comment saying we must hashcmp() the entries in our own sort function
for the de-duplication to work, I don't see us ever changing that.

Sounds good.

Actually I'm having second thoughts about that and thinking I might keep
my original approach (with a better explanation).

A few more lines of code seems worthwhile in order to not break the
assumptions a documented API is making, no matter how briefly, so I set
about documenting this case and supporting it, since
e.g. oid_array_lookup() will completely fail with the hack of setting
the .sorted member, and came up with this:

diff --git a/Documentation/technical/api-oid-array.txt b/Documentation/technical/api-oid-array.txt
index b0c11f868d..ff87260220 100644
--- a/Documentation/technical/api-oid-array.txt
+++ b/Documentation/technical/api-oid-array.txt

@@ -16,6 +16,20 @@ Data Structures
 	the actual data. The `nr` member contains the number of items in
 	the set.  The `alloc` and `sorted` members are used internally,
 	and should not be needed by API callers.
++
+Both the `oid_array_lookup` and `oid_array_for_each_unique` functions
+rely on the array being sorted. For the former it's an absolute
+requirenment that the internal `oid_array_sort` function has been
+called on it, bu for the latter it's enough that the elements are
+ordered in such a way as to guarantee that identical object IDs are
+adjacent in the array.
++
+This is useful e.g. to print output where commits, tags etc. are
+grouped together (barring a hash collision they won't have the same
+object ID), in such cases the `custom_sorted` member can be set to `1`
+before calling `oid_array_for_each_unique`, and it'll skip its own
+sorting. Once it's been set calling e.g. `oid_array_lookup` without it
+being cleared again will cause an internal panic, so use it carefully.

 Functions
 ---------

diff --git a/sha1-array.c b/sha1-array.c
index 466a926aa3..cbae07ff78 100644
--- a/sha1-array.c
+++ b/sha1-array.c

@@ -18,6 +18,7 @@ static void oid_array_sort(struct oid_array *array)
 {
 	QSORT(array->oid, array->nr, void_hashcmp);
 	array->sorted = 1;
+	array->custom_sorted = 0;
 }

 static const unsigned char *sha1_access(size_t index, void *table)

@@ -28,6 +29,13 @@ static const unsigned char *sha1_access(size_t index, void *table)

 int oid_array_lookup(struct oid_array *array, const struct object_id *oid)
 {
+	if (array->custom_sorted)
+		/*
+		 * We could also just clear custom_sorted here, but if
+		 * the caller is custom sorting and then calling this
+		 * that's likely something they'd like to know about.
+		 */
+		BUG("PANIC: Cannot lookup OIDs in arrays with a custom sort!");
 	if (!array->sorted)
 		oid_array_sort(array);
 	return sha1_pos(oid->hash, array->oid, array->nr, sha1_access);

@@ -39,6 +47,7 @@ void oid_array_clear(struct oid_array *array)
 	array->nr = 0;
 	array->alloc = 0;
 	array->sorted = 0;
+	array->custom_sorted = 0;
 }

 int oid_array_for_each_unique(struct oid_array *array,

@@ -47,7 +56,7 @@ int oid_array_for_each_unique(struct oid_array *array,
 {
 	int i;

-	if (!array->sorted)
+	if (!array->sorted && !array->custom_sorted)
 		oid_array_sort(array);

 	for (i = 0; i < array->nr; i++) {

diff --git a/sha1-array.h b/sha1-array.h
index 1e1d24b009..bfa77ba1e4 100644
--- a/sha1-array.h
+++ b/sha1-array.h

@@ -6,6 +6,7 @@ struct oid_array {
 	int nr;
 	int alloc;
 	int sorted;
+	int custom_sorted;
 };

 #define OID_ARRAY_INIT { NULL, 0, 0, 0 }

diff --git a/sha1-name.c b/sha1-name.c
index b81e07adbb..d190800db0 100644
--- a/sha1-name.c
+++ b/sha1-name.c

@@ -490,9 +490,11 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn fn, void *cb_data)
 	find_short_packed_object(&ds);

 	QSORT(collect.oid, collect.nr, sort_ambiguous);
-	collect.sorted = 1;

+	collect.custom_sorted = 1;
 	ret = oid_array_for_each_unique(&collect, fn, cb_data);
+	collect.custom_sorted = 0;
+
 	oid_array_clear(&collect);
 	return ret;
 }

So maybe I should just stop worrying and YOLO it, it just seems wrong to
leave such a fragile setup in place where we set .sorted=1 and some
future refactoring reasonably tries to call oid_array_lookup() on it and
silently fails.

What do you think?

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help