Re: hashmap vs khash? Re: [PATCH] packfile.c: speed up loading lots of packfiles.
From: Jeff King <hidden>
Date: 2019-12-02 14:39:46
On Thu, Nov 28, 2019 at 12:42:02AM +0000, Eric Wong wrote:
quoted
Add a hashmap containing the packfile names as we load them so that the average runtime cost of checking for already-loaded packs becomes constant.Btw, would you have time to do a comparison against khash? AFAIK hashmap predates khash in git; and hashmap was optimized for removal. Removals don't seem to be a problem for pack loading.
Actually, they came around simultaneously. I think hashmap.[ch] was mostly a response to our open-coded hashes, like the one in object.c (which still uses neither of the reusable forms!). Those didn't handle removal at all. khash does handle removal, though you pay a price in tombstone entries until the next resize operation.
I'm interested in exploring the removing of hashmap entirely in favor of khash to keep our codebase smaller and easier-to-learn. khash shows up more in other projects, and ought to have better cache-locality.
I have been tempted to push for that, too. Every timing I have ever done shows khash as faster (though for a trivial use like this one, I would be quite surprised if it mattered either way). My hesitation is that khash can be harder to debug because of the macro implementation. But I have rarely needed to look beneath its API. -Peff