hashmap vs khash? Re: [PATCH] packfile.c: speed up loading lots of packfiles.
From: Eric Wong <hidden>
Date: 2019-11-28 00:42:04
From: Eric Wong <hidden>
Date: 2019-11-28 00:42:04
Colin Stolley [off-list ref] wrote:
When loading packfiles on start-up, we traverse the internal packfile list once per file to avoid reloading packfiles that have already been loaded. This check runs in quadratic time, so for poorly maintained repos with a large number of packfiles, it can be pretty slow.
Cool! Thanks for looking into this, and I've been having trouble in that department with big alternates files.
Add a hashmap containing the packfile names as we load them so that the average runtime cost of checking for already-loaded packs becomes constant.
Btw, would you have time to do a comparison against khash? AFAIK hashmap predates khash in git; and hashmap was optimized for removal. Removals don't seem to be a problem for pack loading. I'm interested in exploring the removing of hashmap entirely in favor of khash to keep our codebase smaller and easier-to-learn. khash shows up more in other projects, and ought to have better cache-locality. Thanks again.