Re: [PATCH v2] untracked-cache: support '--untracked-files=all' if configured
From: Tao Klerks <hidden>
Date: 2022-02-27 11:21:46
Possibly related (same subject, not in this thread)
- 2022-02-27 · Re: [PATCH v2] untracked-cache: support '--untracked-files=all' if configured · Junio C Hamano <hidden>
- 2022-02-25 · [PATCH v2] untracked-cache: support '--untracked-files=all' if configured · Tao Klerks via GitGitGadget <hidden>
On Fri, Feb 25, 2022 at 8:44 PM Junio C Hamano [off-list ref] wrote:
"Tao Klerks via GitGitGadget" [off-list ref] writes:quoted
If the previously stored flags no longer match the current configuration, but the currently-applicable flags do match the current configuration, then the previously stored untracked cache data is discarded. For most users there will be no change in behavior. Users who need '--untracked-files=all' to perform well will have the option of setting "status.showuntrackedfiles" to "all". Users who need '--untracked-files=all' to perform well for their tooling AND prefer to avoid the verbosity of "all" when running git status explicitly without options... are out of luck for now (no change).So, in short, the root of the problem is that untracked cache can be used to serve only one mode (between normal and all) of operation, and the real solution to that problem must come in a different form, i.e. allowing a single unified untracked cache to be usable in both modes, perhaps by maintaining all the untracked ones in the cache, but filter out upon output when the 'normal' mode is asked for?
I wouldn't call this the root of the problem I was trying to solve with this patch, but rather the root of the remaining problem, yes. The challenge that I can't get my head around for this longer-term approach or objective, is *when* the untracked-folder nested files should be enumerated. Currently, untracked folders are only recursed into if -uall is specified or configured - and that's a significant characteristic: It's perfectly plausible that some users sometimes have huge deep untracked folder hierarchies that take a long time to explore, but git never needs to because they never specify -uall. If we decided to serve both modes with one single untracked cache structure, then we would either need to always fully recurse, or implement some sort of "just-in-time" filling in of the recursive bits when someone specifies -uall for the first time. Either way, I'm pretty sure it's beyond me to do that right. Hence this very-pragmatic approach that makes it *possible* to get good/normal performance with -uall.
quoted
Users who set "status.showuntrackedfiles" to "all" and yet most frequently explicitly call 'git status --untracked-files=normal' (and use the untracked cache) are the only ones who would be disadvantaged by this change. It seems unlikely there are any such users.Given how widely used we are these days, I am afraid that the days we can safely say "users with such a strange use pattern would not exist at all" is long gone.quoted
+static unsigned configured_default_dir_flags(struct index_state *istate) +{ + /* This logic is coordinated with the setting of these flags in + * wt-status.c#wt_status_collect_untracked(), and the evaluation + * of the config setting in commit.c#git_status_config() + */Good thing to note here. Style. /* * Our multi-line comments reads more like this, with * slash-asterisk that opens, and asterisk-slash that closes, * sitting on their own lines. */
Thanks, sorry, I thought I'd corrected them all but clearly missed some.
quoted
+ char *status_untracked_files_config_value; + int config_outcome = repo_config_get_string(istate->repo, + "status.showuntrackedfiles",The indentation looks a bit off. In this project, tab width is 8. The beginning of the second parameter to the function call on the second line should align with the beginning of the first parameter that immediately follows the open parenthesis '('.
Fixed, thx.
quoted
+ &status_untracked_files_config_value); + if (!config_outcome && !strcmp(status_untracked_files_config_value, "all")) { + return 0; + } else { + /* + * The default, if "all" is not set, is "normal" - leading us here. + * If the value is "none" then it really doesn't matter. + */ + return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES; + } +}I didn't see the need to pass istate to this function, though. Shouldn't it take a repository instead?
Makes sense, fixed, thx.
quoted
-static void new_untracked_cache(struct index_state *istate) +static void new_untracked_cache(struct index_state *istate, unsigned flags) { struct untracked_cache *uc = xcalloc(1, sizeof(*uc)); strbuf_init(&uc->ident, 100); uc->exclude_per_dir = ".gitignore"; - /* should be the same flags used by git-status */ - uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES; + uc->dir_flags = flags;So we used to hardcode these two flags to match what is done in wt-status.c when show_untracked_files != SHOW_ALL_UNTRACKEDFILES; we allow a different set of flags to be given by the caller.quoted
@@ -2761,11 +2781,13 @@ static void new_untracked_cache(struct index_state *istate) void add_untracked_cache(struct index_state *istate) { if (!istate->untracked) { - new_untracked_cache(istate); + new_untracked_cache(istate, + configured_default_dir_flags(istate)); } else { if (!ident_in_untracked(istate->untracked)) { free_untracked_cache(istate->untracked); - new_untracked_cache(istate); + new_untracked_cache(istate, + configured_default_dir_flags(istate)); } } }OK. That's quite straight-forward to see how the tweak is made.quoted
@@ -2781,10 +2803,12 @@ void remove_untracked_cache(struct index_state *istate) static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir, int base_len, - const struct pathspec *pathspec) + const struct pathspec *pathspec, + struct index_state *istate) { struct untracked_cache_dir *root; static int untracked_cache_disabled = -1; + unsigned configured_dir_flags; if (!dir->untracked) return NULL;@@ -2812,17 +2836,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d if (base_len || (pathspec && pathspec->nr)) return NULL; - /* Different set of flags may produce different results */ - if (dir->flags != dir->untracked->dir_flags ||This is removed because we are making sure that dir->flags and dir->untracked->dir_flags match?quoted
- /* - * See treat_directory(), case index_nonexistent. Without - * this flag, we may need to also cache .git file content - * for the resolve_gitlink_ref() call, which we don't. - */ - !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||This is removed because...?
Because we *do* now support using untracked cache with -uall... As long as the config matches the runtime flags (new check later).
quoted
- /* We don't support collecting ignore files */ - (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO | - DIR_COLLECT_IGNORED)))quoted
+ /* We don't support collecting ignore files */ + if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO | + DIR_COLLECT_IGNORED)) return NULL; /*@@ -2845,6 +2861,40 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d return NULL; } + /* We don't support using or preparing the untracked cache if + * the current effective flags don't match the configured + * flags. + */Style (another one in large comment below).
Thx.
quoted
+ configured_dir_flags = configured_default_dir_flags(istate); + if (dir->flags != configured_dir_flags) + return NULL;Hmph. If this weren't necessary, this function does not need to call configured_default_dir_flags(), and it can lose the configured_dir_flags variable, too. Which means that new_untracked_cache() function does not need to take the flags word as a caller-supplied parameter. Instead, it can make a call to configured_dir_flags() and assign the result to uc->dir_flags itself, which would have been much nicer.
I've tightened this up a little with an inline call to configured_default_dir_flags(), getting rid of the variable, let's see if that makes more sense / is cleaner. Sending new version with these changes.
quoted
+ /* If the untracked structure we received does not have the same flags + * as configured, but the configured flags do match the effective flags, + * then we need to reset / create a new "untracked" structure to match + * the new config. + * Keeping the saved and used untracked cache in-line with the + * configuration provides an opportunity for frequent users of + * "git status -uall" to leverage the untracked cache by aligning their + * configuration (setting "status.showuntrackedfiles" to "all" or + * "normal" as appropriate), where previously this option was + * incompatible with untracked cache and *consistently* caused + * surprisingly bad performance (with fscache and fsmonitor enabled) on + * Windows. + * + * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage + * to not be as bound up with the desired output in a given run, + * and instead iterated through and stored enough information to + * correctly serve both "modes", then users could get peak performance + * with or without '-uall' regardless of their + * "status.showuntrackedfiles" config. + */ + if (dir->flags != dir->untracked->dir_flags) { + free_untracked_cache(istate->untracked); + new_untracked_cache(istate, configured_dir_flags); + dir->untracked = istate->untracked; + }This compensates what we lost in the validate_untracked_cache() above by making them match. Looking reasonable.quoted
if (!dir->untracked->root) FLEX_ALLOC_STR(dir->untracked->root, name, "");@@ -2916,7 +2966,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, return dir->nr; } - untracked = validate_untracked_cache(dir, len, pathspec); + untracked = validate_untracked_cache(dir, len, pathspec, istate); if (!untracked) /* * make sure untracked cache code path is disabled,base-commit: dab1b7905d0b295f1acef9785bb2b9cbb0fdec84