Re: [PATCH 4/5] config: return an empty list, not NULL

From: Ævar Arnfjörð Bjarmason <hidden>
Date: 2022-09-28 15:07:51

On Wed, Sep 28 2022, Derrick Stolee wrote:

On 9/27/22 3:18 PM, Ævar Arnfjörð Bjarmason wrote:

quoted

On Tue, Sep 27 2022, Derrick Stolee wrote:

quoted

On 9/27/2022 12:21 PM, Ævar Arnfjörð Bjarmason wrote:

quoted

On Tue, Sep 27 2022, Derrick Stolee via GitGitGadget wrote:

quoted

 /**
  * Finds and returns the value list, sorted in order of increasing priority
  * for the configuration variable `key`. When the configuration variable
- * `key` is not found, returns NULL. The caller should not free or modify
- * the returned pointer, as it is owned by the cache.
+ * `key` is not found, returns an empty list. The caller should not free or
+ * modify the returned pointer, as it is owned by the cache.
  */
 const struct string_list *git_config_get_value_multi(const char *key);

Aside from the "DWIM API" aspect of this (which I don't mind) I think
this is really taking the low-level function in the wrong direction, and
that we should just add a new simple wrapper instead.

I.e. both the pre-image API docs & this series gloss over the fact that
we'd not just return NULL here if the config wasn't there, but also if
git_config_parse_key() failed.

So it seems to me that a better direction would be starting with
something like the WIP below (which doesn't compile the whole code, I
stopped at config.[ch] and pack-bitmap.c). I.e. the same "int" return
and "dest" pattern that most other things in the config API have.

Do you have an example where a caller would benefit from this
distinction? Without such an example, I don't think it is worth
creating such a huge change for purity's sake alone.

Not initially, I started poking at this because the CL/series/commits
says that we don't care about the case of non-existing keys, without
being clear as to why we want to conflate that with other errors we
might get from this API.

But after some digging I found:

	$ for k in a a.b. "'x.y"; do ./git for-each-repo --config=$k;  echo $?; done
	error: key does not contain a section: a
	0
	error: key does not contain variable name: a.b.
	0
	error: invalid key: 'x.y
	0
	
I.e. the repo_config_get_value_multi() you added in for-each-repo
doesn't distinguish between bad keys and non-existing keys, and returns
0 even though it printed an "error".

I can understand wanting to inform the user that they provided an
invalid key using a nonzero exit code. I can also understand that
the command does what is asked: it did nothing because the given
key has no values (because it can't). I think the use of an "error"
message balances things towards wanting a nonzero exit code.

Right, to be clear I think 6c62f015520 (for-each-repo: do nothing on
empty config, 2021-01-08) is sensible, i.e. we want to return 0 on a
non-existing key.

We just shouldn't conflate that with e.g. these parse errors, which the
API squashing the underlying negative return values and the "NULL list"
imposes on the user.

quoted

I'm pretty happy that the diff for this series is an overall
reduction in code, while also not being too large in the interim:

 12 files changed, 39 insertions(+), 57 deletions(-)

If all callers that use the *_multi() methods would only use the
wrapper, then what is the point of doing the low-level manipulations?

I hacked up something that's at least RFC-quality based on this
approach, but CI is running etc., so not submitting it
now:

	https://github.com/git/git/compare/master...avar:git:avar/have-git_configset_get_value-use-dest-and-int-pattern

I think the resulting diff is more idiomatic API use, i.e. you ended up
with:

	        /* submodule.active is set */
	        sl = repo_config_get_value_multi(repo, "submodule.active");
	-       if (sl) {
	+       if (sl && sl->nr) {

You're right that I forgot to change this one to "if (sl->nr)"
in patch 5.

If I am I didn't mean to point that out, I ws just pointing out the
end-API use. I.e. int return value v.s. the "populate dest" pattern, but
yes, in your end-state you'd drop the "sl &&" part.

quoted

But I ended up doing:

	        /* submodule.active is set */
	-       sl = repo_config_get_value_multi(repo, "submodule.active");
	-       if (sl) {
	+       if (!repo_config_get_const_value_multi(repo, "submodule.active", &sl)) {

Note the "const" in the function name, i.e. there's wrappers that handle
the case where we have a hardcoded key name, in which case we can BUG()
out if we'd return < 0, so all we have left is just "does key exist".

The problem here is that the block actually cares that the list is non-empty
and should not run if the list is empty. In that case, you would need to add
"&& sl->nr" to the condition.

I'm of course assuming that an empty list is different from an error. In
your for-each-repo example, we would not want to return a non-zero exit
code on an empty list, only on a bad key (or other I/O problem).

If we return a negative value on an error and the number of matches on
success, then this change could instead be "if (repo_config....() > 0)".

Hrm, I think you're confusing the worldview your series here is
advocating for, and what I'm suggesting as an alternative.

There isn't any way on "master" to have "an empty list", that's a
worldview you're proposing. In particular your 1/5 here removes:

	assert(values->nr > 0);

More generally the config format has no notion of "an empty list", if
you have a valid key-value pair at all you have a list of ".nr >= 1".

The "empty list" is a construct you're introducing in this series,
because you wanted the idiom of passing things to
for_each_string_list_item.

I'm advocating for not going that route, and instead make the *_multi()
method like the rest of the config API. I.e. to use the "return int,
populate dest" pattern.

It's fine if we disagree, but I get the sense that it's not clear what
we're disagreeing *on* :)

quoted

In any case, I'm all for having some simple wrapper for the common cases

A simple wrapper would be nice, and be exactly the method as it is
updated in this series. The error-result version could be adopted when
there is reason to do so.

Well, no :) We ended up with two different "simple wrapper[s]", mine
doesn't have this notion of a "struct string_list *list" with .nr == 0.

quoted

But I didn't find a single case where we actually needed this "never
give me a non-NULL list" behavior, it could just be generalized to
"let's have the API tell us if the key exist".

Most cases want to feed the result into the for_each_string_list_item()
macro. Based on the changes in patch 5, I think the empty list is a
better pattern and leads to prettier code in almost all cases.

I updated the WIP RFC series I linked to upthread a bit since my initial
reply (the link is still good, I force-pushed), I then rebased your
series here on "master", below is a diff of some select files.

The overall diff is much bigger obviously (API changes and all), but the
below demonstrates some of the API changes (yours is "-", mine is
"+"). I've commented inline on some of it:

	diff --git a/builtin/for-each-repo.c b/builtin/for-each-repo.c
	index 635ea5e15fd..16e9a76d04a 100644
	--- a/builtin/for-each-repo.c
	+++ b/builtin/for-each-repo.c
	@@ -29,6 +29,7 @@ int cmd_for_each_repo(int argc, const char **argv, const char *prefix)
	 	static const char *config_key = NULL;
	 	int i, result = 0;
	 	const struct string_list *values;
	+	int err;
	 
	 	const struct option options[] = {
	 		OPT_STRING(0, "config", &config_key, N_("config"),
	@@ -42,8 +43,13 @@ int cmd_for_each_repo(int argc, const char **argv, const char *prefix)
	 	if (!config_key)
	 		die(_("missing --config=<config>"));
	 
	-	values = repo_config_get_value_multi(the_repository,
	-					     config_key);
	+	err = repo_config_get_value_multi(the_repository, config_key, &values);
	+	if (err < 0)
	+		usage_msg_optf(_("got bad config --config=%s"),
	+			       for_each_repo_usage, options, config_key);
	+	else if (err)
	+		return 0;
	+
	 	for (i = 0; !result && i < values->nr; i++)
	 		result = run_command_on_repo(values->items[i].string, argc, argv);

Here we're relying an error to the user that we couldn't before, because
repo_config_get_value_multi() would return "NULL" for both "key is bad"
and "key doesn't exist". There's a corresponding test modification
below.

	diff --git a/builtin/gc.c b/builtin/gc.c
	index 1e9ac2ac7e3..94b77a88a99 100644
	--- a/builtin/gc.c
	+++ b/builtin/gc.c
	@@ -1472,9 +1472,7 @@ static int maintenance_register(int argc, const char **argv, const char *prefix)
	 	};
	 	int found = 0;
	 	const char *key = "maintenance.repo";
	-	char *config_value;
	 	char *maintpath = get_maintpath();
	-	struct string_list_item *item;
	 	const struct string_list *list;
	 
	 	argc = parse_options(argc, argv, prefix, options,
	@@ -1487,18 +1485,11 @@ static int maintenance_register(int argc, const char **argv, const char *prefix)
	 	git_config_set("maintenance.auto", "false");
	 
	 	/* Set maintenance strategy, if unset */
	-	if (!git_config_get_string("maintenance.strategy", &config_value))
	-		free(config_value);
	-	else
	+	if (git_config_lookup_value("maintenance.strategy"))
	 		git_config_set("maintenance.strategy", "incremental");

In looking at this I thought we were way overdue for a "does this key
exist?" helper, this and a few other API users use it.
	 
	-	list = git_config_get_value_multi(key);
	-	for_each_string_list_item(item, list) {
	-		if (!strcmp(maintpath, item->string)) {
	-			found = 1;
	-			break;
	-		}
	-	}
	+	if (!git_config_get_const_value_multi(key, &list))
	+		found = unsorted_string_list_has_string(list, maintpath);

So, it turns out that the initial reason you wanted the "pass NULL to
for_each_string_list_item" is actually something we can do with
unsorted_string_list_has_string(), which implements the same loop.

The difference here is *the* API difference we're discussing. I.e. we'll
never get a NULL "list", we'll instead always get a non-NULL list with

= 1 item if we can get this key at all.

The "const value" helper is a wrapper that handles the "err < 0" case. I
cases where we hardcode the key it's a BUG() if we get "err < 0". The
wrapper is just:

	int err = git_configset_get_value_multi(cs, key, dest);
        if (err < 0)
		BUG("failed to parse constant key '%s'!", key);
	return err;

	[...]
	@@ -1547,13 +1537,8 @@ static int maintenance_unregister(int argc, const char **argv, const char *prefi
	 		usage_with_options(builtin_maintenance_unregister_usage,
	 				   options);
	 
	-	list = git_config_get_value_multi(key);
	-	for_each_string_list_item(item, list) {
	-		if (!strcmp(maintpath, item->string)) {
	-			found = 1;
	-			break;
	-		}
	-	}
	+	if (!git_config_get_const_value_multi(key, &list))
	+		found = unsorted_string_list_has_string(list, maintpath);

Ditto the same git_config_get_const_value_multi() &
unsorted_string_list_has_string() pattern.
	 
	 	if (found) {
	 		int rc;
	diff --git a/builtin/log.c b/builtin/log.c
	index 719ef966045..bdb87f6c42b 100644
	--- a/builtin/log.c
	+++ b/builtin/log.c
	@@ -182,13 +182,15 @@ static void set_default_decoration_filter(struct decoration_filter *decoration_f
	 	int i;
	 	char *value = NULL;
	 	struct string_list *include = decoration_filter->include_ref_pattern;
	-	struct string_list_item *item;
	-	const struct string_list *config_exclude =
	-			git_config_get_value_multi("log.excludeDecoration");
	+	const struct string_list *config_exclude;
	 
	-	for_each_string_list_item(item, config_exclude)
	-		string_list_append(decoration_filter->exclude_ref_config_pattern,
	-				   item->string);
	+	if (!git_config_get_const_value_multi("log.excludeDecoration",
	+					      &config_exclude)) {
	+		struct string_list_item *item;
	+		for_each_string_list_item(item, config_exclude)
	+			string_list_append(decoration_filter->exclude_ref_config_pattern,
	+					   item->string);
	+	}

Here's a case where we need to use for_each_string_list_item(), I think
it's nice how we can now scope the "item" variable.

	 	/*
	 	 * By default, decorate_all is disabled. Enable it if
	diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
	index 5a8b6120157..b758255f816 100644
	--- a/builtin/submodule--helper.c
	+++ b/builtin/submodule--helper.c
	@@ -552,7 +552,7 @@ static int module_init(int argc, const char **argv, const char *prefix)
	 	 * If there are no path args and submodule.active is set then,
	 	 * by default, only initialize 'active' modules.
	 	 */
	-	if (!argc && git_config_get_value_multi("submodule.active")->nr)
	+	if (!argc && !git_config_lookup_value("submodule.active"))
	 		module_list_active(&list);

You changed these in your 2/5, but they really just wanted the new "does
this key exist?" API. No need to construct the string_list just to throw
it away...
	 
	 	info.prefix = prefix;
	@@ -2720,7 +2720,7 @@ static int module_update(int argc, const char **argv, const char *prefix)
	 		 * If there are no path args and submodule.active is set then,
	 		 * by default, only initialize 'active' modules.
	 		 */
	-		if (!argc && git_config_get_value_multi("submodule.active")->nr)
	+		if (!argc && !git_config_lookup_value("submodule.active"))
	 			module_list_active(&list);

Ditto.
	 
	 		info.prefix = opt.prefix;
	@@ -3164,7 +3164,6 @@ static int config_submodule_in_gitmodules(const char *name, const char *var, con
	 static void configure_added_submodule(struct add_data *add_data)
	 {
	 	char *key;
	-	const char *val;
	 	struct child_process add_submod = CHILD_PROCESS_INIT;
	 	struct child_process add_gitmodules = CHILD_PROCESS_INIT;
	 
	@@ -3209,7 +3208,7 @@ static void configure_added_submodule(struct add_data *add_data)
	 	 * is_submodule_active(), since that function needs to find
	 	 * out the value of "submodule.active" again anyway.
	 	 */
	-	if (!git_config_get_string_tmp("submodule.active", &val)) {
	+	if (!git_config_lookup_value("submodule.active")) {
	 		/*
	 		 * If the submodule being added isn't already covered by the
	 		 * current configured pathspec, set the submodule's active flag

Ditto.

	diff --git a/submodule.c b/submodule.c
	index 06230961c80..4474cf9ed2d 100644
	--- a/submodule.c
	+++ b/submodule.c
	@@ -274,8 +274,7 @@ int is_tree_submodule_active(struct repository *repo,
	 	free(key);
	 
	 	/* submodule.active is set */
	-	sl = repo_config_get_value_multi(repo, "submodule.active");
	-	if (sl && sl->nr) {
	+	if (!repo_config_get_const_value_multi(repo, "submodule.active", &sl)) {
	 		struct pathspec ps;
	 		struct strvec args = STRVEC_INIT;
	 		const struct string_list_item *item;

Another "*the* API difference we're discussing". I.e. sure, your end
state would be "if (sl->nr)", but if we make it return "int"...

	diff --git a/t/helper/test-config.c b/t/helper/test-config.c
	index 90810946783..432ad047537 100644
	--- a/t/helper/test-config.c
	+++ b/t/helper/test-config.c
	@@ -95,8 +95,7 @@ int cmd__config(int argc, const char **argv)
	 			goto exit1;
	 		}
	 	} else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) {
	-		strptr = git_config_get_value_multi(argv[2]);
	-		if (strptr->nr) {
	+		if (!git_config_get_const_value_multi(argv[2], &strptr)) {
	 			for (i = 0; i < strptr->nr; i++) {
	 				v = strptr->items[i].string;
	 				if (!v)

Ditto, (this one converts away from your preferred API use).

	@@ -159,8 +158,7 @@ int cmd__config(int argc, const char **argv)
	 				goto exit2;
	 			}
	 		}
	-		strptr = git_configset_get_value_multi(&cs, argv[2]);
	-		if (strptr && strptr->nr) {
	+		if (!git_configset_get_const_value_multi(&cs, argv[2], &strptr)) {
	 			for (i = 0; i < strptr->nr; i++) {
	 				v = strptr->items[i].string;
	 				if (!v)

Ditto, sans that you'd presumably want s/strptr && // here.

	diff --git a/t/t0068-for-each-repo.sh b/t/t0068-for-each-repo.sh
	index 4675e852517..115221c9ca5 100755
	--- a/t/t0068-for-each-repo.sh
	+++ b/t/t0068-for-each-repo.sh
	@@ -33,4 +33,10 @@ test_expect_success 'do nothing on empty config' '
	 	git for-each-repo --config=bogus.config -- help --no-such-option
	 '
	 
	+test_expect_success 'error on bad config keys' '
	+	test_expect_code 129 git for-each-repo --config=a &&
	+	test_expect_code 129 git for-each-repo --config=a.b. &&
	+	test_expect_code 129 git for-each-repo --config="'\''.b"
	+'
	+
	 test_done

A test showing behavior change we can implement now that we don't sweep
the "err < 0" under the rug.

That branch also grew to have some other changes we may or may not want,
one thing was to convert the various *_get_*() functionts that now
normalize the non-zero return value with e.g.:

	 int git_configset_get_int(struct config_set *cs, const char *key, int *dest)
	 {
	        const char *value;
	-       if (!git_configset_get_value(cs, key, &value)) {
	-               *dest = git_config_int(key, value);
	-               return 0;
	-       } else
	-               return 1;
	+       int err;
	+
	+       if ((err = git_configset_get_value(cs, key, &value)))
	+               return err;
	+       *dest = git_config_int(key, value);
	+       return 0;
	 }

No caller currently cares about it, but I think it makes sense generally
not to throw away errors if we can (whether that part is worth the churn
is another topic).

Anyway, the reason I started looking at this RFC to begin with was
because this *_multi() part of the config API has often seemed odd to
me, i.e. I wondered why we couldn't just have it use the "return int,
populate dest" pattern. I'd just never tried to see if I could get that
to work.

It's a bit of one-off churn to get to this point, but I think the end
result of having all the API functions act the same way to signal key
existence v.s. validity is worth it.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help