Re: [PATCH 6/6] hash-object: use fsck for object checks
From: Taylor Blau <hidden>
Date: 2023-01-18 21:34:25
On Wed, Jan 18, 2023 at 03:44:12PM -0500, Jeff King wrote:
This is obviously going to be a user-visible behavior change, and the
test changes earlier in this series show the scope of the impact. But
I'd argue that this is OK:
- the documentation for hash-object is already vague about which
checks we might do, saying that --literally will allow "any
garbage[...] which might not otherwise pass standard object parsing
or git-fsck checks". So we are already covered under the documented
behavior.
- users don't generally run hash-object anyway. There are a lot of
spots in the tests that needed to be updated because creating
garbage objects is something that Git's tests disproportionately do.
- it's hard to imagine anyone thinking the new behavior is worse. Any
object we reject would be a potential problem down the road for the
user. And if they really want to create garbage, --literally is
already the escape hatch they need.This is the discussion I was pointing out earlier in the series as evidence for making this behavior the new default without "--literally". That being said, let me play devil's advocate for a second. Do the new fsck checks slow anything in hash-object down significantly? If so, then it's plausible to imagine a hash-object caller who (a) doesn't use `--literally`, but (b) does care about throughput if they're writing a large number of objects at once. I don't know if such a situation exists, or if these new fsck checks even slow hash-object down enough to care. But I didn't catch a discussion of this case in your series, so I figured I'd bring it up here just in case.
- the resulting messages are much better. For example:
[before]
$ echo 'tree 123' | git hash-object -t commit --stdin
error: bogus commit object 0000000000000000000000000000000000000000
fatal: corrupt commit
[after]
$ echo 'tree 123' | git.compile hash-object -t commit --stdin
error: object fails fsck: badTreeSha1: invalid 'tree' line format - bad sha1
fatal: refusing to create malformed objectMuch nicer, well done.
quoted hunk ↗ jump to hunk
Signed-off-by: Jeff King <redacted> --- object-file.c | 55 ++++++++++++++++++------------------------ t/t1007-hash-object.sh | 11 +++++++++ 2 files changed, 34 insertions(+), 32 deletions(-)diff --git a/object-file.c b/object-file.c index 80a0cd3b35..5c96384803 100644 --- a/object-file.c +++ b/object-file.c@@ -33,6 +33,7 @@ #include "object-store.h" #include "promisor-remote.h" #include "submodule.h" +#include "fsck.h" /* The maximum size for an object header. */ #define MAX_HEADER_LEN 32@@ -2298,32 +2299,21 @@ int repo_has_object_file(struct repository *r, return repo_has_object_file_with_flags(r, oid, 0); } -static void check_tree(const void *buf, size_t size) -{ - struct tree_desc desc; - struct name_entry entry; - - init_tree_desc(&desc, buf, size); - while (tree_entry(&desc, &entry)) - /* do nothing - * tree_entry() will die() on malformed entries */ - ; -} - -static void check_commit(const void *buf, size_t size) -{ - struct commit c; - memset(&c, 0, sizeof(c)); - if (parse_commit_buffer(the_repository, &c, buf, size, 0)) - die(_("corrupt commit")); -} - -static void check_tag(const void *buf, size_t size) -{ - struct tag t; - memset(&t, 0, sizeof(t)); - if (parse_tag_buffer(the_repository, &t, buf, size)) - die(_("corrupt tag"));
OK, here we're getting rid of all of the lightweight checks that hash-object used to implement on its own.
quoted hunk ↗ jump to hunk
+/* + * We can't use the normal fsck_error_function() for index_mem(), + * because we don't yet have a valid oid for it to report. Instead, + * report the minimal fsck error here, and rely on the caller to + * give more context. + */ +static int hash_format_check_report(struct fsck_options *opts, + const struct object_id *oid, + enum object_type object_type, + enum fsck_msg_type msg_type, + enum fsck_msg_id msg_id, + const char *message) +{ + error(_("object fails fsck: %s"), message); + return 1; } static int index_mem(struct index_state *istate,@@ -2350,12 +2340,13 @@ static int index_mem(struct index_state *istate, } } if (flags & HASH_FORMAT_CHECK) { - if (type == OBJ_TREE) - check_tree(buf, size); - if (type == OBJ_COMMIT) - check_commit(buf, size); - if (type == OBJ_TAG) - check_tag(buf, size); + struct fsck_options opts = FSCK_OPTIONS_DEFAULT; + + opts.strict = 1; + opts.error_func = hash_format_check_report; + if (fsck_buffer(null_oid(), type, buf, size, &opts)) + die(_("refusing to create malformed object")); + fsck_finish(&opts); }
And here's the main part of the change, which is delightfully simple and appears correct to me.
quoted hunk ↗ jump to hunk
diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh index 2d2148d8fa..ac3d173767 100755 --- a/t/t1007-hash-object.sh +++ b/t/t1007-hash-object.sh@@ -222,6 +222,17 @@ test_expect_success 'empty filename in tree' ' grep "empty filename in tree entry" err ' +test_expect_success 'duplicate filename in tree' ' + hex_oid=$(echo foo | git hash-object --stdin -w) && + bin_oid=$(echo $hex_oid | hex2oct) && + { + printf "100644 file\0$bin_oid" && + printf "100644 file\0$bin_oid" + } >tree-with-duplicate-filename && + test_must_fail git hash-object -t tree tree-with-duplicate-filename 2>err && + grep "duplicateEntries" err +' +
For what it's worth, I think that this is sufficient coverage for the new fsck checks. Thanks, Taylor