Thread (2 messages) 2 messages, 2 authors, 2025-01-31

Re: [PATCH v2 4/8] packed-backend: add "packed-refs" header consistency check

From: shejialuo <hidden>
Date: 2025-01-31 14:22:24

On Thu, Jan 30, 2025 at 10:58:32AM -0800, Junio C Hamano wrote:
shejialuo [off-list ref] writes:
quoted
In "packed-backend.c::create_snapshot", if there is a header (the line
which starts with '#'), we will check whether the line starts with "#
pack-refs with:". As we are going to implement the header consistency
check, we should port this check into "packed_fsck".

However, the above check is not enough, this is because "git pack-refs"
will always write "PACKED_REFS_HEADER" which is a constant string to the
"packed-refs" file. So, we should check the following things for the
header.
I haven't done history digging in this area for a while, but we
should make sure we are not flagging a file that was written in
ancient version of Git whose repository is still supported.
Understood.
quoted
1. If the header does not exist, we may report an error to the user
   because it should exist, but we do allow no header in "packed-refs"
   file. So, create a new fsck message "packedRefMissingHeader(INFO)" to
   warn the user and also keep compatibility.
Are we sure "it should exist"?  I think the header did not exist
before "Git v1.5.0".  I didn't check with other reimplementations of
Git (like jgit or libgit2), but as long as our reading side of the
runtime allows a packed-refs file without the header without
complaint, I do not think it is a good idea to treat it as a
report-worthy event from "git fsck".
OK, let me improve this in the next version.
quoted
2. If the header content does not start with "# packed-ref with:", we
   should report an error just like what "create_snapshot" does. So,
   create a new fsck message "badPackedRefHeader(ERROR)" for this.
This I can agree with.  If the first line begins with "#" but not
with that string (with a trailing SP), that is a sign that it may
not even be a valid packed-refs file, which is a report-worthy
event.
quoted
3. If the header content is not the same as the constant string
   "PACKED_REFS_HEADER", ideally, we should report an error to the user.
NO.  THAT IS NOT IDEAL AT ALL.

The header was written like this:

        /* perhaps other traits later as well */
        fprintf(cbdata.refs_file, "# pack-refs with: peeled \n");

in the older versions of Git before it was made into a separate
preprocessor macro and lost the comment (the above excerpt is from
"git show v1.5.0:builtin-pack-refs.c").

Notice "other traits later" in the comment?

The thing is _designed_ to be extensible.  In fact, these days we
support a few more traits

        static const char PACKED_REFS_HEADER[] =
                "# pack-refs with: peeled fully-peeled sorted \n";

(an excerpt from the current refs/packed-backend.c).

Reporting an error when you see something written by an older
version of Git is far from ideal.
Understood, I think we should be consistency with the runtime check.
quoted
   However, we allow other contents as long as the header content starts
   with "# packed-ref with:". To keep compatibility, create a new fsck
   message "unknownPackedRefHeader(INFO)" to warn about this. We may
   tighten this rule in the future.
Whatever we do, what we do with an unknown trait should be in line
with what the runtime does.  If the runtime failed (we do not, but
this is to illustrate the principle [*]) on a packed-refs file
without "sorted" trait, noticing that "sorted" is not there and
flagging as an error is a good thing to do.  But if the runtime
gracefully degrades and sorts the list of refs read from such a
packed-refs file before continuing, then a packed-refs file that
lack "sorted" trait is not a report-worthy event.
Actually, the runtime won't complain about this. I agree with you here.
I do not offhand recall if we introduced the concept of mandatory vs
optional traits in the packed-refs part of the system (like we have
in the index extension subsystem, where a version of Git that
encounters an unknown *and* mandatory index extension must refuse to
touch the repository), but if there is a mandatory trait declared in
the header that our version of Git does not understand, it is a
report-worthy event that must be flagged with "git refs verify".
I don't think any trait in "packed-refs" is mandatory. Because I have
done some experiments before implementing the code. We should only check
case 2 here.
quoted
+static int packed_fsck_ref_header(struct fsck_options *o, const char *start, const char *eol)
+{
+	const char *err_fmt = NULL;
+	int fsck_msg_id = -1;
+
+	if (!starts_with(start, "# pack-refs with:")) {
+		err_fmt = "'%.*s' does not start with '# pack-refs with:'";
+		fsck_msg_id = FSCK_MSG_BAD_PACKED_REF_HEADER;
+	} else if (strncmp(start, PACKED_REFS_HEADER, strlen(PACKED_REFS_HEADER))) {
+		err_fmt = "'%.*s' is an unknown packed-refs header";
+		fsck_msg_id = FSCK_MSG_UNKNOWN_PACKED_REF_HEADER;
+	}
As I outlined above, this is totally unacceptable.  

Inspecting the header is good, but if this code claims to be a
checker, it should do at least what the runtime does, i.e. parse the
header to tell what traits the packed-file declares, not just
assuming that it is a fixed string.  And error on unknown trait(s)
if they are mandatory (if such a concept is implemented in the
runtime reading side).  Informing on an unknown and optional
trait(s) I can live with, but personally I wouldn't recommend it.
Got it, I don't want to report unknown trait(s) either.
In other words, report loudly if it is an error, but otherwise stay
silent if we know we tolerate it well. 
Thanks for this suggestion.
quoted
+static int packed_fsck_ref_content(struct fsck_options *o,
+				   const char *start, const char *eof)
+{
+	struct strbuf packed_entry = STRBUF_INIT;
+	int line_number = 1;
We limit ourselves with about 1 billion refs in the packed-refs
file, which may be plenty,
Let me change this to `size_t`. This would be better.
but I do not quite understand the use of
this variable.  There is no loop inside this so ...
The reason why I define this variable is that I am going to use loop to
check each entry in the next patch.
quoted
+	const char *eol;
+	int ret = 0;
+
+	strbuf_addf(&packed_entry, "packed-refs line %d", line_number);
... this is always line #1, and then
quoted
+	ret |= packed_fsck_ref_next_line(o, &packed_entry, start, eof, &eol);
+	if (*start == '#') {
+		ret |= packed_fsck_ref_header(o, start, eol);
+
+		start = eol + 1;
+		line_number++;
... it may be incremented, but upon returning from the funcition, it
is lost.

Perhaps you wanted to make it a function-scope static, but then you
are allowed to read one single packed-refs file during the life of
your process before you exit, which I am not sure is what you want?
Actually, what I want is use this variable for looping the each ref
entry in the "packed-refs" file.
quoted
+	} else {
+		struct fsck_ref_report report = { 0 };
+		report.path = "packed-refs";
+
+		ret |= fsck_report_ref(o, &report,
+				       FSCK_MSG_PACKED_REF_MISSING_HEADER,
+				       "missing header line");
+	}
+
+	strbuf_release(&packed_entry);
+	return ret;
+}
Thanks,
Jialuo
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help