Re: [PATCH v2] sha1-file: remove OBJECT_INFO_SKIP_CACHED
From: Jeff King <hidden>
Date: 2020-01-07 11:22:15
On Mon, Jan 06, 2020 at 03:47:53PM -0800, Jonathan Nieder wrote:
quoted
quoted
+ * Callers are responsible for calling write_object_file to record the + * object in persistent storage before writing any other new objects + * that reference it. + */ int pretend_object_file(void *, unsigned long, enum object_type, struct object_id *oid);I think this is an improvement over the status quo, but it's still a potential trap for code which happens to run in the same process (see my other email in the thread). Should the message perhaps be even more scary?A pet peeve of mine is warning volume escalation: if it becomes common for us to say * Warning: callers are reponsible for [...] then new warnings trying to stand out might say * WARNING: callers are responsible for [...] and then after we are desensitized to that, we may switch to * WARNING WARNING WARNING, not the usual blah-blah: callers are and so on. The main way I have found to counteract that is to make the "dangerous curve" markers context-specific enough that people don't learn to ignore them. After all, sometimes a concurrency warning is important to me, at other times warnings about clarity may be what attract my interest, and so on.
I meant less about the number of capital letters, and more that we should be saying "this interface is dangerous; don't use it". Because it's not just "callers are responsible". It's "this can cause subtle and hard-to-debug issues because it's writing to global state". My preferred solution would actually be to rip it out entirely, but we'd need some solution for git-blame, the sole caller. Possibly it could insert the value straight into the diff_filespec. But according to the thread that I linked earlier, I poked at that last year but it didn't look trivial.
I don't have a good suggestion here. Perhaps "Callers are responsible for" is too slow and something more terse would help? /* * Adds an object to the in-memory object store, without writing it * to disk. * * Use with caution! Unless you call write_object_file to record the * in-memory object to persistent storage, any other new objects that * reference it will point to a missing (in memory only) object, * resulting in a corrupt repository. */
Yeah, that's more what I had in mind.
It would be even better if we have some automated way to catch this kind of issue. Should tests run "git fsck" after each test? Should write_object_file have a paranoid mode that checks integrity? I don't know an efficient way to do that. Ultimately I am comfortable counting on reviewers to be aware of this kind of pitfall. While nonlocal invariants are always hard to maintain, this pitfall is inherent in the semantics of the function, so I am not too worried that reviewers will overlook it.
Yeah, given the scope of the problem (we have a single caller, and this mechanism is over a decade old) I'm fine with review as the enforcement mechanism, too.
A less error-prone interface would tie the result of pretend_object_file to a short-lived overlay on the_repository without affecting global state. We could even enforce read-only access in that overlay. I don't think the "struct repository" interface and callers are ready for that yet, though.
I agree that would be better, though it's still kind-of global (in that the repository object is effectively a global for most processes). -Peff