Re: ext4 regression in v5.9-rc2 from e7bfb5c9bb3d on ro fs with overlapped bitmaps

From: Josh Triplett <josh@joshtriplett.org>
Date: 2020-10-07 08:03:21
Also in: lkml

On Tue, Oct 06, 2020 at 09:35:33AM -0400, Theodore Y. Ts'o wrote:

On Mon, Oct 05, 2020 at 10:03:06PM -0700, Josh Triplett wrote:

quoted

I'm not trying to create a problem here; I'm trying to address a whole
family of problems. I was generally under the impression that mounting
existing root filesystems fell under the scope of the kernel<->userspace
or kernel<->existing-system boundary, as defined by what the kernel
accepts and existing userspace has used successfully, and that upgrading
the kernel should work with existing userspace and systems. If there's
some other rule that applies for filesystems, I'm not aware of that.
(I'm also not trying to suggest that every random corner case of what
the kernel *could* accept needs to be the format definition, but rather,
cases that correspond to existing userspace.)

I'm not opposed to the kernel side change; it's *this time*.

I appreciate that. (Does that mean you wouldn't object to this patch
going into 5.9, with Jens' Reviewed-by and your ack?)

I'm more interested in killing off the tool that generated the
malformed file system in the first place.

It was developed for a reason, and it's not intended for general-purpose
use. It serves its purpose, which has nothing to do with e2fsprogs or
any component of e2fsprogs. It does not simply create filesystem images,
in the way that mke2fs or e2fsdroid does.

If I'd found any existing code that would have worked, or could have
been adapted to work, I would have happily reused it; I don't like
reinventing the wheel.

Also, see below regarding e2fsck.

As I keep pointing out, things aren't going to go well if "e2fsck -E
unshare_blocks" is applied to it.

These aren't intended to *ever* become writable. I've ensured that
they're marked with EXT4_FEATURE_RO_COMPAT_READONLY as well, which I
would hope implies that. Would that be sufficient to convince e2fsck
that it should never attempt to apply unshare_blocks to it?

Also, it seems like most of block_validity is trying to carefully avoid
metadata blocks intersecting with the journal, which could cause all
manner of issues; the filesystems I'm working with have no journal
(because they're permanently read-only).

With the sole exception of the shared bitmap block, I have it as a
requirement to pass e2fsck with zero complaints. If you'd be willing to
take a patch to e2fsck along the same lines as this kernel patch, then
I'd be happy to add it to the regression test suite: no filesystem
images that e2fsck is unhappy with. Would that help?

As mentioned before, I'd also be happy to supply some representative
filesystem images.

We had a similar issue with Android.  Many years ago, Andy Rubin was
originally quite allergic to the GPL, and had tried to promulgate the
rule, "no GPL in Android Userspace".  This is why bionic is used as
libc, and this resulted in Android engineers (I think before the
Google acquisition, but I'm not 100% sure), creating an unofficial,
"unauthorized" make_ext4fs which was a BSD-licensed version of mke2fs.

That is indeed a sad reason to develop anything. It's also particularly
sad to develop a different tool to serve exactly the same purpose as an
existing tool.

As opposed to, in this case, developing a different piece of software
to serve different purposes, that happens to make use of ext4 as one
component.

Unfortuantely, it created file systems which the kernel would never
complain about, but which, 50% of the time, would result in a file
system which under some circumstances, would get corrupted (even more)
when e2fsck attempted to repair the file system.  So if a user had a
bit flip caused by an eMMC hiccup, e2fsck could end up making things
worse.

I'd be interested in reading more about that, if you could suggest a
link or the right search terms. I did find
https://bugzilla.kernel.org/show_bug.cgi?id=195561 , which references the
issues in the same way you've done here, but I didn't find any of the
specific details you're mentioning here about what made the images it
created fragile.

I did see you mention, there, the advice of making sure to check
filesystems with e2fsck. That's something I've done from day one: I
didn't stop until e2fsck was happy, not just the kernel.

So that's why I really don't like it when there are "unauthorized",
unofficial tools creating file systems out there which we are now
obliged to support.

ext4 is an incredibly powerful and performant filesystem, with a
reasonably well-documented format and many useful properties. Not all
software that works with ext4 is going to live in e2fsprogs, nor should
it need to.

This would be analogous to the notion that (say) the FreeBSD kernel
belongs in the e2fsprogs repository because it has an ext4 driver. The
tail would be wagging the dog.

Even if it's OK as far as the kernel is
concerned, unless you're planning on forking and/or reimplementing all
of e2fsprogs, and doing so correctly, that way is going to cause
headaches for file system developers.

I haven't recreated or reimplemented e2fsprogs, and I have no desire to
do so. It seems like, as a result of the make_ext4fs debacle, you're
seeing any other software working with ext4 as an instance of
reinventing the wheel (and failing to make that wheel round, because
hey, it still rolls). That's not the case here.

quoted

I don't *want* to rely on what apparently turned out to be an
undocumented bug in the kernel's validator. That's why I was trying to
fix the issue in what seemed like the right way, by detecting the
situation and turning off the validator. That seemed like it would fully
address the issue. If it would help, I could also supply a tiny filesystem
image for regression testing.

I'm OK with working around the problem, and we're lucky that it's this
simple.... this time around.

But can we *please* take your custom tool out back and shoot it in the
head?

Nope. As mentioned, this isn't about creating ext4 filesystem images,
and it isn't even remotely similar to mke2fs.

And perhaps we need to make a policy that makes it clear that for file
systems, it's not just about "whatever the kernel happens to accept".
It should also be, "was it generated using an official version of the
userspace tools", at least as a consideration.

I don't think that's a reasonable policy at all, no.

"Should pass e2fsck" would be a relatively reasonable policy, assuming
that e2fsck doesn't ratchet up strictness in a way that breaks existing
filesystems. I think it'd be reasonable to recommend against trying to
push the boundaries of what the kernel accepts but e2fsck doesn't,
without a concrete plan to improve e2fsck and the filesystem
documentation.

- Josh Triplett

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help