Re: [PATCH] e2fsck: Avoid changes on recovery flags when... | linux-ext4

Re: [PATCH] e2fsck: Avoid changes on recovery flags when jbd2_journal_recover() failed

From: Haotian Li <hidden>
Date: 2021-03-05 09:49:21

I'm very sorry for the delay. Thanks for your suggestion. Just as you said,
we use an e2fsck.conf option "recovery_error_behavior" to help user
adopt different behavior on this situation. The new v2 patch will be resent.

在 2021/1/6 7:06, harshad shirwadkar 写道:

Sorry for the delay. Thanks for providing more information, Haotian.
So this is happening due to IO errors experienced due to a flaky
network connection. I can imagine that this is perhaps a situation
which is recoverable but I guess when running on physical hardware,
it's less likely for such IO errors to be recoverable. I wonder if
this means we need an e2fsck.conf option - something like
"recovery_error_behavior" with default value of "continue". For
usecases such as this, we can set it to "exit" or perhaps "retry"?

On Thu, Dec 24, 2020 at 5:49 PM Zhiqiang Liu [off-list ref] wrote:

quoted

friendly ping...

On 2020/12/15 15:43, Haotian Li wrote:

quoted

Thanks for your review. I agree with you that it's more important
to understand the errors found by e2fsck. we'll decribe the case
below about this problem.

The probelm we find actually in a remote storage case. It means
e2fsck's read or write may fail because of the network packet loss.
At first time, some packet loss errors happen during e2fsck's journal
recovery (using fsck -a), then recover failed. At second time, we
fix the network problem and run e2fsck again, but it still has errors
when we try to mount. Then we set jsb->s_start journal flags and retry
e2fsck, the problem is fixed. So we suspect something wrong on e2fsck's
journal recovery, probably the bug we've described on the patch.

Certainly, directly exit is not a good way to fix this problem.
just like what Harshad said, we need tell user what happen and listen
user's decision, continue e2fsck or not. If we want to safely use
e2fsck without human intervention (using fsck -a), I wonder if we need
provide a safe mechanism to complate the fast check but avoid changes
on journal or something else which may be fixed in feature (such
as jsb->s_start flag)?

Thanks
Haotian

在 2020/12/15 4:27, Theodore Y. Ts'o 写道:

quoted

On Mon, Dec 14, 2020 at 10:44:29AM -0800, harshad shirwadkar wrote:

quoted

Hi Haotian,

Yeah perhaps these are the only recoverable errors. I also think that
we can't surely say that these errors are recoverable always. That's
because in some setups, these errors may still be unrecoverable (for
example, if the machine is running under low memory). I still feel
that we should ask the user about whether they want to continue or
not. The reason is that firstly if we don't allow running e2fsck in
these cases, I wonder what would the user do with their file system -
they can't mount / can't run fsck, right? Secondly, not doing that
would be a regression. I wonder if some setups would have chosen to
ignore journal recovery if there are errors during journal recovery
and with this fix they may start seeing that their file systems aren't
getting repaired.

It may very well be that there are corrupted file system structures
that could lead to ENOMEM.  If so, I'd consider that someone we should
be explicitly checking for in e2fsck, and it's actually relatively
unlikely in the jbd2 recovery code, since that's fairly straight
forward --- except I'd be concerned about potential cases in your Fast
Commit code, since there's quite a bit more complexity when parsing
the fast commit journal.

This isn't a new concern; we've already talked a about the fact the
fast commit needs to have a lot more sanity checks to look for
maliciously --- or syzbot generated, which may be the same thing :-)

--- inconsistent fields causing the e2fsck reply code to behave in

unexpected way, which might include trying to allocate insane amounts
of memory, array buffer overruns, etc.

But assuming that ENOMEM is always due to operational concerns, as
opposed to file system corruption, may not always be a safe
assumption.

Something else to consider is from the perspective of a naive system
administrator, if there is an bad media sector in the journal, simply
always aborting the e2fsck run may not allow them an easy way to
recover.  Simply ignoring the journal and allowing the next write to
occur, at which point the HDD or SSD will redirect the write to a bad
sector spare spool, will allow for an automatic recovery.  Simply
always causing e2fsck to fail, would actually result in a worse
outcome in this particular case.

(This is especially true for a mobile device, where the owner is not
likely to have access to the serial console to manually run e2fsck,
and where if they can't automatically recover, they will have to take
their phone to the local cell phone carrier store for repairs ---
which is *not* something that a cellular provider will enjoy, and they
will tend to choose other cell phone models to feature as
supported/featured devices.  So an increased number of failures which
cann't be automatically recovered cause the carrier to choose to
feature, say, a Xiaomi phone over a ZTE phone.)

quoted

I'm wondering if you saw any a situation in your setup where exiting
e2fsck helped? If possible, could you share what kind of errors were
seen in journal recovery and what was the expected behavior? Maybe
that would help us decide on the right behavior.

Seconded; I think we should try to understand why it is that e2fsck is
failing with these sorts of errors.  It may be that there are better
ways of solving the high-level problem.

For example, the new libext2fs bitmap backends were something that I
added because when running a large number of e2fsck processes in
parallel on a server machine with dozens of HDD spindles was causing
e2fsck processes to run slowly due to memory contention.  We fixed it
by making e2fsck more memory efficient, by improving the bitmap
implementations --- but if that hadn't been sufficient, I had also
considered adding support to make /sbin/fsck "smarter" by limiting the
number of fsck.XXX processes that would get started simultaneously,
since that could actually cause the file system check to run faster by
reducing memory thrashing.  (The trick would have been how to make
fsck smart enough to automatically tune the number of parallel fsck
processes to allow, since asking the system administrator to manually
tune the max number of processes would be annoying to the sysadmin,
and would mean that the feature would never get used outside of $WORK
in practice.)

So is the actual underlying problem that e2fsck is running out of
memory?  If so, is it because there simply isn't enough physical
memory available?  Is it being run in a cgroup container which is too
small?  Or is it because too many file systems are being checked in
parallel at the same time?

Or is it I/O errors that you are concerned with?  And how do you know
that they are not permanent errors; is thie caused by something like
fibre channel connections being flaky?

Or is this a hypotethical worry, as opposed to something which is
causing operational problems right now?

Cheers,

                                     - Ted

.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help