Re: [PATCH 1/3] btrfs: always abort the transaction if we abort a trans handle
From: David Sterba <hidden>
Date: 2021-05-25 15:30:15
On Thu, May 20, 2021 at 11:21:31AM -0400, Josef Bacik wrote:
While stress testing our error handling I noticed that sometimes we would still commit the transaction even though we had aborted the transaction. Currently we track if a trans handle has dirtied any metadata, and if it hasn't we mark the FS as having an error (so no new transactions can be started), but we will allow the current transaction to complete as we do not mark the transaction itself as having been aborted. This sounds good in theory, but we were not properly tracking IO errors in btrfs_finish_ordered_io, and thus committing the transaction with bogus free space data. This isn't necessarily a problem per-se with the free space cache, as the other guards in place would have kept us from accepting the free space cache as valid, but hi-lights a real world case where we had a bug and could have corrupted the filesystem because of it. This "skip abort on empty trans handle" is nice in theory, but assumes we have perfect error handling everywhere, which we clearly do not. Also we do not allow further transactions to be started, so all this does is save the last transaction that was happening, which doesn't necessarily gain us anything other than the potential for real corruption. Remove this particular bit of code, if we decide we need to abort the transaction then abort the current one and keep us from doing real harm to the file system, regardless of whether this specific trans handle dirtied anything or not. Signed-off-by: Josef Bacik <josef@toxicpanda.com>
I've checked logs what would be the effects of leaving out the message printed by the unused transaction. Getting rid of it sounds like an improvement as it's not adding any information, the first transaction is noisy and that's where the problem happens. Additional messages about the abort are confusing, so yeah. Besides, the updates to trans->dirty lack any serialization so it's quite unreliable anyway.