Re: [PATCH RFC 0/8] btrfs: experimental compression support for subpage
From: Qu Wenruo <hidden>
Date: 2021-06-24 01:10:16
On 2021/6/24 上午5:23, David Sterba wrote:
On Wed, Jun 23, 2021 at 01:55:21PM +0800, Qu Wenruo wrote:quoted
This patchset is based on my previously submitted compression refactor, which is further based on subpage patchset. It can be fetched from the following repo: https://github.com/adam900710/linux/tree/compression It only has been lightly tested, no stress test/full fstests run yet. The reason for such a unstable patchset is to get feedback on how to continue the development, aka the feedback for the last patch. The first 7 patches are mostly cleanups to make compression path to be more subpage compatible. The RFC is for the last patch, which will enable subpage compression in a pretty controversial way. The reason is copied from that patch:This mechanism allows btrfs to continue handling next ranges, without waiting for the time consuming compression. But this has a problem for subpage case, as we could have the following delalloc range for a page: 0 32K 64K | |///////| |///////| \- A \- B In above case, if we pass both range to cow_file_range_async(), both range A and range B will try to unlock the full page [0, 64K). And which finishes later than the other range will try to do other page operatioins like end_page_writeback() on a unlocked page, triggering VM layer BUG_ON(). Currently I don't have any perfect solution to this, but two workarounds: - Only allow compression for fully page aligned range This is what I did in this patch. By this, the compressed range can exclusively lock the first page (and other pages), so they are completely safe to do whatever they want. The problem is, we will not compress a lot of small writes. This is especially problematic as our target page size is 64K, not a small size. - Make cow_file_range_async() to behave like cow_file_range() for subpage This needs several behavier change, and are all subpage specific: * Skip the first page of the range when finished Just like cow_file_range() * Have a way to wait for the async_cow to finish before handling the next delalloc range The biggest problem here is the performance impact. Although by this we can compress all sector aligned ranges, we will waste time waiting for the async_cow to finish. This is completely denying the meaning of "async" part. Now to mention there are tons of code needs to be changed. Thus I choose the current way to only compress ranges which is fully page aligned. The cost is we will skip a lot of small writes for 64K page size.Any feedback on this part would be pretty helpful.As another step, progressing towards full subpage support I think that the limiting compression to the full 64k page is acceptable. Better than nothing and without intrusive changes described above. There still might be some odd case even with the whole page, I'm not sure how exactly it would work with 4k/64k but there are some checks inside compression anyway if the size is getting smaller and then bailing out uncompressed anyway. So in general fallback to uncompressed data happens already.
Well well well, just come by a new shinny and maybe perfect idea to make it better. - For async cow submission The main problem described above can be solved by btrfs_subpage::writers and a change in how we run delalloc range. Currently we find and lock a delalloc range, then run the locked range immediately. This makes any submitted async cow can finish before we reach next range, thus even we increase subage::writers it makes no sense. But if we find and lock *all* delalloc ranges inside a page first, and increase subpage::writers according to all the locked range, we can ensure the full page won't be unlocked until all delalloc ranges are submitted and finished. By this we can allow sector-perfect compressed write. And for exsting delalloc range handlers like cow_file_range(), we need to keep a bitmap to record which range is submitted asynchronized, then unlock the remaining ranges manually. - The bio submission and delalloc range Above change makes me thinking, why we don't submit bio when we run delalloc range? When running delalloc range, we will need to find and lock all pages in the range any way. Why we can't do the submission at the same time, as we already need to do the page operations anyway. I guess there is some performance related reason, but I'm not sure. Any feedback on this part will be appreciated. Despite the better solution, I'll still push the current full page compression, not only as a quick stop gap, but all these preparation patches will benefit later sector-perfect compression. Thanks, Qu