Re: [Qemu-devel] d_off field in struct dirent and 32-on-64 emulation
From: "Theodore Y. Ts'o" <tytso@mit.edu>
Date: 2018-12-29 02:15:01
Also in:
linux-ext4, linux-fsdevel, lkml, qemu-devel
On Fri, Dec 28, 2018 at 11:18:18AM +0000, Peter Maydell wrote:
In general inodes and offsets start from 0 and work up -- so almost all of the time they don't actually overflow. The problem with ext4 directory hash "offsets" is that they overflow all the time and immediately, so instead of "works unless you have a weird edge case" like all the other filesystems,h it's "never works".
Actually, XFS uses the inode number to encode the location of the inode (it doesn't have a fixed inode table, so it's effectively the block number shifted left by 3 or 4 bits, with the low bits indicating the slot in the 4k block). It has a hack to provide backwards compatibility for 32-bit API's, but there is a similar, "oh, we're on a non-paleolithic CPU, let's use the full 64-bits" sort of logic that ext4 has.
The problem is that there is no 32-bit API in some cases (unless I have misunderstood the kernel code) -- not all host architectures implement compat syscalls or allow them to be called from 64-bit processes or implement all the older syscall variants that had smaller offets. If there was a guaranteed "this syscall always exists and always gives me 32-bit offsets" we could use it.
Are there going to be cases where a process or a thread will sometimes want the 64-bit interface, and sometimes want the 32-bit interface? Or is it always going to be one or the other? I wonder if we could simply add a new flag to the process personality(2) flags.
Yes, that has been suggested, but it seemed a bit dubious to bake in knowledge of ext4's internal implementation details. Can we rely on this as an ABI promise that will always work for all versions of all file systems going forwards?
Yeah, that seems dubious because I'm pretty sure there are other file systems that may have their own 32/64-bit quirks. - Ted