RE: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems
From: Slava Dubeyko <hidden>
Date: 2017-01-19 02:56:43
Also in:
linux-fsdevel, nvdimm
-----Original Message----- From: Jeff Moyer [mailto:jmoyer@redhat.com]=20 Sent: Wednesday, January 18, 2017 12:48 PM To: Slava Dubeyko <redacted> Cc: Jan Kara <jack@suse.cz>; linux-nvdimm@lists.01.org <linux-nvdimm@ml01.0= 1.org>; linux-block@vger.kernel.org; Viacheslav Dubeyko [off-list ref]= ; Linux FS Devel [off-list ref]; lsf-pc@lists.linux-founda= tion.org Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in f= ilesystems
quoted
quoted
Well, the situation with NVM is more like with DRAM AFAIU. It is=20 quite reliable but given the size the probability *some* cell has degra=
ded is quite high.
quoted
quoted
And similar to DRAM you'll get MCE (Machine Check Exception) when you=20 try to read such cell. As Vishal wrote, the hardware does some=20 background scrubbing and relocates stuff early if needed but nothing is=
100%.
quoted
My understanding that hardware does the remapping the affected address=20 range (64 bytes, for example) but it doesn't move/migrate the stored=20 data in this address range. So, it sounds slightly weird. Because it=20 means that no guarantee to retrieve the stored data. It sounds that=20 file system should be aware about this and has to be heavily protected=20 by some replication or erasure coding scheme. Otherwise, if the=20 hardware does everything for us (remap the affected address region and=20 move data into a new address region) then why does file system need to=20 know about the affected address regions?The data is lost, that's why you're getting an ECC. It's tantamount to -E=
IO for a disk block access. I see the three possible cases here: (1) bad block has been discovered (no remap, no recovering) -> data is lost= ; -EIO for a disk block access, block is always bad; (2) bad block has been discovered and remapped -> data is lost; -EIO for a = disk block access. (3) bad block has been discovered, remapped and recovered -> no data is los= t.
quoted
Let's imagine that the affected address range will equal to 64 bytes.=20 It sounds for me that for the case of block device it will affect the=20 whole logical block (4 KB).512 bytes, and yes, that's the granularity at which we track errors in th=
e block layer, so that's the minimum amount of data you lose. I think it depends what granularity hardware supports. It could be 512 byte= s, 4 KB, maybe greater.
quoted
The situation is more critical for the case of DAX approach. Correct=20 me if I wrong but my understanding is the goal of DAX is to provide=20 the direct access to file's memory pages with minimal file system=20 overhead. So, it looks like that raising bad block issue on file=20 system level will affect a user-space application. Because, finally,=20 user-space application will need to process such trouble (bad block=20 issue). It sounds for me as really weird situation. What can protect a=20 user-space application from encountering the issue with partially=20 incorrect memory page?Applications need to deal with -EIO today. This is the same sort of thin=
g.
If an application trips over a bad block during a load from persistent me=
mory,
they will get a signal, and they can either handle it or not. Have a read through this specification and see if it clears anything up f=
or you:
http://www.snia.org/tech_activities/standards/curr_standards/npm
Thank you for sharing this. So, if a user-space application follows to the NVM Programming Model then it will be able to survive by means of catching and processing the exceptions. But these applications have to be implemente= d yet. Also such applications need in special technique(s) of recovering. It sound= s that legacy user-space applications are unable to survive for the NVM.PM.FI= LE mode in the case of load/store operation's failure. Thanks, Vyacheslav Dubeyko.