Re: [LSF/MM TOPIC] block level event logging for storage media management
From: Bart Van Assche <hidden>
Date: 2017-01-19 00:11:41
On Wed, 2017-01-18 at 23:34 +0000, Song Liu wrote:
Media health monitoring is very important for large scale distributed sto=
rage systems.=20
Traditionally, enterprise storage controllers maintain event logs for att=
ached storage
devices. However, these controller managed logs do not scale well for lar=
ge scale=20
distributed systems.=20 =20 While designing a more flexible and scalable event logging systems, we th=
ink it is better
to build the log in block layer. Block level event logging covers all maj=
or storage media
(SCSI, SATA, NVMe), and thus minimizes redundant work for different proto=
cols.=20
=20 In this LSF/MM, we would like to discuss the following topics with the co=
mmunity:
1. Mechanism for drivers report events (or errors) to block layer.=20
Basically, we will need a traceable function for the drivers to re=port errors=20
(most likely right before calling end_request or bio_endio). =20
=20
2. What mechanism (ftrace, BPF, etc.) is mostly preferred for the eve=nt logging?
=20
3. How should we categorize different events?
Currently, there are existing code that translates ATA error (ata_=to_sense_error)=20
and NVMe error (nvme_trans_status_code) to SCSI sense code. So we =
can=20
leverage SCSI Key Code Qualifier for event categorizations.=20
=20
4. Detailed discussions on data structure for event logging.=20
=20
We will be able to show a prototype implementation during LSF/MM.=20I'd like to participate in this discussion. Bart.=