Thread (11 messages) 11 messages, 7 authors, 2017-01-25

Re: [LSF/MM TOPIC] block level event logging for storage media management

From: Bart Van Assche <hidden>
Date: 2017-01-19 00:11:41

On Wed, 2017-01-18 at 23:34 +0000, Song Liu wrote:
Media health monitoring is very important for large scale distributed sto=
rage systems.=20
Traditionally, enterprise storage controllers maintain event logs for att=
ached storage
devices. However, these controller managed logs do not scale well for lar=
ge scale=20
distributed systems.=20
=20
While designing a more flexible and scalable event logging systems, we th=
ink it is better
to build the log in block layer. Block level event logging covers all maj=
or storage media
(SCSI, SATA, NVMe), and thus minimizes redundant work for different proto=
cols.=20
=20
In this LSF/MM, we would like to discuss the following topics with the co=
mmunity:
    1. Mechanism for drivers report events (or errors) to block layer.=20
       Basically, we will need a traceable function for the drivers to re=
port errors=20
       (most likely right before calling end_request or bio_endio). =20
 =20
    2. What mechanism (ftrace, BPF, etc.) is mostly preferred for the eve=
nt logging?
=20
    3. How should we categorize different events?
       Currently, there are existing code that translates ATA error (ata_=
to_sense_error)=20
       and NVMe error (nvme_trans_status_code) to SCSI sense code. So we =
can=20
       leverage SCSI Key Code Qualifier for event categorizations.=20
=20
    4. Detailed discussions on data structure for event logging.=20
=20
We will be able to show a prototype implementation during LSF/MM.=20
I'd like to participate in this discussion.

Bart.=
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help