Thread (6 messages) 6 messages, 2 authors, 2017-06-30

Re: LightNVM pblk: read/write of random kernel memory

From: Carl-Daniel Hailfinger <hidden>
Date: 2017-06-28 15:14:54

Hi Javier,

On 28.06.2017 16:58, Javier Gonzalez wrote:
quoted
On 28 Jun 2017, at 16.33, Carl-Daniel Hailfinger [off-list ref] wrote:

thanks for the pointer to the github reporting page.
I'll answer your questions here (to make then indexable by search
engines in case someone else stumbles upon this) and link to newly
created github issues for the various problems I encountered.
Ok. I answered each issue directly on the github. A couple og things
inline though, for completion.
Thanks.

 
quoted
On 28.06.2017 13:07, Javier Gonzalez wrote:
quoted
I'll take the question here, but please use our github [1] to report
errors and ask questions instead (including this thread). No need to
spam the rest of the linux-block mailing list for LightNVM specific
matters - unless of course, you want to discuss specific parts of the
code.

[1] https://github.com/OpenChannelSSD
quoted
On 28 Jun 2017, at 01.30, Carl-Daniel Hailfinger [off-list ref] wrote:

I'm currently having trouble with LightNVM pblk with kernel 4.12-rc7 on
Ubuntu 16.04.2 x86_64 in a Qemu VM using latest
https://github.com/OpenChannelSSD/qemu-nvme .

I'm creating a pblk device inside the VM with the following command:
[...]

This might either be a bug in the OpenChannelSSD qemu tree, or it might
be a kernel bug.

I also got warnings like the below:
In the 4.12 patches for pblk we do not have an error state machine. This
is, when writes fail on the device (on qemu in this case), we did not
communicate this to the application. This bad error handling results in
unexpected side-errors like the one you are experiencing. On the patches
for 4.13, we have implemented the error state machine, so this type of
errors should be better handled.
Oh. Shouldn't a minimal version of those patches get merged into 4.12
(or 4.12-stable once 4.12 is released) to avoid releasing a kernel with
a data corruption bug?
This is only in case the device fails, how we handle the error on the
host. If the device is not accepting writes for some reason, data is
lost anyway. So I don't think we need the fix for stable.
This is odd. AFAICS qemu isn't configured to simulate device failure, so
in theory this should never have happened. Can you think of any reason
why this code path was triggered? Should I open a separate github issue
for that?

quoted
quoted
You can pick up the code from out github (linux.git - branch:
pblk.for-4.13) or take it directly form Jens' for-4.13/core
Thanks. A full kernel compile will take some time, though. Do you happen
to have a Ubuntu-compatible kernel .deb for the new code?
We thought about, but never actually did it (to share at least). I see
it might be useful :) For the time being, I'll share a minimal .config
for qemu, which takes a couple of minutes to compile.
Thanks!

 
quoted
[various bugs]
Filed as https://github.com/OpenChannelSSD/linux/issues/28
Filed as https://github.com/OpenChannelSSD/linux/issues/29
Filed as https://github.com/OpenChannelSSD/linux/issues/30
Filed as https://github.com/OpenChannelSSD/linux/issues/31

Regards,
Carl-Daniel
Javier
Regards,
Carl-Daniel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help