Thread (65 messages) 65 messages, 16 authors, 2024-03-09

Re: [RFC PATCH 00/11] Rust null block driver

From: Jens Axboe <axboe@kernel.dk>
Date: 2023-05-04 20:56:17
Also in: lkml, rust-for-linux

On 5/4/23 1:59?PM, Andreas Hindborg wrote:
Jens Axboe [off-list ref] writes:
quoted
On 5/4/23 12:52?PM, Keith Busch wrote:
quoted
On Thu, May 04, 2023 at 11:36:01AM -0700, Bart Van Assche wrote:
quoted
On 5/4/23 11:15, Andreas Hindborg wrote:
quoted
If it is still unclear to you why this effort was started, please do let
me know and I shall try to clarify further :)
It seems like I was too polite in my previous email. What I meant is that
rewriting code is useful if it provides a clear advantage to the users of
a driver. For null_blk, the users are kernel developers. The code that has
been posted is the start of a rewrite of the null_blk driver. The benefits
of this rewrite (making low-level memory errors less likely) do not outweigh
the risks that this effort will introduce functional or performance regressions.
Instead of replacing, would co-existing be okay? Of course as long as
there's no requirement to maintain feature parity between the two.
Actually, just call it "rust_blk" and declare it has no relationship to
null_blk, despite their functional similarities: it's a developer
reference implementation for a rust block driver.
To me, the big discussion point isn't really whether we're doing
null_blk or not, it's more if we want to go down this path of
maintaining rust bindings for the block code in general. If the answer
to that is yes, then doing null_blk seems like a great choice as it's
not a critical piece of infrastructure. It might even be a good idea to
be able to run both, for performance purposes, as the bindings or core
changes.

But back to the real question... This is obviously extra burden on
maintainers, and that needs to be sorted out first. Block drivers in
general are not super security sensitive, as it's mostly privileged code
and there's not a whole lot of user visibile API. And the stuff we do
have is reasonably basic. So what's the long term win of having rust
bindings? This is a legitimate question. I can see a lot of other more
user exposed subsystems being of higher interest here.
Even though the block layer is not usually exposed in the same way
that something like the USB stack is, absence of memory safety bugs is
a very useful property. If this is attainable without sacrificing
performance, it seems like a nice option to offer future block device
driver developers. Some would argue that it is worth offering even in
the face of performance regression.

While memory safety is the primary feature that Rust brings to the
table, it does come with other nice features as well. The type system,
language support stackless coroutines and error handling language
support are all very useful.
We're in violent agreement on this part, I don't think anyone sane would
argue that memory safety with the same performance [1] isn't something
you'd want. And the error handling with rust is so much better than the
C stuff drivers do now that I can't see anyone disagreeing on that being
a great thing as well.

The discussion point here is the price being paid in terms of people
time.
Regarding maintenance of the bindings, it _is_ an amount extra work. But
there is more than one way to structure that work. If Rust is accepted
into the block layer at some point, maintenance could be structured in
such a way that it does not get in the way of existing C maintenance
work. A "rust keeps up or it breaks" model. That could work for a while.
That potentially works for null_blk, but it would not work for anything
that people actually depend on. In other words, anything that isn't
null_blk. And I don't believe we'd be actively discussing these bindings
if just doing null_blk is the end goal, because that isn't useful by
itself, and at that point we'd all just be wasting our time. In the real
world, once we have just one actual driver using it, then we'd be
looking at "this driver regressed because of change X/Y/Z and that needs
to get sorted before the next release". And THAT is the real issue for
me. So a "rust keeps up or it breaks" model is a bit naive in my
opinion, it's just not a viable approach. In fact, even for null_blk,
this doesn't really fly as we rely on blktests to continually vet the
sanity of the IO stack, and null_blk is an integral part of that.

So I really don't think there's much to debate between "rust people vs
jens" here, as we agree on the benefits, but my end of the table has to
stomach the cons. And like I mentioned in an earlier email, that's not
just on me, there are other regular contributors and reviewers that are
relevant to this discussion. This is something we need to discuss.

[1] We obviously need to do real numbers here, the ones posted I don't
consider stable enough to be useful in saying "yeah it's fully on part".
If you have an updated rust nvme driver that uses these bindings I'd
be happy to run some testing that will definitively tell us if there's a
performance win, loss, or parity, and how much.

-- 
Jens Axboe
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help