Thread (15 messages) 15 messages, 4 authors, 2017-09-27

Re: [GIT PULL] Block fixes for 4.14-rc2

From: Jens Axboe <axboe@kernel.dk>
Date: 2017-09-25 00:03:54

On 09/24/2017 11:34 AM, Linus Torvalds wrote:
On Sun, Sep 24, 2017 at 6:03 AM, Christoph Hellwig [off-list ref] wrote:
quoted
On Fri, Sep 22, 2017 at 05:18:49PM -1000, Linus Torvalds wrote:
quoted
WTF? Why is this so hard? It used to be that IDE drove people crazy.
Now it's NVMe and generic block layer stuff.
Can you please explain what problems you have with the nvme patches?
This time it wasn't the NVMe parts (that was the last few releases).
This time it was the random writeback patches that had been committed
only days before, and that completely change some fundamental code.
Some of the commits say "this changes no semantics", but that's after
the previous commit just changed the argument values exactly so that
the next commit wouldn't change semantics.

Don't get me wrong - all the commits look perfectly _sane_.  That's
not my issue. But these commits literally showed up on lkml just a
couple of days before getting sent to me, and even when they were sent
the cover letter for the series of six patches was literally

   More graceful flusher thread memory reclaim wakeup

which *really* does not say "this is an important big fix or regression" to me.

What made it ok to send that outside the merge window? It looks like a
nice cleanup, no question about it, and it probably fixes some
behavioral problem at FB. But that behavioral problem is not *new*, as
far as I can tell.

So why was it sent as a bug-fix pull request?

Now, if this was some other subsystem that _hadn't_ had problems in
EVERY SINGLE recent merge window, I'd likely have been more than happy
to take it early in the rc series because that subsystem wasn't on my
list of "increased scrutiny due to historical issues".

But as it is, the block layer _is_ on my list of increased scrutiny
(and part of that reason is NVMe patches - again, "fixes" being sent
that were really just normal on-going development)
Sorry for the late response, been moving this weekend and tending to
things more important than arguing on the Internet.

NVMe fixes have sometimes been accepted for the current series, while
they should have been pushed. I'm not going to argue with that, but I
think we've managed to make that better the last few series. It's still
not squeaky clean, but it is improving substantially.  I don't believe
that's even been the case for core changes, which have always been
scrutinized much more heavily.

I knew you might object to the writeback changes. As mentioned in the
pull request, if you diff the whole thing, it's really not big at all.
It's mostly shuffling some arguments and things around, to make the
final patch trivial. And it does mean that it's super easy to review.
In the original review series, we are hitting this in production, and
have been for a while. So while it's not a regression for this series,
it's something that causes us quite a bit of pain. When attempting to
free memory ends up gobling up tons of extra memory AND causing the
flusher threads to waste tons of CPU, then that's a big problem. Big
enough that we get softlockups from just attempting to process these
identical start-all-writeback requests.

That's why I sent it in for this series. Yes, it's not months old code,
but it has been tested. And it's not like it's a lot of NEW code, it's
mostly removing code, or moving things around.

I can resend without this stuff, but I think it'd be a shame to release
4.14 without it. Let me know, and I'll respin and send a new pull
request.

-- 
Jens Axboe
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help