Thread (16 messages) 16 messages, 6 authors, 2017-02-14

Re: [PATCH 1/5] MD: attach data to each bio

From: Shaohua Li <shli@kernel.org>
Date: 2017-02-13 18:49:42

On Mon, Feb 13, 2017 at 08:49:33PM +1100, Neil Brown wrote:
On Thu, Feb 09 2017, Shaohua Li wrote:
quoted
On Fri, Feb 10, 2017 at 05:08:54PM +1100, Neil Brown wrote:
quoted
On Tue, Feb 07 2017, Shaohua Li wrote:
quoted
Currently MD is rebusing some bio fields. To remove the hack, we attach
extra data to each bio. Each personablity can attach extra data to the
bios, so we don't need to rebuse bio fields.
I must say that I don't really like this approach.
Temporarily modifying ->bi_private and ->bi_end_io seems
.... intrusive.   I suspect it works, but I wonder if it is really
robust in the long term.

How about a different approach..  Your main concern with my first patch
was that it called md_write_start() and md_write_end() much more often,
and these performed atomic ops on "global" variables, particular
writes_pending.

We could change writes_pending to a per-cpu array which we only count
occasionally when needed.  As writes_pending is updated often and
checked rarely, a per-cpu array which is summed on demand seems
appropriate.

The following patch is an early draft - it doesn't obviously fail and
isn't obviously wrong to me.  There is certainly room for improvement
and may be bugs.
Next week I'll work on collection the re-factoring into separate
patches, which are possible good-to-have anyway.
For your first patch, I don't have much concern. It's ok to me. What I don't
like is the bi_phys_segments handling part. The patches add a lot of logic to
handle the reference count. They should work, but I'd say it's not easy to
understand and could be error prone. What we really need is a reference count
for the bio, so let's just add a reference count. That's my logic and it's
simple.
We already have two reference counts, and you want to add a third one.

bi_phys_segments is currently used for two related purposes.
It counts the number of stripe_heads currently attached to the bio so
that when the count reaches zero:
 1/ ->writes_pending can be decremented
 2/ bio_endio() can be called.

When the code was written, the __bi_remaining counter didn't exist.  Now
it does and it is integrated with bio_endio() so it should make the code
easier to understand if we just use bio_endio() rather and doing our own
accounting.

That just leaves '1'.  We can easily decrement ->writes_pending directly
instead of decrementing a per-bio refcount, and then when it reaches
zero, decrement ->writes_pending.  As you pointed out, that comes with a
cost.  If ->writes_pending is changed to a per-cpu array which is summed
on demand, the cost goes away.

Having an extra refcount in the bio just adds a level of indirection
that doesn't (that I can see) provide actual value.
Ok, fair enough. I do think an explict counter in the driver side will help us
a lot, eg, we can better control when to endio and do something there (for
example the current blk trace, or something we want to add in the future). But
I don't insist currently.

For the patches, can you repost? I think:
- patch 2 missed md_write_start for make_discard_request
- It's unnecessary to zero bi_phys_segments in patch 5. And raid5-cache need do
  the same change of bio_endio.
For the md_write_start optimization, we can do it later.

Thanks,
Shaohua
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help