Re: [PATCH V10 09/19] block: introduce bio_bvecs()
From: Sagi Grimberg <sagi@grimberg.me>
Date: 2018-11-21 04:25:46
Also in:
dm-devel, linux-bcache, linux-block, linux-btrfs, linux-fsdevel, linux-mm, linux-raid, linux-xfs, lkml
quoted
I would like to avoid growing bvec tables and keep everything preallocated. Plus, a bvec_iter operates on a bvec which means we'll need a table there as well... Not liking it so far...In case of bios in one request, we can't know how many bvecs there are except for calling rq_bvecs(), so it may not be suitable to preallocate the table. If you have to send the IO request in one send(), runtime allocation may be inevitable.
I don't want to do that, I want to work on a single bvec at a time like the current implementation does.
If you don't require to send the IO request in one send(), you may send one bio in one time, and just uses the bio's bvec table directly, such as the single bio case in lo_rw_aio().
we'd need some indication that we need to reinit my iter with the
new bvec, today we do:
static inline void nvme_tcp_advance_req(struct nvme_tcp_request *req,
int len)
{
req->snd.data_sent += len;
req->pdu_sent += len;
iov_iter_advance(&req->snd.iter, len);
if (!iov_iter_count(&req->snd.iter) &&
req->snd.data_sent < req->data_len) {
req->snd.curr_bio = req->snd.curr_bio->bi_next;
nvme_tcp_init_send_iter(req);
}
}
and initialize the send iter. I imagine that now I will need to
switch to the next bvec and only if I'm on the last I need to
use the next bio...
Do you offer an API for that?
quoted
quoted
can this way avoid your blocking issue? You may see this example in branch 'rq->bio != rq->biotail' of lo_rw_aio().This is exactly an example of not ignoring the bios...Yeah, that is the most common example, given merge is enabled in most of cases. If the driver or device doesn't care merge, you can disable it and always get single bio request, then the bio's bvec table can be reused for send().
Does bvec_iter span bvecs with your patches? I didn't see that change?
quoted
I'm not sure how this helps me either. Unless we can set a bvec_iter to span bvecs or have an abstract bio crossing when we re-initialize the bvec_iter I don't see how I can ignore bios completely...rq_for_each_bvec() will iterate over all bvecs from all bios, so you needn't to see any bio in this req.
But I don't need this iteration, I need a transparent API like; bvec2 = rq_bvec_next(rq, bvec) This way I can simply always reinit my iter without thinking about how the request/bios/bvecs are constructed...
rq_bvecs() will return how many bvecs there are in this request(cover all bios in this req)
Still not very useful given that I don't want to use a table...
quoted
quoted
So looks nvme-tcp host driver might be the 2nd driver which benefits from multi-page bvec directly. The multi-page bvec V11 has passed my tests and addressed almost all the comments during review on V10. I removed bio_vecs() in V11, but it won't be big deal, we can introduce them anytime when there is the requirement.multipage-bvecs and nvme-tcp are going to conflict, so it would be good to coordinate on this. I think that nvme-tcp host needs some adjustments as setting a bvec_iter. I'm under the impression that the change is rather small and self-contained, but I'm not sure I have the full picture here.I guess I may not get your exact requirement on block io iterator from nvme-tcp too, :-(
They are pretty much listed above. Today nvme-tcp sets an iterator with: vec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter); nsegs = bio_segments(bio); size = bio->bi_iter.bi_size; offset = bio->bi_iter.bi_bvec_done; iov_iter_bvec(&req->snd.iter, WRITE, vec, nsegs, size); and when done, iterate to the next bio and do the same. With multipage bvec it would be great if we can simply have something like rq_bvec_next() that would pretty much satisfy the requirements from the nvme-tcp side...