About bio_endio | kernelnewbies

quoted

Hello Pranay,

 Thanks for your very helpful insights! I hope you don?t mind if I continue
with more questions on block layer :-)

On Jun 25, 2014 2:09 AM, "Pranay Srivastava" [off-list ref] wrote:
Hello Alvin,

On Tue, Jun 24, 2014 at 9:53 PM, Alvin Abitria [off-list ref]
wrote:
Hello Pranay!

Thanks for your reply.  I apologize for my very late reply, I was very
preoccupied earlier at work.

On Tue, Jun 24, 2014 at 1:07 PM, Pranay Srivastava [off-list ref]
wrote:
Hello Alvin,

On Mon, Jun 23, 2014 at 10:39 PM, Alvin Abitria <
abitria.alvin@gmail.com>
wrote:
Hello,

I'm developing a block driver using the make_request method,
effectively
bypassing existing scsi or request stack in block layer.  So that
means
im
directly working with bios.  As prescribed in linux documentation and
from
referring to similar drivers in kernel, you close a session with a
bio
with
the bio_endio function.
So it means you are just passing on the bios without the request
structure if I'm correct?
I don't know how you are handling blk_finish_plug without having
request or request queue,
I maybe wrong in understanding how you are handling it.
Yes, I'm working on bio's level.  No struct requests, and I haven't used
blk_finish_plug yet.
The block driver method I'm implementing is somewhat along the same line
with nvme and mtip2xxx
drivers in drivers/block directory (but differing in hardware specific
level
of course).
Ok I got it now. But why not use request structures? You can use at
least one for chaining bio(s). See below what I mean..

I usually invoke bio_endio during successful I/O completion, meaning
with an
error code of zero.  But there are cases that this is not fulfilled
or
there
are error cases.  My question is, what are the valid error codes that
can be
used with it?  My initial impression is that other than zero as error
code,
-EIO is the one that you should use I think,
bio_endio will fail.  I've read somewhere that -EBUSY is not
recognized,
and
I tried -EIO but my driver crashed.  I got a panic in some dio_xxx
function
leading from bio_endio(bio,-EIO). I would like to block subsequent
bios
sent
If it's okay for you to post the error then can you do that? I was
seeing the code for
dio_end_io but it would be good if you can post the exact crash
backtrace if you've got that.

Here you go:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff811b9a80>] bio_check_pages_dirty+0x50/0xe0
PGD 42e5e4067 PUD 42e6e7067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file:
/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size
CPU 7
Modules linked in: block_module(U) fuse ip6table_filter ip6_tables
ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4
nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM
iptable_mangle
iptable_filter ip_tables bridge autofs4 sunrpc 8021q garp stp llc
cpufreq_ondemand freq_table pcc_cpufreq ipv6 vhost_net macvtap macvlan
tun
kvm_intel kvm uinput power_meter hpilo hpwdt sg tg3 microcode serio_raw
iTCO_wdt iTCO_vendor_support ioatdma dca shpchp ext4 mbcache jbd2 sd_mod
crc_t10dif hpsa pata_acpi ata_generic ata_piix dm_mirror dm_region_hash
dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 3740, comm: fio Not tainted 2.6.32-358.el6.x86_64 #1 HP ProLiant
DL380p
Gen8
RIP: 0010:[<ffffffff811b9a80>]  [<ffffffff811b9a80>]
bio_check_pages_dirty+0x50/0xe0
RSP: 0018:ffff8804191618c8  EFLAGS: 00010046
RAX: 2000000000000000 RBX: ffff88041909f0c0 RCX: 00000000000011ae
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff8804191618f8 R08: ffffffff81c07728 R09: 0000000000000040
R10: 0000000000000002 R11: 0000000000000002 R12: 0000000000000000
R13: ffff8804191b9b80 R14: ffff8804191b9b80 R15: 0000000000000000
FS:  00007fcd43e2d720(0000) GS:ffff8800366e0000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000041e69f000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process fio (pid: 3740, threadinfo ffff880419160000, task
ffff88043341f500)
Stack:
 0000000000000000 ffff88043433f400 ffff88043433f520 ffff88041909f0c0
<d> ffff88041909f0c0 ffff8804191b9b80 ffff880419161948 ffffffff811bdc38
<d> ffff880419161968 0000000034236400 00000000fffffffb ffff88043433f400
Call Trace:
 [<ffffffff811bdc38>] dio_bio_complete+0xc8/0xd0
You are using a different kernel then mine. See if the offset within
this function leads you to

                spin_lock_irqsave(&bio_dirty_lock, flags);
                bio->bi_private = bio_dirty_list; /*This is the line
number I got from the offset*/
                bio_dirty_list = bio;
                spin_unlock_irqrestore(&bio_dirty_lock, flags);
Here are the offsets in my kernel:
(gdb) list *(dio_bio_complete+0xc8)

0xffffffff811bdc38 is in dio_bio_complete (fs/direct-io.c:440).

435         int page_no;

436

437         if (!uptodate)

438               dio->io_error = -EIO;

439

440         if (dio->is_async && dio->rw == READ) {

441               bio_check_pages_dirty(bio);   /* transfers ownership */

442         } else {

443               for (page_no = 0; page_no < bio->bi_vcnt; page_no++) {

444                     struct page *page = bvec[page_no].bv_page;

(gdb)

445

446                     if (dio->rw == READ && !PageCompound(page))

447                           set_page_dirty_lock(page);

448                     page_cache_release(page);

449               }

450               bio_put(bio);

451         }

452         return uptodate ? 0 : -EIO;

453   }

(gdb) list *(bio_check_pages_dirty+0x50)

0xffffffff811b9a80 is in bio_check_pages_dirty
(/usr/src/debug/kernel-2.6.32-358.el6/linux-2.6.32-358.el6.x86_64/arch/x86/include/asm/bitops.h:359).

354   #endif

355

356   static __always_inline int constant_test_bit(unsigned int nr, const
volatile unsigned long *addr)

357   {

358         return ((1UL << (nr % BITS_PER_LONG)) &

359               (((unsigned long *)addr)[nr / BITS_PER_LONG])) != 0;

360   }

361

362   static inline int variable_test_bit(int nr, volatile const unsigned
long *addr)

363   {

(gdb)

364         int oldbit;

365

366         asm volatile("bt %2,%1\n\t"

367                    "sbb %0,%0"

368                    : "=r" (oldbit)

369                    : "m" (*(unsigned long *)addr), "Ir" (nr));

370

371         return oldbit;

372   }

373

You are not doing bio_put in the driver code right? Assuming you are
not doing bio_get also. On which filesystem you are testing this?
No, I?m not using either the bio_put nor the bio_get kernel functions.
Also, there is no filesystem yet on the drive under test; it?s raw.  This
is because  I?m using fio benchmarking tool to send large amount of I/O
traffic to test how the driver behaves if pushed to its queue depth limits.

Can
you see if it happens on a simpler one like ext2 since it uses the
generic direct_IO call. So we can further narrow it down.

 [<ffffffff811bef4f>] dio_bio_end_aio+0x2f/0xd0
 [<ffffffff811b920d>] bio_endio+0x1d/0x40
 [<ffffffffa02c15a1>] block_make_request+0xe1/0x150 [block_module]
 [<ffffffff8125ccce>] generic_make_request+0x25e/0x530
 [<ffffffff811bae72>] ? bvec_alloc_bs+0x62/0x110
 [<ffffffff8125d02d>] submit_bio+0x8d/0x120
 [<ffffffff811bdf6c>] dio_bio_submit+0xbc/0xc0
 [<ffffffff811be951>] __blockdev_direct_IO_newtrunc+0x631/0xb30
 [<ffffffff8111afe3>] ? filemap_fault+0xd3/0x500
 [<ffffffff811beeae>] __blockdev_direct_IO+0x5e/0xd0
 [<ffffffff811bb280>] ? blkdev_get_blocks+0x0/0xc0
 [<ffffffff811bc347>] blkdev_direct_IO+0x57/0x60
 [<ffffffff811bb280>] ? blkdev_get_blocks+0x0/0xc0
 [<ffffffff8111bb8b>] generic_file_aio_read+0x6bb/0x700
 [<ffffffff81166a2a>] ? kmem_getpages+0xba/0x170
 [<ffffffff81166f87>] ? cache_grow+0x217/0x320
 [<ffffffff811bb893>] blkdev_aio_read+0x53/0xc0
 [<ffffffff8111c633>] ? mempool_alloc+0x63/0x140
 [<ffffffff811bb840>] ? blkdev_aio_read+0x0/0xc0
 [<ffffffff811cadc4>] aio_rw_vect_retry+0x84/0x200
 [<ffffffff811cc784>] aio_run_iocb+0x64/0x170
 [<ffffffff811cdbb1>] do_io_submit+0x291/0x920
 [<ffffffff811ce250>] sys_io_submit+0x10/0x20
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Code: 31 e4 45 31 ff eb 15 0f 1f 40 00 0f b7 43 28 41 83 c4 01 41 83 c7
01
44 39 e0 7e 36 4d 63 ec 49 c1 e5 04 4f 8d 2c 2e 49 8b 7d 00 <48> 8b 07
a8 10
75 06 66 a9 00 c0 74 d3 e8 5e 59 f7 ff 49 c7 45
RIP  [<ffffffff811b9a80>] bio_check_pages_dirty+0x50/0xe0
 RSP <ffff8804191618c8>
CR2: 0000000000000000

Basically, after receiving the bio from generic_make_request, I checked
and
found that
I already have the maximum outstanding pending I/O (bios) and couldn't
accomodate
this bio for now.  Rather than just exiting the make_request() routine,
I
decided to close
this bio or return it to kernel via bio_endio(bio, -EIO).  No
processing or
modification
 was done to the bio, I just used the callback fn above.

to me after reaching my queue depth and with no tags left, and so I
want
to
use bio_endio with an error code.
If you have a request queue then you could call blk_stop_queue and
blk_start_queue but I don't know if this is
relevant to your case.

I do have a request queue allocated using blk_alloc_queue, but this is
more
of a dummy queue
in my case - I don't use it as much as the struct requests would use it.
Its function that I used is
for me to register my make_request function, register the maximum I/O
size
and max number of
buffer segments that can be sent to my driver.
See request generation is not in your hands. You can't tell a file
system don't send in more requests but what your driver can do is
choose to delay their processing when it is able to do so. So I don't
think you should return error from make_request at least hold on to
those bios so you can process them later. We don't want to lie to
applications that an I/O error occurred right?

For this purpose I thought you might use one static request structure
and since you are already overriding the default one just put those
bios in that request, chain them. I guess you'll have to do a bit more
if you work this way so that you don't mess with the request code
anywhere else, but if putting those bios on a list is easier go that
way then.
I get your point, that there is no control over bio generation from the
upper layers.  I?m curious in your suggestion to hold on the excess bios
for later processing.  Say I do have a queue 256 bio entries deep, the 256
bios that make it within the queue gets processed, the 257th bio etc. will
not be returned but instead put on a list (a struct bio_list?) and
processed later on.  Is that correct?  Is there a possibility that if
traffic is really high and that lasts for indefinite amount of time that
the bio list will grow bigger and bigger and overflow ?til it hurts the
kernel?  I?m thinking of the worst case and how to handle that, since upper
layers can?t be stopped from sending I/O to my driver anyway.

I also have a related question from the generic block layer.  I see that
it?s tagging outstanding requests via the blk_queue_start_tag() if the
request is within the block layer queue depth.  By tagging requests it can
process them.  That is similar to what I want to do.  I want to handle
properly those bios over the queue, so I first thought of returning it to
kernel hoping it will retry sending it til I can accomodate it, or be able
to tell it im busy and so don't send me bios for a while til driver has
free slots in queue.  What does the block layer do with a request that
failed to get a tag (because it?s over the queue depth).  Where is it
placed?

Thanks for the suggestion, I haven't tried blk_stop_queue and
blk_start_queue but I hope it will
work.  I also like the possibility that you can prevent the upper layers
from sending me further bio
when my queue depth is full, then letting them know once my driver can
accomodate new bios -
more of telling them I'm busy right now, don't send me further bios,
etc.
Is that the general idea behind
the two functions you mentioned?
Yup that's what those functions are for. But they just stop processing
part not the generation part of request. I hope you got the point I'm
trying to make.

I also notice that the two kernel APIs you gave either set/clear a
certain
request queue flag.  I'd like to
think that upper layers (generic_make_request etc) check that flag
first to
decide if it can dispatch bios
to my driver, is that right?
Right that's for the queue, See blk_queue_stopped, if that's set then
processing won't be done right now i.e. request_function won't be
called. But nothing on the make_request function since that's not in
your hands. Let the requests be queued but process them at your
discretion.

What are those error codes, and will they work for my intended
function?
Thanks!
I thought -EIOCBQUEUED might help but I don't think it would. You can
try it though.
Thanks! No harm in trying :-) Let me know if you have questions regarding
my replies.
-EIO should work, but first let's find out why you got the crash.
Thanks.  I hope we get to the bottom why it failed and crashed.  Let me
know
if you have questions.

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies at kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

--
        ---P.K.S
Alvin

--
        ---P.K.S
Alvin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140625/bf1eae18/attachment-0001.html 

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help