Thread (4 messages) 4 messages, 2 authors, 2017-12-20

Re: random call_single_data alignment

From: Peter Zijlstra <peterz@infradead.org>
Date: 2017-12-20 20:18:47
Subsystem: block layer, the rest · Maintainers: Jens Axboe, Linus Torvalds

On Wed, Dec 20, 2017 at 12:40:25PM -0700, Jens Axboe wrote:
On 12/20/17 12:10 PM, Jens Axboe wrote:
quoted
For some reason, commit 966a967116e6 was added to the tree without
CC'ing relevant maintainers, even though it's touching random subsystems.
One example is struct request, a core structure in the block layer.
After this change, struct request grows from 296 to 320 bytes on my
setup.
https://marc.info/?l=linux-block&m=151379793913822&w=2
Does that actually matter though?, kmalloc is likely to over-allocate in
any case. Sure it introduces a weird hole in the data structure, but
that can be easily fixed by rearranging the thing.

struct request {
        struct list_head {
                struct list_head * next;                                                 /*     0     8 */
                struct list_head * prev;                                                 /*     8     8 */
        } queuelist; /*     0    16 */

        /* XXX 16 bytes hole, try to pack */

        union {
                /* typedef call_single_data_t */ struct __call_single_data {
                        struct llist_node {
                                struct llist_node * next;                                /*    32     8 */
                        } llist; /*    32     8 */
                        /* typedef smp_call_func_t */ void       (*func)(void *);        /*    40     8 */
                        void *     info;                                                 /*    48     8 */
                        unsigned int flags;                                              /*    56     4 */
                } csd; /*          32 */
                /* typedef u64 */ long long unsigned int fifo_time;                      /*           8 */
        };                                                                               /*    32    32 */
        /* --- cacheline 1 boundary (64 bytes) --- */
	....
}

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 8089ca17db9a..9d6fb6d59268 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -133,11 +133,11 @@ typedef __u32 __bitwise req_flags_t;
  * especially blk_mq_rq_ctx_init() to take care of the added fields.
  */
 struct request {
-       struct list_head queuelist;
        union {
                call_single_data_t csd;
                u64 fifo_time;
        };
+       struct list_head queuelist;
 
        struct request_queue *q;
        struct blk_mq_ctx *mq_ctx;

quoted
Why are we blindly aligning to 32 bytes? The comment says to avoid
it spanning two cache lines - but if that's the case, look at the
actual use case and see if that's actually a problem. For struct
request, it is not.

Seems to me, this should have been applied in the specific area
where it was needed. Keep struct call_single_data (instead of some
__ version), and just manually align it where it matters.
Without enforcement of some kind, its too easy to get wrong.
https://marc.info/?l=linux-block&m=151379849914002&w=2
diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index ccb9975a97fa..e0c44e4efa44 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -33,9 +33,9 @@ static inline u64 mb_per_tick(int mbps)
 }
 
 struct nullb_cmd {
+	call_single_data_t csd;
 	struct list_head list;
 	struct llist_node ll_list;
-	call_single_data_t csd;
 	struct request *rq;
 	struct bio *bio;
 	unsigned int tag;

Gets you the exact same size back.
In the future, please CC the relevant folks before making (and
committing) changes like that.
Yeah, I usually do, sorry about that :/
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help