Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation
From: Peter Zijlstra <peterz@infradead.org>
Date: 2014-01-31 19:46:21
Also in:
lkml
On Fri, Jan 31, 2014 at 01:26:29PM -0500, Waiman Long wrote:
quoted
I don't get why we need the used thing at all; something like: struct qna { int cnt; struct qnode nodes[4]; }; DEFINE_PER_CPU(struct qna, qna); struct qnode *get_qnode(void) { struct qna *qna = this_cpu_ptr(&qna); return qna->nodes[qna->cnt++]; /* RMW */ } void put_qnode(struct qnode *qnode) { struct qna *qna = this_cpu_ptr(&qna); qna->cnt--; } Should do fine, right?Yes, we can do something like that. However I think put_qnode() needs to use atomic dec as well. As a result, we will need 2 additional atomic operations per slowpath invocation. The code may look simpler, but I don't think it will be faster than what I am currently doing as the cases where the used flag is set will be relatively rare.
No, put doesn't need an atomic; nor is it as well; because the inc doesn't need an atomic either.
quoted
If we interrupt the RMW above the interrupted context hasn't yet used the queue and once we return its free again, so all should be well even on load-store archs. The nodes array might as well be 3, because NMIs should never contend on a spinlock, so all we're left with is task, softirq and hardirq context.I am not so sure about NMI not taking a spinlock. I seem to remember seeing code that did that. Actually, I think the NMI code is trying to printk something which, in turn, need to acquire a spinlock.
Yeah I know, terribly broken that, I've been waiting for that to explode :-)