Re: [PATCH v4 01/13] Linux RDMA Core Changes
From: Steve Wise <hidden>
Date: 2007-01-03 20:20:27
Also in:
lkml
On Wed, 2007-01-03 at 21:33 +0200, Michael S. Tsirkin wrote:
quoted
Without extra param (1000 iterations in cycles): ave 101.283 min 91 max 247 With extra param (1000 iterations in cycles): ave 103.311 min 91 max 221A 2% hit then. Not huge, but 0 either.quoted
Convert cycles to ns (3466.727 MHz CPU): Without: 101.283 / 3466.727 = .02922us == 29.22ns With: 103.311 / 3466.727 = .02980us == 29.80ns So I measure a .58ns average increase for passing in the additional parameter.That depends on CPU speed though. Percentage is likely to be more universal.quoted
Here is a snipit of the test: spin_lock_irq(&lock); do_gettimeofday(&start_tv); for (i=0; i<1000; i++) { cycles_start[i] = get_cycles(); ib_req_notify_cq(cb->cq, IB_CQ_NEXT_COMP); cycles_stop[i] = get_cycles(); } do_gettimeofday(&stop_tv); spin_unlock_irq(&lock); if (stop_tv.tv_usec < start_tv.tv_usec) { stop_tv.tv_usec += 1000000; stop_tv.tv_sec -= 1; } for (i=0; i < 1000; i++) { cycles_t v = cycles_stop[i] - cycles_start[i]; sum += v; if (v > max) max = v; if (min == 0 || v < min) min = v; } printk(KERN_ERR PFX "FOO delta sec %lu usec %lu sum %llu min %llu max %llu\n", stop_tv.tv_sec - start_tv.tv_sec, stop_tv.tv_usec - start_tv.tv_usec, (unsigned long long)sum, (unsigned long long)min, (unsigned long long)max);Good job, the test looks good, thanks. So what does this tell you? To me it looks like there's a measurable speed difference, and so we should find a way (e.g. what I proposed) to enable chelsio userspace without adding overhead to other low level drivers or indeed chelsio kernel level code. What do you think? Roland?
I think having a 2nd function to set the udata seems onerous.