Thread (13 messages) 13 messages, 4 authors, 2017-02-13

Re: [PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function

From: Waiman Long <longman@redhat.com>
Date: 2017-02-13 19:41:06

On 02/13/2017 05:47 AM, Peter Zijlstra wrote:
On Fri, Feb 10, 2017 at 12:00:43PM -0500, Waiman Long wrote:
quoted
quoted
quoted
+asm(
+".pushsection .text;"
+".global __raw_callee_save___kvm_vcpu_is_preempted;"
+".type __raw_callee_save___kvm_vcpu_is_preempted, @function;"
+"__raw_callee_save___kvm_vcpu_is_preempted:"
+FRAME_BEGIN
+"push %rdi;"
+"push %rdx;"
+"movslq  %edi, %rdi;"
+"movq    $steal_time+16, %rax;"
+"movq    __per_cpu_offset(,%rdi,8), %rdx;"
+"cmpb    $0, (%rdx,%rax);"
Could we not put the $steal_time+16 displacement as an immediate in the
cmpb and save a whole register here?

That way we'd end up with something like:

asm("
push %rdi;
movslq %edi, %rdi;
movq __per_cpu_offset(,%rdi,8), %rax;
cmpb $0, %[offset](%rax);
setne %al;
pop %rdi;
" : : [offset] "i" (((unsigned long)&steal_time) + offsetof(struct steal_time, preempted)));

And if we could get rid of the sign extend on edi we could avoid all the
push-pop nonsense, but I'm not sure I see how to do that (then again,
this asm foo isn't my strongest point).
Yes, I think that can work. I will try to ran this patch to see how
thing goes.
quoted
quoted
quoted
+"setne   %al;"
+"pop %rdx;"
+"pop %rdi;"
+FRAME_END
+"ret;"
+".popsection");
+
+#endif
+
 /*
  * Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present.
  */
That should work for now. I have done something similar for
__pv_queued_spin_unlock. However, this has the problem of creating a
dependency on the exact layout of the steal_time structure. Maybe the
constant 16 can be passed in as a parameter offsetof(struct
kvm_steal_time, preempted) to the asm call.
Yeah it should be well possible to pass that in. But ideally we'd have
GCC grow something like __attribute__((callee_saved)) or somesuch and it
would do all this for us.
That will be really nice too. I am not too fond of working in assembly.
quoted
One more thing, that will improve KVM performance, but it won't help Xen.
People still use Xen? ;-) In any case, their implementation looks very
similar and could easily crib this.
In Red Hat, my focus will be on KVM performance. I do believe that there
are still Xen users out there. So we still need to keep their interest
into consideration. Given that, I am OK to make it work better in KVM
first and then think about Xen later.
quoted
I looked into the assembly code for rwsem_spin_on_owner, It need to save
and restore 2 additional registers with my patch. Doing it your way,
will transfer the save and restore overhead to the assembly code.
However, __kvm_vcpu_is_preempted() is called multiple times per
invocation of rwsem_spin_on_owner. That function is simple enough that
making __kvm_vcpu_is_preempted() callee-save won't produce much compiler
optimization opportunity.
This is because of that noinline, right? Otherwise it would've been
folded and register pressure would be much higher.
Yes, I guess so. The noinline is there so that we know what the CPU time
is for spinning rather than other activities within the slowpath.
quoted
The outer function rwsem_down_write_failed()
does appear to be a bit bigger (from 866 bytes to 884 bytes) though.
I suspect GCC is being clever and since all this is static it plays
games with the calling convention and pushes these clobbers out.
Cheers,
Longman
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help