Thread (135 messages) 135 messages, 18 authors, 2007-03-20

Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

From: Andi Kleen <hidden>
Date: 2007-03-19 11:00:57
Also in: lkml, netdev, xen-devel

On Monday 19 March 2007 00:46, Jeremy Fitzhardinge wrote:
Andi Kleen wrote:
quoted
Yes. All inline assembly tells gcc what registers are clobbered
and it fills in the tables. Hand clobbering in inline assembly cannot
be expressed with the current toolchain, so we moved all those
out of line.

But again I'm not sure it will work anyways. For once you would
need large padding around the calls anyways for inline replacement --
how would you generate that? I expect you would need to put the calls
into asm() again and with that a custom annotiation format looks
reasonable.
Inlining is most important for very small code: sti, cli, pushf;pop eax,
etc (in many cases, no-ops).  We'd have at least 5 bytes to work in, and
maybe more if there are surrounding push/pops to be consumed.

For example, say we wanted to put a general call for sti into entry.S,
where its expected it won't touch any registers.  In that case, we'd
have a sequence like:

    push %eax
    push %ecx
    push %edx
    call paravirt_cli
    pop %edx
    pop %ecx
    pop %eax
This cannot right now be expressed as inline assembly in the unwinder at all 
because there is no way to inject the push/pops into the compiler generated
ehframe tables.

[BTW I plan to resubmit the unwinder with some changes]

If we parse the relocs, then we'd find the reference to paravirt_cli.
If we look at the byte before and see 0xe8, then we can see if its a
call.  If we then work out in each direction and see matched push/pops,
then we know what registers can be trashed in the call.  This also
allows us to determine the callsite size, and therefore how much space
we need for inlining.
gcc normally doesn't generate push/pops around directly around the
call site, but somewhere else due to the way its register allocator works.
It can be anywhere in the function or even not there at all if the register
didn't contain anything useful. And they're not necessarily push/pops of 
course.

So you would need to write it as inline assembly. I'm not sure it would
be significantly cleaner than just having tables then.
So in this case, we see that there are 5 bytes for the call and a
further 6 bytes of push/pops available for inlining.

Of course this is hand-written code anyway, so there's no particular
burden to having some extra metadata stashed away in another section.
For compiler-generated code, we know that it's already expecting
standard C ABI calling conventions.  The downside, of course, is that
only the 5 byte call space is available for inline patching.
It's unlikely you can do much useful in 5 bytes I guess.

Regarding cli/sti: i've been actually thinking about changing it in the
non paravirt kernel. IIRC most save_flags/restore_flags are inside
spin_lock_irqsave/restore() and that is a separate function anyways
so a little larger special case code is ok as long as it is not slower. 
There is some evidence that at least on P4 a software cli/sti flag without 
pushf/popf would be faster.

-Andi
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help