On Monday 19 March 2007 00:46, Jeremy Fitzhardinge wrote:
Andi Kleen wrote:
quoted
Yes. All inline assembly tells gcc what registers are clobbered
and it fills in the tables. Hand clobbering in inline assembly cannot
be expressed with the current toolchain, so we moved all those
out of line.
But again I'm not sure it will work anyways. For once you would
need large padding around the calls anyways for inline replacement --
how would you generate that? I expect you would need to put the calls
into asm() again and with that a custom annotiation format looks
reasonable.
Inlining is most important for very small code: sti, cli, pushf;pop eax,
etc (in many cases, no-ops). We'd have at least 5 bytes to work in, and
maybe more if there are surrounding push/pops to be consumed.
For example, say we wanted to put a general call for sti into entry.S,
where its expected it won't touch any registers. In that case, we'd
have a sequence like:
push %eax
push %ecx
push %edx
call paravirt_cli
pop %edx
pop %ecx
pop %eax
This cannot right now be expressed as inline assembly in the unwinder at all
because there is no way to inject the push/pops into the compiler generated
ehframe tables.
[BTW I plan to resubmit the unwinder with some changes]
If we parse the relocs, then we'd find the reference to paravirt_cli.
If we look at the byte before and see 0xe8, then we can see if its a
call. If we then work out in each direction and see matched push/pops,
then we know what registers can be trashed in the call. This also
allows us to determine the callsite size, and therefore how much space
we need for inlining.
gcc normally doesn't generate push/pops around directly around the
call site, but somewhere else due to the way its register allocator works.
It can be anywhere in the function or even not there at all if the register
didn't contain anything useful. And they're not necessarily push/pops of
course.
So you would need to write it as inline assembly. I'm not sure it would
be significantly cleaner than just having tables then.
So in this case, we see that there are 5 bytes for the call and a
further 6 bytes of push/pops available for inlining.
Of course this is hand-written code anyway, so there's no particular
burden to having some extra metadata stashed away in another section.
For compiler-generated code, we know that it's already expecting
standard C ABI calling conventions. The downside, of course, is that
only the 5 byte call space is available for inline patching.
It's unlikely you can do much useful in 5 bytes I guess.
Regarding cli/sti: i've been actually thinking about changing it in the
non paravirt kernel. IIRC most save_flags/restore_flags are inside
spin_lock_irqsave/restore() and that is a separate function anyways
so a little larger special case code is ok as long as it is not slower.
There is some evidence that at least on P4 a software cli/sti flag without
pushf/popf would be faster.
-Andi