On Mon, May 28, 2007 at 08:00:21PM +1000, Benjamin Herrenschmidt wrote:
On Mon, 2007-05-28 at 17:37 +0800, Liu Dave-r63238 wrote:
quoted
BTW, if the x86 processor support the broadcast tlb operation to
system?
If it can, why we adopt the IPI mechanism for x86? what is the
concern?
I don't think it supports them but then, I don't know for sure.
It does not. However IA64 (aka Itanic) does. Of course on x86 until
recently, the TLB were completely flushed (at least the entries mapping to
user space) on task switches to a different mm, which automatically
avoids races for single threaded apps.
Part of the problem is what your workload is. if you have a lot of small
and short lived processes, such as CGI's on a web server, they are
fairly unlikely to exist on more than one processor, maybe two, during
their lifetime (there is a strong optimisation to only do a local
invalidate when the process only ever existed on one processor).
If you have a massively threaded workload, that is, a given process is
likely to exist on all processors, then it's also fairly unlikely that
you start doing a lot of fork()'s or to have that processes be short
lived... so it's less of an issue unless you start abusing mmap/munmap
or mprotect.
Also, when you have a large number of processors, having broadcast tlb
invalidations on the bus might become a bottleneck if, at the end of the
day, you really only want to invalidate one or two siblings. In that
case, targetted IPIs are probably a better option.
On SMP with single die and integrated memory controllers (PASemi),
I'd bet that tlb invalidation broadcast is typically much cheaper
since no external signals are involved (from a hardware point of view
it's not very different from a store to a shared cache line that has
to be invalidated in the cache of the other processors).
Gabriel