Thread (19 messages) 19 messages, 5 authors, 2009-05-30

Re: [PATCH RFC] v5 expedited "big hammer" RCU grace periods

From: Paul E. McKenney <hidden>
Date: 2009-05-30 04:56:47
Also in: lkml, netfilter-devel

On Fri, May 29, 2009 at 05:36:37PM +0530, Gautham R Shenoy wrote:
On Thu, May 28, 2009 at 06:22:51PM -0700, Paul E. McKenney wrote:
quoted
Hmmm...  Making the transition work nicely would require some thought.
It might be good to retain the two-phase nature, even when reversing
the order of offline notifications.  This would address one disadvantage
of the past-life version, which was unnecessary migration of processes
off of the CPU in question, only to find that a later notifier aborted
the offlining.
The notifiers handling CPU_DEAD cannot abort it from here since the
operation has already completed, whether they like it or not!
Hello, Gautham,

We are talking past each other -- the past-life (not Linux) CPU-offlining
scheme had but one phase for offlining, which meant that if a very late
notifier-equivalent realized that the offlining could not proceed,
it would have mostly shut the CPU down, only to have to restart it.

For example, it might have needlessly migrated processes off of that
CPU.  This did not happen often, but it was a bit of a disadvantage.

							Thanx, Paul
If there exist notifiers which try to abort it from here, it's a BUG, as
the code says:

        /* CPU is completely dead: tell everyone.  Too late to complain.
	 * */
         if (raw_notifier_call_chain(&cpu_chain, CPU_DEAD | mod,
	                                     hcpu) == NOTIFY_BAD)
	                     BUG();

Also, one can thus consider the CPU_DEAD and the CPU_POST_DEAD parts to be
the extensions of the second phase. Just that we do some
additional cleanup once the CPU has actually gone down. migration of
processes (while breaking their affinity if required) is one of them.

But there are other things as well, such as rebuilding the sched-domain
which have to be done after the cpu has gone down. Currently this
operation contributes to majority of time taken to bring a cpu-offline.
quoted
So only the first phase is permitted to abort the offlining of the CPU,
and this first phase must also set whatever state is necessary to prevent
some later operation from making it impossible to offline the CPU.
The second phase would unconditionally take the CPU out of service.
In theory, this approach would allow incremental conversion of the
notifiers, waiting to remove the stop_machine stuff until all notifiers
had been converted.
If this actually works out, the sequence of changes would be as follows:

1.	Reverse the order of the offline notifications, fixing any
	bugs induced/exposed by this change.

2.	Incrementally convert notifiers to the new mechanism.  This
	will require more thought.

3.	Get rid of the stop_machine and the CPU_DEAD once all are
	converted.
I agree with this sequence. It seems quite logical.

However, I am not yet sure if we can completely get rid of stop_machine
and CPU_DEAD in practice, unless we're okay with having an
time-consuming rollback operation. Currently the rollback only consists of
rolling back the actions done during CPU_UP_PREPARE/CPU_DOWN_PREPARE.

And from the notifiers profile (see attached file),
UP_PREPARE/DOWN_PREPARE seem to consume a lot lesser time
when compared to the post-hotplug notifications.
quoted
Or we might find that simply reversing the order (#1 above) suffices.
quoted
quoted
This meant that a given CPU was naturally guaranteed to be 
correctly taking interrupts for the entire time that it was 
capable of running user-level processes. Later in the offlining 
process, it would still take interrupts, but would be unable to 
run user processes.  Still later, it would no longer be taking 
interrupts, and would stop participating in RCU and in the global 
TLB-flush algorithm.  There was no need to stop the whole machine 
to make a given CPU go offline, in fact, most of the work was done 
by the CPU in question.

In the case of RCU, this meant that there was no need for 
double-checking for offlined CPUs, because CPUs could reliably 
indicate a quiescent state on their way out.

On the other hand, there was no equivalent of dynticks in the old 
days. And it is dynticks that is responsible for most of the 
complexity present in force_quiescent_state(), not CPU hotplug.

So I cannot hold up RCU as something that would be greatly 
simplified by changing the CPU hotplug design, much as I might 
like to.  ;-)
We could probably remove a fair bit of dynticks complexity by 
removing non-dynticks and removing non-hrtimer. People could still 
force a 'periodic' interrupting mode (if they want, or if their hw 
forces that), but that would be a plain periodic hrtimer firing off 
all the time.
Hmmm...  That would not simplify RCU much, but on the other hand (1) the
rcutree.c dynticks approach is already quite a bit simpler than the
rcupreempt.c approach and (2) doing this could potentially simplify
other things.

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
-- 
Thanks and Regards
gautham
=============================================================================
statistics for CPU_DOWN_PREPARE
=============================================================================
      410 ns: buffer_cpu_notify             : CPU_DOWN_PREPARE
      441 ns: radix_tree_callback           : CPU_DOWN_PREPARE
      473 ns: relay_hotcpu_callback         : CPU_DOWN_PREPARE
      486 ns: blk_cpu_notify                : CPU_DOWN_PREPARE
      563 ns: cpu_callback                  : CPU_DOWN_PREPARE
      579 ns: hotplug_hrtick                : CPU_DOWN_PREPARE
      594 ns: cpu_callback                  : CPU_DOWN_PREPARE
      605 ns: cpu_numa_callback             : CPU_DOWN_PREPARE
      611 ns: hrtimer_cpu_notify            : CPU_DOWN_PREPARE
      625 ns: flow_cache_cpu                : CPU_DOWN_PREPARE
      625 ns: rcu_barrier_cpu_hotplug       : CPU_DOWN_PREPARE
      639 ns: hotplug_cfd                   : CPU_DOWN_PREPARE
      641 ns: pageset_cpuup_callback        : CPU_DOWN_PREPARE
      656 ns: rb_cpu_notify                 : CPU_DOWN_PREPARE
      670 ns: dev_cpu_callback              : CPU_DOWN_PREPARE
      670 ns: topology_cpu_callback         : CPU_DOWN_PREPARE
      672 ns: remote_softirq_cpu_notify     : CPU_DOWN_PREPARE
      715 ns: ratelimit_handler             : CPU_DOWN_PREPARE
      715 ns: rcu_cpu_notify                : CPU_DOWN_PREPARE
      717 ns: timer_cpu_notify              : CPU_DOWN_PREPARE
      730 ns: page_alloc_cpu_notify         : CPU_DOWN_PREPARE
      746 ns: cpu_callback                  : CPU_DOWN_PREPARE
      821 ns: cpuset_track_online_cpus      : CPU_DOWN_PREPARE
      824 ns: slab_cpuup_callback           : CPU_DOWN_PREPARE
      849 ns: sysfs_cpu_notify              : CPU_DOWN_PREPARE
      884 ns: percpu_counter_hotcpu_callback: CPU_DOWN_PREPARE
      961 ns: update_runtime                : CPU_DOWN_PREPARE
     1323 ns: migration_call                : CPU_DOWN_PREPARE
     1918 ns: vmstat_cpuup_callback         : CPU_DOWN_PREPARE
     2072 ns: workqueue_cpu_callback        : CPU_DOWN_PREPARE
=========================================================================
Total time for CPU_DOWN_PREPARE = .023235000 ms
=========================================================================
=============================================================================
statistics for CPU_DYING
=============================================================================
      365 ns: remote_softirq_cpu_notify     : CPU_DYING
      365 ns: topology_cpu_callback         : CPU_DYING
      381 ns: blk_cpu_notify                : CPU_DYING
      381 ns: cpu_callback                  : CPU_DYING
      381 ns: relay_hotcpu_callback         : CPU_DYING
      381 ns: update_runtime                : CPU_DYING
      394 ns: dev_cpu_callback              : CPU_DYING
      395 ns: hotplug_cfd                   : CPU_DYING
      395 ns: vmstat_cpuup_callback         : CPU_DYING
      397 ns: cpuset_track_online_cpus      : CPU_DYING
      397 ns: flow_cache_cpu                : CPU_DYING
      397 ns: pageset_cpuup_callback        : CPU_DYING
      397 ns: rb_cpu_notify                 : CPU_DYING
      398 ns: hotplug_hrtick                : CPU_DYING
      410 ns: cpu_callback                  : CPU_DYING
      410 ns: page_alloc_cpu_notify         : CPU_DYING
      411 ns: rcu_cpu_notify                : CPU_DYING
      412 ns: slab_cpuup_callback           : CPU_DYING
      412 ns: sysfs_cpu_notify              : CPU_DYING
      412 ns: timer_cpu_notify              : CPU_DYING
      426 ns: buffer_cpu_notify             : CPU_DYING
      426 ns: radix_tree_callback           : CPU_DYING
      441 ns: cpu_callback                  : CPU_DYING
      442 ns: cpu_numa_callback             : CPU_DYING
      473 ns: ratelimit_handler             : CPU_DYING
      531 ns: percpu_counter_hotcpu_callback: CPU_DYING
      562 ns: workqueue_cpu_callback        : CPU_DYING
      730 ns: rcu_barrier_cpu_hotplug       : CPU_DYING
     1536 ns: migration_call                : CPU_DYING
     1873 ns: hrtimer_cpu_notify            : CPU_DYING
=========================================================================
Total time for CPU_DYING = .015331000 ms
=========================================================================
=============================================================================
statistics for CPU_DOWN_CANCELED
=============================================================================
=========================================================================
Total time for CPU_DOWN_CANCELED = 0 ms
=========================================================================
=============================================================================
statistics for __stop_machine
=============================================================================
   357983 ns: __stop_machine                :
=========================================================================
Total time for __stop_machine = .357983000 ms
=========================================================================
=============================================================================
statistics for CPU_DEAD
=============================================================================
      350 ns: update_runtime                : CPU_DEAD
      379 ns: hotplug_hrtick                : CPU_DEAD
      381 ns: cpu_callback                  : CPU_DEAD
      381 ns: rb_cpu_notify                 : CPU_DEAD
      426 ns: hotplug_cfd                   : CPU_DEAD
      426 ns: relay_hotcpu_callback         : CPU_DEAD
      441 ns: rcu_barrier_cpu_hotplug       : CPU_DEAD
      442 ns: remote_softirq_cpu_notify     : CPU_DEAD
      609 ns: ratelimit_handler             : CPU_DEAD
      625 ns: cpu_numa_callback             : CPU_DEAD
      684 ns: dev_cpu_callback              : CPU_DEAD
      686 ns: workqueue_cpu_callback        : CPU_DEAD
      838 ns: rcu_cpu_notify                : CPU_DEAD
      898 ns: pageset_cpuup_callback        : CPU_DEAD
     1202 ns: vmstat_cpuup_callback         : CPU_DEAD
     1295 ns: blk_cpu_notify                : CPU_DEAD
     1554 ns: buffer_cpu_notify             : CPU_DEAD
     2588 ns: hrtimer_cpu_notify            : CPU_DEAD
     3274 ns: radix_tree_callback           : CPU_DEAD
     5246 ns: timer_cpu_notify              : CPU_DEAD
     8587 ns: flow_cache_cpu                : CPU_DEAD
     8645 ns: topology_cpu_callback         : CPU_DEAD
    12454 ns: cpu_callback                  : CPU_DEAD
    12650 ns: cpu_callback                  : CPU_DEAD
    45727 ns: percpu_counter_hotcpu_callback: CPU_DEAD
    55242 ns: page_alloc_cpu_notify         : CPU_DEAD
    56766 ns: sysfs_cpu_notify              : CPU_DEAD
    58241 ns: slab_cpuup_callback           : CPU_DEAD
    78250 ns: migration_call                : CPU_DEAD
 10784759 ns: cpuset_track_online_cpus      : CPU_DEAD
=========================================================================
Total time for CPU_DEAD = 11.144046000 ms
=========================================================================
=============================================================================
statistics for CPU_POST_DEAD
=============================================================================
      350 ns: cpu_callback                  : CPU_POST_DEAD
      365 ns: blk_cpu_notify                : CPU_POST_DEAD
      365 ns: buffer_cpu_notify             : CPU_POST_DEAD
      365 ns: cpu_numa_callback             : CPU_POST_DEAD
      365 ns: dev_cpu_callback              : CPU_POST_DEAD
      365 ns: flow_cache_cpu                : CPU_POST_DEAD
      365 ns: hrtimer_cpu_notify            : CPU_POST_DEAD
      365 ns: page_alloc_cpu_notify         : CPU_POST_DEAD
      365 ns: rb_cpu_notify                 : CPU_POST_DEAD
      365 ns: rcu_cpu_notify                : CPU_POST_DEAD
      365 ns: timer_cpu_notify              : CPU_POST_DEAD
      365 ns: update_runtime                : CPU_POST_DEAD
      366 ns: cpu_callback                  : CPU_POST_DEAD
      366 ns: hotplug_cfd                   : CPU_POST_DEAD
      366 ns: pageset_cpuup_callback        : CPU_POST_DEAD
      366 ns: radix_tree_callback           : CPU_POST_DEAD
      367 ns: hotplug_hrtick                : CPU_POST_DEAD
      367 ns: topology_cpu_callback         : CPU_POST_DEAD
      367 ns: vmstat_cpuup_callback         : CPU_POST_DEAD
      381 ns: cpu_callback                  : CPU_POST_DEAD
      381 ns: cpuset_track_online_cpus      : CPU_POST_DEAD
      381 ns: relay_hotcpu_callback         : CPU_POST_DEAD
      381 ns: sysfs_cpu_notify              : CPU_POST_DEAD
      383 ns: rcu_barrier_cpu_hotplug       : CPU_POST_DEAD
      410 ns: remote_softirq_cpu_notify     : CPU_POST_DEAD
      412 ns: slab_cpuup_callback           : CPU_POST_DEAD
      442 ns: migration_call                : CPU_POST_DEAD
      457 ns: percpu_counter_hotcpu_callback: CPU_POST_DEAD
      502 ns: ratelimit_handler             : CPU_POST_DEAD
    86200 ns: workqueue_cpu_callback        : CPU_POST_DEAD
=========================================================================
Total time for CPU_POST_DEAD = .097260000 ms
=========================================================================
=============================================================================
statistics for CPU_UP_PREPARE
=============================================================================
      336 ns: hotplug_hrtick                : CPU_UP_PREPARE
      350 ns: cpu_callback                  : CPU_UP_PREPARE
      365 ns: blk_cpu_notify                : CPU_UP_PREPARE
      381 ns: vmstat_cpuup_callback         : CPU_UP_PREPARE
      410 ns: buffer_cpu_notify             : CPU_UP_PREPARE
      410 ns: radix_tree_callback           : CPU_UP_PREPARE
      426 ns: dev_cpu_callback              : CPU_UP_PREPARE
      426 ns: remote_softirq_cpu_notify     : CPU_UP_PREPARE
      428 ns: cpuset_track_online_cpus      : CPU_UP_PREPARE
      441 ns: sysfs_cpu_notify              : CPU_UP_PREPARE
      471 ns: hotplug_cfd                   : CPU_UP_PREPARE
      472 ns: rb_cpu_notify                 : CPU_UP_PREPARE
      473 ns: flow_cache_cpu                : CPU_UP_PREPARE
      486 ns: page_alloc_cpu_notify         : CPU_UP_PREPARE
      488 ns: hrtimer_cpu_notify            : CPU_UP_PREPARE
      488 ns: update_runtime                : CPU_UP_PREPARE
      502 ns: rcu_barrier_cpu_hotplug       : CPU_UP_PREPARE
      531 ns: percpu_counter_hotcpu_callback: CPU_UP_PREPARE
      547 ns: ratelimit_handler             : CPU_UP_PREPARE
      594 ns: relay_hotcpu_callback         : CPU_UP_PREPARE
     1125 ns: rcu_cpu_notify                : CPU_UP_PREPARE
     1309 ns: pageset_cpuup_callback        : CPU_UP_PREPARE
     1947 ns: timer_cpu_notify              : CPU_UP_PREPARE
     5389 ns: cpu_numa_callback             : CPU_UP_PREPARE
     6379 ns: topology_cpu_callback         : CPU_UP_PREPARE
     6436 ns: slab_cpuup_callback           : CPU_UP_PREPARE
    19879 ns: cpu_callback                  : CPU_UP_PREPARE
    20227 ns: cpu_callback                  : CPU_UP_PREPARE
    33940 ns: migration_call                : CPU_UP_PREPARE
   143731 ns: workqueue_cpu_callback        : CPU_UP_PREPARE
=========================================================================
Total time for CPU_UP_PREPARE = .249387000 ms
=========================================================================
=============================================================================
statistics for CPU_UP_CANCELED
=============================================================================
=========================================================================
Total time for CPU_UP_CANCELED = 0 ms
=========================================================================
=============================================================================
statistics for __cpu_up
=============================================================================
205868908 ns: __cpu_up                      :
=========================================================================
Total time for __cpu_up = 205.868908000 ms
=========================================================================
=============================================================================
statistics for CPU_STARTING
=============================================================================
      350 ns: hotplug_cfd                   : CPU_STARTING
      352 ns: cpu_callback                  : CPU_STARTING
      352 ns: remote_softirq_cpu_notify     : CPU_STARTING
      363 ns: vmstat_cpuup_callback         : CPU_STARTING
      365 ns: cpu_callback                  : CPU_STARTING
      365 ns: dev_cpu_callback              : CPU_STARTING
      365 ns: hotplug_hrtick                : CPU_STARTING
      365 ns: radix_tree_callback           : CPU_STARTING
      365 ns: rb_cpu_notify                 : CPU_STARTING
      368 ns: update_runtime                : CPU_STARTING
      379 ns: cpu_callback                  : CPU_STARTING
      379 ns: cpu_numa_callback             : CPU_STARTING
      380 ns: rcu_barrier_cpu_hotplug       : CPU_STARTING
      380 ns: relay_hotcpu_callback         : CPU_STARTING
      381 ns: hrtimer_cpu_notify            : CPU_STARTING
      381 ns: pageset_cpuup_callback        : CPU_STARTING
      381 ns: slab_cpuup_callback           : CPU_STARTING
      382 ns: flow_cache_cpu                : CPU_STARTING
      394 ns: blk_cpu_notify                : CPU_STARTING
      397 ns: buffer_cpu_notify             : CPU_STARTING
      397 ns: percpu_counter_hotcpu_callback: CPU_STARTING
      397 ns: sysfs_cpu_notify              : CPU_STARTING
      397 ns: topology_cpu_callback         : CPU_STARTING
      410 ns: rcu_cpu_notify                : CPU_STARTING
      412 ns: page_alloc_cpu_notify         : CPU_STARTING
      426 ns: cpuset_track_online_cpus      : CPU_STARTING
      455 ns: ratelimit_handler             : CPU_STARTING
      471 ns: timer_cpu_notify              : CPU_STARTING
      516 ns: migration_call                : CPU_STARTING
      549 ns: workqueue_cpu_callback        : CPU_STARTING
=========================================================================
Total time for CPU_STARTING = .011874000 ms
=========================================================================
=============================================================================
statistics for CPU_ONLINE
=============================================================================
      365 ns: radix_tree_callback           : CPU_ONLINE
      379 ns: hotplug_hrtick                : CPU_ONLINE
      381 ns: hrtimer_cpu_notify            : CPU_ONLINE
      381 ns: remote_softirq_cpu_notify     : CPU_ONLINE
      410 ns: slab_cpuup_callback           : CPU_ONLINE
      410 ns: timer_cpu_notify              : CPU_ONLINE
      412 ns: blk_cpu_notify                : CPU_ONLINE
      426 ns: dev_cpu_callback              : CPU_ONLINE
      426 ns: flow_cache_cpu                : CPU_ONLINE
      426 ns: topology_cpu_callback         : CPU_ONLINE
      428 ns: rcu_barrier_cpu_hotplug       : CPU_ONLINE
      428 ns: rcu_cpu_notify                : CPU_ONLINE
      440 ns: buffer_cpu_notify             : CPU_ONLINE
      455 ns: pageset_cpuup_callback        : CPU_ONLINE
      457 ns: relay_hotcpu_callback         : CPU_ONLINE
      473 ns: rb_cpu_notify                 : CPU_ONLINE
      518 ns: update_runtime                : CPU_ONLINE
      549 ns: cpu_numa_callback             : CPU_ONLINE
      562 ns: ratelimit_handler             : CPU_ONLINE
      595 ns: page_alloc_cpu_notify         : CPU_ONLINE
      596 ns: hotplug_cfd                   : CPU_ONLINE
      777 ns: percpu_counter_hotcpu_callback: CPU_ONLINE
     1037 ns: cpu_callback                  : CPU_ONLINE
     1280 ns: cpu_callback                  : CPU_ONLINE
     1680 ns: cpu_callback                  : CPU_ONLINE
     2043 ns: vmstat_cpuup_callback         : CPU_ONLINE
     3422 ns: migration_call                : CPU_ONLINE
    12344 ns: workqueue_cpu_callback        : CPU_ONLINE
    52879 ns: sysfs_cpu_notify              : CPU_ONLINE
 12287706 ns: cpuset_track_online_cpus      : CPU_ONLINE
=========================================================================
Total time for CPU_ONLINE = 12.372685000 ms
=========================================================================
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help