Thread (156 messages) 156 messages, 8 authors, 2021-08-04

Re: [dpdk-dev] [PATCH v6 5/7] power: support callbacks for multiple Rx queues

From: Ananyev, Konstantin <hidden>
Date: 2021-07-06 18:50:44

quoted hunk ↗ jump to hunk
Currently, there is a hard limitation on the PMD power management
support that only allows it to support a single queue per lcore. This is
not ideal as most DPDK use cases will poll multiple queues per core.

The PMD power management mechanism relies on ethdev Rx callbacks, so it
is very difficult to implement such support because callbacks are
effectively stateless and have no visibility into what the other ethdev
devices are doing. This places limitations on what we can do within the
framework of Rx callbacks, but the basics of this implementation are as
follows:

- Replace per-queue structures with per-lcore ones, so that any device
  polled from the same lcore can share data
- Any queue that is going to be polled from a specific lcore has to be
  added to the list of queues to poll, so that the callback is aware of
  other queues being polled by the same lcore
- Both the empty poll counter and the actual power saving mechanism is
  shared between all queues polled on a particular lcore, and is only
  activated when all queues in the list were polled and were determined
  to have no traffic.
- The limitation on UMWAIT-based polling is not removed because UMWAIT
  is incapable of monitoring more than one address.

Also, while we're at it, update and improve the docs.

Signed-off-by: Anatoly Burakov <redacted>
---

Notes:
    v6:
    - Track each individual queue sleep status (Konstantin)
    - Fix segfault (Dave)

    v5:
    - Remove the "power save queue" API and replace it with mechanism suggested by
      Konstantin

    v3:
    - Move the list of supported NICs to NIC feature table

    v2:
    - Use a TAILQ for queues instead of a static array
    - Address feedback from Konstantin
    - Add additional checks for stopped queues

 doc/guides/nics/features.rst           |  10 +
 doc/guides/prog_guide/power_man.rst    |  65 ++--
 doc/guides/rel_notes/release_21_08.rst |   3 +
 lib/power/rte_power_pmd_mgmt.c         | 452 +++++++++++++++++++------
 4 files changed, 394 insertions(+), 136 deletions(-)
diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index 403c2b03a3..a96e12d155 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -912,6 +912,16 @@ Supports to get Rx/Tx packet burst mode information.
 * **[implements] eth_dev_ops**: ``rx_burst_mode_get``, ``tx_burst_mode_get``.
 * **[related] API**: ``rte_eth_rx_burst_mode_get()``, ``rte_eth_tx_burst_mode_get()``.

+.. _nic_features_get_monitor_addr:
+
+PMD power management using monitor addresses
+--------------------------------------------
+
+Supports getting a monitoring condition to use together with Ethernet PMD power
+management (see :doc:`../prog_guide/power_man` for more details).
+
+* **[implements] eth_dev_ops**: ``get_monitor_addr``
+
 .. _nic_features_other:

 Other dev ops not represented by a Feature
diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index c70ae128ac..ec04a72108 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -198,34 +198,41 @@ Ethernet PMD Power Management API
 Abstract
 ~~~~~~~~

-Existing power management mechanisms require developers
-to change application design or change code to make use of it.
-The PMD power management API provides a convenient alternative
-by utilizing Ethernet PMD RX callbacks,
-and triggering power saving whenever empty poll count reaches a certain number.
-
-Monitor
-   This power saving scheme will put the CPU into optimized power state
-   and use the ``rte_power_monitor()`` function
-   to monitor the Ethernet PMD RX descriptor address,
-   and wake the CPU up whenever there's new traffic.
-
-Pause
-   This power saving scheme will avoid busy polling
-   by either entering power-optimized sleep state
-   with ``rte_power_pause()`` function,
-   or, if it's not available, use ``rte_pause()``.
-
-Frequency scaling
-   This power saving scheme will use ``librte_power`` library
-   functionality to scale the core frequency up/down
-   depending on traffic volume.
-
-.. note::
-
-   Currently, this power management API is limited to mandatory mapping
-   of 1 queue to 1 core (multiple queues are supported,
-   but they must be polled from different cores).
+Existing power management mechanisms require developers to change application
+design or change code to make use of it. The PMD power management API provides a
+convenient alternative by utilizing Ethernet PMD RX callbacks, and triggering
+power saving whenever empty poll count reaches a certain number.
+
+* Monitor
+   This power saving scheme will put the CPU into optimized power state and
+   monitor the Ethernet PMD RX descriptor address, waking the CPU up whenever
+   there's new traffic. Support for this scheme may not be available on all
+   platforms, and further limitations may apply (see below).
+
+* Pause
+   This power saving scheme will avoid busy polling by either entering
+   power-optimized sleep state with ``rte_power_pause()`` function, or, if it's
+   not supported by the underlying platform, use ``rte_pause()``.
+
+* Frequency scaling
+   This power saving scheme will use ``librte_power`` library functionality to
+   scale the core frequency up/down depending on traffic volume.
+
+The "monitor" mode is only supported in the following configurations and scenarios:
+
+* If ``rte_cpu_get_intrinsics_support()`` function indicates that
+  ``rte_power_monitor()`` is supported by the platform, then monitoring will be
+  limited to a mapping of 1 core 1 queue (thus, each Rx queue will have to be
+  monitored from a different lcore).
+
+* If ``rte_cpu_get_intrinsics_support()`` function indicates that the
+  ``rte_power_monitor()`` function is not supported, then monitor mode will not
+  be supported.
+
+* Not all Ethernet drivers support monitoring, even if the underlying
+  platform may support the necessary CPU instructions. Please refer to
+  :doc:`../nics/overview` for more information.
+
.... 
+static inline void
+queue_reset(struct pmd_core_cfg *cfg, struct queue_list_entry *qcfg)
+{
+	const bool is_ready_to_sleep = qcfg->n_empty_polls > EMPTYPOLL_MAX;
+
+	/* reset empty poll counter for this queue */
+	qcfg->n_empty_polls = 0;
+	/* reset the queue sleep counter as well */
+	qcfg->n_sleeps = 0;
+	/* remove the queue from list of cores ready to sleep */
+	if (is_ready_to_sleep)
+		cfg->n_queues_ready_to_sleep--;
+	/*
+	 * no need change the lcore sleep target counter because this lcore will
+	 * reach the n_sleeps anyway, and the other cores are already counted so
+	 * there's no need to do anything else.
+	 */
+}
+
+static inline bool
+queue_can_sleep(struct pmd_core_cfg *cfg, struct queue_list_entry *qcfg)
+{
+	/* this function is called - that means we have an empty poll */
+	qcfg->n_empty_polls++;
+
+	/* if we haven't reached threshold for empty polls, we can't sleep */
+	if (qcfg->n_empty_polls <= EMPTYPOLL_MAX)
+		return false;
+
+	/*
+	 * we've reached a point where we are able to sleep, but we still need
+	 * to check if this queue has already been marked for sleeping.
+	 */
+	if (qcfg->n_sleeps == cfg->sleep_target)
+		return true;
+
+	/* mark this queue as ready for sleep */
+	qcfg->n_sleeps = cfg->sleep_target;
+	cfg->n_queues_ready_to_sleep++;
So, assuming there is no incoming traffic, should it be:
1) poll_all_queues(times=EMPTYPOLL_MAX); sleep; poll_all_queues(times=1); sleep; poll_all_queues(times=1); sleep; ...
OR
2) poll_all_queues(times=EMPTYPOLL_MAX); sleep; poll_all_queues(times= EMPTYPOLL_MAX); sleep; poll_all_queues(times= EMPTYPOLL_MAX); sleep; ...
?

My initial thought was 2) but might be the intention is 1)?
quoted hunk ↗ jump to hunk
+
+	return true;
+}
+
+static inline bool
+lcore_can_sleep(struct pmd_core_cfg *cfg)
+{
+	/* are all queues ready to sleep? */
+	if (cfg->n_queues_ready_to_sleep != cfg->n_queues)
+		return false;
+
+	/* we've reached an iteration where we can sleep, reset sleep counter */
+	cfg->n_queues_ready_to_sleep = 0;
+	cfg->sleep_target++;
+
+	return true;
+}
+
 static uint16_t
 clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
-		uint16_t nb_rx, uint16_t max_pkts __rte_unused,
-		void *addr __rte_unused)
+		uint16_t nb_rx, uint16_t max_pkts __rte_unused, void *arg)
 {
+	struct queue_list_entry *queue_conf = arg;

-	struct pmd_queue_cfg *q_conf;
-
-	q_conf = &port_cfg[port_id][qidx];
-
+	/* this callback can't do more than one queue, omit multiqueue logic */
 	if (unlikely(nb_rx == 0)) {
-		q_conf->empty_poll_stats++;
-		if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) {
+		queue_conf->n_empty_polls++;
+		if (unlikely(queue_conf->n_empty_polls > EMPTYPOLL_MAX)) {
 			struct rte_power_monitor_cond pmc;
-			uint16_t ret;
+			int ret;

 			/* use monitoring condition to sleep */
 			ret = rte_eth_get_monitor_addr(port_id, qidx,
@@ -97,60 +231,77 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
 				rte_power_monitor(&pmc, UINT64_MAX);
 		}
 	} else
-		q_conf->empty_poll_stats = 0;
+		queue_conf->n_empty_polls = 0;

 	return nb_rx;
 }
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help