Re: [dpdk-dev] [PATCH v16 07/11] power: add PMD power management API and callback
From: Ananyev, Konstantin <hidden>
Date: 2021-01-13 12:59:06
-----Original Message-----
From: Burakov, Anatoly <redacted>
Sent: Tuesday, January 12, 2021 5:37 PM
To: dev@dpdk.org
Cc: Ma, Liang J <redacted>; Hunt, David <redacted>; Ray Kinsella <redacted>; Neil Horman
[off-list ref]; thomas@monjalon.net; Ananyev, Konstantin [off-list ref]; McDaniel, Timothy
[off-list ref]; Richardson, Bruce [off-list ref]; Macnamara, Chris [off-list ref]
Subject: [PATCH v16 07/11] power: add PMD power management API and callback
From: Liang Ma <redacted>
Add a simple on/off switch that will enable saving power when no
packets are arriving. It is based on counting the number of empty
polls and, when the number reaches a certain threshold, entering an
architecture-defined optimized power state that will either wait
until a TSC timestamp expires, or when packets arrive.
This API mandates a core-to-single-queue mapping (that is, multiple
queued per device are supported, but they have to be polled on different
cores).
This design is using PMD RX callbacks.
1. UMWAIT/UMONITOR:
When a certain threshold of empty polls is reached, the core will go
into a power optimized sleep while waiting on an address of next RX
descriptor to be written to.
2. TPAUSE/Pause instruction
This method uses the pause (or TPAUSE, if available) instruction to
avoid busy polling.
3. Frequency scaling
Reuse existing DPDK power library to scale up/down core frequency
depending on traffic volume.
Signed-off-by: Liang Ma <redacted>
Signed-off-by: Anatoly Burakov <redacted>
---
Notes:
v15:
- Fix check in UMWAIT callback
v13:
- Rework the synchronization mechanism to not require locking
- Add more parameter checking
- Rework n_rx_queues access to not go through internal PMD structures and use
public API instead
v13:
- Rework the synchronization mechanism to not require locking
- Add more parameter checking
- Rework n_rx_queues access to not go through internal PMD structures and use
public API instead
doc/guides/prog_guide/power_man.rst | 44 +++
doc/guides/rel_notes/release_21_02.rst | 10 +
lib/librte_power/meson.build | 5 +-
lib/librte_power/rte_power_pmd_mgmt.c | 359 +++++++++++++++++++++++++
lib/librte_power/rte_power_pmd_mgmt.h | 90 +++++++
lib/librte_power/version.map | 5 +
6 files changed, 511 insertions(+), 2 deletions(-)
create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c
create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h...
+
+static uint16_t
+clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
+ uint16_t nb_rx, uint16_t max_pkts __rte_unused,
+ void *addr __rte_unused)
+{
+
+ struct pmd_queue_cfg *q_conf;
+
+ q_conf = &port_cfg[port_id][qidx];
+
+ if (unlikely(nb_rx == 0)) {
+ q_conf->empty_poll_stats++;
+ if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) {
+ struct rte_power_monitor_cond pmc;
+ uint16_t ret;
+
+ /*
+ * we might get a cancellation request while being
+ * inside the callback, in which case the wakeup
+ * wouldn't work because it would've arrived too early.
+ *
+ * to get around this, we notify the other thread that
+ * we're sleeping, so that it can spin until we're done.
+ * unsolicited wakeups are perfectly safe.
+ */
+ q_conf->umwait_in_progress = true;This write and subsequent read can be reordered by the cpu. I think you need rte_atomic_thread_fence(__ATOMIC_SEQ_CST) here and in disable() code-path below.
+
+ /* check if we need to cancel sleep */
+ if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
+ /* use monitoring condition to sleep */
+ ret = rte_eth_get_monitor_addr(port_id, qidx,
+ &pmc);
+ if (ret == 0)
+ rte_power_monitor(&pmc, -1ULL);
+ }
+ q_conf->umwait_in_progress = false;
+ }
+ } else
+ q_conf->empty_poll_stats = 0;
+
+ return nb_rx;
+}
+...
+
+int
+rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id,
+ uint16_t port_id, uint16_t queue_id)
+{
+ struct pmd_queue_cfg *queue_cfg;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+ if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT)
+ return -EINVAL;
+
+ /* no need to check queue id as wrong queue id would not be enabled */
+ queue_cfg = &port_cfg[port_id][queue_id];
+
+ if (queue_cfg->pwr_mgmt_state != PMD_MGMT_ENABLED)
+ return -EINVAL;
+
+ /* let the callback know we're shutting down */
+ queue_cfg->pwr_mgmt_state = PMD_MGMT_BUSY;Same as above - write to pwr_mgmt_state and read from umwait_in_progress could be reordered by cpu. Need to insert rte_atomic_thread_fence(__ATOMIC_SEQ_CST) between them. BTW, out of curiosity - why do you need this intermediate state (PMD_MGMT_BUSY) at all? Why not directly: queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED; ?
quoted hunk ↗ jump to hunk
+ + switch (queue_cfg->cb_mode) { + case RTE_POWER_MGMT_TYPE_MONITOR: + { + bool exit = false; + do { + /* + * we may request cancellation while the other thread + * has just entered the callback but hasn't started + * sleeping yet, so keep waking it up until we know it's + * done sleeping. + */ + if (queue_cfg->umwait_in_progress) + rte_power_monitor_wakeup(lcore_id); + else + exit = true; + } while (!exit); + } + /* fall-through */ + case RTE_POWER_MGMT_TYPE_PAUSE: + rte_eth_remove_rx_callback(port_id, queue_id, + queue_cfg->cur_cb); + break; + case RTE_POWER_MGMT_TYPE_SCALE: + rte_power_freq_max(lcore_id); + rte_eth_remove_rx_callback(port_id, queue_id, + queue_cfg->cur_cb); + rte_power_exit(lcore_id); + break; + } + /* + * we don't free the RX callback here because it is unsafe to do so + * unless we know for a fact that all data plane threads have stopped. + */ + queue_cfg->cur_cb = NULL; + queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED; + + return 0; +}diff --git a/lib/librte_power/rte_power_pmd_mgmt.h b/lib/librte_power/rte_power_pmd_mgmt.h new file mode 100644 index 0000000000..0bfbc6ba69 --- /dev/null +++ b/lib/librte_power/rte_power_pmd_mgmt.h@@ -0,0 +1,90 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2020 Intel Corporation + */ + +#ifndef _RTE_POWER_PMD_MGMT_H +#define _RTE_POWER_PMD_MGMT_H + +/** + * @file + * RTE PMD Power Management + */ +#include <stdint.h> +#include <stdbool.h> + +#include <rte_common.h> +#include <rte_byteorder.h> +#include <rte_log.h> +#include <rte_power.h> +#include <rte_atomic.h> + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * PMD Power Management Type + */ +enum rte_power_pmd_mgmt_type { + /** Use power-optimized monitoring to wait for incoming traffic */ + RTE_POWER_MGMT_TYPE_MONITOR = 1, + /** Use power-optimized sleep to avoid busy polling */ + RTE_POWER_MGMT_TYPE_PAUSE, + /** Use frequency scaling when traffic is low */ + RTE_POWER_MGMT_TYPE_SCALE, +}; + +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice + * + * Enable power management on a specified RX queue and lcore. + * + * @note This function is not thread-safe. + * + * @param lcore_id + * lcore_id. + * @param port_id + * The port identifier of the Ethernet device. + * @param queue_id + * The queue identifier of the Ethernet device. + * @param mode + * The power management callback function type. + + * @return + * 0 on success + * <0 on error + */ +__rte_experimental +int +rte_power_pmd_mgmt_queue_enable(unsigned int lcore_id, + uint16_t port_id, uint16_t queue_id, + enum rte_power_pmd_mgmt_type mode); + +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice + * + * Disable power management on a specified RX queue and lcore. + * + * @note This function is not thread-safe. + * + * @param lcore_id + * lcore_id. + * @param port_id + * The port identifier of the Ethernet device. + * @param queue_id + * The queue identifier of the Ethernet device. + * @return + * 0 on success + * <0 on error + */ +__rte_experimental +int +rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id, + uint16_t port_id, uint16_t queue_id); +#ifdef __cplusplus +} +#endif + +#endifdiff --git a/lib/librte_power/version.map b/lib/librte_power/version.map index 69ca9af616..61996b4d11 100644 --- a/lib/librte_power/version.map +++ b/lib/librte_power/version.map@@ -34,4 +34,9 @@ EXPERIMENTAL { rte_power_guest_channel_receive_msg; rte_power_poll_stat_fetch; rte_power_poll_stat_update; + + # added in 21.02 + rte_power_pmd_mgmt_queue_enable; + rte_power_pmd_mgmt_queue_disable; + }; --2.25.1