Thread (354 messages) 354 messages, 18 authors, 2021-01-29

Re: [dpdk-dev] [PATCH v16 07/11] power: add PMD power management API and callback

From: Ananyev, Konstantin <hidden>
Date: 2021-01-13 12:59:06

-----Original Message-----
From: Burakov, Anatoly <redacted>
Sent: Tuesday, January 12, 2021 5:37 PM
To: dev@dpdk.org
Cc: Ma, Liang J <redacted>; Hunt, David <redacted>; Ray Kinsella <redacted>; Neil Horman
[off-list ref]; thomas@monjalon.net; Ananyev, Konstantin [off-list ref]; McDaniel, Timothy
[off-list ref]; Richardson, Bruce [off-list ref]; Macnamara, Chris [off-list ref]
Subject: [PATCH v16 07/11] power: add PMD power management API and callback

From: Liang Ma <redacted>

Add a simple on/off switch that will enable saving power when no
packets are arriving. It is based on counting the number of empty
polls and, when the number reaches a certain threshold, entering an
architecture-defined optimized power state that will either wait
until a TSC timestamp expires, or when packets arrive.

This API mandates a core-to-single-queue mapping (that is, multiple
queued per device are supported, but they have to be polled on different
cores).

This design is using PMD RX callbacks.

1. UMWAIT/UMONITOR:

   When a certain threshold of empty polls is reached, the core will go
   into a power optimized sleep while waiting on an address of next RX
   descriptor to be written to.

2. TPAUSE/Pause instruction

   This method uses the pause (or TPAUSE, if available) instruction to
   avoid busy polling.

3. Frequency scaling
   Reuse existing DPDK power library to scale up/down core frequency
   depending on traffic volume.

Signed-off-by: Liang Ma <redacted>
Signed-off-by: Anatoly Burakov <redacted>
---

Notes:
    v15:
    - Fix check in UMWAIT callback

    v13:
    - Rework the synchronization mechanism to not require locking
    - Add more parameter checking
    - Rework n_rx_queues access to not go through internal PMD structures and use
      public API instead

    v13:
    - Rework the synchronization mechanism to not require locking
    - Add more parameter checking
    - Rework n_rx_queues access to not go through internal PMD structures and use
      public API instead

 doc/guides/prog_guide/power_man.rst    |  44 +++
 doc/guides/rel_notes/release_21_02.rst |  10 +
 lib/librte_power/meson.build           |   5 +-
 lib/librte_power/rte_power_pmd_mgmt.c  | 359 +++++++++++++++++++++++++
 lib/librte_power/rte_power_pmd_mgmt.h  |  90 +++++++
 lib/librte_power/version.map           |   5 +
 6 files changed, 511 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c
 create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h
...
+
+static uint16_t
+clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
+		uint16_t nb_rx, uint16_t max_pkts __rte_unused,
+		void *addr __rte_unused)
+{
+
+	struct pmd_queue_cfg *q_conf;
+
+	q_conf = &port_cfg[port_id][qidx];
+
+	if (unlikely(nb_rx == 0)) {
+		q_conf->empty_poll_stats++;
+		if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) {
+			struct rte_power_monitor_cond pmc;
+			uint16_t ret;
+
+			/*
+			 * we might get a cancellation request while being
+			 * inside the callback, in which case the wakeup
+			 * wouldn't work because it would've arrived too early.
+			 *
+			 * to get around this, we notify the other thread that
+			 * we're sleeping, so that it can spin until we're done.
+			 * unsolicited wakeups are perfectly safe.
+			 */
+			q_conf->umwait_in_progress = true;
This write and subsequent read can be reordered by the cpu.
I think you need rte_atomic_thread_fence(__ATOMIC_SEQ_CST) here and
in disable() code-path below.
+
+			/* check if we need to cancel sleep */
+			if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
+				/* use monitoring condition to sleep */
+				ret = rte_eth_get_monitor_addr(port_id, qidx,
+						&pmc);
+				if (ret == 0)
+					rte_power_monitor(&pmc, -1ULL);
+			}
+			q_conf->umwait_in_progress = false;
+		}
+	} else
+		q_conf->empty_poll_stats = 0;
+
+	return nb_rx;
+}
+
...
+
+int
+rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id,
+		uint16_t port_id, uint16_t queue_id)
+{
+	struct pmd_queue_cfg *queue_cfg;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT)
+		return -EINVAL;
+
+	/* no need to check queue id as wrong queue id would not be enabled */
+	queue_cfg = &port_cfg[port_id][queue_id];
+
+	if (queue_cfg->pwr_mgmt_state != PMD_MGMT_ENABLED)
+		return -EINVAL;
+
+	/* let the callback know we're shutting down */
+	queue_cfg->pwr_mgmt_state = PMD_MGMT_BUSY;
Same as above - write to pwr_mgmt_state and read from umwait_in_progress
could be reordered by cpu.
Need to insert rte_atomic_thread_fence(__ATOMIC_SEQ_CST) between them.

BTW, out of curiosity - why do you need this intermediate
state (PMD_MGMT_BUSY) at all?
Why not directly:
queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
?
quoted hunk ↗ jump to hunk
+
+	switch (queue_cfg->cb_mode) {
+	case RTE_POWER_MGMT_TYPE_MONITOR:
+	{
+		bool exit = false;
+		do {
+			/*
+			 * we may request cancellation while the other thread
+			 * has just entered the callback but hasn't started
+			 * sleeping yet, so keep waking it up until we know it's
+			 * done sleeping.
+			 */
+			if (queue_cfg->umwait_in_progress)
+				rte_power_monitor_wakeup(lcore_id);
+			else
+				exit = true;
+		} while (!exit);
+	}
+	/* fall-through */
+	case RTE_POWER_MGMT_TYPE_PAUSE:
+		rte_eth_remove_rx_callback(port_id, queue_id,
+				queue_cfg->cur_cb);
+		break;
+	case RTE_POWER_MGMT_TYPE_SCALE:
+		rte_power_freq_max(lcore_id);
+		rte_eth_remove_rx_callback(port_id, queue_id,
+				queue_cfg->cur_cb);
+		rte_power_exit(lcore_id);
+		break;
+	}
+	/*
+	 * we don't free the RX callback here because it is unsafe to do so
+	 * unless we know for a fact that all data plane threads have stopped.
+	 */
+	queue_cfg->cur_cb = NULL;
+	queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
+
+	return 0;
+}
diff --git a/lib/librte_power/rte_power_pmd_mgmt.h b/lib/librte_power/rte_power_pmd_mgmt.h
new file mode 100644
index 0000000000..0bfbc6ba69
--- /dev/null
+++ b/lib/librte_power/rte_power_pmd_mgmt.h
@@ -0,0 +1,90 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2020 Intel Corporation
+ */
+
+#ifndef _RTE_POWER_PMD_MGMT_H
+#define _RTE_POWER_PMD_MGMT_H
+
+/**
+ * @file
+ * RTE PMD Power Management
+ */
+#include <stdint.h>
+#include <stdbool.h>
+
+#include <rte_common.h>
+#include <rte_byteorder.h>
+#include <rte_log.h>
+#include <rte_power.h>
+#include <rte_atomic.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * PMD Power Management Type
+ */
+enum rte_power_pmd_mgmt_type {
+	/** Use power-optimized monitoring to wait for incoming traffic */
+	RTE_POWER_MGMT_TYPE_MONITOR = 1,
+	/** Use power-optimized sleep to avoid busy polling */
+	RTE_POWER_MGMT_TYPE_PAUSE,
+	/** Use frequency scaling when traffic is low */
+	RTE_POWER_MGMT_TYPE_SCALE,
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enable power management on a specified RX queue and lcore.
+ *
+ * @note This function is not thread-safe.
+ *
+ * @param lcore_id
+ *   lcore_id.
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The queue identifier of the Ethernet device.
+ * @param mode
+ *   The power management callback function type.
+
+ * @return
+ *   0 on success
+ *   <0 on error
+ */
+__rte_experimental
+int
+rte_power_pmd_mgmt_queue_enable(unsigned int lcore_id,
+		uint16_t port_id, uint16_t queue_id,
+		enum rte_power_pmd_mgmt_type mode);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Disable power management on a specified RX queue and lcore.
+ *
+ * @note This function is not thread-safe.
+ *
+ * @param lcore_id
+ *   lcore_id.
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The queue identifier of the Ethernet device.
+ * @return
+ *   0 on success
+ *   <0 on error
+ */
+__rte_experimental
+int
+rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id,
+		uint16_t port_id, uint16_t queue_id);
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_power/version.map b/lib/librte_power/version.map
index 69ca9af616..61996b4d11 100644
--- a/lib/librte_power/version.map
+++ b/lib/librte_power/version.map
@@ -34,4 +34,9 @@ EXPERIMENTAL {
 	rte_power_guest_channel_receive_msg;
 	rte_power_poll_stat_fetch;
 	rte_power_poll_stat_update;
+
+	# added in 21.02
+	rte_power_pmd_mgmt_queue_enable;
+	rte_power_pmd_mgmt_queue_disable;
+
 };
--
2.25.1
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help