Thread (4 messages) 4 messages, 2 authors, 2021-02-19

Re: [PATCH] nvmet-tcp: enable optional queue idle period tracking

From: Sagi Grimberg <sagi@grimberg.me>
Date: 2021-02-15 21:18:40


On 2/11/21 3:42 PM, Wunderlich, Mark wrote:
nvmet-tcp: enable optional queue idle period tracking

Add 'queue idle period' option used by io_work() to support
network devices enabled with advanced interrupt moderation
supporting a relaxed interrupt model. It was discovered that
such a NIC used on the target was unable to support initiator
connection establishment, caused by the existing io_work()
flow that immediately exits and does not re-queue itself at
the first loop with no activity.

With this new option a queue is assigned a period of time
that no activity must occur in order to become 'idle'.  Until
the queue is idle the work item is requeued.

The new module option is defined as changeable making it
flexible for testing purposes.

The pre-existing legacy behavior is preserved when no module option
for queue idle period is specified.

Signed-off-by: Mark Wunderlich <redacted>
---
Testing was performed with a NIC using standard HW interrupt mode, with
and without the new module option enabled.  No measurable performance
drop was seen when the patch was applied and the new option specified
or not.  A side effect of a standard NIC using the new option
will reduce the context switch rate.  We measured a drop from roughly
90K to less than 300 (for 32 active connections).
Nice!
quoted hunk ↗ jump to hunk
For a NIC using a passive advanced interrupt moderation policy, it was
then successfully able to achieve and maintain active connections with
the target.
---
  drivers/nvme/target/tcp.c |   65 ++++++++++++++++++++++++++++++++++++++++++++-
  1 file changed, 64 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index dc1f0f647189..f5e6e2806841 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -29,6 +29,16 @@ static int so_priority;
  module_param(so_priority, int, 0644);
  MODULE_PARM_DESC(so_priority, "nvmet tcp socket optimize priority");
  
+/*
+ * Define a time period (in msecs)
Is this in msecs or usecs? Anyway I think it should be in the name as
well.

  that io_work() shall sample an activated
+ * queue before determining it to be idle.  This optional module behavior
+ * can enable NIC solutions that support socket optimized packet processing
+ * using advanced interrupt moderation techniques.
+ */
+static int queue_idle_period;
+module_param(queue_idle_period, int, 0644);
+MODULE_PARM_DESC(queue_idle_period, "nvmet tcp io_work queue idle time period");
I think this should be called "queue_poll_period" maybe? because this is
the period a queue will keep actively trying to read data from the
socket regardless if there is any activity or not right?

Or does this mean a different semantics with this?
quoted hunk ↗ jump to hunk
+
  #define NVMET_TCP_RECV_BUDGET		8
  #define NVMET_TCP_SEND_BUDGET		8
  #define NVMET_TCP_IO_WORK_BUDGET	64
@@ -96,6 +106,7 @@ struct nvmet_tcp_queue {
  	struct work_struct	io_work;
  	struct nvmet_cq		nvme_cq;
  	struct nvmet_sq		nvme_sq;
+	unsigned long           idle_period;
poll_period?
quoted hunk ↗ jump to hunk
  
  	/* send state */
  	struct nvmet_tcp_cmd	*cmds;
@@ -1198,12 +1209,42 @@ static void nvmet_tcp_schedule_release_queue(struct nvmet_tcp_queue *queue)
  	spin_unlock(&queue->state_lock);
  }
  
+/*
+ * This worker function will process all send and recv packet
+ * activity for a queue. It will loop on the queue for up to a
+ * given maximum operation budget, or until there is no activity
+ * during a single loop iteration.
+ *
+ * Two exit modes are possible.
+ *
+ * The default 'pending' mode where the worker will re-queue
+ * itself, after exiting the work loop, only if any send or recv
+ * activity was recorded during the last pass within the loop.
+ *
+ * A optional 'idle period' mode where in addition to re-queueing
+ * itself because of activity it also tracks if a queue has not reached an
+ * assigned 'idle' period of time. The worker consumes from the assigned
+ * time period, across many potential invocations, until it is expired.
+ * A queue with activity always being awarded a fresh time
+ * period for processing.
+ */
  static void nvmet_tcp_io_work(struct work_struct *w)
  {
  	struct nvmet_tcp_queue *queue =
  		container_of(w, struct nvmet_tcp_queue, io_work);
  	bool pending;
  	int ret, ops = 0;
+	long period_remaining;
+	unsigned long idle_deadline;
+
+	/* Setup use of optional tracking for idle time period */
Redundant comment.
+	if (queue_idle_period) {
+		/* Assign the queue an idle period if not already set */
+		if (!queue->idle_period)
+			queue->idle_period =
+					usecs_to_jiffies(queue_idle_period);
+		idle_deadline = jiffies + queue->idle_period;
Why do you need to keep a period time?

Why don't you just maintain queue->poll_start and just check if
your past the queue_poll time addition:

	if (queue_idle_period && queue->start_poll == 0)
		queue->start_poll = jiffies;

...
[io_work loop]
...

	if (queue_poll_period) {
		if (!time_after(jiffies, queue->start_poll +
				usecs_to_jiffies(queue_poll_period)))
			pending = true;
		else
			queue->start_poll = 0;
	}

Can this work?
quoted hunk ↗ jump to hunk
+	}
  
  	do {
  		pending = false;
@@ -1222,8 +1263,30 @@ static void nvmet_tcp_io_work(struct work_struct *w)
  
  	} while (pending && ops < NVMET_TCP_IO_WORK_BUDGET);
  
+	/* If optional deadline mode active, determine if queue has reached its
+	 * idle process deadline limit.  Remaining deadline is calculated. Any ops
+	 * activity awards the queue a new deadline period.
+	 */
Comments should stay consistent in style...
+	if (queue_idle_period) {
+		/*
+		 * Clear to award active non-idle queue new period, or
+		 * reset for future queue activity after exit when idle reached.
+		 */
+		queue->idle_period = 0;
+		if (ops > 0) {
+			pending = true;
What is this useful for?
+		} else if (!time_after(jiffies, idle_deadline)) {
+			period_remaining = (long)(idle_deadline - jiffies);
+			if (period_remaining > 0) {
+				pending = true;
+				queue->idle_period = period_remaining;
+			}
+		}
+	}
See above if it is simpler?
+
  	/*
-	 * We exahusted our budget, requeue our selves
+	 * We requeue ourself when pending indicates there was activity
+	 * recorded, or queue has not reached optional idle time period.
  	 */
  	if (pending)
  		queue_work_on(queue_cpu(queue), nvmet_tcp_wq, &queue->io_work);
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help