Thread (91 messages) 91 messages, 15 authors, 2017-07-11

Unexpected issues with 2 NVME initiators using the same target

From: Gruher, Joseph R <hidden>
Date: 2017-03-16 00:03:41

quoted
We tested the patches
with a single target system and a single initiator system connected
via CX4s at 25Gb through an Arista 7060X switch with regular Ethernet
flow control enabled (no PFC/DCB - but the switch has no other traffic
on it).  We connected
8 Intel P3520 1.2 TB SSDs from the target to the initiator with 16 IO
queues per disk.  Then we ran FIO with a 4KB workload, random IO
pattern, 4 jobs per disk, queue depth 32 per job, testing 100% read,
70/30 read/write, and 100% write workloads.  We used the default
4.10-RC8 kernel, then patched the same kernel with Sagi's patch, and
then separately with Max's patch, and then both patches at the same
time (just for fun).  The patches were applied on both target and
initiator.  In general we do see to see a performance hit on small
block read workloads but it is not massive, looks like about 10%.  We also
tested some large block transfers and didn't see any impact.  Results here are
in 4KB IOPS:
quoted
Read/Write	4.10-RC8	Patch 1 (Sagi)	Patch 2 (Max)	Both Patches
100/0		667,158		611,737		619,586
		607,080
70/30		941,352		890,962		884,222
		876,926
0/100		667,379		666,000		666,093
		666,144
One additional result from our 25Gb testing - we did do an additional test with
the same configuration as above but we ran just a single disk, and a single FIO
job with queue depth 8.  This is a light workload designed to examine latency
under lower load, when not bottlenecked on network or disk throughput, as
opposed to driving the system to max IOPS.  Here we see about a 30usec (20%)
increase to latency on 4KB random reads when we apply Sagi's patch and a
corresponding dip in IOPS (only about a 2% hit to latency was seen with Max's
patch):

4.10-RC8	Patch 1		4.10-RC8 Kernel		Patch 1
IOPS		IOPS		Latency (usec)		Latency (usec)
49,304		40,490		160.3			192.9
After moving back to 50Gb CX4 NICs we tested the patches from Sagi and Max.  With Sagi's patch we seem to see a reduced frequency of errors, especially on the target, but errors still definitely occur.  We ran 48 different two-minute workloads and saw roughly 30 errors on the initiator and exactly two on the target.

Target error example:

[ 4336.224633] mlx5_0:dump_cqe:262:(pid 12397): dump error cqe
[ 4336.224636] 00000000 00000000 00000000 00000000
[ 4336.224636] 00000000 00000000 00000000 00000000
[ 4336.224637] 00000000 00000000 00000000 00000000
[ 4336.224637] 00000000 00008813 080000ca 3fb97fd3

Initiator error example:

[ 3134.447002] mlx5_0:dump_cqe:262:(pid 0): dump error cqe
[ 3134.447006] 00000000 00000000 00000000 00000000
[ 3134.447007] 00000000 00000000 00000000 00000000
[ 3134.447008] 00000000 00000000 00000000 00000000
[ 3134.447010] 00000000 08007806 250001a1 55a128d3
[ 3134.447032] nvme nvme0: MEMREG for CQE 0xffff91458a81a650 failed with status memory management operation error (6)
[ 3134.460612] nvme nvme0: reconnecting in 10 seconds
[ 3144.733988] nvme nvme0: Successfully reconnected

Full dmesg output from both systems is attached (it has a few annotations in it about what workload were running at the time of the errors - please just ignore those).

With Max's patch we have so far not produced any errors!  We will continue testing it.  We are also still working to assess the performance impact of Max's patch on the 50Gb configuration.  Since we get the errors without the patch (which then cause the initiator to disconnect and reconnect and thus affect performance) we cannot just run our automated test with and without the patch and compare the two results.  We will do some targeted testing to see if we can capture some unpatched runs that don't have errors and use those to assess the performance impact of Max's patch on the same workloads.

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patch-test-01-50g-patch1-i03-dmesg.txt
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20170316/6ef575ae/attachment-0002.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patch-test-01-50g-patch1-t01-dmesg.txt
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20170316/6ef575ae/attachment-0003.txt>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help