Thread (4 messages) 4 messages, 4 authors, 2018-08-24

Re: [PATCH] Performance Improvement in CRC16 Calculations.

From: Ard Biesheuvel <hidden>
Date: 2018-08-24 15:39:51
Also in: linux-block, linux-scsi, lkml

On 24 August 2018 at 16:32, Jeffrey Lien [off-list ref] wrote:
I rebuilt my 4.18 kernel with CONFIG_CRYPTO_CRCT10DIF_PCLMUL=y as Martin recommended and got even better performance results vs the CRC Slice by 16 changes.  Here's a summary of the results

FIO Sequential Write, 64K Block Size, Queue Depth 64
PCLMUL = y Kernel:        bw = 2237 MiB/s
Slice by 16 CRC Calc:      bw = 1964 MiB/s
Base Kernel:                     bw =   357 MiB/s

FIO Sequential Read, 64K Block Size, Queue Depth 64
PCLMUL = y Kernel:        bw = 3839 MiB/s
Slice by 16 CRC Calc:      bw = 2730  MiB/s
Base Kernel:                     bw =   797 MiB/s

So it seems the CONFIG_CRYPTO_CRCT10DIF_PCLMUL=y provides the best performance.  Are there any negative side effect to this config option?   If not, does it make sense to recommend all the major distro's change their config options to have CONFIG_CRYPTO_CRCT10DIF_PCLMUL=y as the default option?
I think the way the library version of crc_t10dif() invokes the crypto
API should be revised.

Would it be possible to allocate the crypto transform upon first use
instead of from an initcall? If crc_t10dif() is mostly called from
non-process context, that would not really work, but otherwise, we
could simply defer it (and occasional calls from non-process context
that do occur would use the generic code until the point where another
call from process context allocates the transform)
-----Original Message-----
From: Christoph Hellwig [mailto:hch@infradead.org]
Sent: Wednesday, August 22, 2018 1:20 AM
To: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Jeffrey Lien <redacted>; linux-kernel@vger.kernel.org; linux-crypto@vger.kernel.org; linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; herbert@gondor.apana.org.au; tim.c.chen@linux.intel.com; David Darrington <redacted>; Jeff Furlong <redacted>
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On Tue, Aug 21, 2018 at 09:40:34PM -0400, Martin K. Petersen wrote:
quoted
When crc-t10dif is initialized, the crypto infrastructure will pick
the algorithm with the highest priority currently registered. Both
block and SCSI will cause crc-t10dif to be compiled as a built-in so
this selection happens very early.
Ouch.  This might actually happen in a lot of other users of the crypto functionality as well.
quoted
However, it seems like a bit of a deficiency in crypto that there is
no way to upgrade existing transformations if higher priority
algorithms become available. btrfs and a few others work around this
issue by not using the generic lib/ CRC functions (which defeats the
purpose of having these in the first place). Instead they are
registering their own transformation at a later time where any
accelerator modules are more likely to be loaded.
If we can't fix this in crypto (which doesn't seem that easy), we should at least clearly document the issue somewhere, and fix this in the t10pi code by initializing crct10dif_tfm in a lazy fashion only once the fist block device starts using it.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help