Re: [PATCH v4 net-next] net: Implement fast csum_partial for x86_64
From: Alexander Duyck <hidden>
Date: 2016-02-27 08:30:08
From: Alexander Duyck <hidden>
Date: 2016-02-27 08:30:08
+{
+ asm("lea 40f(, %[slen], 4), %%r11\n\t"
+ "clc\n\t"
+ "jmpq *%%r11\n\t"
+ "adcq 7*8(%[src]),%[res]\n\t"
+ "adcq 6*8(%[src]),%[res]\n\t"
+ "adcq 5*8(%[src]),%[res]\n\t"
+ "adcq 4*8(%[src]),%[res]\n\t"
+ "adcq 3*8(%[src]),%[res]\n\t"
+ "adcq 2*8(%[src]),%[res]\n\t"
+ "adcq 1*8(%[src]),%[res]\n\t"
+ "adcq 0*8(%[src]),%[res]\n\t"
+ "nop\n\t"
+ "40: adcq $0,%[res]"
+ : [res] "=r" (sum)
+ : [src] "r" (buff),
+ [slen] "r" (-((unsigned long)(len >> 3))), "[res]" (sum)
+ : "r11");
+With this patch I cannot mix/match different length checksums without things failing. In perf the jmpq in the loop above seems to be set to a fixed value so perhaps it is something in how the compiler is interpreting the inline assembler. - Alex