Re: [PATCH 1/8] drivers/random: Cache align ip_random better
From: George Spelvin <hidden>
Date: 2011-03-16 18:10:30
Also in:
lkml
I'm intrigued: please educate me. On what architectures does cache- aligning a 48-byte buffer (previously offset by 4 bytes) speed up copying from it, and why? Does the copying involve 8-byte or 16-byte instructions that benefit from that alignment, rather than cacheline alignment?
I had two thoughts in my head when I wrote that: 1) A smart compiler could note the alignment and issue wider copy instructions. (Especially on alignment-required architectures.) 2) The cacheline fetch would get more data faster. The data would be transferred in the first 6 beats of the load from RAM (assuming a 64-bit data bus) rather than waiting for 7, so you'd finish the copy 1 ns sooner or so. Similar 1-cycle win on a 128-bit Ln->L(n-1) cache transfer. As I said, "infinitesimal". The main reason that I bothered to generate a patch was that it appealed to my sense of neatness to keep the 3x16-byte buffer 16-byte aligned. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>