[GIT PULL] Update LZO compression
From: Markus F.X.J. Oberhumer <hidden>
Date: 2012-08-16 06:27:56
Also in:
linux-btrfs, lkml
Possibly related (same subject, not in this thread)
- 2012-09-07 · Re: [GIT PULL] Update LZO compression · Andi Kleen <hidden>
On 2012-08-15 16:45, Johannes Stezenbach wrote:
On Wed, Aug 15, 2012 at 02:02:43PM +0200, Markus F.X.J. Oberhumer wrote:quoted
On 2012-08-14 14:39, Johannes Stezenbach wrote:quoted
On Tue, Aug 14, 2012 at 01:44:02AM +0200, Markus F.X.J. Oberhumer wrote:quoted
On 2012-07-16 20:30, Markus F.X.J. Oberhumer wrote:quoted
As stated in the README this version is significantly faster (typically more than 2 times faster!) than the current version, has been thoroughly tested on x86_64/i386/powerpc platforms and is intended to get included into the official Linux 3.6 or 3.7 release. I encourage all compression users to test and benchmark this new version, and I also would ask some official LZO maintainer to convert the updated source files into a GIT commit and possibly push it to Linus or linux-next.Sorry for not reporting earlier, but I didn't have time to do real benchmarks, just a quick test on ARM926EJ-S using barebox, and found in the new version decompression is slower: http://lists.infradead.org/pipermail/barebox/2012-July/008268.htmlI can only guess, but maybe your ARM cpu does not have an efficient implementation of {get,put}_unaligned().Yes, ARMv5 cannot do unaligned access. ARMv6+ could, but I think the Linux kernel normally traps it for debug, all ARM seem to use generic {get,put}_unaligned() implementation which use byte access and shift.
Hmm - I could imagine that we're wasting a lot of possible speed gain by not exploiting that feature on ARMv6+.
quoted
Could you please try the following patch and test if you can see any significant speed difference?It isn't. I made the attached quick hack userspace code using ARM kernel headers and barebox unlzop code. (new == your new code, old == linux-3.5 git, test == new + your suggested change) (sorry I had no time to clean it up)
My suggested COPY4 replacement probably has a lot of load stalls - maybe some ARM expert could have a look and suggest a more efficient implementation. In any case, I still would like to see the new code in linux-next because of the huge improvements on other modern CPUs. Cheers, Markus
I compressed a Linux Image with lzop (lzop <arch/arm/boot/Image >lzoimage) and timed uncompression: # time ./unlzopold <lzoimage >/dev/null real 0m 0.29s user 0m 0.19s sys 0m 0.10s # time ./unlzopold <lzoimage >/dev/null real 0m 0.29s user 0m 0.20s sys 0m 0.09s # time ./unlzopnew <lzoimage >/dev/null real 0m 0.41s user 0m 0.30s sys 0m 0.10s # time ./unlzopnew <lzoimage >/dev/null real 0m 0.40s user 0m 0.30s sys 0m 0.10s # time ./unlzopnew <lzoimage >/dev/null real 0m 0.40s user 0m 0.29s sys 0m 0.11s # time ./unlzoptest <lzoimage >/dev/null real 0m 0.39s user 0m 0.28s sys 0m 0.11s # time ./unlzoptest <lzoimage >/dev/null real 0m 0.39s user 0m 0.27s sys 0m 0.11s # time ./unlzoptest <lzoimage >/dev/null real 0m 0.39s user 0m 0.27s sys 0m 0.11s FWIW I also checked the sha1sum to confirm the Image uncompressed OK. Johannes
-- Markus Oberhumer, [off-list ref], http://www.oberhumer.com/