Re: [PATCH 04/12] bloom: clear each bloom_key after use
From: Andrzej Hunt <hidden>
Date: 2021-04-25 13:17:45
On 11/04/2021 09:26, SZEDER Gábor wrote:
On Fri, Apr 09, 2021 at 06:47:23PM +0000, Andrzej Hunt via GitGitGadget wrote:quoted
From: Andrzej Hunt <redacted> fill_bloom_key() allocates memory into bloom_key, we need to clean that up once the key is no longer needed. This fixes the following leak which was found while running t0002-t0099. Although this leak is happening in code being called from a test-helper, the same code is also used in various locations around git, and could presumably happen during normal usage too.It does indeed happen: 'git commit-graph write --reachable --changed-paths' generates Bloom filters for every commit, with each filter containing all paths modified by its associated commit, so it leaks a lot of 7 * 4byte hashes. This patch reduces the memory usage of that command: Max RSS before after --------------------------------------------- android-base 1275028k 1006576k -21.1% chromium 3245144k 3127764k -3.6% cmssw 793996k 699156k -12.0% cpython 371584k 343480k -7.6% elasticsearch 748104k 637936k -14.7% freebsd-src 819020k 741272k -9.5% gcc 867412k 730332k -15.8% gecko-dev 2619112k 2457280k -6.2% git 252684k 216900k -14.2% glibc 239000k 222228k -7.0% go 264132k 251344k -4.9% homebrew-cask 542188k 480588k -11.4% homebrew-core 805332k 715848k -11.1% jdk 417832k 342928k -17.9% libreoff-core 1257296k 1089980k -13.3% linux 2033296k 1759712k -13.5% llvm-project 1067216k 956704k -10.4% mariadb-srv 695172k 559508k -19.5% postgres 340132k 317416k -6.7% rails 325432k 294332k -9.6% rust 655244k 584904k -10.7% tensorflow 507308k 480848k -5.2% webkit 2466812k 2237332k -9.3% Just out of curiosity, I disabled the questionable hardcoded 512 paths limit on the size of modified path Bloom filters, and the memory usage in the jdk repository sunk by over 55%, from 849520k to 379760k. Please feel free to include any of the above data points in the commit message.
Thank you for the detailed analysis - these kinds of results are very motivating! I will include a brief summary (something like "10% typical improvement for 'commit-graph write' for large repos") along with a link to your posting for those who want the full picture.