Re: [PATCH v10 1/5] kasan: support backing vmalloc space with real shadow memory
From: Daniel Axtens <hidden>
Date: 2019-10-31 09:36:55
Also in:
linux-mm, lkml
Uladzislau Rezki [off-list ref] writes:
Hello, Danielquoted
@@ -1294,14 +1299,19 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) spin_lock(&free_vmap_area_lock); llist_for_each_entry_safe(va, n_va, valist, purge_list) { unsigned long nr = (va->va_end - va->va_start) >> PAGE_SHIFT; + unsigned long orig_start = va->va_start; + unsigned long orig_end = va->va_end; /* * Finally insert or merge lazily-freed area. It is * detached and there is no need to "unlink" it from * anything. */ - merge_or_add_vmap_area(va, - &free_vmap_area_root, &free_vmap_area_list); + va = merge_or_add_vmap_area(va, &free_vmap_area_root, + &free_vmap_area_list); + + kasan_release_vmalloc(orig_start, orig_end, + va->va_start, va->va_end);I have some questions here. I have not analyzed kasan_releace_vmalloc() logic in detail, sorry for that if i miss something. __purge_vmap_area_lazy() deals with big address space, so not only vmalloc addresses it frees here, basically it can be any, starting from 1 until ULONG_MAX, whereas vmalloc space spans from VMALLOC_START - VMALLOC_END: 1) Should it be checked that vmalloc only address is freed or you handle it somewhere else? if (is_vmalloc_addr(va->va_start)) kasan_release_vmalloc(...)
So in kasan_release_vmalloc we only free the region covered by the shadow of orig_start to orig_end, and possibly 1 page to either side. So it will never attempt to free an enormous area. And it will also do nothing if called for a region where there is no shadow backin installed. Having said that, there should be a test on orig_start, and I've added that in v11 - good catch.
2) Have you run any bencmarking just to see how much overhead it adds? I am asking, because probably it make sense to add those figures to the backlog(commit message). For example you can run: <snip> sudo ./test_vmalloc.sh performance and sudo ./test_vmalloc.sh sequential_test_order=1 <snip>
I have now done that:
Testing with test_vmalloc.sh on an x86 VM with 2 vCPUs shows that:
- Turning on KASAN, inline instrumentation, without this feature, introuduces
a 4.1x-4.2x slowdown in vmalloc operations.
- Turning this on introduces the following slowdowns over KASAN:
* ~1.76x slower single-threaded (test_vmalloc.sh performance)
* ~2.18x slower when both cpus are performing operations
simultaneously (test_vmalloc.sh sequential_test_order=1)
This is unfortunate but given that this is a debug feature only, not
the end of the world.
The full figures are:
Performance
No KASAN KASAN original x baseline KASAN vmalloc x baseline x KASAN
fix_size_alloc_test 1697913 14229459 8.38 22981983 13.54 1.62
full_fit_alloc_test 1841601 15152633 8.23 17902922 9.72 1.18
long_busy_list_alloc_test 17874082 58856758 3.29 103925371 5.81 1.77
random_size_alloc_test 9356047 29544085 3.16 57871338 6.19 1.96
fix_align_alloc_test 3188968 19821620 6.22 37979436 11.91 1.92
random_size_align_alloc_te 3033507 17584339 5.80 32588942 10.74 1.85
align_shift_alloc_test 325 1154 3.55 7263 22.35 6.29
pcpu_alloc_test 231952 278181 1.20 318977 1.38 1.15
Total Cycles 235852824254 985040965542 4.18 1733258779416 7.35 1.76
Sequential, 2 cpus
No KASAN KASAN original x baseline KASAN vmalloc x baseline x KASAN
fix_size_alloc_test 2505806 17989253 7.18 39651038 15.82 2.20
full_fit_alloc_test 3579676 18829862 5.26 21142645 5.91 1.12
long_busy_list_alloc_test 21594983 74766736 3.46 140701363 6.52 1.88
random_size_alloc_test 10884695 34282077 3.15 91945108 8.45 2.68
fix_align_alloc_test 4133226 26304745 6.36 76163270 18.43 2.90
random_size_align_alloc_te 4261175 22927883 5.38 55236058 12.96 2.41
align_shift_alloc_test 948 4827 5.09 4144 4.37 0.86
pcpu_alloc_test 371789 307654 0.83 374412 1.01 1.22
Total Cycles 99965417402 412710461642 4.13 897968646378 8.98 2.18
fix_size_alloc_test 2502718 17921542 7.16 39893515 15.94 2.23
full_fit_alloc_test 3547996 18675007 5.26 21330495 6.01 1.14
long_busy_list_alloc_test 21522579 74610739 3.47 139822907 6.50 1.87
random_size_alloc_test 10881507 34317349 3.15 91110531 8.37 2.65
fix_align_alloc_test 4119755 26180887 6.35 75818927 18.40 2.90
random_size_align_alloc_te 4297708 23058344 5.37 55969004 13.02 2.43
align_shift_alloc_test 956 5574 5.83 4591 4.80 0.82
pcpu_alloc_test 306340 347014 1.13 571289 1.86 1.65
Total Cycles 99642832084 412084074628 4.14 896497227762 9.00 2.18
Regards,
Daniel
Thanks! -- Vlad Rezki