Re: [RFC 0/9] Nginx refcount scalability issue with Apparmor enabled and potential solutions
From: Neeraj Upadhyay <hidden>
Date: 2024-05-30 04:20:14
Also in:
lkml
Hi John, Thanks for taking a look at the series! On 5/29/2024 6:07 AM, John Johansen wrote:
On 5/28/24 06:29, Mateusz Guzik wrote:quoted
On Fri, May 24, 2024 at 11:52 PM John Johansen [off-list ref] wrote:quoted
On 5/24/24 14:10, Mateusz Guzik wrote:quoted
On Fri, Mar 8, 2024 at 9:09 PM John Johansen [off-list ref] wrote:quoted
On 3/2/24 02:23, Mateusz Guzik wrote:quoted
On 2/9/24, John Johansen [off-list ref] wrote:quoted
On 2/6/24 20:40, Neeraj Upadhyay wrote:quoted
Gentle ping. John, Could you please confirm that: a. The AppArmor refcount usage described in the RFC is correct? b. Approach taken to fix the scalability issue is valid/correct?Hi Neeraj, I know your patchset has been waiting on review for a long time. Unfortunately I have been very, very busy lately. I will try to get to it this weekend, but I can't promise that I will be able to get the review fully done.Gentle prod. Any chances of this getting reviewed in the foreseeable future? Would be a real bummer if the patchset fell through the cracks.yes, sorry I have been unavailable for the last couple of weeks. I am now back, I have a rather large backlog to try catching up on but this is has an entry on the list.So where do we stand here?sorry I am still trying to dig out of my backlog, I will look at this, this weekend.How was the weekend? ;)lets say it was busy. Have I looked at this, yes. I am still digesting it. I don't have objections to moving towards percpu refcounts, but the overhead of a percpu stuct per label is a problem when we have thousands of labels on the system. That is to say, this would have to be a config option. We moved buffers from kmalloc to percpu to reduce memory overhead to reduce contention. The to percpu, to a global pool because the percpu overhead was too high for some machines, and then from a global pool to a hybrid scheme because of global lock contention. I don't see a way of doing that with the label, which means a config would be the next best thing.
For the buffers, what was the percpu overhead roughly? For thousands of labels, I think, the extra memory overhead roughly would be in the range of few MBs (need to be profiled though). This extra label overhead would be considered high for the machines where percpu buffer overhead was considered high? Please correct me here, so you are proposing that we use a kconfig to use either 'struct percpu_ref' or a 'struct kref' (using a union maybe) inside the 'struct aa_label' and update the refcount operations accordingly? If yes, I will work on a patch with this kconfig based selection of refcounting mode to see how it pans out. @Mateusz can you share the dynamic switching counter mode patch series please? In addition, for long term, there is an ongoing work (by Paul, Boqun and myself) on implementing hazard pointers as a scalable refcounting scheme [1] in kernel, which would not have memory usage overhead as in percpu refcount. At this point the API design/implementation is in early prototype stage. [1] https://docs.google.com/document/d/113WFjGlAW4m72xNbZWHUSE-yU2HIJnWpiXp91ShtgeE/edit?usp=sharing
Not part of your patch but something to be considered is that the label tree needs a rework, its locking needs to move to read side a read side lock less scheme, and the plan was to make it also use a linked list such that new labels are always queued at the end, allowing dynamically created labels to be lazily added to the tree.
Read side would be rcu read lock protected in this scheme? The linked list would store the dynamically created compound labels? What is the advantage of using this lazy addition to the tree? We optimize on the label search, addition/deletion for dynamic labels? The lazy addition to the tree is done when a label find operation on the list succeeds?
I see the use of the kworker as problematic as well, especially if we are talking using kconfig to switch reference counting modes. I am futzing with some ideas, on how to deal with this.
We can disable queuing of label reclaim work for non-percpu case?
Like I said I am still digesting.
Thank you! Thanks Neeraj