Thread (78 messages) 78 messages, 10 authors, 2021-02-04

Re: [PATCH v16 07/11] secretmem: use PMD-size pages to amortize direct map fragmentation

From: Michal Hocko <mhocko@suse.com>
Date: 2021-02-03 09:14:06
Also in: linux-api, linux-arch, linux-fsdevel, linux-kselftest, linux-mm, linux-riscv, lkml, nvdimm

On Tue 02-02-21 21:10:40, Mike Rapoport wrote:
On Tue, Feb 02, 2021 at 02:27:14PM +0100, Michal Hocko wrote:
quoted
On Tue 02-02-21 14:48:57, Mike Rapoport wrote:
quoted
On Tue, Feb 02, 2021 at 10:35:05AM +0100, Michal Hocko wrote:
quoted
On Mon 01-02-21 08:56:19, James Bottomley wrote:

I have also proposed potential ways out of this. Either the pool is not
fixed sized and you make it a regular unevictable memory (if direct map
fragmentation is not considered a major problem)
I think that the direct map fragmentation is not a major problem, and the
data we have confirms it, so I'd be more than happy to entirely drop the
pool, allocate memory page by page and remove each page from the direct
map. 

Still, we cannot prove negative and it could happen that there is a
workload that would suffer a lot from the direct map fragmentation, so
having a pool of large pages upfront is better than trying to fix it
afterwards. As we get more confidence that the direct map fragmentation is
not an issue as it is common to believe we may remove the pool altogether.
I would drop the pool altogether and instantiate pages to the
unevictable LRU list and internally treat it as ramdisk/mlock so you
will get an accounting correctly. The feature should be still opt-in
(e.g. a kernel command line parameter) for now. The recent report by
Intel (http://lkml.kernel.org/r/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com)
there is no clear win to have huge mappings in _general_ but there are
still workloads which benefit. 
 
quoted
I think that using PMD_ORDER allocations for the pool with a fallback to
order 0 will do the job, but unfortunately I doubt we'll reach a consensus
about this because dogmatic beliefs are hard to shake...
If this is opt-in then those beliefs can be relaxed somehow. Long term
it makes a lot of sense to optimize for a better direct map management
but I do not think this is a hard requirement for an initial
implementation if it is not imposed to everybody by default.
quoted
A more restrictive possibility is to still use plain PMD_ORDER allocations
to fill the pool, without relying on CMA. In this case there will be no
global secretmem specific pool to exhaust, but then it's possible to drain
high order free blocks in a system, so CMA has an advantage of limiting
secretmem pools to certain amount of memory with somewhat higher
probability for high order allocation to succeed. 
quoted
or you need a careful access control 
Do you mind elaborating what do you mean by "careful access control"?
As already mentioned, a mechanism to control who can use this feature -
e.g. make it a special device which you can access control by
permissions or higher level security policies. But that is really needed
only if the pool is fixed sized.
  
Let me reiterate to make sure I don't misread your suggestion.

If we make secretmem an opt-in feature with, e.g. kernel parameter, the
pooling of large pages is unnecessary. In this case there is no limited
resource we need to protect because secretmem will allocate page by page.
Yes.
Since there is no limited resource, we don't need special permissions
to access secretmem so we can move forward with a system call that creates
a mmapable file descriptor and save the hassle of a chardev.
Yes, I assume you implicitly assume mlock rlimit here. Also memcg
accounting should be in place. Wrt to the specific syscall, please
document why existing interfaces are not a good fit as well. It would be
also great to describe interaction with mlock itself (I assume the two
to be incompatible - mlock will fail on and mlockall will ignore it).
I cannot say I don't like this as it cuts roughly half of mm/secretmem.c :)

But I must say I am still a bit concerned about that we have no provisions
here for dealing with the direct map fragmentation even with the set goal
to improve the direct map management in the long run...
Yes that is something that will be needed long term. I do not think this
is strictly necessary for the initial submission, though. The
implementation should be as simple as possible now and complexity added
on top.
-- 
Michal Hocko
SUSE Labs

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help