Re: [PATCH 0/24] kernel: add a netlink interface to get information about processes (v2)
From: Andrey Vagin <hidden>
Date: 2015-07-08 22:49:14
Also in:
lkml
2015-07-08 20:39 GMT+03:00 Andy Lutomirski [off-list ref]:
On Wed, Jul 8, 2015 at 9:10 AM, Andrew Vagin [off-list ref] wrote:quoted
On Tue, Jul 07, 2015 at 08:56:37AM -0700, Andy Lutomirski wrote:quoted
On Tue, Jul 7, 2015 at 8:43 AM, Andrew Vagin [off-list ref] wrote:quoted
On Mon, Jul 06, 2015 at 10:10:32AM -0700, Andy Lutomirski wrote:quoted
On Mon, Jul 6, 2015 at 1:47 AM, Andrey Vagin [off-list ref] wrote:quoted
Currently we use the proc file system, where all information are presented in text files, what is convenient for humans. But if we need to get information about processes from code (e.g. in C), the procfs doesn't look so cool. From code we would prefer to get information in binary format and to be able to specify which information and for which tasks are required. Here is a new interface with all these features, which is called task_diag. In addition it's much faster than procfs. task_diag is based on netlink sockets and looks like socket-diag, which is used to get information about sockets.I think I like this in principle, but I have can see a few potential problems with using netlink for this: 1. Netlink very naturally handles net namespaces, but it doesn't naturally handle any other kind of namespace. In fact, the taskstats code that you're building on has highly broken user and pid namespace support. (Look for some obviously useless init_user_ns and init_pid_ns references. But that's only the obvious problem. That code calls current_user_ns() and task_active_pid_ns(current) from .doit, which is, in turn, called from sys_write, and looking at current's security state from sys_write is a big no-no.) You could partially fix it by looking at f_cred's namespaces, but that would be a change of what it means to create a netlink socket, and I'm not sure that's a good idea.If I don't miss something, all problems around pidns and userns are related with multicast functionality. task_diag is using request/response scheme and doesn't send multicast packets.It has nothing to do with multicast. task_diag needs to know what pidns and userns to use for a request, but netlink isn't set up to give you any reasonably way to do that. A netlink socket is fundamentally tied to a *net* ns (it's a socket, after all). But you can send it requests using write(2), and calling current_user_ns() from write(2) is bad. There's a long history of bugs and vulnerabilities related to thinking that current_cred() and similar are acceptable things to use in write(2) implementations.As far as I understand, socket_diag doesn't have this problem, becaus each socket has a link on a namespace where it was created. What if we will pin the current pidns and credentials to a task_diag socket in a moment when it's created.That's certainly doable. OTOH, if anything does: socket(AF_NETLINK, ...); unshare(CLONE_PID); fork(); then they now have a (minor) security problem.
What do you mean? Is it not the same when we open a file and change
uid and gid? Permissions are checked only in the "open" syscall.
[root@avagin-fc19-cr ~]# ls -l xxx
-rw-r--r-- 1 root root 5 Jul 9 01:42 xxx
open("xxx", O_WRONLY|O_APPEND) = 3
setgid(1000) = 0
setuid(1000) = 0
write(3, "a", 1) = 1
close(1) = 0
--Andy