Re: [RESEND RFC] translate_pid API
From: Nagarathnam Muthusamy <hidden>
Date: 2018-03-13 22:50:28
Also in:
lkml
On 03/13/2018 03:00 PM, Jann Horn wrote:
On Tue, Mar 13, 2018 at 2:44 PM, Nagarathnam Muthusamy [off-list ref] wrote:quoted
On 03/13/2018 02:28 PM, Jann Horn wrote:quoted
On Tue, Mar 13, 2018 at 2:20 PM, Nagarathnam Muthusamy [off-list ref] wrote:quoted
On 03/13/2018 01:47 PM, Jann Horn wrote:quoted
On Mon, Mar 12, 2018 at 10:18 AM, [off-list ref] wrote:quoted
Resending the RFC with participants of previous discussions in the list. Following patch which is a variation of a solution discussed in https://lwn.net/Articles/736330/ provides the users of pid namespace, the functionality of pid translation between namespaces using a namespace identifier. The topic of pid translation has been discussed in the community few times but there has always been a resistance to adding new solution for this problem. I will outline the planned usecase of pid namespace by oracle database and explain why any of the existing solution cannot be used to solve their problem. Consider a system in which several PID namespaces with multiple nested levels exists in parallel with monitor processes managing all the namespaces. PID translation is required for controlling and accessing information about the processes by the monitors and other processes down the hierarchy of namespaces. Controlling primarily involves sending signals or using ptrace by a process in parent namespace on any of the processes in its child namespace. Accessing information deals with the reading /proc/<pid>/* files of processes in child namespace. None of the processes have root/CAP_SYS_ADMIN privileges.How are you dealing with PID reuse?We have a monitor process which keeps track of the aliveness of important processes. When a process dies, monitor makes a note of it and hence detects if pid is reused.How do you do that in a race-free manner?AFAIK, the monitor runs periodically to check the aliveness of the processes and this period is too short for pids to recycle. I will get back with more information on this if any other mechanisms are in place.quoted
quoted
quoted
quoted
+ */ +SYSCALL_DEFINE3(translate_pid, pid_t, pid, u64, source, + u64, target) +{ + struct pid_namespace *source_ns = NULL, *target_ns = NULL; + struct pid *struct_pid; + struct pid_namespace *ph; + struct hlist_bl_head *shead = NULL; + struct hlist_bl_head *thead = NULL; + struct hlist_bl_node *dup_node; + pid_t result; + + if (!source) { + source_ns = &init_pid_ns; + } else { + shead = pid_ns_hash_head(pid_ns_hash, source); + hlist_bl_lock(shead); + hlist_bl_for_each_entry(ph, dup_node, shead, node) { + if (source == ph->ns.ns_id) { + source_ns = ph; + break; + } + } + if (!source_ns) { + hlist_bl_unlock(shead); + return -EINVAL; + } + } + if (!ptrace_may_access(source_ns->child_reaper, + PTRACE_MODE_READ_FSCREDS)) {AFAICS this proposal breaks the visibility restrictions that namespaces normally create. If there are two namespaces-based containers that use the same UID range, I don't think they should be able to learn information about each other, such as which PIDs are in use in the other container; but as far as I can tell, your proposal makes it possible to do that (unless an LSM or so is interfering). I would prefer it if this API required visibility of the targeted PID namespaces in the caller's PID namespace.I am trying to simulate the same access restrictions allowed on a process's /proc/<pid>/ns/pid file. If the translator has access to /proc/<pid>/ns/pid file of both source and destination namespaces, shouldn't it be allowed to translate the pid between them?But the translator doesn't actually need to have access to those procfs files, right?I thought it should have access to those procfs files to satisfy the visibility constraint that targeted PID namespaces should be visible in caller's PID namespace and ptrace_may_access checks that constraint.If there are two containers that use the same UID range, ptrace_may_access() checks from a process in one container on a process in another container can pass. Normally, you just can't even reach the ptrace_may_access() checks because you can't reference processes in another container in any way.
If there is no way to reference the process in another container, there is no way to get to the /proc/<pid>/ns/pidns_id file which exports the ID of that container right? So, a translator has to first guess the container ID then try translate. Even after translation, unless the translator has proper privileges, I believe it cant do anything with just the pid right?
By the way, a related concern: The use of global identifiers will probably also negatively affect Checkpoint/Restore In Userspace?
Will look into this. Can you point me to the specifics of the usecase which could be negatively affected? Thanks, Nagarathnam.