Re: [PATCH v4] pidns: introduce syscall translate_pid
From: prakash sangappa <hidden>
Date: 2017-10-17 22:53:07
Also in:
lkml
On 10/17/2017 3:40 PM, Andy Lutomirski wrote:
On Tue, Oct 17, 2017 at 3:35 PM, prakash sangappa [off-list ref] wrote:quoted
On 10/17/2017 3:02 PM, Andy Lutomirski wrote:quoted
On Tue, Oct 17, 2017 at 8:38 AM, Prakash Sangappa [off-list ref] wrote:quoted
On 10/16/17 5:52 PM, Andy Lutomirski wrote:quoted
On Mon, Oct 16, 2017 at 3:54 PM, prakash.sangappa [off-list ref] wrote:quoted
On 10/16/2017 03:07 PM, Nagarathnam Muthusamy wrote:quoted
On 10/16/2017 02:36 PM, Andrew Morton wrote:quoted
On Sat, 14 Oct 2017 11:17:47 +0300 Konstantin Khlebnikov [off-list ref] wrote:quoted
quoted
quoted
quoted
pid_t translate_pid(pid_t pid, int source, int target); This syscall converts pid from source pid-ns into pid in target pid-ns. If pid is unreachable from target pid-ns it returns zero. Pid-namespaces are referred file descriptors opened to proc files /proc/[pid]/ns/pid or /proc/[pid]/ns/pid_for_children. Negative argument refers to current pid namespace, same as file /proc/self/ns/pid. Kernel expose virtual pids in /proc/[pid]/status:NSpid, but backward translation requires scanning all tasks. Also pids could be translated by sending them through unix socket between namespaces, this method is slow and insecure because other side is exposed inside pid namespace.Andrew asked why we might need this. Such conversion is required for interaction between processes across pid-namespaces. For example to identify process in container by pid file looking from outside. Two years ago I've solved this in project of mine with monstrous code which forks couple times just to convert pid, lucky for me performance wasn't important.That's a single user who needed this a single time, and found a userspace-based solution anyway. This is not exactly compelling! Is there a stronger case to be made? How does this change benefit our users? Sell it to us!Oracle database is planning to use pid namespace for sandboxing database instances and they need an API similar to translate_pid to effectively translate process IDs from other pid namespaces. Prakash (cced in mail) can provide more details on this usecase.As Nagarathnam indicated, Oracle Database will be using pid namespaces and needs a direct method of converting pids of processes in the pid namespace hierarchy. In this use case multiple nested PID namespaces will be used. The currently available mechanism are not very efficient for this use case. For ex. as Konstantin described, using /proc/<pid>/status would require the application to scan all the pid's status files to determine the pid of given process in a child namespace. Use of SCM_CREDENTIALS's socket message is another way, which would require every process starting inside a pid namespace to send this message and the receiving process in the target namespace would have to save the converted pid and reference it. This mechanism becomes cumbersome especially if the application has to deal with multiple nested pid namespaces. Also, the Database needs to be able to convert a thread's global pid(gettid()). Passing the thread's pid(gettid()) in SCM_CREDENTIALS message requires CAP_SYS_ADMIN, which is an issue. So having a direct method, like the API that Konstantin is proposing, will work best for the Database since pid of a process in any of the nested pid namespaces can be converted as and when required. I think with the proposed API, the application should be able to convert pid of a process or tid(gettid()) of a thread as well.Can you explain what Oracle's database is planning to do with this information?Database uses the PID to programmatically find out if the process/thread is alive(kill 0) also send signals to the processes requesting it to dump status/debug information and kill the processes in case of a shutdown abort of the instance.What I'm wondering is: how does the caller of kill() end up controlling a task whose pid it doesn't know in its own namespace?I was generally describing how DB would use the PID of process. The above description was in the case when no namespaces are used. With use of namespaces, the DB would convert the PID of processes inside its children namespaces to PID in its namespace and use that pid to issue kill().Seems vaguely sensible. If I were designing this type of system, I'd have a manager process in each namespace running as PID 1, though -- PID 1 is special and needs to understand what's going on anyway. Then PID 1 would do the kill() calls and wouldn't need translate_pid().
Yes, this has been tried out with the prototype use of PID namespaces in the DB. It works, but would be slow as the manager would have to exchange messages with the controlling processes which would be in the parent namespace. DB could use the api to convert the pid.