Re: RFC(v2): Audit Kernel Container IDs

From: Paul Moore <paul@paul-moore.com>
Date: 2017-10-19 15:51:13
Also in: cgroups, linux-fsdevel, lkml, netdev

On Thu, Oct 19, 2017 at 9:32 AM, Casey Schaufler [off-list ref] wrote:

On 10/18/2017 5:05 PM, Richard Guy Briggs wrote:

quoted

On 2017-10-17 01:10, Casey Schaufler wrote:

quoted

On 10/16/2017 5:33 PM, Richard Guy Briggs wrote:

quoted

On 2017-10-12 16:33, Casey Schaufler wrote:

quoted

On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:

quoted

Containers are a userspace concept.  The kernel knows nothing of them.

The Linux audit system needs a way to be able to track the container
provenance of events and actions.  Audit needs the kernel's help to do
this.

Since the concept of a container is entirely a userspace concept, a
registration from the userspace container orchestration system initiates
this.  This will define a point in time and a set of resources
associated with a particular container with an audit container ID.

The registration is a pseudo filesystem (proc, since PID tree already
exists) write of a u8[16] UUID representing the container ID to a file
representing a process that will become the first process in a new
container.  This write might place restrictions on mount namespaces
required to define a container, or at least careful checking of
namespaces in the kernel to verify permissions of the orchestrator so it
can't change its own container ID.  A bind mount of nsfs may be
necessary in the container orchestrator's mntNS.
Note: Use a 128-bit scalar rather than a string to make compares faster
and simpler.

Require a new CAP_CONTAINER_ADMIN to be able to carry out the
registration.

Hang on. If containers are a user space concept, how can
you want CAP_CONTAINER_ANYTHING? If there's not such thing as
a container, how can you be asking for a capability to manage
them?

There is such a thing, but the kernel doesn't know about it yet.

Then how can it be the kernel's place to control access to a
container resource, that is, the containerID.

Ok, let me try to address your objections.

The kernel can know enough that if it is already set to not allow it to
be set again.  Or if the user doesn't have permission to set it that the
user be denied this action.  How is this different from loginuid and
sessionid?

quoted

  This
same situation exists for loginuid and sessionid which are userspace
concepts that the kernel tracks for the convenience of userspace.

Ah, no. Loginuid identifies a user, which is a kernel concept in
that a user is defined by the uid.

This simple explanation doesn't help me.  What makes that a kernel
concept?  The fact that it is stored and compared in more than one
place?

quoted

The session ID has well defined kernel semantics. You're trying to say
that the containerID is an opaque value that is meaningless to the
kernel, but you still want the kernel to protect it. How can the
kernel know if it is protecting it correctly?

How so?  A userspace process triggers this.  Does the kernel know what
these values mean?  Does it do anything with them other than report
them or allow audit to filter them?  It is given some instructions on
how to treat it.

This is what we're trying to do with the containerID.

quoted

  As
for its name, I'm not particularly picky, so if you don't like
CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
don't want to give the ability to set a containerID to any process that
is able to do audit logging (such as vsftpd) and similarly we don't want
to give the orchestrator the ability to control the setup of the audit
daemon.

Sorry, but what aspect of the kernel security policy is this
capability supposed to protect? That's what capabilities are
for, not the undefined support of undefined user-space behavior.

Similarly, loginuids and sessionIDs are only used for audit tracking and
filtering.

Tell me again why you're not reusing either of these?

Ah, granularity arguments, welcome back old friend :)

Once again, we're still trying to sort all this out so I reserve the
right to change my mind, but my current thinking is as follows ...
CAP_AUDIT_WRITE exists to control which applications can submit
userspace generated audit records to the kernel, CAP_AUDIT_CONTROL
exists to control which applications can manage the in-kernel audit
configuration (e.g. filter rules) and the current task's loginuid
value.  Reusing CAP_AUDIT_WRITE here would allow any application that
can submit userspace audit records the ability to change the audit
container ID; this would be bad, we don't allow CAP_AUDIT_WRITE to
change the loginuid, it would be even worse to allow it to change the
audit container ID.  Reusing CAP_AUDIT_CONTROL is less worse than than
CAP_AUDIT_WRITE, but it gets sticky once we get to the part where we
want to auditd instances in containers, complete with their own
queues, filtering rules, etc..  Perhaps we could use CAP_AUDIT_CONTROL
to guard the audit container ID value, but we would always want to do
that check in the init userns in order to prevent container bound
processes from manipulating their own audit container ID.

-- 
paul moore
www.paul-moore.com

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help