Thread (60 messages) 60 messages, 6 authors, 2015-10-28

Re: [PATCH 0/7] devcg: device cgroup extension for rdma resource

From: Tejun Heo <tj@kernel.org>
Date: 2015-09-10 20:22:21
Also in: linux-rdma, lkml

Hello, Parav.

On Thu, Sep 10, 2015 at 11:16:49PM +0530, Parav Pandit wrote:
quoted
quoted
These resources include are-  QP (queue pair) to transfer data, CQ
(Completion queue) to indicate completion of data transfer operation,
MR (memory region) to represent user application memory as source or
destination for data transfer.
Common resources are QP, SRQ (shared received queue), CQ, MR, AH
(Address handle), FLOW, PD (protection domain), user context etc.
It's kinda bothering that all these are disparate resources.
Actually not. They are linked resources. Every QP needs associated one
or two CQ, one PD.
Every QP will use few MRs for data transfer.
So, if that's the case, let's please implement something higher level.
The goal is providing reasonable isolation or protection.  If that can
be achieved at a higher level of abstraction, please do that.
Here is the good programming guide of the RDMA APIs exposed to the
user space application.

http://www.mellanox.com/related-docs/prod_software/RDMA_Aware_Programming_user_manual.pdf
So first version of the cgroups patch will address the control
operation for section 3.4.
quoted
I suppose that each restriction comes from the underlying hardware and
there's no accepted higher level abstraction for these things?
There is higher level abstraction which is through the verbs layer
currently which does actually expose the hardware resource but in
vendor agnostic way.
There are many vendors who support these verbs layer, some of them
which I know are Mellanox, Intel, Chelsio, Avago/Emulex whose drivers
which support these verbs are in <drivers/infiniband/hw/> kernel tree.

There is higher level APIs above the verb layer, such as MPI,
libfabric, rsocket, rds, pgas, dapl which uses underlying verbs layer.
They all rely on the hardware resource. All of these higher level
abstraction is accepted and well used by certain application class. It
would be long discussion to go over them here.
Well, the programming interface that userland builds on top doesn't
matter too much here but if there is a common resource abstraction
which can be made in terms of constructs that consumers of the
facility would care about, that likely is a better choice than
exposing whatever hardware exposes.
quoted
I'm doubtful that these things are gonna be mainstream w/o building up
higher level abstractions on top and if we ever get there we won't be
talking about MR or CQ or whatever.
Some of the higher level examples I gave above will adapt to resource
allocation failure. Some are actually adaptive to few resource
allocation failure, they do query resources. But its not completely
there yet. Once we have this notion of limited resource in place,
abstraction layer would adapt to relatively smaller value of such
resource.

These higher level abstraction is mainstream. Its shipped at least in
Redhat Enterprise Linux.
Again, I was talking more about resource abstraction - e.g. something
along the line of "I want N command buffers".
quoted
Also, whatever next-gen is
unlikely to have enough commonalities when the proposed resource knobs
are this low level,
I agree that resource won't be common in next-gen other transport
whenever they arrive.
But with my existing background working on some of those transport,
they appear similar in nature and it might seek similar knobs.
I don't know.  What's proposed in this thread seems way too low level
to be useful anywhere else.  Also, what if there are multiple devices?
Is that a problem to worry about?
In past I have discussions with Liran Liss from Mellanox as well on
this topic and we also agreed to have such cgroup controller.
He has recent presentation at Linux foundation event indicating to
have cgroup for RDMA.
Below is the link to it.
http://events.linuxfoundation.org/sites/events/files/slides/containing_rdma_final.pdf
Slides 1 to 7 and slide 13 will give you more insight to it.
Liran and I had similar presentation to RDMA audience with less slides
in RDMA openfabrics summit in March 2015.

I am ok to create separate cgroup for rdma, if community thinks that way.
My preference would be still use device cgroup for above extensions
unless there are fundamental issues that I am missing.
The thing is that they aren't related at all in any way.  There's no
reason to tie them together.  In fact, the way we did devcg is
backward.  The ideal solution would have been extending the usual ACL
to understand cgroups so that it's a natural growth of the permission
system.

You're talking about actual hardware resources.  That has nothing to
do with access permissions on device nodes.
I would let you make the call.
Rdma and other is just another type of device with different
characteristics than character or block, so one device cgroup with sub
functionalities can allow setting knobs.
Every device category will have their own set of knobs for resources,
ACL, limits, policy.
I'm kinda doubtful we're gonna have too many of these.  Hardware
details being exposed to userland this directly isn't common.
And I think cgroup is certainly better control point than sysfs or
spinning of new control infrastructure for this.
That said, I would like to hear your and communities view on how they
would like to see this shaping up.
I'd say keep it simple and do the minimum. :)

Thanks.

-- 
tejun
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help