Thread (60 messages) 60 messages, 6 authors, 2015-10-28

Re: [PATCH 5/7] devcg: device cgroup's extension for RDMA resource.

From: Parav Pandit <hidden>
Date: 2015-09-08 14:13:11
Also in: linux-rdma, lkml

On Tue, Sep 8, 2015 at 7:20 PM, Haggai Eran [off-list ref] wrote:
On 08/09/2015 13:18, Parav Pandit wrote:
quoted
quoted
quoted
quoted
quoted
+ * RDMA resource limits are hierarchical, so the highest configured limit of
+ * the hierarchy is enforced. Allowing resource limit configuration to default
+ * cgroup allows fair share to kernel space ULPs as well.
In what way is the highest configured limit of the hierarchy enforced? I
would expect all the limits along the hierarchy to be enforced.
In  hierarchy, of say 3 cgroups, the smallest limit of the cgroup is applied.

Lets take example to clarify.
Say cg_A, cg_B, cg_C
Role              name                           limit
Parent           cg_A                           100
Child_level1  cg_B (child of cg_A)    20
Child_level2: cg_C (child of cg_B)    50

If the process allocating rdma resource belongs to cg_C, limit lowest
limit in the hierarchy is applied during charge() stage.
If cg_A limit happens to be 10, since 10 is lowest, its limit would be
applicable as you expected.
Looking at the code, the usage in every level is charged. This is what I
would expect. I just think the comment is a bit misleading.
quoted
quoted
quoted
+int devcgroup_rdma_get_max_resource(struct seq_file *sf, void *v)
+{
+     struct dev_cgroup *dev_cg = css_to_devcgroup(seq_css(sf));
+     int type = seq_cft(sf)->private;
+     u32 usage;
+
+     if (dev_cg->rdma.tracker[type].limit == DEVCG_RDMA_MAX_RESOURCES) {
+             seq_printf(sf, "%s\n", DEVCG_RDMA_MAX_RESOURCE_STR);
I'm not sure hiding the actual number is good, especially in the
show_usage case.
This is similar to following other controller same as newly added PID
subsystem in showing max limit.
Okay.
quoted
quoted
quoted
+void devcgroup_rdma_uncharge_resource(struct ib_ucontext *ucontext,
+                                   enum devcgroup_rdma_rt type, int num)
+{
+     struct dev_cgroup *dev_cg, *p;
+     struct task_struct *ctx_task;
+
+     if (!num)
+             return;
+
+     /* get cgroup of ib_ucontext it belong to, to uncharge
+      * so that when its called from any worker tasks or any
+      * other tasks to which this resource doesn't belong to,
+      * it can be uncharged correctly.
+      */
+     if (ucontext)
+             ctx_task = get_pid_task(ucontext->tgid, PIDTYPE_PID);
+     else
+             ctx_task = current;
+     dev_cg = task_devcgroup(ctx_task);
+
+     spin_lock(&ctx_task->rdma_res_counter->lock);
Don't you need an rcu read lock and rcu_dereference to access
rdma_res_counter?
I believe, its not required because when uncharge() is happening, it
can happen only from 3 contexts.
(a) from the caller task context, who has made allocation call, so no
synchronizing needed.
(b) from the dealloc resource context, again this is from the same
task context which allocated, it so this is single threaded, no need
to syncronize.
I don't think it is true. You can access uverbs from multiple threads.
Yes, thats right. Though I design counter structure allocation on per
task basis for individual thread access, I totally missed out ucontext
sharing among threads. I replied in other thread to make counters
during charge, uncharge to atomic to cover that case.
Therefore I need rcu lock and deference as well.
What may help your case here I think is the fact that only when the last
ucontext is released you can change the rdma_res_counter field, and
ucontext release takes the ib_uverbs_file->mutex.

Still, I think it would be best to use rcu_dereference(), if only for
documentation and sparse.
yes.
quoted
(c) from the fput() context when process is terminated abruptly or as
part of differed cleanup, when this is happening there cannot be
allocator task anyway.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help