Thread (8 messages) 8 messages, 3 authors, 2011-02-28

Thread scheduling in 2.6 kernels

From: Mandeep Sandhu <hidden>
Date: 2011-02-28 11:38:53

? ? ? ? What is the preemptive level you have set for your kernel,
As mentioned in my first mail, I have set the following options:

- "Preemption Model" option as  - Preemptible Kernel (Low-Latency Desktop).

This, I think, means that even the kernel can be preempted (involuntarily)

- "Preempt The Big Kernel Lock"
Check that one, and find out from your third party who provided
scheduler, the algorithm, and how it modifies the nice values.
The scheduler being used here is Con Kolivas' "Staircase Deadline"
scheduler. It uses a priority matrix, where each process is placed at
it's "static prio" position in the matrix. Here's a short desc of the
SD desgin (taken from patch file)

+Design description
+==================
+
+SD works off the principle of providing each task a quota of runtime that it is
+allowed to run at a number of priority levels determined by its static priority
+(ie. its nice level). If the task uses up its quota it has its priority
+decremented to the next level determined by a priority matrix. Once every
+runtime quota has been consumed of every priority level, a task is
queued on the
+"expired" array. When no other tasks exist with quota, the expired array is
+activated and fresh quotas are handed out. This is all done in O(1).
If the thread scheduling policy was set to SCHED_OTHER than the third
party scheduler is been used. If you set thread schd policy to
I'm not sure what you mean here? SCHED_OTHER is the default sched
policy used for normal process' (unless explicitly changed). I think
irrespective of what sched policy is being set, there's only 1
scheduler available for use, i.e in my case, the SD scheduler. CMIIW.
SCHED_FIFO for both decoder and rendering thread and set rendering
thread to higher priority it will do for you. The other decoder thread
can be in busy loop. Why do not create a notifier for decoder thread,
so that it will wake up only when data is available.
Well, i tried something similar and that seemed to work fairly well!

I set the scheduling policy of the decoder thread to "SCHED_BATCH".
Now I'm observing that the main render/GUI thread completes its
animation and then the decoder gets a chance to run (batch mode
processing).

We're not busy-looping. Rather we're making the decoder thread wait on
a job-queue. It'll sleep as long as the job-queue is empty.
Also, you need to tune your thread nr time and policies based on bit
rate of data you are rendering. If you can run in interims of bit rate
time both the threads, rendering and decoding, that creates a smooth
picture. Thats the catch.
Don't quite follow you here...what is "nr time" ? I don't quite
understand what is the significance of "bit rate" for static images?
Also note that these images (JPEG) are quite small in dimensions (~
200 x 150). The memory bandwidth available from the main memory (DRAM)
to the video-rendering subsystem is quite high (~2.6Gbps), so that
won't be a bottleneck.

For me the trick to solving this issue was to NOT do decoding while
the animation was going on. Even a single decode op use to make the
animation suffer as it had fairly strict timing requirement (not hard
real-time, but close). So forcing the decoder thread to sort-of
"pause" on decoding while animation is in progress, helped.
Are you using multi core to do the job or single core.
Single core. The processor has multi-threading support but that
support is disabled in the kernel config. Since this was something set
by the vendor, I'm not changing it.

Thanks,
-mandeep
--Sri.

On Thu, Feb 24, 2011 at 8:47 AM, Mandeep Sandhu
[off-list ref] wrote:
quoted
quoted
Quite long questions you have below...but I'll try to summarize and answer....
I did try to be as concise as possible! :)
quoted
Btw, your problem description is great....I believe it helps (at least
/me) to get a sense what you gonna do, what you've done and how it
really works. A nice example for every one of us....
Thanks
quoted
quoted
We're working in an MIPS based embedded system, running a fairly old
OK, I take a bold note here. I only have in touch with x86 32 bit, so
what I am going to say might be completely wrong it is brought to MIPS
realm.
No probs...even I'm no expert in MIPS (rather my first time with MIPS
as well!:))

The only thing that I found which _might_ be pertinent to our
discussion was that the multi-threading option for MIPS ?was disabled
("MIPS MT options (Disable multithreading support.)" ). Since this is
a vendor provided config option I have not changed it. So no processor
MT support for apps.
quoted
quoted
Linux 2.6.22 kernel (with vendor provided BSP). We write UI
I remember vaguely that CFS (Complete Fair Scheduler) was improved
somewhere after 2.6.22 version...I couldn't recall exactly what
changes they are...
The vendor provided linux kernel has the "Staircase Deadline"
scheduler patched into it...so no CFS here...
quoted
In fact, the latest "200 lines famous patch" also affect how scheduling works...
Yeah I read about it (thoug I couldn't grasp how the thing actually
works)...I have the user-space variant of this soln running on my
ubuntu box :)
quoted
Why not shifting the network I/O to the decoder threat? or IMHO,
better...another separate thread? So each other could
overlap...between CPU computation and I/O.
We have tested running the app with just the decoding bit disabled in
the decoder thread. The animation is pretty smooth...though thats also
because there's not much to do w/o the images! :)

QT handles n/w i/o pretty well, in a non-blocking, async
manner...though I'm not sure if it is internally using separate
threads for doing so...will have to find out.
quoted
one is lowest, latter is highest? hmmmm if we put that back to pre CFS
era, that could mean a very different time slice assignment...or in
simpler word...kinda bad idea. I think if it's using nice value, it's
better if the difference is around 5 or 10 by maximum.
The idea of assigning 2 extreme pri's was to ensure that the decode
thread never interferer's with the main thread while animation is
going on. It's almost like the main thread needs "real-time" priority
while it's doing animation...and goes back to normal priority when
idle! :)

I think SD sched uses nice values...I'm also not certain whether the
QT wrappers are assigning "nice" values when one tries to set priority
to a thread...will have to check and get back.
quoted
wait, so decoder just "eat" the content of the buffer without being
signaled before? in other word, it just work all the time?
I'm not sure i follow your question here.

The main thread _copies_ raw data rx'ed from the n/w and adds it to a
"job queue" of the decoder thread...a fxn in the decoder thread simply
checks if there are any jobs in the queue...if there is...it accesses
the data (which was copied earlier when adding the job) and decodes
the image...

This is where had the 2 types of implementations...i.e in one...this
job queue is checked continuously like:

while(true) {
? ?if (job-queue is NOT empty) {
? ? ? // do decode
? ?}
}

And in the second implementation:

while(true) {
? ?if (job-queue is NOT empty) {
? ? ? ?// do decode
? ?} else {
? ? ? ?// wait for main thread to signal us when a new job is available
? ?}
}

The "waiting" (in 2nd implementation) is done via thread
synchronization primitives available in QT
(http://doc.qt.nokia.com/4.6/qwaitcondition.html)
quoted

I think this is the problem and that's why I proposed to isolate the
network I/O into separate thread. It's like ping pong, main thread
push new data, decoder thread wait...it is then woken
up..decoding...main thread waits....

Technically it is called priority inversion..if I got it correctly
about your situation.
Hmmm...n/w io doesn't seem to be affecting animation perf of main
thread (as pointed above)...it's just that when the decoder thread has
a job to do..I need it to be preempted by the main thread so it can
complete its animation w/o the other thread taking away precious CPU
cycles...

I'm going to try an "renice"-ing the decoder thread to a higher value
and see if it changes the behaviour in the 2nd implementation (where
we don't busy-loop)...
quoted
Fixed? I don't think so. CFS is kinda using "delta" i.e if current
task runs for x and other which is waiting is y, then for the next
round, others deserve some kind of weighted x-y.
SD sched, i think, assigns a fixed quota of runtime (= timeslice?) and
if the process uses up this quota...it's priority is reduced to the
next level....
quoted
quoted
- How can I find out if the kernel supports NPTL (kernel managed
threads) or plain old linux threads (user-space managed threads)?
I think this trick might work: Check /proc/<pid>/maps or use pmap.
NPTL ones usually maps libtls in its process address space
pmap's not available! :(

and i couldn't see libtls mapped in this process's addr space (is it
really libtls? why would we have TLS library for NPTL?...isn't libtls
used for SSL communications?)
quoted
so, no coreutils/util-linux/util-linux-ng?
coreutils is there.....but most commands are stripped down/lightweight
versions of the originals! :)
quoted
quoted
Any other way to get more thread related info about a running application?
everything under /proc/<pid>? have you checked that?
This helped a little!

I can see the threads spawned by the main thread under
"/proc/<pid>/task". This dir lists pid's of all the threads started by
the parent proc...and contents of individual dir (pids) is same as
"/proc/<pid>"...

Here I could find out my decoder thread's ID...but again contents of
that dir does not show info like priority/nice value etc...

Thanks again for your inputs. I'll keep posting my findings
here...till I get a satisfactory soln to this issue.

Regards,
-mandeep
quoted
--
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com
_______________________________________________
Kernelnewbies mailing list
Kernelnewbies at kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


--
Regards,
Sri.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help