Thread scheduling in 2.6 kernels
From: Mandeep Sandhu <hidden>
Date: 2011-02-28 11:38:53
? ? ? ? What is the preemptive level you have set for your kernel,
As mentioned in my first mail, I have set the following options: - "Preemption Model" option as - Preemptible Kernel (Low-Latency Desktop). This, I think, means that even the kernel can be preempted (involuntarily) - "Preempt The Big Kernel Lock"
Check that one, and find out from your third party who provided scheduler, the algorithm, and how it modifies the nice values.
The scheduler being used here is Con Kolivas' "Staircase Deadline" scheduler. It uses a priority matrix, where each process is placed at it's "static prio" position in the matrix. Here's a short desc of the SD desgin (taken from patch file) +Design description +================== + +SD works off the principle of providing each task a quota of runtime that it is +allowed to run at a number of priority levels determined by its static priority +(ie. its nice level). If the task uses up its quota it has its priority +decremented to the next level determined by a priority matrix. Once every +runtime quota has been consumed of every priority level, a task is queued on the +"expired" array. When no other tasks exist with quota, the expired array is +activated and fresh quotas are handed out. This is all done in O(1).
If the thread scheduling policy was set to SCHED_OTHER than the third party scheduler is been used. If you set thread schd policy to
I'm not sure what you mean here? SCHED_OTHER is the default sched policy used for normal process' (unless explicitly changed). I think irrespective of what sched policy is being set, there's only 1 scheduler available for use, i.e in my case, the SD scheduler. CMIIW.
SCHED_FIFO for both decoder and rendering thread and set rendering thread to higher priority it will do for you. The other decoder thread can be in busy loop. Why do not create a notifier for decoder thread, so that it will wake up only when data is available.
Well, i tried something similar and that seemed to work fairly well! I set the scheduling policy of the decoder thread to "SCHED_BATCH". Now I'm observing that the main render/GUI thread completes its animation and then the decoder gets a chance to run (batch mode processing). We're not busy-looping. Rather we're making the decoder thread wait on a job-queue. It'll sleep as long as the job-queue is empty.
Also, you need to tune your thread nr time and policies based on bit rate of data you are rendering. If you can run in interims of bit rate time both the threads, rendering and decoding, that creates a smooth picture. Thats the catch.
Don't quite follow you here...what is "nr time" ? I don't quite understand what is the significance of "bit rate" for static images? Also note that these images (JPEG) are quite small in dimensions (~ 200 x 150). The memory bandwidth available from the main memory (DRAM) to the video-rendering subsystem is quite high (~2.6Gbps), so that won't be a bottleneck. For me the trick to solving this issue was to NOT do decoding while the animation was going on. Even a single decode op use to make the animation suffer as it had fairly strict timing requirement (not hard real-time, but close). So forcing the decoder thread to sort-of "pause" on decoding while animation is in progress, helped.
Are you using multi core to do the job or single core.
Single core. The processor has multi-threading support but that support is disabled in the kernel config. Since this was something set by the vendor, I'm not changing it. Thanks, -mandeep
--Sri. On Thu, Feb 24, 2011 at 8:47 AM, Mandeep Sandhu [off-list ref] wrote:quoted
quoted
Quite long questions you have below...but I'll try to summarize and answer....I did try to be as concise as possible! :)quoted
Btw, your problem description is great....I believe it helps (at least /me) to get a sense what you gonna do, what you've done and how it really works. A nice example for every one of us....Thanksquoted
quoted
We're working in an MIPS based embedded system, running a fairly oldOK, I take a bold note here. I only have in touch with x86 32 bit, so what I am going to say might be completely wrong it is brought to MIPS realm.No probs...even I'm no expert in MIPS (rather my first time with MIPS as well!:)) The only thing that I found which _might_ be pertinent to our discussion was that the multi-threading option for MIPS ?was disabled ("MIPS MT options (Disable multithreading support.)" ). Since this is a vendor provided config option I have not changed it. So no processor MT support for apps.quoted
quoted
Linux 2.6.22 kernel (with vendor provided BSP). We write UII remember vaguely that CFS (Complete Fair Scheduler) was improved somewhere after 2.6.22 version...I couldn't recall exactly what changes they are...The vendor provided linux kernel has the "Staircase Deadline" scheduler patched into it...so no CFS here...quoted
In fact, the latest "200 lines famous patch" also affect how scheduling works...Yeah I read about it (thoug I couldn't grasp how the thing actually works)...I have the user-space variant of this soln running on my ubuntu box :)quoted
Why not shifting the network I/O to the decoder threat? or IMHO, better...another separate thread? So each other could overlap...between CPU computation and I/O.We have tested running the app with just the decoding bit disabled in the decoder thread. The animation is pretty smooth...though thats also because there's not much to do w/o the images! :) QT handles n/w i/o pretty well, in a non-blocking, async manner...though I'm not sure if it is internally using separate threads for doing so...will have to find out.quoted
one is lowest, latter is highest? hmmmm if we put that back to pre CFS era, that could mean a very different time slice assignment...or in simpler word...kinda bad idea. I think if it's using nice value, it's better if the difference is around 5 or 10 by maximum.The idea of assigning 2 extreme pri's was to ensure that the decode thread never interferer's with the main thread while animation is going on. It's almost like the main thread needs "real-time" priority while it's doing animation...and goes back to normal priority when idle! :) I think SD sched uses nice values...I'm also not certain whether the QT wrappers are assigning "nice" values when one tries to set priority to a thread...will have to check and get back.quoted
wait, so decoder just "eat" the content of the buffer without being signaled before? in other word, it just work all the time?I'm not sure i follow your question here. The main thread _copies_ raw data rx'ed from the n/w and adds it to a "job queue" of the decoder thread...a fxn in the decoder thread simply checks if there are any jobs in the queue...if there is...it accesses the data (which was copied earlier when adding the job) and decodes the image... This is where had the 2 types of implementations...i.e in one...this job queue is checked continuously like: while(true) { ? ?if (job-queue is NOT empty) { ? ? ? // do decode ? ?} } And in the second implementation: while(true) { ? ?if (job-queue is NOT empty) { ? ? ? ?// do decode ? ?} else { ? ? ? ?// wait for main thread to signal us when a new job is available ? ?} } The "waiting" (in 2nd implementation) is done via thread synchronization primitives available in QT (http://doc.qt.nokia.com/4.6/qwaitcondition.html)quoted
I think this is the problem and that's why I proposed to isolate the network I/O into separate thread. It's like ping pong, main thread push new data, decoder thread wait...it is then woken up..decoding...main thread waits.... Technically it is called priority inversion..if I got it correctly about your situation.Hmmm...n/w io doesn't seem to be affecting animation perf of main thread (as pointed above)...it's just that when the decoder thread has a job to do..I need it to be preempted by the main thread so it can complete its animation w/o the other thread taking away precious CPU cycles... I'm going to try an "renice"-ing the decoder thread to a higher value and see if it changes the behaviour in the 2nd implementation (where we don't busy-loop)...quoted
Fixed? I don't think so. CFS is kinda using "delta" i.e if current task runs for x and other which is waiting is y, then for the next round, others deserve some kind of weighted x-y.SD sched, i think, assigns a fixed quota of runtime (= timeslice?) and if the process uses up this quota...it's priority is reduced to the next level....quoted
quoted
- How can I find out if the kernel supports NPTL (kernel managed threads) or plain old linux threads (user-space managed threads)?I think this trick might work: Check /proc/<pid>/maps or use pmap. NPTL ones usually maps libtls in its process address spacepmap's not available! :( and i couldn't see libtls mapped in this process's addr space (is it really libtls? why would we have TLS library for NPTL?...isn't libtls used for SSL communications?)quoted
so, no coreutils/util-linux/util-linux-ng?coreutils is there.....but most commands are stripped down/lightweight versions of the originals! :)quoted
quoted
Any other way to get more thread related info about a running application?everything under /proc/<pid>? have you checked that?This helped a little! I can see the threads spawned by the main thread under "/proc/<pid>/task". This dir lists pid's of all the threads started by the parent proc...and contents of individual dir (pids) is same as "/proc/<pid>"... Here I could find out my decoder thread's ID...but again contents of that dir does not show info like priority/nice value etc... Thanks again for your inputs. I'll keep posting my findings here...till I get a satisfactory soln to this issue. Regards, -mandeepquoted
-- regards, Mulyadi Santosa Freelance Linux trainer and consultant blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com_______________________________________________ Kernelnewbies mailing list Kernelnewbies at kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies-- Regards, Sri.