Fwd: Custom Linux Kernel Scheduler issue
From: Greg KH <hidden>
Date: 2016-11-24 18:46:52
Possibly related (same subject, not in this thread)
- 2016-11-24 · Custom Linux Kernel Scheduler issue · Greg KH <hidden>
- 2016-11-24 · Custom Linux Kernel Scheduler issue · Kenneth Adam Miller <hidden>
On Thu, Nov 24, 2016 at 11:33:04AM -0500, Kenneth Adam Miller wrote:
On Thu, Nov 24, 2016 at 11:13 AM, Greg KH [off-list ref] wrote:quoted
On Thu, Nov 24, 2016 at 10:31:18AM -0500, Kenneth Adam Miller wrote:quoted
On Nov 24, 2016 2:18 AM, "Greg KH" [off-list ref] wrote:quoted
On Thu, Nov 24, 2016 at 02:01:41AM -0500, Kenneth Adam Miller wrote:quoted
Hello, I have a scheduler issue in two different respects: 1) I have a process that is supposed to tight loop, and it is being given very very little time on the system. I don't want that - I want those who would use the processor to be given the resources to run as fast as they each can.What is causing it to give up its timeslice? Is it waiting for I/O? Doing something else to sleep?It's multithreaded, so it reads in a loop in one thread and writes in another thread. What I saw when I ran strace on it is each process would run for too long- the program is designed to try and stay out of the kernel on each side, so it checks some shared variables before it ever goes.So locking/cpu contention for those "shared variables" perhaps?I don't think that could possibly be it, because the shared variables are controlled by atomics. It's just some memory operation to check to see if it needs to go to the kernel, as in is there more data in the shm region for me to read? If not, I'll go wait on this OS semaphore. It's lightening fast on my host machine.
Ah, but your "host" and your "test" machines are two totally different things, as you say below. So how do you know that memory accesses and atomic writes/reads are the same?
quoted
quoted
quoted
quoted
2) I am seeing with perf that the maximum overhead at each section does not sum up to be more than 15 percent. Total, probably something like 18% of cpu time is used, and my binary has rocketed in slowness from about 2 seconds or less total to several minutes.What changed to make things slower? Did you change kernel versions or did you change something in your userspace program?The kernel versions specifically couldnt have anything to do with it but it was different kernels. The test runs in less that 2 seconds on my host. When I copy it to our custom linux, it takes minutes for it to run. I think it's some extra setting that we're missing while building the kernel, and I don't know what that is. I got a huge improvement when I changed the multicore scheduling to allow preemption "(desktop)" but there's still a problem as I've described with one of the processes not using the core as it should.What do you mean by "custom linux"? Is this the exact same hardware as your machine? Or different? If so, what is different? What is different between the different kernel versions you are using? Does the perf output look different from running on the two different machines? If so, where?I am building with buildroot a linux that is meant to be really stripped down and only have the things we want. In my case, the what the bzImage sees is either what QEMU gives it or what it sees in our dedicated hardware, with is just off the shelf i7 and other stuff you get a market - nothing custom in the sense you are thinking. Custom as in, roll your own linux. The kernel versions between my host and the target are 3.13.x and 3.14.5x; they don't change so much, and certainly don't affect performance on their own. I'm missing some setting or something with how I'm configuring or building linux.
Those are really old and obsolete kernel versions, not much we can do with them here :)
I haven't had a chance to run perf on my host. I can't find what ubuntu package it is just yet, but I will search for it in a minute. I have to go somewhere and will be right back immediately.quoted
Have you changed the priority levels of your application at all? Have you thought about just forcing your app to a specific CPU and getting the kernel off of that CPU in order so that the kernel isn't even an option here at all (Linux allows you to do this, details are somewhere in the documentation, sorry, can't remember off the top of my head...)No, that may be it or help though. I thought that binding an application to a particular cpu had something to do with affinity and that there was some C api for it or something. That would work for our particular scenario, and we've even talked about it, I just don't know how to do it yet.quoted
But really, you should track down what the differences are between your two machines/environments, as something is different that is causing the slow down.True - the kernel configuration is most suspect based on everything I know. The hardware differences between my host to the target we're building for is each modern, and well supported by linux. I'm thinking it absolutely must have something to do with the way I've built linux.quoted
You haven't even said what kernel version you are using, and if you have any of your own kernel patches in those kernels.For the target hardware is 3.14.5x, and there aren't any kernel patches at this time; I've disabled grsec while in the process of narrowing down what the problem is.
Woah, grsec does a _lot_ of different things, you have to just not use it if you wish to try to compare anything.
quoted
quoted
quoted
quoted
I think that the linux scheduler isn't scheduling it, because this process is just some unit tests that double as benchmarks in that they shm_open a file and write into it with memcpy's.Are you sure that I/O isn't happening here like through swap or something else?Well, we're using tmpfs and don't have a disk in the machine, but I will say this process is using all lot of the address space. One problem here is that the kernel has more ram than it thinks it does,What do you mean, is this a hardware issue?I don't think it's hardware; we're using this proprietary software beneath the linux kernel, but it's still ram of course. I can't say too too much, but what I can say is that while how much linux thinks it has could be affecting how it behaves, on our end we have the resources and can just change the configuration to make sure that linux sees and has enough ram. So that we can test on our end, and indeed we will.
Ah, this crazy thing. You are running two totally different hardware platforms here, with memory accesses working totally differently between them. Of course performance is going to be different, why would you expect it not to be? So try to compare apples to apples, not apples to "the smell of apples". Oh, and rip out grsec when doing benchmarks of anything, if you want to have a chance of comparing kernels. best of luck, greg k-h