Versatile Express randomly fails to boot - Versatile Express to be removed from nightly testing
From: Sudeep Holla <hidden>
Date: 2015-03-31 17:27:30
On 30/03/15 16:39, Sudeep Holla wrote:
On 30/03/15 16:05, Russell King - ARM Linux wrote:quoted
On Mon, Mar 30, 2015 at 03:48:08PM +0100, Sudeep Holla wrote:quoted
Though <2 2 1> works fine most of the time, I did try testing continuous reboot overnight and it failed. I kept increasing the latencies and found out that even max latency of <8 8 8> could not survive continuous overnight reboot test and it fails with exact same issue. So I am not sure if we can consider it as a fix. However if we are OK to have *mostly reliable*, then we can push that change.Okay, the issue I have is this. Versatile Express used to boot reliably in the nightly build tests prior to DT. In that mode, we never configured the latency values.I have never run in legacy mode as I am relatively new to vexpress platform and started using with DT from first. Just to understand better I had a look at the commit commit 81cc3f868d30("ARM: vexpress: Remove non-DT code") and I see the below function in arch/arm/mach-vexpress/ct-ca9x4.c So I assume we were programming one cycle for all the latencies just like DT.
I was able to boot v3.18 without DT and I compared the L2C settings with and w/o DT, they are identical. Also v3.18 with and w/o DT survived overnight reboot testing.
quoted
Then the legacy code was removed, and I had to switch over to DT booting, and shortly after I noticed that the platform was now randomly failing its nightly boot tests. Maybe we should revert the commit removing the superior legacy code, because that seems to be the only thing that was reliable? Maybe it was premature to remove it until DT had proven itself?
Not sure on that as v3.18 with DT seems to be working fine and passed overnight reboot testing.
quoted
On the other hand, if the legacy code hadn't been removed, I probably would never have tested it - but then, from what I hear, this was a *known* issue prior to the removal of the legacy code. Given that the legacy code worked totally fine, it's utterly idiotic to me to have removed the working legacy code when DT is soo unstable. Whatever way I look at this, this problem _is_ a _regression_, and we can't sit around and hope it magically vanishes by some means.I agree, last time I tested it was fine with v3.18. However I have not run the continuous overnight reboot test on that. I will first started looking at that, just to see if it's issue related to DT vs legacy boot.
Since v3.18 is both boot modes and the problem is reproducible on v3.19-rc1. I am trying to bisect but not sure if that's feasible for such a problem. I also found out by accident that even on mainline with more configs enabled, it's hard to hit the issue. Regards, Sudeep