Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]
From: Ville Syrjälä <hidden>
Date: 2016-08-09 17:21:13
Also in:
linux-acpi, linux-pm, lkml
On Thu, Jul 14, 2016 at 04:29:42PM +0800, Feng Tang wrote:
if you only want it to work, you can try an old patch https://bugzilla.kernel.org/attachment.cgi?id=76071 from a similar bug https://bugzilla.kernel.org/show_bug.cgi?id=41932 Alistair Buxton confirmed it work for 3.18 at least https://bugzilla.kernel.org/show_bug.cgi?id=107151#c16
That patch is a bit too ripe by now. Would need a fresh squeezed one.
Thanks, Feng On Wed, Jul 13, 2016 at 10:54 PM, Ville Syrjälä [off-list ref] wrote:quoted
On Tue, May 31, 2016 at 10:26:50AM +0300, Ville Syrjälä wrote:quoted
On Mon, May 30, 2016 at 10:43:51PM +0200, Rafael J. Wysocki wrote:quoted
On Thu, May 26, 2016 at 8:32 PM, Ville Syrjälä [off-list ref] wrote:quoted
On Wed, May 18, 2016 at 10:24:24AM +0300, Ville Syrjälä wrote:quoted
On Wed, May 18, 2016 at 01:14:42AM +0200, Rafael J. Wysocki wrote:quoted
On 5/16/2016 9:39 PM, Ville Syrjälä wrote:quoted
On Wed, May 11, 2016 at 04:34:06PM +0300, Ville Syrjälä wrote:quoted
On Wed, May 11, 2016 at 08:44:45AM -0400, Steven Rostedt wrote:quoted
On Wed, 11 May 2016 15:21:16 +0300 Ville Syrjälä [off-list ref] wrote:quoted
Yeah can't get anything from the machine at that point. netconsole didn't help either, and no serial on this machine. And IIRC I've tried ramoops on this thing in the past but unfortunately the memory got cleared on reboot.Can you look at the documentation in the kernel code at Documentation/power/basic-pm-debugging.txt And follow the procedures for testing suspend to RAM (although it requires mostly running the same tests as for hibernation suspending). You can also use the tool s2ram for this as well. See Documentation/power/s2ram.txt Perhaps this can give us a bit more light onto the problem. Basically the above does partial suspend and resume, and can pinpoint problem areas down to a more select location.All the pm_test modes work fine. The only difference between them was that 'platform' required me to manually wake up the machine (hitting a key was sufficient), whereas the others woke up without help. pm_trace gave me [ 1.306633] Magic number: 0:185:178 [ 1.322880] hash matches ../drivers/base/power/main.c:1070 [ 1.339270] acpi device:0e: hash matches [ 1.355414] platform: hash matches which is the TRACE_SUSPEND in __device_suspend_noirq(), so no help there. I guess I could try to sprinkle more TRACE_RESUMEs around into some early resume code. If anyone has good ideas where to put them it might speed things up a bit.So I did a bunch of that and found that it gets stuck somewhere around executing the _WAK method: platform_resume_noirq acpi_pm_finish acpi_leave_sleep_state acpi_hw_sleep_dispatch acpi_hw_legacy_wake acpi_hw_execute_sleep_method acpi_evaluate_object acpi_ns_evaluate acpi_ps_execute_method acpi_ps_parse_aml It also seesm that adding a few TRACE_RESUME()s or an msleep() right after enable_nonboot_cpus() can avoid the hang, sometimes. I've attached the DSDT in case anyone is interested in looking at it.What if you comment out the execution of _WAK (line 318 of drivers/acpi/acpica/hwsleep.c in 4.6)? Does that make any difference?Indeed it does. Tried with acpi_idle and intel_idle, and both appear to resume just fine with that hack. - acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state); + //acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state); + printk(KERN_CRIT "skipping _WAK\n");Continuing with my detective work a bit, I decided to hack the DSDT a bit to see if I can narrow the it down further, and looks like I found it on the first guess. The following change stops it from hanging. @ -5056,7 +5056,7 @@ If (LEqual (Arg0, 0x03)) { Store (0x01, \SPNF) - TRAP (0x46) + //TRAP (0x46) P8XH (0x00, 0x03) } So what does that do? Let's see: OperationRegion (IO_T, SystemIO, 0x0800, 0x10) Field (IO_T, ByteAcc, NoLock, Preserve) { Offset (0x08), TRP0, 8 } OperationRegion (GNVS, SystemMemory, 0x3F5E0C7C, 0x0200) Field (GNVS, AnyAcc, Lock, Preserve) { OSYS, 16, SMIF, 8, ... Method (TRAP, 1, Serialized) { Store (Arg0, SMIF) /* \SMIF */ Store (0x00, TRP0) /* \TRP0 */ Return (SMIF) /* \SMIF */ } and a dump of the IOTR registers shows: 0x1e80: 0x0000fe01 0x1e84: 0x00020001 0x1e98: 0x000c0801 0x1e9c: 0x000200f0 which seems to be telling me that ports 0x800-0x80f and 0xfe00-0xfe03 would trigger an SMI.Well, the name of the method kind of suggests that it triggers an SMM trap. :-)Which is why I wanted confirm that by looking at the IOTR regs ;)quoted
quoted
So the next question is how do the idle drivers and cpu hotplug fit into this picture. Do we need to force the second HT into a specific C state before the SMI or something?Or you can ask why exactly someone put that SMM trap into _WAK. Apparently, it was regarded as necessary or no one would have bothered. The only reason I can see why it might be regarded as necessary was that Windows did something Linux doesn't do on that platform, or, which to me is far more interesting, that Windows didn't do something actually done by Linux. My theory would be that Windows didn't reinitialize the second HT properly during resume and the trap was added to let SMM do that. If that's the case, the trap may trigger by the time the second HT already executes code in Linux and then it will mess up with it and crash. Now, what do idles states have to do with that? IIRC, Windows puts nonboot CPUs into idle states before suspend, so the SMM code triggered by the trap may make assumptions about the CPU being in such a state or similar.BTW I also tried to move the enable_nonboot_cpus() after _WAK, and I tried to boot with nosmp, but neither trick helped. If someone could throw some patches my way to force things into a specific state before suspend/_WAK I'd be happy to test them out.Ping. Anyone have any ideas what to try here? Would be nice to get this machine working again... -- Ville Syrjälä Intel OTC -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-- Ville Syrjälä Intel OTC