[v3 3/5] coresight: add support for debug module
From: mathieu.poirier@linaro.org (Mathieu Poirier)
Date: 2017-03-13 16:30:58
Also in:
linux-clk, linux-devicetree, lkml
On Fri, Mar 10, 2017 at 01:59:15AM +0800, Leo Yan wrote:
Hi Suziku, Thanks for reviewing, please see some replying. On Thu, Mar 09, 2017 at 04:53:05PM +0000, Suzuki K Poulose wrote:quoted
On 03/03/17 06:00, Leo Yan wrote:quoted
Coresight includes debug module and usually the module connects with CPU debug logic. ARMv8 architecture reference manual (ARM DDI 0487A.k) has description for related info in "Part H: External Debug". Chapter H7 "The Sample-based Profiling Extension" introduces several sampling registers, e.g. we can check program counter value with combined CPU exception level, secure state, etc. So this is helpful for analysis CPU lockup scenarios, e.g. if one CPU has run into infinite loop with IRQ disabled. In this case the CPU cannot switch context and handle any interrupt (including IPIs), as the result it cannot handle SMP call for stack dump. This patch is to enable coresight debug module, so firstly this driver is to bind apb clock for debug module and this is to ensure the debug module can be accessed from program or external debugger. And the driver uses sample-based registers for debug purpose, e.g. when system detects the CPU lockup and trigger panic, the driver will dump program counter and combined context registers (EDCIDSR, EDVIDSR); by parsing context registers so can quickly get to know CPU secure state, exception level, etc.The problem is, it is not guaranteed that the EDPCSR_Hi, EDCIDSR & EDVIDSR are updated as a side effect of a memory mapped access (which is what we do here) to the EDPCSR_Lo. Section H.7.1.2 : Reads of EDPCSRs (in ARM DDI 0487A.k) : "The indirect writes to EDCIDSR, EDVIDSR, and EDPCSRhi might not occur for a memory-mapped access to the external debug interface. For more information, see Memory-mapped accesses to the external debug interface on page H8-4968." So we cannot really rely on the values in EDVIDSR which we use to make further decisions. So I am wondering if this is really guranteed to be useful.So this is caused by Software lock is locked? Section H8.4.1: "Reads and writes have no side-effects. A side-effect is where a direct read or a direct write of a register creates an indirect write of the same or another register. When the Software Lock is locked, the indirect write does not occur."quoted
quoted
Some of the debug module registers are located in CPU power domain, so in the driver it has checked the power state for CPU before accessing registers within CPU power domain. For most safe way to use this driver, it's suggested to disable CPU low power states, this can simply set "nohlt" in kernel command line. Signed-off-by: Leo Yan <redacted> --- drivers/hwtracing/coresight/Kconfig | 10 + drivers/hwtracing/coresight/Makefile | 1 + drivers/hwtracing/coresight/coresight-debug.c | 377 ++++++++++++++++++++++++++ 3 files changed, 388 insertions(+) create mode 100644 drivers/hwtracing/coresight/coresight-debug.cdiff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig index 130cb21..3ed651e 100644 --- a/drivers/hwtracing/coresight/Kconfig +++ b/drivers/hwtracing/coresight/Kconfig@@ -89,4 +89,14 @@ config CORESIGHT_STMlogging useful software events or data coming from various entities in the system, possibly running different OSs +config CORESIGHT_DEBUGTo make it more specific, may be CORESIGHT_CPU_DEBUG ?Will fix.quoted
quoted
+ bool "CoreSight debug driver""Coresight CPU Debug driver"Will fix.quoted
quoted
+ depends on ARM || ARM64 + help + This driver provides support for coresight debugging module. This + is primarily used to dump sample-based profiling registers for + panic. To avoid lockups when accessing debug module registers, + it is safer to disable CPU low power states (like "nohlt" on the + kernel command line) when using this feature. +quoted
+#define EDPCSR_THUMB BIT(0) +#define EDPCSR_ARM_INST_MASK GENMASK(31, 2) +#define EDPCSR_THUMB_INST_MASK GENMASK(31, 1)We don't need two different masks. {ED/DBG}PCSR has only bit 0 reserved for instruction set indication.I think we need this two different masks. Please review below extra doc for PC offset analysis in ARM ARM.quoted
quoted
+#endif + +/* bits definition for EDPRSR */ +#define EDPRSR_DLK BIT(6) +#define EDPRSR_PU BIT(0) + + +static void debug_read_regs(struct debug_drvdata *drvdata) +{ + drvdata->edprsr = readl_relaxed(drvdata->base + EDPRSR); + + if (!debug_access_permitted(drvdata)) + return; + + if (!drvdata->edpcsr_present) + return; + + CS_UNLOCK(drvdata->base); + + debug_os_unlock(drvdata); + + drvdata->edpcsr = readl_relaxed(drvdata->base + EDPCSR); + + /* + * As described in ARM DDI 0487A.k, if the processing + * element (PE) is in debug state, or sample-based + * profiling is prohibited, EDPCSR reads as 0xFFFFFFFF; + * EDCIDSR, EDVIDSR and EDPCSR_HI registers also become + * UNKNOWN state. So directly bail out for this case. + */ + if (drvdata->edpcsr == EDPCSR_PROHIBITED) { + CS_LOCK(drvdata->base); + return; + } + + /* + * A read of the EDPCSR normally has the side-effect of + * indirectly writing to EDCIDSR, EDVIDSR and EDPCSR_HI; + * at this point it's safe to read value from them. + */See my comment above about the side effects of memory mapped access.Yeah.quoted
quoted
+ drvdata->edcidsr = readl_relaxed(drvdata->base + EDCIDSR); +#ifdef CONFIG_64BIT + drvdata->edpcsr_hi = readl_relaxed(drvdata->base + EDPCSR_HI); +#endifquoted
+ + if (drvdata->edvidsr_present) + drvdata->edvidsr = readl_relaxed(drvdata->base + EDVIDSR); + + CS_LOCK(drvdata->base); +} +quoted
+#ifndef CONFIG_64BITI guess this doesn't help for an ARMv8 32bit only core (e.g, Cortex-A32). And unfortunately, there are conflicting definitions for the values for PCSROffset w.r.t ARMv8 and ARMv7. DBGDEVID1[3:0] For ARMv7 : 0000 - Sample offset applies based on the instruction state. 0001 - No offset applies. EDDEVID1[3:0] For ARMv8 : 0000 - EDPCSR not implemented 0010 - EDPCSR implemented without offsets, but do not use in AArch32 state! So there is no easy way to make sense of the value, unless you know which version of the architecture is in use. Or may be we could co-relate it with the value from DEVID. i.e, EDPCSR is not implemented do not register this device, see comments on debug_probe(). ( And we should also include the following test for 32bit code to see if edpcsr is implemented. See comments on debug_init_arch_data() ) That way, we could use the following inference from the PCSROffset value : 0000 - Sample offset applies based on the instruction state (indicated by PCSR[0]) 0001 - No offset applies. 0010 - No offset applies, but do not use in AArch32 modeJust now I went through ARM ARM and ARMv8 ARM, this makes sense to me. Thanks for good pointing for this.quoted
quoted
+static bool debug_pc_has_offset(struct debug_drvdata *drvdata) +{ + u32 pcsr_offset; + + pcsr_offset = drvdata->eddevid1 & EDDEVID1_PCSR_OFFSET_MASK; + + return (pcsr_offset == EDDEVID1_PCSR_OFFSET_INS_SET); +} + +static unsigned long debug_adjust_pc(struct debug_drvdata *drvdata, + unsigned long pc) +{ + unsigned long arm_inst_offset = 0, thumb_inst_offset = 0; + + if (debug_pc_has_offset(drvdata)) { + arm_inst_offset = 8; + thumb_inst_offset = 4; + } + + /* Handle thumb instruction */ + if (pc & EDPCSR_THUMB) { + pc = (pc & EDPCSR_THUMB_INST_MASK) - thumb_inst_offset; + return pc; + } + + /* + * Handle arm instruction offset, if the arm instruction + * is not 4 byte alignment then it's possible the case + * for implementation defined; keep original value for this + * case and print info for notice. + */ + if (pc & BIT(1)) + pr_emerg("Instruction offset is implementation defined\n");I am struggling to find the any mention about this in the ARM ARM. Please could you point me to it.Sure, please see ARM DDI 0406C.b, chapter C11.11.34 " DBGPCSR, Program Counter Sampling Register": A profiling tool can use the value of the T bit to calculate the instruction address as follows: When an offset is applied to the sampled address - if T is 0 and DBGPCSR[1] is 0, ((DBGPCSR[31:2] << 2) - 8) is the address of the sampled ARM instruction - if T is 0 and DBGPCSR[1] is 1, the instruction address is IMPLEMENTATION DEFINED - if T is 1, ((DBGPCSR[31:1] << 1) - 4) is the address of the sampled Thumb or ThumbEE instruction. When no offset is applied to the sampled address - if T is 0 and DBGPCSR[1] is 0, (DBGPCSR[31:2] << 2) is the address of the sampled ARM instruction - if T is 0 and DBGPCSR[1] is 1, the instruction address is IMPLEMENTATION DEFINED - if T is 1, (DBGPCSR[31:1] << 1) is the address of the sampled Thumb or ThumbEE instruction.quoted
quoted
+static void debug_init_arch_data(void *info) +{ + struct debug_drvdata *drvdata = info; + u32 mode; + + CS_UNLOCK(drvdata->base); + + debug_os_unlock(drvdata); + + /* Read device info */ + drvdata->eddevid = readl_relaxed(drvdata->base + EDDEVID); + drvdata->eddevid1 = readl_relaxed(drvdata->base + EDDEVID1); + + /* Parse implementation feature */ + mode = drvdata->eddevid & EDDEVID_PCSAMPLE_MODE; + if (mode == EDDEVID_IMPL_FULL) { + drvdata->edpcsr_present = true; + drvdata->edvidsr_present = true; + } else if (mode == EDDEVID_IMPL_EDPCSR_EDCIDSR) { + drvdata->edpcsr_present = true; + drvdata->edvidsr_present = false;As discussed above, we need to consult the DEVID1:PCSROffset for AArch32 to decide if we have the edpcsr implemented on ARMv8.Yeah.quoted
quoted
+ } else { + drvdata->edpcsr_present = false; + drvdata->edvidsr_present = false; + } + + CS_LOCK(drvdata->base); +} + +static int debug_probe(struct amba_device *adev, const struct amba_id *id) +{ + void __iomem *base; + struct device *dev = &adev->dev; + struct debug_drvdata *drvdata; + struct resource *res = &adev->res; + struct device_node *np = adev->dev.of_node; + char buf[32]; + static int debug_count; + + drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL); + if (!drvdata) + return -ENOMEM; + + drvdata->cpu = np ? of_coresight_get_cpu(np) : 0; + drvdata->dev = &adev->dev; + + dev_set_drvdata(dev, drvdata); + + /* Validity for the resource is already checked by the AMBA core */ + base = devm_ioremap_resource(dev, res); + if (IS_ERR(base)) + return PTR_ERR(base); + + drvdata->base = base; + + get_online_cpus(); + per_cpu(debug_drvdata, drvdata->cpu) = drvdata; + + if (smp_call_function_single(drvdata->cpu, + debug_init_arch_data, drvdata, 1)) + dev_err(dev, "Debug arch init failed\n");If this fails (say the CPU was offline), should we still return success ? And may be we should check if the drvdata->edpcsr_present to detect if the CPU implements the PC Sampling and return failure here if it doesn't.Will fix.quoted
quoted
+ + put_online_cpus(); + + if (!debug_count++) + atomic_notifier_chain_register(&panic_notifier_list, + &debug_notifier); +quoted
+ sprintf(buf, (char *)id->data, drvdata->cpu); + dev_info(dev, "%s initialized\n", buf);This could simply be : dev_info(dev, "Coresight debug-CPU%d initialized\n", drvdata->cpu); and get rid of the static string and the buffer, see below.quoted
+ return 0; +} + +static struct amba_id debug_ids[] = { + { /* Debug for Cortex-A53 */ + .id = 0x000bbd03, + .mask = 0x000fffff,...quoted
+ .data = "Coresight debug-CPU%d",I think this is pointless, as the debug area we are interested in is always associated with a CPU, we could as well figure out what to print from the drvdata->cpu above.I prefer to follow your suggestion for upper two comments; but I'd like check with Mathieu, due I followed up Mathieu's suggestion to write current code.
The end result is the same - I'm good either way. Thanks, Mathieu
Thanks, Leo Yan