[v3 3/5] coresight: add support for debug module

From: mathieu.poirier@linaro.org (Mathieu Poirier)
Date: 2017-03-13 16:30:58
Also in: linux-clk, linux-devicetree, lkml

On Fri, Mar 10, 2017 at 01:59:15AM +0800, Leo Yan wrote:

Hi Suziku,

Thanks for reviewing, please see some replying.

On Thu, Mar 09, 2017 at 04:53:05PM +0000, Suzuki K Poulose wrote:

quoted

On 03/03/17 06:00, Leo Yan wrote:

quoted

Coresight includes debug module and usually the module connects with CPU
debug logic. ARMv8 architecture reference manual (ARM DDI 0487A.k) has
description for related info in "Part H: External Debug".

Chapter H7 "The Sample-based Profiling Extension" introduces several
sampling registers, e.g. we can check program counter value with
combined CPU exception level, secure state, etc. So this is helpful for
analysis CPU lockup scenarios, e.g. if one CPU has run into infinite
loop with IRQ disabled. In this case the CPU cannot switch context and
handle any interrupt (including IPIs), as the result it cannot handle
SMP call for stack dump.

This patch is to enable coresight debug module, so firstly this driver
is to bind apb clock for debug module and this is to ensure the debug
module can be accessed from program or external debugger. And the driver
uses sample-based registers for debug purpose, e.g. when system detects
the CPU lockup and trigger panic, the driver will dump program counter
and combined context registers (EDCIDSR, EDVIDSR); by parsing context
registers so can quickly get to know CPU secure state, exception level,
etc.

The problem is, it is not guaranteed that the EDPCSR_Hi, EDCIDSR & EDVIDSR are
updated as a side effect of a memory mapped access (which is what we do here) to the
EDPCSR_Lo.

Section H.7.1.2 : Reads of EDPCSRs (in ARM DDI 0487A.k) :

"The indirect writes to EDCIDSR, EDVIDSR, and EDPCSRhi might not occur for a memory-mapped access
to the external debug interface. For more information, see Memory-mapped accesses to the external debug
interface on page H8-4968."

So we cannot really rely on the values in EDVIDSR which we use to make further decisions. So I
am wondering if this is really guranteed to be useful.

So this is caused by Software lock is locked?

Section H8.4.1: 

"Reads and writes have no side-effects. A side-effect is where a
direct read or a direct write of a register creates
an indirect write of the same or another register. When the Software
Lock is locked, the indirect write does
not occur."

quoted

Some of the debug module registers are located in CPU power domain, so
in the driver it has checked the power state for CPU before accessing
registers within CPU power domain. For most safe way to use this driver,
it's suggested to disable CPU low power states, this can simply set
"nohlt" in kernel command line.

Signed-off-by: Leo Yan <redacted>
---
drivers/hwtracing/coresight/Kconfig           |  10 +
drivers/hwtracing/coresight/Makefile          |   1 +
drivers/hwtracing/coresight/coresight-debug.c | 377 ++++++++++++++++++++++++++
3 files changed, 388 insertions(+)
create mode 100644 drivers/hwtracing/coresight/coresight-debug.c

diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig
index 130cb21..3ed651e 100644
--- a/drivers/hwtracing/coresight/Kconfig
+++ b/drivers/hwtracing/coresight/Kconfig

@@ -89,4 +89,14 @@ config CORESIGHT_STM

	  logging useful software events or data coming from various entities
	  in the system, possibly running different OSs

+config CORESIGHT_DEBUG

To make it more specific, may be CORESIGHT_CPU_DEBUG ?

Will fix.

quoted

+	bool "CoreSight debug driver"

"Coresight CPU Debug driver"

Will fix.

quoted

+	depends on ARM || ARM64
+	help
+	  This driver provides support for coresight debugging module. This
+	  is primarily used to dump sample-based profiling registers for
+	  panic. To avoid lockups when accessing debug module registers,
+	  it is safer to disable CPU low power states (like "nohlt" on the
+	  kernel command line) when using this feature.
+

quoted

+#define EDPCSR_THUMB			BIT(0)
+#define EDPCSR_ARM_INST_MASK		GENMASK(31, 2)
+#define EDPCSR_THUMB_INST_MASK		GENMASK(31, 1)

We don't need two different masks. {ED/DBG}PCSR has only bit 0 reserved
for instruction set indication.

I think we need this two different masks. Please review below extra doc
for PC offset analysis in ARM ARM.

quoted

+#endif
+
+/* bits definition for EDPRSR */
+#define EDPRSR_DLK			BIT(6)
+#define EDPRSR_PU			BIT(0)
+
+
+static void debug_read_regs(struct debug_drvdata *drvdata)
+{
+	drvdata->edprsr = readl_relaxed(drvdata->base + EDPRSR);
+
+	if (!debug_access_permitted(drvdata))
+		return;
+
+	if (!drvdata->edpcsr_present)
+		return;
+
+	CS_UNLOCK(drvdata->base);
+
+	debug_os_unlock(drvdata);
+
+	drvdata->edpcsr = readl_relaxed(drvdata->base + EDPCSR);
+
+	/*
+	 * As described in ARM DDI 0487A.k, if the processing
+	 * element (PE) is in debug state, or sample-based
+	 * profiling is prohibited, EDPCSR reads as 0xFFFFFFFF;
+	 * EDCIDSR, EDVIDSR and EDPCSR_HI registers also become
+	 * UNKNOWN state. So directly bail out for this case.
+	 */
+	if (drvdata->edpcsr == EDPCSR_PROHIBITED) {
+		CS_LOCK(drvdata->base);
+		return;
+	}
+
+	/*
+	 * A read of the EDPCSR normally has the side-effect of
+	 * indirectly writing to EDCIDSR, EDVIDSR and EDPCSR_HI;
+	 * at this point it's safe to read value from them.
+	 */

See my comment above about the side effects of memory mapped access.

Yeah.

quoted

+	drvdata->edcidsr = readl_relaxed(drvdata->base + EDCIDSR);
+#ifdef CONFIG_64BIT
+	drvdata->edpcsr_hi = readl_relaxed(drvdata->base + EDPCSR_HI);
+#endif

quoted

+
+	if (drvdata->edvidsr_present)
+		drvdata->edvidsr = readl_relaxed(drvdata->base + EDVIDSR);
+
+	CS_LOCK(drvdata->base);
+}
+

quoted

+#ifndef CONFIG_64BIT

I guess this doesn't help for an ARMv8 32bit only core (e.g, Cortex-A32). And
unfortunately, there are conflicting definitions for the values for PCSROffset w.r.t
ARMv8 and ARMv7.

DBGDEVID1[3:0] For ARMv7 :

0000 - Sample offset applies based on the instruction state.
0001 - No offset applies.

EDDEVID1[3:0] For ARMv8 :
0000 - EDPCSR not implemented
0010 - EDPCSR implemented without offsets, but do not use in AArch32 state!

So there is no easy way to make sense of the value, unless you know which version
of the architecture is in use. Or may be we could co-relate it with the value from
DEVID.

i.e, EDPCSR is not implemented do not register this device, see comments on debug_probe().
( And we should also include the following test for 32bit code to see if edpcsr is implemented.
See comments on debug_init_arch_data() )


That way, we could use the following inference from the PCSROffset value :

0000 - Sample offset applies based on the instruction state (indicated by PCSR[0])
0001 - No offset applies.
0010 - No offset applies, but do not use in AArch32 mode

Just now I went through ARM ARM and ARMv8 ARM, this makes sense to me.
Thanks for good pointing for this.

quoted

+static bool debug_pc_has_offset(struct debug_drvdata *drvdata)
+{
+	u32 pcsr_offset;
+
+	pcsr_offset = drvdata->eddevid1 & EDDEVID1_PCSR_OFFSET_MASK;
+
+	return (pcsr_offset == EDDEVID1_PCSR_OFFSET_INS_SET);
+}
+
+static unsigned long debug_adjust_pc(struct debug_drvdata *drvdata,
+				     unsigned long pc)
+{
+	unsigned long arm_inst_offset = 0, thumb_inst_offset = 0;
+
+	if (debug_pc_has_offset(drvdata)) {
+		arm_inst_offset = 8;
+		thumb_inst_offset = 4;
+	}
+
+	/* Handle thumb instruction */
+	if (pc & EDPCSR_THUMB) {
+		pc = (pc & EDPCSR_THUMB_INST_MASK) - thumb_inst_offset;
+		return pc;
+	}
+
+	/*
+	 * Handle arm instruction offset, if the arm instruction
+	 * is not 4 byte alignment then it's possible the case
+	 * for implementation defined; keep original value for this
+	 * case and print info for notice.
+	 */
+	if (pc & BIT(1))
+		pr_emerg("Instruction offset is implementation defined\n");

I am struggling to find the any mention about this in the ARM ARM. Please could
you point me to it.

Sure, please see ARM DDI 0406C.b, chapter C11.11.34 "
DBGPCSR, Program Counter Sampling Register":

A profiling tool can use the value of the T bit to calculate the
instruction address as follows:

When an offset is applied to the sampled address
- if T is 0 and DBGPCSR[1] is 0, ((DBGPCSR[31:2] << 2) - 8) is the
address of the sampled ARM instruction
- if T is 0 and DBGPCSR[1] is 1, the instruction address is
IMPLEMENTATION DEFINED
- if T is 1, ((DBGPCSR[31:1] << 1) - 4) is the address of the sampled
Thumb or ThumbEE instruction.

When no offset is applied to the sampled address
-  if T is 0 and DBGPCSR[1] is 0, (DBGPCSR[31:2] << 2) is the address
of the sampled ARM instruction
-  if T is 0 and DBGPCSR[1] is 1, the instruction address is
IMPLEMENTATION DEFINED
- if T is 1, (DBGPCSR[31:1] << 1) is the address of the sampled Thumb
or ThumbEE instruction.

quoted

+static void debug_init_arch_data(void *info)
+{
+	struct debug_drvdata *drvdata = info;
+	u32 mode;
+
+	CS_UNLOCK(drvdata->base);
+
+	debug_os_unlock(drvdata);
+
+	/* Read device info */
+	drvdata->eddevid  = readl_relaxed(drvdata->base + EDDEVID);
+	drvdata->eddevid1 = readl_relaxed(drvdata->base + EDDEVID1);
+
+	/* Parse implementation feature */
+	mode = drvdata->eddevid & EDDEVID_PCSAMPLE_MODE;
+	if (mode == EDDEVID_IMPL_FULL) {
+		drvdata->edpcsr_present  = true;
+		drvdata->edvidsr_present = true;
+	} else if (mode == EDDEVID_IMPL_EDPCSR_EDCIDSR) {
+		drvdata->edpcsr_present  = true;
+		drvdata->edvidsr_present = false;

As discussed above, we need to consult the DEVID1:PCSROffset for AArch32 to decide
if we have the edpcsr implemented on ARMv8.

Yeah.

quoted

+	} else {
+		drvdata->edpcsr_present  = false;
+		drvdata->edvidsr_present = false;
+	}
+
+	CS_LOCK(drvdata->base);
+}
+
+static int debug_probe(struct amba_device *adev, const struct amba_id *id)
+{
+	void __iomem *base;
+	struct device *dev = &adev->dev;
+	struct debug_drvdata *drvdata;
+	struct resource *res = &adev->res;
+	struct device_node *np = adev->dev.of_node;
+	char buf[32];
+	static int debug_count;
+
+	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
+	if (!drvdata)
+		return -ENOMEM;
+
+	drvdata->cpu = np ? of_coresight_get_cpu(np) : 0;
+	drvdata->dev = &adev->dev;
+
+	dev_set_drvdata(dev, drvdata);
+
+	/* Validity for the resource is already checked by the AMBA core */
+	base = devm_ioremap_resource(dev, res);
+	if (IS_ERR(base))
+		return PTR_ERR(base);
+
+	drvdata->base = base;
+
+	get_online_cpus();
+	per_cpu(debug_drvdata, drvdata->cpu) = drvdata;
+
+	if (smp_call_function_single(drvdata->cpu,
+				debug_init_arch_data, drvdata, 1))
+		dev_err(dev, "Debug arch init failed\n");

If this fails (say the CPU was offline), should we still return success ?
And may be we should check if the drvdata->edpcsr_present to detect if the CPU
implements the PC Sampling and return failure here if it doesn't.

Will fix.

quoted

+
+	put_online_cpus();
+
+	if (!debug_count++)
+		atomic_notifier_chain_register(&panic_notifier_list,
+					       &debug_notifier);
+

quoted

+	sprintf(buf, (char *)id->data, drvdata->cpu);
+	dev_info(dev, "%s initialized\n", buf);

This could simply be :
	dev_info(dev, "Coresight debug-CPU%d initialized\n", drvdata->cpu);

and get rid of the static string and the buffer, see below.

quoted

+	return 0;
+}
+
+static struct amba_id debug_ids[] = {
+	{       /* Debug for Cortex-A53 */
+		.id	= 0x000bbd03,
+		.mask	= 0x000fffff,

...

quoted

+		.data   = "Coresight debug-CPU%d",

I think this is pointless, as the debug area we are interested in is always associated
with a CPU, we could as well figure out what to print from the drvdata->cpu above.

I prefer to follow your suggestion for upper two comments; but I'd like
check with Mathieu, due I followed up Mathieu's suggestion to write
current code.

The end result is the same - I'm good either way.

Thanks,
Mathieu

Thanks,
Leo Yan

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help