Thread (32 messages) 32 messages, 11 authors, 2018-04-19

[PATCH 2/2] perf: add arm64 smmuv3 pmu driver

From: Leeder, Neil <hidden>
Date: 2017-08-07 21:18:43
Also in: lkml

Hi Robin,
Thank you for your comments.

On 8/7/2017 10:31 AM, Robin Murphy wrote:
On 04/08/17 20:59, Neil Leeder wrote:
quoted
PMUs are named smmu_0_<phys_addr_page> where <phys_addr_page>
is the physical page address of the SMMU PMCG.
For example, the SMMU PMCG at 0xff88840000 is named smmu_0_ff88840
This seems a bit rough - is it at feasible to at least chase the node
reference and namespace them by the associated component, e.g. something
like "arm-smmu-v3.x:pmcg.y"? The user can always dig the associated
physical address out of /proc/iomem if necessary.
That looks like it may be better - I'll look into it. 
quoted
Filtering by stream id is done by specifying filtering parameters
with the event. Options are:
  filter_enable    - 0 = no filtering, 1 = filtering enabled
  filter_span      - 0 = exact match, 1 = pattern match
  filter_sec       - applies to non-secure (0) or secure (1) namespace
I'm a little dubious as to how useful it is to expose this, since we
can't see the true value of SCR.SO so have no way of knowing what we'll
actually end up counting.
I can remove the sec filter.
quoted
+config ARM_SMMUV3_PMU
+	 bool "ARM SMMUv3 PMU"
+	 depends on PERF_EVENTS && ARM64 && ACPI
PERF_EVENTS is already a top-level dependency now.
OK
quoted
+#include <linux/msi.h>
Is MSI support planned?
Not in this patchset. I'll remove the include.
quoted
+#define SMMU_PMCG_EVCNTR0               0x0
+#define SMMU_PMCG_EVCNTR(n, stride)     (SMMU_PMCG_EVCNTR0 + (n) * (stride))
+#define SMMU_PMCG_EVTYPER0              0x400
+#define SMMU_PMCG_EVTYPER(n)            (SMMU_PMCG_EVTYPER0 + (n) * 4)
+#define SMMU_PMCG_EVTYPER_SEC_SID_SHIFT       30
+#define SMMU_PMCG_EVTYPER_SID_SPAN_SHIFT      29
+#define SMMU_PMCG_EVTYPER_EVENT_MASK          GENMASK(15, 0)
+#define SMMU_PMCG_SVR0                  0x600
+#define SMMU_PMCG_SVR(n, stride)        (SMMU_PMCG_SVR0 + (n) * (stride))
+#define SMMU_PMCG_SMR0                  0xA00
+#define SMMU_PMCG_SMR(n)                (SMMU_PMCG_SMR0 + (n) * 4)
+#define SMMU_PMCG_CNTENSET0             0xC00
+#define SMMU_PMCG_CNTENCLR0             0xC20
+#define SMMU_PMCG_INTENSET0             0xC40
+#define SMMU_PMCG_INTENCLR0             0xC60
+#define SMMU_PMCG_OVSCLR0               0xC80
+#define SMMU_PMCG_OVSSET0               0xCC0
+#define SMMU_PMCG_CAPR                  0xD88
+#define SMMU_PMCG_SCR                   0xDF8
+#define SMMU_PMCG_CFGR                  0xE00
+#define SMMU_PMCG_CFGR_SID_FILTER_TYPE        BIT(23)
+#define SMMU_PMCG_CFGR_CAPTURE                BIT(22)
+#define SMMU_PMCG_CFGR_MSI                    BIT(21)
+#define SMMU_PMCG_CFGR_RELOC_CTRS             BIT(20)
+#define SMMU_PMCG_CFGR_SIZE_MASK              GENMASK(13, 8)
+#define SMMU_PMCG_CFGR_SIZE_SHIFT             8
+#define SMMU_PMCG_CFGR_COUNTER_SIZE_32        31
+#define SMMU_PMCG_CFGR_NCTR_MASK              GENMASK(5, 0)
+#define SMMU_PMCG_CFGR_NCTR_SHIFT             0
+#define SMMU_PMCG_CR                    0xE04
+#define SMMU_PMCG_CR_ENABLE                   BIT(0)
+#define SMMU_PMCG_CEID0                 0xE20
+#define SMMU_PMCG_CEID1                 0xE28
+#define SMMU_PMCG_IRQ_CTRL              0xE50
+#define SMMU_PMCG_IRQ_CTRL_IRQEN              BIT(0)
+#define SMMU_PMCG_IRQ_CTRLACK           0xE54
+#define SMMU_PMCG_IRQ_CTRLACK_IRQEN           BIT(0)
+#define SMMU_PMCG_IRQ_CFG0              0xE58
+#define SMMU_PMCG_IRQ_CFG0_ADDR_MASK          GENMASK(51, 2)
+#define SMMU_PMCG_IRQ_CFG1              0xE60
+#define SMMU_PMCG_IRQ_CFG2              0xE64
+#define SMMU_PMCG_IRQ_STATUS            0xE68
+#define SMMU_PMCG_IRQ_STATUS_IRQ_ABT          BIT(0)
+#define SMMU_PMCG_AIDR                  0xE70
Several of these are unused (although at least IRQ0_CFG1 probably should
be, to zero it to a known state if we aren't supporting MSIs).
I can remove the unused defines and clear IRQ_CFG1.
quoted
+	bool reg_size_32;
This guy is redundant...
[...]
quoted
+	if (smmu_pmu->reg_size_32)
...since it would be just as efficient to directly test
smmu_pmu->counter_mask & BIT(32) here and below.
OK
quoted
+
+	for (i = 0; i < smmu_pmu->num_counters; i++) {
+		smmu_pmu_counter_disable(smmu_pmu, i);
+		smmu_pmu_interrupt_disable(smmu_pmu, i);
+	}
Surely it would be far quicker and simpler to do this?

	writeq(smmu_pmu->counter_present_mask,
		smmu_pmu->reg_base + SMMU_PMCG_CNTENCLR0);
	writeq(smmu_pmu->counter_present_mask,
		smmu_pmu->reg_base + SMMU_PMCG_INTENCLR0);
OK
quoted
+static inline bool smmu_pmu_has_overflowed(struct smmu_pmu *smmu_pmu, u64 ovsr)
+{
+	return !!(ovsr & smmu_pmu->counter_present_mask);
+}
Personally, I find these helpers abstracting simple reads/writes
actually make the code harder to follow, especially when they're each
used a grand total of once. That may well just be me, though.
At least this one will go away with the below change to the interrupt handler.
quoted
+	new = SMMU_COUNTER_RELOAD;
Given that we *are* following the "use the top counter bit as an
implicit overflow bit" pattern of arm_pmu, it feels a bit weird to not
use the actual half-maximum value here (especially since it's easily
computable from counter_mask). I'm about 85% sure it probably still
works, but as ever inconsistency breeds uncertainty.
I thought that if we're happy with BIT(31) working fine with 32-bit registers,
it should also work for larger registers, so there was no need to waste more of
their bits. But I can change it to be half-max for all of them.
quoted
+static irqreturn_t smmu_pmu_handle_irq(int irq_num, void *data)
+{
+	struct smmu_pmu *smmu_pmu = data;
+	u64 ovsr;
+	unsigned int idx;
+
+	ovsr = smmu_pmu_getreset_ovsr(smmu_pmu);
+	if (!smmu_pmu_has_overflowed(smmu_pmu, ovsr))
You have an architectural guarantee that unimplemented bits of OVSSET0
are RES0, so checking !ovsr is sufficient.
OK
quoted
+	/* Verify specified event is supported on this PMU */
+	event_id = get_event(event);
+	if ((event_id >= SMMU_MAX_EVENT_ID) ||
What about raw imp-def events?
I can keep the check for common events, but also allow any raw event
in the imp-def range.
quoted
+	    (!test_bit(event_id, smmu_pmu->supported_events))) {
+		dev_dbg_ratelimited(&smmu_pmu->pdev->dev,
+				    "Invalid event %d for this PMU\n",
+				    event_id);
+		return -EINVAL;
+	}
[...]
quoted
+static int smmu_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
+{
+	struct smmu_pmu *smmu_pmu;
+	unsigned int target;
+
+	smmu_pmu = hlist_entry_safe(node, struct smmu_pmu, node);
Is it ever valid for node to be NULL? If we can't trust it to be one of
the PMUs we registered I think all bets are off anyway.
I was following the logic in arm-ccn.c and arm-cci.c. If it works for them
I would hope it works here too.
quoted
+
+	ceid_64 = readq(smmu_pmu->reg_base + SMMU_PMCG_CEID0);
+	ceid[0] = ceid_64 & GENMASK(31, 0);
It took a second look to determine that that masking does nothing...
quoted
+	ceid[1] = ceid_64 >> 32;
+	ceid_64 = readq(smmu_pmu->reg_base + SMMU_PMCG_CEID1);
+	ceid[2] = ceid_64 & GENMASK(31, 0);
+	ceid[3] = ceid_64 >> 32;
+	bitmap_from_u32array(smmu_pmu->supported_events, SMMU_MAX_EVENT_ID,
+			     ceid, SMMU_NUM_EVENTS_U32);
...but then the whole lot might be cleaner and simpler with a u64[2]
cast to u32* (or unioned to u32[4]) as necessary.
I've rewritten this about 4 different ways and didn't love any of them,
including this one. I can re-do this as you suggest.
quoted
+static struct platform_driver smmu_pmu_driver = {
+	.driver = {
+		.name = "arm-smmu-pmu",
Nit: "arm-smmu-v3-pmu" please, for consistency with the IOMMU driver
naming. There is a SMMUv2 PMU driver in the works, too ;)
ok

Thanks,

Neil
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help