Thread (27 messages) 27 messages, 4 authors, 2021-01-18

Re: [PATCH v3 4/4] arm64: mte: Optimize mte_assign_mem_tag_range()

From: Vincenzo Frascino <vincenzo.frascino@arm.com>
Date: 2021-01-16 17:23:37
Also in: lkml

Hi Mark,

On 1/15/21 3:45 PM, Mark Rutland wrote:
On Fri, Jan 15, 2021 at 12:00:43PM +0000, Vincenzo Frascino wrote:
quoted
mte_assign_mem_tag_range() is called on production KASAN HW hot
paths. It makes sense to optimize it in an attempt to reduce the
overhead.

Optimize mte_assign_mem_tag_range() based on the indications provided at
[1].
... what exactly is the optimization?

I /think/ you're just trying to have it inlined, but you should mention
that explicitly.
Good point, I will change it in the next version. I used "Optimize" as a
continuation of the topic in the previous thread but you are right it is not
immediately obvious.
quoted
[1] https://lore.kernel.org/r/CAAeHK+wCO+J7D1_T89DG+jJrPLk3X9RsGFKxJGd0ZcUFjQT-9Q@mail.gmail.com/ (local)

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
---
 arch/arm64/include/asm/mte.h | 26 +++++++++++++++++++++++++-
 arch/arm64/lib/mte.S         | 15 ---------------
 2 files changed, 25 insertions(+), 16 deletions(-)
diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index 1a715963d909..9730f2b07b79 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -49,7 +49,31 @@ long get_mte_ctrl(struct task_struct *task);
 int mte_ptrace_copy_tags(struct task_struct *child, long request,
 			 unsigned long addr, unsigned long data);
 
-void mte_assign_mem_tag_range(void *addr, size_t size);
+static inline void mte_assign_mem_tag_range(void *addr, size_t size)
+{
+	u64 _addr = (u64)addr;
+	u64 _end = _addr + size;
+
+	/*
+	 * This function must be invoked from an MTE enabled context.
+	 *
+	 * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and
+	 * size must be non-zero and MTE_GRANULE_SIZE aligned.
+	 */
+	do {
+		/*
+		 * 'asm volatile' is required to prevent the compiler to move
+		 * the statement outside of the loop.
+		 */
+		asm volatile(__MTE_PREAMBLE "stg %0, [%0]"
+			     :
+			     : "r" (_addr)
+			     : "memory");
+
+		_addr += MTE_GRANULE_SIZE;
+	} while (_addr < _end);
Is there any chance that this can be used for the last bytes of the
virtual address space? This might need to change to `_addr == _end` if
that is possible, otherwise it'll terminate early in that case.
Theoretically it is a possibility. I will change the condition and add a note
for that.
quoted
+}
What does the code generation look like for this, relative to the
assembly version?
The assembly looks like this:

 390:   8b000022        add     x2, x1, x0
 394:   aa0003e1        mov     x1, x0
 398:   d9200821        stg     x1, [x1]
 39c:   91004021        add     x1, x1, #0x10
 3a0:   eb01005f        cmp     x2, x1
 3a4:   54ffffa8        b.hi    398 <mte_set_mem_tag_range+0x48>

You can see the handcrafted one below.
Thanks,
Mark.
quoted
+
 
 #else /* CONFIG_ARM64_MTE */
 
diff --git a/arch/arm64/lib/mte.S b/arch/arm64/lib/mte.S
index 9e1a12e10053..a0a650451510 100644
--- a/arch/arm64/lib/mte.S
+++ b/arch/arm64/lib/mte.S
@@ -150,18 +150,3 @@ SYM_FUNC_START(mte_restore_page_tags)
 	ret
 SYM_FUNC_END(mte_restore_page_tags)
 
-/*
- * Assign allocation tags for a region of memory based on the pointer tag
- *   x0 - source pointer
- *   x1 - size
- *
- * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and
- * size must be non-zero and MTE_GRANULE_SIZE aligned.
- */
-SYM_FUNC_START(mte_assign_mem_tag_range)
-1:	stg	x0, [x0]
-	add	x0, x0, #MTE_GRANULE_SIZE
-	subs	x1, x1, #MTE_GRANULE_SIZE
-	b.gt	1b
-	ret
-SYM_FUNC_END(mte_assign_mem_tag_range)
-- 
2.30.0
-- 
Regards,
Vincenzo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help