Thread (36 messages) 36 messages, 3 authors, 2014-12-15
STALE4194d
Revisions (8)
  1. v1 [diff vs current]
  2. v1 current
  3. v1 [diff vs current]
  4. v1 [diff vs current]
  5. v1 [diff vs current]
  6. v2 [diff vs current]
  7. v2 [diff vs current]
  8. v2 [diff vs current]

[PATCH 0/4] Generic IOMMU page table framework

From: Will Deacon <hidden>
Date: 2014-12-01 12:05:34
Also in: linux-iommu

On Sun, Nov 30, 2014 at 10:03:08PM +0000, Laurent Pinchart wrote:
Hi Will,
Hi Laurent,
On Thursday 27 November 2014 11:51:14 Will Deacon wrote:
quoted
Hi all,

This series introduces a generic IOMMU page table allocation framework,
implements support for ARM long-descriptors and then ports the arm-smmu
driver over to the new code.

There are a few reasons for doing this:

  - Page table code is hard, and I don't enjoy shopping

  - A number of IOMMUs actually use the same table format, but currently
    duplicate the code

  - It provides a CPU (and architecture) independent allocator, which
    may be useful for some systems where the CPU is using a different
    table format for its own mappings

As illustrated in the final patch, an IOMMU driver interacts with the
allocator by passing in a configuration structure describing the
input and output address ranges, the supported pages sizes and a set of
ops for performing various TLB invalidation and PTE flushing routines.

The LPAE code implements support for 4k/2M/1G, 16k/32M and 64k/512M
mappings, but I decided not to implement the contiguous bit in the
interest of trying to keep the code semi-readable. This could always be
added later, if needed.
Do you have any idea how much the contiguous bit can improve performances in 
real use cases ?
It depends on the TLB, really. Given that the contiguous sized map directly
onto block sizes using different granules, I didn't see that the complexity
was worth it.

For example:

   4k granule : 16 contiguous entries => {64k, 32M, 16G}
  16k granule : 128 contiguous lvl3 entries => 2M
                32 contiguous lvl2 entries => 1G
  64k granule : 32 contiguous entries => {2M, 16G}

If we use block mappings, then we get:

   4k granule : 2M @ lvl2, 1G @ lvl1
  16k granule : 32M @ lvl2
  64k granule : 512M @ lvl2

so really, we only miss the ability to create 16G mappings. I doubt
that hardware even implements that size in the TLB (the contiguous bit
is only a hint).

On top of that, the contiguous bit leads to additional expense on unmap,
since you have extra TLB invalidation splitting the thing into
non-contiguous pages before you can do anything.

Will
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help