Re: VFIO no-iommu

From: Alex Williamson <hidden>
Date: 2015-12-18 14:38:42

On Fri, 2015-12-18 at 10:43 +0000, Yigit, Ferruh wrote:

On Thu, Dec 17, 2015 at 09:43:59AM -0700, Alex Williamson wrote:
<...>

quoted

Also I need to disable VFIO_CHECK_EXTENSION ioctl,
because in
vfio
module,
container->noiommu is not set before doing a
vfio_group_set_container()
and vfio_for_each_iommu_driver selects wrong driver.

Running CHECK_EXTENSION on a container without the group
attached is
only going to tell you what extensions vfio is capable of,
not
necessarily what extensions are available to you with that
group.
Is this just a general dpdk- vfio ordering bug?

Yes, that is how VFIO was implemented in DPDK. I was under
the
impression that checking extension before assigning devices
was
the
correct way to do things, so as to not to try anything we
know
would
fail anyway. Does this imply that CHECK_EXTENSION needs to be
called
on both container and groups (or just on groups)?

Hmm, in Documentation/vfio.txt we do give the following
algorithm:

        if (ioctl(container, VFIO_GET_API_VERSION) !=
VFIO_API_VERSION)
                /* Unknown API version */

        if (!ioctl(container, VFIO_CHECK_EXTENSION,
VFIO_TYPE1_IOMMU))
                /* Doesn't support the IOMMU driver we want. */
        ...

That's just going to query each iommu driver and we can't yet
say
whether
the group the user attaches to the container later will
actually
support that
extension until we try to do it, that would come at
VFIO_SET_IOMMU.
 So is
it perhaps a vfio bug that we're not advertising no-iommu until
the
group is
attached?  After all, we are capable of it with just an empty
container, just
like we are with type1, but we're going to fail SET_IOMMU for
the
wrong
combination.
 This is exactly the sort of thing that makes me glad we
reverted
it without
feedback from a working user driver.  Thanks,

Whether it should be considered a "bug" in VFIO or "by design" is
up
to you, of course, but at least according to the VFIO
documentation,
we are meant to check for type 1 extension and then attach
devices,
so it would be expected to get VFIO_NOIOMMU_IOMMU marked as
supported
even without any devices attached to the container (just like we
get
type 1 as supported without any devices attached). Having said
that,
if it was meant to attach devices first and then check the
extensions, then perhaps the documentation should also point out
that
fact (or perhaps I missed that detail in my readings of the docs,
in
which case my apologies).

Hi Anatoly,

Does the below patch make it behave more like you'd expect.  This
applies to v4.4-rc4, I'd fold this into the base patch if we
reincorporate it to a future kernel.  Thanks,

Alex

commit 88d4dcb6b77624965f0b45b5cd305a2b4a105c94
Author: Alex Williamson [off-list ref]
Date:   Wed Dec 16 19:02:01 2015 -0700

    vfio: Fix no-iommu CHECK_EXTENSION
    
    Previously the no-iommu iommu driver was only visible when the
    container had an attached no-iommu group.  This means that
    CHECK_EXTENSION on and empty container couldn't report the
possibility
    of using VFIO_NOIOMMU_IOMMU.  We report TYPE1 whether or not
the user
    can make use of it with the group, so this is
inconsistent.  Add the
    no-iommu iommu to the list of iommu drivers when enabled via
module
    option, but skip all the others if the container is attached to
a
    no-iommu groups.  Note that tainting is now done with the
"unsafe"
    module callback rather than explictly within vfio.
    
    Also fixes module option and module description name
inconsistency.
    
    Also make vfio_noiommu_ops const.
    
    Signed-off-by: Alex Williamson [off-list ref]

Hi Alex,

I got following crash with this update:

[  +0.046973] BUG: unable to handle kernel NULL pointer dereference
at           (null)
[  +0.000031] IP: [<ffffffff813b92cf>] __list_add+V0x1f/0xc0
[  +0.000024] PGD 0 
[  +0.000010] Oops: 0000 [#1] SMP 
...
[  +0.000028] Call Trace:
[  +0.000014]  [<ffffffff81777301>] __mutex_lock_slowpath+0x91/0x110
[  +0.000022]  [<ffffffff817773a3>] mutex_lock+0x23/0x40
[  +0.000020]  [<ffffffffa0370c00>]
vfio_register_iommu_driver+0x40/0xc0 [vfio]
[  +0.000025]  [<ffffffffa0370cd1>] noiommu_param_set+0x51/0x60
[vfio]
[  +0.000022]  [<ffffffff810bbe3e>] parse_args+0x1be/0x4a0
[  +0.000021]  [<ffffffff81121ea0>] load_module+0xe30/0x2730
[  +0.000942]  [<ffffffff8111f500>] ? __symbol_put+0x60/0x60
[  +0.000921]  [<ffffffff81223f60>] ? kernel_read+0x50/0x80
[  +0.000917]  [<ffffffff811239f9>] SyS_finit_module+0xb9/0xf0
[  +0.000925]  [<ffffffff817796ae>]
entry_SYSCALL_64_fastpath+0x12/0x71

quoted

-static struct vfio_iommu_driver vfio_noiommu_driver = {
-	.ops = &vfio_noiommu_ops,
+static int noiommu_param_set(const char *val, const struct
kernel_param *kp)
+{
+	int ret;
+
+	if (!val)
+		val = "1";
+
+	ret = strtobool(val, kp->arg);
+	if (ret)
+		return ret;
+
+	if (noiommu)
+		ret =
vfio_register_iommu_driver(&vfio_noiommu_ops);

Is it possible that kernel_param_ops->set() called before
module_init() that initializes the mutex.
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/
kernel/module.c?id=refs/tags/v4.4-rc5#n3517

Yes, guess I didn't think that one through very well.  I'll try again.
 Thank you for testing, sorry it was a dud.  Thanks,

Alex

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help