Thread (26 messages) 26 messages, 9 authors, 2015-11-04

Re: [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device

From: Gabriel L. Somlo <hidden>
Date: 2015-11-04 20:48:43
Also in: kernelnewbies, lkml, qemu-devel

On Mon, Oct 05, 2015 at 03:18:05PM +0200, Paolo Bonzini wrote:

On 05/10/2015 14:50, Peter Maydell wrote:
quoted
If you want to try to support "firmware might also be reading
fw_cfg at the same time as the kernel" this is a (painful)
problem regardless of how the kernel figures out whether a
fw_cfg device is present. I had assumed that one of the design
assumptions of this series was that firmware would only
read the fw_cfg before booting the guest kernel and never touch
it afterwards. If it might touch it later then letting the
guest kernel also mess with fw_cfg seems like a really bad idea.
The idea of tinkering with fw_cfg from the AML code (DSDT/SSDT) has been
proposed many times, and always dropped.  One of the reasons was that
the OS could have a driver for fw_cfg.

So I think that we can define the QEMU0002 id as owned by the OSPM,
similar to the various standard ACPI ids that are usually found in the
x86 world (e.g. PNP0B00 is a mc146818 RTC, PNP0303 is an 8042 keyboard
controller, PNP0501 is a 16550 or similar UART, and so on).  This
basically sanctions _CRS as the way to pass information from the
firmware to the OSPM, also similarly to those standard PNP ids.
OK, so I don't expect to go the "pure ACPI" route in the final
version, mainly because I'm still hoping to fill the requirement
of writing a driver which can use either ACPI or DT to detect the
presence of fw_cfg (how I'm going to compile this on kernels with
no ACPI or no DT support is still TBD, and probably will have to
involve #ifdef, but I digress :)

However, Michael's idea of having ACPI supply an "accessor method" for
the guest kernel driver to call, without having to worry about the
specific details in _CRS, sounded intriguing enough to at least explore
in further detail.

So, assuming we have the following fw_cfg AML in ssdt (i386) or
dsdt (arm) (using cpp-style #ifdef to highlight the arch-specific
bits):

Scope (\_SB)
{
    Device (FWCF)
    {
        Name (_HID, EisaId ("QMU0002"))  // _HID: Hardware ID
        Name (_STA, 0x0B)  // _STA: Status

#ifdef ARCH_X86

        Name (_CRS, ResourceTemplate ()
        {
            IO (Decode16,
                0x0510,             // Range Minimum
                0x0510,             // Range Maximum
                0x01,               // Alignment
                0x02,               // Length
                )
        })

#else /* ARCH_ARM */

        NAME (_CRS, ResourceTemplate ()
        {
            Memory32Fixed (ReadWrite,
                           0x09020000,  // Address Base
                           0x0000000a,  // Address Length
                          )
        })

#endif

    }
}

I can easily patch QEMU to additionally insert the following into
"Device (FWCF)":

#ifdef ARCH_X86

        OperationRegion (FWCR, SystemIO, 0x0510, 0x02)
        Field (FWCR, WordAcc, NoLock, Preserve)
        {
            FWCC,   16
        }
        Field (FWCR, ByteAcc, NoLock, Preserve)
        {
            Offset (0x01),
            FWCD,   8
        }

#else /* ARCH_ARM */

        OperationRegion (FWCR, SystemMemory, 0x09020000, 0x0a)
        Field (FWCR, WordAcc, NoLock, Preserve)
        {
            Offset (0x08),
            FWCC,   16  // not sure about endianness on ARM here, but
                        // I can still address this from the userspace
                        // kernel driver if needed
        }
        Field (FWCR, ByteAcc, NoLock, Preserve)
        {
            FWCD,   8
        }

#endif

        Method (RDBL, 3, Serialized) // (Arg0 = key, Arg1 = skip, Arg2 = count)
        {
            FWCC = Arg0
            Local0 = Zero
            While ((Local0 < Arg1))
            {
                Local1 = FWCD
                Local0++
            }
            Name (BBUF, Buffer (Arg2) {})
            Local0 = Zero
            While ((Local0 < Arg2))
            {
                Index (BBUF, Local0) = FWCD /* \_SB_.FWCF.FWCD */
                Local0++
            }
            Return (BBUF) /* \_SB_.FWCF.RDBL.BBUF */
        }

With the host generating the additional RDBL method above, I could
write a "pure ACPI" driver for the guest kernel, where instead of the
current fw_cfg_read_blob() logic:


  static DEFINE_MUTEX(fw_cfg_dev_lock);
  static bool fw_cfg_is_mmio;

  static inline u16 fw_cfg_sel_endianness(u16 key)
  {
          return fw_cfg_is_mmio ? cpu_to_be16(key) : cpu_to_le16(key);
  }

  static inline void fw_cfg_read_blob(u16 key,
                                      void *buf, loff_t pos, size_t count)
  {
          mutex_lock(&fw_cfg_dev_lock);
          iowrite16(fw_cfg_sel_endianness(key), fw_cfg_reg_ctrl);
          while (pos-- > 0)
                  ioread8(fw_cfg_reg_data);
          ioread8_rep(fw_cfg_reg_data, buf, count);
          mutex_unlock(&fw_cfg_dev_lock);
  }


I could instead write something like this:


  struct acpi_device *dev;    /* set during module init.  */

  static inline void fw_cfg_read_blob(u16 key,
                                      void *buf, loff_t pos, size_t count)
  {
          union acpi_object arg_elem[3], *obj;
          struct acpi_object_list arg;
          struct acpi_buffer acpi_buf = { ACPI_ALLOCATE_BUFFER, NULL };
          acpi_status status;

          arg.count = 3;
          arg.pointer = &arg_elem[0];

          arg_elem[0].type =
          arg_elem[1].type =
          arg_elem[2].type = ACPI_TYPE_INTEGER;

          arg_elem[0].integer.value = key;
          arg_elem[1].integer.value = pos;
          arg_elem[2].integer.value = count;

          status = acpi_evaluate_object_typed(dev->handle, "RDBL", &arg,
                                              &acpi_buf, ACPI_TYPE_BUFFER);
          if (ACPI_FAILURE(status)) {
              return; /* FIXME: actual error signaling to caller TBD */
          }
 
          obj = (union acpi_object *)acpi_buf.pointer;

          /* FIXME: in lieu of better error signaling to caller: */
          assert(count == obj->buffer.length);

          memcpy(buf, obj->buffer.pointer, obj->buffer.length);
          kfree(acpi_buf.pointer);
  }

Now, if ACPI-less DT-enabled architectures are to be supported, this
wouldn't get me out of having to spell out the original ioread8_rep()
based fw_cfg_read_blob(), so probably not worth doing.

But I simply *had* to try and chase this down before writing it off,
since part of the reason I'm doing all this in the first place is to
teach myself new tricks... :)

Sorry for going off on a somewhat lengthy (and maybe just a *little*
bit off-topic) tangent, but I figured I'd put it out there to at least
facilitate future archaeology :)

Obviously, any comments or feedback much appreciated!

Cheers,
--Gabriel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help