Thread (72 messages) 72 messages, 9 authors, 2021-08-05

Re: [PATCH 06/14] peci: Add core infrastructure

From: "Winiarska, Iwona" <iwona.winiarska@intel.com>
Date: 2021-07-16 21:08:54
Also in: linux-arm-kernel, linux-aspeed, linux-doc, linux-hwmon, lkml, openbmc

On Wed, 2021-07-14 at 17:19 +0000, Williams, Dan J wrote:
On Tue, 2021-07-13 at 00:04 +0200, Iwona Winiarska wrote:
quoted
Intel processors provide access for various services designed to support
processor and DRAM thermal management, platform manageability and
processor interface tuning and diagnostics.
Those services are available via the Platform Environment Control
Interface (PECI) that provides a communication channel between the
processor and the Baseboard Management Controller (BMC) or other
platform management device.

This change introduces PECI subsystem by adding the initial core module
and API for controller drivers.

Co-developed-by: Jason M Bills <redacted>
Signed-off-by: Jason M Bills <redacted>
Co-developed-by: Jae Hyun Yoo <redacted>
Signed-off-by: Jae Hyun Yoo <redacted>
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
Reviewed-by: Pierre-Louis Bossart <redacted>
---
 MAINTAINERS             |   9 +++
 drivers/Kconfig         |   3 +
 drivers/Makefile        |   1 +
 drivers/peci/Kconfig    |  14 ++++
 drivers/peci/Makefile   |   5 ++
 drivers/peci/core.c     | 166 ++++++++++++++++++++++++++++++++++++++++
 drivers/peci/internal.h |  20 +++++
 drivers/peci/sysfs.c    |  48 ++++++++++++
 include/linux/peci.h    |  82 ++++++++++++++++++++
 9 files changed, 348 insertions(+)
 create mode 100644 drivers/peci/Kconfig
 create mode 100644 drivers/peci/Makefile
 create mode 100644 drivers/peci/core.c
 create mode 100644 drivers/peci/internal.h
 create mode 100644 drivers/peci/sysfs.c
 create mode 100644 include/linux/peci.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 6f77aaca2a30..47411e2b6336 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14495,6 +14495,15 @@ L:     platform-driver-x86@vger.kernel.org
 S:     Maintained
 F:     drivers/platform/x86/peaq-wmi.c
 
+PECI SUBSYSTEM
+M:     Iwona Winiarska [off-list ref]
+R:     Jae Hyun Yoo [off-list ref]
+L:     openbmc@lists.ozlabs.org (moderated for non-subscribers)
+S:     Supported
+F:     Documentation/devicetree/bindings/peci/
+F:     drivers/peci/
+F:     include/linux/peci.h
+
 PENSANDO ETHERNET DRIVERS
 M:     Shannon Nelson [off-list ref]
 M:     drivers@pensando.io
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 8bad63417a50..f472b3d972b3 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -236,4 +236,7 @@ source "drivers/interconnect/Kconfig"
 source "drivers/counter/Kconfig"
 
 source "drivers/most/Kconfig"
+
+source "drivers/peci/Kconfig"
+
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 27c018bdf4de..8d96f0c3dde5 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -189,3 +189,4 @@ obj-$(CONFIG_GNSS)          += gnss/
 obj-$(CONFIG_INTERCONNECT)     += interconnect/
 obj-$(CONFIG_COUNTER)          += counter/
 obj-$(CONFIG_MOST)             += most/
+obj-$(CONFIG_PECI)             += peci/
diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
new file mode 100644
index 000000000000..601cc3c3c852
--- /dev/null
+++ b/drivers/peci/Kconfig
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+menuconfig PECI
+       tristate "PECI support"
+       help
+         The Platform Environment Control Interface (PECI) is an interface
+         that provides a communication channel to Intel processors and
+         chipset components from external monitoring or control devices.
+
+         If you want PECI support, you should say Y here and also to the
+         specific driver for your bus adapter(s) below.
The user is reading this help text to decide if they want PECI
support, so clarifying that if they want PECI support they should turn
it on is not all that helpful. I would say "If you are building a
kernel for a Board Management Controller (BMC) say Y. If unsure say
N".
Since PECI is only available on Intel platforms, perhaps something
like:
"If you are building a Board Management Controller (BMC) kernel for
Intel platform say Y"?
quoted
+
+         This support is also available as a module. If so, the module
+         will be called peci.
diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
new file mode 100644
index 000000000000..2bb2f51bcda7
--- /dev/null
+++ b/drivers/peci/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+# Core functionality
+peci-y := core.o sysfs.o
+obj-$(CONFIG_PECI) += peci.o
diff --git a/drivers/peci/core.c b/drivers/peci/core.c
new file mode 100644
index 000000000000..0ad00110459d
--- /dev/null
+++ b/drivers/peci/core.c
@@ -0,0 +1,166 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2018-2021 Intel Corporation
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/bug.h>
+#include <linux/device.h>
+#include <linux/export.h>
+#include <linux/idr.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/peci.h>
+#include <linux/pm_runtime.h>
+#include <linux/property.h>
+#include <linux/slab.h>
+
+#include "internal.h"
+
+static DEFINE_IDA(peci_controller_ida);
+
+static void peci_controller_dev_release(struct device *dev)
+{
+       struct peci_controller *controller = to_peci_controller(dev);
+
+       mutex_destroy(&controller->bus_lock);
+}
+
+struct device_type peci_controller_type = {
+       .release        = peci_controller_dev_release,
+};
I have not read further than patch 6 in this set, so I'm hoping there
is an explanation for this. As it stands it looks like a red flag that
the release function is not actually releasing anything?
Ok, that's related to other comments here and in patch 7. I'll try to
refactor this. I'm thinking about splitting the "controller_add" into
separate "alloc" and "add" (or init? register?). And perhaps integrate
that into devm, so that controller can be allocated using devres, tying
that into lifetime of underlying platform device.
quoted
+
+int peci_controller_scan_devices(struct peci_controller *controller)
+{
+       /* Just a stub, no support for actual devices yet */
+       return 0;
+}
Move this to the patch where it is needed.
It's used in this patch (in sysfs and controller add), but at this
point we haven't introduced devices yet.
I would have to move this to patch 8 - but I don't think it belongs
there.
Will it make more sense if I introduce sysfs documentation here?
Or as a completely separate patch?
I wanted to avoid going too far with split granularity, and just go
with high-level concepts starting with the controller.
quoted
+
+/**
+ * peci_controller_add() - Add PECI controller
+ * @controller: the PECI controller to be added
+ * @parent: device object to be registered as a parent
+ *
+ * In final stage of its probe(), peci_controller driver should include calling
s/should include calling/calls/
Ok.
quoted
+ * peci_controller_add() to register itself with the PECI bus.
+ * The caller is responsible for allocating the struct
peci_controller and
+ * managing its lifetime, calling peci_controller_remove() prior
to releasing
+ * the allocation.
+ *
+ * It returns zero on success, else a negative error code
(dropping the
+ * controller's refcount). After a successful return, the caller
is responsible
+ * for calling peci_controller_remove().
+ *
+ * Return: 0 if succeeded, other values in case errors.
+ */
+int peci_controller_add(struct peci_controller *controller, struct
device *parent)
+{
+       struct fwnode_handle *node =
fwnode_handle_get(dev_fwnode(parent));
+       int ret;
+
+       if (WARN_ON(!controller->xfer))
Why WARN()? What is 'xfer', and what is likelihood the caller forgets
to set it? For something critical like this the WARN is likely
overkill.
Very unlikely - 'xfer' provides "connection" with hardware so it's
rather mandatory.
It indicates programmer error, so WARN() with all its consequences
(taint and so on) seemed adequate.

Do you suggest to downgrade it to pr_err()?
quoted
+               return -EINVAL;
+
+       ret = ida_alloc_max(&peci_controller_ida, U8_MAX,
GFP_KERNEL);
An '_add' function should just add, this seems to be doing more
"alloc". Speaking of which is there a peci_controller_alloc()?
Please see my previous comment (I'll try to refactor this).
quoted
+       if (ret < 0)
+               return ret;
+
+       controller->id = ret;
+
+       mutex_init(&controller->bus_lock);
+
+       controller->dev.parent = parent;
+       controller->dev.bus = &peci_bus_type;
+       controller->dev.type = &peci_controller_type;
+       controller->dev.fwnode = node;
+       controller->dev.of_node = to_of_node(node);
+
+       ret = dev_set_name(&controller->dev, "peci-%d", controller-
quoted
id);
+       if (ret)
+               goto err_id;
+
+       ret = device_register(&controller->dev);
+       if (ret)
+               goto err_put;
+
+       pm_runtime_no_callbacks(&controller->dev);
+       pm_suspend_ignore_children(&controller->dev, true);
+       pm_runtime_enable(&controller->dev);
+
+       /*
+        * Ignoring retval since failures during scan are non-
critical for
+        * controller itself.
+        */
+       peci_controller_scan_devices(controller);
+
+       return 0;
+
+err_put:
+       put_device(&controller->dev);
+err_id:
+       fwnode_handle_put(controller->dev.fwnode);
+       ida_free(&peci_controller_ida, controller->id);
I'd expect these to be released by ->release().
Ack.
quoted
+
+       return ret;
+}
+EXPORT_SYMBOL_NS_GPL(peci_controller_add, PECI);
I think it's cleaner to declare symbol namespaces in the Makefile. In
this case, add:

cflags-y += -DDEFAULT_SYMBOL_NAMESPACE=PECI

...and just use EXPORT_SYMBOL_GPL as normal in the C file.
  
I kind of prefer the more verbose EXPORT_SYMBOL_NS_GPL - it also
doesn't "hide" the fact that we're using namespaces (everything is in
the C file rather than mixed into Makefile), but it's not a strong
opinion, so sure - I can change this.
quoted
+
+static int _unregister(struct device *dev, void *dummy)
+{
+       /* Just a stub, no support for actual devices yet */
At least for me, I think it wastes review time to consider empty
stubs. Just add the
whole thing back when it's actually used so it can be reviewed
properly for suitability.
Just like with peci_controller_scan_devices - logically it belongs to
the controller, and is used by the controller, it's just that the
devices will be added later in the series.
quoted
+       return 0;
+}
+
+/**
+ * peci_controller_remove - Delete PECI controller
+ * @controller: the PECI controller to be removed
+ *
+ * This call is used only by PECI controller drivers, which are
the only ones
+ * directly touching chip registers.
+ *
+ * Note that this function also drops a reference to the
controller.
+ */
+void peci_controller_remove(struct peci_controller *controller)
+{
+       pm_runtime_disable(&controller->dev);
+       /*
+        * Detach any active PECI devices. This can't fail, thus we
do not
+        * check the returned value.
+        */
+       device_for_each_child_reverse(&controller->dev, NULL,
_unregister);
How does the peci_controller_remove() get called with children still
beneath it? Can that possibility be precluded by arranging for
children to be removed first?
When we're unbinding the controller driver from its backing device (or
just removing the module) with children devices still present in the
system.

Yes, it could be precluded, but I don't think we should prevent this
(forcing the user to manually remove all the children devices first).
For example, given peci_controller_add is called from another's
driver
probe routine, this unregistration could be handled by a devm action.
Ok, I think this should just fall into place naturally after alloc/init
gets split.
quoted
+
+       device_unregister(&controller->dev);
+       fwnode_handle_put(controller->dev.fwnode);
+       ida_free(&peci_controller_ida, controller->id);
Another open coded copy of release code that belongs in ->release()?
Ack.
quoted
+}
+EXPORT_SYMBOL_NS_GPL(peci_controller_remove, PECI);
+
+struct bus_type peci_bus_type = {
+       .name           = "peci",
+       .bus_groups     = peci_bus_groups,
+};
+
+static int __init peci_init(void)
+{
+       int ret;
+
+       ret = bus_register(&peci_bus_type);
+       if (ret < 0) {
+               pr_err("failed to register PECI bus type!\n");
+               return ret;
+       }
+
+       return 0;
+}
+subsys_initcall(peci_init);
You can't have subsys_initcall in a module. If you actually need
subsys_initcall then this can't be a module. Are you sure this can't
be module_init()?
Sure, I'll fix this in v2.
quoted
+
+static void __exit peci_exit(void)
+{
+       bus_unregister(&peci_bus_type);
+}
+module_exit(peci_exit);
+
+MODULE_AUTHOR("Jason M Bills [off-list ref]");
+MODULE_AUTHOR("Jae Hyun Yoo [off-list ref]");
+MODULE_AUTHOR("Iwona Winiarska [off-list ref]");
Is MAINTAINERS sufficient? Do you all want to be contacted by end
users, or just kernel developers. If it's the former then keep this,
if it's the latter then MAINTAINERS is sufficient.
It's the former.
quoted
+MODULE_DESCRIPTION("PECI bus core module");
+MODULE_LICENSE("GPL");
diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
new file mode 100644
index 000000000000..80c61bcdfc6b
--- /dev/null
+++ b/drivers/peci/internal.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2018-2021 Intel Corporation */
+
+#ifndef __PECI_INTERNAL_H
+#define __PECI_INTERNAL_H
+
+#include <linux/device.h>
+#include <linux/types.h>
+
+struct peci_controller;
+struct attribute_group;
+
+extern struct bus_type peci_bus_type;
+extern const struct attribute_group *peci_bus_groups[];
+
+extern struct device_type peci_controller_type;
+
+int peci_controller_scan_devices(struct peci_controller
*controller);
+
+#endif /* __PECI_INTERNAL_H */
diff --git a/drivers/peci/sysfs.c b/drivers/peci/sysfs.c
new file mode 100644
index 000000000000..36c5e2a18a92
--- /dev/null
+++ b/drivers/peci/sysfs.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2021 Intel Corporation
+
+#include <linux/peci.h>
+
+#include "internal.h"
+
+static int rescan_controller(struct device *dev, void *data)
+{
+       if (dev->type != &peci_controller_type)
+               return 0;
+
+       return
peci_controller_scan_devices(to_peci_controller(dev));
+}
+
+static ssize_t rescan_store(struct bus_type *bus, const char *buf,
size_t count)
+{
+       bool res;
+       int ret;
+
+       ret = kstrtobool(buf, &res);
+       if (ret)
+               return ret;
+
+       if (!res)
+               return count;
+
+       ret = bus_for_each_dev(&peci_bus_type, NULL, NULL,
rescan_controller);
+       if (ret)
+               return ret;
+
+       return count;
+}
+static BUS_ATTR_WO(rescan);
No Documentation/ABI entry for this attribute, which means I'm not
sure if it's suitable because it's unreviewable what it actually does
reviewing this patch as a standalone.
We're expecting to use "rescan" in the similar way as it is used for
PCIe or USB.
BMC can boot up when the system is still in S5 (without any guarantee
that it will ever change this state - the user can never turn the
platform on :) ). If the controller is loaded and the platform allows
it to discover devices - great (the scan happens as last step of
controller_add), if not - userspace can use rescan.

I'll add documentation in v2.
quoted
+
+static struct attribute *peci_bus_attrs[] = {
+       &bus_attr_rescan.attr,
+       NULL
+};
+
+static const struct attribute_group peci_bus_group = {
+       .attrs = peci_bus_attrs,
+};
+
+const struct attribute_group *peci_bus_groups[] = {
+       &peci_bus_group,
+       NULL
+};
diff --git a/include/linux/peci.h b/include/linux/peci.h
new file mode 100644
index 000000000000..cdf3008321fd
--- /dev/null
+++ b/include/linux/peci.h
@@ -0,0 +1,82 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2018-2021 Intel Corporation */
+
+#ifndef __LINUX_PECI_H
+#define __LINUX_PECI_H
+
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+struct peci_request;
+
+/**
+ * struct peci_controller - PECI controller
+ * @dev: device object to register PECI controller to the device
model
+ * @xfer: PECI transfer function
+ * @bus_lock: lock used to protect multiple callers
+ * @id: PECI controller ID
+ *
+ * PECI controllers usually connect to their drivers using non-
PECI bus,
+ * such as the platform bus.
+ * Each PECI controller can communicate with one or more PECI
devices.
+ */
+struct peci_controller {
+       struct device dev;
+       int (*xfer)(struct peci_controller *controller, u8 addr,
struct peci_request *req);
Each device will have a different way to do a PECI transfer?

I thought PECI was a standard...
The "standard" part only applies to the connection between the
controller and the devices - not the connection between controller and
the rest of the system on which the controller resides in.
xfer is vendor specific. 
quoted
+       struct mutex bus_lock; /* held for the duration of xfer */
What is it actually locking? For example, there is a mantra that goes
"lock data, not code", and this comment seems to imply that no
specific
data is being locked.
PECI-wire interface requires that the response follows the request -
and that should hold for all devices behind a given controller.
In other words, assuming that we have two devices, d1 and d2, we need
to have: d1.req, d1.resp, d2.req, d2.resp. Single xfer takes care of
both request and response.

I would like to eventually move that lock into individual controllers,
but before that happens - I'd like to have a reasoning behind it.
If we have interfaces that allow us to decouple requests from responses
or devices that can handle servicing more than one requests at a time,
the lock will go away from peci-core.
quoted
+       u8 id;
No possible way to have more than 256 controllers per system?
For real world scenarios - I expect single digit number of controllers
per system. The boards with HW compatible with "aspeed,ast2xxx-peci"
contain just one instance of this controller.
I expect more in the future (e.g. different "physical" transport), but
definitely not more than 256 per system.
quoted
+};
+
+int peci_controller_add(struct peci_controller *controller, struct
device *parent);
+void peci_controller_remove(struct peci_controller *controller);
+
+static inline struct peci_controller *to_peci_controller(void *d)
+{
+       return container_of(d, struct peci_controller, dev);
+}
+
+/**
+ * struct peci_device - PECI device
+ * @dev: device object to register PECI device to the device model
+ * @controller: manages the bus segment hosting this PECI device
+ * @addr: address used on the PECI bus connected to the parent
controller
+ *
+ * A peci_device identifies a single device (i.e. CPU) connected
to a PECI bus.
+ * The behaviour exposed to the rest of the system is defined by
the PECI driver
+ * managing the device.
+ */
+struct peci_device {
+       struct device dev;
+       struct peci_controller *controller;
Is the device a child of the controller? If yes, then no need for a a
separate pointer vs "to_peci_controller(peci_dev->parent)"
Yeah, it's redundant - I'll remove it.

Thank you
-Iwona

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help