Re: [RFC PATCH 00/10] perf: add workqueue library and use it in synthetic-events
From: Jiri Olsa <hidden>
Date: 2021-07-19 21:43:42
Also in:
lkml
On Tue, Jul 13, 2021 at 02:11:11PM +0200, Riccardo Mancini wrote:
This patchset introduces a new utility library inside perf/util, which provides a work queue abstraction, which loosely follows the Kernel workqueue API. The workqueue abstraction is made up by two components: - threadpool: which takes care of managing a pool of threads. It is inspired by the prototype for threaded trace in perf-record from Alexey: https://lore.kernel.org/lkml/cover.1625227739.git.alexey.v.bayduraev@linux.intel.com/ (local) - workqueue: manages a shared queue and provides the workers implementation. On top of the workqueue, a simple parallel-for utility is implemented which is then showcased in synthetic-events.c, replacing the previous manual pthread-created threads. Through some experiments with perf bench, I can see how the new workqueue has a higher overhead compared to manual creation of threads, but is able to more effectively partition work among threads, yielding a better result with more threads. Furthermore, the overhead could be configured by changing the `work_size` (currently 1), aka the number of dirents that are processed by a thread before grabbing a lock to get the new work item. I experimented with different sizes but, while bigger sizes reduce overhead as expected, they do not scale as well to more threads. I tried to keep the patchset as simple as possible, deferring possible improvements and features to future work. Naming a few: - in order to achieve a better performance, we could consider using work-stealing instead of a common queue. - affinities in the thread pool, as in Alexey prototype for perf-record. Doing so would enable reusing the same threadpool for different purposes (evlist open, threaded trace, synthetic threads), avoiding having to spin up threads multiple times. - resizable threadpool, e.g. for lazy spawining of threads. @Arnaldo Since I wanted the workqueue to provide a similar API to the Kernel's workqueue, I followed the naming style I found there, instead of the usual object__method style that is typically found in perf. Let me know if you'd like me to follow perf style instead. Thanks, Riccardo Riccardo Mancini (10): perf workqueue: threadpool creation and destruction perf tests: add test for workqueue perf workqueue: add threadpool start and stop functions perf workqueue: add threadpool execute and wait functions perf workqueue: add sparse annotation header perf workqueue: introduce workqueue struct perf workqueue: implement worker thread and management perf workqueue: add queue_work and flush_workqueue functions perf workqueue: add utility to execute a for loop in parallel perf synthetic-events: use workqueue parallel_for
looks great, would it make sense to put this to libperf? jirka
tools/perf/tests/Build | 1 + tools/perf/tests/builtin-test.c | 9 + tools/perf/tests/tests.h | 3 + tools/perf/tests/workqueue.c | 453 +++++++++++++++++ tools/perf/util/Build | 1 + tools/perf/util/synthetic-events.c | 131 +++-- tools/perf/util/workqueue/Build | 2 + tools/perf/util/workqueue/sparse.h | 21 + tools/perf/util/workqueue/threadpool.c | 516 ++++++++++++++++++++ tools/perf/util/workqueue/threadpool.h | 29 ++ tools/perf/util/workqueue/workqueue.c | 642 +++++++++++++++++++++++++ tools/perf/util/workqueue/workqueue.h | 38 ++ 12 files changed, 1771 insertions(+), 75 deletions(-) create mode 100644 tools/perf/tests/workqueue.c create mode 100644 tools/perf/util/workqueue/Build create mode 100644 tools/perf/util/workqueue/sparse.h create mode 100644 tools/perf/util/workqueue/threadpool.c create mode 100644 tools/perf/util/workqueue/threadpool.h create mode 100644 tools/perf/util/workqueue/workqueue.c create mode 100644 tools/perf/util/workqueue/workqueue.h -- 2.31.1