Re: [RFC] IMA Log Snapshotting Design Proposal
From: Sush Shringarputale <hidden>
Date: 2023-08-14 21:43:03
Also in:
kexec, linux-integrity
Hello Mimi, Thanks for your feedback on this. On 8/11/2023 6:14 AM, Mimi Zohar wrote:
Hi Sush, Tushar, On Tue, 2023-08-01 at 12:12 -0700, Sush Shringarputale wrote:quoted
================================================ | A. Problem Statement | ================================================ Depending on the IMA policy, the IMA log can consume a lot of Kernel memory on the device. For instance, the events for the following IMA policy entries may need to be measured in certain scenarios, but they can also lead to a verbose IMA log when the device is running for a long period of time. ┌───────────────────────────────────────┐ │# PROC_SUPER_MAGIC │ │measure fsmagic=0x9fa0 │ │# SYSFS_MAGIC │ │measure fsmagic=0x62656572 │ │# DEBUGFS_MAGIC │ │measure fsmagic=0x64626720 │ │# TMPFS_MAGIC │ │measure fsmagic=0x01021994 │ │# RAMFS_MAGIC │ │measure fsmagic=0x858458f6 │ │# SECURITYFS_MAGIC │ │measure fsmagic=0x73636673 │ │# OVERLAYFS_MAGIC │ │measure fsmagic=0x794c7630 │ │# log, audit or tmp files │ │measure obj_type=var_log_t │ │measure obj_type=auditd_log_t │ │measure obj_type=tmp_t │ └───────────────────────────────────────┘ Secondly, certain devices are configured to take Kernel updates using Kexec soft-boot. The IMA log from the previous Kernel gets carried over and the Kernel memory consumption problem worsens when such devices undergo multiple Kexec soft-boots over a long period of time. The above two scenarios can cause IMA log to grow and consume Kernel memory. In addition, a large IMA log can add pressure on the network bandwidth when the attestation client sends it to remote-attestation-service. Truncating IMA log to reclaim memory is not feasible, since it makes the log go out of sync with the TPM PCR quote making remote attestation fail. A sophisticated solution is required which will help relieve the memory pressure on the device and continue supporting remote attestation without disruptions.If the problem is kernel memory, then using a single tmpfs file has already been proposed [1]. As entries are added to the measurement list, they are copied to the tmpfs file and removed from kernel memory. Userspace would still access the measurement list via the existing securityfs file. The IMA measurement list is a sequential file, allowing it to be read from an offset. How much or how little of the measuremnt list is read by the attestation client and sent to the attestation server is up to the attestation client/server. If the problem is not kernel memory, but memory pressure in general, then instead of a tmpfs file, the measurement list could similarly be copied to a single persistent file [1].
The suggested approach in this RFC discussion using a vfs_tmpfile was only discussed but no prototype was created back then. We are discussing the approach internally now and will respond with more details about it.
quoted
------------------------------------------------------------------------------- ================================================ | B. Proposed Solution | ================================================ In this document, we propose an enhancement to the IMA subsystem to improve the long-running performance by snapshotting the IMA log, while still providing mechanisms to verify its integrity using the PCR quotes. The remainder of the document describes details of the proposed solution in the following sub-sections. - High-level Work-flow - Snapshot Triggering Mechanism - Design Choices for Storing Snapshots - Attestation-Client and Remote-Attestation-Service Side Changes - Example Walk-through - Open Questions ------------------------------------------------------------------------------- ================================================ | B.1 High-level Work-flow | ================================================ Pre-requisites: - IMA Integrity guarantees are maintained. The proposed high level work-flow of IMA log snapshotting is as follows: - A user-mode process will trigger the snapshot by opening a file in SysFS say /sys/kernel/security/ima/snapshot (referred to as sysk_ima_snapshot_file here onwards).Please fix the mailer so that it doesn't wrap sentences. Adding blank lines between bullets would improve readability.
Noted, will do.
quoted
- The Kernel will get the current TPM PCR values and PCR update counter [2] and store them as template data in a new IMA event "snapshot_aggregate". This event will be measured by IMA using critical data measurement functionality [1]. Recording regular IMA events will be paused while "snapshot_aggregate" is being computed using the existing IMA mutex lock. - Once the "snapshot_aggregate" is computed and measured in IMA log, the prior IMA events will be made available in the sysk_ima_snapshot_file. - The UM process will copy those IMA events from sysk_ima_snapshot_file to a snapshot file on disk chosen by UM (referred to as UM_snapshot_file here onwards). The location, file-system type, access permissions etc. of the UM_snapshot_file would be controlled by UM process itself. - Once UM is done copying the IMA events from sysk_ima_snapshot_file to UM_snapshot_file, it will indicate to the Kernel that the snapshot can be finalized by triggering a write with any data to the sysk_ima_snapshot_file. UM process cannot prevent the IMA log purge operation after this point. - The Kernel will truncate the current IMA log and and clear HTable up to the "snapshot_aggregate" marker. - The Kernel will measure the PCR update counter as part of measuring snapshot_aggregate, so that it can be used by the remote attestation service for detecting missing events. - UM can prevent the IMA log purge by closing the sysk_ima_snapshot_file without performing a write operation on it. In this case, while the "snapshot_aggregate" marker may still be in the log, the event can be ignored since the previous entries in the IMA log will not be purged. Note: - This work-flow should work when interleaved with Kexec 'load' and 'execute' events and should not cause IMA log + snapshot to go out of sync with PCR quotes. The implementation details are omitted from this document for brevity.This design seems overly complex and requires synchronization between the "snapshot" record and exporting the records from the measurement list. None of this would be necessary if the measurements were copied from kernel memory to a backing file (e.g. tmpfs), as described in [1]. What is the real problem - kernel memory pressure, memory pressure in general, or disk space? Is the intention to remove or offload the exported measurements?
The main concern is the memory pressure on both the kernel and the attestation client when it sends the request. The concern you bring up is valid and we are working on creating a prototype. There is no intention to remove the exported measurements. - Sush
Concerns: - Pausing extending the measurement list. [1] https://lore.kernel.org/linux-integrity/CAOQ4uxj4Pv2Wr1wgvBCDR-tnA5dsZT3rvdDzKgAH1aEV_-r9Qg@mail.gmail.com/#t (local)