Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)
From: James Bottomley <James.Bottomley@HansenPartnership.com>
Date: 2020-08-12 15:51:30
Also in:
dm-devel, linux-block, linux-fsdevel, linux-integrity, lkml
On Wed, 2020-08-12 at 10:15 -0400, Chuck Lever wrote:
quoted
On Aug 11, 2020, at 11:53 AM, James Bottomley [off-list ref] wrote: On Tue, 2020-08-11 at 10:48 -0400, Chuck Lever wrote:
[...]
quoted
quoted
quoted
and what is nice to have to speed up the verification process. The choice for the latter is cache or reconstruct depending on the resources available. If the tree gets cached on the server, that would be a server implementation detail invisible to the client.We assume that storage targets (for block or file) are not trusted. Therefore storage clients cannot rely on intermediate results (eg, middle nodes in a Merkle tree) unless those results are generated within the client's trust envelope.Yes, they can ... because supplied nodes can be verified. That's the whole point of a merkle tree. As long as I'm sure of the root hash I can verify all the rest even if supplied by an untrusted source. If you consider a simple merkle tree covering 4 blocks: R / \ H11 H12 / \ / \ H21 H22 H23 H24quoted
| | |B1 B2 B3 B4 Assume I have the verified root hash R. If you supply B3 you also supply H24 and H11 as proof. I verify by hashing B3 to produce H23 then hash H23 and H24 to produce H12 and if H12 and your supplied H11 hash to R the tree is correct and the B3 you supplied must likewise be correct.I'm not sure what you are proving here. Obviously this has to work in order for a client to reconstruct the file's Merkle tree given only R and the file content.
You implied the server can't be trusted to generate the merkel tree. I'm showing above it can because of the tree path based verification.
It's the construction of the tree and verification of the hashes that are potentially expensive. The point of caching intermediate hashes is so that the client verifies them as few times as possible. I don't see value in caching those hashes on an untrusted server -- the client will have to reverify them anyway, and there will be no savings.
I'm not making any claim about server caching, I'm just saying the client can request pieces of the tree from the server without having to reconstruct the whole thing itself because it can verify their correctness.
Cache once, as close as you can to where the data will be used.quoted
quoted
So: if the storage target is considered inside the client's trust envelope, it can cache or store durably any intermediate parts of the verification process. If not, the network and file storage is considered untrusted, and the client has to rely on nothing but the signed digest of the tree root. We could build a scheme around, say, fscache, that might save the intermediate results durably and locally.I agree we want caching on the client, but we can always page in from the remote as long as we page enough to verify up to R, so we're always sure the remote supplied genuine information.Agreed.quoted
quoted
quoted
quoted
For this reason, the idea was to save only the signature of the tree's root on durable storage. The client would retrieve that signature possibly at open time, and reconstruct the tree at that time.Right that's the integrity data you must have.quoted
Or the tree could be partially constructed on-demand at the time each unit is to be checked (say, as part of 2. above).Whether it's reconstructed or cached can be an implementation detail. You clearly have to reconstruct once, but whether you have to do it again depends on the memory available for caching and all the other resource calls in the system.quoted
The client would have to reconstruct that tree again if memory pressure caused some or all of the tree to be evicted, so perhaps an on-demand mechanism is preferable.Right, but I think that's implementation detail. Probably what we need is a way to get the log(N) verification hashes from the server and it's up to the client whether it caches them or not.Agreed, these are implementation details. But see above about the trustworthiness of the intermediate hashes. If they are conveyed on an untrusted network, then they can't be trusted either.Yes, they can, provided enough of them are asked for to verify. If you look at the simple example above, suppose I have cached H11 and H12, but I've lost the entire H2X layer. I want to verify B3 so I also ask you for your copy of H24. Then I generate H23 from B3 and Hash H23 and H24. If this doesn't hash to H12 I know either you supplied me the wrong block or lied about H24. However, if it all hashes correctly I know you supplied me with both the correct B3 and the correct H24.My point is there is a difference between a trusted cache and an untrusted cache. I argue there is not much value in a cache where the hashes have to be verified again.
And my point isn't about caching, it's about where the tree comes from. I claim and you agree the client can get the tree from the server a piece at a time (because it can path verify it) and doesn't have to generate it itself. How much of the tree the client has to store and whether the server caches, reads it in from somewhere or reconstructs it is an implementation detail. James