Thread (36 messages) 36 messages, 11 authors, 2025-09-02

Re: [RFC PATCH v1 0/2] Add O_DENY_WRITE (complement AT_EXECVE_CHECK)

From: Andy Lutomirski <luto@kernel.org>
Date: 2025-09-01 16:01:29
Also in: linux-fsdevel, linux-integrity, linux-security-module, lkml

On Mon, Sep 1, 2025 at 4:06 AM Jann Horn [off-list ref] wrote:
On Thu, Aug 28, 2025 at 11:01 PM Serge E. Hallyn [off-list ref] wrote:
quoted
On Wed, Aug 27, 2025 at 05:32:02PM -0700, Andy Lutomirski wrote:
quoted
On Wed, Aug 27, 2025 at 5:14 PM Aleksa Sarai [off-list ref] wrote:
quoted
On 2025-08-26, Mickaël Salaün [off-list ref] wrote:
quoted
On Tue, Aug 26, 2025 at 11:07:03AM +0200, Christian Brauner wrote:
quoted
Nothing has changed in that regard and I'm not interested in stuffing
the VFS APIs full of special-purpose behavior to work around the fact
that this is work that needs to be done in userspace. Change the apps,
stop pushing more and more cruft into the VFS that has no business
there.
It would be interesting to know how to patch user space to get the same
guarantees...  Do you think I would propose a kernel patch otherwise?
You could mmap the script file with MAP_PRIVATE. This is the *actual*
protection the kernel uses against overwriting binaries (yes, ETXTBSY is
nice but IIRC there are ways to get around it anyway).
Wait, really?  MAP_PRIVATE prevents writes to the mapping from
affecting the file, but I don't think that writes to the file will
break the MAP_PRIVATE CoW if it's not already broken.

IPython says:

In [1]: import mmap, tempfile

In [2]: f = tempfile.TemporaryFile()

In [3]: f.write(b'initial contents')
Out[3]: 16

In [4]: f.flush()

In [5]: map = mmap.mmap(f.fileno(), f.tell(), flags=mmap.MAP_PRIVATE,
prot=mmap.PROT_READ)

In [6]: map[:]
Out[6]: b'initial contents'

In [7]: f.seek(0)
Out[7]: 0

In [8]: f.write(b'changed')
Out[8]: 7

In [9]: f.flush()

In [10]: map[:]
Out[10]: b'changed contents'
That was surprising to me, however, if I split the reader
and writer into different processes, so
Testing this in python is a terrible idea because it obfuscates the
actual syscalls from you.
quoted
P1:
f = open("/tmp/3", "w")
f.write('initial contents')
f.flush()

P2:
import mmap
f = open("/tmp/3", "r")
map = mmap.mmap(f.fileno(), f.tell(), flags=mmap.MAP_PRIVATE, prot=mmap.PROT_READ)

Back to P1:
f.seek(0)
f.write('changed')

Back to P2:
map[:]

Then P2 gives me:

b'initial contents'
Because when you executed `f.write('changed')`, Python internally
buffered the write. "changed" is never actually written into the file
in your example. If you add a `f.flush()` in P1 after this, running
`map[:]` in P2 again will show you the new data.
These days, one can type in Python, ask an LLM to translate to C, and
get almost-correct output :)  Or one can use os.write(), which is
exactly what I should have done.

--Andy
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help