Re: [PATCH] attr: avoid recursion when expanding attribute macros
From: Patrick Steinhardt <hidden>
Date: 2025-11-12 06:57:21
On Tue, Nov 11, 2025 at 05:36:47PM -0500, Jeff King wrote:
Given a set of attribute macros like:
[attr]a1 a2
[attr]a2 a3
...
[attr]a300000 -text
file a1
expanding the attributes for "file" requires expanding "a1" to "a2",
"a2" to "a3", and so on until hitting a non-macro expansion ("-text", in
this case). We implement this via recursion: fill_one() calls
macroexpand_one(), which then recurses back to fill_one(). As a result,
very deep macro chains like the one above can run out of stack space and
cause us to segfault.
The required stack space is fairly small; I needed on the order of
200,000 entries to get a segfault on Linux. So it's unlikely anybody
would hit this accidentally, leaving only malicious inputs. There you
can easily construct a repo which will segfault on clone (we look at
attributes during the checkout step, but you'd see the same trying to do
other operations, like diff in a bare repo). It's mostly harmless, since
anybody constructing such a repo is only preventing victims from cloning
their evil garbage, but it could be a nuisance for hosting sites.
One option to prevent this is to limit the depth of recursion we'll
allow. This is conceptually easy to implement, but it raises other
questions: what should the limit be, and do we need a configuration knob
for it?That's fair, and as you demonstrate it's easy enough to turn recursion into iteration. But it doesn't really solve the main problem: given malicious input we'd now still crash eventually, even though we ourselves control how exactly we crash. The main difference is that with iteration it'll both: - take longer for us to crash - require way more memory along the way So the evil garbage would continue to be a nuisance for users who want to clone such a repository, but now it's going to be more of a nuisance for hosting sites given that it could lead to out-of-memory situations. I guess the reasoning here is that for this to become a real problem the ".gitattributes" file would need to be excessively huge. We're probably talking about many millions or even billions of attributes before this could cause an OOM situation. And such a file would be large enough to bust the typical limits that the likes of GitHub and GitLab have in place, so Git hosters already protect themselves against this crafted input, even if only indirectly so. The other angle is of course the wasted compute that an adversary can cause. But I don't really mind that too much: there's enough benign operations that require a bunch of compute, so I don't really see a reason why one would need to craft a "compute waster" with malicious input. So personally I would've probably leaned into the direction of enforcing a hard limit. I don't see a reason why anybody would need more than a couple of recursions, it culls both compute and memory growth, and it allows us to have a proper error message in case the limit is busted. Furthermore, we can demonstrate right now that it wasn't possible to have unlimited recursion anyway, which makes it easier to put a new limit into place. But following my above reasoning I think it's okay to turn this into iteration, as well, though, but I'd like to hear whether my train of thought matches yours. Thanks! Patrick