Re: [PATCH] attr: avoid recursion when expanding attribute macros

From: Patrick Steinhardt <hidden>
Date: 2025-11-12 06:57:21

On Tue, Nov 11, 2025 at 05:36:47PM -0500, Jeff King wrote:

Given a set of attribute macros like:

   [attr]a1 a2
   [attr]a2 a3
   ...
   [attr]a300000 -text
   file a1

expanding the attributes for "file" requires expanding "a1" to "a2",
"a2" to "a3", and so on until hitting a non-macro expansion ("-text", in
this case). We implement this via recursion: fill_one() calls
macroexpand_one(), which then recurses back to fill_one(). As a result,
very deep macro chains like the one above can run out of stack space and
cause us to segfault.

The required stack space is fairly small; I needed on the order of
200,000 entries to get a segfault on Linux. So it's unlikely anybody
would hit this accidentally, leaving only malicious inputs. There you
can easily construct a repo which will segfault on clone (we look at
attributes during the checkout step, but you'd see the same trying to do
other operations, like diff in a bare repo). It's mostly harmless, since
anybody constructing such a repo is only preventing victims from cloning
their evil garbage, but it could be a nuisance for hosting sites.

One option to prevent this is to limit the depth of recursion we'll
allow. This is conceptually easy to implement, but it raises other
questions: what should the limit be, and do we need a configuration knob
for it?

That's fair, and as you demonstrate it's easy enough to turn recursion
into iteration. But it doesn't really solve the main problem: given
malicious input we'd now still crash eventually, even though we
ourselves control how exactly we crash. The main difference is that with
iteration it'll both:

  - take longer for us to crash

  - require way more memory along the way

So the evil garbage would continue to be a nuisance for users who want
to clone such a repository, but now it's going to be more of a nuisance
for hosting sites given that it could lead to out-of-memory situations.

I guess the reasoning here is that for this to become a real problem the
".gitattributes" file would need to be excessively huge. We're probably
talking about many millions or even billions of attributes before this
could cause an OOM situation. And such a file would be large enough to
bust the typical limits that the likes of GitHub and GitLab have in
place, so Git hosters already protect themselves against this crafted
input, even if only indirectly so.

The other angle is of course the wasted compute that an adversary can
cause. But I don't really mind that too much: there's enough benign
operations that require a bunch of compute, so I don't really see a
reason why one would need to craft a "compute waster" with malicious
input.

So personally I would've probably leaned into the direction of enforcing
a hard limit. I don't see a reason why anybody would need more than a
couple of recursions, it culls both compute and memory growth, and it
allows us to have a proper error message in case the limit is busted.
Furthermore, we can demonstrate right now that it wasn't possible to
have unlimited recursion anyway, which makes it easier to put a new
limit into place.

But following my above reasoning I think it's okay to turn this into
iteration, as well, though, but I'd like to hear whether my train of
thought matches yours.

Thanks!

Patrick

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help