Re: [PATCH 6/6] upload-pack: provide a hook for running pack-objects
From: Jeff King <hidden>
Date: 2016-06-16 02:19:31
On Thu, May 19, 2016 at 12:12:43PM +0200, Ævar Arnfjörð Bjarmason wrote:
On Thu, May 19, 2016 at 12:45 AM, Jeff King [off-list ref] wrote:quoted
3. You may want to insert a caching layer around pack-objects; it is the most CPU- and memory-intensive part of serving a fetch, and its output is a pure function[1] of its input, making it an ideal place to consolidate identical requests.Cool to see this on the list after we talked briefly about this at Git Merge. Being able to cache this so simply is a great optimization. As I recall you guys at GitHub ended up writing your own utility to cache output depending on stdin/argv because none existed already.
Yeah, we do have such a tool internally. It's possible we may one day open-source that, but there aren't plans to do so right now. I don't know whether this kind of caching would be useful to most sites or not. It's good if you have lots of clients asking you for the same thing at roughly the same time (say, somebody using "git pull" as a deploy mechanism from their AWS cluster), but otherwise not.
So do I understand correctly that you're trying to guard against the
case where you e.g.:
rsync untrusted.example.com:/tmp/poison.git /tmp/
git clone /tmp/poison.git /tmp/safe.git
Not hosing your system if the poison.git/config has a
uploadpack.packObjectsHook that's "sudo rm -rf /".
I'm not that worried about this case, as it's just not that common. I
think we're more concerned with two cases:
1. multi-user servers where you ssh as yourself, but then access
repositories owned by somebody else. This is basically the ssh case
you described later.
2. hosting sites that run git-daemon as the "daemon" user, but serve
repositories owned by random untrusted users (where you would not
want those users to run arbitrary code as "daemon").
We've already accepted that "push" hooks like the pre-receive or update hook can do something malicious like this, so on one hand maybe we should say if you scp raw *.git repositories with hooks this sort of thing might happen, or if you ssh to a remote box and run their per-repo hooks it's really their problem to make sure their users don't run malicious hooks on your behalf.
Yeah, we make no promises for repositories that you push to. It's _only_ for the fetching side. It's kind of a funny distinction, but it's one we have maintained since the beginning of git, and I do think there are real sites that depend on it (see, e.g., the history of the post-upload-pack hook added in the v1.6.x time frame). Rsyncing a repository is generally of questionable safety. It's OK to fetch from the result, but certainly not to run "git log" (which can run arbitrary commands via external diff, etc).
But as you point out this makes the hook interface a bit unusual. Wouldn't this give us the same security and normalize the hook interface: * Don't do the uploadpack.packObjectsHook variable, just have a normal "pack-objects" hook that works like any other git hook * By default we don't run this hook unless core.runDangerousHooks (or whatever we call it) is true. * The core.runDangerousHooks variable cannot be set on a per-repo basis using your new config facility. * If there's a pack-objects hook and core.runDangerousHooks isn't true we warn "not executing potentially unsafe hook $path_to_hook" and carry on
This is the "could we just set a bool" option I discussed in the commit message. The problem is that it doesn't let the admin say "I don't trust these repositories, but I _do_ want to run just this one hook when serving them, and not any other hooks".
This would allow use-cases that are a bit inconvenient with your patch (again, if I'm understanding it correctly): * I can set core.runDangerousHooks=true in /etc/gitconfig on my git server because I also control all the repos, and I want to experiment with trying this on a per-repo basis for users that are cloning from me. * I can similarly play with this locally knowing I'm only cloning repos I trust by setting core.runDangerousHooks=true in ~/.gitconfig
Yes, those use cases are not well served by the git config alone. But you can do them (and much more) once your trusted hook is running (by checking $GIT_DIR, or looking in a database, or whatever you want). -Peff