Re: [PATCH 6/6] upload-pack: provide a hook for running pack-objects

From: Jeff King <hidden>
Date: 2016-06-16 02:19:31

On Thu, May 19, 2016 at 12:12:43PM +0200, Ævar Arnfjörð Bjarmason wrote:

On Thu, May 19, 2016 at 12:45 AM, Jeff King [off-list ref] wrote:

quoted

  3. You may want to insert a caching layer around
     pack-objects; it is the most CPU- and memory-intensive
     part of serving a fetch, and its output is a pure
     function[1] of its input, making it an ideal place to
     consolidate identical requests.

Cool to see this on the list after we talked briefly about this at Git
Merge. Being able to cache this so simply is a great optimization.

As I recall you guys at GitHub ended up writing your own utility to
cache output depending on stdin/argv because none existed already.

Yeah, we do have such a tool internally. It's possible we may one day
open-source that, but there aren't plans to do so right now.

I don't know whether this kind of caching would be useful to most sites
or not. It's good if you have lots of clients asking you for the same
thing at roughly the same time (say, somebody using "git pull" as a
deploy mechanism from their AWS cluster), but otherwise not.

So do I understand correctly that you're trying to guard against the
case where you e.g.:

    rsync untrusted.example.com:/tmp/poison.git /tmp/
    git clone /tmp/poison.git /tmp/safe.git

Not hosing your system if the poison.git/config has a
uploadpack.packObjectsHook that's "sudo rm -rf /".

I'm not that worried about this case, as it's just not that common.  I
think we're more concerned with two cases:

  1. multi-user servers where you ssh as yourself, but then access
     repositories owned by somebody else. This is basically the ssh case
     you described later.

  2. hosting sites that run git-daemon as the "daemon" user, but serve
     repositories owned by random untrusted users (where you would not
     want those users to run arbitrary code as "daemon").

We've already accepted that "push" hooks like the pre-receive or
update hook can do something malicious like this, so on one hand maybe
we should say if you scp raw *.git repositories with hooks this sort
of thing might happen, or if you ssh to a remote box and run their
per-repo hooks it's really their problem to make sure their users
don't run malicious hooks on your behalf.

Yeah, we make no promises for repositories that you push to. It's _only_
for the fetching side. It's kind of a funny distinction, but it's one we
have maintained since the beginning of git, and I do think there are
real sites that depend on it (see, e.g., the history of the
post-upload-pack hook added in the v1.6.x time frame).

Rsyncing a repository is generally of questionable safety. It's OK to
fetch from the result, but certainly not to run "git log" (which can run
arbitrary commands via external diff, etc).

But as you point out this makes the hook interface a bit unusual.
Wouldn't this give us the same security and normalize the hook
interface:

 * Don't do the uploadpack.packObjectsHook variable, just have a
normal "pack-objects" hook that works like any other git hook
 * By default we don't run this hook unless core.runDangerousHooks (or
whatever we call it) is true.
 * The core.runDangerousHooks variable cannot be set on a per-repo
basis using your new config facility.
 * If there's a pack-objects hook and core.runDangerousHooks isn't
true we warn "not executing potentially unsafe hook $path_to_hook" and
carry on

This is the "could we just set a bool" option I discussed in the commit
message. The problem is that it doesn't let the admin say "I don't trust
these repositories, but I _do_ want to run just this one hook when
serving them, and not any other hooks".

This would allow use-cases that are a bit inconvenient with your patch
(again, if I'm understanding it correctly):

 * I can set core.runDangerousHooks=true in /etc/gitconfig on my git
server because I also control all the repos, and I want to experiment
with trying this on a per-repo basis for users that are cloning from
me.
 * I can similarly play with this locally knowing I'm only cloning
repos I trust by setting core.runDangerousHooks=true in ~/.gitconfig

Yes, those use cases are not well served by the git config alone. But
you can do them (and much more) once your trusted hook is running (by
checking $GIT_DIR, or looking in a database, or whatever you want).

-Peff

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help