Thread (35 messages) 35 messages, 5 authors, 2019-01-17

Re: [PATCH 2/2] submodule: munge paths to submodule git directories

From: Stefan Beller <hidden>
Date: 2018-08-14 21:08:15

On Tue, Aug 14, 2018 at 11:57 AM Jonathan Nieder [off-list ref] wrote:
Hi,

Brandon Williams wrote:
quoted
On 08/09, Jeff King wrote:
quoted
quoted
One interesting thing about url-encoding is that it's not one-to-one.
This case could also be %2F, which is a different file (on a
case-sensitive filesystem). I think "%20" and "+" are similarly
interchangeable.

If we were decoding the filenames, that's fine. The round-trip is
lossless.

But that's not quite how the new code behaves. We encode the input and
then check to see if it matches an encoding we previously performed. So
if our urlencode routines ever change, this will subtly break.

I don't know how much it's worth caring about. We're not that likely to
change the routines ourself (though certainly a third-party
implementation would need to know our exact url-encoding decisions).
This is exactly the reason why I wanted to get some opinions on what the
best thing to do here would be.  I _think_ the best thing would probably
be to write a specific routine to do the conversion, and it wouldn't
even have to be all that complex.  Basically I'm just interested in
converting '/' characters so that things no longer behave like
nested directories.
First of all, I think the behavior with this patch is already much
better than the previous status quo.  I'm using the patch now and am
very happy with it.

Second, what if we store the pathname in config?  We already store the
URL there:

        [submodule "plugins/hooks"]
                url = https://gerrit.googlesource.com/plugins/hooks

So we could (as a followup patch) do something like

        [submodule "plugins/hooks"]
                url = https://gerrit.googlesource.com/plugins/hooks
                gitdirname = plugins%2fhooks

and use that for lookups instead of regenerating the directory name.
What do you think?
As I just looked at worktree code, this sounds intriguing for the wrong
reason (again), as a user may want to point the gitdirname to a repository
that they have already on disk outside the actual superproject. They
would be reinventing worktrees in the submodule space. ;-)

This would open up the security hole that we just had, again.
So we'd have to make sure that the gitdirname (instead of the
now meaningless subsection name) is proof to ../ attacks.

I feel uneasy about this as then the user might come in
and move submodules and repoint the gitdirname...
to a not url encoded path. Exposing this knob just
asks for trouble, no?

On the other hand, the only requirement for the "name" is
now uniqueness, and that is implied with subsections,
so I guess it looks elegant.

What would happen if gitdirname is changed as part of
history? (The same problem we have now with changing
the subsection name)

The more I think about it the less appealing this is, but it looks
elegant.

Stefan
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help