Thread (115 messages) 115 messages, 6 authors, 2019-07-11

Re: [GSoC][PATCH v4 2/7] clone: better handle symlinked files at .git/objects/

From: Thomas Gummerer <hidden>
Date: 2019-03-30 19:27:44

On 03/30, Matheus Tavares Bernardino wrote:
On Fri, Mar 29, 2019 at 5:05 PM Thomas Gummerer [off-list ref] wrote:
quoted
On 03/29, Matheus Tavares Bernardino wrote:
quoted
Ok, what if instead of using linkat() we use 'realpath(const char
*path, char *resolved_path)', which will resolve any symlinks at
'path' and store the canonical path at 'resolved_path'? Then, we can
still keep using link() but now, with the certainty that all platforms
will have a consistent behaviour? (also, realpath() is POSIX.1-2001)
Would that be a better idea?
Yeah, I think that is a good idea.  Note that 'realpath()' itself is
not used anywhere in our codebase either, but there is
'strbuf_realpath()', that from reading the function documentation does
exactly what 'realpath()' would do.  So using 'strbuf_realpath()'
would probably be the right thing to do here.
Thanks. While I was looking for realpath() at git codebase (before I
saw your email), I got a little confused: Besides strbuf_realpath() I
also found real_path(), real_path_if_valid() and real_pathdup(). All
these last three use strbuf_realpath() but they also initialize the
struct strbuf internally and just return a 'char *', which is much
convenient in some cases.
Right, feel free to use whichever is most convenient for you, and
whichever works in the context.
                           What seems weird to me is that, whilst
real_pathdup() releases the internally initialized struct strubuf
(leaving just the returned string to be free'd by the user), the other
two don't. So, if struct strbuf change in the future to have more
dynamic allocated resources, these functions will also have to be
modified. Also, since real_pathdup() can already do what the other two
do, do you know if there is a reason to keep all of them?
Right, '*dup()' functions usually leave the return value to be free'd
by the caller.  And while 'real_pathdup()' could do what the others do
already it also takes more effort to use it.  Users don't need to free
the return value from 'real_path()' to avoid a memory leak.  This
alone justifies its existence I think.
One last question: I found some places which don't free the string
returned by, for example, real_path() (e.g., find_worktree() at
worktree.c). Would it be a valid/good patch (or patches) to add free()
calls in this places? (I'm currently trying to get more people here at
USP to contribute to git, and maybe this could be a nice first
contribution for them...)
Trying to plug memory leaks in the codebase is definitely something
that I think is worthy of doing.  Sometimes it's not worth actually
free'ing the memory, for example just before the program exits, in
which case we can use the UNLEAK annotation.  It was introduced in
0e5bba53af ("add UNLEAK annotation for reducing leak false positives",
2017-09-08) if you want more background.

That said, the memory from 'real_path()' should actually not be
free'd.  The strbuf there has a static lifetime, so it is valid until
git exits.  If we were to free the return value of the function we'd
actually free an internal buffer of the strbuf, that is still valid.
So if someone were to use 'real_path()' after that, the memory that
strbuf still thinks it owns would actually have been free'd, which
would result in undefined behaviour, and probably would make git
segfault.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help