Re: [PATCH v7 10/10] convert: add filter.<driver>.process option
From: Lars Schneider <hidden>
Date: 2016-09-13 22:05:03
On 10 Sep 2016, at 17:40, Torsten Bögershausen [off-list ref] wrote:
[]
One general question up here, more comments inline.
The current order for a clean-filter is like this, I removed the error handling:
int convert_to_git()
{
ret |= apply_filter(path, src, len, -1, dst, filter);
if (ret && dst) {
src = dst->buf;
len = dst->len;
}
ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
return ret | ident_to_git(path, src, len, dst, ca.ident);
}
The first step is the clean filter, the CRLF-LF conversion (if needed),
then ident.
The current implementation streams the whole file content to the filter,
(STDIN of the filter) and reads back STDOUT from the filter into a STRBUF.
This is to use the UNIX-like STDIN--STDOUT method for writing a filter.
However, int would_convert_to_git_filter_fd() and convert_to_git_filter_fd()
offer a sort of short-cut:
The filter reads from the file directly, and the output of the filter is
read into a STRBUF.Are you sure? As far as I understand the code the filter does not read from the file in any case today. The functions would_convert_to_git_filter_fd() and convert_to_git_filter_fd() just avoid avoid mapping the file in Git. The content is still streamed via pipes: https://github.com/git/git/commit/9035d75a2be9d80d82676504d69553245017f6d4
It looks as if the multi-filter approach can use this in a similar way: Give the pathname to the filter, the filter opens the file for reading and stream the result via the pkt-line protocol into Git. This needs some more changes, and may be very well go into a separate patch series. (and should). What I am asking for: When a multi-filter is used, the content is handled to the filter via pkt-line, and the result is given to Git via pkt-line ? Nothing wrong with it, I just wonder, if it should be mentioned somewhere.
That is most certainly a good idea and the main reason I added "capabilities" to the protocol. Joey Hess worked on this topic (not merged, yet) and I would like to make this available to the long-running filter protocol as soon as the feature is available: http://public-inbox.org/git/1468277112-9909-1-git-send-email-joeyh@joeyh.name/
quoted
+sub packet_read { + my $buffer; + my $bytes_read = read STDIN, $buffer, 4; + if ( $bytes_read == 0 ) { + + # EOF - Git stopped talking to us! + exit(); + } + elsif ( $bytes_read != 4 ) { + die "invalid packet size '$bytes_read' field"; + }This is half-kosher, I would say, (And I really. really would like to see an implementation in C ;-)
Would you be willing to contribute a patch? :-)
A read function may look like this:
ret = read(0, &buffer, 4);
if (ret < 0) {
/* Error, nothing we can do */
exit(1);
} else if (ret == 0) {
/* EOF */
exit(0);
} else if (ret < 4) {
/*
* Go and read more, until we have 4 bytes or EOF or Error */
} else {
/* Good case, see below */
}I see. However, my intention was to provide an absolute minimal example to teach a reader how the protocol works. I consider all proper error handling an exercise for the reader ;-)
quoted
+#define CAP_CLEAN (1u<<0) +#define CAP_SMUDGE (1u<<1)Is CAP_ too generic, and GIT_FILTER_CAP (or so) less calling for trouble ?
I had something like that but Junio suggested these names in V4: http://public-inbox.org/git/xmqq8twd8uld.fsf@gitster.mtv.corp.google.com/
quoted
+ + err = (strlen(filter_type) > PKTLINE_DATA_MAXLEN);Extra () needed ? More () in the code...
I thought it might improve readability, but I will remove them if you think this would be more consistent with existing Git code. Thanks, Lars