Thread (27 messages) 27 messages, 3 authors, 2016-09-13

Re: [PATCH v7 10/10] convert: add filter.<driver>.process option

From: Lars Schneider <hidden>
Date: 2016-09-13 22:05:03

On 10 Sep 2016, at 17:40, Torsten Bögershausen [off-list ref] wrote:

[]

One general question up here, more comments inline.
The current order for a clean-filter is like this, I removed the error handling:

int convert_to_git()
{
	ret |= apply_filter(path, src, len, -1, dst, filter);
	if (ret && dst) {
		src = dst->buf;
		len = dst->len;
	}
	ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
	return ret | ident_to_git(path, src, len, dst, ca.ident);
}

The first step is the clean filter, the CRLF-LF conversion (if needed),
then ident.
The current implementation streams the whole file content to the filter,
(STDIN of the filter) and reads back STDOUT from the filter into a STRBUF.
This is to use the UNIX-like STDIN--STDOUT method for writing a filter.

However, int would_convert_to_git_filter_fd() and convert_to_git_filter_fd()
offer a sort of short-cut:
The filter reads from the file directly, and the output of the filter is
read into a STRBUF.
Are you sure? As far as I understand the code the filter does not read from 
the file in any case today.  The functions would_convert_to_git_filter_fd() and 
convert_to_git_filter_fd() just avoid avoid mapping the file in Git. The content 
is still streamed via pipes:
https://github.com/git/git/commit/9035d75a2be9d80d82676504d69553245017f6d4

It looks as if the multi-filter approach can use this in a similar way:
Give the pathname to the filter, the filter opens the file for reading
and stream the result via the pkt-line protocol into Git.
This needs some more changes, and may be very well go into a separate patch
series. (and should).

What I am asking for:
When a multi-filter is used, the content is handled to the filter via pkt-line,
and the result is given to Git via pkt-line ?
Nothing wrong with it, I just wonder, if it should be mentioned somewhere.
That is most certainly a good idea and the main reason I added "capabilities"
to the protocol. Joey Hess worked on this topic (not merged, yet) and I would 
like to make this available to the long-running filter protocol as soon as the
feature is available:
http://public-inbox.org/git/1468277112-9909-1-git-send-email-joeyh@joeyh.name/

quoted
+sub packet_read {
+    my $buffer;
+    my $bytes_read = read STDIN, $buffer, 4;
+    if ( $bytes_read == 0 ) {
+
+        # EOF - Git stopped talking to us!
+        exit();
+    }
+    elsif ( $bytes_read != 4 ) {
+        die "invalid packet size '$bytes_read' field";
+    }
This is half-kosher, I would say,
(And I really. really would like to see an implementation in C ;-)
Would you be willing to contribute a patch? :-)

A read function may look like this:

  ret = read(0, &buffer, 4);
  if (ret < 0) {
    /* Error, nothing we can do */
    exit(1);
  } else if (ret == 0) {
    /* EOF  */
    exit(0);
  } else if (ret < 4) {
    /* 
     * Go and read more, until we have 4 bytes or EOF or Error */
  } else {
    /* Good case, see below */
  }
I see. However, my intention was to provide an absolute minimal
example to teach a reader how the protocol works. I consider
all proper error handling an exercise for the reader ;-)

quoted
+#define CAP_CLEAN    (1u<<0)
+#define CAP_SMUDGE   (1u<<1)
Is CAP_ too generic, and GIT_FILTER_CAP (or so) less calling for trouble ?
I had something like that but Junio suggested these names in V4:
http://public-inbox.org/git/xmqq8twd8uld.fsf@gitster.mtv.corp.google.com/

quoted
+
+	err = (strlen(filter_type) > PKTLINE_DATA_MAXLEN);
Extra () needed ?
More () in the code...
I thought it might improve readability, but I will remove them
if you think this would be more consistent with existing Git code.


Thanks,
Lars
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help