Thread (329 messages) 329 messages, 12 authors, 2018-03-14

Re: [PATCH v3 34/35] remote-curl: implement stateless-connect command

From: Brandon Williams <hidden>
Date: 2018-02-28 20:21:52

On 02/27, Jonathan Nieder wrote:
Hi,

Brandon Williams wrote:
quoted
Teach remote-curl the 'stateless-connect' command which is used to
establish a stateless connection with servers which support protocol
version 2.  This allows remote-curl to act as a proxy, allowing the git
client to communicate natively with a remote end, simply using
remote-curl as a pass through to convert requests to http.
Cool!  I better look at the spec for that first.

*looks at the previous patch*

Oh, there is no documented spec. :/  I'll muddle through this instead,
then.
I'll make sure to add one :)
[...]
quoted
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -188,7 +188,10 @@ static struct ref *parse_git_refs(struct discovery *heads, int for_push)
 	heads->version = discover_version(&reader);
 	switch (heads->version) {
 	case protocol_v2:
-		die("support for protocol v2 not implemented yet");
+		/*
+		 * Do nothing.  Client should run 'stateless-connect' and
+		 * request the refs themselves.
+		 */
 		break;
This is the 'list' command, right?  Since we expect the client to run
'stateless-connect' instead, can we make it error out?
Yes and no.  Remote-curl will run this when trying to establish a
stateless-connection.  If the response is v2 then this is a capability
list and not refs.  So the capabilities will be dumped to the client and
they will be able to request the refs themselves at a later point.  The
comment here is just misleading, so i'll make sure to fix it.
[...]
quoted
@@ -1082,6 +1085,184 @@ static void parse_push(struct strbuf *buf)
 	free(specs);
 }
 
+struct proxy_state {
+	char *service_name;
+	char *service_url;
+	struct curl_slist *headers;
+	struct strbuf request_buffer;
+	int in;
+	int out;
+	struct packet_reader reader;
+	size_t pos;
+	int seen_flush;
+};
Can this have a comment describing what it is/does?  It's not obvious
to me at first glance.

It doesn't have to go in a lot of detail since this is an internal
implementation detail, but something saying e.g. that this represents
a connection to an HTTP server (that's an example; I'm not saying
that's what it represents :)) would help.
Always making new code have higher standards than the existing code ;)
Haha, I'll add a simple comment explaining it.
quoted
+
+static void proxy_state_init(struct proxy_state *p, const char *service_name,
+			     enum protocol_version version)
[...]
quoted
+static void proxy_state_clear(struct proxy_state *p)
Looks sensible.

[...]
quoted
+static size_t proxy_in(char *buffer, size_t eltsize,
+		       size_t nmemb, void *userdata)
Can this have a comment describing the interface?  (E.g. does it read
a single pkt_line?  How is the caller expected to use it?  Does it
satisfy the interface of some callback?)
I'll add a comment that its used as a READFUNCTION callback for curl and
that it tries to copy over a packet-line at a time.
libcurl's example https://curl.haxx.se/libcurl/c/ftpupload.html just
calls this read_callback.  Such a name plus a pointer to
CURLOPT_READFUNCTION should do the trick; bonus points if the comment 
says what our implementation of the callback does.

Is this about having peek ability?
No its just that Curl only requests a set about of data at a time so you
need to be able to buffer the data that can't be read yet.
quoted
+	struct proxy_state *p = userdata;
+	size_t avail = p->request_buffer.len - p->pos;
+
+	if (!avail) {
+		if (p->seen_flush) {
+			p->seen_flush = 0;
+			return 0;
+		}
+
+		strbuf_reset(&p->request_buffer);
+		switch (packet_reader_read(&p->reader)) {
+		case PACKET_READ_EOF:
+			die("unexpected EOF when reading from parent process");
+		case PACKET_READ_NORMAL:
+			packet_buf_write_len(&p->request_buffer, p->reader.line,
+					     p->reader.pktlen);
+			break;
+		case PACKET_READ_DELIM:
+			packet_buf_delim(&p->request_buffer);
+			break;
+		case PACKET_READ_FLUSH:
+			packet_buf_flush(&p->request_buffer);
+			p->seen_flush = 1;
+			break;
+		}
+		p->pos = 0;
+		avail = p->request_buffer.len;
+	}
+
+	if (max < avail)
+		avail = max;
+	memcpy(buffer, p->request_buffer.buf + p->pos, avail);
+	p->pos += avail;
+	return avail;
This is a number of bytes, but CURLOPT_READFUNCTION expects a number
of items, fread-style.  That is:

	if (avail < eltsize)
		... handle somehow, maybe by reading in more? ...

	avail_memb = avail / eltsize;
	memcpy(buffer,
	       p->request_buffer.buf + p->pos,
	       st_mult(avail_memb, eltsize));
	p->pos += st_mult(avail_memb, eltsize);
	return avail_memb;

But https://curl.haxx.se/libcurl/c/CURLOPT_READFUNCTION.html says

	Your function must then return the actual number of bytes that
	it stored in that memory area.

Does this mean eltsize is always 1?  This is super confusing...

... ok, a quick grep for fread_func in libcurl reveals that eltsize is
indeed always 1.  Can we add an assertion so we notice if that
changes?

	if (eltsize != 1)
		BUG("curl read callback called with size = %zu != 1", eltsize);
	max = nmemb;
Yeah i can go ahead and do this.  Just note that the v1 path uses logic
identical to this so it would be a problem there.
[...]
quoted
+static size_t proxy_out(char *buffer, size_t eltsize,
+			size_t nmemb, void *userdata)
+{
+	size_t size = eltsize * nmemb;
+	struct proxy_state *p = userdata;
+
+	write_or_die(p->out, buffer, size);
+	return size;
+}
Nice.  Same questions about st_mult or just asserting on eltsize apply
here, too.

[...]
quoted
+static int proxy_post(struct proxy_state *p)
What does this function do?  Can it get a brief comment?
Will do.
quoted
+{
+	struct active_request_slot *slot;
+	int err;
+
+	slot = get_active_slot();
+
+	curl_easy_setopt(slot->curl, CURLOPT_NOBODY, 0);
+	curl_easy_setopt(slot->curl, CURLOPT_POST, 1);
+	curl_easy_setopt(slot->curl, CURLOPT_URL, p->service_url);
+	curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, p->headers);
+
+	/* Setup function to read request from client */
+	curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, proxy_in);
+	curl_easy_setopt(slot->curl, CURLOPT_READDATA, p);
+
+	/* Setup function to write server response to client */
+	curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, proxy_out);
+	curl_easy_setopt(slot->curl, CURLOPT_WRITEDATA, p);
+
+	err = run_slot(slot, NULL);
+
+	if (err != HTTP_OK)
+		err = -1;
+
+	return err;
HTTP_OK is 0 but kind of obscures that.  How about something like the
following?

	if (run_slot(slot, NULL))
		return -1;
	return 0;

or

	if (run_slot(slot, NULL) != HTTP_OK)
		return -1;
	return 0;

That way it's clearer that this always returns 0 or -1.
Sounds good.
[...]
quoted
+static int stateless_connect(const char *service_name)
+{
+	struct discovery *discover;
+	struct proxy_state p;
+
+	/*
+	 * Run the info/refs request and see if the server supports protocol
+	 * v2.  If and only if the server supports v2 can we successfully
+	 * establish a stateless connection, otherwise we need to tell the
+	 * client to fallback to using other transport helper functions to
+	 * complete their request.
+	 */
+	discover = discover_refs(service_name, 0);
+	if (discover->version != protocol_v2) {
+		printf("fallback\n");
+		fflush(stdout);
+		return -1;
Interesting.  I wonder if we can make remote-curl less smart and drive
this more from the caller.

E.g. if the caller could do a single stateless request, they could do:

	option git-protocol version=2
	stateless-request GET info/refs?service=git-upload-pack
	[pkt-lines, ending with a flush-pkt]

The git-protocol option in this hypothetical example is the value to
be passed in the Git-Protocol header.

Then based on the response, the caller could decide to keep using
stateless-request for further requests or fall back to "fetch".

That way, if we implement some protocol v3, the remote helper would
not have to be changed at all to handle it: the caller would instead
make the new v3-format request and remote-curl would be able to oblige
without knowing why they're doing it.

What do you think?
I do see the draw for wanting this.  I think a change like this requires
a lot more refactoring, simply because with the current setup the
fetch/ls-refs logic doesn't care that its talking through a
remote-helper where if we went down that route it would need to be aware
of that.

-- 
Brandon Williams
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help