Re: [PATCH v3 34/35] remote-curl: implement stateless-connect command
From: Brandon Williams <hidden>
Date: 2018-02-28 20:21:52
On 02/27, Jonathan Nieder wrote:
Hi, Brandon Williams wrote:quoted
Teach remote-curl the 'stateless-connect' command which is used to establish a stateless connection with servers which support protocol version 2. This allows remote-curl to act as a proxy, allowing the git client to communicate natively with a remote end, simply using remote-curl as a pass through to convert requests to http.Cool! I better look at the spec for that first. *looks at the previous patch* Oh, there is no documented spec. :/ I'll muddle through this instead, then.
I'll make sure to add one :)
[...]quoted
--- a/remote-curl.c +++ b/remote-curl.c@@ -188,7 +188,10 @@ static struct ref *parse_git_refs(struct discovery *heads, int for_push) heads->version = discover_version(&reader); switch (heads->version) { case protocol_v2: - die("support for protocol v2 not implemented yet"); + /* + * Do nothing. Client should run 'stateless-connect' and + * request the refs themselves. + */ break;This is the 'list' command, right? Since we expect the client to run 'stateless-connect' instead, can we make it error out?
Yes and no. Remote-curl will run this when trying to establish a stateless-connection. If the response is v2 then this is a capability list and not refs. So the capabilities will be dumped to the client and they will be able to request the refs themselves at a later point. The comment here is just misleading, so i'll make sure to fix it.
[...]quoted
@@ -1082,6 +1085,184 @@ static void parse_push(struct strbuf *buf) free(specs); } +struct proxy_state { + char *service_name; + char *service_url; + struct curl_slist *headers; + struct strbuf request_buffer; + int in; + int out; + struct packet_reader reader; + size_t pos; + int seen_flush; +};Can this have a comment describing what it is/does? It's not obvious to me at first glance. It doesn't have to go in a lot of detail since this is an internal implementation detail, but something saying e.g. that this represents a connection to an HTTP server (that's an example; I'm not saying that's what it represents :)) would help.
Always making new code have higher standards than the existing code ;) Haha, I'll add a simple comment explaining it.
quoted
+ +static void proxy_state_init(struct proxy_state *p, const char *service_name, + enum protocol_version version)[...]quoted
+static void proxy_state_clear(struct proxy_state *p)Looks sensible. [...]quoted
+static size_t proxy_in(char *buffer, size_t eltsize, + size_t nmemb, void *userdata)Can this have a comment describing the interface? (E.g. does it read a single pkt_line? How is the caller expected to use it? Does it satisfy the interface of some callback?)
I'll add a comment that its used as a READFUNCTION callback for curl and that it tries to copy over a packet-line at a time.
libcurl's example https://curl.haxx.se/libcurl/c/ftpupload.html just calls this read_callback. Such a name plus a pointer to CURLOPT_READFUNCTION should do the trick; bonus points if the comment says what our implementation of the callback does. Is this about having peek ability?
No its just that Curl only requests a set about of data at a time so you need to be able to buffer the data that can't be read yet.
quoted
+ struct proxy_state *p = userdata; + size_t avail = p->request_buffer.len - p->pos; + + if (!avail) { + if (p->seen_flush) { + p->seen_flush = 0; + return 0; + } + + strbuf_reset(&p->request_buffer); + switch (packet_reader_read(&p->reader)) { + case PACKET_READ_EOF: + die("unexpected EOF when reading from parent process"); + case PACKET_READ_NORMAL: + packet_buf_write_len(&p->request_buffer, p->reader.line, + p->reader.pktlen); + break; + case PACKET_READ_DELIM: + packet_buf_delim(&p->request_buffer); + break; + case PACKET_READ_FLUSH: + packet_buf_flush(&p->request_buffer); + p->seen_flush = 1; + break; + } + p->pos = 0; + avail = p->request_buffer.len; + } + + if (max < avail) + avail = max; + memcpy(buffer, p->request_buffer.buf + p->pos, avail); + p->pos += avail; + return avail;This is a number of bytes, but CURLOPT_READFUNCTION expects a number of items, fread-style. That is: if (avail < eltsize) ... handle somehow, maybe by reading in more? ... avail_memb = avail / eltsize; memcpy(buffer, p->request_buffer.buf + p->pos, st_mult(avail_memb, eltsize)); p->pos += st_mult(avail_memb, eltsize); return avail_memb; But https://curl.haxx.se/libcurl/c/CURLOPT_READFUNCTION.html says Your function must then return the actual number of bytes that it stored in that memory area. Does this mean eltsize is always 1? This is super confusing... ... ok, a quick grep for fread_func in libcurl reveals that eltsize is indeed always 1. Can we add an assertion so we notice if that changes? if (eltsize != 1) BUG("curl read callback called with size = %zu != 1", eltsize); max = nmemb;
Yeah i can go ahead and do this. Just note that the v1 path uses logic identical to this so it would be a problem there.
[...]quoted
+static size_t proxy_out(char *buffer, size_t eltsize, + size_t nmemb, void *userdata) +{ + size_t size = eltsize * nmemb; + struct proxy_state *p = userdata; + + write_or_die(p->out, buffer, size); + return size; +}Nice. Same questions about st_mult or just asserting on eltsize apply here, too. [...]quoted
+static int proxy_post(struct proxy_state *p)What does this function do? Can it get a brief comment?
Will do.
quoted
+{ + struct active_request_slot *slot; + int err; + + slot = get_active_slot(); + + curl_easy_setopt(slot->curl, CURLOPT_NOBODY, 0); + curl_easy_setopt(slot->curl, CURLOPT_POST, 1); + curl_easy_setopt(slot->curl, CURLOPT_URL, p->service_url); + curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, p->headers); + + /* Setup function to read request from client */ + curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, proxy_in); + curl_easy_setopt(slot->curl, CURLOPT_READDATA, p); + + /* Setup function to write server response to client */ + curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, proxy_out); + curl_easy_setopt(slot->curl, CURLOPT_WRITEDATA, p); + + err = run_slot(slot, NULL); + + if (err != HTTP_OK) + err = -1; + + return err;HTTP_OK is 0 but kind of obscures that. How about something like the following? if (run_slot(slot, NULL)) return -1; return 0; or if (run_slot(slot, NULL) != HTTP_OK) return -1; return 0; That way it's clearer that this always returns 0 or -1.
Sounds good.
[...]quoted
+static int stateless_connect(const char *service_name) +{ + struct discovery *discover; + struct proxy_state p; + + /* + * Run the info/refs request and see if the server supports protocol + * v2. If and only if the server supports v2 can we successfully + * establish a stateless connection, otherwise we need to tell the + * client to fallback to using other transport helper functions to + * complete their request. + */ + discover = discover_refs(service_name, 0); + if (discover->version != protocol_v2) { + printf("fallback\n"); + fflush(stdout); + return -1;Interesting. I wonder if we can make remote-curl less smart and drive this more from the caller. E.g. if the caller could do a single stateless request, they could do: option git-protocol version=2 stateless-request GET info/refs?service=git-upload-pack [pkt-lines, ending with a flush-pkt] The git-protocol option in this hypothetical example is the value to be passed in the Git-Protocol header. Then based on the response, the caller could decide to keep using stateless-request for further requests or fall back to "fetch". That way, if we implement some protocol v3, the remote helper would not have to be changed at all to handle it: the caller would instead make the new v3-format request and remote-curl would be able to oblige without knowing why they're doing it. What do you think?
I do see the draw for wanting this. I think a change like this requires a lot more refactoring, simply because with the current setup the fetch/ls-refs logic doesn't care that its talking through a remote-helper where if we went down that route it would need to be aware of that. -- Brandon Williams