Re: [PATCH 2/2] archive: avoid spawning `gzip`
From: René Scharfe <hidden>
Date: 2019-06-10 10:45:31
Subsystem:
kernel build + files below scripts/ (unless maintained elsewhere), the rest · Maintainers:
Nathan Chancellor, Nicolas Schier, Linus Torvalds
Am 01.05.19 um 20:18 schrieb Jeff King:
On Wed, May 01, 2019 at 07:45:05PM +0200, René Scharfe wrote:quoted
quoted
But since the performance is still not quite on par with `gzip`, I would actually rather not, and really, just punt on that one, stating that people interested in higher performance should use `pigz`.Here are my performance numbers for generating .tar.gz files again:
OK, tried one more version, with pthreads (patch at the end). Also redid all measurements for better comparability; everything is faster now for some reason (perhaps due to a compiler update? clang version 7.0.1-8 now): master, using gzip(1): Benchmark #1: git archive --format=tgz HEAD Time (mean ± σ): 15.697 s ± 0.246 s [User: 19.213 s, System: 0.386 s] Range (min … max): 15.405 s … 16.103 s 10 runs using zlib sequentially: Benchmark #1: git archive --format=tgz HEAD Time (mean ± σ): 19.191 s ± 0.408 s [User: 19.091 s, System: 0.100 s] Range (min … max): 18.802 s … 19.877 s 10 runs using a gzip-lookalike: Benchmark #1: git archive --format=tgz HEAD Time (mean ± σ): 16.289 s ± 0.218 s [User: 19.485 s, System: 0.337 s] Range (min … max): 16.020 s … 16.555 s 10 runs using zlib with start_async: Benchmark #1: git archive --format=tgz HEAD Time (mean ± σ): 16.516 s ± 0.334 s [User: 20.282 s, System: 0.383 s] Range (min … max): 16.166 s … 17.283 s 10 runs using zlib in a separate thread (that's the new one): Benchmark #1: git archive --format=tgz HEAD Time (mean ± σ): 16.310 s ± 0.237 s [User: 20.075 s, System: 0.173 s] Range (min … max): 15.983 s … 16.790 s 10 runs
I think the start_async() one seems like a good option. It reclaims most of the (wall-clock) performance, isn't very much code, and doesn't leave any ugly user-visible traces.
The pthreads numbers look a bit better still. The patch is huge though, because it duplicates almost everything. It was easier that way; a real patch series would extract functions that can be used both with static and allocated headers first, and keep everything in archive-tar.c.
I'd be fine to see it come later, though, on top of the patches Dscho is sending. Even though changing to sequential zlib is technically a change in behavior, the existing behavior wasn't really planned. And given the wall-clock versus CPU time tradeoff, it's not entirely clear that one solution is better than the other.
The current behavior is not an accident; the synchronous method was rejected in 2009 because it was slower [1]. Redid the measurements with v1.6.5-rc0 and the old patch [2], but they would only compile with gcc (Debian 8.3.0-6) for me, so it's not directly comparable to the numbers above: v1.6.5-rc0: Benchmark #1: ../git/git-archive HEAD | gzip Time (mean ± σ): 16.051 s ± 0.486 s [User: 19.514 s, System: 0.341 s] Range (min … max): 15.416 s … 17.001 s 10 runs v1.6.5-rc0 + [2]: Benchmark #1: ../git/git-archive --format=tar.gz HEAD Time (mean ± σ): 19.684 s ± 0.374 s [User: 19.601 s, System: 0.060 s] Range (min … max): 19.082 s … 20.177 s 10 runs User time is still slightly higher, but the difference is in the noise. [1] http://public-inbox.org/git/4AAAC8CE.8020302@lsrfire.ath.cx/ [2] http://public-inbox.org/git/4AA97B61.6030301@lsrfire.ath.cx/
quoted
quoted
And who knows, maybe nobody will complain at all about the performance?Probably. And popular tarballs would be cached anyway, I guess.At GitHub we certainly do cache the git-archive output. We'd also be just fine with the sequential solution. We generally turn down pack.threads to 1, and keep our CPUs busy by serving multiple users anyway. So whatever has the lowest overall CPU time is generally preferable, but the times are close enough that I don't think we'd care much either way (and it's probably not worth having a config option or similar).
Moving back to 2009 and reducing the number of utilized cores both feels weird, but the sequential solution *is* the most obvious, easiest and (by a narrow margin) lightest one if gzip(1) is not an option anymore. Anyway, the threading patch: --- Makefile | 1 + archive-tar.c | 11 +- archive-tgz.c | 452 ++++++++++++++++++++++++++++++++++++++++++++++++++ archive.h | 4 + 4 files changed, 465 insertions(+), 3 deletions(-) create mode 100644 archive-tgz.c
diff --git a/Makefile b/Makefile
index 8a7e235352..ed649ac18d 100644
--- a/Makefile
+++ b/Makefile@@ -834,6 +834,7 @@ LIB_OBJS += alloc.o LIB_OBJS += apply.o LIB_OBJS += archive.o LIB_OBJS += archive-tar.o +LIB_OBJS += archive-tgz.o LIB_OBJS += archive-zip.o LIB_OBJS += argv-array.o LIB_OBJS += attr.o
diff --git a/archive-tar.c b/archive-tar.c
index 3e53aac1e6..929eb58235 100644
--- a/archive-tar.c
+++ b/archive-tar.c@@ -15,7 +15,9 @@ static char block[BLOCKSIZE]; static unsigned long offset; -static int tar_umask = 002; +int tar_umask = 002; + +static const char internal_gzip[] = "git archive gzip"; static int write_tar_filter_archive(const struct archiver *ar, struct archiver_args *args);
@@ -445,6 +447,9 @@ static int write_tar_filter_archive(const struct archiver *ar, if (!ar->data) BUG("tar-filter archiver called with no filter defined"); + if (!strcmp(ar->data, internal_gzip)) + return write_tgz_archive(ar, args); + strbuf_addstr(&cmd, ar->data); if (args->compression_level >= 0) strbuf_addf(&cmd, " -%d", args->compression_level);
@@ -483,9 +488,9 @@ void init_tar_archiver(void) int i; register_archiver(&tar_archiver); - tar_filter_config("tar.tgz.command", "gzip -cn", NULL); + tar_filter_config("tar.tgz.command", internal_gzip, NULL); tar_filter_config("tar.tgz.remote", "true", NULL); - tar_filter_config("tar.tar.gz.command", "gzip -cn", NULL); + tar_filter_config("tar.tar.gz.command", internal_gzip, NULL); tar_filter_config("tar.tar.gz.remote", "true", NULL); git_config(git_tar_config, NULL); for (i = 0; i < nr_tar_filters; i++) {
diff --git a/archive-tgz.c b/archive-tgz.c
new file mode 100644
index 0000000000..ae219e1cc0
--- /dev/null
+++ b/archive-tgz.c@@ -0,0 +1,452 @@ +#include "cache.h" +#include "config.h" +#include "tar.h" +#include "archive.h" +#include "object-store.h" +#include "streaming.h" + +#define RECORDSIZE (512) +#define BLOCKSIZE (RECORDSIZE * 20) + +static gzFile gzip; +static size_t offset; + +/* + * This is the max value that a ustar size header can specify, as it is fixed + * at 11 octal digits. POSIX specifies that we switch to extended headers at + * this size. + * + * Likewise for the mtime (which happens to use a buffer of the same size). + */ +#if ULONG_MAX == 0xFFFFFFFF +#define USTAR_MAX_SIZE ULONG_MAX +#else +#define USTAR_MAX_SIZE 077777777777UL +#endif +#if TIME_MAX == 0xFFFFFFFF +#define USTAR_MAX_MTIME TIME_MAX +#else +#define USTAR_MAX_MTIME 077777777777ULL +#endif + +static void tgz_write(const void *data, size_t size) +{ + const char *p = data; + while (size) { + size_t to_write = size; + if (to_write > UINT_MAX) + to_write = UINT_MAX; + if (gzwrite(gzip, p, to_write) != to_write) + die(_("gzwrite failed")); + p += to_write; + size -= to_write; + offset = (offset + to_write) % BLOCKSIZE; + } +} + +static void tgz_finish_record(void) +{ + size_t tail = offset % RECORDSIZE; + if (tail) { + size_t to_seek = RECORDSIZE - tail; + if (gzseek(gzip, to_seek, SEEK_CUR) < 0) + die(_("gzseek failed")); + offset = (offset + to_seek) % BLOCKSIZE; + } +} + +static void tgz_write_trailer(void) +{ + size_t to_seek = BLOCKSIZE - offset; + if (to_seek < 2 * RECORDSIZE) + to_seek += BLOCKSIZE; + if (gzseek(gzip, to_seek, SEEK_CUR) < 0) + die(_("gzseek failed")); + if (gzflush(gzip, Z_FINISH) != Z_OK) + die(_("gzflush failed")); +} + +struct work_item { + void *buffer; + size_t size; + int finish_record; +}; + +#define TODO_SIZE 64 +struct work_item todo[TODO_SIZE]; +static int todo_start; +static int todo_end; +static int todo_done; +static int all_work_added; +static pthread_mutex_t tar_mutex; +static pthread_t thread; + +static void tar_lock(void) +{ + pthread_mutex_lock(&tar_mutex); +} + +static void tar_unlock(void) +{ + pthread_mutex_unlock(&tar_mutex); +} + +static pthread_cond_t cond_add; +static pthread_cond_t cond_write; +static pthread_cond_t cond_result; + +static void add_work(void *buffer, size_t size, int finish_record) +{ + tar_lock(); + + while ((todo_end + 1) % ARRAY_SIZE(todo) == todo_done) + pthread_cond_wait(&cond_write, &tar_mutex); + + todo[todo_end].buffer = buffer; + todo[todo_end].size = size; + todo[todo_end].finish_record = finish_record; + + todo_end = (todo_end + 1) % ARRAY_SIZE(todo); + + pthread_cond_signal(&cond_add); + tar_unlock(); +} + +static struct work_item *get_work(void) +{ + struct work_item *ret = NULL; + + tar_lock(); + while (todo_start == todo_end && !all_work_added) + pthread_cond_wait(&cond_add, &tar_mutex); + + if (todo_start != todo_end || !all_work_added) { + ret = &todo[todo_start]; + todo_start = (todo_start + 1) % ARRAY_SIZE(todo); + } + tar_unlock(); + return ret; +} + +static void work_done(void) +{ + tar_lock(); + todo_done = (todo_done + 1) % ARRAY_SIZE(todo); + pthread_cond_signal(&cond_write); + + if (all_work_added && todo_done == todo_end) + pthread_cond_signal(&cond_result); + tar_unlock(); +} + +static void *run(void *arg) +{ + for (;;) { + struct work_item *w = get_work(); + if (!w) + break; + tgz_write(w->buffer, w->size); + free(w->buffer); + if (w->finish_record) + tgz_finish_record(); + work_done(); + } + return NULL; +} + +static void start_output_thread(void) +{ + int err; + + pthread_mutex_init(&tar_mutex, NULL); + pthread_cond_init(&cond_add, NULL); + pthread_cond_init(&cond_write, NULL); + pthread_cond_init(&cond_result, NULL); + + memset(todo, 0, sizeof(todo)); + + err = pthread_create(&thread, NULL, run, NULL); + if (err) + die(_("failed to create thread: %s"), strerror(err)); +} + +static void wait_for_output_thread(void) +{ + tar_lock(); + all_work_added = 1; + + while (todo_done != todo_end) + pthread_cond_wait(&cond_result, &tar_mutex); + + pthread_cond_broadcast(&cond_add); + tar_unlock(); + + pthread_join(thread, NULL); + + pthread_mutex_destroy(&tar_mutex); + pthread_cond_destroy(&cond_add); + pthread_cond_destroy(&cond_write); + pthread_cond_destroy(&cond_result); +} + +static int stream_blob(const struct object_id *oid) +{ + struct git_istream *st; + enum object_type type; + unsigned long sz; + ssize_t readlen; + size_t chunk_size = BLOCKSIZE * 10; + + st = open_istream(oid, &type, &sz, NULL); + if (!st) + return error(_("cannot stream blob %s"), oid_to_hex(oid)); + for (;;) { + char *buf = xmalloc(chunk_size); + readlen = read_istream(st, buf, chunk_size); + if (readlen <= 0) + break; + sz -= readlen; + add_work(buf, readlen, !sz); + } + close_istream(st); + return readlen; +} + +/* + * pax extended header records have the format "%u %s=%s\n". %u contains + * the size of the whole string (including the %u), the first %s is the + * keyword, the second one is the value. This function constructs such a + * string and appends it to a struct strbuf. + */ +static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword, + const char *value, unsigned int valuelen) +{ + int len, tmp; + + /* "%u %s=%s\n" */ + len = 1 + 1 + strlen(keyword) + 1 + valuelen + 1; + for (tmp = len; tmp > 9; tmp /= 10) + len++; + + strbuf_grow(sb, len); + strbuf_addf(sb, "%u %s=", len, keyword); + strbuf_add(sb, value, valuelen); + strbuf_addch(sb, '\n'); +} + +/* + * Like strbuf_append_ext_header, but for numeric values. + */ +static void strbuf_append_ext_header_uint(struct strbuf *sb, + const char *keyword, + uintmax_t value) +{ + char buf[40]; /* big enough for 2^128 in decimal, plus NUL */ + int len; + + len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value); + strbuf_append_ext_header(sb, keyword, buf, len); +} + +static unsigned int ustar_header_chksum(const struct ustar_header *header) +{ + const unsigned char *p = (const unsigned char *)header; + unsigned int chksum = 0; + while (p < (const unsigned char *)header->chksum) + chksum += *p++; + chksum += sizeof(header->chksum) * ' '; + p += sizeof(header->chksum); + while (p < (const unsigned char *)header + sizeof(struct ustar_header)) + chksum += *p++; + return chksum; +} + +static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen) +{ + size_t i = pathlen; + if (i > 1 && path[i - 1] == '/') + i--; + if (i > maxlen) + i = maxlen; + do { + i--; + } while (i > 0 && path[i] != '/'); + return i; +} + +static void prepare_header(struct archiver_args *args, + struct ustar_header *header, + unsigned int mode, unsigned long size) +{ + xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 07777); + xsnprintf(header->size, sizeof(header->size), "%011"PRIoMAX , S_ISREG(mode) ? (uintmax_t)size : (uintmax_t)0); + xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) args->time); + + xsnprintf(header->uid, sizeof(header->uid), "%07o", 0); + xsnprintf(header->gid, sizeof(header->gid), "%07o", 0); + strlcpy(header->uname, "root", sizeof(header->uname)); + strlcpy(header->gname, "root", sizeof(header->gname)); + xsnprintf(header->devmajor, sizeof(header->devmajor), "%07o", 0); + xsnprintf(header->devminor, sizeof(header->devminor), "%07o", 0); + + memcpy(header->magic, "ustar", 6); + memcpy(header->version, "00", 2); + + xsnprintf(header->chksum, sizeof(header->chksum), "%07o", ustar_header_chksum(header)); +} + +static void write_extended_header(struct archiver_args *args, + const struct object_id *oid, + struct strbuf *extended_header) +{ + size_t size; + char *buffer = strbuf_detach(extended_header, &size); + struct ustar_header *header = xcalloc(1, sizeof(*header)); + unsigned int mode; + *header->typeflag = TYPEFLAG_EXT_HEADER; + mode = 0100666; + xsnprintf(header->name, sizeof(header->name), "%s.paxheader", + oid_to_hex(oid)); + prepare_header(args, header, mode, size); + add_work(header, sizeof(*header), 1); + add_work(buffer, size, 1); +} + +static int write_tar_entry(struct archiver_args *args, + const struct object_id *oid, + const char *path, size_t pathlen, + unsigned int mode) +{ + struct ustar_header *header = xcalloc(1, sizeof(*header)); + struct strbuf ext_header = STRBUF_INIT; + unsigned int old_mode = mode; + unsigned long size, size_in_header; + void *buffer; + int err = 0; + + if (S_ISDIR(mode) || S_ISGITLINK(mode)) { + *header->typeflag = TYPEFLAG_DIR; + mode = (mode | 0777) & ~tar_umask; + } else if (S_ISLNK(mode)) { + *header->typeflag = TYPEFLAG_LNK; + mode |= 0777; + } else if (S_ISREG(mode)) { + *header->typeflag = TYPEFLAG_REG; + mode = (mode | ((mode & 0100) ? 0777 : 0666)) & ~tar_umask; + } else { + return error(_("unsupported file mode: 0%o (SHA1: %s)"), + mode, oid_to_hex(oid)); + } + if (pathlen > sizeof(header->name)) { + size_t plen = get_path_prefix(path, pathlen, + sizeof(header->prefix)); + size_t rest = pathlen - plen - 1; + if (plen > 0 && rest <= sizeof(header->name)) { + memcpy(header->prefix, path, plen); + memcpy(header->name, path + plen + 1, rest); + } else { + xsnprintf(header->name, sizeof(header->name), "%s.data", + oid_to_hex(oid)); + strbuf_append_ext_header(&ext_header, "path", + path, pathlen); + } + } else + memcpy(header->name, path, pathlen); + + if (S_ISREG(mode) && !args->convert && + oid_object_info(args->repo, oid, &size) == OBJ_BLOB && + size > big_file_threshold) + buffer = NULL; + else if (S_ISLNK(mode) || S_ISREG(mode)) { + enum object_type type; + buffer = object_file_to_archive(args, path, oid, old_mode, &type, &size); + if (!buffer) + return error(_("cannot read %s"), oid_to_hex(oid)); + } else { + buffer = NULL; + size = 0; + } + + if (S_ISLNK(mode)) { + if (size > sizeof(header->linkname)) { + xsnprintf(header->linkname, sizeof(header->linkname), + "see %s.paxheader", oid_to_hex(oid)); + strbuf_append_ext_header(&ext_header, "linkpath", + buffer, size); + } else + memcpy(header->linkname, buffer, size); + } + + size_in_header = size; + if (S_ISREG(mode) && size > USTAR_MAX_SIZE) { + size_in_header = 0; + strbuf_append_ext_header_uint(&ext_header, "size", size); + } + + prepare_header(args, header, mode, size_in_header); + + if (ext_header.len > 0) { + write_extended_header(args, oid, &ext_header); + } + add_work(header, sizeof(*header), 1); + if (S_ISREG(mode) && size > 0) { + if (buffer) + add_work(buffer, size, 1); + else + err = stream_blob(oid); + } else + free(buffer); + return err; +} + +static void write_global_extended_header(struct archiver_args *args) +{ + const struct object_id *oid = args->commit_oid; + struct strbuf ext_header = STRBUF_INIT; + struct ustar_header *header = xcalloc(1, sizeof(*header)); + unsigned int mode; + size_t size; + char *buffer; + + if (oid) + strbuf_append_ext_header(&ext_header, "comment", + oid_to_hex(oid), + the_hash_algo->hexsz); + if (args->time > USTAR_MAX_MTIME) { + strbuf_append_ext_header_uint(&ext_header, "mtime", + args->time); + args->time = USTAR_MAX_MTIME; + } + + if (!ext_header.len) + return; + + buffer = strbuf_detach(&ext_header, &size); + *header->typeflag = TYPEFLAG_GLOBAL_HEADER; + mode = 0100666; + xsnprintf(header->name, sizeof(header->name), "pax_global_header"); + prepare_header(args, header, mode, size); + add_work(header, sizeof(*header), 1); + add_work(buffer, size, 1); +} + +int write_tgz_archive(const struct archiver *ar, struct archiver_args *args) +{ + int level = args->compression_level; + int err = 0; + + gzip = gzdopen(1, "wb"); + if (!gzip) + return error(_("gzdopen failed")); + if (gzsetparams(gzip, level, Z_DEFAULT_STRATEGY) != Z_OK) + return error(_("unable to set compression level %d"), level); + + start_output_thread(); + write_global_extended_header(args); + err = write_archive_entries(args, write_tar_entry); + if (err) + return err; + wait_for_output_thread(); + tgz_write_trailer(); + return err; +}
diff --git a/archive.h b/archive.h
index e60e3dd31c..b00afa1a9f 100644
--- a/archive.h
+++ b/archive.h@@ -45,6 +45,10 @@ void init_tar_archiver(void); void init_zip_archiver(void); void init_archivers(void); +int tar_umask; + +int write_tgz_archive(const struct archiver *ar, struct archiver_args *args); + typedef int (*write_archive_entry_fn_t)(struct archiver_args *args, const struct object_id *oid, const char *path, size_t pathlen, --
2.22.0