Re: [PATCH 3/5] parallel-checkout: add configuration options

From: Christian Couder <hidden>
Date: 2021-03-31 04:35:05

On Wed, Mar 17, 2021 at 10:12 PM Matheus Tavares
[off-list ref] wrote:

The above benchmarks show that parallel checkout is most effective on
repositories located on an SSD or over a distributed file system. For
local file systems on spinning disks, and/or older machines, the
parallelism does not always bring a good performance. For this reason,
the default value for checkout.workers is one, a.k.a. sequential
checkout.

I wonder how many people are still using HDD, and how many will still
use them in a few years.

I think having 1 as the default value for checkout.workers might be
good for a while for backward compatibility and stability, while
people who are interested can test parallel checkout on different
environments. But we might want, in a few releases, after some bugs,
if any, have been fixed, to use a default, maybe 10, that will provide
significant speedup for most people, and will justify the added
complexity, especially as your numbers still show a small speedup for
HDD when using 10.

To decide the default value for checkout.thresholdForParallelism,
another benchmark was executed in the "Local SSD" setup, where parallel
checkout showed to be beneficial. This time, we compared the runtime of
a `git checkout -f`, with and without parallelism, after randomly
removing an increasing number of files from the Linux working tree. The
"sequential fallback" column bellow corresponds to the executions where

s/bellow/below/

checkout.workers was 10 but checkout.thresholdForParallelism was equal
to the number of to-be-updated files plus one (so that we end up writing
sequentially). Each test case was sampled 15 times, and each sample had
a randomly different set of files removed. Here are the results:

             sequential fallback   10 workers           speedup
10   files    772.3 ms ± 12.6 ms   769.0 ms ± 13.6 ms   1.00 ± 0.02
20   files    780.5 ms ± 15.8 ms   775.2 ms ±  9.2 ms   1.01 ± 0.02
50   files    806.2 ms ± 13.8 ms   767.4 ms ±  8.5 ms   1.05 ± 0.02
100  files    833.7 ms ± 21.4 ms   750.5 ms ± 16.8 ms   1.11 ± 0.04
200  files    897.6 ms ± 30.9 ms   730.5 ms ± 14.7 ms   1.23 ± 0.05
500  files   1035.4 ms ± 48.0 ms   677.1 ms ± 22.3 ms   1.53 ± 0.09
1000 files   1244.6 ms ± 35.6 ms   654.0 ms ± 38.3 ms   1.90 ± 0.12
2000 files   1488.8 ms ± 53.4 ms   658.8 ms ± 23.8 ms   2.26 ± 0.12

From the above numbers, 100 files seems to be a reasonable default value
for the threshold setting.

Note: Up to 1000 files, we observe a drop in the execution time of the
parallel code with an increase in the number of files. This is a rather
odd behavior, but it was observed in multiple repetitions. Above 1000
files, the execution time increases according to the number of files, as
one would expect.

About the test environments: Local SSD tests were executed on an
i7-7700HQ (4 cores with hyper-threading) running Manjaro Linux. Local
HDD tests were executed on an Intel(R) Xeon(R) E3-1230 (also 4 cores
with hyper-threading), HDD Seagate Barracuda 7200.14 SATA 3.1, running
Debian. NFS and EFS tests were executed on an Amazon EC2 c5n.xlarge
instance, with 4 vCPUs. The Linux NFS server was running on a m6g.large
instance with 2 vCPUSs and a 1 TB EBS GP2 volume. Before each timing,
the linux repository was removed (or checked out back to its previous
state), and `sync && sysctl vm.drop_caches=3` was executed.

Thanks for all these tests and details!

Co-authored-by: Jeff Hostetler [off-list ref]
Signed-off-by: Matheus Tavares <redacted>

[...]

quoted hunk ↗ jump to hunk

@@ -23,6 +25,19 @@ enum pc_status parallel_checkout_status(void)
        return parallel_checkout.status;
 }

+#define DEFAULT_THRESHOLD_FOR_PARALLELISM 100

Using a "static const int" might be a bit better.

+void get_parallel_checkout_configs(int *num_workers, int *threshold)
+{
+       if (git_config_get_int("checkout.workers", num_workers))
+               *num_workers = 1;

I think it's better when an important default value like this "1" is
made more visible using a "static const int" or a "#define".

quoted hunk ↗ jump to hunk

+       else if (*num_workers < 1)
+               *num_workers = online_cpus();
+
+       if (git_config_get_int("checkout.thresholdForParallelism", threshold))
+               *threshold = DEFAULT_THRESHOLD_FOR_PARALLELISM;
+}
+
 void init_parallel_checkout(void)
 {
        if (parallel_checkout.status != PC_UNINITIALIZED)

@@ -554,11 +569,9 @@ static void write_items_sequentially(struct checkout *state)
                write_pc_item(&parallel_checkout.items[i], state);
 }

-#define DEFAULT_NUM_WORKERS 2

As I say above, it doesn't look like a good idea to have removed this.

quoted hunk ↗ jump to hunk

-int run_parallel_checkout(struct checkout *state)
+int run_parallel_checkout(struct checkout *state, int num_workers, int threshold)
 {
-       int ret, num_workers = DEFAULT_NUM_WORKERS;
+       int ret;

        if (parallel_checkout.status != PC_ACCEPTING_ENTRIES)
                BUG("cannot run parallel checkout: uninitialized or already running");

@@ -568,7 +581,7 @@ int run_parallel_checkout(struct checkout *state)
        if (parallel_checkout.nr < num_workers)
                num_workers = parallel_checkout.nr;

-       if (num_workers <= 1) {
+       if (num_workers <= 1 || parallel_checkout.nr < threshold) {

Here we check the number of workers...

                write_items_sequentially(state);
        } else {
                struct pc_worker *workers = setup_workers(state, num_workers);

[...]

quoted hunk ↗ jump to hunk

@@ -480,7 +483,8 @@ static int check_updates(struct unpack_trees_options *o,
                }
        }
        stop_progress(&progress);
-       errs |= run_parallel_checkout(&state);
+       if (pc_workers > 1)

...but the number of workers was already checked here.

+               errs |= run_parallel_checkout(&state, pc_workers, pc_threshold);
        errs |= finish_delayed_checkout(&state, NULL);
        git_attr_set_direction(GIT_ATTR_CHECKIN);

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help