Re: [PATCH v2] diff: ensure consistent diff behavior with -I<regex> across output formats
From: Lidong Yan <hidden>
Date: 2025-08-05 09:23:56
Junio C Hamano [off-list ref] writes:
But I think the refactoring of diff_flush() codepath would may
involve some new mode (perhaps DIFF_FORMAT_DRYRUN or something) that
(1) does not produce any output, like DIFF_FORMAT_NO_OUTPUT, so
that we do not need to play with /dev/null like Peff's
illustration.
(2) knows that the caller is only interested in each path having
any change worth reporting, so that it can short-circuit once a
change is found for each path.
So, just before you want to decide showing name or name-status,
you'd do this extra diff_flush() that is run only to learn if each
path has changes (with various "ignore" criteria) in the dry-run
mode, and it can do as much short-cut as it needs to.I’m proposing to add a .diff_optimize field to struct diff_options, which would support three modes: DIFF_OPT_NONE, DIFF_OPT_DRY_RUN, and DIFF_OPT_BUFFER. The appropriate value would be determined before calling diff_flush(), potentially in repo_diff_setup(). DIFF_OPT_NONE will be the code Peff provide, DIFF_OPT_DRY_RUN will optimize for --quiet, --name, --name-status, etc, so that we can return early if we found any change. DIFF_OPT_BUFFER will first emit changes and context around changes into a buffer (so there would be a map from file pair to change buffer), then operations after the buffer is built will use the buffer instead of calling xdl_diff(). However, I’m concerned that DIFF_OPT_BUFFER could lead to high memory usage in Git, and I’m not entirely sure if this trade-off is justified. Thanks, Lidong