diff options
author | Elijah Newren <newren@gmail.com> | 2021-05-20 06:09:41 (GMT) |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2021-05-20 06:40:39 (GMT) |
commit | 25e65b6dd52c987056f1cac00fe6073fbf8ea237 (patch) | |
tree | a2b9698df576ab6412985ed08d1ee6fd63b696c6 /merge-ort.c | |
parent | cbdca289fbb011e7397fecfebeeac3f887ef22d1 (diff) | |
download | git-25e65b6dd52c987056f1cac00fe6073fbf8ea237.zip git-25e65b6dd52c987056f1cac00fe6073fbf8ea237.tar.gz git-25e65b6dd52c987056f1cac00fe6073fbf8ea237.tar.bz2 |
merge-ort, diffcore-rename: employ cached renames when possible
When there are many renames between the old base of a series of commits
and the new base, the way sequencer.c, merge-recursive.c, and
diffcore-rename.c have traditionally split the work resulted in
redetecting the same renames with each and every commit being
transplanted. To address this, the last several commits have been
creating a cache of rename detection results, determining when it was
safe to use such a cache in subsequent merge operations, adding helper
functions, and so on. See the previous half dozen commit messages for
additional discussion of this optimization, particularly the message a
few commits ago entitled "add code to check for whether cached renames
can be reused". This commit finally ties all of that work together,
modifying the merge algorithm to make use of these cached renames.
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 5.665 s ± 0.129 s 5.622 s ± 0.059 s
mega-renames: 11.435 s ± 0.158 s 10.127 s ± 0.073 s
just-one-mega: 494.2 ms ± 6.1 ms 500.3 ms ± 3.8 ms
That's a fairly small improvement, but mostly because the previous
optimizations were so effective for these particular testcases; this
optimization only kicks in when the others don't. If we undid the
basename-guided rename detection and skip-irrelevant-renames
optimizations, then we'd see that this series by itself improved
performance as follows:
Before Basename Series After Just This Series
no-renames: 13.815 s ± 0.062 s 5.697 s ± 0.080 s
mega-renames: 1799.937 s ± 0.493 s 205.709 s ± 0.457 s
Since this optimization kicks in to help accelerate cases where the
previous optimizations do not apply, this last comparison shows that
this cached-renames optimization has the potential to help signficantly
in cases that don't meet the requirements for the other optimizations to
be effective.
The changes made in this optimization also lay some important groundwork
for a future optimization around having collect_merge_info() avoid
recursing into subtrees in more cases.
However, for this optimization to be effective, merge_switch_to_result()
should only be called when the rebase or cherry-pick operation has
either completed or hit a case where the user needs to resolve a
conflict or edit the result. If it is called after every commit, as
sequencer.c does, then the working tree and index are needlessly updated
with every commit and the cached metadata is tossed, defeating this
optimization. Refactoring sequencer.c to only call
merge_switch_to_result() at the end of the operation is a bigger
undertaking, and the practical benefits of this optimization will not be
realized until that work is performed. Since `test-tool fast-rebase`
only updates at the end of the operation, it was used to obtain the
timings above.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'merge-ort.c')
-rw-r--r-- | merge-ort.c | 47 |
1 files changed, 42 insertions, 5 deletions
diff --git a/merge-ort.c b/merge-ort.c index de96fe4..ed21dc8 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -763,15 +763,48 @@ static void add_pair(struct merge_options *opt, struct rename_info *renames = &opt->priv->renames; int names_idx = is_add ? side : 0; - if (!is_add) { + if (is_add) { + if (strset_contains(&renames->cached_target_names[side], + pathname)) + return; + } else { unsigned content_relevant = (match_mask == 0); unsigned location_relevant = (dir_rename_mask == 0x07); + /* + * If pathname is found in cached_irrelevant[side] due to + * previous pick but for this commit content is relevant, + * then we need to remove it from cached_irrelevant. + */ + if (content_relevant) + /* strset_remove is no-op if strset doesn't have key */ + strset_remove(&renames->cached_irrelevant[side], + pathname); + + /* + * We do not need to re-detect renames for paths that we already + * know the pairing, i.e. for cached_pairs (or + * cached_irrelevant). However, handle_deferred_entries() needs + * to loop over the union of keys from relevant_sources[side] and + * cached_pairs[side], so for simplicity we set relevant_sources + * for all the cached_pairs too and then strip them back out in + * prune_cached_from_relevant() at the beginning of + * detect_regular_renames(). + */ if (content_relevant || location_relevant) { /* content_relevant trumps location_relevant */ strintmap_set(&renames->relevant_sources[side], pathname, content_relevant ? RELEVANT_CONTENT : RELEVANT_LOCATION); } + + /* + * Avoid creating pair if we've already cached rename results. + * Note that we do this after setting relevant_sources[side] + * as noted in the comment above. + */ + if (strmap_contains(&renames->cached_pairs[side], pathname) || + strset_contains(&renames->cached_irrelevant[side], pathname)) + return; } one = alloc_filespec(pathname); @@ -2360,7 +2393,9 @@ static inline int possible_side_renames(struct rename_info *renames, static inline int possible_renames(struct rename_info *renames) { return possible_side_renames(renames, 1) || - possible_side_renames(renames, 2); + possible_side_renames(renames, 2) || + !strmap_empty(&renames->cached_pairs[1]) || + !strmap_empty(&renames->cached_pairs[2]); } static void resolve_diffpair_statuses(struct diff_queue_struct *q) @@ -2384,7 +2419,6 @@ static void resolve_diffpair_statuses(struct diff_queue_struct *q) } } -MAYBE_UNUSED static void prune_cached_from_relevant(struct rename_info *renames, unsigned side) { @@ -2404,7 +2438,6 @@ static void prune_cached_from_relevant(struct rename_info *renames, } } -MAYBE_UNUSED static void use_cached_pairs(struct merge_options *opt, struct strmap *cached_pairs, struct diff_queue_struct *pairs) @@ -2507,6 +2540,7 @@ static void detect_regular_renames(struct merge_options *opt, struct diff_options diff_opts; struct rename_info *renames = &opt->priv->renames; + prune_cached_from_relevant(renames, side_index); if (!possible_side_renames(renames, side_index)) { /* * No rename detection needed for this side, but we still need @@ -2535,7 +2569,8 @@ static void detect_regular_renames(struct merge_options *opt, diffcore_rename_extended(&diff_opts, &renames->relevant_sources[side_index], &renames->dirs_removed[side_index], - &renames->dir_rename_count[side_index]); + &renames->dir_rename_count[side_index], + &renames->cached_pairs[side_index]); trace2_region_leave("diff", "diffcore_rename", opt->repo); resolve_diffpair_statuses(&diff_queued_diff); @@ -2643,6 +2678,8 @@ static int detect_and_process_renames(struct merge_options *opt, trace2_region_enter("merge", "regular renames", opt->repo); detect_regular_renames(opt, MERGE_SIDE1); detect_regular_renames(opt, MERGE_SIDE2); + use_cached_pairs(opt, &renames->cached_pairs[1], &renames->pairs[1]); + use_cached_pairs(opt, &renames->cached_pairs[2], &renames->pairs[2]); trace2_region_leave("merge", "regular renames", opt->repo); trace2_region_enter("merge", "directory renames", opt->repo); |