summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--Documentation/technical/remembering-renames.txt671
-rw-r--r--diffcore-rename.c22
-rw-r--r--diffcore.h3
-rw-r--r--merge-ort.c331
-rw-r--r--merge-ort.h2
-rw-r--r--t/helper/test-fast-rebase.c54
-rwxr-xr-xt/t6423-merge-rename-directories.sh58
-rwxr-xr-xt/t6429-merge-sequence-rename-caching.sh700
8 files changed, 1804 insertions, 37 deletions
diff --git a/Documentation/technical/remembering-renames.txt b/Documentation/technical/remembering-renames.txt
new file mode 100644
index 0000000..2fd5cc8
--- /dev/null
+++ b/Documentation/technical/remembering-renames.txt
@@ -0,0 +1,671 @@
+Rebases and cherry-picks involve a sequence of merges whose results are
+recorded as new single-parent commits. The first parent side of those
+merges represent the "upstream" side, and often include a far larger set of
+changes than the second parent side. Traditionally, the renames on the
+first-parent side of that sequence of merges were repeatedly re-detected
+for every merge. This file explains why it is safe and effective during
+rebases and cherry-picks to remember renames on the upstream side of
+history as an optimization, assuming all merges are automatic and clean
+(i.e. no conflicts and not interrupted for user input or editing).
+
+Outline:
+
+ 0. Assumptions
+
+ 1. How rebasing and cherry-picking work
+
+ 2. Why the renames on MERGE_SIDE1 in any given pick are *always* a
+ superset of the renames on MERGE_SIDE1 for the next pick.
+
+ 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also
+ a rename on MERGE_SIDE1 for the next pick
+
+ 4. A detailed description of the the counter-examples to #3.
+
+ 5. Why the special cases in #4 are still fully reasonable to use to pair
+ up files for three-way content merging in the merge machinery, and why
+ they do not affect the correctness of the merge.
+
+ 6. Interaction with skipping of "irrelevant" renames
+
+ 7. Additional items that need to be cached
+
+ 8. How directory rename detection interacts with the above and why this
+ optimization is still safe even if merge.directoryRenames is set to
+ "true".
+
+
+=== 0. Assumptions ===
+
+There are two assumptions that will hold throughout this document:
+
+ * The upstream side where commits are transplanted to is treated as the
+ first parent side when rebase/cherry-pick call the merge machinery
+
+ * All merges are fully automatic
+
+and a third that will hold in sections 2-5 for simplicity, that I'll later
+address in section 8:
+
+ * No directory renames occur
+
+
+Let me explain more about each assumption and why I include it:
+
+
+The first assumption is merely for the purposes of making this document
+clearer; the optimization implementation does not actually depend upon it.
+However, the assumption does hold in all cases because it reflects the way
+that both rebase and cherry-pick were implemented; and the implementation
+of cherry-pick and rebase are not readily changeable for backwards
+compatibility reasons (see for example the discussion of the --ours and
+--theirs flag in the documentation of `git checkout`, particularly the
+comments about how they behave with rebase). The optimization avoids
+checking first-parent-ness, though. It checks the conditions that make the
+optimization valid instead, so it would still continue working if someone
+changed the parent ordering that cherry-pick and rebase use. But making
+this assumption does make this document much clearer and prevents me from
+having to repeat every example twice.
+
+If the second assumption is violated, then the optimization simply is
+turned off and thus isn't relevant to consider. The second assumption can
+also be stated as "there is no interruption for a user to resolve conflicts
+or to just further edit or tweak files". While real rebases and
+cherry-picks are often interrupted (either because it's an interactive
+rebase where the user requested to stop and edit, or because there were
+conflicts that the user needs to resolve), the cache of renames is not
+stored on disk, and thus is thrown away as soon as the rebase or cherry
+pick stops for the user to resolve the operation.
+
+The third assumption makes sections 2-5 simpler, and allows people to
+understand the basics of why this optimization is safe and effective, and
+then I can go back and address the specifics in section 8. It is probably
+also worth noting that if directory renames do occur, then the default of
+merge.directoryRenames being set to "conflict" means that the operation
+will stop for users to resolve the conflicts and the cache will be thrown
+away, and thus that there won't be an optimization to apply. So, the only
+reason we need to address directory renames specifically, is that some
+users will have set merge.directoryRenames to "true" to allow the merges to
+continue to proceed automatically. The optimization is still safe with
+this config setting, but we have to discuss a few more cases to show why;
+this discussion is deferred until section 8.
+
+
+=== 1. How rebasing and cherry-picking work ===
+
+Consider the following setup (from the git-rebase manpage):
+
+ A---B---C topic
+ /
+ D---E---F---G main
+
+After rebasing or cherry-picking topic onto main, this will appear as:
+
+ A'--B'--C' topic
+ /
+ D---E---F---G main
+
+The way the commits A', B', and C' are created is through a series of
+merges, where rebase or cherry-pick sequentially uses each of the three
+A-B-C commits in a special merge operation. Let's label the three commits
+in the merge operation as MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2. For
+this picture, the three commits for each of the three merges would be:
+
+To create A':
+ MERGE_BASE: E
+ MERGE_SIDE1: G
+ MERGE_SIDE2: A
+
+To create B':
+ MERGE_BASE: A
+ MERGE_SIDE1: A'
+ MERGE_SIDE2: B
+
+To create C':
+ MERGE_BASE: B
+ MERGE_SIDE1: B'
+ MERGE_SIDE2: C
+
+Sometimes, folks are surprised that these three-way merges are done. It
+can be useful in understanding these three-way merges to view them in a
+slightly different light. For example, in creating C', you can view it as
+either:
+
+ * Apply the changes between B & C to B'
+ * Apply the changes between B & B' to C
+
+Conceptually the two statements above are the same as a three-way merge of
+B, B', and C, at least the parts before you decide to record a commit.
+
+
+=== 2. Why the renames on MERGE_SIDE1 in any given pick are always a ===
+=== superset of the renames on MERGE_SIDE1 for the next pick. ===
+
+The merge machinery uses the filenames it is fed from MERGE_BASE,
+MERGE_SIDE1, and MERGE_SIDE2. It will only move content to a different
+filename under one of three conditions:
+
+ * To make both pieces of a conflict available to a user during conflict
+ resolution (examples: directory/file conflict, add/add type conflict
+ such as symlink vs. regular file)
+
+ * When MERGE_SIDE1 renames the file.
+
+ * When MERGE_SIDE2 renames the file.
+
+First, let's remember what commits are involved in the first and second
+picks of the cherry-pick or rebase sequence:
+
+To create A':
+ MERGE_BASE: E
+ MERGE_SIDE1: G
+ MERGE_SIDE2: A
+
+To create B':
+ MERGE_BASE: A
+ MERGE_SIDE1: A'
+ MERGE_SIDE2: B
+
+So, in particular, we need to show that the renames between E and G are a
+superset of those between A and A'.
+
+A' is created by the first merge. A' will only have renames for one of the
+three reasons listed above. The first case, a conflict, results in a
+situation where the cache is dropped and thus this optimization doesn't
+take effect, so we need not consider that case. The third case, a rename
+on MERGE_SIDE2 (i.e. from G to A), will show up in A' but it also shows up
+in A -- therefore when diffing A and A' that path does not show up as a
+rename. The only remaining way for renames to show up in A' is for the
+rename to come from MERGE_SIDE1. Therefore, all renames between A and A'
+are a subset of those between E and G. Equivalently, all renames between E
+and G are a superset of those between A and A'.
+
+
+=== 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ ===
+=== always also a rename on MERGE_SIDE1 for the next pick. ===
+
+Let's again look at the first two picks:
+
+To create A':
+ MERGE_BASE: E
+ MERGE_SIDE1: G
+ MERGE_SIDE2: A
+
+To create B':
+ MERGE_BASE: A
+ MERGE_SIDE1: A'
+ MERGE_SIDE2: B
+
+Now let's look at any given rename from MERGE_SIDE1 of the first pick, i.e.
+any given rename from E to G. Let's use the filenames 'oldfile' and
+'newfile' for demonstration purposes. That first pick will function as
+follows; when the rename is detected, the merge machinery will do a
+three-way content merge of the following:
+ E:oldfile
+ G:newfile
+ A:oldfile
+and produce a new result:
+ A':newfile
+
+Note above that I've assumed that E->A did not rename oldfile. If that
+side did rename, then we most likely have a rename/rename(1to2) conflict
+that will cause the rebase or cherry-pick operation to halt and drop the
+in-memory cache of renames and thus doesn't need to be considered further.
+In the special case that E->A does rename the file but also renames it to
+newfile, then there is no conflict from the renaming and the merge can
+succeed. In this special case, the rename is not valid to cache because
+the second merge will find A:newfile in the MERGE_BASE (see also the new
+testcases in t6429 with "rename same file identically" in their
+description). So a rename/rename(1to1) needs to be specially handled by
+pruning renames from the cache and decrementing the dir_rename_counts in
+the current and leading directories associated with those renames. Or,
+since these are really rare, one could just take the easy way out and
+disable the remembering renames optimization when a rename/rename(1to1)
+happens.
+
+The previous paragraph handled the cases for E->A renaming oldfile, let's
+continue assuming that oldfile is not renamed in A.
+
+As per the diagram for creating B', MERGE_SIDE1 involves the changes from A
+to A'. So, we are curious whether A:oldfile and A':newfile will be viewed
+as renames. Note that:
+
+ * There will be no A':oldfile (because there could not have been a
+ G:oldfile as we do not do break detection in the merge machinery and
+ G:newfile was detected as a rename, and by the construction of the
+ rename above that merged cleanly, the merge machinery will ensure there
+ is no 'oldfile' in the result).
+
+ * There will be no A:newfile (if there had been, we would have had a
+ rename/add conflict).
+
+ * Clearly A:oldfile and A':newfile are "related" (A':newfile came from a
+ clean three-way content merge involving A:oldfile).
+
+We can also expound on the third point above, by noting that three-way
+content merges can also be viewed as applying the differences between the
+base and one side to the other side. Thus we can view A':newfile as
+having been created by taking the changes between E:oldfile and G:newfile
+(which were detected as being related, i.e. <50% changed) to A:oldfile.
+
+Thus A:oldfile and A':newfile are just as related as E:oldfile and
+G:newfile are -- they have exactly identical differences. Since the latter
+were detected as renames, A:oldfile and A':newfile should also be
+detectable as renames almost always.
+
+
+=== 4. A detailed description of the counter-examples to #3. ===
+
+We already noted in section 3 that rename/rename(1to1) (i.e. both sides
+renaming a file the same way) was one counter-example. The more
+interesting bit, though, is why did we need to use the "almost" qualifier
+when stating that A:oldfile and A':newfile are "almost" always detectable
+as renames?
+
+Let's repeat an earlier point that section 3 made:
+
+ A':newfile was created by applying the changes between E:oldfile and
+ G:newfile to A:oldfile. The changes between E:oldfile and G:newfile were
+ <50% of the size of E:oldfile.
+
+If those changes that were <50% of the size of E:oldfile are also <50% of
+the size of A:oldfile, then A:oldfile and A':newfile will be detectable as
+renames. However, if there is a dramatic size reduction between E:oldfile
+and A:oldfile (but the changes between E:oldfile, G:newfile, and A:oldfile
+still somehow merge cleanly), then traditional rename detection would not
+detect A:oldfile and A':newfile as renames.
+
+Here's an example where that can happen:
+ * E:oldfile had 20 lines
+ * G:newfile added 10 new lines at the beginning of the file
+ * A:oldfile kept the first 3 lines of the file, and deleted all the rest
+then
+ => A':newfile would have 13 lines, 3 of which matches those in A:oldfile.
+E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and
+A':newfile would not be.
+
+
+=== 5. Why the special cases in #4 are still fully reasonable to use to ===
+=== pair up files for three-way content merging in the merge machinery, ===
+=== and why they do not affect the correctness of the merge. ===
+
+In the rename/rename(1to1) case, A:newfile and A':newfile are not renames
+since they use the *same* filename. However, files with the same filename
+are obviously fine to pair up for three-way content merging (the merge
+machinery has never employed break detection). The interesting
+counter-example case is thus not the rename/rename(1to1) case, but the case
+where A did not rename oldfile. That was the case that we spent most of
+the time discussing in sections 3 and 4. The remainder of this section
+will be devoted to that case as well.
+
+So, even if A:oldfile and A':newfile aren't detectable as renames, why is
+it still reasonable to pair them up for three-way content merging in the
+merge machinery? There are multiple reasons:
+
+ * As noted in sections 3 and 4, the diff between A:oldfile and A':newfile
+ is *exactly* the same as the diff between E:oldfile and G:newfile. The
+ latter pair were detected as renames, so it seems unlikely to surprise
+ users for us to treat A:oldfile and A':newfile as renames.
+
+ * In fact, "oldfile" and "newfile" were at one point detected as renames
+ due to how they were constructed in the E..G chain. And we used that
+ information once already in this rebase/cherry-pick. I think users
+ would be unlikely to be surprised at us continuing to treat the files
+ as renames and would quickly understand why we had done so.
+
+ * Marking or declaring files as renames is *not* the end goal for merges.
+ Merges use renames to determine which files make sense to be paired up
+ for three-way content merges.
+
+ * A:oldfile and A':newfile were _already_ paired up in a three-way
+ content merge; that is how A':newfile was created. In fact, that
+ three-way content merge was clean. So using them again in a later
+ three-way content merge seems very reasonable.
+
+However, the above is focusing on the common scenarios. Let's try to look
+at all possible unusual scenarios and compare without the optimization to
+with the optimization. Consider the following theoretical cases; we will
+then dive into each to determine which of them are possible,
+and if so, what they mean:
+
+ 1. Without the optimization, the second merge results in a conflict.
+ With the optimization, the second merge also results in a conflict.
+ Questions: Are the conflicts confusingly different? Better in one case?
+
+ 2. Without the optimization, the second merge results in NO conflict.
+ With the optimization, the second merge also results in NO conflict.
+ Questions: Are the merges the same?
+
+ 3. Without the optimization, the second merge results in a conflict.
+ With the optimization, the second merge results in NO conflict.
+ Questions: Possible? Bug, bugfix, or something else?
+
+ 4. Without the optimization, the second merge results in NO conflict.
+ With the optimization, the second merge results in a conflict.
+ Questions: Possible? Bug, bugfix, or something else?
+
+I'll consider all four cases, but out of order.
+
+The fourth case is impossible. For the code without the remembering
+renames optimization to not get a conflict, B:oldfile would need to exactly
+match A:oldfile -- if it doesn't, there would be a modify/delete conflict.
+If A:oldfile matches B:oldfile exactly, then a three-way content merge
+between A:oldfile, A':newfile, and B:oldfile would have no conflict and
+just give us the version of newfile from A' as the result.
+
+From the same logic as the above paragraph, the second case would indeed
+result in identical merges. When A:oldfile exactly matches B:oldfile, an
+undetected rename would say, "Oh, I see one side didn't modify 'oldfile'
+and the other side deleted it. I'll delete it. And I see you have this
+brand new file named 'newfile' in A', so I'll keep it." That gives the
+same results as three-way content merging A:oldfile, A':newfile, and
+B:oldfile -- a removal of oldfile with the version of newfile from A'
+showing up in the result.
+
+The third case is interesting. It means that A:oldfile and A':newfile were
+not just similar enough, but that the changes between them did not conflict
+with the changes between A:oldfile and B:oldfile. This would validate our
+hunch that the files were similar enough to be used in a three-way content
+merge, and thus seems entirely correct for us to have used them that way.
+(Sidenote: One particular example here may be enlightening. Let's say that
+B was an immediate revert of A. B clearly would have been a clean revert
+of A, since A was B's immediate parent. One would assume that if you can
+pick a commit, you should also be able to cherry-pick its immediate revert.
+However, this is one of those funny corner cases; without this
+optimization, we just successfully picked a commit cleanly, but we are
+unable to cherry-pick its immediate revert due to the size differences
+between E:oldfile and A:oldfile.)
+
+That leaves only the first case to consider -- when we get conflicts both
+with or without the optimization. Without the optimization, we'll have a
+modify/delete conflict, where both A':newfile and B:oldfile are left in the
+tree for the user to deal with and no hints about the potential similarity
+between the two. With the optimization, we'll have a three-way content
+merged A:oldfile, A':newfile, and B:oldfile with conflict markers
+suggesting we thought the files were related but giving the user the chance
+to resolve. As noted above, I don't think users will find us treating
+'oldfile' and 'newfile' as related as a surprise since they were between E
+and G. In any event, though, this case shouldn't be concerning since we
+hit a conflict in both cases, told the user what we know, and asked them to
+resolve it.
+
+So, in summary, case 4 is impossible, case 2 yields the same behavior, and
+cases 1 and 3 seem to provide as good or better behavior with the
+optimization than without.
+
+
+=== 6. Interaction with skipping of "irrelevant" renames ===
+
+Previous optimizations involved skipping rename detection for paths
+considered to be "irrelevant". See for example the following commits:
+
+ * 32a56dfb99 ("merge-ort: precompute subset of sources for which we
+ need rename detection", 2021-03-11)
+ * 2fd9eda462 ("merge-ort: precompute whether directory rename
+ detection is needed", 2021-03-11)
+ * 9bd342137e ("diffcore-rename: determine which relevant_sources are
+ no longer relevant", 2021-03-13)
+
+Relevance is always determined by what the _other_ side of history has
+done, in terms of modifing a file that our side renamed, or adding a
+file to a directory which our side renamed. This means that a path
+that is "irrelevant" when picking the first commit of a series in a
+rebase or cherry-pick, may suddenly become "relevant" when picking the
+next commit.
+
+The upshot of this is that we can only cache rename detection results
+for relevant paths, and need to re-check relevance in subsequent
+commits. If those subsequent commits have additional paths that are
+relevant for rename detection, then we will need to redo rename
+detection -- though we can limit it to the paths for which we have not
+already detected renames.
+
+
+=== 7. Additional items that need to be cached ===
+
+It turns out we have to cache more than just renames; we also cache:
+
+ A) non-renames (i.e. unpaired deletes)
+ B) counts of renames within directories
+ C) sources that were marked as RELEVANT_LOCATION, but which were
+ downgraded to RELEVANT_NO_MORE
+ D) the toplevel trees involved in the merge
+
+These are all stored in struct rename_info, and respectively appear in
+ * cached_pairs (along side actual renames, just with a value of NULL)
+ * dir_rename_counts
+ * cached_irrelevant
+ * merge_trees
+
+The reason for (A) comes from the irrelevant renames skipping
+optimization discussed in section 6. The fact that irrelevant renames
+are skipped means we only get a subset of the potential renames
+detected and subsequent commits may need to run rename detection on
+the upstream side on a subset of the remaining renames (to get the
+renames that are relevant for that later commit). Since unpaired
+deletes are involved in rename detection too, we don't want to
+repeatedly check that those paths remain unpaired on the upstream side
+with every commit we are transplanting.
+
+The reason for (B) is that diffcore_rename_extended() is what
+generates the counts of renames by directory which is needed in
+directory rename detection, and if we don't run
+diffcore_rename_extended() again then we need to have the output from
+it, including dir_rename_counts, from the previous run.
+
+The reason for (C) is that merge-ort's tree traversal will again think
+those paths are relevant (marking them as RELEVANT_LOCATION), but the
+fact that they were downgraded to RELEVANT_NO_MORE means that
+dir_rename_counts already has the information we need for directory
+rename detection. (A path which becomes RELEVANT_CONTENT in a
+subsequent commit will be removed from cached_irrelevant.)
+
+The reason for (D) is that is how we determine whether the remember
+renames optimization can be used. In particular, remembering that our
+sequence of merges looks like:
+
+ Merge 1:
+ MERGE_BASE: E
+ MERGE_SIDE1: G
+ MERGE_SIDE2: A
+ => Creates A'
+
+ Merge 2:
+ MERGE_BASE: A
+ MERGE_SIDE1: A'
+ MERGE_SIDE2: B
+ => Creates B'
+
+It is the fact that the trees A and A' appear both in Merge 1 and in
+Merge 2, with A as a parent of A' that allows this optimization. So
+we store the trees to compare with what we are asked to merge next
+time.
+
+
+=== 8. How directory rename detection interacts with the above and ===
+=== why this optimization is still safe even if ===
+=== merge.directoryRenames is set to "true". ===
+
+As noted in the assumptions section:
+
+ """
+ ...if directory renames do occur, then the default of
+ merge.directoryRenames being set to "conflict" means that the operation
+ will stop for users to resolve the conflicts and the cache will be
+ thrown away, and thus that there won't be an optimization to apply.
+ So, the only reason we need to address directory renames specifically,
+ is that some users will have set merge.directoryRenames to "true" to
+ allow the merges to continue to proceed automatically.
+ """
+
+Let's remember that we need to look at how any given pick affects the next
+one. So let's again use the first two picks from the diagram in section
+one:
+
+ First pick does this three-way merge:
+ MERGE_BASE: E
+ MERGE_SIDE1: G
+ MERGE_SIDE2: A
+ => creates A'
+
+ Second pick does this three-way merge:
+ MERGE_BASE: A
+ MERGE_SIDE1: A'
+ MERGE_SIDE2: B
+ => creates B'
+
+Now, directory rename detection exists so that if one side of history
+renames a directory, and the other side adds a new file to the old
+directory, then the merge (with merge.directoryRenames=true) can move the
+file into the new directory. There are two qualitatively different ways to
+add a new file to an old directory: create a new file, or rename a file
+into that directory. Also, directory renames can be done on either side of
+history, so there are four cases to consider:
+
+ * MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir
+ * MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir
+ * MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir
+ * MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir
+
+One last note before we consider these four cases: There are some
+important properties about how we implement this optimization with
+respect to directory rename detection that we need to bear in mind
+while considering all of these cases:
+
+ * rename caching occurs *after* applying directory renames
+
+ * a rename created by directory rename detection is recorded for the side
+ of history that did the directory rename.
+
+ * dir_rename_counts, the nested map of
+ {oldname => {newname => count}},
+ is cached between runs as well. This basically means that directory
+ rename detection is also cached, though only on the side of history
+ that we cache renames for (MERGE_SIDE1 as far as this document is
+ concerned; see the assumptions section). Two interesting sub-notes
+ about these counts:
+
+ * If we need to perform rename-detection again on the given side (e.g.
+ some paths are relevant for rename detection that weren't before),
+ then we clear dir_rename_counts and recompute it, making use of
+ cached_pairs. The reason it is important to do this is optimizations
+ around RELEVANT_LOCATION exist to prevent us from computing
+ unnecessary renames for directory rename detection and from computing
+ dir_rename_counts for irrelevant directories; but those same renames
+ or directories may become necessary for subsequent merges. The
+ easiest way to "fix up" dir_rename_counts in such cases is to just
+ recompute it.
+
+ * If we prune rename/rename(1to1) entries from the cache, then we also
+ need to update dir_rename_counts to decrement the counts for the
+ involved directory and any relevant parent directories (to undo what
+ update_dir_rename_counts() in diffcore-rename.c incremented when the
+ rename was initially found). If we instead just disable the
+ remembering renames optimization when the exceedingly rare
+ rename/rename(1to1) cases occur, then dir_rename_counts will get
+ re-computed the next time rename detection occurs, as noted above.
+
+ * the side with multiple commits to pick, is the side of history that we
+ do NOT cache renames for. Thus, there are no additional commits to
+ change the number of renames in a directory, except for those done by
+ directory rename detection (which always pad the majority).
+
+ * the "renames" we cache are modified slightly by any directory rename,
+ as noted below.
+
+Now, with those notes out of the way, let's go through the four cases
+in order:
+
+Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir
+
+ This case looks like this:
+
+ MERGE_BASE: E, Has olddir/
+ MERGE_SIDE1: G, Renames olddir/ -> newdir/
+ MERGE_SIDE2: A, Adds olddir/newfile
+ => creates A', With newdir/newfile
+
+ MERGE_BASE: A, Has olddir/newfile
+ MERGE_SIDE1: A', Has newdir/newfile
+ MERGE_SIDE2: B, Modifies olddir/newfile
+ => expected B', with threeway-merged newdir/newfile from above
+
+ In this case, with the optimization, note that after the first commit:
+ * MERGE_SIDE1 remembers olddir/ -> newdir/
+ * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile
+ Given the cached rename noted above, the second merge can proceed as
+ expected without needing to perform rename detection from A -> A'.
+
+Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir
+
+ This case looks like this:
+ MERGE_BASE: E oldfile, olddir/
+ MERGE_SIDE1: G oldfile, olddir/ -> newdir/
+ MERGE_SIDE2: A oldfile -> olddir/newfile
+ => creates A', With newdir/newfile representing original oldfile
+
+ MERGE_BASE: A olddir/newfile
+ MERGE_SIDE1: A' newdir/newfile
+ MERGE_SIDE2: B modify olddir/newfile
+ => expected B', with threeway-merged newdir/newfile from above
+
+ In this case, with the optimization, note that after the first commit:
+ * MERGE_SIDE1 remembers olddir/ -> newdir/
+ * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile
+ (NOT oldfile -> newdir/newfile; compare to case with
+ (p->status == 'R' && new_path) in possibly_cache_new_pair())
+
+ Given the cached rename noted above, the second merge can proceed as
+ expected without needing to perform rename detection from A -> A'.
+
+Case 3: MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir
+
+ This case looks like this:
+
+ MERGE_BASE: E, Has olddir/
+ MERGE_SIDE1: G, Adds olddir/newfile
+ MERGE_SIDE2: A, Renames olddir/ -> newdir/
+ => creates A', With newdir/newfile
+
+ MERGE_BASE: A, Has newdir/, but no notion of newdir/newfile
+ MERGE_SIDE1: A', Has newdir/newfile
+ MERGE_SIDE2: B, Has newdir/, but no notion of newdir/newfile
+ => expected B', with newdir/newfile from A'
+
+ In this case, with the optimization, note that after the first commit there
+ were no renames on MERGE_SIDE1, and any renames on MERGE_SIDE2 are tossed.
+ But the second merge didn't need any renames so this is fine.
+
+Case 4: MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir
+
+ This case looks like this:
+
+ MERGE_BASE: E, Has olddir/
+ MERGE_SIDE1: G, Renames oldfile -> olddir/newfile
+ MERGE_SIDE2: A, Renames olddir/ -> newdir/
+ => creates A', With newdir/newfile representing original oldfile
+
+ MERGE_BASE: A, Has oldfile
+ MERGE_SIDE1: A', Has newdir/newfile
+ MERGE_SIDE2: B, Modifies oldfile
+ => expected B', with threeway-merged newdir/newfile from above
+
+ In this case, with the optimization, note that after the first commit:
+ * MERGE_SIDE1 remembers oldfile -> newdir/newfile
+ (NOT oldfile -> olddir/newfile; compare to case of second
+ block under p->status == 'R' in possibly_cache_new_pair())
+ * MERGE_SIDE2 renames are tossed because only MERGE_SIDE1 is remembered
+
+ Given the cached rename noted above, the second merge can proceed as
+ expected without needing to perform rename detection from A -> A'.
+
+Finally, I'll just note here that interactions with the
+skip-irrelevant-renames optimization means we sometimes don't detect
+renames for any files within a directory that was renamed, in which
+case we will not have been able to detect any rename for the directory
+itself. In such a case, we do not know whether the directory was
+renamed; we want to be careful to avoid cacheing some kind of "this
+directory was not renamed" statement. If we did, then a subsequent
+commit being rebased could add a file to the old directory, and the
+user would expect it to end up in the correct directory -- something
+our erroneous "this directory was not renamed" cache would preclude.
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 963ca58..3375e24 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -568,7 +568,8 @@ static void update_dir_rename_counts(struct dir_rename_info *info,
static void initialize_dir_rename_info(struct dir_rename_info *info,
struct strintmap *relevant_sources,
struct strintmap *dirs_removed,
- struct strmap *dir_rename_count)
+ struct strmap *dir_rename_count,
+ struct strmap *cached_pairs)
{
struct hashmap_iter iter;
struct strmap_entry *entry;
@@ -633,6 +634,17 @@ static void initialize_dir_rename_info(struct dir_rename_info *info,
rename_dst[i].p->two->path);
}
+ /* Add cached_pairs to counts */
+ strmap_for_each_entry(cached_pairs, &iter, entry) {
+ const char *old_name = entry->key;
+ const char *new_name = entry->value;
+ if (!new_name)
+ /* known delete; ignore it */
+ continue;
+
+ update_dir_rename_counts(info, dirs_removed, old_name, new_name);
+ }
+
/*
* Now we collapse
* dir_rename_count: old_directory -> {new_directory -> count}
@@ -1247,7 +1259,8 @@ static void handle_early_known_dir_renames(struct dir_rename_info *info,
void diffcore_rename_extended(struct diff_options *options,
struct strintmap *relevant_sources,
struct strintmap *dirs_removed,
- struct strmap *dir_rename_count)
+ struct strmap *dir_rename_count,
+ struct strmap *cached_pairs)
{
int detect_rename = options->detect_rename;
int minimum_score = options->rename_score;
@@ -1363,7 +1376,8 @@ void diffcore_rename_extended(struct diff_options *options,
/* Preparation for basename-driven matching. */
trace2_region_enter("diff", "dir rename setup", options->repo);
initialize_dir_rename_info(&info, relevant_sources,
- dirs_removed, dir_rename_count);
+ dirs_removed, dir_rename_count,
+ cached_pairs);
trace2_region_leave("diff", "dir rename setup", options->repo);
/* Utilize file basenames to quickly find renames. */
@@ -1560,5 +1574,5 @@ void diffcore_rename_extended(struct diff_options *options,
void diffcore_rename(struct diff_options *options)
{
- diffcore_rename_extended(options, NULL, NULL, NULL);
+ diffcore_rename_extended(options, NULL, NULL, NULL, NULL);
}
diff --git a/diffcore.h b/diffcore.h
index f5c6de4..533b30e 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -181,7 +181,8 @@ void diffcore_rename(struct diff_options *);
void diffcore_rename_extended(struct diff_options *options,
struct strintmap *relevant_sources,
struct strintmap *dirs_removed,
- struct strmap *dir_rename_count);
+ struct strmap *dir_rename_count,
+ struct strmap *cached_pairs);
void diffcore_merge_broken(void);
void diffcore_pickaxe(struct diff_options *);
void diffcore_order(const char *orderfile);
diff --git a/merge-ort.c b/merge-ort.c
index 4a9ce2a..b954f71 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -53,6 +53,8 @@ enum merge_side {
MERGE_SIDE2 = 2
};
+static unsigned RESULT_INITIALIZED = 0x1abe11ed; /* unlikely accidental value */
+
struct traversal_callback_data {
unsigned long mask;
unsigned long dirmask;
@@ -141,6 +143,72 @@ struct rename_info {
char *callback_data_traverse_path;
/*
+ * merge_trees: trees passed to the merge algorithm for the merge
+ *
+ * merge_trees records the trees passed to the merge algorithm. But,
+ * this data also is stored in merge_result->priv. If a sequence of
+ * merges are being done (such as when cherry-picking or rebasing),
+ * the next merge can look at this and re-use information from
+ * previous merges under certain circumstances.
+ *
+ * See also all the cached_* variables.
+ */
+ struct tree *merge_trees[3];
+
+ /*
+ * cached_pairs_valid_side: which side's cached info can be reused
+ *
+ * See the description for merge_trees. For repeated merges, at most
+ * only one side's cached information can be used. Valid values:
+ * MERGE_SIDE2: cached data from side2 can be reused
+ * MERGE_SIDE1: cached data from side1 can be reused
+ * 0: no cached data can be reused
+ */
+ int cached_pairs_valid_side;
+
+ /*
+ * cached_pairs: Caching of renames and deletions.
+ *
+ * These are mappings recording renames and deletions of individual
+ * files (not directories). They are thus a map from an old
+ * filename to either NULL (for deletions) or a new filename (for
+ * renames).
+ */
+ struct strmap cached_pairs[3];
+
+ /*
+ * cached_target_names: just the destinations from cached_pairs
+ *
+ * We sometimes want a fast lookup to determine if a given filename
+ * is one of the destinations in cached_pairs. cached_target_names
+ * is thus duplicative information, but it provides a fast lookup.
+ */
+ struct strset cached_target_names[3];
+
+ /*
+ * cached_irrelevant: Caching of rename_sources that aren't relevant.
+ *
+ * If we try to detect a rename for a source path and succeed, it's
+ * part of a rename. If we try to detect a rename for a source path
+ * and fail, then it's a delete. If we do not try to detect a rename
+ * for a path, then we don't know if it's a rename or a delete. If
+ * merge-ort doesn't think the path is relevant, then we just won't
+ * cache anything for that path. But there's a slight problem in
+ * that merge-ort can think a path is RELEVANT_LOCATION, but due to
+ * commit 9bd342137e ("diffcore-rename: determine which
+ * relevant_sources are no longer relevant", 2021-03-13),
+ * diffcore-rename can downgrade the path to RELEVANT_NO_MORE. To
+ * avoid excessive calls to diffcore_rename_extended() we still need
+ * to cache such paths, though we cannot record them as either
+ * renames or deletes. So we cache them here as a "turned out to be
+ * irrelevant *for this commit*" as they are often also irrelevant
+ * for subsequent commits, though we will have to do some extra
+ * checking to see whether such paths become relevant for rename
+ * detection when cherry-picking/rebasing subsequent commits.
+ */
+ struct strset cached_irrelevant[3];
+
+ /*
* needed_limit: value needed for inexact rename detection to run
*
* If the current rename limit wasn't high enough for inexact
@@ -382,6 +450,8 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
reinitialize ? strmap_partial_clear : strmap_clear;
void (*strintmap_func)(struct strintmap *) =
reinitialize ? strintmap_partial_clear : strintmap_clear;
+ void (*strset_func)(struct strset *) =
+ reinitialize ? strset_partial_clear : strset_clear;
/*
* We marked opti->paths with strdup_strings = 0, so that we
@@ -417,15 +487,21 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
/* Free memory used by various renames maps */
for (i = MERGE_SIDE1; i <= MERGE_SIDE2; ++i) {
strintmap_func(&renames->dirs_removed[i]);
-
- partial_clear_dir_rename_count(&renames->dir_rename_count[i]);
- if (!reinitialize)
- strmap_clear(&renames->dir_rename_count[i], 1);
-
strmap_func(&renames->dir_renames[i], 0);
-
strintmap_func(&renames->relevant_sources[i]);
+ if (!reinitialize)
+ assert(renames->cached_pairs_valid_side == 0);
+ if (i != renames->cached_pairs_valid_side) {
+ strset_func(&renames->cached_target_names[i]);
+ strmap_func(&renames->cached_pairs[i], 1);
+ strset_func(&renames->cached_irrelevant[i]);
+ partial_clear_dir_rename_count(&renames->dir_rename_count[i]);
+ if (!reinitialize)
+ strmap_clear(&renames->dir_rename_count[i], 1);
+ }
}
+ renames->cached_pairs_valid_side = 0;
+ renames->dir_rename_mask = 0;
if (!reinitialize) {
struct hashmap_iter iter;
@@ -448,8 +524,6 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
strmap_clear(&opti->output, 0);
}
- renames->dir_rename_mask = 0;
-
/* Clean out callback_data as well. */
FREE_AND_NULL(renames->callback_data);
renames->callback_data_nr = renames->callback_data_alloc = 0;
@@ -690,15 +764,48 @@ static void add_pair(struct merge_options *opt,
struct rename_info *renames = &opt->priv->renames;
int names_idx = is_add ? side : 0;
- if (!is_add) {
+ if (is_add) {
+ if (strset_contains(&renames->cached_target_names[side],
+ pathname))
+ return;
+ } else {
unsigned content_relevant = (match_mask == 0);
unsigned location_relevant = (dir_rename_mask == 0x07);
+ /*
+ * If pathname is found in cached_irrelevant[side] due to
+ * previous pick but for this commit content is relevant,
+ * then we need to remove it from cached_irrelevant.
+ */
+ if (content_relevant)
+ /* strset_remove is no-op if strset doesn't have key */
+ strset_remove(&renames->cached_irrelevant[side],
+ pathname);
+
+ /*
+ * We do not need to re-detect renames for paths that we already
+ * know the pairing, i.e. for cached_pairs (or
+ * cached_irrelevant). However, handle_deferred_entries() needs
+ * to loop over the union of keys from relevant_sources[side] and
+ * cached_pairs[side], so for simplicity we set relevant_sources
+ * for all the cached_pairs too and then strip them back out in
+ * prune_cached_from_relevant() at the beginning of
+ * detect_regular_renames().
+ */
if (content_relevant || location_relevant) {
/* content_relevant trumps location_relevant */
strintmap_set(&renames->relevant_sources[side], pathname,
content_relevant ? RELEVANT_CONTENT : RELEVANT_LOCATION);
}
+
+ /*
+ * Avoid creating pair if we've already cached rename results.
+ * Note that we do this after setting relevant_sources[side]
+ * as noted in the comment above.
+ */
+ if (strmap_contains(&renames->cached_pairs[side], pathname) ||
+ strset_contains(&renames->cached_irrelevant[side], pathname))
+ return;
}
one = alloc_filespec(pathname);
@@ -2037,6 +2144,9 @@ static int process_renames(struct merge_options *opt,
VERIFY_CI(side2);
if (!strcmp(pathnames[1], pathnames[2])) {
+ struct rename_info *ri = &opt->priv->renames;
+ int j;
+
/* Both sides renamed the same way */
assert(side1 == side2);
memcpy(&side1->stages[0], &base->stages[0],
@@ -2046,6 +2156,16 @@ static int process_renames(struct merge_options *opt,
base->merged.is_null = 1;
base->merged.clean = 1;
+ /*
+ * Disable remembering renames optimization;
+ * rename/rename(1to1) is incredibly rare, and
+ * just disabling the optimization is easier
+ * than purging cached_pairs,
+ * cached_target_names, and dir_rename_counts.
+ */
+ for (j = 0; j < 3; j++)
+ ri->merge_trees[j] = NULL;
+
/* We handled both renames, i.e. i+1 handled */
i++;
/* Move to next rename */
@@ -2273,7 +2393,9 @@ static inline int possible_side_renames(struct rename_info *renames,
static inline int possible_renames(struct rename_info *renames)
{
return possible_side_renames(renames, 1) ||
- possible_side_renames(renames, 2);
+ possible_side_renames(renames, 2) ||
+ !strmap_empty(&renames->cached_pairs[1]) ||
+ !strmap_empty(&renames->cached_pairs[2]);
}
static void resolve_diffpair_statuses(struct diff_queue_struct *q)
@@ -2297,6 +2419,112 @@ static void resolve_diffpair_statuses(struct diff_queue_struct *q)
}
}
+static void prune_cached_from_relevant(struct rename_info *renames,
+ unsigned side)
+{
+ /* Reason for this function described in add_pair() */
+ struct hashmap_iter iter;
+ struct strmap_entry *entry;
+
+ /* Remove from relevant_sources all entries in cached_pairs[side] */
+ strmap_for_each_entry(&renames->cached_pairs[side], &iter, entry) {
+ strintmap_remove(&renames->relevant_sources[side],
+ entry->key);
+ }
+ /* Remove from relevant_sources all entries in cached_irrelevant[side] */
+ strset_for_each_entry(&renames->cached_irrelevant[side], &iter, entry) {
+ strintmap_remove(&renames->relevant_sources[side],
+ entry->key);
+ }
+}
+
+static void use_cached_pairs(struct merge_options *opt,
+ struct strmap *cached_pairs,
+ struct diff_queue_struct *pairs)
+{
+ struct hashmap_iter iter;
+ struct strmap_entry *entry;
+
+ /*
+ * Add to side_pairs all entries from renames->cached_pairs[side_index].
+ * (Info in cached_irrelevant[side_index] is not relevant here.)
+ */
+ strmap_for_each_entry(cached_pairs, &iter, entry) {
+ struct diff_filespec *one, *two;
+ const char *old_name = entry->key;
+ const char *new_name = entry->value;
+ if (!new_name)
+ new_name = old_name;
+
+ /* We don't care about oid/mode, only filenames and status */
+ one = alloc_filespec(old_name);
+ two = alloc_filespec(new_name);
+ diff_queue(pairs, one, two);
+ pairs->queue[pairs->nr-1]->status = entry->value ? 'R' : 'D';
+ }
+}
+
+static void cache_new_pair(struct rename_info *renames,
+ int side,
+ char *old_path,
+ char *new_path,
+ int free_old_value)
+{
+ char *old_value;
+ new_path = xstrdup(new_path);
+ old_value = strmap_put(&renames->cached_pairs[side],
+ old_path, new_path);
+ strset_add(&renames->cached_target_names[side], new_path);
+ if (free_old_value)
+ free(old_value);
+ else
+ assert(!old_value);
+}
+
+static void possibly_cache_new_pair(struct rename_info *renames,
+ struct diff_filepair *p,
+ unsigned side,
+ char *new_path)
+{
+ int dir_renamed_side = 0;
+
+ if (new_path) {
+ /*
+ * Directory renames happen on the other side of history from
+ * the side that adds new files to the old directory.
+ */
+ dir_renamed_side = 3 - side;
+ } else {
+ int val = strintmap_get(&renames->relevant_sources[side],
+ p->one->path);
+ if (val == RELEVANT_NO_MORE) {
+ assert(p->status == 'D');
+ strset_add(&renames->cached_irrelevant[side],
+ p->one->path);
+ }
+ if (val <= 0)
+ return;
+ }
+
+ if (p->status == 'D') {
+ /*
+ * If we already had this delete, we'll just set it's value
+ * to NULL again, so no harm.
+ */
+ strmap_put(&renames->cached_pairs[side], p->one->path, NULL);
+ } else if (p->status == 'R') {
+ if (!new_path)
+ new_path = p->two->path;
+ else
+ cache_new_pair(renames, dir_renamed_side,
+ p->two->path, new_path, 0);
+ cache_new_pair(renames, side, p->one->path, new_path, 1);
+ } else if (p->status == 'A' && new_path) {
+ cache_new_pair(renames, dir_renamed_side,
+ p->two->path, new_path, 0);
+ }
+}
+
static int compare_pairs(const void *a_, const void *b_)
{
const struct diff_filepair *a = *((const struct diff_filepair **)a_);
@@ -2312,6 +2540,7 @@ static void detect_regular_renames(struct merge_options *opt,
struct diff_options diff_opts;
struct rename_info *renames = &opt->priv->renames;
+ prune_cached_from_relevant(renames, side_index);
if (!possible_side_renames(renames, side_index)) {
/*
* No rename detection needed for this side, but we still need
@@ -2322,6 +2551,7 @@ static void detect_regular_renames(struct merge_options *opt,
return;
}
+ partial_clear_dir_rename_count(&renames->dir_rename_count[side_index]);
repo_diff_setup(opt->repo, &diff_opts);
diff_opts.flags.recursive = 1;
diff_opts.flags.rename_empty = 0;
@@ -2339,7 +2569,8 @@ static void detect_regular_renames(struct merge_options *opt,
diffcore_rename_extended(&diff_opts,
&renames->relevant_sources[side_index],
&renames->dirs_removed[side_index],
- &renames->dir_rename_count[side_index]);
+ &renames->dir_rename_count[side_index],
+ &renames->cached_pairs[side_index]);
trace2_region_leave("diff", "diffcore_rename", opt->repo);
resolve_diffpair_statuses(&diff_queued_diff);
@@ -2379,6 +2610,7 @@ static int collect_renames(struct merge_options *opt,
char *new_path; /* non-NULL only with directory renames */
if (p->status != 'A' && p->status != 'R') {
+ possibly_cache_new_pair(renames, p, side_index, NULL);
diff_free_filepair(p);
continue;
}
@@ -2390,6 +2622,7 @@ static int collect_renames(struct merge_options *opt,
&collisions,
&clean);
+ possibly_cache_new_pair(renames, p, side_index, new_path);
if (p->status != 'R' && !new_path) {
diff_free_filepair(p);
continue;
@@ -2445,6 +2678,8 @@ static int detect_and_process_renames(struct merge_options *opt,
trace2_region_enter("merge", "regular renames", opt->repo);
detect_regular_renames(opt, MERGE_SIDE1);
detect_regular_renames(opt, MERGE_SIDE2);
+ use_cached_pairs(opt, &renames->cached_pairs[1], &renames->pairs[1]);
+ use_cached_pairs(opt, &renames->cached_pairs[2], &renames->pairs[2]);
trace2_region_leave("merge", "regular renames", opt->repo);
trace2_region_enter("merge", "directory renames", opt->repo);
@@ -3635,6 +3870,10 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
assert(opt->obuf.len == 0);
assert(opt->priv == NULL);
+ if (result->_properly_initialized != 0 &&
+ result->_properly_initialized != RESULT_INITIALIZED)
+ BUG("struct merge_result passed to merge_incore_*recursive() must be zeroed or filled with values from a previous run");
+ assert(!!result->priv == !!result->_properly_initialized);
if (result->priv) {
opt->priv = result->priv;
result->priv = NULL;
@@ -3674,8 +3913,22 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
NULL, 1);
strmap_init_with_options(&renames->dir_renames[i],
NULL, 0);
+ /*
+ * relevant_sources uses -1 for the default, because we need
+ * to be able to distinguish not-in-strintmap from valid
+ * relevant_source values from enum file_rename_relevance.
+ * In particular, possibly_cache_new_pair() expects a negative
+ * value for not-found entries.
+ */
strintmap_init_with_options(&renames->relevant_sources[i],
- 0, NULL, 0);
+ -1 /* explicitly invalid */,
+ NULL, 0);
+ strmap_init_with_options(&renames->cached_pairs[i],
+ NULL, 1);
+ strset_init_with_options(&renames->cached_irrelevant[i],
+ NULL, 1);
+ strset_init_with_options(&renames->cached_target_names[i],
+ NULL, 0);
}
/*
@@ -3701,6 +3954,50 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
trace2_region_leave("merge", "allocate/init", opt->repo);
}
+static void merge_check_renames_reusable(struct merge_options *opt,
+ struct merge_result *result,
+ struct tree *merge_base,
+ struct tree *side1,
+ struct tree *side2)
+{
+ struct rename_info *renames;
+ struct tree **merge_trees;
+ struct merge_options_internal *opti = result->priv;
+
+ if (!opti)
+ return;
+
+ renames = &opti->renames;
+ merge_trees = renames->merge_trees;
+
+ /*
+ * Handle case where previous merge operation did not want cache to
+ * take effect, e.g. because rename/rename(1to1) makes it invalid.
+ */
+ if (!merge_trees[0]) {
+ assert(!merge_trees[0] && !merge_trees[1] && !merge_trees[2]);
+ renames->cached_pairs_valid_side = 0; /* neither side valid */
+ return;
+ }
+
+ /*
+ * Handle other cases; note that merge_trees[0..2] will only
+ * be NULL if opti is, or if all three were manually set to
+ * NULL by e.g. rename/rename(1to1) handling.
+ */
+ assert(merge_trees[0] && merge_trees[1] && merge_trees[2]);
+
+ /* Check if we meet a condition for re-using cached_pairs */
+ if (oideq(&merge_base->object.oid, &merge_trees[2]->object.oid) &&
+ oideq(&side1->object.oid, &result->tree->object.oid))
+ renames->cached_pairs_valid_side = MERGE_SIDE1;
+ else if (oideq(&merge_base->object.oid, &merge_trees[1]->object.oid) &&
+ oideq(&side2->object.oid, &result->tree->object.oid))
+ renames->cached_pairs_valid_side = MERGE_SIDE2;
+ else
+ renames->cached_pairs_valid_side = 0; /* neither side valid */
+}
+
/*** Function Grouping: merge_incore_*() and their internal variants ***/
/*
@@ -3751,6 +4048,7 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt,
result->clean &= strmap_empty(&opt->priv->conflicted);
if (!opt->priv->call_depth) {
result->priv = opt->priv;
+ result->_properly_initialized = RESULT_INITIALIZED;
opt->priv = NULL;
}
}
@@ -3848,7 +4146,16 @@ void merge_incore_nonrecursive(struct merge_options *opt,
trace2_region_enter("merge", "merge_start", opt->repo);
assert(opt->ancestor != NULL);
+ merge_check_renames_reusable(opt, result, merge_base, side1, side2);
merge_start(opt, result);
+ /*
+ * Record the trees used in this merge, so if there's a next merge in
+ * a cherry-pick or rebase sequence it might be able to take advantage
+ * of the cached_pairs in that next merge.
+ */
+ opt->priv->renames.merge_trees[0] = merge_base;
+ opt->priv->renames.merge_trees[1] = side1;
+ opt->priv->renames.merge_trees[2] = side2;
trace2_region_leave("merge", "merge_start", opt->repo);
merge_ort_nonrecursive_internal(opt, merge_base, side1, side2, result);
diff --git a/merge-ort.h b/merge-ort.h
index d53a0a3..c011864 100644
--- a/merge-ort.h
+++ b/merge-ort.h
@@ -29,6 +29,8 @@ struct merge_result {
* !clean) and to print "CONFLICT" messages. Not for external use.
*/
void *priv;
+ /* Also private */
+ unsigned _properly_initialized;
};
/*
diff --git a/t/helper/test-fast-rebase.c b/t/helper/test-fast-rebase.c
index 3732122..fc2d460 100644
--- a/t/helper/test-fast-rebase.c
+++ b/t/helper/test-fast-rebase.c
@@ -91,7 +91,6 @@ int cmd__fast_rebase(int argc, const char **argv)
struct commit *last_commit = NULL, *last_picked_commit = NULL;
struct object_id head;
struct lock_file lock = LOCK_INIT;
- int clean = 1;
struct strvec rev_walk_args = STRVEC_INIT;
struct rev_info revs;
struct commit *commit;
@@ -124,7 +123,8 @@ int cmd__fast_rebase(int argc, const char **argv)
assert(oideq(&onto->object.oid, &head));
hold_locked_index(&lock, LOCK_DIE_ON_ERROR);
- assert(repo_read_index(the_repository) >= 0);
+ if (repo_read_index(the_repository) < 0)
+ BUG("Could not read index");
repo_init_revisions(the_repository, &revs, NULL);
revs.verbose_header = 1;
@@ -175,11 +175,10 @@ int cmd__fast_rebase(int argc, const char **argv)
free((char*)merge_opt.ancestor);
merge_opt.ancestor = NULL;
if (!result.clean)
- die("Aborting: Hit a conflict and restarting is not implemented.");
+ break;
last_picked_commit = commit;
last_commit = create_commit(result.tree, commit, last_commit);
}
- fprintf(stderr, "\nDone.\n");
/* TODO: There should be some kind of rev_info_free(&revs) call... */
memset(&revs, 0, sizeof(revs));
@@ -188,24 +187,39 @@ int cmd__fast_rebase(int argc, const char **argv)
if (result.clean < 0)
exit(128);
- strbuf_addf(&reflog_msg, "finish rebase %s onto %s",
- oid_to_hex(&last_picked_commit->object.oid),
- oid_to_hex(&last_commit->object.oid));
- if (update_ref(reflog_msg.buf, branch_name.buf,
- &last_commit->object.oid,
- &last_picked_commit->object.oid,
- REF_NO_DEREF, UPDATE_REFS_MSG_ON_ERR)) {
- error(_("could not update %s"), argv[4]);
- die("Failed to update %s", argv[4]);
+ if (result.clean) {
+ fprintf(stderr, "\nDone.\n");
+ strbuf_addf(&reflog_msg, "finish rebase %s onto %s",
+ oid_to_hex(&last_picked_commit->object.oid),
+ oid_to_hex(&last_commit->object.oid));
+ if (update_ref(reflog_msg.buf, branch_name.buf,
+ &last_commit->object.oid,
+ &last_picked_commit->object.oid,
+ REF_NO_DEREF, UPDATE_REFS_MSG_ON_ERR)) {
+ error(_("could not update %s"), argv[4]);
+ die("Failed to update %s", argv[4]);
+ }
+ if (create_symref("HEAD", branch_name.buf, reflog_msg.buf) < 0)
+ die(_("unable to update HEAD"));
+ strbuf_release(&reflog_msg);
+ strbuf_release(&branch_name);
+
+ prime_cache_tree(the_repository, the_repository->index,
+ result.tree);
+ } else {
+ fprintf(stderr, "\nAborting: Hit a conflict.\n");
+ strbuf_addf(&reflog_msg, "rebase progress up to %s",
+ oid_to_hex(&last_picked_commit->object.oid));
+ if (update_ref(reflog_msg.buf, "HEAD",
+ &last_commit->object.oid,
+ &head,
+ REF_NO_DEREF, UPDATE_REFS_MSG_ON_ERR)) {
+ error(_("could not update %s"), argv[4]);
+ die("Failed to update %s", argv[4]);
+ }
}
- if (create_symref("HEAD", branch_name.buf, reflog_msg.buf) < 0)
- die(_("unable to update HEAD"));
- strbuf_release(&reflog_msg);
- strbuf_release(&branch_name);
-
- prime_cache_tree(the_repository, the_repository->index, result.tree);
if (write_locked_index(&the_index, &lock,
COMMIT_LOCK | SKIP_IF_UNCHANGED))
die(_("unable to write %s"), get_index_file());
- return (clean == 0);
+ return (result.clean == 0);
}
diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh
index 7134769..be84d22 100755
--- a/t/t6423-merge-rename-directories.sh
+++ b/t/t6423-merge-rename-directories.sh
@@ -4966,6 +4966,64 @@ test_expect_success '12g: Testcase with two kinds of "relevant" renames' '
)
'
+# Testcase 12h, Testcase with two kinds of "relevant" renames
+# Commit O: olddir/{a_1, b}
+# Commit A: newdir/{a_2, b}
+# Commit B: olddir/{alpha_1, b}
+# Expected: newdir/{alpha_2, b}
+
+test_setup_12h () {
+ test_create_repo 12h &&
+ (
+ cd 12h &&
+
+ mkdir olddir &&
+ test_seq 3 8 >olddir/a &&
+ >olddir/b &&
+ git add olddir &&
+ git commit -m orig &&
+
+ git branch O &&
+ git branch A &&
+ git branch B &&
+
+ git switch A &&
+ test_seq 3 10 >olddir/a &&
+ git add olddir/a &&
+ git mv olddir newdir &&
+ git commit -m A &&
+
+ git switch B &&
+
+ git mv olddir/a olddir/alpha &&
+ git commit -m B
+ )
+}
+
+test_expect_failure '12h: renaming a file within a renamed directory' '
+ test_setup_12h &&
+ (
+ cd 12h &&
+
+ git checkout A^0 &&
+
+ test_might_fail git -c merge.directoryRenames=true merge -s recursive B^0 &&
+
+ git ls-files >tracked &&
+ test_line_count = 2 tracked &&
+
+ test_path_is_missing olddir/a &&
+ test_path_is_file newdir/alpha &&
+ test_path_is_file newdir/b &&
+
+ git rev-parse >actual \
+ HEAD:newdir/alpha HEAD:newdir/b &&
+ git rev-parse >expect \
+ A:newdir/a O:oldir/b &&
+ test_cmp expect actual
+ )
+'
+
###########################################################################
# SECTION 13: Checking informational and conflict messages
#
diff --git a/t/t6429-merge-sequence-rename-caching.sh b/t/t6429-merge-sequence-rename-caching.sh
new file mode 100755
index 0000000..035edc4
--- /dev/null
+++ b/t/t6429-merge-sequence-rename-caching.sh
@@ -0,0 +1,700 @@
+#!/bin/sh
+
+test_description="remember regular & dir renames in sequence of merges"
+
+. ./test-lib.sh
+
+#
+# NOTE 1: this testfile tends to not only rename files, but modify on both
+# sides; without modifying on both sides, optimizations can kick in
+# which make rename detection irrelevant or trivial. We want to make
+# sure that we are triggering rename caching rather than rename
+# bypassing.
+#
+# NOTE 2: this testfile uses 'test-tool fast-rebase' instead of either
+# cherry-pick or rebase. sequencer.c is only superficially
+# integrated with merge-ort; it calls merge_switch_to_result()
+# after EACH merge, which updates the index and working copy AND
+# throws away the cached results (because merge_switch_to_result()
+# is only supposed to be called at the end of the sequence).
+# Integrating them more deeply is a big task, so for now the tests
+# use 'test-tool fast-rebase'.
+#
+
+
+#
+# In the following simple testcase:
+# Base: numbers_1, values_1
+# Upstream: numbers_2, values_2
+# Topic_1: sequence_3
+# Topic_2: scruples_3
+# or, in english, rename numbers -> sequence in the first commit, and rename
+# values -> scruples in the second commit.
+#
+# This shouldn't be a challenge, it's just verifying that cached renames isn't
+# preventing us from finding new renames.
+#
+test_expect_success 'caching renames does not preclude finding new ones' '
+ test_create_repo caching-renames-and-new-renames &&
+ (
+ cd caching-renames-and-new-renames &&
+
+ test_seq 2 10 >numbers &&
+ test_seq 2 10 >values &&
+ git add numbers values &&
+ git commit -m orig &&
+
+ git branch upstream &&
+ git branch topic &&
+
+ git switch upstream &&
+ test_seq 1 10 >numbers &&
+ test_seq 1 10 >values &&
+ git add numbers values &&
+ git commit -m "Tweaked both files" &&
+
+ git switch topic &&
+
+ test_seq 2 12 >numbers &&
+ git add numbers &&
+ git mv numbers sequence &&
+ git commit -m A &&
+
+ test_seq 2 12 >values &&
+ git add values &&
+ git mv values scruples &&
+ git commit -m B &&
+
+ #
+ # Actual testing
+ #
+
+ git switch upstream &&
+
+ test-tool fast-rebase --onto HEAD upstream~1 topic &&
+ #git cherry-pick upstream~1..topic
+
+ git ls-files >tracked-files &&
+ test_line_count = 2 tracked-files &&
+ test_seq 1 12 >expect &&
+ test_cmp expect sequence &&
+ test_cmp expect scruples
+ )
+'
+
+#
+# In the following testcase:
+# Base: numbers_1
+# Upstream: rename numbers_1 -> sequence_2
+# Topic_1: numbers_3
+# Topic_2: numbers_1
+# or, in english, the first commit on the topic branch modifies numbers by
+# shrinking it (dramatically) and the second commit on topic reverts its
+# parent.
+#
+# Can git apply both patches?
+#
+# Traditional cherry-pick/rebase will fail to apply the second commit, the
+# one that reverted its parent, because despite detecting the rename from
+# 'numbers' to 'sequence' for the first commit, it fails to detect that
+# rename when picking the second commit. That's "reasonable" given the
+# dramatic change in size of the file, but remembering the rename and
+# reusing it is reasonable too.
+#
+# We do test here that we expect rename detection to only be run once total
+# (the topic side of history doesn't need renames, and with caching we
+# should be able to only run rename detection on the upstream side one
+# time.)
+test_expect_success 'cherry-pick both a commit and its immediate revert' '
+ test_create_repo pick-commit-and-its-immediate-revert &&
+ (
+ cd pick-commit-and-its-immediate-revert &&
+
+ test_seq 11 30 >numbers &&
+ git add numbers &&
+ git commit -m orig &&
+
+ git branch upstream &&
+ git branch topic &&
+
+ git switch upstream &&
+ test_seq 1 30 >numbers &&
+ git add numbers &&
+ git mv numbers sequence &&
+ git commit -m "Renamed (and modified) numbers -> sequence" &&
+
+ git switch topic &&
+
+ test_seq 11 13 >numbers &&
+ git add numbers &&
+ git commit -m A &&
+
+ git revert HEAD &&
+
+ #
+ # Actual testing
+ #
+
+ git switch upstream &&
+
+ GIT_TRACE2_PERF="$(pwd)/trace.output" &&
+ export GIT_TRACE2_PERF &&
+
+ test-tool fast-rebase --onto HEAD upstream~1 topic &&
+ #git cherry-pick upstream~1..topic &&
+
+ grep region_enter.*diffcore_rename trace.output >calls &&
+ test_line_count = 1 calls
+ )
+'
+
+#
+# In the following testcase:
+# Base: sequence_1
+# Upstream: rename sequence_1 -> values_2
+# Topic_1: rename sequence_1 -> values_3
+# Topic_2: add unrelated sequence_4
+# or, in english, both sides rename sequence -> values, and then the second
+# commit on the topic branch adds an unrelated file called sequence.
+#
+# This testcase presents no problems for git traditionally, but having both
+# sides do the same rename in effect "uses it up" and if it remains cached,
+# could cause a spurious rename/add conflict.
+#
+test_expect_success 'rename same file identically, then reintroduce it' '
+ test_create_repo rename-rename-1to1-then-add-old-filename &&
+ (
+ cd rename-rename-1to1-then-add-old-filename &&
+
+ test_seq 3 8 >sequence &&
+ git add sequence &&
+ git commit -m orig &&
+
+ git branch upstream &&
+ git branch topic &&
+
+ git switch upstream &&
+ test_seq 1 8 >sequence &&
+ git add sequence &&
+ git mv sequence values &&
+ git commit -m "Renamed (and modified) sequence -> values" &&
+
+ git switch topic &&
+
+ test_seq 3 10 >sequence &&
+ git add sequence &&
+ git mv sequence values &&
+ git commit -m A &&
+
+ test_write_lines A B C D E F G H I J >sequence &&
+ git add sequence &&
+ git commit -m B &&
+
+ #
+ # Actual testing
+ #
+
+ git switch upstream &&
+
+ GIT_TRACE2_PERF="$(pwd)/trace.output" &&
+ export GIT_TRACE2_PERF &&
+
+ test-tool fast-rebase --onto HEAD upstream~1 topic &&
+ #git cherry-pick upstream~1..topic &&
+
+ git ls-files >tracked &&
+ test_line_count = 2 tracked &&
+ test_path_is_file values &&
+ test_path_is_file sequence &&
+
+ grep region_enter.*diffcore_rename trace.output >calls &&
+ test_line_count = 2 calls
+ )
+'
+
+#
+# In the following testcase:
+# Base: olddir/{valuesZ_1, valuesY_1, valuesX_1}
+# Upstream: rename olddir/valuesZ_1 -> dirA/valuesZ_2
+# rename olddir/valuesY_1 -> dirA/valuesY_2
+# rename olddir/valuesX_1 -> dirB/valuesX_2
+# Topic_1: rename olddir/valuesZ_1 -> dirA/valuesZ_3
+# rename olddir/valuesY_1 -> dirA/valuesY_3
+# Topic_2: add olddir/newfile
+# Expected Pick1: dirA/{valuesZ, valuesY}, dirB/valuesX
+# Expected Pick2: dirA/{valuesZ, valuesY}, dirB/{valuesX, newfile}
+#
+# This testcase presents no problems for git traditionally, but having both
+# sides do the same renames in effect "use it up" but if the renames remain
+# cached, the directory rename could put newfile in the wrong directory.
+#
+test_expect_success 'rename same file identically, then add file to old dir' '
+ test_create_repo rename-rename-1to1-then-add-file-to-old-dir &&
+ (
+ cd rename-rename-1to1-then-add-file-to-old-dir &&
+
+ mkdir olddir/ &&
+ test_seq 3 8 >olddir/valuesZ &&
+ test_seq 3 8 >olddir/valuesY &&
+ test_seq 3 8 >olddir/valuesX &&
+ git add olddir &&
+ git commit -m orig &&
+
+ git branch upstream &&
+ git branch topic &&
+
+ git switch upstream &&
+ test_seq 1 8 >olddir/valuesZ &&
+ test_seq 1 8 >olddir/valuesY &&
+ test_seq 1 8 >olddir/valuesX &&
+ git add olddir &&
+ mkdir dirA &&
+ git mv olddir/valuesZ olddir/valuesY dirA &&
+ git mv olddir/ dirB/ &&
+ git commit -m "Renamed (and modified) values*" &&
+
+ git switch topic &&
+
+ test_seq 3 10 >olddir/valuesZ &&
+ test_seq 3 10 >olddir/valuesY &&
+ git add olddir &&
+ mkdir dirA &&
+ git mv olddir/valuesZ olddir/valuesY dirA &&
+ git commit -m A &&
+
+ >olddir/newfile &&
+ git add olddir/newfile &&
+ git commit -m B &&
+
+ #
+ # Actual testing
+ #
+
+ git switch upstream &&
+ git config merge.directoryRenames true &&
+
+ GIT_TRACE2_PERF="$(pwd)/trace.output" &&
+ export GIT_TRACE2_PERF &&
+
+ test-tool fast-rebase --onto HEAD upstream~1 topic &&
+ #git cherry-pick upstream~1..topic &&
+
+ git ls-files >tracked &&
+ test_line_count = 4 tracked &&
+ test_path_is_file dirA/valuesZ &&
+ test_path_is_file dirA/valuesY &&
+ test_path_is_file dirB/valuesX &&
+ test_path_is_file dirB/newfile &&
+
+ grep region_enter.*diffcore_rename trace.output >calls &&
+ test_line_count = 3 calls
+ )
+'
+
+#
+# In the following testcase, upstream renames a directory, and the topic branch
+# first adds a file to the directory, then later renames the directory
+# differently:
+# Base: olddir/a
+# olddir/b
+# Upstream: rename olddir/ -> newdir/
+# Topic_1: add olddir/newfile
+# Topic_2: rename olddir/ -> otherdir/
+#
+# Here we are just concerned that cached renames might prevent us from seeing
+# the rename conflict, and we want to ensure that we do get a conflict.
+#
+# While at it, though, we do test that we only try to detect renames 2
+# times and not three. (The first merge needs to detect renames on the
+# upstream side. Traditionally, the second merge would need to detect
+# renames on both sides of history, but our caching of upstream renames
+# should avoid the need to re-detect upstream renames.)
+#
+test_expect_success 'cached dir rename does not prevent noticing later conflict' '
+ test_create_repo dir-rename-cache-not-occluding-later-conflict &&
+ (
+ cd dir-rename-cache-not-occluding-later-conflict &&
+
+ mkdir olddir &&
+ test_seq 3 10 >olddir/a &&
+ test_seq 3 10 >olddir/b &&
+ git add olddir &&
+ git commit -m orig &&
+
+ git branch upstream &&
+ git branch topic &&
+
+ git switch upstream &&
+ test_seq 3 10 >olddir/a &&
+ test_seq 3 10 >olddir/b &&
+ git add olddir &&
+ git mv olddir newdir &&
+ git commit -m "Dir renamed" &&
+
+ git switch topic &&
+
+ >olddir/newfile &&
+ git add olddir/newfile &&
+ git commit -m A &&
+
+ test_seq 1 8 >olddir/a &&
+ test_seq 1 8 >olddir/b &&
+ git add olddir &&
+ git mv olddir otherdir &&
+ git commit -m B &&
+
+ #
+ # Actual testing
+ #
+
+ git switch upstream &&
+ git config merge.directoryRenames true &&
+
+ GIT_TRACE2_PERF="$(pwd)/trace.output" &&
+ export GIT_TRACE2_PERF &&
+
+ test_must_fail test-tool fast-rebase --onto HEAD upstream~1 topic >output &&
+ #git cherry-pick upstream..topic &&
+
+ grep CONFLICT..rename/rename output &&
+
+ grep region_enter.*diffcore_rename trace.output >calls &&
+ test_line_count = 2 calls
+ )
+'
+
+# Helper for the next two tests
+test_setup_upstream_rename () {
+ test_create_repo $1 &&
+ (
+ cd $1 &&
+
+ test_seq 3 8 >somefile &&
+ test_seq 3 8 >relevant-rename &&
+ git add somefile relevant-rename &&
+ mkdir olddir &&
+ test_write_lines a b c d e f g >olddir/a &&
+ test_write_lines z y x w v u t >olddir/b &&
+ git add olddir &&
+ git commit -m orig &&
+
+ git branch upstream &&
+ git branch topic &&
+
+ git switch upstream &&
+ test_seq 1 8 >somefile &&
+ test_seq 1 8 >relevant-rename &&
+ git add somefile relevant-rename &&
+ git mv relevant-rename renamed &&
+ echo h >>olddir/a &&
+ echo s >>olddir/b &&
+ git add olddir &&
+ git mv olddir newdir &&
+ git commit -m "Dir renamed"
+ )
+}
+
+#
+# In the following testcase, upstream renames a file in the toplevel directory
+# as well as its only directory:
+# Base: relevant-rename_1
+# somefile
+# olddir/a
+# olddir/b
+# Upstream: rename relevant-rename_1 -> renamed_2
+# rename olddir/ -> newdir/
+# Topic_1: relevant-rename_3
+# Topic_2: olddir/newfile_1
+# Topic_3: olddir/newfile_2
+#
+# In this testcase, since the first commit being picked only modifies a
+# file in the toplevel directory, the directory rename is irrelevant for
+# that first merge. However, we need to notice the directory rename for
+# the merge that picks the second commit, and we don't want the third
+# commit to mess up its location either. We want to make sure that
+# olddir/newfile doesn't exist in the result and that newdir/newfile does.
+#
+# We also test that we only do rename detection twice. We never need
+# rename detection on the topic side of history, but we do need it twice on
+# the upstream side of history. For the first topic commit, we only need
+# the
+# relevant-rename -> renamed
+# rename, because olddir is unmodified by Topic_1. For Topic_2, however,
+# the new file being added to olddir means files that were previously
+# irrelevant for rename detection are now relevant, forcing us to repeat
+# rename detection for the paths we don't already have cached. Topic_3 also
+# tweaks olddir/newfile, but the renames in olddir/ will have been cached
+# from the second rename detection run.
+#
+test_expect_success 'dir rename unneeded, then add new file to old dir' '
+ test_setup_upstream_rename dir-rename-unneeded-until-new-file &&
+ (
+ cd dir-rename-unneeded-until-new-file &&
+
+ git switch topic &&
+
+ test_seq 3 10 >relevant-rename &&
+ git add relevant-rename &&
+ git commit -m A &&
+
+ echo foo >olddir/newfile &&
+ git add olddir/newfile &&
+ git commit -m B &&
+
+ echo bar >>olddir/newfile &&
+ git add olddir/newfile &&
+ git commit -m C &&
+
+ #
+ # Actual testing
+ #
+
+ git switch upstream &&
+ git config merge.directoryRenames true &&
+
+ GIT_TRACE2_PERF="$(pwd)/trace.output" &&
+ export GIT_TRACE2_PERF &&
+
+ test-tool fast-rebase --onto HEAD upstream~1 topic &&
+ #git cherry-pick upstream..topic &&
+
+ grep region_enter.*diffcore_rename trace.output >calls &&
+ test_line_count = 2 calls &&
+
+ git ls-files >tracked &&
+ test_line_count = 5 tracked &&
+ test_path_is_missing olddir/newfile &&
+ test_path_is_file newdir/newfile
+ )
+'
+
+#
+# The following testcase is *very* similar to the last one, but instead of
+# adding a new olddir/newfile, it renames somefile -> olddir/newfile:
+# Base: relevant-rename_1
+# somefile_1
+# olddir/a
+# olddir/b
+# Upstream: rename relevant-rename_1 -> renamed_2
+# rename olddir/ -> newdir/
+# Topic_1: relevant-rename_3
+# Topic_2: rename somefile -> olddir/newfile_2
+# Topic_3: modify olddir/newfile_3
+#
+# In this testcase, since the first commit being picked only modifies a
+# file in the toplevel directory, the directory rename is irrelevant for
+# that first merge. However, we need to notice the directory rename for
+# the merge that picks the second commit, and we don't want the third
+# commit to mess up its location either. We want to make sure that
+# neither somefile or olddir/newfile exists in the result and that
+# newdir/newfile does.
+#
+# This testcase needs one more call to rename detection than the last
+# testcase, because of the somefile -> olddir/newfile rename in Topic_2.
+test_expect_success 'dir rename unneeded, then rename existing file into old dir' '
+ test_setup_upstream_rename dir-rename-unneeded-until-file-moved-inside &&
+ (
+ cd dir-rename-unneeded-until-file-moved-inside &&
+
+ git switch topic &&
+
+ test_seq 3 10 >relevant-rename &&
+ git add relevant-rename &&
+ git commit -m A &&
+
+ test_seq 1 10 >somefile &&
+ git add somefile &&
+ git mv somefile olddir/newfile &&
+ git commit -m B &&
+
+ test_seq 1 12 >olddir/newfile &&
+ git add olddir/newfile &&
+ git commit -m C &&
+
+ #
+ # Actual testing
+ #
+
+ git switch upstream &&
+ git config merge.directoryRenames true &&
+
+ GIT_TRACE2_PERF="$(pwd)/trace.output" &&
+ export GIT_TRACE2_PERF &&
+
+ test-tool fast-rebase --onto HEAD upstream~1 topic &&
+ #git cherry-pick upstream..topic &&
+
+ grep region_enter.*diffcore_rename trace.output >calls &&
+ test_line_count = 3 calls &&
+
+ test_path_is_missing somefile &&
+ test_path_is_missing olddir/newfile &&
+ test_path_is_file newdir/newfile &&
+ git ls-files >tracked &&
+ test_line_count = 4 tracked
+ )
+'
+
+# Helper for the next two tests
+test_setup_topic_rename () {
+ test_create_repo $1 &&
+ (
+ cd $1 &&
+
+ test_seq 3 8 >somefile &&
+ mkdir olddir &&
+ test_seq 3 8 >olddir/a &&
+ echo b >olddir/b &&
+ git add olddir somefile &&
+ git commit -m orig &&
+
+ git branch upstream &&
+ git branch topic &&
+
+ git switch topic &&
+ test_seq 1 8 >somefile &&
+ test_seq 1 8 >olddir/a &&
+ git add somefile olddir/a &&
+ git mv olddir newdir &&
+ git commit -m "Dir renamed" &&
+
+ test_seq 1 10 >somefile &&
+ git add somefile &&
+ mkdir olddir &&
+ >olddir/unrelated-file &&
+ git add olddir &&
+ git commit -m "Unrelated file in recreated old dir"
+ )
+}
+
+#
+# In the following testcase, the first commit on the topic branch renames
+# a directory, while the second recreates the old directory and places a
+# file into it:
+# Base: somefile
+# olddir/a
+# olddir/b
+# Upstream: olddir/newfile
+# Topic_1: somefile_2
+# rename olddir/ -> newdir/
+# Topic_2: olddir/unrelated-file
+#
+# Note that the first pick should merge:
+# Base: somefile
+# olddir/{a,b}
+# Upstream: olddir/newfile
+# Topic_1: rename olddir/ -> newdir/
+# For which the expected result (assuming merge.directoryRenames=true) is
+# clearly:
+# Result: somefile
+# newdir/{a, b, newfile}
+#
+# While the second pick does the following three-way merge:
+# Base (Topic_1): somefile
+# newdir/{a,b}
+# Upstream (Result from 1): same files as base, but adds newdir/newfile
+# Topic_2: same files as base, but adds olddir/unrelated-file
+#
+# The second merge is pretty trivial; upstream adds newdir/newfile, and
+# topic_2 adds olddir/unrelated-file. We're just testing that we don't
+# accidentally cache directory renames somehow and rename
+# olddir/unrelated-file to newdir/unrelated-file.
+#
+# This testcase should only need one call to diffcore_rename_extended().
+test_expect_success 'caching renames only on upstream side, part 1' '
+ test_setup_topic_rename cache-renames-only-upstream-add-file &&
+ (
+ cd cache-renames-only-upstream-add-file &&
+
+ git switch upstream &&
+
+ >olddir/newfile &&
+ git add olddir/newfile &&
+ git commit -m "Add newfile" &&
+
+ #
+ # Actual testing
+ #
+
+ git switch upstream &&
+
+ git config merge.directoryRenames true &&
+
+ GIT_TRACE2_PERF="$(pwd)/trace.output" &&
+ export GIT_TRACE2_PERF &&
+
+ test-tool fast-rebase --onto HEAD upstream~1 topic &&
+ #git cherry-pick upstream..topic &&
+
+ grep region_enter.*diffcore_rename trace.output >calls &&
+ test_line_count = 1 calls &&
+
+ git ls-files >tracked &&
+ test_line_count = 5 tracked &&
+ test_path_is_missing newdir/unrelated-file &&
+ test_path_is_file olddir/unrelated-file &&
+ test_path_is_file newdir/newfile &&
+ test_path_is_file newdir/b &&
+ test_path_is_file newdir/a &&
+ test_path_is_file somefile
+ )
+'
+
+#
+# The following testcase is *very* similar to the last one, but instead of
+# adding a new olddir/newfile, it renames somefile -> olddir/newfile:
+# Base: somefile
+# olddir/a
+# olddir/b
+# Upstream: somefile_1 -> olddir/newfile
+# Topic_1: rename olddir/ -> newdir/
+# somefile_2
+# Topic_2: olddir/unrelated-file
+# somefile_3
+#
+# Much like the previous test, this case is actually trivial and we are just
+# making sure there isn't some spurious directory rename caching going on
+# for the wrong side of history.
+#
+#
+# This testcase should only need two calls to diffcore_rename_extended(),
+# both for the first merge, one for each side of history.
+#
+test_expect_success 'caching renames only on upstream side, part 2' '
+ test_setup_topic_rename cache-renames-only-upstream-rename-file &&
+ (
+ cd cache-renames-only-upstream-rename-file &&
+
+ git switch upstream &&
+
+ git mv somefile olddir/newfile &&
+ git commit -m "Add newfile" &&
+
+ #
+ # Actual testing
+ #
+
+ git switch upstream &&
+
+ git config merge.directoryRenames true &&
+
+ GIT_TRACE2_PERF="$(pwd)/trace.output" &&
+ export GIT_TRACE2_PERF &&
+
+ test-tool fast-rebase --onto HEAD upstream~1 topic &&
+ #git cherry-pick upstream..topic &&
+
+ grep region_enter.*diffcore_rename trace.output >calls &&
+ test_line_count = 2 calls &&
+
+ git ls-files >tracked &&
+ test_line_count = 4 tracked &&
+ test_path_is_missing newdir/unrelated-file &&
+ test_path_is_file olddir/unrelated-file &&
+ test_path_is_file newdir/newfile &&
+ test_path_is_file newdir/b &&
+ test_path_is_file newdir/a
+ )
+'
+
+test_done