summaryrefslogtreecommitdiff
path: root/merge-ort.c
AgeCommit message (Collapse)Author
2021-02-16merge-ort: call diffcore_rename() directlyElijah Newren
We want to pass additional information to diffcore_rename() (or some variant thereof) without plumbing that extra information through diff_tree_oid() and diffcore_std(). Further, since we will need to gather additional special information related to diffs and are walking the trees anyway in collect_merge_info(), it seems odd to have diff_tree_oid()/diffcore_std() repeat those tree walks. And there may be times where we can avoid traversing into a subtree in collect_merge_info() (based on additional information at our disposal), that the basic diff logic would be unable to take advantage of. For all these reasons, just create the add and delete pairs ourself and then call diffcore_rename() directly. This change is primarily about enabling future optimizations; the advantage of avoiding extra tree traversals is small compared to the cost of rename detection, and the advantage of avoiding the extra tree traversals is somewhat offset by the extra time spent in collect_merge_info() collecting the additional data anyway. However... For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 13.294 s ± 0.103 s 12.775 s ± 0.062 s mega-renames: 187.248 s ± 0.882 s 188.754 s ± 0.284 s just-one-mega: 5.557 s ± 0.017 s 5.599 s ± 0.019 s Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-24merge-ort: begin performance work; instrument with trace2_region_* callsElijah Newren
Add some timing instrumentation for both merge-ort and diffcore-rename; I used these to measure and optimize performance in both, and several future patch series will build on these to reduce the timings of some select testcases. === Setup === The primary testcase I used involved rebasing a random topic in the linux kernel (consisting of 35 patches) against an older version. I added two variants, one where I rename a toplevel directory, and another where I only rebase one patch instead of the whole topic. The setup is as follows: $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git $ git branch hwmon-updates fd8bdb23b91876ac1e624337bb88dc1dcc21d67e $ git branch hwmon-just-one fd8bdb23b91876ac1e624337bb88dc1dcc21d67e~34 $ git branch base 4703d9119972bf586d2cca76ec6438f819ffa30e $ git switch -c 5.4-renames v5.4 $ git mv drivers pilots # Introduce over 26,000 renames $ git commit -m "Rename drivers/ to pilots/" $ git config merge.renameLimit 30000 $ git config merge.directoryRenames true === Testcases === Now with REBASE standing for either "git rebase [--merge]" (using merge-recursive) or "test-tool fast-rebase" (using merge-ort), the testcases are: Testcase #1: no-renames $ git checkout v5.4^0 $ REBASE --onto HEAD base hwmon-updates Note: technically the name is misleading; there are some renames, but very few. Rename detection only takes about half the overall time. Testcase #2: mega-renames $ git checkout 5.4-renames^0 $ REBASE --onto HEAD base hwmon-updates Testcase #3: just-one-mega $ git checkout 5.4-renames^0 $ REBASE --onto HEAD base hwmon-just-one === Timing results === Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames, 10 runs for the other two cases): merge-recursive merge-ort no-renames: 18.912 s ± 0.174 s 14.263 s ± 0.053 s mega-renames: 5964.031 s ± 10.459 s 5504.231 s ± 5.150 s just-one-mega: 149.583 s ± 0.751 s 158.534 s ± 0.498 s A single re-run of each with some breakdowns: --- no-renames --- merge-recursive merge-ort overall runtime: 19.302 s 14.257 s inexact rename detection: 7.603 s 7.906 s everything else: 11.699 s 6.351 s --- mega-renames --- merge-recursive merge-ort overall runtime: 5950.195 s 5499.672 s inexact rename detection: 5746.309 s 5487.120 s everything else: 203.886 s 17.552 s --- just-one-mega --- merge-recursive merge-ort overall runtime: 151.001 s 158.582 s inexact rename detection: 143.448 s 157.835 s everything else: 7.553 s 0.747 s === Timing observations === 0) Maximum speedup The "everything else" row represents the maximum speedup we could achieve if we were to somehow infinitely parallelize inexact rename detection, but leave everything else alone. The fact that this is so much smaller than the real runtime (even in the case with virtually no renames) makes it clear just how overwhelmingly large the time spent on rename detection can be. 1) no-renames 1a) merge-ort is faster than merge-recursive, which is nice. However, this still should not be considered good enough. Although the "merge" backend to rebase (merge-recursive) is sometimes faster than the "apply" backend, this is one of those cases where it is not. In fact, even merge-ort is slower. The "apply" backend can complete this testcase in 6.940 s ± 0.485 s which is about 2x faster than merge-ort and 3x faster than merge-recursive. One goal of the merge-ort performance work will be to make it faster than git-am on this (and similar) testcases. 2) mega-renames 2a) Obviously rename detection is a huge cost; it's where most the time is spent. We need to cut that down. If we could somehow infinitely parallelize it and drive its time to 0, the merge-recursive time would drop to about 204s, and the merge-ort time would drop to about 17s. I think this particular stat shows I've subtly baked a couple performance improvements into merge-ort and into fast-rebase already. 3) just-one-mega 3a) not much to say here, it just gives some flavor for how rebasing only one patch compares to rebasing 35. === Goals === This patch is obviously just the beginning. Here are some of my goals that this measurement will help us achieve: * Drive the cost of rename detection down considerably for merges * After the above has been achieved, see if there are other slowness factors (which would have previously been overshadowed by rename detection costs) which we can then focus on and also optimize. * Ensure our rebase testcase that requires little rename detection is noticeably faster with merge-ort than with apply-based rebase. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Taylor Blau <ttaylorr@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-24merge-ort: ignore the directory rename split conflict for nowElijah Newren
get_provisional_directory_renames() has code to detect directories being evenly split between different locations. However, as noted previously, if there are no new files added to that directory that was split evenly, our inability to determine where the directory was renamed to doesn't matter since there are no new files to try to move into the new location. Unfortunately, that code is unaware of whether there are new files under the directory in question and we just ignore that, causing us to fail t6423 test 2b but pass test 2a; turn off the error for now, swapping which tests pass and fail. The motivating reason for switching this off as a temporary measure is that as we add optimizations, we'll start looking at only subsets of renames, and subsets of renames can start switching the result we get when this error is (wrongly) on. Once we get enough optimizations, however, we can prevent that code from even running when there are no new files added to the relevant directory, at which point we can revert this commit and then both testcases 2a and 2b will pass simultaneously. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-24merge-ort: fix massive leakElijah Newren
When a series of merges was performed (such as for a rebase or series of cherry-picks), only the data structures allocated by the final merge operation were being freed. The problem was that while picking out pieces of merge-ort to upstream, I previously misread a certain section of merge_start() and assumed it was associated with a later optimization. Include that section now, which ensures that if there was a previous merge operation, that we clear out result->priv and then re-use it for opt->priv, and otherwise we allocate opt->priv. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21Merge branch 'en/ort-directory-rename' into en/merge-ort-perfJunio C Hamano
* en/ort-directory-rename: (28 commits) merge-ort: fix a directory rename detection bug merge-ort: process_renames() now needs more defensiveness merge-ort: implement apply_directory_rename_modifications() merge-ort: add a new toplevel_dir field merge-ort: implement handle_path_level_conflicts() merge-ort: implement check_for_directory_rename() merge-ort: implement apply_dir_rename() and check_dir_renamed() merge-ort: implement compute_collisions() merge-ort: modify collect_renames() for directory rename handling merge-ort: implement handle_directory_level_conflicts() merge-ort: implement compute_rename_counts() merge-ort: copy get_renamed_dir_portion() from merge-recursive.c merge-ort: add outline of get_provisional_directory_renames() merge-ort: add outline for computing directory renames merge-ort: collect which directories are removed in dirs_removed merge-ort: initialize and free new directory rename data structures merge-ort: add new data structures for directory rename detection merge-ort: add implementation of type-changed rename handling merge-ort: add implementation of normal rename handling merge-ort: add implementation of rename collisions ...
2021-01-21merge-ort: fix a directory rename detection bugElijah Newren
As noted in commit 902c521a35 ("t6423: more involved directory rename test", 2020-10-15), when we have a case where * dir/subdir/ has several files * almost all files in dir/subdir/ are renamed to folder/subdir/ * one of the files in dir/subdir/ is renamed to folder/subdir/newsubdir/ * the other side of history (that doesn't do the renames) adds a new file to dir/subdir/ Then for the majority of the file renames, the directory rename of dir/subdir/ -> folder/subdir/ is actually not represented that way but as dir/ -> folder/ We also had one rename that was represented as dir/subdir/ -> folder/subdir/newsubdir/ Now, since there's a new file in dir/subdir/, where does it go? Well, there's only one rule for dir/subdir/, so the code previously noted that this rule had the "majority" of the one "relevant" rename and thus erroneously used it to place the file in folder/subdir/newsubdir/. We really want the heavy weight associated with dir/ -> folder/ to also be treated as dir/subdir/ -> folder/subdir/, so that we correctly place the file in folder/subdir/. Add a bunch of logic to make sure that we use all relevant renamings in directory rename detection. Note that testcase 12f of t6423 still fails after this, but it gets further than merge-recursive does. There are some performance related bits in that testcase (the region_enter messages) that do not yet succeed, but the rest of the testcase works after this patch. Subsequent patch series will fix up the performance side. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21merge-ort: process_renames() now needs more defensivenessElijah Newren
Since directory rename detection adds new paths to opt->priv->paths and removes old ones, process_renames() needs to now check whether pair->one->path actually exists in opt->priv->paths instead of just assuming it does. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21merge-ort: implement apply_directory_rename_modifications()Elijah Newren
This function roughly follows the same outline as the function of the same name from merge-recursive.c, but the code diverges in multiple ways due to some special considerations: * merge-ort's version needs to update opt->priv->paths with any new paths (and opt->priv->paths points to struct conflict_infos which track quite a bit of metadata for each path); merge-recursive's version would directly update the index * merge-ort requires that opt->priv->paths has any leading directories of any relevant files also be included in the set of paths. And due to pointer equality requirements on merged_info.directory_name, we have to be careful how we compute and insert these. * due to the above requirements on opt->priv->paths, merge-ort's version starts with a long comment to explain all the special considerations that need to be handled * merge-ort can use the full data stored in opt->priv->paths to avoid making expensive get_tree_entry() calls to regather the necessary data. * due to messages being deferred automatically in merge-ort, this is the best place to handle conflict messages whereas in merge-recursive.c they are deferred manually so that processing of entries does all the printing Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21merge-ort: add a new toplevel_dir fieldElijah Newren
Due to the string-equality-iff-pointer-equality requirements placed on merged_info.directory_name, apply_directory_rename_modifications() will need to have access to the exact toplevel directory name string pointer and can't just use a new empty string. Store it in a field that we can use. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21merge-ort: implement handle_path_level_conflicts()Elijah Newren
This is copied from merge-recursive.c, with minor tweaks due to: * using strmap API * merge-ort not using the non_unique_new_dir field, since it'll obviate its need entirely later with performance improvements * adding a new path_in_way() function that uses opt->priv->paths instead of doing an expensive tree_has_path() lookup to see if a tree has a given path. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21merge-ort: implement check_for_directory_rename()Elijah Newren
This is copied from merge-recursive.c, with minor tweaks due to using strmap API and the fact that it can use opt->priv->paths to get all pathnames that exist instead of taking a tree object. This depends on a new function, handle_path_level_conflicts(), which just has a placeholder die-not-yet-implemented implementation for now; a subsequent patch will implement it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21merge-ort: implement apply_dir_rename() and check_dir_renamed()Elijah Newren
Both of these are copied from merge-recursive.c, with just minor tweaks due to using strmap API and not having a non_unique_new_dir field. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21merge-ort: implement compute_collisions()Elijah Newren
This is nearly a wholesale copy of compute_collisions() from merge-recursive.c, and the logic remains the same, but it has been tweaked slightly due to: * using strmap.h API (instead of direct hashmaps) * allocation/freeing of data structures were done separately in merge_start() and clear_or_reinit_internal_opts() in an earlier patch in this series * there is no non_unique_new_dir data field in merge-ort; that will be handled a different way It does depend on two new functions, apply_dir_rename() and check_dir_renamed() which were introduced with simple die-not-yet-implemented shells and will be implemented in subsequent patches. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21merge-ort: modify collect_renames() for directory rename handlingElijah Newren
collect_renames() is similar to merge-recursive.c's get_renames(), but lacks the directory rename handling found in the latter. Port that code structure over to merge-ort. This introduces three new die-not-yet-implemented functions that will be defined in future commits. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21merge-ort: implement handle_directory_level_conflicts()Elijah Newren
This is modelled on the version of handle_directory_level_conflicts() from merge-recursive.c, but is massively simplified due to the following factors: * strmap API provides simplifications over using direct hashmap * we have a dirs_removed field in struct rename_info that we have an easy way to populate from collect_merge_info(); this was already used in compute_rename_counts() and thus we do not need to check for condition #2. * The removal of condition #2 by handling it earlier in the code also obviates the need to check for condition #3 -- if both sides renamed a directory, meaning that the directory no longer exists on either side, then neither side could have added any new files to that directory, and thus there are no files whose locations we need to move due to such a directory rename. In fact, the same logic that makes condition #3 irrelevant means condition #1 is also irrelevant so we could drop this function. However, it is cheap to check if both sides rename the same directory, and doing so can save future computation. So, simply remove any directories that both sides renamed from the list of directory renames. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21merge-ort: implement compute_rename_counts()Elijah Newren
This function is based on the first half of get_directory_renames() from merge-recursive.c; as part of the implementation, factor out a routine, increment_count(), to update the bookkeeping to track the number of items renamed into new directories. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21merge-ort: copy get_renamed_dir_portion() from merge-recursive.cElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21merge-ort: add outline of get_provisional_directory_renames()Elijah Newren
This function is based on merge-recursive.c's get_directory_renames(), except that the first half has been split out into a not-yet-implemented compute_rename_counts(). The primary difference here is our lack of the non_unique_new_dir boolean in our strmap. The lack of that field will at first cause us to fail testcase 2b of t6423; however, future optimizations will obviate the need for that ugly field so we have just left it out. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21merge-ort: add outline for computing directory renamesElijah Newren
Port some directory rename handling changes from merge-recursive.c's detect_and_process_renames() to the same-named function of merge-ort.c. This does not yet add any use or handling of directory renames, just the outline for where we start to compute them. Thus, a future patch will add port additional changes to merge-ort's detect_and_process_renames(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-14Merge branch 'en/ort-conflict-handling' into en/merge-ort-perfJunio C Hamano
* en/ort-conflict-handling: merge-ort: add handling for different types of files at same path merge-ort: copy find_first_merges() implementation from merge-recursive.c merge-ort: implement format_commit() merge-ort: copy and adapt merge_submodule() from merge-recursive.c merge-ort: copy and adapt merge_3way() from merge-recursive.c merge-ort: flesh out implementation of handle_content_merge() merge-ort: handle book-keeping around two- and three-way content merge merge-ort: implement unique_path() helper merge-ort: handle directory/file conflicts that remain merge-ort: handle D/F conflict where directory disappears due to merge
2021-01-07merge-ort: collect which directories are removed in dirs_removedElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-07merge-ort: initialize and free new directory rename data structuresElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-07merge-ort: add new data structures for directory rename detectionElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-07Merge branch 'en/merge-ort-3' into en/ort-directory-renameJunio C Hamano
* en/merge-ort-3: merge-ort: add implementation of type-changed rename handling merge-ort: add implementation of normal rename handling merge-ort: add implementation of rename collisions merge-ort: add implementation of rename/delete conflicts merge-ort: add implementation of both sides renaming differently merge-ort: add implementation of both sides renaming identically merge-ort: add basic outline for process_renames() merge-ort: implement compare_pairs() and collect_renames() merge-ort: implement detect_regular_renames() merge-ort: add initial outline for basic rename detection merge-ort: add basic data structures for handling renames
2021-01-04merge-ort: add handling for different types of files at same pathElijah Newren
Add some handling that explicitly considers collisions of the following types: * file/submodule * file/symlink * submodule/symlink Leaving them as conflicts at the same path are hard for users to resolve, so move one or both of them aside so that they each get their own path. Note that in the case of recursive handling (i.e. call_depth > 0), we can just use the merge base of the two merge bases as the merge result much like we do with modify/delete conflicts, binary files, conflicting submodule values, and so on. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04merge-ort: copy find_first_merges() implementation from merge-recursive.cElijah Newren
Code is identical for the function body in the two files, the call signature is just slightly different in merge-ort than merge-recursive as noted a couple commits ago. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04merge-ort: implement format_commit()Elijah Newren
This implementation is based on a mixture of print_commit() and output_commit_title() from merge-recursive.c so that it can be used to take over both functions. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04merge-ort: copy and adapt merge_submodule() from merge-recursive.cElijah Newren
Take merge_submodule() from merge-recursive.c and make slight adjustments, predominantly around deferring output using path_msg() instead of using merge-recursive's output() and show() functions. There's also a fix for recursive cases (when call_depth > 0) and a slight change to argument order for find_first_merges(). find_first_merges() and format_commit() are left unimplemented for now, but will be added by subsequent commits. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04merge-ort: copy and adapt merge_3way() from merge-recursive.cElijah Newren
Take merge_3way() from merge-recursive.c and make slight adjustments based on different data structures (direct usage of object_id rather diff_filespec, separate pathnames which based on our careful interning of pathnames in opt->priv->paths can be compared with '!=' rather than 'strcmp'). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04merge-ort: flesh out implementation of handle_content_merge()Elijah Newren
This implementation is based heavily on merge_mode_and_contents() from merge-recursive.c, though it has some fixes for recursive merges (i.e. when call_depth > 0), and has a number of changes throughout based on slight differences in data structures and in how the functions are called. It is, however, based on two new helper functions -- merge_3way() and merge_submodule -- for which we only provide die-not-implemented stubs at this point. Future commits will add implementations of these functions. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04merge-ort: handle book-keeping around two- and three-way content mergeElijah Newren
In addition to the content merge (which will go in a subsequent commit), we need to worry about conflict messages, placing results in higher order stages in case of a df_conflict, and making sure the results are placed in ci->merged.result so that they will show up in the working tree. Take care of all that external book-keeping, moving the simplistic just-take-HEAD code into the barebones handle_content_merge() function for now. Subsequent commits will flesh out handle_content_merge(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04merge-ort: implement unique_path() helperElijah Newren
Implement unique_path(), based on the one from merge-recursive.c. It is simplified, however, due to: (1) using strmaps, and (2) the fact that merge-ort lets the checkout codepath handle possible collisions with the working tree means that other code locations don't have to. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04merge-ort: handle directory/file conflicts that remainElijah Newren
When a directory/file conflict remains, we can leave the directory where it is, but need to move the information about the file to a different pathname. After moving the file to a different pathname, we allow subsequent process_entry() logic to handle any additional details that might be relevant. This depends on a new helper function, unique_path(), that dies with an unimplemented error currently but will be implemented in a subsequent commit. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04merge-ort: handle D/F conflict where directory disappears due to mergeElijah Newren
When one side has a directory at a given path and the other side of history has a file at the path, but the merge resolves the directory away (e.g. because no path within that directory was modified and the other side deleted it, or because renaming moved all the files elsewhere), then we don't actually have a conflict anymore. We just need to clear away any information related to the relevant directory, and then the subsequent process_entry() handling can handle the given path. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-17merge-ort: implement merge_incore_recursive()Elijah Newren
Implement merge_incore_recursive(), mostly through the use of a new helper function, merge_ort_internal(), which itself is based off merge_recursive_internal() from merge-recursive.c. This drops the number of failures in the testsuite when run under GIT_TEST_MERGE_ALGORITHM=ort from around 1500 to 647. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-17merge-ort: make clear_internal_opts() aware of partial clearingElijah Newren
In order to handle recursive merges, after merging merge-bases we need to clear away most of the data we had built up but some of it needs to be kept -- in particular the "output" field. Rename the function to reflect its future change in use. Further, since "reinitialize" means we'll be reusing the fields immediately, take advantage of this to only partially clear maps, leaving the hashtable allocated and pre-sized. (This may be slightly out-of-order since the speedups aren't realized until there are far more strmaps in use, but the patch submission process already went out of order because of various questions and requests for strmap. Anyway, see commit 6ccdfc2a20 ("strmap: enable faster clearing and reusing of strmaps", 2020-11-05), for performance details about the use of strmap_partial_clear().) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-17merge-ort: copy a few small helper functions from merge-recursive.cElijah Newren
In a subsequent commit, we will implement the traditional recursiveness that gave merge-recursive its name, namely merging non-unique merge-bases to come up with a single virtual merge base. Copy a few helper functions from merge-recursive.c that we will use in the implementation. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-16merge-ort: add implementation of type-changed rename handlingElijah Newren
Implement cases where renames are involved in type changes (i.e. the side of history that didn't rename the file changed its type from a regular file to a symlink or submodule). There was some code to handle this in merge-recursive but only in the special case when the renamed file had no content changes. The code here works differently -- it knows process_entry() can handle mode conflicts, so it does a few minimal tweaks to ensure process_entry() can just finish the job as needed. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-16merge-ort: add implementation of normal rename handlingElijah Newren
Implement handling of normal renames. This code replaces the following from merge-recurisve.c: * the code relevant to RENAME_NORMAL in process_renames() * the RENAME_NORMAL case of process_entry() Also, there is some shared code from merge-recursive.c for multiple different rename cases which we will no longer need for this case (or other rename cases): * handle_rename_normal() * setup_rename_conflict_info() The consolidation of four separate codepaths into one is made possible by a change in design: process_renames() tweaks the conflict_info entries within opt->priv->paths such that process_entry() can then handle all the non-rename conflict types (directory/file, modify/delete, etc.) orthogonally. This means we're much less likely to miss special implementation of some kind of combination of conflict types (see commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'", 2020-11-18), especially commit ef52778708 ("merge tests: expect improved directory/file conflict handling in ort", 2020-10-26) for more details). That, together with letting worktree/index updating be handled orthogonally in the merge_switch_to_result() function, dramatically simplifies the code for various special rename cases. (To be fair, the code for handling normal renames wasn't all that complicated beforehand, but it's still much simpler now.) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-16merge-ort: add implementation of rename collisionsElijah Newren
Implement rename/rename(2to1) and rename/add handling, i.e. a file is renamed into a location where another file is added (with that other file either being a plain add or itself coming from a rename). Note that rename collisions can also have a special case stacked on top: the file being renamed on one side of history is deleted on the other (yielding either a rename/add/delete conflict or perhaps a rename/rename(2to1)/delete[/delete]) conflict. One thing to note here is that when there is a double rename, the code in question only handles one of them at a time; a later iteration through the loop will handle the other. After they've both been handled, process_entry()'s normal add/add code can handle the collision. This code replaces the following from merge-recurisve.c: * all the 2to1 code in process_renames() * the RENAME_TWO_FILES_TO_ONE case of process_entry() * handle_rename_rename_2to1() * handle_rename_add() Also, there is some shared code from merge-recursive.c for multiple different rename cases which we will no longer need for this case (or other rename cases): * handle_file_collision() * setup_rename_conflict_info() The consolidation of six separate codepaths into one is made possible by a change in design: process_renames() tweaks the conflict_info entries within opt->priv->paths such that process_entry() can then handle all the non-rename conflict types (directory/file, modify/delete, etc.) orthogonally. This means we're much less likely to miss special implementation of some kind of combination of conflict types (see commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'", 2020-11-18), especially commit ef52778708 ("merge tests: expect improved directory/file conflict handling in ort", 2020-10-26) for more details). That, together with letting worktree/index updating be handled orthogonally in the merge_switch_to_result() function, dramatically simplifies the code for various special rename cases. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-16merge-ort: add implementation of rename/delete conflictsElijah Newren
Implement rename/delete conflicts, i.e. one side renames a file and the other deletes the file. This code replaces the following from merge-recurisve.c: * the code relevant to RENAME_DELETE in process_renames() * the RENAME_DELETE case of process_entry() * handle_rename_delete() Also, there is some shared code from merge-recursive.c for multiple different rename cases which we will no longer need for this case (or other rename cases): * handle_change_delete() * setup_rename_conflict_info() The consolidation of five separate codepaths into one is made possible by a change in design: process_renames() tweaks the conflict_info entries within opt->priv->paths such that process_entry() can then handle all the non-rename conflict types (directory/file, modify/delete, etc.) orthogonally. This means we're much less likely to miss special implementation of some kind of combination of conflict types (see commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'", 2020-11-18), especially commit ef52778708 ("merge tests: expect improved directory/file conflict handling in ort", 2020-10-26) for more details). That, together with letting worktree/index updating be handled orthogonally in the merge_switch_to_result() function, dramatically simplifies the code for various special rename cases. To be fair, there is a _slight_ tweak to process_entry() here, because rename/delete cases will also trigger the modify/delete codepath. However, we only want a modify/delete message to be printed for a rename/delete conflict if there is a content change in the renamed file in addition to the rename. So process_renames() and process_entry() aren't quite fully orthogonal, but they are pretty close. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-16merge-ort: add implementation of both sides renaming differentlyElijah Newren
Implement rename/rename(1to2) handling, i.e. both sides of history renaming a file and rename it differently. This code replaces the following from merge-recurisve.c: * all the 1to2 code in process_renames() * the RENAME_ONE_FILE_TO_TWO case of process_entry() * handle_rename_rename_1to2() Also, there is some shared code from merge-recursive.c for multiple different rename cases which we will no longer need for this case (or other rename cases): * handle_file_collision() * setup_rename_conflict_info() The consolidation of five separate codepaths into one is made possible by a change in design: process_renames() tweaks the conflict_info entries within opt->priv->paths such that process_entry() can then handle all the non-rename conflict types (directory/file, modify/delete, etc.) orthogonally. This means we're much less likely to miss special implementation of some kind of combination of conflict types (see commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'", 2020-11-18), especially commit ef52778708 ("merge tests: expect improved directory/file conflict handling in ort", 2020-10-26) for more details). That, together with letting worktree/index updating be handled orthogonally in the merge_switch_to_result() function, dramatically simplifies the code for various special rename cases. To be fair, there is a _slight_ tweak to process_entry() here to make sure that the two different paths aren't marked as clean but are left in a conflicted state. So process_renames() and process_entry() aren't quite entirely orthogonal, but they are pretty close. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-16merge-ort: add implementation of both sides renaming identicallyElijah Newren
Implement rename/rename(1to1) handling, i.e. both sides of history renaming a file but renaming the same way. This code replaces the following from merge-recurisve.c: * all the 1to1 code in process_renames() * the RENAME_ONE_FILE_TO_ONE case of process_entry() Also, there is some shared code from merge-recursive.c for multiple different rename cases which we will no longer need for this case (or other rename cases): * handle_rename_normal() * setup_rename_conflict_info() The consolidation of four separate codepaths into one is made possible by a change in design: process_renames() tweaks the conflict_info entries within opt->priv->paths such that process_entry() can then handle all the non-rename conflict types (directory/file, modify/delete, etc.) orthogonally. This means we're much less likely to miss special implementation of some kind of combination of conflict types (see commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'", 2020-11-18), especially commit ef52778708 ("merge tests: expect improved directory/file conflict handling in ort", 2020-10-26) for more details). That, together with letting worktree/index updating be handled orthogonally in the merge_switch_to_result() function, dramatically simplifies the code for various special rename cases. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-14merge-ort: add basic outline for process_renames()Elijah Newren
Add code which determines which kind of special rename case each rename corresponds to, but leave the handling of each type unimplemented for now. Future commits will implement each one. There is some tenuous resemblance to merge-recursive's process_renames(), but comparing the two is very unlikely to yield any insights. merge-ort's process_renames() is a bit complex and I would prefer if I could simplify it more, but it is far easier to grok than merge-recursive's function of the same name in my opinion. Plus, merge-ort handles more rename conflict types than merge-recursive does. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-14merge-ort: implement compare_pairs() and collect_renames()Elijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-14merge-ort: implement detect_regular_renames()Elijah Newren
Based heavily on merge-recursive's get_diffpairs() function, and also includes the necessary paired call to diff_warn_rename_limit() so that users will be warned if merge.renameLimit is not sufficiently large for rename detection to run. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-14merge-ort: add initial outline for basic rename detectionElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-14merge-ort: add basic data structures for handling renamesElijah Newren
This will grow later, but we only need a few fields for basic rename handling. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: add modify/delete handling and delayed output processingElijah Newren
The focus here is on adding a path_msg() which will queue up warning/conflict/notice messages about the merge for later processing, storing these in a pathname -> strbuf map. It might seem like a big change, but it really just is: * declaration of necessary map with some comments * initialization and recording of data * a bunch of code to iterate over the map at print/free time * at least one caller in order to avoid an error about having an unused function (which we provide in the form of implementing modify/delete conflict handling). At this stage, it is probably not clear why I am opting for delayed output processing. There are multiple reasons: 1. Merges are supposed to abort if they would overwrite dirty changes in the working tree. We cannot correctly determine whether changes would be overwritten until both rename detection has occurred and full processing of entries with the renames has finalized. Warning/conflict/notice messages come up at intermediate codepaths along the way, so unless we want spurious conflict/warning messages being printed when the merge will be aborted anyway, we need to save these messages and only print them when relevant. 2. There can be multiple messages for a single path, and we want all messages for a give path to appear together instead of having them grouped by conflict/warning type. This was a problem already with merge-recursive.c but became even more important due to the splitting apart of conflict types as discussed in the commit message for 1f3c9ba707 ("t6425: be more flexible with rename/delete conflict messages", 2020-08-10) 3. Some callers might want to avoid showing the output in certain cases, such as if the end result is a clean merge. Rebases have typically done this. 4. Some callers might not want the output to go to stdout or even stderr, but might want to do something else with it entirely. For example, a --remerge-diff option to `git show` or `git log -p` that remerges on the fly and diffs merge commits against the remerged version would benefit from stdout/stderr not being written to in the standard form. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: add die-not-implemented stub handle_content_merge() functionElijah Newren
This simplistic and weird-looking patch is here to facilitate future patch submissions. Adding this stub allows rename detection code to reference it in one patch series, while a separate patch series can define the implementation, and then both series can merge cleanly and work nicely together at that point. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>