summaryrefslogtreecommitdiff
path: root/merge-ort.c
AgeCommit message (Collapse)Author
2021-02-06Merge branch 'en/ort-conflict-handling'Junio C Hamano
ORT merge strategy learns more support for merge conflicts. * en/ort-conflict-handling: merge-ort: add handling for different types of files at same path merge-ort: copy find_first_merges() implementation from merge-recursive.c merge-ort: implement format_commit() merge-ort: copy and adapt merge_submodule() from merge-recursive.c merge-ort: copy and adapt merge_3way() from merge-recursive.c merge-ort: flesh out implementation of handle_content_merge() merge-ort: handle book-keeping around two- and three-way content merge merge-ort: implement unique_path() helper merge-ort: handle directory/file conflicts that remain merge-ort: handle D/F conflict where directory disappears due to merge
2021-01-25Merge branch 'en/merge-ort-3'Junio C Hamano
Rename detection is added to the "ORT" merge strategy. * en/merge-ort-3: merge-ort: add implementation of type-changed rename handling merge-ort: add implementation of normal rename handling merge-ort: add implementation of rename collisions merge-ort: add implementation of rename/delete conflicts merge-ort: add implementation of both sides renaming differently merge-ort: add implementation of both sides renaming identically merge-ort: add basic outline for process_renames() merge-ort: implement compare_pairs() and collect_renames() merge-ort: implement detect_regular_renames() merge-ort: add initial outline for basic rename detection merge-ort: add basic data structures for handling renames
2021-01-04merge-ort: add handling for different types of files at same pathElijah Newren
Add some handling that explicitly considers collisions of the following types: * file/submodule * file/symlink * submodule/symlink Leaving them as conflicts at the same path are hard for users to resolve, so move one or both of them aside so that they each get their own path. Note that in the case of recursive handling (i.e. call_depth > 0), we can just use the merge base of the two merge bases as the merge result much like we do with modify/delete conflicts, binary files, conflicting submodule values, and so on. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04merge-ort: copy find_first_merges() implementation from merge-recursive.cElijah Newren
Code is identical for the function body in the two files, the call signature is just slightly different in merge-ort than merge-recursive as noted a couple commits ago. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04merge-ort: implement format_commit()Elijah Newren
This implementation is based on a mixture of print_commit() and output_commit_title() from merge-recursive.c so that it can be used to take over both functions. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04merge-ort: copy and adapt merge_submodule() from merge-recursive.cElijah Newren
Take merge_submodule() from merge-recursive.c and make slight adjustments, predominantly around deferring output using path_msg() instead of using merge-recursive's output() and show() functions. There's also a fix for recursive cases (when call_depth > 0) and a slight change to argument order for find_first_merges(). find_first_merges() and format_commit() are left unimplemented for now, but will be added by subsequent commits. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04merge-ort: copy and adapt merge_3way() from merge-recursive.cElijah Newren
Take merge_3way() from merge-recursive.c and make slight adjustments based on different data structures (direct usage of object_id rather diff_filespec, separate pathnames which based on our careful interning of pathnames in opt->priv->paths can be compared with '!=' rather than 'strcmp'). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04merge-ort: flesh out implementation of handle_content_merge()Elijah Newren
This implementation is based heavily on merge_mode_and_contents() from merge-recursive.c, though it has some fixes for recursive merges (i.e. when call_depth > 0), and has a number of changes throughout based on slight differences in data structures and in how the functions are called. It is, however, based on two new helper functions -- merge_3way() and merge_submodule -- for which we only provide die-not-implemented stubs at this point. Future commits will add implementations of these functions. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04merge-ort: handle book-keeping around two- and three-way content mergeElijah Newren
In addition to the content merge (which will go in a subsequent commit), we need to worry about conflict messages, placing results in higher order stages in case of a df_conflict, and making sure the results are placed in ci->merged.result so that they will show up in the working tree. Take care of all that external book-keeping, moving the simplistic just-take-HEAD code into the barebones handle_content_merge() function for now. Subsequent commits will flesh out handle_content_merge(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04merge-ort: implement unique_path() helperElijah Newren
Implement unique_path(), based on the one from merge-recursive.c. It is simplified, however, due to: (1) using strmaps, and (2) the fact that merge-ort lets the checkout codepath handle possible collisions with the working tree means that other code locations don't have to. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04merge-ort: handle directory/file conflicts that remainElijah Newren
When a directory/file conflict remains, we can leave the directory where it is, but need to move the information about the file to a different pathname. After moving the file to a different pathname, we allow subsequent process_entry() logic to handle any additional details that might be relevant. This depends on a new helper function, unique_path(), that dies with an unimplemented error currently but will be implemented in a subsequent commit. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04merge-ort: handle D/F conflict where directory disappears due to mergeElijah Newren
When one side has a directory at a given path and the other side of history has a file at the path, but the merge resolves the directory away (e.g. because no path within that directory was modified and the other side deleted it, or because renaming moved all the files elsewhere), then we don't actually have a conflict anymore. We just need to clear away any information related to the relevant directory, and then the subsequent process_entry() handling can handle the given path. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-17merge-ort: implement merge_incore_recursive()Elijah Newren
Implement merge_incore_recursive(), mostly through the use of a new helper function, merge_ort_internal(), which itself is based off merge_recursive_internal() from merge-recursive.c. This drops the number of failures in the testsuite when run under GIT_TEST_MERGE_ALGORITHM=ort from around 1500 to 647. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-17merge-ort: make clear_internal_opts() aware of partial clearingElijah Newren
In order to handle recursive merges, after merging merge-bases we need to clear away most of the data we had built up but some of it needs to be kept -- in particular the "output" field. Rename the function to reflect its future change in use. Further, since "reinitialize" means we'll be reusing the fields immediately, take advantage of this to only partially clear maps, leaving the hashtable allocated and pre-sized. (This may be slightly out-of-order since the speedups aren't realized until there are far more strmaps in use, but the patch submission process already went out of order because of various questions and requests for strmap. Anyway, see commit 6ccdfc2a20 ("strmap: enable faster clearing and reusing of strmaps", 2020-11-05), for performance details about the use of strmap_partial_clear().) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-17merge-ort: copy a few small helper functions from merge-recursive.cElijah Newren
In a subsequent commit, we will implement the traditional recursiveness that gave merge-recursive its name, namely merging non-unique merge-bases to come up with a single virtual merge base. Copy a few helper functions from merge-recursive.c that we will use in the implementation. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-16merge-ort: add implementation of type-changed rename handlingElijah Newren
Implement cases where renames are involved in type changes (i.e. the side of history that didn't rename the file changed its type from a regular file to a symlink or submodule). There was some code to handle this in merge-recursive but only in the special case when the renamed file had no content changes. The code here works differently -- it knows process_entry() can handle mode conflicts, so it does a few minimal tweaks to ensure process_entry() can just finish the job as needed. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-16merge-ort: add implementation of normal rename handlingElijah Newren
Implement handling of normal renames. This code replaces the following from merge-recurisve.c: * the code relevant to RENAME_NORMAL in process_renames() * the RENAME_NORMAL case of process_entry() Also, there is some shared code from merge-recursive.c for multiple different rename cases which we will no longer need for this case (or other rename cases): * handle_rename_normal() * setup_rename_conflict_info() The consolidation of four separate codepaths into one is made possible by a change in design: process_renames() tweaks the conflict_info entries within opt->priv->paths such that process_entry() can then handle all the non-rename conflict types (directory/file, modify/delete, etc.) orthogonally. This means we're much less likely to miss special implementation of some kind of combination of conflict types (see commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'", 2020-11-18), especially commit ef52778708 ("merge tests: expect improved directory/file conflict handling in ort", 2020-10-26) for more details). That, together with letting worktree/index updating be handled orthogonally in the merge_switch_to_result() function, dramatically simplifies the code for various special rename cases. (To be fair, the code for handling normal renames wasn't all that complicated beforehand, but it's still much simpler now.) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-16merge-ort: add implementation of rename collisionsElijah Newren
Implement rename/rename(2to1) and rename/add handling, i.e. a file is renamed into a location where another file is added (with that other file either being a plain add or itself coming from a rename). Note that rename collisions can also have a special case stacked on top: the file being renamed on one side of history is deleted on the other (yielding either a rename/add/delete conflict or perhaps a rename/rename(2to1)/delete[/delete]) conflict. One thing to note here is that when there is a double rename, the code in question only handles one of them at a time; a later iteration through the loop will handle the other. After they've both been handled, process_entry()'s normal add/add code can handle the collision. This code replaces the following from merge-recurisve.c: * all the 2to1 code in process_renames() * the RENAME_TWO_FILES_TO_ONE case of process_entry() * handle_rename_rename_2to1() * handle_rename_add() Also, there is some shared code from merge-recursive.c for multiple different rename cases which we will no longer need for this case (or other rename cases): * handle_file_collision() * setup_rename_conflict_info() The consolidation of six separate codepaths into one is made possible by a change in design: process_renames() tweaks the conflict_info entries within opt->priv->paths such that process_entry() can then handle all the non-rename conflict types (directory/file, modify/delete, etc.) orthogonally. This means we're much less likely to miss special implementation of some kind of combination of conflict types (see commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'", 2020-11-18), especially commit ef52778708 ("merge tests: expect improved directory/file conflict handling in ort", 2020-10-26) for more details). That, together with letting worktree/index updating be handled orthogonally in the merge_switch_to_result() function, dramatically simplifies the code for various special rename cases. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-16merge-ort: add implementation of rename/delete conflictsElijah Newren
Implement rename/delete conflicts, i.e. one side renames a file and the other deletes the file. This code replaces the following from merge-recurisve.c: * the code relevant to RENAME_DELETE in process_renames() * the RENAME_DELETE case of process_entry() * handle_rename_delete() Also, there is some shared code from merge-recursive.c for multiple different rename cases which we will no longer need for this case (or other rename cases): * handle_change_delete() * setup_rename_conflict_info() The consolidation of five separate codepaths into one is made possible by a change in design: process_renames() tweaks the conflict_info entries within opt->priv->paths such that process_entry() can then handle all the non-rename conflict types (directory/file, modify/delete, etc.) orthogonally. This means we're much less likely to miss special implementation of some kind of combination of conflict types (see commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'", 2020-11-18), especially commit ef52778708 ("merge tests: expect improved directory/file conflict handling in ort", 2020-10-26) for more details). That, together with letting worktree/index updating be handled orthogonally in the merge_switch_to_result() function, dramatically simplifies the code for various special rename cases. To be fair, there is a _slight_ tweak to process_entry() here, because rename/delete cases will also trigger the modify/delete codepath. However, we only want a modify/delete message to be printed for a rename/delete conflict if there is a content change in the renamed file in addition to the rename. So process_renames() and process_entry() aren't quite fully orthogonal, but they are pretty close. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-16merge-ort: add implementation of both sides renaming differentlyElijah Newren
Implement rename/rename(1to2) handling, i.e. both sides of history renaming a file and rename it differently. This code replaces the following from merge-recurisve.c: * all the 1to2 code in process_renames() * the RENAME_ONE_FILE_TO_TWO case of process_entry() * handle_rename_rename_1to2() Also, there is some shared code from merge-recursive.c for multiple different rename cases which we will no longer need for this case (or other rename cases): * handle_file_collision() * setup_rename_conflict_info() The consolidation of five separate codepaths into one is made possible by a change in design: process_renames() tweaks the conflict_info entries within opt->priv->paths such that process_entry() can then handle all the non-rename conflict types (directory/file, modify/delete, etc.) orthogonally. This means we're much less likely to miss special implementation of some kind of combination of conflict types (see commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'", 2020-11-18), especially commit ef52778708 ("merge tests: expect improved directory/file conflict handling in ort", 2020-10-26) for more details). That, together with letting worktree/index updating be handled orthogonally in the merge_switch_to_result() function, dramatically simplifies the code for various special rename cases. To be fair, there is a _slight_ tweak to process_entry() here to make sure that the two different paths aren't marked as clean but are left in a conflicted state. So process_renames() and process_entry() aren't quite entirely orthogonal, but they are pretty close. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-16merge-ort: add implementation of both sides renaming identicallyElijah Newren
Implement rename/rename(1to1) handling, i.e. both sides of history renaming a file but renaming the same way. This code replaces the following from merge-recurisve.c: * all the 1to1 code in process_renames() * the RENAME_ONE_FILE_TO_ONE case of process_entry() Also, there is some shared code from merge-recursive.c for multiple different rename cases which we will no longer need for this case (or other rename cases): * handle_rename_normal() * setup_rename_conflict_info() The consolidation of four separate codepaths into one is made possible by a change in design: process_renames() tweaks the conflict_info entries within opt->priv->paths such that process_entry() can then handle all the non-rename conflict types (directory/file, modify/delete, etc.) orthogonally. This means we're much less likely to miss special implementation of some kind of combination of conflict types (see commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'", 2020-11-18), especially commit ef52778708 ("merge tests: expect improved directory/file conflict handling in ort", 2020-10-26) for more details). That, together with letting worktree/index updating be handled orthogonally in the merge_switch_to_result() function, dramatically simplifies the code for various special rename cases. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-14merge-ort: add basic outline for process_renames()Elijah Newren
Add code which determines which kind of special rename case each rename corresponds to, but leave the handling of each type unimplemented for now. Future commits will implement each one. There is some tenuous resemblance to merge-recursive's process_renames(), but comparing the two is very unlikely to yield any insights. merge-ort's process_renames() is a bit complex and I would prefer if I could simplify it more, but it is far easier to grok than merge-recursive's function of the same name in my opinion. Plus, merge-ort handles more rename conflict types than merge-recursive does. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-14merge-ort: implement compare_pairs() and collect_renames()Elijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-14merge-ort: implement detect_regular_renames()Elijah Newren
Based heavily on merge-recursive's get_diffpairs() function, and also includes the necessary paired call to diff_warn_rename_limit() so that users will be warned if merge.renameLimit is not sufficiently large for rename detection to run. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-14merge-ort: add initial outline for basic rename detectionElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-14merge-ort: add basic data structures for handling renamesElijah Newren
This will grow later, but we only need a few fields for basic rename handling. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: add modify/delete handling and delayed output processingElijah Newren
The focus here is on adding a path_msg() which will queue up warning/conflict/notice messages about the merge for later processing, storing these in a pathname -> strbuf map. It might seem like a big change, but it really just is: * declaration of necessary map with some comments * initialization and recording of data * a bunch of code to iterate over the map at print/free time * at least one caller in order to avoid an error about having an unused function (which we provide in the form of implementing modify/delete conflict handling). At this stage, it is probably not clear why I am opting for delayed output processing. There are multiple reasons: 1. Merges are supposed to abort if they would overwrite dirty changes in the working tree. We cannot correctly determine whether changes would be overwritten until both rename detection has occurred and full processing of entries with the renames has finalized. Warning/conflict/notice messages come up at intermediate codepaths along the way, so unless we want spurious conflict/warning messages being printed when the merge will be aborted anyway, we need to save these messages and only print them when relevant. 2. There can be multiple messages for a single path, and we want all messages for a give path to appear together instead of having them grouped by conflict/warning type. This was a problem already with merge-recursive.c but became even more important due to the splitting apart of conflict types as discussed in the commit message for 1f3c9ba707 ("t6425: be more flexible with rename/delete conflict messages", 2020-08-10) 3. Some callers might want to avoid showing the output in certain cases, such as if the end result is a clean merge. Rebases have typically done this. 4. Some callers might not want the output to go to stdout or even stderr, but might want to do something else with it entirely. For example, a --remerge-diff option to `git show` or `git log -p` that remerges on the fly and diffs merge commits against the remerged version would benefit from stdout/stderr not being written to in the standard form. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: add die-not-implemented stub handle_content_merge() functionElijah Newren
This simplistic and weird-looking patch is here to facilitate future patch submissions. Adding this stub allows rename detection code to reference it in one patch series, while a separate patch series can define the implementation, and then both series can merge cleanly and work nicely together at that point. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: add function grouping commentsElijah Newren
Commit b658536f59 ("merge-ort: add some high-level algorithm structure", 2020-10-27) added high-level structure of the ort merge algorithm. As we have added more and more functions, that high-level structure has been slightly obscured. Since functions are still grouped according to this high-level structure, add comments denoting sections where all the functions are specifically tied to a piece of the high-level structure. This function groupings include a few sub-divisions of the original high-level structure, including some sub-divisions that are yet to be submitted. Each has (or will have) several functions all serving as helpers to one or two main functions for each section. As an added bonus, the comments will serve to provide a small textual separation between nearby sections and allow the next three patch series to be submitted independently and merge cleanly. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: add a paths_to_free field to merge_options_internalElijah Newren
This field will be used in future patches to allow removal of paths from opt->priv->paths. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: add a path_conflict field to merge_options_internalElijah Newren
This field is not yet used, but will be used by both the rename handling code, and the conflict type handling code in process_entry(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: add a clear_internal_opts helperElijah Newren
Move most of merge_finalize() into a new helper function, clear_internal_opts(). This is a step to facilitate recursive merges, as well as some future optimizations. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: add a few includesElijah Newren
Include blob.h for definition of blob_type, and commit-reach.h for declarations of get_merge_bases() and in_merge_bases(). While none of these are used yet, we want to avoid cross-dependencies in the next three series of patches for merge-ort and merge them at the end; adding these "#include"s now avoids textual conflicts. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: free data structures in merge_finalize()Elijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: add implementation of record_conflicted_index_entries()Elijah Newren
After checkout(), the working tree has the appropriate contents, and the index matches the working copy. That means that all unmodified and cleanly merged files have correct index entries, but conflicted entries need to be updated. We do this by looping over the conflicted entries, marking the existing index entry for the path with CE_REMOVE, adding new higher order staged for the path at the end of the index (ignoring normal index sort order), and then at the end of the loop removing the CE_REMOVED-marked cache entries and sorting the index. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: add implementation of checkout()Elijah Newren
Since merge-ort creates a tree for its output, when there are no conflicts, updating the working tree and index is as simple as using the unpack_trees() machinery with a twoway_merge (i.e. doing the equivalent of a "checkout" operation). If there were conflicts in the merge, then since the tree we created included all the conflict markers, then using the unpack_trees machinery in this manner will still update the working tree correctly. Further, all index entries corresponding to cleanly merged files will also be updated correctly by this procedure. Index entries corresponding to conflicted entries will appear as though the user had run "git add -u" after the merge to accept all files as-is with conflict markers. Thus, after running unpack_trees(), there needs to be a separate step for updating the entries in the index corresponding to conflicted files. This will be the job for the function record_conflicted_index_entris(), which will be implemented in a subsequent commit. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: basic outline for merge_switch_to_result()Elijah Newren
This adds a basic implementation for merge_switch_to_result(), though just in terms of a few new empty functions that will be defined in subsequent commits. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: step 3 of tree writing -- handling subdirectories as we goElijah Newren
Our order for processing of entries means that if we have a tree of files that looks like Makefile src/moduleA/foo.c src/moduleA/bar.c src/moduleB/baz.c src/moduleB/umm.c tokens.txt Then we will process paths in the order of the leftmost column below. I have added two additional columns that help explain the algorithm that follows; the 2nd column is there to remind us we have oid & mode info we are tracking for each of these paths (which differs between the paths which I'm not representing well here), and the third column annotates the parent directory of the entry: tokens.txt <version_info> "" src/moduleB/umm.c <version_info> src/moduleB src/moduleB/baz.c <version_info> src/moduleB src/moduleB <version_info> src src/moduleA/foo.c <version_info> src/moduleA src/moduleA/bar.c <version_info> src/moduleA src/moduleA <version_info> src src <version_info> "" Makefile <version_info> "" When the parent directory changes, if it's a subdirectory of the previous parent directory (e.g. "" -> src/moduleB) then we can just keep appending. If the parent directory differs from the previous parent directory and is not a subdirectory, then we should process that directory. So, for example, when we get to this point: tokens.txt <version_info> "" src/moduleB/umm.c <version_info> src/moduleB src/moduleB/baz.c <version_info> src/moduleB and note that the next entry (src/moduleB) has a different parent than the last one that isn't a subdirectory, we should write out a tree for it 100644 blob <HASH> umm.c 100644 blob <HASH> baz.c then pop all the entries under that directory while recording the new hash for that directory, leaving us with tokens.txt <version_info> "" src/moduleB <new version_info> src This process repeats until at the end we get to tokens.txt <version_info> "" src <new version_info> "" Makefile <version_info> "" and then we can write out the toplevel tree. Since we potentially have entries in our string_list corresponding to multiple different toplevel directories, e.g. a slightly different repository might have: whizbang.txt <version_info> "" tokens.txt <version_info> "" src/moduleD <new version_info> src src/moduleC <new version_info> src src/moduleB <new version_info> src src/moduleA/foo.c <version_info> src/moduleA src/moduleA/bar.c <version_info> src/moduleA When src/moduleA is popped off, we need to know that the "last directory" reverts back to src, and how many entries in our string_list are associated with that parent directory. So I use an auxiliary offsets string_list which would have (parent_directory,offset) information of the form "" 0 src 2 src/moduleA 5 Whenever I write out a tree for a subdirectory, I set versions.nr to the final offset value and then decrement offsets.nr...and then add an entry to versions with a hash for the new directory. The idea is relatively simple, there's just a lot of accounting to implement this. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: step 2 of tree writing -- function to create tree objectElijah Newren
Create a new function, write_tree(), which will take a list of basenames, modes, and oids for a single directory and create a tree object in the object-store. We do not yet have just basenames, modes, and oids for just a single directory (we have a mixture of entries from all directory levels in the hierarchy) so we still die() before the current call to write_tree(), but the next patch will rectify that. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: step 1 of tree writing -- record basenames, modes, and oidsElijah Newren
As a step towards transforming the processed path->conflict_info entries into an actual tree object, start recording basenames, modes, and oids in a dir_metadata structure. Subsequent commits will make use of this to actually write a tree. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: have process_entries operate in a defined orderElijah Newren
We want to handle paths below a directory before needing to handle the directory itself. Also, we want to handle the directory immediately after the paths below it, so we can't use simple lexicographic ordering from strcmp (which would insert foo.txt between foo and foo/file.c). Copy string_list_df_name_compare() from merge-recursive.c, and set up a string list of paths sorted by that function so that we can iterate in the desired order. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: add a preliminary simple process_entries() implementationElijah Newren
Add a process_entries() implementation that just loops over the paths and processes each one individually with an auxiliary process_entry() call. Add a basic process_entry() as well, which handles several cases but leaves a few of the more involved ones with die-not-implemented messages. Also, although process_entries() is supposed to create a tree, it does not yet have code to do so -- except in the special case of merging completely empty trees. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: avoid recursing into identical treesElijah Newren
When all three trees have the same oid, there is no need to recurse into these trees to find that all files within them happen to match. We can just record any one of the trees as the resolution of merging that particular path. Immediately resolving trees for other types of trivial tree merges (such as one side matches the merge base, or the two sides match each other) would prevent us from detecting renames for some paths, and thus prevent us from doing three-way content merges for those paths whose renames we did not detect. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: record stage and auxiliary info for every pathElijah Newren
Create a helper function, setup_path_info(), which can be used to record all the information we want in a merged_info or conflict_info. While there is currently only one caller of this new function, and some of its particular parameters are fixed, future callers of this function will be added later. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: compute a few more useful fields for collect_merge_infoElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: avoid repeating fill_tree_descriptor() on the same treeElijah Newren
Three-way merges, by their nature, are going to often have two or more trees match at a given subdirectory. We can avoid calling fill_tree_descriptor() on the same tree by checking when these trees match. Noting when various oids match will also be useful in other calculations and optimizations as well. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: implement a very basic collect_merge_info()Elijah Newren
This does not actually collect any necessary info other than the pathnames involved, since it just allocates an all-zero conflict_info and stuffs that into paths. However, it invokes the traverse_trees() machinery to walk over all the paths and sets up the basic infrastructure we need. I have left out a few obvious optimizations to try to make this patch as short and obvious as possible. A subsequent patch will add some of those back in with some more useful data fields before we introduce a patch that actually sets up the conflict_info fields. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: add an err() function similar to one from merge-recursiveElijah Newren
Various places in merge-recursive used an err() function when it hit some kind of unrecoverable error. That code was from the reusable bits of merge-recursive.c that we liked, such as merge_3way, writing object files to the object store, reading blobs from the object store, etc. So create a similar function to allow us to port that code over, and use it for when we detect problems returned from collect_merge_info()'s traverse_trees() call, which we will be adding next. While we are at it, also add more documentation for the "clean" field from struct merge_result, particularly since the name suggests a boolean but it is not quite one and this is our first non-boolean usage. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: use histogram diffElijah Newren
In my cursory investigation, histogram diffs are about 2% slower than Myers diffs. Others have probably done more detailed benchmarks. But, in short, histogram diffs have been around for years and in a number of cases provide obviously better looking diffs where Myers diffs are unintelligible but the performance hit has kept them from becoming the default. However, there are real merge bugs we know about that have triggered on git.git and linux.git, which I don't have a clue how to address without the additional information that I believe is provided by histogram diffs. See the following: https://lore.kernel.org/git/20190816184051.GB13894@sigill.intra.peff.net/ https://lore.kernel.org/git/CABPp-BHvJHpSJT7sdFwfNcPn_sOXwJi3=o14qjZS3M8Rzcxe2A@mail.gmail.com/ https://lore.kernel.org/git/CABPp-BGtez4qjbtFT1hQoREfcJPmk9MzjhY5eEq1QhXT23tFOw@mail.gmail.com/ I don't like mismerges. I really don't like silent mismerges. While I am sometimes willing to make performance and correctness tradeoff, I'm much more interested in correctness in general. I want to fix the above bugs. I have not yet started doing so, but I believe histogram diff at least gives me an angle. Unfortunately, I can't rely on using the information from histogram diff unless it's in use. And it hasn't been used because of a few percentage performance hit. In testcases I have looked at, merge-ort is _much_ faster than merge-recursive for non-trivial merges/rebases/cherry-picks. As such, this is a golden opportunity to switch out the underlying diff algorithm (at least the one used by the merge machinery; git-diff and git-log are separate questions); doing so will allow me to get additional data and improved diffs, and I believe it will help me fix the above bugs at some point in the future. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13merge-ort: port merge_start() from merge-recursiveElijah Newren
merge_start() basically does a bunch of sanity checks, then allocates and initializes opt->priv -- a struct merge_options_internal. Most of the sanity checks are usable as-is. The allocation/intialization is a bit different since merge-ort has a very different merge_options_internal than merge-recursive, but the idea is the same. The weirdest part here is that merge-ort and merge-recursive use the same struct merge_options, even though merge_options has a number of fields that are oddly specific to merge-recursive's internal implementation and don't even make sense with merge-ort's high-level design (e.g. buffer_output, which merge-ort has to always do). I reused the same data structure because: * most the fields made sense to both merge algorithms * making a new struct would have required making new enums or somehow externalizing them, and that was getting messy. * it simplifies converting the existing callers by not having to have different code paths for merge_options setup. I also marked detect_renames as ignored. We can revisit that later, but in short: merge-recursive allowed turning off rename detection because it was sometimes glacially slow. When you speed something up by a few orders of magnitude, it's worth revisiting whether that justification is still relevant. Besides, if folks find it's still too slow, perhaps they have a better scaling case than I could find and maybe it turns up some more optimizations we can add. If it still is needed as an option, it is easy to add later. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>