summaryrefslogtreecommitdiff
path: root/revision.h
AgeCommit message (Collapse)Author
2022-11-23Merge branch 'ps/receive-use-only-advertised'Junio C Hamano
"git receive-pack" used to use all the local refs as the boundary for checking connectivity of the data "git push" sent, but now it uses only the refs that it advertised to the pusher. In a repository with the .hideRefs configuration, this reduces the resources needed to perform the check. cf. <221028.86bkpw805n.gmgdl@evledraar.gmail.com> cf. <xmqqr0yrizqm.fsf@gitster.g> * ps/receive-use-only-advertised: receive-pack: only use visible refs for connectivity check rev-parse: add `--exclude-hidden=` option revision: add new parameter to exclude hidden refs revision: introduce struct to handle exclusions revision: move together exclusion-related functions refs: get rid of global list of hidden refs refs: fix memory leak when parsing hideRefs config
2022-11-17revision: add new parameter to exclude hidden refsPatrick Steinhardt
Users can optionally hide refs from remote users in git-upload-pack(1), git-receive-pack(1) and others via the `transfer.hideRefs`, but there is not an easy way to obtain the list of all visible or hidden refs right now. We'll require just that though for a performance improvement in our connectivity check. Add a new option `--exclude-hidden=` that excludes any hidden refs from the next pseudo-ref like `--all` or `--branches`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-17revision: introduce struct to handle exclusionsPatrick Steinhardt
The functions that handle exclusion of refs work on a single string list. We're about to add a second mechanism for excluding refs though, and it makes sense to reuse much of the same architecture for both kinds of exclusion. Introduce a new `struct ref_exclusions` that encapsulates all the logic related to excluding refs and move the `struct string_list` that holds all wildmatch patterns of excluded refs into it. Rename functions that operate on this struct to match its name. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08revisions API: extend the nascent REV_INFO_INIT macroÆvar Arnfjörð Bjarmason
Have the REV_INFO_INIT macro added in [1] declare more members of "struct rev_info" that we can initialize statically, and have repo_init_revisions() do so with the memcpy(..., &blank) idiom introduced in [2]. As the comment for the "REV_INFO_INIT" macro notes this still isn't sufficient to initialize a "struct rev_info" for use yet. But we are getting closer to that eventual goal. Even though we can't fully initialize a "struct rev_info" with REV_INFO_INIT it's useful for readability to clearly separate those things that we can statically initialize, and those that we can't. This change could replace the: list_objects_filter_init(&revs->filter); In the repo_init_revisions() with this line, at the end of the REV_INFO_INIT deceleration in revisions.h: .filter = LIST_OBJECTS_FILTER_INIT, \ But doing so would produce a minor conflict with an outstanding topic[3]. Let's skip that for now. I have follow-ups to initialize more of this statically, e.g. changes to get rid of grep_init(). We can initialize more members with the macro in a future series. 1. f196c1e908d (revisions API users: use release_revisions() needing REV_INFO_INIT, 2022-04-13) 2. 5726a6b4012 (*.c *_init(): define in terms of corresponding *_INIT macro, 2021-07-01) 3. https://lore.kernel.org/git/265b292ed5c2de19b7118dfe046d3d9d932e2e89.1667901510.git.ps@pks.im/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-09-09Merge branch 'jc/format-patch-force-in-body-from'Junio C Hamano
"git format-patch --from=<ident>" can be told to add an in-body "From:" line even for commits that are authored by the given <ident> with "--force-in-body-from"option. * jc/format-patch-force-in-body-from: format-patch: learn format.forceInBodyFrom configuration variable format-patch: allow forcing the use of in-body From: header pretty: separate out the logic to decide the use of in-body from
2022-08-29format-patch: allow forcing the use of in-body From: headerJunio C Hamano
Users may be authoring and committing their commits under the same e-mail address they use to send their patches from, in which case they shouldn't need to use the in-body From: line in their outgoing e-mails. At the receiving end, "git am" will use the address on the "From:" header of the incoming e-mail and all should be well. Some mailing lists, however, mangle the From: address from what the original sender had; in such a situation, the user may want to add the in-body "From:" header even for their own patches. "git format-patch --[no-]force-in-body-from" was invented for such users. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19revision: allow --ancestry-path to take an argumentElijah Newren
We have long allowed users to run e.g. git log --ancestry-path master..seen which shows all commits which satisfy all three of these criteria: * are an ancestor of seen * are not an ancestor of master * have master as an ancestor This commit allows another variant: git log --ancestry-path=$TOPIC master..seen which shows all commits which satisfy all of these criteria: * are an ancestor of seen * are not an ancestor of master * have $TOPIC in their ancestry-path that last bullet can be defined as commits meeting any of these criteria: * are an ancestor of $TOPIC * have $TOPIC as an ancestor * are $TOPIC This also allows multiple --ancestry-path arguments, which can be used to find commits with any of the given topics in their ancestry path. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03revisions API: don't leak memory on argv elements that need free()-ingÆvar Arnfjörð Bjarmason
Add a "free_removed_argv_elements" member to "struct setup_revision_opt", and use it to fix several memory leaks. We have various memory leaks in APIs that take and munge "const char **argv", e.g. parse_options(). Sometimes these APIs are given the "argv" we get to the "main" function, in which case we don't leak memory, but other times we're giving it the "v" member of a "struct strvec" we created. There's several potential ways to fix those sort of leaks, we could add a "nodup" mode to "struct strvec", which would work for the cases where we push constant strings to it. But that wouldn't work as soon as we used strvec_pushf(), or otherwise needed to duplicate or create a string for that "struct strvec". Let's instead make it the responsibility of the revisions API. If it's going to clobber elements of argv it can also free() them, which it will now do if instructed to do so via "free_removed_argv_elements". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-07Merge branch 'ab/plug-leak-in-revisions'Junio C Hamano
Plug the memory leaks from the trickiest API of all, the revision walker. * ab/plug-leak-in-revisions: (27 commits) revisions API: add a TODO for diff_free(&revs->diffopt) revisions API: have release_revisions() release "topo_walk_info" revisions API: have release_revisions() release "date_mode" revisions API: call diff_free(&revs->pruning) in revisions_release() revisions API: release "reflog_info" in release revisions() revisions API: clear "boundary_commits" in release_revisions() revisions API: have release_revisions() release "prune_data" revisions API: have release_revisions() release "grep_filter" revisions API: have release_revisions() release "filter" revisions API: have release_revisions() release "cmdline" revisions API: have release_revisions() release "mailmap" revisions API: have release_revisions() release "commits" revisions API users: use release_revisions() for "prune_data" users revisions API users: use release_revisions() with UNLEAK() revisions API users: use release_revisions() in builtin/log.c revisions API users: use release_revisions() in http-push.c revisions API users: add "goto cleanup" for release_revisions() stash: always have the owner of "stash_info" free it revisions API users: use release_revisions() needing REV_INFO_INIT revision.[ch]: document and move code declared around "init" ...
2022-04-23log: "--since-as-filter" option is a non-terminating "--since" variantMiklos Vajna
The "--since=<time>" option of "git log" limits the commits displayed by the command by stopping the traversal once it sees a commit whose timestamp is older than the given time and not digging further into its parents. This is OK in a history where a commit always has a newer timestamp than any of its parents'. Once you see a commit older than the given <time>, all ancestor commits of it are even older than the time anyway. It poses, however, a problem when there is a commit with a wrong timestamp that makes it appear older than its parents. Stopping traversal at the "incorrectly old" commit will hide its ancestors that are newer than that wrong commit and are newer than the cut-off time given with the --since option. --max-age and --after being the synonyms to --since, they share the same issue. Add a new "--since-as-filter" option that is a variant of "--since=<time>". Instead of stopping the traversal to hide an old enough commit and its all ancestors, exclude commits with an old timestamp from the output but still keep digging the history. Without other traversal stopping options, this will force the command in "git log" family to dig down the history to the root. It may be an acceptable cost for a small project with short history and many commits with screwy timestamps. It is quite unlikely for us to add traversal stopper other than since, so have this as a --since-as-filter option, rather than a separate --as-filter, that would be probably more confusing. Signed-off-by: Miklos Vajna <vmiklos@vmiklos.hu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-14revisions API users: use release_revisions() needing REV_INFO_INITÆvar Arnfjörð Bjarmason
Use release_revisions() to various users of "struct rev_list" which need to have their "struct rev_info" zero-initialized before we can start using it. For the bundle.c code see the early exit case added in 3bbbe467f29 (bundle verify: error out if called without an object database, 2019-05-27). For the relevant bisect.c code see 45b6370812c (bisect: libify `check_good_are_ancestors_of_bad` and its dependents, 2020-02-17). For the submodule.c code see the "goto" on "(!left || !right || !sub)" added in 8e6df65015f (submodule: refactor show_submodule_summary with helper function, 2016-08-31). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-14revision.[ch]: document and move code declared around "init"Ævar Arnfjörð Bjarmason
A subsequent commit will add "REV_INFO_INIT" macro adjacent to repo_init_revisions(), unfortunately between the "struct rev_info" itself and that function we've added various miscellaneous code between the two over the years. Let's move that code either lower in revision.h, giving it API docs while we're at it, or in cases where it wasn't public API at all move it into revision.c No lines of code are changed here, only moved around. The only changes are the addition of new API comments. The "tree_difference" variable could also be declared like this, which I think would be a lot clearer, but let's leave that for now to keep this a move-only change: static enum { REV_TREE_SAME, REV_TREE_NEW, /* Only new files */ REV_TREE_OLD, /* Only files removed */ REV_TREE_DIFFERENT, /* Mixed changes */ } tree_difference = REV_TREE_SAME; Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-14revision.[ch]: provide and start using a release_revisions()Ævar Arnfjörð Bjarmason
The users of the revision.[ch] API's "struct rev_info" are a major source of memory leaks in the test suite under SANITIZE=leak, which in turn adds a lot of noise when trying to mark up tests with "TEST_PASSES_SANITIZE_LEAK=true". The users of that API are largely one-shot, e.g. "git rev-list" or "git log", or the "git checkout" and "git stash" being modified here For these callers freeing the memory is arguably a waste of time, but in many cases they've actually been trying to free the memory, and just doing that in a buggy manner. Let's provide a release_revisions() function for these users, and start migrating them over per the plan outlined in [1]. Right now this only handles the "pending" member of the struct, but more will be added in subsequent commits. Even though we only clear the "pending" member now, let's not leave a trap in code like the pre-image of index_differs_from(), where we'd start doing the wrong thing as soon as the release_revisions() learned to clear its "diffopt". I.e. we need to call release_revisions() after we've inspected any state in "struct rev_info". This leaves in place e.g. clear_pathspec(&rev.prune_data) in stash_working_tree() in builtin/stash.c, subsequent commits will teach release_revisions() to free "prune_data" and other members that in some cases are individually cleared by users of "struct rev_info" by reaching into its members. Those subsequent commits will remove the relevant calls to e.g. clear_pathspec(). We avoid amending code in index_differs_from() in diff-lib.c as well as wt_status_collect_changes_index(), has_unstaged_changes() and has_uncommitted_changes() in wt-status.c in a way that assumes that we are already clearing the "diffopt" member. That will be handled in a subsequent commit. 1. https://lore.kernel.org/git/87a6k8daeu.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-09revision: put object filter into struct rev_infoDerrick Stolee
Placing a 'struct list_objects_filter_options' within 'struct rev_info' will assist making some bookkeeping around object filters in the future. For now, let's use this new member to remove a static global instance of the struct from builtin/rev-list.c. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-24Merge branch 'ah/log-no-graph'Junio C Hamano
"git log --graph --graph" used to leak a graph structure, and there was no way to countermand "--graph" that appear earlier on the command line. A "--no-graph" option has been added and resource leakage has been plugged. * ah/log-no-graph: log: add a --no-graph option log: fix memory leak if --graph is passed multiple times
2022-02-18Merge branch 'jz/rev-list-exclude-first-parent-only'Junio C Hamano
"git log" and friends learned an option --exclude-first-parent-only to propagate UNINTERESTING bit down only along the first-parent chain, just like --first-parent option shows commits that lack the UNINTERESTING bit only along the first-parent chain. * jz/rev-list-exclude-first-parent-only: git-rev-list: add --exclude-first-parent-only flag
2022-02-16Merge branch 'en/remerge-diff'Junio C Hamano
"git log --remerge-diff" shows the difference from mechanical merge result and the result that is actually recorded in a merge commit. * en/remerge-diff: diff-merges: avoid history simplifications when diffing merges merge-ort: mark conflict/warning messages from inner merges as omittable show, log: include conflict/warning messages in --remerge-diff headers diff: add ability to insert additional headers for paths merge-ort: format messages slightly different for use in headers merge-ort: mark a few more conflict messages as omittable merge-ort: capture and print ll-merge warnings in our preferred fashion ll-merge: make callers responsible for showing warnings log: clean unneeded objects during `log --remerge-diff` show, log: provide a --remerge-diff capability
2022-02-11log: add a --no-graph optionAlex Henrie
It's useful to be able to countermand a previous --graph option, for example if `git log --graph` is run via an alias. Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02log: clean unneeded objects during `log --remerge-diff`Elijah Newren
The --remerge-diff option will need to create new blobs and trees representing the "automatic merge" state. If one is traversing a long project history, one can easily get hundreds of thousands of loose objects generated during `log --remerge-diff`. However, none of those loose objects are needed after we have completed our diff operation; they can be summarily deleted. Add a new helper function to tmp_objdir to discard all the contained objects, and call it after each merge is handled. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02show, log: provide a --remerge-diff capabilityElijah Newren
When this option is specified, we remerge all (two parent) merge commits and diff the actual merge commit to the automatically created version, in order to show how users removed conflict markers, resolved the different conflict versions, and potentially added new changes outside of conflict regions in order to resolve semantic merge problems (or, possibly, just to hide other random changes). This capability works by creating a temporary object directory and marking it as the primary object store. This makes it so that any blobs or trees created during the automatic merge are easily removable afterwards by just deleting all objects from the temporary object directory. There are a few ways that this implementation is suboptimal: * `log --remerge-diff` becomes slow, because the temporary object directory can fill with many loose objects while running * the log output can be muddied with misplaced "warning: cannot merge binary files" messages, since ll-merge.c unconditionally writes those messages to stderr while running instead of allowing callers to manage them. * important conflict and warning messages are simply dropped; thus for conflicts like modify/delete or rename/rename or file/directory which are not representable with content conflict markers, there may be no way for a user of --remerge-diff to know that there had been a conflict which was resolved (and which possibly motivated other changes in the merge commit). * when fixing the previous issue, note that some unimportant conflict and warning messages might start being included. We should instead make sure these remain dropped. Subsequent commits will address these issues. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-12git-rev-list: add --exclude-first-parent-only flagJerry Zhang
It is useful to know when a branch first diverged in history from some integration branch in order to be able to enumerate the user's local changes. However, these local changes can include arbitrary merges, so it is necessary to ignore this merge structure when finding the divergence point. In order to do this, teach the "rev-list" family to accept "--exclude-first-parent-only", which restricts the traversal of excluded commits to only follow first parent links. -A-----E-F-G--main \ / / B-C-D--topic In this example, the goal is to return the set {B, C, D} which represents a topic branch that has been merged into main branch. `git rev-list topic ^main` will end up returning no commits since excluding main will end up traversing the commits on topic as well. `git rev-list --exclude-first-parent-only topic ^main` however will return {B, C, D} as desired. Add docs for the new flag, and clarify the doc for --first-parent to indicate that it applies to traversing the set of included commits only. Signed-off-by: Jerry Zhang <jerry@skydio.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-17log: let --invert-grep only invert --grepRené Scharfe
The option --invert-grep is documented to filter out commits whose messages match the --grep filters. However, it also affects the header matches (--author, --committer), which is not intended. Move the handling of that option to grep.c, as only the code there can distinguish between matches in the header from those in the message body. If --invert-grep is given then enable extended expressions (not the regex type, we just need git grep's --not to work), negate the body patterns and check if any of them match by piggy-backing on the collect_hits mechanism of grep_source_1(). Collecting the matches in struct grep_opt is a bit iffy, but with "last_shown" we have a precedent for writing state information to that struct. Reported-by: Dotan Cohen <dotancohen@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-06Merge branch 'jt/add-submodule-odb-clean-up'Junio C Hamano
More code paths that use the hack to add submodule's object database to the set of alternate object store have been cleaned up. * jt/add-submodule-odb-clean-up: revision: remove "submodule" from opt struct repository: support unabsorbed in repo_submodule_init submodule: remove unnecessary unabsorbed fallback
2021-09-09revision: remove "submodule" from opt structJonathan Tan
Clean up a TODO in revision.h by removing the "submodule" field from struct setup_revision_opt. This field is only used to specify the ref store to use, so use rev_info->repo to determine the ref store instead. The only users of this field are merge-ort.c and merge-recursive.c. However, both these files specify the superproject as rev_info->repo and the submodule as setup_revision_opt->submodule. In order to be able to pass the submodule as rev_info->repo, all commits must be parsed with the submodule explicitly specified; this patch does that as well. (An incremental solution in which only some commits are parsed with explicit submodule will not work, because if the same commit is parsed twice in different repositories, there will be 2 heap-allocated object structs corresponding to that commit, and any flag set by the revision walking mechanism on one of them will not be reflected onto the other.) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-05revision: separate walk and unsorted flagsPatrick Steinhardt
The `--no-walk` flag supports two modes: either it sorts the revisions given as input input or it doesn't. This is reflected in a single `no_walk` flag, which reflects one of the three states "walk", "don't walk but without sorting" and "don't walk but with sorting". Split up the flag into two separate bits, one indicating whether we should walk or not and one indicating whether the input should be sorted or not. This will allow us to more easily introduce a new flag `--unsorted-input`, which only impacts the sorting bit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-22Merge branch 'bc/rev-list-without-commit-line'Junio C Hamano
"git rev-list" learns to omit the "commit <object-name>" header lines from the output with the `--no-commit-header` option. * bc/rev-list-without-commit-line: rev-list: add option for --pretty=format without header
2021-07-12rev-list: add option for --pretty=format without headerbrian m. carlson
In general, we encourage users to use plumbing commands, like git rev-list, over porcelain commands, like git log, when scripting. However, git rev-list has one glaring problem that prevents it from being used in certain cases: when --pretty is used with a custom format, it always prints out a line containing "commit" and the object ID. This makes it unsuitable for many scripting needs, and forces users to use git log instead. While we can't change this behavior for backwards compatibility, we can add an option to suppress this behavior, so let's do so, and call it "--no-commit-header". Additionally, add the corresponding positive option to switch it back on. Note that this option doesn't affect the built-in formats, only custom formats. This is exactly the same behavior as users already have from git log and is what most users will be used to. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-08Merge branch 'jk/bitmap-tree-optim'Junio C Hamano
Avoid duplicated work while building reachability bitmaps. * jk/bitmap-tree-optim: bitmaps: don't recurse into trees already in the bitmap
2021-06-15bitmaps: don't recurse into trees already in the bitmapJeff King
If an object is already mentioned in a reachability bitmap we are building, then by definition so are all of the objects it can reach. We have an optimization to stop traversing commits when we see they are already in the bitmap, but we don't do the same for trees. It's generally unavoidable to recurse into trees for commits not yet covered by bitmaps (since most commits generally do have unique top-level trees). But they usually have subtrees that are shared with other commits (i.e., all of the subtrees the commit _didn't_ touch). And some of those commits (and their trees) may be covered by the bitmap. Usually this isn't _too_ big a deal, because we'll visit those subtrees only once in total for the whole walk. But if you have a large number of unbitmapped commits, and if your tree is big, then you may end up opening a lot of sub-trees for no good reason. We can use the same optimization we do for commits here: when we are about to open a tree, see if it's in the bitmap (either the one we are building, or the "seen" bitmap which covers the UNINTERESTING side of the bitmap when doing a set-difference). This works especially well because we'll visit all commits before hitting any trees. So even in a history like: A -- B if "A" has a bitmap on disk but "B" doesn't, we'll already have OR-ed in the results from A before looking at B's tree (so we really will only look at trees touched by B). For most repositories, the timings produced by p5310 are unspectacular. Here's linux.git: Test HEAD^ HEAD -------------------------------------------------------------------- 5310.4: simulated clone 6.00(5.90+0.10) 5.98(5.90+0.08) -0.3% 5310.5: simulated fetch 2.98(5.45+0.18) 2.85(5.31+0.18) -4.4% 5310.7: rev-list (commits) 0.32(0.29+0.03) 0.33(0.30+0.03) +3.1% 5310.8: rev-list (objects) 1.48(1.44+0.03) 1.49(1.44+0.05) +0.7% Any improvement there is within the noise (the +3.1% on test 7 has to be noise, since we are not recursing into trees, and thus the new code isn't even run). The results for git.git are likewise uninteresting. But here are numbers from some other real-world repositories (that are not public). This one's tree is comparable in size to linux.git, but has ~16k refs (and so less complete bitmap coverage): Test HEAD^ HEAD ------------------------------------------------------------------------- 5310.4: simulated clone 38.34(39.86+0.74) 33.95(35.53+0.76) -11.5% 5310.5: simulated fetch 2.29(6.31+0.35) 2.20(5.97+0.41) -3.9% 5310.7: rev-list (commits) 0.99(0.86+0.13) 0.96(0.85+0.11) -3.0% 5310.8: rev-list (objects) 11.32(11.04+0.27) 6.59(6.37+0.21) -41.8% And here's another with a very large tree (~340k entries), and a fairly large number of refs (~10k): Test HEAD^ HEAD ------------------------------------------------------------------------- 5310.3: simulated clone 53.83(54.71+1.54) 39.77(40.76+1.50) -26.1% 5310.4: simulated fetch 19.91(20.11+0.56) 19.79(19.98+0.67) -0.6% 5310.6: rev-list (commits) 0.54(0.44+0.11) 0.51(0.43+0.07) -5.6% 5310.7: rev-list (objects) 24.32(23.59+0.73) 9.85(9.49+0.36) -59.5% This patch provides substantial improvements in these larger cases, and have any drawbacks for smaller ones (the cost of the bitmap check is quite small compared to an actual tree traversal). Note that we have to add a version of revision.c's include_check callback which handles non-commits. We could possibly consolidate this into a single callback for all objects types, as there's only one user of the feature which would need converted (pack-bitmap.c:should_include). That would in theory let us avoid duplicating any logic. But when I tried it, the code ended up much worse to read, with lots of repeated "if it's a commit do this, otherwise do that". Having two separate callbacks splits that naturally, and matches the existing split of show_commit/show_object callbacks. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-14Merge branch 'so/log-m-implies-p'Junio C Hamano
The "-m" option in "git log -m" that does not specify which format, if any, of diff is desired did not have any visible effect; it now implies some form of diff (by default "--patch") is produced. * so/log-m-implies-p: diff-merges: let "-m" imply "-p" diff-merges: rename "combined_imply_patch" to "merges_imply_patch" stash list: stop passing "-m" to "git log" git-svn: stop passing "-m" to "git rev-list" diff-merges: move specific diff-index "-m" handling to diff-index t4013: test "git diff-index -m" t4013: test "git diff-tree -m" t4013: test "git log -m --stat" t4013: test "git log -m --raw" t4013: test that "-m" alone has no effect in "git log"
2021-05-21diff-merges: rename "combined_imply_patch" to "merges_imply_patch"Sergey Organov
This is refactoring change in preparation for the next commit that will let -m imply -p. The old name doesn't match the intention to let not only -c/-cc imply -p, but also -m, that is not a "combined" format, so we rename the flag accordingly. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-11revision: mark commit parents as NOT_USER_GIVENPatrick Steinhardt
The NOT_USER_GIVEN flag of an object marks whether a flag was explicitly provided by the user or not. The most important use case for this is when filtering objects: only objects that were not explicitly requested will get filtered. The flag is currently only set for blobs and trees, which has been fine given that there are no filters for tags or commits currently. We're about to extend filtering capabilities to add object type filter though, which requires us to set up the NOT_USER_GIVEN flag correctly -- if it's not set, the object wouldn't get filtered at all. Mark unseen commit parents as NOT_USER_GIVEN when processing parents. Like this, explicitly provided parents stay user-given and thus unfiltered, while parents which get loaded as part of the graph walk can be filtered. This commit shouldn't have any user-visible impact yet as there is no logic to filter commits yet. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-02Merge branch 'zh/format-patch-fractional-reroll-count'Junio C Hamano
"git format-patch -v<n>" learned to allow a reroll count that is not an integer. * zh/format-patch-fractional-reroll-count: format-patch: allow a non-integral version numbers
2021-03-23format-patch: allow a non-integral version numbersZheNing Hu
The `-v<n>` option of `format-patch` can give nothing but an integral iteration number to patches in a series.  Some people, however, prefer to mark a new iteration with only a small fixup with a non integral iteration number (e.g. an "oops, that was wrong" fix-up patch for v4 iteration may be labeled as "v4.1"). Allow `format-patch` to take such a non-integral iteration number. `<n>` can be any string, such as '3.1' or '4rev2'. In the case where it is a non-integral value, the "Range-diff" and "Interdiff" headers will not include the previous version. Signed-off-by: ZheNing Hu <adlternative@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-23revision: learn '--no-kept-objects'Taylor Blau
A future caller will want to be able to perform a reachability traversal which terminates when visiting an object found in a kept pack. The closest existing option is '--honor-pack-keep', but this isn't quite what we want. Instead of halting the traversal midway through, a full traversal is always performed, and the results are only trimmed afterwords. Besides needing to introduce a new flag (since culling results post-facto can be different than halting the traversal as it's happening), there is an additional wrinkle handling the distinction in-core and on-disk kept packs. That is: what kinds of kept pack should stop the traversal? Introduce '--no-kept-objects[=<on-disk|in-core>]' to specify which kinds of kept packs, if any, should stop a traversal. This can be useful for callers that want to perform a reachability analysis, but want to leave certain packs alone (for e.g., when doing a geometric repack that has some "large" packs which are kept in-core that it wants to leave alone). Note that this option is not guaranteed to produce exactly the set of objects that aren't in kept packs, since it's possible the traversal order may end up in a situation where a non-kept ancestor was "cut off" by a kept object (at which point we would stop traversing). But, we don't care about absolute correctness here, since this will eventually be used as a purely additive guide in an upcoming new repack mode. Explicitly avoid documenting this new flag, since it is only used internally. In theory we could avoid even adding it rev-list, but being able to spell this option out on the command-line makes some special cases easier to test without promising to keep it behaving consistently forever. Those tricky cases are exercised in t6114. Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-06Merge branch 'so/log-diff-merge'Junio C Hamano
"git log" learned a new "--diff-merges=<how>" option. * so/log-diff-merge: (32 commits) t4013: add tests for --diff-merges=first-parent doc/git-show: include --diff-merges description doc/rev-list-options: document --first-parent changes merges format doc/diff-generate-patch: mention new --diff-merges option doc/git-log: describe new --diff-merges options diff-merges: add '--diff-merges=1' as synonym for 'first-parent' diff-merges: add old mnemonic counterparts to --diff-merges diff-merges: let new options enable diff without -p diff-merges: do not imply -p for new options diff-merges: implement new values for --diff-merges diff-merges: make -m/-c/--cc explicitly mutually exclusive diff-merges: refactor opt settings into separate functions diff-merges: get rid of now empty diff_merges_init_revs() diff-merges: group diff-merge flags next to each other inside 'rev_info' diff-merges: split 'ignore_merges' field diff-merges: fix -m to properly override -c/--cc t4013: add tests for -m failing to override -c/--cc t4013: support test_expect_failure through ':failure' magic diff-merges: revise revs->diff flag handling diff-merges: handle imply -p on -c/--cc logic for log.c ...
2020-12-21diff-merges: let new options enable diff without -pSergey Organov
New options don't have any visible effect unless -p is either given or implied, as unlike -c/-cc we don't imply -p with --diff-merges. To fix this, this patch adds new functionality by letting new options enable output of diffs for merge commits only. Add 'merges_need_diff' field and set it whenever diff output for merges is enabled by any of the new options. Extend diff output logic accordingly, to output diffs for merges when 'merges_need_diff' is set even when no -p has been provided. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-21diff-merges: do not imply -p for new optionsSergey Organov
Add 'combined_imply_patch' field and set it only for old --cc/-c options, then imply -p if this flag is set instead of implying -p whenever 'combined_merge' flag is set. We don't want new --diff-merge options to imply -p, to make it possible to enable output of diffs for merges independently from non-merge commits. At the same time we want to preserve behavior of old --c/-c/-m options and their interactions with --first-parent, to stay backward-compatible. This patch is first step in this direction: it separates old "--cc/-c imply -p" logic from the rest of the options. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-21diff-merges: group diff-merge flags next to each other inside 'rev_info'Sergey Organov
The relevant flags were somewhat scattered over definition of 'struct rev_info'. Rearrange them to group them together. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-21diff-merges: split 'ignore_merges' fieldSergey Organov
'ignore_merges' was 3-way field that served two distinct purposes that we now assign to 2 new independent flags: 'separate_merges', and 'explicit_diff_merges'. 'separate_merges' tells that we need to output diff format containing separate diff for every parent (as opposed to 'combine_merges'). 'explicit_diff_merges' tells that at least one of diff-merges options has been explicitly specified on the command line, so no defaults should apply. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-21diff-merges: introduce revs->first_parent_merges flagSergey Organov
This new field allows us to separate format of diff for merges from 'first_parent_only' flag which primary purpose is limiting history traversal. This change further localizes diff format selection logic into the diff-merges.c file. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-21revision: move diff merges functions to its own diff-merges.cSergey Organov
Create separate diff-merges.c and diff-merges.h files, and move all the code related to handling of diff merges there. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-21revision: provide implementation for diff merges tweaksSergey Organov
Use these implementations from show_setup_revisions_tweak() and log_setup_revisions_tweak() in builtin/log.c. This completes moving of management of diff merges parameters to a single place, where we can finally observe them simultaneously. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-10format-patch: make output filename configurableJunio C Hamano
For the past 15 years, we've used the hardcoded 64 as the length limit of the filename of the output from the "git format-patch" command. Since the value is shorter than the 80-column terminal, it could grow without line wrapping a bit. At the same time, since the value is longer than half of the 80-column terminal, we could fit two or more of them in "ls" output on such a terminal if we allowed to lower it. Introduce a new command line option --filename-max-length=<n> and a new configuration variable format.filenameMaxLength to override the hardcoded default. While we are at it, remove a check that the name of output directory does not exceed PATH_MAX---this check is pointless in that by the time control reaches the function, the caller would already have done an equivalent of "mkdir -p", so if the system does not like an overly long directory name, the control wouldn't have reached here, and otherwise, we know that the system allowed the output directory to exist. In the worst case, we will get an error when we try to open the output file and handle the error correctly anyway. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-31revision: add separate field for "-m" of "diff-index -m"Sergey Organov
Add separate 'match_missing' field for diff-index to use and set it when we encounter "-m" option. This field won't then be cleared when another meaning of "-m" is reverted (e.g., by "--no-diff-merges"), nor it will be affected by future option(s) that might drive 'ignore_merges' field. Use this new field from diff-lib:do_oneway_diff() instead of reusing 'ignore_merges' field. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-18Merge branch 'jk/log-fp-implies-m'Junio C Hamano
"git log --first-parent -p" showed patches only for single-parent commits on the first-parent chain; the "--first-parent" option has been made to imply "-m". Use "--no-diff-merges" to restore the previous behaviour to omit patches for merge commits. * jk/log-fp-implies-m: doc/git-log: clarify handling of merge commit diffs doc/git-log: move "-t" into diff-options list doc/git-log: drop "-r" diff option doc/git-log: move "Diff Formatting" from rev-list-options log: enable "-m" automatically with "--first-parent" revision: add "--no-diff-merges" option to counteract "-m" log: drop "--cc implies -m" logic
2020-07-30Merge branch 'ds/commit-graph-bloom-updates' into masterJunio C Hamano
Updates to the changed-paths bloom filter. * ds/commit-graph-bloom-updates: commit-graph: check all leading directories in changed path Bloom filters revision: empty pathspecs should not use Bloom filters revision.c: fix whitespace commit-graph: check chunk sizes after writing commit-graph: simplify chunk writes into loop commit-graph: unify the signatures of all write_graph_chunk_*() functions commit-graph: persist existence of changed-paths bloom: fix logic in get_bloom_filter() commit-graph: change test to die on parse, not load commit-graph: place bloom_settings in context
2020-07-29revision: add "--no-diff-merges" option to counteract "-m"Jeff King
The "-m" option sets revs->ignore_merges to "0", but there's no way to undo it. This probably isn't something anybody overly cares about, since "1" is already the default, but it will serve as an escape hatch when we flip the default for ignore_merges to "0" in more situations. We'll also add a few extra niceties: - initialize the value to "-1" to indicate "not set", and then resolve it to the normal 0/1 bool in setup_revisions(). This lets any tweak functions, as well as setup_revisions() itself, avoid clobbering the user's preference (which until now they couldn't actually express). - since we now have --no-diff-merges, let's add the matching --diff-merges, which is just a synonym for "-m". Then we don't even need to document --no-diff-merges separately; it countermands the long form of "-m" in the usual way. The new test shows that this behaves just the same as the current behavior without "-m". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-01commit-graph: check all leading directories in changed path Bloom filtersSZEDER Gábor
The file 'dir/subdir/file' can only be modified if its leading directories 'dir' and 'dir/subdir' are modified as well. So when checking modified path Bloom filters looking for commits modifying a path with multiple path components, then check not only the full path in the Bloom filters, but all its leading directories as well. Take care to check these paths in "deepest first" order, because it's the full path that is least likely to be modified, and the Bloom filter queries can short circuit sooner. This can significantly reduce the average false positive rate, by about an order of magnitude or three(!), and can further speed up pathspec-limited revision walks. The table below compares the average false positive rate and runtime of git rev-list HEAD -- "$path" before and after this change for 5000+ randomly* selected paths from each repository: Average false Average Average positive rate runtime runtime before after before after difference ------------------------------------------------------------------ git 3.220% 0.7853% 0.0558s 0.0387s -30.6% linux 2.453% 0.0296% 0.1046s 0.0766s -26.8% tensorflow 2.536% 0.6977% 0.0594s 0.0420s -29.2% *Path selection was done with the following pipeline: git ls-tree -r --name-only HEAD | sort -R | head -n 5000 The improvements in runtime are much smaller than the improvements in average false positive rate, as we are clearly reaching diminishing returns here. However, all these timings depend on that accessing tree objects is reasonably fast (warm caches). If we had a partial clone and the tree objects had to be fetched from a promisor remote, e.g.: $ git clone --filter=tree:0 --bare file://.../webkit.git webkit.notrees.git $ git -C webkit.git -c core.modifiedPathBloomFilters=1 \ commit-graph write --reachable $ cp webkit.git/objects/info/commit-graph webkit.notrees.git/objects/info/ $ git -C webkit.notrees.git -c core.modifiedPathBloomFilters=1 \ rev-list HEAD -- "$path" then checking all leading path component can reduce the runtime from over an hour to a few seconds (and this is with the clone and the promisor on the same machine). This adjusts the tracing values in t4216-log-bloom.sh, which provides a concrete way to notice the improvement. Helped-by: Taylor Blau <me@ttaylorr.com> Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24revision: reallocate TOPO_WALK object flagsRené Scharfe
The bit fields in struct object have an unfortunate layout. Here's what pahole reports on x86_64 GNU/Linux: struct object { unsigned int parsed:1; /* 0: 0 4 */ unsigned int type:3; /* 0: 1 4 */ /* XXX 28 bits hole, try to pack */ /* Force alignment to the next boundary: */ unsigned int :0; unsigned int flags:29; /* 4: 0 4 */ /* XXX 3 bits hole, try to pack */ struct object_id oid; /* 8 32 */ /* size: 40, cachelines: 1, members: 4 */ /* sum members: 32 */ /* sum bitfield members: 33 bits, bit holes: 2, sum bit holes: 31 bits */ /* last cacheline: 40 bytes */ }; Notice the 1+3+29=33 bits in bit fields and 28+3=31 bits in holes. There are holes inside the flags bit field as well -- while some object flags are used for more than one purpose, 22, 23 and 24 are still free. Use 23 and 24 instead of 27 and 28 for TOPO_WALK_EXPLORED and TOPO_WALK_INDEGREE. This allows us to reduce FLAG_BITS by one so that all bitfields combined fit into a single 32-bit slot: struct object { unsigned int parsed:1; /* 0: 0 4 */ unsigned int type:3; /* 0: 1 4 */ unsigned int flags:28; /* 0: 4 4 */ struct object_id oid; /* 4 32 */ /* size: 36, cachelines: 1, members: 4 */ /* last cacheline: 36 bytes */ }; With this tight packing the size of struct object is reduced by 10%. Other architectures probably benefit as well. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>