summaryrefslogtreecommitdiff
path: root/ref-filter.c
AgeCommit message (Collapse)Author
14 hoursref-filter: properly distinuish pseudo and root refsPatrick Steinhardt
The ref-filter interfaces currently define root refs as either a detached HEAD or a pseudo ref. Pseudo refs aren't root refs though, so let's properly distinguish those ref types. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
14 hoursrefs: rename `is_pseudoref()` to `is_root_ref()`Patrick Steinhardt
Rename `is_pseudoref()` to `is_root_ref()` to adapt to the newly defined terminology in our gitglossary(7). Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-05date: make DATE_MODE thread-safeRené Scharfe
date_mode_from_type() modifies a static variable and returns a pointer to it. This is not thread-safe. Most callers of date_mode_from_type() use it via the macro DATE_MODE and pass its result on to functions like show_date(), which take a const pointer and don't modify the struct. Avoid the static storage by putting the variable on the stack and returning the whole struct date_mode. Change functions that take a constant pointer to expect the whole struct instead. Reduce the cost of passing struct date_mode around on 64-bit systems by reordering its members to close the hole between the 32-bit wide .type and the 64-bit aligned .strftime_fmt as well as the alignment hole at the end. sizeof reports 24 before and 16 with this change on x64. Keep .type at the top to still allow initialization without designator -- though that's only done in a single location, in builtin/blame.c. Signed-off-by: René Scharfe <l.s.r@web.de> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-14Merge branch 'la/trailer-api'Junio C Hamano
Trailer API updates. Acked-by: Christian Couder <christian.couder@gmail.com> cf. <CAP8UFD1Zd+9q0z1JmfOf60S2vn5-sD3SafDvAJUzRFwHJKcb8A@mail.gmail.com> * la/trailer-api: format_trailers_from_commit(): indirectly call trailer_info_get() format_trailer_info(): move "fast path" to caller format_trailers(): use strbuf instead of FILE trailer_info_get(): reorder parameters trailer: move interpret_trailers() to interpret-trailers.c trailer: reorder format_trailers_from_commit() parameters trailer: rename functions to use 'trailer' shortlog: add test for de-duplicating folded trailers trailer: free trailer_info _after_ all related usage
2024-03-05Merge branch 'kn/for-all-refs'Junio C Hamano
"git for-each-ref" learned "--include-root-refs" option to show even the stuff outside the 'refs/' hierarchy. * kn/for-all-refs: for-each-ref: add new option to include root refs ref-filter: rename 'FILTER_REFS_ALL' to 'FILTER_REFS_REGULAR' refs: introduce `refs_for_each_include_root_refs()` refs: extract out `loose_fill_ref_dir_regular_file()` refs: introduce `is_pseudoref()` and `is_headref()`
2024-03-01trailer: reorder format_trailers_from_commit() parametersLinus Arver
Currently there are two functions for formatting trailers in <trailer.h>: void format_trailers(const struct process_trailer_options *, struct list_head *trailers, FILE *outfile); void format_trailers_from_commit(struct strbuf *out, const char *msg, const struct process_trailer_options *opts); and although they are similar enough (even taking the same process_trailer_options struct pointer) they are used quite differently. One might intuitively think that format_trailers_from_commit() builds on top of format_trailers(), but this is not the case. Instead format_trailers_from_commit() calls format_trailer_info() and format_trailers() is never called in that codepath. This is a preparatory refactor to help us deprecate format_trailers() in favor of format_trailer_info() (at which point we can rename the latter to the former). When the deprecation is complete, both format_trailers_from_commit(), and the interpret-trailers builtin will be able to call into the same helper function (instead of format_trailers() and format_trailer_info(), respectively). Unifying the formatters is desirable because it simplifies the API. Reorder parameters for format_trailers_from_commit() to prefer const struct process_trailer_options *opts as the first parameter, because these options are intimately tied to formatting trailers. And take struct strbuf *out last, because it's an "out parameter" (something that the caller wants to use as the output of this function). Similarly, reorder parameters for format_trailer_info(), because later on we will unify the two together. Signed-off-by: Linus Arver <linusa@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-23for-each-ref: add new option to include root refsKarthik Nayak
The git-for-each-ref(1) command doesn't provide a way to print root refs i.e pseudorefs and HEAD with the regular "refs/" prefixed refs. This commit adds a new option "--include-root-refs" to git-for-each-ref(1). When used this would also print pseudorefs and HEAD for the current worktree. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-23ref-filter: rename 'FILTER_REFS_ALL' to 'FILTER_REFS_REGULAR'Karthik Nayak
The flag 'FILTER_REFS_ALL' is a bit ambiguous, where ALL doesn't specify if it means to contain refs from all worktrees or whether all types of refs (regular, HEAD & pseudorefs) or all of the above. Since here it is actually referring to all refs with the "refs/" prefix, let's rename it to 'FILTER_REFS_REGULAR' to indicate that this is specifically for regular refs. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-08ref-filter.c: sort formatted dates by byte valueVictoria Dye
Update the ref sorting functions of 'ref-filter.c' so that when date fields are specified with a format string (such as in 'git for-each-ref --sort=creatordate:<something>'), they are sorted by their formatted string value rather than by the underlying numeric timestamp. Currently, date fields are always sorted by timestamp, regardless of whether formatting information is included in the '--sort' key. Leaving the default (unformatted) date sorting unchanged, sorting by the formatted date string adds some flexibility to 'for-each-ref' by allowing for behavior like "sort by year, then by refname within each year" or "sort by time of day". Because the inclusion of a format string previously had no effect on sort behavior, this change likely will not affect existing usage of 'for-each-ref' or other ref listing commands. Additionally, update documentation & tests to document the new sorting mechanism. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-08Merge branch 'en/header-cleanup'Junio C Hamano
Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files
2024-01-02Merge branch 'rj/status-bisect-while-rebase'Junio C Hamano
"git status" is taught to show both the branch being bisected and being rebased when both are in effect at the same time. * rj/status-bisect-while-rebase: status: fix branch shown when not only bisecting
2023-12-26treewide: remove unnecessary includes in source filesElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26treewide: remove unnecessary includes in source filesElijah Newren
Each of these were checked with gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-10Merge branch 'vd/for-each-ref-unsorted-optimization'Junio C Hamano
"git for-each-ref --no-sort" still sorted the refs alphabetically which paid non-trivial cost. It has been redefined to show output in an unspecified order, to allow certain optimizations to take advantage of. * vd/for-each-ref-unsorted-optimization: t/perf: add perf tests for for-each-ref ref-filter.c: use peeled tag for '*' format fields for-each-ref: clean up documentation of --format ref-filter.c: filter & format refs in the same callback ref-filter.c: refactor to create common helper functions ref-filter.c: rename 'ref_filter_handler()' to 'filter_one()' ref-filter.h: add functions for filter/format & format-only ref-filter.h: move contains caches into filter ref-filter.h: add max_count and omit_empty to ref_format ref-filter.c: really don't sort when using --no-sort
2023-11-16ref-filter.c: use peeled tag for '*' format fieldsVictoria Dye
In most builtins ('rev-parse <revision>^{}', 'show-ref --dereference'), "dereferencing" a tag refers to a recursive peel of the tag object. Unlike these cases, the dereferencing prefix ('*') in 'for-each-ref' format specifiers triggers only a single, non-recursive dereference of a given tag object. For most annotated tags, a single dereference is all that is needed to access the tag's associated commit or tree; "recursive" and "non-recursive" dereferencing are functionally equivalent in these cases. However, nested tags (annotated tags whose target is another annotated tag) dereferenced once return another tag, where a recursive dereference would return the commit or tree. Currently, if a user wants to filter & format refs and include information about a recursively-dereferenced tag, they can do so with something like 'cat-file --batch-check': git for-each-ref --format="%(objectname)^{} %(refname)" <pattern> | git cat-file --batch-check="%(objectname) %(rest)" But the combination of commands is inefficient. So, to improve the performance of this use case and align the defererencing behavior of 'for-each-ref' with that of other commands, update the ref formatting code to use the peeled tag (from 'peel_iterated_oid()') to populate '*' fields rather than the tag's immediate target object (from 'get_tagged_oid()'). Additionally, add a test to 't6300-for-each-ref' to verify new nested tag behavior and update 't6302-for-each-ref-filter.sh' to print the correct value for nested dereferenced fields. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-16ref-filter.c: filter & format refs in the same callbackVictoria Dye
Update 'filter_and_format_refs()' to try to perform ref filtering & formatting in a single ref iteration, without an intermediate 'struct ref_array'. This can only be done if no operations need to be performed on a pre-filtered array; specifically, if the refs are - filtered on reachability, - sorted, or - formatted with ahead-behind information they cannot be filtered & formatted in the same iteration. In that case, fall back on the current filter-then-sort-then-format flow. This optimization substantially improves memory usage due to no longer storing a ref array in memory. In some cases, it also dramatically reduces runtime (e.g. 'git for-each-ref --no-sort --count=1', which no longer loads all refs into a 'struct ref_array' to printing only the first ref). Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-16ref-filter.c: refactor to create common helper functionsVictoria Dye
Factor out parts of 'ref_array_push()', 'ref_filter_handler()', and 'filter_refs()' into new helper functions: * Extract the code to grow a 'struct ref_array' and append a given 'struct ref_array_item *' to it from 'ref_array_push()' into 'ref_array_append()'. * Extract the code to filter a given ref by refname & object ID then create a new 'struct ref_array_item *' from 'filter_one()' into 'apply_ref_filter()'. * Extract the code for filter pre-processing, contains cache creation, and ref iteration from 'filter_refs()' into 'do_filter_refs()'. In later patches, these helpers will be used by new ref-filter API functions. This patch does not result in any user-facing behavior changes or changes to callers outside of 'ref-filter.c'. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-16ref-filter.c: rename 'ref_filter_handler()' to 'filter_one()'Victoria Dye
Rename 'ref_filter_handler()' to 'filter_one()' to more clearly distinguish it from other ref filtering callbacks that will be added in later patches. The "*_one()" naming convention is common throughout the codebase for iteration callbacks. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-16ref-filter.h: add functions for filter/format & format-onlyVictoria Dye
Add two new public methods to 'ref-filter.h': * 'print_formatted_ref_array()' which, given a format specification & array of ref items, formats and prints the items to stdout. * 'filter_and_format_refs()' which combines 'filter_refs()', 'ref_array_sort()', and 'print_formatted_ref_array()' into a single function. This consolidates much of the code used to filter and format refs in 'builtin/for-each-ref.c', 'builtin/tag.c', and 'builtin/branch.c', reducing duplication and simplifying the future changes needed to optimize the filter & format process. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-16ref-filter.h: move contains caches into filterVictoria Dye
Move the 'contains_cache' and 'no_contains_cache' used in filter_refs into an 'internal' struct of the 'struct ref_filter'. In later patches, the 'struct ref_filter *' will be a common data structure across multiple filtering functions. These caches are part of the common functionality the filter struct will support, so they are updated to be internally accessible wherever the filter is used. The design used here mirrors what was introduced in 576de3d956 (unpack_trees: start splitting internal fields from public API, 2023-02-27) for 'unpack_trees_options'. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-16ref-filter.c: really don't sort when using --no-sortVictoria Dye
When '--no-sort' is passed to 'for-each-ref', 'tag', and 'branch', the printed refs are still sorted by ascending refname. Change the handling of sort options in these commands so that '--no-sort' to truly disables sorting. '--no-sort' does not disable sorting in these commands is because their option parsing does not distinguish between "the absence of '--sort'" (and/or values for tag.sort & branch.sort) and '--no-sort'. Both result in an empty 'sorting_options' string list, which is parsed by 'ref_sorting_options()' to create the 'struct ref_sorting *' for the command. If the string list is empty, 'ref_sorting_options()' interprets that as "the absence of '--sort'" and returns the default ref sorting structure (equivalent to "refname" sort). To handle '--no-sort' properly while preserving the "refname" sort in the "absence of --sort'" case, first explicitly add "refname" to the string list *before* parsing options. This alone doesn't actually change any behavior, since 'compare_refs()' already falls back on comparing refnames if two refs are equal w.r.t all other sort keys. Now that the string list is populated by default, '--no-sort' is the only way to empty the 'sorting_options' string list. Update 'ref_sorting_options()' to return a NULL 'struct ref_sorting *' if the string list is empty, and add a condition to 'ref_array_sort()' to skip the sort altogether if the sort structure is NULL. Note that other functions using 'struct ref_sorting *' do not need any changes because they already ignore NULL values. Finally, remove the condition around sorting in 'ls-remote', since it's no longer necessary. Unlike 'for-each-ref' et. al., it does *not* do any sorting by default. This default is preserved by simply leaving its sort key string list empty before parsing options; if no additional sort keys are set, 'struct ref_sorting *' is NULL and sorting is skipped. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-16status: fix branch shown when not only bisectingRubén Justo
In 83c750acde (wt-status.*: better advice for git status added, 2012-06-05), git-status received new informative messages to describe the ongoing work in a worktree. These messages were enhanced in 0722c805d6 (status: show the branch name if possible in in-progress info, 2013-02-03), to show, if possible, the branch where the operation was initiated. Since then, we show incorrect information when several operations are in progress and one of them is bisect: $ git checkout -b foo $ GIT_SEQUENCE_EDITOR='echo break >' git rebase -i HEAD~ $ git checkout -b bar $ git bisect start $ git status ... You are currently editing a commit while rebasing branch 'bar' on '...'. You are currently bisecting, started from branch 'bar'. ... Note that we erroneously say "while rebasing branch 'bar'" when we should be referring to "foo". This must have gone unnoticed for so long because it must be unusual to start a bisection while another operation is in progress. And even less usual to involve different branches. It caught my attention reviewing a leak introduced in 8b87cfd000 (wt-status: move strbuf into read_and_strip_branch(), 2013-03-16). A simple change to deal with this situation can be to record in struct wt_status_state, the branch where the bisect starts separately from the branch related to other operations. Let's do it and so we'll be able to display correct information and we'll avoid the leak as well. Signed-off-by: Rubén Justo <rjusto@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-25ref-filter: add mailmap supportKousik Sanagavarapu
Add mailmap support to ref-filter formats which are similar in pretty. This support is such that the following pretty placeholders are equivalent to the new ref-filter atoms: %aN = authorname:mailmap %cN = committername:mailmap %aE = authoremail:mailmap %aL = authoremail:mailmap,localpart %cE = committeremail:mailmap %cL = committeremail:mailmap,localpart Additionally, mailmap can also be used with ":trim" option for email by doing something like "authoremail:mailmap,trim". The above also applies for the "tagger" atom, that is, "taggername:mailmap", "taggeremail:mailmap", "taggeremail:mailmap,trim" and "taggername:mailmap,localpart". The functionality of ":trim" and ":localpart" remains the same. That is, ":trim" gives the email, but without the angle brackets and ":localpart" gives the part of the email before the '@' character (if such a character is not found then we directly grab everything between the angle brackets). Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-14Merge branch 'ks/ref-filter-sort-numerically'Junio C Hamano
"git for-each-ref --sort='contents:size'" sorts the refs according to size numerically, giving a ref that points at a blob twelve-byte (12) long before showing a blob hundred-byte (100) long. * ks/ref-filter-sort-numerically: ref-filter: sort numerically when ":size" is used
2023-09-05ref-filter: sort numerically when ":size" is usedKousik Sanagavarapu
Atoms like "raw" and "contents" have a ":size" option which can be used to know the size of the data. Since these atoms have the cmp_type FIELD_STR, they are sorted alphabetically from 'a' to 'z' and '0' to '9'. Meaning, even when the ":size" option is used and what we ultimatlely have is numbers, we still sort alphabetically. For example, consider the the following case in a repo refname contents:size raw:size ======= ============= ======== refs/heads/branch1 1130 1210 refs/heads/master 300 410 refs/tags/v1.0 140 260 Sorting with "--format="%(refname) %(contents:size) --sort=contents:size" would give refs/heads/branch1 1130 refs/tags/v1.0.0 140 refs/heads/master 300 which is an alphabetic sort, while what one might really expect is refs/tags/v1.0.0 140 refs/heads/master 300 refs/heads/branch1 1130 which is a numeric sort (that is, a "$ sort -n file" as opposed to a "$ sort file", where "file" contains only the "contents:size" or "raw:size" info, each of which is on a newline). Same is the case with "--sort=raw:size". So, sort numerically whenever the sort is done with "contents:size" or "raw:size" and do it the normal alphabetic way when "contents" or "raw" are used with some other option (they are FIELD_STR anyways). Helped-by: Jeff King <peff@peff.net> Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-30ref-filter: mark unused parameters in parser callbacksJeff King
These are similar to the cases annotated in 5fe9e1ce2f (ref-filter: mark unused callback parameters, 2023-02-24), but were added after that commit. Note that the ahead/behind callback ignores its "atom" parameter, which is a little unusual, since that struct usually stores the result. But in this case, the data is stored centrally in ref_array->counts, since we want to compute all ahead/behinds at once, not per ref. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-02Merge branch 'ks/ref-filter-describe'Junio C Hamano
"git branch --list --format=<format>" and friends are taught a new "%(describe)" placeholder. * ks/ref-filter-describe: ref-filter: add new "describe" atom ref-filter: add multiple-option parsing functions
2023-07-25Merge branch 'jk/nested-points-at'Junio C Hamano
"git tag --list --points-at X" showed tags that directly refers to object X, but did not list a tag that points at such a tag, which has been corrected. * jk/nested-points-at: ref-filter: simplify return type of match_points_at ref-filter: avoid parsing non-tags in match_points_at() ref-filter: avoid parsing tagged objects in match_points_at() ref-filter: handle nested tags in --points-at option
2023-07-24ref-filter: add new "describe" atomKousik Sanagavarapu
Duplicate the logic of %(describe) and friends from pretty to ref-filter. In the future, this change helps in unifying both the formats as ref-filter will be able to do everything that pretty is doing and we can have a single interface. The new atom "describe" and its friends are equivalent to the existing pretty formats with the same name. Helped-by: Junio C Hamano <gitster@pobox.com> Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-24ref-filter: add multiple-option parsing functionsKousik Sanagavarapu
The functions match_placeholder_arg_value() match_placeholder_bool_arg() were added in pretty 4f732e0fd7 (pretty: allow %(trailers) options with explicit value, 2019-01-29) to parse multiple options in an argument to --pretty. For example, git log --pretty="%(trailers:key=Signed-Off-By,separator=%x2C )" will output all the trailers matching the key and seperates them by a comma followed by a space per commit. Add similar functions, match_atom_arg_value() match_atom_bool_arg() in ref-filter. There is no atom yet that can use these functions in ref-filter, but we are going to add a new %(describe) atom in a subsequent commit where we parse options like tags=<bool-value> or match=<pattern> given to it. Helped-by: Junio C Hamano <gitster@pobox.com> Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-21Merge branch 'tb/refs-exclusion-and-packed-refs'Junio C Hamano
Enumerating refs in the packed-refs file, while excluding refs that match certain patterns, has been optimized. * tb/refs-exclusion-and-packed-refs: ls-refs.c: avoid enumerating hidden refs where possible upload-pack.c: avoid enumerating hidden refs where possible builtin/receive-pack.c: avoid enumerating hidden references refs.h: implement `hidden_refs_to_excludes()` refs.h: let `for_each_namespaced_ref()` take excluded patterns revision.h: store hidden refs in a `strvec` refs/packed-backend.c: add trace2 counters for jump list refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) refs/packed-backend.c: refactor `find_reference_location()` refs: plumb `exclude_patterns` argument throughout builtin/for-each-ref.c: add `--exclude` option ref-filter.c: parameterize match functions over patterns ref-filter: add `ref_filter_clear()` ref-filter: clear reachable list pointers after freeing ref-filter.h: provide `REF_FILTER_INIT` refs.c: rename `ref_filter`
2023-07-17ref-filter: simplify return type of match_points_atJeff King
We return the oid that matched, but the sole caller only cares whether we matched anything at all. This is mostly academic, since there's only one caller, but the lifetime of the returned pointer is not immediately clear. Sometimes it points to an oid in a tag struct, which should live forever. And sometimes to the oid passed in, which only lives as long as the each_ref_fn callback we're called from. Simplify this to a boolean return which is more direct and obvious. As a bonus, this lets us avoid the weird pattern of overwriting our "oid" parameter in the loop (since we now only refer to the tagged oid one time, and can just inline the call to get it). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-17ref-filter: avoid parsing non-tags in match_points_at()Jeff King
When handling --points-at, we have to try to peel each ref to see if it's a tag that points at a requested oid. We start this process by calling parse_object() on the oid pointed to by each ref. The cost of parsing each object adds up, especially in an output that doesn't otherwise need to open the objects at all. Ideally we'd use peel_iterated_oid() here, which uses the cached information in the packed-refs file. But we can't, because our --points-at must match not only the fully peeled value, but any interim values (so if tag A points to tag B which points to commit C, we should match --points-at=B, but peel_iterated_oid() will only tell us about C). So the best we can do (absent changes to the packed-refs peel traits) is to avoid parsing non-tags. The obvious way to do that is to call oid_object_info() to check the type before parsing. But there are a few gotchas there, like checking if the object has already been parsed. Instead we can just tell parse_object() that we are OK skipping the hash check, which lets it turn on several optimizations. Commits can be loaded via the commit graph (so it's both fast and we have the benefit of the parsed data if we need it later at the output stage). Blobs are not loaded at all. Trees are still loaded, but it's rather rare to have a ref point directly to a tree (and since this is just an optimization, kicking in 99% of the time is OK). Even though we're paying for an extra lookup, the cost to avoid parsing the non-tags is a net benefit. In my git.git repository with 941 tags and 1440 other refs pointing to commits, this significantly cuts the runtime: Benchmark 1: ./git.old for-each-ref --points-at=HEAD Time (mean ± σ): 26.8 ms ± 0.5 ms [User: 24.5 ms, System: 2.2 ms] Range (min … max): 25.9 ms … 29.2 ms 107 runs Benchmark 2: ./git.new for-each-ref --points-at=HEAD Time (mean ± σ): 9.1 ms ± 0.3 ms [User: 6.8 ms, System: 2.2 ms] Range (min … max): 8.6 ms … 10.2 ms 308 runs Summary './git.new for-each-ref --points-at=HEAD' ran 2.96 ± 0.10 times faster than './git.old for-each-ref --points-at=HEAD' In a repository that is mostly annotated tags, we'd expect less improvement (we might still skip a few object loads, but that's balanced by the extra lookups). In my clone of linux.git, which has 782 tags and 3 branches, the run-time is about the same (it's actually ~1% faster on average after this patch, but that's within the run-to-run noise). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-17ref-filter: avoid parsing tagged objects in match_points_at()Jeff King
When we peel tags to check if they match a --points-at oid, we recursively parse the tagged object to see if it is also a tag. But since the tag itself tells us the type of the object it points to (and even gives us the appropriate object struct via its "tagged" member), we can use that directly. We do still have to make sure to call parse_tag() before looking at each tag. This is redundant for the outermost tag (since we did call parse_object() to find its type), but that's OK; parse_tag() is smart enough to make this a noop when the tag has already been parsed. In my clone of linux.git, with 782 tags (and only 3 non-tags), this yields a significant speedup (bringing us back where we were before the commit before this one started recursively dereferencing tags): Benchmark 1: ./git.old for-each-ref --points-at=HEAD --format="%(refname)" Time (mean ± σ): 20.3 ms ± 0.5 ms [User: 11.1 ms, System: 9.1 ms] Range (min … max): 19.6 ms … 21.5 ms 141 runs Benchmark 2: ./git.new for-each-ref --points-at=HEAD --format="%(refname)" Time (mean ± σ): 11.4 ms ± 0.2 ms [User: 6.3 ms, System: 5.0 ms] Range (min … max): 11.0 ms … 12.2 ms 250 runs Summary './git.new for-each-ref --points-at=HEAD --format="%(refname)"' ran 1.79 ± 0.05 times faster than './git.old for-each-ref --points-at=HEAD --format="%(refname)"' Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-17ref-filter: handle nested tags in --points-at optionJan Klötzke
Tags are dereferenced until reaching a different object type to handle nested tags, e.g. on checkout. In contrast, "git tag --points-at=..." fails to list such nested tags because only one level of indirection is obtained in filter_refs(). Implement the recursive dereferencing for the "--points-at" option when filtering refs to unify the behaviour. Signed-off-by: Jan Klötzke <jan@kloetzke.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-17Merge branch 'cw/compat-util-header-cleanup'Junio C Hamano
Further shuffling of declarations across header files to streamline file dependencies. * cw/compat-util-header-cleanup: git-compat-util: move alloc macros to git-compat-util.h treewide: remove unnecessary includes for wrapper.h kwset: move translation table from ctype sane-ctype.h: create header for sane-ctype macros git-compat-util: move wrapper.c funcs to its header git-compat-util: move strbuf.c funcs to its header
2023-07-14Merge branch 'ks/ref-filter-signature'Junio C Hamano
The "git for-each-ref" family of commands learned placeholders related to GPG signature verification. * ks/ref-filter-signature: ref-filter: add new "signature" atom t/lib-gpg: introduce new prereq GPG2
2023-07-10refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)Taylor Blau
When iterating through the `packed-refs` file in order to answer a query like: $ git for-each-ref --exclude=refs/__hidden__ it would be useful to avoid walking over all of the entries in `refs/__hidden__/*` when possible, since we know that the ref-filter code is going to throw them away anyways. In certain circumstances, doing so is possible. The algorithm for doing so is as follows: - For each excluded pattern, find the first record that matches it, and the first record that *doesn't* match it (i.e. the location you'd next want to consider when excluding that pattern). - Sort the set of excluded regions from the previous step in ascending order of the first location within the `packed-refs` file that matches. - Clean up the results from the previous step: discard empty regions, and combine adjacent regions. The set of regions which remains is referred to as the "jump list", and never contains any references which should be included in the result set. Then when iterating through the `packed-refs` file, if `iter->pos` is ever contained in one of the regions from the previous steps, advance `iter->pos` past the end of that region, and continue enumeration. Note that we only perform this optimization when none of the excluded pattern(s) have special meta-characters in them. For a pattern like "refs/foo[ac]", the excluded regions ("refs/fooa", "refs/fooc", and everything underneath them) are not connected. A future implementation that handles this case may split the character class (pretending as if two patterns were excluded: "refs/fooa", and "refs/fooc"). There are a few other gotchas worth considering. First, note that the jump list is sorted, so once we jump past a region, we can avoid considering it (or any regions preceding it) again. The member `jump_pos` is used to track the first next-possible region to jump through. Second, note that the jump list is best-effort, since we do not handle loose references, and because of the meta-character issue above. The jump list may not skip past all references which won't appear in the results, but will never skip over a reference which does appear in the result set. In repositories with a large number of hidden references, the speed-up can be significant. Tests here are done with a copy of linux.git with a reference "refs/pull/N" pointing at every commit, as in: $ git rev-list HEAD | awk '{ print "create refs/pull/" NR " " $0 }' | git update-ref --stdin $ git pack-refs --all , it is significantly faster to have `for-each-ref` jump over the excluded references, as opposed to filtering them out after the fact: $ hyperfine \ 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"' \ 'git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' \ 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' Benchmark 1: git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/" Time (mean ± σ): 798.1 ms ± 3.3 ms [User: 687.6 ms, System: 146.4 ms] Range (min … max): 794.5 ms … 805.5 ms 10 runs Benchmark 2: git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull" Time (mean ± σ): 98.9 ms ± 1.4 ms [User: 93.1 ms, System: 5.7 ms] Range (min … max): 97.0 ms … 104.0 ms 29 runs Benchmark 3: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull" Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 0.7 ms, System: 3.8 ms] Range (min … max): 4.1 ms … 5.8 ms 524 runs Summary 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' ran 21.87 ± 1.05 times faster than 'git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' 176.52 ± 8.19 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"' (Comparing stock git and this patch isn't quite fair, since an earlier commit in this series adds a naive implementation of the `--exclude` option. `git.prev` is built from the previous commit and includes this naive implementation). Using the jump list is fairly straightforward (see the changes to `refs/packed-backend.c::next_record()`), but constructing the list is not. To ensure that the construction is correct, add a new suite of tests in t1419 covering various corner cases (overlapping regions, partially overlapping regions, adjacent regions, etc.). Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10refs: plumb `exclude_patterns` argument throughoutTaylor Blau
The subsequent patch will want to access an optional `excluded_patterns` array within `refs/packed-backend.c` that will cull out certain references matching any of the given patterns on a best-effort basis. To do so, the refs subsystem needs to be updated to pass this value across a number of different locations. Prepare for a future patch by introducing this plumbing now, passing NULLs at top-level APIs in order to make that patch less noisy and more easily readable. Signed-off-by: Taylor Blau <me@ttaylorr.co> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10builtin/for-each-ref.c: add `--exclude` optionTaylor Blau
When using `for-each-ref`, it is sometimes convenient for the caller to be able to exclude certain parts of the references. For example, if there are many `refs/__hidden__/*` references, the caller may want to emit all references *except* the hidden ones. Currently, the only way to do this is to post-process the output, like: $ git for-each-ref --format='%(refname)' | grep -v '^refs/hidden/' Which is do-able, but requires processing a potentially large quantity of references. Teach `git for-each-ref` a new `--exclude=<pattern>` option, which excludes references from the results if they match one or more excluded patterns. This patch provides a naive implementation where the `ref_filter` still sees all references (including ones that it will discard) and is left to check whether each reference matches any excluded pattern(s) before emitting them. By culling out references we know the caller doesn't care about, we can avoid allocating memory for their storage, as well as spending time sorting the output (among other things). Even the naive implementation provides a significant speed-up on a modified copy of linux.git (that has a hidden ref pointing at each commit): $ hyperfine \ 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' \ 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' Benchmark 1: git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/" Time (mean ± σ): 820.1 ms ± 2.0 ms [User: 703.7 ms, System: 152.0 ms] Range (min … max): 817.7 ms … 823.3 ms 10 runs Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/ Time (mean ± σ): 106.6 ms ± 1.1 ms [User: 99.4 ms, System: 7.1 ms] Range (min … max): 104.7 ms … 109.1 ms 27 runs Summary 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' ran 7.69 ± 0.08 times faster than 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' Subsequent patches will improve on this by avoiding visiting excluded sections of the `packed-refs` file in certain cases. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10ref-filter.c: parameterize match functions over patternsJeff King
`match_pattern()` and `match_name_as_path()` both take a `struct ref_filter *`, and then store a stack variable `patterns` pointing at `filter->patterns`. The subsequent patch will add a new array of patterns to match over (the excluded patterns, via a new `git for-each-ref --exclude` option), treating the return value of these functions differently depending on which patterns are being used to match. Tweak `match_pattern()` and `match_name_as_path()` to take an array of patterns to prepare for passing either in. Once we start passing either in, `match_pattern()` will have little to do with a particular `struct ref_filter *` instance. To clarify this, drop it from the argument list, and replace it with the only bit of the `ref_filter` that we care about (`filter->ignore_case`). Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10ref-filter: add `ref_filter_clear()`Jeff King
We did not bother to clean up at all in `git branch` or `git tag`, and `git for-each-ref` only cleans up a couple of members. Add and call `ref_filter_clear()` when cleaning up a `struct ref_filter`. Running this patch (without any test changes) indicates a couple of now leak-free tests. This was found by running: $ make SANITIZE=leak $ make -C t GIT_TEST_PASSING_SANITIZE_LEAK=check GIT_TEST_OPTS=--immediate (Note that the `reachable_from` and `unreachable_from` lists should be cleaned as they are used. So this is just covering any case where we might bail before running the reachability check.) Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10ref-filter: clear reachable list pointers after freeingJeff King
In `reach_filter()`, we pop all commits from the reachable lists, leaving them empty. But because we're operating on a list pointer that was passed by value, the original `filter.reachable_from` and `filter.unreachable_from` pointers are left dangling. As is the case with the previous commit, nobody touches either of these fields after calling `reach_filter()`, so leaving them dangling is OK. But this future proofs against dangerous situations. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-05git-compat-util: move alloc macros to git-compat-util.hCalvin Wan
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for dynamic array allocation. Moving these macros to git-compat-util.h with the other alloc macros focuses alloc.[ch] to allocation for Git objects and additionally allows us to remove inclusions to alloc.h from files that solely used the above macros. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21object-store-ll.h: split this header out of object-store.hElijah Newren
The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store | sort | uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-06ref-filter: add new "signature" atomKousik Sanagavarapu
Duplicate the code for outputting the signature and its other parameters for commits and tags in ref-filter from pretty. In the future, this will help in getting rid of the current duplicate implementations of such logic everywhere, when ref-filter can do everything that pretty is doing. The new atom "signature" and its friends are equivalent to the existing pretty formats as follows: %(signature) = %GG %(signature:grade) = %G? %(siganture:signer) = %GS %(signature:key) = %GK %(signature:fingerprint) = %GF %(signature:primarykeyfingerprint) = %GP %(signature:trustlevel) = %GT Co-authored-by: Hariom Verma <hariom18599@gmail.com> Co-authored-by: Jaydeep Das <jaydeepjd.8914@gmail.com> Co-authored-by: Nsengiyumva Wilberforce <nsengiyumvawilberforce@gmail.com> Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-09Merge branch 'en/header-split-cache-h-part-2'Junio C Hamano
More header clean-up. * en/header-split-cache-h-part-2: (22 commits) reftable: ensure git-compat-util.h is the first (indirect) include diff.h: reduce unnecessary includes object-store.h: reduce unnecessary includes commit.h: reduce unnecessary includes fsmonitor: reduce includes of cache.h cache.h: remove unnecessary headers treewide: remove cache.h inclusion due to previous changes cache,tree: move basic name compare functions from read-cache to tree cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c hash-ll.h: split out of hash.h to remove dependency on repository.h tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h dir.h: move DTYPE defines from cache.h versioncmp.h: move declarations for versioncmp.c functions from cache.h ws.h: move declarations for ws.c functions from cache.h match-trees.h: move declarations for match-trees.c functions from cache.h pkt-line.h: move declarations for pkt-line.c functions from cache.h base85.h: move declarations for base85.c functions from cache.h copy.h: move declarations for copy.c functions from cache.h server-info.h: move declarations for server-info.c functions from cache.h packfile.h: move pack_window and pack_entry from cache.h ...
2023-04-25Merge branch 'en/header-split-cache-h'Junio C Hamano
Header clean-up. * en/header-split-cache-h: (24 commits) protocol.h: move definition of DEFAULT_GIT_PORT from cache.h mailmap, quote: move declarations of global vars to correct unit treewide: reduce includes of cache.h in other headers treewide: remove double forward declaration of read_in_full cache.h: remove unnecessary includes treewide: remove cache.h inclusion due to pager.h changes pager.h: move declarations for pager.c functions from cache.h treewide: remove cache.h inclusion due to editor.h changes editor: move editor-related functions and declarations into common file treewide: remove cache.h inclusion due to object.h changes object.h: move some inline functions and defines from cache.h treewide: remove cache.h inclusion due to object-file.h changes object-file.h: move declarations for object-file.c functions from cache.h treewide: remove cache.h inclusion due to git-zlib changes git-zlib: move declarations for git-zlib functions from cache.h treewide: remove cache.h inclusion due to object-name.h changes object-name.h: move declarations for object-name.c functions from cache.h treewide: remove unnecessary cache.h inclusion treewide: be explicit about dependence on mem-pool.h treewide: be explicit about dependence on oid-array.h ...
2023-04-24commit.h: reduce unnecessary includesElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24treewide: remove cache.h inclusion due to previous changesElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>