summaryrefslogtreecommitdiff
path: root/builtin/name-rev.c
AgeCommit message (Collapse)Author
2024-02-26name-rev: use mem_pool_strfmt()René Scharfe
1c56fc2084 (name-rev: pre-size buffer in get_parent_name(), 2020-02-04) got a big performance boost in an unusual repository by calculating the name length in advance. This is a bit awkward, as it references the name components twice. Use a memory pool to store the strings for the struct rev_name member tip_name. Using mem_pool_strfmt() allows efficient allocation without explicit size calculation. This simplifies the formatting part of the code without giving up performance: Benchmark 1: ./git_2.44.0 -C ../chromium/src name-rev --all Time (mean ± σ): 1.231 s ± 0.013 s [User: 1.082 s, System: 0.136 s] Range (min … max): 1.214 s … 1.252 s 10 runs Benchmark 2: ./git -C ../chromium/src name-rev --all Time (mean ± σ): 1.220 s ± 0.020 s [User: 1.083 s, System: 0.130 s] Range (min … max): 1.197 s … 1.254 s 10 runs Don't bother discarding the memory pool just before exiting. The effort for that would be very low, but actually measurable in the above example, with no benefit to users. At least UNLEAK it to calm down leak checkers. This addresses the leaks that 45a14f578e (Revert "name-rev: release unused name strings", 2022-04-22) brought back. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-05name-rev: use OPT_HIDDEN_BOOL for --peel-tagRené Scharfe
adfc1857bd (describe: fix --contains when a tag is given as input, 2013-07-18) added the option --peel-tag, defining it using a positional struct option initializer and a comment indicating that it's intended to be a hidden OPT_BOOL. 4741edd549 (Remove deprecated OPTION_BOOLEAN for parsing arguments, 2013-08-03) added the macro OPT_HIDDEN_BOOL, which allows to express this more succinctly. Use it. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-05git-compat-util: move alloc macros to git-compat-util.hCalvin Wan
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for dynamic array allocation. Moving these macros to git-compat-util.h with the other alloc macros focuses alloc.[ch] to allocation for Git objects and additionally allows us to remove inclusions to alloc.h from files that solely used the above macros. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21git-compat-util.h: remove unneccessary include of wildmatch.hElijah Newren
The include of wildmatch.h in git-compat-util.h was added in cebcab189aa (Makefile: add USE_WILDMATCH to use wildmatch as fnmatch, 2013-01-01) as a way to be able to compile-time force any calls to fnmatch() to instead invoke wildmatch(). The defines and inline function were removed in 70a8fc999d9 (stop using fnmatch (either native or compat), 2014-02-15), and this include in git-compat-util.h has been unnecessary ever since. Remove the include from git-compat-util.h, but add it to the .c files that had omitted the direct #include they needed. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-06name-rev: make --stdin hiddenJohn Cai
In 34ae3b70 (name-rev: deprecate --stdin in favor of --annotate-stdin), we renamed --stdin to --annotate-stdin for the sake of a clearer name for the option, and added text that indicates --stdin is deprecated. The next step is to hide --stdin completely. Make the option hidden. Also, update documentation to remove all mentions of --stdin. Signed-off-by: "John Cai" <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11pager.h: move declarations for pager.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object-name.h: move declarations for object-name.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-04Merge branch 'ab/remove-implicit-use-of-the-repository' into ↵Junio C Hamano
en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-03-28cocci: apply the "commit.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason
Apply the part of "the_repository.pending.cocci" pertaining to "commit.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28cocci: apply the "cache.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason
Apply the part of "the_repository.pending.cocci" pertaining to "cache.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21environment.h: move declarations for environment.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24alloc.h: move ALLOC_GROW() functions from cache.hElijah Newren
This allows us to replace includes of cache.h with includes of the much smaller alloc.h in many places. It does mean that we also need to add includes of alloc.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-22Merge branch 'ab/various-leak-fixes'Junio C Hamano
Leak fixes. * ab/various-leak-fixes: push: free_refs() the "local_refs" in set_refspecs() push: refactor refspec_append_mapped() for subsequent leak-fix receive-pack: release the linked "struct command *" list grep API: plug memory leaks by freeing "header_list" grep.c: refactor free_grep_patterns() builtin/merge.c: free "&buf" on "Your local changes..." error builtin/merge.c: use fixed strings, not "strbuf", fix leak show-branch: free() allocated "head" before return commit-graph: fix a parse_options_concat() leak http-backend.c: fix cmd_main() memory leak, refactor reg{exec,free}() http-backend.c: fix "dir" and "cmd_arg" leaks in cmd_main() worktree: fix a trivial leak in prune_worktrees() repack: fix leaks on error with "goto cleanup" name-rev: don't xstrdup() an already dup'd string various: add missing clear_pathspec(), fix leaks clone: use free() instead of UNLEAK() commit-graph: use free_commit_graph() instead of UNLEAK() bundle.c: don't leak the "args" in the "struct child_process" tests: mark tests as passing with SANITIZE=leak
2023-02-09name-rev: fix names by dropping taggerdate workaroundElijah Newren
Commit 7550424804 ("name-rev: include taggerdate in considering the best name", 2016-04-22) introduced the idea of using taggerdate in the criteria for selecting the best name. At the time, a certain commit in linux.git -- namely, aed06b9cfcab -- was being named by name-rev as v4.6-rc1~9^2~792 which, while correct, was very suboptimal. Some investigation found that tweaking the MERGE_TRAVERSAL_WEIGHT to lower it could give alternate answers such as v3.13-rc7~9^2~14^2~42 or v3.13~5^2~4^2~2^2~1^2~42 A manual solution involving looking at tagger dates came up with v3.13-rc1~65^2^2~42 which is much nicer. That workaround was then implemented in name-rev. Unfortunately, the taggerdate heuristic is causing bugs. I was pointed to a case in a private repository where name-rev reports a name of the form v2022.10.02~86 when users expected to see one of the form v2022.10.01~2 (I've modified the names and numbers a bit from the real testcase.) As you can probably guess, v2022.10.01 was created after v2022.10.02 (by a few hours), even though it pointed to an older commit. While the condition is unusual even in the repository in question, it is not the only problematic set of tags in that repository. The taggerdate logic is causing problems. Further, it turns out that this taggerdate heuristic isn't even helping anymore. Due to the fix to naming logic in 3656f84278 ("name-rev: prefer shorter names over following merges", 2021-12-04), we get improved names without the taggerdate heuristic. For the original commit of interest in linux.git, a modern git without the taggerdate heuristic still provides the same optimal answer of interest, namely: v3.13-rc1~65^2^2~42 So, the taggerdate is no longer providing benefit, and it is causing problems. Simply get rid of it. However, note that "taggerdate" as a variable is used to store things besides a taggerdate these days. Ever since commit ef1e74065c ("name-rev: favor describing with tags and use committer date to tiebreak", 2017-03-29), this has been used to store committer dates and there it is used as a fallback tiebreaker (as opposed to a primary criteria overriding effective distance calculations). We do not want to remove that fallback tiebreaker, so not all instances of "taggerdate" are removed in this change. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06name-rev: don't xstrdup() an already dup'd stringÆvar Arnfjörð Bjarmason
When "add_to_tip_table()" is called with a non-zero "shorten_unambiguous" we always return an xstrdup()'d string, which we'd then xstrdup() again, leaking memory. See [1] and [2] for how this leak came about. We could xstrdup() only if "shorten_unambiguous" wasn't true, but let's instead inline this code, so that information on whether we need to xstrdup() is contained within add_to_tip_table(). 1. 98c5c4ad015 (name-rev: allow to specify a subpath for --refs option, 2013-06-18) 2. b23e0b9353e (name-rev: allow converting the exact object name at the tip of a ref, 2013-07-07) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01git-compat-util.h: use "UNUSED", not "UNUSED(var)"Ævar Arnfjörð Bjarmason
As reported in [1] the "UNUSED(var)" macro introduced in 2174b8c75de (Merge branch 'jk/unused-annotation' into next, 2022-08-24) breaks coccinelle's parsing of our sources in files where it occurs. Let's instead partially go with the approach suggested in [2] of making this not take an argument. As noted in [1] "coccinelle" will ignore such tokens in argument lists that it doesn't know about, and it's less of a surprise to syntax highlighters. This undoes the "help us notice when a parameter marked as unused is actually use" part of 9b240347543 (git-compat-util: add UNUSED macro, 2022-08-19), a subsequent commit will further tweak the macro to implement a replacement for that functionality. 1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19refs: mark unused each_ref_fn parametersJeff King
Functions used with for_each_ref(), etc, need to conform to the each_ref_fn interface. But most of them don't need every parameter; let's annotate the unused ones to quiet -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-20name-rev: prefix annotate-stdin with '--' in messageAlexander Shopov
This is an option rather than command. Make the message convey this similar to the other messages in the file. Signed-off-by: Alexander Shopov <ash@kambanaria.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-28Merge branch 'rs/name-rev-fix-free-after-use'Junio C Hamano
Regression fix for 2.36 where "git name-rev" started to sometimes reference strings after they are freed. * rs/name-rev-fix-free-after-use: Revert "name-rev: release unused name strings"
2022-04-23Revert "name-rev: release unused name strings"René Scharfe
This reverts commit 2d53975488df195e1431c3f90bfb5b60018d5bf6. 3656f84278 (name-rev: prefer shorter names over following merges, 2021-12-04) broke the assumption of 2d53975488 (name-rev: release unused name strings, 2020-02-04) that a better name for a child is a better name for all of its ancestors as well, because it added a penalty for generation > 0. This leads to strings being free(3)'d that are still needed. 079f970971 (name-rev: sort tip names before applying, 2020-02-05) already reduced the number of free(3) calls for the use case that motivated the original patch (name-rev --all in the Chromium repository) from ca. 44000 to 5, and 3656f84278 eliminated even those few. So this revert won't affect name-rev's performance on that particular repo. Reported-by: Thomas Hurst <tom@hur.st> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-13name-rev: use generation numbers if availableJacob Keller
If a commit in a sequence of linear history has a non-monotonically increasing commit timestamp, git name-rev might not properly name the commit. This occurs because name-rev uses a heuristic of the commit date to avoid searching down tags which lead to commits that are older than the named commit. This is intended to avoid work on larger repositories. This heuristic impacts git name-rev, and by extension git describe --contains which is built on top of name-rev. Further more, if --all or --annotate-stdin is used, the heuristic is not enabled because the full history has to be analyzed anyways. This results in some confusion if a user sees that --annotate-stdin works but a normal name-rev does not. If the repository has a commit graph, we can use the generation numbers instead of using the commit dates. This is essentially the same check except that generation numbers make it exact, where the commit date heuristic could be incorrect due to clock errors. Since we're extending the notion of cutoff to more than one variable, create a series of functions for setting and checking the cutoff. This avoids duplication and moves access of the global cutoff and generation_cutoff to as few functions as possible. Add several test cases including a test that covers the new commitGraph behavior, as well as tests for --all and --annotate-stdin with and without commitGraphs. Signed-off-by: Jacob Keller <jacob.keller@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-16name-rev: replace --stdin with --annotate-stdin in synopsisJohn Cai
34ae3b70 (name-rev: deprecate --stdin in favor of --annotate-stdin, 2022-01-05) added --annotate-stdin to replace --stdin as a clearer flag name. Since --stdin is to be deprecated, we should replace --stdin in the output from "git name-rev -h". Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-10name-rev.c: use strbuf_getline instead of limited size bufferJohn Cai
Using a buffer limited to 2048 is unnecessarily limiting. Switch to using a string buffer to read in stdin for annotation. Signed-off-by: "John Cai" <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-10name-rev: deprecate --stdin in favor of --annotate-stdinJohn Cai
Introduce a --annotate-stdin that is functionally equivalent of --stdin. --stdin does not behave as --stdin in other subcommands, such as pack-objects whereby it takes one argument per line. Since --stdin can be a confusing and misleading name, rename it to --annotate-stdin. This change adds a warning to --stdin warning that it will be removed in the future. Signed-off-by: "John Cai" <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-05name-rev: prefer shorter names over following mergesElijah Newren
name-rev has a MERGE_TRAVERSAL_WEIGHT to say that traversing a second or later parent of a merge should be 65535 times more expensive than a first-parent traversal, as per ac076c29ae8d (name-rev: Fix non-shortest description, 2007-08-27). The point of this weight is to prefer names like v2.32.0~1471^2 over names like v2.32.0~43^2~15^2~11^2~20^2~31^2 which are two equally valid names in git.git for the same commit. Note that the first follows 1472 parent traversals compared to a mere 125 for the second. Weighting all traversals equally would clearly prefer the second name since it has fewer parent traversals, but humans aren't going to be traversing commits and they tend to have an easier time digesting names with fewer segments. The fact that the former only has two segments (~1471, ^2) makes it much simpler than the latter which has six segments (~43, ^2, ~15, etc.). Since name-rev is meant to "find symbolic names suitable for human digestion", we prefer fewer segments. However, the particular rule implemented in name-rev would actually prefer v2.33.0-rc0~11^2~1 over v2.33.0-rc0~20^2 because both have precisely one second parent traversal, and it gives the tie breaker to shortest number of total parent traversals. Fewer segments is more important for human consumption than number of hops, so we'd rather see the latter which has one fewer segment. Include the generation in is_better_name() and use a new effective_distance() calculation so that we prefer fewer segments in the printed name over fewer total parent traversals performed to get the answer. == Side-note on tie-breakers == When there are the same number of segments for two different names, we actually use the name of an ancestor commit as a tie-breaker as well. For example, for the commit cbdca289fb in the git.git repository, we prefer the name v2.33.0-rc0~112^2~1 over v2.33.0-rc0~57^2~5. This is because: * cbdca289fb is the parent of 25e65b6dd5, which implies the name for cbdca289fb should be the first parent of the preferred name for 25e65b6dd5 * 25e65b6dd5 could be named either v2.33.0-rc0~112^2 or v2.33.0-rc0~57^2~4, but the former is preferred over the latter due to fewer segments * combine the two previous facts, and the name we get for cbdca289fb is "v2.33.0-rc0~112^2~1" rather than "v2.33.0-rc0~57^2~5". Technically, we get this for free out of the implementation since we only keep track of one name for each commit as we walk history (and re-add parents to the queue if we find a better name for those parents), but the first bullet point above ensures users get results that feel more consistent. == Alternative Ideas and Meanings Discussed == One suggestion that came up during review was that shortest string-length might be easiest for users to consume. However, such a scheme would be rather computationally expensive (we'd have to track all names for each commit as we traversed the graph) and would additionally come with the possibly perplexing result that on a linear segment of history we could rapidly swap back and forth on names: MYTAG~3^2 would be preferred over MYTAG~9998 MYTAG~3^2~1 would NOT be preferred over MYTAG~9999 MYTAG~3^2~2 might be preferred over MYTAG~10000 Another item that came up was possible auxiliary semantic meanings for name-rev results either before or after this patch. The basic answer was that the previous implementation had no known useful auxiliary semantics, but that for many repositories (most in my experience), the new scheme does. In particular, the new name-rev output can often be used to answer the question, "How or when did this commit get merged?" Since that usefulness depends on how merges happen within the repository and thus isn't universally applicable, details are omitted here but you can see them at [1]. [1] https://lore.kernel.org/git/CABPp-BEeUM+3NLKDVdak90_UUeNghYCx=Dgir6=8ixvYmvyq3Q@mail.gmail.com/ Finally, it was noted that the algorithm could be improved by just explicitly tracking the number of segments and using both it and distance in the comparison, instead of giving a magic number that tries to blend the two (and which therefore might give suboptimal results in repositories with really huge numbers of commits that periodically merge older code). However, "[this patch] seems to give us a much better results than the current code, so let's take it and leave further futzing outside the scope." Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-28oid_pos(): access table through const pointersJeff King
When we are looking up an oid in an array, we obviously don't need to write to the array. Let's mark it as const in the function interfaces, as well as in the local variables we use to derference the void pointer (note a few cases use pointers-to-pointers, so we mark everything const). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-28hash_pos(): convert to oid_pos()Jeff King
All of our callers are actually looking up an object_id, not a bare hash. Likewise, the arrays they are looking in are actual arrays of object_id (not just raw bytes of hashes, as we might find in a pack .idx; those are handled by bsearch_hash()). Using an object_id gives us more type safety, and makes the callers slightly shorter. It also gets rid of the word "sha1" from several access functions, though we could obviously also rename those with s/sha1/hash/. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04hash-lookup: rename from sha1-lookupMartin Ågren
Change all remnants of "sha1" in hash-lookup.c and .h and rename them to reflect that we're not just able to handle SHA-1 these days. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04sha1-lookup: rename `sha1_pos()` as `hash_pos()`Martin Ågren
Rename this function to reflect that we're not just able to handle SHA-1 these days. There are a few instances of "sha1" left in sha1-lookup.[ch] after this, but those will be addressed in the next commit. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-14messages: avoid SHA-1 in end-user facing messagesJunio C Hamano
There are still a handful mentions of SHA-1 when we meant the (hexadecimal) object names in end-user facing messages. Rewrite them. I was hoping that this can mostly be s/SHA-1/object name/, but a few messages needed rephrasing to keep the result readable. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05name-rev: sort tip names before applyingRené Scharfe
name_ref() is called for each ref and checks if its a better name for the referenced commit. If that's the case it remembers it and checks if a name based on it is better for its ancestors as well. This in done in the the order for_each_ref() imposes on us. That might not be optimal. If bad names happen to be encountered first (as defined by is_better_name()), names derived from them may spread to a lot of commits, only to be replaced by better names later. Setting better names first can avoid that. is_better_name() prefers tags, short distances and old references. The distance is a measure that we need to calculate for each candidate commit, but the other two properties are not dependent on the relationships of commits. Sorting the refs by them should yield better performance than the essentially random order we currently use. And applying older references first should also help to reduce rework due to the fact that older commits have less ancestors than newer ones. So add all details of names to the tip table first, then sort them to prefer tags and older references and then apply them in this order. Here's the performance as measures by hyperfine for the Linux repo before: Benchmark #1: ./git -C ../linux/ name-rev --all Time (mean ± σ): 851.1 ms ± 4.5 ms [User: 806.7 ms, System: 44.4 ms] Range (min … max): 845.9 ms … 859.5 ms 10 runs ... and with this patch: Benchmark #1: ./git -C ../linux/ name-rev --all Time (mean ± σ): 736.2 ms ± 8.7 ms [User: 688.4 ms, System: 47.5 ms] Range (min … max): 726.0 ms … 755.2 ms 10 runs Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05name-rev: release unused name stringsRené Scharfe
name_rev() assigns a name to a commit and its parents and grandparents and so on. Commits share their name string with their first parent, which in turn does the same, recursively to the root. That saves a lot of allocations. When a better name is found, the old name is replaced, but its memory is not released. That leakage can become significant. Can we release these old strings exactly once even though they are referenced multiple times? Yes, indeed -- we can make use of the fact that name_rev() visits the ancestors of a commit after it set a new name for it and tries to update their names as well. Members of the first ancestral line have the same taggerdate and from_tag values, but a higher distance value than their child commit at generation 0. These are the only criteria used by is_better_name(). Lower distance values are considered better, so a name that is better for a child will also be better for its parent and grandparent etc. That means we can free(3) an inferior name at generation 0 and rely on name_rev() to replace all references in ancestors as well. If we do that then we need to stop using the string pointer alone to distinguish new empty rev_name slots from initialized ones, though, as it technically becomes invalid after the free(3) call -- even though its value is still different from NULL. We can check the generation value first, as empty slots will have it initialized to 0, and for the actual generation 0 we'll set a new valid name right after the create_or_update_name() call that releases the string. For the Chromium repo, releasing superceded names reduces the memory footprint of name-rev --all significantly. Here's the output of GNU time before: 0.98user 0.48system 0:01.46elapsed 99%CPU (0avgtext+0avgdata 2601812maxresident)k 0inputs+0outputs (0major+571470minor)pagefaults 0swaps ... and with this patch: 1.01user 0.26system 0:01.28elapsed 100%CPU (0avgtext+0avgdata 1559196maxresident)k 0inputs+0outputs (0major+314370minor)pagefaults 0swaps It also gets faster; hyperfine before: Benchmark #1: ./git -C ../chromium/src name-rev --all Time (mean ± σ): 1.534 s ± 0.006 s [User: 1.039 s, System: 0.494 s] Range (min … max): 1.522 s … 1.542 s 10 runs ... and with this patch: Benchmark #1: ./git -C ../chromium/src name-rev --all Time (mean ± σ): 1.338 s ± 0.006 s [User: 1.047 s, System: 0.291 s] Range (min … max): 1.327 s … 1.346 s 10 runs For the Linux repo it doesn't pay off; memory usage only gets down from: 0.76user 0.03system 0:00.80elapsed 99%CPU (0avgtext+0avgdata 292848maxresident)k 0inputs+0outputs (0major+44579minor)pagefaults 0swaps ... to: 0.78user 0.03system 0:00.81elapsed 100%CPU (0avgtext+0avgdata 284696maxresident)k 0inputs+0outputs (0major+44892minor)pagefaults 0swaps The runtime actually increases slightly from: Benchmark #1: ./git -C ../linux/ name-rev --all Time (mean ± σ): 828.8 ms ± 5.0 ms [User: 797.2 ms, System: 31.6 ms] Range (min … max): 824.1 ms … 838.9 ms 10 runs ... to: Benchmark #1: ./git -C ../linux/ name-rev --all Time (mean ± σ): 847.6 ms ± 3.4 ms [User: 807.9 ms, System: 39.6 ms] Range (min … max): 843.4 ms … 854.3 ms 10 runs Why is that? In the Chromium repo, ca. 44000 free(3) calls in create_or_update_name() release almost 1GB, while in the Linux repo 240000+ calls release a bit more than 5MB, so the average discarded name is ca. 1000x longer in the latter. Overall I think it's the right tradeoff to make, as it helps curb the memory usage in repositories with big discarded names, and the added overhead is small. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05name-rev: generate name strings only if they are betterRené Scharfe
Leave setting the tip_name member of struct rev_name to callers of create_or_update_name(). This avoids allocations for names that are rejected by that function. Here's how this affects the runtime when working with a fresh clone of Git's own repository; performance numbers by hyperfine before: Benchmark #1: ./git -C ../git-pristine/ name-rev --all Time (mean ± σ): 437.8 ms ± 4.0 ms [User: 422.5 ms, System: 15.2 ms] Range (min … max): 432.8 ms … 446.3 ms 10 runs ... and with this patch: Benchmark #1: ./git -C ../git-pristine/ name-rev --all Time (mean ± σ): 408.5 ms ± 1.4 ms [User: 387.2 ms, System: 21.2 ms] Range (min … max): 407.1 ms … 411.7 ms 10 runs Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05name-rev: pre-size buffer in get_parent_name()René Scharfe
We can calculate the size of new name easily and precisely. Open-code the xstrfmt() calls and grow the buffers as needed before filling them. This provides a surprisingly large benefit when working with the Chromium repository; here are the numbers measured using hyperfine before: Benchmark #1: ./git -C ../chromium/src name-rev --all Time (mean ± σ): 5.822 s ± 0.013 s [User: 5.304 s, System: 0.516 s] Range (min … max): 5.803 s … 5.837 s 10 runs ... and with this patch: Benchmark #1: ./git -C ../chromium/src name-rev --all Time (mean ± σ): 1.527 s ± 0.003 s [User: 1.015 s, System: 0.511 s] Range (min … max): 1.524 s … 1.535 s 10 runs Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05name-rev: factor out get_parent_name()René Scharfe
Reduce nesting by moving code to come up with a name for the parent into its own function. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05name-rev: put struct rev_name into commit slabRené Scharfe
The commit slab commit_rev_name contains a pointer to a struct rev_name, and the actual struct is allocated separatly. Avoid that allocation and pointer indirection by storing the full struct in the commit slab. Use the tip_name member pointer to determine if the returned struct is initialized. Performance in the Linux repository measured with hyperfine before: Benchmark #1: ./git -C ../linux/ name-rev --all Time (mean ± σ): 953.5 ms ± 6.3 ms [User: 901.2 ms, System: 52.1 ms] Range (min … max): 945.2 ms … 968.5 ms 10 runs ... and with this patch: Benchmark #1: ./git -C ../linux/ name-rev --all Time (mean ± σ): 851.0 ms ± 3.1 ms [User: 807.4 ms, System: 43.6 ms] Range (min … max): 846.7 ms … 857.0 ms 10 runs Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05name-rev: don't _peek() in create_or_update_name()René Scharfe
Look up the commit slab slot for the commit once using commit_rev_name_at() and populate it in case it is empty, instead of checking for emptiness in a separate step using commit_rev_name_peek() via get_commit_rev_name(). Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05name-rev: don't leak path copy in name_ref()René Scharfe
name_ref() duplicates the path string and passes it to name_rev(), which either puts it into a commit slab or ignores it if there is already a better name, leaking it. Move the duplication to name_rev() and release the copy in the latter case. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05name-rev: respect const qualifierRené Scharfe
Keep the const qualifier of the first parameter of get_rev_name() even when casting the object pointer to a commit pointer, and further for the parameter of get_commit_rev_name(), as all these uses are read-only. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05name-rev: remove unused typedefRené Scharfe
The type alias became unused with bf43abc6e6 (name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation, 2019-11-12); remove it. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05name-rev: rewrite create_or_update_name()Martin Ågren
This code was moved straight out of name_rev(). As such, we inherited the "goto" to jump from an if into an else-if. We also inherited the fact that "nothing to do -- return NULL" is handled last. Rewrite the function to first handle the "nothing to do" case. Then we can handle the conditional allocation early before going on to populate the struct. No need for goto-ing. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-25Merge branch 'sg/name-rev-wo-recursion'Junio C Hamano
Redo "git name-rev" to avoid recursive calls. * sg/name-rev-wo-recursion: name-rev: cleanup name_ref() name-rev: eliminate recursion in name_rev() name-rev: use 'name->tip_name' instead of 'tip_name' name-rev: drop name_rev()'s 'generation' and 'distance' parameters name-rev: restructure creating/updating 'struct rev_name' instances name-rev: restructure parsing commits and applying date cutoff name-rev: pull out deref handling from the recursion name-rev: extract creating/updating a 'struct name_rev' into a helper t6120: add a test to cover inner conditions in 'git name-rev's name_rev() name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation name-rev: avoid unnecessary cast in name_ref() name-rev: use strbuf_strip_suffix() in get_rev_name() t6120-describe: modernize the 'check_describe' helper t6120-describe: correct test repo history graph in comment
2019-12-09name-rev: cleanup name_ref()SZEDER Gábor
Earlier patches in this series moved a couple of conditions from the recursive name_rev() function into its caller name_ref(), for no other reason than to make eliminating the recursion a bit easier to follow. Since the previous patch name_rev() is not recursive anymore, so let's move all those conditions back into name_rev(). Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09name-rev: eliminate recursion in name_rev()SZEDER Gábor
The name_rev() function calls itself recursively for each interesting parent of the commit it got as parameter, and, consequently, it can segfault when processing a deep history if it exhausts the available stack space. E.g. running 'git name-rev --all' and 'git name-rev HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories results in segfaults on my machine ('ulimit -s' reports 8192kB of stack size limit), and nowadays the former segfaults in the Linux repo as well (it reached the necessasry depth sometime between v5.3-rc4 and -rc5). Eliminate the recursion by inserting the interesting parents into a LIFO 'prio_queue' [1] and iterating until the queue becomes empty. Note that the parent commits must be added in reverse order to the LIFO 'prio_queue', so their relative order is preserved during processing, i.e. the first parent should come out first from the queue, because otherwise performance greatly suffers on mergy histories [2]. The stacksize-limited test 'name-rev works in a deep repo' in 't6120-describe.sh' demonstrated this issue and expected failure. Now the recursion is gone, so flip it to expect success. Also gone are the dmesg entries logging the segfault of that segfaulting 'git name-rev' process on every execution of the test suite. Note that this slightly changes the order of lines in the output of 'git name-rev --all', usually swapping two lines every 35 lines in git.git or every 150 lines in linux.git. This shouldn't matter in practice, because the output has always been unordered anyway. This patch is best viewed with '--ignore-all-space'. [1] Early versions of this patch used a 'commit_list', resulting in ~15% performance penalty for 'git name-rev --all' in 'linux.git', presumably because of the memory allocation and release for each insertion and removal. Using a LIFO 'prio_queue' has basically no effect on performance. [2] We prefer shorter names, i.e. 'v0.1~234' is preferred over 'v0.1^2~5', meaning that usually following the first parent of a merge results in the best name for its ancestors. So when later we follow the remaining parent(s) of a merge, and reach an already named commit, then we usually find that we can't give that commit a better name, and thus we don't have to visit any of its ancestors again. OTOH, if we were to follow the Nth parent of the merge first, then the name of all its ancestors would include a corresponding '^N'. Those are not the best names for those commits, so when later we reach an already named commit following the first parent of that merge, then we would have to update the name of that commit and the names of all of its ancestors as well. Consequently, we would have to visit many commits several times, resulting in a significant slowdown. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09name-rev: use 'name->tip_name' instead of 'tip_name'SZEDER Gábor
Following the previous patches in this series we can get the value of 'name_rev()'s 'tip_name' parameter from the 'struct rev_name' associated with the commit as well. So let's use 'name->tip_name' instead, which makes the patch eliminating the recursion of name_rev() a bit easier to follow. Note that at this point we could drop the 'tip_name' parameter as well, but that parameter will be necessary later, after the recursion is eliminated. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-06name-rev: drop name_rev()'s 'generation' and 'distance' parametersSZEDER Gábor
Following the previous patches in this series we can get the values of name_rev()'s 'generation' and 'distance' parameters from the 'stuct rev_name' associated with the commit as well. Let's simplify the function's signature and remove these two unnecessary parameters. Note that at this point we could do the same with the 'tip_name', 'taggerdate' and 'from_tag' parameters as well, but those parameters will be necessary later, after the recursion is eliminated. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-06name-rev: restructure creating/updating 'struct rev_name' instancesSZEDER Gábor
At the beginning of the recursive name_rev() function it creates a new 'struct rev_name' instance for each previously unvisited commit or, if this visit results in better name for an already visited commit, then updates the 'struct rev_name' instance attached to the commit, or returns early. Restructure this so it's caller creates or updates the 'struct rev_name' instance associated with the commit to be passed as parameter, i.e. both name_ref() before calling name_rev() and name_rev() itself as it iterates over the parent commits. This makes eliminating the recursion a bit easier to follow, and the condition moved to name_ref() will be moved back to name_rev() after the recursion is eliminated. This change also plugs the memory leak that was temporarily unplugged in the earlier "name-rev: pull out deref handling from the recursion" patch in this series. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-06name-rev: restructure parsing commits and applying date cutoffSZEDER Gábor
At the beginning of the recursive name_rev() function it parses the commit it got as parameter, and returns early if the commit is older than a cutoff limit. Restructure this so the caller parses the commit and checks its date, and doesn't invoke name_rev() if the commit to be passed as parameter is older than the cutoff, i.e. both name_ref() before calling name_rev() and name_rev() itself as it iterates over the parent commits. This makes eliminating the recursion a bit easier to follow, and the condition moved to name_ref() will be moved back to name_rev() after the recursion is eliminated. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>