summaryrefslogtreecommitdiff
path: root/revision.c
AgeCommit message (Collapse)Author
2019-12-25Merge branch 'dl/format-patch-notes-config-fixup'Junio C Hamano
"git format-patch" can take a set of configured format.notes values to specify which notes refs to use in the log message part of the output. The behaviour of this was not consistent with multiple --notes command line options, which has been corrected. * dl/format-patch-notes-config-fixup: notes.h: fix typos in comment notes: break set_display_notes() into smaller functions config/format.txt: clarify behavior of multiple format.notes format-patch: move git_config() before repo_init_revisions() format-patch: use --notes behavior for format.notes notes: extract logic into set_display_notes() notes: create init_display_notes() helper notes: rename to load_display_notes()
2019-12-13notes: break set_display_notes() into smaller functionsDenton Liu
In 8164c961e1 (format-patch: use --notes behavior for format.notes, 2019-12-09), we introduced set_display_notes() which was a monolithic function with three mutually exclusive branches. Break the function up into three small and simple functions that each are only responsible for one task. This family of functions accepts an `int *show_notes` instead of returning a value suitable for assignment to `show_notes`. This is for two reasons. First of all, this guarantees that the external `show_notes` variable changes in lockstep with the `struct display_notes_opt`. Second, this prompts future developers to be careful about doing something meaningful with this value. In fact, a NULL check is intentionally omitted because causing a segfault here would tell the future developer that they are meant to use the value for something meaningful. One alternative was making the family of functions accept a `struct rev_info *` instead of the `struct display_notes_opt *`, since the former contains the `show_notes` field as well. This does not work because we have to call git_config() before repo_init_revisions(). However, if we had a `struct rev_info`, we'd need to initialize it before it gets assigned values from git_config(). As a result, we break the circular dependency by having standalone `int show_notes` and `struct display_notes_opt notes_opt` variables which temporarily hold values from git_config() until the values are copied over to `rev`. To implement this change, we need to get a pointer to `rev_info::show_notes`. Unfortunately, this is not possible with bitfields and only direct-assignment is possible. Change `rev_info::show_notes` to a non-bitfield int so that we can get its address. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-10Merge branch 'dl/pretty-reference'Junio C Hamano
"git log" family learned "--pretty=reference" that gives the name of a commit in the format that is often used to refer to it in log messages. * dl/pretty-reference: SubmittingPatches: use `--pretty=reference` pretty: implement 'reference' format pretty: add struct cmt_fmt_map::default_date_mode_type pretty: provide short date format t4205: cover `git log --reflog -z` blindspot pretty.c: inline initalize format_context revision: make get_revision_mark() return const pointer completion: complete `tformat:` pretty format SubmittingPatches: remove dq from commit reference pretty-formats.txt: use generic terms for hash SubmittingPatches: use generic terms for hash
2019-12-09notes: extract logic into set_display_notes()Denton Liu
Instead of open coding the logic that tweaks the variables in `struct display_notes_opt` within handle_revision_opt(), abstract away the logic into set_display_notes() so that it can be reused. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09notes: create init_display_notes() helperDenton Liu
We currently open code the initialization for revs->notes_opt. Abstract this away into a helper function so that the logic can be reused in a future commit. This is slightly wasteful as we memset the struct twice but this is only run once so it shouldn't have any major effect. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-05Merge branch 'mh/clear-topo-walk-upon-reset'Junio C Hamano
The revision walking machinery uses resources like per-object flag bits that need to be reset before a new iteration of walking begins, but the resources related to topological walk were not cleared correctly, which has been corrected. * mh/clear-topo-walk-upon-reset: revision: free topo_walk_info before creating a new one in init_topo_walk revision: clear the topo-walk flags in reset_revision_walk
2019-11-25revision: free topo_walk_info before creating a new one in init_topo_walkMike Hommey
init_topo_walk doesn't reuse an existing topo_walk_info, and currently leaks the one that might exist on the current rev_info if it was already used for a topo walk beforehand. Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-25revision: clear the topo-walk flags in reset_revision_walkMike Hommey
Not doing so can lead to wrong topo-walks when using the revision walk API consecutively. Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-20revision: make get_revision_mark() return const pointerDenton Liu
get_revision_mark() used to return a `char *`, even though all of the strings it was returning were string literals. Make get_revision_mark() return a `const char *` so that callers won't be tempted to modify the returned string. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-15Merge branch 'ew/hashmap'Junio C Hamano
Code clean-up of the hashmap API, both users and implementation. * ew/hashmap: hashmap_entry: remove first member requirement from docs hashmap: remove type arg from hashmap_{get,put,remove}_entry OFFSETOF_VAR macro to simplify hashmap iterators hashmap: introduce hashmap_free_entries hashmap: hashmap_{put,remove} return hashmap_entry * hashmap: use *_entry APIs for iteration hashmap_cmp_fn takes hashmap_entry params hashmap_get{,_from_hash} return "struct hashmap_entry *" hashmap: use *_entry APIs to wrap container_of hashmap_get_next returns "struct hashmap_entry *" introduce container_of macro hashmap_put takes "struct hashmap_entry *" hashmap_remove takes "const struct hashmap_entry *" hashmap_get takes "const struct hashmap_entry *" hashmap_add takes "struct hashmap_entry *" hashmap_get_next takes "const struct hashmap_entry *" hashmap_entry_init takes "struct hashmap_entry *" packfile: use hashmap_entry in delta_base_cache_entry coccicheck: detect hashmap_entry.hash assignment diff: use hashmap_entry_init on moved_entry.ent
2019-10-11Merge branch 'ab/pcre-jit-fixes'Junio C Hamano
A few simplification and bugfixes to PCRE interface. * ab/pcre-jit-fixes: grep: under --debug, show whether PCRE JIT is enabled grep: do not enter PCRE2_UTF mode on fixed matching grep: stess test PCRE v2 on invalid UTF-8 data grep: create a "is_fixed" member in "grep_pat" grep: consistently use "p->fixed" in compile_regexp() grep: stop using a custom JIT stack with PCRE v1 grep: stop "using" a custom JIT stack with PCRE v2 grep: remove overly paranoid BUG(...) code grep: use PCRE v2 for optimized fixed-string search grep: remove the kwset optimization grep: drop support for \0 in --fixed-strings <pattern> grep: make the behavior for NUL-byte in patterns sane grep tests: move binary pattern tests into their own file grep tests: move "grep binary" alongside the rest grep: inline the return value of a function call used only once t4210: skip more command-line encoding tests on MinGW grep: don't use PCRE2?_UTF8 with "log --encoding=<non-utf8>" log tests: test regex backends in "--encode=<enc>" tests
2019-10-07Merge branch 'rs/simplify-by-deco-with-deco-refs-exclude'Junio C Hamano
"git log --decorate-refs-exclude=<pattern>" was incorrectly overruled when the "--simplify-by-decoration" option is used, which has been corrected. * rs/simplify-by-deco-with-deco-refs-exclude: log-tree: call load_ref_decorations() in get_name_decoration() log: test --decorate-refs-exclude with --simplify-by-decoration
2019-10-07hashmap: remove type arg from hashmap_{get,put,remove}_entryEric Wong
Since these macros already take a `keyvar' pointer of a known type, we can rely on OFFSETOF_VAR to get the correct offset without relying on non-portable `__typeof__' and `offsetof'. Argument order is also rearranged, so `keyvar' and `member' are sequential as they are used as: `keyvar->member' Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07OFFSETOF_VAR macro to simplify hashmap iteratorsEric Wong
While we cannot rely on a `__typeof__' operator being portable to use with `offsetof'; we can calculate the pointer offset using an existing pointer and the address of a member using pointer arithmetic for compilers without `__typeof__'. This allows us to simplify usage of hashmap iterator macros by not having to specify a type when a pointer of that type is already given. In the future, list iterator macros (e.g. list_for_each_entry) may also be implemented using OFFSETOF_VAR to save hackers the trouble of using container_of/list_entry macros and without relying on non-portable `__typeof__'. v3: use `__typeof__' to avoid clang warnings Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap: introduce hashmap_free_entriesEric Wong
`hashmap_free_entries' behaves like `container_of' and passes the offset of the hashmap_entry struct to the internal `hashmap_free_' function, allowing the function to free any struct pointer regardless of where the hashmap_entry field is located. `hashmap_free' no longer takes any arguments aside from the hashmap itself. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap: use *_entry APIs for iterationEric Wong
Inspired by list_for_each_entry in the Linux kernel. Once again, these are somewhat compromised usability-wise by compilers lacking __typeof__ support. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap_cmp_fn takes hashmap_entry paramsEric Wong
Another step in eliminating the requirement of hashmap_entry being the first member of a struct. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap_get{,_from_hash} return "struct hashmap_entry *"Eric Wong
Update callers to use hashmap_get_entry, hashmap_get_entry_from_hash or container_of as appropriate. This is another step towards eliminating the requirement of hashmap_entry being the first field in a struct. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap_put takes "struct hashmap_entry *"Eric Wong
This is less error-prone than "void *" as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap_get takes "const struct hashmap_entry *"Eric Wong
This is less error-prone than "const void *" as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap_entry_init takes "struct hashmap_entry *"Eric Wong
C compilers do type checking to make life easier for us. So rely on that and update all hashmap_entry_init callers to take "struct hashmap_entry *" to avoid future bugs while improving safety and readability. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-30Merge branch 'rs/get-tagged-oid'Junio C Hamano
Code cleanup. * rs/get-tagged-oid: use get_tagged_oid() tag: factor out get_tagged_oid()
2019-09-09log-tree: call load_ref_decorations() in get_name_decoration()René Scharfe
Load a default set of ref name decorations at the first lookup. This frees direct and indirect callers from doing so. They can still do it if they want to use a filter or are interested in full decorations instead of the default short ones -- the first load_ref_decorations() call wins. This means that the load in builtin/log.c::cmd_log_init_finish() is respected even if --simplify-by-decoration is given, as the previously dominating earlier load in handle_revision_opt() is gone. So a filter given with --decorate-refs-exclude is used for simplification in that case, as expected. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-05tag: factor out get_tagged_oid()René Scharfe
Add a function for accessing the ID of the object referenced by a tag safely, i.e. without causing a segfault when encountering a broken tag where ->tagged is NULL. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-06revision: allow --end-of-options to end option parsingJeff King
There's currently no robust way to tell Git that a particular option is meant to be a revision, and not an option. So if you have a branch "refs/heads/--foo", you cannot just say: git rev-list --foo You can say: git rev-list refs/heads/--foo But that breaks down if you don't know the refname, and in particular if you're a script passing along a value from elsewhere. In most programs, you can use "--" to end option parsing, like this: some-prog -- "$revision" But that doesn't work for the revision parser, because "--" is already meaningful there: it separates revisions from pathspecs. So we need some other marker to separate options from revisions. This patch introduces "--end-of-options", which serves that purpose: git rev-list --oneline --end-of-options "$revision" will work regardless of what's in "$revision" (well, if you say "--" it may fail, but it won't do something dangerous, like triggering an unexpected option). The name is verbose, but that's probably a good thing; this is meant to be used for scripted invocations where readability is more important than terseness. One alternative would be to introduce an explicit option to mark a revision, like: git rev-list --oneline --revision="$revision" That's slightly _more_ informative than this commit (because it makes even something silly like "--" unambiguous). But the pattern of using a separator like "--" is well established in git and in other commands, and it makes some scripting tasks simpler like: git rev-list --end-of-options "$@" There's no documentation in this patch, because it will make sense to describe the feature once it is available everywhere (and support will be added in further patches). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-19Merge branch 'jk/check-connected-with-alternates'Junio C Hamano
The tips of refs from the alternate object store can be used as starting point for reachability computation now. * jk/check-connected-with-alternates: check_everything_connected: assume alternate ref tips are valid object-store.h: move for_each_alternate_ref() from transport.h
2019-07-01check_everything_connected: assume alternate ref tips are validJeff King
When we receive a remote ref update to sha1 "X", we want to check that we have all of the objects needed by "X". We can assume that our repository is not currently corrupted, and therefore if we have a ref pointing at "Y", we have all of its objects. So we can stop our traversal from "X" as soon as we hit "Y". If we make the same non-corruption assumption about any repositories we use to store alternates, then we can also use their ref tips to shorten the traversal. This is especially useful when cloning with "--reference", as we otherwise do not have any local refs to check against, and have to traverse the whole history, even though the other side may have sent us few or no objects. Here are results for the included perf test (which shows off more or less the maximal savings, getting one new commit and sharing the whole history): Test HEAD^ HEAD -------------------------------------------------------------------- [on git.git] 5600.3: clone --reference 2.94(2.86+0.08) 0.09(0.08+0.01) -96.9% [on linux.git] 5600.3: clone --reference 45.74(45.34+0.41) 0.36(0.30+0.08) -99.2% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28grep: don't use PCRE2?_UTF8 with "log --encoding=<non-utf8>"Ævar Arnfjörð Bjarmason
Fix a bug introduced in 18547aacf5 ("grep/pcre: support utf-8", 2016-06-25) that was missed due to a blindspot in our tests, as discussed in the previous commit. I then blindly copied the same bug in 94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) when adding the PCRE v2 code. We should not tell PCRE that we're processing UTF-8 just because we're dealing with non-ASCII. In the case of e.g. "log --encoding=<...>" under is_utf8_locale() the haystack might be in ISO-8859-1, and the needle might be in a non-UTF-8 encoding. Maybe we should be more strict here and die earlier? Should we also be converting the needle to the encoding in question, and failing if it's not a string that's valid in that encoding? Maybe. But for now matching this as non-UTF8 at least has some hope of producing sensible results, since we know that our default heuristic of assuming the text to be matched is in the user locale encoding isn't true when we've explicitly encoded it to be in a different encoding. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-28revision: keep topo-walk free of unintersting commitsDerrick Stolee
When updating the topo-order walk in b454241 (revision.c: generation-based topo-order algorithm, 2018-11-01), the logic was a huge rewrite of the walk logic. In that massive change, we accidentally included the UNINTERESTING commits in expand_topo_walk(). This means that a simple query like git rev-list --topo-order HEAD~1..HEAD will expand the topo walk for all commits reachable from HEAD, and not just one commit. This change should speed up these cases, but there is still a need for corrected commit-date for some A..B queries. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-28revision: use generation for A..B --topo-order queriesDerrick Stolee
If a commit-graph exists with computed generation numbers, then a 'git rev-list --topo-order -n <N> <rev>' query will use those generation numbers to reduce the number of commits walked before writing N commits. One caveat put in b454241 (revision.c: generation-based topo-order algorithm, 2018-11-01) was to not enable the new algorithm for queries with a revision range "A..B". The logic was placed to walk from "A" and mark those commits as uninteresting, but the performance was actually worse than the existing logic in some cases. The root cause of this performance degradation is that generation numbers _increase_ the number of commits we walk relative to the existing heuristic of walking by commit date. While generation numbers actually guarantee that the algorithm is correct, the existing logic is very rarely wrong and that added requirement is not worth the cost. This motivates the planned "corrected commit date" to replace generation numbers in a future version of Git. The current change enables the logic to use whatever reachability index is currently in the commit-graph (generation numbers or corrected commit date). The limited flag in struct rev_info forces a full walk of the commit history (after discovering the A..B range). Previosuly, it is enabled whenever we see an uninteresting commit. We prevent enabling the parameter when we are planning to use the reachability index for a topo-order. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-25Merge branch 'jk/revision-rewritten-parents-in-prio-queue'Junio C Hamano
Performance fix for "rev-list --parents -- pathspec". * jk/revision-rewritten-parents-in-prio-queue: revision: use a prio_queue to hold rewritten parents
2019-04-25Merge branch 'jk/unused-params-even-more'Junio C Hamano
Code cleanup. * jk/unused-params-even-more: parse_opt_ref_sorting: always use with NONEG flag pretty: drop unused strbuf from parse_padding_placeholder() pretty: drop unused "type" parameter in needs_rfc2047_encoding() parse-options: drop unused ctx parameter from show_gitcomp() fetch_pack(): drop unused parameters report_path_error(): drop unused prefix parameter unpack-trees: drop unused error_type parameters unpack-trees: drop name_entry from traverse_by_cache_tree() test-date: drop unused "now" parameter from parse_dates() update-index: drop unused prefix_length parameter from do_reupdate() log: drop unused "len" from show_tagger() log: drop unused rev_info from early output revision: drop some unused "revs" parameters
2019-04-09Merge branch 'jk/line-log-with-patch'Junio C Hamano
"git log -L<from>,<to>:<path>" with "-s" did not suppress the patch output as it should. This has been corrected. * jk/line-log-with-patch: line-log: detect unsupported formats line-log: suppress diff output with "-s"
2019-04-04revision: use a prio_queue to hold rewritten parentsJeff King
This patch fixes a quadratic list insertion in rewrite_one() when pathspec limiting is combined with --parents. What happens is something like this: 1. We see that some commit X touches the path, so we try to rewrite its parents. 2. rewrite_one() loops forever, rewriting parents, until it finds a relevant parent (or hits the root and decides there are none). The heavy lifting is done by process_parent(), which uses try_to_simplify_commit() to drop parents. 3. process_parent() puts any intermediate parents into the &revs->commits list, inserting by commit date as usual. So if commit X is recent, and then there's a large chunk of history that doesn't touch the path, we may add a lot of commits to &revs->commits. And insertion by commit date is O(n) in the worst case, making the whole thing quadratic. We tried to deal with this long ago in fce87ae538 (Fix quadratic performance in rewrite_one., 2008-07-12). In that scheme, we cache the oldest commit in the list; if the new commit to be added is older, we can start our linear traversal there. This often works well in practice because parents are older than their descendants, and thus we tend to add older and older commits as we traverse. But this isn't guaranteed, and in fact there's a simple case where it is not: merges. Imagine we look at the first parent of a merge and see a very old commit (let's say 3 years old). And on the second parent, as we go back 3 years in history, we might have many commits. That one first-parent commit has polluted our oldest-commit cache; it will remain the oldest while we traverse a huge chunk of history, during which we have to fall back to the slow, linear method of adding to the list. Naively, one might imagine that instead of caching the oldest commit, we'd start at the last-added one. But that just makes some cases faster while making others slower (and indeed, while it made a real-world test case much faster, it does quite poorly in the perf test include here). Fundamentally, these are just heuristics; our worst case is still quadratic, and some cases will approach that. Instead, let's use a data structure with better worst-case performance. Swapping out revs->commits for something else would have repercussions all over the code base, but we can take advantage of one fact: for the rewrite_one() case, nobody actually needs to see those commits in revs->commits until we've finished generating the whole list. That leaves us with two obvious options: 1. We can generate the list _unordered_, which should be O(n), and then sort it afterwards, which would be O(n log n) total. This is "sort-after" below. 2. We can insert the commits into a separate data structure, like a priority queue. This is "prio-queue" below. I expected that sort-after would be the fastest (since it saves us the extra step of copying the items into the linked list), but surprisingly the prio-queue seems to be a bit faster. Here are timings for the new p0001.6 for all three techniques across a few repositories, as compared to master: master cache-last sort-after prio-queue -------------------------------------------------------------------------------------------- GIT_PERF_REPO=git.git 0.52(0.50+0.02) 0.53(0.51+0.02) +1.9% 0.37(0.33+0.03) -28.8% 0.37(0.32+0.04) -28.8% GIT_PERF_REPO=linux.git 20.81(20.74+0.07) 20.31(20.24+0.07) -2.4% 0.94(0.86+0.07) -95.5% 0.91(0.82+0.09) -95.6% GIT_PERF_REPO=llvm-project.git 83.67(83.57+0.09) 4.23(4.15+0.08) -94.9% 3.21(3.15+0.06) -96.2% 2.98(2.91+0.07) -96.4% A few items to note: - the cache-list tweak does improve the bad case for llvm-project.git that started my digging into this problem. But it performs terribly on linux.git, barely helping at all. - the sort-after and prio-queue techniques work well. They approach the timing for running without --parents at all, which is what you'd expect (see below for more data). - prio-queue just barely outperforms sort-after. As I said, I'm not really sure why this is the case, but it is. You can see it even more prominently in this real-world case on llvm-project.git: git rev-list --parents 07ef786652e7 -- llvm/test/CodeGen/Generic/bswap.ll where prio-queue routinely outperforms sort-after by about 7%. One guess is that the prio-queue may just be more efficient because it uses a compact array. There are three new perf tests: - "rev-list --parents" gives us a baseline for running with --parents. This isn't sped up meaningfully here, because the bad case is triggered only with simplification. But it's good to make sure we don't screw it up (now, or in the future). - "rev-list -- dummy" gives us a baseline for just traversing with pathspec limiting. This gives a lower bound for the next test (and it's also a good thing for us to be checking in general for regressions, since we don't seem to have any existing tests). - "rev-list --parents -- dummy" shows off the problem (and our fix) Here are the timings for those three on llvm-project.git, before and after the fix: Test master prio-queue ------------------------------------------------------------------------------ 0001.3: rev-list --parents 2.24(2.12+0.12) 2.22(2.11+0.11) -0.9% 0001.5: rev-list -- dummy 2.89(2.82+0.07) 2.92(2.89+0.03) +1.0% 0001.6: rev-list --parents -- dummy 83.67(83.57+0.09) 2.98(2.91+0.07) -96.4% Changes in the first two are basically noise, and you can see we approach our lower bound in the final one. Note that we can't fully get rid of the list argument from process_parents(). Other callers do have lists, and it would be hard to convert them. They also don't seem to have this problem (probably because they actually remove items from the list as they loop, meaning it doesn't grow so large in the first place). So this basically just drops the "cache_ptr" parameter (which was used only by the one caller we're fixing here) and replaces it with a prio_queue. Callers are free to use either data structure, depending on what they're prepared to handle. Reported-by: Björn Pettersson A <bjorn.a.pettersson@ericsson.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-20revision: drop some unused "revs" parametersJeff King
There are several internal helpers that take a rev_info struct but don't actually look at it. While one could argue that all helpers in revision.c should take a rev_info struct for consistency, dropping the unused parameter makes it clear that they don't actually depend on any other rev options. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-11line-log: detect unsupported formatsJeff King
If you use "log -L" with an output format like "--raw" or "--stat", we'll silently ignore the format and just output the normal patch. Let's detect and complain about this, which at least tells the user what's going on. The tests here aren't exhaustive over the set of all formats, but it should at least let us know if somebody breaks the format-checking. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-07Merge branch 'en/combined-all-paths'Junio C Hamano
Output from "diff --cc" did not show the original paths when the merge involved renames. A new option adds the paths in the original trees to the output. * en/combined-all-paths: log,diff-tree: add --combined-all-paths option
2019-02-08log,diff-tree: add --combined-all-paths optionElijah Newren
The combined diff format for merges will only list one filename, even if rename or copy detection is active. For example, with raw format one might see: ::100644 100644 100644 fabadb8 cc95eb0 4866510 MM describe.c ::100755 100755 100755 52b7a2d 6d1ac04 d2ac7d7 RM bar.sh ::100644 100644 100644 e07d6c5 9042e82 ee91881 RR phooey.c This doesn't let us know what the original name of bar.sh was in the first parent, and doesn't let us know what either of the original names of phooey.c were in either of the parents. In contrast, for non-merge commits, raw format does provide original filenames (and a rename score to boot). In order to also provide original filenames for merge commits, add a --combined-all-paths option (which must be used with either -c or --cc, and is likely only useful with rename or copy detection active) so that we can print tab-separated filenames when renames are involved. This transforms the above output to: ::100644 100644 100644 fabadb8 cc95eb0 4866510 MM desc.c desc.c desc.c ::100755 100755 100755 52b7a2d 6d1ac04 d2ac7d7 RM foo.sh bar.sh bar.sh ::100644 100644 100644 e07d6c5 9042e82 ee91881 RR fooey.c fuey.c phooey.c Further, in patch format, this changes the from/to headers so that instead of just having one "from" header, we get one for each parent. For example, instead of having --- a/phooey.c +++ b/phooey.c we would see --- a/fooey.c --- a/fuey.c +++ b/phooey.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-07Merge branch 'ds/push-sparse-tree-walk'Junio C Hamano
"git pack-objects" learned another algorithm to compute the set of objects to send, that trades the resulting packfile off to save traversal cost to favor small pushes. * ds/push-sparse-tree-walk: pack-objects: create GIT_TEST_PACK_SPARSE pack-objects: create pack.useSparse setting revision: implement sparse algorithm list-objects: consume sparse tree walk revision: add mark_tree_uninteresting_sparse
2019-02-07Merge branch 'nd/the-index-final'Junio C Hamano
The assumption to work on the single "in-core index" instance has been reduced from the library-ish part of the codebase. * nd/the-index-final: cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch read-cache.c: remove the_* from index_has_changes() merge-recursive.c: remove implicit dependency on the_repository merge-recursive.c: remove implicit dependency on the_index sha1-name.c: remove implicit dependency on the_index read-cache.c: replace update_index_if_able with repo_& read-cache.c: kill read_index() checkout: avoid the_index when possible repository.c: replace hold_locked_index() with repo_hold_locked_index() notes-utils.c: remove the_repository references grep: use grep_opt->repo instead of explict repo argument
2019-02-05Merge branch 'jt/get-reference-with-commit-graph'Junio C Hamano
Micro-optimize the code that prepares commit objects to be walked by "git rev-list" when the commit-graph is available. * jt/get-reference-with-commit-graph: revision: use commit graph in get_reference()
2019-01-29Merge branch 'bc/tree-walk-oid'Junio C Hamano
The code to walk tree objects has been taught that we may be working with object names that are not computed with SHA-1. * bc/tree-walk-oid: cache: make oidcpy always copy GIT_MAX_RAWSZ bytes tree-walk: store object_id in a separate member match-trees: use hashcpy to splice trees match-trees: compute buffer offset correctly when splicing tree-walk: copy object ID before use
2019-01-17revision: implement sparse algorithmDerrick Stolee
When enumerating objects to place in a pack-file during 'git pack-objects --revs', we discover the "frontier" of commits that we care about and the boundary with commit we find uninteresting. From that point, we walk trees to discover which trees and blobs are uninteresting. Finally, we walk trees from the interesting commits to find the interesting objects that are placed in the pack. This commit introduces a new, "sparse" way to discover the uninteresting trees. We use the perspective of a single user trying to push their topic to a large repository. That user likely changed a very small fraction of the paths in their working directory, but we spend a lot of time walking all reachable trees. The way to switch the logic to work in this sparse way is to start caring about which paths introduce new trees. While it is not possible to generate a diff between the frontier boundary and all of the interesting commits, we can simulate that behavior by inspecting all of the root trees as a whole, then recursing down to the set of trees at each path. We already had taken the first step by passing an oidset to mark_trees_uninteresting_sparse(). We now create a dictionary whose keys are paths and values are oidsets. We consider the set of trees that appear at each path. While we inspect a tree, we add its subtrees to the oidsets corresponding to the tree entry's path. We also mark trees as UNINTERESTING if the tree we are parsing is UNINTERESTING. To actually improve the performance, we need to terminate our recursion. If the oidset contains only UNINTERESTING trees, then we do not continue the recursion. This avoids walking trees that are likely to not be reachable from interesting trees. If the oidset contains only interesting trees, then we will walk these trees in the final stage that collects the intersting objects to place in the pack. Thus, we only recurse if the oidset contains both interesting and UNINITERESTING trees. There are a few ways that this is not a universally better option. First, we can pack extra objects. If someone copies a subtree from one tree to another, the first tree will appear UNINTERESTING and we will not recurse to see that the subtree should also be UNINTERESTING. We will walk the new tree and see the subtree as a "new" object and add it to the pack. A test is modified to demonstrate this behavior and to verify that the new logic is being exercised. Second, we can have extra memory pressure. If instead of being a single user pushing a small topic we are a server sending new objects from across the entire working directory, then we will gain very little (the recursion will rarely terminate early) but will spend extra time maintaining the path-oidset dictionaries. Despite these potential drawbacks, the benefits of the algorithm are clear. By adding a counter to 'add_children_by_path' and 'mark_tree_contents_uninteresting', I measured the number of parsed trees for the two algorithms in a variety of repos. For git.git, I used the following input: v2.19.0 ^v2.19.0~10 Objects to pack: 550 Walked (old alg): 282 Walked (new alg): 130 For the Linux repo, I used the following input: v4.18 ^v4.18~10 Objects to pack: 518 Walked (old alg): 4,836 Walked (new alg): 188 The two repos above are rather "wide and flat" compared to other repos that I have used in the past. As a comparison, I tested an old topic branch in the Azure DevOps repo, which has a much deeper folder structure than the Linux repo. Objects to pack: 220 Walked (old alg): 22,804 Walked (new alg): 129 I used the number of walked trees the main metric above because it is consistent across multiple runs. When I ran my tests, the performance of the pack-objects command with the same options could change the end-to-end time by 10x depending on the file system being warm. However, by repeating the same test on repeat I could get more consistent timing results. The git.git and Linux tests were too fast overall (less than 0.5s) to measure an end-to-end difference. The Azure DevOps case was slow enough to see the time improve from 15s to 1s in the warm case. The cold case was 90s to 9s in my testing. These improvements will have even larger benefits in the super- large Windows repository. In our experiments, we see the "Enumerate objects" phase of pack-objects taking 60-80% of the end-to-end time of non-trivial pushes, taking longer than the network time to send the pack and the server time to verify the pack. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17revision: add mark_tree_uninteresting_sparseDerrick Stolee
In preparation for a new algorithm that walks fewer trees when creating a pack from a set of revisions, create a method that takes an oidset of tree oids and marks reachable objects as UNINTERESTING. The current implementation uses the existing mark_tree_uninteresting to recursively walk the trees and blobs. This will walk the same number of trees as the old mechanism. To ensure that mark_tree_uninteresting walks the tree, we need to remove the UNINTERESTING flag before calling the method. This implementation will be replaced entirely in a later commit. There is one new assumption in this approach: we are also given the oids of the interesting trees. This implementation does not use those trees at the moment, but we will use them in a later rewrite of this method. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15tree-walk: store object_id in a separate memberbrian m. carlson
When parsing a tree, we read the object ID directly out of the tree buffer. This is normally fine, but such an object ID cannot be used with oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1, there may not be that many bytes to copy. Instead, store the object ID in a separate struct member. Since we can no longer efficiently compute the path length, store that information as well in struct name_entry. Ensure we only copy the object ID into the new buffer if the path length is nonzero, as some callers will pass us an empty path with no object ID following it, and we will not want to read past the end of the buffer. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14Merge branch 'nd/indentation-fix'Junio C Hamano
Code cleanup. * nd/indentation-fix: Indent code with TABs
2019-01-14Merge branch 'md/exclude-promisor-objects-fix-cleanup'Junio C Hamano
Code clean-up. * md/exclude-promisor-objects-fix-cleanup: revision.c: put promisor option in specialized struct
2019-01-14Merge branch 'nd/attr-pathspec-in-tree-walk'Junio C Hamano
The traversal over tree objects has learned to honor ":(attr:label)" pathspec match, which has been implemented only for enumerating paths on the filesystem. * nd/attr-pathspec-in-tree-walk: tree-walk: support :(attr) matching dir.c: move, rename and export match_attrs() pathspec.h: clean up "extern" in function declarations tree-walk.c: make tree_entry_interesting() take an index tree.c: make read_tree*() take 'struct repository *'
2019-01-14Merge branch 'md/list-lazy-objects-fix'Junio C Hamano
"git rev-list --exclude-promisor-objects" had to take an object that does not exist locally (and is lazily available) from the command line without barfing, but the code dereferenced NULL. * md/list-lazy-objects-fix: list-objects.c: don't segfault for missing cmdline objects
2019-01-14sha1-name.c: remove implicit dependency on the_indexNguyễn Thái Ngọc Duy
This kills the_index dependency in get_oid_with_context() but for get_oid() and friends, they still assume the_repository (which also means the_index). Unfortunately the widespread use of get_oid() will make it hard to make the conversion now. We probably will add repo_get_oid() at some point and limit the use of get_oid() in builtin/ instead of forcing all get_oid() call sites to carry struct repository. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>