summaryrefslogtreecommitdiff
path: root/commit-graph.h
AgeCommit message (Collapse)Author
2019-02-05Merge branch 'ab/commit-graph-write-progress'Junio C Hamano
The codepath to show progress meter while writing out commit-graph file has been improved. * ab/commit-graph-write-progress: commit-graph write: emit a percentage for all progress commit-graph write: add itermediate progress commit-graph write: remove empty line for readability commit-graph write: add more descriptive progress output commit-graph write: show progress for object search commit-graph write: more descriptive "writing out" output commit-graph write: add "Writing out" progress output commit-graph: don't call write_graph_chunk_extra_edges() unnecessarily commit-graph: rename "large edges" to "extra edges"
2019-01-22commit-graph: rename "large edges" to "extra edges"SZEDER Gábor
The optional 'Large Edge List' chunk of the commit graph file stores parent information for commits with more than two parents, and the names of most of the macros, variables, struct fields, and functions related to this chunk contain the term "large edges", e.g. write_graph_chunk_large_edges(). However, it's not a really great term, as the edges to the second and subsequent parents stored in this chunk are not any larger than the edges to the first and second parents stored in the "main" 'Commit Data' chunk. It's the number of edges, IOW number of parents, that is larger compared to non-merge and "regular" two-parent merge commits. And indeed, two functions in 'commit-graph.c' have a local variable called 'num_extra_edges' that refer to the same thing, and this "extra edges" term is much better at describing these edges. So let's rename all these references to "large edges" in macro, variable, function, etc. names to "extra edges". There is a GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency rename it to GRAPH_EXTRA_EDGES_NEEDED. We can do so safely without causing any incompatibility issues, because the term "large edges" doesn't come up in the file format itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there is no 'L' in there), but only in the specification text. The string "large edges", however, does come up in the output of 'git commit-graph read' and in tests looking at its input, but that command is explicitly documented as debugging aid, so we can change its output and the affected tests safely. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-16commit-graph, fuzz: add fuzzer for commit-graphJosh Steadmon
Break load_commit_graph_one() into a new function, parse_commit_graph(). The latter function operates on arbitrary buffers, which makes it suitable as a fuzzing target. Since parse_commit_graph() is only called by load_commit_graph_one() (and the fuzzer described below), we omit error messages that would be duplicated by the caller. Adds fuzz-commit-graph.c, which provides a fuzzing entry point compatible with libFuzzer (and possibly other fuzzing engines). Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-16Merge branch 'ds/commit-graph-with-grafts'Junio C Hamano
The recently introduced commit-graph auxiliary data is incompatible with mechanisms such as replace & grafts that "breaks" immutable nature of the object reference relationship. Disable optimizations based on its use (and updating existing commit-graph) when these incompatible features are in use in the repository. * ds/commit-graph-with-grafts: commit-graph: close_commit_graph before shallow walk commit-graph: not compatible with uninitialized repo commit-graph: not compatible with grafts commit-graph: not compatible with replace objects test-repository: properly init repo commit-graph: update design document refs.c: upgrade for_each_replace_ref to be a each_repo_ref_fn callback refs.c: migrate internal ref iteration to pass thru repository argument
2018-10-16Merge branch 'ab/commit-graph-progress'Junio C Hamano
Generation of (experimental) commit-graph files have so far been fairly silent, even though it takes noticeable amount of time in a meaningfully large repository. The users will now see progress output. * ab/commit-graph-progress: gc: fix regression in 7b0f229222 impacting --quiet commit-graph verify: add progress output commit-graph write: add progress output
2018-09-17Merge branch 'ds/commit-graph-tests'Junio C Hamano
We can now optionally run tests with commit-graph enabled. * ds/commit-graph-tests: commit-graph: define GIT_TEST_COMMIT_GRAPH
2018-09-17Merge branch 'ds/reachable'Junio C Hamano
The code for computing history reachability has been shuffled, obtained a bunch of new tests to cover them, and then being improved. * ds/reachable: commit-reach: correct accidental #include of C file commit-reach: use can_all_from_reach commit-reach: make can_all_from_reach... linear commit-reach: replace ref_newer logic test-reach: test commit_contains test-reach: test can_all_from_reach_with_flags test-reach: test reduce_heads test-reach: test get_merge_bases_many test-reach: test is_descendant_of test-reach: test in_merge_bases test-reach: create new test tool for ref_newer commit-reach: move can_all_from_reach_with_flags upload-pack: generalize commit date cutoff upload-pack: refactor ok_to_give_up() upload-pack: make reachable() more generic commit-reach: move commit_contains from ref-filter commit-reach: move ref_newer from remote.c commit.h: remove method declarations commit-reach: move walk methods from commit.c
2018-09-17commit-graph write: add progress outputÆvar Arnfjörð Bjarmason
Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29commit-graph: define GIT_TEST_COMMIT_GRAPHDerrick Stolee
The commit-graph feature is tested in isolation by t5318-commit-graph.sh and t6600-test-reach.sh, but there are many more interesting scenarios involving commit walks. Many of these scenarios are covered by the existing test suite, but we need to maintain coverage when the optional commit-graph structure is not present. To allow running the full test suite with the commit-graph present, add a new test environment variable, GIT_TEST_COMMIT_GRAPH. Similar to GIT_TEST_SPLIT_INDEX, this variable makes every Git command try to load the commit-graph when parsing commits, and writes the commit-graph file after every 'git commit' command. There are a few tests that rely on commits not existing in pack-files to trigger important events, so manually set GIT_TEST_COMMIT_GRAPH to false for the necessary commands. There is one test in t6024-recursive-merge.sh that relies on the merge-base algorithm picking one of two ambiguous merge-bases, and the commit-graph feature changes which merge-base is picked. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-21commit-graph: close_commit_graph before shallow walkDerrick Stolee
Call close_commit_graph() when about to start a rev-list walk that includes shallow commits. This is necessary in code paths that "fake" shallow commits for the sake of fetch. Specifically, test 351 in t5500-fetch-pack.sh runs git fetch --shallow-exclude one origin with a file-based transfer. When the "remote" has a commit-graph, we do not prevent the commit-graph from being loaded, but then the commits are intended to be dynamically transferred into shallow commits during get_shallow_commits_by_rev_list(). By closing the commit-graph before this call, we prevent this interaction. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-15Add missing includes and forward declarationsElijah Newren
I looped over the toplevel header files, creating a temporary two-line C program for each consisting of #include "git-compat-util.h" #include $HEADER This patch is the result of manually fixing errors in compiling those tiny programs. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20commit-reach: use can_all_from_reachDerrick Stolee
The is_descendant_of method previously used in_merge_bases() to check if the commit can reach any of the commits in the provided list. This had two performance problems: 1. The performance is quadratic in worst-case. 2. A single in_merge_bases() call requires walking beyond the target commit in order to find the full set of boundary commits that may be merge-bases. The can_all_from_reach method avoids this quadratic behavior and can limit the search beyond the target commits using generation numbers. It requires a small prototype adjustment to stop using commit-date as a cutoff, as that optimization is no longer appropriate here. Since in_merge_bases() uses paint_down_to_common(), is_descendant_of() naturally found cutoffs to avoid walking the entire commit graph. Since we want to always return the correct result, we cannot use the min_commit_date cutoff in can_all_from_reach. We then rely on generation numbers to provide the cutoff. Since not all repos will have a commit-graph file, nor will we always have generation numbers computed for a commit-graph file, create a new method, generation_numbers_enabled(), that checks for a commit-graph file and sees if the first commit in the file has a non-zero generation number. In the case that we do not have generation numbers, use the old logic for is_descendant_of(). Performance was meausured on a copy of the Linux repository using the 'test-tool reach is_descendant_of' command using this input: A:v4.9 X:v4.10 X:v4.11 X:v4.12 X:v4.13 X:v4.14 X:v4.15 X:v4.16 X:v4.17 X.v3.0 Note that this input is tailored to demonstrate the quadratic nature of the previous method, as it will compute merge-bases for v4.9 versus all of the later versions before checking against v4.1. Before: 0.26 s After: 0.21 s Since we previously used the is_descendant_of method in the ref_newer method, we also measured performance there using 'test-tool reach ref_newer' with this input: A:v4.9 B:v3.19 Before: 0.10 s After: 0.08 s By adding a new commit with parent v3.19, we test the non-reachable case of ref_newer: Before: 0.09 s After: 0.08 s Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-17commit-graph: add repo arg to graph readersJonathan Tan
Add a struct repository argument to the functions in commit-graph.h that read the commit graph. (This commit does not affect functions that write commit graphs.) Because the commit graph functions can now read the commit graph of any repository, the global variable core_commit_graph has been removed. Instead, the config option core.commitGraph is now read on the first time in a repository that a commit is attempted to be parsed using its commit graph. This commit includes a test that exercises the functionality on an arbitrary repository that is not the_repository. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-17commit-graph: add free_commit_graphJonathan Tan
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-17commit-graph: add missing forward declarationJonathan Tan
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27commit-graph: add '--reachable' optionDerrick Stolee
When writing commit-graph files, it can be convenient to ask for all reachable commits (starting at the ref set) in the resulting file. This is particularly helpful when writing to stdin is complicated, such as a future integration with 'git gc'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27commit-graph: use string-list API for inputDerrick Stolee
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27commit-graph: add 'verify' subcommandDerrick Stolee
If the commit-graph file becomes corrupt, we need a way to verify that its contents match the object database. In the manner of 'git fsck' we will implement a 'git commit-graph verify' subcommand to report all issues with the file. Add the 'verify' subcommand to the 'commit-graph' builtin and its documentation. The subcommand is currently a no-op except for loading the commit-graph into memory, which may trigger run-time errors that would be caught by normal use. Add a simple test that ensures the command returns a zero error code. If no commit-graph file exists, this is an acceptable state. Do not report any errors. Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22commit-graph: always load commit-graph informationDerrick Stolee
Most code paths load commits using lookup_commit() and then parse_commit(). In some cases, including some branch lookups, the commit is parsed using parse_object_buffer() which side-steps parse_commit() in favor of parse_commit_buffer(). With generation numbers in the commit-graph, we need to ensure that any commit that exists in the commit-graph file has its generation number loaded. Create new load_commit_graph_info() method to fill in the information for a commit that exists only in the commit-graph file. Call it from parse_commit_buffer() after loading the other commit information from the given buffer. Only fill this information when specified by the 'check_graph' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11commit-graph: lazy-load trees for commitsDerrick Stolee
The commit-graph file provides quick access to commit data, including the OID of the root tree for each commit in the graph. When performing a deep commit-graph walk, we may not need to load most of the trees for these commits. Delay loading the tree object for a commit loaded from the graph until requested via get_commit_tree(). Do not lazy-load trees for commits not in the graph, since that requires duplicate parsing and the relative peformance improvement when trees are not needed is small. On the Linux repository, performance tests were run for the following command: git log --graph --oneline -1000 Before: 0.92s After: 0.66s Rel %: -28.3% Adding '-- kernel/' to the command requires loading the root tree for every commit that is walked. There was no measureable performance change as a result of this patch. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11commit-graph: implement "--append" optionDerrick Stolee
Teach git-commit-graph to add all commits from the existing commit-graph file to the file about to be written. This should be used when adding new commits without performing garbage collection. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11commit-graph: build graph from starting commitsDerrick Stolee
Teach git-commit-graph to read commits from stdin when the --stdin-commits flag is specified. Commits reachable from these commits are added to the graph. This is a much faster way to construct the graph than inspecting all packed objects, but is restricted to known tips. For the Linux repository, 700,000+ commits were added to the graph file starting from 'master' in 7-9 seconds, depending on the number of packfiles in the repo (1, 24, or 120). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11commit-graph: read only from specific pack-indexesDerrick Stolee
Teach git-commit-graph to inspect the objects only in a certain list of pack-indexes within the given pack directory. This allows updating the commit graph iteratively. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11commit: integrate commit graph with commit parsingDerrick Stolee
Teach Git to inspect a commit graph file to supply the contents of a struct commit when calling parse_commit_gently(). This implementation satisfies all post-conditions on the struct commit, including loading parents, the root tree, and the commit date. If core.commitGraph is false, then do not check graph files. In test script t5318-commit-graph.sh, add output-matching conditions on read-only graph operations. By loading commits from the graph instead of parsing commit buffers, we save a lot of time on long commit walks. Here are some performance results for a copy of the Linux repository where 'master' has 678,653 reachable commits and is behind 'origin/master' by 59,929 commits. | Command | Before | After | Rel % | |----------------------------------|--------|--------|-------| | log --oneline --topo-order -1000 | 8.31s | 0.94s | -88% | | branch -vv | 1.02s | 0.14s | -86% | | rev-list --all | 5.89s | 1.07s | -81% | | rev-list --all --objects | 66.15s | 58.45s | -11% | Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11commit-graph: implement git commit-graph readDerrick Stolee
Teach git-commit-graph to read commit graph files and summarize their contents. Use the read subcommand to verify the contents of a commit graph file in the tests. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-02commit-graph: implement write_commit_graph()Derrick Stolee
Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>