summaryrefslogtreecommitdiff
path: root/builtin/fsck.c
AgeCommit message (Collapse)Author
2024-02-21refs: drop unused params from the reflog iterator callbackPatrick Steinhardt
The ref and reflog iterators share much of the same underlying code to iterate over the corresponding entries. This results in some weird code because the reflog iterator also exposes an object ID as well as a flag to the callback function. Neither of these fields do refer to the reflog though -- they refer to the corresponding ref with the same name. This is quite misleading. In practice at least the object ID cannot really be implemented in any other way as a reflog does not have a specific object ID in the first place. This is further stressed by the fact that none of the callbacks except for our test helper make use of these fields. Split up the infrastucture so that ref and reflog iterators use separate callback signatures. This allows us to drop the nonsensical fields from the reflog iterator. Note that internally, the backends still use the same shared infra to iterate over both types. As the backends should never end up being called directly anyway, this is not much of a problem and thus kept as-is for simplicity's sake. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26treewide: remove unnecessary includes in source filesElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26treewide: remove unnecessary includes in source filesElijah Newren
Each of these were checked with gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-20fsck: use enum object_type for fsck_walk callbackJeff King
We switched the function interface for fsck callbacks in a1aad71601 (fsck.h: use "enum object_type" instead of "int", 2021-03-28). However, we accidentally flipped the type back to "int" as part of 0b4e9013f1 (fsck: mark unused parameters in various fsck callbacks, 2023-07-03). The mistake happened because that commit was written before a1aad71601 and rebased forward, and I screwed up while resolving the conflict. Curiously, the compiler does not warn about this mismatch, at least not when using gcc and clang on Linux (nor in any of our CI environments). Based on 28abf260a5 (builtin/fsck.c: don't conflate "int" and "enum" in callback, 2021-06-01), I'd guess that this would cause the AIX xlc compiler to complain. I noticed because clang-18's UBSan now identifies mis-matched function calls at runtime, and does complain of this case when running the test suite. I'm not entirely clear on whether this mismatch is a problem in practice. Compilers are certainly free to make enums smaller than "int" if they don't need the bits, but I suspect that they have to promote back to int for function calls (though I didn't dig in the standard, and I won't be surprised if I'm simply wrong and the real-world impact would depend on the ABI). Regardless, switching it back to enum is obviously the right thing to do here; the switch to "int" was simply a mistake. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-25Merge branch 'jk/unused-parameter'Junio C Hamano
Mark-up unused parameters in the code so that we can eventually enable -Wunused-parameter by default. * jk/unused-parameter: t/helper: mark unused callback void data parameters tag: mark unused parameters in each_tag_name_fn callbacks rev-parse: mark unused parameter in for_each_abbrev callback replace: mark unused parameter in each_mergetag_fn callback replace: mark unused parameter in ref callback merge-tree: mark unused parameter in traverse callback fsck: mark unused parameters in various fsck callbacks revisions: drop unused "opt" parameter in "tweak" callbacks count-objects: mark unused parameter in alternates callback am: mark unused keep_cr parameters http-push: mark unused parameter in xml callback http: mark unused parameters in curl callbacks do_for_each_ref_helper(): mark unused repository parameter test-ref-store: drop unimplemented reflog-expire command
2023-07-18Merge branch 'tb/fsck-no-progress'Junio C Hamano
"git fsck --no-progress" still spewed noise from the commit-graph subsystem, which has been corrected. * tb/fsck-no-progress: commit-graph.c: avoid duplicated progress output during `verify` commit-graph.c: pass progress to `verify_one_commit_graph()` commit-graph.c: iteratively verify commit-graph chains commit-graph.c: extract `verify_one_commit_graph()` fsck: suppress MIDX output with `--no-progress` fsck: suppress commit-graph output with `--no-progress`
2023-07-14fsck: mark unused parameters in various fsck callbacksJeff King
There are a few callback functions which are used with the fsck code, but it's natural that not all callbacks need all parameters. For reporting, even something as obvious as "the oid of the object which had a problem" is not always used, as some callers are only checking a single object in the first place. And for both reporting and walking, things like void data pointers and the fsck_options aren't always necessary. But since each such parameter is used by _some_ callback, we have to keep them in the interface. Mark the unused ones in specific callbacks to avoid triggering -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10fsck: suppress MIDX output with `--no-progress`Taylor Blau
In a similar spirit as the previous commit, address a bug where `git fsck` produces output when calling `git multi-pack-index verify` even when invoked with `--no-progress`. $ git.compile fsck --connectivity-only --no-progress --no-dangling Verifying OID order in multi-pack-index: 100% (605677/605677), done. Sorting objects by packfile: 100% (605678/605678), done. Verifying object offsets: 100% (605678/605678), done. The three lines produced by `git fsck` come from `git multi-pack-index verify`, but should be squelched due to `--no-progress`. The MIDX machinery learned to generate these progress messages as early as 430efb8a74b (midx: add progress indicators in multi-pack-index verify, 2019-03-21), but did not respect `--progress` or `--no-progress` until ad60096d1c8 (midx: honor the MIDX_PROGRESS flag in verify_midx_file, 2019-10-21). But the `git multi-pack-index verify` step was added to fsck in 66ec0390e75 (fsck: verify multi-pack-index, 2018-09-13), pre-dating any of the above patches. Pass `--[no-]progress` as appropriate to ensure that we don't produce output when told not to. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10fsck: suppress commit-graph output with `--no-progress`Taylor Blau
Since e0fd51e1d7 (fsck: verify commit-graph, 2018-06-27), `fsck` runs `git commit-graph verify` to check the integrity of any commit-graph(s). Originally, the `git commit-graph verify` step would always print to stdout/stderr, regardless of whether or not `fsck` was invoked with `--[no-]progress` or not. But in 7371612255 (commit-graph: add --[no-]progress to write and verify, 2019-08-26), the commit-graph machinery learned the `--[no-]progress` option, though `fsck` was not updated to pass this new flag (or not). This led to seeing output from running `git fsck`, even with `--no-progress` on repositories that have a commit-graph: $ git.compile fsck --connectivity-only --no-progress --no-dangling Verifying commits in commit graph: 100% (4356/4356), done. Verifying commits in commit graph: 100% (131912/131912), done. Ensure that `fsck` passes `--[no-]progress` as appropriate when calling `git commit-graph verify`. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-08Merge branch 'jk/fsck-indices-in-worktrees'Junio C Hamano
Code clarification. * jk/fsck-indices-in-worktrees: fsck: avoid misleading variable name
2023-06-29Merge branch 'en/header-split-cache-h-part-3'Junio C Hamano
Header files cleanup. * en/header-split-cache-h-part-3: (28 commits) fsmonitor-ll.h: split this header out of fsmonitor.h hash-ll, hashmap: move oidhash() to hash-ll object-store-ll.h: split this header out of object-store.h khash: name the structs that khash declares merge-ll: rename from ll-merge git-compat-util.h: remove unneccessary include of wildmatch.h builtin.h: remove unneccessary includes list-objects-filter-options.h: remove unneccessary include diff.h: remove unnecessary include of oidset.h repository: remove unnecessary include of path.h log-tree: replace include of revision.h with simple forward declaration cache.h: remove this no-longer-used header read-cache*.h: move declarations for read-cache.c functions from cache.h repository.h: move declaration of the_index from cache.h merge.h: move declarations for merge.c from cache.h diff.h: move declaration for global in diff.c from cache.h preload-index.h: move declarations for preload-index.c from elsewhere sparse-index.h: move declarations for sparse-index.c from cache.h name-hash.h: move declarations for name-hash.c from cache.h run-command.h: move declarations for run-command.c from cache.h ...
2023-06-29fsck: avoid misleading variable nameEric Sunshine
When reporting a problem, `git fsck` emits a message such as: missing blob 1234abcd (:file) However, this can be ambiguous when the problem is detected in the index of a worktree other than the one in which `git fsck` was invoked. To address this shortcoming, 592ec63b38 (fsck: mention file path for index errors, 2023-02-24) enhanced the output to mention the path of the index when the problem is detected in some other worktree: missing blob 1234abcd (.git/worktrees/wt/index:file) Unfortunately, the variable in fsck_index() which controls whether the index path should be shown is misleadingly named "is_main_index" which can be misunderstood as referring to the main worktree (i.e. the one housing the .git/ repository) rather than to the current worktree (i.e. the one in which `git fsck` was invoked). Avoid such potential confusion by choosing a name more reflective of its actual purpose. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-22Merge branch 'ds/disable-replace-refs'Junio C Hamano
Introduce a mechanism to disable replace refs globally and per repository. * ds/disable-replace-refs: repository: create read_replace_refs setting replace-objects: create wrapper around setting repository: create disable_replace_refs()
2023-06-21object-store-ll.h: split this header out of object-store.hElijah Newren
The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store | sort | uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21repository: remove unnecessary include of path.hElijah Newren
This also made it clear that several .c files that depended upon path.h were missing a #include for it; add the missing includes while at it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21cache.h: remove this no-longer-used headerElijah Newren
Since this header showed up in some places besides just #include statements, update/clean-up/remove those other places as well. Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got away with violating the rule that all files must start with an include of git-compat-util.h (or a short-list of alternate headers that happen to include it first). This change exposed the violation and caused it to stop building correctly; fix it by having it include git-compat-util.h first, as per policy. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21read-cache*.h: move declarations for read-cache.c functions from cache.hElijah Newren
For the functions defined in read-cache.c, move their declarations from cache.h to a new header, read-cache-ll.h. Also move some related inline functions from cache.h to read-cache.h. The purpose of the read-cache-ll.h/read-cache.h split is that about 70% of the sites don't need the inline functions and the extra headers they include. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21sparse-index.h: move declarations for sparse-index.c from cache.hElijah Newren
Note in particular that this reverses the decision made in 118a2e8bde0 ("cache: move ensure_full_index() to cache.h", 2021-04-01). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-12repository: create disable_replace_refs()Derrick Stolee
Several builtins depend on being able to disable the replace references so we actually operate on each object individually. These currently do so by directly mutating the 'read_replace_refs' global. A future change will move this global into a different place, so it will be necessary to change all of these lines. However, we can simplify that transition by abstracting the purpose of these global assignments with a method call. We will need to keep this read_replace_refs global forever, as we want to make sure that we never use replace refs throughout the life of the process if this method is called. Future changes may present a repository-scoped version of the variable to represent that repository's core.useReplaceRefs config value, but a zero-valued read_replace_refs will always override such a setting. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-02fsck: use local repositoryDerrick Stolee
In 0d30feef3c5 (fsck: create scaffolding for rev-index checks, 2023-04-17) and later 5a6072f631d (fsck: validate .rev file header, 2023-04-17), the check_pack_rev_indexes() method was created with a 'struct repository *r' parameter. However, this parameter was unused and instead 'the_repository' was used in its place. Fix this situation with the obvious replacement. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-02fsck: verify checksums of all .bitmap filesDerrick Stolee
If a filesystem-level corruption occurs in a .bitmap file, Git can react poorly. This could take the form of a run-time error due to failing to parse an EWAH bitmap or be more subtle such as returning the wrong set of objects to a fetch or clone. A natural first response to either of these kinds of errors is to run 'git fsck' to see if any files are corrupt. This currently ignores all .bitmap files. Add checks to 'git fsck' for all .bitmap files that are currently associated with a multi-pack-index or pack file. Verify their checksums using the hashfile API. We iterate through all multi-pack-indexes and pack-files to be sure to check all .bitmap files, not just the one that would be read by the process. For example, a multi-pack-index bitmap overrules a pack-bitmap. However, if the multi-pack-index is removed, the pack-bitmap may be selected instead. Be thorough to include every file that could become active in such a way. This includes checking files in alternates. There is potential that we could extend this effort to check the structure of the reachability bitmaps themselves, but it is very expensive to do so. At minimum, it's as expensive as generating the bitmaps in the first place, and that's assuming that we don't use the trivial algorithm of verifying each bitmap individually. The trivial algorithm will result in quadratic behavior (number of objects times number of bitmapped commits) while the bitmap building operation constructs a lattice of commits to build bitmaps incrementally and then generate the final bitmaps from a subset of those commits. If we were to extend 'git fsck' to check .bitmap file contents more closely like this, then we would likely want to hide it behind an option that signals the user is more willing to do expensive operations such as this. For testing, set up a repository with a pack-bitmap _and_ a multi-pack-index bitmap. This requires some file movement to avoid deleting the pack-bitmap during the repack that creates the multi-pack-index bitmap. We can then verify that 'git fsck' is checking all files, not just the "active" bitmap. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27Merge branch 'ds/fsck-pack-revindex'Junio C Hamano
"git fsck" learned to validate the on-disk pack reverse index files. * ds/fsck-pack-revindex: fsck: validate .rev file header fsck: check rev-index position values fsck: check rev-index checksums fsck: create scaffolding for rev-index checks
2023-04-17fsck: validate .rev file headerDerrick Stolee
While parsing a .rev file, we check the header information to be sure it makes sense. This happens before doing any additional validation such as a checksum or value check. In order to differentiate between a bad header and a non-existent file, we need to update the API for loading a reverse index. Make load_pack_revindex_from_disk() non-static and specify that a positive value means "the file does not exist" while other errors during parsing are negative values. Since an invalid header prevents setting up the structures we would use for further validations, we can stop at that point. The place where we can distinguish between a missing file and a corrupt file is inside load_revindex_from_disk(), which is used both by pack rev-indexes and multi-pack-index rev-indexes. Some tests in t5326 demonstrate that it is critical to take some conditions to allow positive error signals. Add tests that check the three header values. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-17fsck: create scaffolding for rev-index checksDerrick Stolee
The 'fsck' builtin checks many of Git's on-disk data structures, but does not currently validate the pack rev-index files (a .rev file to pair with a .pack and .idx file). Before doing a more-involved check process, create the scaffolding within builtin/fsck.c to have a new error type and add that error type when the API method verify_pack_revindex() returns an error. That method does nothing currently, but we will add checks to it in later changes. For now, check that 'git fsck' succeeds without any errors in the normal case. Future checks will be paired with tests that corrupt the .rev file appropriately. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object-file.h: move declarations for object-file.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object-name.h: move declarations for object-name.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-06Merge branch 'en/header-split-cleanup'Junio C Hamano
Split key function and data structure definitions out of cache.h to new header files and adjust the users. * en/header-split-cleanup: csum-file.h: remove unnecessary inclusion of cache.h write-or-die.h: move declarations for write-or-die.c functions from cache.h treewide: remove cache.h inclusion due to setup.h changes setup.h: move declarations for setup.c functions from cache.h treewide: remove cache.h inclusion due to environment.h changes environment.h: move declarations for environment.c functions from cache.h treewide: remove unnecessary includes of cache.h wrapper.h: move declarations for wrapper.c functions from cache.h path.h: move function declarations for path.c functions from cache.h cache.h: remove expand_user_path() abspath.h: move absolute path functions from cache.h environment: move comment_line_char from cache.h treewide: remove unnecessary cache.h inclusion from several sources treewide: remove unnecessary inclusion of gettext.h treewide: be explicit about dependence on gettext.h treewide: remove unnecessary cache.h inclusion from a few headers
2023-04-06Merge branch 'ab/remove-implicit-use-of-the-repository'Junio C Hamano
Code clean-up around the use of the_repository. * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-04-04Merge branch 'ab/remove-implicit-use-of-the-repository' into ↵Junio C Hamano
en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-03-28cocci: apply the "cache.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason
Apply the part of "the_repository.pending.cocci" pertaining to "cache.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-17Merge branch 'jk/unused-post-2.39-part2'Junio C Hamano
More work towards -Wunused. * jk/unused-post-2.39-part2: (21 commits) help: mark unused parameter in git_unknown_cmd_config() run_processes_parallel: mark unused callback parameters userformat_want_item(): mark unused parameter for_each_commit_graft(): mark unused callback parameter rewrite_parents(): mark unused callback parameter fetch-pack: mark unused parameter in callback function notes: mark unused callback parameters prio-queue: mark unused parameters in comparison functions for_each_object: mark unused callback parameters list-objects: mark unused callback parameters mark unused parameters in signal handlers run-command: mark error routine parameters as unused mark "pointless" data pointers in callbacks ref-filter: mark unused callback parameters http-backend: mark unused parameters in virtual functions http-backend: mark argc/argv unused object-name: mark unused parameters in disambiguate callbacks serve: mark unused parameters in virtual functions serve: use repository pointer to get config ls-refs: drop config caching ...
2023-03-17Merge branch 'en/header-cleanup'Junio C Hamano
Code clean-up to clarify the rule that "git-compat-util.h" must be the first to be included. * en/header-cleanup: diff.h: remove unnecessary include of object.h Remove unnecessary includes of builtin.h treewide: replace cache.h with more direct headers, where possible replace-object.h: move read_replace_refs declaration from cache.h to here object-store.h: move struct object_info from cache.h dir.h: refactor to no longer need to include cache.h object.h: stop depending on cache.h; make cache.h depend on object.h ident.h: move ident-related declarations out of cache.h pretty.h: move has_non_ascii() declaration from commit.h cache.h: remove dependence on hex.h; make other files include it explicitly hex.h: move some hex-related declarations from cache.h hash.h: move some oid-related declarations from cache.h alloc.h: move ALLOC_GROW() functions from cache.h treewide: remove unnecessary cache.h includes in source files treewide: remove unnecessary cache.h includes treewide: remove unnecessary git-compat-util.h includes in headers treewide: ensure one of the appropriate headers is sourced first
2023-02-27fsck: check even zero-entry index filesJeff King
In fb64ca526a (fsck: check index files in all worktrees, 2023-02-24), we swapped out a call to vanilla repo_read_index() for a series of read_index_from() calls, one per worktree. The code for the latter was copied from add_index_objects_to_pending(), which checks for a positive return value from the index reading function, and we do the same here in fsck now. But this is probably the wrong thing. I had interpreted the check as "don't operate on the index struct if there was an error". But in reality, if there is an error then the index-reading code will simply die (which admittedly is not great for fsck, but that is not a new problem). The return value here is actually the number of entries read. So it makes sense for add_index_objects_to_pending() to ignore a zero-entry index (there is nothing to add). But for fsck, we would still want to check any extensions, etc (though presumably it is unlikely to have them in an empty index, I don't think it's impossible). So we should ignore the return value from read_index_from() entirely. This matches the behavior before fb64ca526a, when we ignored the return value from repo_read_index(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24fsck: mention file path for index errorsJeff King
If we encounter an error in an index file, we may say something like: error: 1234abcd: invalid sha1 pointer in resolve-undo But if you have multiple worktrees, each with its own index, it can be very helpful to know which file had the problem. So let's pass that path down through the various index-fsck functions and use it where appropriate. After this patch you should get something like: error: 1234abcd: invalid sha1 pointer in resolve-undo of .git/worktrees/wt/index That's a bit verbose, but since the point is that you shouldn't see this normally, we're better to err on the side of more details. I've also added the index filename to the name used by "fsck --name-objects", which will show up if we find the object to be missing, etc. This is bending the rules a little there, as the option claims to write names that can be fed to rev-parse. But there is no revision syntax to access the index of another worktree, so the best we can do is make up something that a human will probably understand. I did take care to retain the existing ":file" syntax for the current worktree. So the uglier output should kick in only when it's actually necessary. See the included tests for examples of both forms. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24fsck: check index files in all worktreesJeff King
We check the index file for the main worktree, but completely ignore the index files in other worktrees. These should be checked, too, as they are part of the repository state (and in particular, errors in those index files may cause repo-wide operations like "git gc" to complain). Reported-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24fsck: factor out index fsckJeff King
The code to fsck an index operates directly on the_index. Let's move it into its own function in preparation for handling the index files from other worktrees. Since we now have only a single reference to the_index, let's drop our USE_THE_INDEX_VARIABLE definition and just use the_repository.index directly. That's a minor cleanup, but also ensures that we didn't miss any references when moving the code into fsck_index(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24for_each_object: mark unused callback parametersJeff King
The for_each_{loose,packed}_object interface uses callback functions, but not every callback needs all of the parameters. Mark the unused ones to satisfy -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24replace-object.h: move read_replace_refs declaration from cache.h to hereElijah Newren
Adjust several files to be more explicit about their dependency on replace-objects to accommodate this change. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-21cocci: apply "pending" index-compatibility to some "builtin/*.c"Ævar Arnfjörð Bjarmason
Apply "index-compatibility.pending.cocci" rule to "builtin/*", but exclude those where we conflict with in-flight changes. As a result some of them end up using only "the_index", so let's have them use the more narrow "USE_THE_INDEX_VARIABLE" rather than "USE_THE_INDEX_COMPATIBILITY_MACROS". Manual changes not made by coccinelle, that were squashed in: * Whitespace-wrap argument lists for repo_hold_locked_index(), repo_read_index_preload() and repo_refresh_and_write_index(), in cases where the line became too long after the transformation. * Change "refresh_cache()" to "refresh_index()" in a comment in "builtin/update-index.c". * For those whose call was followed by perror("<macro-name>"), change it to perror("<function-name>"), referring to the new function. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-21cocci & cache.h: apply variable section of "pending" index-compatibilityÆvar Arnfjörð Bjarmason
Mostly apply the part of "index-compatibility.pending.cocci" that renames the global variables like "active_nr", which are a shorthand to referencing (in that case) a struct member as "the_index.cache_nr". In doing so move more of "index-compatibility.pending.cocci" to "index-compatibility.cocci". In the case of "active_nr" we'd have a textual conflict with "ab/various-leak-fixes" in "next"[1]. Let's exclude that specific case while moving the rule over from "pending". 1. 407b94280f8 (commit: discard partial cache before (re-)reading it, 2022-11-08) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-28Merge branch 'ab/doc-synopsis-and-cmd-usage'Junio C Hamano
The short-help text shown by "git cmd -h" and the synopsis text shown at the beginning of "git help cmd" have been made more consistent. * ab/doc-synopsis-and-cmd-usage: (34 commits) tests: assert consistent whitespace in -h output tests: start asserting that *.txt SYNOPSIS matches -h output doc txt & -h consistency: make "worktree" consistent worktree: define subcommand -h in terms of command -h reflog doc: list real subcommands up-front doc txt & -h consistency: make "commit" consistent doc txt & -h consistency: make "diff-tree" consistent doc txt & -h consistency: use "[<label>...]" for "zero or more" doc txt & -h consistency: make "annotate" consistent doc txt & -h consistency: make "stash" consistent doc txt & -h consistency: add missing options doc txt & -h consistency: use "git foo" form, not "git-foo" doc txt & -h consistency: make "bundle" consistent doc txt & -h consistency: make "read-tree" consistent doc txt & -h consistency: make "rerere" consistent doc txt & -h consistency: add missing options and labels doc txt & -h consistency: make output order consistent doc txt & -h consistency: add or fix optional "--" syntax doc txt & -h consistency: fix mismatching labels doc SYNOPSIS & -h: use "-" to separate words in labels, not "_" ...
2022-10-13doc txt & -h consistency: add missing optionsÆvar Arnfjörð Bjarmason
Change those built-in commands that were attempting to exhaustively list the options in the "-h" output to actually do so, and always have *.txt documentation know about the exhaustive list of options. Let's also fix the documentation and -h output for those built-in commands where the *.txt and -h output was a mismatch of missing options on both sides. In the case of "interpret-trailers" fixing the missing options reveals that the *.txt version was implicitly claiming that the command had two operating modes, which a look at the -h version (and studying the documentation) will show is not the case. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-10Merge branch 'jk/fsck-on-diet'Junio C Hamano
"git fsck" failed to release contents of tree objects already used from the memory, which has been fixed. * jk/fsck-on-diet: parse_object_buffer(): respect save_commit_buffer fsck: turn off save_commit_buffer fsck: free tree buffers after walking unreachable objects
2022-09-22parse_object_buffer(): respect save_commit_bufferJeff King
If the global variable "save_commit_buffer" is set to 0, then parse_commit() will throw away the commit object data after parsing it, rather than sticking it into a commit slab. This goes all the way back to 60ab26de99 ([PATCH] Avoid wasting memory in git-rev-list, 2005-09-15). But there's another code path which may similarly stash the buffer: parse_object_buffer(). This is where we end up if we parse a commit via parse_object(), and it's used directly in a few other code paths like git-fsck. The original goal of 60ab26de99 was avoiding extra memory usage for rev-list. And there it's not all that important to catch parse_object(). We use that function only for looking at the tips of the traversal, and the majority of the commits are parsed by following parent links, where we use parse_commit() directly. So we were wasting some memory, but only a small portion. It's much easier to see the effect with fsck. Since we now turn off save_commit_buffer by default there, we _should_ be able to drop the freeing of the commit buffer in fsck_obj(). But if we do so (taking the first hunk of this patch without the rest), then the peak heap of "git fsck" in a clone of git.git goes from 136MB to 194MB. Teaching parse_object_buffer() to respect save_commit_buffer brings that down to 134.5MB (it's hard to tell from massif's output, but I suspect the savings comes from avoiding the overhead of the mostly-empty commit slab). Other programs should see a small improvement. Both "rev-list --all" and "fsck --connectivity-only" improve by a few hundred kilobytes, as they'd avoid loading the tip objects of their traversals. Most importantly, no code should be hurt by doing this. Any program that turns off save_commit_buffer is already making the assumption that any commit it sees may need to have its object data loaded on demand, as it doesn't know which ones were parsed by parse_commit() versus parse_object(). Not to mention that anything parsed by the commit graph may be in the same boat, even if save_commit_buffer was not disabled. This should be the only spot that needs to be fixed. Grepping for set_commit_buffer() shows that this and parse_commit() are the only relevant calls. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-22fsck: turn off save_commit_bufferJeff King
When parsing a commit, the default behavior is to stuff the original buffer into a commit_slab (which takes ownership of it). But for a tool like fsck, this isn't useful. While we may look at the buffer further as part of fsck_commit(), we'll always do so through a separate pointer; attaching the buffer to the slab doesn't help. Worse, it means we have to remember to free the commit buffer in all call paths. We do so in fsck_obj(), which covers a regular "git fsck". But with "--connectivity-only", we forget to do so in both traverse_one_object(), which covers reachable objects, and mark_unreachable_referents(), which covers unreachable ones. As a result, that mode ends up storing an uncompressed copy of every commit on the heap at once. We could teach the code paths for --connectivity-only to also free commit buffers. But there's an even easier fix: we can just turn off the save_commit_buffer flag, and then we won't attach them to the commits in the first place. This reduces the peak heap of running "git fsck --connectivity-only" in a clone of linux.git from ~2GB to ~1GB. According to massif, the remaining memory goes where you'd expect: the object structs themselves, the obj_hash containing them, and the delta base cache. Note that we'll leave the call to free commit buffers in fsck_obj() for now; it's not quite redundant because of a related bug that we'll fix in a subsequent commit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-22fsck: free tree buffers after walking unreachable objectsJeff King
After calling fsck_walk(), a tree object struct may be left in the parsed state, with the full tree contents available via tree->buffer. It's the responsibility of the caller to free these when it's done with the object to avoid having many trees allocated at once. In a regular "git fsck", we hit fsck_walk() only from fsck_obj(), which does call free_tree_buffer(). Likewise for "--connectivity-only", we see most objects via traverse_one_object(), which makes a similar call. The exception is in mark_unreachable_referents(). When using both "--connectivity-only" and "--dangling" (the latter of which is the default), we walk all of the unreachable objects, and there we forget to free. Most cases would not notice this, because they don't have a lot of unreachable objects, but you can make a pathological case like this: git clone --bare /path/to/linux.git repo.git cd repo.git rm packed-refs ;# now everything is unreachable! git fsck --connectivity-only That ends up with peak heap usage ~18GB, which is (not coincidentally) close to the size of all uncompressed trees in the repository. After this patch, the peak heap is only ~2GB. A few things to note: - it might seem like fsck_walk(), if it is parsing the trees, should be responsible for freeing them. But the situation is quite tricky. In the non-connectivity mode, after we call fsck_walk() we then proceed with fsck_object() which actually does the type-specific sanity checks on the object contents. We do pass our own separate buffer to fsck_object(), but there's a catch: our earlier call to parse_object_buffer() may have attached that buffer to the object struct! So by freeing it, we leave the rest of the code with a dangling pointer. Likewise, the call to fsck_walk() in index-pack is subtle. It attaches a buffer to the tree object that must not be freed! And so rather than calling free_tree_buffer(), it actually detaches it by setting tree->buffer to NULL. These cases would _probably_ be fixable by having fsck_walk() free the tree buffer only when it was the one who allocated it via parse_tree(). But that would still leave the callers responsible for freeing other cases, so they wouldn't be simplified. While the current semantics for fsck_walk() make it easy to accidentally leak in new callers, at least they are simple to explain, and it's not a function that's likely to get a lot of new call-sites. And in any case, it's probably sensible to fix the leak first with this simple patch, and try any more complicated refactoring separately. - a careful reader may notice that fsck_obj() also frees commit buffers, but neither the call in traverse_one_object() nor the one touched in this patch does so. And indeed, this is another problem for --connectivity-only (and accounts for most of the 2GB heap after this patch), but it's one we'll fix in a separate commit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01git-compat-util.h: use "UNUSED", not "UNUSED(var)"Ævar Arnfjörð Bjarmason
As reported in [1] the "UNUSED(var)" macro introduced in 2174b8c75de (Merge branch 'jk/unused-annotation' into next, 2022-08-24) breaks coccinelle's parsing of our sources in files where it occurs. Let's instead partially go with the approach suggested in [2] of making this not take an argument. As noted in [1] "coccinelle" will ignore such tokens in argument lists that it doesn't know about, and it's less of a surprise to syntax highlighters. This undoes the "help us notice when a parameter marked as unused is actually use" part of 9b240347543 (git-compat-util: add UNUSED macro, 2022-08-19), a subsequent commit will further tweak the macro to implement a replacement for that functionality. 1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19refs: mark unused reflog callback parametersJeff King
Functions used with for_each_reflog_ent() need to conform to a particular interface, but not every function needs all of the parameters. Mark the unused ones to make -Wunused-parameter happy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>