summaryrefslogtreecommitdiff
path: root/t/t6102-rev-list-unexpected-objects.sh
AgeCommit message (Collapse)Author
2023-11-02tests: teach callers of test_i18ngrep to use test_grepJunio C Hamano
They are equivalents and the former still exists, so as long as the only change this commit makes are to rewrite test_i18ngrep to test_grep, there won't be any new bug, even if there still are callers of test_i18ngrep remaining in the tree, or when merged to other topics that add new uses of test_i18ngrep. This patch was produced more or less with git grep -l -e 'test_i18ngrep ' 't/t[0-9][0-9][0-9][0-9]-*.sh' | xargs perl -p -i -e 's/test_i18ngrep /test_grep /' and a good way to sanity check the result yourself is to run the above in a checkout of c4603c1c (test framework: further deprecate test_i18ngrep, 2023-10-31) and compare the resulting working tree contents with the result of applying this patch to the same commit. You'll see that test_i18ngrep in a few t/lib-*.sh files corrected, in addition to the manual reproduction. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-18parse_object(): check on-disk type of suspected blobJeff King
In parse_object(), we try to handle blobs by streaming rather than loading them entirely into memory. The most common case here will be that we haven't seen the object yet and check oid_object_info(), which tells us we have a blob. But we trigger this code on one other case: when we have an in-memory object struct with type OBJ_BLOB (and without its "parsed" flag set, since otherwise we'd return early from the function). This indicates that some other part of the code suspected we have a blob (e.g., it was mentioned by a tree or tag) but we haven't yet looked at the on-disk copy. In this case before hitting the streaming path, we check if we have the object on-disk at all. This is mostly pointless extra work, as the streaming path would complain if it couldn't open the object (albeit with the message "hash mismatch", which is a little misleading). But it's also insufficient to catch all problems. The streaming code will only tell us "yes, the on-disk object matches the oid". But it doesn't actually confirm that what we found was indeed a blob, and neither does repo_has_object_file(). One way to improve this would be to teach stream_object_signature() to check the type (either by returning it to us to check, or taking an "expected" type). But there's an even simpler fix here: if we suspect the object is a blob, just call oid_object_info() to confirm that we have it on-disk, and that it really is a blob. This is slightly less efficient than teaching stream_object_signature() to do it (since it has to open the object already). But this case very rarely comes up. In practice, we usually don't have any clue what the type is, in which case we already call oid_object_info(). This "suspected" case happens only when some other code created an object struct but didn't actually parse the blob, which is actually tricky to trigger at all (see the discussion of the test below). I reworked the conditional a bit so that instead of: if ((suspected_blob && oid_object_info() == OBJ_BLOB) (no_clue && oid_object_info() == OBJ_BLOB) we have the simpler: if ((suspected_blob || no_clue) && oid_object_info() == OBJ_BLOB) This is shorter, but also reflects what we really want say, which is "have we ruled out this being a blob; if not, check it on-disk". In either case, if oid_object_info() fails to tell us it's a blob, we'll skip the streaming code path and call repo_read_object_file(), just as before. And if we really do have a mismatch with the existing object struct, we'll eventually call lookup_commit(), etc, via parse_object_buffer(), which will complain that it doesn't match our existing obj->type. So this fixes one of the lingering expect_failure cases from 0616617c7e (t: introduce tests for unexpected object types, 2019-04-09). That test works by peeling a tag that claims to point to a blob (triggering us to create the struct), but really points to something else, which we later discover when we call parse_object() as part of the actual traversal). Prior to this commit, we'd quietly check the sha1 and mark the blob as "parsed". Now we correctly complain about the mismatch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-07-27leak tests: don't skip some tests under SANITIZE=leakÆvar Arnfjörð Bjarmason
The '!SANITIZE_LEAK' prerequisite added in 956d2e4639b (tests: add a test mode for SANITIZE=leak, run it in CI, 2021-09-23) has been used in various tests to skip individual tests in otherwise leak-free tests. Let's change the cases that have become leak-free since then to run under SANITIZE=leak. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-07rev-list tests: don't hide abort() in "test_expect_failure"Ævar Arnfjörð Bjarmason
Change a couple of uses of "test_expect_failure" to use a "test_expect_success" to positively assert the current behavior, and replace the intent of "test_expect_failure" with a "TODO" comment int the description. As noted in [1] the "test_expect_failure" feature is overly eager to accept any failure as OK, and thus by design hides segfaults, abort() etc. Because of that I didn't notice in dd9cede9136 (leak tests: mark some rev-list tests as passing with SANITIZE=leak, 2021-10-31) that this test leaks memory under SANITIZE=leak. I have some larger local changes to add a better "test_expect_failure", which would work just like "test_expect_success", but would allow us say "test_todo" here (and "success" would emit a "not ok [...] # TODO", not "ok [...]". So even though using "test_expect_success" here comes with its own problems[2], let's use it as a narrow change to fix the problem at hand here and stop conflating the current "success" with actual SANITIZE=leak failures. 1. https://lore.kernel.org/git/87tuhmk19c.fsf@evledraar.gmail.com/ 2. https://lore.kernel.org/git/xmqq4k9kj15p.fsf@gitster.g/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-01leak tests: mark some rev-list tests as passing with SANITIZE=leakÆvar Arnfjörð Bjarmason
Mark some tests that match "*rev-list*" as passing when git is compiled with SANITIZE=leak. They'll now be listed as running under the "GIT_TEST_PASSING_SANITIZE_LEAK=true" test mode (the "linux-leaks" CI target). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-21parse_commit_buffer(): treat lookup_commit() failure as parse errorJeff King
While parsing the parents of a commit, if we are able to parse an actual oid but lookup_commit() fails on it (because we previously saw it in this process as a different object type), we silently omit the parent and do not report any error to the caller. The caller has no way of knowing this happened, because even an empty parent list is a valid parse result. As a result, it's possible to fool our "rev-list" connectivity check into accepting a corrupted set of objects. There's a test for this case already in t6102, but unfortunately it has a slight error. It creates a broken commit with a parent line pointing to a blob, and then checks that rev-list notices the problem in two cases: 1. the "lone" case: we traverse the broken commit by itself (here we try to actually load the blob from disk and find out that it's not a commit) 2. the "seen" case: we parse the blob earlier in the process, and then when calling lookup_commit() we realize immediately that it's not a commit The "seen" variant for this test mistakenly parsed another commit instead of the blob, meaning that we were actually just testing the "lone" case again. Changing that reveals the breakage (and shows that this fixes it). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-10rev-list: detect broken root treesJeff King
When the traversal machinery sees a commit without a root tree, it assumes that the tree was part of a BOUNDARY commit, and quietly ignores the tree. But it could also be caused by a commit whose root tree is broken or missing. Instead, let's die() when we see a NULL root tree. We can differentiate it from the BOUNDARY case by seeing if the commit was actually parsed. This covers that case, plus future-proofs us against any others where we might try to show an unparsed commit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-10rev-list: let traversal die when --missing is not in useJeff King
Commit 7c0fe330d5 (rev-list: handle missing tree objects properly, 2018-10-05) taught the traversal machinery used by git-rev-list to ignore missing trees, so that rev-list could handle them itself. However, it does so only by checking via oid_object_info_extended() that the object exists at all. This can miss several classes of errors that were previously detected by rev-list: - type mismatches (e.g., we expected a tree but got a blob) - failure to read the object data (e.g., due to bitrot on disk) This is especially important because we use "rev-list --objects" as our connectivity check to admit new objects to the repository, and it will now miss these cases (though the bitrot one is less important here, because we'd typically have just hashed and stored the object). There are a few options to fix this: 1. we could check these properties in rev-list when we do the existence check. This is probably too expensive in practice (perhaps even for a type check, but definitely for checking the whole content again, which implies loading each object into memory twice). 2. teach the traversal machinery to differentiate between a missing object, and one that could not be loaded as expected. This probably wouldn't be too hard to detect type mismatches, but detecting bitrot versus a truly missing object would require deep changes to the object-loading code. 3. have the traversal machinery communicate the failure to the caller, so that it can decide how to proceed without re-evaluting the object itself. Of those, I think (3) is probably the best path forward. However, this patch does none of them. In the name of expediently fixing the regression to a normal "rev-list --objects" that we use for connectivity checks, this simply restores the pre-7c0fe330d5 behavior of having the traversal die as soon as it fails to load a tree (when --missing is set to MA_ERROR, which is the default). Note that we can't get rid of the object-existence check in finish_object(), because this also handles blobs (which are not otherwise checked at all by the traversal code). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-10list-objects.c: handle unexpected non-tree entriesTaylor Blau
Apply similar treatment as the previous commit for non-tree entries, too. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-10list-objects.c: handle unexpected non-blob entriesTaylor Blau
Fix one of the cases described in the previous commit where a tree-entry that is promised to a blob is in fact a non-blob. When 'lookup_blob()' returns NULL, it is because Git has cached the requested object as a non-blob. In this case, prevent a SIGSEGV by 'die()'-ing immediately before attempting to dereference the result. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-10t: introduce tests for unexpected object typesTaylor Blau
Call an object's type "unexpected" when the actual type of an object does not match Git's contextual expectation. For example, a tree entry whose mode differs from the object's actual type, or a commit's parent which is not another commit, and so on. This can manifest itself in various unfortunate ways, including Git SIGSEGV-ing under specific conditions. Consider the following example: Git traverses a blob (say, via `git rev-list`), and then tries to read out a tree-entry which lists that object as something other than a blob. In this case, `lookup_blob()` will return NULL, and the subsequent dereference will result in a SIGSEGV. Introduce tests that present objects of "unexpected" type in the above fashion to 'git rev-list'. Mark as failures the combinations that are already broken (i.e., they exhibit the segfault described above). In the cases that are not broken (i.e., they have NULL-ness checks or similar), mark these as expecting success. We might hit an unexpected type in two different ways (imagine we have a tree entry that claims to be a tree but actually points to a blob): - when we call lookup_tree(), we might find that we've already seen the object referenced as a blob, in which case we'd get NULL. We can exercise this with "git rev-list --objects $blob $tree", which guarantees that the blob will have been parsed before we look in the tree. These tests are marked as "seen" in the test script. - we call lookup_tree() successfully, but when we try to read the object, we find out it's something else. We construct our tests such that $blob is not otherwise mentioned in $tree. These tests are marked as "lone" in the script. We should check that we behave sensibly in both cases (especially because it is easy for a malicious actor to provoke one case or the other). Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>