summaryrefslogtreecommitdiff
path: root/t/t1060-object-corruption.sh
AgeCommit message (Collapse)Author
2019-03-20Merge branch 'jk/virtual-objects-do-exist'Junio C Hamano
A recent update broke "is this object available to us?" check for well-known objects like an empty tree (which should yield "yes", even when there is no on-disk object for an empty tree), which has been corrected. * jk/virtual-objects-do-exist: rev-list: allow cached objects in existence check
2019-03-05rev-list: allow cached objects in existence checkJeff King
This fixes a regression in 7c0fe330d5 (rev-list: handle missing tree objects properly, 2018-10-05) where rev-list will now complain about the empty tree when it doesn't physically exist on disk. Before that commit, we relied on the traversal code in list-objects.c to walk through the trees. Since it uses parse_tree(), we'd do a normal object lookup that includes looking in the set of "cached" objects (which is where our magic internal empty-tree kicks in). After that commit, we instead tell list-objects.c not to die on any missing trees, and we check them ourselves using has_object_file(). But that function uses OBJECT_INFO_SKIP_CACHED, which means we won't use our internal empty tree. This normally wouldn't come up. For most operations, Git will try to write out the empty tree object as it would any other object. And pack-objects in a push or fetch will send the empty tree (even if it's virtual on the sending side). However, there are cases where this can matter. One I found in the wild: 1. The root tree of a commit became empty by deleting all files, without using an index. In this case it was done using libgit2's tree builder API, but as the included test shows, it can easily be done with regular git using hash-object. The resulting repo works OK, as we'd avoid walking over our own reachable commits for a connectivity check. 2. Cloning with --reference pointing to the repository from (1) can trigger the problem, because we tell the other side we already have that commit (and hence the empty tree), but then walk over it during the connectivity check (where we complain about it missing). Arguably the workflow in step (1) should be more careful about writing the empty tree object if we're referencing it. But this workflow did work prior to 7c0fe330d5, so let's restore it. This patch makes the minimal fix, which is to swap out a direct call to oid_object_info_extended(), minus the SKIP_CACHED flag, instead of calling has_object_file(). This is all that has_object_file() is doing under the hood. And there's little danger of unrelated fallout from other unexpected "cached" objects, since there's only one call site that ends such a cached object, and it's in git-blame. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-31index-pack tests: don't leave test repo dirty at endÆvar Arnfjörð Bjarmason
Change a test added in 51054177b3 ("index-pack: detect local corruption in collision check", 2017-04-01) so that the repository isn't left dirty at the end. Due to the caveats explained in 720dae5a19 ("config doc: elaborate on fetch.fsckObjects security", 2018-07-27) even a "fetch" that fails will write to the local object store, so let's copy the bit-error test directory before running this test. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-01index-pack: detect local corruption in collision checkJeff King
When we notice that we have a local copy of an incoming object, we compare the two objects to make sure we haven't found a collision. Before we get to the actual object bytes, though, we compare the type and size from sha1_object_info(). If our local object is corrupted, then the type will be OBJ_BAD, which obviously will not match the incoming type, and we'll report "SHA1 COLLISION FOUND" (with capital letters and everything). This is confusing, as the problem is not a collision but rather local corruption. We should report that instead (just like we do if reading the rest of the object content fails a few lines later). Note that we _could_ just ignore the error and mark it as a non-collision. That would let you "git fetch" to replace a corrupted object. But it's not a very reliable method for repairing a repository. The earlier want/have negotiation tries to get the other side to omit objects we already have, and it would not realize that we are "missing" this corrupted object. So we're better off complaining loudly when we see corruption, and letting the user take more drastic measures to repair (like making a full clone elsewhere and copying the pack into place). Note that the test sets transfer.unpackLimit in the receiving repository so that we use index-pack (which is what does the collision check). Normally for such a small push we'd use unpack-objects, which would simply try to write the loose object, and discard the new one when we see that there's already an old one. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-01sha1_loose_object_info: return error for corrupted objectsJeff King
When sha1_loose_object_info() finds that a loose object file cannot be stat(2)ed or mmap(2)ed, it returns -1 to signal an error to the caller. However, if it found that the loose object file is corrupt and the object data cannot be used from it, it stuffs OBJ_BAD into "type" field of the object_info, but returns zero (i.e., success), which can confuse callers. This is due to 052fe5eac (sha1_loose_object_info: make type lookup optional, 2013-07-12), which switched the return to a strict success/error, rather than returning the type (but botched the return). Callers of regular sha1_object_info() don't notice the difference, as that function returns the type (which is OBJ_BAD in this case). However, direct callers of sha1_object_info_extended() see the function return success, but without setting any meaningful values in the object_info struct, leading them to access potentially uninitialized memory. The easiest way to see the bug is via "cat-file -s", which will happily ignore the corruption and report whatever value happened to be in the "size" variable. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-29clone: leave repo in place after checkout errorsJeff King
If we manage to clone a remote repository but run into an error in the checkout, it is probably sane to leave the repo directory in place. That lets the user examine the situation without spending time to re-clone from the remote (which may be a lengthy process). Rather than try to convert each die() from the checkout code path into an error(), we simply set a flag that tells the "remove_junk" atexit function to print a helpful message and leave the repo in place. Note that the test added in this patch actually passes without the code change. The reason is that the cleanup code is buggy; we chdir into the working tree for the checkout, but still may use relative paths to remove the directories (which means if you cloned into "foo", we would accidentally remove "foo" from the working tree!). There's no point in fixing it now, since this patch means we will never try to remove anything after the chdir, anyway. [jc: replaced the message with a more succinct version from Jonathan] Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-27clone: run check_everything_connectedJeff King
When we fetch from a remote, we do a revision walk to make sure that what we received is connected to our existing history. We do not do the same check for clone, which should be able to check that we received an intact history graph. The upside of this patch is that it will make clone more resilient against propagating repository corruption. The downside is that we will now traverse "rev-list --objects --all" down to the roots, which may take some time (it is especially noticeable for a "--local --bare" clone). Note that we need to adjust t5710, which tries to make such a bogus clone. Rather than checking after the fact that our clone is bogus, we can simplify it to just make sure "git clone" reports failure. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-27clone: die on errors from unpack_treesJeff King
When clone is populating the working tree, it ignores the return status from unpack_trees; this means we may report a successful clone, even when the checkout fails. When checkout fails, we may want to leave the $GIT_DIR in place, as it might be possible to recover the data through further use of "git checkout" (e.g., if the checkout failed due to a transient error, disk full, etc). However, we already die on a number of other checkout-related errors, so this patch follows that pattern. In addition to marking a now-passing test, we need to adjust t5710, which blindly assumed it could make bogus clones of very deep alternates hierarchies. By using "--bare", we can avoid it actually touching any objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-27add tests for cloning corrupted repositoriesJeff King
We try not to let corruption pass unnoticed over fetches and clones. For the most part, this works, but there are some broken corner cases, including: 1. We do not detect missing objects over git-aware transports. This is a little hard to test, because the sending side will actually complain about the missing object. To fool it, we corrupt a repository such that we have a "misnamed" object: it claims to be sha1 X, but is really Y. This lets the sender blindly transmit it, but it is the receiver's responsibility to verify that what it got is sane (and it does not). 2. We do not detect missing or misnamed blobs during the checkout phase of clone. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-27streaming_write_entry: propagate streaming errorsJeff King
When we are streaming an index blob to disk, we store the error from stream_blob_to_fd in the "result" variable, and then immediately overwrite that with the return value of "close". That means we catch errors on close (e.g., problems committing the file to disk), but miss anything which happened before then. We can fix this by using bitwise-OR to accumulate errors in our result variable. While we're here, we can also simplify the error handling with an early return, which makes it easier to see under which circumstances we need to clean up. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-27add test for streaming corrupt blobsJeff King
We do not have many tests for handling corrupt objects. This new test at least checks that we detect a byte error in a corrupt blob object while streaming it out with cat-file. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>