summaryrefslogtreecommitdiff
path: root/object.c
diff options
context:
space:
mode:
authorJeff King <peff@peff.net>2022-11-17 22:41:16 (GMT)
committerTaylor Blau <me@ttaylorr.com>2022-11-18 18:59:31 (GMT)
commit8db2dad7a045e376b9c4f51ddd33da43c962e3a4 (patch)
treeb57a96d09e87d8d1579de99571fd1f4bd8fb3a24 /object.c
parent04fb96219abc0cbe46ba084997dc9066de3ac889 (diff)
downloadgit-8db2dad7a045e376b9c4f51ddd33da43c962e3a4.zip
git-8db2dad7a045e376b9c4f51ddd33da43c962e3a4.tar.gz
git-8db2dad7a045e376b9c4f51ddd33da43c962e3a4.tar.bz2
parse_object(): check on-disk type of suspected blob
In parse_object(), we try to handle blobs by streaming rather than loading them entirely into memory. The most common case here will be that we haven't seen the object yet and check oid_object_info(), which tells us we have a blob. But we trigger this code on one other case: when we have an in-memory object struct with type OBJ_BLOB (and without its "parsed" flag set, since otherwise we'd return early from the function). This indicates that some other part of the code suspected we have a blob (e.g., it was mentioned by a tree or tag) but we haven't yet looked at the on-disk copy. In this case before hitting the streaming path, we check if we have the object on-disk at all. This is mostly pointless extra work, as the streaming path would complain if it couldn't open the object (albeit with the message "hash mismatch", which is a little misleading). But it's also insufficient to catch all problems. The streaming code will only tell us "yes, the on-disk object matches the oid". But it doesn't actually confirm that what we found was indeed a blob, and neither does repo_has_object_file(). One way to improve this would be to teach stream_object_signature() to check the type (either by returning it to us to check, or taking an "expected" type). But there's an even simpler fix here: if we suspect the object is a blob, just call oid_object_info() to confirm that we have it on-disk, and that it really is a blob. This is slightly less efficient than teaching stream_object_signature() to do it (since it has to open the object already). But this case very rarely comes up. In practice, we usually don't have any clue what the type is, in which case we already call oid_object_info(). This "suspected" case happens only when some other code created an object struct but didn't actually parse the blob, which is actually tricky to trigger at all (see the discussion of the test below). I reworked the conditional a bit so that instead of: if ((suspected_blob && oid_object_info() == OBJ_BLOB) (no_clue && oid_object_info() == OBJ_BLOB) we have the simpler: if ((suspected_blob || no_clue) && oid_object_info() == OBJ_BLOB) This is shorter, but also reflects what we really want say, which is "have we ruled out this being a blob; if not, check it on-disk". In either case, if oid_object_info() fails to tell us it's a blob, we'll skip the streaming code path and call repo_read_object_file(), just as before. And if we really do have a mismatch with the existing object struct, we'll eventually call lookup_commit(), etc, via parse_object_buffer(), which will complain that it doesn't match our existing obj->type. So this fixes one of the lingering expect_failure cases from 0616617c7e (t: introduce tests for unexpected object types, 2019-04-09). That test works by peeling a tag that claims to point to a blob (triggering us to create the struct), but really points to something else, which we later discover when we call parse_object() as part of the actual traversal). Prior to this commit, we'd quietly check the sha1 and mark the blob as "parsed". Now we correctly complain about the mismatch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>
Diffstat (limited to 'object.c')
-rw-r--r--object.c4
1 files changed, 2 insertions, 2 deletions
diff --git a/object.c b/object.c
index 16eb944..fad1a5a 100644
--- a/object.c
+++ b/object.c
@@ -286,8 +286,8 @@ struct object *parse_object_with_flags(struct repository *r,
return &commit->object;
}
- if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
- (!obj && oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
+ if ((!obj || (obj && obj->type == OBJ_BLOB)) &&
+ oid_object_info(r, oid, NULL) == OBJ_BLOB) {
if (!skip_hash && stream_object_signature(r, repl) < 0) {
error(_("hash mismatch %s"), oid_to_hex(oid));
return NULL;