ref-filter: avoid parsing non-tags in match_points_at()

When handling --points-at, we have to try to peel each ref to see if it's a tag that points at a requested oid. We start this process by calling parse_object() on the oid pointed to by each ref. The cost of parsing each object adds up, especially in an output that doesn't otherwise need to open the objects at all. Ideally we'd use peel_iterated_oid() here, which uses the cached information in the packed-refs file. But we can't, because our --points-at must match not only the fully peeled value, but any interim values (so if tag A points to tag B which points to commit C, we should match --points-at=B, but peel_iterated_oid() will only tell us about C). So the best we can do (absent changes to the packed-refs peel traits) is to avoid parsing non-tags. The obvious way to do that is to call oid_object_info() to check the type before parsing. But there are a few gotchas there, like checking if the object has already been parsed. Instead we can just tell parse_object() that we are OK skipping the hash check, which lets it turn on several optimizations. Commits can be loaded via the commit graph (so it's both fast and we have the benefit of the parsed data if we need it later at the output stage). Blobs are not loaded at all. Trees are still loaded, but it's rather rare to have a ref point directly to a tree (and since this is just an optimization, kicking in 99% of the time is OK). Even though we're paying for an extra lookup, the cost to avoid parsing the non-tags is a net benefit. In my git.git repository with 941 tags and 1440 other refs pointing to commits, this significantly cuts the runtime: Benchmark 1: ./git.old for-each-ref --points-at=HEAD Time (mean ± σ): 26.8 ms ± 0.5 ms [User: 24.5 ms, System: 2.2 ms] Range (min … max): 25.9 ms … 29.2 ms 107 runs Benchmark 2: ./git.new for-each-ref --points-at=HEAD Time (mean ± σ): 9.1 ms ± 0.3 ms [User: 6.8 ms, System: 2.2 ms] Range (min … max): 8.6 ms … 10.2 ms 308 runs Summary './git.new for-each-ref --points-at=HEAD' ran 2.96 ± 0.10 times faster than './git.old for-each-ref --points-at=HEAD' In a repository that is mostly annotated tags, we'd expect less improvement (we might still skip a few object loads, but that's balanced by the extra lookups). In my clone of linux.git, which has 782 tags and 3 branches, the run-time is about the same (it's actually ~1% faster on average after this patch, but that's within the run-to-run noise). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
author: Jeff King <peff@peff.net> 2023-07-02 22:37:47 (GMT)
committer: Junio C Hamano <gitster@pobox.com> 2023-07-17 21:16:05 (GMT)
commit: 870eb53ab20e9ff453e3b89b4927c154c2b7211a (patch)
tree: 8af53dec2974da97cc830a21991466cb61990e6e /ref-filter.c
parent: b9584c5858799d5603851af5f0dbad5e7af29b22 (diff)
download: git-870eb53ab20e9ff453e3b89b4927c154c2b7211a.zip
git-870eb53ab20e9ff453e3b89b4927c154c2b7211a.tar.gz
git-870eb53ab20e9ff453e3b89b4927c154c2b7211a.tar.bz2
1 files changed, 2 insertions, 1 deletions
diff --git a/ref-filter.c b/ref-filter.c
index 2eb41ac..948a253 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2344,7 +2344,8 @@ static const struct object_id *match_points_at(struct oid_array *points_at,
 
 	if (oid_array_lookup(points_at, oid) >= 0)
 		return oid;
-	obj = parse_object(the_repository, oid);
+	obj = parse_object_with_flags(the_repository, oid,
+				      PARSE_OBJECT_SKIP_HASH_CHECK);
 	while (obj && obj->type == OBJ_TAG) {
 		struct tag *tag = (struct tag *)obj;
author	Jeff King <peff@peff.net>	2023-07-02 22:37:47 (GMT)
committer	Junio C Hamano <gitster@pobox.com>	2023-07-17 21:16:05 (GMT)
commit	870eb53ab20e9ff453e3b89b4927c154c2b7211a (patch)
tree	8af53dec2974da97cc830a21991466cb61990e6e /ref-filter.c
parent	b9584c5858799d5603851af5f0dbad5e7af29b22 (diff)
download	git-870eb53ab20e9ff453e3b89b4927c154c2b7211a.zip git-870eb53ab20e9ff453e3b89b4927c154c2b7211a.tar.gz git-870eb53ab20e9ff453e3b89b4927c154c2b7211a.tar.bz2