path: root/builtin/pack-objects.c
diff options
authorJeff King <>2016-09-05 21:52:26 (GMT)
committerJunio C Hamano <>2016-09-07 18:45:31 (GMT)
commitb773ddea2cd3b08cb5654884f26c69b46e2d0c50 (patch)
treeb160ecb85a806cca4b71aaa5c979f53f5d49a92b /builtin/pack-objects.c
parentab5178356cc47ea9f1725993d45941d7c41fa471 (diff)
pack-objects: walk tag chains for --include-tag
When pack-objects is given --include-tag, it peels each tag ref down to a non-tag object, and if that non-tag object is going to be packed, we include the tag, too. But what happens if we have a chain of tags (e.g., tag "A" points to tag "B", which points to commit "C")? We'll peel down to "C" and realize that we want to include tag "A", but we do not ever consider tag "B", leading to a broken pack (assuming "B" was not otherwise selected). Instead, we have to walk the whole chain, adding any tags we find to the pack. Interestingly, it doesn't seem possible to trigger this problem with "git fetch", but you can with "git clone --single-branch". The reason is that we generate the correct pack when the client explicitly asks for "A" (because we do a real reachability analysis there), and "fetch" is more willing to do so. There are basically two cases: 1. If "C" is already a ref tip, then the client can deduce that it needs "A" itself (via find_non_local_tags), and will ask for it explicitly rather than relying on the include-tag capability. Everything works. 2. If "C" is not already a ref tip, then we hope for include-tag to send us the correct tag. But it doesn't; it generates a broken pack. However, the next step is to do a follow-up run of find_non_local_tags(), followed by fetch_refs() to backfill any tags we learned about. In the normal case, fetch_refs() calls quickfetch(), which does a connectivity check and sees we have no new objects to fetch. We just write the refs. But for the broken-pack case, the connectivity check fails, and quickfetch will follow-up with the remote, asking explicitly for each of the ref tips. This picks up the missing object in a new pack. For a regular "git clone", we are similarly OK, because we explicitly request all of the tag refs, and get a correct pack. But with "--single-branch", we kick in tag auto-following via "include-tag", but do _not_ do a follow-up backfill. We just take whatever the server sent us via include-tag and write out tag refs for any tag objects we were sent. So prior to c6807a4 (clone: open a shortcut for connectivity check, 2013-05-26), we actually claimed the clone was a success, but the result was silently corrupted! Since c6807a4, index-pack's connectivity check catches this case, and we correctly complain. The included test directly checks that pack-objects does not generate a broken pack, but also confirms that "clone --single-branch" does not hit the bug. Note that tag chains introduce another interesting question: if we are packing the tag "B" but not the commit "C", should "A" be included? Both before and after this patch, we do not include "A", because the initial peel_ref() check only knows about the bottom-most level, "C". To realize that "B" is involved at all, we would have to switch to an incremental peel, in which we examine each tagged object, asking if it is being packed (and including the outer tag if so). But that runs contrary to the optimizations in peel_ref(), which avoid accessing the objects at all, in favor of using the value we pull from packed-refs. It's OK to walk the whole chain once we know we're going to include the tag (we have to access it anyway, so the effort is proportional to the pack we're generating). But for the initial selection, we have to look at every ref. If we're only packing a few objects, we'd still have to parse every single referenced tag object just to confirm that it isn't part of a tag chain. This could be addressed if packed-refs stored the complete tag chain for each peeled ref (in most cases, this would be the same cost as now, as each "chain" is only a single link). But given the size of that project, it's out of scope for this fix (and probably nobody cares enough anyway, as it's such an obscure situation). This commit limits itself to just avoiding the creation of a broken pack. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
Diffstat (limited to 'builtin/pack-objects.c')
1 files changed, 30 insertions, 1 deletions
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index f854ca4..ea5367e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2104,6 +2104,35 @@ static void ll_find_deltas(struct object_entry **list, unsigned list_size,
#define ll_find_deltas(l, s, w, d, p) find_deltas(l, &s, w, d, p)
+static void add_tag_chain(const struct object_id *oid)
+ struct tag *tag;
+ /*
+ * We catch duplicates already in add_object_entry(), but we'd
+ * prefer to do this extra check to avoid having to parse the
+ * tag at all if we already know that it's being packed (e.g., if
+ * it was included via bitmaps, we would not have parsed it
+ * previously).
+ */
+ if (packlist_find(&to_pack, oid->hash, NULL))
+ return;
+ tag = lookup_tag(oid->hash);
+ while (1) {
+ if (!tag || parse_tag(tag) || !tag->tagged)
+ die("unable to pack objects reachable from tag %s",
+ oid_to_hex(oid));
+ add_object_entry(tag->object.oid.hash, OBJ_TAG, NULL, 0);
+ if (tag->tagged->type != OBJ_TAG)
+ return;
+ tag = (struct tag *)tag->tagged;
+ }
static int add_ref_tag(const char *path, const struct object_id *oid, int flag, void *cb_data)
struct object_id peeled;
@@ -2111,7 +2140,7 @@ static int add_ref_tag(const char *path, const struct object_id *oid, int flag,
if (starts_with(path, "refs/tags/") && /* is a tag? */
!peel_ref(path, peeled.hash) && /* peelable? */
packlist_find(&to_pack, peeled.hash, NULL)) /* object packed? */
- add_object_entry(oid->hash, OBJ_TAG, NULL, 0);
+ add_tag_chain(oid);
return 0;