path: root/pack-objects.c
diff options
authorJeff King <>2018-08-21 19:07:05 (GMT)
committerJunio C Hamano <>2018-08-21 19:45:49 (GMT)
commit6a1e32d532c5948071e322cefc7052be6228adc3 (patch)
tree17612452b57edd80c5822deed9e724252a2d6efb /pack-objects.c
parent30cdc33fba361528c317042535fad9d5f0a2c6e6 (diff)
pack-objects: reuse on-disk deltas for thin "have" objects
When we serve a fetch, we pass the "wants" and "haves" from the fetch negotiation to pack-objects. That tells us not only which objects we need to send, but we also use the boundary commits as "preferred bases": their trees and blobs are candidates for delta bases, both for reusing on-disk deltas and for finding new ones. However, this misses some opportunities. Modulo some special cases like shallow or partial clones, we know that every object reachable from the "haves" could be a preferred base. We don't use all of them for two reasons: 1. It's expensive to traverse the whole history and enumerate all of the objects the other side has. 2. The delta search is expensive, so we want to keep the number of candidate bases sane. The boundary commits are the most likely to work. When we have reachability bitmaps, though, reason 1 no longer applies. We can efficiently compute the set of reachable objects on the other side (and in fact already did so as part of the bitmap set-difference to get the list of interesting objects). And using this set conveniently covers the shallow and partial cases, since we have to disable the use of bitmaps for those anyway. The second reason argues against using these bases in the search for new deltas. But there's one case where we can use this information for free: when we have an existing on-disk delta that we're considering reusing, we can do so if we know the other side has the base object. This in fact saves time during the delta search, because it's one less delta we have to compute. And that's exactly what this patch does: when we're considering whether to reuse an on-disk delta, if bitmaps tell us the other side has the object (and we're making a thin-pack), then we reuse it. Here are the results on p5311 using linux.git, which simulates a client fetching after `N` days since their last fetch: Test origin HEAD -------------------------------------------------------------------------- 5311.3: server (1 days) 0.27(0.27+0.04) 0.12(0.09+0.03) -55.6% 5311.4: size (1 days) 0.9M 237.0K -73.7% 5311.5: client (1 days) 0.04(0.05+0.00) 0.10(0.10+0.00) +150.0% 5311.7: server (2 days) 0.34(0.42+0.04) 0.13(0.10+0.03) -61.8% 5311.8: size (2 days) 1.5M 347.7K -76.5% 5311.9: client (2 days) 0.07(0.08+0.00) 0.16(0.15+0.01) +128.6% 5311.11: server (4 days) 0.56(0.77+0.08) 0.13(0.10+0.02) -76.8% 5311.12: size (4 days) 2.8M 566.6K -79.8% 5311.13: client (4 days) 0.13(0.15+0.00) 0.34(0.31+0.02) +161.5% 5311.15: server (8 days) 0.97(1.39+0.11) 0.30(0.25+0.05) -69.1% 5311.16: size (8 days) 4.3M 1.0M -76.0% 5311.17: client (8 days) 0.20(0.22+0.01) 0.53(0.52+0.01) +165.0% 5311.19: server (16 days) 1.52(2.51+0.12) 0.30(0.26+0.03) -80.3% 5311.20: size (16 days) 8.0M 2.0M -74.5% 5311.21: client (16 days) 0.40(0.47+0.03) 1.01(0.98+0.04) +152.5% 5311.23: server (32 days) 2.40(4.44+0.20) 0.31(0.26+0.04) -87.1% 5311.24: size (32 days) 14.1M 4.1M -70.9% 5311.25: client (32 days) 0.70(0.90+0.03) 1.81(1.75+0.06) +158.6% 5311.27: server (64 days) 11.76(26.57+0.29) 0.55(0.50+0.08) -95.3% 5311.28: size (64 days) 89.4M 47.4M -47.0% 5311.29: client (64 days) 5.71(9.31+0.27) 15.20(15.20+0.32) +166.2% 5311.31: server (128 days) 16.15(36.87+0.40) 0.91(0.82+0.14) -94.4% 5311.32: size (128 days) 134.8M 100.4M -25.5% 5311.33: client (128 days) 9.42(16.86+0.49) 25.34(25.80+0.46) +169.0% In all cases we save CPU time on the server (sometimes significant) and the resulting pack is smaller. We do spend more CPU time on the client side, because it has to reconstruct more deltas. But that's the right tradeoff to make, since clients tend to outnumber servers. It just means the thin pack mechanism is doing its job. From the user's perspective, the end-to-end time of the operation will generally be faster. E.g., in the 128-day case, we saved 15s on the server at a cost of 16s on the client. Since the resulting pack is 34MB smaller, this is a net win if the network speed is less than 270Mbit/s. And that's actually the worst case. The 64-day case saves just over 11s at a cost of just under 11s. So it's a slight win at any network speed, and the 40MB saved is pure bonus. That trend continues for the smaller fetches. The implementation itself is mostly straightforward, with the new logic going into check_object(). But there are two tricky bits. The first is that check_object() needs access to the relevant information (the thin flag and bitmap result). We can do this by pushing these into program-lifetime globals. The second is that the rest of the code assumes that any reused delta will point to another "struct object_entry" as its base. But of course the case we are interested in here is the one where don't have such an entry! I looked at a number of options that didn't quite work: - we could use a flag to signal a reused delta, but it's not a single bit. We have to actually store the oid of the base, which is normally done by pointing to the existing object_entry. And we'd have to modify all the code which looks at deltas. - we could add the reused bases to the end of the existing object_entry array. While this does create some extra work as later stages consider the extra entries, it's actually not too bad (we're not sending them, so they don't cost much in the delta search, and at most we'd have 2*N of them). But there's a more subtle problem. Adding to the existing array means we might need to grow it with realloc, which could move the earlier entries around. While many of the references to other entries are done by integer index, some (including ones on the stack) use pointers, which would become invalidated. This isn't insurmountable, but it would require quite a bit of refactoring (and it's hard to know that you've got it all, since it may work _most_ of the time and then fail subtly based on memory allocation patterns). - we could allocate a new one-off entry for the base. In fact, this is what an earlier version of this patch did. However, since the refactoring brought in by ad635e82d6 (Merge branch 'nd/pack-objects-pack-struct', 2018-05-23), the delta_idx code requires that both entries be in the main packing list. So taking all of those options into account, what I ended up with is a separate list of "external bases" that are not part of the main packing list. Each delta entry that points to an external base has a single-bit flag to do so; we have a little breathing room in the bitfield section of object_entry. This lets us limit the change primarily to the oe_delta() and oe_set_delta_ext() functions. And as a bonus, most of the rest of the code does not consider these dummy entries at all, saving both runtime CPU and code complexity. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
Diffstat (limited to 'pack-objects.c')
1 files changed, 19 insertions, 0 deletions
diff --git a/pack-objects.c b/pack-objects.c
index 9270852..9ae0cec 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -177,3 +177,22 @@ struct object_entry *packlist_alloc(struct packing_data *pdata,
return new_entry;
+void oe_set_delta_ext(struct packing_data *pdata,
+ struct object_entry *delta,
+ const unsigned char *sha1)
+ struct object_entry *base;
+ ALLOC_GROW(pdata->ext_bases, pdata->nr_ext + 1, pdata->alloc_ext);
+ base = &pdata->ext_bases[pdata->nr_ext++];
+ memset(base, 0, sizeof(*base));
+ hashcpy(base->idx.oid.hash, sha1);
+ /* These flags mark that we are not part of the actual pack output. */
+ base->preferred_base = 1;
+ base->filled = 1;
+ delta->ext_base = 1;
+ delta->delta_idx = base - pdata->ext_bases + 1;