path: root/Documentation
diff options
authorJunio C Hamano <>2018-09-17 20:53:55 (GMT)
committerJunio C Hamano <>2018-09-17 20:53:55 (GMT)
commitf3504ea3dd21b0a6d38bcd369efa0663cdc05416 (patch)
treec8eb1191474dc37e25f082917e803915525c15d1 /Documentation
parentfba9654364a523bf146df26052ab05dd52c1d719 (diff)
parentfe0ac2fb7f8e87d37ef83dcee2d93901d58d8277 (diff)
Merge branch 'cc/delta-islands'
Lift code from GitHub to restrict delta computation so that an object that exists in one fork is not made into a delta against another object that does not appear in the same forked repository. * cc/delta-islands: pack-objects: move 'layer' into 'struct packing_data' pack-objects: move tree_depth into 'struct packing_data' t5320: tests for delta islands repack: add delta-islands support pack-objects: add delta-islands support pack-objects: refactor code into compute_layer_order() Add delta-islands.{c,h}
Diffstat (limited to 'Documentation')
3 files changed, 121 insertions, 0 deletions
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 6ecd70d..112041f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2684,6 +2684,21 @@ Note that changing the compression level will not automatically recompress
all existing objects. You can force recompression by passing the -F option
to linkgit:git-repack[1].
+ An extended regular expression configuring a set of delta
+ islands. See "DELTA ISLANDS" in linkgit:git-pack-objects[1]
+ for details.
+ Specify an island name which gets to have its objects be
+ packed first. This creates a kind of pseudo-pack at the front
+ of one pack, so that the objects from the specified island are
+ hopefully faster to copy into any pack that should be served
+ to a user requesting these objects. In practice this means
+ that the island specified should likely correspond to what is
+ the most commonly cloned in the repo. See also "DELTA ISLANDS"
+ in linkgit:git-pack-objects[1].
The maximum memory in bytes used for caching deltas in
linkgit:git-pack-objects[1] before writing them out to a pack.
@@ -3218,6 +3233,10 @@ repack.packKeptObjects::
index is being written (either via `--write-bitmap-index` or
+ If set to true, makes `git repack` act as if `--delta-islands`
+ was passed. Defaults to `false`.
When true, git will write a bitmap index when packing all
objects to disk (e.g., when `git repack -a` is run). This
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index d95b472..40c825c 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -289,6 +289,103 @@ Unexpected missing object will raise an error.
Keep unreachable objects in loose form. This implies `--revs`.
+ Restrict delta matches based on "islands". See DELTA ISLANDS
+ below.
+When possible, `pack-objects` tries to reuse existing on-disk deltas to
+avoid having to search for new ones on the fly. This is an important
+optimization for serving fetches, because it means the server can avoid
+inflating most objects at all and just send the bytes directly from
+disk. This optimization can't work when an object is stored as a delta
+against a base which the receiver does not have (and which we are not
+already sending). In that case the server "breaks" the delta and has to
+find a new one, which has a high CPU cost. Therefore it's important for
+performance that the set of objects in on-disk delta relationships match
+what a client would fetch.
+In a normal repository, this tends to work automatically. The objects
+are mostly reachable from the branches and tags, and that's what clients
+fetch. Any deltas we find on the server are likely to be between objects
+the client has or will have.
+But in some repository setups, you may have several related but separate
+groups of ref tips, with clients tending to fetch those groups
+independently. For example, imagine that you are hosting several "forks"
+of a repository in a single shared object store, and letting clients
+view them as separate repositories through `GIT_NAMESPACE` or separate
+repos using the alternates mechanism. A naive repack may find that the
+optimal delta for an object is against a base that is only found in
+another fork. But when a client fetches, they will not have the base
+object, and we'll have to find a new delta on the fly.
+A similar situation may exist if you have many refs outside of
+`refs/heads/` and `refs/tags/` that point to related objects (e.g.,
+`refs/pull` or `refs/changes` used by some hosting providers). By
+default, clients fetch only heads and tags, and deltas against objects
+found only in those other groups cannot be sent as-is.
+Delta islands solve this problem by allowing you to group your refs into
+distinct "islands". Pack-objects computes which objects are reachable
+from which islands, and refuses to make a delta from an object `A`
+against a base which is not present in all of `A`'s islands. This
+results in slightly larger packs (because we miss some delta
+opportunities), but guarantees that a fetch of one island will not have
+to recompute deltas on the fly due to crossing island boundaries.
+When repacking with delta islands the delta window tends to get
+clogged with candidates that are forbidden by the config. Repacking
+with a big --window helps (and doesn't take as long as it otherwise
+might because we can reject some object pairs based on islands before
+doing any computation on the content).
+Islands are configured via the `pack.island` option, which can be
+specified multiple times. Each value is a left-anchored regular
+expressions matching refnames. For example:
+island = refs/heads/
+island = refs/tags/
+puts heads and tags into an island (whose name is the empty string; see
+below for more on naming). Any refs which do not match those regular
+expressions (e.g., `refs/pull/123`) is not in any island. Any object
+which is reachable only from `refs/pull/` (but not heads or tags) is
+therefore not a candidate to be used as a base for `refs/heads/`.
+Refs are grouped into islands based on their "names", and two regexes
+that produce the same name are considered to be in the same
+island. The names are computed from the regexes by concatenating any
+capture groups from the regex, with a '-' dash in between. (And if
+there are no capture groups, then the name is the empty string, as in
+the above example.) This allows you to create arbitrary numbers of
+islands. Only up to 14 such capture groups are supported though.
+For example, imagine you store the refs for each fork in
+`refs/virtual/ID`, where `ID` is a numeric identifier. You might then
+island = refs/virtual/([0-9]+)/heads/
+island = refs/virtual/([0-9]+)/tags/
+island = refs/virtual/([0-9]+)/(pull)/
+That puts the heads and tags for each fork in their own island (named
+"1234" or similar), and the pull refs for each go into their own
+Note that we pick a single island for each regex to go into, using "last
+one wins" ordering (which allows repo-specific config to take precedence
+over user-wide config, and so forth).
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index d056250..aa0cc8b 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -160,6 +160,11 @@ depth is 4095.
being removed. In addition, any unreachable loose objects will
be packed (and their loose counterparts removed).
+ Pass the `--delta-islands` option to `git-pack-objects`, see
+ linkgit:git-pack-objects[1].