summaryrefslogtreecommitdiff
path: root/Documentation/technical/commit-graph.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/technical/commit-graph.txt')
-rw-r--r--Documentation/technical/commit-graph.txt93
1 files changed, 72 insertions, 21 deletions
diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
index 808fa30..2c26e95 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -1,4 +1,4 @@
-Git Commit Graph Design Notes
+Git Commit-Graph Design Notes
=============================
Git walks the commit graph for many reasons, including:
@@ -17,7 +17,7 @@ There are two main costs here:
The commit-graph file is a supplemental data structure that accelerates
commit graph walks. If a user downgrades or disables the 'core.commitGraph'
-config setting, then the existing ODB is sufficient. The file is stored
+config setting, then the existing object database is sufficient. The file is stored
as "commit-graph" either in the .git/objects/info directory or in the info
directory of an alternate.
@@ -38,14 +38,31 @@ A consumer may load the following info for a commit from the graph:
Values 1-4 satisfy the requirements of parse_commit_gently().
-Define the "generation number" of a commit recursively as follows:
+There are two definitions of generation number:
+1. Corrected committer dates (generation number v2)
+2. Topological levels (generation number v1)
- * A commit with no parents (a root commit) has generation number one.
+Define "corrected committer date" of a commit recursively as follows:
- * A commit with at least one parent has generation number one more than
- the largest generation number among its parents.
+ * A commit with no parents (a root commit) has corrected committer date
+ equal to its committer date.
-Equivalently, the generation number of a commit A is one more than the
+ * A commit with at least one parent has corrected committer date equal to
+ the maximum of its committer date and one more than the largest corrected
+ committer date among its parents.
+
+ * As a special case, a root commit with timestamp zero has corrected commit
+ date of 1, to be able to distinguish it from GENERATION_NUMBER_ZERO
+ (that is, an uncomputed corrected commit date).
+
+Define the "topological level" of a commit recursively as follows:
+
+ * A commit with no parents (a root commit) has topological level of one.
+
+ * A commit with at least one parent has topological level one more than
+ the largest topological level among its parents.
+
+Equivalently, the topological level of a commit A is one more than the
length of a longest path from A to a root commit. The recursive definition
is easier to use for computation and observing the following property:
@@ -60,6 +77,9 @@ is easier to use for computation and observing the following property:
generation numbers, then we always expand the boundary commit with highest
generation number and can easily detect the stopping condition.
+The property applies to both versions of generation number, that is both
+corrected committer dates and topological levels.
+
This property can be used to significantly reduce the time it takes to
walk commits and determine topological relationships. Without generation
numbers, the general heuristic is the following:
@@ -67,17 +87,19 @@ numbers, the general heuristic is the following:
If A and B are commits with commit time X and Y, respectively, and
X < Y, then A _probably_ cannot reach B.
-This heuristic is currently used whenever the computation is allowed to
+In absence of corrected commit dates (for example, old versions of Git or
+mixed generation graph chains),
+this heuristic is currently used whenever the computation is allowed to
violate topological relationships due to clock skew (such as "git log"
with default order), but is not used when the topological order is
required (such as merge base calculations, "git log --graph").
In practice, we expect some commits to be created recently and not stored
-in the commit graph. We can treat these commits as having "infinite"
+in the commit-graph. We can treat these commits as having "infinite"
generation number and walk until reaching commits with known generation
number.
-We use the macro GENERATION_NUMBER_INFINITY = 0xFFFFFFFF to mark commits not
+We use the macro GENERATION_NUMBER_INFINITY to mark commits not
in the commit-graph file. If a commit-graph file was written by a version
of Git that did not compute generation numbers, then those commits will
have generation number represented by the macro GENERATION_NUMBER_ZERO = 0.
@@ -93,12 +115,12 @@ fully-computed generation numbers. Using strict inequality may result in
walking a few extra commits, but the simplicity in dealing with commits
with generation number *_INFINITY or *_ZERO is valuable.
-We use the macro GENERATION_NUMBER_MAX = 0x3FFFFFFF to for commits whose
-generation numbers are computed to be at least this value. We limit at
-this value since it is the largest value that can be stored in the
-commit-graph file using the 30 bits available to generation numbers. This
-presents another case where a commit can have generation number equal to
-that of a parent.
+We use the macro GENERATION_NUMBER_V1_MAX = 0x3FFFFFFF for commits whose
+topological levels (generation number v1) are computed to be at least
+this value. We limit at this value since it is the largest value that
+can be stored in the commit-graph file using the 30 bits available
+to topological levels. This presents another case where a commit can
+have generation number equal to that of a parent.
Design Details
--------------
@@ -114,7 +136,7 @@ Design Details
- Commit grafts and replace objects can change the shape of the commit
history. The latter can also be enabled/disabled on the fly using
- `--no-replace-objects`. This leads to difficultly storing both possible
+ `--no-replace-objects`. This leads to difficulty storing both possible
interpretations of a commit id, especially when computing generation
numbers. The commit-graph will not be read or written when
replace-objects or grafts are present.
@@ -127,7 +149,7 @@ Design Details
helpful for these clones, anyway. The commit-graph will not be read or
written when shallow commits are present.
-Commit Graphs Chains
+Commit-Graphs Chains
--------------------
Typically, repos grow with near-constant velocity (commits per day). Over time,
@@ -210,12 +232,12 @@ file.
+---------------------+
| |
+-----------------------+ +---------------------+
- | graph-{hash2} |->| |
+ | graph-{hash2} |->| |
+-----------------------+ +---------------------+
| | |
+-----------------------+ +---------------------+
| | | |
- | graph-{hash1} |->| |
+ | graph-{hash1} |->| |
| | | |
+-----------------------+ +---------------------+
| tmp_graphXXX
@@ -223,7 +245,7 @@ file.
| |
| |
| |
- | graph-{hash0} |
+ | graph-{hash0} |
| |
| |
| |
@@ -267,6 +289,35 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum
number of commits) could be extracted into config settings for full
flexibility.
+## Handling Mixed Generation Number Chains
+
+With the introduction of generation number v2 and generation data chunk, the
+following scenario is possible:
+
+1. "New" Git writes a commit-graph with the corrected commit dates.
+2. "Old" Git writes a split commit-graph on top without corrected commit dates.
+
+A naive approach of using the newest available generation number from
+each layer would lead to violated expectations: the lower layer would
+use corrected commit dates which are much larger than the topological
+levels of the higher layer. For this reason, Git inspects the topmost
+layer to see if the layer is missing corrected commit dates. In such a case
+Git only uses topological level for generation numbers.
+
+When writing a new layer in split commit-graph, we write corrected commit
+dates if the topmost layer has corrected commit dates written. This
+guarantees that if a layer has corrected commit dates, all lower layers
+must have corrected commit dates as well.
+
+When merging layers, we do not consider whether the merged layers had corrected
+commit dates. Instead, the new layer will have corrected commit dates if the
+layer below the new layer has corrected commit dates.
+
+While writing or merging layers, if the new layer is the only layer, it will
+have corrected commit dates when written by compatible versions of Git. Thus,
+rewriting split commit-graph as a single file (`--split=replace`) creates a
+single layer with corrected commit dates.
+
## Deleting graph-{hash} files
After a new tip file is written, some `graph-{hash}` files may no longer