summaryrefslogtreecommitdiff
path: root/Documentation/technical/commit-graph-format.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/technical/commit-graph-format.txt')
-rw-r--r--Documentation/technical/commit-graph-format.txt32
1 files changed, 31 insertions, 1 deletions
diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
index a4f1744..4405410 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -17,6 +17,9 @@ metadata, including:
- The parents of the commit, stored using positional references within
the graph file.
+- The Bloom filter of the commit carrying the paths that were changed between
+ the commit and its first parent, if requested.
+
These positional references are stored as unsigned 32-bit integers
corresponding to the array position within the list of commit OIDs. Due
to some special constants we use to track parents, we can store at most
@@ -29,7 +32,7 @@ the body into "chunks" and provide a binary lookup table at the beginning
of the body. The header includes certain values, such as number of chunks
and hash type.
-All 4-byte numbers are in network order.
+All multi-byte numbers are in network byte order.
HEADER:
@@ -93,6 +96,33 @@ CHUNK DATA:
positions for the parents until reaching a value with the most-significant
bit on. The other bits correspond to the position of the last parent.
+ Bloom Filter Index (ID: {'B', 'I', 'D', 'X'}) (N * 4 bytes) [Optional]
+ * The ith entry, BIDX[i], stores the number of bytes in all Bloom filters
+ from commit 0 to commit i (inclusive) in lexicographic order. The Bloom
+ filter for the i-th commit spans from BIDX[i-1] to BIDX[i] (plus header
+ length), where BIDX[-1] is 0.
+ * The BIDX chunk is ignored if the BDAT chunk is not present.
+
+ Bloom Filter Data (ID: {'B', 'D', 'A', 'T'}) [Optional]
+ * It starts with header consisting of three unsigned 32-bit integers:
+ - Version of the hash algorithm being used. We currently only support
+ value 1 which corresponds to the 32-bit version of the murmur3 hash
+ implemented exactly as described in
+ https://en.wikipedia.org/wiki/MurmurHash#Algorithm and the double
+ hashing technique using seed values 0x293ae76f and 0x7e646e2 as
+ described in https://doi.org/10.1007/978-3-540-30494-4_26 "Bloom Filters
+ in Probabilistic Verification"
+ - The number of times a path is hashed and hence the number of bit positions
+ that cumulatively determine whether a file is present in the commit.
+ - The minimum number of bits 'b' per entry in the Bloom filter. If the filter
+ contains 'n' entries, then the filter size is the minimum number of 64-bit
+ words that contain n*b bits.
+ * The rest of the chunk is the concatenation of all the computed Bloom
+ filters for the commits in lexicographic order.
+ * Note: Commits with no changes or more than 512 changes have Bloom filters
+ of length zero.
+ * The BDAT chunk is present if and only if BIDX is present.
+
Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional]
This list of H-byte hashes describe a set of B commit-graph files that
form a commit-graph chain. The graph position for the ith commit in this