summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorJunio C Hamano <gitster@pobox.com>2020-05-01 20:39:53 (GMT)
committerJunio C Hamano <gitster@pobox.com>2020-05-01 20:39:53 (GMT)
commit9b6606f43d55bbf33b9924d16e02e60e1c09660a (patch)
tree0082094df8d1e99873fa1e72628fa4e02ec3a282 /Documentation
parentcf054f817a30cf4a6531548f52cd7d5cbed6f4fc (diff)
parentcaf388caa101be90b7ec43d7f78ca4e935fc0150 (diff)
downloadgit-9b6606f43d55bbf33b9924d16e02e60e1c09660a.zip
git-9b6606f43d55bbf33b9924d16e02e60e1c09660a.tar.gz
git-9b6606f43d55bbf33b9924d16e02e60e1c09660a.tar.bz2
Merge branch 'gs/commit-graph-path-filter'
Introduce an extension to the commit-graph to make it efficient to check for the paths that were modified at each commit using Bloom filters. * gs/commit-graph-path-filter: bloom: ignore renames when computing changed paths commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag t4216: add end to end tests for git log with Bloom filters revision.c: add trace2 stats around Bloom filter usage revision.c: use Bloom filters to speed up path based revision walks commit-graph: add --changed-paths option to write subcommand commit-graph: reuse existing Bloom filters during write commit-graph: write Bloom filters to commit graph file commit-graph: examine commits by generation number commit-graph: examine changed-path objects in pack order commit-graph: compute Bloom filters for changed paths diff: halt tree-diff early after max_changes bloom.c: core Bloom filter implementation for changed paths. bloom.c: introduce core Bloom filter constructs bloom.c: add the murmur3 hash implementation commit-graph: define and use MAX_NUM_CHUNKS
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/git-commit-graph.txt5
-rw-r--r--Documentation/technical/commit-graph-format.txt30
2 files changed, 35 insertions, 0 deletions
diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 46f7f7c..53a6502 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -57,6 +57,11 @@ or `--stdin-packs`.)
With the `--append` option, include all commits that are present in the
existing commit-graph file.
+
+With the `--changed-paths` option, compute and write information about the
+paths changed between a commit and it's first parent. This operation can
+take a while on large repositories. It provides significant performance gains
+for getting history of a directory or a file with `git log -- <path>`.
++
With the `--split[=<strategy>]` option, write the commit-graph as a
chain of multiple commit-graph files stored in
`<dir>/info/commit-graphs`. Commit-graph layers are merged based on the
diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
index a4f1744..de56f9f 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -17,6 +17,9 @@ metadata, including:
- The parents of the commit, stored using positional references within
the graph file.
+- The Bloom filter of the commit carrying the paths that were changed between
+ the commit and its first parent, if requested.
+
These positional references are stored as unsigned 32-bit integers
corresponding to the array position within the list of commit OIDs. Due
to some special constants we use to track parents, we can store at most
@@ -93,6 +96,33 @@ CHUNK DATA:
positions for the parents until reaching a value with the most-significant
bit on. The other bits correspond to the position of the last parent.
+ Bloom Filter Index (ID: {'B', 'I', 'D', 'X'}) (N * 4 bytes) [Optional]
+ * The ith entry, BIDX[i], stores the number of 8-byte word blocks in all
+ Bloom filters from commit 0 to commit i (inclusive) in lexicographic
+ order. The Bloom filter for the i-th commit spans from BIDX[i-1] to
+ BIDX[i] (plus header length), where BIDX[-1] is 0.
+ * The BIDX chunk is ignored if the BDAT chunk is not present.
+
+ Bloom Filter Data (ID: {'B', 'D', 'A', 'T'}) [Optional]
+ * It starts with header consisting of three unsigned 32-bit integers:
+ - Version of the hash algorithm being used. We currently only support
+ value 1 which corresponds to the 32-bit version of the murmur3 hash
+ implemented exactly as described in
+ https://en.wikipedia.org/wiki/MurmurHash#Algorithm and the double
+ hashing technique using seed values 0x293ae76f and 0x7e646e2 as
+ described in https://doi.org/10.1007/978-3-540-30494-4_26 "Bloom Filters
+ in Probabilistic Verification"
+ - The number of times a path is hashed and hence the number of bit positions
+ that cumulatively determine whether a file is present in the commit.
+ - The minimum number of bits 'b' per entry in the Bloom filter. If the filter
+ contains 'n' entries, then the filter size is the minimum number of 64-bit
+ words that contain n*b bits.
+ * The rest of the chunk is the concatenation of all the computed Bloom
+ filters for the commits in lexicographic order.
+ * Note: Commits with no changes or more than 512 changes have Bloom filters
+ of length zero.
+ * The BDAT chunk is present if and only if BIDX is present.
+
Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional]
This list of H-byte hashes describe a set of B commit-graph files that
form a commit-graph chain. The graph position for the ith commit in this