summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorDerrick Stolee <dstolee@microsoft.com>2020-05-11 11:56:11 (GMT)
committerJunio C Hamano <gitster@pobox.com>2020-05-11 16:33:56 (GMT)
commit88093289cdcfe99c5a3d7ef6f36ee45aa3018824 (patch)
treede313c91da6df5a188929293717ac501a10cf229 /Documentation
parent891c17c95497712126700cdf5cd887bdb0ff3d55 (diff)
downloadgit-88093289cdcfe99c5a3d7ef6f36ee45aa3018824.zip
git-88093289cdcfe99c5a3d7ef6f36ee45aa3018824.tar.gz
git-88093289cdcfe99c5a3d7ef6f36ee45aa3018824.tar.bz2
Documentation: changed-path Bloom filters use byte words
In Documentation/technical/commit-graph-format.txt, the definition of the BIDX chunk specifies the length is a number of 8-byte words. During development we discovered that using 8-byte words in the Murmur3 hash algorithm causes issues with big-endian versus little- endian machines. Thus, the hash algorithm was adapted to work on a byte-by-byte basis. However, this caused a change in the definition of a "word" in bloom.h. Now, a "word" is a single byte, which allows filters to be as small as two bytes. These length-two filters are demonstrated in t0095-bloom.sh, and a larger filter of length 25 is demonstrated as well. The original point of using 8-byte words was for alignment reasons. It also presented opportunities for extremely sparse Bloom filters when there were a small number of changes at a commit, creating a very low false-positive rate. However, modifying the format at this point is unlikely to be a valuable exercise. Also, this use of single-byte granularity does present opportunities to save space. It is unclear if 8-byte alignment of the filters would present any meaningful performance benefits. Modify the format document to reflect reality. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/technical/commit-graph-format.txt8
1 files changed, 4 insertions, 4 deletions
diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
index de56f9f..1beef17 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -97,10 +97,10 @@ CHUNK DATA:
bit on. The other bits correspond to the position of the last parent.
Bloom Filter Index (ID: {'B', 'I', 'D', 'X'}) (N * 4 bytes) [Optional]
- * The ith entry, BIDX[i], stores the number of 8-byte word blocks in all
- Bloom filters from commit 0 to commit i (inclusive) in lexicographic
- order. The Bloom filter for the i-th commit spans from BIDX[i-1] to
- BIDX[i] (plus header length), where BIDX[-1] is 0.
+ * The ith entry, BIDX[i], stores the number of bytes in all Bloom filters
+ from commit 0 to commit i (inclusive) in lexicographic order. The Bloom
+ filter for the i-th commit spans from BIDX[i-1] to BIDX[i] (plus header
+ length), where BIDX[-1] is 0.
* The BIDX chunk is ignored if the BDAT chunk is not present.
Bloom Filter Data (ID: {'B', 'D', 'A', 'T'}) [Optional]