summaryrefslogtreecommitdiff
path: root/bloom.h
diff options
context:
space:
mode:
authorGarima Singh <garima.singh@microsoft.com>2020-04-06 16:59:52 (GMT)
committerJunio C Hamano <gitster@pobox.com>2020-04-06 18:08:37 (GMT)
commita56b9464cd0a49317fafde080ae4e73c5430ac9b (patch)
tree73a69c869e0dc5dabc5655940479510fafe09426 /bloom.h
parentd38e07b8c44ffdb73e7eba1b7f6a73eb7eb0d5f9 (diff)
downloadgit-a56b9464cd0a49317fafde080ae4e73c5430ac9b.zip
git-a56b9464cd0a49317fafde080ae4e73c5430ac9b.tar.gz
git-a56b9464cd0a49317fafde080ae4e73c5430ac9b.tar.bz2
revision.c: use Bloom filters to speed up path based revision walks
Revision walk will now use Bloom filters for commits to speed up revision walks for a particular path (for computing history for that path), if they are present in the commit-graph file. We load the Bloom filters during the prepare_revision_walk step, currently only when dealing with a single pathspec. Extending it to work with multiple pathspecs can be explored and built on top of this series in the future. While comparing trees in rev_compare_trees(), if the Bloom filter says that the file is not different between the two trees, we don't need to compute the expensive diff. This is where we get our performance gains. The other response of the Bloom filter is '`:maybe', in which case we fall back to the full diff calculation to determine if the path was changed in the commit. We do not try to use Bloom filters when the '--walk-reflogs' option is specified. The '--walk-reflogs' option does not walk the commit ancestry chain like the rest of the options. Incorporating the performance gains when walking reflog entries would add more complexity, and can be explored in a later series. Performance Gains: We tested the performance of `git log -- <path>` on the git repo, the linux and some internal large repos, with a variety of paths of varying depths. On the git and linux repos: - we observed a 2x to 5x speed up. On a large internal repo with files seated 6-10 levels deep in the tree: - we observed 10x to 20x speed ups, with some paths going up to 28 times faster. Helped-by: Derrick Stolee <dstolee@microsoft.com Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'bloom.h')
-rw-r--r--bloom.h4
1 files changed, 4 insertions, 0 deletions
diff --git a/bloom.h b/bloom.h
index 760d712..b935186 100644
--- a/bloom.h
+++ b/bloom.h
@@ -83,4 +83,8 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
struct commit *c,
int compute_if_not_present);
+int bloom_filter_contains(const struct bloom_filter *filter,
+ const struct bloom_key *key,
+ const struct bloom_filter_settings *settings);
+
#endif \ No newline at end of file