summaryrefslogtreecommitdiff
path: root/diff.h
diff options
context:
space:
mode:
authorJeff King <peff@peff.net>2017-10-13 15:27:45 (GMT)
committerJunio C Hamano <gitster@pobox.com>2017-10-14 02:43:49 (GMT)
commita937b37e766479c8e780b17cce9c4b252fd97e40 (patch)
treef03220482464e5c88330e3b87f663cd17bab25b9 /diff.h
parent9752ad0bb79f680bca48db7adc45338b298304b0 (diff)
downloadgit-a937b37e766479c8e780b17cce9c4b252fd97e40.zip
git-a937b37e766479c8e780b17cce9c4b252fd97e40.tar.gz
git-a937b37e766479c8e780b17cce9c4b252fd97e40.tar.bz2
revision: quit pruning diff more quickly when possible
When the revision traversal machinery is given a pathspec, we must compute the parent-diff for each commit to determine which ones are TREESAME. We set the QUICK diff flag to avoid looking at more entries than we need; we really just care whether there are any changes at all. But there is one case where we want to know a bit more: if --remove-empty is set, we care about finding cases where the change consists only of added entries (in which case we may prune the parent in try_to_simplify_commit()). To cover that case, our file_add_remove() callback does not quit the diff upon seeing an added entry; it keeps looking for other types of entries. But this means when --remove-empty is not set (and it is not by default), we compute more of the diff than is necessary. You can see this in a pathological case where a commit adds a very large number of entries, and we limit based on a broad pathspec. E.g.: perl -e ' chomp(my $blob = `git hash-object -w --stdin </dev/null`); for my $a (1..1000) { for my $b (1..1000) { print "100644 $blob\t$a/$b\n"; } } ' | git update-index --index-info git commit -qm add git rev-list HEAD -- . This case takes about 100ms now, but after this patch only needs 6ms. That's not a huge improvement, but it's easy to get and it protects us against even more pathological cases (e.g., going from 1 million to 10 million files would take ten times as long with the current code, but not increase at all after this patch). This is reported to minorly speed-up pathspec limiting in real world repositories (like the 100-million-file Windows repository), but probably won't make a noticeable difference outside of pathological setups. This patch actually covers the case without --remove-empty, and the case where we see only deletions. See the in-code comment for details. Note that we have to add a new member to the diff_options struct so that our callback can see the value of revs->remove_empty_trees. This callback parameter could be passed to the "add_remove" and "change" callbacks, but there's not much point. They already receive the diff_options struct, and doing it this way avoids having to update the function signature of the other callbacks (arguably the format_callback and output_prefix functions could benefit from the same simplification). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'diff.h')
-rw-r--r--diff.h1
1 files changed, 1 insertions, 0 deletions
diff --git a/diff.h b/diff.h
index e9ccb38..fe5c287 100644
--- a/diff.h
+++ b/diff.h
@@ -180,6 +180,7 @@ struct diff_options {
pathchange_fn_t pathchange;
change_fn_t change;
add_remove_fn_t add_remove;
+ void *change_fn_data;
diff_format_fn_t format_callback;
void *format_callback_data;
diff_prefix_fn_t output_prefix;