Do exact rename detection regardless of rename limits

Now that the exact rename detection is linear-time (with a very small constant factor to boot), there is no longer any reason to limit it by the number of files involved. In some trivial testing, I created a repository with a directory that had a hundred thousand files in it (all with different contents), and then moved that directory to show the effects of renaming 100,000 files. With the new code, that resulted in [torvalds@woody big-rename]$ time ~/git/git show -C | wc -l 400006 real 0m2.071s user 0m1.520s sys 0m0.576s ie the code can correctly detect the hundred thousand renames in about 2 seconds (the number "400006" comes from four lines for each rename: diff --git a/really-big-dir/file-1-1-1-1-1 b/moved-big-dir/file-1-1-1-1-1 similarity index 100% rename from really-big-dir/file-1-1-1-1-1 rename to moved-big-dir/file-1-1-1-1-1 and the extra six lines is from a one-liner commit message and all the commit information and spacing). Most of those two seconds weren't even really the rename detection, it's really all the other stuff needed to get there. With the old code, this wouldn't have been practically possible. Doing a pairwise check of the ten billion possible pairs would have been prohibitively expensive. In fact, even with the rename limiter in place, the old code would waste a lot of time just on the diff_filespec checks, and despite not even trying to find renames, it used to look like: [torvalds@woody big-rename]$ time git show -C | wc -l 1400006 real 0m12.337s user 0m12.285s sys 0m0.192s ie we used to take 12 seconds for this load and not even do any rename detection! (The number 1400006 comes from fourteen lines per file moved: seven lines each for the delete and the create of a one-liner file, and the same extra six lines of commit information). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
author: Linus Torvalds <torvalds@linux-foundation.org> 2007-10-25 18:24:47 (GMT)
committer: Junio C Hamano <gitster@pobox.com> 2007-10-27 06:18:06 (GMT)
commit: 17559a643ecef94834d930790498c6babe3e89a8 (patch)
tree: a51109dfff25cad9d819e4b61c6d583da08695ed /diffcore-rename.c
parent: 9027f53cb5051bf83a0254e7f8aeb5d1a206de0b (diff)
download: git-17559a643ecef94834d930790498c6babe3e89a8.zip
git-17559a643ecef94834d930790498c6babe3e89a8.tar.gz
git-17559a643ecef94834d930790498c6babe3e89a8.tar.bz2
1 files changed, 6 insertions, 6 deletions
diff --git a/diffcore-rename.c b/diffcore-rename.c
index e7e370b..3946932 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -429,6 +429,12 @@ void diffcore_rename(struct diff_options *options)
 		goto cleanup; /* nothing to do */
 
 	/*
+	 * We really want to cull the candidates list early
+	 * with cheap tests in order to avoid doing deltas.
+	 */
+	rename_count = find_exact_renames();
+
+	/*
 	 * This basically does a test for the rename matrix not
 	 * growing larger than a "rename_limit" square matrix, ie:
 	 *
@@ -444,12 +450,6 @@ void diffcore_rename(struct diff_options *options)
 	if (rename_dst_nr * rename_src_nr > rename_limit * rename_limit)
 		goto cleanup;
 
-	/*
-	 * We really want to cull the candidates list early
-	 * with cheap tests in order to avoid doing deltas.
-	 */
-	rename_count = find_exact_renames();
-
 	/* Have we run out the created file pool?  If so we can avoid
 	 * doing the delta matrix altogether.
 	 */
author	Linus Torvalds <torvalds@linux-foundation.org>	2007-10-25 18:24:47 (GMT)
committer	Junio C Hamano <gitster@pobox.com>	2007-10-27 06:18:06 (GMT)
commit	17559a643ecef94834d930790498c6babe3e89a8 (patch)
tree	a51109dfff25cad9d819e4b61c6d583da08695ed /diffcore-rename.c
parent	9027f53cb5051bf83a0254e7f8aeb5d1a206de0b (diff)
download	git-17559a643ecef94834d930790498c6babe3e89a8.zip git-17559a643ecef94834d930790498c6babe3e89a8.tar.gz git-17559a643ecef94834d930790498c6babe3e89a8.tar.bz2