From 7288e12cce5db87216eba16441a2c37206e5dcad Mon Sep 17 00:00:00 2001 From: Jeff King Date: Tue, 14 Mar 2017 12:30:24 -0400 Subject: add--interactive: do not expand pathspecs with ls-files MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When we want to get the list of modified files, we first expand any user-provided pathspecs with "ls-files", and then feed the resulting list of paths as arguments to "diff-index" and "diff-files". If your pathspec expands into a large number of paths, you may run into one of two problems: 1. The OS may complain about the size of the argument list, and refuse to run. For example: $ (ulimit -s 128 && git add -p drivers) Can't exec "git": Argument list too long at .../git-add--interactive line 177. Died at .../git-add--interactive line 177. That's on the linux.git repository, which has about 20K files in the "drivers" directory (none of them modified in this case). The "ulimit -s" trick is necessary to show the problem on Linux even for such a gigantic set of paths. Other operating systems have much smaller limits (e.g., a real-world case was seen with only 5K files on OS X). 2. Even when it does work, it's really slow. The pathspec code is not optimized for huge numbers of paths. Here's the same case without the ulimit: $ time git add -p drivers No changes. real 0m16.559s user 0m53.140s sys 0m0.220s We can improve this by skipping "ls-files" completely, and just feeding the original pathspecs to the diff commands. This solution was discussed in 2010: http://public-inbox.org/git/20100105041438.GB12574@coredump.intra.peff.net/ but at the time the diff code's pathspecs were more primitive than those used by ls-files (e.g., they did not support globs). Making the change would have caused a user-visible regression, so we didn't. Since then, the pathspec code has been unified, and the diff commands natively understand pathspecs like '*.c'. This patch implements that solution. That skips the argument-list limits, and the result runs much faster: $ time git add -p drivers No changes. real 0m0.149s user 0m0.116s sys 0m0.080s There are two new tests. The first just exercises the globbing behavior to confirm that we are not causing a regression there. The second checks the actual argument behavior using GIT_TRACE. We _could_ do it with the "ulimit -s" trick, as above. But that would mean the test could only run where "ulimit -s" works. And tests of that sort are expensive, because we have to come up with enough files to actually bust the limit (we can't just shrink the "128" down infinitely, since it is also the in-program stack size). Finally, two caveats and possibilities for future work: a. This fixes one argument-list expansion, but there may be others. In fact, it's very likely that if you run "git add -i" and select a large number of modified files that the script would try to feed them all to a single git command. In practice this is probably fine. The real issue here is that the argument list was growing with the _total_ number of files, not the number of modified or selected files. b. If the repository contains filenames with literal wildcard characters (e.g., "foo*"), the original code expanded them via "ls-files" and then fed those wildcard names to "diff-index", which would have treated them as wildcards. This was a bug, which is now fixed (though unless you really go through some contortions with ":(literal)", it's likely that your original pathspec would match whatever the accidentally-expanded wildcard would anyway). So this takes us one step closer to working correctly with files whose names contain wildcard characters, but it's likely that others remain (e.g., if "git add -i" feeds the selected paths to "git add"). Reported-by: Wincent Colaiuta Reported-by: Mislav Marohnić Signed-off-by: Jeff King Signed-off-by: Junio C Hamano diff --git a/git-add--interactive.perl b/git-add--interactive.perl index 982593c..131c2a4 100755 --- a/git-add--interactive.perl +++ b/git-add--interactive.perl @@ -275,20 +275,11 @@ sub list_modified { my ($only) = @_; my (%data, @return); my ($add, $del, $adddel, $file); - my @tracked = (); - - if (@ARGV) { - @tracked = map { - chomp $_; - unquote_path($_); - } run_cmd_pipe(qw(git ls-files --), @ARGV); - return if (!@tracked); - } my $reference = get_diff_reference($patch_mode_revision); for (run_cmd_pipe(qw(git diff-index --cached --numstat --summary), $reference, - '--', @tracked)) { + '--', @ARGV)) { if (($add, $del, $file) = /^([-\d]+) ([-\d]+) (.*)/) { my ($change, $bin); @@ -313,7 +304,7 @@ sub list_modified { } } - for (run_cmd_pipe(qw(git diff-files --numstat --summary --raw --), @tracked)) { + for (run_cmd_pipe(qw(git diff-files --numstat --summary --raw --), @ARGV)) { if (($add, $del, $file) = /^([-\d]+) ([-\d]+) (.*)/) { $file = unquote_path($file); diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh index 5ffe78e..d8dcc97 100755 --- a/t/t3701-add-interactive.sh +++ b/t/t3701-add-interactive.sh @@ -394,4 +394,47 @@ test_expect_success 'diffs can be colorized' ' grep "$(printf "\\033")" output ' +test_expect_success 'add -p handles globs' ' + git reset --hard && + + mkdir -p subdir && + echo base >one.c && + echo base >subdir/two.c && + git add "*.c" && + git commit -m base && + + echo change >one.c && + echo change >subdir/two.c && + git add -p "*.c" <<-\EOF && + y + y + EOF + + cat >expect <<-\EOF && + one.c + subdir/two.c + EOF + git diff --cached --name-only >actual && + test_cmp expect actual +' + +test_expect_success 'add -p does not expand argument lists' ' + git reset --hard && + + echo content >not-changed && + git add not-changed && + git commit -m "add not-changed file" && + + echo change >file && + GIT_TRACE=$(pwd)/trace.out git add -p . <<-\EOF && + y + EOF + + # we know that "file" must be mentioned since we actually + # update it, but we want to be sure that our "." pathspec + # was not expanded into the argument list of any command. + # So look only for "not-changed". + ! grep not-changed trace.out +' + test_done -- cgit v0.10.2-6-g49f6