From 7541cc530239b14edacfa5750f8ab3a8a62dc545 Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Tue, 17 Sep 2019 09:34:53 -0700 Subject: t7300: add testcases showing failure to clean specified pathspecs Someone brought me a testcase where multiple git-clean invocations were required to clean out unwanted files: mkdir d{1,2} touch d{1,2}/ut touch d1/t && git add d1/t With this setup, the user would need to run git clean -ffd */ut twice to delete both ut files. A little testing showed some interesting variants: * If only one of those two ut files existed (either one), then only one clean command would be necessary. * If both directories had tracked files, then only one git clean would be necessary to clean both files. * If both directories had no tracked files then the clean command above would never clean either of the untracked files despite the pathspec explicitly calling both of them out. A bisect showed that the failure to clean out the files started with commit cf424f5fd89b ("clean: respect pathspecs with "-d", 2014-03-10). However, that pointed to a separate issue: while the "-d" flag was used by the original user who showed me this problem, that flag should have been irrelevant to this problem. Testing again without the "-d" flag showed that the same buggy behavior exists without using that flag, and has in fact existed since before cf424f5fd89b. Although these problems at first are perceived to be different (e.g. never clearing out the requested files vs. taking multiple invocations to get everything cleared out), they are actually just different manifestations of the same problem. The case with multiple directories that have no tracked files is the more general case; solving it will solve all the others. So, I concentrate on it. Add testcases showing that multiple untracked files within entirely untracked directories cannot be cleaned when specifying these files to git clean via pathspecs. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index d01fd12..2c254c7 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -691,6 +691,38 @@ test_expect_failure 'git clean -d skips nested repo containing ignored files' ' test_path_is_file nested-repo-with-ignored-file/file ' +test_expect_failure 'git clean handles being told what to clean' ' + mkdir -p d1 d2 && + touch d1/ut d2/ut && + git clean -f */ut && + test_path_is_missing d1/ut && + test_path_is_missing d2/ut +' + +test_expect_failure 'git clean handles being told what to clean, with -d' ' + mkdir -p d1 d2 && + touch d1/ut d2/ut && + git clean -ffd */ut && + test_path_is_missing d1/ut && + test_path_is_missing d2/ut +' + +test_expect_failure 'git clean works if a glob is passed without -d' ' + mkdir -p d1 d2 && + touch d1/ut d2/ut && + git clean -f "*ut" && + test_path_is_missing d1/ut && + test_path_is_missing d2/ut +' + +test_expect_failure 'git clean works if a glob is passed with -d' ' + mkdir -p d1 d2 && + touch d1/ut d2/ut && + git clean -ffd "*ut" && + test_path_is_missing d1/ut && + test_path_is_missing d2/ut +' + test_expect_success MINGW 'handle clean & core.longpaths = false nicely' ' test_config core.longpaths false && a50=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa && -- cgit v0.10.2-6-g49f6 From bbbb6b0b898d0a03175a902524fb3a371fbd5b3f Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Tue, 17 Sep 2019 09:34:54 -0700 Subject: dir: fix typo in comment Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano diff --git a/dir.c b/dir.c index d021c90..a9168be 100644 --- a/dir.c +++ b/dir.c @@ -139,7 +139,7 @@ static size_t common_prefix_len(const struct pathspec *pathspec) * ":(icase)path" is treated as a pathspec full of * wildcard. In other words, only prefix is considered common * prefix. If the pathspec is abc/foo abc/bar, running in - * subdir xyz, the common prefix is still xyz, not xuz/abc as + * subdir xyz, the common prefix is still xyz, not xyz/abc as * in non-:(icase). */ GUARD_PATHSPEC(pathspec, -- cgit v0.10.2-6-g49f6 From a5e916c7453bc022cb86d7f8528952ccda6a81ce Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Tue, 17 Sep 2019 09:34:55 -0700 Subject: dir: fix off-by-one error in match_pathspec_item For a pathspec like 'foo/bar' comparing against a path named "foo/", namelen will be 4, and match[namelen] will be 'b'. The correct location of the directory separator is namelen-1. However, other callers of match_pathspec_item() such as builtin/grep.c's submodule_path_match() will compare against a path named "foo" instead of "foo/". It might be better to change all the callers to be consistent, as discussed at https://public-inbox.org/git/xmqq7e6cdnkr.fsf@gitster-ct.c.googlers.com/ and https://public-inbox.org/git/CABPp-BERWUPCPq-9fVW1LNocqkrfsoF4BPj3gJd9+En43vEkTQ@mail.gmail.com/ but there are many cases to audit, so for now just make sure we handle both cases with and without a trailing slash. The reason the code worked despite this sometimes-off-by-one error was that the subsequent code immediately checked whether the first matchlen characters matched (which they do) and then bailed and return MATCHED_RECURSIVELY anyway since wildmatch doesn't have the ability to check if "name" can be matched as a directory (or prefix) against the pathspec. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano diff --git a/dir.c b/dir.c index a9168be..bf1a747 100644 --- a/dir.c +++ b/dir.c @@ -356,8 +356,9 @@ static int match_pathspec_item(const struct index_state *istate, /* Perform checks to see if "name" is a super set of the pathspec */ if (flags & DO_MATCH_SUBMODULE) { /* name is a literal prefix of the pathspec */ + int offset = name[namelen-1] == '/' ? 1 : 0; if ((namelen < matchlen) && - (match[namelen] == '/') && + (match[namelen-offset] == '/') && !ps_strncmp(item, match, name, namelen)) return MATCHED_RECURSIVELY; -- cgit v0.10.2-6-g49f6 From 404ebceda01c5af68926c3ee78934cd95bda2602 Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Tue, 17 Sep 2019 09:34:56 -0700 Subject: dir: also check directories for matching pathspecs Even if a directory doesn't match a pathspec, it is possible, depending on the precise pathspecs, that some file underneath it might. So we special case and recurse into the directory for such situations. However, we previously always added any untracked directory that we recursed into to the list of untracked paths, regardless of whether the directory itself matched the pathspec. For the case of git-clean and a set of pathspecs of "dir/file" and "more", this caused a problem because we'd end up with dir entries for both of "dir" "dir/file" Then correct_untracked_entries() would try to helpfully prune duplicates for us by removing "dir/file" since it's under "dir", leaving us with "dir" Since the original pathspec only had "dir/file", the only entry left doesn't match and leaves nothing to be removed. (Note that if only one pathspec was specified, e.g. only "dir/file", then the common_prefix_len optimizations in fill_directory would cause us to bypass this problem, making it appear in simple tests that we could correctly remove manually specified pathspecs.) Fix this by actually checking whether the directory we are about to add to the list of dir entries actually matches the pathspec; only do this matching check after we have already returned from recursing into the directory. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano diff --git a/dir.c b/dir.c index bf1a747..76a3c38 100644 --- a/dir.c +++ b/dir.c @@ -1951,6 +1951,11 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, check_only, stop_at_first_file, pathspec); if (subdir_state > dir_state) dir_state = subdir_state; + + if (!match_pathspec(istate, pathspec, path.buf, path.len, + 0 /* prefix */, NULL, + 0 /* do NOT special case dirs */)) + state = path_none; } if (check_only) { diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 2c254c7..1261715 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -699,7 +699,7 @@ test_expect_failure 'git clean handles being told what to clean' ' test_path_is_missing d2/ut ' -test_expect_failure 'git clean handles being told what to clean, with -d' ' +test_expect_success 'git clean handles being told what to clean, with -d' ' mkdir -p d1 d2 && touch d1/ut d2/ut && git clean -ffd */ut && @@ -715,7 +715,7 @@ test_expect_failure 'git clean works if a glob is passed without -d' ' test_path_is_missing d2/ut ' -test_expect_failure 'git clean works if a glob is passed with -d' ' +test_expect_success 'git clean works if a glob is passed with -d' ' mkdir -p d1 d2 && touch d1/ut d2/ut && git clean -ffd "*ut" && -- cgit v0.10.2-6-g49f6 From a3d89d8f7681c61c7fe04a0c01c7310a375f3d0e Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Tue, 17 Sep 2019 09:34:57 -0700 Subject: dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case The specific checks done in match_pathspec_item for the DO_MATCH_SUBMODULE case are useful for other cases which have nothing to do with submodules. Rename this constant; a subsequent commit will make use of this change. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano diff --git a/dir.c b/dir.c index 76a3c38..b4d6561 100644 --- a/dir.c +++ b/dir.c @@ -273,7 +273,7 @@ static int do_read_blob(const struct object_id *oid, struct oid_stat *oid_stat, #define DO_MATCH_EXCLUDE (1<<0) #define DO_MATCH_DIRECTORY (1<<1) -#define DO_MATCH_SUBMODULE (1<<2) +#define DO_MATCH_LEADING_PATHSPEC (1<<2) /* * Does 'match' match the given name? @@ -354,7 +354,7 @@ static int match_pathspec_item(const struct index_state *istate, return MATCHED_FNMATCH; /* Perform checks to see if "name" is a super set of the pathspec */ - if (flags & DO_MATCH_SUBMODULE) { + if (flags & DO_MATCH_LEADING_PATHSPEC) { /* name is a literal prefix of the pathspec */ int offset = name[namelen-1] == '/' ? 1 : 0; if ((namelen < matchlen) && @@ -498,7 +498,7 @@ int submodule_path_match(const struct index_state *istate, strlen(submodule_name), 0, seen, DO_MATCH_DIRECTORY | - DO_MATCH_SUBMODULE); + DO_MATCH_LEADING_PATHSPEC); return matched; } -- cgit v0.10.2-6-g49f6 From 89a1f4aaf7650288b976c6022b2c5854950d52c6 Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Tue, 17 Sep 2019 09:34:58 -0700 Subject: dir: if our pathspec might match files under a dir, recurse into it For git clean, if a directory is entirely untracked and the user did not specify -d (corresponding to DIR_SHOW_IGNORED_TOO), then we usually do not want to remove that directory and thus do not recurse into it. However, if the user manually specified specific (or even globbed) paths somewhere under that directory to remove, then we need to recurse into the directory to make sure we remove the relevant paths under that directory as the user requested. Note that this does not mean that the recursed-into directory will be added to dir->entries for later removal; as of a few commits earlier in this series, there is another more strict match check that is run after returning from a recursed-into directory before deciding to add it to the list of entries. Therefore, this will only result in files underneath the given directory which match one of the pathspecs being added to the entries list. Two notes of potential interest to future readers: * If we wanted to only recurse into a directory when it is specifically matched rather than matched-via-glob (e.g. '*.c'), then we could do so via making the final non-zero return in match_pathspec_item be MATCHED_RECURSIVELY instead of MATCHED_RECURSIVELY_LEADING_PATHSPEC. (Note that the relative order of MATCHED_RECURSIVELY_LEADING_PATHSPEC and MATCHED_RECURSIVELY are important for such a change.) I was leaving open that possibility while writing an RFC asking for the behavior we want, but even though we don't want it, that knowledge might help you understand the code flow better. * There is a growing amount of logic in read_directory_recursive() for deciding whether to recurse into a subdirectory. However, there is a comment immediately preceding this logic that says to recurse if instructed by treat_path(). It may be better for the logic in read_directory_recursive() to ultimately be moved to treat_path() (or another function it calls, such as treat_directory()), but I have left that for someone else to tackle in the future. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano diff --git a/dir.c b/dir.c index b4d6561..47c0a99 100644 --- a/dir.c +++ b/dir.c @@ -360,7 +360,7 @@ static int match_pathspec_item(const struct index_state *istate, if ((namelen < matchlen) && (match[namelen-offset] == '/') && !ps_strncmp(item, match, name, namelen)) - return MATCHED_RECURSIVELY; + return MATCHED_RECURSIVELY_LEADING_PATHSPEC; /* name" doesn't match up to the first wild character */ if (item->nowildcard_len < item->len && @@ -377,7 +377,7 @@ static int match_pathspec_item(const struct index_state *istate, * The submodules themselves will be able to perform more * accurate matching to determine if the pathspec matches. */ - return MATCHED_RECURSIVELY; + return MATCHED_RECURSIVELY_LEADING_PATHSPEC; } return 0; @@ -1939,8 +1939,10 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, /* recurse into subdir if instructed by treat_path */ if ((state == path_recurse) || ((state == path_untracked) && - (dir->flags & DIR_SHOW_IGNORED_TOO) && - (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR))) { + (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR) && + ((dir->flags & DIR_SHOW_IGNORED_TOO) || + do_match_pathspec(istate, pathspec, path.buf, path.len, + baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) { struct untracked_cache_dir *ud; ud = lookup_untracked(dir->untracked, untracked, path.buf + baselen, diff --git a/dir.h b/dir.h index 680079b..46c238a 100644 --- a/dir.h +++ b/dir.h @@ -211,8 +211,9 @@ int count_slashes(const char *s); * when populating the seen[] array. */ #define MATCHED_RECURSIVELY 1 -#define MATCHED_FNMATCH 2 -#define MATCHED_EXACTLY 3 +#define MATCHED_RECURSIVELY_LEADING_PATHSPEC 2 +#define MATCHED_FNMATCH 3 +#define MATCHED_EXACTLY 4 int simple_length(const char *match); int no_wildcard(const char *string); char *common_prefix(const struct pathspec *pathspec); diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 1261715..d83aeb7 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -691,7 +691,7 @@ test_expect_failure 'git clean -d skips nested repo containing ignored files' ' test_path_is_file nested-repo-with-ignored-file/file ' -test_expect_failure 'git clean handles being told what to clean' ' +test_expect_success 'git clean handles being told what to clean' ' mkdir -p d1 d2 && touch d1/ut d2/ut && git clean -f */ut && @@ -707,7 +707,7 @@ test_expect_success 'git clean handles being told what to clean, with -d' ' test_path_is_missing d2/ut ' -test_expect_failure 'git clean works if a glob is passed without -d' ' +test_expect_success 'git clean works if a glob is passed without -d' ' mkdir -p d1 d2 && touch d1/ut d2/ut && git clean -f "*ut" && -- cgit v0.10.2-6-g49f6 From 29b577b96082d87b7237fdc57bf95cfe0914c53d Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Tue, 17 Sep 2019 09:34:59 -0700 Subject: dir: add commentary explaining match_pathspec_item's return value The way match_pathspec_item() handles names and pathspecs with trailing slash characters, in conjunction with special options like DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC were non-obvious, and broken until this patch series. Add a table in a comment explaining the intent of how these work. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano diff --git a/dir.c b/dir.c index 47c0a99..3b2fe17 100644 --- a/dir.c +++ b/dir.c @@ -276,16 +276,27 @@ static int do_read_blob(const struct object_id *oid, struct oid_stat *oid_stat, #define DO_MATCH_LEADING_PATHSPEC (1<<2) /* - * Does 'match' match the given name? - * A match is found if + * Does the given pathspec match the given name? A match is found if * - * (1) the 'match' string is leading directory of 'name', or - * (2) the 'match' string is a wildcard and matches 'name', or - * (3) the 'match' string is exactly the same as 'name'. + * (1) the pathspec string is leading directory of 'name' ("RECURSIVELY"), or + * (2) the pathspec string has a leading part matching 'name' ("LEADING"), or + * (3) the pathspec string is a wildcard and matches 'name' ("WILDCARD"), or + * (4) the pathspec string is exactly the same as 'name' ("EXACT"). * - * and the return value tells which case it was. + * Return value tells which case it was (1-4), or 0 when there is no match. * - * It returns 0 when there is no match. + * It may be instructive to look at a small table of concrete examples + * to understand the differences between 1, 2, and 4: + * + * Pathspecs + * | a/b | a/b/ | a/b/c + * ------+-----------+-----------+------------ + * a/b | EXACT | EXACT[1] | LEADING[2] + * Names a/b/ | RECURSIVE | EXACT | LEADING[2] + * a/b/c | RECURSIVE | RECURSIVE | EXACT + * + * [1] Only if DO_MATCH_DIRECTORY is passed; otherwise, this is NOT a match. + * [2] Only if DO_MATCH_LEADING_PATHSPEC is passed; otherwise, not a match. */ static int match_pathspec_item(const struct index_state *istate, const struct pathspec_item *item, int prefix, @@ -353,7 +364,7 @@ static int match_pathspec_item(const struct index_state *istate, item->nowildcard_len - prefix)) return MATCHED_FNMATCH; - /* Perform checks to see if "name" is a super set of the pathspec */ + /* Perform checks to see if "name" is a leading string of the pathspec */ if (flags & DO_MATCH_LEADING_PATHSPEC) { /* name is a literal prefix of the pathspec */ int offset = name[namelen-1] == '/' ? 1 : 0; -- cgit v0.10.2-6-g49f6 From 3aca58045f4fa4ad25042f0850706b1d00fd77f1 Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Tue, 17 Sep 2019 09:35:00 -0700 Subject: git-clean.txt: do not claim we will delete files with -n/--dry-run It appears that the wrong option got included in the list of what will cause git-clean to actually take action. Correct the list. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt index db876f7..e84ffc9 100644 --- a/Documentation/git-clean.txt +++ b/Documentation/git-clean.txt @@ -35,7 +35,7 @@ OPTIONS --force:: If the Git configuration variable clean.requireForce is not set to false, 'git clean' will refuse to delete files or directories - unless given -f, -n or -i. Git will refuse to delete directories + unless given -f or -i. Git will refuse to delete directories with .git sub directory or file unless a second -f is given. -- cgit v0.10.2-6-g49f6 From e86bbcf987faad8e487beb814351a404fa80957f Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Tue, 17 Sep 2019 09:35:01 -0700 Subject: clean: disambiguate the definition of -d The -d flag pre-dated git-clean's ability to have paths specified. As such, the default for git-clean was to only remove untracked files in the current directory, and -d existed to allow it to recurse into subdirectories. The interaction of paths and the -d option appears to not have been carefully considered, as evidenced by numerous bugs and a dearth of tests covering such pairings in the testsuite. The definition turns out to be important, so let's look at some of the various ways one could interpret the -d option: A) Without -d, only look in subdirectories which contain tracked files under them; with -d, also look in subdirectories which are untracked for files to clean. B) Without specified paths from the user for us to delete, we need to have some kind of default, so...without -d, only look in subdirectories which contain tracked files under them; with -d, also look in subdirectories which are untracked for files to clean. The important distinction here is that choice B says that the presence or absence of '-d' is irrelevant if paths are specified. The logic behind option B is that if a user explicitly asked us to clean a specified pathspec, then we should clean anything that matches that pathspec. Some examples may clarify. Should git clean -f untracked_dir/file remove untracked_dir/file or not? It seems crazy not to, but a strict reading of option A says it shouldn't be removed. How about git clean -f untracked_dir/file1 tracked_dir/file2 or git clean -f untracked_dir_1/file1 untracked_dir_2/file2 ? Should it remove either or both of these files? Should it require multiple runs to remove both the files listed? (If this sounds like a crazy question to even ask, see the commit message of "t7300: Add some testcases showing failure to clean specified pathspecs" added earlier in this patch series.) What if -ffd were used instead of -f -- should that allow these to be removed? Should it take multiple invocations with -ffd? What if a glob (such as '*tracked*') were used instead of spelling out the directory names? What if the filenames involved globs, such as git clean -f '*.o' or git clean -f '*/*.o' ? The current documentation actually suggests a definition that is slightly different than choice A, and the implementation prior to this series provided something radically different than either choices A or B. (The implementation, though, was clearly just buggy). There may be other choices as well. However, for almost any given choice of definition for -d that I can think of, some of the examples above will appear buggy to the user. The only case that doesn't have negative surprises is choice B: treat a user-specified path as a request to clean all untracked files which match that path specification, including recursing into any untracked directories. Change the documentation and basic implementation to use this definition. There were two regression tests that indirectly depended on the current implementation, but neither was about subdirectory handling. These two tests were introduced in commit 5b7570cfb41c ("git-clean: add tests for relative path", 2008-03-07) which was solely created to add coverage for the changes in commit fb328947c8e ("git-clean: correct printing relative path", 2008-03-07). Both tests specified a directory that happened to have an untracked subdirectory, but both were only checking that the resulting printout of a file that was removed was shown with a relative path. Update these tests appropriately. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt index e84ffc9..3ab749b 100644 --- a/Documentation/git-clean.txt +++ b/Documentation/git-clean.txt @@ -26,10 +26,12 @@ are affected. OPTIONS ------- -d:: - Remove untracked directories in addition to untracked files. - If an untracked directory is managed by a different Git - repository, it is not removed by default. Use -f option twice - if you really want to remove such a directory. + Normally, when no is specified, git clean will not + recurse into untracked directories to avoid removing too much. + Specify -d to have it recurse into such directories as well. + If any paths are specified, -d is irrelevant; all untracked + files matching the specified paths (with exceptions for nested + git directories mentioned under `--force`) will be removed. -f:: --force:: diff --git a/builtin/clean.c b/builtin/clean.c index d5579da..68d70e4 100644 --- a/builtin/clean.c +++ b/builtin/clean.c @@ -949,6 +949,14 @@ int cmd_clean(int argc, const char **argv, const char *prefix) dir.flags |= DIR_SHOW_OTHER_DIRECTORIES; + if (argc) { + /* + * Remaining args implies pathspecs specified, and we should + * recurse within those. + */ + remove_directories = 1; + } + if (remove_directories) dir.flags |= DIR_SHOW_IGNORED_TOO | DIR_KEEP_UNTRACKED_CONTENTS; diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index d83aeb7..530dfda 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -117,6 +117,7 @@ test_expect_success C_LOCALE_OUTPUT 'git clean with relative prefix' ' would_clean=$( cd docs && git clean -n ../src | + grep part3 | sed -n -e "s|^Would remove ||p" ) && verbose test "$would_clean" = ../src/part3.c @@ -129,6 +130,7 @@ test_expect_success C_LOCALE_OUTPUT 'git clean with absolute path' ' would_clean=$( cd docs && git clean -n "$(pwd)/../src" | + grep part3 | sed -n -e "s|^Would remove ||p" ) && verbose test "$would_clean" = ../src/part3.c -- cgit v0.10.2-6-g49f6 From 09487f2cbad36d5da42e204c9e0d201e8607acb8 Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Tue, 17 Sep 2019 09:35:02 -0700 Subject: clean: avoid removing untracked files in a nested git repository Users expect files in a nested git repository to be left alone unless sufficiently forced (with two -f's). Unfortunately, in certain circumstances, git would delete both tracked (and possibly dirty) files and untracked files within a nested repository. To explain how this happens, let's contrast a couple cases. First, take the following example setup (which assumes we are already within a git repo): git init nested cd nested >tracked git add tracked git commit -m init >untracked cd .. In this setup, everything works as expected; running 'git clean -fd' will result in fill_directory() returning the following paths: nested/ nested/tracked nested/untracked and then correct_untracked_entries() would notice this can be compressed to nested/ and then since "nested/" is a directory, we would call remove_dirs("nested/", ...), which would check is_nonbare_repository_dir() and then decide to skip it. However, if someone also creates an ignored file: >nested/ignored then running 'git clean -fd' would result in fill_directory() returning the same paths: nested/ nested/tracked nested/untracked but correct_untracked_entries() will notice that we had ignored entries under nested/ and thus simplify this list to nested/tracked nested/untracked Since these are not directories, we do not call remove_dirs() which was the only place that had the is_nonbare_repository_dir() safety check -- resulting in us deleting both the untracked file and the tracked (and possibly dirty) file. One possible fix for this issue would be walking the parent directories of each path and checking if they represent nonbare repositories, but that would be wasteful. Even if we added caching of some sort, it's still a waste because we should have been able to check that "nested/" represented a nonbare repository before even descending into it in the first place. Add a DIR_SKIP_NESTED_GIT flag to dir_struct.flags and use it to prevent fill_directory() and friends from descending into nested git repos. With this change, we also modify two regression tests added in commit 91479b9c72f1 ("t7300: add tests to document behavior of clean and nested git", 2015-06-15). That commit, nor its series, nor the six previous iterations of that series on the mailing list discussed why those tests coded the expectation they did. In fact, it appears their purpose was simply to test _existing_ behavior to make sure that the performance changes didn't change the behavior. However, these two tests directly contradicted the manpage's claims that two -f's were required to delete files/directories under a nested git repository. While one could argue that the user gave an explicit path which matched files/directories that were within a nested repository, there's a slippery slope that becomes very difficult for users to understand once you go down that route (e.g. what if they specified "git clean -f -d '*.c'"?) It would also be hard to explain what the exact behavior was; avoid such problems by making it really simple. Also, clean up some grammar errors describing this functionality in the git-clean manpage. Finally, there are still a couple bugs with -ffd not cleaning out enough (e.g. missing the nested .git) and with -ffdX possibly cleaning out the wrong files (paying attention to outer .gitignore instead of inner). This patch does not address these cases at all (and does not change the behavior relative to those flags), it only fixes the handling when given a single -f. See https://public-inbox.org/git/20190905212043.GC32087@szeder.dev/ for more discussion of the -ffd[X?] bugs. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt index 3ab749b..ba31d8d 100644 --- a/Documentation/git-clean.txt +++ b/Documentation/git-clean.txt @@ -37,9 +37,9 @@ OPTIONS --force:: If the Git configuration variable clean.requireForce is not set to false, 'git clean' will refuse to delete files or directories - unless given -f or -i. Git will refuse to delete directories - with .git sub directory or file unless a second -f - is given. + unless given -f or -i. Git will refuse to modify untracked + nested git repositories (directories with a .git subdirectory) + unless a second -f is given. -i:: --interactive:: diff --git a/builtin/clean.c b/builtin/clean.c index 68d70e4..3a7a63a 100644 --- a/builtin/clean.c +++ b/builtin/clean.c @@ -946,6 +946,8 @@ int cmd_clean(int argc, const char **argv, const char *prefix) if (force > 1) rm_flags = 0; + else + dir.flags |= DIR_SKIP_NESTED_GIT; dir.flags |= DIR_SHOW_OTHER_DIRECTORIES; diff --git a/dir.c b/dir.c index 3b2fe17..7ff7917 100644 --- a/dir.c +++ b/dir.c @@ -1451,6 +1451,16 @@ static enum path_treatment treat_directory(struct dir_struct *dir, return path_none; case index_nonexistent: + if (dir->flags & DIR_SKIP_NESTED_GIT) { + int nested_repo; + struct strbuf sb = STRBUF_INIT; + strbuf_addstr(&sb, dirname); + nested_repo = is_nonbare_repository_dir(&sb); + strbuf_release(&sb); + if (nested_repo) + return path_none; + } + if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES) break; if (exclude && diff --git a/dir.h b/dir.h index 46c238a..739aea7 100644 --- a/dir.h +++ b/dir.h @@ -156,7 +156,8 @@ struct dir_struct { DIR_SHOW_IGNORED_TOO = 1<<5, DIR_COLLECT_KILLED_ONLY = 1<<6, DIR_KEEP_UNTRACKED_CONTENTS = 1<<7, - DIR_SHOW_IGNORED_TOO_MODE_MATCHING = 1<<8 + DIR_SHOW_IGNORED_TOO_MODE_MATCHING = 1<<8, + DIR_SKIP_NESTED_GIT = 1<<9 } flags; struct dir_entry **entries; struct dir_entry **ignored; diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 530dfda..6e6d24c 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -549,7 +549,7 @@ test_expect_failure 'nested (non-empty) bare repositories should be cleaned even test_path_is_missing strange_bare ' -test_expect_success 'giving path in nested git work tree will remove it' ' +test_expect_success 'giving path in nested git work tree will NOT remove it' ' rm -fr repo && mkdir repo && ( @@ -561,7 +561,7 @@ test_expect_success 'giving path in nested git work tree will remove it' ' git clean -f -d repo/bar/baz && test_path_is_file repo/.git/HEAD && test_path_is_dir repo/bar/ && - test_path_is_missing repo/bar/baz + test_path_is_file repo/bar/baz/hello.world ' test_expect_success 'giving path to nested .git will not remove it' ' @@ -579,7 +579,7 @@ test_expect_success 'giving path to nested .git will not remove it' ' test_path_is_dir untracked/ ' -test_expect_success 'giving path to nested .git/ will remove contents' ' +test_expect_success 'giving path to nested .git/ will NOT remove contents' ' rm -fr repo untracked && mkdir repo untracked && ( @@ -589,7 +589,7 @@ test_expect_success 'giving path to nested .git/ will remove contents' ' ) && git clean -f -d repo/.git/ && test_path_is_dir repo/.git && - test_dir_is_empty repo/.git && + test_path_is_file repo/.git/HEAD && test_path_is_dir untracked/ ' @@ -671,7 +671,7 @@ test_expect_success 'git clean -d skips untracked dirs containing ignored files' test_path_is_missing foo/b/bb ' -test_expect_failure 'git clean -d skips nested repo containing ignored files' ' +test_expect_success 'git clean -d skips nested repo containing ignored files' ' test_when_finished "rm -rf nested-repo-with-ignored-file" && git init nested-repo-with-ignored-file && -- cgit v0.10.2-6-g49f6 From ca8b5390db928014af9aadc73c876198501bfb35 Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Tue, 17 Sep 2019 09:35:03 -0700 Subject: clean: rewrap overly long line Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano diff --git a/builtin/clean.c b/builtin/clean.c index 3a7a63a..6030842 100644 --- a/builtin/clean.c +++ b/builtin/clean.c @@ -158,7 +158,8 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag, *dir_gone = 1; - if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) && is_nonbare_repository_dir(path)) { + if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) && + is_nonbare_repository_dir(path)) { if (!quiet) { quote_path_relative(path->buf, prefix, "ed); printf(dry_run ? _(msg_would_skip_git_dir) : _(msg_skip_git_dir), -- cgit v0.10.2-6-g49f6 From 902b90cf42bc3c05f9fd124edac42ef1fc4ffafa Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Tue, 17 Sep 2019 09:35:04 -0700 Subject: clean: fix theoretical path corruption cmd_clean() had the following code structure: struct strbuf abs_path = STRBUF_INIT; for_each_string_list_item(item, &del_list) { strbuf_addstr(&abs_path, prefix); strbuf_addstr(&abs_path, item->string); PROCESS(&abs_path); strbuf_reset(&abs_path); } where I've elided a bunch of unnecessary details and PROCESS(&abs_path) represents a big chunk of code rather than an actual function call. One piece of PROCESS was: if (lstat(abs_path.buf, &st)) continue; which would cause the strbuf_reset() to be missed -- meaning that the next path to be handled would have two paths concatenated. This path used to use die_errno() instead of continue prior to commit 396049e5fb62 ("git-clean: refactor git-clean into two phases", 2013-06-25), but my understanding of how correct_untracked_entries() works is that it will prevent both dir/ and dir/file from being in the list to clean so this should be dead code and the die_errno() should be safe. But I hesitate to remove it since I am not certain. However, we can fix both this bug and possible similar future bugs by simply moving the strbuf_reset(&abs_path) to the beginning of the loop. It'll result in N calls to strbuf_reset() instead of N-1, but that's a small price to pay to avoid sneaky bugs like this. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano diff --git a/builtin/clean.c b/builtin/clean.c index 6030842..4cf2399 100644 --- a/builtin/clean.c +++ b/builtin/clean.c @@ -1018,6 +1018,7 @@ int cmd_clean(int argc, const char **argv, const char *prefix) for_each_string_list_item(item, &del_list) { struct stat st; + strbuf_reset(&abs_path); if (prefix) strbuf_addstr(&abs_path, prefix); @@ -1051,7 +1052,6 @@ int cmd_clean(int argc, const char **argv, const char *prefix) printf(dry_run ? _(msg_would_remove) : _(msg_remove), qname); } } - strbuf_reset(&abs_path); } strbuf_release(&abs_path); -- cgit v0.10.2-6-g49f6 From 69f272b922df153c86db520bf9b6018a9808c2a6 Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Tue, 1 Oct 2019 11:55:24 -0700 Subject: dir: special case check for the possibility that pathspec is NULL MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commits 404ebceda01c ("dir: also check directories for matching pathspecs", 2019-09-17) and 89a1f4aaf765 ("dir: if our pathspec might match files under a dir, recurse into it", 2019-09-17) added calls to match_pathspec() and do_match_pathspec() passing along their pathspec parameter. Both match_pathspec() and do_match_pathspec() assume the pathspec argument they are given is non-NULL. It turns out that unpack-tree.c's verify_clean_subdirectory() calls read_directory() with pathspec == NULL, and it is possible on case insensitive filesystems for that NULL to make it to these new calls to match_pathspec() and do_match_pathspec(). Add appropriate checks on the NULLness of pathspec to avoid a segfault. In case the negation throws anyone off (one of the calls was to do_match_pathspec() while the other was to !match_pathspec(), yet no negation of the NULLness of pathspec is used), there are two ways to understand the differences: * The code already handled the pathspec == NULL cases before this series, and this series only tried to change behavior when there was a pathspec, thus we only want to go into the if-block if pathspec is non-NULL. * One of the calls is for whether to recurse into a subdirectory, the other is for after we've recursed into it for whether we want to remove the subdirectory itself (i.e. the subdirectory didn't match but something under it could have). That difference in situation leads to the slight differences in logic used (well, that and the slightly unusual fact that we don't want empty pathspecs to remove untracked directories by default). Denton found and analyzed one issue and provided the patch for the match_pathspec() call, SZEDER figured out why the issue only reproduced for some folks and not others and provided the testcase, and I looked through the remainder of the series and noted the do_match_pathspec() call that should have the same check. Co-authored-by: Denton Liu Co-authored-by: SZEDER Gábor Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano diff --git a/dir.c b/dir.c index 7ff7917..bd39b86 100644 --- a/dir.c +++ b/dir.c @@ -1962,8 +1962,9 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, ((state == path_untracked) && (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR) && ((dir->flags & DIR_SHOW_IGNORED_TOO) || - do_match_pathspec(istate, pathspec, path.buf, path.len, - baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) { + (pathspec && + do_match_pathspec(istate, pathspec, path.buf, path.len, + baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) { struct untracked_cache_dir *ud; ud = lookup_untracked(dir->untracked, untracked, path.buf + baselen, @@ -1975,7 +1976,8 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, if (subdir_state > dir_state) dir_state = subdir_state; - if (!match_pathspec(istate, pathspec, path.buf, path.len, + if (pathspec && + !match_pathspec(istate, pathspec, path.buf, path.len, 0 /* prefix */, NULL, 0 /* do NOT special case dirs */)) state = path_none; diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh index 192c94e..608673f 100755 --- a/t/t0050-filesystem.sh +++ b/t/t0050-filesystem.sh @@ -131,4 +131,24 @@ $test_unicode 'merge (silent unicode normalization)' ' git merge topic ' +test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case insensitive fs' ' + git init repo && + ( + cd repo && + + >Gitweb && + git add Gitweb && + git commit -m "add Gitweb" && + + git checkout --orphan todo && + git reset --hard && + mkdir -p gitweb/subdir && + >gitweb/subdir/file && + git add gitweb && + git commit -m "add gitweb/subdir/file" && + + git checkout master + ) +' + test_done -- cgit v0.10.2-6-g49f6