summaryrefslogtreecommitdiff
path: root/builtin/clean.c
AgeCommit message (Collapse)Author
2024-03-04clean: further clean-up of implementation around "--force"Junio C Hamano
We clarified how "clean.requireForce" interacts with the "--dry-run" option in the previous commit, both in the implementation and in the documentation. Even when "git clean" (without other options) is required to be used with "--force" (i.e. either clean.requireForce is unset, or explicitly set to true) to protect end-users from casual invocation of the command by mistake, "--dry-run" does not require "--force" to be used, because it is already its own protection mechanism by being a no-op to the working tree files. The previous commit, however, missed another clean-up opportunity around the same area. Just like in the "--dry-run" mode, the command in the "--interactive" mode does not require "--force", either. This is because by going interactive and giving the end user one more chance to confirm, the mode itself is serving as its own protection mechanism. Let's take things one step further, and unify the code that defines interaction between "--force" and these two other options. Just like we added explanation for the reason why "--dry-run" does not honor "clean.requireForce", give an explanation for the reason why "--interactive" makes "clean.requireForce" to be ignored. Finally, add some tests to show the interaction between "--force" and "--interactive". We already have tests that show interaction between "--force" and "--dry-run", but didn't test "--interactive". Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-03clean: improve -n and -f implementation and documentationSergey Organov
What -n actually does in addition to its documented behavior is ignoring of configuration variable clean.requireForce, that makes sense provided -n prevents files removal anyway. So, first, document this in the manual, and then modify implementation to make this more explicit in the code. Improved implementation also stops to share single internal variable 'force' between command-line -f option and configuration variable clean.requireForce, resulting in more clear logic. Two error messages with slightly different text depending on if clean.requireForce was explicitly set or not, are merged into a single one. The resulting error message now does not mention -n as well, as it neither matches intended clean.requireForce usage nor reflects clarified implementation. Documentation of clean.requireForce is changed accordingly. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-08clean: factorize incompatibility messageRené Scharfe
Use the standard parameterized message for reporting incompatible options to inform users that they can't use -x and -X together. This reduces the number of strings to translate and makes the UI slightly more consistent. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-06Merge branch 'gc/config-context'Junio C Hamano
Reduce reliance on a global state in the config reading API. * gc/config-context: config: pass source to config_parser_event_fn_t config: add kvi.path, use it to evaluate includes config.c: remove config_reader from configsets config: pass kvi to die_bad_number() trace2: plumb config kvi config.c: pass ctx with CLI config config: pass ctx with config files config.c: pass ctx in configsets config: add ctx arg to config_fn_t urlmatch.h: use config_fn_t type config: inline git_color_default_config
2023-06-28config: add ctx arg to config_fn_tGlen Choo
Add a new "const struct config_context *ctx" arg to config_fn_t to hold additional information about the config iteration operation. config_context has a "struct key_value_info kvi" member that holds metadata about the config source being read (e.g. what kind of config source it is, the filename, etc). In this series, we're only interested in .kvi, so we could have just used "struct key_value_info" as an arg, but config_context makes it possible to add/adjust members in the future without changing the config_fn_t signature. We could also consider other ways of organizing the args (e.g. moving the config name and value into config_context or key_value_info), but in my experiments, the incremental benefit doesn't justify the added complexity (e.g. a config_fn_t will sometimes invoke another config_fn_t but with a different config value). In subsequent commits, the .kvi member will replace the global "struct config_reader" in config.c, making config iteration a global-free operation. It requires much more work for the machinery to provide meaningful values of .kvi, so for now, merely change the signature and call sites, pass NULL as a placeholder value, and don't rely on the arg in any meaningful way. Most of the changes are performed by contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every config_fn_t: - Modifies the signature to accept "const struct config_context *ctx" - Passes "ctx" to any inner config_fn_t, if needed - Adds UNUSED attributes to "ctx", if needed Most config_fn_t instances are easily identified by seeing if they are called by the various config functions. Most of the remaining ones are manually named in the .cocci patch. Manual cleanups are still needed, but the majority of it is trivial; it's either adjusting config_fn_t that the .cocci patch didn't catch, or adding forward declarations of "struct config_context ctx" to make the signatures make sense. The non-trivial changes are in cases where we are invoking a config_fn_t outside of config machinery, and we now need to decide what value of "ctx" to pass. These cases are: - trace2/tr2_cfg.c:tr2_cfg_set_fl() This is indirectly called by git_config_set() so that the trace2 machinery can notice the new config values and update its settings using the tr2 config parsing function, i.e. tr2_cfg_cb(). - builtin/checkout.c:checkout_main() This calls git_xmerge_config() as a shorthand for parsing a CLI arg. This might be worth refactoring away in the future, since git_xmerge_config() can call git_default_config(), which can do much more than just parsing. Handle them by creating a KVI_INIT macro that initializes "struct key_value_info" to a reasonable default, and use that to construct the "ctx" arg. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28config: inline git_color_default_configGlen Choo
git_color_default_config() is a shorthand for calling two other config callbacks. There are no other non-static functions that do this and it will complicate our refactoring of config_fn_t so inline it instead. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21repository: remove unnecessary include of path.hElijah Newren
This also made it clear that several .c files that depended upon path.h were missing a #include for it; add the missing includes while at it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21cache.h: remove this no-longer-used headerElijah Newren
Since this header showed up in some places besides just #include statements, update/clean-up/remove those other places as well. Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got away with violating the rule that all files must start with an include of git-compat-util.h (or a short-list of alternate headers that happen to include it first). This change exposed the violation and caused it to stop building correctly; fix it by having it include git-compat-util.h first, as per policy. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21read-cache*.h: move declarations for read-cache.c functions from cache.hElijah Newren
For the functions defined in read-cache.c, move their declarations from cache.h to a new header, read-cache-ll.h. Also move some related inline functions from cache.h to read-cache.h. The purpose of the read-cache-ll.h/read-cache.h split is that about 70% of the sites don't need the inline functions and the extra headers they include. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24hash-ll.h: split out of hash.h to remove dependency on repository.hElijah Newren
hash.h depends upon and includes repository.h, due to the definition and use of the_hash_algo (defined as the_repository->hash_algo). However, most headers trying to include hash.h are only interested in the layout of the structs like object_id. Move the parts of hash.h that do not depend upon repository.h into a new file hash-ll.h (the "low level" parts of hash.h), and adjust other files to use this new header where the convenience inline functions aren't needed. This allows hash.h and object.h to be fairly small, minimal headers. It also exposes a lot of hidden dependencies on both path.h (which was brought in by repository.h) and repository.h (which was previously implicitly brought in by object.h), so also adjust other files to be more explicit about what they depend upon. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21setup.h: move declarations for setup.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21abspath.h: move absolute path functions from cache.hElijah Newren
This is another step towards letting us remove the include of cache.h in strbuf.c. It does mean that we also need to add includes of abspath.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-22Merge branch 'ab/various-leak-fixes'Junio C Hamano
Leak fixes. * ab/various-leak-fixes: push: free_refs() the "local_refs" in set_refspecs() push: refactor refspec_append_mapped() for subsequent leak-fix receive-pack: release the linked "struct command *" list grep API: plug memory leaks by freeing "header_list" grep.c: refactor free_grep_patterns() builtin/merge.c: free "&buf" on "Your local changes..." error builtin/merge.c: use fixed strings, not "strbuf", fix leak show-branch: free() allocated "head" before return commit-graph: fix a parse_options_concat() leak http-backend.c: fix cmd_main() memory leak, refactor reg{exec,free}() http-backend.c: fix "dir" and "cmd_arg" leaks in cmd_main() worktree: fix a trivial leak in prune_worktrees() repack: fix leaks on error with "goto cleanup" name-rev: don't xstrdup() an already dup'd string various: add missing clear_pathspec(), fix leaks clone: use free() instead of UNLEAK() commit-graph: use free_commit_graph() instead of UNLEAK() bundle.c: don't leak the "args" in the "struct child_process" tests: mark tests as passing with SANITIZE=leak
2023-02-06various: add missing clear_pathspec(), fix leaksÆvar Arnfjörð Bjarmason
Fix memory leaks resulting from a missing clear_pathspec(). - archive.c: Plug a leak in the "struct archiver_args", and clear_pathspec() the "pathspec" member that the "parse_pathspec_arg()" call in this function populates. - builtin/clean.c: Fix a memory leak that's been with us since 893d839970c (clean: convert to use parse_pathspec, 2013-07-14). - builtin/reset.c: Add clear_pathspec() calls to cmd_reset(), including to the codepaths where we'd return early. - builtin/stash.c: Call clear_pathspec() on the pathspec initialized in push_stash(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06docs & comments: replace mentions of "git-add--interactive.perl"Ævar Arnfjörð Bjarmason
Now that we've removed "git-add--interactive.perl" let's replace mentions of it with "add-interactive.c". In the case of the "git add" documentation we were using it as an example filename, so the mention wasn't wrong, but using a dead file is slightly confusing. The "borrowed" comment here likewise isn't wrong, but let's mention the successor file instead. In the case of pathspec.c the implied TODO item should refer to the current code (and the comment may not even be current, I didn't check). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-21cocci: apply "pending" index-compatibility to some "builtin/*.c"Ævar Arnfjörð Bjarmason
Apply "index-compatibility.pending.cocci" rule to "builtin/*", but exclude those where we conflict with in-flight changes. As a result some of them end up using only "the_index", so let's have them use the more narrow "USE_THE_INDEX_VARIABLE" rather than "USE_THE_INDEX_COMPATIBILITY_MACROS". Manual changes not made by coccinelle, that were squashed in: * Whitespace-wrap argument lists for repo_hold_locked_index(), repo_read_index_preload() and repo_refresh_and_write_index(), in cases where the line became too long after the transformation. * Change "refresh_cache()" to "refresh_index()" in a comment in "builtin/update-index.c". * For those whose call was followed by perror("<macro-name>"), change it to perror("<function-name>"), referring to the new function. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-21cocci & cache.h: remove rarely used "the_index" compat macrosÆvar Arnfjörð Bjarmason
Since 4aab5b46f44 (Make read-cache.c "the_index" free., 2007-04-01) we've been undergoing a slow migration away from these macros, but haven't made much progress since f8adbec9fea (cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch, 2019-01-24). Let's move forward a bit by changing the users of those macros that are rare enough that we can convert them in one go, and then remove the compatibility shim. The only manual change to the C code here is to "cache.h", the rest is all the result of applying the new "index-compatibility.cocci". Even though it's a one-off, let's keep the coccinelle rules for now. We'll extend them in subsequent commits, and this will help anything that's in-flight or out-of-tree to migrate. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-13doc txt & -h consistency: use "[<label>...]" for "zero or more"Ævar Arnfjörð Bjarmason
Correct uses of "<label>..." where we really meant to say "[<label>...]", i.e. the command in question taken an optional set of "<label>". As the CodingGuidelines notes "[o]ptional parts [should be] enclosed in square brackets". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-13doc txt & -h consistency: fix mismatching labelsÆvar Arnfjörð Bjarmason
Fix various inconsistencies between command SYNOPSIS and the corresponding -h output where our translatable labels didn't match up. In some cases we need to adjust the prose that follows the SYNOPSIS accordingly, as it refers back to the changed label. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-18Merge branch 'vd/sparse-clean-etc'Junio C Hamano
"git update-index", "git checkout-index", and "git clean" are taught to work better with the sparse checkout feature. * vd/sparse-clean-etc: update-index: reduce scope of index expansion in do_reupdate update-index: integrate with sparse index update-index: add tests for sparse-checkout compatibility checkout-index: integrate with sparse index checkout-index: add --ignore-skip-worktree-bits option checkout-index: expand sparse checkout compatibility tests clean: integrate with sparse index reset: reorder wildcard pathspec conditions reset: fix validation in sparse index test
2022-01-13clean: integrate with sparse indexVictoria Dye
Remove full index requirement for `git clean` and test to ensure the index is not expanded in `git clean`. Add to existing test for `git clean` to verify cleanup of untracked files in sparse directories is consistent between sparse index and non-sparse index checkouts. Signed-off-by: Victoria Dye <vdye@github.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09clean: do not attempt to remove startup_info->original_cwdElijah Newren
Acked-by: Derrick Stolee <stolee@gmail.com> Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-01dir.[ch]: replace dir_init() with DIR_INITÆvar Arnfjörð Bjarmason
Remove the dir_init() function and replace it with a DIR_INIT macro. In many cases in the codebase we need to initialize things with a function for good reasons, e.g. needing to call another function on initialization. The "dir_init()" function was not one such case, and could trivially be replaced with a more idiomatic macro initialization pattern. The only place where we made use of its use of memset() was in dir_clear() itself, which resets the contents of an an existing struct pointer. Let's use the new "memcpy() a 'blank' struct on the stack" idiom to do that reset. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-19Merge branch 'en/dir-traversal'Junio C Hamano
"git clean" and "git ls-files -i" had confusion around working on or showing ignored paths inside an ignored directory, which has been corrected. * en/dir-traversal: dir: introduce readdir_skip_dot_and_dotdot() helper dir: update stale description of treat_directory() dir: traverse into untracked directories if they may have ignored subfiles dir: avoid unnecessary traversal into ignored directory t3001, t7300: add testcase showcasing missed directory traversal t7300: add testcase showing unnecessary traversal into ignored directory ls-files: error out on -i unless -o or -c are specified dir: report number of visited directories and paths with trace2 dir: convert trace calls to trace2 equivalents
2021-05-13Merge branch 'mt/clean-clean'Junio C Hamano
Code clean-up. * mt/clean-clean: clean: remove unnecessary variable
2021-05-12dir: introduce readdir_skip_dot_and_dotdot() helperElijah Newren
Many places in the code were doing while ((d = readdir(dir)) != NULL) { if (is_dot_or_dotdot(d->d_name)) continue; ...process d... } Introduce a readdir_skip_dot_and_dotdot() helper to make that a one-liner: while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) { ...process d... } This helper particularly simplifies checks for empty directories. Also use this helper in read_cached_dir() so that our statistics are consistent across platforms. (In other words, read_cached_dir() should have been using is_dot_or_dotdot() and skipping such entries, but did not and left it to treat_path() to detect and mark such entries as path_none.) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-06clean: remove unnecessary variableMatheus Tavares
The variable `matches` used to hold the return of a `dir_path_match()` call that was removed in 95c11ecc73 ("Fix error-prone fill_directory() API; make it only return matches", 2020-04-01). Now `matches` will always hold 0, which is the value it's initialized with; and the condition `matches != MATCHED_EXACTLY` will always evaluate to true. So let's remove this unnecessary variable. Interestingly, it seems that `matches != MATCHED_EXACTLY` was already unnecessary before 95c11ecc73. That's because `remove_directories` is always set to 1 when we have pathspecs; So, in the condition `!remove_directories && matches != MATCHED_EXACTLY`, we would either: - have pathspecs (or have been given `-d`) and ignore `matches` because `remove_directories` is 1; or - not have pathspecs (nor `-d`) and end up just checking that `0 != MATCHED_EXACTLY`, as `matches` would never get reassigned after its zero initialization (because there is no pathspec to match). Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-14use CALLOC_ARRAYRené Scharfe
Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-10quote_path: give flags parameter to quote_path()Junio C Hamano
The quote_path() function computes a path (relative to its base directory) and c-quotes the result if necessary. Teach it to take a flags parameter to allow its behaviour to be enriched later. No behaviour change intended. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-10quote_path: rename quote_path_relative() to quote_path()Junio C Hamano
There is no quote_path_absolute() or anything that causes confusion, and one of the two large consumers already rename the long name locally with a preprocessor macro. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-19dir: fix problematic API to avoid memory leaksElijah Newren
The dir structure seemed to have a number of leaks and problems around it. First I noticed that parent_hashmap and recursive_hashmap were being leaked (though Peff noticed and submitted fixes before me). Then I noticed in the previous commit that clear_directory() was only taking responsibility for a subset of fields within dir_struct, despite the fact that entries[] and ignored[] we allocated internally to dir.c. That, of course, resulted in many callers either leaking or haphazardly trying to free these arrays and their contents. Digging further, I found that despite the pretty clear documentation near the top of dir.h that folks were supposed to call clear_directory() when the user no longer needed the dir_struct, there were four callers that didn't bother doing that at all. However, two of them clearly thought about leaks since they had an UNLEAK(dir) directive, which to me suggests that the method to free the data was too unclear. I suspect the non-obviousness of the API and its holes led folks to avoid it, which then snowballed into further problems with the entries[], ignored[], parent_hashmap, and recursive_hashmap problems. Rename clear_directory() to dir_clear() to be more in line with other data structures in git, and introduce a dir_init() to handle the suggested memsetting of dir_struct to all zeroes. I hope that a name like "dir_clear()" is more clear, and that the presence of dir_init() will provide a hint to those looking at the code that they need to look for either a dir_clear() or a dir_free() and lead them to find dir_clear(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-19dir: make clear_directory() free all relevant memoryElijah Newren
The calling convention for the dir API is supposed to end with a call to clear_directory() to free up no longer needed memory. However, clear_directory() didn't free dir->entries or dir->ignored. I believe this was an oversight, but a number of callers noticed memory leaks and started free'ing these. Unfortunately, they did so somewhat haphazardly (sometimes freeing the entries in the arrays, and sometimes only free'ing the arrays themselves). This suggests the callers weren't trying to make sure any possible memory used might be free'd, but just the memory they noticed their usecase definitely had allocated. Fix this mess by moving all the duplicated free'ing logic into clear_directory(). End by resetting dir to a pristine state so it could be reused if desired. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-13clean: optimize and document cases where we recurse into subdirectoriesElijah Newren
Commit 6b1db43109 ("clean: teach clean -d to preserve ignored paths", 2017-05-23) added the following code block (among others) to git-clean: if (remove_directories) dir.flags |= DIR_SHOW_IGNORED_TOO | DIR_KEEP_UNTRACKED_CONTENTS; The reason for these flags is well documented in the commit message, but isn't obvious just from looking at the code. Add some explanations to the code to make it clearer. Further, it appears git-2.26 did not correctly handle this combination of flags from git-clean. With both these flags and without DIR_SHOW_IGNORED_TOO_MODE_MATCHING set, git is supposed to recurse into all untracked AND ignored directories. git-2.26.0 clearly was not doing that. I don't know the full reasons for that or whether git < 2.27.0 had additional unknown bugs because of that misbehavior, because I don't feel it's worth digging into. As per the huge changes and craziness documented in commit 8d92fb2927 ("dir: replace exponential algorithm with a linear one", 2020-04-01), the old algorithm was a mess and was thrown out. What I can say is that git-2.27.0 correctly recurses into untracked AND ignored directories with that combination. However, in clean's case we don't need to recurse into ignored directories; that is just a waste of time. Thus, when git-2.27.0 started correctly handling those flags, we got a performance regression report. Rather than relying on other bugs in fill_directory()'s former logic to provide the behavior of skipping ignored directories, make use of the DIR_SHOW_IGNORED_TOO_MODE_MATCHING value specifically added in commit eec0f7f2b7 ("status: add option to show ignored files differently", 2017-10-30) for this purpose. Reported-by: Brian Malehorn <bmalehorn@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-13clean: consolidate handling of ignored parametersElijah Newren
I spent a long time trying to figure out how and whether the code worked with different values of ignore, ignore_only, and remove_directories. After lots of time setting up lots of testcases, sifting through lots of print statements, and walking through the debugger, I finally realized that one piece of code related to how it was all setup was found in clean.c rather than dir.c. Make a change that would have made it easier for me to do the extra testing by putting this handling in one spot. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-13dir, clean: avoid disallowed behaviorElijah Newren
dir.h documented quite clearly that DIR_SHOW_IGNORED and DIR_SHOW_IGNORED_TOO are mutually exclusive, with a big comment to this effect by the definition of both enum values. However, a command like git clean -fx $DIR would set both values for dir.flags. I _think_ it happened to work because: * As dir.h points out, DIR_KEEP_UNTRACKED_CONTENTS only takes effect if DIR_SHOW_IGNORED_TOO is set. * As coded, I believe DIR_SHOW_IGNORED would just happen to take precedence over DIR_SHOW_IGNORED_TOO in the code as currently constructed. Which is a long way of saying "we just got lucky". Fix clean.c to avoid setting these mutually exclusive values at the same time, and add a check to dir.c that will throw a BUG() to prevent anyone else from making this mistake. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-05Merge branch 'dl/opt-callback-cleanup'Junio C Hamano
Code cleanup. * dl/opt-callback-cleanup: Use OPT_CALLBACK and OPT_CALLBACK_F
2020-04-29Merge branch 'en/fill-directory-exponential'Junio C Hamano
The directory traversal code had redundant recursive calls which made its performance characteristics exponential with respect to the depth of the tree, which was corrected. * en/fill-directory-exponential: completion: fix 'git add' on paths under an untracked directory Fix error-prone fill_directory() API; make it only return matches dir: replace double pathspec matching with single in treat_directory() dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory() dir: replace exponential algorithm with a linear one dir: refactor treat_directory to clarify control flow dir: fix confusion based on variable tense dir: fix broken comment dir: consolidate treat_path() and treat_one_path() dir: fix simple typo in comment t3000: add more testcases testing a variety of ls-files issues t7063: more thorough status checking
2020-04-28Use OPT_CALLBACK and OPT_CALLBACK_FDenton Liu
In the codebase, there are many options which use OPTION_CALLBACK in a plain ol' struct definition. However, we have the OPT_CALLBACK and OPT_CALLBACK_F macros which are meant to abstract these plain struct definitions away. These macros are useful as they semantically signal to developers that these are just normal callback option with nothing fancy happening. Replace plain struct definitions of OPTION_CALLBACK with OPT_CALLBACK or OPT_CALLBACK_F where applicable. The heavy lifting was done using the following (disgusting) shell script: #!/bin/sh do_replacement () { tr '\n' '\r' | sed -e 's/{\s*OPTION_CALLBACK,\s*\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\s*0,\(\s*[^[:space:]}]*\)\s*}/OPT_CALLBACK(\1,\2,\3,\4,\5,\6)/g' | sed -e 's/{\s*OPTION_CALLBACK,\s*\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\(\s*[^[:space:]}]*\)\s*}/OPT_CALLBACK_F(\1,\2,\3,\4,\5,\6,\7)/g' | tr '\r' '\n' } for f in $(git ls-files \*.c) do do_replacement <"$f" >"$f.tmp" mv "$f.tmp" "$f" done The result was manually inspected and then reformatted to match the style of the surrounding code. Finally, using `git grep OPTION_CALLBACK \*.c`, leftover results which were not handled by the script were manually transformed. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-10interactive: refactor code asking the user for interactive inputJohannes Schindelin
There are quite a few code locations (e.g. `git clean --interactive`) where Git asks the user for an answer. In preparation for fixing a bug shared by all of them, and also to DRY up the code, let's refactor it. Please note that most of these callers trimmed white-space both at the beginning and at the end of the answer, instead of trimming only the end (as the caller in `add-patch.c` does). Therefore, technically speaking, we change behavior in this patch. At the same time, it can be argued that this is actually a bug fix. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-01Fix error-prone fill_directory() API; make it only return matchesElijah Newren
Traditionally, the expected calling convention for the dir.c API was: fill_directory(&dir, ..., pathspec) foreach entry in dir->entries: if (dir_path_match(entry, pathspec)) process_or_display(entry) This may have made sense once upon a time, because the fill_directory() call could use cheap checks to avoid doing full pathspec matching, and an external caller may have wanted to do other post-processing of the results anyway. However: * this structure makes it easy for users of the API to get it wrong * this structure actually makes it harder to understand fill_directory() and the functions it uses internally. It has tripped me up several times while trying to fix bugs and restructure things. * relying on post-filtering was already found to produce wrong results; pathspec matching had to be added internally for multiple cases in order to get the right results (see commits 404ebceda01c (dir: also check directories for matching pathspecs, 2019-09-17) and 89a1f4aaf765 (dir: if our pathspec might match files under a dir, recurse into it, 2019-09-17)) * it's bad for performance: fill_directory() already has to do lots of checks and knows the subset of cases where it still needs to do more checks. Forcing external callers to do full pathspec matching means they must re-check _every_ path. So, add the pathspec matching within the fill_directory() internals, and remove it from external callers. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-11Merge branch 'en/clean-nested-with-ignored'Junio C Hamano
"git clean" fixes. * en/clean-nested-with-ignored: dir: special case check for the possibility that pathspec is NULL clean: fix theoretical path corruption clean: rewrap overly long line clean: avoid removing untracked files in a nested git repository clean: disambiguate the definition of -d git-clean.txt: do not claim we will delete files with -n/--dry-run dir: add commentary explaining match_pathspec_item's return value dir: if our pathspec might match files under a dir, recurse into it dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case dir: also check directories for matching pathspecs dir: fix off-by-one error in match_pathspec_item dir: fix typo in comment t7300: add testcases showing failure to clean specified pathspecs
2019-09-30Merge branch 'ds/include-exclude'Junio C Hamano
The internal code originally invented for ".gitignore" processing got reshuffled and renamed to make it less tied to "excluding" and stress more that it is about "matching", as it has been reused for things like sparse checkout specification that want to check if a path is "included". * ds/include-exclude: unpack-trees: rename 'is_excluded_from_list()' treewide: rename 'exclude' methods to 'pattern' treewide: rename 'EXCL_FLAG_' to 'PATTERN_FLAG_' treewide: rename 'struct exclude_list' to 'struct pattern_list' treewide: rename 'struct exclude' to 'struct path_pattern'
2019-09-17clean: fix theoretical path corruptionElijah Newren
cmd_clean() had the following code structure: struct strbuf abs_path = STRBUF_INIT; for_each_string_list_item(item, &del_list) { strbuf_addstr(&abs_path, prefix); strbuf_addstr(&abs_path, item->string); PROCESS(&abs_path); strbuf_reset(&abs_path); } where I've elided a bunch of unnecessary details and PROCESS(&abs_path) represents a big chunk of code rather than an actual function call. One piece of PROCESS was: if (lstat(abs_path.buf, &st)) continue; which would cause the strbuf_reset() to be missed -- meaning that the next path to be handled would have two paths concatenated. This path used to use die_errno() instead of continue prior to commit 396049e5fb62 ("git-clean: refactor git-clean into two phases", 2013-06-25), but my understanding of how correct_untracked_entries() works is that it will prevent both dir/ and dir/file from being in the list to clean so this should be dead code and the die_errno() should be safe. But I hesitate to remove it since I am not certain. However, we can fix both this bug and possible similar future bugs by simply moving the strbuf_reset(&abs_path) to the beginning of the loop. It'll result in N calls to strbuf_reset() instead of N-1, but that's a small price to pay to avoid sneaky bugs like this. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-17clean: rewrap overly long lineElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-17clean: avoid removing untracked files in a nested git repositoryElijah Newren
Users expect files in a nested git repository to be left alone unless sufficiently forced (with two -f's). Unfortunately, in certain circumstances, git would delete both tracked (and possibly dirty) files and untracked files within a nested repository. To explain how this happens, let's contrast a couple cases. First, take the following example setup (which assumes we are already within a git repo): git init nested cd nested >tracked git add tracked git commit -m init >untracked cd .. In this setup, everything works as expected; running 'git clean -fd' will result in fill_directory() returning the following paths: nested/ nested/tracked nested/untracked and then correct_untracked_entries() would notice this can be compressed to nested/ and then since "nested/" is a directory, we would call remove_dirs("nested/", ...), which would check is_nonbare_repository_dir() and then decide to skip it. However, if someone also creates an ignored file: >nested/ignored then running 'git clean -fd' would result in fill_directory() returning the same paths: nested/ nested/tracked nested/untracked but correct_untracked_entries() will notice that we had ignored entries under nested/ and thus simplify this list to nested/tracked nested/untracked Since these are not directories, we do not call remove_dirs() which was the only place that had the is_nonbare_repository_dir() safety check -- resulting in us deleting both the untracked file and the tracked (and possibly dirty) file. One possible fix for this issue would be walking the parent directories of each path and checking if they represent nonbare repositories, but that would be wasteful. Even if we added caching of some sort, it's still a waste because we should have been able to check that "nested/" represented a nonbare repository before even descending into it in the first place. Add a DIR_SKIP_NESTED_GIT flag to dir_struct.flags and use it to prevent fill_directory() and friends from descending into nested git repos. With this change, we also modify two regression tests added in commit 91479b9c72f1 ("t7300: add tests to document behavior of clean and nested git", 2015-06-15). That commit, nor its series, nor the six previous iterations of that series on the mailing list discussed why those tests coded the expectation they did. In fact, it appears their purpose was simply to test _existing_ behavior to make sure that the performance changes didn't change the behavior. However, these two tests directly contradicted the manpage's claims that two -f's were required to delete files/directories under a nested git repository. While one could argue that the user gave an explicit path which matched files/directories that were within a nested repository, there's a slippery slope that becomes very difficult for users to understand once you go down that route (e.g. what if they specified "git clean -f -d '*.c'"?) It would also be hard to explain what the exact behavior was; avoid such problems by making it really simple. Also, clean up some grammar errors describing this functionality in the git-clean manpage. Finally, there are still a couple bugs with -ffd not cleaning out enough (e.g. missing the nested .git) and with -ffdX possibly cleaning out the wrong files (paying attention to outer .gitignore instead of inner). This patch does not address these cases at all (and does not change the behavior relative to those flags), it only fixes the handling when given a single -f. See https://public-inbox.org/git/20190905212043.GC32087@szeder.dev/ for more discussion of the -ffd[X?] bugs. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-17clean: disambiguate the definition of -dElijah Newren
The -d flag pre-dated git-clean's ability to have paths specified. As such, the default for git-clean was to only remove untracked files in the current directory, and -d existed to allow it to recurse into subdirectories. The interaction of paths and the -d option appears to not have been carefully considered, as evidenced by numerous bugs and a dearth of tests covering such pairings in the testsuite. The definition turns out to be important, so let's look at some of the various ways one could interpret the -d option: A) Without -d, only look in subdirectories which contain tracked files under them; with -d, also look in subdirectories which are untracked for files to clean. B) Without specified paths from the user for us to delete, we need to have some kind of default, so...without -d, only look in subdirectories which contain tracked files under them; with -d, also look in subdirectories which are untracked for files to clean. The important distinction here is that choice B says that the presence or absence of '-d' is irrelevant if paths are specified. The logic behind option B is that if a user explicitly asked us to clean a specified pathspec, then we should clean anything that matches that pathspec. Some examples may clarify. Should git clean -f untracked_dir/file remove untracked_dir/file or not? It seems crazy not to, but a strict reading of option A says it shouldn't be removed. How about git clean -f untracked_dir/file1 tracked_dir/file2 or git clean -f untracked_dir_1/file1 untracked_dir_2/file2 ? Should it remove either or both of these files? Should it require multiple runs to remove both the files listed? (If this sounds like a crazy question to even ask, see the commit message of "t7300: Add some testcases showing failure to clean specified pathspecs" added earlier in this patch series.) What if -ffd were used instead of -f -- should that allow these to be removed? Should it take multiple invocations with -ffd? What if a glob (such as '*tracked*') were used instead of spelling out the directory names? What if the filenames involved globs, such as git clean -f '*.o' or git clean -f '*/*.o' ? The current documentation actually suggests a definition that is slightly different than choice A, and the implementation prior to this series provided something radically different than either choices A or B. (The implementation, though, was clearly just buggy). There may be other choices as well. However, for almost any given choice of definition for -d that I can think of, some of the examples above will appear buggy to the user. The only case that doesn't have negative surprises is choice B: treat a user-specified path as a request to clean all untracked files which match that path specification, including recursing into any untracked directories. Change the documentation and basic implementation to use this definition. There were two regression tests that indirectly depended on the current implementation, but neither was about subdirectory handling. These two tests were introduced in commit 5b7570cfb41c ("git-clean: add tests for relative path", 2008-03-07) which was solely created to add coverage for the changes in commit fb328947c8e ("git-clean: correct printing relative path", 2008-03-07). Both tests specified a directory that happened to have an untracked subdirectory, but both were only checking that the resulting printout of a file that was removed was shown with a relative path. Update these tests appropriately. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-05treewide: rename 'exclude' methods to 'pattern'Derrick Stolee
The first consumer of pattern-matching filenames was the .gitignore feature. In that context, storing a list of patterns as a 'struct exclude_list' makes sense. However, the sparse-checkout feature then adopted these structures and methods, but with the opposite meaning: these patterns match the files that should be included! It would be clearer to rename this entire library as a "pattern matching" library, and the callers apply exclusion/inclusion logic accordingly based on their needs. This commit renames several methods defined in dir.h to make more sense with the renamed 'struct exclude_list' to 'struct pattern_list' and 'struct exclude' to 'struct path_pattern': * last_exclude_matching() -> last_matching_pattern() * parse_exclude() -> parse_path_pattern() In addition, the word 'exclude' was replaced with 'pattern' in the methods below: * add_exclude_list() * add_excludes_from_file_to_list() * add_excludes_from_file() * add_excludes_from_blob_to_list() * add_exclude() * clear_exclude_list() A few methods with the word "exclude" remain. These will be handled seperately. In particular, the method "is_excluded()" is concretely about the .gitignore file relative to a specific directory. This is the important boundary between library and consumer: is_excluded() cares about .gitignore, but is_excluded() calls last_matching_pattern() to make that decision. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-05treewide: rename 'struct exclude_list' to 'struct pattern_list'Derrick Stolee
The first consumer of pattern-matching filenames was the .gitignore feature. In that context, storing a list of patterns as a 'struct exclude_list' makes sense. However, the sparse-checkout feature then adopted these structures and methods, but with the opposite meaning: these patterns match the files that should be included! It would be clearer to rename this entire library as a "pattern matching" library, and the callers apply exclusion/inclusion logic accordingly based on their needs. This commit renames 'struct exclude_list' to 'struct pattern_list' and renames several variables called 'el' to 'pl'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-19clean: show an error message when the path is too longJohannes Schindelin
When `lstat()` failed, `git clean` would abort without an error message, leaving the user quite puzzled. In particular on Windows, where the default maximum path length is quite small (yet there are ways to circumvent that limit in many cases), it is very important that users be given an indication why their command failed because of too long paths when it did. This test case makes sure that a warning is issued that would have helped the user who reported this issue: https://github.com/git-for-windows/git/issues/521 Note that we temporarily set `core.longpaths = false` in the regression test; this ensures forward-compatibility with the `core.longpaths` feature that has not yet been upstreamed from Git for Windows. Helped-by: René Scharfe <l.s.r@web.de> Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>