summaryrefslogtreecommitdiff
path: root/dir.c
AgeCommit message (Collapse)Author
2013-09-17dir: revert work-around for retired dangerous behaviorEric Sunshine
directory_exists_in_index_icase() dangerously assumed that it could access one character beyond the end of its directory argument, and that that character would unconditionally be '/'. 2eac2a4c (ls-files -k: a directory only can be killed if the index has a non-directory, 2013-08-15) added a caller which did not respect this undocumented assumption, and 680be044 (dir.c::test_one_path(): work around directory_exists_in_index_icase() breakage, 2013-08-23) added a work-around which temporarily appends a '/' before invoking directory_exists_in_index_icase(). Since the dangerous behavior of directory_exists_in_index_icase() has been eliminated, the work-around is now redundant, so retire it (but not the tests added by the same commit). Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-17name-hash: stop storing trailing '/' on paths in index_state.dir_hashEric Sunshine
When 5102c617 (Add case insensitivity support for directories when using git status, 2010-10-03) added directories to the name-hash there was only a single hash table in which both real cache entries and leading directory prefixes were registered. To distinguish between the two types of entries, directories were stored with a trailing '/'. 2092678c (name-hash.c: fix endless loop with core.ignorecase=true, 2013-02-28), however, moved directories to a separate hash table (index_state.dir_hash) but retained the (now) redundant trailing '/', thus callers continue to bear the burden of ensuring the slash's presence before searching the index for a directory. Eliminate this redundancy by storing paths in the dir-hash without the trailing '/'. An important benefit of this change is that it eliminates undocumented and dangerous behavior of dir.c:directory_exists_in_index_icase() in which it assumes not only that it can validly access one character beyond the end of its incoming directory argument, but also that that character will unconditionally be a '/'. This perilous behavior was "tolerated" because the string passed in by its lone caller always had a '/' in that position, however, things broke [1] when 2eac2a4c (ls-files -k: a directory only can be killed if the index has a non-directory, 2013-08-15) added a new caller which failed to respect the undocumented assumption. [1]: http://thread.gmane.org/gmane.comp.version-control.git/232727 Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-17employ new explicit "exists in index?" APIEric Sunshine
Each caller of index_name_exists() knows whether it is looking for a directory or a file, and can avoid the unnecessary indirection of index_name_exists() by instead calling index_dir_exists() or index_file_exists() directly. Invoking the appropriate search function explicitly will allow a subsequent patch to relieve callers of the artificial burden of having to add a trailing '/' to the pathname given to index_dir_exists(). Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-11Merge branch 'jc/ls-files-killed-optim'Junio C Hamano
"git ls-files -k" needs to crawl only the part of the working tree that may overlap the paths in the index to find killed files, but shared code with the logic to find all the untracked files, which made it unnecessarily inefficient. * jc/ls-files-killed-optim: dir.c::test_one_path(): work around directory_exists_in_index_icase() breakage t3010: update to demonstrate "ls-files -k" optimization pitfalls ls-files -k: a directory only can be killed if the index has a non-directory dir.c: use the cache_* macro to access the current index
2013-09-09Merge branch 'jl/submodule-mv'Junio C Hamano
"git mv A B" when moving a submodule A does "the right thing", inclusing relocating its working tree and adjusting the paths in the .gitmodules file. * jl/submodule-mv: (53 commits) rm: delete .gitmodules entry of submodules removed from the work tree mv: update the path entry in .gitmodules for moved submodules submodule.c: add .gitmodules staging helper functions mv: move submodules using a gitfile mv: move submodules together with their work trees rm: do not set a variable twice without intermediate reading. t6131 - skip tests if on case-insensitive file system parse_pathspec: accept :(icase)path syntax pathspec: support :(glob) syntax pathspec: make --literal-pathspecs disable pathspec magic pathspec: support :(literal) syntax for noglob pathspec kill limit_pathspec_to_literal() as it's only used by parse_pathspec() parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN parse_pathspec: make sure the prefix part is wildcard-free rename field "raw" to "_raw" in struct pathspec tree-diff: remove the use of pathspec's raw[] in follow-rename codepath remove match_pathspec() in favor of match_pathspec_depth() remove init_pathspec() in favor of parse_pathspec() remove diff_tree_{setup,release}_paths convert common_prefix() to use struct pathspec ...
2013-08-23dir.c::test_one_path(): work around directory_exists_in_index_icase() breakageEric Sunshine
directory_exists_in_index() takes pathname and its length, but its helper function directory_exists_in_index_icase() reads one byte beyond the end of the pathname and expects there to be a '/'. This needs to be fixed, as that one-byte-beyond-the-end location may not even be readable, possibly by not registering directories to name hashes with trailing slashes. In the meantime, update the new caller added recently to treat_one_path() to make sure that the path buffer it gives the function is one byte longer than the path it is asking the function about by appending a slash to it. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-15ls-files -k: a directory only can be killed if the index has a non-directoryJunio C Hamano
"ls-files -o" and "ls-files -k" both traverse the working tree down to find either all untracked paths or those that will be "killed" (removed from the working tree to make room) when the paths recorded in the index are checked out. It is necessary to traverse the working tree fully when enumerating all the "other" paths, but when we are only interested in "killed" paths, we can take advantage of the fact that paths that do not overlap with entries in the index can never be killed. The treat_one_path() helper function, which is called during the recursive traversal, is the ideal place to implement an optimization. When we are looking at a directory P in the working tree, there are three cases: (1) P exists in the index. Everything inside the directory P in the working tree needs to go when P is checked out from the index. (2) P does not exist in the index, but there is P/Q in the index. We know P will stay a directory when we check out the contents of the index, but we do not know yet if there is a directory P/Q in the working tree to be killed, so we need to recurse. (3) P does not exist in the index, and there is no P/Q in the index to require P to be a directory, either. Only in this case, we know that everything inside P will not be killed without recursing. Note that this helper is called by treat_leading_path() that decides if we need to traverse only subdirectories of a single common leading directory, which is essential for this optimization to be correct. This caller checks each level of the leading path component from shallower directory to deeper ones, and that is what allows us to only check if the path appears in the index. If the call to treat_one_path() weren't there, given a path P/Q/R, the real traversal may start from directory P/Q/R, even when the index records P as a regular file, and we would end up having to check if any leading subpath in P/Q/R, e.g. P, appears in the index. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-15dir.c: use the cache_* macro to access the current indexJunio C Hamano
These codepaths always start from the_index and use index_* functions, but there is no reason to do so. Use the compatibility cache_* macro to access the current in-core index like everybody else. While at it, fix typo in the comment for a function to check if a path within a directory appears in the index. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-22Merge branch 'nd/const-struct-cache-entry'Junio C Hamano
* nd/const-struct-cache-entry: Convert "struct cache_entry *" to "const ..." wherever possible
2013-07-15parse_pathspec: accept :(icase)path syntaxNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15pathspec: support :(glob) syntaxNguyễn Thái Ngọc Duy
:(glob)path differs from plain pathspec that it uses wildmatch with WM_PATHNAME while the other uses fnmatch without FNM_PATHNAME. The difference lies in how '*' (and '**') is processed. With the introduction of :(glob) and :(literal) and their global options --[no]glob-pathspecs, the user can: - make everything literal by default via --noglob-pathspecs --literal-pathspecs cannot be used for this purpose as it disables _all_ pathspec magic. - individually turn on globbing with :(glob) - make everything globbing by default via --glob-pathspecs - individually turn off globbing with :(literal) The implication behind this is, there is no way to gain the default matching behavior (i.e. fnmatch without FNM_PATHNAME). You either get new globbing or literal. The old fnmatch behavior is considered deprecated and discouraged to use. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15pathspec: support :(literal) syntax for noglob pathspecNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15kill limit_pathspec_to_literal() as it's only used by parse_pathspec()Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15rename field "raw" to "_raw" in struct pathspecNguyễn Thái Ngọc Duy
This patch is essentially no-op. It helps catching new use of this field though. This field is introduced as an intermediate step for the pathspec conversion and will be removed eventually. At this stage no more access sites should be introduced. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15remove match_pathspec() in favor of match_pathspec_depth()Nguyễn Thái Ngọc Duy
match_pathspec_depth was created to replace match_pathspec (see 61cf282 (pathspec: add match_pathspec_depth() - 2010-12-15). It took more than two years, but the replacement finally happens :-) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15remove init_pathspec() in favor of parse_pathspec()Nguyễn Thái Ngọc Duy
While at there, move free_pathspec() to pathspec.c Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15convert common_prefix() to use struct pathspecNguyễn Thái Ngọc Duy
The code now takes advantage of nowildcard_len field. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15convert {read,fill}_directory to take struct pathspecNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15guard against new pathspec magic in pathspec matching codeNguyễn Thái Ngọc Duy
GUARD_PATHSPEC() marks pathspec-sensitive code, basically all those that touch anything in 'struct pathspec' except fields "nr" and "original". GUARD_PATHSPEC() is not supposed to fail. It's mainly to help the designers catch unsupported codepaths. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15parse_pathspec: add special flag for max_depth featureNguyễn Thái Ngọc Duy
match_pathspec_depth() and tree_entry_interesting() check max_depth field in order to support "git grep --max-depth". The feature activation is tied to "recursive" field, which led to some unwanted activation, e.g. 5c8eeb8 (diff-index: enable recursive pathspec matching in unpack_trees - 2012-01-15). This patch decouples the activation from "recursive" field, puts it in "magic" field instead. This makes sure that only "git grep" can activate this feature. And because parse_pathspec knows when the feature is not used, it does not need to sort pathspec (required for max_depth to work correctly). A small win for non-grep cases. Even though a new magic flag is introduced, no magic syntax is. The magic can be only enabled by parse_pathspec() caller. We might someday want to support ":(maxdepth:10)src." It all depends on actual use cases. max_depth feature cannot be enabled via init_pathspec() anymore. But that's ok because init_pathspec() is on its way to /dev/null. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15parse_pathspec: save original pathspec for reportingNguyễn Thái Ngọc Duy
We usually use pathspec_item's match field for pathspec error reporting. However "match" (or "raw") does not show the magic part, which will play more important role later on. Preserve exact user input for reporting. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15add parse_pathspec() that converts cmdline args to struct pathspecNguyễn Thái Ngọc Duy
Currently to fill a struct pathspec, we do: const char **paths; paths = get_pathspec(prefix, argv); ... init_pathspec(&pathspec, paths); "paths" can only carry bare strings, which loses information from command line arguments such as pathspec magic or the prefix part's length for each argument. parse_pathspec() is introduced to combine the two calls into one. The plan is gradually replace all get_pathspec() and init_pathspec() with parse_pathspec(). get_pathspec() now becomes a thin wrapper of parse_pathspec(). parse_pathspec() allows the caller to reject the pathspec magics that it does not support. When a new pathspec magic is introduced, we can enable it per command after making sure that all underlying code has no problem with the new magic. "flags" parameter is currently unused. But it would allow callers to pass certain instructions to parse_pathspec, for example forcing literal pathspec when no magic is used. With the introduction of parse_pathspec, there are now two functions that can initialize struct pathspec: init_pathspec and parse_pathspec. Any semantic changes in struct pathspec must be reflected in both functions. init_pathspec() will be phased out in favor of parse_pathspec(). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15move struct pathspec and related functions to pathspec.[ch]Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-09Convert "struct cache_entry *" to "const ..." wherever possibleNguyễn Thái Ngọc Duy
I attempted to make index_state->cache[] a "const struct cache_entry **" to find out how existing entries in index are modified and where. The question I have is what do we do if we really need to keep track of on-disk changes in the index. The result is - diff-lib.c: setting CE_UPTODATE - name-hash.c: setting CE_HASHED - preload-index.c, read-cache.c, unpack-trees.c and builtin/update-index: obvious - entry.c: write_entry() may refresh the checked out entry via fill_stat_cache_info(). This causes "non-const struct cache_entry *" in builtin/apply.c, builtin/checkout-index.c and builtin/checkout.c - builtin/ls-files.c: --with-tree changes stagemask and may set CE_UPDATE Of these, write_entry() and its call sites are probably most interesting because it modifies on-disk info. But this is stat info and can be retrieved via refresh, at least for porcelain commands. Other just uses ce_flags for local purposes. So, keeping track of "dirty" entries is just a matter of setting a flag in index modification functions exposed by read-cache.c. Except unpack-trees, the rest of the code base does not do anything funny behind read-cache's back. The actual patch is less valueable than the summary above. But if anyone wants to re-identify the above sites. Applying this patch, then this: diff --git a/cache.h b/cache.h index 430d021..1692891 100644 --- a/cache.h +++ b/cache.h @@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode) #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1) struct index_state { - struct cache_entry **cache; + const struct cache_entry **cache; unsigned int version; unsigned int cache_nr, cache_alloc, cache_changed; struct string_list *resolve_undo; will help quickly identify them without bogus warnings. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-01treat_directory(): do not declare submodules to be untrackedJunio C Hamano
When the working tree walker encounters a directory, it asks the function treat_directory() if it should descend into it, show it as an untracked directory, or do something else. When the directory is the top of the submodule working tree, we used to say "That is an untracked directory", which was bogus. It is an entity that is tracked in the index of the repository we are looking at, and that is not to be descended into it. Return path_none, not path_untracked, to report that. The existing case that path_untracked is returned for a newly discovered submodule that is not tracked in the index (this only happens when DIR_NO_GITLINKS option is not used) is unchanged, but that is exactly because the submodule is not tracked in the index. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-03Merge branch 'kb/status-ignored-optim-2'Junio C Hamano
Fix 1.8.3 regressions in the .gitignore path exclusion logic. * kb/status-ignored-optim-2: dir.c: fix ignore processing within not-ignored directories
2013-06-02dir.c: fix ignore processing within not-ignored directoriesKarsten Blees
As of 95c6f271 "dir.c: unify is_excluded and is_path_excluded APIs", the is_excluded API no longer recurses into directories that match an ignore pattern, and returns the directory's ignored state for all contained paths. This is OK for normal ignore patterns, i.e. ignoring a directory affects the entire contents recursively. Unfortunately, this also "works" for negated ignore patterns ('!dir'), i.e. the entire contents is "not-ignored" recursively, regardless of ignore patterns that match the contents directly. In prep_exclude, skip recursing into a directory only if it is really ignored (i.e. the ignore pattern is not negated). Signed-off-by: Karsten Blees <blees@dcon.de> Tested-by: Øystein Walle <oystwa@gmail.com> Reviewed-by: Duy Nguyen <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-29Merge branch 'jn/config-ignore-inaccessible'Junio C Hamano
When $HOME is misconfigured to point at an unreadable directory, we used to complain and die. This loosens the check. * jn/config-ignore-inaccessible: config: allow inaccessible configuration under $HOME
2013-04-15dir.c: git-status --ignored: don't scan the work tree twiceKarsten Blees
'git-status --ignored' still scans the work tree twice to collect untracked and ignored files, respectively. fill_directory / read_directory already supports collecting untracked and ignored files in a single directory scan. However, the DIR_COLLECT_IGNORED flag to enable this has some git-add specific side-effects (e.g. it doesn't recurse into ignored directories, so listing ignored files with --untracked=all doesn't work). The DIR_SHOW_IGNORED flag doesn't list untracked files and returns ignored files in dir_struct.entries[] (instead of dir_struct.ignored[] as DIR_COLLECT_IGNORED). DIR_SHOW_IGNORED is used all throughout git. We don't want to break the existing API, so lets introduce a new flag DIR_SHOW_IGNORED_TOO that lists untracked as well as ignored files similar to DIR_COLLECT_FILES, but will recurse into sub-directories based on the other flags as DIR_SHOW_IGNORED does. In dir.c::read_directory_recursive, add ignored files to either dir_struct.entries[] or dir_struct.ignored[] based on the flags. Also move the DIR_COLLECT_IGNORED case here so that filling result lists is in a common place. In wt-status.c::wt_status_collect_untracked, use the new flag and read results from dir_struct.ignored[]. Remove the extra fill_directory call. builtin/check-ignore.c doesn't call fill_directory, setting the git-add specific DIR_COLLECT_IGNORED flag has no effect here. Remove for clarity. Update API documentation to reflect the changes. Performance: with this patch, 'git-status --ignored' is typically as fast as 'git-status'. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15dir.c: git-status --ignored: don't scan the work tree three timesKarsten Blees
'git-status --ignored' recursively scans directories up to three times: 1. To collect untracked files. 2. To collect ignored files. 3. When collecting ignored files, to check that an untracked directory that potentially contains ignored files doesn't also contain untracked files (i.e. isn't already listed as untracked). Let's get rid of case 3 first. Currently, read_directory_recursive returns a boolean whether a directory contains the requested files or not (actually, it returns the number of files, but no caller actually needs that), and DIR_SHOW_IGNORED specifies what we're looking for. To be able to test for both untracked and ignored files in a single scan, we need to return a bit more info, and the result must be independent of the DIR_SHOW_IGNORED flag. Reuse the path_treatment enum as return value of read_directory_recursive. Split path_handled in two separate values path_excluded and path_untracked that don't change their meaning with the DIR_SHOW_IGNORED flag. We don't need an extra value path_untracked_and_excluded, as directories with both untracked and ignored files should be listed as untracked. Rename path_ignored to path_none for clarity (i.e. "don't treat that path" in contrast to "the path is ignored and should be treated according to DIR_SHOW_IGNORED"). Replace enum directory_treatment with path_treatment. That's just another enum with the same meaning, no need to translate back and forth. In treat_directory, get rid of the extra read_directory_recursive call and all the DIR_SHOW_IGNORED-specific code. In read_directory_recursive, decide whether to dir_add_name path_excluded or path_untracked paths based on the DIR_SHOW_IGNORED flag. The return value of read_directory_recursive is the maximum path_treatment of all files and sub-directories. In the check_only case, abort when we've reached the most significant value (path_untracked). Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15dir.c: git-status: avoid is_excluded checks for tracked filesKarsten Blees
Checking if a file is in the index is much faster (hashtable lookup) than checking if the file is excluded (linear search over exclude patterns). Skip is_excluded checks for files: move the cache_name_exists check from treat_file to treat_one_path and return early if the file is tracked. This can safely be done as all other code paths also return path_ignored for tracked files, and dir_add_ignored skips tracked files as well. There's just one line left in treat_file, so move this to treat_one_path as well. Here's some performance data for git-status from the linux and WebKit repos (best of 10 runs on a Debian Linux on SSD, core.preloadIndex=true): | status | status --ignored | linux | WebKit | linux | WebKit -------+-------+--------+-------+--------- before | 0.218 | 1.583 | 0.321 | 2.579 after | 0.156 | 0.988 | 0.202 | 1.279 gain | 1.397 | 1.602 | 1.589 | 2.016 Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15dir.c: replace is_path_excluded with now equivalent is_excluded APIKarsten Blees
Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15dir.c: unify is_excluded and is_path_excluded APIsKarsten Blees
The is_excluded and is_path_excluded APIs are very similar, except for a few noteworthy differences: is_excluded doesn't handle ignored directories, results for paths within ignored directories are incorrect. This is probably based on the premise that recursive directory scans should stop at ignored directories, which is no longer true (in certain cases, read_directory_recursive currently calls is_excluded *and* is_path_excluded to get correct ignored state). is_excluded caches parsed .gitignore files of the last directory in struct dir_struct. If the directory changes, it finds a common parent directory and is very careful to drop only as much state as necessary. On the other hand, is_excluded will also read and parse .gitignore files in already ignored directories, which are completely irrelevant. is_path_excluded correctly handles ignored directories by checking if any component in the path is excluded. As it uses is_excluded internally, this unfortunately forces is_excluded to drop and re-read all .gitignore files, as there is no common parent directory for the root dir. is_path_excluded tracks state in a separate struct path_exclude_check, which is essentially a wrapper of dir_struct with two more fields. However, as is_path_excluded also modifies dir_struct, it is not possible to e.g. use multiple path_exclude_check structures with the same dir_struct in parallel. The additional structure just unnecessarily complicates the API. Teach is_excluded / prep_exclude about ignored directories: whenever entering a new directory, first check if the entire directory is excluded. Remember the excluded state in dir_struct. Don't traverse into already ignored directories (i.e. don't read irrelevant .gitignore files). Directories could also be excluded by exclude patterns specified on the command line or .git/info/exclude, so we cannot simply skip prep_exclude entirely if there's no .gitignore file name (dir_struct.exclude_per_dir). Move this check to just before actually reading the file. is_path_excluded is now equivalent to is_excluded, so we can simply redirect to it (the public API is cleaned up in the next patch). The performance impact of the additional ignored check per directory is hardly noticeable when reading directories recursively (e.g. 'git status'). However, performance of git commands using the is_path_excluded API (e.g. 'git ls-files --cached --ignored --exclude-standard') is greatly improved as this no longer re-reads .gitignore files on each call. Here's some performance data from the linux and WebKit repos (best of 10 runs on a Debian Linux on SSD, core.preloadIndex=true): | ls-files -ci | status | status --ignored | linux | WebKit | linux | WebKit | linux | WebKit -------+-------+--------+-------+--------+-------+--------- before | 0.506 | 6.539 | 0.212 | 1.555 | 0.323 | 2.541 after | 0.080 | 1.191 | 0.218 | 1.583 | 0.321 | 2.579 gain | 6.325 | 5.490 | 0.972 | 0.982 | 1.006 | 0.985 Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15dir.c: move prep_excludeKarsten Blees
Move prep_exclude in preparation for the next patch. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15dir.c: factor out parts of last_exclude_matching for later reuseKarsten Blees
Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15dir.c: git-clean -d -X: don't delete tracked directoriesKarsten Blees
The notion of "ignored tracked" directories introduced in 721ac4ed "dir.c: Make git-status --ignored more consistent" has a few unwanted side effects: - git-clean -d -X: deletes ignored tracked directories. git-clean should never delete tracked content. - git-ls-files --ignored --other --directory: lists ignored tracked directories instead of "other" directories. - git-status --ignored: lists ignored tracked directories while contained files may be listed as modified. Paths listed by git-status should be disjoint (except in long format where a path may be listed in both the staged and unstaged section). Additionally, the current behaviour violates documentation in gitignore(5) ("Specifies intentionally *untracked* files to ignore") and Documentation/ technical/api-directory-listing.txt ("DIR_SHOW_OTHER_DIRECTORIES: Include a directory that is *not tracked*."). In dir.c::treat_directory, remove the special handling of ignored tracked directories, so that the DIR_SHOW_OTHER_DIRECTORIES flag only affects "other" (i.e. untracked) directories. In dir.c::dir_add_name, check that added paths are untracked even if DIR_SHOW_IGNORED is set. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15dir.c: make 'git-status --ignored' work within leading directoriesKarsten Blees
'git-status --ignored path/' doesn't list ignored files and directories within 'path' if some component of 'path' is classified as untracked. Disable the DIR_SHOW_OTHER_DIRECTORIES flag while traversing leading directories. This prevents treat_leading_path() with DIR_SHOW_IGNORED flag from aborting at the top level untracked directory. As a side effect, this also eliminates a recursive directory scan per leading directory level, as treat_directory() can no longer call read_directory_recursive() when called from treat_leading_path(). Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15dir.c: git-status --ignored: don't list empty directories as ignoredKarsten Blees
'git-status --ignored' lists empty untracked directories as ignored, even though they don't have any ignored files. When checking if a directory is already listed as untracked (i.e. shouldn't be listed as ignored as well), don't assume that the directory has only ignored files if it doesn't have untracked files, as the directory may be empty. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15dir.c: git-ls-files --directories: don't hide empty directoriesKarsten Blees
'git-ls-files --ignored --directories' hides empty directories even though --no-empty-directory was not specified. Treat the DIR_HIDE_EMPTY_DIRECTORIES flag independently from DIR_SHOW_IGNORED to make all git-ls-files options work as expected. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15dir.c: git-status --ignored: don't list empty ignored directoriesKarsten Blees
'git-status --ignored' lists ignored tracked directories without any ignored files if a tracked file happens to match an exclude pattern. Always exclude tracked files. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15dir.c: git-status --ignored: don't list files in ignored directoriesKarsten Blees
'git-status --ignored' lists both the ignored directory and the ignored files if the files are in a tracked sub directory. When recursing into sub directories in read_directory_recursive, pass on the check_only parameter so that we don't accidentally add the files. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15dir.c: git-status --ignored: don't drop ignored directoriesKarsten Blees
'git-status --ignored' drops ignored directories if they contain untracked files in an untracked sub directory. Fix it by getting exact (recursive) excluded status in treat_directory. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15config: allow inaccessible configuration under $HOMEJonathan Nieder
The changes v1.7.12.1~2^2~4 (config: warn on inaccessible files, 2012-08-21) and v1.8.1.1~22^2~2 (config: treat user and xdg config permission problems as errors, 2012-10-13) were intended to prevent important configuration (think "[transfer] fsckobjects") from being ignored when the configuration is unintentionally unreadable (for example with EIO on a flaky filesystem, or with ENOMEM due to a DoS attack). Usually ~/.gitconfig and ~/.config/git are readable by the current user, and if they aren't then it would be easy to fix those permissions, so the damage from adding this check should have been minimal. Unfortunately the access() check often trips when git is being run as a server. A daemon (such as inetd or git-daemon) starts as "root", creates a listening socket, and then drops privileges, meaning that when git commands are invoked they cannot access $HOME and die with fatal: unable to access '/root/.config/git/config': Permission denied Any patch to fix this would have one of three problems: 1. We annoy sysadmins who need to take an extra step to handle HOME when dropping privileges (the current behavior, or any other proposal that they have to opt into). 2. We annoy sysadmins who want to set HOME when dropping privileges, either by making what they want to do impossible, or making them set an extra variable or option to accomplish what used to work (e.g., a patch to git-daemon to set HOME when --user is passed). 3. We loosen the check, so some cases which might be noteworthy are not caught. This patch is of type (3). Treat user and xdg configuration that are inaccessible due to permissions (EACCES) as though no user configuration was provided at all. An alternative method would be to check if $HOME is readable, but that would not help in cases where the user who dropped privileges had a globally readable HOME with only .config or .gitconfig being private. This does not change the behavior when /etc/gitconfig or .git/config is unreadable (since those are more serious configuration errors), nor when ~/.gitconfig or ~/.config/git is unreadable due to problems other than permissions. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Improved-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-07Merge branch 'jk/rm-removed-paths'Junio C Hamano
A handful of test cases and a corner case bugfix for "git rm". * jk/rm-removed-paths: t3600: document failure of rm across symbolic links t3600: test behavior of reverse-d/f conflict rm: do not complain about d/f conflicts during deletion
2013-04-07Sync with 1.8.1.6Junio C Hamano
2013-04-07Merge branch 'jc/directory-attrs-regression-fix' into maint-1.8.1Junio C Hamano
A pattern "dir" (without trailing slash) in the attributes file stopped matching a directory "dir" by mistake with an earlier change that wanted to allow pattern "dir/" to also match. * jc/directory-attrs-regression-fix: t: check that a pattern without trailing slash matches a directory dir.c::match_pathname(): pay attention to the length of string parameters dir.c::match_pathname(): adjust patternlen when shifting pattern dir.c::match_basename(): pay attention to the length of string parameters attr.c::path_matches(): special case paths that end with a slash attr.c::path_matches(): the basename is part of the pathname
2013-04-04rm: do not complain about d/f conflicts during deletionJeff King
If we used to have an index entry "d/f", but "d" has been replaced by a non-directory entry, the user may still want to run "git rm" to delete the stale index entry. They could use "git rm --cached" to just touch the index, but "git rm" should also work: we explicitly try to handle the case that the file has already been removed from the working tree. However, because unlinking "d/f" in this case will not yield ENOENT, but rather ENOTDIR, we do not notice that the file is already gone. Instead, we report it as an error. The simple solution is to treat ENOTDIR in this case exactly like ENOENT; all we want to know is whether the file is already gone, and if a leading path is no longer a directory, then by definition the sub-path is gone. Reported-by: jpinheiro <7jpinheiro@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-03Merge branch 'jc/directory-attrs-regression-fix'Junio C Hamano
Fix 1.8.1.x regression that stopped matching "dir" (without trailing slash) to a directory "dir". * jc/directory-attrs-regression-fix: t: check that a pattern without trailing slash matches a directory dir.c::match_pathname(): pay attention to the length of string parameters dir.c::match_pathname(): adjust patternlen when shifting pattern dir.c::match_basename(): pay attention to the length of string parameters attr.c::path_matches(): special case paths that end with a slash attr.c::path_matches(): the basename is part of the pathname
2013-03-29dir.c::match_pathname(): pay attention to the length of string parametersJeff King
This function takes two counted strings: a <pattern, patternlen> pair and a <pathname, pathlen> pair. But we end up feeding the result to fnmatch, which expects NUL-terminated strings. We can fix this by calling the fnmatch_icase_mem function, which handles re-allocating into a NUL-terminated string if necessary. While we're at it, we can avoid even calling fnmatch in some cases. In addition to patternlen, we get "prefix", the size of the pattern that contains no wildcard characters. We do a straight match of the prefix part first, and then use fnmatch to cover the rest. But if there are no wildcards in the pattern at all, we do not even need to call fnmatch; we would simply be comparing two empty strings. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-29dir.c::match_pathname(): adjust patternlen when shifting patternJeff King
If we receive a pattern that starts with "/", we shift it forward to avoid looking at the "/" part. Since the prefix and patternlen parameters are counts of what is in the pattern, we must decrement them as we increment the pointer. We remembered to handle prefix, but not patternlen. This didn't cause any bugs, though, because the patternlen parameter is not actually used. Since it will be used in future patches, let's correct this oversight. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>