summaryrefslogtreecommitdiff
path: root/dir.c
AgeCommit message (Collapse)Author
2015-10-29Merge branch 'dt/name-hash-dir-entry-fix'Junio C Hamano
The name-hash subsystem that is used to cope with case insensitive filesystems keeps track of directories and their on-filesystem cases for all the paths in the index by holding a pointer to a randomly chosen cache entry that is inside the directory (for its ce->ce_name component). This pointer was not updated even when the cache entry was removed from the index, leading to use after free. This was fixed by recording the path for each directory instead of borrowing cache entries and restructuring the API somewhat. * dt/name-hash-dir-entry-fix: name-hash: don't reuse cache_entry in dir_entry
2015-10-21name-hash: don't reuse cache_entry in dir_entryDavid Turner
Stop reusing cache_entry in dir_entry; doing so causes a use-after-free bug. During merges, we free entries that we no longer need in the destination index. But those entries might have also been stored in the dir_entry cache, and when a later call to add_to_index found them, they would be used after being freed. To prevent this, change dir_entry to store a copy of the name instead of a pointer to a cache_entry. This entails some refactoring of code that expects the cache_entry. Keith McGuigan <kmcguigan@twitter.com> diagnosed this bug and wrote the initial patch, but this version does not use any of Keith's code. Helped-by: Keith McGuigan <kmcguigan@twitter.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-20Merge branch 'jk/war-on-sprintf'Junio C Hamano
Many allocations that is manually counted (correctly) that are followed by strcpy/sprintf have been replaced with a less error prone constructs such as xstrfmt. Macintosh-specific breakage was noticed and corrected in this reroll. * jk/war-on-sprintf: (70 commits) name-rev: use strip_suffix to avoid magic numbers use strbuf_complete to conditionally append slash fsck: use for_each_loose_file_in_objdir Makefile: drop D_INO_IN_DIRENT build knob fsck: drop inode-sorting code convert strncpy to memcpy notes: document length of fanout path with a constant color: add color_set helper for copying raw colors prefer memcpy to strcpy help: clean up kfmclient munging receive-pack: simplify keep_arg computation avoid sprintf and strcpy with flex arrays use alloc_ref rather than hand-allocating "struct ref" color: add overflow checks for parsing colors drop strcpy in favor of raw sha1_to_hex use sha1_to_hex_r() instead of strcpy daemon: use cld->env_array when re-spawning stat_tracking_info: convert to argv_array http-push: use an argv_array for setup_revisions fetch-pack: use argv_array for index-pack / unpack-objects ...
2015-10-15Merge branch 'js/icase-wt-detection'Junio C Hamano
On a case insensitive filesystems, setting GIT_WORK_TREE variable using a random cases that does not agree with what the filesystem thinks confused Git that it wasn't inside the working tree. * js/icase-wt-detection: setup: fix "inside work tree" detection on case-insensitive filesystems
2015-10-07Merge branch 'nd/ignore-then-not-ignore'Junio C Hamano
Allow a later "!/abc/def" to override an earlier "/abc" that appears in the same .gitignore file to make it easier to express "everything in /abc directory is ignored, except for ...". * nd/ignore-then-not-ignore: dir.c: don't exclude whole dir prematurely if neg pattern may match dir.c: make last_exclude_matching_from_list() run til the end
2015-10-05use strbuf_complete to conditionally append slashJeff King
When working with paths in strbufs, we frequently want to ensure that a directory contains a trailing slash before appending to it. We can shorten this code (and make the intent more obvious) by calling strbuf_complete. Most of these cases are trivially identical conversions, but there are two things to note: - in a few cases we did not check that the strbuf is non-empty (which would lead to an out-of-bounds memory access). These were generally not triggerable in practice, either from earlier assertions, or typically because we would have just fed the strbuf to opendir(), which would choke on an empty path. - in a few cases we indexed the buffer with "original_len" or similar, rather than the current sb->len, and it is not immediately obvious from the diff that they are the same. In all of these cases, I manually verified that the strbuf does not change between the assignment and the strbuf_complete call. This does not convert cases which look like: if (sb->len && !is_dir_sep(sb->buf[sb->len - 1])) strbuf_addch(sb, '/'); as those are obviously semantically different. Some of these cases arguably should be doing that, but that is out of scope for this change, which aims purely for cleanup with no behavior change (and at least it will make such sites easier to find and examine in the future, as we can grep for strbuf_complete). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-28setup: fix "inside work tree" detection on case-insensitive filesystemsJohannes Schindelin
Git has a config variable to indicate that it is operating on a file system that is case-insensitive: core.ignoreCase. But the `dir_inside_of()` function did not respect that. As a result, if Git's idea of the current working directory disagreed in its upper/lower case with the `GIT_WORK_TREE` variable (e.g. `C:\test` vs `c:\test`) the user would be greeted by the error message fatal: git-am cannot be used without a working tree. when trying to run a rebase. This fixes https://github.com/git-for-windows/git/issues/402 (reported by Daniel Harding). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-21dir.c: don't exclude whole dir prematurely if neg pattern may matchNguyễn Thái Ngọc Duy
If there is a pattern "!foo/bar", this patch makes it not exclude "foo" right away. This gives us a chance to examine "foo" and re-include "foo/bar". In order for it to detect that the directory under examination should not be excluded right away, in other words it is a parent directory of a negative pattern, the "directory path" of the negative pattern must be literal. Patterns like "!f?o/bar" can't stop "foo" from being excluded. Basename matching (i.e. "no slashes in the pattern") or must-be-dir matching (i.e. "trailing slash in the pattern") does not work well with this. For example, if we descend in "foo" and are examining "foo/abc", current code for "foo/" pattern will check if path "foo/abc", not "foo", is a directory. The same problem with basename matching. These may need big code reorg to make it work. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-21dir.c: make last_exclude_matching_from_list() run til the endNguyễn Thái Ngọc Duy
The next patch adds some post processing to the result value before it's returned to the caller. Keep all branches reach the end of the function, so we can do it all in one place. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-17Merge branch 'dt/untracked-subdir' into maintJunio C Hamano
The experimental untracked-cache feature were buggy when paths with a few levels of subdirectories are involved. * dt/untracked-subdir: untracked cache: fix entry invalidation untracked-cache: fix subdirectory handling t7063: use --force-untracked-cache to speed up a bit untracked-cache: support sparse checkout
2015-08-28Merge branch 'dt/untracked-subdir'Junio C Hamano
The experimental untracked-cache feature were buggy when paths with a few levels of subdirectories are involved. * dt/untracked-subdir: untracked cache: fix entry invalidation untracked-cache: fix subdirectory handling
2015-08-19Merge branch 'jk/git-path'Junio C Hamano
git_path() and mkpath() are handy helper functions but it is easy to misuse, as the callers need to be careful to keep the number of active results below 4. Their uses have been reduced. * jk/git-path: memoize common git-path "constant" files get_repo_path: refactor path-allocation find_hook: keep our own static buffer refs.c: remove_empty_directories can take a strbuf refs.c: avoid git_path assignment in lock_ref_sha1_basic refs.c: avoid repeated git_path calls in rename_tmp_log refs.c: simplify strbufs in reflog setup and writing path.c: drop git_path_submodule refs.c: remove extra git_path calls from read_loose_refs remote.c: drop extraneous local variable from migrate_file prefer mkpathdup to mkpath in assignments prefer git_pathdup to git_path in some possibly-dangerous cases add_to_alternates_file: don't add duplicate entries t5700: modernize style cache.h: complete set of git_path_submodule helpers cache.h: clarify documentation for git_path, et al
2015-08-19Merge branch 'cb/uname-in-untracked' into maintJunio C Hamano
An experimental "untracked cache" feature used uname(2) in a slightly unportable way. * cb/uname-in-untracked: untracked: fix detection of uname(2) failure
2015-08-19untracked cache: fix entry invalidationNguyễn Thái Ngọc Duy
First, the current code in untracked_cache_invalidate_path() is wrong because it can only handle paths "a" or "a/b", not "a/b/c" because lookup_untracked() only looks for entries directly under the given directory. In the last case, it will look for the entry "b/c" in directory "a" instead. This means if you delete or add an entry in a subdirectory, untracked cache may become out of date because it does not invalidate properly. This is noticed by David Turner. The second problem is about invalidation inside a fully untracked/excluded directory. In this case we may have to invalidate back to root. See the comment block for detail. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-19untracked-cache: fix subdirectory handlingDavid Turner
Previously, some calls lookup_untracked would pass a full path. But lookup_untracked assumes that the portion of the path up to and including to the untracked_cache_dir has been removed. So lookup_untracked would be looking in the untracked_cache for 'foo' for 'foo/bar' (instead of just looking for 'bar'). This would cause untracked cache corruption. Instead, treat_directory learns to track the base length of the parent directory, so that only the last path component is passed to lookup_untracked. Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-17Merge branch 'dt/untracked-sparse'Junio C Hamano
Allow untracked cache (experimental) to be used when sparse checkout (experimental) is also in use. * dt/untracked-sparse: untracked-cache: support sparse checkout
2015-08-10memoize common git-path "constant" filesJeff King
One of the most common uses of git_path() is to pass a constant, like git_path("MERGE_MSG"). This has two drawbacks: 1. The return value is a static buffer, and the lifetime is dependent on other calls to git_path, etc. 2. There's no compile-time checking of the pathname. This is OK for a one-off (after all, we have to spell it correctly at least once), but many of these constant strings appear throughout the code. This patch introduces a series of functions to "memoize" these strings, which are essentially globals for the lifetime of the program. We compute the value once, take ownership of the buffer, and return the cached value for subsequent calls. cache.h provides a helper macro for defining these functions as one-liners, and defines a few common ones for global use. Using a macro is a little bit gross, but it does nicely document the purpose of the functions. If we need to touch them all later (e.g., because we learned how to change the git_dir variable at runtime, and need to invalidate all of the stored values), it will be much easier to have the complete list. Note that the shared-global functions have separate, manual declarations. We could do something clever with the macros (e.g., expand it to a declaration in some places, and a declaration _and_ a definition in path.c). But there aren't that many, and it's probably better to stay away from too-magical macros. Likewise, if we abandon the C preprocessor in favor of generating these with a script, we could get much fancier. E.g., normalizing "FOO/BAR-BAZ" into "git_path_foo_bar_baz". But the small amount of saved typing is probably not worth the resulting confusion to readers who want to grep for the function's definition. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-03Merge branch 'cb/uname-in-untracked'Junio C Hamano
An experimental "untracked cache" feature used uname(2) in a slightly unportable way. * cb/uname-in-untracked: untracked: fix detection of uname(2) failure
2015-07-31untracked-cache: support sparse checkoutDavid Turner
Remove a check that would disable the untracked cache for sparse checkouts. Add tests that ensure that the untracked cache works with sparse checkouts -- specifically considering the case that a file foo/bar is checked out, but foo/.gitignore is not. Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-17untracked: fix detection of uname(2) failureCharles Bailey
According to POSIX specification uname(2) must return -1 on failure and a non-negative value on success. Although many implementations do return 0 on success it is valid to return any positive value for success. In particular, Solaris returns 1. Signed-off-by: Charles Bailey <cbailey32@bloomberg.net> Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-01Merge branch 'rs/janitorial'Junio C Hamano
Code clean-up. * rs/janitorial: dir: remove unused variable sb clean: remove unused variable buf use file_exists() to check if a file exists in the worktree
2015-05-26Merge branch 'nd/untracked-cache'Junio C Hamano
Teach the index to optionally remember already seen untracked files to speed up "git status" in a working tree with tons of cruft. * nd/untracked-cache: (24 commits) git-status.txt: advertisement for untracked cache untracked cache: guard and disable on system changes mingw32: add uname() t7063: tests for untracked cache update-index: test the system before enabling untracked cache update-index: manually enable or disable untracked cache status: enable untracked cache untracked-cache: temporarily disable with $GIT_DISABLE_UNTRACKED_CACHE untracked cache: mark index dirty if untracked cache is updated untracked cache: print stats with $GIT_TRACE_UNTRACKED_STATS untracked cache: avoid racy timestamps read-cache.c: split racy stat test to a separate function untracked cache: invalidate at index addition or removal untracked cache: load from UNTR index extension untracked cache: save to an index extension ewah: add convenient wrapper ewah_serialize_strbuf() untracked cache: don't open non-existent .gitignore untracked cache: mark what dirs should be recursed/saved untracked cache: record/validate dir mtime and reuse cached output untracked cache: make a wrapper around {open,read,close}dir() ...
2015-05-20dir: remove unused variable sbRené Scharfe
It had never been used. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-19Merge branch 'jc/gitignore-precedence'Junio C Hamano
core.excludesfile (defaulting to $XDG_HOME/git/ignore) is supposed to be overridden by repository-specific .git/info/exclude file, but the order was swapped from the beginning. This belatedly fixes it. * jc/gitignore-precedence: ignore: info/exclude should trump core.excludesfile
2015-05-13Merge branch 'cn/bom-in-gitignore' into maintJunio C Hamano
Teach the codepaths that read .gitignore and .gitattributes files that these files encoded in UTF-8 may have UTF-8 BOM marker at the beginning; this makes it in line with what we do for configuration files already. * cn/bom-in-gitignore: attr: skip UTF8 BOM at the beginning of the input file config: use utf8_bom[] from utf.[ch] in git_parse_source() utf8-bom: introduce skip_utf8_bom() helper add_excludes_from_file: clarify the bom skipping logic dir: allow a BOM at the beginning of exclude files
2015-05-11Merge branch 'pt/xdg-config-path'Junio C Hamano
Code clean-up for xdg configuration path support. * pt/xdg-config-path: path.c: remove home_config_paths() git-config: replace use of home_config_paths() git-commit: replace use of home_config_paths() credential-store.c: replace home_config_paths() with xdg_config_home() dir.c: replace home_config_paths() with xdg_config_home() attr.c: replace home_config_paths() with xdg_config_home() path.c: implement xdg_config_home()
2015-05-06dir.c: replace home_config_paths() with xdg_config_home()Paul Tan
Since only the xdg excludes file path is required, simplify the code by replacing use of home_config_paths() with xdg_config_home(). Signed-off-by: Paul Tan <pyokagan@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-06Merge branch 'cn/bom-in-gitignore'Junio C Hamano
Teach the codepaths that read .gitignore and .gitattributes files that these files encoded in UTF-8 may have UTF-8 BOM marker at the beginning; this makes it in line with what we do for configuration files already. * cn/bom-in-gitignore: attr: skip UTF8 BOM at the beginning of the input file config: use utf8_bom[] from utf.[ch] in git_parse_source() utf8-bom: introduce skip_utf8_bom() helper add_excludes_from_file: clarify the bom skipping logic dir: allow a BOM at the beginning of exclude files
2015-04-22ignore: info/exclude should trump core.excludesfileJunio C Hamano
$GIT_DIR/info/exclude and core.excludesfile (which falls back to $XDG_HOME/git/ignore) are both ways to override the ignore pattern lists given by the project in .gitignore files. The former, which is per-repository personal preference, should take precedence over the latter, which is a personal preference default across different repositories that are accessed from that machine. The existing documentation also agrees. However, the precedence order was screwed up between these two from the very beginning when 896bdfa2 (add: Support specifying an excludes file with a configuration variable, 2007-02-27) introduced core.excludesfile variable. Noticed-by: Yohei Endo <yoheie@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-16utf8-bom: introduce skip_utf8_bom() helperJunio C Hamano
With the recent change to ignore the UTF8 BOM at the beginning of .gitignore files, we now have two codepaths that do such a skipping (the other one is for reading the configuration files). Introduce utf8_bom[] constant string and skip_utf8_bom() helper and teach .gitignore code how to use it. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-16add_excludes_from_file: clarify the bom skipping logicJunio C Hamano
Even though the previous step shifts where the "entry" begins, we still iterate over the original buf[], which may begin with the UTF-8 BOM we are supposed to be skipping. At the end of the first line, the code grabs the contents of it starting at "entry", so there is nothing wrong per-se, but the logic looks really confused. Instead, move the buf pointer and shrink its size, to truly pretend that UTF-8 BOM did not exist in the input. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-16dir: allow a BOM at the beginning of exclude filesCarlos Martín Nieto
Some text editors like Notepad or LibreOffice write an UTF-8 BOM in order to indicate that the file is Unicode text rather than whatever the current locale would indicate. If someone uses such an editor to edit a gitignore file, we are left with those three bytes at the beginning of the file. If we do not skip them, we will attempt to match a filename with the BOM as prefix, which won't match the files the user is expecting. Signed-off-by: Carlos Martín Nieto <cmn@elego.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-31Merge branch 'jc/report-path-error-to-dir' into maintJunio C Hamano
Code clean-up. * jc/report-path-error-to-dir: report_path_error(): move to dir.c
2015-03-26Merge branch 'jc/report-path-error-to-dir'Junio C Hamano
Code clean-up. * jc/report-path-error-to-dir: report_path_error(): move to dir.c
2015-03-24report_path_error(): move to dir.cJunio C Hamano
The expected call sequence is for the caller to use match_pathspec() repeatedly on a set of pathspecs, accumulating the "hits" in a separate array, and then call this function to diagnose a pathspec that never matched anything, as that can indicate a typo from the command line, e.g. "git commit Maekfile". Many builtin commands use this function from builtin/ls-files.c, which is not a very healthy arrangement. ls-files might have been the first command to feel the need for such a helper, but the need is shared by everybody who uses the "match and then report" pattern. Move it to dir.c where match_pathspec() is defined. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: guard and disable on system changesNguyễn Thái Ngọc Duy
If the user enables untracked cache, then - move worktree to an unsupported filesystem - or simply upgrade OS - or move the whole (portable) disk from one machine to another - or access a shared fs from another machine there's no guarantee that untracked cache can still function properly. Record the worktree location and OS footprint in the cache. If it changes, err on the safe side and disable the cache. The user can 'update-index --untracked-cache' again to make sure all conditions are met. This adds a new requirement that setup_git_directory* must be called before read_cache() because we need worktree location by then, or the cache is dropped. This change does not cover all bases, you can fool it if you try hard. The point is to stop accidents. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Helped-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked-cache: temporarily disable with $GIT_DISABLE_UNTRACKED_CACHENguyễn Thái Ngọc Duy
This can be used to double check if results with untracked cache are correctly, compared to vanilla version. Untracked cache remains in index, but not used. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: mark index dirty if untracked cache is updatedNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: print stats with $GIT_TRACE_UNTRACKED_STATSNguyễn Thái Ngọc Duy
This could be used to verify correct behavior in tests Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: avoid racy timestampsNguyễn Thái Ngọc Duy
When a directory is updated within the same second that its timestamp is last saved, we cannot realize the directory has been updated by checking timestamps. Assume the worst (something is update). See 29e4d36 (Racy GIT - 2005-12-20) for more information. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: invalidate at index addition or removalNguyễn Thái Ngọc Duy
Ideally we should implement untracked_cache_remove_from_index() and untracked_cache_add_to_index() so that they update untracked cache right away instead of invalidating it and wait for read_directory() next time to deal with it. But that may need some more work in unpack-trees.c. So stay simple as the first step. The new call in add_index_entry_with_check() may look strange because new calls usually stay close to cache_tree_invalidate_path(). We do it a bit later than c_t_i_p() in this function because if it's about replacing the entry with the same name, we don't care (but cache-tree does). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: load from UNTR index extensionNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: save to an index extensionNguyễn Thái Ngọc Duy
Helped-by: Stefan Beller <sbeller@google.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: don't open non-existent .gitignoreNguyễn Thái Ngọc Duy
This cuts down a signficant number of open(.gitignore) because most directories usually don't have .gitignore files. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: mark what dirs should be recursed/savedNguyễn Thái Ngọc Duy
If we redo this thing in a functional style, we would have one struct untracked_dir as input tree and another as output. The input is used for verification. The output is a brand new tree, reflecting current worktree. But that means recreate a lot of dir nodes even if a lot could be shared between input and output trees in good cases. So we go with the messy but efficient way, combining both input and output trees into one. We need a way to know which node in this combined tree belongs to the output. This is the purpose of this "recurse" flag. "valid" bit can't be used for this because it's about data of the node except the subdirs. When we invalidate a directory, we want to keep cached data of the subdirs intact even though we don't really know what subdir still exists (yet). Then we check worktree to see what actual subdir remains on disk. Those will have 'recurse' bit set again. If cached data for those are still valid, we may be able to avoid computing exclude files for them. Those subdirs that are deleted will have 'recurse' remained clear and their 'valid' bits do not matter. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: record/validate dir mtime and reuse cached outputNguyễn Thái Ngọc Duy
The main readdir loop in read_directory_recursive() is replaced with a new one that checks if cached results of a directory is still valid. If a file is added or removed from the index, the containing directory is invalidated (but not its subdirs). If directory's mtime is changed, the same happens. If a .gitignore is updated, the containing directory and all subdirs are invalidated recursively. If dir_struct#flags or other conditions change, the cache is ignored. If a directory is invalidated, we opendir/readdir/closedir and run the exclude machinery on that directory listing as usual. If untracked cache is also enabled, we'll update the cache along the way. If a directory is validated, we simply pull the untracked listing out from the cache. The cache also records the list of direct subdirs that we have to recurse in. Fully excluded directories are seen as "untracked files". In the best case when no dirs are invalidated, read_directory() becomes a series of stat(dir), open(.gitignore), fstat(), read(), close() and optionally hash_sha1_file() For comparison, standard read_directory() is a sequence of opendir(), readdir(), open(.gitignore), fstat(), read(), close(), the expensive last_exclude_matching() and closedir(). We already try not to open(.gitignore) if we know it does not exist, so open/fstat/read/close sequence does not apply to every directory. The sequence could be reduced further, as noted in prep_exclude() in another patch. So in theory, the entire best-case read_directory sequence could be reduced to a series of stat() and nothing else. This is not a silver bullet approach. When you compile a C file, for example, the old .o file is removed and a new one with the same name created, effectively invalidating the containing directory's cache (but not its subdirectories). If your build process touches every directory, this cache adds extra overhead for nothing, so it's a good idea to separate generated files from tracked files.. Editors may use the same strategy for saving files. And of course you're out of luck running your repo on an unsupported filesystem and/or operating system. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: make a wrapper around {open,read,close}dir()Nguyễn Thái Ngọc Duy
This allows us to feed different info to read_directory_recursive() based on untracked cache in the next patch. Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: invalidate dirs recursively if .gitignore changesNguyễn Thái Ngọc Duy
It's easy to see that if an existing .gitignore changes, its SHA-1 would be different and invalidate_gitignore() is called. If .gitignore is removed, add_excludes() will treat it like an empty .gitignore, which again should invalidate the cached directory data. if .gitignore is added, lookup_untracked() already fills initial .gitignore SHA-1 as "empty file", so again invalidate_gitignore() is called. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: initial untracked cache validationNguyễn Thái Ngọc Duy
Make sure the starting conditions and all global exclude files are good to go. If not, either disable untracked cache completely, or wipe out the cache and start fresh. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: record .gitignore information and dir hierarchyNguyễn Thái Ngọc Duy
The idea is if we can capture all input and (non-rescursive) output of read_directory_recursive(), and can verify later that all the input is the same, then the second r_d_r() should produce the same output as in the first run. The requirement for this to work is stat info of a directory MUST change if an entry is added to or removed from that directory (and should not change often otherwise). If your OS and filesystem do not meet this requirement, untracked cache is not for you. Most file systems on *nix should be fine. On Windows, NTFS is fine while FAT may not be [1] even though FAT on Linux seems to be fine. The list of input of r_d_r() is in the big comment block in dir.h. In short, the output of a directory (not counting subdirs) mainly depends on stat info of the directory in question, all .gitignore leading to it and the check_only flag when r_d_r() is called recursively. This patch records all this info (and the output) as r_d_r() runs. Two hash_sha1_file() are required for $GIT_DIR/info/exclude and core.excludesfile unless their stat data matches. hash_sha1_file() is only needed when .gitignore files in the worktree are modified, otherwise their SHA-1 in index is used (see the previous patch). We could store stat data for .gitignore files so we don't have to rehash them if their content is different from index, but I think .gitignore files are rarely modified, so not worth extra cache data (and hashing penalty read-cache.c:verify_hdr(), as we will be storing this as an index extension). The implication is, if you change .gitignore, you better add it to the index soon or you lose all the benefit of untracked cache because a modified .gitignore invalidates all subdirs recursively. This is especially bad for .gitignore at root. This cached output is about untracked files only, not ignored files because the number of tracked files is usually small, so small cache overhead, while the number of ignored files could go really high (e.g. *.o files mixing with source code). [1] "Description of NTFS date and time stamps for files and folders" http://support.microsoft.com/kb/299648 Helped-by: Torsten Bögershausen <tboegi@web.de> Helped-by: David Turner <dturner@twopensource.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>