summaryrefslogtreecommitdiff
path: root/preload-index.c
AgeCommit message (Collapse)Author
2018-02-02trace: measure where the time is spent in the index-heavy operationsNguyễn Thái Ngọc Duy
All the known heavy code blocks are measured (except object database access). This should help identify if an optimization is effective or not. An unoptimized git-status would give something like below: 0.001791141 s: read cache ... 0.004011363 s: preload index 0.000516161 s: refresh index 0.003139257 s: git command: ... 'status' '--porcelain=2' 0.006788129 s: diff-files 0.002090267 s: diff-index 0.001885735 s: initialize name hash 0.032013138 s: read directory 0.051781209 s: git command: './git' 'status' Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-01fsmonitor: teach git to optionally utilize a file system monitor to speed up ↵Ben Peart
detecting new or changed files. When the index is read from disk, the fsmonitor index extension is used to flag the last known potentially dirty index entries. The registered core.fsmonitor command is called with the time the index was last updated and returns the list of files changed since that time. This list is used to flag any additional dirty cache entries and untracked cache directories. We can then use this valid state to speed up preload_index(), ie_match_stat(), and refresh_cache_ent() as they do not need to lstat() files to detect potential changes for those entries marked CE_FSMONITOR_VALID. In addition, if the untracked cache is turned on valid_cached_dir() can skip checking directories for new or changed files as fsmonitor will invalidate the cache only for those directories that have been identified as having potential changes. To keep the CE_FSMONITOR_VALID state accurate during git operations; when git updates a cache entry to match the current state on disk, it will now set the CE_FSMONITOR_VALID bit. Inversely, anytime git changes a cache entry, the CE_FSMONITOR_VALID bit is cleared and the corresponding untracked cache directory is marked invalid. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-24preload-index: add override to enable testing preload-indexBen Peart
By default, the preload index code path doesn't run unless there is a minimum of 1000 files. To enable running the test suite and having it execute the preload-index path, add an environment variable (GIT_FORCE_PRELOAD_TEST) which will override that minimum and set it to 2. This enables you run existing tests and have the core.preloadindex code path execute as long as the test has at least 2 files by setting GIT_FORCE_PRELOAD_TEXT=1 before running the test. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-10preload-index: avoid lstat for skip-worktree itemsJeff Hostetler
Teach preload-index to avoid lstat() calls for index-entries with skip-worktree bit set. This is a performance optimization. During a sparse-checkout, the skip-worktree bit is set on items that were not populated and therefore are not present in the worktree. The per-thread preload-index loop performs a series of tests on each index-entry as it attempts to compare the worktree version with the index and mark them up-to-date. This patch short-cuts that work. On a Windows 10 system with a very large repo (450MB index) and various levels of sparseness, performance was improved in the {preloadindex=true, fscache=false} case by 80% and in the {preloadindex=true, fscache=true} case by 20% for various commands. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-13cache.h: rename cache_def_free to cache_def_clearKarsten Blees
Rename cache_def_free to cache_def_clear as it doesn't free the struct cache_def, but just clears its content. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07symlinks: remove PATH_MAX limitationKarsten Blees
'git checkout' fails if a directory is longer than PATH_MAX, because the lstat_cache in symlinks.c checks if the leading directory exists using PATH_MAX-bounded string operations. Remove the limitation by using strbuf instead. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24pathspec: convert some match_pathspec_depth() to ce_path_match()Nguyễn Thái Ngọc Duy
This helps reduce the number of match_pathspec_depth() call sites and show how match_pathspec_depth() is used. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15convert read_cache_preload() to take struct pathspecNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15move struct pathspec and related functions to pathspec.[ch]Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03Convert ce_path_match() to use struct pathspecNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-24Make ce_uptodate() trustworthy againJunio C Hamano
The rule has always been that a cache entry that is ce_uptodate(ce) means that we already have checked the work tree entity and we know there is no change in the work tree compared to the index, and nobody should have to double check. Note that false ce_uptodate(ce) does not mean it is known to be dirty---it only means we don't know if it is clean. There are a few codepaths (refresh-index and preload-index are among them) that mark a cache entry as up-to-date based solely on the return value from ie_match_stat(); this function uses lstat() to see if the work tree entity has been touched, and for a submodule entry, if its HEAD points at the same commit as the commit recorded in the index of the superproject (a submodule that is not even cloned is considered clean). A submodule is no longer considered unmodified merely because its HEAD matches the index of the superproject these days, in order to prevent people from forgetting to commit in the submodule and updating the superproject index with the new submodule commit, before commiting the state in the superproject. However, the patch to do so didn't update the codepath that marks cache entries up-to-date based on the updated definition and instead worked it around by saying "we don't trust the return value of ce_uptodate() for submodules." This makes ce_uptodate() trustworthy again by not marking submodule entries up-to-date. The next step _could_ be to introduce a few "in-core" flag bits to cache_entry structure to record "this entry is _known_ to be dirty", call is_submodule_modified() from ie_match_stat(), and use these new bits to avoid running this rather expensive check more than once, but that can be a separate patch. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-23janitor: add DIV_ROUND_UP and use it.Pierre Habouzit
Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-10Make index preloading check the whole path to the fileLinus Torvalds
This uses the new thread-safe 'threaded_has_symlink_leading_path()' function to efficiently verify that the whole path leading up to the filename is a proper path, and does not contain symlinks. This makes 'ce_uptodate()' a much stronger guarantee: it no longer just guarantees that the 'lstat()' of the path would match, it also means that we know that people haven't played games with moving directories around and covered it up with symlinks. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-12-02Makefile: introduce NO_PTHREADSJunio C Hamano
This introduces make variable NO_PTHREADS for platforms that lack the support for pthreads library or people who do not want to use it for whatever reason. When defined, it makes the multi-threaded index preloading into a no-op, and also disables threaded delta searching by pack-objects. Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Mike Ralphson <mike@abacus.co.uk> Tested-by: Johannes Sixt <j6t@kdbg.org> (AIX 4.3.x) Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-17Fix index preloading for racy dirty caseLinus Torvalds
In the threaded index preloading case, we must be sure to always use the CE_MATCH_RACY_IS_DIRTY flag when calling ie_match_stat(), in order to make sure that we only ever look at the stat() data, and don't try to do anything fancy. Because most of git internals are not thread-safe, and must not be called in parallel. Otherwise, what happens is that if the timestamps indicate that an entry _might_ be dirty, we might start actually comparing filesystem data with the object database. And we mustn't do that, because that would involve looking up and creating the object structure, and that whole code sequence with read_sha1_file() where we look up and add objects to the hashes is definitely not thread-safe. Nor do we want to add locking, because the whole point of the preload was to be simple and not affect anything else. With CE_MATCH_RACY_IS_DIRTY, we get what we wanted, and we'll just leave the hard cases well alone, to be done later in the much simpler serial case. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-15Add cache preload facilityLinus Torvalds
This can do the lstat() storm in parallel, giving potentially much improved performance for cold-cache cases or things like NFS that have weak metadata caching. Just use "read_cache_preload()" instead of "read_cache()" to force an optimistic preload of the index stat data. The function takes a pathspec as its argument, allowing us to preload only the relevant portion of the index. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>