summaryrefslogtreecommitdiff
path: root/read-cache.c
AgeCommit message (Collapse)Author
2019-03-07Merge branch 'jh/trace2'Junio C Hamano
A more structured way to obtain execution trace has been added. * jh/trace2: trace2: add for_each macros to clang-format trace2: t/helper/test-trace2, t0210.sh, t0211.sh, t0212.sh trace2:data: add subverb for rebase trace2:data: add subverb to reset command trace2:data: add subverb to checkout command trace2:data: pack-objects: add trace2 regions trace2:data: add trace2 instrumentation to index read/write trace2:data: add trace2 hook classification trace2:data: add trace2 transport child classification trace2:data: add trace2 sub-process classification trace2:data: add editor/pager child classification trace2:data: add trace2 regions to wt-status trace2: collect Windows-specific process information trace2: create new combined trace facility trace2: Documentation/technical/api-trace2.txt
2019-03-07Merge branch 'nd/split-index-null-base-fix'Junio C Hamano
Split-index fix. * nd/split-index-null-base-fix: read-cache.c: fix writing "link" index ext with null base oid
2019-03-07Merge branch 'tg/checkout-no-overlay'Junio C Hamano
"git checkout --no-overlay" can be used to trigger a new mode of checking out paths out of the tree-ish, that allows paths that match the pathspec that are in the current index and working tree and are not in the tree-ish. * tg/checkout-no-overlay: revert "checkout: introduce checkout.overlayMode config" checkout: introduce checkout.overlayMode config checkout: introduce --{,no-}overlay option checkout: factor out mark_cache_entry_for_checkout function checkout: clarify comment read-cache: add invalidate parameter to remove_marked_cache_entries entry: support CE_WT_REMOVE flag in checkout_entry entry: factor out unlink_entry function move worktree tests to t24*
2019-02-22trace2:data: add trace2 instrumentation to index read/writeJeff Hostetler
Add trace2 events to measure reading and writing the index. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-13read-cache.c: fix writing "link" index ext with null base oidNguyễn Thái Ngọc Duy
Since commit 7db118303a (unpack_trees: fix breakage when o->src_index != o->dst_index - 2018-04-23) and changes in merge code to use separate index_state for source and destination, when doing a merge with split index activated, we may run into this line in unpack_trees(): o->result.split_index = init_split_index(&o->result); This is by itself not wrong. But this split index information is not fully populated (and it's only so when move_cache_to_base_index() is called, aka force splitting the index, or loading index_state from a file). Both "base_oid" and "base" in this case remain null. So when writing the main index down, we link to this index with null oid (default value after init_split_index()), which also means "no split index" internally. This triggers an incorrect base index refresh: warning: could not freshen shared index '.../sharedindex.0{40}' This patch makes sure we will not refresh null base_oid (because the file is never there). It also makes sure not to write "link" extension with null base_oid in the first place (no point having it at all). Read code already has protection against null base_oid. There is also another side fix in remove_split_index() that causes a crash when doing "git update-index --no-split-index" when base_oid in the index file is null. In this case we will not load istate->split_index->base but we dereference it anyway and are rewarded with a segfault. This should not happen anymore, but it's still wrong to dereference a potential NULL pointer, especially when we do check for NULL pointer in the next code. Reported-by: Luke Diamand <luke@diamand.org> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-07Merge branch 'jk/add-ignore-errors-bit-assignment-fix'Junio C Hamano
A hotfix to an incomplete fix made earlier. * jk/add-ignore-errors-bit-assignment-fix: add_to_index(): convert forgotten HASH_RENORMALIZE check
2019-02-07Merge branch 'nd/the-index-final'Junio C Hamano
The assumption to work on the single "in-core index" instance has been reduced from the library-ish part of the codebase. * nd/the-index-final: cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch read-cache.c: remove the_* from index_has_changes() merge-recursive.c: remove implicit dependency on the_repository merge-recursive.c: remove implicit dependency on the_index sha1-name.c: remove implicit dependency on the_index read-cache.c: replace update_index_if_able with repo_& read-cache.c: kill read_index() checkout: avoid the_index when possible repository.c: replace hold_locked_index() with repo_hold_locked_index() notes-utils.c: remove the_repository references grep: use grep_opt->repo instead of explict repo argument
2019-02-07add_to_index(): convert forgotten HASH_RENORMALIZE checkJeff King
Commit 9e5da3d055 (add: use separate ADD_CACHE_RENORMALIZE flag, 2019-01-17) switched out using HASH_RENORMALIZE in our flags field for a new ADD_CACHE_RENORMALIZE flag. However, it forgot to convert one of the checks for HASH_RENORMALIZE into the new flag, which totally broke "git add --renormalize". To make matters even more confusing, the resulting code would racily pass the tests! The forgotten check was responsible for defeating the up-to-date check of the index entry. That meant that "git add --renormalize" would refuse to renormalize things that appeared stat-clean. But most of the time the test commands run fast enough that the file mtime is the same as the index mtime. And thus we err on the conservative side and re-hash the file, which is what "--renormalize" would have wanted. But if you're unlucky and cross that one-second boundary between writing the file and writing the index (which is more likely to happen on a slow or heavily-loaded system), then the file appears stat-clean. And "--renormalize" would effectively do nothing. The fix is straightforward: convert this check to use the right flag. Noticed-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-05Merge branch 'jk/add-ignore-errors-bit-assignment-fix'Junio C Hamano
"git add --ignore-errors" did not work as advertised and instead worked as an unintended synonym for "git add --renormalize", which has been fixed. * jk/add-ignore-errors-bit-assignment-fix: add: use separate ADD_CACHE_RENORMALIZE flag
2019-01-24cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switchNguyễn Thái Ngọc Duy
By default, index compat macros are off from now on, because they could hide the_index dependency. Only those in builtin can use it. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17add: use separate ADD_CACHE_RENORMALIZE flagJeff King
Commit 9472935d81 (add: introduce "--renormalize", 2017-11-16) taught git-add to pass HASH_RENORMALIZE to add_to_index(), which then passes the flag along to index_path(). However, the flags taken by add_to_index() and the ones taken by index_path() are distinct namespaces. We cannot take HASH_* flags in add_to_index(), because they overlap with the ADD_CACHE_* flags we already take (in this case, HASH_RENORMALIZE conflicts with ADD_CACHE_IGNORE_ERRORS). We can solve this by adding a new ADD_CACHE_RENORMALIZE flag, and using it to set HASH_RENORMALIZE within add_to_index(). In order to make it clear that these two flags come from distinct sets, let's also change the name "newflags" in the function to "hash_flags". Reported-by: Dmitriy Smirnov <dmitriy.smirnov@jetbrains.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14Merge branch 'nd/indentation-fix'Junio C Hamano
Code cleanup. * nd/indentation-fix: Indent code with TABs
2019-01-14read-cache.c: remove the_* from index_has_changes()Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14read-cache.c: replace update_index_if_able with repo_&Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14read-cache.c: kill read_index()Nguyễn Thái Ngọc Duy
read_index() shares the same problem as hold_locked_index(): it assumes $GIT_DIR/index. Move all call sites to repo_read_index() instead. read_index_preload() and read_index_unmerged() are also killed as a consequence. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14repository.c: replace hold_locked_index() with repo_hold_locked_index()Nguyễn Thái Ngọc Duy
hold_locked_index() assumes the index path at $GIT_DIR/index. This is not good for places that take an arbitrary index_state instead of the_index, which is basically everywhere except builtin/. Replace it with repo_hold_locked_index(). hold_locked_index() remains as a wrapper around repo_hold_locked_index() to reduce changes in builtin/ Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-04Merge branch 'nd/the-index'Junio C Hamano
More codepaths become aware of working with in-core repository instance other than the default "the_repository". * nd/the-index: (22 commits) rebase-interactive.c: remove the_repository references rerere.c: remove the_repository references pack-*.c: remove the_repository references pack-check.c: remove the_repository references notes-cache.c: remove the_repository references line-log.c: remove the_repository reference diff-lib.c: remove the_repository references delta-islands.c: remove the_repository references cache-tree.c: remove the_repository references bundle.c: remove the_repository references branch.c: remove the_repository reference bisect.c: remove the_repository reference blame.c: remove implicit dependency the_repository sequencer.c: remove implicit dependency on the_repository sequencer.c: remove implicit dependency on the_index transport.c: remove implicit dependency on the_index notes-merge.c: remove implicit dependency the_repository notes-merge.c: remove implicit dependency on the_index list-objects.c: reduce the_repository references list-objects-filter.c: remove implicit dependency on the_index ...
2019-01-04Merge branch 'nd/i18n'Junio C Hamano
More _("i18n") markings. * nd/i18n: fsck: mark strings for translation fsck: reduce word legos to help i18n parse-options.c: mark more strings for translation parse-options.c: turn some die() to BUG() parse-options: replace opterror() with optname() repack: mark more strings for translation remote.c: mark messages for translation remote.c: turn some error() or die() to BUG() reflog: mark strings for translation read-cache.c: add missing colon separators read-cache.c: mark more strings for translation read-cache.c: turn die("internal error") to BUG() attr.c: mark more string for translation archive.c: mark more strings for translation alias.c: mark split_cmdline_strerror() strings for translation git.c: mark more strings for translation
2019-01-02read-cache: add invalidate parameter to remove_marked_cache_entriesThomas Gummerer
When marking cache entries for removal, and later removing them all at once using 'remove_marked_cache_entries()', cache entries currently have to be invalidated manually in the cache tree and in the untracked cache. Add an invalidate flag to the function. With the flag set, the function will take care of invalidating the path in the cache tree and in the untracked cache. Note that the current callsites already do the invalidation properly in other places, so we're just passing 0 from there to keep the status quo. This will be useful in a subsequent commit. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-09Indent code with TABsNguyễn Thái Ngọc Duy
We indent with TABs and sometimes for fine alignment, TABs followed by spaces, but never all spaces (unless the indentation is less than 8 columns). Indenting with spaces slips through in some places. Fix them. Imported code and compat/ are left alone on purpose. The former should remain as close as upstream as possible. The latter pretty much has separate maintainers, it's up to them to decide. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-21Merge branch 'cc/shared-index-permbits'Junio C Hamano
The way .git/index and .git/sharedindex* files were initially created gave these files different perm bits until they were adjusted for shared repository settings. This was made consistent. * cc/shared-index-permbits: read-cache: make the split index obey umask settings
2018-11-21index: make index.threads=true enable ieot and eoieJonathan Nieder
If a user explicitly sets [index] threads = true to read the index using multiple threads, ensure that index writes include the offset table by default to make that possible. This ensures that the user's intent of turning on threading is respected. In other words, permit the following configurations: - index.threads and index.recordOffsetTable unspecified: do not write the offset table yet (to avoid alarming the user with "ignoring IEOT extension" messages when an older version of Git accesses the repository) but do make use of multiple threads to read the index if the supporting offset table is present. This can also be requested explicitly by setting index.threads=true, 0, or >1 and index.recordOffsetTable=false. - index.threads=false or 1: do not write the offset table, and do not make use of the offset table. One can set index.recordOffsetTable=false as well, to be more explicit. - index.threads=true, 0, or >1 and index.recordOffsetTable unspecified: write the offset table and make use of threads at read time. This can also be requested by setting index.threads=true, 0, >1, or unspecified and index.recordOffsetTable=true. Fortunately the complication is temporary: once most Git installations have upgraded to a version with support for the IEOT and EOIE extensions, we can flip the defaults for index.recordEndOfIndexEntries and index.recordOffsetTable to true and eliminate the settings. Helped-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-21ieot: default to not writing IEOT sectionJonathan Nieder
As with EOIE, popular versions of Git do not support the new IEOT extension yet. When accessing a Git repository written by a more modern version of Git, they correctly ignore the unrecognized section, but in the process they loudly warn ignoring IEOT extension resulting in confusion for users. Introduce the index extension more gently by not writing it yet in this first version with support for it. Soon, once sufficiently many users are running a modern version of Git, we can flip the default so users benefit from this index extension by default. Introduce a '[index] recordOffsetTable' configuration variable to control whether the new index extension is written. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-21eoie: default to not writing EOIE sectionJonathan Nieder
Since 3b1d9e04 (eoie: add End of Index Entry (EOIE) extension, 2018-10-10) Git defaults to writing the new EOIE section when writing out an index file. Usually that is a good thing because it improves threaded performance, but when a Git repository is shared with older versions of Git, it produces a confusing warning: $ git status ignoring EOIE extension HEAD detached at 371ed0defa nothing to commit, working tree clean Let's introduce the new index extension more gently. First we'll roll out the new version of Git that understands it, and then once sufficiently many users are using such a version, we can flip the default to writing it by default. Introduce a '[index] recordEndOfIndexEntries' configuration variable to allow interested users to benefit from this index extension early. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-18read-cache: make the split index obey umask settingsÆvar Arnfjörð Bjarmason
Make the split index write out its .git/sharedindex_* files with the same permissions as .git/index. This only changes the behavior when core.sharedRepository isn't set, i.e. the user's umask settings will be respected. This hasn't been the case ever since the split index was originally implemented in c18b80a0e8 ("update-index: new options to enable/disable split index mode", 2014-06-13). A mkstemp()-like function has always been used to create it. First mkstemp() itself, and then later our own mkstemp()-like in f6ecc62dbf ("write_shared_index(): use tempfile module", 2015-08-10) A related bug was fixed in df801f3f9f ("read-cache: use shared perms when writing shared index", 2017-06-25). Since then the split index has respected core.sharedRepository. However, using that setting should not be required simply to make git obey the user's umask setting. It's intended for the use-case of overriding whatever that umask is set to. This fixes cases where the user has e.g. set his umask to 022 on a shared server in anticipation of other user's needing to run "status", "log" etc. in his repository. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-18Merge branch 'nd/pthreads'Junio C Hamano
The codebase has been cleaned up to reduce "#ifndef NO_PTHREADS". * nd/pthreads: Clean up pthread_create() error handling read-cache.c: initialize copy_len to shut up gcc 8 read-cache.c: reduce branching based on HAVE_THREADS read-cache.c: remove #ifdef NO_PTHREADS pack-objects: remove #ifdef NO_PTHREADS preload-index.c: remove #ifdef NO_PTHREADS grep: clean up num_threads handling grep: remove #ifdef NO_PTHREADS attr.c: remove #ifdef NO_PTHREADS name-hash.c: remove #ifdef NO_PTHREADS index-pack: remove #ifdef NO_PTHREADS send-pack.c: move async's #ifdef NO_PTHREADS back to run-command.c run-command.h: include thread-utils.h instead of pthread.h thread-utils: macros to unconditionally compile pthreads API
2018-11-13Merge branch 'bp/refresh-index-using-preload'Junio C Hamano
The helper function to refresh the cached stat information in the in-core index has learned to perform the lstat() part of the operation in parallel on multi-core platforms. * bp/refresh-index-using-preload: refresh_index: remove unnecessary calls to preload_index() speed up refresh_index() by utilizing preload_index()
2018-11-12cache-tree.c: remove the_repository referencesNguyễn Thái Ngọc Duy
This case is more interesting than other boring "remove the_repo" commits because while we need access to the object database, we cannot simply use r->index because unpack-trees.c can operate on a temporary index, not $GIT_DIR/index. Ideally we should be able to pass an object database to lookup_tree() but that ship has sailed. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12read-cache.c: add missing colon separatorsNguyễn Thái Ngọc Duy
typechange_fmt and added_fmt should have a colon before "needs update". Align the statements to make it easier to read and see. Also drop the unnecessary (). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12read-cache.c: mark more strings for translationNguyễn Thái Ngọc Duy
There are a couple other improvements on these strings as well: - add missing colon (as separator) - quote paths - provide more information on error messages - keep first word in lowercase Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12read-cache.c: turn die("internal error") to BUG()Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-06Merge branch 'nd/unpack-trees-with-cache-tree'Junio C Hamano
Trivial bugfix. * nd/unpack-trees-with-cache-tree: read-cache: use of memory after it is freed
2018-11-05read-cache.c: initialize copy_len to shut up gcc 8Nguyễn Thái Ngọc Duy
It was reported that when building with NO_PTHREADS=1, -Wmaybe-uninitialized is triggered. Just initialize the variable from the beginning to shut the compiler up (because this warning is enabled in config.dev) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-05read-cache.c: reduce branching based on HAVE_THREADSNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-05read-cache.c: remove #ifdef NO_PTHREADSNguyễn Thái Ngọc Duy
This is a faithful conversion with no attempt to clean up whatsoever. Code indentation is left broken. There will be another commit to clean it up and un-indent if we just indent now. It's just more code noise. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-30speed up refresh_index() by utilizing preload_index()Ben Peart
Speed up refresh_index() by utilizing preload_index() to do most of the work spread across multiple threads. This works because most cache entries will get marked CE_UPTODATE so that refresh_cache_ent() can bail out early when called from within refresh_index(). On a Windows repo with ~200K files, this drops refresh times from 6.64 seconds to 2.87 seconds for a savings of 57%. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-26Merge branch 'sg/split-index-racefix'Junio C Hamano
The codepath to support the experimental split-index mode had remaining "racily clean" issues fixed. * sg/split-index-racefix: split-index: BUG() when cache entry refers to non-existing shared entry split-index: smudge and add racily clean cache entries to split index split-index: don't compare cached data of entries already marked for split index split-index: count the number of deleted entries t1700-split-index: date back files to avoid racy situations split-index: add tests to demonstrate the racy split index problem t1700-split-index: document why FSMONITOR is disabled in this test script
2018-10-22read-cache: use of memory after it is freedCarlo Marcelo Arenas Belón
introduced with c46c406ae1e (trace.h: support nested performance tracing) on Aug 18, 2018 but not affecting maint Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Acked-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19Merge branch 'nd/status-refresh-progress'Junio C Hamano
"git status" learns to show progress bar when refreshing the index takes a long time. * nd/status-refresh-progress: status: show progress bar if refreshing the index takes too long
2018-10-19Merge branch 'bp/read-cache-parallel'Junio C Hamano
A new extension to the index file has been introduced, which allows the file to be read in parallel. * bp/read-cache-parallel: read-cache: load cache entries on worker threads ieot: add Index Entry Offset Table (IEOT) extension read-cache: load cache extensions on a worker thread config: add new index.threads config setting eoie: add End of Index Entry (EOIE) extension read-cache: clean up casting and byte decoding read-cache.c: optimize reading index format v4
2018-10-19Merge branch 'nd/the-index'Junio C Hamano
Various codepaths in the core-ish part learn to work on an arbitrary in-core index structure, not necessarily the default instance "the_index". * nd/the-index: (23 commits) revision.c: reduce implicit dependency the_repository revision.c: remove implicit dependency on the_index ws.c: remove implicit dependency on the_index tree-diff.c: remove implicit dependency on the_index submodule.c: remove implicit dependency on the_index line-range.c: remove implicit dependency on the_index userdiff.c: remove implicit dependency on the_index rerere.c: remove implicit dependency on the_index sha1-file.c: remove implicit dependency on the_index patch-ids.c: remove implicit dependency on the_index merge.c: remove implicit dependency on the_index merge-blobs.c: remove implicit dependency on the_index ll-merge.c: remove implicit dependency on the_index diff-lib.c: remove implicit dependency on the_index read-cache.c: remove implicit dependency on the_index diff.c: remove implicit dependency on the_index grep.c: remove implicit dependency on the_index diff.c: remove the_index dependency in textconv() functions blame.c: rename "repo" argument to "r" combine-diff.c: remove implicit dependency on the_index ...
2018-10-11split-index: smudge and add racily clean cache entries to split indexSZEDER Gábor
Ever since the split index feature was introduced [1], refreshing a split index is prone to a variant of the classic racy git problem. Consider the following sequence of commands updating the split index when the shared index contains a racily clean cache entry, i.e. an entry whose cached stat data matches with the corresponding file in the worktree and the cached mtime matches that of the index: echo "cached content" >file git update-index --split-index --add file echo "dirty worktree" >file # size stays the same! # ... wait ... git update-index --add other-file Normally, when a non-split index is updated, then do_write_index() (the function responsible for writing all kinds of indexes, "regular", split, and shared) recognizes racily clean cache entries, and writes them with smudged stat data, i.e. with file size set to 0. When subsequent git commands read the index, they will notice that the smudged stat data doesn't match with the file in the worktree, and then go on to check the file's content and notice its dirtiness. In the above example, however, in the second 'git update-index' prepare_to_write_split_index() decides which cache entries stored only in the shared index should be replaced in the new split index. Alas, this function never looks out for racily clean cache entries, and since the file's stat data in the worktree hasn't changed since the shared index was written, it won't be replaced in the new split index. Consequently, do_write_index() doesn't even get this racily clean cache entry, and can't smudge its stat data. Subsequent git commands will then see that the index has more recent mtime than the file and that the (not smudged) cached stat data still matches with the file in the worktree, and, ultimately, will erroneously consider the file clean. Modify prepare_to_write_split_index() to recognize racily clean cache entries, and mark them to be added to the split index. Note that there are two places where it should check raciness: first those cache entries that are only stored in the shared index, and then those that have been copied by unpack_trees() from the shared index while it constructed a new index. This way do_write_index() will get these racily clean cache entries as well, and will then write them with smudged stat data to the new split index. This change makes all tests in 't1701-racy-split-index.sh' pass, so flip the two 'test_expect_failure' tests to success. Also add the '#' (as in nr. of trial) to those tests' description that were omitted when the tests expected failure. Note that after this change if the index is split when it contains a racily clean cache entry, then a smudged cache entry will be written both to the new shared and to the new split indexes. This doesn't affect regular git commands: as far as they are concerned this is just an entry in the split index replacing an outdated entry in the shared index. It did affect a few tests in 't1700-split-index.sh', though, because they actually check which entries are stored in the split index; a previous patch in this series has already made the necessary adjustments in 't1700'. And racily clean cache entries and index splitting are rare enough to not worry about the resulting duplicated smudged cache entries, and the additional complexity required to prevent them is not worth it. Several tests failed occasionally when the test suite was run with 'GIT_TEST_SPLIT_INDEX=yes'. Here are those that I managed to trace back to this racy split index problem, starting with those failing more frequently, with a link to a failing Travis CI build job for each. The highlighted line [2] shows when the racy file was written, which is not always in the failing test but in a preceeding setup test. t3903-stash.sh: https://travis-ci.org/git/git/jobs/385542084#L5858 t4024-diff-optimize-common.sh: https://travis-ci.org/git/git/jobs/386531969#L3174 t4015-diff-whitespace.sh: https://travis-ci.org/git/git/jobs/360797600#L8215 t2200-add-update.sh: https://travis-ci.org/git/git/jobs/382543426#L3051 t0090-cache-tree.sh: https://travis-ci.org/git/git/jobs/416583010#L3679 There might be others, e.g. perhaps 't1000-read-tree-m-3way.sh' and others using 'lib-read-tree-m-3way.sh', but I couldn't confirm yet. [1] In the branch leading to the merge commit v2.1.0-rc0~45 (Merge branch 'nd/split-index', 2014-07-16). [2] Note that those highlighted lines are in the 'after failure' fold, and your browser might unhelpfully fold it up before you could take a good look. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-11read-cache: load cache entries on worker threadsBen Peart
This patch helps address the CPU cost of loading the index by utilizing the Index Entry Offset Table (IEOT) to divide loading and conversion of the cache entries across multiple threads in parallel. I used p0002-read-cache.sh to generate some performance data: Test w/100,000 files reduced the time by 32.24% Test w/1,000,000 files reduced the time by -4.77% Note that on the 1,000,000 files case, multi-threading the cache entry parsing does not yield a performance win. This is because the cost to parse the index extensions in this repo, far outweigh the cost of loading the cache entries. The high cost of parsing the index extensions is driven by the cache tree and the untracked cache extensions. As this is currently the longest pole, any reduction in this time will reduce the overall index load times so is worth further investigation in another patch series. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-11ieot: add Index Entry Offset Table (IEOT) extensionBen Peart
This patch enables addressing the CPU cost of loading the index by adding additional data to the index that will allow us to efficiently multi- thread the loading and conversion of cache entries. It accomplishes this by adding an (optional) index extension that is a table of offsets to blocks of cache entries in the index file. To make this work for V4 indexes, when writing the cache entries, it periodically "resets" the prefix-compression by encoding the current entry as if the path name for the previous entry is completely different and saves the offset of that entry in the IEOT. Basically, with V4 indexes, it generates offsets into blocks of prefix-compressed entries. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-11read-cache: load cache extensions on a worker threadBen Peart
This patch helps address the CPU cost of loading the index by loading the cache extensions on a worker thread in parallel with loading the cache entries. In some cases, loading the extensions takes longer than loading the cache entries so this patch utilizes the new EOIE to start the thread to load the extensions before loading all the cache entries in parallel. This is possible because the current extensions don't access the cache entries in the index_state structure so are OK that they don't all exist yet. The CACHE_EXT_TREE, CACHE_EXT_RESOLVE_UNDO, and CACHE_EXT_UNTRACKED extensions don't even get a pointer to the index so don't have access to the cache entries. CACHE_EXT_LINK only uses the index_state to initialize the split index. CACHE_EXT_FSMONITOR only uses the index_state to save the fsmonitor last update and dirty flags. I used p0002-read-cache.sh to generate some performance data: Test w/100,000 files reduced the time by 0.53% Test w/1,000,000 files reduced the time by 27.78% Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-11eoie: add End of Index Entry (EOIE) extensionBen Peart
The End of Index Entry (EOIE) is used to locate the end of the variable length index entries and the beginning of the extensions. Code can take advantage of this to quickly locate the index extensions without having to parse through all of the index entries. The EOIE extension is always written out to the index file including to the shared index when using the split index feature. Because it is always written out, the SHA checksums in t/t1700-split-index.sh were updated to reflect its inclusion. It is written as an optional extension to ensure compatibility with other git implementations that do not yet support it. It is always written out to ensure it is available as often as possible to speed up index operations. Because it must be able to be loaded before the variable length cache entries and other index extensions, this extension must be written last. The signature for this extension is { 'E', 'O', 'I', 'E' }. The extension consists of: - 32-bit offset to the end of the index entries - 160-bit SHA-1 over the extension types and their sizes (but not their contents). E.g. if we have "TREE" extension that is N-bytes long, "REUC" extension that is M-bytes long, followed by "EOIE", then the hash would be: SHA-1("TREE" + <binary representation of N> + "REUC" + <binary representation of M>) Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-11read-cache: clean up casting and byte decodingBen Peart
This patch does a clean up pass to minimize the casting required to work with the memory mapped index (mmap). It also makes the decoding of network byte order more consistent by using get_be32() where possible. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-26read-cache.c: optimize reading index format v4Nguyễn Thái Ngọc Duy
Index format v4 requires some more computation to assemble a path based on a previous one. The current code is not very efficient because - it doubles memory copy, we assemble the final path in a temporary first before putting it back to a cache_entry - strbuf_remove() in expand_name_field() is not exactly a good fit for stripping a part at the end, _setlen() would do the same job and is much cheaper. - the open-coded loop to find the end of the string in expand_name_field() can't beat an optimized strlen() This patch avoids the temporary buffer and writes directly to the new cache_entry, which addresses the first two points. The last point could also be avoided if the total string length fits in the first 12 bits of ce_flags, if not we fall back to strlen(). Running "test-tool read-cache 100" on webkit.git (275k files), reading v2 only takes 4.226 seconds, while v4 takes 5.711 seconds, 35% more time. The patch reduces read time on v4 to 4.319 seconds. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21sha1-file.c: remove implicit dependency on the_indexNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21read-cache.c: remove implicit dependency on the_indexNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>