summaryrefslogtreecommitdiff
path: root/sparse-index.c
AgeCommit message (Collapse)Author
2021-09-20Merge branch 'ds/sparse-index-ignored-files'Junio C Hamano
In cone mode, the sparse-index code path learned to remove ignored files (like build artifacts) outside the sparse cone, allowing the entire directory outside the sparse cone to be removed, which is especially useful when the sparse patterns change. * ds/sparse-index-ignored-files: sparse-checkout: clear tracked sparse dirs sparse-index: add SPARSE_INDEX_MEMORY_ONLY flag attr: be careful about sparse directories sparse-checkout: create helper methods sparse-index: use WRITE_TREE_MISSING_OK sparse-index: silently return when cache tree fails unpack-trees: fix nested sparse-dir search sparse-index: silently return when not using cone-mode patterns t7519: rewrite sparse index test
2021-09-08sparse-index: add SPARSE_INDEX_MEMORY_ONLY flagDerrick Stolee
The convert_to_sparse() method checks for the GIT_TEST_SPARSE_INDEX environment variable or the "index.sparse" config setting before converting the index to a sparse one. This is for ease of use since all current consumers are preparing to compress the index before writing it to disk. If these settings are not enabled, then convert_to_sparse() silently returns without doing anything. We will add a consumer in the next change that wants to use the sparse index as an in-memory data structure, regardless of whether the on-disk format should be sparse. To that end, create the SPARSE_INDEX_MEMORY_ONLY flag that will skip these config checks when enabled. All current consumers are modified to pass '0' in the new 'flags' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08sparse-checkout: create helper methodsDerrick Stolee
As we integrate the sparse index into more builtins, we occasionally need to check the sparse-checkout patterns to see if a path is within the sparse-checkout cone. Create some helper methods that help initialize the patterns and check for pattern matching to make this easier. The existing callers of commands like get_sparse_checkout_patterns() use a custom 'struct pattern_list' that is not necessarily the one in the 'struct index_state', so there are not many previous uses that could adopt these helpers. There are just two in builtin/add.c and sparse-index.c that can use path_in_sparse_checkout(). We add a path_in_cone_mode_sparse_checkout() as well that will only return false if the path is outside of the sparse-checkout definition _and_ the sparse-checkout patterns are in cone mode. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08sparse-index: use WRITE_TREE_MISSING_OKDerrick Stolee
When updating the cache tree in convert_to_sparse(), the WRITE_TREE_MISSING_OK flag indicates that trees might be computed that do not already exist within the object database. This happens in cases such as 'git add' creating new trees that it wants to store in anticipation of a following 'git commit'. If this flag is not specified, then it might trigger a promisor fetch or a failure due to the object not existing locally. Use WRITE_TREE_MISSING_OK during convert_to_sparse() to avoid these possible reasons for the cache_tree_update() to fail. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08sparse-index: silently return when cache tree failsDerrick Stolee
If cache_tree_update() returns a non-zero value, then it could not create the cache tree. This is likely due to a path having a merge conflict. Since we are already returning early, let's return silently to avoid making it seem like we failed to write the index at all. If we remove our dependence on the cache tree within convert_to_sparse(), then we could still recover from this scenario and have a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08sparse-index: silently return when not using cone-mode patternsDerrick Stolee
While the sparse-index is only enabled when core.sparseCheckoutCone is also enabled, it is possible for the user to modify the sparse-checkout file manually in a way that does not match cone-mode patterns. In this case, we should refuse to convert an index into a sparse index, since the sparse_checkout_patterns will not be initialized with recursive and parent path hashsets. Also silently return if there are no cache entries, which is a simple case: there are no paths to make sparse! Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30sparse-index: copy dir_hash in ensure_full_index()Jeff Hostetler
Copy the 'index_state->dir_hash' back to the real istate after expanding a sparse index. A crash was observed in 'git status' during some hashmap lookups with corrupted hashmap entries. During an index expansion, new cache-entries are added to the 'index_state->name_hash' and the 'dir_hash' in a temporary 'index_state' variable 'full'. However, only the 'name_hash' hashmap from this temp variable was copied back into the real 'istate' variable. The original copy of the 'dir_hash' was incorrectly preserved. If the table in the 'full->dir_hash' hashmap were realloced, the stale version (in 'istate') would be corrupted. The test suite does not operate on index sizes sufficiently large to trigger this reallocation, so they do not cover this behavior. Increasing the test suite to cover such scale is fragile and likely wasteful. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-14sparse-index: recompute cache-treeDerrick Stolee
When some commands run with command_requires_full_index=1, then the index can get in a state where the in-memory cache tree is actually equal to the sparse index's cache tree instead of the full one. This results in incorrect entry_count values. By clearing the cache tree before converting to sparse, we avoid this issue. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-14fsmonitor: integrate with sparse indexDerrick Stolee
If we need to expand a sparse-index into a full one, then the FS Monitor bitmap is going to be incorrect. Ensure that we start fresh at such an event. While this is currently a performance drawback, the eventual hope of the sparse-index feature is that these expansions will be rare and hence we will be able to keep the FS Monitor data accurate across multiple Git commands. These tests are added to demonstrate that the behavior is the same across a full index and a sparse index, but also that file modifications to a tracked directory outside of the sparse cone will trigger ensure_full_index(). Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-14sparse-index: include EXTENDED flag when expandingDerrick Stolee
When creating a full index from a sparse one, we create cache entries for every blob within a given sparse directory entry. These are correctly marked with the CE_SKIP_WORKTREE flag, but the CE_EXTENDED flag is not included. The CE_EXTENDED flag would exist if we loaded a full index from disk with these entries marked with CE_SKIP_WORKTREE, so we can add the flag here to be consistent. This allows us to directly compare the flags present in cache entries when testing the sparse-index feature, but has no significance to its correctness in the user-facing functionality. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-14sparse-index: skip indexes with unmerged entriesDerrick Stolee
The sparse-index format is designed to be compatible with merge conflicts, even those outside the sparse-checkout definition. The reason is that when converting a full index to a sparse one, a cache entry with nonzero stage will not be collapsed into a sparse directory entry. However, this behavior was not tested, and a different behavior within convert_to_sparse() fails in this scenario. Specifically, cache_tree_update() will fail when unmerged entries exist. convert_to_sparse_rec() uses the cache-tree data to recursively walk the tree structure, but also to compute the OIDs used in the sparse-directory entries. Add an index scan to convert_to_sparse() that will detect if these merge conflict entries exist and skip the conversion before trying to update the cache-tree. This is marked as NEEDSWORK because this can be removed with a suitable update to cache_tree_update() or a similar method that can construct a cache-tree with invalid nodes, but still allow creating the nodes necessary for creating sparse directory entries. It is possible that in the future we will not need to make such an update, since if we do not expand a sparse-index into a full one, this conversion does not need to happen. Thus, this can be deferred until the merge machinery is made to integrate with the sparse-index. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20Merge branch 'ds/sparse-index-protections'Junio C Hamano
Fix access to uninitialized piece of memory, introduced during this cycle. * ds/sparse-index-protections: sparse-index: fix uninitialized jump
2021-05-17sparse-index: fix uninitialized jumpDerrick Stolee
While testing the sparse-index, I verified a test with --valgrind and it complained about an uninitialized value being used in a jump in the path_matches_pattern_list() method. The line was this one: if (*dtype == DT_UNKNOWN) In the call stack, the culprit was the initialization of the dtype variable in convert_to_sparse_rec(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-06sparse-index.c: remove set_index_sparse_config()Ævar Arnfjörð Bjarmason
Remove the set_index_sparse_config() function by folding it into set_sparse_index_config(), which was its only user. Since 122ba1f7b52 (sparse-checkout: toggle sparse index from builtin, 2021-03-30) the flow of this code hasn't made much sense, we'd get "enabled" in set_sparse_index_config(), proceed to call set_index_sparse_config() with it. There we'd call prepare_repo_settings() and set "repo->settings.sparse_index = 1", only to needlessly call prepare_repo_settings() again in set_sparse_index_config() (where it would early abort), and finally setting "repo->settings.sparse_index = enabled". Instead we can just call prepare_repo_settings() once, and set the variable to "enabled" in the first place. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-14sparse-index: expand_to_path()Derrick Stolee
Some users of the index API have a specific path they are looking for, but choose to use index_file_exists() to rely on the name-hash hashtable instead of doing binary search with index_name_pos(). These users only need to know a yes/no answer, not a position within the cache array. When the index is sparse, the name-hash hash table does not contain the full list of paths within sparse directories. It _does_ contain the directory names for the sparse-directory entries. Create a helper function, expand_to_path(), for intended use with the name-hash hashtable functions. The integration with name-hash.c will follow in a later change. The solution here is to use ensure_full_index() when we determine that the requested path is within a sparse directory entry. This will populate the name-hash hashtable as the index is recomputed from scratch. There may be cases where the caller is trying to find an untracked path that is not in the index but also is not within a sparse directory entry. We want to minimize the overhead for these requests. If we used index_name_pos() to find the insertion order of the path, then we could determine from that position if a sparse-directory exists. (In fact, just calling index_name_pos() in that case would lead to expanding the index to a full index.) However, this takes O(log N) time where N is the number of cache entries. To keep the performance of this call based mostly on the input string, use index_file_exists() to look for the ancestors of the path. Using the heuristic that a sparse directory is likely to have a small number of parent directories, we start from the bottom and build up. Use a string buffer to allow mutating the path name to terminate after each slash for each hashset test. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30cache-tree: integrate with sparse directory entriesDerrick Stolee
The cache-tree extension was previously disabled with sparse indexes. However, the cache-tree is an important performance feature for commands like 'git status' and 'git add'. Integrate it with sparse directory entries. When writing a sparse index, completely clear and recalculate the cache tree. By starting from scratch, the only integration necessary is to check if we hit a sparse directory entry and create a leaf of the cache-tree that has an entry_count of one and no subtrees. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30sparse-checkout: toggle sparse index from builtinDerrick Stolee
The sparse index extension is used to signal that index writes should be in sparse mode. This was only updated using GIT_TEST_SPARSE_INDEX=1. Add a '--[no-]sparse-index' option to 'git sparse-checkout init' that specifies if the sparse index should be used. It also updates the index to use the correct format, either way. Add a warning in the documentation that the use of a repository extension might reduce compatibility with third-party tools. 'git sparse-checkout init' already sets extension.worktreeConfig, which places most sparse-checkout users outside of the scope of most third-party tools. Update t1092-sparse-checkout-compatibility.sh to use this CLI instead of GIT_TEST_SPARSE_INDEX=1. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30sparse-index: add index.sparse config optionDerrick Stolee
When enabled, this config option signals that index writes should attempt to use sparse-directory entries. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30submodule: sparse-index should not collapse linksDerrick Stolee
A submodule is stored as a "Git link" that actually points to a commit within a submodule. Submodules are populated or not depending on submodule configuration, not sparse-checkout. To ensure that the sparse-index feature integrates correctly with submodules, we should not collapse a directory if there is a Git link within its range. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30sparse-index: convert from full to sparseDerrick Stolee
If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30sparse-index: implement ensure_full_index()Derrick Stolee
We will mark an in-memory index_state as having sparse directory entries with the sparse_index bit. These currently cannot exist, but we will add a mechanism for collapsing a full index to a sparse one in a later change. That will happen at write time, so we must first allow parsing the format before writing it. Commands or methods that require a full index in order to operate can call ensure_full_index() to expand that index in-memory. This requires parsing trees using that index's repository. Sparse directory entries have a specific 'ce_mode' value. The macro S_ISSPARSEDIR(ce->ce_mode) can check if a cache_entry 'ce' has this type. This ce_mode is not possible with the existing index formats, so we don't also verify all properties of a sparse-directory entry, which are: 1. ce->ce_mode == 0040000 2. ce->flags & CE_SKIP_WORKTREE is true 3. ce->name[ce->namelen - 1] == '/' (ends in dir separator) 4. ce->oid references a tree object. These are all semi-enforced in ensure_full_index() to some extent. Any deviation will cause a warning at minimum or a failure in the worst case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30sparse-index: add guard to ensure full indexDerrick Stolee
Upcoming changes will introduce modifications to the index format that allow sparse directories. It will be useful to have a mechanism for converting those sparse index files into full indexes by walking the tree at those sparse directories. Name this method ensure_full_index() as it will guarantee that the index is fully expanded. This method is not implemented yet, and instead we focus on the scaffolding to declare it and call it at the appropriate time. Add a 'command_requires_full_index' member to struct repo_settings. This will be an indicator that we need the index in full mode to do certain index operations. This starts as being true for every command, then we will set it to false as some commands integrate with sparse indexes. If 'command_requires_full_index' is true, then we will immediately expand a sparse index to a full one upon reading from disk. This suffices for now, but we will want to add more callers to ensure_full_index() later. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>