summaryrefslogtreecommitdiff
path: root/unpack-trees.c
AgeCommit message (Collapse)Author
2021-04-30Merge branch 'mt/parallel-checkout-part-2'Junio C Hamano
The checkout machinery has been taught to perform the actual write-out of the files in parallel when able. * mt/parallel-checkout-part-2: parallel-checkout: add design documentation parallel-checkout: support progress displaying parallel-checkout: add configuration options parallel-checkout: make it truly parallel unpack-trees: add basic support for parallel checkout
2021-04-30Merge branch 'ds/sparse-index-protections'Junio C Hamano
Builds on top of the sparse-index infrastructure to mark operations that are not ready to mark with the sparse index, causing them to fall back on fully-populated index that they always have worked with. * ds/sparse-index-protections: (47 commits) name-hash: use expand_to_path() sparse-index: expand_to_path() name-hash: don't add directories to name_hash revision: ensure full index resolve-undo: ensure full index read-cache: ensure full index pathspec: ensure full index merge-recursive: ensure full index entry: ensure full index dir: ensure full index update-index: ensure full index stash: ensure full index rm: ensure full index merge-index: ensure full index ls-files: ensure full index grep: ensure full index fsck: ensure full index difftool: ensure full index commit: ensure full index checkout: ensure full index ...
2021-04-19parallel-checkout: support progress displayingMatheus Tavares
Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-19parallel-checkout: add configuration optionsMatheus Tavares
Make parallel checkout configurable by introducing two new settings: checkout.workers and checkout.thresholdForParallelism. The first defines the number of workers (where one means sequential checkout), and the second defines the minimum number of entries to attempt parallel checkout. To decide the default value for checkout.workers, the parallel version was benchmarked during three operations in the linux repo, with cold cache: cloning v5.8, checking out v5.8 from v2.6.15 (checkout I) and checking out v5.8 from v5.7 (checkout II). The four tables below show the mean run times and standard deviations for 5 runs in: a local file system on SSD, a local file system on HDD, a Linux NFS server, and Amazon EFS (all on Linux). Each parallel checkout test was executed with the number of workers that brings the best overall results in that environment. Local SSD: Sequential 10 workers Speedup Clone 8.805 s ± 0.043 s 3.564 s ± 0.041 s 2.47 ± 0.03 Checkout I 9.678 s ± 0.057 s 4.486 s ± 0.050 s 2.16 ± 0.03 Checkout II 5.034 s ± 0.072 s 3.021 s ± 0.038 s 1.67 ± 0.03 Local HDD: Sequential 10 workers Speedup Clone 32.288 s ± 0.580 s 30.724 s ± 0.522 s 1.05 ± 0.03 Checkout I 54.172 s ± 7.119 s 54.429 s ± 6.738 s 1.00 ± 0.18 Checkout II 40.465 s ± 2.402 s 38.682 s ± 1.365 s 1.05 ± 0.07 Linux NFS server (v4.1, on EBS, single availability zone): Sequential 32 workers Speedup Clone 240.368 s ± 6.347 s 57.349 s ± 0.870 s 4.19 ± 0.13 Checkout I 242.862 s ± 2.215 s 58.700 s ± 0.904 s 4.14 ± 0.07 Checkout II 65.751 s ± 1.577 s 23.820 s ± 0.407 s 2.76 ± 0.08 EFS (v4.1, replicated over multiple availability zones): Sequential 32 workers Speedup Clone 922.321 s ± 2.274 s 210.453 s ± 3.412 s 4.38 ± 0.07 Checkout I 1011.300 s ± 7.346 s 297.828 s ± 0.964 s 3.40 ± 0.03 Checkout II 294.104 s ± 1.836 s 126.017 s ± 1.190 s 2.33 ± 0.03 The above benchmarks show that parallel checkout is most effective on repositories located on an SSD or over a distributed file system. For local file systems on spinning disks, and/or older machines, the parallelism does not always bring a good performance. For this reason, the default value for checkout.workers is one, a.k.a. sequential checkout. To decide the default value for checkout.thresholdForParallelism, another benchmark was executed in the "Local SSD" setup, where parallel checkout showed to be beneficial. This time, we compared the runtime of a `git checkout -f`, with and without parallelism, after randomly removing an increasing number of files from the Linux working tree. The "sequential fallback" column below corresponds to the executions where checkout.workers was 10 but checkout.thresholdForParallelism was equal to the number of to-be-updated files plus one (so that we end up writing sequentially). Each test case was sampled 15 times, and each sample had a randomly different set of files removed. Here are the results: sequential fallback 10 workers speedup 10 files 772.3 ms ± 12.6 ms 769.0 ms ± 13.6 ms 1.00 ± 0.02 20 files 780.5 ms ± 15.8 ms 775.2 ms ± 9.2 ms 1.01 ± 0.02 50 files 806.2 ms ± 13.8 ms 767.4 ms ± 8.5 ms 1.05 ± 0.02 100 files 833.7 ms ± 21.4 ms 750.5 ms ± 16.8 ms 1.11 ± 0.04 200 files 897.6 ms ± 30.9 ms 730.5 ms ± 14.7 ms 1.23 ± 0.05 500 files 1035.4 ms ± 48.0 ms 677.1 ms ± 22.3 ms 1.53 ± 0.09 1000 files 1244.6 ms ± 35.6 ms 654.0 ms ± 38.3 ms 1.90 ± 0.12 2000 files 1488.8 ms ± 53.4 ms 658.8 ms ± 23.8 ms 2.26 ± 0.12 From the above numbers, 100 files seems to be a reasonable default value for the threshold setting. Note: Up to 1000 files, we observe a drop in the execution time of the parallel code with an increase in the number of files. This is a rather odd behavior, but it was observed in multiple repetitions. Above 1000 files, the execution time increases according to the number of files, as one would expect. About the test environments: Local SSD tests were executed on an i7-7700HQ (4 cores with hyper-threading) running Manjaro Linux. Local HDD tests were executed on an Intel(R) Xeon(R) E3-1230 (also 4 cores with hyper-threading), HDD Seagate Barracuda 7200.14 SATA 3.1, running Debian. NFS and EFS tests were executed on an Amazon EC2 c5n.xlarge instance, with 4 vCPUs. The Linux NFS server was running on a m6g.large instance with 2 vCPUSs and a 1 TB EBS GP2 volume. Before each timing, the linux repository was removed (or checked out back to its previous state), and `sync && sysctl vm.drop_caches=3` was executed. Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-19unpack-trees: add basic support for parallel checkoutMatheus Tavares
This new interface allows us to enqueue some of the entries being checked out to later uncompress them, apply in-process filters, and write out the files in parallel. For now, the parallel checkout machinery is enabled by default and there is no user configuration, but run_parallel_checkout() just writes the queued entries in sequence (without spawning additional workers). The next patch will actually implement the parallelism and, later, we will make it configurable. Note that, to avoid potential data races, not all entries are eligible for parallel checkout. Also, paths that collide on disk (e.g. case-sensitive paths in case-insensitive file systems), are detected by the parallel checkout code and skipped, so that they can be safely sequentially handled later. The collision detection works like the following: - If the collision was at basename (e.g. 'a/b' and 'a/B'), the framework detects it by looking for EEXIST and EISDIR errors after an open(O_CREAT | O_EXCL) failure. - If the collision was at dirname (e.g. 'a/b' and 'A'), it is detected at the has_dirs_only_path() check, which is done for the leading path of each item in the parallel checkout queue. Both verifications rely on the fact that, before enqueueing an entry for parallel checkout, checkout_entry() makes sure that there is no file at the entry's path and that its leading components are all real directories. So, any later change in these conditions indicates that there was a collision (either between two parallel-eligible entries or between an eligible and an ineligible one). After all parallel-eligible entries have been processed, the collided (and thus, skipped) entries are sequentially fed to checkout_entry() again. This is similar to the way the current code deals with collisions, overwriting the previously checked out entries with the subsequent ones. The only difference is that, since we no longer create the files in the same order that they appear on index, we are not able to determine which of the colliding entries will survive on disk (for the classic code, it is always the last entry). Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-02Merge branch 'mt/parallel-checkout-part-1'Junio C Hamano
Preparatory API changes for parallel checkout. * mt/parallel-checkout-part-1: entry: add checkout_entry_ca() taking preloaded conv_attrs entry: move conv_attrs lookup up to checkout_entry() entry: extract update_ce_after_write() from write_entry() entry: make fstat_output() and read_blob_entry() public entry: extract a header file for entry.c functions convert: add classification for conv_attrs struct convert: add get_stream_filter_ca() variant convert: add [async_]convert_to_working_tree_ca() variants convert: make convert_attrs() and convert structs public
2021-03-30Merge branch 'mt/checkout-remove-nofollow'Junio C Hamano
When "git checkout" removes a path that does not exist in the commit it is checking out, it wasn't careful enough not to follow symbolic links, which has been corrected. * mt/checkout-remove-nofollow: checkout: don't follow symlinks when removing entries symlinks: update comment on threaded_check_leading_path()
2021-03-30unpack-trees: allow sparse directoriesDerrick Stolee
The index_pos_by_traverse_info() currently throws a BUG() when a directory entry exists exactly in the index. We need to consider that it is possible to have a directory in a sparse index as long as that entry is itself marked with the skip-worktree bit. The 'pos' variable is assigned a negative value if an exact match is not found. Since a directory name can be an exact match, it is no longer an error to have a nonnegative 'pos' value. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30unpack-trees: ensure full indexDerrick Stolee
The next change will translate full indexes into sparse indexes at write time. The existing logic provides a way for every sparse index to be expanded to a full index at read time. However, there are cases where an index is written and then continues to be used in-memory to perform further updates. unpack_trees() is frequently called after such a write. In particular, commands like 'git reset' do this double-update of the index. Ensure that we have a full index when entering unpack_trees(), but only when command_requires_full_index is true. This is always true at the moment, but we will later relax that after unpack_trees() is updated to handle sparse directory entries. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-23entry: extract a header file for entry.c functionsMatheus Tavares
The declarations of entry.c's public functions and structures currently reside in cache.h. Although not many, they contribute to the size of cache.h and, when changed, cause the unnecessary recompilation of modules that don't really use these functions. So let's move them to a new entry.h header. While at it let's also move a comment related to checkout_entry() from entry.c to entry.h as it's more useful to describe the function there. Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-22Merge branch 'dl/stash-show-untracked'Junio C Hamano
"git stash show" learned to optionally show untracked part of the stash. * dl/stash-show-untracked: stash show: learn stash.showIncludeUntracked stash show: teach --include-untracked and --only-untracked
2021-03-19Merge branch 'js/fsmonitor-unpack-fix'Junio C Hamano
The data structure used by fsmonitor interface was not properly duplicated during an in-core merge, leading to use-after-free etc. * js/fsmonitor-unpack-fix: fsmonitor: do not forget to release the token in `discard_index()` fsmonitor: fix memory corruption in some corner cases
2021-03-18checkout: don't follow symlinks when removing entriesMatheus Tavares
At 1d718a5108 ("do not overwrite untracked symlinks", 2011-02-20), symlink.c:check_leading_path() started returning different codes for FL_ENOENT and FL_SYMLINK. But one of its callers, unlink_entry(), was not adjusted for this change, so it started to follow symlinks on the leading path of to-be-removed entries. Fix that and add a regression test. Note that since 1d718a5108 check_leading_path() no longer differentiates the case where it found a symlink in the path's leading components from the cases where it found a regular file or failed to lstat() the component. So, a side effect of this current patch is that unlink_entry() now returns early in all of these three cases. And because we no longer try to unlink such paths, we also don't get the warning from remove_or_warn(). For the regular file and symlink cases, it's questionable whether the warning was useful in the first place: unlink_entry() removes tracked paths that should no longer be present in the state we are checking out to. If the path had its leading dir replaced by another file, it means that the basename already doesn't exist, so there is no need for a warning. Sure, we are leaving a regular file or symlink behind at the path's dirname, but this file is either untracked now (so again, no need to warn), or it will be replaced by a tracked file during the next phase of this checkout operation. As for failing to lstat() one of the leading components, the basename might still exist only we cannot unlink it (e.g. due to the lack of the required permissions). Since the user expect it to be removed (especially with checkout's --no-overlay option), add back the warning in this more relevant case. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-17fsmonitor: fix memory corruption in some corner casesJohannes Schindelin
In 56c6910028a (fsmonitor: change last update timestamp on the index_state to opaque token, 2020-01-07), we forgot to adjust the part of `unpack_trees()` that copies the FSMonitor "last-update" information that we copy from the source index to the result index since 679f2f9fdd2 (unpack-trees: skip stat on fsmonitor-valid files, 2019-11-20). Since the "last-update" information is no longer a 64-bit number, but a free-form string that has been allocated, we need to duplicate it rather than just copying it. This is important because there _are_ cases when `unpack_trees()` will perform a oneway merge that implicitly calls `refresh_fsmonitor()` (which will allocate that "last-update" token). This happens _after_ that token was copied into the result index. However, we _then_ call `check_updates()` on that index, which will _also_ call `refresh_fsmonitor()`, accessing the "last-update" string, which by now would be released already. In the instance that lead to this patch, this caused a segmentation fault during a lengthy, complicated rebase involving the todo command `reset` that (crucially) had to updated many files. Unfortunately, it seems very hard to trigger that crash, therefore this patch is not accompanied by a regression test. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-09Sync with Git 2.30.2 for CVE-2021-21300Junio C Hamano
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-05stash show: teach --include-untracked and --only-untrackedDenton Liu
Stash entries can be made with untracked files via `git stash push --include-untracked`. However, because the untracked files are stored in the third parent of the stash entry and not the stash entry itself, running `git stash show` does not include the untracked files as part of the diff. With --include-untracked, untracked paths, which are recorded in the third-parent if it exists, are shown in addition to the paths that have modifications between the stash base and the working tree in the stash. It is possible to manually craft a malformed stash entry where duplicate untracked files in the stash entry will mask tracked files. We detect and error out in that case via a custom unpack_trees() callback: stash_worktree_untracked_merge(). Also, teach stash the --only-untracked option which only shows the untracked files of a stash entry. This is similar to `git show stash^3` but it is nice to provide a convenient abstraction for it so that users do not have to think about the underlying implementation. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-12Sync with 2.28.1Johannes Schindelin
* maint-2.28: Git 2.28.1 Git 2.27.1 Git 2.26.3 Git 2.25.5 Git 2.24.4 Git 2.23.4 Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.26.3Johannes Schindelin
* maint-2.26: Git 2.26.3 Git 2.25.5 Git 2.24.4 Git 2.23.4 Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.25.5Johannes Schindelin
* maint-2.25: Git 2.25.5 Git 2.24.4 Git 2.23.4 Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.24.4Johannes Schindelin
* maint-2.24: Git 2.24.4 Git 2.23.4 Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.23.4Johannes Schindelin
* maint-2.23: Git 2.23.4 Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.22.5Johannes Schindelin
* maint-2.22: Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.21.4Johannes Schindelin
* maint-2.21: Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.20.5Johannes Schindelin
* maint-2.20: Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.19.6Johannes Schindelin
* maint-2.19: Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.18.5Johannes Schindelin
* maint-2.18: Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.17.6Johannes Schindelin
* maint-2.17: Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12unpack_trees(): start with a fresh lstat cacheMatheus Tavares
We really want to avoid relying on stale information. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2021-01-24sparse-checkout: load sparse-checkout patternsDerrick Stolee
A future feature will want to load the sparse-checkout patterns into a pattern_list, but the current mechanism to do so is a bit complicated. This is made difficult due to needing to find the sparse-checkout file in different ways throughout the codebase. The logic implemented in the new get_sparse_checkout_patterns() was duplicated in populate_from_existing_patterns() in unpack-trees.c. Use the new method instead, keeping the logic around handling the struct unpack_trees_options. The callers to get_sparse_checkout_filename() in builtin/sparse-checkout.c manipulate the sparse-checkout file directly, so it is not appropriate to replace logic in that file with get_sparse_checkout_patterns(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-24cache-tree: clean up cache_tree_update()Derrick Stolee
Make the method safer by allocating a cache_tree member for the given index_state if it is not already present. This is preferrable to a BUG() statement or returning with an error because future callers will want to populate an empty cache-tree using this method. Callers can also remove their conditional allocations of cache_tree. Also drop local variables that can be found directly from the 'istate' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04unpack-trees: add trace2 regionsDerrick Stolee
The unpack_trees() method is quite complicated and its performance can change dramatically depending on how it is used. We already have some performance tracing regions, but they have not been updated to the trace2 API. Do so now. We already have trace2 regions in unpack_trees.c:clear_ce_flags(), which uses a linear scan through the index without recursing into trees. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28strvec: convert remaining callers away from argv_array nameJeff King
We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts all of the remaining files, as the resulting diff is reasonably sized. The conversion was done purely mechanically with: git ls-files '*.c' '*.h' | xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28strvec: rename files from argv-array to strvecJeff King
This requires updating #include lines across the code-base, but that's all fairly mechanical, and was done with: git ls-files '*.c' '*.h' | xargs perl -i -pe 's/argv-array.h/strvec.h/' Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-20Merge branch 'en/sparse-checkout'Junio C Hamano
Consistency fix to a topic already in 'master'. * en/sparse-checkout: unpack-trees: also allow get_progress() to work on a different index
2020-05-15unpack-trees: also allow get_progress() to work on a different indexElijah Newren
commit b0a5a12a60 ("unpack-trees: allow check_updates() to work on a different index", 2020-03-27) allowed check_updates() to work on a different index, but it called get_progress() which was hardcoded to work on o->result much like check_updates() had been. Update it to also accept an index parameter and have check_updates() pass that parameter along so that both are working on the same index. Noticed-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-13Merge branch 'ds/sparse-updates-oob-access-fix'Junio C Hamano
The code to skip unmerged paths in the index when sparse checkout is in use would have made out-of-bound access of the in-core index when the last path was unmerged, which has been corrected. * ds/sparse-updates-oob-access-fix: unpack-trees: avoid array out-of-bounds error
2020-05-08Merge branch 'ds/sparse-allow-empty-working-tree'Junio C Hamano
The sparse-checkout patterns have been forbidden from excluding all paths, leaving an empty working tree, for a long time. This limitation has been lifted. * ds/sparse-allow-empty-working-tree: sparse-checkout: stop blocking empty workdirs
2020-05-08unpack-trees: avoid array out-of-bounds errorDerrick Stolee
The loop in warn_conflicted_path() that checks for the count of entries with the same path uses "i+count" for the array entry. However, the loop only verifies that the value of count is below the array size. Fix this by adding i to the condition. I hit this condition during a test of the in-tree sparse-checkout feature, so it is exercised by the end of the series. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> [jc: readability fix] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-04sparse-checkout: stop blocking empty workdirsDerrick Stolee
Remove the error condition when updating the sparse-checkout leaves an empty working directory. This behavior was added in 9e1afb167 (sparse checkout: inhibit empty worktree, 2009-08-20). The comment was added in a7bc906f2 (Add explanation why we do not allow to sparse checkout to empty working tree, 2011-09-22) in response to a "dubious" comment in 84563a624 (unpack-trees.c: cosmetic fix, 2010-12-22). With the recent "cone mode" and "git sparse-checkout init [--cone]" command, it is common to set a reasonable sparse-checkout pattern set of /* !/*/ which matches only files at root. If the repository has no such files, then their "git sparse-checkout init" command will fail. Now that we expect this to be a common pattern, we should not have the commands fail on an empty working directory. If it is a confusing result, then the user can recover with "git sparse-checkout disable" or "git sparse-checkout set". This is especially simple when using cone mode. Reported-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-29Merge branch 'en/sparse-checkout'Junio C Hamano
"sparse-checkout" UI improvements. * en/sparse-checkout: sparse-checkout: provide a new reapply subcommand unpack-trees: failure to set SKIP_WORKTREE bits always just a warning unpack-trees: provide warnings on sparse updates for unmerged paths too unpack-trees: make sparse path messages sound like warnings unpack-trees: split display_error_msgs() into two unpack-trees: rename ERROR_* fields meant for warnings to WARNING_* unpack-trees: move ERROR_WOULD_LOSE_SUBMODULE earlier sparse-checkout: use improved unpack_trees porcelain messages sparse-checkout: use new update_sparsity() function unpack-trees: add a new update_sparsity() function unpack-trees: pull sparse-checkout pattern reading into a new function unpack-trees: do not mark a dirty path with SKIP_WORKTREE unpack-trees: allow check_updates() to work on a different index t1091: make some tests a little more defensive against failures unpack-trees: simplify pattern_list freeing unpack-trees: simplify verify_absent_sparse() unpack-trees: remove unused error type unpack-trees: fix minor typo in comment
2020-04-28Merge branch 'jt/avoid-prefetch-when-able-in-diff'Junio C Hamano
"git diff" in a partial clone learned to avoid lazy loading blob objects in more casese when they are not needed. * jt/avoid-prefetch-when-able-in-diff: diff: restrict when prefetching occurs diff: refactor object read diff: make diff_populate_filespec_options struct promisor-remote: accept 0 as oid_nr in function
2020-04-02promisor-remote: accept 0 as oid_nr in functionJonathan Tan
There are 3 callers to promisor_remote_get_direct() that first check if the number of objects to be fetched is equal to 0. Fold that check into promisor_remote_get_direct(), and in doing so, be explicit as to what promisor_remote_get_direct() does if oid_nr is 0 (it returns 0, success, immediately). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: failure to set SKIP_WORKTREE bits always just a warningElijah Newren
Setting and clearing of the SKIP_WORKTREE bit is not only done when users run 'sparse-checkout'; other commands such as 'checkout' also run through unpack_trees() which has logic for handling this special bit. As such, we need to consider how they handle special cases. A couple comparison points should help explain the rationale for changing how unpack_trees() handles these bits: Ignoring sparse checkouts for a moment, if you are switching branches and have dirty changes, it is only considered an error that will prevent the branch switching from being successful if the dirty file happens to be one of the paths with different contents. SKIP_WORKTREE has always been considered advisory; for example, if rebase or merge need or even want to materialize a path as part of their work, they have always been allowed to do so regardless of the SKIP_WORKTREE setting. This has been used for unmerged paths, but it was often used for paths it wasn't needed just because it made the code simpler. It was a best-effort consideration, and when it materialized paths contrary to the SKIP_WORKTREE setting, it was never required to even print a warning message. In the past if you trying to run e.g. 'git checkout' and: 1) you had a path that was materialized and had some dirty changes 2) the path was listed in $GITDIR/info/sparse-checkout 3) this path did not different between the current and target branches then despite the comparison points above, the inability to set SKIP_WORKTREE was treated as a *hard* error that would abort the checkout operation. This is completely inconsistent with how SKIP_WORKTREE is handled elsewhere, and rather annoying for users as leaving the paths materialized in the working copy (with a simple warning) should present no problem at all. Downgrade any errors from inability to toggle the SKIP_WORKTREE bit to a warning and allow the operations to continue. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: provide warnings on sparse updates for unmerged paths tooElijah Newren
When sparse-checkout runs to update the list of sparsity patterns, it gives warnings if it can't remove paths from the working tree because those files have dirty changes. Add a similar warning for unmerged paths as well. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: make sparse path messages sound like warningsElijah Newren
The messages for problems with sparse paths are phrased as errors that cause the operation to abort, even though we are not making the operation abort. Reword the messages to make sense in their new context. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: split display_error_msgs() into twoElijah Newren
display_error_msgs() is never called to show messages of both ERROR_* and WARNING_* types at the same time; it is instead called multiple times, separately for each type. Since we want to display these types differently, make two slightly different versions of this function. A subsequent commit will further modify unpack_trees() and how it calls the new display_warning_msgs(). Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: rename ERROR_* fields meant for warnings to WARNING_*Elijah Newren
We want to treat issues with setting the SKIP_WORKTREE bit as a warning rather than an error; rename the enum values to reflect this intent as a simple step towards that goal. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: move ERROR_WOULD_LOSE_SUBMODULE earlierElijah Newren
A minor change, but we want to convert the sparse messages to warnings and this allows us to group warnings and errors. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: add a new update_sparsity() functionElijah Newren
Previously, the only way to update the SKIP_WORKTREE bits for various paths was invoking `git read-tree -mu HEAD` or calling the same code that this codepath invoked. This however had a number of problems if the index or working directory were not clean. First, let's consider the case: Flipping SKIP_WORKTREE -> !SKIP_WORKTREE (materializing files) If the working tree was clean this was fine, but if there were files or directories or symlinks or whatever already present at the given path then the operation would abort with an error. Let's label this case for later discussion: A) There is an untracked path in the way Now let's consider the opposite case: Flipping !SKIP_WORKTREE -> SKIP_WORKTREE (removing files) If the index and working tree was clean this was fine, but if there were any unclean paths we would run into problems. There are three different cases to consider: B) The path is unmerged C) The path has unstaged changes D) The path has staged changes (differs from HEAD) If any path fell into case B or C, then the whole operation would be aborted with an error. With sparse-checkout, the whole operation would be aborted for case D as well, but for its predecessor of using `git read-tree -mu HEAD` directly, any paths that fell into case D would be removed from the working copy and the index entry for that path would be reset to match HEAD -- which looks and feels like data loss to users (only a few are even aware to ask whether it can be recovered, and even then it requires walking through loose objects trying to match up the right ones). Refusing to remove files that have unsaved user changes is good, but refusing to work on any other paths is very problematic for users. If the user is in the middle of a rebase or has made modifications to files that bring in more dependencies, then for their build to work they need to update the sparse paths. This logic has been preventing them from doing so. Sometimes in response, the user will stage the files and re-try, to no avail with sparse-checkout or to the horror of losing their changes if they are using its predecessor of `git read-tree -mu HEAD`. Add a new update_sparsity() function which will not error out in any of these cases but behaves as follows for the special cases: A) Leave the file in the working copy alone, clear the SKIP_WORKTREE bit, and print a warning (thus leaving the path in a state where status will report the file as modified, which seems logical). B) Do NOT mark this path as SKIP_WORKTREE, and leave it as unmerged. C) Do NOT mark this path as SKIP_WORKTREE and print a warning about the dirty path. D) Mark the path as SKIP_WORKTREE, but do not revert the version stored in the index to match HEAD; leave the contents alone. I tried a different behavior for A (leave the SKIP_WORKTREE bit set), but found it very surprising and counter-intuitive (e.g. the user sees it is present along with all the other files in that directory, tries to stage it, but git add ignores it since the SKIP_WORKTREE bit is set). A & C seem like optimal behavior to me. B may be as well, though I wonder if printing a warning would be an improvement. Some might be slightly surprised by D at first, but given that it does the right thing with `git commit` and even `git commit -a` (`git add` ignores entries that are marked SKIP_WORKTREE and thus doesn't delete them, and `commit -a` is similar), it seems logical to me. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: pull sparse-checkout pattern reading into a new functionElijah Newren
Create a populate_from_existing_patterns() function for reading the path_patterns from $GIT_DIR/info/sparse-checkout so that we can re-use it elsewhere. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>