summaryrefslogtreecommitdiff
path: root/read-cache.c
AgeCommit message (Collapse)Author
2021-03-19Merge branch 'rs/calloc-array'Junio C Hamano
CALLOC_ARRAY() macro replaces many uses of xcalloc(). * rs/calloc-array: cocci: allow xcalloc(1, size) use CALLOC_ARRAY git-compat-util.h: drop trailing semicolon from macro definition
2021-03-19Merge branch 'js/fsmonitor-unpack-fix'Junio C Hamano
The data structure used by fsmonitor interface was not properly duplicated during an in-core merge, leading to use-after-free etc. * js/fsmonitor-unpack-fix: fsmonitor: do not forget to release the token in `discard_index()` fsmonitor: fix memory corruption in some corner cases
2021-03-17fsmonitor: do not forget to release the token in `discard_index()`Johannes Schindelin
In 56c6910028a (fsmonitor: change last update timestamp on the index_state to opaque token, 2020-01-07), we forgot to adjust `discard_index()` to release the "last-update" token: it is no longer a 64-bit number, but a free-form string that has been allocated. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-14use CALLOC_ARRAYRené Scharfe
Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-01Merge branch 'ns/raise-write-index-buffer-size'Junio C Hamano
Raise the buffer size used when writing the index file out from (obviously too small) 8kB to (clearly sufficiently large) 128kB. * ns/raise-write-index-buffer-size: read-cache: make the index write buffer size 128K
2021-03-01Merge branch 'jh/fsmonitor-prework'Junio C Hamano
Preliminary changes to fsmonitor integration. * jh/fsmonitor-prework: fsmonitor: refactor initialization of fsmonitor_last_update token fsmonitor: allow all entries for a folder to be invalidated fsmonitor: log FSMN token when reading and writing the index fsmonitor: log invocation of FSMonitor hook to trace2 read-cache: log the number of scanned files to trace2 read-cache: log the number of lstat calls to trace2 preload-index: log the number of lstat calls to trace2 p7519: add trace logging during perf test p7519: move watchman cleanup earlier in the test p7519: fix watchman watch-list test on Windows p7519: do not rely on "xargs -d" in test
2021-02-24read-cache: make the index write buffer size 128KNeeraj Singh
Writing an index 8K at a time invokes the OS filesystem and caching code very frequently, introducing noticeable overhead while writing large indexes. When experimenting with different write buffer sizes on Windows writing the Windows OS repo index (260MB), most of the benefit came by bumping the index write buffer size to 64K. I picked 128K to ensure that we're past the knee of the curve. With this change, the time under do_write_index for an index with 3M files goes from ~1.02s to ~0.72s. Signed-off-by: Neeraj Singh <neerajsi@ntdev.microsoft.com> Acked-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-17read-cache: log the number of scanned files to trace2Jeff Hostetler
Report the number of files in the working directory that were read and their hashes verified in `refresh_index()`. FSMonitor improves the performance of commands like `git status` by avoiding scanning the disk for changed files. Let's measure this. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-17read-cache: log the number of lstat calls to trace2Jeff Hostetler
Report the total number of calls made to lstat() inside of refresh_index(). FSMonitor improves the performance of commands like `git status` by avoiding scanning the disk for changed files. This can be seen in `refresh_index()`. Let's measure this. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-06read-cache: try not to peek into `struct {lock_,temp}file`Martin Ågren
Similar to the previous commits, try to avoid peeking into the `struct lock_file`. We also have some `struct tempfile`s -- let's avoid looking into those as well. Note that `do_write_index()` takes a tempfile and that when we call it, we either have a tempfile which we can easily hand down, or we have a lock file, from which we need to somehow obtain the internal tempfile. So we need to leave that one instance of peeking-into. Nevertheless, this commit leaves us not relying on exactly how the path of the tempfile / lock file is stored internally. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-06read-cache: fix mem-pool allocation for multi-threaded index loadingRené Scharfe
44c7e1a7e0 (mem-pool: use more standard initialization and finalization, 2020-08-15) moved the allocation of the mem-pool structure to callers. It also added an allocation to load_cache_entries_threaded(), but for an unrelated mem-pool. Fix that by allocating the correct one instead -- the one that is initialized two lines later. Reported-by: Sandor Bodo-Merle <sbodomerle@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-18mem-pool: use more standard initialization and finalizationElijah Newren
A typical memory type, such as strbuf, hashmap, or string_list can be stored on the stack or embedded within another structure. mem_pool cannot be, because of how mem_pool_init() and mem_pool_discard() are written. mem_pool_init() does essentially the following (simplified for purposes of explanation here): void mem_pool_init(struct mem_pool **pool...) { *pool = xcalloc(1, sizeof(*pool)); It seems weird to require that mem_pools can only be accessed through a pointer. It also seems slightly dangerous: unlike strbuf_release() or strbuf_reset() or string_list_clear(), all of which put the data structure into a state where it can be re-used after the call, mem_pool_discard(pool) will leave pool pointing at free'd memory. read-cache (and split-index) are the only current users of mem_pools, and they haven't fallen into a use-after-free mistake here, but it seems likely to be problematic for future users especially since several of the current callers of mem_pool_init() will only call it when the mem_pool* is not already allocated (i.e. is NULL). This type of mechanism also prevents finding synchronization points where one can free existing memory and then resume more operations. It would be natural at such points to run something like mem_pool_discard(pool...); and, if necessary, mem_pool_init(&pool...); and then carry on continuing to use the pool. However, this fails badly if several objects had a copy of the value of pool from before these commands; in such a case, those objects won't get the updated value of pool that mem_pool_init() overwrites pool with and they'll all instead be reading and writing from free'd memory. Modify mem_pool_init()/mem_pool_discard() to behave more like strbuf_init()/strbuf_release() or string_list_init()/string_list_clear() In particular: (1) make mem_pool_init() just take a mem_pool* and have it only worry about allocating struct mp_blocks, not the struct mem_pool itself, (2) make mem_pool_discard() free the memory that the pool was responsible for, but leave it in a state where it can be used to allocate more memory afterward (without the need to call mem_pool_init() again). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16read-cache: remove bogus shortcutRené Scharfe
has_dir_name() has some optimizations for the case where entries are added to an index in the correct order. They kick in if the new entry sorts after the last one. One of them exits early if the last entry has a longer name than the directory of the new entry. Here's its comment: /* * The directory prefix lines up with part of * a longer file or directory name, but sorts * after it, so this sub-directory cannot * collide with a file. * * last: xxx/yy-file (because '-' sorts before '/') * this: xxx/yy/abc */ However, a file named xxx/yy would be sorted before xxx/yy-file because '-' sorts after NUL, so the length check against the last entry is not sufficient to rule out a collision. Remove it. Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Suggested-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-10Merge branch 'js/mingw-loosen-overstrict-tree-entry-checks'Junio C Hamano
Further tweak to a "no backslash in indexed paths" for Windows port we applied earlier. * js/mingw-loosen-overstrict-tree-entry-checks: mingw: safeguard better against backslashes in file names
2020-01-10mingw: safeguard better against backslashes in file namesJohannes Schindelin via GitGitGadget
In 224c7d70fa1 (mingw: only test index entries for backslashes, not tree entries, 2019-12-31), we relaxed the check for backslashes in tree entries to check only index entries. However, the code change was incorrect: it was added to `add_index_entry_with_check()`, not to `add_index_entry()`, so under certain circumstances it was possible to side-step the protection. Besides, the description of that commit purported that all index entries would be checked when in fact they were only checked when being added to the index (there are code paths that do not do that, constructing "transient" index entries). In any case, it was pointed out in one insightful review at https://github.com/git-for-windows/git/pull/2437#issuecomment-566771835 that it would be a much better idea to teach `verify_path()` to perform the check for a backslash. This is safer, even if it comes with two notable drawbacks: - `verify_path()` cannot say _what_ is wrong with the path, therefore the user will no longer be told that there was a backslash in the path, only that the path was invalid. - The `git apply` command also calls the `verify_path()` function, and might have been able to handle Windows-style paths (i.e. with backslashes instead of forward slashes). This will no longer be possible unless the user (temporarily) sets `core.protectNTFS=false`. Note that `git add <windows-path>` will _still_ work because `normalize_path_copy_len()` will convert the backslashes to forward slashes before hitting the code path that creates an index entry. The clear advantage is that `verify_path()`'s purpose is to check the validity of the file name, therefore we naturally tap into all the code paths that need safeguarding, also implicitly into future code paths. The benefits of that approach outweigh the downsides, so let's move the check from `add_index_entry_with_check()` to `verify_path()`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-06Merge branch 'js/mingw-loosen-overstrict-tree-entry-checks'Junio C Hamano
An earlier update to Git for Windows declared that a tree object is invalid if it has a path component with backslash in it, which was overly strict, which has been corrected. The only protection the Windows users need is to prevent such path (or any path that their filesystem cannot check out) from entering the index. * js/mingw-loosen-overstrict-tree-entry-checks: mingw: only test index entries for backslashes, not tree entries
2020-01-02mingw: only test index entries for backslashes, not tree entriesJohannes Schindelin
During a clone of a repository that contained a file with a backslash in its name in the past, as of v2.24.1(2), Git for Windows prints errors like this: error: filename in tree entry contains backslash: '\' The idea is to prevent Git from even trying to write files with backslashes in their file names: while these characters are valid in file names on other platforms, on Windows it is interpreted as directory separator (which would obviously lead to ambiguities, e.g. when there is a file `a\b` and there is also a file `a/b`). Arguably, this is the wrong layer for that error: As long as the user never checks out the files whose names contain backslashes, there should not be any problem in the first place. So let's loosen the requirements: we now leave tree entries with backslashes in their file names alone, but we do require any entries that are added to the Git index to contain no backslashes on Windows. Note: just as before, the check is guarded by `core.protectNTFS` (to allow overriding the check by toggling that config setting), and it is _only_ performed on Windows, as the backslash is not a directory separator elsewhere, even when writing to NTFS-formatted volumes. An alternative approach would be to try to prevent creating files with backslashes in their file names. However, that comes with its own set of problems. For example, `git config -f C:\ProgramData\Git\config ...` is a very valid way to specify a custom config location, and we obviously do _not_ want to prevent that. Therefore, the approach chosen in this patch would appear to be better. This addresses https://github.com/git-for-windows/git/issues/2435 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-10Sync with Git 2.24.1Junio C Hamano
2019-12-06Sync with 2.23.1Johannes Schindelin
* maint-2.23: (44 commits) Git 2.23.1 Git 2.22.2 Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters ...
2019-12-06Sync with 2.22.2Johannes Schindelin
* maint-2.22: (43 commits) Git 2.22.2 Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors ...
2019-12-06Sync with 2.21.1Johannes Schindelin
* maint-2.21: (42 commits) Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh ...
2019-12-06Sync with 2.20.2Johannes Schindelin
* maint-2.20: (36 commits) Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories ...
2019-12-06Sync with 2.19.3Johannes Schindelin
* maint-2.19: (34 commits) Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams ...
2019-12-06Sync with 2.18.2Johannes Schindelin
* maint-2.18: (33 commits) Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up ...
2019-12-06Sync with 2.17.3Johannes Schindelin
* maint-2.17: (32 commits) Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names ...
2019-12-06Sync with 2.16.6Johannes Schindelin
* maint-2.16: (31 commits) Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names path: safeguard `.git` against NTFS Alternate Streams Accesses ...
2019-12-06Sync with 2.15.4Johannes Schindelin
* maint-2.15: (29 commits) Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names path: safeguard `.git` against NTFS Alternate Streams Accesses clone --recurse-submodules: prevent name squatting on Windows is_ntfs_dotgit(): only verify the leading segment ...
2019-12-06Sync with 2.14.6Johannes Schindelin
* maint-2.14: (28 commits) Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names path: safeguard `.git` against NTFS Alternate Streams Accesses clone --recurse-submodules: prevent name squatting on Windows is_ntfs_dotgit(): only verify the leading segment test-path-utils: offer to run a protectNTFS/protectHFS benchmark ...
2019-12-05mingw: refuse to access paths with trailing spaces or periodsJohannes Schindelin
When creating a directory on Windows whose path ends in a space or a period (or chains thereof), the Win32 API "helpfully" trims those. For example, `mkdir("abc ");` will return success, but actually create a directory called `abc` instead. This stems back to the DOS days, when all file names had exactly 8 characters plus exactly 3 characters for the file extension, and the only way to have shorter names was by padding with spaces. Sadly, this "helpful" behavior is a bit inconsistent: after a successful `mkdir("abc ");`, a `mkdir("abc /def")` will actually _fail_ (because the directory `abc ` does not actually exist). Even if it would work, we now have a serious problem because a Git repository could contain directories `abc` and `abc `, and on Windows, they would be "merged" unintentionally. As these paths are illegal on Windows, anyway, let's disallow any accesses to such paths on that Operating System. For practical reasons, this behavior is still guarded by the config setting `core.protectNTFS`: it is possible (and at least two regression tests make use of it) to create commits without involving the worktree. In such a scenario, it is of course possible -- even on Windows -- to create such file names. Among other consequences, this patch disallows submodules' paths to end in spaces on Windows (which would formerly have confused Git enough to try to write into incorrect paths, anyway). While this patch does not fix a vulnerability on its own, it prevents an attack vector that was exploited in demonstrations of a number of recently-fixed security bugs. The regression test added to `t/t7417-submodule-path-url.sh` reflects that attack vector. Note that we have to adjust the test case "prevent git~1 squatting on Windows" in `t/t7415-submodule-names.sh` because of a very subtle issue. It tries to clone two submodules whose names differ only in a trailing period character, and as a consequence their git directories differ in the same way. Previously, when Git tried to clone the second submodule, it thought that the git directory already existed (because on Windows, when you create a directory with the name `b.` it actually creates `b`), but with this patch, the first submodule's clone will fail because of the illegal name of the git directory. Therefore, when cloning the second submodule, Git will take a different code path: a fresh clone (without an existing git directory). Both code paths fail to clone the second submodule, both because the the corresponding worktree directory exists and is not empty, but the error messages are worded differently. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-12-05is_ntfs_dotgit(): only verify the leading segmentJohannes Schindelin
The config setting `core.protectNTFS` is specifically designed to work not only on Windows, but anywhere, to allow for repositories hosted on, say, Linux servers to be protected against NTFS-specific attack vectors. As a consequence, `is_ntfs_dotgit()` manually splits backslash-separated paths (but does not do the same for paths separated by forward slashes), under the assumption that the backslash might not be a valid directory separator on the _current_ Operating System. However, the two callers, `verify_path()` and `fsck_tree()`, are supposed to feed only individual path segments to the `is_ntfs_dotgit()` function. This causes a lot of duplicate scanning (and very inefficient scanning, too, as the inner loop of `is_ntfs_dotgit()` was optimized for readability rather than for speed. Let's simplify the design of `is_ntfs_dotgit()` by putting the burden of splitting the paths by backslashes as directory separators on the callers of said function. Consequently, the `verify_path()` function, which already splits the path by directory separators, now treats backslashes as directory separators _explicitly_ when `core.protectNTFS` is turned on, even on platforms where the backslash is _not_ a directory separator. Note that we have to repeat some code in `verify_path()`: if the backslash is not a directory separator on the current Operating System, we want to allow file names like `\`, but we _do_ want to disallow paths that are clearly intended to cause harm when the repository is cloned on Windows. The `fsck_tree()` function (the other caller of `is_ntfs_dotgit()`) now needs to look for backslashes in tree entries' names specifically when `core.protectNTFS` is turned on. While it would be tempting to completely disallow backslashes in that case (much like `fsck` reports names containing forward slashes as "full paths"), this would be overzealous: when `core.protectNTFS` is turned on in a non-Windows setup, backslashes are perfectly valid characters in file names while we _still_ want to disallow tree entries that are clearly designed to exploit NTFS-specific behavior. This simplification will make subsequent changes easier to implement, such as turning `core.protectNTFS` on by default (not only on Windows) or protecting against attack vectors involving NTFS Alternate Data Streams. Incidentally, this change allows for catching malicious repositories that contain tree entries of the form `dir\.gitmodules` already on the server side rather than only on the client side (and previously only on Windows): in contrast to `is_ntfs_dotgit()`, the `is_ntfs_dotgitmodules()` function already expects the caller to split the paths by directory separators. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-11-10Fix spelling errors in code commentsElijah Newren
Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-15Merge branch 'js/azure-pipelines-msvc'Junio C Hamano
CI updates. * js/azure-pipelines-msvc: ci: also build and test with MS Visual Studio on Azure Pipelines ci: really use shallow clones on Azure Pipelines tests: let --immediate and --write-junit-xml play well together test-tool run-command: learn to run (parts of) the testsuite vcxproj: include more generated files vcxproj: only copy `git-remote-http.exe` once it was built msvc: work around a bug in GetEnvironmentVariable() msvc: handle DEVELOPER=1 msvc: ignore some libraries when linking compat/win32/path-utils.h: add #include guards winansi: use FLEX_ARRAY to avoid compiler warning msvc: avoid using minus operator on unsigned types push: do not pretend to return `int` from `die_push_simple()`
2019-10-07Merge branch 'tg/stash-refresh-index'Junio C Hamano
"git stash" learned to write refreshed index back to disk. * tg/stash-refresh-index: stash: make sure to write refreshed cache merge: use refresh_and_write_cache factor out refresh_and_write_cache function
2019-10-06msvc: avoid using minus operator on unsigned typesJohannes Schindelin
MSVC complains about this with `-Wall`, which can be taken as a sign that this is indeed a real bug. The symptom is: C4146: unary minus operator applied to unsigned type, result still unsigned Let's avoid this warning in the minimal way, e.g. writing `-1 - <unsigned value>` instead of `-<unsigned value> - 1`. Note that the change in the `estimate_cache_size()` function is needed because MSVC considers the "return type" of the `sizeof()` operator to be `size_t`, i.e. unsigned, and therefore it cannot be negated using the unary minus operator. Even worse, that arithmetic is doing extra work, in vain. We want to calculate the entry extra cache size as the difference between the size of the `cache_entry` structure minus the size of the `ondisk_cache_entry` structure, padded to the appropriate alignment boundary. To that end, we start by assigning that difference to the `per_entry` variable, and then abuse the `len` parameter of the `align_padding_size()` macro to take the negative size of the ondisk entry size. Essentially, we try to avoid passing the already calculated difference to that macro by passing the operands of that difference instead, when the macro expects operands of an addition: #define align_padding_size(size, len) \ ((size + (len) + 8) & ~7) - (size + len) Currently, we pass A and -B to that macro instead of passing A - B and 0, where A - B is already stored in the `per_entry` variable, ready to be used. This is neither necessary, nor intuitive. Let's fix this, and have code that is both easier to read and that also does not trigger MSVC's warning. While at it, we take care of reporting overflows (which are unlikely, but hey, defensive programming is good!). We _also_ take pains of casting the unsigned value to signed: otherwise, the signed operand (i.e. the `-1`) would be cast to unsigned before doing the arithmetic. Helped-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-20factor out refresh_and_write_cache functionThomas Gummerer
Getting the lock for the index, refreshing it and then writing it is a pattern that happens more than once throughout the codebase, and isn't trivial to get right. Factor out the refresh_and_write_cache function from builtin/am.c to read-cache.c, so it can be re-used in other places in a subsequent commit. Note that we return different error codes for failing to refresh the cache, and failing to write the index. The current caller only cares about failing to write the index. However for other callers we're going to convert in subsequent patches we will need this distinction. Helped-by: Martin Ågren <martin.agren@gmail.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-09Merge branch 'ds/feature-macros'Junio C Hamano
A mechanism to affect the default setting for a (related) group of configuration variables is introduced. * ds/feature-macros: repo-settings: create feature.experimental setting repo-settings: create feature.manyFiles setting repo-settings: parse core.untrackedCache commit-graph: turn on commit-graph by default t6501: use 'git gc' in quiet mode repo-settings: consolidate some config settings
2019-08-13repo-settings: parse core.untrackedCacheDerrick Stolee
The core.untrackedCache config setting is slightly complicated, so clarify its use and centralize its parsing into the repo settings. The default value is "keep" (returned as -1), which persists the untracked cache if it exists. If the value is set as "false" (returned as 0), then remove the untracked cache if it exists. If the value is set as "true" (returned as 1), then write the untracked cache and persist it. Instead of relying on magic values of -1, 0, and 1, split these options into an enum. This allows the use of "-1" as a default value. After parsing the config options, if the value is unset we can initialize it to UNTRACKED_CACHE_KEEP. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-13repo-settings: consolidate some config settingsDerrick Stolee
There are a few important config settings that are not loaded during git_default_config. These are instead loaded on-demand. Centralize these config options to a single scan, and store all of the values in a repo_settings struct. The values for each setting are initialized as negative to indicate "unset". This centralization will be particularly important in a later change to introduce "meta" config settings that change the defaults for these config settings. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-29Merge branch 'rs/avoid-overflow-in-midpoint-computation' into maintJunio C Hamano
Code clean-up to avoid signed integer overlaps during binary search. * rs/avoid-overflow-in-midpoint-computation: cleanup: fix possible overflow errors in binary search, part 2
2019-07-29Merge branch 'vn/xmmap-gently' into maintJunio C Hamano
Clean-up an error codepath. * vn/xmmap-gently: read-cache.c: do not die if mmap fails
2019-07-25Merge branch 'vn/xmmap-gently'Junio C Hamano
Clean-up an error codepath. * vn/xmmap-gently: read-cache.c: do not die if mmap fails
2019-07-14read-cache.c: do not die if mmap failsVarun Naik
do_read_index() mmaps the index, or tries to die with an error message on failure. It should call xmmap_gently(), which returns MAP_FAILED, rather than xmmap(), which dies with its own error message. An easy way to cause this mmap to fail is by setting $GIT_INDEX_FILE to a path to a directory and then invoking any command that reads from the index. Signed-off-by: Varun Naik <vcnaik94@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-09Merge branch 'rs/avoid-overflow-in-midpoint-computation'Junio C Hamano
Code clean-up to avoid signed integer overlaps during binary search. * rs/avoid-overflow-in-midpoint-computation: cleanup: fix possible overflow errors in binary search, part 2
2019-06-17Merge branch 'js/fsmonitor-unflake'Junio C Hamano
The data collected by fsmonitor was not properly written back to the on-disk index file, breaking t7519 tests occasionally, which has been corrected. * js/fsmonitor-unflake: mark_fsmonitor_valid(): mark the index as changed if needed fill_stat_cache_info(): prepare for an fsmonitor fix
2019-06-13Merge branch 'jk/unused-params-final-batch'Junio C Hamano
* jk/unused-params-final-batch: verify-commit: simplify parameters to run_gpg_verify() show-branch: drop unused parameter from show_independent() rev-list: drop unused void pointer from finish_commit() remove_all_fetch_refspecs(): drop unused "remote" parameter receive-pack: drop unused "commands" from prepare_shallow_update() pack-objects: drop unused rev_info parameters name-rev: drop unused parameters from is_better_name() mktree: drop unused length parameter wt-status: drop unused status parameter read-cache: drop unused parameter from threaded load clone: drop dest parameter from copy_alternates() submodule: drop unused prefix parameter from some functions builtin: consistently pass cmd_* prefix to parse_options cmd_{read,write}_tree: rename "unused" variable that is used
2019-06-13cleanup: fix possible overflow errors in binary search, part 2René Scharfe
Calculating the sum of two array indexes to find the midpoint between them can overflow, i.e. code like this is unsafe for big arrays: mid = (first + last) >> 1; Make sure the intermediate value stays within the boundaries instead, like this: mid = first + ((last - first) >> 1); The loop condition of the binary search makes sure that 'last' is always greater than 'first', so this is safe as long as 'first' is not negative. And that can be verified easily using the pre-context of each change, except for name-hash.c, so add an assertion to that effect there. The unsafe calculations were found with: git grep '(.*+.*) *>> *1' This is a continuation of 19716b21a4 (cleanup: fix possible overflow errors in binary search, 2017-10-08). Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-28mark_fsmonitor_valid(): mark the index as changed if neededJohannes Schindelin
Without this bug fix, t7519's four "status doesn't detect unreported modifications" test cases would fail occasionally (and, oddly enough, *a lot* more frequently on Windows). The reason is that these test cases intentionally use the side effect of `git status` to re-write the index if any updates were detected: they first clean the worktree, run `git status` to update the index as well as show the output to the casual reader, then make the worktree dirty again and expect no changes to reported if running with a mocked fsmonitor hook. The problem with this strategy was that the index was written during said `git status` on the clean worktree for the *wrong* reason: not because the index was marked as changed (it wasn't), but because the recorded mtimes were racy with the index' own mtime. As the mtime granularity on Windows is 100 nanoseconds (see e.g. https://docs.microsoft.com/en-us/windows/desktop/SysInfo/file-times), the mtimes of the files are often enough *not* racy with the index', so that that `git status` call currently does not always update the index (including the fsmonitor extension), causing the test case to fail. The obvious fix: if we change *any* index entry's `CE_FSMONITOR_VALID` flag, we should also mark the index as changed. That will cause the index to be written upon `git status`, *including* an updated fsmonitor extension. Side note: Even though the reader might think that the t7519 issue should be *much* more prevalent on Linux, given that the ext4 filesystem (that seems to be used by every Linux distribution) stores mtimes in nanosecond precision. However, ext4 uses `current_kernel_time()` (see https://unix.stackexchange.com/questions/11599#comment762968_11599; it is *amazingly* hard to find any proper source of information about such ext4 questions) whose accuracy seems to depend on many factors but is safely worse than the 100-nanosecond granularity of NTFS (again, it is *horribly* hard to find anything remotely authoritative about this question). So it seems that the racy index condition that hid the bug fixed by this patch simply is a lot more likely on Linux than on Windows. But not impossible ;-) Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-28fill_stat_cache_info(): prepare for an fsmonitor fixJohannes Schindelin
We will need to pass down the `struct index_state` to `mark_fsmonitor_valid()` for an upcoming bug fix, and this here function calls that there function, so we need to extend the signature of `fill_stat_cache_info()` first. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-19Merge branch 'js/fsmonitor-refresh-after-discarding-index'Junio C Hamano
The fsmonitor interface got out of sync after the in-core index file gets discarded, which has been corrected. * js/fsmonitor-refresh-after-discarding-index: fsmonitor: force a refresh after the index was discarded fsmonitor: demonstrate that it is not refreshed after discard_index()
2019-05-13Merge branch 'jh/trace2'Junio C Hamano
A few embarrassing bugfixes. * jh/trace2: trace2: fix up a missing "leave" entry point trace2: fix incorrect function pointer check