path: root/symlinks.c
AgeCommit message (Collapse)Author
2021-03-18checkout: don't follow symlinks when removing entriesMatheus Tavares
At 1d718a5108 ("do not overwrite untracked symlinks", 2011-02-20), symlink.c:check_leading_path() started returning different codes for FL_ENOENT and FL_SYMLINK. But one of its callers, unlink_entry(), was not adjusted for this change, so it started to follow symlinks on the leading path of to-be-removed entries. Fix that and add a regression test. Note that since 1d718a5108 check_leading_path() no longer differentiates the case where it found a symlink in the path's leading components from the cases where it found a regular file or failed to lstat() the component. So, a side effect of this current patch is that unlink_entry() now returns early in all of these three cases. And because we no longer try to unlink such paths, we also don't get the warning from remove_or_warn(). For the regular file and symlink cases, it's questionable whether the warning was useful in the first place: unlink_entry() removes tracked paths that should no longer be present in the state we are checking out to. If the path had its leading dir replaced by another file, it means that the basename already doesn't exist, so there is no need for a warning. Sure, we are leaving a regular file or symlink behind at the path's dirname, but this file is either untracked now (so again, no need to warn), or it will be replaced by a tracked file during the next phase of this checkout operation. As for failing to lstat() one of the leading components, the basename might still exist only we cannot unlink it (e.g. due to the lack of the required permissions). Since the user expect it to be removed (especially with checkout's --no-overlay option), add back the warning in this more relevant case. Signed-off-by: Matheus Tavares <> Signed-off-by: Junio C Hamano <>
2021-03-18symlinks: update comment on threaded_check_leading_path()Matheus Tavares
Since 1d718a5108 ("do not overwrite untracked symlinks", 2011-02-20), the comment on top of threaded_check_leading_path() is outdated and no longer reflects the behavior of this function. Let's updated it to avoid confusions. While we are here, also remove some duplicated comments to avoid similar maintenance problems. Signed-off-by: Matheus Tavares <> Signed-off-by: Junio C Hamano <>
2021-02-12Sync with 2.20.5Johannes Schindelin
* maint-2.20: Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12checkout: fix bug that makes checkout follow symlinks in leading pathMatheus Tavares
Before checking out a file, we have to confirm that all of its leading components are real existing directories. And to reduce the number of lstat() calls in this process, we cache the last leading path known to contain only directories. However, when a path collision occurs (e.g. when checking out case-sensitive files in case-insensitive file systems), a cached path might have its file type changed on disk, leaving the cache on an invalid state. Normally, this doesn't bring any bad consequences as we usually check out files in index order, and therefore, by the time the cached path becomes outdated, we no longer need it anyway (because all files in that directory would have already been written). But, there are some users of the checkout machinery that do not always follow the index order. In particular: checkout-index writes the paths in the same order that they appear on the CLI (or stdin); and the delayed checkout feature -- used when a long-running filter process replies with "status=delayed" -- postpones the checkout of some entries, thus modifying the checkout order. When we have to check out an out-of-order entry and the lstat() cache is invalid (due to a previous path collision), checkout_entry() may end up using the invalid data and thrusting that the leading components are real directories when, in reality, they are not. In the best case scenario, where the directory was replaced by a regular file, the user will get an error: "fatal: unable to create file 'foo/bar': Not a directory". But if the directory was replaced by a symlink, checkout could actually end up following the symlink and writing the file at a wrong place, even outside the repository. Since delayed checkout is affected by this bug, it could be used by an attacker to write arbitrary files during the clone of a maliciously crafted repository. Some candidate solutions considered were to disable the lstat() cache during unordered checkouts or sort the entries before passing them to the checkout machinery. But both ideas include some performance penalty and they don't future-proof the code against new unordered use cases. Instead, we now manually reset the lstat cache whenever we successfully remove a directory. Note: We are not even checking whether the directory was the same as the lstat cache points to because we might face a scenario where the paths refer to the same location but differ due to case folding, precomposed UTF-8 issues, or the presence of `..` components in the path. Two regression tests, with case-collisions and utf8-collisions, are also added for both checkout-index and delayed checkout. Note: to make the previously mentioned clone attack unfeasible, it would be sufficient to reset the lstat cache only after the remove_subtree() call inside checkout_entry(). This is the place where we would remove a directory whose path collides with the path of another entry that we are currently trying to check out (possibly a symlink). However, in the interest of a thorough fix that does not leave Git open to similar-but-not-identical attack vectors, we decided to intercept all `rmdir()` calls in one fell swoop. This addresses CVE-2021-21300. Co-authored-by: Johannes Schindelin <> Signed-off-by: Matheus Tavares <>
2018-12-09Indent code with TABsNguyễn Thái Ngọc Duy
We indent with TABs and sometimes for fine alignment, TABs followed by spaces, but never all spaces (unless the indentation is less than 8 columns). Indenting with spaces slips through in some places. Fix them. Imported code and compat/ are left alone on purpose. The former should remain as close as upstream as possible. The latter pretty much has separate maintainers, it's up to them to decide. Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2014-07-07symlinks: remove PATH_MAX limitationKarsten Blees
'git checkout' fails if a directory is longer than PATH_MAX, because the lstat_cache in symlinks.c checks if the leading directory exists using PATH_MAX-bounded string operations. Remove the limitation by using strbuf instead. Signed-off-by: Karsten Blees <> Signed-off-by: Junio C Hamano <>
2012-09-16symlinks.c: mark private file-scope symbols as staticJunio C Hamano
Signed-off-by: Junio C Hamano <>
2012-03-03Add threaded versions of functions in symlinks.c.Jared Hance
check_leading_path() and has_dirs_only_path() both always use the default cache, which could be a caveat for adding parallelism (which is a concern and even a GSoC proposal). Reimplement these two in terms of new threaded_check_leading_path() and threaded_has_dirs_only_path() that take their own copy of the cache. Signed-off-by: Jared Hance <> Signed-off-by: Junio C Hamano <>
2011-02-22do not overwrite untracked symlinksClemens Buchacher
Git traditionally overwrites untracked symlinks silently. This will generally not cause massive data loss, but it is inconsistent with the behavior for regular files, which are not silently overwritten. With this change, git refuses to overwrite untracked symlinks by default. If the user really wants to overwrite the untracked symlink, he has git-clean and git-checkout -f at his disposal. Signed-off-by: Clemens Buchacher <> Signed-off-by: Junio C Hamano <>
2010-10-13do not overwrite files in leading pathClemens Buchacher
If the work tree contains an untracked file x, and unpack-trees wants to checkout a path x/*, the file x is removed unconditionally. Instead, apply the same checks that are normally used for untracked files, and abort if the file cannot be removed. Signed-off-by: Clemens Buchacher <>
2010-10-13lstat_cache: optionally return match_lenClemens Buchacher
Return match_len so that the caller can know which leading path component matched. Signed-off-by: Clemens Buchacher <>
2010-01-18symlinks.c: remove unused functionsJunio C Hamano
invalidate_lstat_cache() and clear_lstat_cache() are not used anywhere. Signed-off-by: Junio C Hamano <>
2009-07-30lstat_cache: guard against full match of length of 'name' parameterKjetil Barvik
longest_path_match() in symlinks.c does exactly what it's name says, but in some cases that match can be too long, since the has_*_leading_path() functions assumes that the match will newer be as long as the name string given to the function. fix this by adding an extra if test which checks if the match length is equal to the 'len' parameter. Signed-off-by: Junio C Hamano <>
2009-07-10Export thread-safe version of 'has_symlink_leading_path()'Linus Torvalds
The threaded index preloading will want it, so that it can avoid locking by simply using a per-thread symlink/directory cache. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2009-07-10Prepare symlink caching for thread-safetyLinus Torvalds
This doesn't actually change the external interfaces, so they are still thread-unsafe, but it makes the code internally pass a pointer to a local 'struct cache_def' around, so that the core code can be made thread-safe. The threaded index preloading will want to verify that the paths leading up to a pathname are all real directories. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2009-06-07symlinks.c: small style cleanupKjetil Barvik
Add {}-braces around an else-part, where the if-part already has {}-braces. And, also remove some unnecessary "return;"-statements at the end of "void foo()"-functions. Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-03-17Revert "lstat_cache(): print a warning if doing ping-pong between cache types"Kjetil Barvik
This reverts commit 7734f04873cfaddd0b148074a633f1f824fd961f. I guess that the reverted commit, 7734f048, has been in test long enough, and should now be reverted. I have not received any info regarding any debug output of the reverted commit, so lets hope that the lstat_cache() function do not cause any ping-pong. Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-02-10lstat_cache(): print a warning if doing ping-pong between cache typesKjetil Barvik
This is a debug patch which is only to be used while the lstat_cache() is in the test stage, and should be removed/reverted before the final relase. I think it should be useful to catch these warnings, as I it could be an indication of that the cache would not be very effective if it is doing ping-pong by switching between different cache types too many times. Also, if someone is experimenting with the lstat_cache(), this patch will maybe be useful while debugging. If someone is able to trigger the warning, then send a mail to the GIT mailing list, containing the first 15 lines of the warning, and a short description of the GIT commands to trigger the warnings. I hope someone is willing to use this patch for a while, to be able to catch possible ping-pong's. Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-02-10unlink_entry(): introduce schedule_dir_for_removal()Kjetil Barvik
Currently inside unlink_entry() if we get a successful removal of one file with unlink(), we try to remove the leading directories each and every time. So if one directory containing 200 files is moved to an other location we get 199 failed calls to rmdir() and 1 successful call. To fix this and avoid some unnecessary calls to rmdir(), we schedule each directory for removal and wait much longer before we do the real call to rmdir(). Since the unlink_entry() function is called with alphabetically sorted names, this new function end up being very effective to avoid unnecessary calls to rmdir(). In some cases over 95% of all calls to rmdir() is removed with this patch. Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-02-10lstat_cache(): swap func(length, string) into func(string, length)Kjetil Barvik
Swap function argument pair (length, string) into (string, length) to conform with the commonly used order inside the GIT source code. Also, add a note about this fact into the coding guidelines. Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-02-10lstat_cache(): generalise longest_match_lstat_cache()Kjetil Barvik
Rename the function to longst_path_match() and generalise it such that it can also be used by other functions. Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-02-10lstat_cache(): small cleanup and optimisationKjetil Barvik
Simplify the if-else test in longest_match_lstat_cache() such that we only have one simple if test. Instead of testing for 'i == cache.len' or 'i == len', we transform this to a common test for 'i == max_len'. And to further optimise we use 'i >= max_len' instead of 'i == max_len', the reason is that it is now the exact opposite of one part inside the while-loop termination expression 'i < max_len && name[i] == cache.path[i]', and then the compiler can probably reuse a test instruction from it. We also throw away the arguments to reset_lstat_cache(), such that all the safeguard logic inside lstat_cache() is handled at one place. Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-01-18lstat_cache(): introduce clear_lstat_cache() functionKjetil Barvik
If you want to completely clear the contents of the lstat_cache(), then call this new function. Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-01-18lstat_cache(): introduce invalidate_lstat_cache() functionKjetil Barvik
In some cases it could maybe be necessary to say to the cache that "Hey, I deleted/changed the type of this pathname and if you currently have it inside your cache, you should deleted it". This patch introduce a function which support this. Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-01-18lstat_cache(): introduce has_dirs_only_path() functionKjetil Barvik
The create_directories() function in entry.c currently calls stat() or lstat() for each path component of the pathname 'path' each and every time. For the 'git checkout' command, this function is called on each file for which we must do an update (ce->ce_flags & CE_UPDATE), so we get lots and lots of calls. To fix this, we make a new wrapper to the lstat_cache() function, and call the wrapper function instead of the calls to the stat() or the lstat() functions. Since the paths given to the create_directories() function, is sorted alphabetically, the new wrapper would be very cache effective in this situation. To support it we must update the lstat_cache() function to be able to say that "please test the complete length of 'name'", and also to give it the length of a prefix, where the cache should use the stat() function instead of the lstat() function to test each path component. Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-01-18lstat_cache(): introduce has_symlink_or_noent_leading_path() functionKjetil Barvik
In some cases, especially inside the unpack-trees.c file, and inside the verify_absent() function, we can avoid some unnecessary calls to lstat(), if the lstat_cache() function can also be told to keep track of non-existing directories. So we update the lstat_cache() function to handle this new fact, introduce a new wrapper function, and the result is that we save lots of lstat() calls for a removed directory which previously contained lots of files, when we call this new wrapper of lstat_cache() instead of the old one. We do similar changes inside the unlink_entry() function, since if we can already say that the leading directory component of a pathname does not exist, it is not necessary to try to remove a pathname below it! Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-01-18lstat_cache(): more cache effective symlink/directory detectionKjetil Barvik
Make the cache functionality more effective. Previously when A/B/C/D was in the cache and A/B/C/E/file.c was called for, there was no match at all from the cache. Now we use the fact that the paths "A", "A/B" and "A/B/C" are already tested, and we only need to do an lstat() call on "A/B/C/E". We only cache/store the last path regardless of its type. Since the cache functionality is always used with alphabetically sorted names (at least it seems so for me), there is no need to store both the last symlink-leading path and the last real-directory path. Note that if the cache is not called with (mostly) alphabetically sorted names, neither the old, nor this new one, would be very effective. Previously, when symlink A/B/C/S was cached/stored in the symlink- leading path, and A/B/C/file.c was called for, it was not easy to use the fact that we already knew that the paths "A", "A/B" and "A/B/C" are real directories. Avoid copying the first path components of the name 2 zillion times when we test new path components. Since we always cache/store the last path, we can copy each component as we test those directly into the cache. Previously we ended up doing a memcpy() for the full path/name right before each lstat() call, and when updating the cache for each time we have tested a new path component. We also use less memory, that is, PATH_MAX bytes less memory on the stack and PATH_MAX bytes less memory on the heap. Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2008-05-11Optimize symlink/directory detectionLinus Torvalds
This is the base for making symlink detection in the middle fo a pathname saner and (much) more efficient. Under various loads, we want to verify that the full path leading up to a filename is a real directory tree, and that when we successfully do an 'lstat()' on a filename, we don't get a false positive due to a symlink in the middle of the path that git should have seen as a symlink, not as a normal path component. The 'has_symlink_leading_path()' function already did this, and cached a single level of symlink information, but didn't cache the _lack_ of a symlink, so the normal behaviour was actually the wrong way around, and we ended up doing an 'lstat()' on each path component to check that it was a real directory. This caches the last detected full directory and symlink entries, and speeds up especially deep directory structures a lot by avoiding to lstat() all the directories leading up to each entry in the index. [ This can - and should - probably be extended upon so that we eventually never do a bare 'lstat()' on any path entries at *all* when checking the index, but always check the full path carefully. Right now we do not generally check the whole path for all our normal quick index revalidation. We should also make sure that we're careful about all the invalidation, ie when we remove a link and replace it by a directory we should invalidate the symlink cache if it matches (and vice versa for the directory cache). But regardless, the basic function needs to be sane to do that. The old 'has_symlink_leading_path()' was not capable enough - or indeed the code readable enough - to really do that sanely. So I'm pushing this as not just an optimization, but as a base for further work. ] Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2007-05-12Add has_symlink_leading_path() function.Junio C Hamano
When we are applying a patch that creates a blob at a path, or when we are switching from a branch that does not have a blob at the path to another branch that has one, we need to make sure that there is nothing at the path in the working tree, as such a file is a local modification made by the user that would be lost by the operation. Normally, lstat() on the path and making sure ENOENT is returned is good enough for that purpose. However there is a twist. We may be creating a regular file arch/x86_64/boot/Makefile, while removing an existing symbolic link at arch/x86_64/boot that points at existing ../i386/boot directory that has Makefile in it. We always first check without touching filesystem and then perform the actual operation, so when we verify the new file, arch/x86_64/boot/Makefile, does not exist, we haven't removed the symbolic link arc/x86_64/boot symbolic link yet. lstat() on the file sees through the symbolic link and reports the file is there, which is not what we want. The function has_symlink_leading_path() function takes a path, and sees if any of the leading directory component is a symbolic link. When files in a new directory are created, we tend to process them together because both index and tree are sorted. The function takes advantage of this and allows the caller to cache and reuse which symbolic link on the filesystem caused the function to return true. The calling sequence would be: char last_symlink[PATH_MAX]; *last_symlink = '\0'; for each index entry { if (!lose) continue; if (lstat(it)) if (errno == ENOENT) ; /* happy */ else error; else if (has_symlink_leading_path(it, last_symlink)) ; /* happy */ else error; /* would lose local changes */ unlink_entry(it, last_symlink); } Signed-off-by: Junio C Hamano <>