path: root/symlinks.c
AgeCommit message (Collapse)Author
2018-12-09Indent code with TABsNguyễn Thái Ngọc Duy
We indent with TABs and sometimes for fine alignment, TABs followed by spaces, but never all spaces (unless the indentation is less than 8 columns). Indenting with spaces slips through in some places. Fix them. Imported code and compat/ are left alone on purpose. The former should remain as close as upstream as possible. The latter pretty much has separate maintainers, it's up to them to decide. Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2014-07-07symlinks: remove PATH_MAX limitationKarsten Blees
'git checkout' fails if a directory is longer than PATH_MAX, because the lstat_cache in symlinks.c checks if the leading directory exists using PATH_MAX-bounded string operations. Remove the limitation by using strbuf instead. Signed-off-by: Karsten Blees <> Signed-off-by: Junio C Hamano <>
2012-09-16symlinks.c: mark private file-scope symbols as staticJunio C Hamano
Signed-off-by: Junio C Hamano <>
2012-03-03Add threaded versions of functions in symlinks.c.Jared Hance
check_leading_path() and has_dirs_only_path() both always use the default cache, which could be a caveat for adding parallelism (which is a concern and even a GSoC proposal). Reimplement these two in terms of new threaded_check_leading_path() and threaded_has_dirs_only_path() that take their own copy of the cache. Signed-off-by: Jared Hance <> Signed-off-by: Junio C Hamano <>
2011-02-22do not overwrite untracked symlinksClemens Buchacher
Git traditionally overwrites untracked symlinks silently. This will generally not cause massive data loss, but it is inconsistent with the behavior for regular files, which are not silently overwritten. With this change, git refuses to overwrite untracked symlinks by default. If the user really wants to overwrite the untracked symlink, he has git-clean and git-checkout -f at his disposal. Signed-off-by: Clemens Buchacher <> Signed-off-by: Junio C Hamano <>
2010-10-13do not overwrite files in leading pathClemens Buchacher
If the work tree contains an untracked file x, and unpack-trees wants to checkout a path x/*, the file x is removed unconditionally. Instead, apply the same checks that are normally used for untracked files, and abort if the file cannot be removed. Signed-off-by: Clemens Buchacher <>
2010-10-13lstat_cache: optionally return match_lenClemens Buchacher
Return match_len so that the caller can know which leading path component matched. Signed-off-by: Clemens Buchacher <>
2010-01-18symlinks.c: remove unused functionsJunio C Hamano
invalidate_lstat_cache() and clear_lstat_cache() are not used anywhere. Signed-off-by: Junio C Hamano <>
2009-07-30lstat_cache: guard against full match of length of 'name' parameterKjetil Barvik
longest_path_match() in symlinks.c does exactly what it's name says, but in some cases that match can be too long, since the has_*_leading_path() functions assumes that the match will newer be as long as the name string given to the function. fix this by adding an extra if test which checks if the match length is equal to the 'len' parameter. Signed-off-by: Junio C Hamano <>
2009-07-10Export thread-safe version of 'has_symlink_leading_path()'Linus Torvalds
The threaded index preloading will want it, so that it can avoid locking by simply using a per-thread symlink/directory cache. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2009-07-10Prepare symlink caching for thread-safetyLinus Torvalds
This doesn't actually change the external interfaces, so they are still thread-unsafe, but it makes the code internally pass a pointer to a local 'struct cache_def' around, so that the core code can be made thread-safe. The threaded index preloading will want to verify that the paths leading up to a pathname are all real directories. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2009-06-07symlinks.c: small style cleanupKjetil Barvik
Add {}-braces around an else-part, where the if-part already has {}-braces. And, also remove some unnecessary "return;"-statements at the end of "void foo()"-functions. Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-03-17Revert "lstat_cache(): print a warning if doing ping-pong between cache types"Kjetil Barvik
This reverts commit 7734f04873cfaddd0b148074a633f1f824fd961f. I guess that the reverted commit, 7734f048, has been in test long enough, and should now be reverted. I have not received any info regarding any debug output of the reverted commit, so lets hope that the lstat_cache() function do not cause any ping-pong. Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-02-10lstat_cache(): print a warning if doing ping-pong between cache typesKjetil Barvik
This is a debug patch which is only to be used while the lstat_cache() is in the test stage, and should be removed/reverted before the final relase. I think it should be useful to catch these warnings, as I it could be an indication of that the cache would not be very effective if it is doing ping-pong by switching between different cache types too many times. Also, if someone is experimenting with the lstat_cache(), this patch will maybe be useful while debugging. If someone is able to trigger the warning, then send a mail to the GIT mailing list, containing the first 15 lines of the warning, and a short description of the GIT commands to trigger the warnings. I hope someone is willing to use this patch for a while, to be able to catch possible ping-pong's. Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-02-10unlink_entry(): introduce schedule_dir_for_removal()Kjetil Barvik
Currently inside unlink_entry() if we get a successful removal of one file with unlink(), we try to remove the leading directories each and every time. So if one directory containing 200 files is moved to an other location we get 199 failed calls to rmdir() and 1 successful call. To fix this and avoid some unnecessary calls to rmdir(), we schedule each directory for removal and wait much longer before we do the real call to rmdir(). Since the unlink_entry() function is called with alphabetically sorted names, this new function end up being very effective to avoid unnecessary calls to rmdir(). In some cases over 95% of all calls to rmdir() is removed with this patch. Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-02-10lstat_cache(): swap func(length, string) into func(string, length)Kjetil Barvik
Swap function argument pair (length, string) into (string, length) to conform with the commonly used order inside the GIT source code. Also, add a note about this fact into the coding guidelines. Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-02-10lstat_cache(): generalise longest_match_lstat_cache()Kjetil Barvik
Rename the function to longst_path_match() and generalise it such that it can also be used by other functions. Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-02-10lstat_cache(): small cleanup and optimisationKjetil Barvik
Simplify the if-else test in longest_match_lstat_cache() such that we only have one simple if test. Instead of testing for 'i == cache.len' or 'i == len', we transform this to a common test for 'i == max_len'. And to further optimise we use 'i >= max_len' instead of 'i == max_len', the reason is that it is now the exact opposite of one part inside the while-loop termination expression 'i < max_len && name[i] == cache.path[i]', and then the compiler can probably reuse a test instruction from it. We also throw away the arguments to reset_lstat_cache(), such that all the safeguard logic inside lstat_cache() is handled at one place. Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-01-18lstat_cache(): introduce clear_lstat_cache() functionKjetil Barvik
If you want to completely clear the contents of the lstat_cache(), then call this new function. Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-01-18lstat_cache(): introduce invalidate_lstat_cache() functionKjetil Barvik
In some cases it could maybe be necessary to say to the cache that "Hey, I deleted/changed the type of this pathname and if you currently have it inside your cache, you should deleted it". This patch introduce a function which support this. Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-01-18lstat_cache(): introduce has_dirs_only_path() functionKjetil Barvik
The create_directories() function in entry.c currently calls stat() or lstat() for each path component of the pathname 'path' each and every time. For the 'git checkout' command, this function is called on each file for which we must do an update (ce->ce_flags & CE_UPDATE), so we get lots and lots of calls. To fix this, we make a new wrapper to the lstat_cache() function, and call the wrapper function instead of the calls to the stat() or the lstat() functions. Since the paths given to the create_directories() function, is sorted alphabetically, the new wrapper would be very cache effective in this situation. To support it we must update the lstat_cache() function to be able to say that "please test the complete length of 'name'", and also to give it the length of a prefix, where the cache should use the stat() function instead of the lstat() function to test each path component. Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-01-18lstat_cache(): introduce has_symlink_or_noent_leading_path() functionKjetil Barvik
In some cases, especially inside the unpack-trees.c file, and inside the verify_absent() function, we can avoid some unnecessary calls to lstat(), if the lstat_cache() function can also be told to keep track of non-existing directories. So we update the lstat_cache() function to handle this new fact, introduce a new wrapper function, and the result is that we save lots of lstat() calls for a removed directory which previously contained lots of files, when we call this new wrapper of lstat_cache() instead of the old one. We do similar changes inside the unlink_entry() function, since if we can already say that the leading directory component of a pathname does not exist, it is not necessary to try to remove a pathname below it! Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-01-18lstat_cache(): more cache effective symlink/directory detectionKjetil Barvik
Make the cache functionality more effective. Previously when A/B/C/D was in the cache and A/B/C/E/file.c was called for, there was no match at all from the cache. Now we use the fact that the paths "A", "A/B" and "A/B/C" are already tested, and we only need to do an lstat() call on "A/B/C/E". We only cache/store the last path regardless of its type. Since the cache functionality is always used with alphabetically sorted names (at least it seems so for me), there is no need to store both the last symlink-leading path and the last real-directory path. Note that if the cache is not called with (mostly) alphabetically sorted names, neither the old, nor this new one, would be very effective. Previously, when symlink A/B/C/S was cached/stored in the symlink- leading path, and A/B/C/file.c was called for, it was not easy to use the fact that we already knew that the paths "A", "A/B" and "A/B/C" are real directories. Avoid copying the first path components of the name 2 zillion times when we test new path components. Since we always cache/store the last path, we can copy each component as we test those directly into the cache. Previously we ended up doing a memcpy() for the full path/name right before each lstat() call, and when updating the cache for each time we have tested a new path component. We also use less memory, that is, PATH_MAX bytes less memory on the stack and PATH_MAX bytes less memory on the heap. Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2008-05-11Optimize symlink/directory detectionLinus Torvalds
This is the base for making symlink detection in the middle fo a pathname saner and (much) more efficient. Under various loads, we want to verify that the full path leading up to a filename is a real directory tree, and that when we successfully do an 'lstat()' on a filename, we don't get a false positive due to a symlink in the middle of the path that git should have seen as a symlink, not as a normal path component. The 'has_symlink_leading_path()' function already did this, and cached a single level of symlink information, but didn't cache the _lack_ of a symlink, so the normal behaviour was actually the wrong way around, and we ended up doing an 'lstat()' on each path component to check that it was a real directory. This caches the last detected full directory and symlink entries, and speeds up especially deep directory structures a lot by avoiding to lstat() all the directories leading up to each entry in the index. [ This can - and should - probably be extended upon so that we eventually never do a bare 'lstat()' on any path entries at *all* when checking the index, but always check the full path carefully. Right now we do not generally check the whole path for all our normal quick index revalidation. We should also make sure that we're careful about all the invalidation, ie when we remove a link and replace it by a directory we should invalidate the symlink cache if it matches (and vice versa for the directory cache). But regardless, the basic function needs to be sane to do that. The old 'has_symlink_leading_path()' was not capable enough - or indeed the code readable enough - to really do that sanely. So I'm pushing this as not just an optimization, but as a base for further work. ] Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2007-05-12Add has_symlink_leading_path() function.Junio C Hamano
When we are applying a patch that creates a blob at a path, or when we are switching from a branch that does not have a blob at the path to another branch that has one, we need to make sure that there is nothing at the path in the working tree, as such a file is a local modification made by the user that would be lost by the operation. Normally, lstat() on the path and making sure ENOENT is returned is good enough for that purpose. However there is a twist. We may be creating a regular file arch/x86_64/boot/Makefile, while removing an existing symbolic link at arch/x86_64/boot that points at existing ../i386/boot directory that has Makefile in it. We always first check without touching filesystem and then perform the actual operation, so when we verify the new file, arch/x86_64/boot/Makefile, does not exist, we haven't removed the symbolic link arc/x86_64/boot symbolic link yet. lstat() on the file sees through the symbolic link and reports the file is there, which is not what we want. The function has_symlink_leading_path() function takes a path, and sees if any of the leading directory component is a symbolic link. When files in a new directory are created, we tend to process them together because both index and tree are sorted. The function takes advantage of this and allows the caller to cache and reuse which symbolic link on the filesystem caused the function to return true. The calling sequence would be: char last_symlink[PATH_MAX]; *last_symlink = '\0'; for each index entry { if (!lose) continue; if (lstat(it)) if (errno == ENOENT) ; /* happy */ else error; else if (has_symlink_leading_path(it, last_symlink)) ; /* happy */ else error; /* would lose local changes */ unlink_entry(it, last_symlink); } Signed-off-by: Junio C Hamano <>