summaryrefslogtreecommitdiff
path: root/path.c
AgeCommit message (Collapse)Author
2021-05-04t0060: test ntfs/hfs-obscured dotfilesJeff King
We have tests that cover various filesystem-specific spellings of ".gitmodules", because we need to reliably identify that path for some security checks. These are from dc2d9ba318 (is_{hfs,ntfs}_dotgitmodules: add tests, 2018-05-12), with the actual code coming from e7cb0b4455 (is_ntfs_dotgit: match other .git files, 2018-05-11) and 0fc333ba20 (is_hfs_dotgit: match other .git files, 2018-05-02). Those latter two commits also added similar matching functions for .gitattributes and .gitignore. These ended up not being used in the final series, and are currently dead code. But in preparation for them being used in some fsck checks, let's make sure they actually work by throwing a few basic tests at them. Likewise, let's cover .mailmap (which does need matching code added). I didn't bother with the whole battery of tests that we cover for .gitmodules. These functions are all based on the same generic matcher, so it's sufficient to test most of the corner cases just once. Note that the ntfs magic prefix names in the tests come from the algorithm described in e7cb0b4455 (and are different for each file). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-20merge-ort: write $GIT_DIR/AUTO_MERGE whenever we hit a conflictElijah Newren
There are a variety of questions users might ask while resolving conflicts: * What changes have been made since the previous (first) parent? * What changes are staged? * What is still unstaged? (or what is still conflicted?) * What changes did I make to resolve conflicts so far? The first three of these have simple answers: * git diff HEAD * git diff --cached * git diff There was no way to answer the final question previously. Adding one is trivial in merge-ort, since it works by creating a tree representing what should be written to the working copy complete with conflict markers. Simply write that tree to .git/AUTO_MERGE, allowing users to answer the fourth question with * git diff AUTO_MERGE I avoided using a name like "MERGE_AUTO", because that would be merge-specific (much like MERGE_HEAD, REBASE_HEAD, REVERT_HEAD, CHERRY_PICK_HEAD) and I wanted a name that didn't change depending on which type of operation the merge was part of. Ensure that paths which clean out other temporary operation-specific files (e.g. CHERRY_PICK_HEAD, MERGE_MSG, rebase-merge/ state directory) also clean out this AUTO_MERGE file. Signed-off-by: Elijah Newren <newren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-21sequencer: treat REVERT_HEAD as a pseudo refHan-Wen Nienhuys
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-21sequencer: treat CHERRY_PICK_HEAD as a pseudo refHan-Wen Nienhuys
Check for existence and delete CHERRY_PICK_HEAD through ref functions. This will help cherry-pick work with alternate ref storage backends. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-29Merge branch 'dl/merge-autostash'Junio C Hamano
"git merge" learns the "--autostash" option. * dl/merge-autostash: (22 commits) pull: pass --autostash to merge t5520: make test_pull_autostash() accept expect_parent_num merge: teach --autostash option sequencer: implement apply_autostash_oid() sequencer: implement save_autostash() sequencer: unlink autostash in apply_autostash() sequencer: extract perform_autostash() from rebase rebase: generify create_autostash() rebase: extract create_autostash() reset: extract reset_head() from rebase rebase: generify reset_head() rebase: use apply_autostash() from sequencer.c sequencer: rename stash_sha1 to stash_oid sequencer: make apply_autostash() accept a path rebase: use read_oneliner() sequencer: make read_oneliner() extern sequencer: configurably warn on non-existent files sequencer: make read_oneliner() accept flags sequencer: make file exists check more efficient sequencer: stop leaking buf ...
2020-04-10merge: teach --autostash optionDenton Liu
In rebase, one can pass the `--autostash` option to cause the worktree to be automatically stashed before continuing with the rebase. This option is missing in merge, however. Implement the `--autostash` option and corresponding `merge.autoStash` option in merge which stashes before merging and then pops after. This option is useful when a developer has some local changes on a topic branch but they realize that their work depends on another branch. Previously, they had to run something like git fetch ... git stash push git merge FETCH_HEAD git stash pop but now, that is reduced to git fetch ... git merge --autostash FETCH_HEAD When an autostash is generated, it is automatically reapplied to the worktree only in three explicit situations: 1. An incomplete merge is commit using `git commit`. 2. A merge completes successfully. 3. A merge is aborted using `git merge --abort`. In all other situations where the merge state is removed using remove_merge_branch_state() such as aborting a merge via `git reset --hard`, the autostash is saved into the stash reflog instead keeping the worktree clean. Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk> Suggested-by: Alban Gruin <alban.gruin@gmail.com> Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27Merge branch 'bc/sha-256-part-1-of-4'Junio C Hamano
SHA-256 transition continues. * bc/sha-256-part-1-of-4: (22 commits) fast-import: add options for rewriting submodules fast-import: add a generic function to iterate over marks fast-import: make find_marks work on any mark set fast-import: add helper function for inserting mark object entries fast-import: permit reading multiple marks files commit: use expected signature header for SHA-256 worktree: allow repository version 1 init-db: move writing repo version into a function builtin/init-db: add environment variable for new repo hash builtin/init-db: allow specifying hash algorithm on command line setup: allow check_repository_format to read repository format t/helper: make repository tests hash independent t/helper: initialize repository if necessary t/helper/test-dump-split-index: initialize git repository t6300: make hash algorithm independent t6300: abstract away SHA-1-specific constants t: use hash-specific lookup tables to define test constants repository: require a build flag to use SHA-256 hex: add functions to parse hex object IDs in any algorithm hex: introduce parsing variants taking hash algorithms ...
2020-03-10real_path: remove unsafe APIAlexandr Miloslavskiy
Returning a shared buffer invites very subtle bugs due to reentrancy or multi-threading, as demonstrated by the previous patch. There was an unfinished effort to abolish this [1]. Let's finally rid of `real_path()`, using `strbuf_realpath()` instead. This patch uses a local `strbuf` for most places where `real_path()` was previously called. However, two places return the value of `real_path()` to the caller. For them, a `static` local `strbuf` was added, effectively pushing the problem one level higher: read_gitfile_gently() get_superproject_working_tree() [1] https://lore.kernel.org/git/1480964316-99305-1-git-send-email-bmwill@google.com/ Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-06set_git_dir: fix crash when used with real_path()Alexandr Miloslavskiy
`real_path()` returns result from a shared buffer, inviting subtle reentrance bugs. One of these bugs occur when invoked this way: set_git_dir(real_path(git_dir)) In this case, `real_path()` has reentrance: real_path read_gitfile_gently repo_set_gitdir setup_git_env set_git_dir_1 set_git_dir Later, `set_git_dir()` uses its now-dead parameter: !is_absolute_path(path) Fix this by using a dedicated `strbuf` to hold `strbuf_realpath()`. Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24setup: allow check_repository_format to read repository formatbrian m. carlson
In some cases, we will want to not only check the repository format, but extract the information that we've gained. To do so, allow check_repository_format to take a pointer to struct repository_format. Allow passing NULL for this argument if we're not interested in the information, and pass NULL for all existing callers. A future patch will make use of this information. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-30normalize_path_copy(): document "dst" size expectationsJeff King
We take a "dst" buffer to write into, but there's no matching "len" parameter. The hidden assumption is that normalizing always makes things smaller, so we're OK as long as "dst" is at least as big as "src". Let's document that explicitly. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-10Sync with Git 2.24.1Junio C Hamano
2019-12-06Sync with 2.23.1Johannes Schindelin
* maint-2.23: (44 commits) Git 2.23.1 Git 2.22.2 Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters ...
2019-12-06Sync with 2.21.1Johannes Schindelin
* maint-2.21: (42 commits) Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh ...
2019-12-06Sync with 2.20.2Johannes Schindelin
* maint-2.20: (36 commits) Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories ...
2019-12-06Sync with 2.19.3Johannes Schindelin
* maint-2.19: (34 commits) Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams ...
2019-12-06Sync with 2.18.2Johannes Schindelin
* maint-2.18: (33 commits) Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up ...
2019-12-06Sync with 2.17.3Johannes Schindelin
* maint-2.17: (32 commits) Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names ...
2019-12-06Sync with 2.14.6Johannes Schindelin
* maint-2.14: (28 commits) Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names path: safeguard `.git` against NTFS Alternate Streams Accesses clone --recurse-submodules: prevent name squatting on Windows is_ntfs_dotgit(): only verify the leading segment test-path-utils: offer to run a protectNTFS/protectHFS benchmark ...
2019-12-05is_ntfs_dotgit(): speed it upJohannes Schindelin
Previously, this function was written without focusing on speed, intending to make reviewing the code as easy as possible, to avoid any bugs in this critical code. Turns out: we can do much better on both accounts. With this patch, we make it as fast as this developer can make it go: - We avoid the call to `is_dir_sep()` and make all the character comparisons explicit. - We avoid the cost of calling `strncasecmp()` and unroll the test for `.git` and `git~1`, not even using `tolower()` because it is faster to compare against two constant values. - We look for `.git` and `.git~1` first thing, and return early if not found. - We also avoid calling a separate function for detecting chains of spaces and periods. Each of these improvements has a noticeable impact on the speed of `is_ntfs_dotgit()`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-12-05path: also guard `.gitmodules` against NTFS Alternate Data StreamsJohannes Schindelin
We just safe-guarded `.git` against NTFS Alternate Data Stream-related attack vectors, and now it is time to do the same for `.gitmodules`. Note: In the added regression test, we refrain from verifying all kinds of variations between short names and NTFS Alternate Data Streams: as the new code disallows _all_ Alternate Data Streams of `.gitmodules`, it is enough to test one in order to know that all of them are guarded against. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-12-05path: safeguard `.git` against NTFS Alternate Streams AccessesJohannes Schindelin
Probably inspired by HFS' resource streams, NTFS supports "Alternate Data Streams": by appending `:<stream-name>` to the file name, information in addition to the file contents can be written and read, information that is copied together with the file (unless copied to a non-NTFS location). These Alternate Data Streams are typically used for things like marking an executable as having just been downloaded from the internet (and hence not necessarily being trustworthy). In addition to a stream name, a stream type can be appended, like so: `:<stream-name>:<stream-type>`. Unless specified, the default stream type is `$DATA` for files and `$INDEX_ALLOCATION` for directories. In other words, `.git::$INDEX_ALLOCATION` is a valid way to reference the `.git` directory! In our work in Git v2.2.1 to protect Git on NTFS drives under `core.protectNTFS`, we focused exclusively on NTFS short names, unaware of the fact that NTFS Alternate Data Streams offer a similar attack vector. Let's fix this. Seeing as it is better to be safe than sorry, we simply disallow paths referring to *any* NTFS Alternate Data Stream of `.git`, not just `::$INDEX_ALLOCATION`. This also simplifies the implementation. This closes CVE-2019-1352. Further reading about NTFS Alternate Data Streams: https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/c54dec26-1551-4d3a-a0ea-4fa40f848eb3 Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-12-05is_ntfs_dotgit(): only verify the leading segmentJohannes Schindelin
The config setting `core.protectNTFS` is specifically designed to work not only on Windows, but anywhere, to allow for repositories hosted on, say, Linux servers to be protected against NTFS-specific attack vectors. As a consequence, `is_ntfs_dotgit()` manually splits backslash-separated paths (but does not do the same for paths separated by forward slashes), under the assumption that the backslash might not be a valid directory separator on the _current_ Operating System. However, the two callers, `verify_path()` and `fsck_tree()`, are supposed to feed only individual path segments to the `is_ntfs_dotgit()` function. This causes a lot of duplicate scanning (and very inefficient scanning, too, as the inner loop of `is_ntfs_dotgit()` was optimized for readability rather than for speed. Let's simplify the design of `is_ntfs_dotgit()` by putting the burden of splitting the paths by backslashes as directory separators on the callers of said function. Consequently, the `verify_path()` function, which already splits the path by directory separators, now treats backslashes as directory separators _explicitly_ when `core.protectNTFS` is turned on, even on platforms where the backslash is _not_ a directory separator. Note that we have to repeat some code in `verify_path()`: if the backslash is not a directory separator on the current Operating System, we want to allow file names like `\`, but we _do_ want to disallow paths that are clearly intended to cause harm when the repository is cloned on Windows. The `fsck_tree()` function (the other caller of `is_ntfs_dotgit()`) now needs to look for backslashes in tree entries' names specifically when `core.protectNTFS` is turned on. While it would be tempting to completely disallow backslashes in that case (much like `fsck` reports names containing forward slashes as "full paths"), this would be overzealous: when `core.protectNTFS` is turned on in a non-Windows setup, backslashes are perfectly valid characters in file names while we _still_ want to disallow tree entries that are clearly designed to exploit NTFS-specific behavior. This simplification will make subsequent changes easier to implement, such as turning `core.protectNTFS` on by default (not only on Windows) or protecting against attack vectors involving NTFS Alternate Data Streams. Incidentally, this change allows for catching malicious repositories that contain tree entries of the form `dir\.gitmodules` already on the server side rather than only on the client side (and previously only on Windows): in contrast to `is_ntfs_dotgit()`, the `is_ntfs_dotgitmodules()` function already expects the caller to split the paths by directory separators. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-12-04path.c: document the purpose of `is_ntfs_dotgit()`Johannes Schindelin
Previously, this function was completely undocumented. It is worth, though, to explain what is going on, as it is not really obvious at all. Suggested-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-12-01Merge branch 'js/git-path-head-dot-lock-fix'Junio C Hamano
"git rev-parse --git-path HEAD.lock" did not give the right path when run in a secondary worktree. * js/git-path-head-dot-lock-fix: git_path(): handle `.lock` files correctly t1400: wrap setup code in test case
2019-11-10Merge branch 'sg/dir-trie-fixes'Junio C Hamano
Code clean-up and a bugfix in the logic used to tell worktree local and repository global refs apart. * sg/dir-trie-fixes: path.c: don't call the match function without value in trie_find() path.c: clarify two field names in 'struct common_dir' path.c: mark 'logs/HEAD' in 'common_list' as file path.c: clarify trie_find()'s in-code comment Documentation: mention more worktree-specific exceptions
2019-10-29git_path(): handle `.lock` files correctlyJohannes Schindelin
Ever since worktrees were introduced, the `git_path()` function _really_ needed to be called e.g. to get at the path to `logs/HEAD` (`HEAD` is specific to the worktree, and therefore so is its reflog). However, the wrong path is returned for `logs/HEAD.lock`. This does not matter as long as the Git executable is doing the asking, as the path for that `logs/HEAD.lock` file is constructed from `git_path("logs/HEAD")` by appending the `.lock` suffix. However, Git GUI just learned to use `--git-path` instead of appending relative paths to what `git rev-parse --git-dir` returns (and as a consequence not only using the correct hooks directory, but also using the correct paths in worktrees other than the main one). While it does not seem as if Git GUI in particular is asking for `logs/HEAD.lock`, let's be safe rather than sorry. Side note: Git GUI _does_ ask for `index.lock`, but that is already resolved correctly, due to `update_common_dir()` preferring to leave unknown paths in the (worktree-specific) git directory. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-23path.c: don't call the match function without value in trie_find()SZEDER Gábor
'logs/refs' is not a working tree-specific path, but since commit b9317d55a3 (Make sure refs/rewritten/ is per-worktree, 2019-03-07) 'git rev-parse --git-path' has been returning a bogus path if a trailing '/' is present: $ git -C WT/ rev-parse --git-path logs/refs --git-path logs/refs/ /home/szeder/src/git/.git/logs/refs /home/szeder/src/git/.git/worktrees/WT/logs/refs/ We use a trie data structure to efficiently decide whether a path belongs to the common dir or is working tree-specific. As it happens b9317d55a3 triggered a bug that is as old as the trie implementation itself, added in 4e09cf2acf (path: optimize common dir checking, 2015-08-31). - According to the comment describing trie_find(), it should only call the given match function 'fn' for a "/-or-\0-terminated prefix of the key for which the trie contains a value". This is not true: there are three places where trie_find() calls the match function, but one of them is missing the check for value's existence. - b9317d55a3 added two new keys to the trie: 'logs/refs/rewritten' and 'logs/refs/worktree', next to the already existing 'logs/refs/bisect'. This resulted in a trie node with the path 'logs/refs/', which didn't exist before, and which doesn't have a value attached. A query for 'logs/refs/' finds this node and then hits that one callsite of the match function which doesn't check for the value's existence, and thus invokes the match function with NULL as value. - When the match function check_common() is invoked with a NULL value, it returns 0, which indicates that the queried path doesn't belong to the common directory, ultimately resulting the bogus path shown above. Add the missing condition to trie_find() so it will never invoke the match function with a non-existing value. check_common() will then no longer have to check that it got a non-NULL value, so remove that condition. I believe that there are no other paths that could cause similar bogus output. AFAICT the only other key resulting in the match function being called with a NULL value is 'co' (because of the keys 'common' and 'config'). However, as they are not in a directory that belongs to the common directory the resulting working tree-specific path is expected. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-23path.c: clarify two field names in 'struct common_dir'SZEDER Gábor
An array of 'struct common_dir' instances is used to specify whether various paths in $GIT_DIR are specific to a worktree, or are common, i.e. belong to main worktree. The names of two fields in this struct are somewhat confusing or ambigious: - The path is recorded in the struct's 'dirname' field, even though several entries are regular files e.g. 'gc.pid', 'packed-refs', etc. Rename this field to 'path' to reduce confusion. - The field 'exclude' tells whether the path is excluded... from where? Excluded from the common dir or from the worktree? It means the former, but it's ambigious. Rename this field to 'is_common' to make it unambigious what it means. This, however, means the exact opposite of what 'exclude' meant, so we have to negate the field's value in all entries as well. The diff is best viewed with '--color-words'. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-23path.c: mark 'logs/HEAD' in 'common_list' as fileSZEDER Gábor
'logs/HEAD', i.e. HEAD's reflog, is a file, but its entry in 'common_list' has the 'is_dir' bit set. Unset that bit to make it consistent with what 'logs/HEAD' is supposed to be. This doesn't make a difference in behavior: check_common() is the only function that looks at the 'is_dir' bit, and that function either returns 0, or '!exclude', which for 'logs/HEAD' results in 0 as well. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-23path.c: clarify trie_find()'s in-code commentSZEDER Gábor
A fairly long comment describes trie_find()'s behavior and shows examples, but it's slightly incomplete/inaccurate. Update this comment to specify how trie_find() handles a negative return value from the given match function. Furthermore, update the list of examples to include not only two but three levels of path components. This makes the examples slightly more complicated, but it can illustrate the behavior in more corner cases. Finally, basically everything refers to the data stored for a key as "value", with two confusing exceptions: - The type definition of the match function calls its corresponding parameter 'data'. Rename that parameter to 'value'. (check_common(), the only function of this type already calls it 'value'). - The table of examples above trie_find() has a "val from node" column, which has nothing to do with the value stored in the trie: it's a "prefix of the key for which the trie contains a value" that led to that node. Rename that column header to "prefix to node". Note that neither the original nor the updated description and examples correspond 100% to the current implementation, because the implementation is a bit buggy, but the comment describes the desired behavior. The bug will be fixed in the last patch of this series. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-26path: add a function to check for path suffixbrian m. carlson
We have a function to strip the path suffix from a commit, but we don't have one to check for a path suffix. For a plain filename, we can use basename, but that requires an allocation, since POSIX allows it to modify its argument. Refactor strip_path_suffix into a helper function and a new function, ends_with_path_components, to meet this need. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-09Merge branch 'nd/rewritten-ref-is-per-worktree'Junio C Hamano
"git rebase" uses the refs/rewritten/ hierarchy to store its intermediate states, which inherently makes the hierarchy per worktree, but it didn't quite work well. * nd/rewritten-ref-is-per-worktree: Make sure refs/rewritten/ is per-worktree files-backend.c: reduce duplication in add_per_worktree_entries_to_dir() files-backend.c: factor out per-worktree code in loose_fill_ref_dir()
2019-03-08Make sure refs/rewritten/ is per-worktreeNguyễn Thái Ngọc Duy
a9be29c981 (sequencer: make refs generated by the `label` command worktree-local, 2018-04-25) adds refs/rewritten/ as per-worktree reference space. Unfortunately (my bad) there are a couple places that need update to make sure it's really per-worktree. - add_per_worktree_entries_to_dir() is updated to make sure ref listing look at per-worktree refs/rewritten/ instead of per-repo one [1] - common_list[] is updated so that git_path() returns the correct location. This includes "rev-parse --git-path". This mess is created by me. I started trying to fix it with the introduction of refs/worktree, where all refs will be per-worktree without special treatments. Unfortunate refs/rewritten came before refs/worktree so this is all we can do. This also fixes logs/refs/worktree not being per-worktree. [1] note that ref listing still works sometimes. For example, if you have .git/worktrees/foo/refs/rewritten/bar AND the directory .git/worktrees/refs/rewritten, refs/rewritten/bar will show up. add_per_worktree_entries_to_dir() is only needed when the directory .git/worktrees/refs/rewritten is missing. Reported-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-04Merge branch 'jk/loose-object-cache'Junio C Hamano
Code clean-up with optimization for the codepath that checks (non-)existence of loose objects. * jk/loose-object-cache: odb_load_loose_cache: fix strbuf leak fetch-pack: drop custom loose object cache sha1-file: use loose object cache for quick existence check object-store: provide helpers for loose_objects_cache sha1-file: use an object_directory for the main object dir handle alternates paths the same as the main object dir sha1_file_name(): overwrite buffer instead of appending rename "alternate_object_database" to "object_directory" submodule--helper: prefer strip_suffix() to ends_with() fsck: do not reuse child_process structs
2018-11-21Merge branch 'tb/char-may-be-unsigned' into maintJunio C Hamano
Build portability fix. * tb/char-may-be-unsigned: path.c: char is not (always) signed
2018-11-13Merge branch 'nd/per-worktree-ref-iteration'Junio C Hamano
The code to traverse objects for reachability, used to decide what objects are unreferenced and expendable, have been taught to also consider per-worktree refs of other worktrees as starting points to prevent data loss. * nd/per-worktree-ref-iteration: git-worktree.txt: correct linkgit command name reflog expire: cover reflog from all worktrees fsck: check HEAD and reflog from other worktrees fsck: move fsck_head_link() to get_default_heads() to avoid some globals revision.c: better error reporting on ref from different worktrees revision.c: correct a parameter name refs: new ref types to make per-worktree refs visible to all worktrees Add a place for (not) sharing stuff between worktrees refs.c: indent with tabs, not spaces
2018-11-13sha1-file: use an object_directory for the main object dirJeff King
Our handling of alternate object directories is needlessly different from the main object directory. As a result, many places in the code basically look like this: do_something(r->objects->objdir); for (odb = r->objects->alt_odb_list; odb; odb = odb->next) do_something(odb->path); That gets annoying when do_something() is non-trivial, and we've resorted to gross hacks like creating fake alternates (see find_short_object_filename()). Instead, let's give each raw_object_store a unified list of object_directory structs. The first will be the main store, and everything after is an alternate. Very few callers even care about the distinction, and can just loop over the whole list (and those who care can just treat the first element differently). A few observations: - we don't need r->objects->objectdir anymore, and can just mechanically convert that to r->objects->odb->path - object_directory's path field needs to become a real pointer rather than a FLEX_ARRAY, in order to fill it with expand_base_dir() - we'll call prepare_alt_odb() earlier in many functions (i.e., outside of the loop). This may result in us calling it even when our function would be satisfied looking only at the main odb. But this doesn't matter in practice. It's not a very expensive operation in the first place, and in the majority of cases it will be a noop. We call it already (and cache its results) in prepare_packed_git(), and we'll generally check packs before loose objects. So essentially every program is going to call it immediately once per program. Arguably we should just prepare_alt_odb() immediately upon setting up the repository's object directory, which would save us sprinkling calls throughout the code base (and forgetting to do so has been a source of subtle bugs in the past). But I've stopped short of that here, since there are already a lot of other moving parts in this patch. - Most call sites just get shorter. The check_and_freshen() functions are an exception, because they have entry points to handle local and nonlocal directories separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-06Merge branch 'tb/char-may-be-unsigned'Junio C Hamano
Build portability fix. * tb/char-may-be-unsigned: path.c: char is not (always) signed
2018-10-26path.c: char is not (always) signedTorsten Bögershausen
If a "char" in C is signed or unsigned is not specified, because it is out of tradition "implementation dependent". Therefore constructs like "if (name[i] < 0)" are not portable, use "if (name[i] & 0x80)" instead. Detected by "gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516" when setting DEVELOPER = 1 DEVOPTS = extra-all Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-06Add a place for (not) sharing stuff between worktreesNguyễn Thái Ngọc Duy
When multiple worktrees are used, we need rules to determine if something belongs to one worktree or all of them. Instead of keeping adding rules when new stuff comes (*), have a generic rule: - Inside $GIT_DIR, which is per-worktree by default, add $GIT_DIR/common which is always shared. New features that want to share stuff should put stuff under this directory. - Inside refs/, which is shared by default except refs/bisect, add refs/worktree/ which is per-worktree. We may eventually move refs/bisect to this new location and remove the exception in refs code. (*) And it may also include stuff from external commands which will have no way to modify common/per-worktree rules. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-18Merge branch 'sb/object-store-grafts'Junio C Hamano
The conversion to pass "the_repository" and then "a_repository" throughout the object access API continues. * sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h
2018-05-29Sync with Git 2.17.1Junio C Hamano
* maint: (25 commits) Git 2.17.1 Git 2.16.4 Git 2.15.2 Git 2.14.4 Git 2.13.7 fsck: complain when .gitmodules is a symlink index-pack: check .gitmodules files with --strict unpack-objects: call fsck_finish() after fscking objects fsck: call fsck_finish() after fscking objects fsck: check .gitmodules content fsck: handle promisor objects in .gitmodules check fsck: detect gitmodules files fsck: actually fsck blob data fsck: simplify ".git" check index-pack: make fsck error message more specific verify_path: disallow symlinks in .gitmodules update-index: stat updated files earlier verify_dotfile: mention case-insensitivity in comment verify_path: drop clever fallthrough skip_prefix: add case-insensitive variant ...
2018-05-22Sync with Git 2.14.4Junio C Hamano
* maint-2.14: Git 2.14.4 Git 2.13.7 verify_path: disallow symlinks in .gitmodules update-index: stat updated files earlier verify_dotfile: mention case-insensitivity in comment verify_path: drop clever fallthrough skip_prefix: add case-insensitive variant is_{hfs,ntfs}_dotgitmodules: add tests is_ntfs_dotgit: match other .git files is_hfs_dotgit: match other .git files is_ntfs_dotgit: use a size_t for traversing string submodule-config: verify submodule names as paths
2018-05-22Sync with Git 2.13.7Junio C Hamano
* maint-2.13: Git 2.13.7 verify_path: disallow symlinks in .gitmodules update-index: stat updated files earlier verify_dotfile: mention case-insensitivity in comment verify_path: drop clever fallthrough skip_prefix: add case-insensitive variant is_{hfs,ntfs}_dotgitmodules: add tests is_ntfs_dotgit: match other .git files is_hfs_dotgit: match other .git files is_ntfs_dotgit: use a size_t for traversing string submodule-config: verify submodule names as paths
2018-05-22is_ntfs_dotgit: match other .git filesJohannes Schindelin
When we started to catch NTFS short names that clash with .git, we only looked for GIT~1. This is sufficient because we only ever clone into an empty directory, so .git is guaranteed to be the first subdirectory or file in that directory. However, even with a fresh clone, .gitmodules is *not* necessarily the first file to be written that would want the NTFS short name GITMOD~1: a malicious repository can add .gitmodul0000 and friends, which sorts before `.gitmodules` and is therefore checked out *first*. For that reason, we have to test not only for ~1 short names, but for others, too. It's hard to just adapt the existing checks in is_ntfs_dotgit(): since Windows 2000 (i.e., in all Windows versions still supported by Git), NTFS short names are only generated in the <prefix>~<number> form up to number 4. After that, a *different* prefix is used, calculated from the long file name using an undocumented, but stable algorithm. For example, the short name of .gitmodules would be GITMOD~1, but if it is taken, and all of ~2, ~3 and ~4 are taken, too, the short name GI7EBA~1 will be used. From there, collisions are handled by incrementing the number, shortening the prefix as needed (until ~9999999 is reached, in which case NTFS will not allow the file to be created). We'd also want to handle .gitignore and .gitattributes, which suffer from a similar problem, using the fall-back short names GI250A~1 and GI7D29~1, respectively. To accommodate for that, we could reimplement the hashing algorithm, but it is just safer and simpler to provide the known prefixes. This algorithm has been reverse-engineered and described at https://usn.pw/blog/gen/2015/06/09/filenames/, which is defunct but still available via https://web.archive.org/. These can be recomputed by running the following Perl script: -- snip -- use warnings; use strict; sub compute_short_name_hash ($) { my $checksum = 0; foreach (split('', $_[0])) { $checksum = ($checksum * 0x25 + ord($_)) & 0xffff; } $checksum = ($checksum * 314159269) & 0xffffffff; $checksum = 1 + (~$checksum & 0x7fffffff) if ($checksum & 0x80000000); $checksum -= (($checksum * 1152921497) >> 60) * 1000000007; return scalar reverse sprintf("%x", $checksum & 0xffff); } print compute_short_name_hash($ARGV[0]); -- snap -- E.g., running that with the argument ".gitignore" will result in "250a" (which then becomes "gi250a" in the code). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jeff King <peff@peff.net>
2018-05-22is_ntfs_dotgit: use a size_t for traversing stringJeff King
We walk through the "name" string using an int, which can wrap to a negative value and cause us to read random memory before our array (e.g., by creating a tree with a name >2GB, since "int" is still 32 bits even on most 64-bit platforms). Worse, this is easy to trigger during the fsck_tree() check, which is supposed to be protecting us from malicious garbage. Note one bit of trickiness in the existing code: we sometimes assign -1 to "len" at the end of the loop, and then rely on the "len++" in the for-loop's increment to take it back to 0. This is still legal with a size_t, since assigning -1 will turn into SIZE_MAX, which then wraps around to 0 on increment. Signed-off-by: Jeff King <peff@peff.net>
2018-05-17path.c: migrate global git_path_* to take a repository argumentStefan Beller
Migrate all git_path_* functions that are defined in path.c to take a repository argument. Unlike other patches in this series, do not use the #define trick, as we rewrite the whole function, which is rather small. This doesn't migrate all the functions, as other builtins have their own local path functions defined using GIT_PATH_FUNC. So keep that macro around to serve the other locations. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-23repository: introduce raw object store fieldStefan Beller
The raw object store field will contain any objects needed for access to objects in a given repository. This patch introduces the raw object store and populates it with the `objectdir`, which used to be part of the repository struct. As the struct gains members, we'll also populate the function to clear the memory for these members. In a later step, we'll introduce a struct object_parser, that will complement the object parsing in a repository struct: The raw object parser is the layer that will provide access to raw object content, while the higher level object parser code will parse raw objects and keeps track of parenthood and other object relationships using 'struct object'. For now only add the lower level to the repository struct. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-28Merge branch 'ao/path-use-xmalloc'Junio C Hamano
A possible oom error is now caught as a fatal error, instead of continuing and dereferencing NULL. * ao/path-use-xmalloc: path.c: use xmalloc() in add_to_trie()