path: root/builtin/init-db.c
AgeCommit message (Collapse)Author
2021-05-25Merge branch 'mt/init-template-userpath-fix'Junio C Hamano
Regression fix. * mt/init-template-userpath-fix: init: fix bug regarding ~/ expansion in init.templateDir
2021-05-25init: fix bug regarding ~/ expansion in init.templateDirMatheus Tavares
We used to read the init.templateDir setting at builtin/init-db.c using a git_config() callback that, in turn, called git_config_pathname(). To simplify the config reading logic at this file and plug a memory leak, this was replaced by a direct call to git_config_get_value() at e4de4502e6 ("init: remove git_init_db_config() while fixing leaks", 2021-03-14). However, this function doesn't provide path expanding semantics, like git_config_pathname() does, so paths with '~/' and '~user/' are treated literally. This makes 'git init' fail to handle init.templateDir paths using these constructs: $ git config init.templateDir '~/templates_dir' $ git init 'warning: templates not found in ~/templates_dir' Replace the git_config_get_value() call by git_config_get_pathname(), which does the '~/' and '~user/' expansions. Also add a regression test. Note that unlike git_config_get_value(), the config cache does not own the memory for the path returned by git_config_get_pathname(), so we must free() it. Reported on IRC by rkta. Signed-off-by: Matheus Tavares <> Signed-off-by: Junio C Hamano <>
2021-04-07Merge branch 'ah/plugleaks'Junio C Hamano
Plug or annotate remaining leaks that trigger while running the very basic set of tests. * ah/plugleaks: transport: also free remote_refs in transport_disconnect() parse-options: don't leak alias help messages parse-options: convert bitfield values to use binary shift init-db: silence template_dir leak when converting to absolute path init: remove git_init_db_config() while fixing leaks worktree: fix leak in dwim_branch() clone: free or UNLEAK further pointers when finished reset: free instead of leaking unneeded ref symbolic-ref: don't leak shortened refname in check_symref()
2021-03-14init-db: silence template_dir leak when converting to absolute pathAndrzej Hunt
template_dir starts off pointing to either argv or nothing. However if the value supplied in argv is a relative path, absolute_pathdup() is used to turn it into an absolute path. absolute_pathdup() allocates a new string, and we then "leak" it when cmd_init_db() completes. We don't bother to actually free the return value (instead we UNLEAK it), because there's no significant advantage to doing so here. Correctly freeing it would require more significant changes to code flow which would be more noisy than beneficial. Signed-off-by: Andrzej Hunt <> Signed-off-by: Junio C Hamano <>
2021-03-14init: remove git_init_db_config() while fixing leaksAndrzej Hunt
The primary goal of this change is to stop leaking init_db_template_dir. This leak can happen because: 1. git_init_db_config() allocates new memory into init_db_template_dir without first freeing the existing value. 2. init_db_template_dir might already contain data, either because: 2.1 git_config() can be invoked twice with this callback in a single process - at least 2 allocations are likely. 2.2 A single git_config() allocation can invoke the callback multiple times for a given key (see further explanation in the function docs) - each of those calls will trigger another leak. The simplest fix for the leak would be to free(init_db_template_dir) before overwriting it. Instead we choose to convert to fetching init.templatedir via git_config_get_value() as that is more explicit, more efficient, and avoids allocations (the returned result is owned by the config cache, so we aren't responsible for freeing it). If we remove init_db_template_dir, git_init_db_config() ends up being responsible only for forwarding core.* config values to platform_core_config(). However platform_core_config() already ignores non-core.* config values, so we can safely remove git_init_db_config() and invoke git_config() directly with platform_core_config() as the callback. The platform_core_config forwarding was originally added in: 287853392a (mingw: respect core.hidedotfiles = false in git-init again, 2019-03-11 And I suspect the potential for a leak existed since the original implementation of git_init_db_config in: 90b45187ba (Add `init.templatedir` configuration variable., 2010-02-17) LSAN output from t0001: Direct leak of 73 byte(s) in 1 object(s) allocated from: #0 0x49a859 in realloc /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3 #1 0x9a7276 in xrealloc /home/ahunt/oss-fuzz/git/wrapper.c:126:8 #2 0x9362ad in strbuf_grow /home/ahunt/oss-fuzz/git/strbuf.c:98:2 #3 0x936eaa in strbuf_add /home/ahunt/oss-fuzz/git/strbuf.c:295:2 #4 0x868112 in strbuf_addstr /home/ahunt/oss-fuzz/git/./strbuf.h:304:2 #5 0x86a8ad in expand_user_path /home/ahunt/oss-fuzz/git/path.c:758:2 #6 0x720bb1 in git_config_pathname /home/ahunt/oss-fuzz/git/config.c:1287:10 #7 0x5960e2 in git_init_db_config /home/ahunt/oss-fuzz/git/builtin/init-db.c:161:11 #8 0x7255b8 in configset_iter /home/ahunt/oss-fuzz/git/config.c:1982:7 #9 0x7253fc in repo_config /home/ahunt/oss-fuzz/git/config.c:2311:2 #10 0x725ca7 in git_config /home/ahunt/oss-fuzz/git/config.c:2399:2 #11 0x593e8d in create_default_files /home/ahunt/oss-fuzz/git/builtin/init-db.c:225:2 #12 0x5935c6 in init_db /home/ahunt/oss-fuzz/git/builtin/init-db.c:449:11 #13 0x59588e in cmd_init_db /home/ahunt/oss-fuzz/git/builtin/init-db.c:714:9 #14 0x4cd60d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11 #15 0x4cb2da in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3 #16 0x4ccc37 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4 #17 0x4cac29 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19 #18 0x69c4de in main /home/ahunt/oss-fuzz/git/common-main.c:52:11 #19 0x7f23552d6349 in __libc_start_main (/lib64/ Signed-off-by: Andrzej Hunt <> Signed-off-by: Junio C Hamano <>
2021-03-10builtin/init-db: handle bare clones when core.bare set to falsebrian m. carlson
In 552955ed7f ("clone: use more conventional config/option layering", 2020-10-01), clone learned to read configuration options earlier in its execution, before creating the new repository. However, that led to a problem: if the core.bare setting is set to false in the global config, cloning a bare repository segfaults. This happens because the repository is falsely thought to be non-bare, but clone has set the work tree to NULL, which is then dereferenced. The code to initialize the repository already considers the fact that a user might want to override the --bare option for git init, but it doesn't take into account clone, which uses a different option. Let's just check that the work tree is not NULL, since that's how clone indicates that the repository is bare. This is also the case for git init, so we won't be regressing that case. Reported-by: Joseph Vusich <> Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2020-12-13get_default_branch_name(): prepare for showing some adviceJohannes Schindelin
We are about to introduce a message giving users running `git init` some advice about `init.defaultBranch`. This will necessarily be done in `repo_default_branch_name()`. Not all code paths want to show that advice, though. In particular, the `git clone` codepath _specifically_ asks for `init_db()` to be quiet, via the `INIT_DB_QUIET` flag. In preparation for showing users above-mentioned advice, let's change the function signature of `get_default_branch_name()` to accept the parameter `quiet`. Signed-off-by: Johannes Schindelin <> Signed-off-by: Junio C Hamano <>
2020-09-22builtin/clone: avoid failure with GIT_DEFAULT_HASHbrian m. carlson
If a user is cloning a SHA-1 repository with GIT_DEFAULT_HASH set to "sha256", then we can end up with a repository where the repository format version is 0 but the extensions.objectformat key is set to "sha256". This is both wrong (the user has a SHA-1 repository) and nonfunctional (because the extension cannot be used in a v0 repository). This happens because in a clone, we initially set up the repository, and then change its algorithm based on what the remote side tells us it's using. We've initially set up the repository as SHA-256 in this case, and then later on reset the repository version without clearing the extension. We could just always set the extension in this case, but that would mean that our SHA-1 repositories weren't compatible with older Git versions, even though there's no reason why they shouldn't be. And we also don't want to initialize the repository as SHA-1 initially, since that means if we're cloning an empty repository, we'll have failed to honor the GIT_DEFAULT_HASH variable and will end up with a SHA-1 repository, not a SHA-256 repository. Neither of those are appealing, so let's tell the repository initialization code if we're doing a reinit like this, and if so, to clear the extension if we're using SHA-1. This makes sure we produce a valid and functional repository and doesn't break any of our other use cases. Reported-by: Matheus Tavares <> Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2020-08-31init: make --separate-git-dir work from within linked worktreeEric Sunshine
The intention of `git init --separate-work-dir=<path>` is to move the .git/ directory to a location outside of the main worktree. When used within a linked worktree, however, rather than moving the .git/ directory as intended, it instead incorrectly moves the worktree's .git/worktrees/<id> directory to <path>, thus disconnecting the linked worktree from its parent repository and breaking the worktree in the process since its local .git file no longer points at a location at which it can find the object database. Fix this broken behavior. An intentional side-effect of this change is that it also closes a loophole not caught by ccf236a23a (init: disallow --separate-git-dir with bare repository, 2020-08-09) in which the check to prevent --separate-git-dir being used in conjunction with a bare repository was unable to detect the invalid combination when invoked from within a linked worktree. Therefore, add a test to verify that this loophole is closed, as well. Reported-by: Henré Botha <> Signed-off-by: Eric Sunshine <> Signed-off-by: Junio C Hamano <>
2020-08-31init: teach --separate-git-dir to repair linked worktreesEric Sunshine
A linked worktree's .git file is a "gitfile" pointing at the .git/worktrees/<id> directory within the repository. When `git init --separate-git-dir=<path>` is used on an existing repository to relocate the repository's .git/ directory to a different location, it neglects to update the .git files of linked worktrees, thus breaking the worktrees by making it impossible for them to locate the repository. Fix this by teaching --separate-git-dir to repair the .git file of each linked worktree to point at the new repository location. Reported-by: Henré Botha <> Signed-off-by: Eric Sunshine <> Signed-off-by: Junio C Hamano <>
2020-08-24Merge branch 'es/init-no-separate-git-dir-in-bare'Junio C Hamano
The purpose of "git init --separate-git-dir" is to initialize a new project with the repository separate from the working tree, or, in the case of an existing project, to move the repository (the .git/ directory) out of the working tree. It does not make sense to use --separate-git-dir with a bare repository for which there is no working tree, so disallow its use with bare repositories. * es/init-no-separate-git-dir-in-bare: init: disallow --separate-git-dir with bare repository
2020-08-10init: disallow --separate-git-dir with bare repositoryEric Sunshine
The purpose of "git init --separate-git-dir" is to separate the repository from the worktree. This is true even when --separate-git-dir is used on an existing worktree, in which case, it moves the .git/ subdirectory to a new location outside the worktree. However, an outright bare repository (such as one created by "git init --bare"), has no worktree, so using --separate-git-dir to separate it from its non-existent worktree is nonsensical. Therefore, make it an error to use --separate-git-dir on a bare repository. Implementation note: "git init" considers a repository bare if told so explicitly via --bare or if it guesses it to be so based upon heuristics. In the explicit --bare case, a conflict with --separate-git-dir is easy to detect early. In the guessed case, however, the conflict can only be detected once "bareness" is guessed, which happens after "git init" has begun creating the repository. Technically, we can get by with a single late check which would cover both cases, however, erroring out early, when possible, without leaving detritus provides a better user experience. Signed-off-by: Eric Sunshine <> Signed-off-by: Junio C Hamano <>
2020-07-30repository: enable SHA-256 support by defaultbrian m. carlson
Now that we have a complete SHA-256 implementation in Git, let's enable it so people can use it. Remove the ENABLE_SHA256 define constant everywhere it's used. Add tests for initializing a repository with SHA-256. Signed-off-by: brian m. carlson <> Reviewed-by: Eric Sunshine <> Signed-off-by: Junio C Hamano <>
2020-06-24init: allow setting the default for the initial branch name via the configDon Goodman-Wilson
We just introduced the command-line option `--initial-branch=<branch-name>` to allow initializing a new repository with a different initial branch than the hard-coded one. To allow users to override the initial branch name more permanently (i.e. without having to specify the name manually for each and every `git init` invocation), let's introduce the `init.defaultBranch` config setting. Helped-by: Johannes Schindelin <> Helped-by: Derrick Stolee <> Signed-off-by: Don Goodman-Wilson <> Signed-off-by: Junio C Hamano <>
2020-06-24init: allow specifying the initial branch name for the new repositoryJohannes Schindelin
There is a growing number of projects and companies desiring to change the main branch name of their repositories (see e.g. for background on this). To change that branch name for new repositories, currently the only way to do that automatically is by copying all of Git's template directory, then hard-coding the desired default branch name into the `.git/HEAD` file, and then configuring `init.templateDir` to point to those copied template files. To make this process much less cumbersome, let's introduce a new option: `--initial-branch=<branch-name>`. Signed-off-by: Johannes Schindelin <> Signed-off-by: Junio C Hamano <>
2020-03-27Merge branch 'bc/sha-256-part-1-of-4'Junio C Hamano
SHA-256 transition continues. * bc/sha-256-part-1-of-4: (22 commits) fast-import: add options for rewriting submodules fast-import: add a generic function to iterate over marks fast-import: make find_marks work on any mark set fast-import: add helper function for inserting mark object entries fast-import: permit reading multiple marks files commit: use expected signature header for SHA-256 worktree: allow repository version 1 init-db: move writing repo version into a function builtin/init-db: add environment variable for new repo hash builtin/init-db: allow specifying hash algorithm on command line setup: allow check_repository_format to read repository format t/helper: make repository tests hash independent t/helper: initialize repository if necessary t/helper/test-dump-split-index: initialize git repository t6300: make hash algorithm independent t6300: abstract away SHA-1-specific constants t: use hash-specific lookup tables to define test constants repository: require a build flag to use SHA-256 hex: add functions to parse hex object IDs in any algorithm hex: introduce parsing variants taking hash algorithms ...
2020-03-06set_git_dir: fix crash when used with real_path()Alexandr Miloslavskiy
`real_path()` returns result from a shared buffer, inviting subtle reentrance bugs. One of these bugs occur when invoked this way: set_git_dir(real_path(git_dir)) In this case, `real_path()` has reentrance: real_path read_gitfile_gently repo_set_gitdir setup_git_env set_git_dir_1 set_git_dir Later, `set_git_dir()` uses its now-dead parameter: !is_absolute_path(path) Fix this by using a dedicated `strbuf` to hold `strbuf_realpath()`. Signed-off-by: Alexandr Miloslavskiy <> Signed-off-by: Junio C Hamano <>
2020-02-24init-db: move writing repo version into a functionbrian m. carlson
When we perform a clone, we won't know the remote side's hash algorithm until we've read the heads. Consequently, we'll need to rewrite the repository format version and hash algorithm once we know what the remote side has. Move the code that does this into its own function so that we can call it from clone in the future. Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2020-02-24builtin/init-db: add environment variable for new repo hashbrian m. carlson
For the foreseeable future, SHA-1 will be the default algorithm for Git. However, when running the testsuite, we want to be able to test an arbitrary algorithm. It would be quite burdensome and very untidy to have to specify the algorithm we'd like to test every time we initialized a new repository somewhere in the testsuite, so add an environment variable to allow us to specify the default hash algorithm for Git. This has the benefit that we can set it once for the entire testsuite and not have to think about it. In the future, users can also use it to set the default for their repositories if they would like to do so. Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2020-02-24builtin/init-db: allow specifying hash algorithm on command linebrian m. carlson
Allow the user to specify the hash algorithm on the command line by using the --object-format option to git init. Validate that the user is not attempting to reinitialize a repository with a different hash algorithm. Ensure that if we are writing a non-SHA-1 repository that we set the repository version to 1 and write the objectFormat extension. Restrict this option to work only when ENABLE_SHA256 is set until the codebase is in a situation to fully support this. Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2020-02-24setup: allow check_repository_format to read repository formatbrian m. carlson
In some cases, we will want to not only check the repository format, but extract the information that we've gained. To do so, allow check_repository_format to take a pointer to struct repository_format. Allow passing NULL for this argument if we're not interested in the information, and pass NULL for all existing callers. A future patch will make use of this information. Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2019-06-13Merge branch 'nd/init-relative-template-fix'Junio C Hamano
A relative pathname given to "git init --template=<path> <repo>" ought to be relative to the directory "git init" gets invoked in, but it instead was made relative to the repository, which has been corrected. * nd/init-relative-template-fix: init: make --template path relative to $CWD
2019-05-13init: make --template path relative to $CWDNguyễn Thái Ngọc Duy
During git-init we chdir() to the target directory, but --template is not adjusted. So it's relative to the target directory instead of current directory. It would be ok if it's documented, but --template in git-init.txt mentions nothing about this behavior. Change it to be relative to $CWD, which is much more intuitive. The changes in the test suite show that this relative-to-target behavior is actually used. I just hope that it's only used in the test suite and it's safe to change. Otherwise, the other option is just document it (i.e. relative to target dir) and move on. Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2019-04-16Merge branch 'js/init-db-update-for-mingw'Junio C Hamano
"git init" forgot to read platform-specific repository configuration, which made Windows port to ignore settings of core.hidedotfiles, for example. * js/init-db-update-for-mingw: mingw: respect core.hidedotfiles = false in git-init again
2019-03-20Merge branch 'ma/clear-repository-format'Junio C Hamano
The setup code has been cleaned up to avoid leaks around the repository_format structure. * ma/clear-repository-format: setup: fix memory leaks with `struct repository_format` setup: free old value before setting `work_tree`
2019-03-12mingw: respect core.hidedotfiles = false in git-init againJohannes Schindelin
This is a brown paper bag. When adding the tests, we actually failed to verify that the config variable is heeded in git-init at all. And when changing the original patch that marked the .git/ directory as hidden after reading the config, it was lost on this developer that the new code would use the hide_dotfiles variable before the config was read. The fix is obvious: read the (limited, pre-init) config *before* creating the .git/ directory. Please note that we cannot remove the identical-looking `git_config()` call from `create_default_files()`: we create the `.git/` directory between those calls. If we removed it, and if the parent directory is in a Git worktree, and if that worktree's `.git/config` contained any `init.templatedir` setting, we would all of a sudden pick that up. This fixes Signed-off-by: Johannes Schindelin <> Signed-off-by: Junio C Hamano <>
2019-02-28setup: fix memory leaks with `struct repository_format`Martin Ågren
After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. Thus, it is important to initialize the struct before calling `read_...()`, so document that. It's also important because we might not even call `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c. Teach `read_...()` to clear the struct on error, so that it is reset to a safe state, and document this. (In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.) We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <> Signed-off-by: Junio C Hamano <>
2019-01-29Merge branch 'jk/save-getenv-result'Junio C Hamano
There were many places the code relied on the string returned from getenv() to be non-volatile, which is not true, that have been corrected. * jk/save-getenv-result: builtin_diff(): read $GIT_DIFF_OPTS closer to use merge-recursive: copy $GITHEAD strings init: make a copy of $GIT_DIR string config: make a copy of $GIT_CONFIG string commit: copy saved getenv() result get_super_prefix(): copy getenv() result
2019-01-12init: make a copy of $GIT_DIR stringJeff King
We pass the result of getenv("GIT_DIR") to init_db() and assume that the string remains valid. But that's not guaranteed across calls to setenv() or even getenv(), although it often works in practice. Let's make a copy of the string so that we follow the rules. Note that we need to mark it with UNLEAK(), since the value persists until the end of program (but we have no opportunity to free it). This patch also handles $GIT_WORK_TREE the same way. It actually doesn't have as long a lifetime and is probably fine, but it's simpler to just treat the two side-by-side variables the same. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2018-11-06assert NOARG/NONEG behavior of parse-options callbacksJeff King
When we define a parse-options callback, the flags we put in the option struct must match what the callback expects. For example, a callback which does not handle the "unset" parameter should only be used with PARSE_OPT_NONEG. But since the callback and the option struct are not defined next to each other, it's easy to get this wrong (as earlier patches in this series show). Fortunately, the compiler can help us here: compiling with -Wunused-parameters can show us which callbacks ignore their "unset" parameters (and likewise, ones that ignore "arg" expect to be triggered with PARSE_OPT_NOARG). But after we've inspected a callback and determined that all of its callers use the right flags, what do we do next? We'd like to silence the compiler warning, but do so in a way that will catch any wrong calls in the future. We can do that by actually checking those variables and asserting that they match our expectations. Because this is such a common pattern, we'll introduce some helper macros. The resulting messages aren't as descriptive as we could make them, but the file/line information from BUG() is enough to identify the problem (and anyway, the point is that these should never be seen). Each of the annotated callbacks in this patch triggers -Wunused-parameters, and was manually inspected to make sure all callers use the correct options (so none of these BUGs should be triggerable). Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2018-07-24pass st.st_size as hint for strbuf_readlink()Jeff King
When we initially added the strbuf_readlink() function in b11b7e13f4 (Add generic 'strbuf_readlink()' helper function, 2008-12-17), the point was that we generally have a _guess_ as to the correct size based on the stat information, but we can't necessarily trust it. Over the years, a few callers have grown up that simply pass in 0, even though they have the stat information. Let's have them pass in their hint for consistency (and in theory efficiency, since it may avoid an extra resize/syscall loop, but neither location is probably performance critical). Note that st.st_size is actually an off_t, so in theory we need xsize_t() here. But none of the other callsites use it, and since this is just a hint, it doesn't matter either way (if we wrap we'll simply start with a too-small hint and then eventually complain when we cannot allocate the memory). Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2018-06-01Merge branch 'rd/init-typo'Junio C Hamano
Message fix. * rd/init-typo: init: fix grammar in "templates not found" msg
2018-05-30init: fix grammar in "templates not found" msgRobert P. J. Day
Signed-off-by: Robert P. J. Day <> Signed-off-by: Junio C Hamano <>
2018-05-06Replace all die("BUG: ...") calls by BUG() onesJohannes Schindelin
In d8193743e08 (usage.c: add BUG() function, 2017-05-12), a new macro was introduced to use for reporting bugs instead of die(). It was then subsequently used to convert one single caller in 588a538ae55 (setup_git_env: convert die("BUG") to BUG(), 2017-05-12). The cover letter of the patch series containing this patch (cf is not terribly clear why only one call site was converted, or what the plan is for other, similar calls to die() to report bugs. Let's just convert all remaining ones in one fell swoop. This trick was performed by this invocation: sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c) Signed-off-by: Johannes Schindelin <> Signed-off-by: Junio C Hamano <>
2018-04-11exec_cmd: rename to use dash in file nameStefan Beller
This is more consistent with the project style. The majority of Git's source files use dashes in preference to underscores in their file names. Signed-off-by: Stefan Beller <>
2018-02-22init-db: rename 'template' variablesBrandon Williams
Rename C++ keyword in order to bring the codebase closer to being able to be compiled with a C++ compiler. Signed-off-by: Brandon Williams <> Signed-off-by: Junio C Hamano <>
2017-09-08add UNLEAK annotation for reducing leak false positivesJeff King
It's a common pattern in git commands to allocate some memory that should last for the lifetime of the program and then not bother to free it, relying on the OS to throw it away. This keeps the code simple, and it's fast (we don't waste time traversing structures or calling free at the end of the program). But it also triggers warnings from memory-leak checkers like valgrind or LSAN. They know that the memory was still allocated at program exit, but they don't know _when_ the leaked memory stopped being useful. If it was early in the program, then it's probably a real and important leak. But if it was used right up until program exit, it's not an interesting leak and we'd like to suppress it so that we can see the real leaks. This patch introduces an UNLEAK() macro that lets us do so. To understand its design, let's first look at some of the alternatives. Unfortunately the suppression systems offered by leak-checking tools don't quite do what we want. A leak-checker basically knows two things: 1. Which blocks were allocated via malloc, and the callstack during the allocation. 2. Which blocks were left un-freed at the end of the program (and which are unreachable, but more on that later). Their suppressions work by mentioning the function or callstack of a particular allocation, and marking it as OK to leak. So imagine you have code like this: int cmd_foo(...) { /* this allocates some memory */ char *p = some_function(); printf("%s", p); return 0; } You can say "ignore allocations from some_function(), they're not leaks". But that's not right. That function may be called elsewhere, too, and we would potentially want to know about those leaks. So you can say "ignore the callstack when main calls some_function". That works, but your annotations are brittle. In this case it's only two functions, but you can imagine that the actual allocation is much deeper. If any of the intermediate code changes, you have to update the suppression. What we _really_ want to say is that "the value assigned to p at the end of the function is not a real leak". But leak-checkers can't understand that; they don't know about "p" in the first place. However, we can do something a little bit tricky if we make some assumptions about how leak-checkers work. They generally don't just report all un-freed blocks. That would report even globals which are still accessible when the leak-check is run. Instead they take some set of memory (like BSS) as a root and mark it as "reachable". Then they scan the reachable blocks for anything that looks like a pointer to a malloc'd block, and consider that block reachable. And then they scan those blocks, and so on, transitively marking anything reachable from a global as "not leaked" (or at least leaked in a different category). So we can mark the value of "p" as reachable by putting it into a variable with program lifetime. One way to do that is to just mark "p" as static. But that actually affects the run-time behavior if the function is called twice (you aren't likely to call main() twice, but some of our cmd_*() functions are called from other commands). Instead, we can trick the leak-checker by putting the value into _any_ reachable bytes. This patch keeps a global linked-list of bytes copied from "unleaked" variables. That list is reachable even at program exit, which confers recursive reachability on whatever values we unleak. In other words, you can do: int cmd_foo(...) { char *p = some_function(); printf("%s", p); UNLEAK(p); return 0; } to annotate "p" and suppress the leak report. But wait, couldn't we just say "free(p)"? In this toy example, yes. But UNLEAK()'s byte-copying strategy has several advantages over actually freeing the memory: 1. It's recursive across structures. In many cases our "p" is not just a pointer, but a complex struct whose fields may have been allocated by a sub-function. And in some cases (e.g., dir_struct) we don't even have a function which knows how to free all of the struct members. By marking the struct itself as reachable, that confers reachability on any pointers it contains (including those found in embedded structs, or reachable by walking heap blocks recursively. 2. It works on cases where we're not sure if the value is allocated or not. For example: char *p = argc > 1 ? argv[1] : some_function(); It's safe to use UNLEAK(p) here, because it's not freeing any memory. In the case that we're pointing to argv here, the reachability checker will just ignore our bytes. 3. Likewise, it works even if the variable has _already_ been freed. We're just copying the pointer bytes. If the block has been freed, the leak-checker will skip over those bytes as uninteresting. 4. Because it's not actually freeing memory, you can UNLEAK() before we are finished accessing the variable. This is helpful in cases like this: char *p = some_function(); return another_function(p); Writing this with free() requires: int ret; char *p = some_function(); ret = another_function(p); free(p); return ret; But with unleak we can just write: char *p = some_function(); UNLEAK(p); return another_function(p); This patch adds the UNLEAK() macro and enables it automatically when Git is compiled with SANITIZE=leak. In normal builds it's a noop, so we pay no runtime cost. It also adds some UNLEAK() annotations to show off how the feature works. On top of other recent leak fixes, these are enough to get t0000 and t0001 to pass when compiled with LSAN. Note the case in commit.c which actually converts a strbuf_release() into an UNLEAK. This code was already non-leaky, but the free didn't do anything useful, since we're exiting. Converting it to an annotation means that non-leak-checking builds pay no runtime cost. The cost is minimal enough that it's probably not worth going on a crusade to convert these kinds of frees to UNLEAKS. I did it here for consistency with the "sb" leak (though it would have been equally correct to go the other way, and turn them both into strbuf_release() calls). Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2017-06-15config: don't include config.h by defaultBrandon Williams
Stop including config.h by default in cache.h. Instead only include config.h in those files which require use of the config system. Signed-off-by: Brandon Williams <> Signed-off-by: Junio C Hamano <>
2017-03-08real_pathdup(): fix callsites that wanted it to die on errorJohannes Schindelin
In 4ac9006f832 (real_path: have callers use real_pathdup and strbuf_realpath, 2016-12-12), we changed the xstrdup(real_path()) pattern to use real_pathdup() directly. The problem with this change is that real_path() calls strbuf_realpath() with die_on_error = 1 while real_pathdup() calls it with die_on_error = 0. Meaning that in cases where real_path() causes Git to die() with an error message, real_pathdup() is silent and returns NULL instead. The callers, however, are ill-prepared for that change, as they expect the return value to be non-NULL (and otherwise the function died with an appropriate error message). Fix this by extending real_pathdup()'s signature to accept the die_on_error flag and simply pass it through to strbuf_realpath(), and then adjust all callers after a careful audit whether they would handle NULLs well. Signed-off-by: Johannes Schindelin <> Signed-off-by: Junio C Hamano <>
2017-01-31refs: add option core.logAllRefUpdates = alwaysCornelius Weig
When core.logallrefupdates is true, we only create a new reflog for refs that are under certain well-known hierarchies. The reason is that we know that some hierarchies (like refs/tags) are not meant to change, and that unknown hierarchies might not want reflogs at all (e.g., a hypothetical refs/foo might be meant to change often and drop old history immediately). However, sometimes it is useful to override this decision and simply log for all refs, because the safety and audit trail is more important than the performance implications of keeping the log around. This patch introduces a new "always" mode for the core.logallrefupdates option which will log updates to everything under refs/, regardless where in the hierarchy it is (we still will not log things like ORIG_HEAD and FETCH_HEAD, which are known to be transient). Based-on-patch-by: Jeff King <> Signed-off-by: Cornelius Weig <> Reviewed-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2016-12-12real_path: have callers use real_pathdup and strbuf_realpathBrandon Williams
Migrate callers of real_path() who duplicate the retern value to use real_pathdup or strbuf_realpath. Signed-off-by: Brandon Williams <> Signed-off-by: Junio C Hamano <>
2016-10-03Merge branch 'nd/init-core-worktree-in-multi-worktree-world'Junio C Hamano
"git init" tried to record core.worktree in the repository's 'config' file when GIT_WORK_TREE environment variable was set and it was different from where GIT_DIR appears as ".git" at its top, but the logic was faulty when .git is a "gitdir:" file that points at the real place, causing trouble in working trees that are managed by "git worktree". This has been corrected. * nd/init-core-worktree-in-multi-worktree-world: init: kill git_link variable init: do not set unnecessary core.worktree init: kill set_git_dir_init() init: call set_git_dir_init() from within init_db() init: correct re-initialization from a linked worktree
2016-09-25init: kill git_link variableNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2016-09-25init: do not set unnecessary core.worktreeNguyễn Thái Ngọc Duy
The function needs_work_tree_config() that is called from create_default_files() is supposed to be fed the path to ".git" that looks as if it is at the top of the working tree, and decide if that location matches the actual worktree being used. This comparison allows "git init" to decide if core.worktree needs to be recorded in the working tree. In the current code, however, we feed the return value from get_git_dir(), which can be totally different from what the function expects when "gitdir" file is involved. Instead of giving the path to the ".git" at the top of the working tree, we end up feeding the actual path that the file points at. This original location of ".git" however is only known to init_db(). Make init_db() save it and have it passed to create_default_files() as a new parameter, which passes the correct location down to needs_work_tree_config() to fix this. Noticed-by: Max Nordlund <> Helped-by: Michael J Gruber <> Helped-by: Junio C Hamano <> Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2016-09-25init: kill set_git_dir_init()Nguyễn Thái Ngọc Duy
This is a pure code move, necessary to kill the global variable git_link later (and also helps a bit in the next patch). Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2016-09-25init: call set_git_dir_init() from within init_db()Nguyễn Thái Ngọc Duy
The next commit requires that set_git_dir_init() must be called before init_db(). Let's make sure nobody can do otherwise. Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2016-09-25init: correct re-initialization from a linked worktreeNguyễn Thái Ngọc Duy
When 'git init' is called from a linked worktree, we treat '.git' dir (which is $GIT_COMMON_DIR/worktrees/something) as the main '.git' (i.e. $GIT_COMMON_DIR) and populate the whole repository skeleton in there. It does not harm anything (*) but it is still wrong. Since 'git init' calls set_git_dir() at preparation time, which indirectly calls get_common_dir() and correctly detects multiple worktree setup, all git_path_buf() calls in create_default_files() will return correct paths in both single and multiple worktree setups. The only thing left is copy_templates(), which targets $GIT_DIR, not $GIT_COMMON_DIR. Fix that with get_git_common_dir(). This function will return $GIT_DIR in single-worktree setup, so we don't have to make a special case for multiple-worktree here. (*) It does in fact, thanks to another bug. More on that later. Noticed-by: Max Nordlund <> Helped-by: Michael J Gruber <> Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2016-09-21Merge branch 'jk/setup-sequence-update'Junio C Hamano
There were numerous corner cases in which the configuration files are read and used or not read at all depending on the directory a Git command was run, leading to inconsistent behaviour. The code to set-up repository access at the beginning of a Git process has been updated to fix them. * jk/setup-sequence-update: t1007: factor out repeated setup init: reset cached config when entering new repo init: expand comments explaining config trickery config: only read .git/config from configured repos test-config: setup git directory t1302: use "git -C" pager: handle early config pager: use callbacks instead of configset pager: make pager_program a file-local static pager: stop loading git_default_config() pager: remove obsolete comment diff: always try to set up the repository diff: handle --no-index prefixes consistently diff: skip implicit no-index check when given --no-index patch-id: use RUN_SETUP_GENTLY hash-object: always try to set up the git repository
2016-09-19Merge branch 'mh/ref-store'Junio C Hamano
The ref-store abstraction was introduced to the refs API so that we can plug in different backends to store references. * mh/ref-store: (38 commits) refs: implement iteration over only per-worktree refs refs: make lock generic refs: add method to rename refs refs: add methods to init refs db refs: make delete_refs() virtual refs: add method for initial ref transaction commit refs: add methods for reflog refs: add method iterator_begin files_ref_iterator_begin(): take a ref_store argument split_symref_update(): add a files_ref_store argument lock_ref_sha1_basic(): add a files_ref_store argument lock_ref_for_update(): add a files_ref_store argument commit_ref_update(): add a files_ref_store argument lock_raw_ref(): add a files_ref_store argument repack_without_refs(): add a files_ref_store argument refs: make peel_ref() virtual refs: make create_symref() virtual refs: make pack_refs() virtual refs: make verify_refname_available() virtual refs: make read_raw_ref() virtual ...
2016-09-13init: reset cached config when entering new repoJeff King
After we copy the templates into place, we re-read the config in case we copied in a default config file. But since git_config() is backed by a cache these days, it's possible that the call will not actually touch the filesystem at all; we need to tell it that something has changed behind the scenes. Note that we also need to reset the shared_repository config. At first glance, it seems like this should probably just be folded into git_config_clear(). But unfortunately that is not quite right. The shared repository value may come from config, _or_ it may have been set manually. So only the caller who knows whether or not they set it is the one who can clear it (and indeed, if you _do_ put it into git_config_clear(), then many tests fail, as we have to clear the config cache any time we set a new config variable). There are three tests here. The first two actually pass already, though it's largely luck: they just don't happen to actually read any config before we enter the new repo. But the third one does fail without this patch; we look at core.sharedrepository while creating the directory, but need to make sure the value from the template config overrides it. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>