summaryrefslogtreecommitdiff
path: root/config.c
AgeCommit message (Collapse)Author
8 daysMerge branch 'ps/config-global-override'Junio C Hamano
Replace GIT_CONFIG_NOSYSTEM mechanism to decline from reading the system-wide configuration file with GIT_CONFIG_SYSTEM that lets users specify from which file to read the system-wide configuration (setting it to an empty file would essentially be the same as setting NOSYSTEM), and introduce GIT_CONFIG_GLOBAL to override the per-user configuration in $HOME/.gitconfig. * ps/config-global-override: t1300: fix unset of GIT_CONFIG_NOSYSTEM leaking into subsequent tests config: allow overriding of global and system configuration config: unify code paths to get global config paths config: rename `git_etc_config()`
2021-04-19config: allow overriding of global and system configurationPatrick Steinhardt
In order to have git run in a fully controlled environment without any misconfiguration, it may be desirable for users or scripts to override global- and system-level configuration files. We already have a way of doing this, which is to unset both HOME and XDG_CONFIG_HOME environment variables and to set `GIT_CONFIG_NOGLOBAL=true`. This is quite kludgy, and unsetting the first two variables likely has an impact on other executables spawned by such a script. The obvious way to fix this would be to introduce `GIT_CONFIG_NOGLOBAL` as an equivalent to `GIT_CONFIG_NOSYSTEM`. But in the past, it has turned out that this design is inflexible: we cannot test system-level parsing of the git configuration in our test harness because there is no way to change its location, so all tests run with `GIT_CONFIG_NOSYSTEM` set. Instead of doing the same mistake with `GIT_CONFIG_NOGLOBAL`, introduce two new variables `GIT_CONFIG_GLOBAL` and `GIT_CONFIG_SYSTEM`: - If unset, git continues to use the usual locations. - If set to a specific path, we skip reading the normal configuration files and instead take the path. By setting the path to `/dev/null`, no configuration will be loaded for the respective level. This implements the usecase where we want to execute code in a sanitized environment without any potential misconfigurations via `/dev/null`, but is more flexible and allows for more usecases than simply adding `GIT_CONFIG_NOGLOBAL`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-19config: unify code paths to get global config pathsPatrick Steinhardt
There's two callsites which assemble global config paths, once in the config loading code and once in the git-config(1) builtin. We're about to implement a way to override global config paths via an environment variable which would require us to adjust both sites. Unify both code paths into a single `git_global_config()` function which returns both paths for `~/.gitconfig` and the XDG config file. This will make the subsequent patch which introduces the new envvar easier to implement. No functional changes are expected from this patch. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-19config: rename `git_etc_config()`Patrick Steinhardt
The `git_etc_gitconfig()` function retrieves the system-level path of the configuration file. We're about to introduce a way to override it via an environment variable, at which point the name of this function would start to become misleading. Rename the function to `git_system_config()` as a preparatory step. While at it, the function is also refactored to pass memory ownership to the caller. This is done to better match semantics of `git_global_config()`, which is going to be introduced in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-08config.c: remove last remnant of GIT_TEST_GETTEXT_POISONÆvar Arnfjörð Bjarmason
Remove a use of GIT_TEST_GETTEXT_POISON added in f276e2a4694 (config: improve error message for boolean config, 2021-02-11). This was simultaneously in-flight with my d162b25f956 (tests: remove support for GIT_TEST_GETTEXT_POISON, 2021-01-20) which removed the rest of the GIT_TEST_GETTEXT_POISON code. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-14use CALLOC_ARRAYRené Scharfe
Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-18Merge branch 'ak/config-bad-bool-error'Junio C Hamano
The error message given when a configuration variable that is expected to have a boolean value has been improved. * ak/config-bad-bool-error: config: improve error message for boolean config
2021-02-11config: improve error message for boolean configAndrew Klotz
Currently invalid boolean config values return messages about 'bad numeric', which is slightly misleading when the error was due to a boolean value. We can improve the developer experience by returning a boolean error message when we know the value is neither a bool text or int. before with an invalid boolean value of `non-boolean`, its unclear what numeric is referring to: fatal: bad numeric config value 'non-boolean' for 'commit.gpgsign': invalid unit now the error message mentions `non-boolean` is a bad boolean value: fatal: bad boolean config value 'non-boolean' for 'commit.gpgsign' Signed-off-by: Andrew Klotz <agc.klotz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-10Merge branch 'ab/detox-gettext-tests'Junio C Hamano
Get rid of "GETTEXT_POISON" support altogether, which may or may not be controversial. * ab/detox-gettext-tests: tests: remove uses of GIT_TEST_GETTEXT_POISON=false tests: remove support for GIT_TEST_GETTEXT_POISON ci: remove GETTEXT_POISON jobs
2021-01-25Merge branch 'ps/config-env-pairs'Junio C Hamano
Introduce two new ways to feed configuration variable-value pairs via environment variables, and tweak the way GIT_CONFIG_PARAMETERS encodes variable/value pairs to make it more robust. * ps/config-env-pairs: config: allow specifying config entries via envvar pairs environment: make `getenv_safe()` a public function config: store "git -c" variables using more robust format config: parse more robust format in GIT_CONFIG_PARAMETERS config: extract function to parse config pairs quote: make sq_dequote_step() a public function config: add new way to pass config via `--config-env` git: add `--super-prefix` to usage string
2021-01-21tests: remove support for GIT_TEST_GETTEXT_POISONÆvar Arnfjörð Bjarmason
This removes the ability to inject "poison" gettext() messages via the GIT_TEST_GETTEXT_POISON special test setup. I initially added this as a compile-time option in bb946bba761 (i18n: add GETTEXT_POISON to simulate unfriendly translator, 2011-02-22), and most recently modified to be toggleable at runtime in 6cdccfce1e0 (i18n: make GETTEXT_POISON a runtime option, 2018-11-08).. The reason for its removal is that the trade-off of maintaining it v.s. what it's getting us has long since flipped. When gettext was integrated in 5e9637c6297 (i18n: add infrastructure for translating Git with gettext, 2011-11-18) there was understandable concern on the Git ML that in marking messages for translation en-masse we'd inadvertently mark plumbing messages. The GETTEXT_POISON facility was a way to smoke those out via our test suite. Nowadays however we're done (or almost entirely done) with any marking of messages for translation. New messages are usually marked by their authors, who'll know whether it makes sense to translate them or not. If not any errors in marking the messages are much more likely to be spotted in review than in the the initial deluge of i18n patches in the 2011-2012 era. So let's just remove this. This leaves the test suite in a state where we still have a lot of test_i18n, C_LOCALE_OUTPUT etc. uses. Subsequent commits will remove those too. The change to t/lib-rebase.sh is a selective revert of the relevant part of f2d17068fd (i18n: rebase-interactive: mark comments of squash for translation, 2016-06-17), and the comment in t/t3406-rebase-message.sh is from c7108bf9ed (i18n: rebase: mark messages for translation, 2012-07-25). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15Merge branch 'ew/decline-core-abbrev'Junio C Hamano
The configuration variable 'core.abbrev' can be set to 'no' to force no abbreviation regardless of the hash algorithm. * ew/decline-core-abbrev: core.abbrev=no disables abbreviations
2021-01-15config: allow specifying config entries via envvar pairsPatrick Steinhardt
While we currently have the `GIT_CONFIG_PARAMETERS` environment variable which can be used to pass runtime configuration data to git processes, it's an internal implementation detail and not supposed to be used by end users. Next to being for internal use only, this way of passing config entries has a major downside: the config keys need to be parsed as they contain both key and value in a single variable. As such, it is left to the user to escape any potentially harmful characters in the value, which is quite hard to do if values are controlled by a third party. This commit thus adds a new way of adding config entries via the environment which gets rid of this shortcoming. If the user passes the `GIT_CONFIG_COUNT=$n` environment variable, Git will parse environment variable pairs `GIT_CONFIG_KEY_$i` and `GIT_CONFIG_VALUE_$i` for each `i` in `[0,n)`. While the same can be achieved with `git -c <name>=<value>`, one may wish to not do so for potentially sensitive information. E.g. if one wants to set `http.extraHeader` to contain an authentication token, doing so via `-c` would trivially leak those credentials via e.g. ps(1), which typically also shows command arguments. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15config: store "git -c" variables using more robust formatPatrick Steinhardt
The previous commit added a new format for $GIT_CONFIG_PARAMETERS which is able to robustly handle subsections with "=" in them. Let's start writing the new format. Unfortunately, this does much less than you'd hope, because "git -c" itself has the same ambiguity problem! But it's still worth doing: - we've now pushed the problem from the inter-process communication into the "-c" command-line parser. This would free us up to later add an unambiguous format there (e.g., separate arguments like "git --config key value", etc). - for --config-env, the parser already disallows "=" in the environment variable name. So: git --config-env section.with=equals.key=ENVVAR will robustly set section.with=equals.key to the contents of $ENVVAR. The new test shows the improvement for --config-env. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15config: parse more robust format in GIT_CONFIG_PARAMETERSJeff King
When we stuff config options into GIT_CONFIG_PARAMETERS, we shell-quote each one as a single unit, like: 'section.one=value1' 'section.two=value2' On the reading side, we de-quote to get the individual strings, and then parse them by splitting on the first "=" we find. This format is ambiguous, because an "=" may appear in a subsection. So the config represented in a file by both: [section "subsection=with=equals"] key = value and: [section] subsection = with=equals.key=value ends up in this flattened format like: 'section.subsection=with=equals.key=value' and we can't tell which was desired. We have traditionally resolved this by taking the first "=" we see starting from the left, meaning that we allowed arbitrary content in the value, but not in the subsection. Let's make our environment format a bit more robust by separately quoting the key and value. That turns those examples into: 'section.subsection=with=equals.key'='value' and: 'section.subsection'='with=equals.key=value' respectively, and we can tell the difference between them. We can detect which format is in use for any given element of the list based on the presence of the unquoted "=". That means we can continue to allow the old format to work to support any callers which manually used the old format, and we can even intermingle the two formats. The old format wasn't documented, and nobody was supposed to be using it. But it's likely that such callers exist in the wild, so it's nice if we can avoid breaking them. Likewise, it may be possible to trigger an older version of "git -c" that runs a script that calls into a newer version of "git -c"; that new version would see the intermingled format. This does create one complication, which is that the obvious format in the new scheme for [section] some-bool is: 'section.some-bool' with no equals. We'd mistake that for an old-style variable. And it even has the same meaning in the old style, but: [section "with=equals"] some-bool does not. It would be: 'section.with=equals=some-bool' which we'd take to mean: [section] with = equals=some-bool in the old, ambiguous style. Likewise, we can't use: 'section.some-bool'='' because that's ambiguous with an actual empty string. Instead, we'll again use the shell-quoting to give us a hint, and use: 'section.some-bool'= to show that we have no value. Note that this commit just expands the reading side. We'll start writing the new format via "git -c" in a future patch. In the meantime, the existing "git -c" tests will make sure we didn't break reading the old format. But we'll also add some explicit coverage of the two formats to make sure we continue to handle the old one after we move the writing side over. And one final note: since we're now using the shell-quoting as a semantically meaningful hint, this closes the door to us ever allowing arbitrary shell quoting, like: 'a'shell'would'be'ok'with'this'.key=value But we have never supported that (only what sq_quote() would produce), and we are probably better off keeping things simple, robust, and backwards-compatible, than trying to make it easier for humans. We'll continue not to advertise the format of the variable to users, and instead keep "git -c" as the recommended mechanism for setting config (even if we are trying to be kind not to break users who may be relying on the current undocumented format). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-12config: extract function to parse config pairsPatrick Steinhardt
The function `git_config_parse_parameter` is responsible for parsing a `foo.bar=baz`-formatted configuration key, sanitizing the key and then processing it via the given callback function. Given that we're about to add a second user which is going to process keys which already has keys and values separated, this commit extracts a function `config_parse_pair` which only does the sanitization and processing part as a preparatory step. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-12config: add new way to pass config via `--config-env`Patrick Steinhardt
While it's already possible to pass runtime configuration via `git -c <key>=<value>`, it may be undesirable to use when the value contains sensitive information. E.g. if one wants to set `http.extraHeader` to contain an authentication token, doing so via `-c` would trivially leak those credentials via e.g. ps(1), which typically also shows command arguments. To enable this usecase without leaking credentials, this commit introduces a new switch `--config-env=<key>=<envvar>`. Instead of directly passing a value for the given key, it instead allows the user to specify the name of an environment variable. The value of that variable will then be used as value of the key. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-23core.abbrev=no disables abbreviationsEric Wong
This allows users to write hash-agnostic scripts and configs by disabling abbreviations. Using "-c core.abbrev=40" will be insufficient with SHA-256, and "-c core.abbrev=64" won't work with SHA-1 repos today. Signed-off-by: Eric Wong <e@80x24.org> [jc: tweaked implementation, added doc and a test] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08Merge branch 'ds/config-literal-value'Junio C Hamano
Various subcommands of "git config" that takes value_regex learn the "--literal-value" option to take the value_regex option as a literal string. * ds/config-literal-value: config doc: value-pattern is not necessarily a regexp config: implement --fixed-value with --get* config: plumb --fixed-value into config API config: add --fixed-value option, un-implemented t1300: add test for --replace-all with value-pattern t1300: test "set all" mode with value-pattern config: replace 'value_regex' with 'value_pattern' config: convert multi_replace to flags
2020-11-25config: plumb --fixed-value into config APIDerrick Stolee
The git_config_set_multivar_in_file_gently() and related methods now take a 'flags' bitfield, so add a new bit representing the --fixed-value option from 'git config'. This alters the purpose of the value_pattern parameter to be an exact string match. This requires some initialization changes in git_config_set_multivar_in_file_gently() and a new strcmp() call in the matches() method. The new CONFIG_FLAGS_FIXED_VALUE flag is initialized in builtin/config.c based on the --fixed-value option, and that needs to be updated in several callers. This patch only affects some of the modes of 'git config', and the rest will be completed in the next change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-25config: replace 'value_regex' with 'value_pattern'Derrick Stolee
The 'value_regex' argument in the 'git config' builtin is poorly named, especially related to an upcoming change that allows exact string matches instead of ERE pattern matches. Perform a mostly mechanical change of every instance of 'value_regex' to 'value_pattern' in the codebase. This is only critical for documentation and error messages, but it is best to be consistent inside the codebase, too. For documentation, use 'value-pattern' which is better punctuation. This affects Documentation/git-config.txt and the usage in builtin/config.c, which was already mixed between 'value_regex' and 'value-regex'. I gave some thought to leaving the value_regex variables inside config.c that are regex_t pointers. However, it is probably best to keep the name consistent with the rest of the variables. This does not update the translations inside the po/ directory, as that creates conflicts with ongoing work. The input strings should automatically update through automation, and a few of the output strings currently use "[value_regex]" directly. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-25config: convert multi_replace to flagsDerrick Stolee
We will extend the flexibility of the config API. Before doing so, let's take an existing 'int multi_replace' parameter and replace it with a new 'unsigned flags' parameter that can take multiple options as a bit field. Update all callers that specified multi_replace to now specify the CONFIG_FLAGS_MULTI_REPLACE flag. To add more clarity, extend the documentation of git_config_set_multivar_in_file() including a clear labeling of its arguments. Other config API methods in config.h require only a change of the final parameter from 'int' to 'unsigned'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-02hashmap: provide deallocation function namesElijah Newren
hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed for a while, but aren't necessarily the clearest names, especially with hashmap_partial_clear() being added to the mix and lazy-initialization now being supported. Peff suggested we adopt the following names[1]: - hashmap_clear() - remove all entries and de-allocate any hashmap-specific data, but be ready for reuse - hashmap_clear_and_free() - ditto, but free the entries themselves - hashmap_partial_clear() - remove all entries but don't deallocate table - hashmap_partial_clear_and_free() - ditto, but free the entries This patch provides the new names and converts all existing callers over to the new naming scheme. [1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/ Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-27Merge branch 'jk/leakfix'Junio C Hamano
Code clean-up. * jk/leakfix: submodule--helper: fix leak of core.worktree value config: fix leak in git_config_get_expiry_in_days() config: drop git_config_get_string_const() config: fix leaks from git_config_get_string_const() checkout: fix leak of non-existent branch names submodule--helper: use strbuf_release() to free strbufs clear_pattern_list(): clear embedded hashmaps
2020-08-17config: fix leak in git_config_get_expiry_in_days()Jeff King
We use git_config_get_string() to retrieve the expiry value in a newly allocated string. But after parsing it, we never free it, leaking the memory. We could fix this with a free() obviously, but there's an even better solution: we can use the non-allocating "tmp" variant of the function; we only need it to be valid for the lifetime of our parse function. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-17config: drop git_config_get_string_const()Jeff King
As evidenced by the leak fixes in the previous commit, the "const" in git_config_get_string_const() clearly misleads people into thinking that it does not allocate a copy of the string. We can fix this by renaming it, but it's easier still to just drop it. Of the four remaining callers: - The one in git_config_parse_expiry() still needs to allocate, since that's what its callers expect. We can just use the non-const version and cast our pointer. Slightly ugly, but the damage is contained in one spot. - The two in apply are writing to global "const char *" variables, and need to continue allocating. We often mark these as const because we assign default string literals to them. But in this case we don't do that, so we can just declare them as real "char *" pointers and use the non-const version. - The call in checkout doesn't actually need a copy; it can just use the non-allocating "tmp" version of the function. The function is also mentioned in the MyFirstContribution document. We can swap that call out for the non-allocating "tmp" variant, which fits well in the example given. We'll drop the "configset" and "repo" variants, as well (which are unused). Note that this frees up the "const" name, so we could rename the "tmp" variant back to that. But let's give some time for topics in flight to adapt to the new code before doing so (if we do it too soon, the function semantics will change but the compiler won't alert us). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-14config: fix leaks from git_config_get_string_const()Jeff King
There are two functions to get a single config string: - git_config_get_string() - git_config_get_string_const() One might naively think that the first one allocates a new string and the second one just points us to the internal configset storage. But in fact they both allocate a new copy; the second one exists only to avoid having to cast when using it with a const global which we never intend to free. The documentation for the function explains that clearly, but it seems I'm not alone in being surprised by this. Of 17 calls to the function, 13 of them leak the resulting value. We could obviously fix these by adding the appropriate free(). But it would be simpler still if we actually had a non-allocating way to get the string. There's git_config_get_value() but that doesn't quite do what we want. If the config key is present but is a boolean with no value (e.g., "[foo]bar" in the file), then we'll get NULL (whereas the string versions will print an error and die). So let's introduce a new variant, git_config_get_string_tmp(), that behaves as these callers expect. We need a new name because we have new semantics but the same function signature (so even if we converted the four remaining callers, topics in flight might be surprised). The "tmp" is because this value should only be held onto for a short time. In practice it's rare for us to clear and refresh the configset, invalidating the pointer, but hopefully the "tmp" makes callers think about the lifetime. In each of the converted cases here the value only needs to last within the local function or its immediate caller. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04config: work around gcc-10 -Wstringop-overflow warningJeff King
Compiling with gcc-10, -O2, and -fsanitize=undefined results in a compiler warning: config.c: In function ‘git_config_copy_or_rename_section_in_file’: config.c:3170:17: error: writing 1 byte into a region of size 0 [-Werror=stringop-overflow=] 3170 | output[0] = '\t'; | ~~~~~~~~~~^~~~~~ config.c:3076:7: note: at offset -1 to object ‘buf’ with size 1024 declared here 3076 | char buf[1024]; | ^~~ This is a false positive. The interesting lines of code are: int i; char *output = buf; ... for (i = 0; buf[i] && isspace(buf[i]); i++) ; /* do nothing */ ... int offset; offset = section_name_match(&buf[i], old_name); if (offset > 0) { ... output += offset + i; if (strlen(output) > 0) { /* * More content means there's * a declaration to put on the * next line; indent with a * tab */ output -= 1; output[0] = '\t'; } } So we do assign output to buf initially. Later we increment it based on "offset" and "i" and then subtract "1" from it. That latter step is what the compiler is complaining about; it could lead to going off the left side of the array if "output == buf" at the moment of the subtraction. For that to be the case, then "offset + i" would have to be 0. But that can't happen: - we know that "offset" is at least 1, since we're in a conditional block that checks that - we know that "i" is not negative, since it started at 0 and only incremented over whitespace So the sum must be at least 1, and therefore it's OK to subtract one from "output". But that's not quite the whole story. Since "i" is an int, it could in theory be possible to overflow to negative (when counting whitespace on a very large string). But we know that's impossible because we're counting the 1024-byte buffer we just fed to fgets(), so it can never be larger than that. Switching the type of "i" to "unsigned" makes the warning go away, so let's do that. Arguably size_t is an even better type (for this and for the other length fields), but switching to it produces a similar but distinct warning: config.c: In function ‘git_config_copy_or_rename_section_in_file’: config.c:3170:13: error: array subscript -1 is outside array bounds of ‘char[1024]’ [-Werror=array-bounds] 3170 | output[0] = '\t'; | ~~~~~~^~~ config.c:3076:7: note: while referencing ‘buf’ 3076 | char buf[1024]; | ^~~ If we were to ever switch off of fgets() to strbuf_getline() or similar, we'd probably need to use size_t to avoid other overflow problems. But for now we know we're safe because of the small fixed size of our buffer. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-10config: reject parsing of files over INT_MAXJeff King
While the last few commits have made it possible for the config parser to handle config files up to the limits of size_t, the rest of the code isn't really ready for this. In particular, we often feed the keys as strings into printf "%s" format specifiers. And because the printf family of functions must return an int to specify the result, they complain. Here are two concrete examples (using glibc; we're in uncharted territory here so results may vary): Generate a gigantic .gitmodules file like this: git submodule add /some/other/repo foo { printf '[submodule "' perl -e 'print "a" x 2**31' echo '"]path = foo' } >.gitmodules git commit -m 'huge gitmodule' then try this: $ git show BUG: strbuf.c:397: your vsnprintf is broken (returned -1) The problem is that we end up calling: strbuf_addf(&sb, "submodule.%s.ignore", submodule_name); which relies on vsnprintf(), and that function has no way to report back a size larger than INT_MAX. Taking that same file, try this: git config --file=.gitmodules --list --name-only On my system it produces an output with exactly 4GB of spaces. I confirmed in a debugger that we reach the config callback with the key intact: it's 2147483663 bytes and full of a's. But when we print it with this call: printf("%s%c", key_, term); we just get the spaces. So given the fact that these are insane cases which we have no need to support, the weird behavior from feeding the results to printf even if the code is careful, and the possibility of uncareful code introducing its own integer truncation issues, let's just declare INT_MAX as a limit for parsing config files. We'll enforce the limit in get_next_char(), which generalizes over all sources (blobs, files, etc) and covers any element we're parsing (whether section, key, value, etc). For simplicity, the limit is over the length of the _whole_ file, so you couldn't have two 1GB values in the same file. This should be perfectly fine, as the expected size for config files is generally kilobytes at most. With this patch both cases above will yield: fatal: bad config line 1 in file .gitmodules That's not an amazing error message, but the parser isn't set up to provide specific messages (it just breaks out of the parsing loop and gives that generic error even if see a syntactic issue). And we really wouldn't expect to see this case outside of somebody maliciously probing the limits of the config system. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-10config: use size_t to store parsed variable baselenJeff King
Most of the config parsing infrastructure is limited in what it can parse only by the size of memory, because it parses character by character, building up strbufs for keys, values, etc. One exception is the "baselen" value we keep in git_parse_source(), which is an int. That stores the length of the section.subsection base, to which we can then append individual key names (by truncating back to the baselen with strbuf_setlen(), and then appending characters for the key name). But because it's an int, if we see an absurdly long section or subsection, we may overflow the integer, wrapping negative. That negative value is then implicitly cast to a size_t when we pass it to strbuf_setlen(), creating a very large value and triggering a BUG. For example: $ { printf '[foo "' perl -e 'print "a" x 2**31' echo '"]bar = value' } >huge $ git config --file=huge --list fatal: BUG: strbuf_setlen() beyond buffer While this is obviously a silly case that we don't care about supporting, it's worth fixing it by switching to a size_t for a few reasons: - we should try to avoid hitting BUG assertions at all - avoiding integer truncation or overflow sets a good example and makes it easier to audit the code for more important issues - the BUG outcome is what happens in _this_ instance, because we wrap negative. If we used a 2**32 subsection, we'd wrap to a small positive value and actually generate wrong output (the subsection of our key would be truncated). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-10git_config_parse_key(): return baselen as size_tJeff King
As with the recent change to parse_config_key(), the best type to return a string length is a size_t, as it won't cause integer truncation for a gigantic key. And as with that change, this is mostly a clarity / hygiene issue for now, as our config parser would choke on such a large key anyway. There are a few ripple effects within the config code, as callers switch to using size_t. I also adjusted a few related variables that iterate over strings. The most unexpected change is that a call to strbuf_addf() had to switch to strbuf_add(). We can't use a size_t with "%.*s", because printf precisions must have type "int" (we could cast, of course, but that would miss the point of using size_t in the first place). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-10config: drop useless length variable in write_pair()Jeff King
We compute the length of a subset of a string, but then use that length only to feed a "%.*s" printf placeholder for the same string. We can just use "%s" to achieve the same thing. The variable became useless in cb891a5989 (Use a strbuf for building up section header and key/value pair strings., 2007-12-14), which swapped out a write() which _did_ use the length for a strbuf_addf() call. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-10parse_config_key(): return subsection len as size_tJeff King
We return the length to a subset of a string using an "int *" out-parameter. This is fine most of the time, as we'd expect config keys to be relatively short, but it could behave oddly if we had a gigantic config key. A more appropriate type is size_t. Let's switch over, which lets our callers use size_t as appropriate (they are bound by our type because they must pass the out-parameter as a pointer). This is mostly just a cleanup to make it clear this code handles long strings correctly. In practice, our config parser already chokes on long key names (because of a similar int/size_t mixup!). When doing an int/size_t conversion, we have to be careful that nobody was trying to assign a negative value to the variable. I manually confirmed that for each case here. They tend to just feed the result to xmemdupz() or similar; in a few cases I adjusted the parameter types for helper functions to make sure the size_t is preserved. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-25Merge branch 'bw/remote-rename-update-config'Junio C Hamano
"git remote rename X Y" needs to adjust configuration variables (e.g. branch.<name>.remote) whose value used to be X to Y. branch.<name>.pushRemote is now also updated. * bw/remote-rename-update-config: remote rename/remove: gently handle remote.pushDefault config config: provide access to the current line number remote rename/remove: handle branch.<name>.pushRemote config values remote: clean-up config callback remote: clean-up by returning early to avoid one indentation pull --rebase/remote rename: document and honor single-letter abbreviations rebase types
2020-02-17Merge branch 'mr/show-config-scope'Junio C Hamano
"git config" learned to show in which "scope", in addition to in which file, each config setting comes from. * mr/show-config-scope: config: add '--show-scope' to print the scope of a config value submodule-config: add subomdule config scope config: teach git_config_source to remember its scope config: preserve scope in do_git_config_sequence config: clarify meaning of command line scoping config: split repo scope to local and worktree config: make scope_name non-static and rename it t1300: create custom config file without special characters t1300: fix over-indented HERE-DOCs config: fix typo in variable name
2020-02-10config: provide access to the current line numberBert Wesarg
Users are nowadays trained to see message from CLI tools in the form <file>:<lno>: … To be able to give such messages when notifying the user about configurations in any config file, it is currently only possible to get the file name (if the value originates from a file to begin with) via `current_config_name()`. Now it is also possible to query the current line number for the configuration. Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10submodule-config: add subomdule config scopeMatthew Rogers
Before the changes to teach git_config_source to remember scope information submodule-config.c never needed to consider the question of config scope. Even though zeroing out git_config_source is still correct and preserved the previous behavior of setting the scope to CONFIG_SCOPE_UNKNOWN, it's better to be explicit about such situations by explicitly setting the scope. As none of the current config_scope enumerations make sense we create CONFIG_SCOPE_SUBMODULE to describe the situation. Signed-off-by: Matthew Rogers <mattr94@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10config: teach git_config_source to remember its scopeMatthew Rogers
There are many situations where the scope of a config command is known beforehand, such as passing of '--local', '--file', etc. to an invocation of git config. However, this information is lost when moving from builtin/config.c to /config.c. This historically hasn't been a big deal, but to prepare for the upcoming --show-scope option we teach git_config_source to keep track of the source and the config machinery to use that information to set current_parsing_scope appropriately. Signed-off-by: Matthew Rogers <mattr94@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10config: preserve scope in do_git_config_sequenceMatthew Rogers
do_git_config_sequence operated under the assumption that it was correct to set current_parsing_scope to CONFIG_SCOPE_UNKNOWN as part of the cleanup it does after it finishes execution. This is incorrect, as it blows away the current_parsing_scope if do_git_config_sequence is called recursively. As such situations are rare (git config running with the '--blob' option is one example) this has yet to cause a problem, but the upcoming '--show-scope' option will experience issues in that case, lets teach do_git_config_sequence to preserve the current_parsing_scope from before it started execution. Signed-off-by: Matthew Rogers <mattr94@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10config: clarify meaning of command line scopingMatthew Rogers
CONFIG_SCOPE_CMDLINE is generally used in the code to refer to config values passed in via the -c option. Options passed in using this mechanism share similar scoping characteristics with the --file and --blob options of the 'config' command, namely that they are only in use for that single invocation of git, and that they supersede the normal system/global/local hierarchy. This patch introduces CONFIG_SCOPE_COMMAND to reflect this new idea, which also makes CONFIG_SCOPE_CMDLINE redundant. Signed-off-by: Matthew Rogers <mattr94@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10config: split repo scope to local and worktreeMatthew Rogers
Previously when iterating through git config variables, worktree config and local config were both considered "CONFIG_SCOPE_REPO". This was never a problem before as no one had needed to differentiate between the two cases, but future functionality may care whether or not the config options come from a worktree or from the repository's actual local config file. For example, the planned feature to add a '--show-scope' to config to allow a user to see which scope listed config options come from would confuse users if it just printed 'repo' rather than 'local' or 'worktree' as the documentation would lead them to expect. As well as the additional benefit of making the implementation look more like how the documentation describes the interface. To accomplish this we split out what was previously considered repo scope to be local and worktree. The clients of 'current_config_scope()' who cared about CONFIG_SCOPE_REPO are also modified to similarly care about CONFIG_SCOPE_WORKTREE and CONFIG_SCOPE_LOCAL to preserve previous behavior. Signed-off-by: Matthew Rogers <mattr94@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10config: make scope_name non-static and rename itMatthew Rogers
To prepare for the upcoming --show-scope option, we require the ability to convert a config_scope enum to a string. As this was originally implemented as a static function 'scope_name()' in t/helper/test-config.c, we expose it via config.h and give it a less ambiguous name 'config_scope_name()' Signed-off-by: Matthew Rogers <mattr94@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10strbuf: add and use strbuf_insertstr()René Scharfe
Add a function for inserting a C string into a strbuf. Use it throughout the source to get rid of magic string length constants and explicit strlen() calls. Like strbuf_addstr(), implement it as an inline function to avoid the implicit strlen() calls to cause runtime overhead. Helped-by: Taylor Blau <me@ttaylorr.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22sparse-checkout: add 'cone' modeDerrick Stolee
The sparse-checkout feature can have quadratic performance as the number of patterns and number of entries in the index grow. If there are 1,000 patterns and 1,000,000 entries, this time can be very significant. Create a new Boolean config option, core.sparseCheckoutCone, to indicate that we expect the sparse-checkout file to contain a more limited set of patterns. This is a separate config setting from core.sparseCheckout to avoid breaking older clients by introducing a tri-state option. The config option does nothing right now, but will be expanded upon in a later commit. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-15Merge branch 'ew/hashmap'Junio C Hamano
Code clean-up of the hashmap API, both users and implementation. * ew/hashmap: hashmap_entry: remove first member requirement from docs hashmap: remove type arg from hashmap_{get,put,remove}_entry OFFSETOF_VAR macro to simplify hashmap iterators hashmap: introduce hashmap_free_entries hashmap: hashmap_{put,remove} return hashmap_entry * hashmap: use *_entry APIs for iteration hashmap_cmp_fn takes hashmap_entry params hashmap_get{,_from_hash} return "struct hashmap_entry *" hashmap: use *_entry APIs to wrap container_of hashmap_get_next returns "struct hashmap_entry *" introduce container_of macro hashmap_put takes "struct hashmap_entry *" hashmap_remove takes "const struct hashmap_entry *" hashmap_get takes "const struct hashmap_entry *" hashmap_add takes "struct hashmap_entry *" hashmap_get_next takes "const struct hashmap_entry *" hashmap_entry_init takes "struct hashmap_entry *" packfile: use hashmap_entry in delta_base_cache_entry coccicheck: detect hashmap_entry.hash assignment diff: use hashmap_entry_init on moved_entry.ent
2019-10-11Merge branch 'bc/object-id-part17'Junio C Hamano
Preparation for SHA-256 upgrade continues. * bc/object-id-part17: (26 commits) midx: switch to using the_hash_algo builtin/show-index: replace sha1_to_hex rerere: replace sha1_to_hex builtin/receive-pack: replace sha1_to_hex builtin/index-pack: replace sha1_to_hex packfile: replace sha1_to_hex wt-status: convert struct wt_status to object_id cache: remove null_sha1 builtin/worktree: switch null_sha1 to null_oid builtin/repack: write object IDs of the proper length pack-write: use hash_to_hex when writing checksums sequencer: convert to use the_hash_algo bisect: switch to using the_hash_algo sha1-lookup: switch hard-coded constants to the_hash_algo config: use the_hash_algo in abbrev comparison combine-diff: replace GIT_SHA1_HEXSZ with the_hash_algo bundle: switch to use the_hash_algo connected: switch GIT_SHA1_HEXSZ to the_hash_algo show-index: switch hard-coded constants to the_hash_algo blame: remove needless comparison with GIT_SHA1_HEXSZ ...
2019-10-07hashmap: remove type arg from hashmap_{get,put,remove}_entryEric Wong
Since these macros already take a `keyvar' pointer of a known type, we can rely on OFFSETOF_VAR to get the correct offset without relying on non-portable `__typeof__' and `offsetof'. Argument order is also rearranged, so `keyvar' and `member' are sequential as they are used as: `keyvar->member' Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07OFFSETOF_VAR macro to simplify hashmap iteratorsEric Wong
While we cannot rely on a `__typeof__' operator being portable to use with `offsetof'; we can calculate the pointer offset using an existing pointer and the address of a member using pointer arithmetic for compilers without `__typeof__'. This allows us to simplify usage of hashmap iterator macros by not having to specify a type when a pointer of that type is already given. In the future, list iterator macros (e.g. list_for_each_entry) may also be implemented using OFFSETOF_VAR to save hackers the trouble of using container_of/list_entry macros and without relying on non-portable `__typeof__'. v3: use `__typeof__' to avoid clang warnings Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap: introduce hashmap_free_entriesEric Wong
`hashmap_free_entries' behaves like `container_of' and passes the offset of the hashmap_entry struct to the internal `hashmap_free_' function, allowing the function to free any struct pointer regardless of where the hashmap_entry field is located. `hashmap_free' no longer takes any arguments aside from the hashmap itself. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap: use *_entry APIs for iterationEric Wong
Inspired by list_for_each_entry in the Linux kernel. Once again, these are somewhat compromised usability-wise by compilers lacking __typeof__ support. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>