summaryrefslogtreecommitdiff
path: root/urlmatch.c
AgeCommit message (Collapse)Author
2023-09-29hex-ll: separate out non-hash-algo functionsCalvin Wan
In order to further reduce all-in-one headers, separate out functions in hex.h that do not operate on object hashes into its own file, hex-ll.h, and update the include directives in the .c files that need only such functions accordingly. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28config: add ctx arg to config_fn_tGlen Choo
Add a new "const struct config_context *ctx" arg to config_fn_t to hold additional information about the config iteration operation. config_context has a "struct key_value_info kvi" member that holds metadata about the config source being read (e.g. what kind of config source it is, the filename, etc). In this series, we're only interested in .kvi, so we could have just used "struct key_value_info" as an arg, but config_context makes it possible to add/adjust members in the future without changing the config_fn_t signature. We could also consider other ways of organizing the args (e.g. moving the config name and value into config_context or key_value_info), but in my experiments, the incremental benefit doesn't justify the added complexity (e.g. a config_fn_t will sometimes invoke another config_fn_t but with a different config value). In subsequent commits, the .kvi member will replace the global "struct config_reader" in config.c, making config iteration a global-free operation. It requires much more work for the machinery to provide meaningful values of .kvi, so for now, merely change the signature and call sites, pass NULL as a placeholder value, and don't rely on the arg in any meaningful way. Most of the changes are performed by contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every config_fn_t: - Modifies the signature to accept "const struct config_context *ctx" - Passes "ctx" to any inner config_fn_t, if needed - Adds UNUSED attributes to "ctx", if needed Most config_fn_t instances are easily identified by seeing if they are called by the various config functions. Most of the remaining ones are manually named in the .cocci patch. Manual cleanups are still needed, but the majority of it is trivial; it's either adjusting config_fn_t that the .cocci patch didn't catch, or adding forward declarations of "struct config_context ctx" to make the signatures make sense. The non-trivial changes are in cases where we are invoking a config_fn_t outside of config machinery, and we now need to decide what value of "ctx" to pass. These cases are: - trace2/tr2_cfg.c:tr2_cfg_set_fl() This is indirectly called by git_config_set() so that the trace2 machinery can notice the new config values and update its settings using the tr2 config parsing function, i.e. tr2_cfg_cb(). - builtin/checkout.c:checkout_main() This calls git_xmerge_config() as a shorthand for parsing a CLI arg. This might be worth refactoring away in the future, since git_xmerge_config() can call git_default_config(), which can do much more than just parsing. Handle them by creating a KVI_INIT macro that initializes "struct key_value_info" to a reasonable default, and use that to construct the "ctx" arg. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24treewide: replace cache.h with more direct headers, where possibleElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-08convert trivial uses of strncmp() to starts_with()Jeff King
It's more readable to use starts_with() instead of strncmp() to match a prefix, as the latter requires a manually-computed length, and has the funny "matching is zero" return value common to cmp functions. This patch converts several cases which were found with: git grep 'strncmp(.*, [0-9]*)' But note that it doesn't convert all such cases. There are several where the magic length number is repeated elsewhere in the code, like: /* handle "buf" which isn't NUL-terminated and might be too small */ if (len >= 3 && !strncmp(buf, "foo", 3)) or: /* exact match for "foo", but within a larger string */ if (end - buf == 3 && !strncmp(buf, "foo", 3)) While it would not produce the wrong outcome to use starts_with() in these cases, we'd still be left with one instance of "3". We're better to leave them for now, as the repeated "3" makes it clear that the two are linked (there may be other refactorings that handle both, but they're out of scope for this patch). A few things to note while reading the patch: - all cases but one are trying to match, and so lose the extra "!". The case in the first hunk of urlmatch.c is not-matching, and hence gains a "!". - the case in remote-fd.c is matching the beginning of "connect foo", but we never look at str+8 to parse the "foo" part (which would make this a candidate for skip_prefix(), not starts_with()). This seems at first glance like a bug, but is a limitation of how remote-fd works. - the second hunk in urlmatch.c shows some cases adjacent to other strncmp() calls that are left. These are of the "exact match within a larger string" type, as described above. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-04urlmatch.c: add and use a *_release() functionÆvar Arnfjörð Bjarmason
Plug a memory leak in credential_apply_config() by adding and using a new urlmatch_config_release() function. This just does a string_list_clear() on the "vars" member. This finished up work on normalizing the init/free pattern in this API, started in 73ee449bbf2 (urlmatch.[ch]: add and use URLMATCH_CONFIG_INIT, 2021-10-01). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-13urlmatch: add underscore to URL_HOST_CHARSJeff King
When parsing a URL to normalize it, we allow hostnames to contain only dot (".") or dash ("-"), plus brackets and colons for IPv6 literals. This matches the old URL standard in RFC 1738, which says: host = hostname | hostnumber hostname = *[ domainlabel "." ] toplabel domainlabel = alphadigit | alphadigit *[ alphadigit | "-" ] alphadigit But this was later updated by RFC 3986, which is more liberal: host = IP-literal / IPv4address / reg-name reg-name = *( unreserved / pct-encoded / sub-delims ) unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" While names with underscore in them are not common and possibly violate some DNS rules, they do work in practice, and we will happily contact them over http://, git://, or ssh://. It seems odd to ignore them for purposes of URL matching, especially when the URL RFC seems to allow them. There shouldn't be any downside here. It's not a syntactically significant character in a URL, so we won't be confused about parsing; we'd have simply rejected such a URL previously (the test here checks the url code directly, but the obvious user-visible effect would be failing to match credential.http://foo_bar.example.com.helper, or similar config in http.<url>.*). Arguably we'd want to allow tilde ("~") here, too. There's likewise probably no downside, but I didn't add it simply because it seems like an even less likely character to appear in a hostname. Reported-by: Alex Waite <alex@waite.eu> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-24credential: handle `credential.<partial-URL>.<key>` againJohannes Schindelin
In the patches for CVE-2020-11008, the ability to specify credential settings in the config for partial URLs got lost. For example, it used to be possible to specify a credential helper for a specific protocol: [credential "https://"] helper = my-https-helper Likewise, it used to be possible to configure settings for a specific host, e.g.: [credential "dev.azure.com"] useHTTPPath = true Let's reinstate this behavior. While at it, increase the test coverage to document and verify the behavior with a couple other categories of partial URLs. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-20credential: allow wildcard patterns when matching configbrian m. carlson
In some cases, a user will want to use a specific credential helper for a wildcard pattern, such as https://*.corp.example.com. We have code that handles this already with the urlmatch code, so let's use that instead of our custom code. Since the urlmatch code is a superset of our current matching in terms of capabilities, there shouldn't be any cases of things that matched previously that don't match now. However, in addition to wildcard matching, we now use partial path matching, which can cause slightly different behavior in the case that a helper applies to the prefix (considering path components) of the remote URL. While different, this is probably the behavior people were wanting anyway. Since we're using the urlmatch code, we need to encode the components we've gotten into a URL to match, so add a function to percent-encode data and format the URL with it. We now also no longer need to the custom code to match URLs, so let's remove it. Additionally, the urlmatch code always looks for the best match, whereas we want all matches for credential helpers to preserve existing behavior. Let's add an optional field, select_fn, that lets us control which items we want (in this case, all of them) and default it to the best-match code that already exists for other users. Signed-off-by: brian m. carlson <bk2204@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-09urlmatch: use hex2chr() in append_normalized_escapes()René Scharfe
Simplify the code by using hex2chr() to convert and check for invalid characters at the same time instead of doing that sequentially with one table lookup for each. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-01urlmatch: allow globbing for the URL host partPatrick Steinhardt
The URL matching function computes for two URLs whether they match not. The match is performed by splitting up the URL into different parts and then doing an exact comparison with the to-be-matched URL. The main user of `urlmatch` is the configuration subsystem. It allows to set certain configurations based on the URL which is being connected to via keys like `http.<url>.*`. A common use case for this is to set proxies for only some remotes which match the given URL. Unfortunately, having exact matches for all parts of the URL can become quite tedious in some setups. Imagine for example a corporate network where there are dozens or even hundreds of subdomains, which would have to be configured individually. Allow users to write an asterisk '*' in place of any 'host' or 'subdomain' label as part of the host name. For example, "http.https://*.example.com.proxy" sets "http.proxy" for all direct subdomains of "https://example.com", e.g. "https://foo.example.com", but not "https://foo.bar.example.com". Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-01urlmatch: include host in urlmatch rankingPatrick Steinhardt
In order to be able to rank positive matches by `urlmatch`, we inspect the path length and user part to decide whether a match is better than another match. As all other parts are matched exactly between both URLs, this is the correct thing to do right now. In the future, though, we want to introduce wild cards for the domain part. When doing this, it does not make sense anymore to only compare the path lengths. Instead, we also want to compare the domain lengths to determine which of both URLs matches the host part more closely. Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-31urlmatch: split host and port fields in `struct url_info`Patrick Steinhardt
The `url_info` structure contains information about a normalized URL with the URL's components being represented by different fields. The host and port part though are to be accessed by the same `host` field, so that getting the host and/or port separately becomes more involved than really necessary. To make the port more readily accessible, split up the host and port fields. Namely, the `host_len` will not include the port length anymore and a new `port_off` field has been added which includes the offset to the port, if available. The only user of these fields is `url_normalize_1`. This change makes it easier later on to treat host and port differently when introducing globs for domains. Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-31urlmatch: enable normalization of URLs with globsPatrick Steinhardt
The `url_normalize` function is used to validate and normalize URLs. As such, it does not allow for some special characters to be part of the URLs that are to be normalized. As we want to allow using globs in some configuration keys making use of URLs, namely `http.<url>.<key>`, but still normalize them, we need to somehow enable some additional allowed characters. To do this without having to change all callers of `url_normalize`, where most do not actually want globbing at all, we split off another function `url_normalize_1`. This function accepts an additional parameter `allow_globs`, which is subsequently called by `url_normalize` with `allow_globs=0`. As of now, this function is not used with globbing enabled. A caller will be added in the following commit. Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-15urlmatch.c: make match_urls() staticJunio C Hamano
No external callers exist. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16isxdigit: cast input to unsigned charJeff King
Otherwise, callers must do so or risk triggering warnings -Wchar-subscript (and rightfully so; a signed char might cause us to use a bogus negative index into the hexval_table). While we are dropping the now-unnecessary casts from the caller in urlmatch.c, we can get rid of similar casts in actually parsing the hex by using the hexval() helper, which implicitly casts to unsigned (but note that we cannot implement isxdigit in terms of hexval(), as it also casts its return value to unsigned). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20refactor skip_prefix to return a booleanJeff King
The skip_prefix() function returns a pointer to the content past the prefix, or NULL if the prefix was not found. While this is nice and simple, in practice it makes it hard to use for two reasons: 1. When you want to conditionally skip or keep the string as-is, you have to introduce a temporary variable. For example: tmp = skip_prefix(buf, "foo"); if (tmp) buf = tmp; 2. It is verbose to check the outcome in a conditional, as you need extra parentheses to silence compiler warnings. For example: if ((cp = skip_prefix(buf, "foo")) /* do something with cp */ Both of these make it harder to use for long if-chains, and we tend to use starts_with() instead. However, the first line of "do something" is often to then skip forward in buf past the prefix, either using a magic constant or with an extra strlen(3) (which is generally computed at compile time, but means we are repeating ourselves). This patch refactors skip_prefix() to return a simple boolean, and to provide the pointer value as an out-parameter. If the prefix is not found, the out-parameter is untouched. This lets you write: if (skip_prefix(arg, "foo ", &arg)) do_foo(arg); else if (skip_prefix(arg, "bar ", &arg)) do_bar(arg); Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-12urlmatch.c: recompute pointer after append_normalized_escapesThomas Rast
When append_normalized_escapes is called, its internal strbuf_add* calls can cause the strbuf's buf to be reallocated changing the value of the buf pointer. Do not use the strbuf buf pointer from before any append_normalized_escapes calls afterwards. Instead recompute the needed pointer. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Kyle J. McKay <mackyle@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-31config: add generic callback wrapper to parse section.<url>.keyJunio C Hamano
Existing configuration parsing functions (e.g. http_options() in http.c) know how to parse two-level configuration variable names. We would like to exploit them and parse something like this: [http] sslVerify = true [http "https://weak.example.com"] sslVerify = false and pretend as if http.sslVerify were set to false when talking to "https://weak.example.com/path". Introduce `urlmatch_config_entry()` wrapper that: - is called with the target URL (e.g. "https://weak.example.com/path"), and the two-level variable parser (e.g. `http_options`); - uses `url_normalize()` and `match_urls()` to see if configuration data matches the target URL; and - calls the traditional two-level configuration variable parser only for the configuration data whose <url> part matches the target URL (and if there are multiple matches, only do so if the current match is a better match than the ones previously seen). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-31config: add helper to normalize and match URLsKyle J. McKay
Some http.* configuration variables need to take values customized for the URL we are talking to. We may want to set http.sslVerify to true in general but to false only for a certain site, for example, with a configuration file like this: [http] sslVerify = true [http "https://weak.example.com"] sslVerify = false and let the configuration machinery pick up the latter only when talking to "https://weak.example.com". The latter needs to kick in not only when the URL is exactly "https://weak.example.com", but also is anything that "match" it, e.g. https://weak.example.com/test https://me@weak.example.com/test The <url> in the configuration key consists of the following parts, and is considered a match to the URL we are attempting to access under certain conditions: . Scheme (e.g., `https` in `https://example.com/`). This field must match exactly between the config key and the URL. . Host/domain name (e.g., `example.com` in `https://example.com/`). This field must match exactly between the config key and the URL. . Port number (e.g., `8080` in `http://example.com:8080/`). This field must match exactly between the config key and the URL. Omitted port numbers are automatically converted to the correct default for the scheme before matching. . Path (e.g., `repo.git` in `https://example.com/repo.git`). The path field of the config key must match the path field of the URL either exactly or as a prefix of slash-delimited path elements. A config key with path `foo/` matches URL path `foo/bar`. A prefix can only match on a slash (`/`) boundary. Longer matches take precedence (so a config key with path `foo/bar` is a better match to URL path `foo/bar` than a config key with just path `foo/`). . User name (e.g., `me` in `https://me@example.com/repo.git`). If the config key has a user name, it must match the user name in the URL exactly. If the config key does not have a user name, that config key will match a URL with any user name (including none), but at a lower precedence than a config key with a user name. Longer matches take precedence over shorter matches. This step adds two helper functions `url_normalize()` and `match_urls()` to help implement the above semantics. The normalization rules are based on RFC 3986 and should result in any two equivalent urls being a match. Signed-off-by: Kyle J. McKay <mackyle@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>