path: root/builtin/fetch.c
AgeCommit message (Collapse)Author
14 hoursMerge branch 'js/run-command-close-packs'Junio C Hamano
The run-command API has been updated so that the callers can easily ask the file descriptors open for packfiles to be closed immediately before spawning commands that may trigger auto-gc. * js/run-command-close-packs: Close object store closer to spawning child processes run_auto_maintenance(): implicitly close the object store run-command: offer to close the object store before running run-command: prettify the `RUN_COMMAND_*` flags pull: release packs before fetching commit-graph: when closing the graph, also release the slab
14 hoursMerge branch 'ps/fetch-optim'Junio C Hamano
Optimize code that handles large number of refs in the "git fetch" code path. * ps/fetch-optim: fetch: avoid second connectivity check if we already have all objects fetch: merge fetching and consuming refs fetch: refactor fetch refs to be more extendable fetch-pack: optimize loading of refs via commit graph connected: refactor iterator to return next object ID directly fetch: avoid unpacking headers in object existence check fetch: speed up lookup of want refs via commit-graph
11 daysMerge branch 'ab/retire-advice-config'Junio C Hamano
Code clean up to migrate callers from older advice_config[] based API to newer advice_if_enabled() and advice_enabled() API. * ab/retire-advice-config: advice: move advice.graftFileDeprecated squashing to commit.[ch] advice: remove use of global advice_add_embedded_repo advice: remove read uses of most global `advice_` variables advice: add enum variants for missing advice variables
11 daysMerge branch 'ps/fetch-omit-formatting-under-quiet'Junio C Hamano
"git fetch --quiet" optimization to avoid useless computation of info that will never be displayed. * ps/fetch-omit-formatting-under-quiet: fetch: skip formatting updated refs with `--quiet`
12 daysrun_auto_maintenance(): implicitly close the object storeJohannes Schindelin
Before spawning the auto maintenance, we need to make sure that we release all open file handles to all the `.pack` files (and MIDX files and commit-graph files and...) so that the maintenance process has the freedom to delete those files. So far, we did this manually every time before calling `run_auto_maintenance()`. With the new `close_object_store` flag, we can do that implicitly in that function, which is more robust because future callers won't be able to forget to close the object store. Note: this changes behavior slightly, as we previously _always_ closed the object store, but now we only close the object store when actually running the auto maintenance. In practice, this should not matter (if anything, it might speed up operations where auto maintenance is disabled). Suggested-by: Junio C Hamano <> Signed-off-by: Johannes Schindelin <> Signed-off-by: Junio C Hamano <>
2021-09-01fetch: avoid second connectivity check if we already have all objectsPatrick Steinhardt
When fetching refs, we are doing two connectivity checks: - The first one is done such that we can skip fetching refs in the case where we already have all objects referenced by the updated set of refs. - The second one verifies that we have all objects after we have fetched objects. We always execute both connectivity checks, but this is wasteful in case the first connectivity check already notices that we have all objects locally available. Skip the second connectivity check in case we already had all objects available. This gives us a nice speedup when doing a mirror-fetch in a repository with about 2.3M refs where the fetching repo already has all objects: Benchmark #1: HEAD~: git-fetch Time (mean ± σ): 30.025 s ± 0.081 s [User: 27.070 s, System: 4.933 s] Range (min … max): 29.900 s … 30.111 s 5 runs Benchmark #2: HEAD: git-fetch Time (mean ± σ): 25.574 s ± 0.177 s [User: 22.855 s, System: 4.683 s] Range (min … max): 25.399 s … 25.765 s 5 runs Summary 'HEAD: git-fetch' ran 1.17 ± 0.01 times faster than 'HEAD~: git-fetch' Signed-off-by: Patrick Steinhardt <> Signed-off-by: Junio C Hamano <>
2021-09-01fetch: merge fetching and consuming refsPatrick Steinhardt
The functions `fetch_refs()` and `consume_refs()` must always be called together such that we first obtain all missing objects and then update our local refs to match the remote refs. In a subsequent patch, we'll further require that `fetch_refs()` must always be called before `consume_refs()` such that it can correctly assert that we have all objects after the fetch given that we're about to move the connectivity check. Make this requirement explicit by merging both functions into a single `fetch_and_consume_refs()` function. Signed-off-by: Patrick Steinhardt <> Signed-off-by: Junio C Hamano <>
2021-09-01fetch: refactor fetch refs to be more extendablePatrick Steinhardt
Refactor `fetch_refs()` code to make it more extendable by explicitly handling error cases. The refactored code should behave the same. Signed-off-by: Patrick Steinhardt <> Signed-off-by: Junio C Hamano <>
2021-09-01connected: refactor iterator to return next object ID directlyPatrick Steinhardt
The object ID iterator used by the connectivity checks returns the next object ID via an out-parameter and then uses a return code to indicate whether an item was found. This is a bit roundabout: instead of a separate error code, we can just return the next object ID directly and use `NULL` pointers as indicator that the iterator got no items left. Furthermore, this avoids a copy of the object ID. Refactor the iterator and all its implementations to return object IDs directly. This brings a tiny performance improvement when doing a mirror-fetch of a repository with about 2.3M refs: Benchmark #1: 328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch Time (mean ± σ): 30.110 s ± 0.148 s [User: 27.161 s, System: 5.075 s] Range (min … max): 29.934 s … 30.406 s 10 runs Benchmark #2: 328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch Time (mean ± σ): 29.899 s ± 0.109 s [User: 26.916 s, System: 5.104 s] Range (min … max): 29.696 s … 29.996 s 10 runs Summary '328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch' ran 1.01 ± 0.01 times faster than '328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch' While this 1% speedup could be labelled as statistically insignificant, the speedup is consistent on my machine. Furthermore, this is an end to end test, so it is expected that the improvement in the connectivity check itself is more significant. Signed-off-by: Patrick Steinhardt <> Signed-off-by: Junio C Hamano <>
2021-09-01fetch: avoid unpacking headers in object existence checkPatrick Steinhardt
When updating local refs after the fetch has transferred all objects, we do an object existence test as a safety guard to avoid updating a ref to an object which we don't have. We do so via `oid_object_info()`: if it returns an error, then we know the object does not exist. One side effect of `oid_object_info()` is that it parses the object's type, and to do so it must unpack the object header. This is completely pointless: we don't care for the type, but only want to assert that the object exists. Refactor the code to use `repo_has_object_file()`, which both makes the code's intent clearer and is also faster because it does not unpack object headers. In a real-world repo with 2.3M refs, this results in a small speedup when doing a mirror-fetch: Benchmark #1: HEAD~: git-fetch Time (mean ± σ): 33.686 s ± 0.176 s [User: 30.119 s, System: 5.262 s] Range (min … max): 33.512 s … 33.944 s 5 runs Benchmark #2: HEAD: git-fetch Time (mean ± σ): 31.247 s ± 0.195 s [User: 28.135 s, System: 5.066 s] Range (min … max): 30.948 s … 31.472 s 5 runs Summary 'HEAD: git-fetch' ran 1.08 ± 0.01 times faster than 'HEAD~: git-fetch' Signed-off-by: Patrick Steinhardt <> Signed-off-by: Junio C Hamano <>
2021-09-01fetch: speed up lookup of want refs via commit-graphPatrick Steinhardt
When updating our local refs based on the refs fetched from the remote, we need to iterate through all requested refs and load their respective commits such that we can determine whether they need to be appended to FETCH_HEAD or not. In cases where we're fetching from a remote with exceedingly many refs, resolving these refs can be quite expensive given that we repeatedly need to unpack object headers for each of the referenced objects. Speed this up by opportunistically trying to resolve object IDs via the commit graph. We only do so for any refs which are not in "refs/tags": more likely than not, these are going to be a commit anyway, and this lets us avoid having to unpack object headers completely in case the object is a commit that is part of the commit-graph. This significantly speeds up mirror-fetches in a real-world repository with 2.3M refs: Benchmark #1: HEAD~: git-fetch Time (mean ± σ): 56.482 s ± 0.384 s [User: 53.340 s, System: 5.365 s] Range (min … max): 56.050 s … 57.045 s 5 runs Benchmark #2: HEAD: git-fetch Time (mean ± σ): 33.727 s ± 0.170 s [User: 30.252 s, System: 5.194 s] Range (min … max): 33.452 s … 33.871 s 5 runs Summary 'HEAD: git-fetch' ran 1.67 ± 0.01 times faster than 'HEAD~: git-fetch' Signed-off-by: Patrick Steinhardt <> Signed-off-by: Junio C Hamano <>
2021-08-30fetch: skip formatting updated refs with `--quiet`Patrick Steinhardt
When fetching, Git will by default print a list of all updated refs in a nicely formatted table. In order to come up with this table, Git needs to iterate refs twice: first to determine the maximum column width, and a second time to actually format these changed refs. While this table will not be printed in case the user passes `--quiet`, we still go out of our way and do all these steps. In fact, we even do more work compared to not passing `--quiet`: without the flag, we will skip all references in the column width computation which have not been updated, but if it is set we will now compute widths for all refs. Fix this issue by completely skipping both preparation of the format and formatting data for display in case the user passes `--quiet`, improving performance especially with many refs. The following benchmark shows a nice speedup for a quiet mirror-fetch in a repository with 2.3M refs: Benchmark #1: HEAD~: git-fetch Time (mean ± σ): 26.929 s ± 0.145 s [User: 24.194 s, System: 4.656 s] Range (min … max): 26.692 s … 27.068 s 5 runs Benchmark #2: HEAD: git-fetch Time (mean ± σ): 25.189 s ± 0.094 s [User: 22.556 s, System: 4.606 s] Range (min … max): 25.070 s … 25.314 s 5 runs Summary 'HEAD: git-fetch' ran 1.07 ± 0.01 times faster than 'HEAD~: git-fetch' While at it, this patch also fixes `adjust_refcol_width()` such that it skips unchanged refs in case the user passed `--quiet`, where verbosity will be negative. While this function won't be called anymore if so, this brings the comment in line with actual code. Furthermore, needless `verbosity >= 0` checks are now removed in `store_updated_refs()`: we never print to the `note` buffer anymore in case `verbosity < 0`, so we won't end up in that code block anyway. Signed-off-by: Patrick Steinhardt <> Signed-off-by: Junio C Hamano <>
2021-08-25advice: remove read uses of most global `advice_` variablesBen Boeckel
In c4a09cc9ccb (Merge branch 'hw/advise-ng', 2020-03-25), a new API for accessing advice variables was introduced and deprecated `advice_config` in favor of a new array, `advice_setting`. This patch ports all but two uses which read the status of the global `advice_` variables over to the new `advice_enabled` API. We'll deal with advice_add_embedded_repo and advice_graft_file_deprecated separately. Signed-off-by: Ben Boeckel <> Signed-off-by: Ævar Arnfjörð Bjarmason <> Signed-off-by: Junio C Hamano <>
2021-08-24Merge branch 'jt/push-negotiation-fixes'Junio C Hamano
Bugfix for common ancestor negotiation recently introduced in "git push" code path. * jt/push-negotiation-fixes: fetch: die on invalid --negotiation-tip hash send-pack: fix push nego. when remote has refs send-pack: fix push.negotiate with remote helper
2021-07-17Merge branch 'ab/fetch-negotiate-segv-fix'Junio C Hamano
Code recently added to support common ancestry negotiation during "git push" did not sanity check its arguments carefully enough. * ab/fetch-negotiate-segv-fix: fetch: fix segfault in --negotiate-only without --negotiation-tip=* fetch: document the --negotiate-only option send-pack.c: move "no refs in common" abort earlier
2021-07-15fetch: die on invalid --negotiation-tip hashJonathan Tan
If a full hexadecimal hash is given as a --negotiation-tip to "git fetch", and that hash does not correspond to an object, "git fetch" will segfault if --negotiate-only is given and will silently ignore that hash otherwise. Make these cases fatal errors, just like the case when an invalid ref name or abbreviated hash is given. While at it, mark the error messages as translatable. Signed-off-by: Jonathan Tan <> Signed-off-by: Junio C Hamano <>
2021-07-08fetch: fix segfault in --negotiate-only without --negotiation-tip=*Ævar Arnfjörð Bjarmason
The recent --negotiate-only option would segfault in the call to oid_array_for_each() in negotiate_using_fetch() unless one or more --negotiation-tip=* options were provided. All of the other tests for the feature combine both, but nothing was checking this assumption, let's do that and add a test for it. Fixes a bug in 9c1e657a8fd (fetch: teach independent negotiation (no packfile), 2021-05-04). Signed-off-by: Ævar Arnfjörð Bjarmason <> Signed-off-by: Junio C Hamano <>
2021-05-20fetch: improve grammar of "shallow roots" messageAlex Henrie
Signed-off-by: Alex Henrie <> Signed-off-by: Junio C Hamano <>
2021-05-16Merge branch 'jt/push-negotiation'Junio C Hamano
"git push" learns to discover common ancestor with the receiving end over protocol v2. * jt/push-negotiation: send-pack: support push negotiation fetch: teach independent negotiation (no packfile) fetch-pack: refactor command and capability write fetch-pack: refactor add_haves() fetch-pack: refactor process_acks()
2021-05-05fetch: teach independent negotiation (no packfile)Jonathan Tan
Currently, the packfile negotiation step within a Git fetch cannot be done independent of sending the packfile, even though there is at least one application wherein this is useful. Therefore, make it possible for this negotiation step to be done independently. A subsequent commit will use this for one such application - push negotiation. This feature is for protocol v2 only. (An implementation for protocol v0 would require a separate implementation in the fetch, transport, and transport helper code.) In the protocol, the main hindrance towards independent negotiation is that the server can unilaterally decide to send the packfile. This is solved by a "wait-for-done" argument: the server will then wait for the client to say "done". In practice, the client will never say it; instead it will cease requests once it is satisfied. In the client, the main change lies in the transport and transport helper code. fetch_refs_via_pack() performs everything needed - protocol version and capability checks, and the negotiation itself. There are 2 code paths that do not go through fetch_refs_via_pack() that needed to be individually excluded: the bundle transport (excluded through requiring smart_options, which the bundle transport doesn't support) and transport helpers that do not support takeover. If or when we support independent negotiation for protocol v0, we will need to modify these 2 code paths to support it. But for now, report failure if independent negotiation is requested in these cases. Signed-off-by: Jonathan Tan <> Signed-off-by: Junio C Hamano <>
2021-04-16fetch: add --prefetch optionDerrick Stolee
The --prefetch option will be used by the 'prefetch' maintenance task instead of sending refspecs explicitly across the command-line. The intention is to modify the refspec to place all results in refs/prefetch/ instead of anywhere else. Create helper method filter_prefetch_refspec() to modify a given refspec to fit the rules expected of the prefetch task: * Negative refspecs are preserved. * Refspecs without a destination are removed. * Refspecs whose source starts with "refs/tags/" are removed. * Other refspecs are placed within "refs/prefetch/". Finally, we add the 'force' option to ensure that prefetch refs are replaced as necessary. There are some interesting cases that are worth testing. An earlier version of this change dropped the "i--" from the loop that deletes a refspec item and shifts the remaining entries down. This allowed some refspecs to not be modified. The subtle part about the first --prefetch test is that the "refs/tags/*" refspec appears directly before the "refs/heads/bogus/*" refspec. Without that "i--", this ordering would remove the "refs/tags/*" refspec and leave the last one unmodified, placing the result in "refs/heads/*". It is possible to have an empty refspec. This is typically the case for remotes other than the origin, where users want to fetch a specific tag or branch. To correctly test this case, we need to further remove the upstream remote for the local branch. Thus, we are testing a refspec that will be deleted, leaving nothing to fetch. Helped-by: Tom Saeger <> Helped-by: Ramsay Jones <> Signed-off-by: Derrick Stolee <> Signed-off-by: Junio C Hamano <>
2021-02-18Merge branch 'jt/clone-unborn-head'Junio C Hamano
"git clone" tries to locally check out the branch pointed at by HEAD of the remote repository after it is done, but the protocol did not convey the information necessary to do so when copying an empty repository. The protocol v2 learned how to do so. * jt/clone-unborn-head: clone: respect remote unborn HEAD connect, transport: encapsulate arg in struct ls-refs: report unborn targets of symrefs
2021-02-05connect, transport: encapsulate arg in structJonathan Tan
In a future patch we plan to return the name of an unborn current branch from deep in the callchain to a caller via a new pointer parameter that points at a variable in the caller when the caller calls get_remote_refs() and transport_get_remote_refs(). In preparation for that, encapsulate the existing ref_prefixes parameter into a struct. The aforementioned unborn current branch will go into this new struct in the future patch. Signed-off-by: Jonathan Tan <> Signed-off-by: Junio C Hamano <>
2021-01-12fetch: implement support for atomic reference updatesPatrick Steinhardt
When executing a fetch, then git will currently allocate one reference transaction per reference update and directly commit it. This means that fetches are non-atomic: even if some of the reference updates fail, others may still succeed and modify local references. This is fine in many scenarios, but this strategy has its downsides. - The view of remote references may be inconsistent and may show a bastardized state of the remote repository. - Batching together updates may improve performance in certain scenarios. While the impact probably isn't as pronounced with loose references, the upcoming reftable backend may benefit as it needs to write less files in case the update is batched. - The reference-update hook is currently being executed twice per updated reference. While this doesn't matter when there is no such hook, we have seen severe performance regressions when doing a git-fetch(1) with reference-transaction hook when the remote repository has hundreds of thousands of references. Similar to `git push --atomic`, this commit thus introduces atomic fetches. Instead of allocating one reference transaction per updated reference, it causes us to only allocate a single transaction and commit it as soon as all updates were received. If locking of any reference fails, then we abort the complete transaction and don't update any reference, which gives us an all-or-nothing fetch. Note that this may not completely fix the first of above downsides, as the consistent view also depends on the server-side. If the server doesn't have a consistent view of its own references during the reference negotiation phase, then the client would get the same inconsistent view the server has. This is a separate problem though and, if it actually exists, can be fixed at a later point. This commit also changes the way we write FETCH_HEAD in case `--atomic` is passed. Instead of writing changes as we go, we need to accumulate all changes first and only commit them at the end when we know that all reference updates succeeded. Ideally, we'd just do so via a temporary file so that we don't need to carry all updates in-memory. This isn't trivially doable though considering the `--append` mode, where we do not truncate the file but simply append to it. And given that we support concurrent processes appending to FETCH_HEAD at the same time without any loss of data, seeding the temporary file with current contents of FETCH_HEAD initially and then doing a rename wouldn't work either. So this commit implements the simple strategy of buffering all changes and appending them to the file on commit. Signed-off-by: Patrick Steinhardt <> Signed-off-by: Junio C Hamano <>
2021-01-12fetch: allow passing a transaction to `s_update_ref()`Patrick Steinhardt
The handling of ref updates is completely handled by `s_update_ref()`, which will manage the complete lifecycle of the reference transaction. This is fine right now given that git-fetch(1) does not support atomic fetches, so each reference gets its own transaction. It is quite inflexible though, as `s_update_ref()` only knows about a single reference update at a time, so it doesn't allow us to alter the strategy. This commit prepares `s_update_ref()` and its only caller `update_local_ref()` to allow passing an external transaction. If none is given, then the existing behaviour is triggered which creates a new transaction and directly commits it. Otherwise, if the caller provides a transaction, then we only queue the update but don't commit it. This optionally allows the caller to manage when a transaction will be committed. Given that `update_local_ref()` is always called with a `NULL` transaction for now, no change in behaviour is expected from this change. Signed-off-by: Patrick Steinhardt <> Signed-off-by: Junio C Hamano <>
2021-01-12fetch: refactor `s_update_ref` to use common exit pathPatrick Steinhardt
The cleanup code in `s_update_ref()` is currently duplicated for both succesful and erroneous exit paths. This commit refactors the function to have a shared exit path for both cases to remove the duplication. Suggested-by: Christian Couder <> Signed-off-by: Patrick Steinhardt <> Signed-off-by: Junio C Hamano <>
2021-01-12fetch: use strbuf to format FETCH_HEAD updatesPatrick Steinhardt
This commit refactors `append_fetch_head()` to use a `struct strbuf` for formatting the update which we're about to append to the FETCH_HEAD file. While the refactoring doesn't have much of a benefit right now, it serves as a preparatory step to implement atomic fetches where we need to buffer all updates to FETCH_HEAD and only flush them out if all reference updates succeeded. No change in behaviour is expected from this commit. Signed-off-by: Patrick Steinhardt <> Signed-off-by: Junio C Hamano <>
2021-01-12fetch: extract writing to FETCH_HEADPatrick Steinhardt
When performing a fetch with the default `--write-fetch-head` option, we write all updated references to FETCH_HEAD while the updates are performed. Given that updates are not performed atomically, it means that we we write to FETCH_HEAD even if some or all of the reference updates fail. Given that we simply update FETCH_HEAD ad-hoc with each reference, the logic is completely contained in `store_update_refs` and thus quite hard to extend. This can already be seen by the way we skip writing to the FETCH_HEAD: instead of having a conditional which simply skips writing, we instead open "/dev/null" and needlessly write all updates there. We are about to extend git-fetch(1) to accept an `--atomic` flag which will make the fetch an all-or-nothing operation with regards to the reference updates. This will also require us to make the updates to FETCH_HEAD an all-or-nothing operation, but as explained doing so is not easy with the current layout. This commit thus refactors the wa we write to FETCH_HEAD and pulls out the logic to open, append to, commit and close the file. While this may seem rather over-the top at first, pulling out this logic will make it a lot easier to update the code in a subsequent commit. It also allows us to easily skip writing completely in case `--no-write-fetch-head` was passed. Signed-off-by: Patrick Steinhardt <> Signed-off-by: Junio C Hamano <>
2020-11-02hashmap: provide deallocation function namesElijah Newren
hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed for a while, but aren't necessarily the clearest names, especially with hashmap_partial_clear() being added to the mix and lazy-initialization now being supported. Peff suggested we adopt the following names[1]: - hashmap_clear() - remove all entries and de-allocate any hashmap-specific data, but be ready for reuse - hashmap_clear_and_free() - ditto, but free the entries themselves - hashmap_partial_clear() - remove all entries but don't deallocate table - hashmap_partial_clear_and_free() - ditto, but free the entries This patch provides the new names and converts all existing callers over to the new naming scheme. [1] Signed-off-by: Elijah Newren <> Signed-off-by: Junio C Hamano <>
2020-10-05Merge branch 'jk/refspecs-negative'Junio C Hamano
"git fetch" and "git push" support negative refspecs. * jk/refspecs-negative: refspec: add support for negative refspecs
2020-10-05Merge branch 'jt/keep-partial-clone-filter-upon-lazy-fetch'Junio C Hamano
The lazy fetching done internally to make missing objects available in a partial clone incorrectly made permanent damage to the partial clone filter in the repository, which has been corrected. * jt/keep-partial-clone-filter-upon-lazy-fetch: fetch: do not override partial clone filter promisor-remote: remove unused variable
2020-09-30refspec: add support for negative refspecsJacob Keller
Both fetch and push support pattern refspecs which allow fetching or pushing references that match a specific pattern. Because these patterns are globs, they have somewhat limited ability to express more complex situations. For example, suppose you wish to fetch all branches from a remote except for a specific one. To allow this, you must setup a set of refspecs which match only the branches you want. Because refspecs are either explicit name matches, or simple globs, many patterns cannot be expressed. Add support for a new type of refspec, referred to as "negative" refspecs. These are prefixed with a '^' and mean "exclude any ref matching this refspec". They can only have one "side" which always refers to the source. During a fetch, this refers to the name of the ref on the remote. During a push, this refers to the name of the ref on the local side. With negative refspecs, users can express more complex patterns. For example: git fetch origin refs/heads/*:refs/remotes/origin/* ^refs/heads/dontwant will fetch all branches on origin into remotes/origin, but will exclude fetching the branch named dontwant. Refspecs today are commutative, meaning that order doesn't expressly matter. Rather than forcing an implied order, negative refspecs will always be applied last. That is, in order to match, a ref must match at least one positive refspec, and match none of the negative refspecs. This is similar to how negative pathspecs work. Signed-off-by: Jacob Keller <> Signed-off-by: Junio C Hamano <>
2020-09-28fetch: do not override partial clone filterJonathan Tan
When a fetch with the --filter argument is made, the configured default filter is set even if one already exists. This change was made in 5e46139376 ("builtin/fetch: remove unique promisor remote limitation", 2019-06-25) - in particular, changing from: * If this is the FIRST partial-fetch request, we enable partial * on this repo and remember the given filter-spec as the default * for subsequent fetches to this remote. to: * If this is a partial-fetch request, we enable partial on * this repo if not already enabled and remember the given * filter-spec as the default for subsequent fetches to this * remote. (The given filter-spec is "remembered" even if there is already an existing one.) This is problematic whenever a lazy fetch is made, because lazy fetches are made using "git fetch --filter=blob:none", but this will also happen if the user invokes "git fetch --filter=<filter>" manually. Therefore, restore the behavior prior to 5e46139376, which writes a filter-spec only if the current fetch request is the first partial-fetch one (for that remote). Signed-off-by: Jonathan Tan <> Signed-off-by: Junio C Hamano <>
2020-09-25Merge branch 'ds/maintenance-part-1'Junio C Hamano
A "git gc"'s big brother has been introduced to take care of more repository maintenance tasks, not limited to the object database cleaning. * ds/maintenance-part-1: maintenance: add trace2 regions for task execution maintenance: add auto condition for commit-graph task maintenance: use pointers to check --auto maintenance: create maintenance.<task>.enabled config maintenance: take a lock on the objects directory maintenance: add --task option maintenance: add commit-graph task maintenance: initialize task array maintenance: replace run_auto_gc() maintenance: add --quiet option maintenance: create basic maintenance runner
2020-09-22Merge branch 'ar/fetch-ipversion-in-all'Junio C Hamano
"git fetch --all --ipv4/--ipv6" forgot to pass the protocol options to instances of the "git fetch" that talk to individual remotes, which has been corrected. * ar/fetch-ipversion-in-all: fetch: pass --ipv4 and --ipv6 options to sub-fetches
2020-09-22Merge branch 'os/fetch-submodule-optim'Junio C Hamano
Optimization around submodule handling. * os/fetch-submodule-optim: fetch: do not look for submodule changes in unchanged refs
2020-09-17maintenance: replace run_auto_gc()Derrick Stolee
The run_auto_gc() method is used in several places to trigger a check for repo maintenance after some Git commands, such as 'git commit' or 'git fetch'. To allow for extra customization of this maintenance activity, replace the 'git gc --auto [--quiet]' call with one to 'git maintenance run --auto [--quiet]'. As we extend the maintenance builtin with other steps, users will be able to select different maintenance activities. Rename run_auto_gc() to run_auto_maintenance() to be clearer what is happening on this call, and to expose all callers in the current diff. Rewrite the method to use a struct child_process to simplify the calls slightly. Since 'git fetch' already allows disabling the 'git gc --auto' subprocess, add an equivalent option with a different name to be more descriptive of the new behavior: '--[no-]maintenance'. Update the documentation to include these options at the same time. Signed-off-by: Derrick Stolee <> Signed-off-by: Junio C Hamano <>
2020-09-15fetch: pass --ipv4 and --ipv6 options to sub-fetchesAlex Riesen
The options indicate user intent for the whole fetch operation, and ignoring them in sub-fetches (i.e. "--all" and recursive fetching of submodules) is quite unexpected when, for instance, it is intended to limit all of the communication to a specific transport protocol for some reason. Signed-off-by: Alex Riesen <> Signed-off-by: Junio C Hamano <>
2020-09-06refspec: add and use refspec_appendf()René Scharfe
Add a function for building a refspec using printf-style formatting. It frees callers from managing their own buffer. Use it throughout the tree to shorten and simplify its callers. Signed-off-by: René Scharfe <> Signed-off-by: Junio C Hamano <>
2020-09-06fetch: do not look for submodule changes in unchanged refsOrgad Shaneh
When fetching recursively with submodules, for each ref in the superproject, we call check_for_new_submodule_commits() which collects all the objects that have to be checked for submodule changes on calculate_changed_submodule_paths(). On the first call, it also collects all the existing refs for excluding them from the scan. calculate_changed_submodule_paths() creates an argument array with all the collected new objects, followed by --not and all the old objects. This argv is passed to setup_revisions, which parses each argument, converts it back to an oid and resolves the object. The parsing itself also does redundant work, because it is treated like user input, while in fact it is a full oid. So it needlessly attempts to look it up as ref (checks if it has ^, ~ etc.), checks if it is a file name etc. For a repository with many refs, all of this is expensive. But if the fetch in the superproject did not update the ref (i.e. the objects that are required to exist in the submodule did not change), there is no need to include it in the list. Before commit be76c212 (fetch: ensure submodule objects fetched, 2018-12-06), submodule reference changes were only detected for refs that were changed, but not for new refs. This commit covered also this case, but what it did was to just include every ref. This change should reduce the number of scanned refs by about half (except the case of a no-op fetch, which will not scan any ref), because all the existing refs will still be listed after --not. The regression was reported here: Signed-off-by: Orgad Shaneh <> Signed-off-by: Junio C Hamano <>
2020-09-03Merge branch 'jt/lazy-fetch'Junio C Hamano
Updates to on-demand fetching code in lazily cloned repositories. * jt/lazy-fetch: fetch: no FETCH_HEAD display if --no-write-fetch-head fetch-pack: remove no_dependents code promisor-remote: lazy-fetch objects in subprocess fetch-pack: do not lazy-fetch during ref iteration fetch: only populate existing_refs if needed fetch: avoid reading submodule config until needed fetch: allow refspecs specified through stdin negotiator/noop: add noop fetch negotiator
2020-09-02fetch: no FETCH_HEAD display if --no-write-fetch-headJonathan Tan
887952b8c6 ("fetch: optionally allow disabling FETCH_HEAD update", 2020-08-18) introduced the ability to disable writing to FETCH_HEAD during fetch, but did not suppress the "<source> -> FETCH_HEAD" message when this ability is used. This message is misleading in this case, because FETCH_HEAD is not written. Also, because "fetch" is used to lazy-fetch missing objects in a partial clone, this significantly clutters up the output in that case since the objects to be fetched are potentially numerous. Therefore, suppress this message when --no-write-fetch-head is passed (but not when --dry-run is set). Signed-off-by: Jonathan Tan <> Signed-off-by: Junio C Hamano <>
2020-08-27Merge branch 'jk/leakfix'Junio C Hamano
Code clean-up. * jk/leakfix: submodule--helper: fix leak of core.worktree value config: fix leak in git_config_get_expiry_in_days() config: drop git_config_get_string_const() config: fix leaks from git_config_get_string_const() checkout: fix leak of non-existent branch names submodule--helper: use strbuf_release() to free strbufs clear_pattern_list(): clear embedded hashmaps
2020-08-18fetch: only populate existing_refs if neededJonathan Tan
In "fetch", get_ref_map() iterates over all refs to populate "existing_refs" in order to populate peer_ref->old_oid in the returned refmap, even if the refmap has no peer_ref set - which is the case when only literal hashes (i.e. no refs by name) are fetched. Iterating over refs causes the targets of those refs to be checked for existence. Avoiding this is especially important when we use "git fetch" to perform lazy fetches in a partial clone because a target of such a ref may need to be itself lazy-fetched (and otherwise causing an infinite loop). Therefore, avoid populating "existing_refs" until necessary. With this patch, because Git lazy-fetches objects by literal hashes (to be done in a subsequent commit), it will then be able to guarantee avoiding reading targets of refs. Signed-off-by: Jonathan Tan <> Signed-off-by: Junio C Hamano <>
2020-08-18fetch: avoid reading submodule config until neededJonathan Tan
In "fetch", there are two parameters submodule_fetch_jobs_config and recurse_submodules that can be set in a variety of ways: through .gitmodules, through .git/config, and through the command line. Currently "fetch" handles this by first reading .gitmodules, then reading .git/config (allowing it to overwrite existing values), then reading the command line (allowing it to overwrite existing values). Notice that we can avoid reading .gitmodules if .git/config and/or the command line already provides us with what we need. In addition, if recurse_submodules is found to be "no", we do not need the value of submodule_fetch_jobs_config. Avoiding reading .gitmodules is especially important when we use "git fetch" to perform lazy fetches in a partial clone because the .gitmodules file itself might need to be lazy fetched (and otherwise causing an infinite loop). In light of all this, avoid reading .gitmodules until necessary. When reading it, we may only need one of the two parameters it provides, so teach fetch_config_from_gitmodules() to support NULL arguments. With this patch, users (including Git itself when invoking "git fetch" to lazy-fetch) will be able to guarantee avoiding reading .gitmodules by passing --recurse-submodules=no. Signed-off-by: Jonathan Tan <> Signed-off-by: Junio C Hamano <>
2020-08-18fetch: allow refspecs specified through stdinJonathan Tan
In a subsequent patch, partial clones will be taught to fetch missing objects using a "git fetch" subprocess. Because the number of objects fetched may be too numerous to fit on the command line, teach "fetch" to accept refspecs passed through stdin. Signed-off-by: Jonathan Tan <> Signed-off-by: Junio C Hamano <>
2020-08-18fetch: optionally allow disabling FETCH_HEAD updateJunio C Hamano
If you run fetch but record the result in remote-tracking branches, and either if you do nothing with the fetched refs (e.g. you are merely mirroring) or if you always work from the remote-tracking refs (e.g. you fetch and then merge origin/branchname separately), you can get away with having no FETCH_HEAD at all. Teach "git fetch" a command line option "--[no-]write-fetch-head". The default is to write FETCH_HEAD, and the option is primarily meant to be used with the "--no-" prefix to override this default, because there is no matching fetch.writeFetchHEAD configuration variable to flip the default to off (in which case, the positive form may become necessary to defeat it). Note that under "--dry-run" mode, FETCH_HEAD is never written; otherwise you'd see list of objects in the file that you do not actually have. Passing `--write-fetch-head` does not force `git fetch` to write the file. Signed-off-by: Derrick Stolee <> Signed-off-by: Junio C Hamano <>
2020-08-14config: fix leaks from git_config_get_string_const()Jeff King
There are two functions to get a single config string: - git_config_get_string() - git_config_get_string_const() One might naively think that the first one allocates a new string and the second one just points us to the internal configset storage. But in fact they both allocate a new copy; the second one exists only to avoid having to cast when using it with a const global which we never intend to free. The documentation for the function explains that clearly, but it seems I'm not alone in being surprised by this. Of 17 calls to the function, 13 of them leak the resulting value. We could obviously fix these by adding the appropriate free(). But it would be simpler still if we actually had a non-allocating way to get the string. There's git_config_get_value() but that doesn't quite do what we want. If the config key is present but is a boolean with no value (e.g., "[foo]bar" in the file), then we'll get NULL (whereas the string versions will print an error and die). So let's introduce a new variant, git_config_get_string_tmp(), that behaves as these callers expect. We need a new name because we have new semantics but the same function signature (so even if we converted the four remaining callers, topics in flight might be surprised). The "tmp" is because this value should only be held onto for a short time. In practice it's rare for us to clear and refresh the configset, invalidating the pointer, but hopefully the "tmp" makes callers think about the lifetime. In each of the converted cases here the value only needs to last within the local function or its immediate caller. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2020-07-31strvec: rename struct fieldsJeff King
The "argc" and "argv" names made sense when the struct was argv_array, but now they're just confusing. Let's rename them to "nr" (which we use for counts elsewhere) and "v" (which is rather terse, but reads well when combined with typical variable names like "args.v"). Note that we have to update all of the callers immediately. Playing tricks with the preprocessor is hard here, because we wouldn't want to rewrite unrelated tokens. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2020-07-28strvec: fix indentation in renamed callsJeff King
Code which split an argv_array call across multiple lines, like: argv_array_pushl(&args, "one argument", "another argument", "and more", NULL); was recently mechanically renamed to use strvec, which results in mis-matched indentation like: strvec_pushl(&args, "one argument", "another argument", "and more", NULL); Let's fix these up to align the arguments with the opening paren. I did this manually by sifting through the results of: git jump grep 'strvec_.*,$' and liberally applying my editor's auto-format. Most of the changes are of the form shown above, though I also normalized a few that had originally used a single-tab indentation (rather than our usual style of aligning with the open paren). I also rewrapped a couple of obvious cases (e.g., where previously too-long lines became short enough to fit on one), but I wasn't aggressive about it. In cases broken to three or more lines, the grouping of arguments is sometimes meaningful, and it wasn't worth my time or reviewer time to ponder each case individually. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>