summaryrefslogtreecommitdiff
path: root/fetch-pack.c
AgeCommit message (Collapse)Author
2018-07-18Merge branch 'sb/object-store-grafts'Junio C Hamano
The conversion to pass "the_repository" and then "a_repository" throughout the object access API continues. * sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h
2018-07-16negotiator/skipping: skip commits during fetchJonathan Tan
Introduce a new negotiation algorithm used during fetch that skips commits in an effort to find common ancestors faster. The skips grow similarly to the Fibonacci sequence as the commit walk proceeds further away from the tips. The skips may cause unnecessary commits to be included in the packfile, but the negotiation step typically ends more quickly. Usage of this algorithm is guarded behind the configuration flag fetch.negotiationAlgorithm. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03fetch-pack: support negotiation tip whitelistJonathan Tan
During negotiation, fetch-pack eventually reports as "have" lines all commits reachable from all refs. Allow the user to restrict the commits sent in this way by providing a whitelist of tips; only the tips themselves and their ancestors will be sent. Both globs and single objects are supported. This feature is only supported for protocols that support connect or stateless-connect (such as HTTP with protocol v2). This will speed up negotiation when the repository has multiple relatively independent branches (for example, when a repository interacts with multiple repositories, such as with linux-next [1] and torvalds/linux [2]), and the user knows which local branch is likely to have commits in common with the upstream branch they are fetching. [1] https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next/ [2] https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03fetch-pack: write shallow, then check connectivityJonathan Tan
When fetching, connectivity is checked after the shallow file is updated. There are 2 issues with this: (1) the connectivity check is only performed up to ancestors of existing refs (which is not thorough enough if we were deepening an existing ref in the first place), and (2) there is no rollback of the shallow file if the connectivity check fails. To solve (1), update the connectivity check to check the ancestry chain completely in the case of a deepening fetch by refraining from passing "--not --all" when invoking rev-list in connected.c. To solve (2), have fetch_pack() perform its own connectivity check before updating the shallow file. To support existing use cases in which "git fetch-pack" is used to download objects without much regard as to the connectivity of the resulting objects with respect to the existing repository, the connectivity check is only done if necessary (that is, the fetch is not a clone, and the fetch involves shallow/deepen functionality). "git fetch" still performs its own connectivity check, preserving correctness but sometimes performing redundant work. This redundancy is mitigated by the fact that fetch_pack() reports if it has performed a connectivity check itself, and if the transport supports connect or stateless-connect, it will bubble up that report so that "git fetch" knows not to perform the connectivity check in such a case. This was noticed when a user tried to deepen an existing repository by fetching with --no-shallow from a server that did not send all necessary objects - the connectivity check as run by "git fetch" succeeded, but a subsequent "git fsck" failed. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29tag: add repository argument to deref_tagStefan Beller
Add a repository argument to allow the callers of deref_tag to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29commit: add repository argument to lookup_commitStefan Beller
Add a repository argument to allow callers of lookup_commit to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29object: add repository argument to lookup_objectStefan Beller
Add a repository argument to allow callers of lookup_object to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29object: add repository argument to parse_objectStefan Beller
Add a repository argument to allow the callers of parse_object to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29Merge branch 'sb/object-store-grafts' into sb/object-store-lookupJunio C Hamano
* sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h
2018-06-28Merge branch 'jk/fetch-all-peeled-fix'Junio C Hamano
"git fetch-pack --all" used to unnecessarily fail upon seeing an annotated tag that points at an object other than a commit. * jk/fetch-all-peeled-fix: fetch-pack: test explicitly that --all can fetch tag references pointing to non-commits fetch-pack: don't try to fetch peel values with --all
2018-06-28fetch-pack: implement ref-in-wantBrandon Williams
Implement ref-in-want on the client side so that when a server supports the "ref-in-want" feature, a client will send "want-ref" lines for each reference the client wants to fetch. This feature allows clients to tolerate inconsistencies that exist when a remote repository's refs change during the course of negotiation. This allows a client to request to request a particular ref without specifying the OID of the ref. This means that instead of hitting an error when a ref no longer points at the OID it did at the beginning of negotiation, negotiation can continue and the value of that ref will be sent at the termination of negotiation, just before a packfile is sent. More information on the ref-in-want feature can be found in Documentation/technical/protocol-v2.txt. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-28fetch-pack: put shallow info in output parameterBrandon Williams
Expand the transport fetch method signature, by adding an output parameter, to allow transports to return information about the refs they have fetched. Then communicate shallow status information through this mechanism instead of by modifying the input list of refs. This does require clients to sometimes generate the ref map twice: once from the list of refs provided by the remote (as is currently done) and potentially once from the new list of refs that the fetch mechanism provides. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-15fetch-pack: introduce negotiator APIJonathan Tan
Introduce the new files fetch-negotiator.{h,c}, which contains an API behind which the details of negotiation are abstracted. Currently, only one algorithm is available: the existing one. This patch is written to be easily reviewed: static functions are moved verbatim from fetch-pack.c to negotiator/default.c, and it can be seen that the lines replaced by negotiator->X() calls are present in the X() functions respectively. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-15fetch-pack: move common check and marking togetherJonathan Tan
When receiving 'ACK <object-id> continue' for a common commit, check if the commit was already known to be common and mark it as such if not up front. This should make future refactoring of how the information about common commits is stored more straightforward. No visible change intended. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-15fetch-pack: make negotiation-related vars localJonathan Tan
Reduce the number of global variables by making the priority queue and the count of non-common commits in it local, passing them as a struct to various functions where necessary. This also helps in the case that fetch_pack() is invoked twice in the same process (when tag following is required when using a transport that does not support tag following), in that different priority queues will now be used in each invocation, instead of reusing the possibly non-empty one. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-15fetch-pack: use ref adv. to prune "have" sentJonathan Tan
In negotiation using protocol v2, fetch-pack sometimes does not make full use of the information obtained in the ref advertisement: specifically, that if the server advertises a commit that the client also has, the client never needs to inform the server that it has the commit's parents, since it can just tell the server that it has the advertised commit and it knows that the server can and will infer the rest. This is because, in do_fetch_pack_v2(), rev_list_insert_ref_oid() is invoked before mark_complete_and_common_ref(). This means that if we have a commit that is both our ref and their ref, it would be enqueued by rev_list_insert_ref_oid() as SEEN, and since it is thus already SEEN, mark_complete_and_common_ref() would not enqueue it. If mark_complete_and_common_ref() were invoked first, as it is in do_fetch_pack() for protocol v0, then mark_complete_and_common_ref() would enqueue it with COMMON_REF | SEEN. The addition of COMMON_REF ensures that its parents are not sent as "have" lines. Change the order in do_fetch_pack_v2() to be consistent with do_fetch_pack(), and to avoid sending unnecessary "have" lines. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-15fetch-pack: directly end negotiation if ACK readyJonathan Tan
When "ACK %s ready" is received, find_common() clears rev_list in an attempt to stop further "have" lines from being sent [1]. It is much more readable to explicitly break from the loop instead. So explicitly break from the loop, and make the clearing of the rev_list happen unconditionally. [1] The rationale is further described in the originating commit f2cba9299b ("fetch-pack: Finish negotation if remote replies "ACK %s ready"", 2011-03-14). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-14fetch-pack: clear marks before re-markingJonathan Tan
If tag following is required when using a transport that does not support tag following, fetch_pack() will be invoked twice in the same process, necessitating a clearing of the object flags used by fetch_pack() sometime during the second invocation. This is currently done in find_common(), which means that the invocation of mark_complete_and_common_ref() in do_fetch_pack() is useless. (This cannot be reproduced with Git alone, because all transports that come with Git support tag following.) Therefore, move this clearing from find_common() to its parent function do_fetch_pack(), right before it calls mark_complete_and_common_ref(). This has been occurring since the commit that introduced the clearing of marks, 420e9af498 ("Fix tag following", 2008-03-19). The corresponding code for protocol v2 in do_fetch_pack_v2() does not have this problem, as the clearing of flags is done before any marking (whether by rev_list_insert_ref_oid() or mark_complete_and_common_ref()). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-14fetch-pack: split up everything_local()Jonathan Tan
The function everything_local(), despite its name, also (1) marks commits as COMPLETE and COMMON_REF and (2) invokes filter_refs() as important side effects. Extract (1) into its own function (mark_complete_and_common_ref()) and remove (2). The restoring of save_commit_buffer, which was introduced in a1c6d7c1a7 ("fetch-pack: restore save_commit_buffer after use", 2017-12-08), is a concern of the parse_object() call in mark_complete_and_common_ref(), so it has been moved from the end of everything_local() to the end of mark_complete_and_common_ref(). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-11fetch-pack: don't try to fetch peel values with --allJeff King
When "fetch-pack --all" sees a tag-to-blob on the remote, it tries to fetch both the tag itself ("refs/tags/foo") and the peeled value that the remote advertises ("refs/tags/foo^{}"). Asking for the object pointed to by the latter can cause upload-pack to complain with "not our ref", since it does not mark the peeled objects with the OUR_REF (unless they were at the tip of some other ref). Arguably upload-pack _should_ be marking those peeled objects. But it never has in the past, since clients would generally just ask for the tag and expect to get the peeled value along with it. And that's how "git fetch" works, as well as older versions of "fetch-pack --all". The problem was introduced by 5f0fc64513 (fetch-pack: eliminate spurious error messages, 2012-09-09). Before then, the matching logic was something like: if (refname is ill-formed) do nothing else if (doing --all) always consider it matched else look through list of sought refs for a match That commit wanted to flip the order of the second two arms of that conditional. But we ended up with: if (refname is ill-formed) do nothing else look through list of sought refs for a match if (--all and no match so far) always consider it matched That means tha an ill-formed ref will trigger the --all conditional block, even though we should just be ignoring it. We can fix that by having a single "else" with all of the well-formed logic, that checks the sought refs and "--all" in the correct order. Reported-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-30Merge branch 'jt/partial-clone-proto-v2'Junio C Hamano
Transfer protocol v2 learned to support the partial clone. * jt/partial-clone-proto-v2: {fetch,upload}-pack: support filter in protocol v2 upload-pack: read config when serving protocol v2 upload-pack: fix error message typo
2018-05-23Merge branch 'bw/server-options'Junio C Hamano
The transport protocol v2 is getting updated further. * bw/server-options: fetch: send server options when using protocol v2 ls-remote: send server options when using protocol v2 serve: introduce the server-option capability
2018-05-17path.c: migrate global git_path_* to take a repository argumentStefan Beller
Migrate all git_path_* functions that are defined in path.c to take a repository argument. Unlike other patches in this series, do not use the #define trick, as we rewrite the whole function, which is rather small. This doesn't migrate all the functions, as other builtins have their own local path functions defined using GIT_PATH_FUNC. So keep that macro around to serve the other locations. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-17shallow: add repository argument to is_repository_shallowStefan Beller
Add a repository argument to allow callers of is_repository_shallow to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-17shallow: add repository argument to register_shallowStefan Beller
Add a repository argument to allow callers of register_shallow to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16object-store: move object access functions to object-store.hStefan Beller
This should make these functions easier to find and cache.h less overwhelming to read. In particular, this moves: - read_object_file - oid_object_info - write_object_file As a result, most of the codebase needs to #include object-store.h. In this patch the #include is only added to files that would fail to compile otherwise. It would be better to #include wherever identifiers from the header are used. That can happen later when we have better tooling for it. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-08Merge branch 'bw/protocol-v2'Junio C Hamano
The beginning of the next-gen transfer protocol. * bw/protocol-v2: (35 commits) remote-curl: don't request v2 when pushing remote-curl: implement stateless-connect command http: eliminate "# service" line when using protocol v2 http: don't always add Git-Protocol header http: allow providing extra headers for http requests remote-curl: store the protocol version the server responded with remote-curl: create copy of the service name pkt-line: add packet_buf_write_len function transport-helper: introduce stateless-connect transport-helper: refactor process_connect_service transport-helper: remove name parameter connect: don't request v2 when pushing connect: refactor git_connect to only get the protocol version once fetch-pack: support shallow requests fetch-pack: perform a fetch using v2 upload-pack: introduce fetch server command push: pass ref prefixes when pushing fetch: pass ref prefixes when fetching ls-remote: pass ref prefixes when requesting a remote's refs transport: convert transport_get_remote_refs to take a list of ref prefixes ...
2018-05-06{fetch,upload}-pack: support filter in protocol v2Jonathan Tan
The fetch-pack/upload-pack protocol v2 was developed independently of the filter parameter (used in partial fetches), thus it did not include support for it. Add support for the filter parameter. Like in the legacy protocol, the server advertises and supports "filter" only if uploadpack.allowfilter is configured. Like in the legacy protocol, the client continues with a warning if "--filter" is specified, but the server does not advertise it. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02Merge branch 'bw/protocol-v2' into jt/partial-clone-proto-v2Junio C Hamano
The beginning of the next-gen transfer protocol. * bw/protocol-v2: (35 commits) remote-curl: don't request v2 when pushing remote-curl: implement stateless-connect command http: eliminate "# service" line when using protocol v2 http: don't always add Git-Protocol header http: allow providing extra headers for http requests remote-curl: store the protocol version the server responded with remote-curl: create copy of the service name pkt-line: add packet_buf_write_len function transport-helper: introduce stateless-connect transport-helper: refactor process_connect_service transport-helper: remove name parameter connect: don't request v2 when pushing connect: refactor git_connect to only get the protocol version once fetch-pack: support shallow requests fetch-pack: perform a fetch using v2 upload-pack: introduce fetch server command push: pass ref prefixes when pushing fetch: pass ref prefixes when fetching ls-remote: pass ref prefixes when requesting a remote's refs transport: convert transport_get_remote_refs to take a list of ref prefixes ...
2018-04-24fetch: send server options when using protocol v2Brandon Williams
Teach fetch to optionally accept server options by specifying them on the cmdline via '-o' or '--server-option'. These server options are sent to the remote end when performing a fetch communicating using protocol version 2. If communicating using a protocol other than v2 the provided options are ignored and not sent to the remote end. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-24Merge branch 'bw/protocol-v2' into HEADJunio C Hamano
* bw/protocol-v2: (35 commits) remote-curl: don't request v2 when pushing remote-curl: implement stateless-connect command http: eliminate "# service" line when using protocol v2 http: don't always add Git-Protocol header http: allow providing extra headers for http requests remote-curl: store the protocol version the server responded with remote-curl: create copy of the service name pkt-line: add packet_buf_write_len function transport-helper: introduce stateless-connect transport-helper: refactor process_connect_service transport-helper: remove name parameter connect: don't request v2 when pushing connect: refactor git_connect to only get the protocol version once fetch-pack: support shallow requests fetch-pack: perform a fetch using v2 upload-pack: introduce fetch server command push: pass ref prefixes when pushing fetch: pass ref prefixes when fetching ls-remote: pass ref prefixes when requesting a remote's refs transport: convert transport_get_remote_refs to take a list of ref prefixes ...
2018-04-11exec_cmd: rename to use dash in file nameStefan Beller
This is more consistent with the project style. The majority of Git's source files use dashes in preference to underscores in their file names. Signed-off-by: Stefan Beller <sbeller@google.com>
2018-04-11Merge branch 'sb/packfiles-in-repository'Junio C Hamano
Refactoring of the internal global data structure continues. * sb/packfiles-in-repository: packfile: keep prepare_packed_git() private packfile: allow find_pack_entry to handle arbitrary repositories packfile: add repository argument to find_pack_entry packfile: allow reprepare_packed_git to handle arbitrary repositories packfile: allow prepare_packed_git to handle arbitrary repositories packfile: allow prepare_packed_git_one to handle arbitrary repositories packfile: add repository argument to reprepare_packed_git packfile: add repository argument to prepare_packed_git packfile: add repository argument to prepare_packed_git_one packfile: allow install_packed_git to handle arbitrary repositories packfile: allow rearrange_packed_git to handle arbitrary repositories packfile: allow prepare_packed_git_mru to handle arbitrary repositories
2018-04-09Merge branch 'ti/fetch-everything-local-optim'Junio C Hamano
A "git fetch" from a repository with insane number of refs into a repository that is already up-to-date still wasted too many cycles making many lstat(2) calls to see if these objects at the tips exist as loose objects locally. These lstat(2) calls are optimized away by enumerating all loose objects beforehand. It is unknown if the new strategy negatively affects existing use cases, fetching into a repository with many loose objects from a repository with small number of refs. * ti/fetch-everything-local-optim: fetch-pack.c: use oidset to check existence of loose object
2018-03-26packfile: add repository argument to reprepare_packed_gitStefan Beller
See previous patch for explanation. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-15fetch-pack: support shallow requestsBrandon Williams
Enable shallow clones and deepen requests using protocol version 2 if the server 'fetch' command supports the 'shallow' feature. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-15fetch-pack: perform a fetch using v2Brandon Williams
When communicating with a v2 server, perform a fetch by requesting the 'fetch' command. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-15fetch-pack: do not check links for partial fetchJonathan Tan
When doing a partial clone or fetch with transfer.fsckobjects=1, use the --fsck-objects instead of the --strict flag when invoking index-pack so that links are not checked, only objects. This is because incomplete links are expected when doing a partial clone or fetch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14fetch-pack.c: use oidset to check existence of loose objectTakuto Ikuta
When fetching from a repository with large number of refs, because to check existence of each refs in local repository to packed and loose objects, 'git fetch' ends up doing a lot of lstat(2) to non-existing loose form, which makes it slow. Instead of making as many lstat(2) calls as the refs the remote side advertised to see if these objects exist in the loose form, first enumerate all the existing loose objects in hashmap beforehand and use it to check existence of them if the number of refs is larger than the number of loose objects. With this patch, the number of lstat(2) calls in `git fetch` is reduced from 411412 to 13794 for chromium repository, it has more than 480000 remote refs. I took time stat of `git fetch` when fetch-pack happens for chromium repository 3 times on linux with SSD. * with this patch 8.105s 8.309s 7.640s avg: 8.018s * master 12.287s 11.175s 12.227s avg: 11.896s On my MacBook Air which has slower lstat(2). * with this patch 14.501s * master 1m16.027s `git fetch` on slow disk will be improved largely. Signed-off-by: Takuto Ikuta <tikuta@chromium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-27Merge branch 'js/packet-read-line-check-null'Junio C Hamano
Some low level protocol codepath could crash when they get an unexpected flush packet, which is now fixed. * js/packet-read-line-check-null: always check for NULL return from packet_read_line() correct error messages for NULL packet_read_line()
2018-02-13Merge branch 'jh/partial-clone'Junio C Hamano
The machinery to clone & fetch, which in turn involves packing and unpacking objects, have been told how to omit certain objects using the filtering mechanism introduced by the jh/object-filtering topic, and also mark the resulting pack as a promisor pack to tolerate missing objects, taking advantage of the mechanism introduced by the jh/fsck-promisors topic. * jh/partial-clone: t5616: test bulk prefetch after partial fetch fetch: inherit filter-spec from partial clone t5616: end-to-end tests for partial clone fetch-pack: restore save_commit_buffer after use unpack-trees: batch fetching of missing blobs clone: partial clone partial-clone: define partial clone settings in config fetch: support filters fetch: refactor calculation of remote list fetch-pack: test support excluding large blobs fetch-pack: add --no-filter fetch-pack, index-pack, transport: partial clone upload-pack: add object filtering for partial clone
2018-02-13Merge branch 'jh/fsck-promisors'Junio C Hamano
In preparation for implementing narrow/partial clone, the machinery for checking object connectivity used by gc and fsck has been taught that a missing object is OK when it is referenced by a packfile specially marked as coming from trusted repository that promises to make them available on-demand and lazily. * jh/fsck-promisors: gc: do not repack promisor packfiles rev-list: support termination at promisor objects sha1_file: support lazily fetching missing objects introduce fetch-object: fetch one promisor object index-pack: refactor writing of .keep files fsck: support promisor objects as CLI argument fsck: support referenced promisor objects fsck: support refs pointing to promisor objects fsck: introduce partialclone extension extension.partialclone: introduce partial clone extension
2018-02-08correct error messages for NULL packet_read_line()Jeff King
The packet_read_line() function dies if it gets an unexpected EOF. It only returns NULL if we get a flush packet (or technically, a zero-length "0004" packet, but nobody is supposed to send those, and they are indistinguishable from a flush in this interface). Let's correct error messages which claim an unexpected EOF; it's really an unexpected flush packet. While we're here, let's also check "!line" instead of "!len" in the second case. The two events should always coincide, but checking "!line" makes it more obvious that we are not about to dereference NULL. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08fetch-pack: restore save_commit_buffer after useJonathan Tan
In fetch-pack, the global variable save_commit_buffer is set to 0, but not restored to its original value after use. In particular, if show_log() (in log-tree.c) is invoked after fetch_pack() in the same process, show_log() will return before printing out the commit message (because the invocation to get_cached_commit_buffer() returns NULL, because the commit buffer was not saved). I discovered this when attempting to run "git log -S" in a partial clone, triggering the case where revision walking lazily loads missing objects. Therefore, restore save_commit_buffer to its original value after use. An alternative to solve the problem I had is to replace get_cached_commit_buffer() with get_commit_buffer(). That invocation was introduced in commit a97934d ("use get_cached_commit_buffer where appropriate", 2014-06-13) to replace "commit->buffer" introduced in commit 3131b71 ("Add "--show-all" revision walker flag for debugging", 2008-02-13). In the latter commit, the commit author seems to be deciding between not showing an unparsed commit at all and showing an unparsed commit without the message (which is what the commit does), and did not mention parsing the unparsed commit, so I prefer to preserve the existing behavior. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08fetch-pack, index-pack, transport: partial cloneJeff Hostetler
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-06Merge branch 'jk/fewer-pack-rescan'Junio C Hamano
Internaly we use 0{40} as a placeholder object name to signal the codepath that there is no such object (e.g. the fast-forward check while "git fetch" stores a new remote-tracking ref says "we know there is no 'old' thing pointed at by the ref, as we are creating it anew" by passing 0{40} for the 'old' side), and expect that a codepath to locate an in-core object to return NULL as a sign that the object does not exist. A look-up for an object that does not exist however is quite costly with a repository with large number of packfiles. This access pattern has been optimized. * jk/fewer-pack-rescan: sha1_file: fast-path null sha1 as a missing object everything_local: use "quick" object existence check p5551: add a script to test fetch pack-dir rescans t/perf/lib-pack: use fast-import checkpoint to create packs p5550: factor out nonsense-pack creation
2017-12-05introduce fetch-object: fetch one promisor objectJonathan Tan
Introduce fetch-object, providing the ability to fetch one object from a promisor remote. This uses fetch-pack. To do this, the transport mechanism has been updated with 2 flags, "from-promisor" to indicate that the resulting pack comes from a promisor remote (and thus should be annotated as such by index-pack), and "no-dependents" to indicate that only the objects themselves need to be fetched (but fetching additional objects is nevertheless safe). Whenever "no-dependents" is used, fetch-pack will refrain from using any object flags, because it is most likely invoked as part of a dynamic object fetch by another Git command (which may itself use object flags). An alternative to this is to leave fetch-pack alone, and instead update the allocation of flags so that fetch-pack's flags never overlap with any others, but this will end up shrinking the number of flags available to nearly every other Git command (that is, every Git command that accesses objects), so the approach in this commit was used instead. This will be tested in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-21everything_local: use "quick" object existence checkJeff King
In b495697b82 (fetch-pack: avoid repeatedly re-scanning pack directory, 2013-01-26), we noticed that everything_local() could waste time trying to find and parse objects which we _expect_ to be missing. The solution was to put has_sha1_file() in front of parse_object() to skip the more-expensive parse attempt. That optimization was negated later when has_sha1_file() learned to do the same re-scan in 45e8a74873 (has_sha1_file: re-check pack directory before giving up, 2013-08-30). We can restore it by using the "quick" flag to tell has_sha1_file (actually has_object_file these days) that we prefer speed to thoroughness for this call. See also the fixes in 5827a0354 and 0eeb077be7 for prior art and discussion on using the "quick" flag for these cases. The recently-added performance regression test in p5551 demonstrates the problem. You can see the original fix: Test b495697b82^ b495697b82 -------------------------------------------------------- 5551.4: fetch 1.68(1.33+0.35) 0.87(0.69+0.18) -48.2% and then the regression: Test 45e8a74873^ 45e8a74873 --------------------------------------------------------- 5551.4: fetch 0.96(0.77+0.19) 2.55(2.04+0.50) +165.6% and now our fix: Test HEAD^ HEAD -------------------------------------------------------- 5551.4: fetch 7.21(6.58+0.63) 5.47(5.04+0.43) -24.1% You can also see that other things have gotten a lot slower since 2013. We'll deal with those in separate patches. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-01oidmap: map with OID as keyJonathan Tan
This is similar to using the hashmap in hashmap.c, but with an easier-to-use API. In particular, custom entry comparisons no longer need to be written, and lookups can be done without constructing a temporary entry structure. This is implemented as a thin wrapper over the hashmap API. In particular, this means that there is an additional 4-byte overhead due to the fact that the first 4 bytes of the hash is redundantly stored. For now, I'm taking the simpler approach, but if need be, we can reimplement oidmap without affecting the callers significantly. oidset has been updated to use oidmap. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23pack: move {,re}prepare_packed_git and approximate_object_countJonathan Tan
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>