summaryrefslogtreecommitdiff
path: root/fetch-pack.c
AgeCommit message (Collapse)Author
2021-05-05fetch: teach independent negotiation (no packfile)Jonathan Tan
Currently, the packfile negotiation step within a Git fetch cannot be done independent of sending the packfile, even though there is at least one application wherein this is useful. Therefore, make it possible for this negotiation step to be done independently. A subsequent commit will use this for one such application - push negotiation. This feature is for protocol v2 only. (An implementation for protocol v0 would require a separate implementation in the fetch, transport, and transport helper code.) In the protocol, the main hindrance towards independent negotiation is that the server can unilaterally decide to send the packfile. This is solved by a "wait-for-done" argument: the server will then wait for the client to say "done". In practice, the client will never say it; instead it will cease requests once it is satisfied. In the client, the main change lies in the transport and transport helper code. fetch_refs_via_pack() performs everything needed - protocol version and capability checks, and the negotiation itself. There are 2 code paths that do not go through fetch_refs_via_pack() that needed to be individually excluded: the bundle transport (excluded through requiring smart_options, which the bundle transport doesn't support) and transport helpers that do not support takeover. If or when we support independent negotiation for protocol v0, we will need to modify these 2 code paths to support it. But for now, report failure if independent negotiation is requested in these cases. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-09fetch-pack: refactor command and capability writeJonathan Tan
A subsequent commit will need this functionality independent of the rest of send_fetch_request(), so put this into its own function. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-09fetch-pack: refactor add_haves()Jonathan Tan
A subsequent commit will need part, but not all, of the functionality in add_haves(), so move some of its functionality to its sole caller send_fetch_request(). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-09fetch-pack: refactor process_acks()Jonathan Tan
A subsequent commit will need part, but not all, of the functionality in process_acks(), so move some of its functionality to its sole caller do_fetch_pack_v2(). As a side effect, the resulting code is also shorter. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-09Merge branch 'jt/fetch-pack-request-fix' into jt/push-negotiationJunio C Hamano
* jt/fetch-pack-request-fix: fetch-pack: buffer object-format with other args
2021-04-09fetch-pack: buffer object-format with other argsJonathan Tan
In send_fetch_request(), "object-format" is written directly to the file descriptor, as opposed to the other arguments, which are buffered. Buffer "object-format" as well. "object-format" must be buffered; in particular, it must appear after "command=fetch" in the request. This divergence was introduced in 4b831208bb ("fetch-pack: parse and advertise the object-format capability", 2020-05-27), perhaps as an oversight (the surrounding code at the point of this commit has already been using a request buffer.) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-08Merge branch 'll/clone-reject-shallow'Junio C Hamano
"git clone --reject-shallow" option fails the clone as soon as we notice that we are cloning from a shallow repository. * ll/clone-reject-shallow: builtin/clone.c: add --reject-shallow option
2021-04-07Merge branch 'ab/fsck-api-cleanup'Junio C Hamano
Fsck API clean-up. * ab/fsck-api-cleanup: fetch-pack: use new fsck API to printing dangling submodules fetch-pack: use file-scope static struct for fsck_options fetch-pack: don't needlessly copy fsck_options fsck.c: move gitmodules_{found,done} into fsck_options fsck.c: add an fsck_set_msg_type() API that takes enums fsck.c: pass along the fsck_msg_id in the fsck_error callback fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h fsck.c: give "FOREACH_MSG_ID" a more specific name fsck.c: undefine temporary STR macro after use fsck.c: call parse_msg_type() early in fsck_set_msg_type() fsck.h: re-order and re-assign "enum fsck_msg_type" fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type" fsck.c: rename remaining fsck_msg_id "id" to "msg_id" fsck.c: remove (mostly) redundant append_msg_id() function fsck.c: rename variables in fsck_set_msg_type() for less confusion fsck.h: use "enum object_type" instead of "int" fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} fsck.c: refactor and rename common config callback
2021-04-01builtin/clone.c: add --reject-shallow optionLi Linchao
In some scenarios, users may want more history than the repository offered for cloning, which happens to be a shallow repository, can give them. But because users don't know it is a shallow repository until they download it to local, we may want to refuse to clone this kind of repository, without creating any unnecessary files. The '--depth=x' option cannot be used as a solution; the source may be deep enough to give us 'x' commits when cloned, but the user may later need to deepen the history to arbitrary depth. Teach '--reject-shallow' option to "git clone" to abort as soon as we find out that we are cloning from a shallow repository. Signed-off-by: Li Linchao <lilinchao@oschina.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-29fetch-pack: use new fsck API to printing dangling submodulesÆvar Arnfjörð Bjarmason
Refactor the check added in 5476e1efde (fetch-pack: print and use dangling .gitmodules, 2021-02-22) to make use of us now passing the "msg_id" to the user defined "error_func". We can now compare against the FSCK_MSG_GITMODULES_MISSING instead of parsing the generated message. Let's also replace register_found_gitmodules() with directly manipulating the "gitmodules_found" member. A recent commit moved it into "fsck_options" so we could do this here. I'm sticking this callback in fsck.c. Perhaps in the future we'd like to accumulate such callbacks into another file (maybe fsck-cb.c, similar to parse-options-cb.c?), but while we've got just the one let's just put it into fsck.c. A better alternative in this case would be some library some more obvious library shared by fetch-pack.c ad builtin/index-pack.c, but there isn't such a thing. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-29fetch-pack: use file-scope static struct for fsck_optionsÆvar Arnfjörð Bjarmason
Change code added in 5476e1efde (fetch-pack: print and use dangling .gitmodules, 2021-02-22) so that we use a file-scoped "static struct fsck_options" instead of defining one in the "fsck_gitmodules_oids()" function. We use this pattern in all of builtin/{fsck,index-pack,mktag,unpack-objects}.c. It's odd to see fetch-pack be the odd one out. One might think that we're using other fsck_options structs in fetch-pack, or doing on fsck twice there, but we're not. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-29fsck.c: move gitmodules_{found,done} into fsck_optionsÆvar Arnfjörð Bjarmason
Move the gitmodules_{found,done} static variables added in 159e7b080bf (fsck: detect gitmodules files, 2018-05-02) into the fsck_options struct. It makes sense to keep all the context in the same place. This requires changing the recently added register_found_gitmodules() function added in 5476e1efde (fetch-pack: print and use dangling .gitmodules, 2021-02-22) to take fsck_options. That function will be removed in a subsequent commit, but as it'll require the new gitmodules_found attribute of "fsck_options" we need this intermediate step first. An earlier version of this patch removed the small amount of duplication we now have between FSCK_OPTIONS_{DEFAULT,STRICT} with a FSCK_OPTIONS_COMMON macro. I don't think such de-duplication is worth it for this amount of copy/pasting. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-14use CALLOC_ARRAYRené Scharfe
Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-09Merge branch 'jt/transfer-fsck-across-packs-fix'Junio C Hamano
The code to fsck objects received across multiple packs during a single git fetch session has been broken when the packfile URI feature was in use. A workaround has been added by disabling the codepath to avoid keeping a packfile that is too small. * jt/transfer-fsck-across-packs-fix: fetch-pack: do not mix --pack_header and packfile uri
2021-03-05fetch-pack: do not mix --pack_header and packfile uriJonathan Tan
When fetching (as opposed to cloning) from a repository with packfile URIs enabled, an error like this may occur: fatal: pack has bad object at offset 12: unknown object type 5 fatal: finish_http_pack_request gave result -1 fatal: fetch-pack: expected keep then TAB at start of http-fetch output This bug was introduced in b664e9ffa1 ("fetch-pack: with packfile URIs, use index-pack arg", 2021-02-22), when the index-pack args used when processing the inline packfile of a fetch response and when processing packfile URIs were unified. This bug happens because fetch, by default, partially reads (and consumes) the header of the inline packfile to determine if it should store the downloaded objects as a packfile or loose objects, and thus passes --pack_header=<...> to index-pack to inform it that some bytes are missing. However, when it subsequently fetches the additional packfiles linked by URIs, it reuses the same index-pack arguments, thus wrongly passing --index-pack-arg=--pack_header=<...> when no bytes are missing. This does not happen when cloning because "git clone" always passes do_keep, which instructs the fetch mechanism to always retain the packfile, eliminating the need to read the header. There are a few ways to fix this, including filtering out pack_header arguments when downloading the additional packfiles, but I decided to stick to always using index-pack throughout when packfile URIs are present - thus, Git no longer needs to read the bytes, and no longer needs --pack_header here. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-01Merge branch 'jt/transfer-fsck-across-packs'Junio C Hamano
The approach to "fsck" the incoming objects in "index-pack" is attractive for performance reasons (we have them already in core, inflated and ready to be inspected), but fundamentally cannot be applied fully when we receive more than one pack stream, as a tree object in one pack may refer to a blob object in another pack as ".gitmodules", when we want to inspect blobs that are used as ".gitmodules" file, for example. Teach "index-pack" to emit objects that must be inspected later and check them in the calling "fetch-pack" process. * jt/transfer-fsck-across-packs: fetch-pack: print and use dangling .gitmodules fetch-pack: with packfile URIs, use index-pack arg http-fetch: allow custom index-pack args http: allow custom index-pack args
2021-02-22fetch-pack: print and use dangling .gitmodulesJonathan Tan
Teach index-pack to print dangling .gitmodules links after its "keep" or "pack" line instead of declaring an error, and teach fetch-pack to check such lines printed. This allows the tree side of the .gitmodules link to be in one packfile and the blob side to be in another without failing the fsck check, because it is now fetch-pack which checks such objects after all packfiles have been downloaded and indexed (and not index-pack on an individual packfile, as it is before this commit). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22fetch-pack: with packfile URIs, use index-pack argJonathan Tan
Unify the index-pack arguments used when processing the inline pack and when downloading packfiles referenced by URIs. This is done by teaching get_pack() to also store the index-pack arguments whenever at least one packfile URI is given, and then when processing the packfile URI(s), using the stored arguments. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22http-fetch: allow custom index-pack argsJonathan Tan
This is the next step in teaching fetch-pack to pass its index-pack arguments when processing packfiles referenced by URIs. The "--keep" in fetch-pack.c will be replaced with a full message in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-13fetch-pack: refactor writing promisor fileChristian Couder
Let's replace the 2 different pieces of code that write a promisor file in 'builtin/repack.c' and 'fetch-pack.c' with a new function called 'write_promisor_file()' in 'pack-write.c' and 'pack.h'. This might also help us in the future, if we want to put back the ref names and associated hashes that were in the promisor files we are repacking in 'builtin/repack.c' as suggested by a NEEDSWORK comment just above the code we are refactoring. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-13fetch-pack: rename helper to create_promisor_file()Christian Couder
As we are going to refactor the code that actually writes the promisor file into a separate function in a following commit, let's rename the current write_promisor_file() function to create_promisor_file(). Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08Merge branch 'rs/fetch-pack-invalid-lockfile'Junio C Hamano
"fetch-pack" could pass NULL pointer to unlink(2) when it sees an invalid filename; the error checking has been tightened to make this impossible. * rs/fetch-pack-invalid-lockfile: fetch-pack: disregard invalid pack lockfiles
2020-11-30fetch-pack: disregard invalid pack lockfilesRené Scharfe
9da69a6539 (fetch-pack: support more than one pack lockfile, 2020-06-10) started to use a string_list for pack lockfile names instead of a single string pointer. It removed a NULL check from transport_unlock_pack() as well, which is the function that eventually deletes these lockfiles and releases their name strings. index_pack_lockfile() can return NULL if it doesn't like the contents it reads from the file descriptor passed to it. unlink(2) is declared to not accept NULL pointers (at least with glibc). Undefined Behavior Sanitizer together with Address Sanitizer detects a case where a NULL lockfile name is passed to unlink(2) by transport_unlock_pack() in t1060 (make SANITIZE=address,undefined; cd t; ./t1060-object-corruption.sh). Reinstate the NULL check to avoid undefined behavior, but put it right at the source, so that the number of items in the string_list reflects the number of valid lockfiles. Signed-off-by: René Scharfe <l.s.r@web.de> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-12fetch-pack: advertise session ID in capabilitiesJosh Steadmon
When the server sent a session-id capability and transfer.advertiseSID is true, advertise fetch-pack's own session ID back to the server. Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-03Merge branch 'jt/lazy-fetch'Junio C Hamano
Updates to on-demand fetching code in lazily cloned repositories. * jt/lazy-fetch: fetch: no FETCH_HEAD display if --no-write-fetch-head fetch-pack: remove no_dependents code promisor-remote: lazy-fetch objects in subprocess fetch-pack: do not lazy-fetch during ref iteration fetch: only populate existing_refs if needed fetch: avoid reading submodule config until needed fetch: allow refspecs specified through stdin negotiator/noop: add noop fetch negotiator
2020-09-03Merge branch 'jt/fetch-pack-loosen-validation-with-packfile-uri'Junio C Hamano
Bugfix for "git fetch" when the packfile URI capability is in use. * jt/fetch-pack-loosen-validation-with-packfile-uri: fetch-pack: make packfile URIs work with transfer.fsckobjects fetch-pack: document only_packfile in get_pack() (various): document from_promisor parameter
2020-08-25fetch-pack: make packfile URIs work with transfer.fsckobjectsJonathan Tan
When fetching with packfile URIs and transfer.fsckobjects=1, use the --fsck-objects instead of the --strict flag when invoking index-pack so that links are not checked, only objects. This is because incomplete links are expected. (A subsequent connectivity check will be done when all the packs have been downloaded regardless of whether transfer.fsckobjects is set.) This is similar to 98a2ea46c2 ("fetch-pack: do not check links for partial fetch", 2018-03-15), but for packfile URIs instead of partial clones. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-25fetch-pack: document only_packfile in get_pack()Jonathan Tan
dd4b732df7 ("upload-pack: send part of packfile response as uri", 2020-06-10) added the "only_packfile" parameter to get_pack() but did not document it. Add documentation. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-20fetch-pack: in partial clone, pass --promisorJonathan Tan
When fetching a pack from a promisor remote, the corresponding .promisor file needs to be created. "fetch-pack" originally did this by passing "--promisor" to "index-pack", but in 5374a290aa ("fetch-pack: write fetched refs to .promisor", 2019-10-16), "fetch-pack" was taught to do this itself instead, because it needed to store ref information in the .promisor file. This causes a problem with superprojects when transfer.fsckobjects is set, because in the current implementation, it is "index-pack" that calls fsck_finish() to check the objects; before 5374a290aa, fsck_finish() would see that .gitmodules is a promisor object and tolerate it being missing, but after, there is no .promisor file (at the time of the invocation of fsck_finish() by "index-pack") to tell it that .gitmodules is a promisor object, so it returns an error. Therefore, teach "fetch-pack" to pass "--promisor" to index pack once again. "fetch-pack" will subsequently overwrite this file with the ref information. An alternative is to instead move object checking to "fetch-pack", and let "index-pack" only index the files. However, since "index-pack" has to inflate objects in order to index them, it seems reasonable to also let it check the objects (which also require inflated files). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-18fetch-pack: remove no_dependents codeJonathan Tan
Now that Git has switched to using a subprocess to lazy-fetch missing objects, remove the no_dependents code as it is no longer used. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-18fetch-pack: do not lazy-fetch during ref iterationJonathan Tan
In order to determine negotiation tips, "fetch-pack" iterates over all refs and dereferences all annotated tags found. This causes the existence of targets of refs and annotated tags to be checked. Avoiding this is especially important when we use "git fetch" (which invokes "fetch-pack") to perform lazy fetches in a partial clone because a target of such a ref or annotated tag may need to be itself lazy-fetched (and otherwise causing an infinite loop). Therefore, teach "fetch-pack" not to lazy fetch whenever iterating over refs. This is done by using the raw form of ref iteration and by dereferencing tags ourselves. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28strvec: fix indentation in renamed callsJeff King
Code which split an argv_array call across multiple lines, like: argv_array_pushl(&args, "one argument", "another argument", "and more", NULL); was recently mechanically renamed to use strvec, which results in mis-matched indentation like: strvec_pushl(&args, "one argument", "another argument", "and more", NULL); Let's fix these up to align the arguments with the opening paren. I did this manually by sifting through the results of: git jump grep 'strvec_.*,$' and liberally applying my editor's auto-format. Most of the changes are of the form shown above, though I also normalized a few that had originally used a single-tab indentation (rather than our usual style of aligning with the open paren). I also rewrapped a couple of obvious cases (e.g., where previously too-long lines became short enough to fit on one), but I wasn't aggressive about it. In cases broken to three or more lines, the grouping of arguments is sometimes meaningful, and it wasn't worth my time or reviewer time to ponder each case individually. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28strvec: convert more callers away from argv_array nameJeff King
We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts remaining files from the first half of the alphabet, to keep the diff to a manageable size. The conversion was done purely mechanically with: git ls-files '*.c' '*.h' | xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' and then selectively staging files with "git add '[abcdefghjkl]*'". We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-07Merge branch 'bc/sha-256-part-2'Junio C Hamano
SHA-256 migration work continues. * bc/sha-256-part-2: (44 commits) remote-testgit: adapt for object-format bundle: detect hash algorithm when reading refs t5300: pass --object-format to git index-pack t5704: send object-format capability with SHA-256 t5703: use object-format serve option t5702: offer an object-format capability in the test t/helper: initialize the repository for test-sha1-array remote-curl: avoid truncating refs with ls-remote t1050: pass algorithm to index-pack when outside repo builtin/index-pack: add option to specify hash algorithm remote-curl: detect algorithm for dumb HTTP by size builtin/ls-remote: initialize repository based on fetch t5500: make hash independent serve: advertise object-format capability for protocol v2 connect: parse v2 refs with correct hash algorithm connect: pass full packet reader when parsing v2 refs Documentation/technical: document object-format for protocol v2 t1302: expect repo format version 1 for SHA-256 builtin/show-index: provide options to determine hash algo t5302: modernize test formatting ...
2020-06-25Merge branch 'jt/cdn-offload'Junio C Hamano
The "fetch/clone" protocol has been updated to allow the server to instruct the clients to grab pre-packaged packfile(s) in addition to the packed object data coming over the wire. * jt/cdn-offload: upload-pack: fix a sparse '0 as NULL pointer' warning upload-pack: send part of packfile response as uri fetch-pack: support more than one pack lockfile upload-pack: refactor reading of pack-objects out Documentation: add Packfile URIs design doc Documentation: order protocol v2 sections http-fetch: support fetching packfiles by URL http-fetch: refactor into function http: refactor finish_http_pack_request() http: use --stdin when indexing dumb HTTP pack
2020-06-11upload-pack: send part of packfile response as uriJonathan Tan
Teach upload-pack to send part of its packfile response as URIs. An administrator may configure a repository with one or more "uploadpack.blobpackfileuri" lines, each line containing an OID, a pack hash, and a URI. A client may configure fetch.uriprotocols to be a comma-separated list of protocols that it is willing to use to fetch additional packfiles - this list will be sent to the server. Whenever an object with one of those OIDs would appear in the packfile transmitted by upload-pack, the server may exclude that object, and instead send the URI. The client will then download the packs referred to by those URIs before performing the connectivity check. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11fetch-pack: support more than one pack lockfileJonathan Tan
Whenever a fetch results in a packfile being downloaded, a .keep file is generated, so that the packfile can be preserved (from, say, a running "git repack") until refs are written referring to the contents of the packfile. In a subsequent patch, a successful fetch using protocol v2 may result in more than one .keep file being generated. Therefore, teach fetch_pack() and the transport mechanism to support multiple .keep files. Implementation notes: - builtin/fetch-pack.c normally does not generate .keep files, and thus is unaffected by this or future changes. However, it has an undocumented "--lock-pack" feature, used by remote-curl.c when implementing the "fetch" remote helper command. In keeping with the remote helper protocol, only one "lock" line will ever be written; the rest will result in warnings to stderr. However, in practice, warnings will never be written because the remote-curl.c "fetch" is only used for protocol v0/v1 (which will not generate multiple .keep files). (Protocol v2 uses the "stateless-connect" command, not the "fetch" command.) - connected.c has an optimization in that connectivity checks on a ref need not be done if the target object is in a pack known to be self-contained and connected. If there are multiple packfiles, this optimization can no longer be done. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-27fetch-pack: parse and advertise the object-format capabilitybrian m. carlson
Parse the server's object-format capability and respond accordingly, dying if there is a mismatch. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-27fetch-pack: detect when the server doesn't support our hashbrian m. carlson
Detect when the server doesn't support our hash algorithm and abort. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-24stateless-connect: send response end packetDenton Liu
Currently, remote-curl acts as a proxy and blindly forwards packets between an HTTP server and fetch-pack. In the case of a stateless RPC connection where the connection is terminated before the transaction is complete, remote-curl will blindly forward the packets before waiting on more input from fetch-pack. Meanwhile, fetch-pack will read the transaction and continue reading, expecting more input to continue the transaction. This results in a deadlock between the two processes. This can be seen in the following command which does not terminate: $ git -c protocol.version=2 clone https://github.com/git/git.git --shallow-since=20151012 Cloning into 'git'... whereas the v1 version does terminate as expected: $ git -c protocol.version=1 clone https://github.com/git/git.git --shallow-since=20151012 Cloning into 'git'... fatal: the remote end hung up unexpectedly Instead of blindly forwarding packets, make remote-curl insert a response end packet after proxying the responses from the remote server when using stateless_connect(). On the RPC client side, ensure that each response ends as described. A separate control packet is chosen because we need to be able to differentiate between what the remote server sends and remote-curl's control packets. By ensuring in the remote-curl code that a server cannot send response end packets, we prevent a malicious server from being able to perform a denial of service attack in which they spoof a response end packet and cause the described deadlock to happen. Reported-by: Force Charlie <charlieio@outlook.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-13Merge branch 'tb/shallow-cleanup'Junio C Hamano
Code cleanup. * tb/shallow-cleanup: shallow: use struct 'shallow_lock' for additional safety shallow.h: document '{commit,rollback}_shallow_file' shallow: extract a header file for shallow-related functions commit: make 'commit_graft_pos' non-static
2020-05-01Merge branch 'jt/v2-fetch-nego-fix'Junio C Hamano
The upload-pack protocol v2 gave up too early before finding a common ancestor, resulting in a wasteful fetch from a fork of a project. This has been corrected to match the behaviour of v0 protocol. * jt/v2-fetch-nego-fix: fetch-pack: in protocol v2, reset in_vain upon ACK fetch-pack: in protocol v2, in_vain only after ACK fetch-pack: return enum from process_acks()
2020-05-01Merge branch 'tb/reset-shallow'Junio C Hamano
Fix in-core inconsistency after fetching into a shallow repository that broke the code to write out commit-graph. * tb/reset-shallow: shallow.c: use '{commit,rollback}_shallow_file' t5537: use test_write_lines and indented heredocs for readability
2020-04-30shallow: use struct 'shallow_lock' for additional safetyTaylor Blau
In previous patches, the functions 'commit_shallow_file' and 'rollback_shallow_file' were introduced to reset the shallowness validity checks on a repository after potentially modifying '.git/shallow'. These functions can be made safer by wrapping the 'struct lockfile *' in a new type, 'shallow_lock', so that they cannot be called with a raw lock (and potentially misused by other code that happens to possess a lockfile, but has nothing to do with shallowness). This patch introduces that type as a thin wrapper around 'struct lockfile', and updates the two aforementioned functions and their callers to use it. Suggested-by: Junio C Hamano <gitster@pobox.com> Helped-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-30shallow: extract a header file for shallow-related functionsTaylor Blau
There are many functions in commit.h that are more related to shallow repositories than they are to any sort of generic commit machinery. Likely this began when there were only a few shallow-related functions, and commit.h seemed a reasonable enough place to put them. But, now there are a good number of shallow-related functions, and placing them all in 'commit.h' doesn't make sense. This patch extracts a 'shallow.h', which takes all of the declarations from 'commit.h' for functions which already exist in 'shallow.c'. We will bring the remaining shallow-related functions defined in 'commit.c' in a subsequent patch. For now, move only the ones that already are implemented in 'shallow.c', and update the necessary includes. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-28fetch-pack: in protocol v2, reset in_vain upon ACKJonathan Tan
In the function process_acks() in fetch-pack.c, the variable received_ack is meant to track that an ACK was received, but it was never set. This results in negotiation terminating prematurely through the in_vain counter, when the counter should have been reset upon every ACK. Therefore, reset the in_vain counter upon every ACK. Helped-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-28fetch-pack: in protocol v2, in_vain only after ACKJonathan Tan
When fetching, Git stops negotiation when it has sent at least MAX_IN_VAIN (which is 256) "have" lines without having any of them ACK-ed. But this is supposed to trigger only after the first ACK, as pack-protocol.txt says: However, the 256 limit *only* turns on in the canonical client implementation if we have received at least one "ACK %s continue" during a prior round. This helps to ensure that at least one common ancestor is found before we give up entirely. The code path for protocol v0 observes this, but not protocol v2, resulting in shorter negotiation rounds but significantly larger packfiles. Teach the code path for protocol v2 to check this criterion only after at least one ACK was received. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-28fetch-pack: return enum from process_acks()Jonathan Tan
process_acks() returns 0, 1, or 2, depending on whether "ready" was received and if not, whether at least one commit was found to be common. Replace these magic numbers with a documented enum. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-24shallow.c: use '{commit,rollback}_shallow_file'Taylor Blau
In bd0b42aed3 (fetch-pack: do not take shallow lock unnecessarily, 2019-01-10), the author noted that 'is_repository_shallow' produces visible side-effect(s) by setting 'is_shallow' and 'shallow_stat'. This is a problem for e.g., fetching with '--update-shallow' in a shallow repository with 'fetch.writeCommitGraph' enabled, since the update to '.git/shallow' will cause Git to think that the repository isn't shallow when it is, thereby circumventing the commit-graph compatibility check. This causes problems in shallow repositories with at least shallow refs that have at least one ancestor (since the client won't have those objects, and therefore can't take the reachability closure over commits when writing a commit-graph). Address this by introducing thin wrappers over 'commit_lock_file' and 'rollback_lock_file' for use specifically when the lock is held over '.git/shallow'. These wrappers (appropriately called 'commit_shallow_file' and 'rollback_shallow_file') call into their respective functions in 'lockfile.h', but additionally reset validity checks used by the shallow machinery. Replace each instance of 'commit_lock_file' and 'rollback_lock_file' with 'commit_shallow_file' and 'rollback_shallow_file' when the lock being held is over the '.git/shallow' file. As a result, 'prune_shallow' can now only be called once (since 'check_shallow_file_for_update' will die after calling 'reset_repository_shallow'). But, this is OK since we only call 'prune_shallow' at most once per process. Helped-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30oid_array: rename source file from sha1-arrayJeff King
We renamed the actual data structure in 910650d2f8 (Rename sha1_array to oid_array, 2017-03-31), but the file is still called sha1-array. Besides being slightly confusing, it makes it more annoying to grep for leftover occurrences of "sha1" in various files, because the header is included in so many places. Let's complete the transition by renaming the source and header files (and fixing up a few comment references). I kept the "-" in the name, as that seems to be our style; cf. fc1395f4a4 (sha1_file.c: rename to use dash in file name, 2018-04-10). We also have oidmap.h and oidset.h without any punctuation, but those are "struct oidmap" and "struct oidset" in the code. We _could_ make this "oidarray" to match, but somehow it looks uglier to me because of the length of "array" (plus it would be a very invasive patch for little gain). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>