summaryrefslogtreecommitdiff
path: root/upload-pack.c
AgeCommit message (Collapse)Author
2020-05-13Merge branch 'cc/upload-pack-v2-fetch-fix'Junio C Hamano
Serving a "git fetch" client over "git://" and "ssh://" protocols using the on-wire protocol version 2 was buggy on the server end when the client needs to make a follow-up request to e.g. auto-follow tags. * cc/upload-pack-v2-fetch-fix: upload-pack: clear filter_options for each v2 fetch command
2020-05-13Merge branch 'tb/shallow-cleanup'Junio C Hamano
Code cleanup. * tb/shallow-cleanup: shallow: use struct 'shallow_lock' for additional safety shallow.h: document '{commit,rollback}_shallow_file' shallow: extract a header file for shallow-related functions commit: make 'commit_graft_pos' non-static
2020-05-08upload-pack: clear filter_options for each v2 fetch commandChristian Couder
Because of the request/response model of protocol v2, the upload_pack_v2() function is sometimes called twice in the same process, while 'struct list_objects_filter_options filter_options' was declared as static at the beginning of 'upload-pack.c'. This made the check in list_objects_filter_die_if_populated(), which is called by process_args(), fail the second time upload_pack_v2() is called, as filter_options had already been populated the first time. To fix that, filter_options is not static any more. It's now owned directly by upload_pack(). It's now also part of 'struct upload_pack_data', so that it's owned indirectly by upload_pack_v2(). In the long term, the goal is to also have upload_pack() use 'struct upload_pack_data', so adding filter_options to this struct makes more sense than to have it owned directly by upload_pack_v2(). This fixes the first of the 2 bugs documented by d0badf8797 (partial-clone: demonstrate bugs in partial fetch, 2020-02-21). Helped-by: Derrick Stolee <dstolee@microsoft.com> Helped-by: Jeff King <peff@peff.net> Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-30shallow: extract a header file for shallow-related functionsTaylor Blau
There are many functions in commit.h that are more related to shallow repositories than they are to any sort of generic commit machinery. Likely this began when there were only a few shallow-related functions, and commit.h seemed a reasonable enough place to put them. But, now there are a good number of shallow-related functions, and placing them all in 'commit.h' doesn't make sense. This patch extracts a 'shallow.h', which takes all of the declarations from 'commit.h' for functions which already exist in 'shallow.c'. We will bring the remaining shallow-related functions defined in 'commit.c' in a subsequent patch. For now, move only the ones that already are implemented in 'shallow.c', and update the necessary includes. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27upload-pack: handle unexpected delim packetsJeff King
When processing the arguments list for a v2 ls-refs or fetch command, we loop like this: while (packet_reader_read(request) != PACKET_READ_FLUSH) { const char *arg = request->line; ...handle arg... } to read and handle packets until we see a flush. The hidden assumption here is that anything except PACKET_READ_FLUSH will give us valid packet data to read. But that's not true; PACKET_READ_DELIM or PACKET_READ_EOF will leave packet->line as NULL, and we'll segfault trying to look at it. Instead, we should follow the more careful model demonstrated on the client side (e.g., in process_capabilities_v2): keep looping as long as we get normal packets, and then make sure that we broke out of the loop due to a real flush. That fixes the segfault and correctly diagnoses any unexpected input from the client. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10config: split repo scope to local and worktreeMatthew Rogers
Previously when iterating through git config variables, worktree config and local config were both considered "CONFIG_SCOPE_REPO". This was never a problem before as no one had needed to differentiate between the two cases, but future functionality may care whether or not the config options come from a worktree or from the repository's actual local config file. For example, the planned feature to add a '--show-scope' to config to allow a user to see which scope listed config options come from would confuse users if it just printed 'repo' rather than 'local' or 'worktree' as the documentation would lead them to expect. As well as the additional benefit of making the implementation look more like how the documentation describes the interface. To accomplish this we split out what was previously considered repo scope to be local and worktree. The clients of 'current_config_scope()' who cared about CONFIG_SCOPE_REPO are also modified to similarly care about CONFIG_SCOPE_WORKTREE and CONFIG_SCOPE_LOCAL to preserve previous behavior. Signed-off-by: Matthew Rogers <mattr94@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07Merge branch 'jk/disable-commit-graph-during-upload-pack'Junio C Hamano
The "upload-pack" (the counterpart of "git fetch") needs to disable commit-graph when responding to a shallow clone/fetch request, but the way this was done made Git panic, which has been corrected. * jk/disable-commit-graph-during-upload-pack: upload-pack: disable commit graph more gently for shallow traversal commit-graph: bump DIE_ON_LOAD check to actual load-time
2019-09-18Merge branch 'md/list-objects-filter-combo'Junio C Hamano
The list-objects-filter API (used to create a sparse/lazy clone) learned to take a combined filter specification. * md/list-objects-filter-combo: list-objects-filter-options: make parser void list-objects-filter-options: clean up use of ALLOC_GROW list-objects-filter-options: allow mult. --filter strbuf: give URL-encoding API a char predicate fn list-objects-filter-options: make filter_spec a string_list list-objects-filter-options: move error check up list-objects-filter: implement composite filters list-objects-filter-options: always supply *errbuf list-objects-filter: put omits set in filter struct list-objects-filter: encapsulate filter components
2019-09-12upload-pack: disable commit graph more gently for shallow traversalJeff King
When the client has asked for certain shallow options like "deepen-since", we do a custom rev-list walk that pretends to be shallow. Before doing so, we have to disable the commit-graph, since it is not compatible with the shallow view of the repository. That's handled by 829a321569 (commit-graph: close_commit_graph before shallow walk, 2018-08-20). That commit literally closes and frees our repo->objects->commit_graph struct. That creates an interesting problem for commits that have _already_ been parsed using the commit graph. Their commit->object.parsed flag is set, their commit->graph_pos is set, but their commit->maybe_tree may still be NULL. When somebody later calls repo_get_commit_tree(), we see that we haven't loaded the tree oid yet and try to get it from the commit graph. But since it has been freed, we segfault! So the root of the issue is a data dependency between the commit's lazy-load of the tree oid and the fact that the commit graph can go away mid-process. How can we resolve it? There are a couple of general approaches: 1. The obvious answer is to avoid loading the tree from the graph when we see that it's NULL. But then what do we return for the tree oid? If we return NULL, our caller in do_traverse() will rightly complain that we have no tree. We'd have to fallback to loading the actual commit object and re-parsing it. That requires teaching parse_commit_buffer() to understand re-parsing (i.e., not starting from a clean slate and not leaking any allocated bits like parent list pointers). 2. When we close the commit graph, walk through the set of in-memory objects and clear any graph_pos pointers. But this means we also have to "unparse" any such commits so that we know they still need to open the commit object to fill in their trees. So it's no less complicated than (1), and is more expensive (since we clear objects we might not later need). 3. Stop freeing the commit-graph struct. Continue to let it be used for lazy-loads of tree oids, but let upload-pack specify that it shouldn't be used for further commit parsing. 4. Push the whole shallow rev-list out to its own sub-process, with the commit-graph disabled from the start, giving it a clean memory space to work from. I've chosen (3) here. Options (1) and (2) would work, but are non-trivial to implement. Option (4) is more expensive, and I'm not sure how complicated it is (shelling out for the actual rev-list part is easy, but we do then parse the resulting commits internally, and I'm not clear which parts need to be handling shallow-ness). The new test in t5500 triggers this segfault, but see the comments there for how horribly intimate it has to be with how both upload-pack and commit graphs work. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-09Merge branch 'jk/oidhash'Junio C Hamano
Code clean-up to remove hardcoded SHA-1 hash from many places. * jk/oidhash: hashmap: convert sha1hash() to oidhash() hash.h: move object_id definition from cache.h khash: rename oid helper functions khash: drop sha1-specific map types pack-bitmap: convert khash_sha1 maps into kh_oid_map delta-islands: convert island_marks khash to use oids khash: rename kh_oid_t to kh_oid_set khash: drop broken oid_map typedef object: convert create_object() to use object_id object: convert internal hash_obj() to object_id object: convert lookup_object() to use object_id object: convert lookup_unknown_object() to use object_id pack-objects: convert locate_object_entry_hash() to object_id pack-objects: convert packlist_find() to use object_id pack-bitmap-write: convert some helpers to use object_id upload-pack: rename a "sha1" variable to "oid" describe: fix accidental oid/hash type-punning
2019-07-09Merge branch 'ds/close-object-store'Junio C Hamano
The commit-graph file is now part of the "files that the runtime may keep open file descriptors on, all of which would need to be closed when done with the object store", and the file descriptor to an existing commit-graph file now is closed before "gc" finalizes a new instance to replace it. * ds/close-object-store: packfile: rename close_all_packs to close_object_store packfile: close commit-graph in close_all_packs commit-graph: use raw_object_store when closing
2019-06-28list-objects-filter-options: allow mult. --filterMatthew DeVore
Allow combining of multiple filters by simply repeating the --filter flag. Before this patch, the user had to combine them in a single flag somewhat awkwardly (e.g. --filter=combine:FOO+BAR), including URL-encoding the individual filters. To make this work, in the --filter flag parsing callback, rather than error out when we detect that the filter_options struct is already populated, we modify it in-place to contain the added sub-filter. The existing sub-filter becomes the lhs of the combined filter, and the next sub-filter becomes the rhs. We also have to URL-encode the LHS and RHS sub-filters. We can simplify the operation if the LHS is already a combine: filter. In that case, we just append the URL-encoded RHS sub-filter to the LHS spec to get the new spec. Helped-by: Emily Shaffer <emilyshaffer@google.com> Helped-by: Jeff Hostetler <git@jeffhostetler.com> Helped-by: Jeff King <peff@peff.net> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28list-objects-filter-options: make filter_spec a string_listMatthew DeVore
Make the filter_spec string a string_list rather than a raw C string. The list of strings must be concatted together to make a complete filter_spec. A future patch will use this capability to build "combine:" filter specs gradually. A strbuf would seem to be a more natural choice for this object, but it unfortunately requires initialization besides just zero'ing out the memory. This results in all container structs, and all containers of those structs, etc., to also require initialization. Initializing them all would be more cumbersome that simply using a string_list, which behaves properly when its contents are zero'd. For the purposes of code simplification, change behavior in how filter specs are conveyed over the protocol: do not normalize the tree:<depth> filter specs since there should be no server in existence that supports tree:# but not tree:#k etc. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20object: convert lookup_object() to use object_idJeff King
There are no callers left of lookup_object() that aren't just passing us the "hash" member of a "struct object_id". Let's take the whole struct, which gets us closer to removing all raw sha1 variables. It also matches the existing conversions of lookup_blob(), etc. The conversions of callers were done by hand, but they're all mechanical one-liners. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20object: convert lookup_unknown_object() to use object_idJeff King
There are no callers left of lookup_unknown_object() that aren't just passing us the "hash" member of a "struct object_id". Let's take the whole struct, which gets us closer to removing all raw sha1 variables. It also matches the existing conversions of lookup_blob(), etc. The conversions of callers were done by hand, but they're all mechanical one-liners. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20upload-pack: rename a "sha1" variable to "oid"Jeff King
This variable is a "struct object_id", but uses the old-style name "sha1". Let's call it oid to match more modern code (and make it clear that it can handle any algorithm, since it uses parse_oid_hex() properly). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-17Merge branch 'jk/HEAD-symref-in-xfer-namespaces'Junio C Hamano
The server side support for "git fetch" used to show incorrect value for the HEAD symbolic ref when the namespace feature is in use, which has been corrected. * jk/HEAD-symref-in-xfer-namespaces: upload-pack: strip namespace from symref data
2019-06-12commit-graph: use raw_object_store when closingDerrick Stolee
The close_commit_graph() method took a repository struct, but then only uses the raw_object_store within. Change the function prototype to make the method more flexible. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-28upload-pack: strip namespace from symref dataJeff King
Since 7171d8c15f (upload-pack: send symbolic ref information as capability, 2013-09-17), we've sent cloning and fetching clients special information about which branch HEAD is pointing to, so that they don't have to guess based on matching up commit ids. However, this feature has never worked properly with the GIT_NAMESPACE feature. Because upload-pack uses head_ref_namespaced(find_symref), we do find and report on refs/namespaces/foo/HEAD instead of the actual HEAD of the repo. This makes sense, since the branch pointed to by the top-level HEAD may not be advertised at all. But we do two things wrong: 1. We report the full name refs/namespaces/foo/HEAD, instead of just HEAD. Meaning no client is going to bother doing anything with that symref, since we're not otherwise advertising it. 2. We report the symref destination using its full name (e.g., refs/namespaces/foo/refs/heads/master). That's similarly useless to the client, who only saw "refs/heads/master" in the advertisement. We should be stripping the namespace prefix off of both places (which this patch fixes). Likely nobody noticed because we tend to do the right thing anyway. Bug (1) means that we said nothing about HEAD (just refs/namespace/foo/HEAD). And so the client half of the code, from a45b5f0552 (connect: annotate refs with their symref information in get_remote_head(), 2013-09-17), does not annotate HEAD, and we use the fallback in guess_remote_head(), matching refs by object id. Which is usually right. It only falls down in ambiguous cases, like the one laid out in the included test. This also means that we don't have to worry about breaking anybody who was putting pre-stripped names into their namespace symrefs when we fix bug (2). Because of bug (1), nobody would have been using the symref we advertised in the first place (not to mention that those symrefs would have appeared broken for any non-namespaced access). Note that we have separate fixes here for the v0 and v2 protocols. The symref advertisement moved in v2 to be a part of the ls-refs command. This actually gets part (1) right, since the symref annotation piggy-backs on the existing ref advertisement, which is properly stripped. But it still needs a fix for part (2). The included tests cover both protocols. Reported-by: Bryan Turner <bturner@atlassian.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-19Merge branch 'en/unicode-in-refnames'Junio C Hamano
On a filesystem like HFS+, the names of the refs stored as filesystem entities may become different from what the end-user expects, just like files in the working tree get "renamed". Work around the mismatch by paying attention to the core.precomposeUnicode configuration. * en/unicode-in-refnames: Honor core.precomposeUnicode in more places
2019-05-08Merge branch 'nd/sha1-name-c-wo-the-repository'Junio C Hamano
Further code clean-up to allow the lowest level of name-to-object mapping layer to work with a passed-in repository other than the default one. * nd/sha1-name-c-wo-the-repository: (34 commits) sha1-name.c: remove the_repo from get_oid_mb() sha1-name.c: remove the_repo from other get_oid_* sha1-name.c: remove the_repo from maybe_die_on_misspelt_object_name submodule-config.c: use repo_get_oid for reading .gitmodules sha1-name.c: add repo_get_oid() sha1-name.c: remove the_repo from get_oid_with_context_1() sha1-name.c: remove the_repo from resolve_relative_path() sha1-name.c: remove the_repo from diagnose_invalid_index_path() sha1-name.c: remove the_repo from handle_one_ref() sha1-name.c: remove the_repo from get_oid_1() sha1-name.c: remove the_repo from get_oid_basic() sha1-name.c: remove the_repo from get_describe_name() sha1-name.c: remove the_repo from get_oid_oneline() sha1-name.c: add repo_interpret_branch_name() sha1-name.c: remove the_repo from interpret_branch_mark() sha1-name.c: remove the_repo from interpret_nth_prior_checkout() sha1-name.c: remove the_repo from get_short_oid() sha1-name.c: add repo_for_each_abbrev() sha1-name.c: store and use repo in struct disambiguate_state sha1-name.c: add repo_find_unique_abbrev_r() ...
2019-04-26Honor core.precomposeUnicode in more placesElijah Newren
On Mac's HFS where git sets core.precomposeUnicode to true automatically by git init/clone, when a user creates a simple unicode refname (in NFC format) such as españa: $ git branch españa different commands would display the branch name differently. For example, git branch, git log --decorate, and git fast-export all used 65 73 70 61 c3 b1 61 (or "espa\xc3\xb1a") (NFC form) while show-ref would use 65 73 70 61 6e cc 83 61 (or "espan\xcc\x83a") (NFD form). A stress test for git filter-repo was tripped up by this inconsistency, though digging in I found that the problems could compound; for example, if the user ran $ git pack-refs --all and then tried to check out the branch, they would be met with: $ git checkout españa error: pathspec 'españa' did not match any file(s) known to git $ git checkout españa -- fatal: invalid reference: españa $ git branch españa * master Note that the user could run the `git branch` command first and copy and paste the `españa` portion of the output and still see the same two errors. Also, if the user added --no-prune to the pack-refs command, then they would see three branches: master, españa, and españa (those last two are NFC vs. NFD forms, even if they render the same). Further, if the user had the `españa` branch checked out before running `git pack-refs --all`, the user would be greeted with (note that I'm trimming trailing output with an ellipsis): $ git rev-parse HEAD fatal: ambiguous argument 'HEAD': unknown revision or path... $ git status On branch españa No commits yet... Or worse, if the user didn't check this stuff first, running `git commit` will create a new commit with all changes of all of history being squashed into it. In addition to pack-refs, one could also get into this state with upload-pack or anything that calls either pack-refs or upload-pack (e.g. gc or clone). Add code in a few places (pack-refs, show-ref, upload-pack) to check and honor the setting of core.precomposeUnicode to avoid these bugs. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-15upload-pack: send ERR packet for non-tip objectsJeff King
Commit bdb31eada7 (upload-pack: report "not our ref" to client, 2017-02-23) catches the case where a client asks for an object we don't have, and issues a message that the client can show to the user (in addition to dying and writing to stderr). There's a similar case (with the same message) when the client asks for an object which we _do_ have, but which isn't a ref tip (or isn't reachable, when uploadpack.allowReachableSHA1InWant is true). Let's give that one the same treatment, for the same reason (namely that it's more informative to the client than just hanging up, since they won't see our stderr over some protocols). There are two tests here. We cover it most directly in t5530 by invoking upload-pack, which matches the existing "not our ref" test. But a more end-to-end check is that "git fetch" actually shows the message to the client. We're already checking in t5516 that this case fails, so we can just check stderr there, too. Note that even after we started ignoring SIGPIPE in 8bf4becf0c, this could in theory still be racy as described in that commit (because we die() on write failures before pumping the connection for any ERR packets). In practice this should be OK for this case. The server will not actually check reachability until it has received our whole group of "want" lines. And since we have no objects in the repository, we won't send any "have" lines, meaning we're always waiting to read the server response. Note also that this case cannot happen in the v2 protocol, since it allows any available object to be requested. However, we don't have to take any steps to protect against the upcoming GIT_TEST_PROTOCOL_VERSION in our tests: - the tests in t5516 would already need to be skipped under v2, and that is covered by ab0c5f5096 (tests: always test fetch of unreachable with v0, 2019-02-25) - the tests in t5530 invoke upload-pack directly, which will continue to default to v0. Eventually we may have a test setting which uses v2 even for bare upload-pack calls, but we can't override it here until we know what the setting looks like. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-08refs.c: remove the_repo from expand_ref()Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-05Merge branch 'jt/fetch-v2-sideband'Junio C Hamano
"git fetch" and "git upload-pack" learned to send all exchange over the sideband channel while talking the v2 protocol. * jt/fetch-v2-sideband: tests: define GIT_TEST_SIDEBAND_ALL {fetch,upload}-pack: sideband v2 fetch response sideband: reverse its dependency on pkt-line pkt-line: introduce struct packet_writer pack-protocol.txt: accept error packets in any context Use packet_reader instead of packet_read_line
2019-02-05Merge branch 'js/filter-options-should-use-plain-int'Junio C Hamano
Update the protocol message specification to allow only the limited use of scaled quantities. This is ensure potential compatibility issues will not go out of hand. * js/filter-options-should-use-plain-int: filter-options: expand scaled numbers tree:<depth>: skip some trees even when collecting omits list-objects-filter: teach tree:# how to handle >0
2019-01-17tests: define GIT_TEST_SIDEBAND_ALLJonathan Tan
Define a GIT_TEST_SIDEBAND_ALL environment variable meant to be used from tests. When set to true, this overrides uploadpack.allowsidebandall to true, allowing the entire test suite to be run as if this configuration is in place for all repositories. As of this patch, all tests pass whether GIT_TEST_SIDEBAND_ALL is unset or set to 1. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17{fetch,upload}-pack: sideband v2 fetch responseJonathan Tan
Currently, a response to a fetch request has sideband support only while the packfile is being sent, meaning that the server cannot send notices until the start of the packfile. Extend sideband support in protocol v2 fetch responses to the whole response. upload-pack will advertise it if the uploadpack.allowsidebandall configuration variable is set, and fetch-pack will automatically request it if advertised. If the sideband is to be used throughout the whole response, upload-pack will use it to send errors instead of prefixing a PKT-LINE payload with "ERR ". This will be tested in a subsequent patch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15filter-options: expand scaled numbersJosh Steadmon
When communicating with a remote server or a subprocess, use expanded numbers rather than numbers with scaling suffix in the object filter spec (e.g. "limit:blob=1k" becomes "limit:blob=1024"). Update the protocol docs to note that clients should always perform this expansion, to allow for more compatibility between server implementations. Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15pkt-line: introduce struct packet_writerJonathan Tan
A future patch will allow the client to request multiplexing of the entire fetch response (and not only during packfile transmission), which in turn allows the server to send progress and keepalive messages at any time during the response. It will be convenient for a future patch if writing options (specifically, whether the written data is to be multiplexed) could be controlled from a single place, so create struct packet_writer to serve as that place, and modify upload-pack to use it. Currently, it only stores the output fd, but a subsequent patch will (as described above) introduce an option to determine if the written data is to be multiplexed. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-10upload-pack: teach deepen-relative in protocol v2Jonathan Tan
Commit 685fbd3291 ("fetch-pack: perform a fetch using v2", 2018-03-15) attempted to teach Git deepen-relative in protocol v2 (among other things), but it didn't work: (1) fetch-pack.c needs to emit "deepen-relative". (2) upload-pack.c needs to ensure that the correct deepen_relative variable is passed to deepen() (there are two - the static variable and the one in struct upload_pack_data). (3) Before deepen() computes the list of reachable shallows, it first needs to mark all "our" refs as OUR_REF. v2 currently does not do this, because unlike v0, it is not needed otherwise. Fix all this and include a test demonstrating that it works now. For (2), the static variable deepen_relative is also eliminated, removing a source of confusion. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-02pack-protocol.txt: accept error packets in any contextMasaya Suzuki
In the Git pack protocol definition, an error packet may appear only in a certain context. However, servers can face a runtime error (e.g. I/O error) at an arbitrary timing. This patch changes the protocol to allow an error packet to be sent instead of any packet. Without this protocol spec change, when a server cannot process a request, there's no way to tell that to a client. Since the server cannot produce a valid response, it would be forced to cut a connection without telling why. With this protocol spec change, the server can be more gentle in this situation. An old client may see these error packets as an unexpected packet, but this is not worse than having an unexpected EOF. Following this protocol spec change, the error packet handling code is moved to pkt-line.c. Implementation wise, this implementation uses pkt-line to communicate with a subprocess. Since this is not a part of Git protocol, it's possible that a packet that is not supposed to be an error packet is mistakenly parsed as an error packet. This error packet handling is enabled only for the Git pack protocol parsing code considering this. Signed-off-by: Masaya Suzuki <masayasuzuki@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-02Use packet_reader instead of packet_read_lineMasaya Suzuki
By using and sharing a packet_reader while handling a Git pack protocol request, the same reader option is used throughout the code. This makes it easy to set a reader option to the request parsing code. Signed-off-by: Masaya Suzuki <masayasuzuki@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-06Merge branch 'jt/upload-pack-v2-fix-shallow'Junio C Hamano
"git fetch" over protocol v2 into a shallow repository failed to fetch full history behind a new tip of history that was diverged before the cut-off point of the history that was previously fetched shallowly. * jt/upload-pack-v2-fix-shallow: upload-pack: clear flags before each v2 request upload-pack: make want_obj not global upload-pack: make have_obj not global
2018-10-30Merge branch 'jk/uploadpack-packobjectshook-fix'Junio C Hamano
Code clean-up that results in a small bugfix. * jk/uploadpack-packobjectshook-fix: upload-pack: fix broken if/else chain in config callback
2018-10-30Merge branch 'bc/hash-transition-part-15'Junio C Hamano
More codepaths are moving away from hardcoded hash sizes. * bc/hash-transition-part-15: rerere: convert to use the_hash_algo submodule: make zero-oid comparison hash function agnostic apply: rename new_sha1_prefix and old_sha1_prefix apply: replace hard-coded constants tag: express constant in terms of the_hash_algo transport: use parse_oid_hex instead of a constant upload-pack: express constants in terms of the_hash_algo refs/packed-backend: express constants using the_hash_algo packfile: express constants in terms of the_hash_algo pack-revindex: express constants in terms of the_hash_algo builtin/fetch-pack: remove constants with parse_oid_hex builtin/mktree: remove hard-coded constant builtin/repack: replace hard-coded constants pack-bitmap-write: use GIT_MAX_RAWSZ for allocation object_id.cocci: match only expressions of type 'struct object_id'
2018-10-26upload-pack: fix broken if/else chain in config callbackJeff King
The upload_pack_config() callback uses an if/else chain like: if (!strcmp(var, "a")) ... else if (!strcmp(var, "b")) ... etc This works as long as the conditions are mutually exclusive, but one of them is not. 20b20a22f8 (upload-pack: provide a hook for running pack-objects, 2016-05-18) added: else if (current_config_scope() != CONFIG_SCOPE_REPO) { ... check some more options ... } That was fine in that commit, because it came at the end of the chain. But later, 10ac85c785 (upload-pack: add object filtering for partial clone, 2017-12-08) did this: else if (current_config_scope() != CONFIG_SCOPE_REPO) { ... check some more options ... } else if (!strcmp("uploadpack.allowfilter", var)) ... We'd always check the scope condition first, meaning we'd _only_ respect allowfilter when it's in the repo config. You can see this with: git -c uploadpack.allowfilter=true upload-pack . | head -1 which will not advertise the filter capability (but will after this patch). We never noticed because: - our tests always set it in the repo config - in protocol v2, we use a different code path that actually calls repo_config_get_bool() separately, so that _does_ work. Real-world people experimenting with this may be using v2. The more recent uploadpack.allowrefinwant option is in the same boat. There are a few possible fixes: 1. Bump the scope conditional back to the bottom of the chain. But that just means somebody else is likely to make the same mistake later. 2. Make the conditional more like the others. I.e.: else if (!current_config_scope() != CONFIG_SCOPE_REPO && !strcmp(var, "uploadpack.notallowedinrepo")) This works, but the idea of the original structure was that we may grow multiple sensitive options like this. 3. Pull it out of the chain entirely. The chain mostly serves to avoid extra strcmp() calls after we've found a match. But it's not worth caring about those. In the worst case, when there isn't a match, we're already hitting every strcmp (and this happens regularly for stuff like "core.bare", etc). This patch does (3). Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19upload-pack: clear flags before each v2 requestJonathan Tan
Suppose a server has the following commit graph: A B \ / O We create a client by cloning A from the server with depth 1, and add many commits to it (so that future fetches span multiple requests due to lengthy negotiation). If it then fetches B using protocol v2, the fetch spanning multiple requests, the resulting packfile does not contain O even though the client did report that A is shallow. This is because upload_pack_v2() can be called multiple times while processing the same session. During the 2nd and all subsequent invocations, some object flags remain from the previous invocations. In particular, CLIENT_SHALLOW remains, preventing process_shallow() from adding client-reported shallows to the "shallows" array, and hence pack-objects not knowing about these client-reported shallows. Therefore, teach upload_pack_v2() to clear object flags at the start of each invocation. This has some other results: - THEY_HAVE gates addition of objects to have_obj in process_haves(). Previously in upload_pack_v2(), have_obj needed to be static because once an object is added to have_obj, it is never readded and thus we needed to retain the contents of have_obj between invocations. Now that flags are cleared, this is no longer necessary. This patch does not change the behavior of ok_to_give_up() (THEY_HAVE is still set on each "have") and got_oid() (used only in non-v2)); THEY_HAVE is not used in any other function. - WANTED gates addition of objects to want_obj in parse_want() and parse_want_ref(). It is also used in receive_needs(), but that is only used in non-v2. For the same reasons as THEY_HAVE, want_obj no longer needs to be static in upload_pack_v2(). - CLIENT_SHALLOW is changed as discussed above. Clearing of the other 5 flags does not affect functionality in v2. (Note that in non-v2, upload_pack() is only called once per process, so each invocation starts with blank flags anyway.) - OUR_REF is only used in non-v2. - COMMON_KNOWN is only used as a scratch flag in ok_to_give_up(). - SHALLOW is passed to invocations in deepen() and deepen_by_rev_list(), but upload-pack doesn't use it. - NOT_SHALLOW is used by send_shallow() and send_unshallow(), but invocations of those functions are always preceded by code that sets NOT_SHALLOW on the appropriate objects. - HIDDEN_REF is only used in non-v2. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19upload-pack: make want_obj not globalJonathan Tan
Because upload_pack_v2() can be invoked multiple times in the same process, the static variable want_obj may not be empty when it is invoked. To make further analysis of this situation easier, make the variable local; analysis will be done in a subsequent patch. The new local variable in upload_pack_v2() is static to preserve existing behavior; this is not necessary in upload_pack() because upload_pack() is only invoked once per process. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19upload-pack: make have_obj not globalJonathan Tan
Because upload_pack_v2() can be invoked multiple times in the same process, the static variable have_obj may not be empty when it is invoked. To make further analysis of this situation easier, make the variable local; analysis will be done in a subsequent patch. The new local variable in upload_pack_v2() is static to preserve existing behavior; this is not necessary in upload_pack() because upload_pack() is only invoked once per process. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-16Merge branch 'ds/commit-graph-with-grafts'Junio C Hamano
The recently introduced commit-graph auxiliary data is incompatible with mechanisms such as replace & grafts that "breaks" immutable nature of the object reference relationship. Disable optimizations based on its use (and updating existing commit-graph) when these incompatible features are in use in the repository. * ds/commit-graph-with-grafts: commit-graph: close_commit_graph before shallow walk commit-graph: not compatible with uninitialized repo commit-graph: not compatible with grafts commit-graph: not compatible with replace objects test-repository: properly init repo commit-graph: update design document refs.c: upgrade for_each_replace_ref to be a each_repo_ref_fn callback refs.c: migrate internal ref iteration to pass thru repository argument
2018-10-15upload-pack: express constants in terms of the_hash_algobrian m. carlson
Convert all uses of the GIT_SHA1_HEXSZ to use the_hash_algo so that they are appropriate for any given hash length. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17Merge branch 'ds/reachable'Junio C Hamano
The code for computing history reachability has been shuffled, obtained a bunch of new tests to cover them, and then being improved. * ds/reachable: commit-reach: correct accidental #include of C file commit-reach: use can_all_from_reach commit-reach: make can_all_from_reach... linear commit-reach: replace ref_newer logic test-reach: test commit_contains test-reach: test can_all_from_reach_with_flags test-reach: test reduce_heads test-reach: test get_merge_bases_many test-reach: test is_descendant_of test-reach: test in_merge_bases test-reach: create new test tool for ref_newer commit-reach: move can_all_from_reach_with_flags upload-pack: generalize commit date cutoff upload-pack: refactor ok_to_give_up() upload-pack: make reachable() more generic commit-reach: move commit_contains from ref-filter commit-reach: move ref_newer from remote.c commit.h: remove method declarations commit-reach: move walk methods from commit.c
2018-08-21commit-graph: close_commit_graph before shallow walkDerrick Stolee
Call close_commit_graph() when about to start a rev-list walk that includes shallow commits. This is necessary in code paths that "fake" shallow commits for the sake of fetch. Specifically, test 351 in t5500-fetch-pack.sh runs git fetch --shallow-exclude one origin with a file-based transfer. When the "remote" has a commit-graph, we do not prevent the commit-graph from being loaded, but then the commits are intended to be dynamically transferred into shallow commits during get_shallow_commits_by_rev_list(). By closing the commit-graph before this call, we prevent this interaction. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-02Merge branch 'sb/object-store-lookup'Junio C Hamano
lookup_commit_reference() and friends have been updated to find in-core object for a specific in-core repository instance. * sb/object-store-lookup: (32 commits) commit.c: allow lookup_commit_reference to handle arbitrary repositories commit.c: allow lookup_commit_reference_gently to handle arbitrary repositories tag.c: allow deref_tag to handle arbitrary repositories object.c: allow parse_object to handle arbitrary repositories object.c: allow parse_object_buffer to handle arbitrary repositories commit.c: allow get_cached_commit_buffer to handle arbitrary repositories commit.c: allow set_commit_buffer to handle arbitrary repositories commit.c: migrate the commit buffer to the parsed object store commit-slabs: remove realloc counter outside of slab struct commit.c: allow parse_commit_buffer to handle arbitrary repositories tag: allow parse_tag_buffer to handle arbitrary repositories tag: allow lookup_tag to handle arbitrary repositories commit: allow lookup_commit to handle arbitrary repositories tree: allow lookup_tree to handle arbitrary repositories blob: allow lookup_blob to handle arbitrary repositories object: allow lookup_object to handle arbitrary repositories object: allow object_as_type to handle arbitrary repositories tag: add repository argument to deref_tag tag: add repository argument to parse_tag_buffer tag: add repository argument to lookup_tag ...
2018-07-24Merge branch 'jt/connectivity-check-after-unshallow'Junio C Hamano
"git fetch" failed to correctly validate the set of objects it received when making a shallow history deeper, which has been corrected. * jt/connectivity-check-after-unshallow: fetch-pack: write shallow, then check connectivity fetch-pack: implement ref-in-want fetch-pack: put shallow info in output parameter fetch: refactor to make function args narrower fetch: refactor fetch_refs into two functions fetch: refactor the population of peer ref OIDs upload-pack: test negotiation with changing repository upload-pack: implement ref-in-want test-pkt-line: add unpack-sideband subcommand
2018-07-20commit-reach: make can_all_from_reach... linearDerrick Stolee
The can_all_from_reach_with_flags() algorithm is currently quadratic in the worst case, because it calls the reachable() method for every 'from' without tracking which commits have already been walked or which can already reach a commit in 'to'. Rewrite the algorithm to walk each commit a constant number of times. We also add some optimizations that should work for the main consumer of this method: fetch negotitation (haves/wants). The first step includes using a depth-first-search (DFS) from each 'from' commit, sorted by ascending generation number. We do not walk beyond the minimum generation number or the minimum commit date. This DFS is likely to be faster than the existing reachable() method because we expect previous ref values to be along the first-parent history. If we find a target commit, then we mark everything in the DFS stack as a RESULT. This expands the set of targets for the other 'from' commits. We also mark the visited commits using 'assign_flag' to prevent re- walking the same commits. We still need to clear our flags at the end, which is why we will have a total of three visits to each commit. Performance was measured on the Linux repository using 'test-tool reach can_all_from_reach'. The input included rows seeded by tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as v3.[0-9]*. This mimics a (very large) fetch that says "I have all major v3 releases and want all major v4 releases." The "large" case included X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate tags to the set, which does not greatly increase the number of objects that are considered, but does increase the number of 'from' commits, demonstrating the quadratic nature of the previous code. Small Case: Before: 1.52 s After: 0.26 s Large Case: Before: 3.50 s After: 0.27 s Note how the time increases between the two cases in the two versions. The new code increases relative to the number of commits that need to be walked, but not directly relative to the number of 'from' commits. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20commit-reach: move can_all_from_reach_with_flagsDerrick Stolee
There are several commit walks in the codebase. Group them together into a new commit-reach.c file and corresponding header. After we group these walks into one place, we can reduce duplicate logic by calling equivalent methods. The can_all_from_reach_with_flags method is used in a stateful way by upload-pack.c. The parameters are very flexible, so we will be able to use its commit walking logic for many other callers. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20upload-pack: generalize commit date cutoffDerrick Stolee
The ok_to_give_up() method uses the commit date as a cutoff to avoid walking the entire reachble set of commits. Before moving the reachable() method to commit-reach.c, pull out the dependence on the global constant 'oldest_have' with a 'min_commit_date' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20upload-pack: refactor ok_to_give_up()Derrick Stolee
In anticipation of consolidating all commit reachability algorithms, refactor ok_to_give_up() in order to allow splitting its logic into an external method. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>