summaryrefslogtreecommitdiff
path: root/fetch-pack.h
AgeCommit message (Collapse)Author
2014-01-17Merge branch 'nd/shallow-clone'Junio C Hamano
Fetching from a shallow-cloned repository used to be forbidden, primarily because the codepaths involved were not carefully vetted and we did not bother supporting such usage. This attempts to allow object transfer out of a shallow-cloned repository in a controlled way (i.e. the receiver become a shallow repository with truncated history). * nd/shallow-clone: (31 commits) t5537: fix incorrect expectation in test case 10 shallow: remove unused code send-pack.c: mark a file-local function static git-clone.txt: remove shallow clone limitations prune: clean .git/shallow after pruning objects clone: use git protocol for cloning shallow repo locally send-pack: support pushing from a shallow clone via http receive-pack: support pushing to a shallow clone via http smart-http: support shallow fetch/clone remote-curl: pass ref SHA-1 to fetch-pack as well send-pack: support pushing to a shallow clone receive-pack: allow pushes that update .git/shallow connected.c: add new variant that runs with --shallow-file add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses receive/send-pack: support pushing from a shallow clone receive-pack: reorder some code in unpack() fetch: add --update-shallow to accept refs that update .git/shallow upload-pack: make sure deepening preserves shallow roots fetch: support fetching from a shallow repository clone: support remote shallow repository ...
2013-12-11fetch: add --update-shallow to accept refs that update .git/shallowNguyễn Thái Ngọc Duy
The same steps are done as in when --update-shallow is not given. The only difference is we now add all shallow commits in "ours" and "theirs" to .git/shallow (aka "step 8"). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-11clone: support remote shallow repositoryNguyễn Thái Ngọc Duy
Cloning from a shallow repository does not follow the "8 steps for new .git/shallow" because if it does we need to get through step 6 for all refs. That means commit walking down to the bottom. Instead the rule to create .git/shallow is simpler and, more importantly, cheap: if a shallow commit is found in the pack, it's probably used (i.e. reachable from some refs), so we add it. Others are dropped. One may notice this method seems flawed by the word "probably". A shallow commit may not be reachable from any refs at all if it's attached to an object island (a group of objects that are not reachable by any refs). If that object island is not complete, a new fetch request may send more objects to connect it to some ref. At that time, because we incorrectly installed the shallow commit in this island, the user will not see anything after that commit (fsck is still ok). This is not desired. Given that object islands are rare (C Git never sends such islands for security reasons) and do not really harm the repository integrity, a tradeoff is made to surprise the user occasionally but work faster everyday. A new option --strict could be added later that follows exactly the 8 steps. "git prune" can also learn to remove dangling objects _and_ the shallow commits that are attached to them from .git/shallow. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-11fetch-pack.h: one statement per bitfield declarationNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-09git fetch-pack: add --diag-urlTorsten Bögershausen
The main purpose is to trace the URL parser called by git_connect() in connect.c The main features of the parser can be listed as this: - parse out host and path for URLs with a scheme (git:// file:// ssh://) - parse host names embedded by [] correctly - extract the port number, if present - separate URLs like "file" (which are local) from URLs like "host:repo" which should use ssh Add the new parameter "--diag-url" to "git fetch-pack", which prints the value for protocol, host and path to stderr and exits. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-08cache.h: move remote/connect API out of itJunio C Hamano
The definition of "struct ref" in "cache.h", a header file so central to the system, always confused me. This structure is not about the local ref used by sha1-name API to name local objects. It is what refspecs are expanded into, after finding out what refs the other side has, to define what refs are updated after object transfer succeeds to what values. It belongs to "remote.h" together with "struct refspec". While we are at it, also move the types and functions related to the Git transport connection to a new header file connect.h Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28clone: open a shortcut for connectivity checkNguyễn Thái Ngọc Duy
In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-07fetch: use struct ref to represent refs to be fetchedJunio C Hamano
Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12filter_refs(): delete matched refs from sought listMichael Haggerty
Remove any references that are available from the remote from the sought list (rather than overwriting their names with NUL characters, as previously). Mark matching entries by writing a non-NULL pointer to string_list_item::util during the iteration, then use filter_string_list() later to filter out the entries that have been marked. Document this aspect of fetch_pack() in a comment in the header file. (More documentation is obviously still needed.) Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12Change fetch_pack() and friends to take string_list argumentsMichael Haggerty
Instead of juggling <nr_heads,heads> (sometimes called <nr_match,match>), pass around the list of references to be sought in a single string_list variable called "sought". Future commits will make more use of string_list functionality. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12fetch_pack(): reindent function decl and defnMichael Haggerty
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-02fetch-pack: new --stdin option to read refs from stdinIvan Todoroski
If a remote repo has too many tags (or branches), cloning it over the smart HTTP transport can fail because remote-curl.c puts all the refs from the remote repo on the fetch-pack command line. This can make the command line longer than the global OS command line limit, causing fetch-pack to fail. This is especially a problem on Windows where the command line limit is orders of magnitude shorter than Linux. There are already real repos out there that msysGit cannot clone over smart HTTP due to this problem. Here is an easy way to trigger this problem: git init too-many-refs cd too-many-refs echo bla > bla.txt git add . git commit -m test sha=$(git rev-parse HEAD) tag=$(perl -e 'print "bla" x 30') for i in `seq 50000`; do echo $sha refs/tags/$tag-$i >> .git/packed-refs done Then share this repo over the smart HTTP protocol and try cloning it: $ git clone http://localhost/.../too-many-refs/.git Cloning into 'too-many-refs'... fatal: cannot exec 'fetch-pack': Argument list too long 50k tags is obviously an absurd number, but it is required to demonstrate the problem on Linux because it has a much more generous command line limit. On Windows the clone fails with as little as 500 tags in the above loop, which is getting uncomfortably close to the number of tags you might see in real long lived repos. This is not just theoretical, msysGit is already failing to clone our company repo due to this. It's a large repo converted from CVS, nearly 10 years of history. Four possible solutions were discussed on the Git mailing list (in no particular order): 1) Call fetch-pack multiple times with smaller batches of refs. This was dismissed as inefficient and inelegant. 2) Add option --refs-fd=$n to pass a an fd from where to read the refs. This was rejected because inheriting descriptors other than stdin/stdout/stderr through exec() is apparently problematic on Windows, plus it would require changes to the run-command API to open extra pipes. 3) Add option --refs-from=$tmpfile to pass the refs using a temp file. This was not favored because of the temp file requirement. 4) Add option --stdin to pass the refs on stdin, one per line. In the end this option was chosen as the most efficient and most desirable from scripting perspective. There was however a small complication when using stdin to pass refs to fetch-pack. The --stateless-rpc option to fetch-pack also uses stdin for communication with the remote server. If we are going to sneak refs on stdin line by line, it would have to be done very carefully in the presence of --stateless-rpc, because when reading refs line by line we might read ahead too much data into our buffer and eat some of the remote protocol data which is also coming on stdin. One way to solve this would be to refactor get_remote_heads() in fetch-pack.c to accept a residual buffer from our stdin line parsing above, but this function is used in several places so other callers would be burdened by this residual buffer interface even when most of them don't need it. In the end we settled on the following solution: If --stdin is specified without --stateless-rpc, fetch-pack would read the refs from stdin one per line, in a script friendly format. However if --stdin is specified together with --stateless-rpc, fetch-pack would read the refs from stdin in packetized format (pkt-line) with a flush packet terminating the list of refs. This way we can read the exact number of bytes that we need from stdin, and then get_remote_heads() can continue reading from the same fd without losing a single byte of remote protocol data. This way the --stdin option only loses generality and scriptability when used together with --stateless-rpc, which is not easily scriptable anyway because it also uses pkt-line when talking to the remote server. Signed-off-by: Ivan Todoroski <grnch@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-16standardize brace placement in struct definitionsJonathan Nieder
In a struct definitions, unlike functions, the prevailing style is for the opening brace to go on the same line as the struct name, like so: struct foo { int bar; char *baz; }; Indeed, grepping for 'struct [a-z_]* {$' yields about 5 times as many matches as 'struct [a-z_]*$'. Linus sayeth: Heretic people all over the world have claimed that this inconsistency is ... well ... inconsistent, but all right-thinking people know that (a) K&R are _right_ and (b) K&R are right. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-05Smart fetch over HTTP: client sideShawn O. Pearce
The git-remote-curl backend detects if the remote server supports the git-upload-pack service, and if so, runs git-fetch-pack locally in a pipe to generate the want/have commands. The advertisements from the server that were obtained during the discovery are passed into git-fetch-pack before the POST request starts, permitting server capability discovery and enablement. Common objects that are discovered are appended onto the request as have lines and are sent again on the next request. This allows the remote side to reinitialize its in-memory list of common objects during the next request. Because all requests are relatively short, below git-remote-curl's 1 MiB buffer limit, requests will use the standard Content-Length header and be valid HTTP/1.0 POST requests. This makes the fetch client more tolerant of proxy servers which don't support HTTP/1.1 or the chunked transfer encoding. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-05Teach fetch-pack/upload-pack about --include-tagShawn O. Pearce
The new protocol extension "include-tag" allows the client side of the connection (fetch-pack) to request that the server side of the native git protocol (upload-pack / pack-objects) use --include-tag as it prepares the packfile, thus ensuring that an annotated tag object will be included in the resulting packfile if the object it refers to was also included into the packfile. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-05Reduce the number of connects when fetchingDaniel Barkalow
This shares the connection between getting the remote ref list and getting objects in the first batch. (A second connection is still used to follow tags). When we do not fetch objects (i.e. either ls-remote disconnects after getting list of refs, or we decide we are already up-to-date), we clean up the connection properly; otherwise the connection is left open in need of cleaning up to avoid getting an error message from the remote end when ssh is used. Signed-off-by: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-19Always obtain fetch-pack arguments from struct fetch_pack_argsShawn O. Pearce
Copying the arguments from a fetch_pack_args into static globals within the builtin-fetch-pack module is error-prone and may lead rise to cases where arguments supplied via the struct from the new fetch_pack() API may not be honored by the implementation. Here we reorganize all of the static globals into a single static struct fetch_pack_args instance and use memcpy() to move the data from the caller supplied structure into the globals before we execute our pack fetching implementation. This strategy is more robust to additions and deletions of properties. As keep_pack is a single bit we have also introduced lock_pack to mean not only download and store the packfile via index-pack but also to lock it against repacking by creating a .keep file when the packfile itself is stored. The caller must remove the .keep file when it is safe to do so. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-19Use 'unsigned:1' when we mean boolean optionsShawn O. Pearce
These options are all strictly boolean (true/false). Its easier to document this implicitly by making their storage type a single bit. There is no compelling memory space reduction reason for this change, it just makes the structure definition slightly more readable. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-09-19Remove pack.keep after ref updates in git-fetchShawn O. Pearce
If we are using a native packfile to perform a git-fetch invocation and the received packfile contained more than the configured limits of fetch.unpackLimit/transfer.unpackLimit then index-pack will output a single line saying "keep\t$sha1\n" to stdout. This line needs to be captured and retained so we can delete the corresponding .keep file ("$GIT_DIR/objects/pack/pack-$sha1.keep") once all refs have been safely updated. This trick has long been in use with git-fetch.sh and its lower level helper git-fetch--tool as a way to allow index-pack to save the new packfile before the refs have been updated and yet avoid a race with any concurrently running git-repack process. It was unfortunately lost when git-fetch.sh was converted to pure C and fetch--tool was no longer being invoked. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-19Make fetch-pack a builtin with an internal APIDaniel Barkalow
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>