summaryrefslogtreecommitdiff
path: root/builtin/cat-file.c
AgeCommit message (Collapse)Author
2016-07-28Merge branch 'nd/pack-ofs-4gb-limit'Junio C Hamano
"git pack-objects" and "git index-pack" mostly operate with off_t when talking about the offset of objects in a packfile, but there were a handful of places that used "unsigned long" to hold that value, leading to an unintended truncation. * nd/pack-ofs-4gb-limit: fsck: use streaming interface for large blobs in pack pack-objects: do not truncate result in-pack object size on 32-bit systems index-pack: correct "offset" type in unpack_entry_data() index-pack: report correct bad object offsets even if they are large index-pack: correct "len" type in unpack_data() sha1_file.c: use type off_t* for object_info->disk_sizep pack-objects: pass length to check_pack_crc() without truncation
2016-07-13sha1_file.c: use type off_t* for object_info->disk_sizepNguyễn Thái Ngọc Duy
This field, filled by sha1_object_info() contains the on-disk size of an object, which could go over 4GB limit of unsigned long on 32-bit systems. Use off_t for it instead and update all callers. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-31Merge branch 'jk/cat-file-buffered-batch-all'Junio C Hamano
"git cat-file --batch-all" has been sped up, by taking advantage of the fact that it does not have to read a list of objects, in two ways. * jk/cat-file-buffered-batch-all: cat-file: default to --buffer when --batch-all-objects is used cat-file: avoid noop calls to sha1_object_info_extended
2016-05-18cat-file: default to --buffer when --batch-all-objects is usedJeff King
Traditionally cat-file's batch-mode does not do any output buffering. The reason is that a caller may have pipes connected to its input and output, and would want to use cat-file interactively, getting output immediately for each input it sends. This may involve a lot of small write() calls, which can be slow. So we introduced --buffer to improve this, but we can't turn it on by default, as it would break the interactive case above. However, when --batch-all-objects is used, we do not read stdin at all. We generate the output ourselves as quickly as possible, and then exit. In this case buffering is a strict win, and it is simply a hassle for the user to have to remember to specify --buffer. This patch makes --buffer the default when --batch-all-objects is used. Specifying "--buffer" manually is still OK, and you can even override it with "--no-buffer" if you're a masochist (or debugging). For some real numbers, running: git cat-file --batch-all-objects --batch-check='%(objectname)' on torvalds/linux goes from: real 0m1.464s user 0m1.208s sys 0m0.252s to: real 0m1.230s user 0m1.172s sys 0m0.056s for a 16% speedup. Suggested-by: Charles Bailey <charles@hashpling.org> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-18cat-file: avoid noop calls to sha1_object_info_extendedJeff King
It is not unreasonable to ask cat-file for a batch-check format of simply "%(objectname)". At first glance this seems like a noop (you are generally already feeding the object names on stdin!), but it has a few uses: 1. With --batch-all-objects, you can generate a listing of the sha1s present in the repository, without any input. 2. You do not have to feed sha1s; you can feed arbitrary sha1 expressions and have git resolve them en masse. 3. You can even feed a raw sha1, with the result that git will tell you whether we actually have the object or not. In case 3, the call to sha1_object_info is useful; it tells us whether the object exists or not (technically we could swap this out for has_sha1_file, but the cost is roughly the same). In case 2, the existence check is of debatable value. A mass-resolution might prefer performance to safety (against outputting a value for a corrupted ref, for example). However, the object lookup cost is likely not as noticeable compared to the resolution cost. And since we have provided that safety in the past, the conservative choice is to keep it. In case 1, though, the object lookup is a definite noop; we know about the object because we found it in the object database. There is no new information gained by making the call. This patch detects that case and optimizes out the call. Here are best-of-five timings for linux.git: [before] $ time git cat-file --buffer \ --batch-all-objects \ --batch-check='%(objectname)' real 0m2.117s user 0m2.044s sys 0m0.072s [after] $ time git cat-file --buffer \ --batch-all-objects \ --batch-check='%(objectname)' real 0m1.230s user 0m1.176s sys 0m0.052s There are two implementation details to note here. One is that we detect the noop case by seeing that "struct object_info" does not request any information. But besides object existence, there is one other piece of information which sha1_object_info may fill in: whether the object is cached, loose, or packed. We don't currently provide that information in the output, but if we were to do so later, we'd need to take note and disable the optimization in that case. And that leads to the second note. If we were to output that information, a better implementation would be to remember where we saw the object in --batch-all-objects in the first place, and avoid looking it up again by sha1. In fact, we could probably squeeze out some extra performance for less-trivial cases, too, by remembering the pack location where we saw the object, and going directly there to find its information (like type, size, etc). That would in theory make this optimization unnecessary. I didn't pursue that path here for two reasons: 1. It's non-trivial to implement, and has memory implications. Because we sort and de-dup the list of output sha1s, we'd have to record the pack information for each object, too. 2. It doesn't save as much as you might hope. It saves the find_pack_entry() call, but getting the size and type for deltified objects requires walking down the delta chain (for the real type) or reading the delta data header (for the size). These costs tend to dominate the non-trivial cases. By contrast, this optimization is easy and self-contained, and speeds up a real-world case I've used. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-15cat-file: read batch stream with strbuf_getline()Junio C Hamano
It is possible to prepare a text file with a DOS editor and feed it as a batch command stream to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-15strbuf: introduce strbuf_getline_{lf,nul}()Junio C Hamano
The strbuf_getline() interface allows a byte other than LF or NUL as the line terminator, but this is only because I wrote these codepaths anticipating that there might be a value other than NUL and LF that could be useful when I introduced line_termination long time ago. No useful caller that uses other value has emerged. By now, it is clear that the interface is overly broad without a good reason. Many codepaths have hardcoded preference to read either LF terminated or NUL terminated records from their input, and then call strbuf_getline() with LF or NUL as the third parameter. This step introduces two thin wrappers around strbuf_getline(), namely, strbuf_getline_lf() and strbuf_getline_nul(), and mechanically rewrites these call sites to call either one of them. The changes contained in this patch are: * introduction of these two functions in strbuf.[ch] * mechanical conversion of all callers to strbuf_getline() with either '\n' or '\0' as the third parameter to instead call the respective thin wrapper. After this step, output from "git grep 'strbuf_getline('" would become a lot smaller. An interim goal of this series is to make this an empty set, so that we can have strbuf_getline_crlf() take over the shorter name strbuf_getline(). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-16usage: do not insist that standard input must come from a fileJunio C Hamano
The synopsys text and the usage string of subcommands that read list of things from the standard input are often shown like this: git gostak [--distim] < <list-of-doshes> This is problematic in a number of ways: * The way to use these commands is more often to feed them the output from another command, not feed them from a file. * Manual pages outside Git, commands that operate on the data read from the standard input, e.g "sort", "grep", "sed", etc., are not described with such a "< redirection-from-file" in their synopsys text. Our doing so introduces inconsistency. * We do not insist on where the output should go, by saying git gostak [--distim] < <list-of-doshes> > <output> * As it is our convention to enclose placeholders inside <braket>, the redirection operator followed by a placeholder filename becomes very hard to read, both in the documentation and in the help text. Let's clean them all up, after making sure that the documentation clearly describes the modes that take information from the standard input and what kind of things are expected on the input. [jc: stole example for fmt-merge-msg from Jonathan] Helped-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-26cat-file: sort and de-dup output of --batch-all-objectsJeff King
The sorting we could probably live without, but printing duplicates is just a hassle for the user, who must then de-dup themselves (or risk a wrong answer if they are doing something like counting objects with a particular property). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-22cat-file: add --batch-all-objects optionJeff King
It can sometimes be useful to examine all objects in the repository. Normally this is done with "git rev-list --all --objects", but: 1. That shows only reachable objects. You may want to look at all available objects. 2. It's slow. We actually open each object to walk the graph. If your operation is OK with seeing unreachable objects, it's an order of magnitude faster to just enumerate the loose directories and pack indices. You can do this yourself using "ls" and "git show-index", but it's non-obvious. This patch adds an option to "cat-file --batch-check" to operate on all available objects (rather than reading names from stdin). This is based on a proposal by Charles Bailey to provide a separate "git list-all-objects" command. That is more orthogonal, as it splits enumerating the objects from getting information about them. However, in practice you will either: a. Feed the list of objects directly into cat-file anyway, so you can find out information about them. Keeping it in a single process is more efficient. b. Ask the listing process to start telling you more information about the objects, in which case you will reinvent cat-file's batch-check formatter. Adding a cat-file option is simple and efficient. And if you really do want just the object names, you can always do: git cat-file --batch-check='%(objectname)' --batch-all-objects Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-22cat-file: split batch_one_object into two stagesJeff King
There are really two things going on in this function: 1. We convert the name we got on stdin to a sha1. 2. We look up and print information on the sha1. Let's split out the second half so that we can call it separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-22cat-file: stop returning value from batch_one_objectJeff King
If batch_one_object returns an error code, we stop reading input. However, it will only do so if we feed it NULL, which cannot happen; we give it the "buf" member of a strbuf, which is always non-NULL. We did originally stop on other errors (like a missing object), but this was changed in 3c076db (cat-file --batch / --batch-check: do not exit if hashes are missing, 2008-06-09). These days we keep going for any per-object error (and print "missing" when necessary). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-22cat-file: add --buffer optionJeff King
We use a direct write() to output the results of --batch and --batch-check. This is good for processes feeding the input and reading the output interactively, but it introduces measurable overhead if you do not want this feature. For example, on linux.git: $ git rev-list --objects --all | cut -d' ' -f1 >objects $ time git cat-file --batch-check='%(objectsize)' \ <objects >/dev/null real 0m5.440s user 0m5.060s sys 0m0.384s This patch adds an option to use regular stdio buffering: $ time git cat-file --batch-check='%(objectsize)' \ --buffer <objects >/dev/null real 0m4.975s user 0m4.888s sys 0m0.092s Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-22cat-file: move batch_options definition to top of fileJeff King
That way all of the functions can make use of it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-22cat-file: minor style fix in options listJeff King
We do not put extra whitespace before the first macro argument. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-01Merge branch 'dt/cat-file-follow-symlinks'Junio C Hamano
"git cat-file --batch(-check)" learned the "--follow-symlinks" option that follows an in-tree symbolic link when asked about an object via extended SHA-1 syntax, e.g. HEAD:RelNotes that points at Documentation/RelNotes/2.5.0.txt. With the new option, the command behaves as if HEAD:Documentation/RelNotes/2.5.0.txt was given as input instead. * dt/cat-file-follow-symlinks: cat-file: add --follow-symlinks to --batch sha1_name: get_sha1_with_context learns to follow symlinks tree-walk: learn get_tree_entry_follow_symlinks
2015-05-20cat-file: add --follow-symlinks to --batchDavid Turner
This wires the in-repo-symlink following code through to the cat-file builtin. In the event of an out-of-repo link, cat-file will print the link in a new format. Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-06cat-file: teach cat-file a '--allow-unknown-type' optionKarthik Nayak
'git cat-file' throws an error while trying to print the type or size of a broken/corrupt object. This is because these objects are usually of unknown types. Teach git cat-file a '--allow-unknown-type' option where it prints the type or size of a broken/corrupt object without throwing an error. Modify '-t' and '-s' options to call sha1_object_info_extended() directly to support the '--allow-unknown-type' option. Add documentation for 'cat-file --allow-unknown-type'. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> cat-file: add documentation for '--allow-unknown-type' option. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-06cat-file: make the options mutually exclusiveKarthik Nayak
We only parse the options if 2 or 3 arguments are specified. Update 'struct option options[]' to use OPT_CMDMODE rather than OPT_SET_INT to allow only one mutually exclusive option and avoid the need for checking number of arguments. This was written by Junio C Hamano, tested by me. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-11Merge branch 'ah/usage-strings'Junio C Hamano
* ah/usage-strings: standardize usage info string format
2015-01-22Merge branch 'ak/cat-file-clean-up'Junio C Hamano
* ak/cat-file-clean-up: cat-file: use "type" and "size" from outer scope
2015-01-14standardize usage info string formatAlex Henrie
This patch puts the usage info strings that were not already in docopt- like format into docopt-like format, which will be a litle easier for end users and a lot easier for translators. Changes include: - Placing angle brackets around fill-in-the-blank parameters - Putting dashes in multiword parameter names - Adding spaces to [-f|--foobar] to make [-f | --foobar] - Replacing <foobar>* with [<foobar>...] Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-13cat-file: use "type" and "size" from outer scopeAlexander Kuleshov
In cat_one_file(), "type" and "size" variables are defined in the function scope, and then two variables of the same name are defined in a block in one of the if/else statement, hiding the definitions in the outer scope. Because the values of the outer variables before the control enters this scope, however, do not have to be preserved, we can remove useless definitions of variables from the inner scope safely without breaking anything. Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-10cat-file: remove unused includesAlexander Kuleshov
- "exec_cmd.h" became unnecessary at b931aa5a (Call builtin ls-tree in git-cat-file -p, 2006-05-26), when it changed an earlier code that delegated tree display to "ls-tree" via the run_command() API (hence needing "exec_cmd.h") to call cmd_ls_tree() directly. We should have removed the include in the same commit, but we forgot to do so. - "diff.h" was added at e5fba602 (textconv: support for cat_file, 2010-06-15), together with "userdiff.h", but "userdiff.h" can be included without including "diff.h"; the header was unnecessary from the beginning. - "tag.h" and "tree.h" were necessary since 8e440259 (Use blob_, commit_, tag_, and tree_type throughout., 2006-04-02) to check the type of object by comparing typename with tree_type and tag_type (pointers to extern strings). 21666f1a (convert object type handling from a string to a number, 2007-02-26) made these <type>_type strings unnecessary, and it could have switched to include "object.h", which is necessary to use typename(), but it forgot to do so. Because "tag.h" and "tree.h" include "object.h", it did not need to explicitly include "object.h" in order to start using typename() itself. We do not even have to include "object.h" after removing these two #includes, because "builtin.h" includes "commit.h" which in turn includes "object.h" these days. This happened at 7b9c0a69 (git-commit-tree: make it usable from other builtins, 2008-07-01). Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-07use skip_prefix() to avoid more magic numbersRené Scharfe
Continue where ae021d87 (use skip_prefix to avoid magic numbers) left off and use skip_prefix() in more places for determining the lengths of prefix strings to avoid using dependent constants and other indirect methods. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-25Merge branch 'jk/warn-on-object-refname-ambiguity'Junio C Hamano
* jk/warn-on-object-refname-ambiguity: rev-list: disable object/refname ambiguity check with --stdin cat-file: restore warn_on_object_refname_ambiguity flag cat-file: fix a minor memory leak in batch_objects cat-file: refactor error handling of batch_objects
2014-03-13cat-file: restore warn_on_object_refname_ambiguity flagJeff King
Commit 25fba78 turned off the object/refname ambiguity check during "git cat-file --batch" operations. However, this is a global flag, so let's restore it when we are done. This shouldn't make any practical difference, as cat-file exits immediately afterwards, but is good code hygeine and would prevent an unnecessary surprise if somebody starts to call cmd_cat_file later. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-10Merge branch 'jk/oi-delta-base'Junio C Hamano
Teach "cat-file --batch" to show delta-base object name for a packed object that is represented as a delta. * jk/oi-delta-base: cat-file: provide %(deltabase) batch format sha1_object_info_extended: provide delta base sha1s
2014-01-10Merge branch 'cc/replace-object-info'Junio C Hamano
read_sha1_file() that is the workhorse to read the contents given an object name honoured object replacements, but there is no corresponding mechanism to sha1_object_info() that is used to obtain the metainfo (e.g. type & size) about the object, leading callers to weird inconsistencies. * cc/replace-object-info: replace info: rename 'full' to 'long' and clarify in-code symbols Documentation/git-replace: describe --format option builtin/replace: unset read_replace_refs t6050: add tests for listing with --format builtin/replace: teach listing using short, medium or full formats sha1_file: perform object replacement in sha1_object_info_extended() t6050: show that git cat-file --batch fails with replace objects sha1_object_info_extended(): add an "unsigned flags" parameter sha1_file.c: add lookup_replace_object_extended() to pass flags replace_object: don't check read_replace_refs twice rename READ_SHA1_FILE_REPLACE flag to LOOKUP_REPLACE_OBJECT
2014-01-07cat-file: fix a minor memory leak in batch_objectsJeff King
We should always have been freeing our strbuf, but doing so consistently was annoying until the refactoring in the previous patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-07cat-file: refactor error handling of batch_objectsJeff King
This just pulls the return value for the function out of the inner loop, so we can break out of the loop rather than do an early return. This will make it easier to put any cleanup for the function in one place. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-26cat-file: provide %(deltabase) batch formatJeff King
It can be useful for debugging or analysis to see which objects are stored as delta bases on top of others. This information is available by running `git verify-pack`, but that is extremely expensive (and is harder than necessary to parse). Instead, let's make it available as a cat-file query format, which makes it fast and simple to get the bases for a subset of the objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-12sha1_object_info_extended(): add an "unsigned flags" parameterChristian Couder
This parameter is not used yet, but it will be used to tell sha1_object_info_extended() if it should perform object replacement or not. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-12cat-file: handle --batch format with missing type/sizeJeff King
Commit 98e2092 taught cat-file to stream blobs with --batch, which requires that we look up the object type before loading it into memory. As a result, we now print the object header from information in sha1_object_info, and the actual contents from the read_sha1_file. We double-check that the information we printed in the header matches the content we are about to show. Later, commit 93d2a60 allowed custom header lines for --batch, and commit 5b08640 made type lookups optional. As a result, specifying a header line without the type or size means that we will not look up those items at all. This causes our double-checking to erroneously die with an error; we think the type or size has changed, when in fact it was simply left at "0". For the size, we can fix this by only doing the consistency double-check when we have retrieved the size via sha1_object_info. In the case that we have not retrieved the value, that means we also did not print it, so there is nothing for us to check that we are consistent with. We could do the same for the type. However, besides our consistency check, we also care about the type in deciding whether to stream or not. So instead of handling the case where we do not know the type, this patch instead makes sure that we always trigger a type lookup when we are printing, so that even a format without the type will stream as we would in the normal case. Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-12cat-file: pass expand_data to print_object_or_dieJeff King
We currently individually pass the sha1, type, and size fields calculated by sha1_object_info. However, if we pass the whole struct, the called function can make more intelligent decisions about which fields were actually filled by sha1_object_info. This patch takes that first refactoring step, passing the whole struct, so further patches can make those decisions with less noise in their diffs. There should be no functional change to this patch (aside from a minor typo fix in the error message). As a side effect, we can rename the local variables in the function to "type" and "size", since the names are no longer taken. Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-23Merge branch 'mg/more-textconv'Junio C Hamano
Make "git grep" and "git show" pay attention to --textconv when dealing with blob objects. * mg/more-textconv: grep: honor --textconv for the case rev:path grep: allow to use textconv filters t7008: demonstrate behavior of grep with textconv cat-file: do not die on --textconv without textconv filters show: honor --textconv for blobs diff_opt: track whether flags have been set explicitly t4030: demonstrate behavior of show with textconv
2013-08-05cat-file: only split on whitespace when %(rest) is usedJeff King
Commit c334b87b (cat-file: split --batch input lines on whitespace, 2013-07-11) taught `cat-file --batch-check` to split input lines on the first whitespace, and stash everything after the first token into the %(rest) output format element. It claimed: Object names cannot contain spaces, so any input with spaces would have resulted in a "missing" line. But that is not correct. Refs, object sha1s, and various peeling suffixes cannot contain spaces, but some object names can. In particular: 1. Tree paths like "[<tree>]:path with whitespace" 2. Reflog specifications like "@{2 days ago}" 3. Commit searches like "rev^{/grep me}" or ":/grep me" To remain backwards compatible, we cannot split on whitespace by default, hence we will ship 1.8.4 with the commit reverted. Resurrect its attempt but in a weaker form; only do the splitting when "%(rest)" is used in the output format. Since that element did not exist at all before c334b87, old scripts cannot be affected. The existence of object names with spaces does mean that you cannot reliably do: echo ":path with space and other data" | git cat-file --batch-check="%(objectname) %(rest)" as it would split the path and feed only ":path" to get_sha1. But that command is nonsensical. If you wanted to see "and other data" in "%(rest)", git cannot possibly know where the filename ends and the "rest" begins. It might be more robust to have something like "-z" to separate the input elements. But this patch is still a reasonable step before having that. It makes the easy cases easy; people who do not care about %(rest) do not have to consider it, and the %(rest) code handles the spaces and newlines of "rev-list --objects" correctly. Hard cases remain hard but possible (if you might get whitespace in your input, you do not get to use %(rest) and must split and join the output yourself using more flexible tools). And most importantly, it does not preclude us from having different splitting rules later if a "-z" (or similar) option is added. So we can make the hard cases easier later, if we choose to. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-02Revert "cat-file: split --batch input lines on whitespace"Junio C Hamano
This reverts commit c334b87b30c1464a1ab563fe1fb8de5eaf0e5bac; the update assumed that people only used the command to read from "rev-list --objects" output, whose lines begin with a 40-hex object name followed by a whitespace, but it turns out that scripts feed random extended SHA-1 expressions (e.g. "HEAD:$pathname") in which a whitespace has to be kept.
2013-07-12sha1_object_info_extended: make type calculation optionalJeff King
Each caller of sha1_object_info_extended sets up an object_info struct to tell the function which elements of the object it wants to get. Until now, getting the type of the object has always been required (and it is returned via the return type rather than a pointer in object_info). This can involve actually opening a loose object file to determine its type, or following delta chains to determine a packed file's base type. These effects produce a measurable slow-down when doing a "cat-file --batch-check" that does not include %(objecttype). This patch adds a "typep" query to struct object_info, so that it can be optionally queried just like size and disk_size. As a result, the return type of the function is no longer the object type, but rather 0/-1 for success/error. As there are only three callers total, we just fix up each caller rather than keep a compatibility wrapper: 1. The simpler sha1_object_info wrapper continues to always ask for and return the type field. 2. The istream_source function wants to know the type, and so always asks for it. 3. The cat-file batch code asks for the type only when %(objecttype) is part of the format string. On linux.git, the best-of-five for running: $ git rev-list --objects --all >objects $ time git cat-file --batch-check='%(objectsize:disk)' on a fully packed repository goes from: real 0m8.680s user 0m8.160s sys 0m0.512s to: real 0m7.205s user 0m6.580s sys 0m0.608s Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12cat-file: disable object/refname ambiguity check for batch modeJeff King
A common use of "cat-file --batch-check" is to feed a list of objects from "rev-list --objects" or a similar command. In this instance, all of our input objects are 40-byte sha1 ids. However, cat-file has always allowed arbitrary revision specifiers, and feeds the result to get_sha1(). Fortunately, get_sha1() recognizes a 40-byte sha1 before doing any hard work trying to look up refs, meaning this scenario should end up spending very little time converting the input into an object sha1. However, since 798c35f (get_sha1: warn about full or short object names that look like refs, 2013-05-29), when we encounter this case, we spend the extra effort to do a refname lookup anyway, just to print a warning. This is further exacerbated by ca91993 (get_packed_ref_cache: reload packed-refs file when it changes, 2013-06-20), which makes individual ref lookup more expensive by requiring a stat() of the packed-refs file for each missing ref. With no patches, this is the time it takes to run: $ git rev-list --objects --all >objects $ time git cat-file --batch-check='%(objectname)' <objects on the linux.git repository: real 1m13.494s user 0m25.924s sys 0m47.532s If we revert ca91993, the packed-refs up-to-date check, it gets a little better: real 0m54.697s user 0m21.692s sys 0m32.916s but we are still spending quite a bit of time on ref lookup (and we would not want to revert that patch, anyway, which has correctness issues). If we revert 798c35f, disabling the warning entirely, we get a much more reasonable time: real 0m7.452s user 0m6.836s sys 0m0.608s This patch does the moral equivalent of this final case (and gets similar speedups). We introduce a global flag that callers of get_sha1() can use to avoid paying the price for the warning. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12cat-file: split --batch input lines on whitespaceJeff King
If we get an input line to --batch or --batch-check that looks like "HEAD foo bar", we will currently feed the whole thing to get_sha1(). This means that to use --batch-check with `rev-list --objects`, one must pre-process the input, like: git rev-list --objects HEAD | cut -d' ' -f1 | git cat-file --batch-check Besides being more typing and slightly less efficient to invoke `cut`, the result loses information: we no longer know which path each object was found at. This patch teaches cat-file to split input lines at the first whitespace. Everything to the left of the whitespace is considered an object name, and everything to the right is made available as the %(reset) atom. So you can now do: git rev-list --objects HEAD | git cat-file --batch-check='%(objectsize) %(rest)' to collect object sizes at particular paths. Even if %(rest) is not used, we always do the whitespace split (which means you can simply eliminate the `cut` command from the first example above). This whitespace split is backwards compatible for any reasonable input. Object names cannot contain spaces, so any input with spaces would have resulted in a "missing" line. The only input hurt is if somebody really expected input of the form "HEAD is a fine-looking ref!" to fail; it will now parse HEAD, and make "is a fine-looking ref!" available as %(rest). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12cat-file: add %(objectsize:disk) format atomJeff King
This atom is just like %(objectsize), except that it shows the on-disk size of the object rather than the object's true size. In other words, it makes the "disk_size" query of sha1_object_info_extended available via the command-line. This can be used for rough attribution of disk usage to particular refs, though see the caveats in the documentation. This patch does not include any tests, as the exact numbers returned are volatile and subject to zlib and packing decisions. We cannot even reliably guarantee that the on-disk size is smaller than the object content (though in general this should be the case for non-trivial objects). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12cat-file: add --batch-check=<format>Jeff King
The `cat-file --batch-check` command can be used to quickly get information about a large number of objects. However, it provides a fixed set of information. This patch adds an optional <format> option to --batch-check to allow a caller to specify which items they are interested in, and in which order to output them. This is not very exciting for now, since we provide the same limited set that you could already get. However, it opens the door to adding new format items in the future without breaking backwards compatibility (or forcing callers to pay the cost to calculate uninteresting items). Since the --batch option shares code with --batch-check, it receives the same feature, though it is less likely to be of interest there. The format atom names are chosen to match their counterparts in for-each-ref. Though we do not (yet) share any code with for-each-ref's formatter, this keeps the interface as consistent as possible, and may help later on if the implementations are unified. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-11cat-file: refactor --batch option parsingJeff King
We currently use an int to tell us whether --batch parsing is on, and if so, whether we should print the full object contents. Let's instead factor this into a struct, filled in by callback, which will make further batch-related options easy to add. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-11cat-file: teach --batch to stream blob objectsJeff King
The regular "git cat-file -p" and "git cat-file blob" code paths already learned to stream large blobs. Let's do the same here. Note that this means we look up the type and size before making a decision of whether to load the object into memory or stream (just like the "-p" code path does). That can lead to extra work, but it should be dwarfed by the cost of actually accessing the object itself. In my measurements, there was a 1-2% slowdown when using "--batch" on a large number of objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-10cat-file: do not die on --textconv without textconv filtersMichael J Gruber
When a command is supposed to use textconv filters (by default or with "--textconv") and none are configured then the blob is output without conversion; the only exception to this rule is "cat-file --textconv". Make it behave like the rest of textconv aware commands. Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17cat-file: print tags raw for "cat-file -p"Jeff King
When "cat-file -p" prints commits, it shows them in their raw format, since git's format is already human-readable. For tags, however, we print the whole thing raw except for one thing: we convert the timestamp on the tagger line into a human-readable date. This dates all the way back to a0f15fa (Pretty-print tagger dates, 2006-03-01). At that time there was no other way to pretty-print a tag. These days, however, neither of those matters much. The normal way to pretty-print a tag is with "git show", which is much more flexible than "cat-file -p". Commit a0f15fa also built "verify-tag --verbose" (and subsequently "tag -v") around the "cat-file -p" output. However, that behavior was lost in commit 62e09ce (Make git tag a builtin, 2007-07-20), and we went back to printing the raw tag contents. Nobody seems to have noticed the bug since then (and it is arguably a saner behavior anyway, as it shows the actual bytes for which we verified the signature). Let's drop the tagger-date formatting for "cat-file -p". It makes us more consistent with cat-file's commit pretty-printer, and as a bonus, we can drop the hand-rolled tag parsing code in cat-file (which happened to behave inconsistently with the tag pretty-printing code elsewhere). This is a change of output format, so it's possible that some callers could considered this a regression. However, the original behavior was arguably a bug (due to the inconsistency with commits), likely nobody was relying on it (even we do not use it ourselves these days), and anyone relying on the "-p" pretty-printer should be able to expect a change in the output format (i.e., while "cat-file" is plumbing, the output format of "-p" was never guaranteed to be stable). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-30cat-file: Fix an gcc -Wuninitialized warningRamsay Jones
After commit cbfd5e1c ("drop some obsolete "x = x" compiler warning hacks", 21-03-2013) removed a gcc specific hack, older versions of gcc now issue an "'contents' might be used uninitialized" warning. In order to suppress the warning, we simply initialize the variable to NULL in it's declaration. Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-21drop some obsolete "x = x" compiler warning hacksJeff King
In cases where the setting and access of a variable are protected by the same conditional flag, older versions of gcc would generate a "might be used unitialized" warning. We silence the warning by initializing the variable to itself, a hack that gcc recognizes. Modern versions of gcc are smart enough to get this right, going back to at least version 4.3.5. gcc 4.1 does get it wrong in both cases, but is sufficiently old that we probably don't need to care about it anymore. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-07Merge branch 'nd/i18n-parseopt-help'Junio C Hamano
A lot of i18n mark-up for the help text from "git <cmd> -h". * nd/i18n-parseopt-help: (66 commits) Use imperative form in help usage to describe an action Reduce translations by using same terminologies i18n: write-tree: mark parseopt strings for translation i18n: verify-tag: mark parseopt strings for translation i18n: verify-pack: mark parseopt strings for translation i18n: update-server-info: mark parseopt strings for translation i18n: update-ref: mark parseopt strings for translation i18n: update-index: mark parseopt strings for translation i18n: tag: mark parseopt strings for translation i18n: symbolic-ref: mark parseopt strings for translation i18n: show-ref: mark parseopt strings for translation i18n: show-branch: mark parseopt strings for translation i18n: shortlog: mark parseopt strings for translation i18n: rm: mark parseopt strings for translation i18n: revert, cherry-pick: mark parseopt strings for translation i18n: rev-parse: mark parseopt strings for translation i18n: reset: mark parseopt strings for translation i18n: rerere: mark parseopt strings for translation i18n: status: mark parseopt strings for translation i18n: replace: mark parseopt strings for translation ...