path: root/builtin/cat-file.c
AgeCommit message (Collapse)Author
2013-10-23Merge branch 'mg/more-textconv'Junio C Hamano
Make "git grep" and "git show" pay attention to --textconv when dealing with blob objects. * mg/more-textconv: grep: honor --textconv for the case rev:path grep: allow to use textconv filters t7008: demonstrate behavior of grep with textconv cat-file: do not die on --textconv without textconv filters show: honor --textconv for blobs diff_opt: track whether flags have been set explicitly t4030: demonstrate behavior of show with textconv
2013-08-05cat-file: only split on whitespace when %(rest) is usedJeff King
Commit c334b87b (cat-file: split --batch input lines on whitespace, 2013-07-11) taught `cat-file --batch-check` to split input lines on the first whitespace, and stash everything after the first token into the %(rest) output format element. It claimed: Object names cannot contain spaces, so any input with spaces would have resulted in a "missing" line. But that is not correct. Refs, object sha1s, and various peeling suffixes cannot contain spaces, but some object names can. In particular: 1. Tree paths like "[<tree>]:path with whitespace" 2. Reflog specifications like "@{2 days ago}" 3. Commit searches like "rev^{/grep me}" or ":/grep me" To remain backwards compatible, we cannot split on whitespace by default, hence we will ship 1.8.4 with the commit reverted. Resurrect its attempt but in a weaker form; only do the splitting when "%(rest)" is used in the output format. Since that element did not exist at all before c334b87, old scripts cannot be affected. The existence of object names with spaces does mean that you cannot reliably do: echo ":path with space and other data" | git cat-file --batch-check="%(objectname) %(rest)" as it would split the path and feed only ":path" to get_sha1. But that command is nonsensical. If you wanted to see "and other data" in "%(rest)", git cannot possibly know where the filename ends and the "rest" begins. It might be more robust to have something like "-z" to separate the input elements. But this patch is still a reasonable step before having that. It makes the easy cases easy; people who do not care about %(rest) do not have to consider it, and the %(rest) code handles the spaces and newlines of "rev-list --objects" correctly. Hard cases remain hard but possible (if you might get whitespace in your input, you do not get to use %(rest) and must split and join the output yourself using more flexible tools). And most importantly, it does not preclude us from having different splitting rules later if a "-z" (or similar) option is added. So we can make the hard cases easier later, if we choose to. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2013-08-02Revert "cat-file: split --batch input lines on whitespace"Junio C Hamano
This reverts commit c334b87b30c1464a1ab563fe1fb8de5eaf0e5bac; the update assumed that people only used the command to read from "rev-list --objects" output, whose lines begin with a 40-hex object name followed by a whitespace, but it turns out that scripts feed random extended SHA-1 expressions (e.g. "HEAD:$pathname") in which a whitespace has to be kept.
2013-07-12sha1_object_info_extended: make type calculation optionalJeff King
Each caller of sha1_object_info_extended sets up an object_info struct to tell the function which elements of the object it wants to get. Until now, getting the type of the object has always been required (and it is returned via the return type rather than a pointer in object_info). This can involve actually opening a loose object file to determine its type, or following delta chains to determine a packed file's base type. These effects produce a measurable slow-down when doing a "cat-file --batch-check" that does not include %(objecttype). This patch adds a "typep" query to struct object_info, so that it can be optionally queried just like size and disk_size. As a result, the return type of the function is no longer the object type, but rather 0/-1 for success/error. As there are only three callers total, we just fix up each caller rather than keep a compatibility wrapper: 1. The simpler sha1_object_info wrapper continues to always ask for and return the type field. 2. The istream_source function wants to know the type, and so always asks for it. 3. The cat-file batch code asks for the type only when %(objecttype) is part of the format string. On linux.git, the best-of-five for running: $ git rev-list --objects --all >objects $ time git cat-file --batch-check='%(objectsize:disk)' on a fully packed repository goes from: real 0m8.680s user 0m8.160s sys 0m0.512s to: real 0m7.205s user 0m6.580s sys 0m0.608s Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2013-07-12cat-file: disable object/refname ambiguity check for batch modeJeff King
A common use of "cat-file --batch-check" is to feed a list of objects from "rev-list --objects" or a similar command. In this instance, all of our input objects are 40-byte sha1 ids. However, cat-file has always allowed arbitrary revision specifiers, and feeds the result to get_sha1(). Fortunately, get_sha1() recognizes a 40-byte sha1 before doing any hard work trying to look up refs, meaning this scenario should end up spending very little time converting the input into an object sha1. However, since 798c35f (get_sha1: warn about full or short object names that look like refs, 2013-05-29), when we encounter this case, we spend the extra effort to do a refname lookup anyway, just to print a warning. This is further exacerbated by ca91993 (get_packed_ref_cache: reload packed-refs file when it changes, 2013-06-20), which makes individual ref lookup more expensive by requiring a stat() of the packed-refs file for each missing ref. With no patches, this is the time it takes to run: $ git rev-list --objects --all >objects $ time git cat-file --batch-check='%(objectname)' <objects on the linux.git repository: real 1m13.494s user 0m25.924s sys 0m47.532s If we revert ca91993, the packed-refs up-to-date check, it gets a little better: real 0m54.697s user 0m21.692s sys 0m32.916s but we are still spending quite a bit of time on ref lookup (and we would not want to revert that patch, anyway, which has correctness issues). If we revert 798c35f, disabling the warning entirely, we get a much more reasonable time: real 0m7.452s user 0m6.836s sys 0m0.608s This patch does the moral equivalent of this final case (and gets similar speedups). We introduce a global flag that callers of get_sha1() can use to avoid paying the price for the warning. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2013-07-12cat-file: split --batch input lines on whitespaceJeff King
If we get an input line to --batch or --batch-check that looks like "HEAD foo bar", we will currently feed the whole thing to get_sha1(). This means that to use --batch-check with `rev-list --objects`, one must pre-process the input, like: git rev-list --objects HEAD | cut -d' ' -f1 | git cat-file --batch-check Besides being more typing and slightly less efficient to invoke `cut`, the result loses information: we no longer know which path each object was found at. This patch teaches cat-file to split input lines at the first whitespace. Everything to the left of the whitespace is considered an object name, and everything to the right is made available as the %(reset) atom. So you can now do: git rev-list --objects HEAD | git cat-file --batch-check='%(objectsize) %(rest)' to collect object sizes at particular paths. Even if %(rest) is not used, we always do the whitespace split (which means you can simply eliminate the `cut` command from the first example above). This whitespace split is backwards compatible for any reasonable input. Object names cannot contain spaces, so any input with spaces would have resulted in a "missing" line. The only input hurt is if somebody really expected input of the form "HEAD is a fine-looking ref!" to fail; it will now parse HEAD, and make "is a fine-looking ref!" available as %(rest). Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2013-07-12cat-file: add %(objectsize:disk) format atomJeff King
This atom is just like %(objectsize), except that it shows the on-disk size of the object rather than the object's true size. In other words, it makes the "disk_size" query of sha1_object_info_extended available via the command-line. This can be used for rough attribution of disk usage to particular refs, though see the caveats in the documentation. This patch does not include any tests, as the exact numbers returned are volatile and subject to zlib and packing decisions. We cannot even reliably guarantee that the on-disk size is smaller than the object content (though in general this should be the case for non-trivial objects). Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2013-07-12cat-file: add --batch-check=<format>Jeff King
The `cat-file --batch-check` command can be used to quickly get information about a large number of objects. However, it provides a fixed set of information. This patch adds an optional <format> option to --batch-check to allow a caller to specify which items they are interested in, and in which order to output them. This is not very exciting for now, since we provide the same limited set that you could already get. However, it opens the door to adding new format items in the future without breaking backwards compatibility (or forcing callers to pay the cost to calculate uninteresting items). Since the --batch option shares code with --batch-check, it receives the same feature, though it is less likely to be of interest there. The format atom names are chosen to match their counterparts in for-each-ref. Though we do not (yet) share any code with for-each-ref's formatter, this keeps the interface as consistent as possible, and may help later on if the implementations are unified. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2013-07-11cat-file: refactor --batch option parsingJeff King
We currently use an int to tell us whether --batch parsing is on, and if so, whether we should print the full object contents. Let's instead factor this into a struct, filled in by callback, which will make further batch-related options easy to add. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2013-07-11cat-file: teach --batch to stream blob objectsJeff King
The regular "git cat-file -p" and "git cat-file blob" code paths already learned to stream large blobs. Let's do the same here. Note that this means we look up the type and size before making a decision of whether to load the object into memory or stream (just like the "-p" code path does). That can lead to extra work, but it should be dwarfed by the cost of actually accessing the object itself. In my measurements, there was a 1-2% slowdown when using "--batch" on a large number of objects. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2013-05-10cat-file: do not die on --textconv without textconv filtersMichael J Gruber
When a command is supposed to use textconv filters (by default or with "--textconv") and none are configured then the blob is output without conversion; the only exception to this rule is "cat-file --textconv". Make it behave like the rest of textconv aware commands. Signed-off-by: Michael J Gruber <> Signed-off-by: Junio C Hamano <>
2013-04-17cat-file: print tags raw for "cat-file -p"Jeff King
When "cat-file -p" prints commits, it shows them in their raw format, since git's format is already human-readable. For tags, however, we print the whole thing raw except for one thing: we convert the timestamp on the tagger line into a human-readable date. This dates all the way back to a0f15fa (Pretty-print tagger dates, 2006-03-01). At that time there was no other way to pretty-print a tag. These days, however, neither of those matters much. The normal way to pretty-print a tag is with "git show", which is much more flexible than "cat-file -p". Commit a0f15fa also built "verify-tag --verbose" (and subsequently "tag -v") around the "cat-file -p" output. However, that behavior was lost in commit 62e09ce (Make git tag a builtin, 2007-07-20), and we went back to printing the raw tag contents. Nobody seems to have noticed the bug since then (and it is arguably a saner behavior anyway, as it shows the actual bytes for which we verified the signature). Let's drop the tagger-date formatting for "cat-file -p". It makes us more consistent with cat-file's commit pretty-printer, and as a bonus, we can drop the hand-rolled tag parsing code in cat-file (which happened to behave inconsistently with the tag pretty-printing code elsewhere). This is a change of output format, so it's possible that some callers could considered this a regression. However, the original behavior was arguably a bug (due to the inconsistency with commits), likely nobody was relying on it (even we do not use it ourselves these days), and anyone relying on the "-p" pretty-printer should be able to expect a change in the output format (i.e., while "cat-file" is plumbing, the output format of "-p" was never guaranteed to be stable). Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2013-03-30cat-file: Fix an gcc -Wuninitialized warningRamsay Jones
After commit cbfd5e1c ("drop some obsolete "x = x" compiler warning hacks", 21-03-2013) removed a gcc specific hack, older versions of gcc now issue an "'contents' might be used uninitialized" warning. In order to suppress the warning, we simply initialize the variable to NULL in it's declaration. Signed-off-by: Ramsay Jones <> Signed-off-by: Junio C Hamano <>
2013-03-21drop some obsolete "x = x" compiler warning hacksJeff King
In cases where the setting and access of a variable are protected by the same conditional flag, older versions of gcc would generate a "might be used unitialized" warning. We silence the warning by initializing the variable to itself, a hack that gcc recognizes. Modern versions of gcc are smart enough to get this right, going back to at least version 4.3.5. gcc 4.1 does get it wrong in both cases, but is sufficiently old that we probably don't need to care about it anymore. Signed-off-by: Jeff King <> Reviewed-by: Jonathan Nieder <> Signed-off-by: Junio C Hamano <>
2012-09-07Merge branch 'nd/i18n-parseopt-help'Junio C Hamano
A lot of i18n mark-up for the help text from "git <cmd> -h". * nd/i18n-parseopt-help: (66 commits) Use imperative form in help usage to describe an action Reduce translations by using same terminologies i18n: write-tree: mark parseopt strings for translation i18n: verify-tag: mark parseopt strings for translation i18n: verify-pack: mark parseopt strings for translation i18n: update-server-info: mark parseopt strings for translation i18n: update-ref: mark parseopt strings for translation i18n: update-index: mark parseopt strings for translation i18n: tag: mark parseopt strings for translation i18n: symbolic-ref: mark parseopt strings for translation i18n: show-ref: mark parseopt strings for translation i18n: show-branch: mark parseopt strings for translation i18n: shortlog: mark parseopt strings for translation i18n: rm: mark parseopt strings for translation i18n: revert, cherry-pick: mark parseopt strings for translation i18n: rev-parse: mark parseopt strings for translation i18n: reset: mark parseopt strings for translation i18n: rerere: mark parseopt strings for translation i18n: status: mark parseopt strings for translation i18n: replace: mark parseopt strings for translation ...
2012-08-27Merge branch 'jk/maint-null-in-trees'Junio C Hamano
We do not want a link to 0{40} object stored anywhere in our objects. * jk/maint-null-in-trees: fsck: detect null sha1 in tree entries do not write null sha1s to on-disk index diff: do not use null sha1 as a sentinel value
2012-08-20i18n: cat-file: mark parseopt strings for translationNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2012-07-29diff: do not use null sha1 as a sentinel valueJeff King
The diff code represents paths using the diff_filespec struct. This struct has a sha1 to represent the sha1 of the content at that path, as well as a sha1_valid member which indicates whether its sha1 field is actually useful. If sha1_valid is not true, then the filespec represents a working tree file (e.g., for the no-index case, or for when the index is not up-to-date). The diff_filespec is only used internally, though. At the interfaces to the diff subsystem, callers feed the sha1 directly, and we create a diff_filespec from it. It's at that point that we look at the sha1 and decide whether it is valid or not; callers may pass the null sha1 as a sentinel value to indicate that it is not. We should not typically see the null sha1 coming from any other source (e.g., in the index itself, or from a tree). However, a corrupt tree might have a null sha1, which would cause "diff --patch" to accidentally diff the working tree version of a file instead of treating it as a blob. This patch extends the edges of the diff interface to accept a "sha1_valid" flag whenever we accept a sha1, and to use that flag when creating a filespec. In some cases, this means passing the flag through several layers, making the code change larger than would be desirable. One alternative would be to simply die() upon seeing corrupted trees with null sha1s. However, this fix more directly addresses the problem (while bogus sha1s in a tree are probably a bad thing, it is really the sentinel confusion sending us down the wrong code path that is what makes it devastating). And it means that git is more capable of examining and debugging these corrupted trees. For example, you can still "diff --raw" such a tree to find out when the bogus entry was introduced; you just cannot do a "--patch" diff (just as you could not with any other corrupted tree, as we do not have any content to diff). Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2012-07-22Merge branch 'jc/sha1-name-more'Junio C Hamano
Teaches the object name parser things like a "git describe" output is always a commit object, "A" in "git log A" must be a committish, and "A" and "B" in "git log A...B" both must be committish, etc., to prolong the lifetime of abbreviated object names. * jc/sha1-name-more: (27 commits) t1512: match the "other" object names t1512: ignore whitespaces in wc -l output rev-parse --disambiguate=<prefix> rev-parse: A and B in "rev-parse A..B" refer to committish reset: the command takes committish commit-tree: the command wants a tree and commits apply: --build-fake-ancestor expects blobs sha1_name.c: add support for disambiguating other types revision.c: the "log" family, except for "show", takes committish revision.c: allow handle_revision_arg() to take other flags sha1_name.c: introduce get_sha1_committish() sha1_name.c: teach lookup context to get_sha1_with_context() sha1_name.c: many short names can only be committish sha1_name.c: get_sha1_1() takes lookup flags sha1_name.c: get_describe_name() by definition groks only commits sha1_name.c: teach get_short_sha1() a commit-only option sha1_name.c: allow get_short_sha1() to take other flags get_sha1(): fix error status regression sha1_name.c: restructure disambiguation of short names sha1_name.c: correct misnamed "canonical" and "res" ...
2012-07-09sha1_name.c: teach lookup context to get_sha1_with_context()Junio C Hamano
The function takes user input string and returns the object name (binary SHA-1) with mode bits and path when the object was looked up in a tree. Additionally give hints to help disambiguation of abbreviated object names when the caller knows what it is looking for. Signed-off-by: Junio C Hamano <>
2012-03-07cat-file: use streaming API to print blobsNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2012-02-07drop odd return value semantics from userdiff_configJeff King
When the userdiff_config function was introduced in be58e70 (diff: unify external diff and funcname parsing code, 2008-10-05), it used a return value convention unlike any other config callback. Like other callbacks, it used "-1" to signal error. But it returned "1" to indicate that it found something, and "0" otherwise; other callbacks simply returned "0" to indicate that no error occurred. This distinction was necessary at the time, because the userdiff namespace overlapped slightly with the color configuration namespace. So "" could mean "the 'foo' slot of diff coloring" or "the 'foo' component of the "color" userdiff driver". Because the color-parsing code would die on an unknown color slot, we needed the userdiff code to indicate that it had matched the variable, letting us bypass the color-parsing code entirely. Later, in 8b8e862 (ignore unknown color configuration, 2009-12-12), the color-parsing code learned to silently ignore unknown slots. This means we no longer need to protect userdiff-matched variables from reaching the color-parsing code. We can therefore change the userdiff_config calling convention to a more normal one. This drops some code from each caller, which is nice. But more importantly, it reduces the cognitive load for readers who may wonder why userdiff_config is unlike every other config callback. There's no need to add a new test confirming that this works; t4020 already contains a test that sets diff.color.external. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2011-06-20plug a few coverity-spotted leaksJim Meyering
Signed-off-by: Jim Meyering <> Signed-off-by: Junio C Hamano <>
2010-09-29blame,cat-file --textconv: Don't assume mode is ``S_IFREF | 0664''Kirill Smelkov
We need to get the correct mode when blame reads the source from the working tree, the index, or trees. This allows us to omit running textconv filters on symbolic links. Signed-off-by: Kirill Smelkov <> Reviewed-by: Matthieu Moy <> Signed-off-by: Junio C Hamano <>
2010-06-27Merge branch 'cp/textconv-cat-file'Junio C Hamano
* cp/textconv-cat-file: git-cat-file.txt: Document --textconv t/t8007: test textconv support for cat-file textconv: support for cat_file sha1_name: add get_sha1_with_context()
2010-06-18textconv: support for cat_fileClément Poulain
Make the textconv_object function public, and add --textconv option to cat-file to perform conversion on blob objects. Using --textconv implies that we are working on a blob. As files drivers need to be initialized, a new config is required in addition to git_default_config. Therefore git_cat_file_config() is introduced Signed-off-by: Clément Poulain <> Signed-off-by: Diane Gasselin <> Signed-off-by: Axel Bonnet <> Signed-off-by: Junio C Hamano <>
2010-05-31Rewrite dynamic structure initializations to runtime assignmentGary V. Vaughan
Unfortunately, there are still plenty of production systems with vendor compilers that choke unless all compound declarations can be determined statically at compile time, for example hpux10.20 (I can provide a comprehensive list of our supported platforms that exhibit this problem if necessary). This patch simply breaks apart any compound declarations with dynamic initialisation expressions, and moves the initialisation until after the last declaration in the same block, in all the places necessary to have the offending compilers accept the code. Signed-off-by: Gary V. Vaughan <> Signed-off-by: Junio C Hamano <>
2010-02-22Move 'builtin-*' into a 'builtin/' subdirectoryLinus Torvalds
This shrinks the top-level directory a bit, and makes it much more pleasant to use auto-completion on the thing. Instead of [torvalds@nehalem git]$ em buil<tab> Display all 180 possibilities? (y or n) [torvalds@nehalem git]$ em builtin-sh builtin-shortlog.c builtin-show-branch.c builtin-show-ref.c builtin-shortlog.o builtin-show-branch.o builtin-show-ref.o [torvalds@nehalem git]$ em builtin-shor<tab> builtin-shortlog.c builtin-shortlog.o [torvalds@nehalem git]$ em builtin-shortlog.c you get [torvalds@nehalem git]$ em buil<tab> [type] builtin/ builtin.h [torvalds@nehalem git]$ em builtin [auto-completes to] [torvalds@nehalem git]$ em builtin/sh<tab> [type] shortlog.c shortlog.o show-branch.c show-branch.o show-ref.c show-ref.o [torvalds@nehalem git]$ em builtin/sho [auto-completes to] [torvalds@nehalem git]$ em builtin/shor<tab> [type] shortlog.c shortlog.o [torvalds@nehalem git]$ em builtin/shortlog.c which doesn't seem all that different, but not having that annoying break in "Display all 180 possibilities?" is quite a relief. NOTE! If you do this in a clean tree (no object files etc), or using an editor that has auto-completion rules that ignores '*.o' files, you won't see that annoying 'Display all 180 possibilities?' message - it will just show the choices instead. I think bash has some cut-off around 100 choices or something. So the reason I see this is that I'm using an odd editory, and thus don't have the rules to cut down on auto-completion. But you can simulate that by using 'ls' instead, or something similar. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>