summaryrefslogtreecommitdiff
path: root/t/t1006-cat-file.sh
AgeCommit message (Collapse)Author
2021-01-05mktag: use fsck instead of custom verify_tag()Ævar Arnfjörð Bjarmason
Change the validation logic in "mktag" to use fsck's fsck_tag() instead of its own custom parser. Curiously the logic for both dates back to the same commit[1]. Let's unify them so we're not maintaining two sets functions to verify that a tag is OK. The behavior of fsck_tag() and the old "mktag" code being removed here is different in few aspects. I think it makes sense to remove some of those checks, namely: A. fsck only cares that the timezone matches [-+][0-9]{4}. The mktag code disallowed values larger than 1400. Yes there's currently no timezone with a greater offset[2], but since we allow any number of non-offical timezones (e.g. +1234) passing this through seems fine. Git also won't break in the future if e.g. French Polynesia decides it needs to outdo the Line Islands when it comes to timezone extravagance. B. fsck allows missing author names such as "tagger <email>", mktag wouldn't, but would allow e.g. "tagger [2 spaces] <email>" (but not "tagger [1 space] <email>"). Now we allow all of these. C. Like B, but "mktag" disallowed spaces in the <email> part, fsck allows it. In some ways fsck_tag() is stricter than "mktag" was, namely: D. fsck disallows zero-padded dates, but mktag didn't care. So e.g. the timestamp "0000000000 +0000" produces an error now. A test in "t1006-cat-file.sh" relied on this, it's been changed to use "hash-object" (without fsck) instead. There was one check I deemed worth keeping by porting it over to fsck_tag(): E. "mktag" did not allow any custom headers, and by extension (as an empty commit is allowed) also forbade an extra stray trailing newline after the headers it knew about. Add a new check in the "ignore" category to fsck and use it. This somewhat abuses the facility added in efaba7cc77f (fsck: optionally ignore specific fsck issues completely, 2015-06-22). This is somewhat of hack, but probably the least invasive change we can make here. The fsck command will shuffle these categories around, e.g. under --strict the "info" becomes a "warn" and "warn" becomes "error". Existing users of fsck's (and others, e.g. index-pack) --strict option rely on this. So we need to put something into a category that'll be ignored by all existing users of the API. Pretending that fsck.extraHeaderEntry=error ("ignore" by default) was set serves to do this for us. 1. ec4465adb38 (Add "tag" objects that can be used to sign other objects., 2005-04-25) 2. https://en.wikipedia.org/wiki/List_of_UTC_time_offsets Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-30t: remove test_oid_init in testsbrian m. carlson
Now that we call test_oid_init in the setup for all test scripts, there's no point in calling it individually. Remove all of the places where we've done so to help keep tests tidy. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-16Merge branch 'md/test-cleanup'Junio C Hamano
Various test scripts have been updated for style and also correct handling of exit status of various commands. * md/test-cleanup: tests: order arguments to git-rev-list properly t9109: don't swallow Git errors upstream of pipes tests: don't swallow Git errors upstream of pipes t/*: fix ordering of expected/observed arguments tests: standardize pipe placement Documentation: add shell guidelines t/README: reformat Do, Don't, Keep in mind lists
2018-10-06tests: standardize pipe placementMatthew DeVore
Instead of using a line-continuation and pipe on the second line, take advantage of the shell's implicit line continuation after a pipe character. So for example, instead of some long line \ | next line use some long line | next line And add a blank line before and after the pipe where it aids readability (it usually does). This better matches the coding style documented in Documentation/CodingGuidelines and used in shell scripts elsewhere in the tree. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17t1006: make hash size independentbrian m. carlson
Compute the size of the tree and commit objects we're creating by checking for the size of an object ID and computing the resulting sizes accordingly. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13cat-file: support "unordered" output for --batch-all-objectsJeff King
If you're going to access the contents of every object in a packfile, it's generally much more efficient to do so in pack order, rather than in hash order. That increases the locality of access within the packfile, which in turn is friendlier to the delta base cache, since the packfile puts related deltas next to each other. By contrast, hash order is effectively random, since the sha1 has no discernible relationship to the content. This patch introduces an "--unordered" option to cat-file which iterates over packs in pack-order under the hood. You can see the results when dumping all of the file content: $ time ./git cat-file --batch-all-objects --buffer --batch | wc -c 6883195596 real 0m44.491s user 0m42.902s sys 0m5.230s $ time ./git cat-file --unordered \ --batch-all-objects --buffer --batch | wc -c 6883195596 real 0m6.075s user 0m4.774s sys 0m3.548s Same output, different order, way faster. The same speed-up applies even if you end up accessing the object content in a different process, like: git cat-file --batch-all-objects --buffer --batch-check | grep blob | git cat-file --batch='%(objectname) %(rest)' | wc -c Adding "--unordered" to the first command drops the runtime in git.git from 24s to 3.5s. Side note: there are actually further speedups available for doing it all in-process now. Since we are outputting the object content during the actual pack iteration, we know where to find the object and could skip the extra lookup done by oid_object_info(). This patch stops short of that optimization since the underlying API isn't ready for us to make those sorts of direct requests. So if --unordered is so much better, why not make it the default? Two reasons: 1. We've promised in the documentation that --batch-all-objects outputs in hash order. Since cat-file is plumbing, people may be relying on that default, and we can't change it. 2. It's actually _slower_ for some cases. We have to compute the pack revindex to walk in pack order. And our de-duplication step uses an oidset, rather than a sort-and-dedup, which can end up being more expensive. If we're just accessing the type and size of each object, for example, like: git cat-file --batch-all-objects --buffer --batch-check my best-of-five warm cache timings go from 900ms to 1100ms using --unordered. Though it's possible in a cold-cache or under memory pressure that we could do better, since we'd have better locality within the packfile. And one final question: why is it "--unordered" and not "--pack-order"? The answer is again two-fold: 1. "pack order" isn't a well-defined thing across the whole set of objects. We're hitting loose objects, as well as objects in multiple packs, and the only ordering we're promising is _within_ a single pack. The rest is apparently random. 2. The point here is optimization. So we don't want to promise any particular ordering, but only to say that we will choose an ordering which is likely to be efficient for accessing the object content. That leaves the door open for further changes in the future without having to add another compatibility option. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13t1006: test cat-file --batch-all-objects with duplicatesJeff King
The test for --batch-all-objects in t1006 covers a variety of object storage situations, but one thing it doesn't cover is that we avoid mentioning duplicate objects. We won't have any because running "git repack -ad" will have packed them all and deleted the loose ones. This does work (because we sort and de-dup the output list), but it's good to include it in our test. And doubly so for when we add an unordered mode which has to de-dup in a different way. Note that we cannot just re-create one of the objects, as Git will omit the write of an object that is already present. However, we can create a new pack with one of the objects, which forces the duplication. One alternative would be to just use "git repack -a" instead of "-ad". But then _every_ object would be duplicated as loose and packed, and we might miss a bug that omits packed objects (because we'd show their loose counterparts). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-14t: switch $_z40 to $ZERO_OIDbrian m. carlson
Switch all uses of $_z40 to $ZERO_OID so that they work correctly with larger hashes. This commit was created by using the following sed command to modify all files in the t directory except t/test-lib.sh: sed -i 's/\$_z40/$ZERO_OID/g' Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-27t/helper: merge test-genrandom into test-toolNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11Spelling fixesVille Skyttä
<BAD> <CORRECTED> accidently accidentally commited committed dependancy dependency emtpy empty existance existence explicitely explicitly git-upload-achive git-upload-archive hierachy hierarchy indegee indegree intial initial mulitple multiple non-existant non-existent precendence. precedence. priviledged privileged programatically programmatically psuedo-binary pseudo-binary soemwhere somewhere successfull successful transfering transferring uncommited uncommitted unkown unknown usefull useful writting writing Signed-off-by: Ville Skyttä <ville.skytta@iki.fi> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-26cat-file: sort and de-dup output of --batch-all-objectsJeff King
The sorting we could probably live without, but printing duplicates is just a hassle for the user, who must then de-dup themselves (or risk a wrong answer if they are doing something like counting objects with a particular property). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-22cat-file: add --batch-all-objects optionJeff King
It can sometimes be useful to examine all objects in the repository. Normally this is done with "git rev-list --all --objects", but: 1. That shows only reachable objects. You may want to look at all available objects. 2. It's slow. We actually open each object to walk the graph. If your operation is OK with seeing unreachable objects, it's an order of magnitude faster to just enumerate the loose directories and pack indices. You can do this yourself using "ls" and "git show-index", but it's non-obvious. This patch adds an option to "cat-file --batch-check" to operate on all available objects (rather than reading names from stdin). This is based on a proposal by Charles Bailey to provide a separate "git list-all-objects" command. That is more orthogonal, as it splits enumerating the objects from getting information about them. However, in practice you will either: a. Feed the list of objects directly into cat-file anyway, so you can find out information about them. Keeping it in a single process is more efficient. b. Ask the listing process to start telling you more information about the objects, in which case you will reinvent cat-file's batch-check formatter. Adding a cat-file option is simple and efficient. And if you really do want just the object names, you can always do: git cat-file --batch-check='%(objectname)' --batch-all-objects Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-01Merge branch 'dt/cat-file-follow-symlinks'Junio C Hamano
"git cat-file --batch(-check)" learned the "--follow-symlinks" option that follows an in-tree symbolic link when asked about an object via extended SHA-1 syntax, e.g. HEAD:RelNotes that points at Documentation/RelNotes/2.5.0.txt. With the new option, the command behaves as if HEAD:Documentation/RelNotes/2.5.0.txt was given as input instead. * dt/cat-file-follow-symlinks: cat-file: add --follow-symlinks to --batch sha1_name: get_sha1_with_context learns to follow symlinks tree-walk: learn get_tree_entry_follow_symlinks
2015-05-20cat-file: add --follow-symlinks to --batchDavid Turner
This wires the in-repo-symlink following code through to the cat-file builtin. In the event of an out-of-repo link, cat-file will print the link in a new format. Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-06t1006: add tests for git cat-file --allow-unknown-typeKarthik Nayak
Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-20t: fix trivial &&-chain breakageJeff King
These are tests which are missing a link in their &&-chain, but during a setup phase. We may fail to notice failure in commands that build the test environment, but these are typically not expected to fail at all (but it's still good to double-check that our test environment is what we expect). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-10Merge branch 'jk/oi-delta-base'Junio C Hamano
Teach "cat-file --batch" to show delta-base object name for a packed object that is represented as a delta. * jk/oi-delta-base: cat-file: provide %(deltabase) batch format sha1_object_info_extended: provide delta base sha1s
2013-12-27Merge branch 'jk/cat-file-regression-fix'Junio C Hamano
"git cat-file --batch=", an admittedly useless command, did not behave very well. * jk/cat-file-regression-fix: cat-file: handle --batch format with missing type/size cat-file: pass expand_data to print_object_or_die
2013-12-26cat-file: provide %(deltabase) batch formatJeff King
It can be useful for debugging or analysis to see which objects are stored as delta bases on top of others. This information is available by running `git verify-pack`, but that is extremely expensive (and is harder than necessary to parse). Instead, let's make it available as a cat-file query format, which makes it fast and simple to get the bases for a subset of the objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-12cat-file: handle --batch format with missing type/sizeJeff King
Commit 98e2092 taught cat-file to stream blobs with --batch, which requires that we look up the object type before loading it into memory. As a result, we now print the object header from information in sha1_object_info, and the actual contents from the read_sha1_file. We double-check that the information we printed in the header matches the content we are about to show. Later, commit 93d2a60 allowed custom header lines for --batch, and commit 5b08640 made type lookups optional. As a result, specifying a header line without the type or size means that we will not look up those items at all. This causes our double-checking to erroneously die with an error; we think the type or size has changed, when in fact it was simply left at "0". For the size, we can fix this by only doing the consistency double-check when we have retrieved the size via sha1_object_info. In the case that we have not retrieved the value, that means we also did not print it, so there is nothing for us to check that we are consistent with. We could do the same for the type. However, besides our consistency check, we also care about the type in deciding whether to stream or not. So instead of handling the case where we do not know the type, this patch instead makes sure that we always trigger a type lookup when we are printing, so that even a format without the type will stream as we would in the normal case. Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05Merge branch 'sb/sha1-loose-object-info-check-existence'Junio C Hamano
"git cat-file --batch-check=ok" did not check the existence of the named object. * sb/sha1-loose-object-info-check-existence: sha1_loose_object_info(): do not return success on missing object
2013-11-06sha1_loose_object_info(): do not return success on missing objectJunio C Hamano
Since 052fe5ea (sha1_loose_object_info: make type lookup optional, 2013-07-12), sha1_loose_object_info() returns happily without checking if the object in question exists, which is not what the the caller sha1_object_info_extended() expects; the caller does not even bother checking the existence of the object itself. Noticed-by: Sven Brauch <svenbrauch@googlemail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-05cat-file: only split on whitespace when %(rest) is usedJeff King
Commit c334b87b (cat-file: split --batch input lines on whitespace, 2013-07-11) taught `cat-file --batch-check` to split input lines on the first whitespace, and stash everything after the first token into the %(rest) output format element. It claimed: Object names cannot contain spaces, so any input with spaces would have resulted in a "missing" line. But that is not correct. Refs, object sha1s, and various peeling suffixes cannot contain spaces, but some object names can. In particular: 1. Tree paths like "[<tree>]:path with whitespace" 2. Reflog specifications like "@{2 days ago}" 3. Commit searches like "rev^{/grep me}" or ":/grep me" To remain backwards compatible, we cannot split on whitespace by default, hence we will ship 1.8.4 with the commit reverted. Resurrect its attempt but in a weaker form; only do the splitting when "%(rest)" is used in the output format. Since that element did not exist at all before c334b87, old scripts cannot be affected. The existence of object names with spaces does mean that you cannot reliably do: echo ":path with space and other data" | git cat-file --batch-check="%(objectname) %(rest)" as it would split the path and feed only ":path" to get_sha1. But that command is nonsensical. If you wanted to see "and other data" in "%(rest)", git cannot possibly know where the filename ends and the "rest" begins. It might be more robust to have something like "-z" to separate the input elements. But this patch is still a reasonable step before having that. It makes the easy cases easy; people who do not care about %(rest) do not have to consider it, and the %(rest) code handles the spaces and newlines of "rev-list --objects" correctly. Hard cases remain hard but possible (if you might get whitespace in your input, you do not get to use %(rest) and must split and join the output yourself using more flexible tools). And most importantly, it does not preclude us from having different splitting rules later if a "-z" (or similar) option is added. So we can make the hard cases easier later, if we choose to. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-02Revert "cat-file: split --batch input lines on whitespace"Junio C Hamano
This reverts commit c334b87b30c1464a1ab563fe1fb8de5eaf0e5bac; the update assumed that people only used the command to read from "rev-list --objects" output, whose lines begin with a 40-hex object name followed by a whitespace, but it turns out that scripts feed random extended SHA-1 expressions (e.g. "HEAD:$pathname") in which a whitespace has to be kept.
2013-07-12cat-file: split --batch input lines on whitespaceJeff King
If we get an input line to --batch or --batch-check that looks like "HEAD foo bar", we will currently feed the whole thing to get_sha1(). This means that to use --batch-check with `rev-list --objects`, one must pre-process the input, like: git rev-list --objects HEAD | cut -d' ' -f1 | git cat-file --batch-check Besides being more typing and slightly less efficient to invoke `cut`, the result loses information: we no longer know which path each object was found at. This patch teaches cat-file to split input lines at the first whitespace. Everything to the left of the whitespace is considered an object name, and everything to the right is made available as the %(reset) atom. So you can now do: git rev-list --objects HEAD | git cat-file --batch-check='%(objectsize) %(rest)' to collect object sizes at particular paths. Even if %(rest) is not used, we always do the whitespace split (which means you can simply eliminate the `cut` command from the first example above). This whitespace split is backwards compatible for any reasonable input. Object names cannot contain spaces, so any input with spaces would have resulted in a "missing" line. The only input hurt is if somebody really expected input of the form "HEAD is a fine-looking ref!" to fail; it will now parse HEAD, and make "is a fine-looking ref!" available as %(rest). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12cat-file: add --batch-check=<format>Jeff King
The `cat-file --batch-check` command can be used to quickly get information about a large number of objects. However, it provides a fixed set of information. This patch adds an optional <format> option to --batch-check to allow a caller to specify which items they are interested in, and in which order to output them. This is not very exciting for now, since we provide the same limited set that you could already get. However, it opens the door to adding new format items in the future without breaking backwards compatibility (or forcing callers to pay the cost to calculate uninteresting items). Since the --batch option shares code with --batch-check, it receives the same feature, though it is less likely to be of interest there. The format atom names are chosen to match their counterparts in for-each-ref. Though we do not (yet) share any code with for-each-ref's formatter, this keeps the interface as consistent as possible, and may help later on if the implementations are unified. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-11t1006: modernize output comparisonsJeff King
In modern tests, we typically put output into a file and compare it with test_cmp. This is nicer than just comparing via "test", and much shorter than comparing via "test" and printing a custom message. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17cat-file: print tags raw for "cat-file -p"Jeff King
When "cat-file -p" prints commits, it shows them in their raw format, since git's format is already human-readable. For tags, however, we print the whole thing raw except for one thing: we convert the timestamp on the tagger line into a human-readable date. This dates all the way back to a0f15fa (Pretty-print tagger dates, 2006-03-01). At that time there was no other way to pretty-print a tag. These days, however, neither of those matters much. The normal way to pretty-print a tag is with "git show", which is much more flexible than "cat-file -p". Commit a0f15fa also built "verify-tag --verbose" (and subsequently "tag -v") around the "cat-file -p" output. However, that behavior was lost in commit 62e09ce (Make git tag a builtin, 2007-07-20), and we went back to printing the raw tag contents. Nobody seems to have noticed the bug since then (and it is arguably a saner behavior anyway, as it shows the actual bytes for which we verified the signature). Let's drop the tagger-date formatting for "cat-file -p". It makes us more consistent with cat-file's commit pretty-printer, and as a bonus, we can drop the hand-rolled tag parsing code in cat-file (which happened to behave inconsistently with the tag pretty-printing code elsewhere). This is a change of output format, so it's possible that some callers could considered this a regression. However, the original behavior was arguably a bug (due to the inconsistency with commits), likely nobody was relying on it (even we do not use it ourselves these days), and anyone relying on the "-p" pretty-printer should be able to expect a change in the output format (i.e., while "cat-file" is plumbing, the output format of "-p" was never guaranteed to be stable). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-12Correct common spelling mistakes in comments and testsStefano Lattarini
Most of these were found using Lucas De Marchi's codespell tool. Signed-off-by: Stefano Lattarini <stefano.lattarini@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Acked-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-09cat-file --batch / --batch-check: do not exit if hashes are missingLea Wiemann
Previously, cat-file --batch / --batch-check would silently exit if it was passed a non-existent SHA1 on stdin. Now it prints "<SHA1> missing" as in all other cases (and as advertised in the documentation). Note that cat-file --batch-check (but not --batch) will still output "error: unable to find <SHA1>" on stderr if a non-existent SHA1 is passed, but this does not affect parsing its stdout. Also, type <= 0 was previously using the potentially uninitialized type variable (relying on it being 0); it is now being initialized. Signed-off-by: Lea Wiemann <LeWiemann@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-09t1006-cat-file.sh: typoLea Wiemann
Previously timestamps were removed unconditionally (though this didn't seem to break this test). Now they are only removed if $no_ts is non-empty. Signed-off-by: Lea Wiemann <LeWiemann@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-23change quoting in test t1006-cat-file.shMichele Ballabio
Signed-off-by: Michele Ballabio <barra_cuda@katamail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-06git-cat-file: Add --batch optionAdam Roben
--batch is similar to --batch-check, except that the contents of each object is also printed. The output's form is: <sha1> SP <type> SP <size> LF <contents> LF Signed-off-by: Adam Roben <aroben@apple.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-06git-cat-file: Add --batch-check optionAdam Roben
This new option allows multiple objects to be specified on stdin. For each object specified, a line of the following form is printed: <sha1> SP <type> SP <size> LF If the object does not exist in the repository, a line of the following form is printed: <object> SP missing LF Signed-off-by: Adam Roben <aroben@apple.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-06Add tests for git cat-fileAdam Roben
Signed-off-by: Adam Roben <aroben@apple.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>