path: root/pack-check.c
AgeCommit message (Collapse)Author
2021-06-29csum-file: introduce checksum_valid()Taylor Blau
Introduce a new function which checks the validity of a file's trailing checksum. This is similar to hashfd_check(), but different since it is intended to be used by callers who aren't writing the same data (like `git index-pack --verify`), but who instead want to validate the integrity of data that they are reading. Rewrite the first of two callers which could benefit from this new function in pack-check.c. Subsequent callers will be added in the following patches. Helped-by: Jeff King <> Signed-off-by: Jeff King <> Signed-off-by: Taylor Blau <> Signed-off-by: Junio C Hamano <>
2020-11-16fsck: correctly compute checksums on idx files larger than 4GBJeff King
When checking the trailing checksum hash of a .idx file, we pass the whole buffer (minus the trailing hash) into a single call to the_hash_algo->update_fn(). But we cast it to an "unsigned int". This comes from c4001d92be (Use off_t when we really mean a file offset., 2007-03-06). That commit started storing the index_size variable as an off_t, but our mozilla-sha1 implementation from the time was limited to a smaller size. Presumably the cast was a way of annotating that we expected .idx files to be small, and so we didn't need to loop (as we do for arbitrarily-large .pack files). Though as an aside it was still wrong, because the mozilla function actually took a signed int. These days our hash-update functions are defined to take a size_t, so we can pass the whole buffer in directly. The cast is actually causing a buggy truncation! While we're here, though, let's drop the confusing off_t variable in the first place. We're getting the size not from the filesystem anyway, but from p->index_size, which is a size_t. In fact, we can make the code a bit more readable by dropping our local variable duplicating p->index_size, and instead have one that stores the size of the actual index data, minus the trailing hash. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2020-11-16compute pack .idx byte offsets using size_tJeff King
A pack and its matching .idx file are limited to 2^32 objects, because the pack format contains a 32-bit field to store the number of objects. Hence we use uint32_t in the code. But the byte count of even a .idx file can be much larger than that, because it stores at least a hash and an offset for each object. So using SHA-1, a v2 .idx file will cross the 4GB boundary at 153,391,650 objects. This confuses load_idx(), which computes the minimum size like this: unsigned long min_size = 8 + 4*256 + nr*(hashsz + 4 + 4) + hashsz + hashsz; Even though min_size will be big enough on most 64-bit platforms, the actual arithmetic is done as a uint32_t, resulting in a truncation. We actually exceed that min_size, but then we do: unsigned long max_size = min_size; if (nr) max_size += (nr - 1)*8; to account for the variable-sized table. That computation doesn't overflow quite so low, but with the truncation for min_size, we end up with a max_size that is much smaller than our actual size. So we complain that the idx is invalid, and can't find any of its objects. We can fix this case by casting "nr" to a size_t, which will do the multiplication in 64-bits (assuming you're on a 64-bit platform; this will never work on a 32-bit system since we couldn't map the whole .idx anyway). Likewise, we don't have to worry about further additions, because adding a smaller number to a size_t will convert the other side to a size_t. A few notes: - obviously we could just declare "nr" as a size_t in the first place (and likewise, packed_git.num_objects). But it's conceptually a uint32_t because of the on-disk format, and we correctly treat it that way in other contexts that don't need to compute byte offsets (e.g., iterating over the set of objects should and generally does use a uint32_t). Switching to size_t would make all of those other cases look wrong. - it could be argued that the proper type is off_t to represent the file offset. But in practice the .idx file must fit within memory, because we mmap the whole thing. And the rest of the code (including the idx_size variable we're comparing against) uses size_t. - we'll add the same cast to the max_size arithmetic line. Even though we're adding to a larger type, which will convert our result, the multiplication is still done as a 32-bit value and can itself overflow. I didn't check this with my test case, since it would need an even larger pack (~530M objects), but looking at compiler output shows that it works this way. The standard should agree, but I couldn't find anything explicit in ("usual arithmetic conversions"). The case in load_idx() was the most immediate one that I was able to trigger. After fixing it, looking up actual objects (including the very last one in sha1 order) works in a test repo with 153,725,110 objects. That's because bsearch_hash() works with uint32_t entry indices, and the actual byte access: int cmp = hashcmp(table + mi * stride, sha1); is done with "stride" as a size_t, causing the uint32_t "mi" to be promoted to a size_t. This is the way most code will access the index data. However, I audited all of the other byte-wise accesses of packed_git.index_data, and many of the others are suspect (they are similar to the max_size one, where we are adding to a properly sized offset or directly to a pointer, but the multiplication in the sub-expression can overflow). I didn't trigger any of these in practice, but I believe they're potential problems, and certainly adding in the cast is not going to hurt anything here. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2020-02-24pack-check: push oid lookup into loopJeff King
When we're checking a pack with fsck or verify-pack, we first sort the idx entries by offset, since accessing them in pack order is more efficient. To do so, we loop over them and fill in an array of structs with the offset, object_id, and index position of each, sort the result, and only then do we iterate over the sorted array and process each entry. In order to avoid the memory cost of storing the hash of each object, we just store a pointer into the copy in the mmap'd pack index file. To keep that property even as the rest of the code converted to "struct object_id", commit 9fd750461b (Convert the verify_pack callback to struct object_id, 2017-05-06) introduced a union in order to type-pun the pointer-to-hash into an object_id struct. But we can make this even simpler by observing that the sort operation doesn't need the object id at all! We only need them one at a time while we actually process each entry. So we can just omit the oid from the struct entirely and load it on the fly into a local variable in the second loop. This gets rid of the type-punning, and lets us directly use the more type-safe nth_packed_object_id(), simplifying the code. And as a bonus, it saves 8 bytes of memory per object. Note that this does mean we'll do the offset lookup for each object before the oid lookup. The oid lookup has more safety checks in it (e.g., for looking past p->num_objects) which in theory protected the offset lookup. But since violating those checks was already a BUG() condition (as described in the previous commit), it's not worth worrying about. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2020-02-24pack-check: convert "internal error" die to a BUG()Jeff King
If we fail to load the oid from the index of a packfile, we'll die() with an "internal error". But this should never happen: we'd fail here only if the idx needed to be lazily opened (but we've already opened it) or if we asked for an out-of-range index (but we're iterating using the same count that we'd check the range against). A corrupted index might have a bogus count (e.g., too large for its size), but we'd have complained and aborted already when opening the index initially. While we're here, we can add a few details so that if this bug ever _does_ trigger, we'll have a bit more information. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2020-01-31sha1-file: allow check_object_signature() to handle any repoMatheus Tavares
Some callers of check_object_signature() can work on arbitrary repositories, but the repo does not get passed to this function. Instead, the_repository is always used internally. To fix possible inconsistencies, allow the function to receive a struct repository and make those callers pass on the repo being handled. Signed-off-by: Matheus Tavares <> Signed-off-by: Junio C Hamano <>
2020-01-31pack-check: use given repo's hash_algo at verify_packfile()Matheus Tavares
At verify_packfile(), use the git_hash_algo from the provided repository instead of the_hash_algo, for consistency. Like the previous patch, this shouldn't bring any behavior changes, since this function is currently only receiving the_repository. Signed-off-by: Matheus Tavares <> Signed-off-by: Junio C Hamano <>
2018-11-12pack-check.c: remove the_repository referencesNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2018-08-29convert "hashcmp() != 0" to "!hasheq()"Jeff King
This rounds out the previous three patches, covering the inequality logic for the "hash" variant of the functions. As with the previous three, the accompanying code changes are the mechanical result of applying the coccinelle patch; see those patches for more discussion. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2018-04-26packfile: add repository argument to unpack_entryStefan Beller
Add a repository argument to allow the callers of unpack_entry to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <> Signed-off-by: Stefan Beller <> Reviewed-by: Jonathan Tan <> Signed-off-by: Junio C Hamano <>
2018-04-11Merge branch 'sb/object-store'Junio C Hamano
Refactoring the internal global data structure to make it possible to open multiple repositories, work with and then close them. Rerolled by Duy on top of a separate preliminary clean-up topic. The resulting structure of the topics looked very sensible. * sb/object-store: (27 commits) sha1_file: allow sha1_loose_object_info to handle arbitrary repositories sha1_file: allow map_sha1_file to handle arbitrary repositories sha1_file: allow map_sha1_file_1 to handle arbitrary repositories sha1_file: allow open_sha1_file to handle arbitrary repositories sha1_file: allow stat_sha1_file to handle arbitrary repositories sha1_file: allow sha1_file_name to handle arbitrary repositories sha1_file: add repository argument to sha1_loose_object_info sha1_file: add repository argument to map_sha1_file sha1_file: add repository argument to map_sha1_file_1 sha1_file: add repository argument to open_sha1_file sha1_file: add repository argument to stat_sha1_file sha1_file: add repository argument to sha1_file_name sha1_file: allow prepare_alt_odb to handle arbitrary repositories sha1_file: allow link_alt_odb_entries to handle arbitrary repositories sha1_file: add repository argument to prepare_alt_odb sha1_file: add repository argument to link_alt_odb_entries sha1_file: add repository argument to read_info_alternates sha1_file: add repository argument to link_alt_odb_entry sha1_file: add raw_object_store argument to alt_odb_usable pack: move approximate object count to object store ...
2018-03-26object-store: move packed_git and packed_git_mru to object storeStefan Beller
In a process with multiple repositories open, packfile accessors should be associated to a single repository and not shared globally. Move packed_git and packed_git_mru into the_repository and adjust callers to reflect this. [nd: while at there, wrap access to these two fields in get_packed_git() and get_packed_git_mru(). This allows us to lazily initialize these fields without caller doing that explicitly] Signed-off-by: Stefan Beller <> Signed-off-by: Jonathan Nieder <> Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2018-03-14sha1_file: convert check_sha1_signature to struct object_idbrian m. carlson
Convert this function to take a pointer to struct object_id and rename it check_object_signature. Introduce temporaries to convert the return values of lookup_replace_object and lookup_replace_object_extended into struct object_id. The temporaries are needed because in order to convert lookup_replace_object, open_istream needs to be converted, and open_istream needs check_sha1_signature to be converted, causing a loop of dependencies. The temporaries will be removed in a future patch. Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2018-03-06Merge branch 'bw/c-plus-plus'Junio C Hamano
Avoid using identifiers that clash with C++ keywords. Even though it is not a goal to compile Git with C++ compilers, changes like this help use of code analysis tools that targets C++ on our codebase. * bw/c-plus-plus: (37 commits) replace: rename 'new' variables trailer: rename 'template' variables tempfile: rename 'template' variables wrapper: rename 'template' variables environment: rename 'namespace' variables diff: rename 'template' variables environment: rename 'template' variables init-db: rename 'template' variables unpack-trees: rename 'new' variables trailer: rename 'new' variables submodule: rename 'new' variables split-index: rename 'new' variables remote: rename 'new' variables ref-filter: rename 'new' variables read-cache: rename 'new' variables line-log: rename 'new' variables imap-send: rename 'new' variables http: rename 'new' variables entry: rename 'new' variables diffcore-delta: rename 'new' variables ...
2018-02-14object: rename function 'typename' to 'type_name'Brandon Williams
Rename C++ keyword in order to bring the codebase closer to being able to be compiled with a C++ compiler. Signed-off-by: Brandon Williams <> Signed-off-by: Junio C Hamano <>
2018-02-02pack-check: convert various uses of SHA-1 to abstract formsbrian m. carlson
Convert various explicit calls to use SHA-1 functions and constants to references to the_hash_algo. Make several strings more generic with respect to the hash algorithm used. Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2017-08-23pack: move open_pack_index(), parse_pack_index()Jonathan Tan
alloc_packed_git() in packfile.c is duplicated from sha1_file.c. In a subsequent commit, alloc_packed_git() will be removed from sha1_file.c. Signed-off-by: Jonathan Tan <> Signed-off-by: Junio C Hamano <>
2017-05-08Convert the verify_pack callback to struct object_idbrian m. carlson
Make the verify_pack_callback take a pointer to struct object_id. Change the pack checksum to use GIT_MAX_RAWSZ, even though it is not strictly an object ID. Doing so ensures resilience against future hash size changes, and allows us to remove hard-coded assumptions about how big the buffer needs to be. Also, use a union to convert the pointer from nth_packed_object_sha1 to to a pointer to struct object_id. This behavior is compatible with GCC and clang and explicitly sanctioned by C11. The alternatives are to just perform a cast, which would run afoul of strict aliasing rules, but should just work, and changing the pointer into an instance of struct object_id and copying the value. The latter operation could seriously bloat memory usage on fsck, which already uses a lot of memory on some repositories. Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2016-10-10Merge branch 'rs/qsort'Junio C Hamano
We call "qsort(array, nelem, sizeof(array[0]), fn)", and most of the time third parameter is redundant. A new QSORT() macro lets us omit it. * rs/qsort: show-branch: use QSORT use QSORT, part 2 coccicheck: use --all-includes by default remove unnecessary check before QSORT use QSORT add QSORT
2016-09-29Merge branch 'jk/verify-packfile-gently'Junio C Hamano
A low-level function verify_packfile() was meant to show errors that were detected without dying itself, but under some conditions it didn't and died instead, which has been fixed. * jk/verify-packfile-gently: verify_packfile: check pack validity before accessing data
2016-09-29use QSORTRené Scharfe
Apply the semantic patch contrib/coccinelle/qsort.cocci to the code base, replacing calls of qsort(3) with QSORT. The resulting code is shorter and supports empty arrays with NULL pointers. Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2016-09-22verify_packfile: check pack validity before accessing dataJeff King
The verify_packfile() does not explicitly open the packfile; instead, it starts with a sha1 checksum over the whole pack, and relies on use_pack() to open the packfile as a side effect. If the pack cannot be opened for whatever reason (either because its header information is corrupted, or perhaps because a simultaneous repack deleted it), then use_pack() will die(), as it has no way to return an error. This is not ideal, as verify_packfile() otherwise tries to gently return an error (this lets programs like git-fsck go on to check other packs). Instead, let's check is_pack_valid() up front, and return an error if it fails. This will open the pack as a side effect, and then use_pack() will later rely on our cached descriptor, and avoid calling die(). Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2016-07-13fsck: use streaming interface for large blobs in packNguyễn Thái Ngọc Duy
For blobs, we want to make sure the on-disk data is not corrupted (i.e. can be inflated and produce the expected SHA-1). Blob content is opaque, there's nothing else inside to check for. For really large blobs, we may want to avoid unpacking the entire blob in memory, just to check whether it produces the same SHA-1. On 32-bit systems, we may not have enough virtual address space for such memory allocation. And even on 64-bit where it's not a problem, allocating a lot more memory could result in kicking other parts of systems to swap file, generating lots of I/O and slowing everything down. For this particular operation, not unpacking the blob and letting check_sha1_signature, which supports streaming interface, do the job is sufficient. check_sha1_signature() is not shown in the diff, unfortunately. But if will be called when "data_valid && !data" is false. We will call the callback function "fn" with NULL as "data". The only callback of this function is fsck_obj_buffer(), which does not touch "data" at all if it's a blob. Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2016-02-22convert trivial cases to ALLOC_ARRAYJeff King
Each of these cases can be converted to use ALLOC_ARRAY or REALLOC_ARRAY, which has two advantages: 1. It automatically checks the array-size multiplication for overflow. 2. It always uses sizeof(*array) for the element-size, so that it can never go out of sync with the declared type of the array. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2015-12-01verify_pack: do not ignore return value of verification functionDavid Turner
In verify_pack, a caller-supplied verification function is called. The function returns an int. If that return value is non-zero, verify_pack should fail. The only caller of verify_pack is in builtin/fsck.c, whose verify_fn returns a meaningful error code (which was then ignored). Now, fsck might return a different error code (with more detail). This would happen in the unlikely event that a commit or tree that is a valid git object but not a valid instance of its type gets into a pack. Signed-off-by: David Turner <> Signed-off-by: Jeff King <>
2011-11-07fsck: print progressNguyễn Thái Ngọc Duy
fsck is usually a long process and it would be nice if it prints progress from time to time. Progress meter is not printed when --verbose is given because --verbose prints a lot, there's no need for "alive" indicator. Progress meter may provide "% complete" information but it would be lost anyway in the flood of text. Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2011-11-07fsck: avoid reading every object twiceNguyễn Thái Ngọc Duy
During verify_pack() all objects are read for SHA-1 check. Then fsck_sha1() is called on every object, which read the object again (fsck_sha1 -> parse_object -> read_sha1_file). Avoid reading an object twice, do fsck_sha1 while we have an object uncompressed data in verify_pack. On git.git, with this patch I got: $ /usr/bin/time ./git fsck >/dev/null 98.97user 0.90system 1:40.01elapsed 99%CPU (0avgtext+0avgdata 616624maxresident)k 0inputs+0outputs (0major+194186minor)pagefaults 0swaps Without it: $ /usr/bin/time ./git fsck >/dev/null 231.23user 2.35system 3:53.82elapsed 99%CPU (0avgtext+0avgdata 636688maxresident)k 0inputs+0outputs (0major+461629minor)pagefaults 0swaps Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2011-11-07verify_packfile(): check as many object as possible in a packNguyễn Thái Ngọc Duy
verify_packfile() checks for whole pack integerity first, then each object individually. Once we get past whole pack check, we can identify all objects in the pack. If there's an error with one object, we should continue to check the next objects to salvage as many objects as possible instead of stopping the process. Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2011-06-10zlib: zlib can only process 4GB at a timeJunio C Hamano
The size of objects we read from the repository and data we try to put into the repository are represented in "unsigned long", so that on larger architectures we can handle objects that weigh more than 4GB. But the interface defined in zlib.h to communicate with inflate/deflate limits avail_in (how many bytes of input are we calling zlib with) and avail_out (how many bytes of output from zlib are we ready to accept) fields effectively to 4GB by defining their type to be uInt. In many places in our code, we allocate a large buffer (e.g. mmap'ing a large loose object file) and tell zlib its size by assigning the size to avail_in field of the stream, but that will truncate the high octets of the real size. The worst part of this story is that we often pass around z_stream (the state object used by zlib) to keep track of the number of used bytes in input/output buffer by inspecting these two fields, which practically limits our callchain to the same 4GB limit. Wrap z_stream in another structure git_zstream that can express avail_in and avail_out in unsigned long. For now, just die() when the caller gives a size that cannot be given to a single zlib call. In later patches in the series, we would make git_inflate() and git_deflate() internally loop to give callers an illusion that our "improved" version of zlib interface can operate on a buffer larger than 4GB in one go. Signed-off-by: Junio C Hamano <>
2011-04-03sparse: Fix errors and silence warningsStephen Boyd
* load_file() returns a void pointer but is using 0 for the return value * builtin/receive-pack.c forgot to include builtin.h * packet_trace_prefix can be marked static * ll_merge takes a pointer for its last argument, not an int * crc32 expects a pointer as the second argument but Z_NULL is defined to be 0 (see 38f4d13 sparse fix: Using plain integer as NULL pointer, 2006-11-18 for more info) Signed-off-by: Stephen Boyd <> Signed-off-by: Junio C Hamano <>
2011-03-16standardize brace placement in struct definitionsJonathan Nieder
In a struct definitions, unlike functions, the prevailing style is for the opening brace to go on the same line as the struct name, like so: struct foo { int bar; char *baz; }; Indeed, grepping for 'struct [a-z_]* {$' yields about 5 times as many matches as 'struct [a-z_]*$'. Linus sayeth: Heretic people all over the world have claimed that this inconsistency is ... well ... inconsistent, but all right-thinking people know that (a) K&R are _right_ and (b) K&R are right. Signed-off-by: Jonathan Nieder <> Signed-off-by: Junio C Hamano <>
2010-08-22Typos in code comments, an error message, documentationRalf Wildenhues
Signed-off-by: Ralf Wildenhues <> Signed-off-by: Junio C Hamano <>
2010-04-20Extract verify_pack_index for reuse from verify_packShawn O. Pearce
The dumb HTTP transport should verify an index is completely valid before trying to use it. That requires checking the header/footer but also checking the complete content SHA-1. All of this logic is already in the front half of verify_pack, so pull it out into a new function that can be reused. Signed-off-by: Shawn O. Pearce <> Signed-off-by: Junio C Hamano <>
2009-06-06Don't expect verify_pack() callers to set pack_sizeMike Hommey
Since use_pack() will end up populating pack_size if it is not already set, we can just adapt the code in verify_packfile() such that it doesn't require pack_size to be set beforehand. This allows callers not to have to set pack_size themselves, and we can thus revert changes from 1c23d794 (Don't die in git-http-fetch when fetching packs). Signed-off-by: Mike Hommey <> Signed-off-by: Tay Ray Chuan <> Signed-off-by: Junio C Hamano <>
2008-10-03fix openssl headers conflicting with custom SHA1 implementationsNicolas Pitre
On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <> Signed-off-by: Shawn O. Pearce <>
2008-06-25verify-pack: check packed object CRC when using index version 2Nicolas Pitre
To do so, check_pack_crc() moved from builtin-pack-objects.c to pack-check.c where it is more logical to share. Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>
2008-06-25move show_pack_info() where it belongsNicolas Pitre
This is called when verify_pack() has its verbose argument set, and verbose in this context makes sense only for the actual 'git verify-pack' command. Therefore let's move show_pack_info() to builtin-verify-pack.c instead and remove useless verbose argument from verify_pack(). Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>
2008-06-25optimize verify-pack a bitNicolas Pitre
Using find_pack_entry_one() to get object offsets is rather suboptimal when nth_packed_object_offset() can be used directly. Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>
2008-06-24call init_pack_revindex() lazilyNicolas Pitre
This makes life much easier for next patch, as well as being more efficient when the revindex is actually not used. Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>
2008-06-02make verify-pack a bit more useful with bad packsNicolas Pitre
When a pack gets corrupted, its SHA1 checksum will fail. However, this is more useful to let the test go on in order to find the actual problem location than only complain about the SHA1 mismatch and bail out. Also, it is more useful to compare the stored pack SHA1 with the one in the index file instead of the computed SHA1 since the computed SHA1 from a corrupted pack won't match the one stored in the index either. Finally a few code and message cleanups were thrown in as a bonus. Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>
2008-03-01add storage size output to 'git verify-pack -v'Nicolas Pitre
This can possibly break external scripts that depend on the previous output, but those script can't possibly be critical to Git usage, and fixing them should be trivial. Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>
2008-03-01fix unimplemented packed_object_info_detail() featuresNicolas Pitre
Since commit eb32d236df0c16b936b04f0c5402addb61cdb311, there was a TODO comment in packed_object_info_detail() about the SHA1 of base object to OBJ_OFS_DELTA objects. So here it is at last. While at it, providing the actual storage size information as well is now trivial. Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>
2007-06-06pack-check: Sort entries by pack offset before unpacking them.Alexandre Julliard
Because of the way objects are sorted in a pack, unpacking them in disk order is much more efficient than random access. Tests on the Wine repository show a gain in pack validation time of about 35%. Signed-off-by: Alexandre Julliard <> Signed-off-by: Junio C Hamano <>
2007-05-27Lazily open pack index files on demandShawn O. Pearce
In some repository configurations the user may have many packfiles, but all of the recent commits/trees/tags/blobs are likely to be in the most recent packfile (the one with the newest mtime). It is therefore common to be able to complete an entire operation by accessing only one packfile, even if there are 25 packfiles available to the repository. Rather than opening and mmaping the corresponding .idx file for every pack found, we now only open and map the .idx when we suspect there might be an object of interest in there. Of course we cannot known in advance which packfile contains an object, so we still need to scan the entire packed_git list to locate anything. But odds are users want to access objects in the most recently created packfiles first, and that may be all they ever need for the current operation. Junio observed in b867092f that placing recent packfiles before older ones can slightly improve access times for recent objects, without degrading it for historical object access. This change improves upon Junio's observations by trying even harder to avoid the .idx files that we won't need. Signed-off-by: Shawn O. Pearce <> Signed-off-by: Junio C Hamano <>
2007-05-26fixes to output of git-verify-pack -vNicolas Pitre
Now that the default delta depth is 50, it is a good idea to also bump MAX_CHAIN to 50. While at it, make the display a bit prettier by making the MAX_CHAIN limit inclusive, and display the number of deltas that are above that limit at the end instead of the beginning. Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>
2007-04-10get rid of num_packed_objects()Nicolas Pitre
The coming index format change doesn't allow for the number of objects to be determined from the size of the index file directly. Instead, Let's initialize a field in the packed_git structure with the object count when the index is validated since the count is always known at that point. While at it let's reorder some struct packed_git fields to avoid padding due to needed 64-bit alignment for some of them. Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>
2007-04-05clean up and optimize nth_packed_object_sha1() usageNicolas Pitre
Let's avoid the open coded pack index reference in pack-object and use nth_packed_object_sha1() instead. This will help encapsulating index format differences in one place. And while at it there is no reason to copy SHA1's over and over while a direct pointer to it in the index will do just fine. Signed-off-by: Nicolas Pitre <> Acked-by: Shawn O. Pearce <> Signed-off-by: Junio C Hamano <>
2007-03-17[PATCH] clean up pack index handling a bitNicolas Pitre
Especially with the new index format to come, it is more appropriate to encapsulate more into check_packed_git_idx() and assume less of the index format in struct packed_git. To that effect, the index_base is renamed to index_data with void * type so it is not used directly but other pointers initialized with it. This allows for a couple pointer cast removal, as well as providing a better generic name to grep for when adding support for new index versions or formats. And index_data is declared const too while at it. Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>
2007-03-07Use off_t when we really mean a file offset.Shawn O. Pearce
Not all platforms have declared 'unsigned long' to be a 64 bit value, but we want to support a 64 bit packfile (or close enough anyway) in the near future as some projects are getting large enough that their packed size exceeds 4 GiB. By using off_t, the POSIX type that is declared to mean an offset within a file, we support whatever maximum file size the underlying operating system will handle. For most modern systems this is up around 2^60 or higher. Signed-off-by: Shawn O. Pearce <> Signed-off-by: Junio C Hamano <>
2007-03-07Use uint32_t for all packed object counts.Shawn O. Pearce
As we permit up to 2^32-1 objects in a single packfile we cannot use a signed int to represent the object offset within a packfile, after 2^31-1 objects we will start seeing negative indexes and error out or compute bad addresses within the mmap'd index. This is a minor cleanup that does not introduce any significant logic changes. It is roach free. Signed-off-by: Shawn O. Pearce <> Signed-off-by: Junio C Hamano <>