summaryrefslogtreecommitdiff
path: root/pack.h
AgeCommit message (Collapse)Author
2016-07-13fsck: use streaming interface for large blobs in packNguyễn Thái Ngọc Duy
For blobs, we want to make sure the on-disk data is not corrupted (i.e. can be inflated and produce the expected SHA-1). Blob content is opaque, there's nothing else inside to check for. For really large blobs, we may want to avoid unpacking the entire blob in memory, just to check whether it produces the same SHA-1. On 32-bit systems, we may not have enough virtual address space for such memory allocation. And even on 64-bit where it's not a problem, allocating a lot more memory could result in kicking other parts of systems to swap file, generating lots of I/O and slowing everything down. For this particular operation, not unpacking the blob and letting check_sha1_signature, which supports streaming interface, do the job is sufficient. check_sha1_signature() is not shown in the diff, unfortunately. But if will be called when "data_valid && !data" is false. We will call the callback function "fn" with NULL as "data". The only callback of this function is fsck_obj_buffer(), which does not touch "data" at all if it's a blob. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-03finish_tmp_packfile():use strbuf for pathname constructionSun He
The old version fixes a maximum length on the buffer, which could be a problem if one is not certain of the length of get_object_directory(). Using strbuf can avoid the protential bug. Helped-by: Michael Haggerty <mhagger@alum.mit.edu> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Sun He <sunheehnus@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05pack-objects: name pack files after trailer hashJeff King
Our current scheme for naming packfiles is to calculate the sha1 hash of the sorted list of objects contained in the packfile. This gives us a unique name, so we are reasonably sure that two packs with the same name will contain the same objects. It does not, however, tell us that two such packs have the exact same bytes. This makes things awkward if we repack the same set of objects. Due to run-to-run variations, the bytes may not be identical (e.g., changed zlib or git versions, different source object reuse due to new packs in the repository, or even different deltas due to races during a multi-threaded delta search). In theory, this could be helpful to a program that cares that the packfile contains a certain set of objects, but does not care about the particular representation. In practice, no part of git makes use of that, and in many cases it is potentially harmful. For example, if a dumb http client fetches the .idx file, it must be sure to get the exact .pack that matches it. Similarly, a partial transfer of a .pack file cannot be safely resumed, as the actual bytes may have changed. This could also affect a local client which opened the .idx and .pack files, closes the .pack file (due to memory or file descriptor limits), and then re-opens a changed packfile. In all of these cases, git can detect the problem, as we have the sha1 of the bytes themselves in the pack trailer (which we verify on transfer), and the .idx file references the trailer from the matching packfile. But it would be simpler and more efficient to actually get the correct bytes, rather than noticing the problem and having to restart the operation. This patch simply uses the pack trailer sha1 as the pack name. It should be similarly unique, but covers the exact representation of the objects. Other parts of git should not care, as the pack name is returned by pack-objects and is essentially opaque. One test needs to be updated, because it actually corrupts a pack and expects that re-packing the corrupted bytes will use the same name. It won't anymore, but we can easily just use the name that pack-objects hands back. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-17Merge branch 'jc/stream-to-pack'Junio C Hamano
* jc/stream-to-pack: bulk-checkin: replace fast-import based implementation csum-file: introduce sha1file_checkpoint finish_tmp_packfile(): a helper function create_tmp_packfile(): a helper function write_pack_header(): a helper function Conflicts: pack.h
2011-12-05Merge branch 'jc/index-pack-reject-dups'Junio C Hamano
* jc/index-pack-reject-dups: receive-pack, fetch-pack: reject bogus pack that records objects twice
2011-11-17receive-pack, fetch-pack: reject bogus pack that records objects twiceJunio C Hamano
When receive-pack & fetch-pack are run and store the pack obtained over the wire to a local repository, they internally run the index-pack command with the --strict option. Make sure that we reject incoming packfile that records objects twice to avoid spreading such a damage. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-07fsck: print progressNguyễn Thái Ngọc Duy
fsck is usually a long process and it would be nice if it prints progress from time to time. Progress meter is not printed when --verbose is given because --verbose prints a lot, there's no need for "alive" indicator. Progress meter may provide "% complete" information but it would be lost anyway in the flood of text. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-07fsck: avoid reading every object twiceNguyễn Thái Ngọc Duy
During verify_pack() all objects are read for SHA-1 check. Then fsck_sha1() is called on every object, which read the object again (fsck_sha1 -> parse_object -> read_sha1_file). Avoid reading an object twice, do fsck_sha1 while we have an object uncompressed data in verify_pack. On git.git, with this patch I got: $ /usr/bin/time ./git fsck >/dev/null 98.97user 0.90system 1:40.01elapsed 99%CPU (0avgtext+0avgdata 616624maxresident)k 0inputs+0outputs (0major+194186minor)pagefaults 0swaps Without it: $ /usr/bin/time ./git fsck >/dev/null 231.23user 2.35system 3:53.82elapsed 99%CPU (0avgtext+0avgdata 636688maxresident)k 0inputs+0outputs (0major+461629minor)pagefaults 0swaps Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-28finish_tmp_packfile(): a helper functionJunio C Hamano
Factor out a small logic out of the private write_pack_file() function in builtin/pack-objects.c. This changes the order of finishing multi-pack generation slightly. The code used to - adjust shared perm of temporary packfile - rename temporary packfile to the final name - update mtime of the packfile under the final name - adjust shared perm of temporary idxfile - rename temporary idxfile to the final name but because the helper does not want to do the mtime thing, the updated code does that step first and then all the rest. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-28create_tmp_packfile(): a helper functionJunio C Hamano
Factor out a small logic out of the private write_pack_file() function in builtin/pack-objects.c Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-28write_pack_header(): a helper functionJunio C Hamano
Factor out a small logic out of the private write_pack_file() function in builtin/pack-objects.c Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-28index-pack --verify: read anomalous offsets from v2 idx fileJunio C Hamano
A pack v2 .idx file usually records offset using 64-bit representation only when the offset does not fit within 31-bit, but you can handcraft your .idx file to record smaller offset using 64-bit, storing all zero in the upper 4-byte. By inspecting the original idx file when running index-pack --verify, encode such low offsets that do not need to be in 64-bit but are encoded using 64-bit just like the original idx file so that we can still validate the pack/idx pair by comparing the idx file recomputed with the original. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-28index-pack: --verifyJunio C Hamano
Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-28write_idx_file: introduce a struct to hold idx customization optionsJunio C Hamano
Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-21Merge branch 'sp/maint-dumb-http-pack-reidx'Junio C Hamano
* sp/maint-dumb-http-pack-reidx: http.c::new_http_pack_request: do away with the temp variable filename http-fetch: Use temporary files for pack-*.idx until verified http-fetch: Use index-pack rather than verify-pack to check packs Allow parse_pack_index on temporary files Extract verify_pack_index for reuse from verify_pack Introduce close_pack_index to permit replacement http.c: Remove unnecessary strdup of sha1_to_hex result http.c: Don't store destination name in request structures http.c: Drop useless != NULL test in finish_http_pack_request http.c: Tiny refactoring of finish_http_pack_request t5550-http-fetch: Use subshell for repository operations http.c: Remove bad free of static block
2010-04-20Extract verify_pack_index for reuse from verify_packShawn O. Pearce
The dumb HTTP transport should verify an index is completely valid before trying to use it. That requires checking the header/footer but also checking the complete content SHA-1. All of this logic is already in the front half of verify_pack, so pull it out into a new function that can be reused. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-23move encode_in_pack_object_header() to a better placeNicolas Pitre
Commit 1b22b6c897 made duplicated versions of encode_header() into a common version called encode_in_pack_object_header(). There is however a better location that sha1_file.c for such a function though, as sha1_file.c contains nothing related to the creation of packs, and it is quite populated already. Also the comment that was moved to the header file should really remain near the function as it covers implementation details and provides no information about the actual function interface. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-22make "index-pack" a built-inLinus Torvalds
This required some fairly trivial packfile function 'const' cleanup, since the builtin commands get a const char *argv[] array. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-30improve reliability of fixup_pack_header_footer()Nicolas Pitre
Currently, this function has the potential to read corrupted pack data from disk and give it a valid SHA1 checksum. Let's add the ability to validate SHA1 checksum of existing data along the way, including before and after any arbitrary point in the pack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-25verify-pack: check packed object CRC when using index version 2Nicolas Pitre
To do so, check_pack_crc() moved from builtin-pack-objects.c to pack-check.c where it is more logical to share. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-25move show_pack_info() where it belongsNicolas Pitre
This is called when verify_pack() has its verbose argument set, and verbose in this context makes sense only for the actual 'git verify-pack' command. Therefore let's move show_pack_info() to builtin-verify-pack.c instead and remove useless verbose argument from verify_pack(). Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-02Merge branch 'np/progress'Junio C Hamano
* np/progress: Show total transferred as part of throughput progress make sure throughput display gets updated even if progress doesn't move return the prune-packed progress display to the inner loop add throughput display to git-push add some copyright notice to the progress display code add throughput display to index-pack add throughput to progress display relax usage of the progress API make struct progress an opaque type prune-packed: don't call display_progress() for every file Stop displaying "Pack pack-$ID created." during git-gc Teach prune-packed to use the standard progress meter Change 'Deltifying objects' to 'Compressing objects' fix for more minor memory leaks fix const issues with some functions pack-objects.c: fix some global variable abuse and memory leaks pack-objects: no delta possible with only one object in the list cope with multiple line breaks within sideband progress messages more compact progress display
2007-10-17fix const issues with some functionsNicolas Pitre
Two functions, namely write_idx_file() and open_pack_file(), currently return a const pointer. However that pointer is either a copy of the first argument, or set to a malloc'd buffer when that first argument is null. In the later case it is wrong to qualify that pointer as const since ownership of the buffer is transferred to the caller to dispose of, and obviously the free() function is not meant to be passed const pointers. Making the return pointer not const causes a warning when the first argument is returned since that argument is also marked const. The correct thing to do is therefore to remove the const qualifiers, avoiding the need for ugly casts only to silence some warnings. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-09-19Refactor index-pack "keep $sha1" handling for reuseShawn O. Pearce
There is a subtle (but important) linkage between receive-pack and index-pack that allows index-pack to create a packfile but protect it from being deleted by a concurrent `git repack -a -d` operation. The linkage works by having index-pack mark the newly created pack with a ".keep" file and then it passes the SHA-1 name of that new packfile to receive-pack along its stdout channel. The receive-pack process must unkeep the packfile by deleting the .keep file, but can it can only do so after all elgible refs have been updated in the receiving repository. This ensures that the packfile is either kept or its objects are reachable, preventing a concurrent repacker from deleting the packfile before it can determine that its objects are actually needed by the repository. The new builtin-fetch code needs to perform the same actions if it choose to run index-pack rather than unpack-objects, so I am moving this code out to its own function where both receive-pack and fetch-pack are able to invoke it when necessary. The caller is responsible for deleting the returned ".keep" and freeing the path if the returned path is not NULL. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-02Unify write_index_file functionsGeert Bosch
This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-02Create pack-write.c for common pack writing codeDana L. How
Include a generalized fixup_pack_header_footer() in this new file. Needed by git-repack --max-pack-size feature in a later patchset. [sp: Moved close(pack_fd) to callers, to support index-pack, and changed name to better indicate it is for packfiles.] Signed-off-by: Dana L. How <danahow@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-03-17[PATCH] clean up pack index handling a bitNicolas Pitre
Especially with the new index format to come, it is more appropriate to encapsulate more into check_packed_git_idx() and assume less of the index format in struct packed_git. To that effect, the index_base is renamed to index_data with void * type so it is not used directly but other pointers initialized with it. This allows for a couple pointer cast removal, as well as providing a better generic name to grep for when adding support for new index versions or formats. And index_data is declared const too while at it. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-25Refactor the pack header reading function out of receive-pack.cJunio C Hamano
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-18Use fixed-size integers for the on-disk pack structure.Simon 'corecode' Schubert
Plain integer types without a fixed size can vary between platforms. Even though all common platforms use 32-bit ints, there is no guarantee that this won't change at some point. Furthermore, specifying an integer type with explicit size makes the definition of structures more obvious. Signed-off-by: Simon 'corecode' Schubert <corecode@fs.ei.tum.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-18Document pack .idx file format upgrade strategy.Shawn O. Pearce
Way back when Junio developed the 64 bit index topic he came up with a means of changing the .idx file format so that older Git clients would recognize that they don't understand the file and refuse to read it, while newer clients could tell the difference between the old-style and new-style .idx files. Unfortunately this wasn't recorded anywhere. This change documents how we might go about changing the .idx file format by using a special signature in the first four bytes. Credit (and possible blame) goes completely to Junio for thinking up this technique. The change also modifies the error message of the current Git code so that users get a recommendation to upgrade their Git software should this version or later encounter a new-style .idx which it cannot process. We already do this for the .pack files, but since we usually process the .idx files first its important that these files are recognized and encourage an upgrade. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-23Merge branch 'np/pack'Junio C Hamano
* np/pack: add the capability for index-pack to read from a stream index-pack: compare only the first 20-bytes of the key. git-repack: repo.usedeltabaseoffset pack-objects: document --delta-base-offset option allow delta data reuse even if base object is a preferred base zap a debug remnant let the GIT native protocol use offsets to delta base when possible make pack data reuse compatible with both delta types make git-pack-objects able to create deltas with offset to base teach git-index-pack about deltas with offset to base teach git-unpack-objects about deltas with offset to base introduce delta objects with offset to base
2006-10-15Revert "move pack creation to version 3"Junio C Hamano
This reverts commit 16854571aae6302f457c5fbee41ac64669b09595. Git as recent as v1.1.6 do not understand version 3 delta. v1.2.0 is Ok and I personally would say it is old enough, but the improvement between version 2 and version 3 delta is not bit enough to justify breaking older clients. We should resurrect this later, but when we do so we shold make it conditional. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-27make pack data reuse compatible with both delta typesNicolas Pitre
This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-23move pack creation to version 3Nicolas Pitre
It's been quite a while now that GIT is able to read version 3 packs. Let's create them at last. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-13Remove TYPE_* constant macros and use object_type enums consistently.Linus Torvalds
This updates the type-enumeration constants introduced to reduce the memory footprint of "struct object" to match the type bits already used in the packfile format, by removing the former (i.e. TYPE_* constant macros) and using the latter (i.e. enum object_type) throughout the code for consistency. Eventually we can stop passing around the "type strings" entirely, and this will help - no confusion about two different integer enumeration. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17pack-objects: reuse data from existing packs.Junio C Hamano
When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-10remove delta-against-self bitNicolas Pitre
After experimenting with code to add the ability to encode a delta against part of the deltified file, it turns out that resulting packs are _bigger_ than when this ability is not used. The raw delta output might be smaller, but it doesn't compress as well using gzip with a negative net saving on average. Said bit would in fact be more useful to allow for encoding the copying of chunks larger than 64KB providing more savings with large files. This will correspond to packs version 3. While the current code still produces packs version 2, it is made future proof so pack versions 2 and 3 are accepted. Any pack version 2 are compatible with version 3 since the redefined bit was never used before. When enough time has passed, code to use that bit to produce version 3 packs could be added. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-07-01[PATCH] verify-pack updates.Junio C Hamano
Nico pointed out that having verify_pack.c and verify-pack.c was confusing. Rename verify_pack.c to pack-check.c as suggested, and enhances the verification done quite a bit. - Built-in sha1_file unpacking knows that a base object of a deltified object _must_ be in the same pack, and takes advantage of that fact. - Earlier verify-pack command only checked the SHA1 sum for the entire pack file and did not look into its contents. It now checks everything idx file claims to have unpacks correctly. - It now has a hook to give more detailed information for objects contained in the pack under -v flag. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-29[PATCH] Add git-verify-pack command.Junio C Hamano
Given a list of <pack>.idx files, this command validates the index file and the corresponding .pack file for consistency. This patch also uses the same validation mechanism in fsck-cache when the --full flag is used. During normal operation, sha1_file.c verifies that a given .idx file matches the .pack file by comparing the SHA1 checksum stored in .idx file and .pack file as a minimum sanity check. We may further want to check the pack signature and version when we map the pack, but that would be a separate patch. Earlier, errors to map a pack file was not flagged fatal but led to a random fatal error later. This version explicitly die()s when such an error is detected. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-29Make git pack files use little-endian size encodingLinus Torvalds
This makes it match the new delta encoding, and admittedly makes the code easier to follow. This also updates the PACK file version to 2, since this (and the delta encoding change in the previous commit) are incompatible with the old format.
2005-06-28Change pack file format. Hopefully for the last time.Linus Torvalds
This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently.