summaryrefslogtreecommitdiff
path: root/pack-objects.c
AgeCommit message (Collapse)Author
2006-03-02Re-fix compilation warnings.Luck, Tony
Commit 8fcf1ad9c68e15d881194c8544e7c11d33529c2b has a combination of double cast and Andreas' switch to using unsigned long ... just the latter is sufficient (and a lot less ugly than using the double cast). Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-26Use setenv(), fix warningsTimo Hirvonen
- Fix -Wundef -Wold-style-definition warnings - Make pll_free() static [jc: original patch by Timo had another unrelated bits: - Use setenv() instead of putenv() I'm postponing that part for now.] Signed-off-by: Timo Hirvonen <tihirvon@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-25fix warning from pack-objects.cLuck, Tony
When compiling on ia64 I get this warning (from gcc 3.4.3): gcc -o pack-objects.o -c -g -O2 -Wall -DSHA1_HEADER='<openssl/sha.h>' pack-objects.c pack-objects.c: In function `pack_revindex_ix': pack-objects.c:94: warning: cast from pointer to integer of different size A double cast (first to long, then to int) shuts gcc up, but is there a better way? [jc: Andreas Ericsson suggests to use ulong instead. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-25Merge branches 'jc/rev-list' and 'jc/pack-thin'Junio C Hamano
* jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next")
2006-02-24pack-objects: hash basename and direname a bit differently.Junio C Hamano
...so that "Makefile"s from different revs are sorted together, separate from "t/Makefile"s, but close enough. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-24pack-objects: allow "thin" packs to exceed depth limitsJunio C Hamano
When creating a new pack to be used in .git/objects/pack/ directory, we carefully count the depth of deltified objects to be reused, so that the generated pack does not to exceed the specified depth limit for runtime efficiency. However, when we are generating a thin pack that does not contain base objects, such a pack can only be used during network transfer that is expanded on the other end upon reception, so being careful and artificially cutting the delta chain does not buy us anything except increased bandwidth requirement. This patch disables the delta chain depth limit check when reusing an existing delta. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-23pack-objects: use full pathname to help hashing with "thin" pack.Junio C Hamano
This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-23pack-objects: thin pack micro-optimization.Junio C Hamano
Since we sort objects by type, hash, preferredness and then size, after we have a delta against preferred base, there is no point trying a delta with non-preferred base. This seems to save expensive calls to diff-delta and it also seems to save the output space as well. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-23pack-objects eye-candy: finishing touches.Junio C Hamano
This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22also adds progress when actually writing a packNicolas Pitre
If that pack is big, it takes significant time to write and might benefit from some more eye candies as well. This is however disabled when the pack is written to stdout since in that case the output is usually piped into unpack_objects which already does its own progress reporting. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22nicer eye candies for pack-objectsNicolas Pitre
This provides a stable and simpler progress reporting mechanism that updates progress as often as possible but accurately not updating more than once a second. The deltification phase is also made more interesting to watch (since repacking a big repository and only seeing a dot appear once every many seconds is rather boring and doesn't provide much food for anticipation). Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22pack-objects: avoid delta chains that are too long.Junio C Hamano
This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22pack-objects: finishing touches.Junio C Hamano
This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22pack-objects: reuse data from existing packs.Junio C Hamano
When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-20Thin pack - create packfile with missing delta base.Junio C Hamano
This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-18pack-objects: avoid delta chains that are too long.Junio C Hamano
This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17pack-objects: finishing touches.Junio C Hamano
This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17pack-objects: reuse data from existing packs.Junio C Hamano
When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-12Make pack-objects chattier.Junio C Hamano
You could give -q to squelch it, but currently no tool does it. This would make 'git clone host:repo here' over ssh not silent again. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-12fetch-clone progress: finishing touches.Junio C Hamano
This makes fetch-pack also report the progress of packing part. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-12-29code comments: spellJunio C Hamano
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-12-08Document the --non-empty command-line option to git-pack-objects.Nikolai Weibull
This provides (minimal) documentation for the --non-empty command-line option to the pack-objects command. Signed-off-by: Nikolai Weibull <nikolai@bitwi.se> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-29Make the rest of commands work from a subdirectory.Junio C Hamano
These commands are converted to run from a subdirectory. commit-tree convert-objects merge-base merge-index mktag pack-objects pack-redundant prune-packed read-tree tar-tree unpack-file unpack-objects update-server-info write-tree Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-21git-repack: Properly abort in corrupt repositoryLinus Torvalds
In a corrupt repository, git-repack produces a pack that does not contain needed objects without complaining, and the result of this combined with -d flag can be very painful -- e.g. a lossage of one tree object can lead to lossage of blobs reachable only through that tree. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-26pack-objects: Allow use of pre-generated pack.Junio C Hamano
git-pack-objects can reuse pack files stored in $GIT_DIR/pack-cache directory, when a necessary pack is found. This is hopefully useful when upload-pack (called from git-daemon) is expected to receive requests for the same set of objects many times (e.g full cloning request of any project, or updates from the set of heads previous day to the latest for a slow moving project). Currently git-pack-objects does *not* keep pack files it creates for reusing. It might be useful to add --update-cache option to it, which would allow it store pack files it created in the pack-cache directory, and prune rarely used ones from it. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-15Unlocalized isspace and friendsLinus Torvalds
Do our own ctype.h, just to get the sane semantics: we want locale-independence, _and_ we want the right signed behaviour. Plus we only use a very small subset of ctype.h anyway (isspace, isalpha, isdigit and isalnum). Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-13Add support for "local" packingLinus Torvalds
This adds the "--local" flag to git-pack-objects, which acts like "--incremental", except that instead of ignoring all packed objects, it only ignores objects that are packed and in an alternate object tree. As a result, it effectively only does a local re-pack: any remote-packed objects will stay in the alternate object directories. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-13Fix packname hash generation.Junio C Hamano
This changes the generation of hash packfiles have in their names, from "hash of object names as fed to us" to "hash of object names in the resulting pack, in the order they appear in the index file". The new "git-index-pack" command is taught to output the computed hash value to its standard output. With this, we can store downloaded pack in a temporary file without knowing its final name, run git-index-pack to generate idx for it while finding out its final name, and then rename the pack and idx to their final names. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-08-09[PATCH] Plug memory leak in git-pack-objectsSergey Vlasov
find_deltas() should free its temporary objects before returning. [jc: Sergey, if you have [PATCH] title on the Subject line of your e-mail, please do not repeat it on the first line in your message body. Thanks.] Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-07-03Make the name of a pack-file depend on the objects packed there-in.Linus Torvalds
This means that the .git/objects/pack directory is also rsync'able, since the filenames created there-in are either unique or refer to the same data. Otherwise you might not be able to pull from a directory that is partly packed without having to worry about missing objects due to pack-file name clashes.
2005-07-03Add "--non-empty" flag to git-pack-objectsLinus Torvalds
It skips writing the pack-file if it ends up being empty.
2005-07-03Add "--incremental" flag to git-pack-objectsLinus Torvalds
It won't add an object that is already in a pack to the new pack.
2005-06-29[PATCH] assorted delta code cleanupNicolas Pitre
This is a wrap-up patch including all the cleanups I've done to the delta code and its usage. The most important change is the factorization of the delta header handling code. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-29Make git pack files use little-endian size encodingLinus Torvalds
This makes it match the new delta encoding, and admittedly makes the code easier to follow. This also updates the PACK file version to 2, since this (and the delta encoding change in the previous commit) are incompatible with the old format.
2005-06-29[PATCH] Emit base objects of a delta chain when the delta is output.Junio C Hamano
Deltas are useless by themselves and when you use them you need to get to their base objects. A base object should inherit recency from the most recent deltified object that is based on it and that is what this patch teaches git-pack-objects. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-29[PATCH] Fix unpack-objects for header length information.Junio C Hamano
Standalone unpack-objects command was not adjusted for header length encoding change when dealing with deltified entry. This fixes it. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-28Change pack file format. Hopefully for the last time.Linus Torvalds
This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently.
2005-06-28git-pack-objects: add "--stdout" flag to write the pack file to stdoutLinus Torvalds
This also suppresses creation of the index file.
2005-06-28Teach packing about "tag" objectsLinus Torvalds
(And teach sha1_file and unpack-object know how to unpack them too, of course)
2005-06-27[PATCH] Enhance sha1_file_size() into sha1_object_info()Junio C Hamano
This lets us eliminate one use of map_sha1_file() outside sha1_file.c, to bring us one step closer to the packed GIT. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-27[PATCH] Remove "delta" object representation.Junio C Hamano
Packed delta files created by git-pack-objects seems to be the way to go, and existing "delta" object handling code has exposed the object representation details to too many places. Remove it while we refactor code to come up with a proper interface in sha1_file.c. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-27csum-file interface updates: return resulting SHA1Linus Torvalds
Also, make the writing of the SHA1 as a end-header be conditional: not every user will necessarily want to write the SHA1 to the file itself, even though current users do (but we migh end up using the same helper functions for the object files themselves, that don't do this). This also makes the packed index file contain the SHA1 of the packed data file at the end (just before its own SHA1). That way you can validate the pairing of the two if you want to.
2005-06-27git-pack-objects: write the pack files with a SHA1 csumLinus Torvalds
We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index.
2005-06-26git-pack-objects: use name information (if any) to sort objects for packing.Linus Torvalds
This is incredibly cheezy. But it's cheap, and it works pretty well.
2005-06-26git-pack-objects: do the delta search in reverse size orderLinus Torvalds
Starting from big objects and going backwards means that we end up picking a delta that goes from a bigger object to a smaller one. That's advantageous for two reasons: the bigger object is likely the newer one (since things tend to grow, rather than shrink), and doing a delete tends to be smaller than doing an add. So the deltas don't tend to be top-of-tree, and the packed end result is just slightly smaller.
2005-06-26Fix object packing/unpacking.Linus Torvalds
This actually successfully packed and unpacked a git archive down to 1.3MB (17MB unpacked). Right now unpacking is way too noisy, lots of debug messages left.
2005-06-26[PATCH] Finish initial cut of git-pack-object/git-unpack-object pair.Junio C Hamano
This finishes the initial round of git-pack-object / git-unpack-object pair. They are now good enough to be used as a transport medium: - Fix delta direction in pack-objects; the original was computing delta to create the base object from the object to be squashed, which was quite unfriendly for unpacker ;-). - Add a script to test the very basics. - Implement unpacker for both regular and deltified objects. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-26Add "--depth=N" parameter to git-pack-objects to limit maximum delta depthLinus Torvalds
It too defaults to 10. A nice round random number.
2005-06-26git-pack-objects: make "--window=x" semantics more logical.Linus Torvalds
A zero disables delta generation (like before), but we make the window be one bigger than specified, since we use one entry for the one to be tested (it used to be that "--window=1" was meaningless, since we'd have used up the single-entry window with the entry to be tested, and had no chance of actually ever finding a delta). The default window remains at 10, but now it really means "test the 10 closest objects", not "test the 9 closest objects".
2005-06-26Add a "max_size" parameter to diff_delta()Linus Torvalds
Anything that generates a delta to see if two objects are close usually isn't interested in the delta ends up being bigger than some specified size, and this allows us to stop delta generation early when that happens.