path: root/pack-objects.c
AgeCommit message (Collapse)Author
2014-09-18use REALLOC_ARRAY for changing the allocation size of arraysRené Scharfe
Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2014-07-07hashmap: factor out getting a hash code from a SHA1Karsten Blees
Copying the first bytes of a SHA1 is duplicated in six places, however, the implications (the actual value would depend on the endianness of the platform) is documented only once. Add a properly documented API for this. Signed-off-by: Karsten Blees <> Signed-off-by: Junio C Hamano <>
2014-06-02pack-objects: use free()+xcalloc() instead of xrealloc()+memset()René Scharfe
Whenever the hash table becomes too small then its size is increased, the original part (and the added space) is zerod out using memset(), and the table is rebuilt from scratch. Simplify this proceess by returning the old memory using free() and allocating the new buffer using xcalloc(), which already clears the buffer for us. That way we avoid copying the old hash table contents needlessly inside xrealloc(). While at it, use the first array member with sizeof instead of a specific type. The old code used uint32_t and int, while index is actually an array of int32_t. Their sizes are the same basically everywhere, so it's not actually a problem, but the new code is cleaner and doesn't have to be touched should the type be changed. Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2013-10-24pack-objects: refactor the packing listVicent Marti
The hash table that stores the packing list for a given `pack-objects` run was tightly coupled to the pack-objects code. In this commit, we refactor the hash table and the underlying storage array into a `packing_data` struct. The functionality for accessing and adding entries to the packing list is hence accessible from other parts of Git besides the `pack-objects` builtin. This refactoring is a requirement for further patches in this series that will require accessing the commit packing list from outside of `pack-objects`. The hash table implementation has been minimally altered: we now use table sizes which are always a power of two, to ensure a uniform index distribution in the array. Signed-off-by: Vicent Marti <> Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2006-08-04Make git-pack-objects a builtinMatthias Kestenholz
Signed-off-by: Matthias Kestenholz <> Signed-off-by: Junio C Hamano <>
2006-07-24pack-objects: check pack.window for default window sizeJeff King
For some repositories, deltas simply don't make sense. One can disable them for git-repack by adding --window, but git-push insists on making the deltas which can be very CPU-intensive for little benefit. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2006-07-10Fix more typos, primarily in the codePavel Roskin
The only visible change is that git-blame doesn't understand "--compability" anymore, but it does accept "--compatibility" instead, which is already documented. Signed-off-by: Pavel Roskin <> Signed-off-by: Junio C Hamano <>
2006-07-01don't load objects needlessly when repackingNicolas Pitre
If no delta is attempted on some objects then it is useless to load them in memory, neither create any delta index for them. The best thing to do is therefore to load and index them only when really needed. Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>
2006-06-30consider previous pack undeltified object state only when reusing delta dataNicolas Pitre
Without this there would never be a chance to improve packing for previously undeltified objects. Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>
2006-06-29Do not try futile object pairs when repacking.Linus Torvalds
In the repacking window, if both objects we are looking at already came from the same (old) pack-file, don't bother delta'ing them against each other. That means that we'll still always check for better deltas for (and against!) _unpacked_ objects, but assuming incremental repacks, you'll avoid the delta creation 99% of the time. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2006-06-21Merge branch 'ff/c99' into nextJunio C Hamano
* ff/c99: Remove all void-pointer arithmetic.
2006-06-21upload-pack: prepare for sideband message support.Junio C Hamano
This does not implement sideband for propagating the status to the downloader yet, but add code to capture the standard error output from the pack-objects process in preparation for sending it off to the client when the protocol extension allows us to do so. Signed-off-by: Junio C Hamano <>
2006-06-20Remove all void-pointer arithmetic.Florian Forster
ANSI C99 doesn't allow void-pointer arithmetic. This patch fixes this in various ways. Usually the strategy that required the least changes was used. Signed-off-by: Florian Forster <> Signed-off-by: Junio C Hamano <>
2006-06-06pack-objects: improve path grouping heuristics.Linus Torvalds
This trivial patch not only simplifies the name hashing, it actually improves packing for both git and the kernel. The git archive pack shrinks from 6824090->6622627 bytes (a 3% improvement), and the kernel pack shrinks from 108756213 to 108219021 (a mere 0.5% improvement, but still, it's an improvement from making the hashing much simpler!) We just create a 32-bit hash, where we "age" previous characters by two bits, so the last characters in a filename count most. So when we then compare the hashes in the sort routine, filenames that end the same way sort the same way. It takes the subdirectory into account (unless the filename is > 16 characters), but files with the same name within the same subdirectory will obviously sort closer than files in different subdirectories. And, incidentally (which is why I tried the hash change in the first place, of course) builtin-rev-list.c will sort fairly close to rev-list.c. And no, it's not a "good hash" in the sense of being secure or unique, but that's not what we're looking for. The whole "hash" thing is misnamed here. It's not so much a hash as a "sorting number". [jc: rolled in simplification for computing the sorting number computation for thin pack base objects] Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2006-05-31tree_entry(): new tree-walking helper functionLinus Torvalds
This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2006-05-17Merge branch 'np/pack'Junio C Hamano
* np/pack: improve depth heuristic for maximum delta size pack-object: slightly more efficient simple euristic for further free packing improvements
2006-05-16improve depth heuristic for maximum delta sizeNicolas Pitre
This provides a linear decrement on the penalty related to delta depth instead of being an 1/x function. With this another 5% reduction is observed on packs for both the GIT repo and the Linux kernel repo, as well as fixing a pack size regression in another sample repo I have. Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>
2006-05-15Merge branch 'fix'Junio C Hamano
* fix: Fix pack-index issue on 64-bit platforms a bit more portably. Install git-send-email by default Fix compilation on newer NetBSD systems git config syntax updates Another config file parsing fix. checkout: use --aggressive when running a 3-way merge (-m).
2006-05-15Fix pack-index issue on 64-bit platforms a bit more portably.v1.3.3Junio C Hamano
Apparently <stdint.h> is not enough for uint32_t on OpenBSD; use "unsigned int" -- hopefully that would stay 32-bit on every platform we care about, at least until we update the pack-index file format. Our sha1 routines optimized for architectures use uint32_t and expects '#include <stdint.h>' to be enough, so OpenBSD on arm or ppc might have similar issues down the road, I dunno. Signed-off-by: Junio C Hamano <>
2006-05-15pack-object: slightly more efficientNicolas Pitre
Avoid creating a delta index for objects with maximum depth since they are not going to be used as delta base anyway. This also reduce peak memory usage slightly as the current object's delta index is not useful until the next object in the loop is considered for deltification. This saves a bit more than 1% on CPU usage. Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>
2006-05-15simple euristic for further free packing improvementsNicolas Pitre
Given that the early eviction of objects with maximum delta depth may exhibit bad packing on its own, why not considering a bias against deep base objects in try_delta() to mitigate that bad behavior. This patch adjust the MAX_size allowed for a delta based on the depth of the base object as well as enabling the early eviction of max depth objects from the object window. When used separately, those two things produce slightly better and much worse results respectively. But their combined effect is a surprising significant packing improvement. With this really simple patch the GIT repo gets nearly 15% smaller, and the Linux kernel repo about 5% smaller, with no significantly measurable CPU usage difference. Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>
2006-05-14Merge branch 'fix'Junio C Hamano
* fix: include header to define uint32_t, necessary on Mac OS X
2006-05-14include header to define uint32_t, necessary on Mac OS XBen Clifford
Signed-off-by: Junio C Hamano <>
2006-05-14Merge branch 'fix'Junio C Hamano
* fix: Fix git-pack-objects for 64-bit platforms
2006-05-13Fix git-pack-objects for 64-bit platformsDennis Stosberg
The offset of an object in the pack is recorded as a 4-byte integer in the index file. When reading the offset from the mmap'ed index in prepare_pack_revindex(), the address is dereferenced as a long*. This works fine as long as the long type is four bytes wide. On NetBSD/sparc64, however, a long is 8 bytes wide and so dereferencing the offset produces garbage. [jc: taking suggestion by Linus to use uint32_t] Signed-off-by: Dennis Stosberg <> Signed-off-by: Junio C Hamano <>
2006-05-09Merge branch 'np/delta'Junio C Hamano
* np/delta: improve diff-delta with sparse and/or repetitive data tiny optimization to diff-delta replace adler32 with Rabin's polynomial in diff-delta use delta index data when finding best delta matches split the diff-delta interface
2006-05-05pack-object: squelch eye-candy on non-ttyJunio C Hamano
One of my post-update scripts runs a git-fetch into a separate repository and sends the results back to me (2>&1); I end up getting this in the mail: Generating pack... Done counting 180 objects. Result has 131 objects. Deltifying 131 objects. 0% (0/131) done^M 1% (2/131) done^M... This defaults not to do the progress report when not on a tty. You could give --progress to force the progress report, but let's not bother even documenting it nor mentioning it in the usage string. Signed-off-by: Junio C Hamano <>
2006-04-28pack-objects: update size heuristucs.Junio C Hamano
We used to omit delta base candidates that is much bigger than the target, but delta size does not grow when we delete more, so that was not a very good heuristics. Signed-off-by: Junio C Hamano <>
2006-04-27use delta index data when finding best delta matchesNicolas Pitre
This patch allows for computing the delta index for each base object only once and reuse it when trying to find the best delta match. This should set the mark and pave the way for possibly better delta generator algorithms. Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>
2006-04-21fix pack-object buffer sizeNicolas Pitre
The input line has 40 _chars_ of sha1 and no 20 _bytes_. It should also account for the space before the pathname, and the terminating \n and \0. Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>
2006-04-21pack-objects: do not stop at object that is "too small"Junio C Hamano
Because we sort the delta window by name-hash and then size, hitting an object that is too small to consider as a delta base for the current object does not mean we do not have better candidate in the window beyond it. Noticed by Shawn Pearce, analyzed by Nico, Linus and me. Signed-off-by: Junio C Hamano <>
2006-04-07Thin pack generation: optimization.Junio C Hamano
Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <>
2006-04-04Merge branch 'pe/cleanup'Junio C Hamano
* pe/cleanup: Replace xmalloc+memset(0) with xcalloc. Use blob_, commit_, tag_, and tree_type throughout.
2006-04-04Merge branch 'lt/fix-sol-pack'Junio C Hamano
* lt/fix-sol-pack: Use sigaction and SA_RESTART in read-tree.c; add option in Makefile. safe_fgets() - even more anal fgets() pack-objects: be incredibly anal about stdio semantics Fix Solaris stdio signal handling stupidities
2006-04-04Use blob_, commit_, tag_, and tree_type throughout.Peter Eriksen
This replaces occurences of "blob", "commit", "tag", and "tree", where they're really used as type specifiers, which we already have defined global constants for. Signed-off-by: Peter Eriksen <> Signed-off-by: Junio C Hamano <>
2006-04-04safe_fgets() - even more anal fgets()Junio C Hamano
This is from Linus -- the previous round forgot to clear error after EINTR case. Signed-off-by: Junio C Hamano <>
2006-04-02pack-objects: be incredibly anal about stdio semanticsLinus Torvalds
This is the "letter of the law" version of using fgets() properly in the face of incredibly broken stdio implementations. We can work around the Solaris breakage with SA_RESTART, but in case anybody else is ever that stupid, here's the "safe" (read: "insanely anal") way to use fgets. It probably goes without saying that I'm not terribly impressed by Solaris libc. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2006-04-02Fix Solaris stdio signal handling stupiditiesLinus Torvalds
This uses sigaction() to install the SIGALRM handler with SA_RESTART, so that Solaris stdio doesn't break completely when a signal interrupts a read. Thanks to Jason Riedy for confirming the silly Solaris signal behaviour. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2006-03-30tree/diff header cleanup.Junio C Hamano
Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <>
2006-03-06pack-objects: simplify "thin" pack.Junio C Hamano
There was a misguided logic to overly prefer using objects that we are not going to pack as the base object. This was unnecessary. It does not matter to the unpacking side where the base object is -- it matters more to make the resulting delta smaller. Signed-off-by: Junio C Hamano <>
2006-03-02Re-fix compilation warnings.Luck, Tony
Commit 8fcf1ad9c68e15d881194c8544e7c11d33529c2b has a combination of double cast and Andreas' switch to using unsigned long ... just the latter is sufficient (and a lot less ugly than using the double cast). Signed-off-by: Tony Luck <> Signed-off-by: Junio C Hamano <>
2006-02-26Use setenv(), fix warningsTimo Hirvonen
- Fix -Wundef -Wold-style-definition warnings - Make pll_free() static [jc: original patch by Timo had another unrelated bits: - Use setenv() instead of putenv() I'm postponing that part for now.] Signed-off-by: Timo Hirvonen <> Signed-off-by: Junio C Hamano <>
2006-02-25fix warning from pack-objects.cLuck, Tony
When compiling on ia64 I get this warning (from gcc 3.4.3): gcc -o pack-objects.o -c -g -O2 -Wall -DSHA1_HEADER='<openssl/sha.h>' pack-objects.c pack-objects.c: In function `pack_revindex_ix': pack-objects.c:94: warning: cast from pointer to integer of different size A double cast (first to long, then to int) shuts gcc up, but is there a better way? [jc: Andreas Ericsson suggests to use ulong instead. ] Signed-off-by: Tony Luck <> Signed-off-by: Junio C Hamano <>
2006-02-25Merge branches 'jc/rev-list' and 'jc/pack-thin'Junio C Hamano
* jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next")
2006-02-24pack-objects: hash basename and direname a bit differently.Junio C Hamano that "Makefile"s from different revs are sorted together, separate from "t/Makefile"s, but close enough. Signed-off-by: Junio C Hamano <>
2006-02-24pack-objects: allow "thin" packs to exceed depth limitsJunio C Hamano
When creating a new pack to be used in .git/objects/pack/ directory, we carefully count the depth of deltified objects to be reused, so that the generated pack does not to exceed the specified depth limit for runtime efficiency. However, when we are generating a thin pack that does not contain base objects, such a pack can only be used during network transfer that is expanded on the other end upon reception, so being careful and artificially cutting the delta chain does not buy us anything except increased bandwidth requirement. This patch disables the delta chain depth limit check when reusing an existing delta. Signed-off-by: Junio C Hamano <>
2006-02-23pack-objects: use full pathname to help hashing with "thin" pack.Junio C Hamano
This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <>
2006-02-23pack-objects: thin pack micro-optimization.Junio C Hamano
Since we sort objects by type, hash, preferredness and then size, after we have a delta against preferred base, there is no point trying a delta with non-preferred base. This seems to save expensive calls to diff-delta and it also seems to save the output space as well. Signed-off-by: Junio C Hamano <>
2006-02-23pack-objects eye-candy: finishing touches.Junio C Hamano
This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <>
2006-02-22also adds progress when actually writing a packNicolas Pitre
If that pack is big, it takes significant time to write and might benefit from some more eye candies as well. This is however disabled when the pack is written to stdout since in that case the output is usually piped into unpack_objects which already does its own progress reporting. Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>