summaryrefslogtreecommitdiff
path: root/index-pack.c
AgeCommit message (Collapse)Author
2006-12-20clarify some error messages wrt unknown object typesNicolas Pitre
If ever new object types are added for future extensions then better have current git version report them as "unknown" instead of "corrupted". Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-20simplify inclusion of system header files.Junio C Hamano
This is a mechanical clean-up of the way *.c files include system header files. (1) sources under compat/, platform sha-1 implementations, and xdelta code are exempt from the following rules; (2) the first #include must be "git-compat-util.h" or one of our own header file that includes it first (e.g. config.h, builtin.h, pkt-line.h); (3) system headers that are included in "git-compat-util.h" need not be included in individual C source files. (4) "git-compat-util.h" does not have to include subsystem specific header files (e.g. expat.h). Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-20index-pack usage of mmap() is unacceptably slower on many OSes other than LinuxNicolas Pitre
It was reported by Randal L. Schwartz <merlyn@stonehenge.com> that indexing the Linux repository ~150MB pack takes about an hour on OS x while it's a minute on Linux. It seems that the OS X mmap() implementation is more than 2 orders of magnitude slower than the Linux one. Linus proposed a patch replacing mmap() with pread() bringing index-pack performance on OS X in line with the Linux one. The performances on Linux also improved by a small margin. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-11Don't use memcpy when source and dest. buffers may overlapJim Meyering
git-index-pack can call memcpy with overlapping source and destination buffers. The patch below makes it use memmove instead. If you want to demonstrate a failure, add the following two lines + if (input_offset < input_len) + abort (); before the existing memcpy call (shown in the patch below), and then run this: (cd t; sh ./t5500-fetch-pack.sh) Signed-off-by: Jim Meyering <jim@meyering.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-11-18sparse fix: non-ANSI function declarationRene Scharfe
The declaration of discard_cache() in cache.h already has its "void". Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-11-03remove .keep pack lock files when done with refs updateNicolas Pitre
This makes both git-fetch and git-push (fetch-pack and receive-pack) safe against a possible race with aparallel git-repack -a -d that could prune the new pack while it is not yet referenced, and remove the .keep file after refs have been updated. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-11-03have index-pack create .keep file more carefullyNicolas Pitre
If by chance we receive a pack which content (list of objects) matches another pack that we already have, and if that pack is marked with a .keep file, then we should not overwrite it. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-11-03Allow pack header preprocessing before unpack-objects/index-pack.Nicolas Pitre
Some applications which invoke unpack-objects or index-pack --stdin may want to examine the pack header to determine the number of objects contained in the pack and use that value to determine which executable to invoke to handle the rest of the pack stream. However if the caller consumes the pack header from the input stream then its no longer available for unpack-objects or index-pack --stdin, both of which need the version and object count to process the stream. This change introduces --pack_header=ver,cnt as a command line option that the caller can supply to indicate it has already consumed the pack header and what version and object count were found in that header. As this option is only meant for low level applications such as receive-pack we are not documenting it at this time. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-29Teach git-index-pack how to keep a pack file.Shawn Pearce
To prevent a race condition between `index-pack --stdin` and `repack -a -d` where the repack deletes the newly created pack file before any refs are updated to reference objects contained within it we mark the pack file as one that should be kept. This removes it from the list of packs that `repack -a -d` will consider for removal. Callers such as `receive-pack` which want to invoke `index-pack` should use this new --keep option to prevent the newly created pack and index file pair from being deleted before they have finished any related ref updates. Only after all ref updates have been finished should the associated .keep file be removed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-27index-pack: minor fixes to comment and function nameNicolas Pitre
Use proper english. Be more exact in one comment. [jc: I threw in a bit of style clean-up as well] Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-27mimic unpack-objects when --stdin is used with index-packNicolas Pitre
It appears that git-unpack-objects writes the last part of the input buffer to stdout after the pack has been parsed. This looks a bit suspicious since the last fill() might have filled the buffer up to the 4096 byte limit and more data might still be pending on stdin, but since this is about being a drop-in replacement for unpack-objects let's simply duplicate the same behavior for now. [jc: with fix-up appeared in Nico's sleep] Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-26add progress status to index-packNicolas Pitre
This is more interesting to look at when performing a big fetch. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-26make index-pack able to complete thin packs.Nicolas Pitre
A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-25enable index-pack streaming capabilityNicolas Pitre
A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-20add the capability for index-pack to read from a streamNicolas Pitre
This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-18index-pack: compare only the first 20-bytes of the key.Nicolas Pitre
The "union delta_base" is a strange beast. It is a 20-byte binary blob key to search a binary searchable deltas[] array, each element of which uses it to represent its base object with either a full 20-byte SHA-1 or an offset in the pack. Which representation is used is determined by another field of the deltas[] array element, obj->type, so there is no room for confusion, as long as we make sure we compare the keys for the same type only with appropriate length. The code compared the full union with memcmp(). When storing the in-pack offset, the union was first cleared before storing an unsigned long, so comparison worked fine. On 64-bit architectures, however, the union typically is 24-byte long; the code did not clear the remaining 4-byte alignment padding when storing a full 20-byte SHA-1 representation. Using memcmp() to compare the whole union was wrong. This fixes the comparison to look at the first 20-bytes of the union, regardless of the architecture. As long as ulong is smaller than 20-bytes this works fine. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-27teach git-index-pack about deltas with offset to baseNicolas Pitre
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-27introduce delta objects with offset to baseNicolas Pitre
This adds a new object, namely OBJ_OFS_DELTA, renames OBJ_DELTA to OBJ_REF_DELTA to better make the distinction between those two delta objects, and adds support for the handling of those new delta objects in sha1_file.c only. The OBJ_OFS_DELTA contains a relative offset from the delta object's position in a pack instead of the 20-byte SHA1 reference to identify the base object. Since the base is likely to be not so far away, the relative offset is more likely to have a smaller encoding on average than an absolute offset. And for those delta objects the base must always be stored first because there is no way to know the distance of later objects when streaming a pack. Hence this relative offset is always meant to be negative. The offset encoding is slightly denser than the one used for object size -- credits to <linux@horizon.com> (whoever this is) for bringing it to my attention. This allows for pack size reduction between 3.2% (Linux-2.6) to over 5% (linux-historic). Runtime pack access should be faster too since delta replay does skip a search in the pack index for each delta in a chain. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-23Convert memcpy(a,b,20) to hashcpy(a,b).Shawn Pearce
This abstracts away the size of the hash values when copying them from memory location to memory location, much as the introduction of hashcmp abstracted away hash value comparsion. A few call sites were using char* rather than unsigned char* so I added the cast rather than open hashcpy to be void*. This is a reasonable tradeoff as most call sites already use unsigned char* and the existing hashcmp is also declared to be unsigned char*. [jc: Splitted the patch to "master" part, to be followed by a patch for merge-recursive.c which is not in "master" yet. Fixed the cast in the latter hunk to combine-diff.c which was wrong in the original. Also converted ones left-over in combine-diff.c, diff-lib.c and upload-pack.c ] Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-17Do not use memcmp(sha1_1, sha1_2, 20) with hardcoded length.David Rientjes
Introduces global inline: hashcmp(const unsigned char *sha1, const unsigned char *sha2) Uses memcmp for comparison and returns the result based on the length of the hash name (a future runtime decision). Acked-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-11drop length argument of has_extensionRene Scharfe
As Fredrik points out the current interface of has_extension() is potentially confusing. Its parameters include both a nul-terminated string and a length-limited string. This patch drops the length argument, requiring two nul-terminated strings; all callsites are updated. I checked that all of them indeed provide nul-terminated strings. Filenames need to be nul-terminated anyway if they are to be passed to open() etc. The performance penalty of the additional strlen() is negligible compared to the system calls which inevitably surround has_extension() calls. Additionally, change has_extension() to use size_t inside instead of int, as that is the exact type strlen() returns and memcmp() expects. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-10Add has_extension()Rene Scharfe
The little helper has_extension() documents through its name what we are trying to do and makes sure we don't forget the underrun check. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-04Use blob_, commit_, tag_, and tree_type throughout.Peter Eriksen
This replaces occurences of "blob", "commit", "tag", and "tree", where they're really used as type specifiers, which we already have defined global constants for. Signed-off-by: Peter Eriksen <s022018@student.dtu.dk> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-10remove delta-against-self bitNicolas Pitre
After experimenting with code to add the ability to encode a delta against part of the deltified file, it turns out that resulting packs are _bigger_ than when this ability is not used. The raw delta output might be smaller, but it doesn't compress as well using gzip with a negative net saving on average. Said bit would in fact be more useful to allow for encoding the copying of chunks larger than 64KB providing more savings with large files. This will correspond to packs version 3. While the current code still produces packs version 2, it is made future proof so pack versions 2 and 3 are accepted. Any pack version 2 are compatible with version 3 since the redefined bit was never used before. When enough time has passed, code to use that bit to produce version 3 packs could be added. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-12-27avoid asking ?alloc() for zero bytes.Junio C Hamano
Avoid asking for zero bytes when that change simplifies overall logic. Later we would change the wrapper to ask for 1 byte on platforms that return NULL for zero byte request. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-12-21An off-by-one bug found by valgrindPavel Roskin
Insufficient memory is allocated in index-pack.c to hold the *.idx name. One more byte should be allocated to hold the terminating 0. Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-13Fix packname hash generation.Junio C Hamano
This changes the generation of hash packfiles have in their names, from "hash of object names as fed to us" to "hash of object names in the resulting pack, in the order they appear in the index file". The new "git-index-pack" command is taught to output the computed hash value to its standard output. With this, we can store downloaded pack in a temporary file without knowing its final name, run git-index-pack to generate idx for it while finding out its final name, and then rename the pack and idx to their final names. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-13Add git-index-pack utilitySergey Vlasov
git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net>