summaryrefslogtreecommitdiff
path: root/index-pack.c
AgeCommit message (Collapse)Author
2008-02-29index-pack: introduce checking modeMartin Koegler
Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-14Merge branch 'np/progress'Junio C Hamano
* np/progress: nicer display of thin pack completion make display of total transferred fully accurate remove dead code from the csum-file interface git-fetch: be even quieter. make display of total transferred more accurate sideband.c: ESC is spelled '\033' not '\e' for portability. fix display overlap between remote and local progress
2007-11-11Merge branch 'maint'Junio C Hamano
* maint: fix index-pack with packs >4GB containing deltas on 32-bit machines git-hash-object should honor config variables gitweb: correct month in date display for atom feeds
2007-11-11fix index-pack with packs >4GB containing deltas on 32-bit machinesNicolas Pitre
This probably hasn't been properly tested before. Here's a script to create a 8GB repo with the necessary characteristics (copy the test-genrandom executable from the Git build tree to /tmp first): ----- #!/bin/bash git init git config core.compression 0 # create big objects with no deltas for i in $(seq -w 1 2 63) do echo $i /tmp/test-genrandom $i 268435456 > file_$i git add file_$i rm file_$i echo "file_$i -delta" >> .gitattributes done # create "deltifiable" objects in between big objects for i in $(seq -w 2 2 64) do echo "$i $i $i" >> grow cp grow file_$i git add file_$i rm file_$i done rm grow # create a pack with them git commit -q -m "commit of big objects interlaced with small deltas" git repack -a -d ----- Then clone this repo over the Git protocol. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-08nicer display of thin pack completionNicolas Pitre
In the same spirit of prettifying Git's output display for mere mortals, here's a simple extension to the progress API allowing for a final message to be provided when terminating a progress line, and use it for the display of the number of objects needed to complete a thin pack, saving yet one more line of screen display. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05make display of total transferred more accurateNicolas Pitre
The throughput display needs a delay period before accounting and displaying anything. Yet it might be called after some amount of data has already been transferred. The display of total data is therefore accounted late and therefore smaller than the reality. Let's call display_throughput() with an absolute amount of transferred data instead of a relative number, and let the throughput code find the relative amount of data by itself as needed. This way the displayed total is always exact. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-04Merge branch 'np/pack'Junio C Hamano
* np/pack: pack-objects: get rid of an ugly cast make the pack index version configurable Conflicts: builtin-pack-objects.c
2007-11-02make the pack index version configurableNicolas Pitre
It is a good idea to use pack index version 2 all the time since it has proper protection against propagation of certain pack corruptions when repacking which is not possible with index version 1, as demonstrated in test t5302. Hence this config option. The default is still pack index version 1. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-30add throughput display to index-packNicolas Pitre
... and call it "Receiving objects" when over stdin to look clearer to end users. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-30relax usage of the progress APINicolas Pitre
Since it is now OK to pass a null pointer to display_progress() and stop_progress() resulting in a no-op, then we can simplify the code and remove a bunch of lines by not making those calls conditional all the time. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-30make struct progress an opaque typeNicolas Pitre
This allows for better management of progress "object" existence, as well as making the progress display implementation more independent from its callers. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-17fix for more minor memory leaksNicolas Pitre
Now that some pointers have lost their const attribute, we can free their associated memory when done with them. This is more a correctness issue about the rule for freeing those pointers which isn't completely trivial more than the leak itself which didn't matter as the program is exiting anyway. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-17fix const issues with some functionsNicolas Pitre
Two functions, namely write_idx_file() and open_pack_file(), currently return a const pointer. However that pointer is either a copy of the first argument, or set to a malloc'd buffer when that first argument is null. In the later case it is wrong to qualify that pointer as const since ownership of the buffer is transferred to the caller to dispose of, and obviously the free() function is not meant to be passed const pointers. Making the return pointer not const causes a warning when the first argument is returned since that argument is also marked const. The correct thing to do is therefore to remove the const qualifiers, avoiding the need for ugly casts only to silence some warnings. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-17more compact progress displayNicolas Pitre
Each progress can be on a single line instead of two. [sp: Changed "Checking files out" to "Checking out files" at Johannes Sixt's suggestion as it better explains the action that is taking place] Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-08-15Use xmkstemp() instead of mkstemp()Luiz Fernando N. Capitulino
xmkstemp() performs error checking and prints a standard error message when an error occur. Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-27detect close failure on just-written file handlesJim Meyering
I audited git for potential undetected write failures. In the cases fixed below, the diagnostics I add mimic the diagnostics used in surrounding code, even when that means not reporting the precise strerror(errno) cause of the error. Signed-off-by: Jim Meyering <jim@meyering.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-02Unify write_index_file functionsGeert Bosch
This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-16Ensure return value from xread() is always stored into an ssize_tJohan Herland
This patch fixes all calls to xread() where the return value is not stored into an ssize_t. The patch should not have any effect whatsoever, other than putting better/more appropriate type names on variables. Signed-off-by: Johan Herland <johan@herland.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-02Reuse fixup_pack_header_footer in index-packShawn O. Pearce
Now that fast-import is using a "library function" to handle correcting its packfile's object count and trailing SHA-1 we should reuse the same function in index-pack, to reduce the size of the code we must maintain. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-04-23make progress "title" part of the common progress interfaceNicolas Pitre
If the progress bar ends up in a box, better provide a title for it too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-23common progress display supportNicolas Pitre
Instead of having this code duplicated in multiple places, let's have a common interface for progress display. If someday someone wishes to display a cheezy progress bar instead then only one file will have to be changed. Note: I left merge-recursive.c out since it has a strange notion of progress as it apparently increase the expected total number as it goes. Someone with more intimate knowledge of what that is supposed to mean might look at converting it to the common progress interface. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10allow forcing index v2 and 64-bit offset tresholdNicolas Pitre
This is necessary for testing the new capabilities in some automated way without having an actual 4GB+ pack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10index-pack: learn about pack index version 2Nicolas Pitre
Like previous patch but for index-pack. [ There is quite some code duplication between pack-objects and index-pack for generating a pack index (and fast-import as well I suppose). This should be reworked into a common function eventually. But not now. ] Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10compute object CRC32 with index-packNicolas Pitre
Same as previous patch but for index-pack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10add overflow tests on pack offset variablesNicolas Pitre
Change a few size and offset variables to more appropriate type, then add overflow tests on those offsets. This prevents any bad data to be generated/processed if off_t happens to not be large enough to handle some big packs. Better be safe than sorry. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10make overflow test on delta base offset work regardless of variable sizeNicolas Pitre
This patch introduces the MSB() macro to obtain the desired number of most significant bits from a given variable independently of the variable type. It is then used to better implement the overflow test on the OBJ_OFS_DELTA base offset variable with the property of always working correctly regardless of the type/size of that variable. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-04Plug memory leak in index-pack collision checking codepath.Nicolas Pitre
2007-03-25make it more obvious that temporary files are temporary filesNicolas Pitre
When some operations are interrupted (or "die()'d" or crashed) then the partial object/pack/index file may remain around. Make it more obvious in their name that those files are temporary stuff and can be cleaned up if no operation is in progress. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-21index-pack: more validation checks and cleanupsNicolas Pitre
When appending objects to a pack, make sure the appended data is really what we expect instead of simply loading potentially corrupted objects and legitimating them by computing a SHA1 of that corrupt data. With this the sha1_object() can lose its test_for_collision parameter which is now redundent. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-21index-pack: use hash_sha1_file()Nicolas Pitre
Use hash_sha1_file() instead of duplicating code to compute object SHA1. While at it make it accept a const pointer. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-21don't ever allow SHA1 collisions to exist by fetching a packNicolas Pitre
Waaaaaaay back Git was considered to be secure as it never overwrote an object it already had. This was ensured by always unpacking the packfile received over the network (both in fetch and receive-pack) and our already existing logic to not create a loose object for an object we already have. Lately however we keep "large-ish" packfiles on both fetch and push by running them through index-pack instead of unpack-objects. This would let an attacker perform a birthday attack. How? Assume the attacker knows a SHA-1 that has two different data streams. He knows the client is likely to have the "good" one. So he sends the "evil" variant to the other end as part of a "large-ish" packfile. The recipient keeps that packfile, and indexes it. Now since this is a birthday attack there is a SHA-1 collision; two objects exist in the repository with the same SHA-1. They have *very* different data streams. One of them is "evil". Currently the poor recipient cannot tell the two objects apart, short of by examining the timestamp of the packfiles. But lets say the recipient repacks before he realizes he's been attacked. We may wind up packing the "evil" version of the object, and deleting the "good" one. This is made *even more likely* by Junio's recent rearrange_packed_git patch (b867092f). It is extremely unlikely for a SHA1 collisions to occur, but if it ever happens with a remote (hence untrusted) object we simply must not let the fetch succeed. Normally received packs should not contain objects we already have. But when they do we must ensure duplicated objects with the same SHA1 actually contain the same data. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-07General const correctness fixesShawn O. Pearce
We shouldn't attempt to assign constant strings into char*, as the string is not writable at runtime. Likewise we should always be treating unsigned values as unsigned values, not as signed values. Most of these are very straightforward. The only exception is the (unnecessary) xstrdup/free in builtin-branch.c for the detached head case. Since this is a user-level interactive type program and that particular code path is executed no more than once, I feel that the extra xstrdup call is well worth the easy elimination of this warning. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-28Merge branch 'js/bundle'Junio C Hamano
* js/bundle: bundle: reword missing prerequisite error message git-bundle: record commit summary in the prerequisite data git-bundle: fix 'create --all' git-bundle: avoid fork() in verify_bundle() git-bundle: assorted fixes Add git-bundle: move objects and references by archive
2007-02-28Merge branch 'np/types'Junio C Hamano
* np/types: Cleanup check_valid in commit-tree. make sure enum object_type is signed get rid of lookup_object_type() convert object type handling from a string to a number formalize typename(), and add its reverse type_from_string() sha1_file.c: don't ignore an error condition in sha1_loose_object_info() sha1_file.c: cleanup "offset" usage sha1_file.c: cleanup hdr usage
2007-02-28Merge branch 'maint'Junio C Hamano
* maint: builtin-fmt-merge-msg: fix bugs in --file option index-pack: Loop over pread until data loading is complete. blameview: Fix the browse behavior in blameview Fix minor typos/grammar in user-manual.txt Correct ordering in git-cvsimport's option documentation git-show: Reject native ref Fix git-show man page formatting in the EXAMPLES section
2007-02-28index-pack: Loop over pread until data loading is complete.Shawn O. Pearce
A filesystem might not be able to completely supply our pread request in one system call, such as if we are reading data from a network file system and the requested length is just simply huge. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-27convert object type handling from a string to a numberNicolas Pitre
We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-27formalize typename(), and add its reverse type_from_string()Nicolas Pitre
Sometime typename() is used, sometimes type_names[] is accessed directly. Let's enforce typename() all the time which allows for validating the type. Also let's add a function to go from a name to a type and use it instead of manual memcpy() when appropriate. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-23git-bundle: assorted fixesJohannes Schindelin
This patch fixes issues mentioned by Junio, Nico and Simon: - I forgot to convert the usage string when removing the "--" from the subcommands, - a style fix in the bundle_header, - use xread() instead of read(), - use write_or_die() instead of write(), - make the bundle header extensible, - fail if the whitespace after a sha1 of a reference is missing, - close() the fds passed to a subprocess, - in verify_bundle(), do not use "rev-list --stdin", but rather pass the revs directly (avoiding a fork()), - fix a corrupted comment in show_object(), and - fix the size check in index_pack. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-23Add git-bundle: move objects and references by archiveJohannes Schindelin
Some workflows require use of repositories on machines that cannot be connected, preventing use of git-fetch / git-push to transport objects and references between the repositories. git-bundle provides an alternate transport mechanism, effectively allowing git-fetch and git-pull to operate using sneakernet transport. `git-bundle create` allows the user to create a bundle containing one or more branches or tags, but with specified basis assumed to exist on the target repository. At the receiving end, git-bundle acts like git-fetch-pack, allowing the user to invoke git-fetch or git-pull using the bundle file as the URL. git-fetch and git-ls-remote determine they have a bundle URL by checking that the URL points to a file, but are otherwise unchanged in operation with bundles. The original patch was done by Mark Levedahl <mdl123@verizon.net>. It was updated to make git-bundle a builtin, and get rid of the tar format: now, the first line is supposed to say "# v2 git bundle", the next lines either contain a prerequisite ("-" followed by the hash of the needed commit), or a ref (the hash of a commit, followed by the name of the ref), and finally the pack. As a result, the bundle argument can be "-" now. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-21Mechanical conversion to use prefixcmp()Junio C Hamano
This mechanically converts strncmp() to use prefixcmp(), but only when the parameters match specific patterns, so that they can be verified easily. Leftover from this will be fixed in a separate step, including idiotic conversions like if (!strncmp("foo", arg, 3)) => if (!(-prefixcmp(arg, "foo"))) This was done by using this script in px.perl #!/usr/bin/perl -i.bak -p if (/strncmp\(([^,]+), "([^\\"]*)", (\d+)\)/ && (length($2) == $3)) { s|strncmp\(([^,]+), "([^\\"]*)", (\d+)\)|prefixcmp($1, "$2")|; } if (/strncmp\("([^\\"]*)", ([^,]+), (\d+)\)/ && (length($1) == $3)) { s|strncmp\("([^\\"]*)", ([^,]+), (\d+)\)|(-prefixcmp($2, "$1"))|; } and running: $ git grep -l strncmp -- '*.c' | xargs perl px.perl Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-11index-pack: write-or-die instead of unchecked write-in-full.Junio C Hamano
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-08short i/o: fix calls to write to use xwrite or write_in_fullAndy Whitcroft
We have a number of badly checked write() calls. Often we are expecting write() to write exactly the size we requested or fail, this fails to handle interrupts or short writes. Switch to using the new write_in_full(). Otherwise we at a minimum need to check for EINTR and EAGAIN, where this is appropriate use xwrite(). Note, the changes to config handling are much larger and handled in the next patch in the sequence. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-08short i/o: fix calls to read to use xread or read_in_fullAndy Whitcroft
We have a number of badly checked read() calls. Often we are expecting read() to read exactly the size we requested or fail, this fails to handle interrupts or short reads. Add a read_in_full() providing those semantics. Otherwise we at a minimum need to check for EINTR and EAGAIN, where this is appropriate use xread(). Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-20clarify some error messages wrt unknown object typesNicolas Pitre
If ever new object types are added for future extensions then better have current git version report them as "unknown" instead of "corrupted". Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-20simplify inclusion of system header files.Junio C Hamano
This is a mechanical clean-up of the way *.c files include system header files. (1) sources under compat/, platform sha-1 implementations, and xdelta code are exempt from the following rules; (2) the first #include must be "git-compat-util.h" or one of our own header file that includes it first (e.g. config.h, builtin.h, pkt-line.h); (3) system headers that are included in "git-compat-util.h" need not be included in individual C source files. (4) "git-compat-util.h" does not have to include subsystem specific header files (e.g. expat.h). Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-20index-pack usage of mmap() is unacceptably slower on many OSes other than LinuxNicolas Pitre
It was reported by Randal L. Schwartz <merlyn@stonehenge.com> that indexing the Linux repository ~150MB pack takes about an hour on OS x while it's a minute on Linux. It seems that the OS X mmap() implementation is more than 2 orders of magnitude slower than the Linux one. Linus proposed a patch replacing mmap() with pread() bringing index-pack performance on OS X in line with the Linux one. The performances on Linux also improved by a small margin. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-11Don't use memcpy when source and dest. buffers may overlapJim Meyering
git-index-pack can call memcpy with overlapping source and destination buffers. The patch below makes it use memmove instead. If you want to demonstrate a failure, add the following two lines + if (input_offset < input_len) + abort (); before the existing memcpy call (shown in the patch below), and then run this: (cd t; sh ./t5500-fetch-pack.sh) Signed-off-by: Jim Meyering <jim@meyering.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-11-18sparse fix: non-ANSI function declarationRene Scharfe
The declaration of discard_cache() in cache.h already has its "void". Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-11-03remove .keep pack lock files when done with refs updateNicolas Pitre
This makes both git-fetch and git-push (fetch-pack and receive-pack) safe against a possible race with aparallel git-repack -a -d that could prune the new pack while it is not yet referenced, and remove the .keep file after refs have been updated. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>