summaryrefslogtreecommitdiff
path: root/index-pack.c
AgeCommit message (Collapse)Author
2008-07-17Merge branch 'sb/dashless'Junio C Hamano
* sb/dashless: Make usage strings dash-less t/: Use "test_must_fail git" instead of "! git" t/test-lib.sh: exit with small negagive int is ok with test_must_fail Conflicts: builtin-blame.c builtin-mailinfo.c builtin-mailsplit.c builtin-shortlog.c git-am.sh t/t4150-am.sh t/t4200-rerere.sh
2008-07-17Merge branch 'sp/maint-index-pack'Junio C Hamano
* sp/maint-index-pack: index-pack: Honor core.deltaBaseCacheLimit when resolving deltas index-pack: Track the object_entry that creates each base_data index-pack: Chain the struct base_data on the stack for traversal index-pack: Refactor base arguments of resolve_delta into a struct
2008-07-15index-pack: Honor core.deltaBaseCacheLimit when resolving deltasShawn O. Pearce
If we are trying to resolve deltas for a long delta chain composed of multi-megabyte objects we can easily run into requiring 500M+ of memory to hold each object in the chain on the call stack while we recurse into the dependent objects and resolve them. We now use a simple delta cache that discards objects near the bottom of the call stack first, as they are the most least recently used objects in this current delta chain. If we recurse out of a chain we may find the base object is no longer available, as it was free'd to keep memory under the deltaBaseCacheLimit. In such cases we must unpack the base object again, which will require recursing back to the root of the top of the delta chain as we released that root first. The astute reader will probably realize that we can still exceed the delta base cache limit, but this happens only if the most recent base plus the delta plus the inflated dependent sum up to more than the base cache limit. Due to the way patch_delta is currently implemented we cannot operate in less memory anyway. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-15index-pack: Track the object_entry that creates each base_dataShawn O. Pearce
If we free the data stored within a base_data we need the struct object_entry to get the data back again for use with another dependent delta. Storing the object_entry* in base_data makes it simple to call get_data_from_pack() to recover the compressed information. This however means that we must add the missing base object to the end of our packfile prior to calling resolve_delta() on each of the dependent deltas. Adding the base first ensures we can read the base back from the pack we are indexing, as if it had been included by the remote side. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-15index-pack: Chain the struct base_data on the stack for traversalShawn O. Pearce
We need to release earlier inflated base objects when memory gets low, which means we need to be able to walk up or down the stack to locate the objects we want to release, and free their data. The new link/unlink routines allow inserting and removing the struct base_data during recursion inside resolve_delta, and the global base_cache gives us the head of the chain (bottom of the stack) so we can traverse it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-15index-pack: Refactor base arguments of resolve_delta into a structShawn O. Pearce
We need to discard base objects which are not recently used if our memory gets low, such as when we are unpacking a long delta chain of a very large object. To support tracking the available base objects we combine the pointer and size into a struct. Future changes would allow the data pointer to be free'd and marked NULL if memory gets low. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-13Make usage strings dash-lessStephan Beyer
When you misuse a git command, you are shown the usage string. But this is currently shown in the dashed form. So if you just copy what you see, it will not work, when the dashed form is no longer supported. This patch makes git commands show the dash-less version. For shell scripts that do not specify OPTIONS_SPEC, git-sh-setup.sh generates a dash-less usage string now. Signed-off-by: Stephan Beyer <s-beyer@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-06Fix some warnings (on cygwin) to allow -WerrorRamsay Jones
When printing valuds of type uint32_t, we should use PRIu32, and should not assume that it is unsigned int. On 32-bit platforms, it could be defined as unsigned long. The same caution applies to ntohl(). Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-31Make pack creation always fsync() the resultLinus Torvalds
This means that we can depend on packs always being stable on disk, simplifying a lot of the object serialization worries. And unlike loose objects, serializing pack creation IO isn't going to be a performance killer. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-14Provide git_config with a callback-data parameterJohannes Schindelin
git_config() only had a function parameter, but no callback data parameter. This assumes that all callback functions only modify global variables. With this patch, every callback gets a void * parameter, and it is hoped that this will help the libification effort. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-29index-pack: introduce checking modeMartin Koegler
Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-14Merge branch 'np/progress'Junio C Hamano
* np/progress: nicer display of thin pack completion make display of total transferred fully accurate remove dead code from the csum-file interface git-fetch: be even quieter. make display of total transferred more accurate sideband.c: ESC is spelled '\033' not '\e' for portability. fix display overlap between remote and local progress
2007-11-11Merge branch 'maint'Junio C Hamano
* maint: fix index-pack with packs >4GB containing deltas on 32-bit machines git-hash-object should honor config variables gitweb: correct month in date display for atom feeds
2007-11-11fix index-pack with packs >4GB containing deltas on 32-bit machinesNicolas Pitre
This probably hasn't been properly tested before. Here's a script to create a 8GB repo with the necessary characteristics (copy the test-genrandom executable from the Git build tree to /tmp first): ----- #!/bin/bash git init git config core.compression 0 # create big objects with no deltas for i in $(seq -w 1 2 63) do echo $i /tmp/test-genrandom $i 268435456 > file_$i git add file_$i rm file_$i echo "file_$i -delta" >> .gitattributes done # create "deltifiable" objects in between big objects for i in $(seq -w 2 2 64) do echo "$i $i $i" >> grow cp grow file_$i git add file_$i rm file_$i done rm grow # create a pack with them git commit -q -m "commit of big objects interlaced with small deltas" git repack -a -d ----- Then clone this repo over the Git protocol. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-08nicer display of thin pack completionNicolas Pitre
In the same spirit of prettifying Git's output display for mere mortals, here's a simple extension to the progress API allowing for a final message to be provided when terminating a progress line, and use it for the display of the number of objects needed to complete a thin pack, saving yet one more line of screen display. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05make display of total transferred more accurateNicolas Pitre
The throughput display needs a delay period before accounting and displaying anything. Yet it might be called after some amount of data has already been transferred. The display of total data is therefore accounted late and therefore smaller than the reality. Let's call display_throughput() with an absolute amount of transferred data instead of a relative number, and let the throughput code find the relative amount of data by itself as needed. This way the displayed total is always exact. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-04Merge branch 'np/pack'Junio C Hamano
* np/pack: pack-objects: get rid of an ugly cast make the pack index version configurable Conflicts: builtin-pack-objects.c
2007-11-02make the pack index version configurableNicolas Pitre
It is a good idea to use pack index version 2 all the time since it has proper protection against propagation of certain pack corruptions when repacking which is not possible with index version 1, as demonstrated in test t5302. Hence this config option. The default is still pack index version 1. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-30add throughput display to index-packNicolas Pitre
... and call it "Receiving objects" when over stdin to look clearer to end users. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-30relax usage of the progress APINicolas Pitre
Since it is now OK to pass a null pointer to display_progress() and stop_progress() resulting in a no-op, then we can simplify the code and remove a bunch of lines by not making those calls conditional all the time. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-30make struct progress an opaque typeNicolas Pitre
This allows for better management of progress "object" existence, as well as making the progress display implementation more independent from its callers. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-17fix for more minor memory leaksNicolas Pitre
Now that some pointers have lost their const attribute, we can free their associated memory when done with them. This is more a correctness issue about the rule for freeing those pointers which isn't completely trivial more than the leak itself which didn't matter as the program is exiting anyway. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-17fix const issues with some functionsNicolas Pitre
Two functions, namely write_idx_file() and open_pack_file(), currently return a const pointer. However that pointer is either a copy of the first argument, or set to a malloc'd buffer when that first argument is null. In the later case it is wrong to qualify that pointer as const since ownership of the buffer is transferred to the caller to dispose of, and obviously the free() function is not meant to be passed const pointers. Making the return pointer not const causes a warning when the first argument is returned since that argument is also marked const. The correct thing to do is therefore to remove the const qualifiers, avoiding the need for ugly casts only to silence some warnings. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-17more compact progress displayNicolas Pitre
Each progress can be on a single line instead of two. [sp: Changed "Checking files out" to "Checking out files" at Johannes Sixt's suggestion as it better explains the action that is taking place] Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-08-15Use xmkstemp() instead of mkstemp()Luiz Fernando N. Capitulino
xmkstemp() performs error checking and prints a standard error message when an error occur. Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-27detect close failure on just-written file handlesJim Meyering
I audited git for potential undetected write failures. In the cases fixed below, the diagnostics I add mimic the diagnostics used in surrounding code, even when that means not reporting the precise strerror(errno) cause of the error. Signed-off-by: Jim Meyering <jim@meyering.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-02Unify write_index_file functionsGeert Bosch
This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-16Ensure return value from xread() is always stored into an ssize_tJohan Herland
This patch fixes all calls to xread() where the return value is not stored into an ssize_t. The patch should not have any effect whatsoever, other than putting better/more appropriate type names on variables. Signed-off-by: Johan Herland <johan@herland.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-02Reuse fixup_pack_header_footer in index-packShawn O. Pearce
Now that fast-import is using a "library function" to handle correcting its packfile's object count and trailing SHA-1 we should reuse the same function in index-pack, to reduce the size of the code we must maintain. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-04-23make progress "title" part of the common progress interfaceNicolas Pitre
If the progress bar ends up in a box, better provide a title for it too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-23common progress display supportNicolas Pitre
Instead of having this code duplicated in multiple places, let's have a common interface for progress display. If someday someone wishes to display a cheezy progress bar instead then only one file will have to be changed. Note: I left merge-recursive.c out since it has a strange notion of progress as it apparently increase the expected total number as it goes. Someone with more intimate knowledge of what that is supposed to mean might look at converting it to the common progress interface. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10allow forcing index v2 and 64-bit offset tresholdNicolas Pitre
This is necessary for testing the new capabilities in some automated way without having an actual 4GB+ pack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10index-pack: learn about pack index version 2Nicolas Pitre
Like previous patch but for index-pack. [ There is quite some code duplication between pack-objects and index-pack for generating a pack index (and fast-import as well I suppose). This should be reworked into a common function eventually. But not now. ] Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10compute object CRC32 with index-packNicolas Pitre
Same as previous patch but for index-pack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10add overflow tests on pack offset variablesNicolas Pitre
Change a few size and offset variables to more appropriate type, then add overflow tests on those offsets. This prevents any bad data to be generated/processed if off_t happens to not be large enough to handle some big packs. Better be safe than sorry. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10make overflow test on delta base offset work regardless of variable sizeNicolas Pitre
This patch introduces the MSB() macro to obtain the desired number of most significant bits from a given variable independently of the variable type. It is then used to better implement the overflow test on the OBJ_OFS_DELTA base offset variable with the property of always working correctly regardless of the type/size of that variable. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-04Plug memory leak in index-pack collision checking codepath.Nicolas Pitre
2007-03-25make it more obvious that temporary files are temporary filesNicolas Pitre
When some operations are interrupted (or "die()'d" or crashed) then the partial object/pack/index file may remain around. Make it more obvious in their name that those files are temporary stuff and can be cleaned up if no operation is in progress. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-21index-pack: more validation checks and cleanupsNicolas Pitre
When appending objects to a pack, make sure the appended data is really what we expect instead of simply loading potentially corrupted objects and legitimating them by computing a SHA1 of that corrupt data. With this the sha1_object() can lose its test_for_collision parameter which is now redundent. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-21index-pack: use hash_sha1_file()Nicolas Pitre
Use hash_sha1_file() instead of duplicating code to compute object SHA1. While at it make it accept a const pointer. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-21don't ever allow SHA1 collisions to exist by fetching a packNicolas Pitre
Waaaaaaay back Git was considered to be secure as it never overwrote an object it already had. This was ensured by always unpacking the packfile received over the network (both in fetch and receive-pack) and our already existing logic to not create a loose object for an object we already have. Lately however we keep "large-ish" packfiles on both fetch and push by running them through index-pack instead of unpack-objects. This would let an attacker perform a birthday attack. How? Assume the attacker knows a SHA-1 that has two different data streams. He knows the client is likely to have the "good" one. So he sends the "evil" variant to the other end as part of a "large-ish" packfile. The recipient keeps that packfile, and indexes it. Now since this is a birthday attack there is a SHA-1 collision; two objects exist in the repository with the same SHA-1. They have *very* different data streams. One of them is "evil". Currently the poor recipient cannot tell the two objects apart, short of by examining the timestamp of the packfiles. But lets say the recipient repacks before he realizes he's been attacked. We may wind up packing the "evil" version of the object, and deleting the "good" one. This is made *even more likely* by Junio's recent rearrange_packed_git patch (b867092f). It is extremely unlikely for a SHA1 collisions to occur, but if it ever happens with a remote (hence untrusted) object we simply must not let the fetch succeed. Normally received packs should not contain objects we already have. But when they do we must ensure duplicated objects with the same SHA1 actually contain the same data. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-07General const correctness fixesShawn O. Pearce
We shouldn't attempt to assign constant strings into char*, as the string is not writable at runtime. Likewise we should always be treating unsigned values as unsigned values, not as signed values. Most of these are very straightforward. The only exception is the (unnecessary) xstrdup/free in builtin-branch.c for the detached head case. Since this is a user-level interactive type program and that particular code path is executed no more than once, I feel that the extra xstrdup call is well worth the easy elimination of this warning. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-28Merge branch 'js/bundle'Junio C Hamano
* js/bundle: bundle: reword missing prerequisite error message git-bundle: record commit summary in the prerequisite data git-bundle: fix 'create --all' git-bundle: avoid fork() in verify_bundle() git-bundle: assorted fixes Add git-bundle: move objects and references by archive
2007-02-28Merge branch 'np/types'Junio C Hamano
* np/types: Cleanup check_valid in commit-tree. make sure enum object_type is signed get rid of lookup_object_type() convert object type handling from a string to a number formalize typename(), and add its reverse type_from_string() sha1_file.c: don't ignore an error condition in sha1_loose_object_info() sha1_file.c: cleanup "offset" usage sha1_file.c: cleanup hdr usage
2007-02-28Merge branch 'maint'Junio C Hamano
* maint: builtin-fmt-merge-msg: fix bugs in --file option index-pack: Loop over pread until data loading is complete. blameview: Fix the browse behavior in blameview Fix minor typos/grammar in user-manual.txt Correct ordering in git-cvsimport's option documentation git-show: Reject native ref Fix git-show man page formatting in the EXAMPLES section
2007-02-28index-pack: Loop over pread until data loading is complete.Shawn O. Pearce
A filesystem might not be able to completely supply our pread request in one system call, such as if we are reading data from a network file system and the requested length is just simply huge. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-27convert object type handling from a string to a numberNicolas Pitre
We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-27formalize typename(), and add its reverse type_from_string()Nicolas Pitre
Sometime typename() is used, sometimes type_names[] is accessed directly. Let's enforce typename() all the time which allows for validating the type. Also let's add a function to go from a name to a type and use it instead of manual memcpy() when appropriate. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-23git-bundle: assorted fixesJohannes Schindelin
This patch fixes issues mentioned by Junio, Nico and Simon: - I forgot to convert the usage string when removing the "--" from the subcommands, - a style fix in the bundle_header, - use xread() instead of read(), - use write_or_die() instead of write(), - make the bundle header extensible, - fail if the whitespace after a sha1 of a reference is missing, - close() the fds passed to a subprocess, - in verify_bundle(), do not use "rev-list --stdin", but rather pass the revs directly (avoiding a fork()), - fix a corrupted comment in show_object(), and - fix the size check in index_pack. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-23Add git-bundle: move objects and references by archiveJohannes Schindelin
Some workflows require use of repositories on machines that cannot be connected, preventing use of git-fetch / git-push to transport objects and references between the repositories. git-bundle provides an alternate transport mechanism, effectively allowing git-fetch and git-pull to operate using sneakernet transport. `git-bundle create` allows the user to create a bundle containing one or more branches or tags, but with specified basis assumed to exist on the target repository. At the receiving end, git-bundle acts like git-fetch-pack, allowing the user to invoke git-fetch or git-pull using the bundle file as the URL. git-fetch and git-ls-remote determine they have a bundle URL by checking that the URL points to a file, but are otherwise unchanged in operation with bundles. The original patch was done by Mark Levedahl <mdl123@verizon.net>. It was updated to make git-bundle a builtin, and get rid of the tar format: now, the first line is supposed to say "# v2 git bundle", the next lines either contain a prerequisite ("-" followed by the hash of the needed commit), or a ref (the hash of a commit, followed by the name of the ref), and finally the pack. As a result, the bundle argument can be "-" now. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>