summaryrefslogtreecommitdiff
path: root/builtin/index-pack.c
AgeCommit message (Collapse)Author
2013-09-17Merge branch 'jk/free-tree-buffer'Junio C Hamano
* jk/free-tree-buffer: clear parsed flag when we free tree buffers
2013-06-06clear parsed flag when we free tree buffersJeff King
Many code paths will free a tree object's buffer and set it to NULL after finishing with it in order to keep memory usage down during a traversal. However, out of 8 sites that do this, only one actually unsets the "parsed" flag back. Those sites that don't are setting a trap for later users of the tree object; even after calling parse_tree, the buffer will remain NULL, causing potential segfaults. It is not known whether this is triggerable in the current code. Most commands do not do an in-memory traversal followed by actually using the objects again. However, it does not hurt to be safe for future callers. In most cases, we can abstract this out to a "free_tree_buffer" helper. However, there are two exceptions: 1. The fsck code relies on the parsed flag to know that we were able to parse the object at one point. We can switch this to using a flag in the "flags" field. 2. The index-pack code sets the buffer to NULL but does not free it (it is freed by a caller). We should still unset the parsed flag here, but we cannot use our helper, as we do not want to free the buffer. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28clone: open a shortcut for connectivity checkNguyễn Thái Ngọc Duy
In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28index-pack: remove dead code (it should never happen)Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-01Merge branch 'nd/index-pack-threaded-fixes'Junio C Hamano
"index-pack --verify-stat" used a few counters outside protection of mutex, possibly showing incorrect numbers. * nd/index-pack-threaded-fixes: index-pack: guard nr_resolved_deltas reads by lock index-pack: protect deepest_delta in multithread code
2013-04-01Merge branch 'jk/index-pack-correct-depth-fix'Junio C Hamano
"index-pack --fix-thin" used uninitialize value to compute delta depths of objects it appends to the resulting pack. * jk/index-pack-correct-depth-fix: index-pack: always zero-initialize object_entry list
2013-03-20index-pack: always zero-initialize object_entry listJeff King
Commit 38a4556 (index-pack: start learning to emulate "verify-pack -v", 2011-06-03) added a "delta_depth" counter to each "struct object_entry". Initially, all object entries have their depth set to 0; in resolve_delta, we then set the depth of each delta to "base + 1". Base entries never have their depth touched, and remain at 0. To ensure that all depths start at 0, that commit changed calls to xmalloc the object_entry list into calls to xcalloc. However, it forgot that we grow the list with xrealloc later. These extra entries are used when we add an object from elsewhere to complete a thin pack. If we add a non-delta object, its depth value will just be uninitialized heap data. This patch fixes it by zero-initializing entries we add to the objects list via the xrealloc. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-19index-pack: guard nr_resolved_deltas reads by lockThomas Rast
The threaded parts of index-pack increment the number of resolved deltas in nr_resolved_deltas guarded by counter_mutex. However, the per-thread outer loop accessed nr_resolved_deltas without any locks. This is not wrong as such, since it doesn't matter all that much whether we get an outdated value. However, unless someone proves that this one lock makes all the performance difference, it would be much cleaner to guard _all_ accesses to the variable with the lock. The only such use is display_progress() in the threaded section (all others are in the conclude_pack() callchain outside the threaded part). To make it obvious that it cannot deadlock, move it out of work_mutex. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Reviewed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-19index-pack: protect deepest_delta in multithread codeNguyễn Thái Ngọc Duy
deepest_delta is a global variable but is updated without protection in resolve_delta(), a multithreaded function. Add a new mutex for it, but only protect and update when it's actually used (i.e. show_stat is non-zero). Another variable that will not be updated is delta_depth in "struct object_entry" as it's only useful when show_stat is 1. Putting it in "if (show_stat)" makes it clearer. The local variable "stat" is renamed to "show_stat" after moving to global scope because the name "stat" conflicts with stat(2) syscall. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-17index-pack: fix buffer overflow caused by translationsNguyễn Thái Ngọc Duy
The translation of "completed with %d local objects" is put in a 48-byte buffer, which may be enough for English but not true for any translations. Convert it to use strbuf (i.e. no hard limit on translation length). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-31i18n: mark more index-pack strings for translationNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-16Merge branch 'jk/index-pack-streaming-fix'Junio C Hamano
The streaming index-pack introduced in 1.7.11 had a data corruption bug, and this should fix it. * jk/index-pack-streaming-fix: index-pack: loop while inflating objects in unpack_data
2012-07-10index-pack: loop while inflating objects in unpack_dataJeff King
When the unpack_data function is given a consume() callback, it unpacks only 64K of the input at a time, feeding it to git_inflate along with a 64K output buffer. However, because we are inflating, there is a good chance that the output buffer will fill before consuming all of the input. In this case, we need to loop on git_inflate until we have fed the whole input buffer, feeding each chunk of output to the consume buffer. The current code does not do this, and as a result, will fail the loop condition and trigger a fatal "serious inflate inconsistency" error in this case. While we're rearranging the loop, let's get rid of the extra last_out pointer. It is meant to point to the beginning of the buffer that we feed to git_inflate, but in practice this is always the beginning of our same 64K buffer, because: 1. At the beginning of the loop, we are feeding the buffer. 2. At the end of the loop, if we are using a consume() function, we reset git_inflate's pointer to the beginning of the buffer. If we are not using a consume() function, then we do not care about the value of last_out at all. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-09Merge branch 'rj/platform-pread-may-be-thread-unsafe'Junio C Hamano
On Cygwin, the platform pread(3) is not thread safe, just like our own compat/ emulation, and cannot be used in the index-pack program. * rj/platform-pread-may-be-thread-unsafe: index-pack: Disable threading on cygwin
2012-06-26index-pack: Disable threading on cygwinJunio C Hamano
The Cygwin implementation of pread() is not thread-safe since, just like the emulation provided by compat/pread.c, it uses a sequence of seek-read-seek calls. In order to avoid failues due to thread-safety issues, commit b038a61 disables threading when NO_PREAD is defined. (ie when using the emulation code in compat/pread.c). We introduce a new build variable, NO_THREAD_SAFE_PREAD, which allows use to disable the threaded index-pack code on cygwin, in addition to the above NO_PREAD case. Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-24index-pack: use streaming interface for collision test on large blobsNguyễn Thái Ngọc Duy
When putting whole objects in core is unavoidable, try match object type and size first before actually inflating. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-23index-pack: factor out unpack core from get_data_from_packNguyễn Thái Ngọc Duy
This allows caller to consume large inflated object with a fixed amount of memory. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-23index-pack: use streaming interface on large blobs (most of the time)Nguyễn Thái Ngọc Duy
unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-23index-pack: hash non-delta objects while reading from streamNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-14Merge branch 'nd/threaded-index-pack'Junio C Hamano
Enables threading in index-pack to resolve base data in parallel. By Nguyễn Thái Ngọc Duy (3) and Ramsay Jones (1) * nd/threaded-index-pack: index-pack: disable threading if NO_PREAD is defined index-pack: support multithreaded delta resolving index-pack: restructure pack processing into three main functions compat/win32/pthread.h: Add an pthread_key_delete() implementation
2012-05-07index-pack: disable threading if NO_PREAD is definedNguyễn Thái Ngọc Duy
NO_PREAD simulates pread() as a sequence of seek, read, seek in compat/pread.c. The simulation is not thread-safe because another thread could move the file offset away in the middle of pread operation. Do not allow threading in that case. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-07index-pack: support multithreaded delta resolvingNguyễn Thái Ngọc Duy
This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-07index-pack: restructure pack processing into three main functionsNguyễn Thái Ngọc Duy
The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-24i18n: index-pack: mark strings for translationNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-16index-pack: eliminate unlimited recursion in get_base_data()Nguyễn Thái Ngọc Duy
Revese the order of delta applying so that by the time a delta is applied, its base is either non-delta or already inflated. get_base_data() is still recursive, but because base's data is always ready, the inner get_base_data() call never has any chance to call itself again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-16index-pack: eliminate recursion in find_unresolved_deltasNguyễn Thái Ngọc Duy
Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-21Appease Sun Studio by renaming "tmpfile"Ævar Arnfjörð Bjarmason
On Solaris the system headers define the "tmpfile" name, which'll cause Git compiled with Sun Studio 12 Update 1 to whine about us redefining the name: "pack-write.c", line 76: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "sha1_file.c", line 2455: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "fast-import.c", line 858: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "builtin/index-pack.c", line 175: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) Just renaming the "tmpfile" variable to "tmp_file" in the relevant places is the easiest way to fix this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-17receive-pack, fetch-pack: reject bogus pack that records objects twiceJunio C Hamano
When receive-pack & fetch-pack are run and store the pack obtained over the wire to a local repository, they internally run the index-pack command with the --strict option. Make sure that we reject incoming packfile that records objects twice to avoid spreading such a damage. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-19Merge branch 'jc/index-pack'Junio C Hamano
* jc/index-pack: verify-pack: use index-pack --verify index-pack: show histogram when emulating "verify-pack -v" index-pack: start learning to emulate "verify-pack -v" index-pack: a miniscule refactor index-pack --verify: read anomalous offsets from v2 idx file write_idx_file: need_large_offset() helper function index-pack: --verify write_idx_file: introduce a struct to hold idx customization options index-pack: group the delta-base array entries also by type Conflicts: builtin/verify-pack.c cache.h sha1_file.c
2011-07-19Merge branch 'jc/zlib-wrap'Junio C Hamano
* jc/zlib-wrap: zlib: allow feeding more than 4GB in one go zlib: zlib can only process 4GB at a time zlib: wrap deflateBound() too zlib: wrap deflate side of the API zlib: wrap inflateInit2 used to accept only for gzip format zlib: wrap remaining calls to direct inflate/inflateEnd zlib wrapper: refactor error message formatter Conflicts: sha1_file.c
2011-06-10zlib: zlib can only process 4GB at a timeJunio C Hamano
The size of objects we read from the repository and data we try to put into the repository are represented in "unsigned long", so that on larger architectures we can handle objects that weigh more than 4GB. But the interface defined in zlib.h to communicate with inflate/deflate limits avail_in (how many bytes of input are we calling zlib with) and avail_out (how many bytes of output from zlib are we ready to accept) fields effectively to 4GB by defining their type to be uInt. In many places in our code, we allocate a large buffer (e.g. mmap'ing a large loose object file) and tell zlib its size by assigning the size to avail_in field of the stream, but that will truncate the high octets of the real size. The worst part of this story is that we often pass around z_stream (the state object used by zlib) to keep track of the number of used bytes in input/output buffer by inspecting these two fields, which practically limits our callchain to the same 4GB limit. Wrap z_stream in another structure git_zstream that can express avail_in and avail_out in unsigned long. For now, just die() when the caller gives a size that cannot be given to a single zlib call. In later patches in the series, we would make git_inflate() and git_deflate() internally loop to give callers an illusion that our "improved" version of zlib interface can operate on a buffer larger than 4GB in one go. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10zlib: wrap deflate side of the APIJunio C Hamano
Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use of deflateInit2 in remote-curl.c to tell the library to use gzip header and trailer in git_deflate_init_gzip(). There is only one caller that cares about the status from deflateEnd(). Introduce git_deflate_end_gently() to let that sole caller retrieve the status and act on it (i.e. die) for now, but we would probably want to make inflate_end/deflate_end die when they ran out of memory and get rid of the _gently() kind. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-06index-pack: show histogram when emulating "verify-pack -v"Junio C Hamano
The histogram produced by "verify-pack -v" always had an artificial limit of 50, but index-pack knows what the maximum delta depth is, so we do not have to limit it. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-06index-pack: start learning to emulate "verify-pack -v"Junio C Hamano
The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-06index-pack: a miniscule refactorJunio C Hamano
Introduce a helper function that takes the type of an object and tell if it is a delta, as we seem to use this check in many places. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-11sparse: Fix an "symbol 'cmd_index_pack' not declared" warningRamsay Jones
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-03sparse: Fix errors and silence warningsStephen Boyd
* load_file() returns a void pointer but is using 0 for the return value * builtin/receive-pack.c forgot to include builtin.h * packet_trace_prefix can be marked static * ll_merge takes a pointer for its last argument, not an int * crc32 expects a pointer as the second argument but Z_NULL is defined to be 0 (see 38f4d13 sparse fix: Using plain integer as NULL pointer, 2006-11-18 for more info) Signed-off-by: Stephen Boyd <bebarino@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-22Fix sparse warningsStephen Boyd
Fix warnings from 'make check'. - These files don't include 'builtin.h' causing sparse to complain that cmd_* isn't declared: builtin/clone.c:364, builtin/fetch-pack.c:797, builtin/fmt-merge-msg.c:34, builtin/hash-object.c:78, builtin/merge-index.c:69, builtin/merge-recursive.c:22 builtin/merge-tree.c:341, builtin/mktag.c:156, builtin/notes.c:426 builtin/notes.c:822, builtin/pack-redundant.c:596, builtin/pack-refs.c:10, builtin/patch-id.c:60, builtin/patch-id.c:149, builtin/remote.c:1512, builtin/remote-ext.c:240, builtin/remote-fd.c:53, builtin/reset.c:236, builtin/send-pack.c:384, builtin/unpack-file.c:25, builtin/var.c:75 - These files have symbols which should be marked static since they're only file scope: submodule.c:12, diff.c:631, replace_object.c:92, submodule.c:13, submodule.c:14, trace.c:78, transport.c:195, transport-helper.c:79, unpack-trees.c:19, url.c:3, url.c:18, url.c:104, url.c:117, url.c:123, url.c:129, url.c:136, thread-utils.c:21, thread-utils.c:48 - These files redeclare symbols to be different types: builtin/index-pack.c:210, parse-options.c:564, parse-options.c:571, usage.c:49, usage.c:58, usage.c:63, usage.c:72 - These files use a literal integer 0 when they really should use a NULL pointer: daemon.c:663, fast-import.c:2942, imap-send.c:1072, notes-merge.c:362 While we're in the area, clean up some unused #includes in builtin files (mostly exec_cmd.h). Signed-off-by: Stephen Boyd <bebarino@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-16standardize brace placement in struct definitionsJonathan Nieder
In a struct definitions, unlike functions, the prevailing style is for the opening brace to go on the same line as the struct name, like so: struct foo { int bar; char *baz; }; Indeed, grepping for 'struct [a-z_]* {$' yields about 5 times as many matches as 'struct [a-z_]*$'. Linus sayeth: Heretic people all over the world have claimed that this inconsistency is ... well ... inconsistent, but all right-thinking people know that (a) K&R are _right_ and (b) K&R are right. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-28index-pack --verify: read anomalous offsets from v2 idx fileJunio C Hamano
A pack v2 .idx file usually records offset using 64-bit representation only when the offset does not fit within 31-bit, but you can handcraft your .idx file to record smaller offset using 64-bit, storing all zero in the upper 4-byte. By inspecting the original idx file when running index-pack --verify, encode such low offsets that do not need to be in 64-bit but are encoded using 64-bit just like the original idx file so that we can still validate the pack/idx pair by comparing the idx file recomputed with the original. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-28index-pack: --verifyJunio C Hamano
Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-28write_idx_file: introduce a struct to hold idx customization optionsJunio C Hamano
Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-28index-pack: group the delta-base array entries also by typeJunio C Hamano
Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-08Use parentheses and `...' where appropriateŠtěpán Němec
Remove some stray usage of other bracket types and asterisks for the same purpose. Signed-off-by: Štěpán Němec <stepnem@gmail.com> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-06do not depend on signed integer overflowErik Faye-Lund
Signed integer overflow is not defined in C, so do not depend on it. This fixes a problem with GCC 4.4.0 and -O3 where the optimizer would consider "consumed_bytes > consumed_bytes + bytes" as a constant expression, and never execute the die()-call. Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-31Merge branch 'jn/paginate-fix'Junio C Hamano
* jn/paginate-fix: t7006 (pager): add missing TTY prerequisites merge-file: run setup_git_directory_gently() sooner var: run setup_git_directory_gently() sooner ls-remote: run setup_git_directory_gently() sooner index-pack: run setup_git_directory_gently() sooner config: run setup_git_directory_gently() sooner bundle: run setup_git_directory_gently() sooner apply: run setup_git_directory_gently() sooner grep: run setup_git_directory_gently() sooner shortlog: run setup_git_directory_gently() sooner git wrapper: allow setup_git_directory_gently() be called earlier setup: remember whether repository was found git wrapper: introduce startup_info struct Conflicts: builtin/index-pack.c
2010-08-31Merge branch 'jn/maint-setup-fix'Junio C Hamano
* jn/maint-setup-fix: setup: split off a function to handle ordinary .git directories Revert "rehabilitate 'git index-pack' inside the object store" setup: do not forget working dir from subdir of gitdir t4111 (apply): refresh index before applying patches to it setup: split off get_device_or_die helper setup: split off a function to handle hitting ceiling in repo search setup: split off code to handle stumbling upon a repository setup: split off a function to checks working dir for .git file setup: split off $GIT_DIR-set case from setup_git_directory_gently tests: try git apply from subdir of toplevel t1501 (rev-parse): clarify Conflicts: builtin/index-pack.c
2010-08-16index-pack: run setup_git_directory_gently() soonerNguyễn Thái Ngọc Duy
index-pack already runs a repository search unconditionally; running such a search earlier is not risky and ensures GIT_DIR will be set correctly if the configuration needs to be accessed from run_builtin(). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-12index-pack: Don't follow replace refs.Nelson Elhage
Without this, attempting to index a pack containing objects that have been replaced results in a fatal error that looks like: fatal: SHA1 COLLISION FOUND WITH <replaced-object> ! Signed-off-by: Nelson Elhage <nelhage@ksplice.com> Acked-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-04Revert "rehabilitate 'git index-pack' inside the object store"Nguyễn Thái Ngọc Duy
Now setup_git_directory_gently behaves sanely even from subdirs of .git, so simplify index-pack by no longer protecting against that. This reverts commit a672ea6ac5a1b876bc7adfe6534b16fa2a32c94b (excluding tests). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>