summaryrefslogtreecommitdiff
path: root/builtin/index-pack.c
AgeCommit message (Collapse)Author
2015-06-22fsck: introduce fsck optionsJohannes Schindelin
Just like the diff machinery, we are about to introduce more settings, therefore it makes sense to carry them around as a (pointer to a) struct containing all of them. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-16Merge branch 'nd/slim-index-pack-memory-usage'Junio C Hamano
An earlier optimization broke index-pack for a large object transfer; this fixes it before the breakage hits any released version. * nd/slim-index-pack-memory-usage: index-pack: fix truncation of off_t in comparison
2015-06-09index-pack: avoid excessive re-reading of pack directoryJeff King
Since 45e8a74 (has_sha1_file: re-check pack directory before giving up, 2013-08-30), we spend extra effort for has_sha1_file to give the right answer when somebody else is repacking. Usually this effort does not matter, because after finding that the object does not exist, the next step is usually to die(). However, some code paths make a large number of has_sha1_file checks which are _not_ expected to return 1. The collision test in index-pack.c is such a case. On a local system, this can cause a performance slowdown of around 5%. But on a system with high-latency system calls (like NFS), it can be much worse. This patch introduces a "quick" flag to has_sha1_file which callers can use when they would prefer high performance at the cost of false negatives during repacks. There may be other code paths that can use this, but the index-pack one is the most obviously critical, so we'll start with switching that one. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-04index-pack: fix truncation of off_t in comparisonJeff King
Commit c6458e6 (index-pack: kill union delta_base to save memory, 2015-04-18) refactored the comparison functions used in sorting and binary searching our delta list. The resulting code does something like: int cmp_offsets(off_t a, off_t b) { return a - b; } This works most of the time, but produces nonsensical results when the difference between the two offsets is larger than what can be stored in an "int". This can lead to unresolved deltas if the packsize is larger than 2G (even on 64-bit systems, an int is still typically 32 bits): $ git clone git://github.com/mozilla/gecko-dev Cloning into 'gecko-dev'... remote: Counting objects: 4800161, done. remote: Compressing objects: 100% (178/178), done. remote: Total 4800161 (delta 88), reused 0 (delta 0), pack-reused 4799978 Receiving objects: 100% (4800161/4800161), 2.21 GiB | 3.26 MiB/s, done. Resolving deltas: 99% (3808820/3811944), completed with 0 local objects. fatal: pack has 3124 unresolved deltas fatal: index-pack failed We can fix it by doing direct comparisons between the offsets and returning constants; the callers only care about the sign of the comparison, not the magnitude. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-11Merge branch 'nd/slim-index-pack-memory-usage'Junio C Hamano
Memory usage of "git index-pack" has been trimmed by tens of per-cent. * nd/slim-index-pack-memory-usage: index-pack: kill union delta_base to save memory index-pack: reduce object_entry size to save memory
2015-04-19index-pack: kill union delta_base to save memoryNguyễn Thái Ngọc Duy
Once we know the number of objects in the input pack, we allocate an array of nr_objects of struct delta_entry. On x86-64, this struct is 32 bytes long. The union delta_base, which is part of struct delta_entry, provides enough space to store either ofs-delta (8 bytes) or ref-delta (20 bytes). Because ofs-delta encoding is more efficient space-wise and more performant at runtime than ref-delta encoding, Git packers try to use ofs-delta whenever possible, and it is expected that objects encoded as ref-delta are minority. In the best clone case where no ref-delta object is present, we waste (20-8) * nr_objects bytes because of this union. That's about 38MB out of 100MB for deltas[] with 3.4M objects, or 38%. deltas[] would be around 62MB without the waste. This patch attempts to eliminate that. deltas[] array is split into two: one for ofs-delta and one for ref-delta. Many functions are also duplicated because of this split. With this patch, ofs_deltas[] array takes 51MB. ref_deltas[] should remain unallocated in clone case (0 bytes). This array grows as we see ref-delta. We save about half in this case, or 25% of total bookkeeping. The saving is more than the calculation above because some padding in the old delta_entry struct is removed. ofs_delta_entry is 16 bytes, including the 4 bytes padding. That's 13MB for padding, but packing the struct could break platforms that do not support unaligned access. If someone on 32-bit is really low on memory and only deals with packs smaller than 2G, using 32-bit off_t would eliminate the padding and save 27MB on top. A note about ofs_deltas allocation. We could use ref_deltas memory allocation strategy for ofs_deltas. But that probably just adds more overhead on top. ofs-deltas are generally the majority (1/2 to 2/3) in any pack. Incremental realloc may lead to too many memcpy. And if we preallocate, say 1/2 or 2/3 of nr_objects initially, the growth rate of ALLOC_GROW() could make this array larger than nr_objects, wasting more memory. Brought-up-by: Matthew Sporleder <msporleder@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-17Merge branch 'rs/deflate-init-cleanup'Junio C Hamano
Code simplification. * rs/deflate-init-cleanup: zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}
2015-03-05zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}René Scharfe
Clear the git_zstream variable at the start of git_deflate_init() etc. so that callers don't have to do that. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-27index-pack: reduce object_entry size to save memoryNguyễn Thái Ngọc Duy
For each object in the input pack, we need one struct object_entry. On x86-64, this struct is 64 bytes long. Although: - The 8 bytes for delta_depth and base_object_no are only useful when show_stat is set. And it's never set unless someone is debugging. - The three fields hdr_size, type and real_type take 4 bytes each even though they never use more than 4 bits. By moving delta_depth and base_object_no out of struct object_entry and make the other 3 fields one byte long instead of 4, we shrink 25% of this struct. On a 3.4M object repo (*) that's about 53MB. The saving is less impressive compared to index-pack memory use for basic bookkeeping (**), about 16%. (*) linux-2.6.git already has 4M objects as of v3.19-rc7 so this is not an unrealistic number of objects that we have to deal with. (**) 3.4M * (sizeof(object_entry) + sizeof(delta_entry)) = 311MB Brought-up-by: Matthew Sporleder <msporleder@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-22Merge branch 'js/fsck-tag-validation'Junio C Hamano
New tag object format validation added in 2.2 showed garbage after a tagname it reported in its error message. * js/fsck-tag-validation: index-pack: terminate object buffers with NUL fsck: properly bound "invalid tag name" error message
2014-12-09index-pack: terminate object buffers with NULDuy Nguyen
We have some tricky checks in fsck that rely on a side effect of require_end_of_header(), and would otherwise easily run outside non-NUL-terminated buffers. This is a bit brittle, so let's make sure that only NUL-terminated buffers are passed around to begin with. Jeff "Peff" King contributed the detailed analysis which call paths are involved and pointed out that we also have to patch the get_data() function in unpack-objects.c, which is what Johannes "Dscho" Schindelin implemented. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Analyzed-by: Jeff King <peff@peff.net> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-13index-pack: fix compilation with NO_PTHREADSEtienne Buira
type_cas_lock/unlock() should be defined as no-op for NO_PTHREADS build, just like all the other locking primitives. Signed-off-by: Etienne Buira <etienne.buira@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-26Merge branch 'rs/realloc-array'Junio C Hamano
Code cleanup. * rs/realloc-array: use REALLOC_ARRAY for changing the allocation size of arrays add macro REALLOC_ARRAY
2014-09-26Merge branch 'js/fsck-tag-validation'Junio C Hamano
Teach "git fsck" to inspect the contents of annotated tag objects. * js/fsck-tag-validation: Make sure that index-pack --strict checks tag objects Add regression tests for stricter tag fsck'ing fsck: check tag objects' headers Make sure fsck_commit_buffer() does not run out of the buffer fsck_object(): allow passing object data separately from the object itself Refactor type_from_string() to allow continuing after detecting an error
2014-09-19Merge branch 'jk/index-pack-threading-races'Junio C Hamano
When receiving an invalid pack stream that records the same object twice, multiple threads got confused due to a race. We should reject or correct such a stream upon receiving, but that will be a larger change. * jk/index-pack-threading-races: index-pack: fix race condition with duplicate bases
2014-09-18use REALLOC_ARRAY for changing the allocation size of arraysRené Scharfe
Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-10fsck_object(): allow passing object data separately from the object itselfJohannes Schindelin
When fsck'ing an incoming pack, we need to fsck objects that cannot be read via read_sha1_file() because they are not local yet (and might even be rejected if transfer.fsckobjects is set to 'true'). For commits, there is a hack in place: we basically cache commit objects' buffers anyway, but the same is not true, say, for tag objects. By refactoring fsck_object() to take the object buffer and size as optional arguments -- optional, because we still fall back to the previous method to look at the cached commit objects if the caller passes NULL -- we prepare the machinery for the upcoming handling of tag objects. The assumption that such buffers are inherently NUL terminated is now wrong, of course, hence we pass the size of the buffer so that we can add a sanity check later, to prevent running past the end of the buffer. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-29index-pack: fix race condition with duplicate basesJeff King
When we are resolving deltas in an indexed pack, we do it by first selecting a potential base (either one stored in full in the pack, or one created by resolving another delta), and then resolving any deltas that use that base. When we resolve a particular delta, we flip its "real_type" field from OBJ_{REF,OFS}_DELTA to whatever the real type is. We assume that traversing the objects this way will visit each delta only once. This is correct for most packs; we visit the delta only when we process its base, and each object (and thus each base) appears only once. However, if a base object appears multiple times in the pack, we will try to resolve any deltas based on it once for each instance. We can detect this case by noting that a delta we are about to resolve has already had its real_type field flipped, and we already do so with an assert(). However, if multiple threads are in use, we may race with another thread on comparing and flipping the field. We need to synchronize the access. The right mechanism for doing this is a compare-and-swap (we atomically "claim" the delta for our own and find out whether our claim was successful). We can implement this in C by using a pthread mutex to protect the operation. This is not the fastest way of doing a compare-and-swap; many processors provide instructions for this, and gcc and other compilers provide builtins to access them. However, some experiments showed that lock contention does not cause a significant slowdown here. Adding c-a-s support for many compilers would increase the maintenance burden (and we would still end up including the pthread version as a fallback). Note that we only need to touch the OBJ_REF_DELTA codepath here. An OBJ_OFS_DELTA object points to its base using an offset, and therefore has only one base, even if another copy of that base object appears in the pack (we do still touch it briefly because the setting of real_type is factored out of resolve_data). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-21Merge branch 'maint'Junio C Hamano
* maint: use xmemdupz() to allocate copies of strings given by start and length use xcalloc() to allocate zero-initialized memory
2014-07-21use xcalloc() to allocate zero-initialized memoryRené Scharfe
Use xcalloc() instead of xmalloc() followed by memset() to allocate and zero out memory because it's shorter and avoids duplicating the function parameters. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-16Merge branch 'jk/strip-suffix'Junio C Hamano
* jk/strip-suffix: prepare_packed_git_one: refactor duplicate-pack check verify-pack: use strbuf_strip_suffix strbuf: implement strbuf_strip_suffix index-pack: use strip_suffix to avoid magic numbers use strip_suffix instead of ends_with in simple cases replace has_extension with ends_with implement ends_with via strip_suffix add strip_suffix function sha1_file: replace PATH_MAX buffer with strbuf in prepare_packed_git_one()
2014-07-16Merge branch 'jk/commit-buffer-length' into maintJunio C Hamano
A handful of code paths had to read the commit object more than once when showing header fields that are usually not parsed. The internal data structure to keep track of the contents of the commit object has been updated to reduce the need for this double-reading, and to allow the caller find the length of the object. * jk/commit-buffer-length: reuse cached commit buffer when parsing signatures commit: record buffer length in cache commit: convert commit->buffer to a slab commit-slab: provide a static initializer use get_commit_buffer everywhere convert logmsg_reencode to get_commit_buffer use get_commit_buffer to avoid duplicate code use get_cached_commit_buffer where appropriate provide helpers to access the commit buffer provide a helper to set the commit buffer provide a helper to free commit buffer sequencer: use logmsg_reencode in get_message logmsg_reencode: return const buffer do not create "struct commit" with xcalloc commit: push commit_index update into alloc_commit_node alloc: include any-object allocations in alloc_report replace dangerous uses of strbuf_attach commit_tree: take a pointer/len pair rather than a const strbuf
2014-07-02Merge branch 'jk/commit-buffer-length'Junio C Hamano
Move "commit->buffer" out of the in-core commit object and keep track of their lengths. Use this to optimize the code paths to validate GPG signatures in commit objects. * jk/commit-buffer-length: reuse cached commit buffer when parsing signatures commit: record buffer length in cache commit: convert commit->buffer to a slab commit-slab: provide a static initializer use get_commit_buffer everywhere convert logmsg_reencode to get_commit_buffer use get_commit_buffer to avoid duplicate code use get_cached_commit_buffer where appropriate provide helpers to access the commit buffer provide a helper to set the commit buffer provide a helper to free commit buffer sequencer: use logmsg_reencode in get_message logmsg_reencode: return const buffer do not create "struct commit" with xcalloc commit: push commit_index update into alloc_commit_node alloc: include any-object allocations in alloc_report replace dangerous uses of strbuf_attach commit_tree: take a pointer/len pair rather than a const strbuf
2014-06-30index-pack: use strip_suffix to avoid magic numbersJeff King
We also switch to using strbufs, which lets us avoid the potentially dangerous combination of a manual malloc followed by a strcpy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-30replace has_extension with ends_withJeff King
These two are almost the same function, with the exception that has_extension only matches if there is content before the suffix. So ends_with(".exe", ".exe") is true, but has_extension would not be. This distinction does not matter to any of the callers, though, and we can just replace uses of has_extension with ends_with. We prefer the "ends_with" name because it is more generic, and there is nothing about the function that requires it to be used for file extensions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-25Merge branch 'ym/fix-opportunistic-index-update-race' into maintJunio C Hamano
"git status", even though it is a read-only operation, tries to update the index with refreshed lstat(2) info to optimize future accesses to the working tree opportunistically, but this could race with a "read-write" operation that modify the index while it is running. Detect such a race and avoid overwriting the index. * ym/fix-opportunistic-index-update-race: read-cache.c: verify index file before we opportunistically update it wrapper.c: add xpread() similar to xread()
2014-06-25Merge branch 'jk/index-pack-report-missing' into maintJunio C Hamano
The error reporting from "git index-pack" has been improved to distinguish missing objects from type errors. * jk/index-pack-report-missing: index-pack: distinguish missing objects from type errors
2014-06-25Merge branch 'nd/index-pack-one-fd-per-thread' into maintJunio C Hamano
We used to disable threaded "git index-pack" on platforms without thread-safe pread(); use a different workaround for such platforms to allow threaded "git index-pack". * nd/index-pack-one-fd-per-thread: index-pack: work around thread-unsafe pread()
2014-06-13commit: record buffer length in cacheJeff King
Most callsites which use the commit buffer try to use the cached version attached to the commit, rather than re-reading from disk. Unfortunately, that interface provides only a pointer to the NUL-terminated buffer, with no indication of the original length. For the most part, this doesn't matter. People do not put NULs in their commit messages, and the log code is happy to treat it all as a NUL-terminated string. However, some code paths do care. For example, when checking signatures, we want to be very careful that we verify all the bytes to avoid malicious trickery. This patch just adds an optional "size" out-pointer to get_commit_buffer and friends. The existing callers all pass NULL (there did not seem to be any obvious sites where we could avoid an immediate strlen() call, though perhaps with some further refactoring we could). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13provide a helper to free commit bufferJeff King
This converts two lines into one at each caller. But more importantly, it abstracts the concept of freeing the buffer, which will make it easier to change later. Note that we also need to provide a "detach" mechanism for a tricky case in index-pack. We are passed a buffer for the object generated by processing the incoming pack. If we are not using --strict, we just calculate the sha1 on that buffer and return, leaving the caller to free it. But if we are using --strict, we actually attach that buffer to an object, pass the object to the fsck functions, and then detach the buffer from the object again (so that the caller can free it as usual). In this case, we don't want to free the buffer ourselves, but just make sure it is no longer associated with the commit. Note that we are making the assumption here that the attach/detach process does not impact the buffer at all (e.g., it is never reallocated or modified). That holds true now, and we have no plans to change that. However, as we abstract the commit_buffer code, this dependency becomes less obvious. So when we detach, let's also make sure that we get back the same buffer that we gave to the commit_buffer code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-06Merge branch 'jk/index-pack-report-missing'Junio C Hamano
* jk/index-pack-report-missing: index-pack: distinguish missing objects from type errors
2014-06-03Merge branch 'nd/index-pack-one-fd-per-thread'Junio C Hamano
Enable threaded index-pack on platforms without thread-unsafe pread() emulation. * nd/index-pack-one-fd-per-thread: index-pack: work around thread-unsafe pread()
2014-06-03Merge branch 'ym/fix-opportunistic-index-update-race'Junio C Hamano
Read-only operations such as "git status" that internally refreshes the index write out the refreshed index to the disk to optimize future accesses to the working tree, but this could race with a "read-write" operation that modify the index while it is running. Detect such a race and avoid overwriting the index. Duy raised a good point that we may need to do the same for the normal writeout codepath, not just the "opportunistic" update codepath. While that is true, nobody sane would be running two simultaneous operations that are clearly write-oriented competing with each other against the same index file. So in that sense that can be done as a less urgent follow-up for this topic. * ym/fix-opportunistic-index-update-race: read-cache.c: verify index file before we opportunistically update it wrapper.c: add xpread() similar to xread()
2014-05-12index-pack: distinguish missing objects from type errorsJeff King
When we fetch a pack that does not contain an object we expected to receive, we get an error like: $ git init --bare tmp.git && cd tmp.git $ git fetch ../parent.git [...] error: Could not read 964953ec7bcc0245cb1d0db4095455edd21a2f2e fatal: Failed to traverse parents of commit b8247b40caf6704fe52736cdece6d6aae87471aa error: ../parent.git did not send all necessary objects This comes from the check_everything_connected rev-list. If we try cloning the same repo (rather than a fetch), we end up using index-pack's --check-self-contained-and-connected option instead, which produces output like: $ git clone --no-local --bare parent.git tmp.git [...] fatal: object of unexpected type fatal: index-pack failed Not only is the sha1 missing, but it's a misleading message. There's no type problem, but rather a missing object problem; we don't notice the difference because we simply compare OBJ_BAD != OBJ_BLOB. Let's provide a different message for this case: $ git clone --no-local --bare parent.git tmp.git fatal: did not receive expected object 6b00a8c61ed379d5f925a72c1987c9c52129d364 fatal: index-pack failed While we're at it, let's also improve a true type mismatch error to look like fatal: object 6b00a8c61ed379d5f925a72c1987c9c52129d364: expected type blob, got tree Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-16index-pack: work around thread-unsafe pread()Nguyễn Thái Ngọc Duy
Multi-threaing of index-pack was disabled with c0f8654 (index-pack: Disable threading on cygwin - 2012-06-26), because pread() implementations for Cygwin and MSYS were not thread safe. Recent Cygwin does offer usable pread() and we enabled multi-threading with 103d530f (Cygwin 1.7 has thread-safe pread, 2013-07-19). Work around this problem on platforms with a thread-unsafe pread() emulation by opening one file handle per thread; it would prevent parallel pread() on different file handles from stepping on each other. Also remove NO_THREAD_SAFE_PREAD that was introduced in c0f8654 because it's no longer used anywhere. This workaround is unconditional, even for platforms with thread-safe pread() because the overhead is small (a couple file handles more) and not worth fragmenting the code. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Tested-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-10wrapper.c: add xpread() similar to xread()Yiannis Marangos
It is a common mistake to call read(2)/pread(2) and forget to anticipate that they may return error with EAGAIN/EINTR when the system call is interrupted. We have xread() helper to relieve callers of read(2) from having to worry about it; add xpread() helper to do the same for pread(2). Update the caller in the builtin/index-pack.c and the mmap emulation in compat/. Signed-off-by: Yiannis Marangos <yiannis.marangos@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-03Merge branch 'nd/index-pack-error-message' into maintJunio C Hamano
* nd/index-pack-error-message: index-pack: report error using the correct variable
2014-03-25Merge branch 'nd/index-pack-error-message'Junio C Hamano
* nd/index-pack-error-message: index-pack: report error using the correct variable
2014-03-17index-pack: report error using the correct variableJunio C Hamano
We feed a string pointer that is potentially NULL to die() when showing the message. Don't. Noticed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-20rename read_replace_refs to check_replace_refsMichael Haggerty
The semantics of this flag was changed in commit e1111cef23 inline lookup_replace_object() calls but wasn't renamed at the time to minimize code churn. Rename it now, and add a comment explaining its use. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05replace {pre,suf}fixcmp() with {starts,ends}_with()Christian Couder
Leaving only the function definitions and declarations so that any new topic in flight can still make use of the old functions, replace existing uses of the prefixcmp() and suffixcmp() with new API functions. The change can be recreated by mechanically applying this: $ git grep -l -e prefixcmp -e suffixcmp -- \*.c | grep -v strbuf\\.c | xargs perl -pi -e ' s|!prefixcmp\(|starts_with\(|g; s|prefixcmp\(|!starts_with\(|g; s|!suffixcmp\(|ends_with\(|g; s|suffixcmp\(|!ends_with\(|g; ' on the result of preparatory changes in this series. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-17Merge branch 'jk/free-tree-buffer'Junio C Hamano
* jk/free-tree-buffer: clear parsed flag when we free tree buffers
2013-06-06clear parsed flag when we free tree buffersJeff King
Many code paths will free a tree object's buffer and set it to NULL after finishing with it in order to keep memory usage down during a traversal. However, out of 8 sites that do this, only one actually unsets the "parsed" flag back. Those sites that don't are setting a trap for later users of the tree object; even after calling parse_tree, the buffer will remain NULL, causing potential segfaults. It is not known whether this is triggerable in the current code. Most commands do not do an in-memory traversal followed by actually using the objects again. However, it does not hurt to be safe for future callers. In most cases, we can abstract this out to a "free_tree_buffer" helper. However, there are two exceptions: 1. The fsck code relies on the parsed flag to know that we were able to parse the object at one point. We can switch this to using a flag in the "flags" field. 2. The index-pack code sets the buffer to NULL but does not free it (it is freed by a caller). We should still unset the parsed flag here, but we cannot use our helper, as we do not want to free the buffer. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28clone: open a shortcut for connectivity checkNguyễn Thái Ngọc Duy
In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28index-pack: remove dead code (it should never happen)Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-01Merge branch 'nd/index-pack-threaded-fixes'Junio C Hamano
"index-pack --verify-stat" used a few counters outside protection of mutex, possibly showing incorrect numbers. * nd/index-pack-threaded-fixes: index-pack: guard nr_resolved_deltas reads by lock index-pack: protect deepest_delta in multithread code
2013-04-01Merge branch 'jk/index-pack-correct-depth-fix'Junio C Hamano
"index-pack --fix-thin" used uninitialize value to compute delta depths of objects it appends to the resulting pack. * jk/index-pack-correct-depth-fix: index-pack: always zero-initialize object_entry list
2013-03-20index-pack: always zero-initialize object_entry listJeff King
Commit 38a4556 (index-pack: start learning to emulate "verify-pack -v", 2011-06-03) added a "delta_depth" counter to each "struct object_entry". Initially, all object entries have their depth set to 0; in resolve_delta, we then set the depth of each delta to "base + 1". Base entries never have their depth touched, and remain at 0. To ensure that all depths start at 0, that commit changed calls to xmalloc the object_entry list into calls to xcalloc. However, it forgot that we grow the list with xrealloc later. These extra entries are used when we add an object from elsewhere to complete a thin pack. If we add a non-delta object, its depth value will just be uninitialized heap data. This patch fixes it by zero-initializing entries we add to the objects list via the xrealloc. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-19index-pack: guard nr_resolved_deltas reads by lockThomas Rast
The threaded parts of index-pack increment the number of resolved deltas in nr_resolved_deltas guarded by counter_mutex. However, the per-thread outer loop accessed nr_resolved_deltas without any locks. This is not wrong as such, since it doesn't matter all that much whether we get an outdated value. However, unless someone proves that this one lock makes all the performance difference, it would be much cleaner to guard _all_ accesses to the variable with the lock. The only such use is display_progress() in the threaded section (all others are in the conclude_pack() callchain outside the threaded part). To make it obvious that it cannot deadlock, move it out of work_mutex. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Reviewed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-19index-pack: protect deepest_delta in multithread codeNguyễn Thái Ngọc Duy
deepest_delta is a global variable but is updated without protection in resolve_delta(), a multithreaded function. Add a new mutex for it, but only protect and update when it's actually used (i.e. show_stat is non-zero). Another variable that will not be updated is delta_depth in "struct object_entry" as it's only useful when show_stat is 1. Putting it in "if (show_stat)" makes it clearer. The local variable "stat" is renamed to "show_stat" after moving to global scope because the name "stat" conflicts with stat(2) syscall. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>