summaryrefslogtreecommitdiff
path: root/sha1_file.c
AgeCommit message (Collapse)Author
2014-04-08Merge branch 'jl/nor-or-nand-and'Junio C Hamano
Eradicate mistaken use of "nor" (that is, essentially "nor" used not in "neither A nor B" ;-)) from in-code comments, command output strings, and documentations. * jl/nor-or-nand-and: code and test: fix misuses of "nor" comments: fix misuses of "nor" contrib: fix misuses of "nor" Documentation: fix misuses of "nor"
2014-03-31code and test: fix misuses of "nor"Justin Lebar
Signed-off-by: Justin Lebar <jlebar@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-31comments: fix misuses of "nor"Justin Lebar
Signed-off-by: Justin Lebar <jlebar@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-18Merge branch 'dd/use-alloc-grow'Junio C Hamano
Replace open-coded reallocation with ALLOC_GROW() macro. * dd/use-alloc-grow: sha1_file.c: use ALLOC_GROW() in pretend_sha1_file() read-cache.c: use ALLOC_GROW() in add_index_entry() builtin/mktree.c: use ALLOC_GROW() in append_to_tree() attr.c: use ALLOC_GROW() in handle_attr_line() dir.c: use ALLOC_GROW() in create_simplify() reflog-walk.c: use ALLOC_GROW() replace_object.c: use ALLOC_GROW() in register_replace_object() patch-ids.c: use ALLOC_GROW() in add_commit() diffcore-rename.c: use ALLOC_GROW() diff.c: use ALLOC_GROW() commit.c: use ALLOC_GROW() in register_commit_graft() cache-tree.c: use ALLOC_GROW() in find_subtree() bundle.c: use ALLOC_GROW() in add_to_ref_list() builtin/pack-objects.c: use ALLOC_GROW() in check_pbase_path()
2014-03-18Merge branch 'nd/sha1-file-delta-stack-leakage-fix'Junio C Hamano
Fix a small leak in the delta stack used when resolving a long delta chain at runtime. * nd/sha1-file-delta-stack-leakage-fix: sha1_file: fix delta_stack memory leak in unpack_entry
2014-03-14Merge branch 'mh/object-code-cleanup'Junio C Hamano
* mh/object-code-cleanup: sha1_file.c: document a bunch of functions defined in the file sha1_file_name(): declare to return a const string find_pack_entry(): document last_found_pack replace_object: use struct members instead of an array
2014-03-03sha1_file.c: use ALLOC_GROW() in pretend_sha1_file()Dmitry S. Dolzhenko
Helped-by: He Sun <sunheehnus@gmail.com> Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-27Merge branch 'jk/pack-bitmap'Junio C Hamano
Borrow the bitmap index into packfiles from JGit to speed up enumeration of objects involved in a commit range without having to fully traverse the history. * jk/pack-bitmap: (26 commits) ewah: unconditionally ntohll ewah data ewah: support platforms that require aligned reads read-cache: use get_be32 instead of hand-rolled ntoh_l block-sha1: factor out get_be and put_be wrappers do not discard revindex when re-preparing packfiles pack-bitmap: implement optional name_hash cache t/perf: add tests for pack bitmaps t: add basic bitmap functionality tests count-objects: recognize .bitmap in garbage-checking repack: consider bitmaps when performing repacks repack: handle optional files created by pack-objects repack: turn exts array into array-of-struct repack: stop using magic number for ARRAY_SIZE(exts) pack-objects: implement bitmap writing rev-list: add bitmap mode to speed up object lists pack-objects: use bitmaps when packing objects pack-objects: split add_object_entry pack-bitmap: add support for bitmap indexes documentation: add documentation for the bitmap format ewah: compressed bitmap implementation ...
2014-02-25sha1_file.c: document a bunch of functions defined in the fileMichael Haggerty
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24sha1_file_name(): declare to return a const stringMichael Haggerty
Change the return value of sha1_file_name() to (const char *). (Callers have no business mucking about here.) Change callers accordingly, deleting a few superfluous temporary variables along the way. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24find_pack_entry(): document last_found_packMichael Haggerty
Add a comment at the declaration of last_found_pack and where it is used in find_pack_entry(). In the latter, separate the cases (1) to make a place for the new comment and (2) to turn the success case into affirmative logic. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Reviewed-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24sha1_file: fix delta_stack memory leak in unpack_entryNguyễn Thái Ngọc Duy
This delta_stack array can grow to any length depending on the actual delta chain, but we forget to free it. Normally it does not matter because we use small_delta_stack[] from stack and small_delta_stack can hold 64-delta chains, more than standard --depth=50 in pack-objects. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-27Merge branch 'ss/safe-create-leading-dir-with-slash'Junio C Hamano
"git clone $origin foo\bar\baz" on Windows failed to create the leading directories (i.e. a moral-equivalent of "mkdir -p"). * ss/safe-create-leading-dir-with-slash: safe_create_leading_directories(): on Windows, \ can separate path components
2014-01-27Merge branch 'mh/safe-create-leading-directories'Junio C Hamano
Code clean-up and protection against concurrent write access to the ref namespace. * mh/safe-create-leading-directories: rename_tmp_log(): on SCLD_VANISHED, retry rename_tmp_log(): limit the number of remote_empty_directories() attempts rename_tmp_log(): handle a possible mkdir/rmdir race rename_ref(): extract function rename_tmp_log() remove_dir_recurse(): handle disappearing files and directories remove_dir_recurse(): tighten condition for removing unreadable dir lock_ref_sha1_basic(): if locking fails with ENOENT, retry lock_ref_sha1_basic(): on SCLD_VANISHED, retry safe_create_leading_directories(): add new error value SCLD_VANISHED cmd_init_db(): when creating directories, handle errors conservatively safe_create_leading_directories(): introduce enum for return values safe_create_leading_directories(): always restore slash at end of loop safe_create_leading_directories(): split on first of multiple slashes safe_create_leading_directories(): rename local variable safe_create_leading_directories(): add explicit "slash" pointer safe_create_leading_directories(): reduce scope of local variable safe_create_leading_directories(): fix format of "if" chaining
2014-01-22safe_create_leading_directories(): on Windows, \ can separate path componentsMichael Haggerty
When cloning to a directory "C:\foo\bar" from Windows' cmd.exe where "foo" does not exist yet, Git would throw an error like fatal: could not create work tree dir 'c:\foo\bar'.: No such file or directory Fix this by not hard-coding a platform specific directory separator into safe_create_leading_directories(). This patch, including its entire commit message, is derived from a patch by Sebastian Schuberth. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-16do not discard revindex when re-preparing packfilesJeff King
When an object lookup fails, we re-read the objects/pack directory to pick up any new packfiles that may have been created since our last read. We also discard any pack revindex structs we've allocated. The discarding is a problem for the pack-bitmap code, which keeps a pointer to the revindex for the bitmapped pack. After the discard, the pointer is invalid, and we may read free()d memory. Other revindex users do not keep a bare pointer to the revindex; instead, they always access it through revindex_for_pack(), which lazily builds the revindex. So one solution is to teach the pack-bitmap code a similar trick. It would be slightly less efficient, but probably not all that noticeable. However, it turns out this discarding is not actually necessary. When we call reprepare_packed_git, we do not throw away our old pack list. We keep the existing entries, and only add in new ones. So there is no safety problem; we will still have the pack struct that matches each revindex. The packfile itself may go away, of course, but we are already prepared to handle that, and it may happen outside of reprepare_packed_git anyway. Throwing away the revindex may save some RAM if the pack never gets reused (about 12 bytes per object). But it also wastes some CPU time (to regenerate the index) if the pack does get reused. It's hard to say which is more valuable, but in either case, it happens very rarely (only when we race with a simultaneous repack). Just leaving the revindex in place is simple and safe both for current and future code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-10Merge branch 'jk/oi-delta-base'Junio C Hamano
Teach "cat-file --batch" to show delta-base object name for a packed object that is represented as a delta. * jk/oi-delta-base: cat-file: provide %(deltabase) batch format sha1_object_info_extended: provide delta base sha1s
2014-01-10Merge branch 'jh/rlimit-nofile-fallback'Junio C Hamano
When we figure out how many file descriptors to allocate for keeping packfiles open, a system with non-working getrlimit() could cause us to die(), but because we make this call only to get a rough estimate of how many is available and we do not even attempt to use up all file descriptors available ourselves, it is nicer to fall back to a reasonable low value rather than dying. * jh/rlimit-nofile-fallback: get_max_fd_limit(): fall back to OPEN_MAX upon getrlimit/sysconf failure
2014-01-10Merge branch 'cc/replace-object-info'Junio C Hamano
read_sha1_file() that is the workhorse to read the contents given an object name honoured object replacements, but there is no corresponding mechanism to sha1_object_info() that is used to obtain the metainfo (e.g. type & size) about the object, leading callers to weird inconsistencies. * cc/replace-object-info: replace info: rename 'full' to 'long' and clarify in-code symbols Documentation/git-replace: describe --format option builtin/replace: unset read_replace_refs t6050: add tests for listing with --format builtin/replace: teach listing using short, medium or full formats sha1_file: perform object replacement in sha1_object_info_extended() t6050: show that git cat-file --batch fails with replace objects sha1_object_info_extended(): add an "unsigned flags" parameter sha1_file.c: add lookup_replace_object_extended() to pass flags replace_object: don't check read_replace_refs twice rename READ_SHA1_FILE_REPLACE flag to LOOKUP_REPLACE_OBJECT
2014-01-06safe_create_leading_directories(): add new error value SCLD_VANISHEDMichael Haggerty
Add a new possible error result that can be returned by safe_create_leading_directories() and safe_create_leading_directories_const(): SCLD_VANISHED. This value indicates that a file or directory on the path existed at one point (either it already existed or the function created it), but then it disappeared. This probably indicates that another process deleted the directory while we were working. If SCLD_VANISHED is returned, the caller might want to retry the function call, as there is a chance that a new attempt will succeed. Why doesn't safe_create_leading_directories() do the retrying internally? Because an empty directory isn't really ever safe until it holds a file. So even if safe_create_leading_directories() were absolutely sure that the directory existed before it returned, there would be no guarantee that the directory still existed when the caller tried to write something in it. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-06safe_create_leading_directories(): introduce enum for return valuesMichael Haggerty
Instead of returning magic integer values (which a couple of callers go to the trouble of distinguishing), return values from an enum. Add a docstring. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-06safe_create_leading_directories(): always restore slash at end of loopMichael Haggerty
Always restore the slash that we scribbled over at the end of the loop, rather than also fixing it up at each premature exit from the loop. This makes it harder to forget to do the cleanup as new paths are added to the code. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-06safe_create_leading_directories(): split on first of multiple slashesMichael Haggerty
If the input path has multiple slashes between path components (e.g., "foo//bar"), then the old code was breaking the path at the last slash, not the first one. So in the above example, the second slash was overwritten with NUL, resulting in the parent directory being sought as "foo/". When stat() is called on "foo/", it fails with ENOTDIR if "foo" exists but is not a directory. This caused the wrong path to be taken in the subsequent logic. So instead, split path components at the first intercomponent slash rather than the last one. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-06safe_create_leading_directories(): rename local variableMichael Haggerty
Rename "pos" to "next_component", because now it always points at the next component of the path name that has to be processed. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-06safe_create_leading_directories(): add explicit "slash" pointerMichael Haggerty
Keep track of the position of the slash character independently of "pos", thereby making the purpose of each variable clearer and working towards other upcoming changes. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-06safe_create_leading_directories(): reduce scope of local variableMichael Haggerty
This makes it more obvious that values of "st" don't persist across loop iterations. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-06safe_create_leading_directories(): fix format of "if" chainingMichael Haggerty
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30count-objects: recognize .bitmap in garbage-checkingNguyễn Thái Ngọc Duy
Count-objects will report any "garbage" files in the packs directory, including files whose extensions it does not know (case 1), and files whose matching ".pack" file is missing (case 2). Without having learned about ".bitmap" files, the current code reports all such files as garbage (case 1), even if their pack exists. Instead, they should be treated as case 2. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-26sha1_object_info_extended: provide delta base sha1sJeff King
A caller of sha1_object_info_extended technically has enough information to determine the base sha1 from the results of the call. It knows the pack, offset, and delta type of the object, which is sufficient to find the base. However, the functions to do so are not publicly available, and the code itself is intimate enough with the pack details that it should be abstracted away. We could add a public helper to allow callers to query the delta base separately, but it is simpler and slightly more efficient to optionally grab it along with the rest of the object_info data. For cases where the object is not stored as a delta, we write the null sha1 into the query field. A careful caller could check "oi.whence == OI_PACKED && oi.u.packed.is_delta" before looking at the base sha1, but using the null sha1 provides a simple alternative (and gives a better sanity check for a non-careful caller than simply returning random bytes). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-18get_max_fd_limit(): fall back to OPEN_MAX upon getrlimit/sysconf failureJunio C Hamano
On broken systems where RLIMIT_NOFILE is visible by the compliers but underlying getrlimit() system call does not behave, we used to simply die() when we are trying to decide how many file descriptors to allocate for keeping packfiles open. Instead, allow the fallback codepath to take over when we get such a failure from getrlimit(). The same issue exists with _SC_OPEN_MAX and sysconf(); restructure the code in a similar way to prepare for a broken sysconf() as well. Noticed-by: Joey Hess <joey@kitenet.net> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-17Merge branch 'jh/loose-object-dirs-creation-race' into maintJunio C Hamano
Two processes creating loose objects at the same time could have failed unnecessarily when the name of their new objects started with the same byte value, due to a race condition. * jh/loose-object-dirs-creation-race: sha1_file.c:create_tmpfile(): Fix race when creating loose object dirs
2013-12-17Merge branch 'sb/sha1-loose-object-info-check-existence' into maintJunio C Hamano
"git cat-file --batch-check=ok" did not check the existence of the named object. * sb/sha1-loose-object-info-check-existence: sha1_loose_object_info(): do not return success on missing object
2013-12-12sha1_file: perform object replacement in sha1_object_info_extended()Christian Couder
sha1_object_info_extended() should perform object replacement if it is needed. The simplest way to do that is to make it call lookup_replace_object_extended(). And now its "unsigned flags" parameter is used as it is passed to lookup_replace_object_extended(). Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-12sha1_object_info_extended(): add an "unsigned flags" parameterChristian Couder
This parameter is not used yet, but it will be used to tell sha1_object_info_extended() if it should perform object replacement or not. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-12sha1_file.c: add lookup_replace_object_extended() to pass flagsChristian Couder
Currently, there is only one caller to lookup_replace_object() that can benefit from passing it some flags, but we expect that there could be more. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-12rename READ_SHA1_FILE_REPLACE flag to LOOKUP_REPLACE_OBJECTChristian Couder
The READ_SHA1_FILE_REPLACE flag is more related to using the lookup_replace_object() function rather than the read_sha1_file() function. We also need such a flag to be used with sha1_object_info() instead of read_sha1_file(). The name LOOKUP_REPLACE_OBJECT is therefore better for this flag. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-06Merge branch 'jk/remove-experimental-loose-object-support'Junio C Hamano
* jk/remove-experimental-loose-object-support: drop support for "experimental" loose objects
2013-12-05Merge branch 'sb/sha1-loose-object-info-check-existence'Junio C Hamano
"git cat-file --batch-check=ok" did not check the existence of the named object. * sb/sha1-loose-object-info-check-existence: sha1_loose_object_info(): do not return success on missing object
2013-12-05Merge branch 'jh/loose-object-dirs-creation-race'Junio C Hamano
When two processes created one loose object file each, which fell into the same fan-out bucket that previously did not have any objects, they both tried to do an equivalent of mkdir .git/objects/$fanout && chmod $shared_perm .git/objects/$fanout before writing into their file .git/objects/$fanout/$remainder, one of which could have failed unnecessarily when the second invocation of mkdir found that the directory already has been created by the first one. * jh/loose-object-dirs-creation-race: sha1_file.c:create_tmpfile(): Fix race when creating loose object dirs
2013-11-21drop support for "experimental" loose objectsJeff King
In git v1.4.3, we introduced a new loose object format that encoded some object information outside of the zlib stream. Ultimately the format was dropped in v1.5.3, but we kept the reading side around to help people migrate objects. Each time we open a loose object, we use a heuristic to check whether it is in the normal loose format, or the experimental one. This heuristic is robust in the face of valid data, but it tends to treat corrupted or garbage data as an experimental object. With the regular format, we would notice quickly that zlib's crc does not check out and complain. With the experimental object, we are likely to extract a nonsensical object size and try to allocate a huge buffer, resulting in xmalloc calling "die". This latter behavior is much worse, for two reasons. One, git reports an allocation error when the real error is corruption. And two, the program dies unconditionally, so you cannot even run fsck (which would otherwise ignore the broken object and keep going). We could try to improve the heuristic to err on the side of normal objects in the face of corruption, but there is really little point. The experimental format is long-dead, and was never enabled by default to begin with. We can instead simply remove it. The only affected repository would be one that explicitly set core.legacyheaders in 2007, and then never repacked in the intervening 6 years. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-06sha1_loose_object_info(): do not return success on missing objectJunio C Hamano
Since 052fe5ea (sha1_loose_object_info: make type lookup optional, 2013-07-12), sha1_loose_object_info() returns happily without checking if the object in question exists, which is not what the the caller sha1_object_info_extended() expects; the caller does not even bother checking the existence of the object itself. Noticed-by: Sven Brauch <svenbrauch@googlemail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-28Sync with v1.8.4.2Junio C Hamano
2013-10-28sha1_file.c:create_tmpfile(): Fix race when creating loose object dirsJohan Herland
There are cases (e.g. when running concurrent fetches in a repo) where multiple Git processes concurrently attempt to create loose objects within the same objects/XX/ dir. The creation of the loose object files is (AFAICS) safe from races, but the creation of the objects/XX/ dir in which the loose objects reside is unsafe, for example: Two concurrent fetches - A and B. As part of its fetch, A needs to store 12aaaaa as a loose object. B, on the other hand, needs to store 12bbbbb as a loose object. The objects/12 directory does not already exist. Concurrently, both A and B determine that they need to create the objects/12 directory (because their first call to git_mkstemp_mode() within create_tmpfile() fails witn ENOENT). One of them - let's say A - executes the following mkdir() call before the other. This first call returns success, and A moves on. When B gets around to calling mkdir(), it fails with EEXIST, because A won the race. The mkdir() error causes B to return -1 from create_tmpfile(), which propagates all the way, resulting in the fetch failing with: error: unable to create temporary file: File exists fatal: failed to write object fatal: unpack-objects failed Although it's hard to add a testcase reproducing this issue, it's easy to provoke if we insert a sleep after the if (mkdir(buffer, 0777) || adjust_shared_perm(buffer)) return -1; block, and then run two concurrent "git fetch"es against the same repo. The fix is to simply handle mkdir() failing with EEXIST as a success. If EEXIST is somehow returned for the wrong reasons (because the relevant objects/XX is not a directory, or is otherwise unsuitable for object storage), the following call to adjust_shared_perm(), or ultimately the retried call to git_mkstemp_mode() will fail, and we end up returning error from create_tmpfile() in any case. Note that there are still cases where two users with unsuitable umasks in a shared repo can end up in two races where one user first wins the mkdir() race to create an objects/XX/ directory, and then the other user wins the adjust_shared_perms() race to chmod() that directory, but fails because it is (transiently, until the first users completes its chmod()) unwriteable to the other user. However, (an equivalent of) this race also exists before this patch, and is made no worse by this patch. Signed-off-by: Johan Herland <johan@herland.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-28sha1_file: move comment about return value where it belongsChristian Couder
Commit 5b0864070 (sha1_object_info_extended: make type calculation optional, Jul 12 2013) changed the return value of the sha1_object_info_extended function to 0/-1 for success/error. Previously this function returned the object type for success or -1 for error. But unfortunately the above commit forgot to change or move the comment above this function that says "returns enum object_type or negative". To fix this inconsistency, let's move the comment above the sha1_object_info function where it is still true. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24sha1_file: export `git_open_noatime`Vicent Marti
The `git_open_noatime` helper can be of general interest for other consumers of git's different on-disk formats. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-25Merge branch 'nd/unpack-entry-optim-in-pack-objects'Jonathan Nieder
* nd/unpack-entry-optim-in-pack-objects: pack-objects: no crc check when the cached version is used
2013-09-17Merge branch 'jk/has-sha1-file-retry-packed'Junio C Hamano
When an object is not found after checking the packfiles and then loose object directory, read_sha1_file() re-checks the packfiles to prevent racing with a concurrent repacker; teach the same logic to has_sha1_file(). * jk/has-sha1-file-retry-packed: has_sha1_file: re-check pack directory before giving up
2013-09-13pack-objects: no crc check when the cached version is usedNguyễn Thái Ngọc Duy
Current code makes pack-objects always do check_pack_crc() in unpack_entry() even if right after that we find out there's a cached version and pack access is not needed. Swap two code blocks, search for cached version first, then check crc. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-04Merge branch 'bc/unuse-packfile'Junio C Hamano
Handle memory pressure and file descriptor pressure separately when deciding to release pack windows to honor resource limits. * bc/unuse-packfile: Don't close pack fd when free'ing pack windows sha1_file: introduce close_one_pack() to close packs on fd pressure
2013-08-30has_sha1_file: re-check pack directory before giving upJeff King
When we read a sha1 file, we first look for a packed version, then a loose version, and then re-check the pack directory again before concluding that we cannot find it. This lets us handle a process that is writing to the repository simultaneously (e.g., receive-pack writing a new pack followed by a ref update, or git-repack packing existing loose objects into a new pack). However, we do not do the same trick with has_sha1_file; we only check the packed objects once, followed by loose objects. This means that we might incorrectly report that we do not have an object, even though we could find it if we simply re-checked the pack directory. By itself, this is usually not a big deal. The other process is running simultaneously, so we may run has_sha1_file before it writes, anyway. It is a race whether we see the object or not. However, we may also see other things the writing process has done (like updating refs); and in that case, we must be able to also see the new objects. For example, imagine we are doing a for_each_ref iteration, and somebody simultaneously pushes. Receive-pack may write the pack and update a ref after we have examined the objects/pack directory, but before the iteration gets to the updated ref. When we do finally see the updated ref, for_each_ref will call has_sha1_file to check whether the ref is broken. If has_sha1_file returns the wrong answer, we erroneously will think that the ref is broken. For a normal iteration without DO_FOR_EACH_INCLUDE_BROKEN, this means that the caller does not see the ref at all (neither the old nor the new value). So not only will we fail to see the new value of the ref (which is acceptable, since we are running simultaneously with the writer, and we might well read the ref before the writer commits its write), but we will not see the old value either. For programs that act on reachability like pack-objects or prune, this can cause data loss, as we may see the objects referenced by the original ref value as dangling (and either omit them from the pack, or delete them via prune). There's no test included here, because the success case is two processes running simultaneously forever. But you can replicate the issue with: # base.sh # run this in one terminal; it creates and pushes # repeatedly to a repository git init parent && (cd parent && # create a base commit that will trigger us looking at # the objects/pack directory before we hit the updated ref echo content >file && git add file && git commit -m base && # set the unpack limit abnormally low, which # lets us simulate full-size pushes using tiny ones git config receive.unpackLimit 1 ) && git clone parent child && cd child && n=0 && while true; do echo $n >file && git add file && git commit -m $n && git push origin HEAD:refs/remotes/child/master && n=$(($n + 1)) done # fsck.sh # now run this simultaneously in another terminal; it # repeatedly fscks, looking for us to consider the # newly-pushed ref broken. We cannot use for-each-ref # here, as it uses DO_FOR_EACH_INCLUDE_BROKEN, which # skips the has_sha1_file check (and if it wants # more information on the object, it will actually read # the object, which does the proper two-step lookup) cd parent && while true; do broken=`git fsck 2>&1 | grep remotes/child` if test -n "$broken"; then echo $broken exit 1 fi done Without this patch, the fsck loop fails within a few seconds (and almost instantly if the test repository actually has a large number of refs). With it, the two can run indefinitely. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>