summaryrefslogtreecommitdiff
path: root/sha1-file.c
AgeCommit message (Collapse)Author
2020-08-13Merge branch 'jt/has_object'Junio C Hamano
A new helper function has_object() has been introduced to make it easier to mark object existence checks that do and don't want to trigger lazy fetches, and a few such checks are converted using it. * jt/has_object: fsck: do not lazy fetch known non-promisor object pack-objects: no fetch when allow-{any,promisor} apply: do not lazy fetch when applying binary sha1-file: introduce no-lazy-fetch has_object()
2020-08-10Merge branch 'jk/strvec'Junio C Hamano
The argv_array API is useful for not just managing argv but any "vector" (NULL-terminated array) of strings, and has seen adoption to a certain degree. It has been renamed to "strvec" to reduce the barrier to adoption. * jk/strvec: strvec: rename struct fields strvec: drop argv_array compatibility layer strvec: update documention to avoid argv_array strvec: fix indentation in renamed calls strvec: convert remaining callers away from argv_array name strvec: convert more callers away from argv_array name strvec: convert builtin/ callers away from argv_array name quote: rename sq_dequote_to_argv_array to mention strvec strvec: rename files from argv-array to strvec argv-array: rename to strvec argv-array: use size_t for count and alloc
2020-08-06sha1-file: introduce no-lazy-fetch has_object()Jonathan Tan
There have been a few bugs wherein Git fetches missing objects whenever the existence of an object is checked, even though it does not need to perform such a fetch. To resolve these bugs, we could look at all the places that has_object_file() (or a similar function) is used. As a first step, introduce a new function has_object() that checks for the existence of an object, with a default behavior of not fetching if the object is missing and the repository is a partial clone. As we verify each has_object_file() (or similar) usage, we can replace it with has_object(), and we will know that we are done when we can delete has_object_file() (and the other similar functions). Also, the new function has_object() has more appropriate defaults: besides not fetching, it also does not recheck packed storage. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04Merge branch 'jt/pretend-object-never-come-from-elsewhere'Junio C Hamano
The pretend-object mechanism checks if the given object already exists in the object store before deciding to keep the data in-core, but the check would have triggered lazy fetching of such an object from a promissor remote. * jt/pretend-object-never-come-from-elsewhere: sha1-file: make pretend_object_file() not prefetch
2020-07-28strvec: convert remaining callers away from argv_array nameJeff King
We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts all of the remaining files, as the resulting diff is reasonably sized. The conversion was done purely mechanically with: git ls-files '*.c' '*.h' | xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-21sha1-file: make pretend_object_file() not prefetchJonathan Tan
When pretend_object_file() is invoked with an object that does not exist (as is the typical case), there is no need to fetch anything from the promisor remote, because the caller already knows what the object is supposed to contain. Therefore, suppress the fetch. (The OBJECT_INFO_QUICK flag is added for the same reason.) This was noticed at $DAYJOB when "blame" was run on a file that had uncommitted modifications. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-28Merge branch 'lr/freshen-file-fix'Junio C Hamano
The code that refreshes the last access and modified time of on-disk packfiles and loose object files have been updated. * lr/freshen-file-fix: freshen_file(): use NULL `times' for implicit current-time
2020-04-15freshen_file(): use NULL `times' for implicit current-timeluciano.rocha@booking.com
Update freshen_file() to use a NULL `times', semantically equivalent to the currently setup, with an explicit `actime' and `modtime' set to the "current time", but with the advantage that it works with other files not owned by the current user. Fixes an issue on shared repos with a split index, where eventually a user's operation creates a shared index, and another user will later do an operation that will try to update its freshness, but will instead raise a warning: $ git status warning: could not freshen shared index '.git/sharedindex.bd736fa10e0519593fefdb2aec253534470865b2' Signed-off-by: Luciano Miguel Ferreira Rocha <luciano.rocha@booking.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27Merge branch 'bc/sha-256-part-1-of-4'Junio C Hamano
SHA-256 transition continues. * bc/sha-256-part-1-of-4: (22 commits) fast-import: add options for rewriting submodules fast-import: add a generic function to iterate over marks fast-import: make find_marks work on any mark set fast-import: add helper function for inserting mark object entries fast-import: permit reading multiple marks files commit: use expected signature header for SHA-256 worktree: allow repository version 1 init-db: move writing repo version into a function builtin/init-db: add environment variable for new repo hash builtin/init-db: allow specifying hash algorithm on command line setup: allow check_repository_format to read repository format t/helper: make repository tests hash independent t/helper: initialize repository if necessary t/helper/test-dump-split-index: initialize git repository t6300: make hash algorithm independent t6300: abstract away SHA-1-specific constants t: use hash-specific lookup tables to define test constants repository: require a build flag to use SHA-256 hex: add functions to parse hex object IDs in any algorithm hex: introduce parsing variants taking hash algorithms ...
2020-03-10real_path_if_valid(): remove unsafe APIAlexandr Miloslavskiy
This commit continues the work started with previous commit. Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24packed_object_info(): use object_id for returning delta baseJeff King
If a caller sets the object_info.delta_base_sha1 to a non-NULL pointer, we'll write the oid of the object's delta base to it. But we can increase our type safety by switching this to a real object_id struct. All of our callers are just pointing into the hash member of an object_id anyway, so there's no inconvenience. Note that we do still keep it as a pointer-to-struct, because the NULL sentinel value tells us whether the caller is even interested in the information. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24hash: implement and use a context cloning functionbrian m. carlson
For all of our SHA-1 implementations and most of our SHA-256 implementations, the hash context we use is a real struct. For these implementations, it's possible to copy a hash context by making a copy of the struct. However, for our libgcrypt implementation, our hash context is a pointer. Consequently, copying it does not lead to an independent hash context like we intended. Fortunately, however, libgcrypt provides us with a handy function to copy hash contexts. Let's add a cloning function to the hash algorithm API, and use it in the one place we need to make a hash context copy. With this change, our libgcrypt SHA-256 implementation is fully functional with all of our other hash implementations. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14Merge branch 'mt/use-passed-repo-more-in-funcs'Junio C Hamano
Some codepaths were given a repository instance as a parameter to work in the repository, but passed the_repository instance to its callees, which has been cleaned up (somewhat). * mt/use-passed-repo-more-in-funcs: sha1-file: allow check_object_signature() to handle any repo sha1-file: pass git_hash_algo to hash_object_file() sha1-file: pass git_hash_algo to write_object_file_prepare() streaming: allow open_istream() to handle any repo pack-check: use given repo's hash_algo at verify_packfile() cache-tree: use given repo's hash_algo at verify_one() diff: make diff_populate_filespec() honor its repo argument
2020-02-14Merge branch 'mt/threaded-grep-in-object-store'Junio C Hamano
Traditionally, we avoided threaded grep while searching in objects (as opposed to files in the working tree) as accesses to the object layer is not thread-safe. This limitation is getting lifted. * mt/threaded-grep-in-object-store: grep: use no. of cores as the default no. of threads grep: move driver pre-load out of critical section grep: re-enable threads in non-worktree case grep: protect packed_git [re-]initialization grep: allow submodule functions to run in parallel submodule-config: add skip_if_read option to repo_read_gitmodules() grep: replace grep_read_mutex by internal obj read lock object-store: allow threaded access to object reading replace-object: make replace operations thread-safe grep: fix racy calls in grep_objects() grep: fix race conditions at grep_submodule() grep: fix race conditions on userdiff calls
2020-01-31sha1-file: allow check_object_signature() to handle any repoMatheus Tavares
Some callers of check_object_signature() can work on arbitrary repositories, but the repo does not get passed to this function. Instead, the_repository is always used internally. To fix possible inconsistencies, allow the function to receive a struct repository and make those callers pass on the repo being handled. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31sha1-file: pass git_hash_algo to hash_object_file()Matheus Tavares
Allow hash_object_file() to work on arbitrary repos by introducing a git_hash_algo parameter. Change callers which have a struct repository pointer in their scope to pass on the git_hash_algo from the said repo. For all other callers, pass on the_hash_algo, which was already being used internally at hash_object_file(). This functionality will be used in the following patch to make check_object_signature() be able to work on arbitrary repos (which, in turn, will be used to fix an inconsistency at object.c:parse_object()). Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31sha1-file: pass git_hash_algo to write_object_file_prepare()Matheus Tavares
Allow write_object_file_prepare() to receive arbitrary 'struct git_hash_algo's instead of always using the_hash_algo. The added parameter will be used in the next commit to make hash_object_file() be able to work with arbitrary git_hash_algo's, as well. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31streaming: allow open_istream() to handle any repoMatheus Tavares
Some callers of open_istream() at archive-tar.c and archive-zip.c are capable of working on arbitrary repositories but the repo struct is not passed down to open_istream(), which uses the_repository internally. For now, that's not a problem since the said callers are only being called with the_repository. But to be consistent and avoid future problems, let's allow open_istream() to receive a struct repository and use that instead of the_repository. This parameter addition will also be used in a future patch to make sha1-file.c:check_object_signature() be able to work on arbitrary repos. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-17object-store: allow threaded access to object readingMatheus Tavares
Allow object reading to be performed by multiple threads protecting it with an internal lock, the obj_read_mutex. The lock usage can be toggled with enable_obj_read_lock() and disable_obj_read_lock(). Currently, the functions which can be safely called in parallel are: read_object_file_extended(), repo_read_object_file(), read_object_file(), read_object_with_reference(), read_object(), oid_object_info() and oid_object_info_extended(). It's also possible to use obj_read_lock() and obj_read_unlock() to protect other sections that cannot execute in parallel with object reading. Probably there are many spots in the functions listed above that could be executed unlocked (and thus, in parallel). But, for now, we are most interested in allowing parallel access to zlib inflation. This is one of the sections where object reading spends most of the time in (e.g. up to one-third of git-grep's execution time in the chromium repo corresponds to inflation) and it's already thread-safe. So, to take advantage of that, the obj_read_mutex is released when calling git_inflate() and re-acquired right after, for every calling spot in oid_object_info_extended()'s call chain. We may refine this lock to also exploit other possible parallel spots in the future, but for now, threaded zlib inflation should already give great speedups for threaded object reading callers. Note that add_delta_base_cache() was also modified to skip adding already present entries to the cache. This wasn't possible before, but it would be now, with the parallel inflation. Take for example the following situation, where two threads - A and B - are executing the code at unpack_entry(): 1. Thread A is performing the decompression of a base O (which is not yet in the cache) at PHASE II. Thread B is simultaneously trying to unpack O, but just starting at PHASE I. 2. Since O is not yet in the cache, B will go to PHASE II to also perform the decompression. 3. When they finish decompressing, one of them will get the object reading mutex and go to PHASE III while the other waits for the mutex. Let’s say A got the mutex first. 4. Thread A will add O to the cache, go throughout the rest of PHASE III and return. 5. Thread B gets the mutex, also add O to the cache (if the check wasn't there) and returns. Finally, it is also important to highlight that the object reading lock can only ensure thread-safety in the mentioned functions thanks to two complementary mechanisms: the use of 'struct raw_object_store's replace_mutex, which guards sections in the object reading machinery that would otherwise be thread-unsafe; and the 'struct pack_window's inuse_cnt, which protects window reading operations (such as the one performed during the inflation of a packed object), allowing them to execute without the acquisition of the obj_read_mutex. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-02sha1-file: remove OBJECT_INFO_SKIP_CACHEDJonathan Tan
In a partial clone, if a user provides the hash of the empty tree ("git mktree </dev/null" - for SHA-1, this is 4b825d...) to a command which requires that that object be parsed, for example: git diff-tree 4b825d <a non-empty tree> then Git will lazily fetch the empty tree, unnecessarily, because parsing of that object invokes repo_has_object_file(), which does not special-case the empty tree. Instead, teach repo_has_object_file() to consult find_cached_object() (which handles the empty tree), thus bringing it in line with the rest of the object-store-accessing functions. A cost is that repo_has_object_file() will now need to oideq upon each invocation, but that is trivial compared to the filesystem lookup or the pack index search required anyway. (And if find_cached_object() needs to do more because of previous invocations to pretend_object_file(), all the more reason to be consistent in whether we present cached objects.) As a historical note, the function now known as repo_read_object_file() was taught the empty tree in 346245a1bb ("hard-code the empty tree object", 2008-02-13), and the function now known as oid_object_info() was taught the empty tree in c4d9986f5f ("sha1_object_info: examine cached_object store too", 2011-02-07). repo_has_object_file() was never updated, perhaps due to oversight. The flag OBJECT_INFO_SKIP_CACHED, introduced later in dfdd4afcf9 ("sha1_file: teach sha1_object_info_extended more flags", 2017-06-26) and used in e83e71c5e1 ("sha1_file: refactor has_sha1_file_with_flags", 2017-06-26), was introduced to preserve this difference in empty-tree handling, but now it can be removed. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-11Merge branch 'bc/object-id-part17'Junio C Hamano
Preparation for SHA-256 upgrade continues. * bc/object-id-part17: (26 commits) midx: switch to using the_hash_algo builtin/show-index: replace sha1_to_hex rerere: replace sha1_to_hex builtin/receive-pack: replace sha1_to_hex builtin/index-pack: replace sha1_to_hex packfile: replace sha1_to_hex wt-status: convert struct wt_status to object_id cache: remove null_sha1 builtin/worktree: switch null_sha1 to null_oid builtin/repack: write object IDs of the proper length pack-write: use hash_to_hex when writing checksums sequencer: convert to use the_hash_algo bisect: switch to using the_hash_algo sha1-lookup: switch hard-coded constants to the_hash_algo config: use the_hash_algo in abbrev comparison combine-diff: replace GIT_SHA1_HEXSZ with the_hash_algo bundle: switch to use the_hash_algo connected: switch GIT_SHA1_HEXSZ to the_hash_algo show-index: switch hard-coded constants to the_hash_algo blame: remove needless comparison with GIT_SHA1_HEXSZ ...
2019-09-18Merge branch 'cc/multi-promisor'Junio C Hamano
Teach the lazy clone machinery that there can be more than one promisor remote and consult them in order when downloading missing objects on demand. * cc/multi-promisor: Move core_partial_clone_filter_default to promisor-remote.c Move repository_format_partial_clone to promisor-remote.c Remove fetch-object.{c,h} in favor of promisor-remote.{c,h} remote: add promisor and partial clone config to the doc partial-clone: add multiple remotes in the doc t0410: test fetching from many promisor remotes builtin/fetch: remove unique promisor remote limitation promisor-remote: parse remote.*.partialclonefilter Use promisor_remote_get_direct() and has_promisor_remote() promisor-remote: use repository_format_partial_clone promisor-remote: add promisor_remote_reinit() promisor-remote: implement promisor_remote_get_direct() Add initial support for many promisor remotes fetch-object: make functions return an error code t0410: remove pipes after git commands
2019-08-19cache: remove null_sha1brian m. carlson
All of the existing uses of null_sha1 can be converted into uses of null_oid, so do so. Remove null_sha1 and is_null_sha1, and define is_null_oid in terms of null_oid. This also has the additional benefit of removing several uses of sha1_to_hex. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-13packfile: drop release_pack_memory()Jeff King
Long ago, in 97bfeb34df (Release pack windows before reporting out of memory., 2006-12-24), we taught xmalloc() and friends to try unmapping pack windows when malloc() failed. It's unlikely that his helps a lot in practice, and it has some downsides. First, the downsides: 1. It makes xmalloc() not thread-safe. We've worked around this in pack-objects.c, which installs its own locking version of the try_to_free_routine(). But other threaded code doesn't. 2. It makes the system as a whole harder to reason about. Functions which allocate heap memory under the hood may have farther-reaching effects than expected. That might be worth the tradeoff if there's a benefit. But in practice, it seems unlikely. We're generally dealing with mmap'd files, so the OS is going to do a much better job at responding to memory pressure by dropping individual pages (the exception is systems with NO_MMAP, but even there the OS can probably respond just as well with swapping). So the only thing we're really freeing is address space. On 64-bit systems, we have plenty of that to go around. On 32-bit systems, it could possibly help. But around the same time we made two other changes: 77ccc5bbd1 (Introduce new config option for mmap limit., 2006-12-23) and 60bb8b1453 (Fully activate the sliding window pack access., 2006-12-23). Together that means that a 32-bit system should have no more than 256MB total of packed-git mmaps at one time, split between a few 32MB windows. It's unlikely we have any address space problems since then, but we don't have any data since the features were all added at the same time. Likewise, xmmap() will try to free memory. At first glance, it seems like we'd need this (when we try to mmap a new window, we might need to close an old one to save address space on a 32-bit system). But we're saved again by core.packedGitLimit: if we're going to exceed our 256MB limit, we'll close an existing window before we even call mmap(). So it seems unlikely that this feature is actually doing anything useful. And while we don't have reports of it harming anything (probably because it rarely if ever kicks in), it would be nice to simplify the system overall. This patch drops the whole try_to_free system from xmalloc(), as well as the manual pack memory release in xmmap(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-07sha1-file: release strbuf after useRené Scharfe
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-19Merge branch 'jk/check-connected-with-alternates'Junio C Hamano
The tips of refs from the alternate object store can be used as starting point for reachability computation now. * jk/check-connected-with-alternates: check_everything_connected: assume alternate ref tips are valid object-store.h: move for_each_alternate_ref() from transport.h
2019-07-01object-store.h: move for_each_alternate_ref() from transport.hJeff King
There's nothing inherently transport-related about enumerating the alternate ref tips. The code has lived in transport.[ch] because the only use so far had been advertising available tips during transport. But it could be used for more, and a future patch will teach rev-list to access these refs. Let's move it alongside the other alt-odb code, declaring it in object-store.h with the implementation in sha1-file.c. This lets us drop the inclusion of transport.h from receive-pack, which perhaps shows how it was misplaced (though receive-pack is about transporting objects, transport.h is mostly about the client side). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-27sha1-file.c: remove the_repo from read_object_with_reference()Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-25Use promisor_remote_get_direct() and has_promisor_remote()Christian Couder
Instead of using the repository_format_partial_clone global and fetch_objects() directly, let's use has_promisor_remote() and promisor_remote_get_direct(). This way all the configured promisor remotes will be taken into account, not only the one specified by extensions.partialClone. Also when cloning or fetching using a partial clone filter, remote.origin.promisor will be set to "true" instead of setting extensions.partialClone to "origin". This makes it possible to use many promisor remote just by fetching from them. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-25fetch-object: make functions return an error codeChristian Couder
The callers of the fetch_object() and fetch_objects() might be interested in knowing if these functions succeeded or not. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-17Merge branch 'ds/object-info-for-prefetch-fix'Junio C Hamano
Code cleanup and futureproof. * ds/object-info-for-prefetch-fix: sha1-file: split OBJECT_INFO_FOR_PREFETCH
2019-05-28sha1-file: split OBJECT_INFO_FOR_PREFETCHDerrick Stolee
The OBJECT_INFO_FOR_PREFETCH bitflag was added to sha1-file.c in 0f4a4fb1 (sha1-file: support OBJECT_INFO_FOR_PREFETCH, 2019-03-29) and is used to prevent the fetch_objects() method when enabled. However, there is a problem with the current use. The definition of OBJECT_INFO_FOR_PREFETCH is given by adding 32 to OBJECT_INFO_QUICK. This is clearly stated above the definition (in a comment) that this is so OBJECT_INFO_FOR_PREFETCH implies OBJECT_INFO_QUICK. The problem is that using "flag & OBJECT_INFO_FOR_PREFETCH" means that OBJECT_INFO_QUICK also implies OBJECT_INFO_FOR_PREFETCH. Split out the single bit from OBJECT_INFO_FOR_PREFETCH into a new OBJECT_INFO_SKIP_FETCH_OBJECT as the single bit and keep OBJECT_INFO_FOR_PREFETCH as the union of two flags. This allows a clearer use of flag checking while also keeping the implication of OBJECT_INFO_QUICK. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-25Merge branch 'jt/batch-fetch-blobs-in-diff'Junio C Hamano
While running "git diff" in a lazy clone, we can upfront know which missing blobs we will need, instead of waiting for the on-demand machinery to discover them one by one. Aim to achieve better performance by batching the request for these promised blobs. * jt/batch-fetch-blobs-in-diff: diff: batch fetching of missing blobs sha1-file: support OBJECT_INFO_FOR_PREFETCH
2019-04-01sha1-file: support OBJECT_INFO_FOR_PREFETCHJonathan Tan
Teach oid_object_info_extended() to support a new flag that inhibits fetching of missing objects. This is equivalent to setting fetch_is_missing to 0, calling oid_object_info_extended(), then setting fetch_if_missing to whatever it was before. Update unpack-trees.c to use this new flag instead of repeatedly setting fetch_if_missing. This new flag complicates things slightly in that there are now 2 ways to do the same thing. But this eliminates the need to repeatedly set a global variable, and more importantly, allows prefetching to be done in parallel (in the future); hence, this patch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01hash: add a function to lookup hash algorithm by lengthbrian m. carlson
There are some cases, such as the dumb HTTP transport and bundles, where we can only determine the hash algorithm in use by the length of the object IDs. Provide a function that looks up the algorithm by length. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-07Merge branch 'jk/loose-object-cache-oid'Junio C Hamano
Code clean-up. * jk/loose-object-cache-oid: prefer "hash mismatch" to "sha1 mismatch" sha1-file: avoid "sha1 file" for generic use in messages sha1-file: prefer "loose object file" to "sha1 file" in messages sha1-file: drop has_sha1_file() convert has_sha1_file() callers to has_object_file() sha1-file: convert pass-through functions to object_id sha1-file: modernize loose header/stream functions sha1-file: modernize loose object file functions http: use struct object_id instead of bare sha1 update comment references to sha1_object_info() sha1-file: fix outdated sha1 comment references
2019-02-05Merge branch 'sb/more-repo-in-api'Junio C Hamano
The in-core repository instances are passed through more codepaths. * sb/more-repo-in-api: (23 commits) t/helper/test-repository: celebrate independence from the_repository path.h: make REPO_GIT_PATH_FUNC repository agnostic commit: prepare free_commit_buffer and release_commit_memory for any repo commit-graph: convert remaining functions to handle any repo submodule: don't add submodule as odb for push submodule: use submodule repos for object lookup pretty: prepare format_commit_message to handle arbitrary repositories commit: prepare logmsg_reencode to handle arbitrary repositories commit: prepare repo_unuse_commit_buffer to handle any repo commit: prepare get_commit_buffer to handle any repo commit-reach: prepare in_merge_bases[_many] to handle any repo commit-reach: prepare get_merge_bases to handle any repo commit-reach.c: allow get_merge_bases_many_0 to handle any repo commit-reach.c: allow remove_redundant to handle any repo commit-reach.c: allow merge_bases_many to handle any repo commit-reach.c: allow paint_down_to_common to handle any repo commit: allow parse_commit* to handle any repo object: parse_object to honor its repository argument object-store: prepare has_{sha1, object}_file to handle any repo object-store: prepare read_object_file to deal with any repo ...
2019-01-29Merge branch 'bc/sha-256'Junio C Hamano
Add sha-256 hash and plug it through the code to allow building Git with the "NewHash". * bc/sha-256: hash: add an SHA-256 implementation using OpenSSL sha256: add an SHA-256 implementation using libgcrypt Add a base implementation of SHA-256 support commit-graph: convert to using the_hash_algo t/helper: add a test helper to compute hash speed sha1-file: add a constant for hash block size t: make the sha1 test-tool helper generic t: add basic tests for our SHA-1 implementation cache: make hashcmp and hasheq work with larger hashes hex: introduce functions to print arbitrary hashes sha1-file: provide functions to look up hash algorithms sha1-file: rename algorithm to "sha1"
2019-01-18Merge branch 'rs/sha1-file-close-mapped-file-on-error'Junio C Hamano
Code clean-up. * rs/sha1-file-close-mapped-file-on-error: sha1-file: close fd of empty file in map_sha1_file_1()
2019-01-08prefer "hash mismatch" to "sha1 mismatch"Jeff King
To future-proof ourselves against a change in the hash, let's use the more generic "hash mismatch" to refer to integrity problems. Note that we do advertise this exact string in git-fsck(1). However, the message itself is marked for translation, meaning we do not expect it to be machine-readable. While we're touching that documentation, let's also update it for grammar and clarity. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-08sha1-file: avoid "sha1 file" for generic use in messagesJeff King
These error messages say "sha1 file", which is vague and not common in user-facing documentation. Unlike the conversions from the previous commit, these do not always refer to loose objects. In finalize_object_file() we could be dealing with a packfile. Let's just say "unable to write file" instead; since we include the filename, the nature of the file is clear from the rest of the message. In force_object_loose(), we're calling into read_object(), which could actually be _any_ type of object. Just say "object". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-08sha1-file: prefer "loose object file" to "sha1 file" in messagesJeff King
When we're reporting an error for a loose object, let's use that term. It's more consistent with other parts of Git, and it is future-proof against changes to the hash function. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-08sha1-file: drop has_sha1_file()Jeff King
There are no callers left of has_sha1_file() or its with_flags() variant. Let's drop them, and convert has_object_file() from a wrapper into the "real" function. Ironically, the sha1 variant was just copying into an object_id internally, so the resulting code is actually shorter! We can also drop the coccinelle rules for catching has_sha1_file() callers. Since the function no longer exists, the compiler will do that for us. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-08convert has_sha1_file() callers to has_object_file()Jeff King
The only remaining callers of has_sha1_file() actually have an object_id already. They can use the "object" variant, rather than dereferencing the hash themselves. The code changes here were completely generated by the included coccinelle patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-08sha1-file: convert pass-through functions to object_idJeff King
These two static functions, read_object() and quick_has_loose(), both have to hashcpy() their bare-sha1 arguments into object_id structs to pass them along. Since all of their callers actually have object_id structs in the first place, we can eliminate the copying by adjusting their input parameters. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-08object-store: retire odb_load_loose_cache()René Scharfe
Inline odb_load_loose_cache() into its only remaining caller, odb_loose_cache(). The latter offers a nicer interface for loading the cache, as it doesn't require callers to deal with fanout directory numbers directly. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-08sha1-file: modernize loose header/stream functionsJeff King
As with the open/map/close functions for loose objects that were recently converted, the functions for parsing the loose object stream use the name "sha1" and a bare "unsigned char *". Let's fix that so that unpack_sha1_header() becomes unpack_loose_header(), etc. These conversions are less clear-cut than the file access functions. You could argue that the they are parsing Git's canonical object format (i.e., "type size\0contents", over which we compute the hash), which is not strictly tied to loose storage. But in practice these functions are used only for loose objects, and using the term "loose_header" (instead of "object_header") distinguishes it from the object header found in packfiles (which contains the same information in a different format). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-08object-store: use one oid_array per subdirectory for loose cacheRené Scharfe
The loose objects cache is filled one subdirectory at a time as needed. It is stored in an oid_array, which has to be resorted after each add operation. So when querying a wide range of objects, the partially filled array needs to be resorted up to 255 times, which takes over 100 times longer than sorting once. Use one oid_array for each subdirectory. This ensures that entries have to only be sorted a single time. It also avoids eight binary search steps for each cache lookup as a small bonus. The cache is used for collision checks for the log placeholders %h, %t and %p, and we can see the change speeding them up in a repository with ca. 100 objects per subdirectory: $ git count-objects 26733 objects, 68808 kilobytes Test HEAD^ HEAD -------------------------------------------------------------------- 4205.1: log with %H 0.51(0.47+0.04) 0.51(0.49+0.02) +0.0% 4205.2: log with %h 0.84(0.82+0.02) 0.60(0.57+0.03) -28.6% 4205.3: log with %T 0.53(0.49+0.04) 0.52(0.48+0.03) -1.9% 4205.4: log with %t 0.84(0.80+0.04) 0.60(0.59+0.01) -28.6% 4205.5: log with %P 0.52(0.48+0.03) 0.51(0.50+0.01) -1.9% 4205.6: log with %p 0.85(0.78+0.06) 0.61(0.56+0.05) -28.2% 4205.7: log with %h-%h-%h 0.96(0.92+0.03) 0.69(0.64+0.04) -28.1% Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-08sha1-file: modernize loose object file functionsJeff King
The loose object access code in sha1-file.c is some of the oldest in Git, and could use some modernizing. It mostly uses "unsigned char *" for object ids, which these days should be "struct object_id". It also uses the term "sha1_file" in many functions, which is confusing. The term "loose_objects" is much better. It clearly distinguishes them from packed objects (which didn't even exist back when the name "sha1_file" came into being). And it also distinguishes it from the checksummed-file concept in csum-file.c (which until recently was actually called "struct sha1file"!). This patch converts the functions {open,close,map,stat}_sha1_file() into open_loose_object(), etc, and switches their sha1 arguments for object_id structs. Similarly, path functions like fill_sha1_path() become fill_loose_path() and use object_ids. The function sha1_loose_object_info() already says "loose", so we can just drop the "sha1" (and teach it to use object_id). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-08object-store: factor out odb_clear_loose_cache()René Scharfe
Add and use a function for emptying the loose object cache, so callers don't have to know any of its implementation details. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>