summaryrefslogtreecommitdiff
path: root/unpack-trees.c
AgeCommit message (Collapse)Author
2019-09-18Merge branch 'cc/multi-promisor'Junio C Hamano
Teach the lazy clone machinery that there can be more than one promisor remote and consult them in order when downloading missing objects on demand. * cc/multi-promisor: Move core_partial_clone_filter_default to promisor-remote.c Move repository_format_partial_clone to promisor-remote.c Remove fetch-object.{c,h} in favor of promisor-remote.{c,h} remote: add promisor and partial clone config to the doc partial-clone: add multiple remotes in the doc t0410: test fetching from many promisor remotes builtin/fetch: remove unique promisor remote limitation promisor-remote: parse remote.*.partialclonefilter Use promisor_remote_get_direct() and has_promisor_remote() promisor-remote: use repository_format_partial_clone promisor-remote: add promisor_remote_reinit() promisor-remote: implement promisor_remote_get_direct() Add initial support for many promisor remotes fetch-object: make functions return an error code t0410: remove pipes after git commands
2019-08-22Merge branch 'jk/tree-walk-overflow'Junio C Hamano
Codepaths to walk tree objects have been audited for integer overflows and hardened. * jk/tree-walk-overflow: tree-walk: harden make_traverse_path() length computations tree-walk: add a strbuf wrapper for make_traverse_path() tree-walk: accept a raw length for traverse_path_len() tree-walk: use size_t consistently tree-walk: drop oid from traverse_info setup_traverse_info(): stop copying oid
2019-08-01tree-walk: harden make_traverse_path() length computationsJeff King
The make_traverse_path() function isn't very careful about checking its output buffer boundaries. In fact, it doesn't even _know_ the size of the buffer it's writing to, and just assumes that the caller used traverse_path_len() correctly. And even then we assume that our traverse_info.pathlen components are all correct, and just blindly write into the buffer. Let's improve this situation a bit: - have the caller pass in their allocated buffer length, which we'll check against our own computations - check for integer underflow as we do our backwards-insertion of pathnames into the buffer - check that we do not run out items in our list to traverse before we've filled the expected number of bytes None of these should be triggerable in practice (especially since our switch to size_t everywhere in a previous commit), but it doesn't hurt to check our assumptions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-01tree-walk: add a strbuf wrapper for make_traverse_path()Jeff King
All but one of the callers of make_traverse_path() allocate a new heap buffer to store the path. Let's give them an easy way to write to a strbuf, which saves them from computing the length themselves (which is especially tricky when they want to add to the path). It will also make it easier for us to change the make_traverse_path() interface in a future patch to improve its bounds-checking. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-01tree-walk: accept a raw length for traverse_path_len()Jeff King
We take a "struct name_entry", but only care about the length of the path name. Let's just take that length directly, making it easier to use the function from callers that sometimes do not have a name_entry at all. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-01tree-walk: use size_t consistentlyJeff King
We store and manipulate the cumulative traverse_info.pathlen as an "int", which can overflow when we are fed ridiculously long pathnames (e.g., ones at the edge of 2GB or 4GB, even if the individual tree entry names are smaller than that). The results can be confusing, though after some prodding I was not able to use this integer overflow to cause an under-allocated buffer. Let's consistently use size_t to generate and store these, and make sure our addition doesn't overflow. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-31tree-walk: drop oid from traverse_infoJeff King
As the previous commit shows, the presence of an oid in each level of the traverse_info is confusing and ultimately not necessary. Let's drop it to make it clear that it will not always be set (as well as convince us that it's unused, and let the compiler catch any merges with other branches that do add new uses). Since the oid is part of name_entry, we'll actually stop embedding a name_entry entirely, and instead just separately hold the pathname, its length, and the mode. This makes the resulting code slightly more verbose as we have to pass those elements around individually. But it also makes it more clear what each code path is going to use (and in most of the paths, we really only care about the pathname itself). A few of these conversions are noisier than they need to be, as they also take the opportunity to rename "len" to "namelen" for clarity (especially where we also have "pathlen" or "ce_len" alongside). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-19Merge branch 'nd/tree-walk-with-repo'Junio C Hamano
The tree-walk API learned to pass an in-core repository instance throughout more codepaths. * nd/tree-walk-with-repo: t7814: do not generate same commits in different repos Use the right 'struct repository' instead of the_repository match-trees.c: remove the_repo from shift_tree*() tree-walk.c: remove the_repo from get_tree_entry_follow_symlinks() tree-walk.c: remove the_repo from get_tree_entry() tree-walk.c: remove the_repo from fill_tree_descriptor() sha1-file.c: remove the_repo from read_object_with_reference()
2019-07-09Merge branch 'nd/switch-and-restore'Junio C Hamano
Two new commands "git switch" and "git restore" are introduced to split "checking out a branch to work on advancing its history" and "checking out paths out of the index and/or a tree-ish to work on advancing the current history" out of the single "git checkout" command. * nd/switch-and-restore: (46 commits) completion: disable dwim on "git switch -d" switch: allow to switch in the middle of bisect t2027: use test_must_be_empty Declare both git-switch and git-restore experimental help: move git-diff and git-reset to different groups doc: promote "git restore" user-manual.txt: prefer 'merge --abort' over 'reset --hard' completion: support restore t: add tests for restore restore: support --patch restore: replace --force with --ignore-unmerged restore: default to --source=HEAD when only --staged is specified restore: reject invalid combinations with --staged restore: add --worktree and --staged checkout: factor out worktree checkout code restore: disable overlay mode by default restore: make pathspec mandatory restore: take tree-ish from --source option instead checkout: split part of it to new command 'restore' doc: promote "git switch" ...
2019-06-27tree-walk.c: remove the_repo from fill_tree_descriptor()Nguyễn Thái Ngọc Duy
While at there, clean up the_repo usage in builtin/merge-tree.c a tiny bit. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-25Use promisor_remote_get_direct() and has_promisor_remote()Christian Couder
Instead of using the repository_format_partial_clone global and fetch_objects() directly, let's use has_promisor_remote() and promisor_remote_get_direct(). This way all the configured promisor remotes will be taken into account, not only the one specified by extensions.partialClone. Also when cloning or fetching using a partial clone filter, remote.origin.promisor will be set to "true" instead of setting extensions.partialClone to "origin". This makes it possible to use many promisor remote just by fetching from them. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-25Merge branch 'jt/batch-fetch-blobs-in-diff'Junio C Hamano
While running "git diff" in a lazy clone, we can upfront know which missing blobs we will need, instead of waiting for the on-demand machinery to discover them one by one. Aim to achieve better performance by batching the request for these promised blobs. * jt/batch-fetch-blobs-in-diff: diff: batch fetching of missing blobs sha1-file: support OBJECT_INFO_FOR_PREFETCH
2019-04-25Merge branch 'nd/checkout-m'Junio C Hamano
"git checkout -m <other>" was about carrying the differences between HEAD and the working-tree files forward while checking out another branch, and ignored the differences between HEAD and the index. The command has been taught to abort when the index and the HEAD are different. * nd/checkout-m: checkout: prevent losing staged changes with --merge read-tree: add --quiet unpack-trees: rename "gently" flag to "quiet" unpack-trees: keep gently check inside add_rejected_path
2019-04-25Merge branch 'jk/unused-params-even-more'Junio C Hamano
Code cleanup. * jk/unused-params-even-more: parse_opt_ref_sorting: always use with NONEG flag pretty: drop unused strbuf from parse_padding_placeholder() pretty: drop unused "type" parameter in needs_rfc2047_encoding() parse-options: drop unused ctx parameter from show_gitcomp() fetch_pack(): drop unused parameters report_path_error(): drop unused prefix parameter unpack-trees: drop unused error_type parameters unpack-trees: drop name_entry from traverse_by_cache_tree() test-date: drop unused "now" parameter from parse_dates() update-index: drop unused prefix_length parameter from do_reupdate() log: drop unused "len" from show_tagger() log: drop unused rev_info from early output revision: drop some unused "revs" parameters
2019-04-25Merge branch 'bp/post-index-change-hook'Junio C Hamano
A new hook "post-index-change" is called when the on-disk index file changes, which can help e.g. a virtualized working tree implementation. * bp/post-index-change-hook: read-cache: add post-index-change hook
2019-04-02doc: promote "git switch"Nguyễn Thái Ngọc Duy
The new command "git switch" is added to avoid the confusion of one-command-do-all "git checkout" for new users. They are also helpful to avoid ambiguation context. For these reasons, promote it everywhere possible. This includes documentation, suggestions/advice from other commands... The "Checking out files" progress line in unpack-trees.c is also updated to "Updating files" to be neutral to both git-checkout and git-switch. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01sha1-file: support OBJECT_INFO_FOR_PREFETCHJonathan Tan
Teach oid_object_info_extended() to support a new flag that inhibits fetching of missing objects. This is equivalent to setting fetch_is_missing to 0, calling oid_object_info_extended(), then setting fetch_if_missing to whatever it was before. Update unpack-trees.c to use this new flag instead of repeatedly setting fetch_if_missing. This new flag complicates things slightly in that there are now 2 ways to do the same thing. But this eliminates the need to repeatedly set a global variable, and more importantly, allows prefetching to be done in parallel (in the future); hence, this patch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24unpack-trees: rename "gently" flag to "quiet"Nguyễn Thái Ngọc Duy
The gently flag was added in 17e4642667 (Add flag to make unpack_trees() not print errors. - 2008-02-07) to suppress error messages. The name "gently" does not quite express that. Granted, being quiet is gentle but it could mean not performing some other actions. Rename the flag to "quiet" to be more on point. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24unpack-trees: keep gently check inside add_rejected_pathNguyễn Thái Ngọc Duy
This basically follows the footsteps of 6a143aa2b2 (checkout -m: attempt merge when deletion of path was staged - 2014-08-12) where there gently check is moved inside reject_merge() so that callers do not accidentally forget it. add_rejected_path() has the same usage pattern. All call sites check gently first, then decide to call add_rejected_path() if needed. Move the check inside. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-21unpack-trees: fix oneway_merge accidentally carry over stage indexNguyễn Thái Ngọc Duy
Phillip found out that 'git checkout -f <branch>' does not restore conflict/unmerged files correctly. All tracked files should be taken from <branch> and all non-zero stages removed. Most of this is true, except that the final file could be in stage one instead of zero. "checkout -f" (among other commands) does this with one-way merge, which is supposed to take stat info from the index and everything else from the given tree. The add_entry(.., old, ...) call in oneway_merge() though will keep stage index from the index. This is normally not a problem if the entry from the index is normal (stage #0). But if there is a conflict, stage #0 does not exist and we'll get stage #1 entry as "old" variable, which gets recorded in the final index. Fix it by clearing stage mask. This bug probably comes from b5b425074e (git-read-tree: make one-way merge also honor the "update" flag, 2005-06-07). Before this commit, we may create the final ("dst") index entry from the one in index, but we do clear CE_STAGEMASK. I briefly checked two- and three-way merge functions. I think we don't have the same problem in those. Reported-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-20unpack-trees: drop unused error_type parametersJeff King
The verify_clean_subdirectory() helper takes an error_type parameter from the caller, but doesn't actually use it. Instead, when it calls add_rejected_path() it passes NOT_UPTODATE_DIR, its own custom error type which is more specific than what the caller provides. Likewise for verify_clean_submodule(), which always passes WOULD_LOSE_SUBMODULE. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-20unpack-trees: drop name_entry from traverse_by_cache_tree()Jeff King
We pull the names from the existing index rather than the tree entry, which is after all the point of this function. Let's drop the unused "names" parameter. Note that we leave the "nr_names" parameter, as it tells us how many trees we are traversing (and thus how many index stages to set up). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-07Merge branch 'tg/checkout-no-overlay'Junio C Hamano
"git checkout --no-overlay" can be used to trigger a new mode of checking out paths out of the tree-ish, that allows paths that match the pathspec that are in the current index and working tree and are not in the tree-ish. * tg/checkout-no-overlay: revert "checkout: introduce checkout.overlayMode config" checkout: introduce checkout.overlayMode config checkout: introduce --{,no-}overlay option checkout: factor out mark_cache_entry_for_checkout function checkout: clarify comment read-cache: add invalidate parameter to remove_marked_cache_entries entry: support CE_WT_REMOVE flag in checkout_entry entry: factor out unlink_entry function move worktree tests to t24*
2019-02-15read-cache: add post-index-change hookBen Peart
Add a post-index-change hook that is invoked after the index is written in do_write_locked_index(). This hook is meant primarily for notification, and cannot affect the outcome of git commands that trigger the index write. The hook is passed a flag to indicate whether the working directory was updated or not and a flag indicating if a skip-worktree bit could have changed. These flags enable the hook to optimize its response to the index change notification. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-07Merge branch 'nd/the-index-final'Junio C Hamano
The assumption to work on the single "in-core index" instance has been reduced from the library-ish part of the codebase. * nd/the-index-final: cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch read-cache.c: remove the_* from index_has_changes() merge-recursive.c: remove implicit dependency on the_repository merge-recursive.c: remove implicit dependency on the_index sha1-name.c: remove implicit dependency on the_index read-cache.c: replace update_index_if_able with repo_& read-cache.c: kill read_index() checkout: avoid the_index when possible repository.c: replace hold_locked_index() with repo_hold_locked_index() notes-utils.c: remove the_repository references grep: use grep_opt->repo instead of explict repo argument
2019-01-29Merge branch 'bc/tree-walk-oid'Junio C Hamano
The code to walk tree objects has been taught that we may be working with object names that are not computed with SHA-1. * bc/tree-walk-oid: cache: make oidcpy always copy GIT_MAX_RAWSZ bytes tree-walk: store object_id in a separate member match-trees: use hashcpy to splice trees match-trees: compute buffer offset correctly when splicing tree-walk: copy object ID before use
2019-01-24cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switchNguyễn Thái Ngọc Duy
By default, index compat macros are off from now on, because they could hide the_index dependency. Only those in builtin can use it. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15tree-walk: store object_id in a separate memberbrian m. carlson
When parsing a tree, we read the object ID directly out of the tree buffer. This is normally fine, but such an object ID cannot be used with oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1, there may not be that many bytes to copy. Instead, store the object ID in a separate struct member. Since we can no longer efficiently compute the path length, store that information as well in struct name_entry. Ensure we only copy the object ID into the new buffer if the path length is nonzero, as some callers will pass us an empty path with no object ID following it, and we will not want to read past the end of the buffer. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14Merge branch 'nd/checkout-noisy'Junio C Hamano
"git checkout [<tree-ish>] path..." learned to report the number of paths that have been checked out of the index or the tree-ish, which gives it the same degree of noisy-ness as the case in which the command checks out a branch. * nd/checkout-noisy: t0027: squelch checkout path run outside test_expect_* block checkout: print something when checking out paths
2019-01-14Merge branch 'nd/attr-pathspec-in-tree-walk'Junio C Hamano
The traversal over tree objects has learned to honor ":(attr:label)" pathspec match, which has been implemented only for enumerating paths on the filesystem. * nd/attr-pathspec-in-tree-walk: tree-walk: support :(attr) matching dir.c: move, rename and export match_attrs() pathspec.h: clean up "extern" in function declarations tree-walk.c: make tree_entry_interesting() take an index tree.c: make read_tree*() take 'struct repository *'
2019-01-02read-cache: add invalidate parameter to remove_marked_cache_entriesThomas Gummerer
When marking cache entries for removal, and later removing them all at once using 'remove_marked_cache_entries()', cache entries currently have to be invalidated manually in the cache tree and in the untracked cache. Add an invalidate flag to the function. With the flag set, the function will take care of invalidating the path in the cache tree and in the untracked cache. Note that the current callsites already do the invalidation properly in other places, so we're just passing 0 from there to keep the status quo. This will be useful in a subsequent commit. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-02entry: factor out unlink_entry functionThomas Gummerer
Factor out the 'unlink_entry()' function from unpack-trees.c to entry.c. It will be used in other places as well in subsequent steps. As it's no longer a static function, also move the documentation to the header file to make it more discoverable. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-19tree-walk.c: make tree_entry_interesting() take an indexNguyễn Thái Ngọc Duy
In order to support :(attr) when matching pathspec on a tree, tree_entry_interesting() needs to take an index (because git_check_attr() needs it). This is the preparation step for it. This also makes it clearer what index we fall back to when looking up attributes during an unpack-trees operation: the source index. This also fixes revs->pruning.repo initialization that should have been done in 2abf350385 (revision.c: remove implicit dependency on the_index - 2018-09-21). Without it, skip_uninteresting() will dereference a NULL pointer through this call chain get_revision(revs) get_revision_internal get_revision_1 try_to_simplify_commit rev_compare_tree diff_tree_oid(..., &revs->pruning) ll_diff_tree_oid diff_tree_paths ll_diff_tree skip_uninteresting Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-14checkout: print something when checking out pathsNguyễn Thái Ngọc Duy
One of the problems with "git checkout" is that it does so many different things and could confuse people specially when we fail to handle ambiguation correctly. One way to help with that is tell the user what sort of operation is actually carried out. When switching branches, we always print something unless --quiet, either - "HEAD is now at ..." - "Reset branch ..." - "Already on ..." - "Switched to and reset ..." - "Switched to a new branch ..." - "Switched to branch ..." Checking out paths however is silent. Print something so that if we got the user intention wrong, they won't waste too much time to find that out. For the remaining cases of checkout we now print either - "Checked out ... paths out of the index" - "Checked out ... paths out of <abbrev hash>" Since the purpose of printing this is to help disambiguate. Only do it when "--" is missing. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12cache-tree.c: remove the_repository referencesNguyễn Thái Ngọc Duy
This case is more interesting than other boring "remove the_repo" commits because while we need access to the object database, we cannot simply use r->index because unpack-trees.c can operate on a temporary index, not $GIT_DIR/index. Ideally we should be able to pass an object database to lookup_tree() but that ship has sailed. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19unpack-trees: avoid dead store for struct progressCarlo Marcelo Arenas Belón
it is unconditionally initialized a few lines below Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-24Merge branch 'jt/lazy-object-fetch-fix'Junio C Hamano
The code to backfill objects in lazily cloned repository did not work correctly, which has been corrected. * jt/lazy-object-fetch-fix: fetch-object: set exact_oid when fetching fetch-object: unify fetch_object[s] functions
2018-09-17Merge branch 'jk/cocci'Junio C Hamano
spatch transformation to replace boolean uses of !hashcmp() to newly introduced oideq() is added, and applied, to regain performance lost due to support of multiple hash algorithms. * jk/cocci: show_dirstat: simplify same-content check read-cache: use oideq() in ce_compare functions convert hashmap comparison functions to oideq() convert "hashcmp() != 0" to "!hasheq()" convert "oidcmp() != 0" to "!oideq()" convert "hashcmp() == 0" to hasheq() convert "oidcmp() == 0" to oideq() introduce hasheq() and oideq() coccinelle: use <...> for function exclusion
2018-09-17Merge branch 'nd/unpack-trees-with-cache-tree'Junio C Hamano
The unpack_trees() API used in checking out a branch and merging walks one or more trees along with the index. When the cache-tree in the index tells us that we are walking a tree whose flattened contents is known (i.e. matches a span in the index), as linearly scanning a span in the index is much more efficient than having to open tree objects recursively and listing their entries, the walk can be optimized, which is done in this topic. * nd/unpack-trees-with-cache-tree: Document update for nd/unpack-trees-with-cache-tree cache-tree: verify valid cache-tree in the test suite unpack-trees: add missing cache invalidation unpack-trees: reuse (still valid) cache-tree from src_index unpack-trees: reduce malloc in cache-tree walk unpack-trees: optimize walking same trees with cache-tree unpack-trees: add performance tracing trace.h: support nested performance tracing
2018-09-17Merge branch 'nd/clone-case-smashing-warning'Junio C Hamano
Running "git clone" against a project that contain two files with pathnames that differ only in cases on a case insensitive filesystem would result in one of the files lost because the underlying filesystem is incapable of holding both at the same time. An attempt is made to detect such a case and warn. * nd/clone-case-smashing-warning: clone: report duplicate entries on case-insensitive filesystems
2018-09-13fetch-object: unify fetch_object[s] functionsJonathan Tan
There are fetch_object() and fetch_objects() helpers in fetch-object.h; as the latter takes "struct oid_array", the former cannot be made into a thin wrapper around the latter without an extra allocation and set-up cost. Update fetch_objects() to take an array of "struct object_id" and number of elements in it as separate parameters, remove fetch_object(), and adjust all existing callers of these functions to use the new fetch_objects(). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29convert "oidcmp() == 0" to oideq()Jeff King
Using the more restrictive oideq() should, in the long run, give the compiler more opportunities to optimize these callsites. For now, this conversion should be a complete noop with respect to the generated code. The result is also perhaps a little more readable, as it avoids the "zero is equal" idiom. Since it's so prevalent in C, I think seasoned programmers tend not to even notice it anymore, but it can sometimes make for awkward double negations (e.g., we can drop a few !!oidcmp() instances here). This patch was generated almost entirely by the included coccinelle patch. This mechanical conversion should be completely safe, because we check explicitly for cases where oidcmp() is compared to 0, which is what oideq() is doing under the hood. Note that we don't have to catch "!oidcmp()" separately; coccinelle's standard isomorphisms make sure the two are treated equivalently. I say "almost" because I did hand-edit the coccinelle output to fix up a few style violations (it mostly keeps the original formatting, but sometimes unwraps long lines). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-27Document update for nd/unpack-trees-with-cache-treeNguyễn Thái Ngọc Duy
Fix an incorrect comment in the new code added in b4da37380b (unpack-trees: optimize walking same trees with cache-tree - 2018-08-18) and document about the new test variable that is enabled by default in test-lib.sh in 4592e6080f (cache-tree: verify valid cache-tree in the test suite - 2018-08-18) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-18cache-tree: verify valid cache-tree in the test suiteNguyễn Thái Ngọc Duy
This makes sure that cache-tree is consistent with the index. The main purpose is to catch potential problems by saving the index in unpack_trees() but the line in write_index() would also help spot missing invalidation in other code. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-18unpack-trees: add missing cache invalidationNguyễn Thái Ngọc Duy
Any changes to the output index should be (confusingly) marked in the source index with invalidate_ce_path(). This is used to make sure we still have valid untracked cache and cache-tree extensions in the end. We do a pretty good job of invalidating except in two places. verify_clean_subdirectory() is part of verify_absent() and verify_absent_sparse(). The former is usually called by merged_entry() or directly in threeway_merge(). The latter is obviously used by sparse checkout. In these three call sites, only merged_entry() follows up with invalidate_ce_path(). The other two don't, but they should not trigger this ce removal because this is about D/F conflicts [1]. But let's be safe and invalidate_ce_path() here as well. The second place is keep_entry() which is also used by threeway_merge() to keep higher stage entries. In order to reuse cache-tree we need to invalidate these paths as well. It's not a problem in the past because whenever a higher stage entry is present, cache-tree will not be created [2]. Now we salvage cache-tree even when higher stage entries are present, we need more invalidation. [1] c81935348b (Fix switching to a branch with D/F when current branch has file D. - 2007-03-15) [2] This is probably too strict. We should be able to create and save cache-tree for the directories that do not have conflict entries in cache_tree_update(). And this becomes more important when cache-tree plays bigger role in terms of performance. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-18unpack-trees: reuse (still valid) cache-tree from src_indexNguyễn Thái Ngọc Duy
We do n-way merge by walking the source index and n trees at the same time and add merge results to a new temporary index called o->result. The merge result for any given path could be either - keep_entry(): same old index entry in o->src_index is reused - merged_entry(): either a new entry is added, or an existing one updated - deleted_entry(): one entry from o->src_index is removed For some reason [1] we keep making sure that the source index's cache-tree is still valid if used by o->result: for all those merged/deleted entries, we invalidate the same path in o->src_index, so only cache-trees covering the "keep_entry" parts remain good. Because of this, the cache-tree from o->src_index can be perfectly reused in o->result. And in fact we already rely on this logic to reuse untracked cache in edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08). Move the cache-tree to o->result before doing cache_tree_update() to reduce hashing cost. Since cache_tree_update() has risen up as one of the most expensive parts in unpack_trees() after the last few patches. This does help reduce unpack_trees() time significantly (on webkit.git): before after -------------------------------------------------------------------- 0.080394752 0.051258167 s: read cache .git/index 0.216010838 0.212106298 s: preload index 0.008534301 0.280521764 s: refresh index 0.251992198 0.218160442 s: traverse_trees 0.377031383 0.374948191 s: check_updates 0.372768105 0.037040114 s: cache_tree_update 1.045887251 0.672031609 s: unpack_trees 0.314983512 0.317456290 s: write index, changed mask = 2e 0.062572653 0.038382654 s: traverse_trees 0.000022544 0.000042731 s: check_updates 0.073795585 0.050930053 s: unpack_trees 0.073807557 0.051099735 s: diff-index 1.938191592 1.614241153 s: git command: git checkout - [1] I'm pretty sure the reason is an oversight in 34110cd4e3 (Make 'unpack_trees()' have a separate source and destination index - 2008-03-06). That patch aims to _not_ update the source index at all. The invalidation should have been done on o->result in that patch. But then there was no cache-tree on o->result even then so it's pointless to do so. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-18unpack-trees: reduce malloc in cache-tree walkNguyễn Thái Ngọc Duy
This is a micro optimization that probably only shines on repos with deep directory structure. Instead of allocating and freeing a new cache_entry in every iteration, we reuse the last one and only update the parts that are new each iteration. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-18unpack-trees: optimize walking same trees with cache-treeNguyễn Thái Ngọc Duy
In order to merge one or many trees with the index, unpack-trees code walks multiple trees in parallel with the index and performs n-way merge. If we find out at start of a directory that all trees are the same (by comparing OID) and cache-tree happens to be available for that directory as well, we could avoid walking the trees because we already know what these trees contain: it's flattened in what's called "the index". The upside is of course a lot less I/O since we can potentially skip lots of trees (think subtrees). We also save CPU because we don't have to inflate and apply the deltas. The downside is of course more fragile code since the logic in some functions are now duplicated elsewhere. "checkout -" with this patch on webkit.git (275k files): baseline new -------------------------------------------------------------------- 0.056651714 0.080394752 s: read cache .git/index 0.183101080 0.216010838 s: preload index 0.008584433 0.008534301 s: refresh index 0.633767589 0.251992198 s: traverse_trees 0.340265448 0.377031383 s: check_updates 0.381884638 0.372768105 s: cache_tree_update 1.401562947 1.045887251 s: unpack_trees 0.338687914 0.314983512 s: write index, changed mask = 2e 0.411927922 0.062572653 s: traverse_trees 0.000023335 0.000022544 s: check_updates 0.423697246 0.073795585 s: unpack_trees 0.423708360 0.073807557 s: diff-index 2.559524127 1.938191592 s: git command: git checkout - Another measurement from Ben's running "git checkout" with over 500k trees (on the whole series): baseline new ---------------------------------------------------------------------- 0.535510167 0.556558733 s: read cache .git/index 0.3057373 0.3147105 s: initialize name hash 0.0184082 0.023558433 s: preload index 0.086910967 0.089085967 s: refresh index 7.889590767 2.191554433 s: unpack trees 0.120760833 0.131941267 s: update worktree after a merge 2.2583504 2.572663167 s: repair cache-tree 0.8916137 0.959495233 s: write index, changed mask = 28 3.405199233 0.2710663 s: unpack trees 0.000999667 0.0021554 s: update worktree after a merge 3.4063306 0.273318333 s: diff-index 16.9524923 9.462943133 s: git command: git.exe checkout This command calls unpack_trees() twice, the first time on 2way merge and the second 1way merge. In both times, "unpack trees" time is reduced to one third. Overall time reduction is not that impressive of course because index operations take a big chunk. And there's that repair cache-tree line. PS. A note about cache-tree invalidation and the use of it in this code. We do invalidate cache-tree in _source_ index when we add new entries to the (temporary) "result" index. But we also use the cache-tree from source index in this optimization. Does this mean we end up having no cache-tree in the source index to activate this optimization? The answer is twisted: the order of finding a good cache-tree and invalidating it matters. In this case we check for a good cache-tree first in all_trees_same_as_cache_tree(), then we start to merge things and potentially invalidate that same cache-tree in the process. Since cache-tree invalidation happens after the optimization kicks in, we're still good. But we may lose that cache-tree at the very first call_unpack_fn() call in traverse_by_cache_tree(). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-18unpack-trees: add performance tracingNguyễn Thái Ngọc Duy
We're going to optimize unpack_trees() a bit in the following patches. Let's add some tracing to measure how long it takes before and after. This is the baseline ("git checkout -" on webkit.git, 275k files on worktree) performance: 0.056651714 s: read cache .git/index performance: 0.183101080 s: preload index performance: 0.008584433 s: refresh index performance: 0.633767589 s: traverse_trees performance: 0.340265448 s: check_updates performance: 0.381884638 s: cache_tree_update performance: 1.401562947 s: unpack_trees performance: 0.338687914 s: write index, changed mask = 2e performance: 0.411927922 s: traverse_trees performance: 0.000023335 s: check_updates performance: 0.423697246 s: unpack_trees performance: 0.423708360 s: diff-index performance: 2.559524127 s: git command: git checkout - Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-17clone: report duplicate entries on case-insensitive filesystemsDuy Nguyen
Paths that only differ in case work fine in a case-sensitive filesystems, but if those repos are cloned in a case-insensitive one, you'll get problems. The first thing to notice is "git status" will never be clean with no indication what exactly is "dirty". This patch helps the situation a bit by pointing out the problem at clone time. Even though this patch talks about case sensitivity, the patch makes no assumption about folding rules by the filesystem. It simply observes that if an entry has been already checked out at clone time when we're about to write a new path, some folding rules are behind this. In the case that we can't rely on filesystem (via inode number) to do this check, fall back to fspathcmp() which is not perfect but should not give false positives. This patch is tested with vim-colorschemes and Sublime-Gitignore repositories on a JFS partition with case insensitive support on Linux. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>