summaryrefslogtreecommitdiff
path: root/unpack-trees.c
AgeCommit message (Collapse)Author
2010-01-12Remove empty directories when checking out a commit with fewer submodulesPeter Collingbourne
Change the unlink_entry function to use rmdir to remove submodule directories. Currently we try to use unlink, which will never succeed. Of course rmdir will only succeed for empty (i.e. not checked out) submodule directories. Behaviour if a submodule is checked out stays essentially the same: print a warning message and keep the submodule directory. Signed-off-by: Peter Collingbourne <peter@pcc.me.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-07read-tree --debug-unpackJunio C Hamano
A debugging patch. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-07unpack-trees.c: look ahead in the indexJunio C Hamano
This makes the traversal of index be in sync with the tree traversal. When unpack_callback() is fed a set of tree entries from trees, it inspects the name of the entry and checks if the an index entry with the same name could be hiding behind the current index entry, and (1) if the name appears in the index as a leaf node, it is also fed to the n_way_merge() callback function; (2) if the name is a directory in the index, i.e. there are entries in that are underneath it, then nothing is fed to the n_way_merge() callback function; (3) otherwise, if the name comes before the first eligible entry in the index, the index entry is first unpacked alone. When traverse_trees_recursive() descends into a subdirectory, the cache_bottom pointer is moved to walk index entries within that directory. All of these are omitted for diff-index, which does not even want to be fed an index entry and a tree entry with D/F conflicts. This fixes 3-way read-tree and exposes a bug in other parts of the system in t6035, test #5. The test prepares these three trees: O = HEAD^ 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 a/b-2/c/d 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 a/b/c/d 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 a/x A = HEAD 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 a/b-2/c/d 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 a/b/c/d 100644 blob 587be6b4c3f93f93c489c0111bba5596147a26cb a/x B = master 120000 blob a36b77384451ea1de7bd340ffca868249626bc52 a/b 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 a/b-2/c/d 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 a/x With a clean index that matches HEAD, running git read-tree -m -u --aggressive $O $A $B now yields 120000 a36b77384451ea1de7bd340ffca868249626bc52 3 a/b 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 a/b-2/c/d 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 1 a/b/c/d 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 2 a/b/c/d 100644 587be6b4c3f93f93c489c0111bba5596147a26cb 0 a/x which is correct. "master" created "a/b" symlink that did not exist, and removed "a/b/c/d" while HEAD did not do touch either path. Before this series, read-tree did not notice the situation and resolved addition of "a/b" and removal of "a/b/c/d" independently. If A = HEAD had another path "a/b/c/e" added, this merge should conflict but instead it silently resolved "a/b" and then immediately overwrote it to add "a/b/c/e", which was quite bogus. Tests in t1012 start to work with this. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-07unpack-trees.c: prepare for looking ahead in the indexJunio C Hamano
This prepares but does not yet implement a look-ahead in the index entries when traverse-trees.c decides to give us tree entries in an order that does not match what is in the index. A case where a look-ahead in the index is necessary happens when merging branch B into branch A while the index matches the current branch A, using a tree O as their common ancestor, and these three trees looks like this: O A B t t t-i t-i t-i t-j t-j t/1 t/2 The traverse_trees() function gets "t", "t-i" and "t" from trees O, A and B first, and notices that A may have a matching "t" behind "t-i" and "t-j" (indeed it does), and tells A to give that entry instead. After unpacking blob "t" from tree B (as it hasn't changed since O in B and A removed it, it will result in its removal), it descends into directory "t/". The side that walked index in parallel to the tree traversal used to be implemented with one pointer, o->pos, that points at the next index entry to be processed. When this happens, the pointer o->pos still points at "t-i" that is the first entry. We should be able to skip "t-i" and "t-j" and locate "t/1" from the index while the recursive invocation of traverse_trees() walks and match entries found there, and later come back to process "t-i". While that look-ahead is not implemented yet, this adds a flag bit, CE_UNPACKED, to mark the entries in the index that has already been processed. o->pos pointer has been renamed to o->cache_bottom and it points at the first entry that may still need to be processed. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-04Aggressive three-way merge: fix D/F caseJunio C Hamano
When the ancestor used to have a blob "P", your tree removed it, and the tree you are merging with also removed it, the agressive three-way cleanly merges to remove that blob. If the other tree added a new blob "P/Q" while removing "P", it should also merge cleanly to remove "P" and create "P/Q" (since neither the ancestor nor your tree could have had it, so it is a typical "created in one"). The "aggressive" rule is not new anymore. Reword the stale comment. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-04"reset --merge": fix unmerged caseJunio C Hamano
Commit 9e8ecea (Add 'merge' mode to 'git reset', 2008-12-01) disallowed "git reset --merge" when there was unmerged entries. But it wished if unmerged entries were reset as if --hard (instead of --merge) has been used. This makes sense because all "mergy" operations makes sure that any path involved in the merge does not have local modifications before starting, so resetting such a path away won't lose any information. The previous commit changed the behavior of --merge to accept resetting unmerged entries if they are reset to a different state than HEAD, but it did not reset the changes in the work tree, leaving the conflict markers in the resulting file in the work tree. Fix it by doing three things: - Update the documentation to match the wish of original "reset --merge" better, namely, "An unmerged entry is a sign that the path didn't have any local modification and can be safely resetted to whatever the new HEAD records"; - Update read_index_unmerged(), which reads the index file into the cache while dropping any higher-stage entries down to stage #0, not to copy the object name from the higher stage entry. The code used to take the object name from the a stage entry ("base" if you happened to have stage #1, or "ours" if both sides added, etc.), which essentially meant that you are getting random results depending on what the merge did. The _only_ reason we want to keep a previously unmerged entry in the index at stage #0 is so that we don't forget the fact that we have corresponding file in the work tree in order to be able to remove it when the tree we are resetting to does not have the path. In order to differentiate such an entry from ordinary cache entry, the cache entry added by read_index_unmerged() is marked as CE_CONFLICTED. - Update merged_entry() and deleted_entry() so that they pay attention to cache entries marked as CE_CONFLICTED. They are previously unmerged entries, and the files in the work tree that correspond to them are resetted away by oneway_merge() to the version from the tree we are resetting to. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-12-14ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALIDNguyễn Thái Ngọc Duy
Previously CE_MATCH_IGNORE_VALID flag is used by both valid and skip-worktree bits. While the two bits have similar behaviour, sharing this flag means "git update-index --really-refresh" will ignore skip-worktree while it should not. Instead another flag is introduced to ignore skip-worktree bit, CE_MATCH_IGNORE_VALID only applies to valid bit. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-21Merge branch 'jc/fix-tree-walk' (early part)Junio C Hamano
* 'jc/fix-tree-walk' (early part): unpack_callback(): use unpack_failed() consistently unpack-trees: typofix diff-lib.c: fix misleading comments on oneway_diff()
2009-10-25Use 'fast-forward' all over the placeFelipe Contreras
It's a compound word. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-11unpack_callback(): use unpack_failed() consistentlyJunio C Hamano
When unpack_index_entry() failed, consistently call unpack_failed(), instead of silently returning -1. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-11unpack-trees: typofixJunio C Hamano
I am not good at subject-verb concordance. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-24sparse checkout: inhibit empty worktreeNguyễn Thái Ngọc Duy
The way sparse checkout works, users may empty their worktree completely, because of non-matching sparse-checkout spec, or empty spec. I believe this is not desired. This patch makes Git refuse to produce such worktree. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-24unpack-trees(): ignore worktree check outside checkout areaNguyễn Thái Ngọc Duy
verify_absent() and verify_uptodate() are used to ensure worktree is safe to be updated, then CE_REMOVE or CE_UPDATE will be set. Finally check_updates() bases on CE_REMOVE, CE_UPDATE and the recently added CE_WT_REMOVE to update working directory accordingly. The entries that are checked may eventually be left out of checkout area (done later in apply_sparse_checkout()). We don't want to update outside checkout area. This patch teaches Git to assume "good", skip these checks when it's sure those entries will be outside checkout area, and clear CE_REMOVE|CE_UPDATE that could be set due to this assumption. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-24unpack_trees(): apply $GIT_DIR/info/sparse-checkout to the final indexNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-24unpack-trees(): "enable" sparse checkout and load $GIT_DIR/info/sparse-checkoutNguyễn Thái Ngọc Duy
This patch introduces core.sparseCheckout, which will control whether sparse checkout support is enabled in unpack_trees() It also loads sparse-checkout file that will be used in the next patch. I split it out so the next patch will be shorter, easier to read. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-24unpack-trees.c: generalize verify_* functionsNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-24unpack-trees(): add CE_WT_REMOVE to remove on worktree aloneNguyễn Thái Ngọc Duy
CE_REMOVE now removes both worktree and index versions. Sparse checkout must be able to remove worktree version while keep the index intact when checkout area is narrowed. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-24unpack-trees(): carry skip-worktree bit over in merged_entry()Nguyễn Thái Ngọc Duy
In this code path, we would remove "old" and replace it with "merge". "old" may have skip-worktree bit, so re-add it to "merge". Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-24Teach Git to respect skip-worktree bit (writing part)Nguyễn Thái Ngọc Duy
This part is mainly to remove CE_VALID shortcuts (and as a consequence, ce_uptodate() shortcuts as it may be turned on by CE_VALID) in writing code path if skip-worktree is used. Various tests are added to avoid future breakages. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-18Merge branch 'maint'Junio C Hamano
* maint: checkout -f: deal with a D/F conflict entry correctly sha1_name.c: avoid unnecessary strbuf_release refs.c: release file descriptor on error return
2009-07-18checkout -f: deal with a D/F conflict entry correctlyJunio C Hamano
When we switch branches with "checkout -f", unpack_trees() feeds two cache_entries to oneway_merge() function in its src[] array argument. The zeroth entry comes from the current index, and the first entry represents what the merge result should be, taken from the tree recorded in the commit we are switching to. When we have a blob (either regular file or a symlink) in the index and in the work tree at path "foo", and the switched-to tree has "foo/bar", i.e. "foo" becomes a directory, src[0] is obviously that blob currently registered at "foo". Even though we do not have anything at "foo" in the switched-to tree, src[1] is _not_ NULL in this case. The unpack_trees() machinery places a special marker df_conflict_entry to signal that no blob exists at "foo", but it will become a directory that may have somthing underneath it (namely "foo/bar"), so a usual 3-way merge can notice the situation. But oneway_merge() codepath failed to notice this and passed the special marker directly to merged_entry(). This happens to remove the "foo" in the end because the df_conflict_entry does not have any name (hence the "error" message) and its addition in add_index_entry() is rejected, but it is wrong. Signed-off-by: Junio C Hamano <gitster@pobox.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-14Fix extraneous lstat's in 'git checkout -f'Linus Torvalds
In our 'oneway_merge()' we always do an 'lstat()' to see if we might need to mark the entry for updating. But we really shouldn't need to do that when the cache entry is already marked as being ce_uptodate(), and this makes us do unnecessary lstat() calls if we have index preloading enabled. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-11unpack-trees.c: work around run-time array initialization flaw on IRIX 6.5Brandon Casey
The c99 MIPSpro Compiler version 7.4.4m on IRIX 6.5 does not properly initialize run-time initialized arrays. An array which is initialized with fewer elements than the length of the array should have the unitialized elements initialized to zero. This compiler only initializes the remaining elements when the last element is a static parameter. So work around it by adding a "NULL" initialization parameter. Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-09Simplify read_directory[_recursive]() argumentsLinus Torvalds
Stop the insanity with separate 'path' and 'base' arguments that must match. We don't need that crazy interface any more, since we cleaned up handling of 'path' in commit da4b3e8c28b1dc2b856d2555ac7bb47ab712598c. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-21Fix various sparse warnings in the git source codeLinus Torvalds
There are a few remaining ones, but this fixes the trivial ones. It boils down to two main issues that sparse complains about: - warning: Using plain integer as NULL pointer Sparse doesn't like you using '0' instead of 'NULL'. For various good reasons, not the least of which is just the visual confusion. A NULL pointer is not an integer, and that whole "0 works as NULL" is a historical accident and not very pretty. A few of these remain: zlib is a total mess, and Z_NULL is just a 0. I didn't touch those. - warning: symbol 'xyz' was not declared. Should it be static? Sparse wants to see declarations for any functions you export. A lack of a declaration tends to mean that you should either add one, or you should mark the function 'static' to show that it's in file scope. A few of these remain: I only did the ones that should obviously just be made static. That 'wt_status_submodule_summary' one is debatable. It has a few related flags (like 'wt_status_use_color') which _are_ declared, and are used by builtin-commit.c. So maybe we'd like to export it at some point, but it's not declared now, and not used outside of that file, so 'static' it is in this patch. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-25Optimize "diff-index --cached" using cache-treeJunio C Hamano
When running "diff-index --cached" after making a change to only a small portion of the index, there is no point unpacking unchanged subtrees into the index recursively, only to find that all entries match anyway. Tweak unpack_trees() logic that is used to read in the tree object to catch the case where the tree entry we are looking at matches the index as a whole by looking at the cache-tree. As an exercise, after modifying a few paths in the kernel tree, here are a few numbers on my Athlon 64X2 3800+: (without patch, hot cache) $ /usr/bin/time git diff --cached --raw :100644 100644 b57e1f5... e69de29... M Makefile :100644 000000 8c86b72... 0000000... D arch/x86/Makefile :000000 100644 0000000... e69de29... A arche 0.07user 0.02system 0:00.09elapsed 102%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+9407minor)pagefaults 0swaps (with patch, hot cache) $ /usr/bin/time ../git.git/git-diff --cached --raw :100644 100644 b57e1f5... e69de29... M Makefile :100644 000000 8c86b72... 0000000... D arch/x86/Makefile :000000 100644 0000000... e69de29... A arche 0.02user 0.00system 0:00.02elapsed 103%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+2446minor)pagefaults 0swaps Cold cache numbers are very impressive, but it does not matter very much in practice: (without patch, cold cache) $ su root sh -c 'echo 3 >/proc/sys/vm/drop_caches' $ /usr/bin/time git diff --cached --raw :100644 100644 b57e1f5... e69de29... M Makefile :100644 000000 8c86b72... 0000000... D arch/x86/Makefile :000000 100644 0000000... e69de29... A arche 0.06user 0.17system 0:10.26elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k 247032inputs+0outputs (1172major+8237minor)pagefaults 0swaps (with patch, cold cache) $ su root sh -c 'echo 3 >/proc/sys/vm/drop_caches' $ /usr/bin/time ../git.git/git-diff --cached --raw :100644 100644 b57e1f5... e69de29... M Makefile :100644 000000 8c86b72... 0000000... D arch/x86/Makefile :000000 100644 0000000... e69de29... A arche 0.02user 0.01system 0:01.01elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k 18440inputs+0outputs (79major+2369minor)pagefaults 0swaps This of course helps "git status" as well. (without patch, hot cache) $ /usr/bin/time ../git.git/git-status >/dev/null 0.17user 0.18system 0:00.35elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+5336outputs (0major+10970minor)pagefaults 0swaps (with patch, hot cache) $ /usr/bin/time ../git.git/git-status >/dev/null 0.10user 0.16system 0:00.27elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+5336outputs (0major+3921minor)pagefaults 0swaps Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-30replace direct calls to unlink(2) with unlink_or_warnAlex Riesen
This helps to notice when something's going wrong, especially on systems which lock open files. I used the following criteria when selecting the code for replacement: - it was already printing a warning for the unlink failures - it is in a function which already printing something or is called from such a function - it is in a static function, returning void and the function is only called from a builtin main function (cmd_) - it is in a function which handles emergency exit (signal handlers) - it is in a function which is obvously cleaning up the lockfiles Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-18unpack-trees: do not muck with attributes when we are not checking outJunio C Hamano
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-26Merge branch 'jc/attributes-checkout'Junio C Hamano
* jc/attributes-checkout: Add a test for checking whether gitattributes is honored by checkout. Read attributes from the index that is being checked out
2009-03-18Merge branch 'jc/maint-1.6.0-read-tree-overlay'Junio C Hamano
* jc/maint-1.6.0-read-tree-overlay: read-tree A B C: do not create a bogus index and do not segfault
2009-03-14Read attributes from the index that is being checked outJunio C Hamano
Traditionally we used .gitattributes file from the work tree if exists, and otherwise read from the index as a fallback. When switching to a branch that has an updated .gitattributes file, and entries in it give different attributes to other paths being checked out, we should instead read from the .gitattributes in the index. This breaks a use case of fixing incorrect entries in the .gitattributes in the work tree (without adding it to the index) and checking other paths out, though. $ edit .gitattributes ;# mark foo.dat as binary $ rm foo.dat $ git checkout foo.dat Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-13read-tree A B C: do not create a bogus index and do not segfaultJunio C Hamano
"git read-tree A B C..." without the "-m" (merge) option is a way to read these trees on top of each other to get an overlay of them. An ancient commit ee6566e (Rewrite read-tree, 2005-09-05) passed the ADD_CACHE_SKIP_DFCHECK flag when calling add_index_entry() to add the paths obtained from these trees to the index, but it is an incorrect use of the flag. The flag is meant to be used by callers who know the addition of the entry does not introduce a D/F conflict to the index in order to avoid the overhead of checking. This bug resulted in a bogus index that records both "x" and "x/z" as a blob after reading three trees that have paths ("x"), ("x", "y"), and ("x/z", "y") respectively. 34110cd (Make 'unpack_trees()' have a separate source and destination index, 2008-03-06) refactored the callsites of add_index_entry() incorrectly and added more codepaths that use this flag when it shouldn't be used. Also, 0190457 (Move 'unpack_trees()' over to 'traverse_trees()' interface, 2008-03-05) introduced a bug to call add_index_entry() for the tree that does not have the path in it, passing NULL as a cache entry. This caused reading multiple trees, one of which has path "x" but another doesn't, to segfault. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-08Record ns-timestamps if possible, but do not use it without USE_NSECKjetil Barvik
Traditionally, the lack of USE_NSEC meant "do not record nor use the nanosecond resolution part of the file timestamps". To avoid problems on filesystems that lose the ns part when the metadata is flushed to the disk and then later read back in, disabling USE_NSEC has been a good idea in general. If you are on a filesystem without such an issue, it does not hurt to read and store them in the cached stat data in the index entries even if your git is compiled without USE_NSEC. The index left with such a version of git can be read by git compiled with USE_NSEC and it can make use of the nanosecond part to optimize the check to see if the path on the filesystem hsa been modified since we last looked at. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-20verify_uptodate(): add ce_uptodate(ce) testKjetil Barvik
If we inside verify_uptodate() can already tell from the ce entry that it is already uptodate by testing it with ce_uptodate(ce), there is no need to call lstat(2) and ie_match_stat() afterwards. And, reading from the commit log message from: commit eadb5831342bb2e756fa05c03642c4aa1929d4f5 Author: Junio C Hamano <gitster@pobox.com> Date: Fri Jan 18 23:45:24 2008 -0800 Avoid running lstat(2) on the same cache entry. this also seems to be correct usage of the ce_uptodate() macro introduced by that patch. This will avoid lots of lstat(2) calls in some cases, for example by running the 'git checkout' command. Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-20make USE_NSEC work as expectedKjetil Barvik
Since the filesystem ext4 is now defined as stable in Linux v2.6.28, and ext4 supports nanonsecond resolution timestamps natively, it is time to make USE_NSEC work as expected. This will make racy git situations less likely to happen. For 'git checkout' this means it will be less likely that we have to open, read the contents of the file into RAM, and check if file is really modified or not. The result sould be a litle less used CPU time, less pagefaults and a litle faster program, at least for 'git checkout'. Since the number of possible racy git situations would increase when disks gets faster, this patch would be more and more helpfull as times go by. For a fast Solid State Disk, this patch should be helpfull. Note that, when file operations starts to take less than 1 nanosecond, one would again start to get more racy git situations. For more info on racy git, see Documentation/technical/racy-git.txt For more info on ext4, see http://kernelnewbies.org/Ext4 Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-19check_updates(): effective removal of cache entries marked CE_REMOVEKjetil Barvik
Below is oprofile output from GIT command 'git chekcout -q my-v2.6.25' (move from tag v2.6.27 to tag v2.6.25 of the Linux kernel): CPU: Core 2, speed 1999.95 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 20000 Counted INST_RETIRED_ANY_P events (number of instructions retired) with a unit mask of 0x00 (No unit mask) count 20000 CPU_CLK_UNHALT...|INST_RETIRED:2...| samples| %| samples| %| ------------------------------------ 409247 100.000 342878 100.000 git CPU_CLK_UNHALT...|INST_RETIRED:2...| samples| %| samples| %| ------------------------------------ 260476 63.6476 257843 75.1996 libz.so.1.2.3 100876 24.6492 64378 18.7758 kernel-2.6.28.4_2.vmlinux 30850 7.5382 7874 2.2964 libc-2.9.so 14775 3.6103 8390 2.4469 git 2020 0.4936 4325 1.2614 libcrypto.so.0.9.8 191 0.0467 32 0.0093 libpthread-2.9.so 58 0.0142 36 0.0105 ld-2.9.so 1 2.4e-04 0 0 libldap-2.3.so.0.2.31 Detail list of the top 20 function entries (libz counted in one blob): CPU_CLK_UNHALTED INST_RETIRED_ANY_P samples % samples % image name symbol name 260476 63.6862 257843 75.2725 libz.so.1.2.3 /lib/libz.so.1.2.3 16587 4.0555 3636 1.0615 libc-2.9.so memcpy 7710 1.8851 277 0.0809 libc-2.9.so memmove 3679 0.8995 1108 0.3235 kernel-2.6.28.4_2.vmlinux d_validate 3546 0.8670 2607 0.7611 kernel-2.6.28.4_2.vmlinux __getblk 3174 0.7760 1813 0.5293 libc-2.9.so _int_malloc 2396 0.5858 3681 1.0746 kernel-2.6.28.4_2.vmlinux copy_to_user 2270 0.5550 2528 0.7380 kernel-2.6.28.4_2.vmlinux __link_path_walk 2205 0.5391 1797 0.5246 kernel-2.6.28.4_2.vmlinux ext4_mark_iloc_dirty 2103 0.5142 1203 0.3512 kernel-2.6.28.4_2.vmlinux find_first_zero_bit 2077 0.5078 997 0.2911 kernel-2.6.28.4_2.vmlinux do_get_write_access 2070 0.5061 514 0.1501 git cache_name_compare 2043 0.4995 1501 0.4382 kernel-2.6.28.4_2.vmlinux rcu_irq_exit 2022 0.4944 1732 0.5056 kernel-2.6.28.4_2.vmlinux __ext4_get_inode_loc 2020 0.4939 4325 1.2626 libcrypto.so.0.9.8 /usr/lib/libcrypto.so.0.9.8 1965 0.4804 1384 0.4040 git patch_delta 1708 0.4176 984 0.2873 kernel-2.6.28.4_2.vmlinux rcu_sched_grace_period 1682 0.4112 727 0.2122 kernel-2.6.28.4_2.vmlinux sysfs_slab_alias 1659 0.4056 290 0.0847 git find_pack_entry_one 1480 0.3619 1307 0.3816 kernel-2.6.28.4_2.vmlinux ext4_writepage_trans_blocks Notice the memmove line, where the CPU did 7710 / 277 = 27.8 cycles per instruction, and compared to the total cycles spent inside the source code of GIT for this command, all the memmove() calls translates to (7710 * 100) / 14775 = 52.2% of this. Retesting with a GIT program compiled for gcov usage, I found out that the memmove() calls came from remove_index_entry_at() in read-cache.c, where we have: memmove(istate->cache + pos, istate->cache + pos + 1, (istate->cache_nr - pos) * sizeof(struct cache_entry *)); remove_index_entry_at() is called 4902 times from check_updates() in unpack-trees.c, and each time called we move each cache_entry pointers (from the removed one) one step to the left. Since we have 28828 entries in the cache this time, and if we on average move half of them each time, we in total move approximately 4902 * 0.5 * 28828 * 4 = 282 629 712 bytes, or twice this amount if each pointer is 8 bytes (64 bit). OK, is seems that the function check_updates() is called 28 times, so the estimated guess above had been more correct if check_updates() had been called only once, but the point is: we get lots of bytes moved. To fix this, and use an O(N) algorithm instead, where N is the number of cache_entries, we delete/remove all entries in one loop through all entries. From a retest, the new remove_marked_cache_entries() from the patch below, ended up with the following output line from oprofile: 46 0.0105 15 0.0041 git remove_marked_cache_entries If we can trust the numbers from oprofile in this case, we saved approximately ((7710 - 46) * 20000) / (2 * 1000 * 1000 * 1000) = 0.077 seconds CPU time with this fix for this particular test. And notice that now the CPU did only 46 / 15 = 3.1 cycles/instruction. Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-10unlink_entry(): introduce schedule_dir_for_removal()Kjetil Barvik
Currently inside unlink_entry() if we get a successful removal of one file with unlink(), we try to remove the leading directories each and every time. So if one directory containing 200 files is moved to an other location we get 199 failed calls to rmdir() and 1 successful call. To fix this and avoid some unnecessary calls to rmdir(), we schedule each directory for removal and wait much longer before we do the real call to rmdir(). Since the unlink_entry() function is called with alphabetically sorted names, this new function end up being very effective to avoid unnecessary calls to rmdir(). In some cases over 95% of all calls to rmdir() is removed with this patch. Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-10lstat_cache(): swap func(length, string) into func(string, length)Kjetil Barvik
Swap function argument pair (length, string) into (string, length) to conform with the commonly used order inside the GIT source code. Also, add a note about this fact into the coding guidelines. Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-01Merge branch 'maint'Junio C Hamano
* maint: merge: fix out-of-bounds memory access
2009-02-01Merge branch 'maint-1.6.0' into maintJunio C Hamano
* maint-1.6.0: merge: fix out-of-bounds memory access
2009-01-31merge: fix out-of-bounds memory accessRené Scharfe
The parameter n of unpack_callback() can have a value of up to MAX_UNPACK_TREES. The check at the top of unpack_trees() (its only (indirect) caller) makes sure it cannot exceed this limit. unpack_callback() passes it and the array src to unpack_nondirectories(), which has this loop: for (i = 0; i < n; i++) { /* ... */ src[i + o->merge] = o->df_conflict_entry; o->merge can be 0 or 1, so unpack_nondirectories() potentially accesses the array src at index MAX_UNPACK_TREES. This patch makes it big enough. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-26Merge branch 'kb/lstat-cache'Junio C Hamano
* kb/lstat-cache: lstat_cache(): introduce clear_lstat_cache() function lstat_cache(): introduce invalidate_lstat_cache() function lstat_cache(): introduce has_dirs_only_path() function lstat_cache(): introduce has_symlink_or_noent_leading_path() function lstat_cache(): more cache effective symlink/directory detection
2009-01-24Merge branch 'cb/maint-unpack-trees-absense' into maintJunio C Hamano
* cb/maint-unpack-trees-absense: unpack-trees: remove redundant path search in verify_absent unpack-trees: fix path search bug in verify_absent unpack-trees: handle failure in verify_absent
2009-01-18lstat_cache(): introduce has_symlink_or_noent_leading_path() functionKjetil Barvik
In some cases, especially inside the unpack-trees.c file, and inside the verify_absent() function, we can avoid some unnecessary calls to lstat(), if the lstat_cache() function can also be told to keep track of non-existing directories. So we update the lstat_cache() function to handle this new fact, introduce a new wrapper function, and the result is that we save lots of lstat() calls for a removed directory which previously contained lots of files, when we call this new wrapper of lstat_cache() instead of the old one. We do similar changes inside the unlink_entry() function, since if we can already say that the leading directory component of a pathname does not exist, it is not necessary to try to remove a pathname below it! Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-14Merge branch 'cb/maint-unpack-trees-absense'Junio C Hamano
* cb/maint-unpack-trees-absense: unpack-trees: remove redundant path search in verify_absent unpack-trees: fix path search bug in verify_absent unpack-trees: handle failure in verify_absent
2009-01-05unpack-trees: remove redundant path search in verify_absentClemens Buchacher
Since the only caller, verify_absent, relies on the fact that o->pos points to the next index entry anyways, there is no need to recompute its position. Furthermore, if a nondirectory entry were found, this would return too early, because there could still be an untracked directory in the way. This is currently not a problem, because verify_absent is only called if the index does not have this entry. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-05unpack-trees: fix path search bug in verify_absentClemens Buchacher
Commit 0cf73755 (unpack-trees.c: assume submodules are clean during check-out) changed an argument to verify_absent from 'path' to 'ce', which is however shadowed by a local variable of the same name. The bug triggers if verify_absent is used on a tree entry, for which the index contains one or more subsequent directories of the same length. The affected subdirectories are removed from the index. The testcase included in this commit bisects to 55218834 (checkout: do not lose staged removal), which reveals the bug in this case, but is otherwise unrelated. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-05unpack-trees: handle failure in verify_absentClemens Buchacher
Commit 203a2fe1 (Allow callers of unpack_trees() to handle failure) changed the "die on error" behavior to "return failure code". verify_absent did not handle errors returned by verify_clean_subdirectory, however. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-11Fix non-literal format in printf-style callsDaniel Lowe
These were found using gcc 4.3.2-1ubuntu11 with the warning: warning: format not a string literal and no format arguments Incorporated suggestions from Brandon Casey <casey@nrlssc.navy.mil>. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-02correct cache_entry allocationJeff King
Most cache_entry structs are allocated by using the cache_entry_size macro, which rounds the size of the struct up to the nearest multiple of 8 bytes (presumably to avoid memory fragmentation). There is one exception: the special "conflict entry" is allocated with an empty name, and so is explicitly given just one extra byte to hold the NUL. However, later code doesn't realize that this particular struct has been allocated differently, and happily tries reading and copying it based on the ce_size macro, which assumes the 8-byte alignment. This can lead to reading uninitalized data, though since that data is simply padding, there shouldn't be any problem as a result. Still, it makes sense to hold the padding assumption so as not to surprise later maintainers. This fixes valgrind errors in t1005, t3030, t4002, and t4114. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>