path: root/read-cache.c
AgeCommit message (Collapse)Author
2011-06-30Merge branch 'ef/maint-win-verify-path'Junio C Hamano
* ef/maint-win-verify-path: verify_dotfile(): do not assume '/' is the path seperator verify_path(): simplify check at the directory boundary verify_path: consider dos drive prefix real_path: do not assume '/' is the path seperator A Windows path starting with a backslash is absolute
2011-06-08verify_dotfile(): do not assume '/' is the path seperatorTheo Niessink
verify_dotfile() currently assumes that the path seperator is '/', but on Windows it can also be '\\', so use is_dir_sep() instead. Signed-off-by: Theo Niessink <> Signed-off-by: Junio C Hamano <>
2011-06-07verify_path(): simplify check at the directory boundaryJunio C Hamano
We simply want to say "At a directory boundary, be careful with a name that begins with a dot, forbid a name that ends with the boundary character or has duplicated bounadry characters". Signed-off-by: Junio C Hamano <>
2011-05-27verify_path: consider dos drive prefixErik Faye-Lund
If someone manage to create a repo with a 'C:' entry in the root-tree, files can be written outside of the working-dir. This opens up a can-of-worms of exploits. Fix it by explicitly checking for a dos drive prefix when verifying a paht. While we're at it, make sure that paths beginning with '\' is considered absolute as well. Noticed-by: Theo Niessink <> Signed-off-by: Erik Faye-Lund <> Signed-off-by: Junio C Hamano <>
2011-05-09index_fd(): turn write_object and format_check arguments into one flagJunio C Hamano
The "format_check" parameter tucked after the existing parameters is too ugly an afterthought to live in any reasonable API. Combine it with the other boolean parameter "write_object" into a single "flags" parameter. Signed-off-by: Junio C Hamano <>
2011-04-03Merge branch 'jc/index-update-if-able' into maintJunio C Hamano
* jc/index-update-if-able: update $GIT_INDEX_FILE when there are racily clean entries diff/status: refactor opportunistic index update
2011-03-27Merge branch 'jc/index-update-if-able'Junio C Hamano
* jc/index-update-if-able: update $GIT_INDEX_FILE when there are racily clean entries diff/status: refactor opportunistic index update
2011-03-21update $GIT_INDEX_FILE when there are racily clean entriesJunio C Hamano
Traditional "opportunistic index update" done by read-only "diff" and "status" was about updating cached lstat(2) information in the index for the next round. We missed another obvious optimization opportunity: when there are racily clean entries that will cease to be racily clean by updating $GIT_INDEX_FILE. Detect that case and write $GIT_INDEX_FILE out to give it a newer timestamp. Noticed by Lasse Makholm by stracing "git status" in a fresh checkout and counting the number of open(2) calls. Signed-off-by: Junio C Hamano <>
2011-03-21diff/status: refactor opportunistic index updateJunio C Hamano
When we had to refresh the index internally before running diff or status, we opportunistically updated the $GIT_INDEX_FILE so that later invocation of git can use the lstat(2) we already did in this invocation. Make them share a helper function to do so. Signed-off-by: Junio C Hamano <>
2011-02-28Merge branch 'nd/hash-object-sanity'Junio C Hamano
* nd/hash-object-sanity: Make hash-object more robust against malformed objects Conflicts: cache.h
2011-02-28Merge branch 'nd/struct-pathspec'Junio C Hamano
* nd/struct-pathspec: (22 commits) t6004: add pathspec globbing test for log family t7810: overlapping pathspecs and depth limit grep: drop pathspec_matches() in favor of tree_entry_interesting() grep: use writable strbuf from caller for grep_tree() grep: use match_pathspec_depth() for cache/worktree grepping grep: convert to use struct pathspec Convert ce_path_match() to use match_pathspec_depth() Convert ce_path_match() to use struct pathspec struct rev_info: convert prune_data to struct pathspec pathspec: add match_pathspec_depth() tree_entry_interesting(): optimize wildcard matching when base is matched tree_entry_interesting(): support wildcard matching tree_entry_interesting(): fix depth limit with overlapping pathspecs tree_entry_interesting(): support depth limit tree_entry_interesting(): refactor into separate smaller functions diff-tree: convert base+baselen to writable strbuf glossary: define pathspec Move tree_entry_interesting() to tree-walk.c and export it tree_entry_interesting(): remove dependency on struct diff_options Convert struct diff_options to use struct pathspec ...
2011-02-23update-index --refresh --porcelain: add missing constJonathan Nieder
Signed-off-by: Jonathan Nieder <> Signed-off-by: Junio C Hamano <>
2011-02-07Make hash-object more robust against malformed objectsNguyễn Thái Ngọc Duy
Commits, trees and tags have structure. Don't let users feed git with malformed ones. Sooner or later git will die() when encountering them. Note that this patch does not check semantics. A tree that points to non-existent objects is perfectly OK (and should be so, users may choose to add commit first, then its associated tree for example). Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2011-02-03Convert ce_path_match() to use match_pathspec_depth()Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2011-02-03Convert ce_path_match() to use struct pathspecNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2010-12-04Merge branch 'jj/icase-directory'Junio C Hamano
* jj/icase-directory: Support case folding in git fast-import when core.ignorecase=true Support case folding for git add when core.ignorecase=true Add case insensitivity support when using git ls-files Add case insensitivity support for directories when using git status Case insensitivity support for .gitignore via core.ignorecase Add string comparison functions that respect the ignore_case variable. Makefile & configure: add a NO_FNMATCH_CASEFOLD flag Makefile & configure: add a NO_FNMATCH flag Conflicts: Makefile fast-import.c
2010-10-06Support case folding for git add when core.ignorecase=trueJoshua Jensen
When MyDir/ABC/filea.txt is added to Git, the disk directory MyDir/ABC/ is renamed to mydir/aBc/, and then mydir/aBc/fileb.txt is added, the index will contain MyDir/ABC/filea.txt and mydir/aBc/fileb.txt. Although the earlier portions of this patch series account for those differences in case, this patch makes the pathing consistent by folding the case of newly added files against the first file added with that path. In read-cache.c's add_to_index(), the index_name_exists() support used for git status's case insensitive directory lookups is used to find the proper directory case according to what the user already checked in. That is, MyDir/ABC/'s case is used to alter the stored path for fileb.txt to MyDir/ABC/fileb.txt (instead of mydir/aBc/fileb.txt). This is especially important when cloning a repository to a case sensitive file system. MyDir/ABC/ and mydir/aBc/ exist in the same directory on a Windows machine, but on Linux, the files exist in two separate directories. The update to add_to_index(), in effect, treats a Windows file system as case sensitive by making path case consistent. Signed-off-by: Joshua Jensen <> Signed-off-by: Johannes Sixt <> Signed-off-by: Junio C Hamano <>
2010-08-11core: Stop leaking ondisk_cache_entrysJonathan Nieder
Noticed with valgrind. Signed-off-by: Jonathan Nieder <> Signed-off-by: Junio C Hamano <>
2010-02-02Correct spelling of 'REUC' extensionShawn O. Pearce
The new dircache extension CACHE_EXT_RESOLVE_UNDO, whose value is 0x52455543, is actually the ASCII sequence 'REUC', not the ASCII sequence 'REUN'. Signed-off-by: Shawn O. Pearce <> Signed-off-by: Junio C Hamano <>
2010-01-24Make ce_uptodate() trustworthy againJunio C Hamano
The rule has always been that a cache entry that is ce_uptodate(ce) means that we already have checked the work tree entity and we know there is no change in the work tree compared to the index, and nobody should have to double check. Note that false ce_uptodate(ce) does not mean it is known to be dirty---it only means we don't know if it is clean. There are a few codepaths (refresh-index and preload-index are among them) that mark a cache entry as up-to-date based solely on the return value from ie_match_stat(); this function uses lstat() to see if the work tree entity has been touched, and for a submodule entry, if its HEAD points at the same commit as the commit recorded in the index of the superproject (a submodule that is not even cloned is considered clean). A submodule is no longer considered unmodified merely because its HEAD matches the index of the superproject these days, in order to prevent people from forgetting to commit in the submodule and updating the superproject index with the new submodule commit, before commiting the state in the superproject. However, the patch to do so didn't update the codepath that marks cache entries up-to-date based on the updated definition and instead worked it around by saying "we don't trust the return value of ce_uptodate() for submodules." This makes ce_uptodate() trustworthy again by not marking submodule entries up-to-date. The next step _could_ be to introduce a few "in-core" flag bits to cache_entry structure to record "this entry is _known_ to be dirty", call is_submodule_modified() from ie_match_stat(), and use these new bits to avoid running this rather expensive check more than once, but that can be a separate patch. Signed-off-by: Junio C Hamano <>
2010-01-22Remove diff machinery dependency from read-cacheLinus Torvalds
Exal Sibeaz pointed out that some git files are way too big, and that add_files_to_cache() brings in all the diff machinery to any git binary that needs the basic git SHA1 object operations from read-cache.c. Which is pretty much all of them. It's doubly silly, since add_files_to_cache() is only used by builtin programs (add, checkout and commit), so it's fairly easily fixed by just moving the thing to builtin-add.c, and avoiding the dependency entirely. I initially argued to Exal that it would probably be best to try to depend on smart compilers and linkers, but after spending some time trying to make -ffunction-sections work and giving up, I think Exal was right, and the fix is to just do some trivial cleanups like this. This trivial cleanup results in pretty stunning file size differences. The diff machinery really is mostly used by just the builtin programs, and you have things like these trivial before-and-after numbers: -rwxr-xr-x 1 torvalds torvalds 1727420 2010-01-21 10:53 git-hash-object -rwxrwxr-x 1 torvalds torvalds 940265 2010-01-21 11:16 git-hash-object Now, I'm not saying that 940kB is good either, but that's mostly all the debug information - you can see the real code with 'size': text data bss dec hex filename 418675 3920 127408 550003 86473 git-hash-object (before) 230650 2288 111728 344666 5425a git-hash-object (after) ie we have a nice 24% size reduction from this trivial cleanup. It's not just that one file either. I get: [torvalds@nehalem git]$ du -s /home/torvalds/libexec/git-core 45640 /home/torvalds/libexec/git-core (before) 33508 /home/torvalds/libexec/git-core (after) so we're talking 12MB of diskspace here. (Of course, stripping all the binaries brings the 33MB down to 9MB, so the whole debug information thing is still the bulk of it all, but that's a separate issue entirely) Now, I'm sure there are other things we should do, and changing our compiler flags from -O2 to -Os would bring the text size down by an additional almost 20%, but this thing Exal pointed out seems to be some good low-hanging fruit. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2010-01-20Merge branch 'jc/cache-unmerge'Junio C Hamano
* jc/cache-unmerge: rerere forget path: forget recorded resolution rerere: refactor rerere logic to make it independent from I/O rerere: remove silly 1024-byte line limit resolve-undo: teach "update-index --unresolve" to use resolve-undo info resolve-undo: "checkout -m path" uses resolve-undo information resolve-undo: allow plumbing to clear the information resolve-undo: basic tests resolve-undo: record resolved conflicts in a new index extension section builtin-merge.c: use standard active_cache macros Conflicts: builtin-ls-files.c builtin-merge.c builtin-rerere.c
2010-01-20Merge branch 'jc/symbol-static'Junio C Hamano
* jc/symbol-static: date.c: mark file-local function static Replace parse_blob() with an explanatory comment symlinks.c: remove unused functions object.c: remove unused functions strbuf.c: remove unused function sha1_file.c: remove unused function mailmap.c: remove unused function utf8.c: mark file-local function static submodule.c: mark file-local function static quote.c: mark file-local function static remote-curl.c: mark file-local function static read-cache.c: mark file-local functions static parse-options.c: mark file-local function static entry.c: mark file-local function static http.c: mark file-local functions static pretty.c: mark file-local function static builtin-rev-list.c: mark file-local function static bisect.c: mark file-local function static
2010-01-13Merge branch 'cc/reset-more'Junio C Hamano
* cc/reset-more: t7111: check that reset options work as described in the tables Documentation: reset: add some missing tables Fix bit assignment for CE_CONFLICTED "reset --merge": fix unmerged case reset: use "unpack_trees()" directly instead of "git read-tree" reset: add a few tests for "git reset --merge" Documentation: reset: add some tables to describe the different options reset: improve mixed reset error message when in a bare repo
2010-01-13Merge branch 'nd/sparse'Junio C Hamano
* nd/sparse: (25 commits) t7002: test for not using external grep on skip-worktree paths t7002: set test prerequisite "external-grep" if supported grep: do not do external grep on skip-worktree entries commit: correctly respect skip-worktree bit ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALID tests: rename duplicate t1009 sparse checkout: inhibit empty worktree Add tests for sparse checkout read-tree: add --no-sparse-checkout to disable sparse checkout support unpack-trees(): ignore worktree check outside checkout area unpack_trees(): apply $GIT_DIR/info/sparse-checkout to the final index unpack-trees(): "enable" sparse checkout and load $GIT_DIR/info/sparse-checkout unpack-trees.c: generalize verify_* functions unpack-trees(): add CE_WT_REMOVE to remove on worktree alone Introduce "sparse checkout" dir.c: export excluded_1() and add_excludes_from_file_1() excluded_1(): support exclude files in index unpack-trees(): carry skip-worktree bit over in merged_entry() Read .gitignore from index if it is skip-worktree Avoid writing to buffer in add_excludes_from_file_1() ... Conflicts: .gitignore Documentation/config.txt Documentation/git-update-index.txt Makefile entry.c t/
2010-01-12read-cache.c: mark file-local functions staticJunio C Hamano
Signed-off-by: Junio C Hamano <>
2010-01-04"reset --merge": fix unmerged caseJunio C Hamano
Commit 9e8ecea (Add 'merge' mode to 'git reset', 2008-12-01) disallowed "git reset --merge" when there was unmerged entries. But it wished if unmerged entries were reset as if --hard (instead of --merge) has been used. This makes sense because all "mergy" operations makes sure that any path involved in the merge does not have local modifications before starting, so resetting such a path away won't lose any information. The previous commit changed the behavior of --merge to accept resetting unmerged entries if they are reset to a different state than HEAD, but it did not reset the changes in the work tree, leaving the conflict markers in the resulting file in the work tree. Fix it by doing three things: - Update the documentation to match the wish of original "reset --merge" better, namely, "An unmerged entry is a sign that the path didn't have any local modification and can be safely resetted to whatever the new HEAD records"; - Update read_index_unmerged(), which reads the index file into the cache while dropping any higher-stage entries down to stage #0, not to copy the object name from the higher stage entry. The code used to take the object name from the a stage entry ("base" if you happened to have stage #1, or "ours" if both sides added, etc.), which essentially meant that you are getting random results depending on what the merge did. The _only_ reason we want to keep a previously unmerged entry in the index at stage #0 is so that we don't forget the fact that we have corresponding file in the work tree in order to be able to remove it when the tree we are resetting to does not have the path. In order to differentiate such an entry from ordinary cache entry, the cache entry added by read_index_unmerged() is marked as CE_CONFLICTED. - Update merged_entry() and deleted_entry() so that they pay attention to cache entries marked as CE_CONFLICTED. They are previously unmerged entries, and the files in the work tree that correspond to them are resetted away by oneway_merge() to the version from the tree we are resetting to. Signed-off-by: Junio C Hamano <>
2009-12-27Merge branch 'nf/maint-fix-index-ext-len-on-be64' into maintJunio C Hamano
* nf/maint-fix-index-ext-len-on-be64: read_index(): fix reading extension size on BE 64-bit archs
2009-12-27read_index(): fix reading extension size on BE 64-bit archsNathaniel W Filardo
On big endian platforms with 8-byte unsigned long, the code reads the size of the index extension section (which is a 4-byte network byte order integer) incorrectly. Signed-off-by: Junio C Hamano <>
2009-12-26resolve-undo: record resolved conflicts in a new index extension sectionJunio C Hamano
When resolving a conflict using "git add" to create a stage #0 entry, or "git rm" to remove entries at higher stages, remove_index_entry_at() function is eventually called to remove unmerged (i.e. higher stage) entries from the index. Introduce a "resolve_undo_info" structure and keep track of the removed cache entries, and save it in a new index extension section in the index_state. Operations like "read-tree -m", "merge", "checkout [-m] <branch>" and "reset" are signs that recorded information in the index is no longer necessary. The data is removed from the index extension when operations start; they may leave conflicted entries in the index, and later user actions like "git add" will record their conflicted states afresh. Signed-off-by: Junio C Hamano <>
2009-12-14ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALIDNguyễn Thái Ngọc Duy
Previously CE_MATCH_IGNORE_VALID flag is used by both valid and skip-worktree bits. While the two bits have similar behaviour, sharing this flag means "git update-index --really-refresh" will ignore skip-worktree while it should not. Instead another flag is introduced to ignore skip-worktree bit, CE_MATCH_IGNORE_VALID only applies to valid bit. Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2009-08-24Teach Git to respect skip-worktree bit (reading part)Nguyễn Thái Ngọc Duy
grep: turn on --cached for files that is marked skip-worktree ls-files: do not check for deleted file that is marked skip-worktree update-index: ignore update request if it's skip-worktree, while still allows removing diff*: skip worktree version Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2009-08-22reset: make the reminder output consistent with "checkout"Matthieu Moy
git reset without argument displays a summary of the local modification, like this: $ git reset Makefile: locally modified Some people have problems with this; they look like an error message. This patch makes its output mimic how "git checkout $another_branch" reports the paths with local modifications. "git add --refresh --verbose" is changed in the same way. It also adds a header to make it clear that the output is informative, and not an error. Signed-off-by: Matthieu Moy <>
The change in the output is going to become more general than just saying "changed", so let's make the variable name more general too. Signed-off-by: Matthieu Moy <> Signed-off-by: Junio C Hamano <>
2009-06-27Use die_errno() instead of die() when checking syscallsThomas Rast
Lots of die() calls did not actually report the kind of error, which can leave the user confused as to the real problem. Use die_errno() where we check a system/library call that sets errno on failure, or one of the following that wrap such calls: Function Passes on error from -------- -------------------- odb_pack_keep open read_ancestry fopen read_in_full xread strbuf_read xread strbuf_read_file open or strbuf_read_file strbuf_readlink readlink write_in_full xwrite Signed-off-by: Thomas Rast <> Signed-off-by: Junio C Hamano <>
2009-06-27Convert existing die(..., strerror(errno)) to die_errno()Thomas Rast
Change calls to die(..., strerror(errno)) to use the new die_errno(). In the process, also make slight style adjustments: at least state _something_ about the function that failed (instead of just printing the pathname), and put paths in single quotes. Signed-off-by: Thomas Rast <> Signed-off-by: Junio C Hamano <>
2009-03-15checkout bugfix: use stat.mtime instead of stat.ctime in two placesKjetil Barvik
Commit e1afca4fd "write_index(): update index_state->timestamp after flushing to disk" on 2009-02-23 used stat.ctime to record the timestamp of the index-file. This is wrong, so fix this and use the correct stat.mtime timestamp instead. Commit 110c46a909 "Not all systems use st_[cm]tim field for ns resolution file timestamp" on 2009-03-08, has a similar bug for the builtin-fetch-pack.c file. Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-03-08Not all systems use st_[cm]tim field for ns resolution file timestampJunio C Hamano
Some codepaths do not still use the ST_[CM]TIME_NSEC() pair of macros introduced by the previous commit but assumes all systems use st_mtim and st_ctim fields in "struct stat" to record nanosecond resolution part of the file timestamps. Signed-off-by: Junio C Hamano <>
2009-03-08Record ns-timestamps if possible, but do not use it without USE_NSECKjetil Barvik
Traditionally, the lack of USE_NSEC meant "do not record nor use the nanosecond resolution part of the file timestamps". To avoid problems on filesystems that lose the ns part when the metadata is flushed to the disk and then later read back in, disabling USE_NSEC has been a good idea in general. If you are on a filesystem without such an issue, it does not hurt to read and store them in the cached stat data in the index entries even if your git is compiled without USE_NSEC. The index left with such a version of git can be read by git compiled with USE_NSEC and it can make use of the nanosecond part to optimize the check to see if the path on the filesystem hsa been modified since we last looked at. Signed-off-by: Junio C Hamano <>
2009-02-24write_index(): update index_state->timestamp after flushing to diskKjetil Barvik
Since this timestamp is used to check for racy-clean files, it is important to keep it uptodate. For the 'git checkout' command without the '-q' option, this make a huge difference. Before, each and every file which was updated, was racy-clean after the call to unpack_trees() and write_index() but before the GIT process ended. And because of the call to show_local_changes() in builtin-checkout.c, we ended up reading those files back into memory, doing a SHA1 to check if the files was really different from the index. And, of course, no file was different. With this fix, 'git checkout' without the '-q' option should now be almost as fast as with the '-q' option, but not quite, as we still do some few lstat(2) calls more without the '-q' option. Below is some average numbers for 10 checkout's to v2.6.27 and 10 to v2.6.25 of the Linux kernel, to show the difference: before (git version 1.6.2.rc1.256.g58a87): 7.860 user 2.427 sys 19.465 real 52.8% CPU faults: 0 major 95331 minor after: 6.184 user 2.160 sys 17.619 real 47.4% CPU faults: 0 major 38994 minor Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-02-20make USE_NSEC work as expectedKjetil Barvik
Since the filesystem ext4 is now defined as stable in Linux v2.6.28, and ext4 supports nanonsecond resolution timestamps natively, it is time to make USE_NSEC work as expected. This will make racy git situations less likely to happen. For 'git checkout' this means it will be less likely that we have to open, read the contents of the file into RAM, and check if file is really modified or not. The result sould be a litle less used CPU time, less pagefaults and a litle faster program, at least for 'git checkout'. Since the number of possible racy git situations would increase when disks gets faster, this patch would be more and more helpfull as times go by. For a fast Solid State Disk, this patch should be helpfull. Note that, when file operations starts to take less than 1 nanosecond, one would again start to get more racy git situations. For more info on racy git, see Documentation/technical/racy-git.txt For more info on ext4, see Signed-off-by: Kjetil Barvik <> Signed-off-by: Junio C Hamano <>
2009-02-19check_updates(): effective removal of cache entries marked CE_REMOVEKjetil Barvik
Below is oprofile output from GIT command 'git chekcout -q my-v2.6.25' (move from tag v2.6.27 to tag v2.6.25 of the Linux kernel): CPU: Core 2, speed 1999.95 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 20000 Counted INST_RETIRED_ANY_P events (number of instructions retired) with a unit mask of 0x00 (No unit mask) count 20000 CPU_CLK_UNHALT...|INST_RETIRED:2...| samples| %| samples| %| ------------------------------------ 409247 100.000 342878 100.000 git CPU_CLK_UNHALT...|INST_RETIRED:2...| samples| %| samples| %| ------------------------------------ 260476 63.6476 257843 75.1996 100876 24.6492 64378 18.7758 kernel- 30850 7.5382 7874 2.2964 14775 3.6103 8390 2.4469 git 2020 0.4936 4325 1.2614 191 0.0467 32 0.0093 58 0.0142 36 0.0105 1 2.4e-04 0 0 Detail list of the top 20 function entries (libz counted in one blob): CPU_CLK_UNHALTED INST_RETIRED_ANY_P samples % samples % image name symbol name 260476 63.6862 257843 75.2725 /lib/ 16587 4.0555 3636 1.0615 memcpy 7710 1.8851 277 0.0809 memmove 3679 0.8995 1108 0.3235 kernel- d_validate 3546 0.8670 2607 0.7611 kernel- __getblk 3174 0.7760 1813 0.5293 _int_malloc 2396 0.5858 3681 1.0746 kernel- copy_to_user 2270 0.5550 2528 0.7380 kernel- __link_path_walk 2205 0.5391 1797 0.5246 kernel- ext4_mark_iloc_dirty 2103 0.5142 1203 0.3512 kernel- find_first_zero_bit 2077 0.5078 997 0.2911 kernel- do_get_write_access 2070 0.5061 514 0.1501 git cache_name_compare 2043 0.4995 1501 0.4382 kernel- rcu_irq_exit 2022 0.4944 1732 0.5056 kernel- __ext4_get_inode_loc 2020 0.4939 4325 1.2626 /usr/lib/ 1965 0.4804 1384 0.4040 git patch_delta 1708 0.4176 984 0.2873 kernel- rcu_sched_grace_period 1682 0.4112 727 0.2122 kernel- sysfs_slab_alias 1659 0.4056 290 0.0847 git find_pack_entry_one 1480 0.3619 1307 0.3816 kernel- ext4_writepage_trans_blocks Notice the memmove line, where the CPU did 7710 / 277 = 27.8 cycles per instruction, and compared to the total cycles spent inside the source code of GIT for this command, all the memmove() calls translates to (7710 * 100) / 14775 = 52.2% of this. Retesting with a GIT program compiled for gcov usage, I found out that the memmove() calls came from remove_index_entry_at() in read-cache.c, where we have: memmove(istate->cache + pos, istate->cache + pos + 1, (istate->cache_nr - pos) * sizeof(struct cache_entry *)); remove_index_entry_at() is called 4902 times from check_updates() in unpack-trees.c, and each time called we move each cache_entry pointers (from the removed one) one step to the left. Since we have 28828 entries in the cache this time, and if we on average move half of them each time, we in total move approximately 4902 * 0.5 * 28828 * 4 = 282 629 712 bytes, or twice this amount if each pointer is 8 bytes (64 bit). OK, is seems that the function check_updates() is called 28 times, so the estimated guess above had been more correct if check_updates() had been called only once, but the point is: we get lots of bytes moved. To fix this, and use an O(N) algorithm instead, where N is the number of cache_entries, we delete/remove all entries in one loop through all entries. From a retest, the new remove_marked_cache_entries() from the patch below, ended up with the following output line from oprofile: 46 0.0105 15 0.0041 git remove_marked_cache_entries If we can trust the numbers from oprofile in this case, we saved approximately ((7710 - 46) * 20000) / (2 * 1000 * 1000 * 1000) = 0.077 seconds CPU time with this fix for this particular test. And notice that now the CPU did only 46 / 15 = 3.1 cycles/instruction. Signed-off-by: Kjetil Barvik <> Acked-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2009-01-29add -u: do not fail to resolve a path as deletedJunio C Hamano
After you resolve a conflicted merge to remove the path, "git add -u" failed to record the removal. Instead it errored out by saying that the removed path is not found in the work tree, but that is what the user already knows, and the wanted to record the removal as the resolution, so the error does not make sense. Signed-off-by: Junio C Hamano <>
2008-12-17Make 'ce_compare_link()' use the new 'strbuf_readlink()'Linus Torvalds
This simplifies the code, and also makes ce_compare_link now able to handle filesystems with odd 'st_size' return values for symlinks. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2008-12-08read-cache.c: typofix in commentJunio C Hamano
Signed-off-by: Junio C Hamano <>
2008-12-01git add --intent-to-add: do not let an empty blob be committed by accidentJunio C Hamano
Writing a tree out of an index with an "intent to add" entry is blocked. This implies that you cannot "git commit" from such a state; however you can still do "git commit -a" or "git commit $that_path". Signed-off-by: Junio C Hamano <>
2008-11-29git add --intent-to-add: fix removal of cached emptinessJunio C Hamano
This uses the extended index flag mechanism introduced earlier to mark the entries added to the index via "git add -N" with CE_INTENT_TO_ADD. The logic to detect an "intent to add" entry for the purpose of allowing "git rm --cached $path" is tightened to check not just for a staged empty blob, but with the CE_INTENT_TO_ADD bit. This protects an empty blob that was explicitly added and then modified in the work tree from being dropped with this sequence: $ >empty $ git add empty $ echo "non empty" >empty $ git rm --cached empty Signed-off-by: Junio C Hamano <>
2008-11-29Merge branch 'nd/narrow' (early part) into jc/add-i-t-aJunio C Hamano
* 'nd/narrow' (early part): Extend index to save more flags
2008-11-12Merge branch 'maint'Junio C Hamano
* maint: Start cycle Fix pack.packSizeLimit and --max-pack-size handling checkout: Fix "initial checkout" detection Remove the period after the git-check-attr summary Conflicts: RelNotes
2008-11-12checkout: Fix "initial checkout" detectionJunio C Hamano
Earlier commit 5521883 (checkout: do not lose staged removal, 2008-09-07) tightened the rule to prevent switching branches from losing local changes, so that staged removal of paths can be protected, while attempting to keep a loophole to still allow a special case of switching out of an un-checked-out state. However, the loophole was made a bit too tight, and did not allow switching from one branch (in an un-checked-out state) to check out another branch. The change to builtin-checkout.c in this commit loosens it to allow this, by not insisting the original commit and the new commit to be the same. It also introduces a new function, is_index_unborn (and an associated macro, is_cache_unborn), to check if the repository is truly in an un-checked-out state more reliably, by making sure that $GIT_INDEX_FILE did not exist when populating the in-core index structure. A few places the earlier commit 5521883 added the check for the initial checkout condition are updated to use this function. Signed-off-by: Junio C Hamano <>