summaryrefslogtreecommitdiff
path: root/tree-walk.c
AgeCommit message (Collapse)Author
2012-11-26pathspec: apply "*.c" optimization from excludeNguyễn Thái Ngọc Duy
When a pattern contains only a single asterisk as wildcard, e.g. "foo*bar", after literally comparing the leading part "foo" with the string, we can compare the tail of the string and make sure it matches "bar", instead of running fnmatch() on "*bar" against the remainder of the string. -O2 build on linux-2.6, without the patch: $ time git rev-list --quiet HEAD -- '*.c' real 0m40.770s user 0m40.290s sys 0m0.256s With the patch $ time ~/w/git/git rev-list --quiet HEAD -- '*.c' real 0m34.288s user 0m33.997s sys 0m0.205s The above command is not supposed to be widely popular. It's chosen because it exercises pathspec matching a lot. The point is it cuts down matching time for popular patterns like *.c, which could be used as pathspec in other places. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-11-26pathspec: do exact comparison on the leading non-wildcard partNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-11-19pathspec: save the non-wildcard length partNguyễn Thái Ngọc Duy
We mark pathspec with wildcards with the field use_wildcard. We could do better by saving the length of the non-wildcard part, which can be used for optimizations such as f9f6e2c (exclude: do strcmp as much as possible before fnmatch - 2012-06-07). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-11-09Merge branch 'nd/tree-walk-enum-cleanup'Jeff King
* nd/tree-walk-enum-cleanup: tree-walk: use enum interesting instead of integer
2012-10-19tree-walk: use enum interesting instead of integerNguyễn Thái Ngọc Duy
Commit d688cf0 (tree_entry_interesting(): give meaningful names to return values - 2011-10-24) converts most of the tree_entry_interesting values to the new enum, except "never_interesting". This completes the conversion. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-15Document limited recursion pathspec matching with wildcardsNguyễn Thái Ngọc Duy
It's actually unlimited recursion if wildcards are active regardless --max-depth Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-27tree_entry_interesting(): give meaningful names to return valuesNguyễn Thái Ngọc Duy
It is a basic code hygiene to avoid magic constants that are unnamed. Besides, this helps extending the value later on for "interesting, but cannot decide if the entry truely matches yet" (ie. prefix matches) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-27tree_entry_interesting: make use of local pointer "item"Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-27get_tree_entry(): do not call find_tree_entry() on an empty treeJunio C Hamano
We know we will find nothing. This incidentally squelches false warning from gcc about potentially uninitialized usage of t.entry fields. For an empty tree, it is true that init_tree_desc() does not call decode_tree_entry() and the tree_desc is left uninitialized, but find_tree_entry() only calls tree_entry_extract() that uses the tree_desc while it has more things to read from the tree, so the uninitialized t.entry fields are never used in such a case anyway. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-27tree-walk.c: do not leak internal structure in tree_entry_len()Nguyễn Thái Ngọc Duy
tree_entry_len() does not simply take two random arguments and return a tree length. The two pointers must point to a tree item structure, or struct name_entry. Passing random pointers will return incorrect value. Force callers to pass struct name_entry instead of two pointers (with hope that they don't manually construct struct name_entry themselves) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-10Merge branch 'dm/tree-walk'Junio C Hamano
* dm/tree-walk: tree-walk: micro-optimization in tree_entry_interesting tree-walk: drop unused parameter from match_dir_prefix
2011-10-09Fix some "variable might be used uninitialized" warningsRamsay Jones
In particular, gcc complains as follows: CC tree-walk.o tree-walk.c: In function `traverse_trees': tree-walk.c:347: warning: 'e' might be used uninitialized in this \ function CC builtin/revert.o builtin/revert.c: In function `verify_opt_mutually_compatible': builtin/revert.c:113: warning: 'opt2' might be used uninitialized in \ this function Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-29tree-walk: micro-optimization in tree_entry_interestingDan McGee
In the case of a wide breadth top-level tree (~2400 entries, all trees in this case), we can see a noticeable cost in the profiler calling strncmp() here. Most of the time we are at the base level of the repository, so base is "" and baselen == 0, which means we will always test true. Break out this one tiny case so we can short circuit the strncmp() call. Test cases are as follows. packages.git is the Arch Linux git-svn clone of the packages repository which has the characteristics above. Commands: [1] packages.git, /usr/bin/time git log >/dev/null [2] packages.git, /usr/bin/time git log -- autogen/trunk pacman/trunk wget/trunk >/dev/null [3] linux.git, /usr/bin/time git log >/dev/null [4] linux.git, /usr/bin/time git log -- drivers/ata drivers/uio tools >/dev/null Results: before after %faster [1] 2.56 2.55 0.4% [2] 51.82 48.66 6.5% [3] 5.58 5.61 -0.5% [4] 1.55 1.51 0.2% The takeaway here is this doesn't matter in many operations, but it does for a certain style of repository and operation where it nets a 6.5% measured improvement. The other changes are likely not significant by reasonable statistics methods. Note: the measured improvement when originally submitted was ~11% (43 to 38 secs) for operation [2]. At the time, the repository had 117220 commits; it now has 137537 commits. Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-29tree-walk: drop unused parameter from match_dir_prefixDan McGee
Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-29traverse_trees(): allow pruning with pathspecJunio C Hamano
The traverse_trees() machinery is primarily meant for merging two (or more) trees, and because a merge is a full tree operation, it doesn't support any pruning with pathspec. Since d1f2d7e (Make run_diff_index() use unpack_trees(), not read_tree(), 2008-01-19), however, we use unpack_trees() to traverse_trees() callchain to perform "diff-index", which could waste a lot of work traversing trees outside the user-supplied pathspec, only to discard at the blob comparison level in diff-lib.c::oneway_diff() which is way too late. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-05pathspec: rename per-item field has_wildcard to use_wildcardJunio C Hamano
As the point of the last change is to allow use of strings as literals no matter what characters are in them, "has_wildcard" does not match what we use this field for anymore. It is used to decide if the wildcard matching should be used, so rename it to match the usage better. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03grep: drop pathspec_matches() in favor of tree_entry_interesting()Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03tree_entry_interesting(): optimize wildcard matching when base is matchedNguyễn Thái Ngọc Duy
If base is already matched, skip that part when calling fnmatch(). This happens quite often if users start a command from worktree's subdirectory and prefix is usually prepended to all pathspecs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03tree_entry_interesting(): support wildcard matchingNguyễn Thái Ngọc Duy
never_interesting optimization is disabled if there is any wildcard pathspec, even if it only matches exactly on trees. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03tree_entry_interesting(): fix depth limit with overlapping pathspecsNguyễn Thái Ngọc Duy
Suppose we have two pathspecs 'a' and 'a/b' (both are dirs) and depth limit 1. In current code, pathspecs are checked in input order. When 'a/b' is checked against pathspec 'a', it fails depth limit and therefore is excluded, although it should match 'a/b' pathspec. This patch reorders all pathspecs alphabetically, then teaches tree_entry_interesting() to check against the deepest pathspec first, so depth limit of a shallower pathspec won't affect a deeper one. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03tree_entry_interesting(): support depth limitNguyễn Thái Ngọc Duy
This is needed to replace pathspec_matches() in builtin/grep.c. max_depth == -1 means infinite depth. Depth limit is only effective when pathspec.recursive == 1. When pathspec.recursive == 0, the behavior depends on match functions: non-recursive for tree_entry_interesting() and recursive for match_pathspec{,_depth} Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03tree_entry_interesting(): refactor into separate smaller functionsNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03diff-tree: convert base+baselen to writable strbufNguyễn Thái Ngọc Duy
In traversing trees, a full path is splitted into two parts: base directory and entry. They are however quite often concatenated whenever a full path is needed. Current code allocates a new buffer, do two memcpy(), use it, then release. Instead this patch turns "base" to a writable, extendable buffer. When a concatenation is needed, the callee only needs to append "entry" to base, use it, then truncate the entry out again. "base" must remain unchanged before and after entering a function. This avoids quite a bit of malloc() and memcpy(). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03Move tree_entry_interesting() to tree-walk.c and export itNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-11unpack_trees: group error messages by typeMatthieu Moy
When an error is encountered, it calls add_rejected_file() which either - directly displays the error message and stops if in plumbing mode (i.e. if show_all_errors is not initialized at 1) - or stores it so that it will be displayed at the end with display_error_msgs(), Storing the files by error type permits to have a list of files for which there is the same error instead of having a serie of almost identical errors. As each bind_overlap error combines a file and an old file, a list cannot be done, therefore, theses errors are not stored but directly displayed. Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-15Merge branch 'maint-1.6.6' into maintJunio C Hamano
* maint-1.6.6: fix minor memory leak in get_tree_entry()
2010-02-14fix minor memory leak in get_tree_entry()René Scharfe
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-04traverse_trees(): handle D/F conflict case sanelyJunio C Hamano
traverse_trees() is supposed to call its callback with all the matching entries from the given trees. The current algorithm keeps a pointer to each of the tree being traversed, and feeds the entry with the earliest name to the callback. This breaks down if the trees being traversed looks like this: A B t-1 t t-2 u t/a v When we are currently looking at an entry "t-1" in tree A, and tree B has returned "t", feeding "t" from the B and not feeding anything from A, only because "t-1" sorts later than "t", will miss an entry for a subtree "t" behind the current entry in tree A. This introduces extended_entry_extract() helper function that gives what name is expected from the tree, and implements a mechanism to look-ahead in the tree object using it, to make sure such a case is handled sanely. Traversal in tree A in the above example will first return "t" to match that of B, and then the next request for an entry to A then returns "t-1". This roughly corresponds to what Linus's "prepare for one-entry lookahead" wanted to do, but because this does implement look ahead, t6035 and one more test in t1012 reveal that the approach would not work without adjusting the side that walks the index in unpack_trees() as well. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-09Fix tree-walking compare_entry() in the presense of --prefixLinus Torvalds
When we make the "root" tree-walk info entry have a pathname in it, we need to have a ->prev pointer so that compare_entry will actually notice and traverse into the root. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-09Make 'traverse_trees()' traverse conflicting DF entries in parallelLinus Torvalds
This makes the traverse_trees() entry comparator routine use the more relaxed form of name comparison that considers files and directories with the same name identical. We pass in a separate mask for just the directory entries, so that the callback routine can decide (if it wants to) to only handle one or the other type, but generally most (all?) users are expected to really want to see the case of a name 'foo' showing up in one tree as a file and in another as a directory at the same time. In particular, moving 'unpack_trees()' over to use this tree traversal mechanism requires this. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-09Add return value to 'traverse_tree()' callbackLinus Torvalds
This allows the callback to return an error value, but it can also specify which of the tree entries that it actually used up by returning a positive mask value. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-09Make 'traverse_tree()' use linked structure rather than 'const char *base'Linus Torvalds
This makes the calling convention a bit less obvious, but a lot more flexible. Instead of allocating and extending a new 'base' string, we just link the top-most name into a linked list of the 'info' structure when traversing a subdirectory, and we can generate the basename by following the list. Perhaps even more importantly, the linked list of info structures also gives us a place to naturally save off other information than just the directory name. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-07tree-walk: don't parse incorrect entriesMartin Koegler
The current code can access memory outside of the tree buffer in the case of malformed tree entries. This patch prevents this by: * The rest of the buffer must be at least 24 bytes (at least 1 byte mode, 1 blank, at least one byte path name, 1 NUL, 20 bytes sha1). * Check that the last NUL (21 bytes before the end) is present. This ensures that strlen() and get_mode() calls stay within the buffer. * The mode may not be empty. We have only to reject a blank at the begin, as the rest is handled by if (c < '0' || c > '7'). * The blank is ensured by get_mode(). * The path must contain at least one character. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-07War on whitespaceJunio C Hamano
This uses "git-apply --whitespace=strip" to fix whitespace errors that have crept in to our source files over time. There are a few files that need to have trailing whitespaces (most notably, test vectors). The results still passes the test, and build result in Documentation/ area is unchanged. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-03-21Switch over tree descriptors to contain a pre-parsed entryLinus Torvalds
This makes the tree descriptor contain a "struct name_entry" as part of it, and it gets filled in so that it always contains a valid entry. On some benchmarks, it improves performance by up to 15%. That makes tree entry "extract" trivial, and means that we only actually need to decode each tree entry just once: we decode the first one when we initialize the tree descriptor, and each subsequent one when doing "update_tree_entry()". In particular, this means that we don't need to do strlen() both at extract time _and_ at update time. Finally, it also allows more sharing of code (entry_extract(), that wanted a "struct name_entry", just got totally trivial, along with the "tree_entry()" function). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-21Initialize tree descriptors with a helper function rather than by hand.Linus Torvalds
This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-21Remove "pathlen" from "struct name_entry"Linus Torvalds
Since we have the "tree_entry_len()" helper function these days, and don't need to do a full strlen(), there's no point in saving the path length - it's just redundant information. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-18Avoid unnecessary strlen() callsLinus Torvalds
This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-09get_tree_entry: map blank requested entry to tree rootJeff King
This means that git show HEAD: will now return HEAD^{tree}, which is logically consistent with git show HEAD:Documentation Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-05Remove shadowing variable from traverse_trees()René Scharfe
The variable named entry is allocated using malloc() and then forgotten, it being shadowed by an automatic variable of the same name. Fixing the array size at 3 worked so far because the only caller of traverse_trees() needed only as much entries. Simply remove the shadowing varaible and we're able to traverse more than three trees and save stack space at the same time! Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-23Convert memcpy(a,b,20) to hashcpy(a,b).Shawn Pearce
This abstracts away the size of the hash values when copying them from memory location to memory location, much as the introduction of hashcmp abstracted away hash value comparsion. A few call sites were using char* rather than unsigned char* so I added the cast rather than open hashcpy to be void*. This is a reasonable tradeoff as most call sites already use unsigned char* and the existing hashcmp is also declared to be unsigned char*. [jc: Splitted the patch to "master" part, to be followed by a patch for merge-recursive.c which is not in "master" yet. Fixed the cast in the latter hunk to combine-diff.c which was wrong in the original. Also converted ones left-over in combine-diff.c, diff-lib.c and upload-pack.c ] Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-20Remove all void-pointer arithmetic.Florian Forster
ANSI C99 doesn't allow void-pointer arithmetic. This patch fixes this in various ways. Usually the strategy that required the least changes was used. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-31tree_entry(): new tree-walking helper functionLinus Torvalds
This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-29Don't use "sscanf()" for tree mode scanningLinus Torvalds
Doing an oprofile run on the result of my git rev-list memory leak fixes and tree parsing cleanups, I was surprised by the third-highest entry being samples % image name app name symbol name 179751 2.7163 libc-2.4.so libc-2.4.so _IO_vfscanf@@GLIBC_2.4 where that 2.7% is actually more than 5% of one CPU, because this was run on a dual CPU setup with the other CPU just being idle. That seems to all be from the use of 'sscanf(tree, "%o", &mode)' for the tree buffer parsing. So do the trivial octal parsing by hand, which also gives us where the first space in the string is (and thus where the pathname starts) so we can get rid of the "strchr(tree, ' ')" call too. This brings the "git rev-list --all --objects" time down from 63 seconds to 55 seconds on the historical kernel archive for me, so it's quite noticeable - tree parsing is a lot of what we end up doing when following all the objects. [ I also see a 5% speedup on a full "git fsck-objects" on the current kernel archive, so that sscanf() really does seem to have hurt our performance by a surprising amount ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-19get_tree_entry(): make it available from tree-walkJunio C Hamano
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-04Use blob_, commit_, tag_, and tree_type throughout.Peter Eriksen
This replaces occurences of "blob", "commit", "tag", and "tree", where they're really used as type specifiers, which we already have defined global constants for. Signed-off-by: Peter Eriksen <s022018@student.dtu.dk> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-30tree/diff header cleanup.Junio C Hamano
Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net>