path: root/diffcore-pickaxe.c
AgeCommit message (Collapse)Author
2010-10-06diffcore-pickaxe.c: a void function shouldn't try to return somethingBrandon Casey
Signed-off-by: Brandon Casey <> Signed-off-by: Junio C Hamano <>
2010-10-06Merge branch 'maint'Junio C Hamano
* maint: Documentation/git-clone: describe --mirror more verbosely do not depend on signed integer overflow work around buggy S_ISxxx(m) implementations xdiff: cast arguments for ctype functions to unsigned char init: plug tiny one-time memory leak diffcore-pickaxe.c: remove unnecessary curly braces t3020 (ls-files-error-unmatch): remove stray '1' from end of file setup: make sure git dir path is in a permanent buffer environment.c: remove unused variable git-svn: fix processing of decorated commit hashes git-svn: check_cherry_pick should exclude commits already in our history Documentation/git-svn: discourage "noMetadata"
2010-10-05diffcore-pickaxe.c: remove unnecessary curly bracesBrandon Casey
Signed-off-by: Brandon Casey <> Signed-off-by: Junio C Hamano <>
2010-08-31git log/diff: add -G<regexp> that greps in the patch textJunio C Hamano
Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <>
2010-08-31diff: pass the entire diff-options to diffcore_pickaxe()Junio C Hamano
That would make it easier to give enhanced feature to the pickaxe transformation. Signed-off-by: Junio C Hamano <>
2010-05-07Add a macro DIFF_QUEUE_CLEAR.Bo Yang
Refactor the diff_queue_struct code, this macro help to reset the structure. Signed-off-by: Bo Yang <> Signed-off-by: Junio C Hamano <>
2009-03-17pickaxe: count regex matches only onceRené Scharfe
When --pickaxe-regex is used, forward past the end of matches instead of advancing to the byte after their start. This way matches count only once, even if the regular expression matches their tail -- like in the fixed-string fork of the code. E.g.: /.*/ used to count the number of bytes instead of the number of lines. /aa/ resulted in a count of two in "aaa" instead of one. Also document the fact that regexec() needs a NUL-terminated string as its second argument by adding an assert(). Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2009-03-03diffcore-pickaxe: use memmem()René Scharfe
Use memmem() instead of open-coding it. The system libraries usually have a much faster version than the memcmp()-loop here. Even our own fall-back in compat/, which is used on Windows, is slightly faster. The following commands were run in a Linux kernel repository and timed, the best of five results is shown: $ STRING='Ensure that the real time constraints are schedulable.' $ git log -S"$STRING" HEAD -- kernel/sched.c >/dev/null On Ubuntu 8.10 x64, before (v1.6.2-rc2): 8.09user 0.04system 0:08.14elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30952minor)pagefaults 0swaps And with the patch: 1.50user 0.04system 0:01.54elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30645minor)pagefaults 0swaps On Fedora 10 x64, before: 8.34user 0.05system 0:08.39elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+29268minor)pagefaults 0swaps And with the patch: 1.15user 0.05system 0:01.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32253minor)pagefaults 0swaps On Windows Vista x64, before: real 0m9.204s user 0m0.000s sys 0m0.000s And with the patch: real 0m8.470s user 0m0.000s sys 0m0.000s Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2007-06-07War on whitespaceJunio C Hamano
This uses "git-apply --whitespace=strip" to fix whitespace errors that have crept in to our source files over time. There are a few files that need to have trailing whitespaces (most notably, test vectors). The results still passes the test, and build result in Documentation/ area is unchanged. Signed-off-by: Junio C Hamano <>
2007-05-07diff -S: release the image after looking for needle in itJunio C Hamano
Signed-off-by: Junio C Hamano <>
2007-01-26diffcore-pickaxe: fix infinite loop on zero-length needleJeff King
The "contains" algorithm runs into an infinite loop if the needle string has zero length. The loop could be modified to handle this, but it makes more sense to simply have an empty needle return no matches. Thus, a command like git log -S produces no output. We place the check at the top of the function so that we get the same results with or without --pickaxe-regex. Note that until now, git log -S --pickaxe-regex would match everything, not nothing. Arguably, an empty pickaxe string should simply produce an error message; however, this is still a useful assertion to add to the algorithm at this layer of the code. Noticed by Bill Lear. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2006-12-20simplify inclusion of system header files.Junio C Hamano
This is a mechanical clean-up of the way *.c files include system header files. (1) sources under compat/, platform sha-1 implementations, and xdelta code are exempt from the following rules; (2) the first #include must be "git-compat-util.h" or one of our own header file that includes it first (e.g. config.h, builtin.h, pkt-line.h); (3) system headers that are included in "git-compat-util.h" need not be included in individual C source files. (4) "git-compat-util.h" does not have to include subsystem specific header files (e.g. expat.h). Signed-off-by: Junio C Hamano <>
2006-04-05On some platforms, certain headers need to be included before regex.hJohannes Schindelin
Happily, these are already included in cache.h, which is included anyway... so: change the order of includes. Signed-off-by: Johannes Schindelin <> Signed-off-by: Junio C Hamano <>
2006-04-04Support for pickaxe matching regular expressionsPetr Baudis
git-diff-* --pickaxe-regex will change the -S pickaxe to match POSIX extended regular expressions instead of fixed strings. The regex.h library is a rather stupid interface and I like pcre too, but with any luck it will be everywhere we will want to run Git on, it being POSIX.2 and all. I'm not sure if we can expect platforms like AIX to conform to POSIX.2 or if win32 has regex.h. We might add a flag to Makefile if there is a portability trouble potential. Signed-off-by: Petr Baudis <>
2005-07-24[PATCH] diffcore-pickaxe: switch to "counting" behaviour.Junio C Hamano
Instead of finding old/new pair that one side has and the other side does not have the specified string, find old/new pair that contains the specified string as a substring different number of times. This would still not catch a case where you introduce two static variable declarations and remove two static function definitions from a file with -S"static", but would make it behave a bit more intuitively. Signed-off-by: Junio C Hamano <> Signed-off-by: Linus Torvalds <>
2005-05-29[PATCH] Do not include unused header files.Junio C Hamano
Some source files were including "delta.h" without actually needing it. Remove them. Signed-off-by: Junio C Hamano <> Signed-off-by: Linus Torvalds <>
2005-05-29[PATCH] Optimize diff-tree -[CM] --stdinJunio C Hamano
This attempts to optimize "diff-tree -[CM] --stdin", which compares successible tree pairs. This optimization does not make much sense for other commands in the diff-* brothers. When reading from --stdin and using rename/copy detection, the patch makes diff-tree to read the current index file first. This is done to reuse the optimization used by diff-cache in the non-cached case. Similarity estimator can avoid expanding a blob if the index says what is in the work tree has an exact copy of that blob already expanded. Another optimization the patch makes is to check only file sizes first to terminate similarity estimation early. In order for this to work, it needs a way to tell the size of the blob without expanding it. Since an obvious way of doing it, which is to keep all the blobs previously used in the memory, is too costly, it does so by keeping the filesize for each object it has already seen in memory. Signed-off-by: Junio C Hamano <> Signed-off-by: Linus Torvalds <>
2005-05-29[PATCH] Add --pickaxe-all to diff-* brothers.Junio C Hamano
When --pickaxe-all is given in addition to -S, pickaxe shows the entire diffs contained in the changeset, not just the diffs for the filepair that touched the sought-after string. This is useful to see the changes in context. Signed-off-by: Junio C Hamano <> Signed-off-by: Linus Torvalds <>
2005-05-29[PATCH] Introduce diff_free_filepair() funcion.Junio C Hamano
This introduces a new function to free a common data structure, and plugs some leaks. Signed-off-by: Junio C Hamano <> Signed-off-by: Linus Torvalds <>
2005-05-23[PATCH] Performance fix for pickaxe.Junio C Hamano
The pickaxe was expanding the blobs and searching in them even when it should have already known that both sides are the same. Signed-off-by: Junio C Hamano <> Signed-off-by: Linus Torvalds <>
2005-05-23[PATCH] Rename/copy detection fix.Junio C Hamano
The rename/copy detection logic in earlier round was only good enough to show patch output and discussion on the mailing list about the diff-raw format updates revealed many problems with it. This patch fixes all the ones known to me, without making things I want to do later impossible, mostly related to patch reordering. (1) Earlier rename/copy detector determined which one is rename and which one is copy too early, which made it impossible to later introduce diffcore transformers to reorder patches. This patch fixes it by moving that logic to the very end of the processing. (2) Earlier output routine diff_flush() was pruning all the "no-change" entries indiscriminatingly. This was done due to my false assumption that one of the requirements in the diff-raw output was not to show such an entry (which resulted in my incorrect comment about "diff-helper never being able to be equivalent to built-in diff driver"). My special thanks go to Linus for correcting me about this. When we produce diff-raw output, for the downstream to be able to tell renames from copies, sometimes it _is_ necessary to output "no-change" entries, and this patch adds diffcore_prune() function for doing it. (3) Earlier diff_filepair structure was trying to be not too specific about rename/copy operations, but the purpose of the structure was to record one or two paths, which _was_ indeed about rename/copy. This patch discards xfrm_msg field which was trying to be generic for this wrong reason, and introduces a couple of fields (rename_score and rename_rank) that are explicitly specific to rename/copy logic. One thing to note is that the information in a single diff_filepair structure _still_ does not distinguish renames from copies, and it is deliberately so. This is to allow patches to be reordered in later stages. (4) This patch also adds some tests about diff-raw format output and makes sure that necessary "no-change" entries appear on the output. Signed-off-by: Junio C Hamano <> Signed-off-by: Linus Torvalds <>
2005-05-22[PATCH] Diffcore updates.Junio C Hamano
This moves the path selection logic from individual programs to a new diffcore transformer (diff-tree still needs to have its own for performance reasons). Also the header printing code in diff-tree was tweaked not to produce anything when pickaxe is in effect and there is nothing interesting to report. An interesting example is the following in the GIT archive itself: $ git-whatchanged -p -C -S'or something in a real script' Signed-off-by: Junio C Hamano <> Signed-off-by: Linus Torvalds <>
2005-05-22[PATCH] The diff-raw format updates.Junio C Hamano
Update the diff-raw format as Linus and I discussed, except that it does not use sequence of underscore '_' letters to express nonexistence. All '0' mode is used for that purpose instead. The new diff-raw format can express rename/copy, and the earlier restriction that -M and -C _must_ be used with the patch format output is no longer necessary. The patch makes -M and -C flags independent of -p flag, so you need to say git-whatchanged -M -p to get the diff/patch format. Updated are both documentations and tests. Signed-off-by: Junio C Hamano <> Signed-off-by: Linus Torvalds <>
2005-05-22[PATCH] Prepare diffcore interface for diff-tree header supression.Junio C Hamano
This does not actually supress the extra headers when pickaxe is used, but prepares enough support for diff-tree to implement it. Signed-off-by: Junio C Hamano <> Signed-off-by: Linus Torvalds <>
2005-05-21[PATCH] Introducing software archaeologist's tool "pickaxe".Junio C Hamano
This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD | ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <> Signed-off-by: Linus Torvalds <>