path: root/userdiff.c
AgeCommit message (Collapse)Author
2011-02-10Merge branch 'tr/diff-words-test'Junio C Hamano
* tr/diff-words-test: t4034 (diff --word-diff): add a minimum Perl drier test vector t4034 (diff --word-diff): style suggestions userdiff: simplify word-diff safeguard t4034: bulk verify builtin word regex sanity
2011-01-24Merge branch 'as/userdiff-pascal'Junio C Hamano
* as/userdiff-pascal: userdiff: match Pascal class methods
2011-01-18userdiff: simplify word-diff safeguardJonathan Nieder
git's diff-words support has a detail that can be a little dangerous: any text not matched by a given language's tokenization pattern is treated as whitespace and changes in such text would go unnoticed. Therefore each of the built-in regexes allows a special token type consisting of a single non-whitespace character [^[:space:]]. To make sure UTF-8 sequences remain human readable, the builtin regexes also have a special token type for runs of bytes with the high bit set. In English, non-ASCII characters are usually isolated so this is analogous to the [^[:space:]] pattern, except it matches a single _multibyte_ character despite use of the C locale. Unfortunately it is easy to make typos or forget entirely to include these catch-all token types when adding support for new languages (see v1.7.3.5~16, userdiff: fix typo in ruby and python word regexes, 2010-12-18). Avoid this by including them automatically within the PATTERNS and IPATTERN macros. While at it, change the UTF-8 sequence token type to match exactly one non-ASCII multi-byte character, rather than an arbitrary run of them. Suggested-by: Thomas Rast <> Signed-off-by: Jonathan Nieder <> Signed-off-by: Junio C Hamano <>
2011-01-11userdiff: match Pascal class methodsAlexey Shumkin
Class declarations were already covered by the second pattern, but class methods have the 'class' keyword in front too. Account for it. Signed-off-by: Alexey Shumkin <> Acked-by: Thomas Rast <> Signed-off-by: Junio C Hamano <>
2010-12-27userdiff/perl: catch BEGIN/END/... and POD as headersJunio C Hamano
Signed-off-by: Junio C Hamano <>
2010-12-27diff: funcname and word patterns for perlJonathan Nieder
The default function name discovery already works quite well for Perl code... with the exception of here-documents (or rather their ending). sub foo { print <<END here-document END return 1; } The default funcname pattern treats the unindented END line as a function declaration and puts it in the @@ line of diff and "grep --show-function" output. With a little knowledge of perl syntax, we can do better. You can try it out by adding "*.perl diff=perl" to the gitattributes file. Signed-off-by: Jonathan Nieder <> Signed-off-by: Junio C Hamano <>
2010-12-19userdiff: fix typo in ruby and python word regexesThomas Rast
Both had an unclosed ] that ruined the safeguard against not matching a non-space char. Signed-off-by: Thomas Rast <> Signed-off-by: Junio C Hamano <>
2010-09-10userdiff.c: add builtin fortran regex patternsBrandon Casey
This adds fortran xfuncname and wordRegex patterns to the list of builtin patterns. The intention is for the patterns to be appropriate for all versions of fortran including 77, 90, 95. The patterns can be enabled by adding the diff=fortran attribute to the .gitattributes file for the desired file glob. This also adds a new macro named IPATTERN which is just like the PATTERNS macro except it sets the REG_ICASE flag so that case will be ignored. The test code in t4018 and the docs were updated as appropriate. Signed-off-by: Brandon Casey <> Signed-off-by: Junio C Hamano <>
2010-08-17Userdiff patterns for C#Petr Onderka
Add userdiff patterns for C#. This code is an improved version of code by Adam Petaccia from 21 June 2009 mail to the list. Signed-off-by: Petr Onderka <> Acked-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2010-06-13Merge branch 'bs/userdiff-php'Junio C Hamano
* bs/userdiff-php: diff: Support visibility modifiers in the PHP hunk header regexp
2010-05-27diff: Support visibility modifiers in the PHP hunk header regexpBjörn Steinbrink
Starting with PHP5, class methods can have a visibility modifier, which caused the methods not to be matched by the existing regexp, so extend the regexp to match those modifiers. And while we're at it, allow the "static" modifier as well. Since the "static" modifier can appear either before or after the visibility modifier, let's just allow any number of modifiers to appear in any order, as that simplifies the regexp and shouldn't cause any false positives. Signed-off-by: Björn Steinbrink <> Signed-off-by: Junio C Hamano <>
2010-04-02diff: cache textconv outputJeff King
Running a textconv filter can take a long time. It's particularly bad for a large file which needs to be spooled to disk, but even for small files, the fork+exec overhead can add up for something like "git log -p". This patch uses the notes-cache mechanism to keep a fast cache of textconv output. Caches are stored in refs/notes/textconv/$x, where $x is the userdiff driver defined in gitattributes. Caching is enabled only if diff.$x.cachetextconv is true. In my test repo, on a commit with 45 jpg and avi files changed and a textconv to show their exif tags: [before] $ time git show >/dev/null real 0m13.724s user 0m12.057s sys 0m1.624s [after, first run] $ git config diff.mfo.cachetextconv true $ time git show >/dev/null real 0m14.252s user 0m12.197s sys 0m1.800s [after, subsequent runs] $ time git show >/dev/null real 0m0.352s user 0m0.148s sys 0m0.200s So for a slight (3.8%) cost on the first run, we achieve an almost 40x speed up on subsequent runs. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2010-01-17git_attr(): fix function signatureJunio C Hamano
The function took (name, namelen) as its arguments, but all the public callers wanted to pass a full string. Demote the counted-string interface to an internal API status, and allow public callers to just pass the string to the function. Signed-off-by: Junio C Hamano <>
2009-06-18avoid exponential regex match for java and objc function namesPaolo Bonzini
In the old regex ^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\([^;]*)$ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ you can backtrack arbitrarily from [A-Za-z_0-9]* into [A-Za-z_], thus causing an exponential number of backtracks. Ironically it also causes the regex not to work as intended; for example "catch" can match the underlined part of the regex, the first repetition matching "c" and the second matching "atch". The replacement regex avoids this problem, because it makes sure that at least a space/tab is eaten on each repetition. In other words, a suffix of a repetition can never be a prefix of the next repetition. Signed-off-by: Paolo Bonzini <> Signed-off-by: Junio C Hamano <>
2009-01-22Change the spelling of "wordregex".Boyd Stephen Smith Jr
Use "wordRegex" for configuration variable names. Use "word_regex" for C language tokens. Signed-off-by: Boyd Stephen Smith Jr. <> Signed-off-by: Junio C Hamano <>
2009-01-17color-words: make regex configurable via attributesThomas Rast
Make the --color-words splitting regular expression configurable via the diff driver's 'wordregex' attribute. The user can then set the driver on a file in .gitattributes. If a regex is given on the command line, it overrides the driver's setting. We also provide built-in regexes for the languages that already had funcname patterns, and add an appropriate diff driver entry for C/++. (The patterns are designed to run UTF-8 sequences into a single chunk to make sure they remain readable.) Signed-off-by: Thomas Rast <> Signed-off-by: Junio C Hamano <>
2008-10-26userdiff: require explicitly allowing textconvJeff King
Diffs that have been produced with textconv almost certainly cannot be applied, so we want to be careful not to generate them in things like format-patch. This introduces a new diff options, ALLOW_TEXTCONV, which controls this behavior. It is off by default, but is explicitly turned on for the "log" family of commands, as well as the "diff" porcelain (but not diff-* plumbing). Because both text conversion and external diffing are controlled by these diff options, we can get rid of the "plumbing versus porcelain" distinction when reading the config. This was an attempt to control the same thing, but suffered from being too coarse-grained. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2008-10-18diff: add filter for converting binary to textJeff King
When diffing binary files, it is sometimes nice to see the differences of a canonical text form rather than either a binary patch or simply "binary files differ." Until now, the only option for doing this was to define an external diff command to perform the diff. This was a lot of work, since the external command needed to take care of doing the diff itself (including mode changes), and lost the benefit of git's colorization and other options. This patch adds a text conversion option, which converts a file to its canonical format before performing the diff. This is less flexible than an arbitrary external diff, but is much less work to set up. For example: $ echo '*.jpg diff=exif' >>.gitattributes $ git config diff.exif.textconv exiftool $ git config diff.exif.binary false allows one to see jpg diffs represented by the text output of exiftool. Signed-off-by: Jeff King <> Signed-off-by: Shawn O. Pearce <>
2008-10-18diff: introduce diff.<driver>.binaryJeff King
The "diff" gitattribute is somewhat overloaded right now. It can say one of three things: 1. this file is definitely binary, or definitely not (i.e., diff or !diff) 2. this file should use an external diff engine (i.e., diff=foo, = custom-script) 3. this file should use particular funcname patterns (i.e., diff=foo, = some-regex) Most of the time, there is no conflict between these uses, since using one implies that the other is irrelevant (e.g., an external diff engine will decide for itself whether the file is binary). However, there is at least one conflicting situation: there is no way to say "use the regular rules to determine whether this file is binary, but if we do diff it textually, use this funcname pattern." That is, currently setting diff=foo indicates that the file is definitely text. This patch introduces a "binary" config option for a diff driver, so that one can explicitly set We default this value to "don't know". That is, setting a diff attribute to "foo" and using "" will have no effect on the binaryness of a file. To get the current behavior, one can set to true. This patch also has one additional advantage: it cleans up the interface to the userdiff code a bit. Before, calling code had to know more about whether attributes were false, true, or unset to determine binaryness. Now that binaryness is a property of a driver, we can represent these situations just by passing back a driver struct. Signed-off-by: Jeff King <> Signed-off-by: Shawn O. Pearce <>
2008-10-18diff: unify external diff and funcname parsing codeJeff King
Both sets of code assume that one specifies a diff profile as a gitattribute via the "diff=foo" attribute. They then pull information about that profile from the config as*. The code for each is currently completely separate from the other, which has several disadvantages: - there is duplication as we maintain code to create and search the separate lists of external drivers and funcname patterns - it is difficult to add new profile options, since it is unclear where they should go - the code is difficult to follow, as we rely on the "check if this file is binary" code to find the funcname pattern as a side effect. This is the first step in refactoring the binary-checking code. This patch factors out these diff profiles into "userdiff" drivers. A file with "diff=foo" uses the "foo" driver, which is specified by a single struct. Note that one major difference between the two pieces of code is that the funcname patterns are always loaded, whereas external drivers are loaded only for the "git diff" porcelain; the new code takes care to retain that situation. Signed-off-by: Jeff King <> Signed-off-by: Shawn O. Pearce <>