path: root/xdiff/xdiff.h
AgeCommit message (Collapse)Author
2022-07-08xdiff: introduce xdl_callocPhillip Wood
In preparation for introducing XDL_CALLOC_ARRAY() use calloc() to obtain zeroed out memory rather than malloc() followed by memset(). To try and keep the lines a reasonable length this commit also stops casting the pointer returned by calloc() as this is unnecessary. Signed-off-by: Phillip Wood <> Signed-off-by: Junio C Hamano <>
2021-12-01xdiff: implement a zealous diff3, or "zdiff3"Phillip Wood
"zdiff3" is identical to ordinary diff3 except that it allows compaction of common lines on the two sides of history at the beginning or end of the conflict hunk. For example, the following diff3 conflict: 1 2 3 4 <<<<<< A B C D E |||||| 5 6 ====== A X C Y E >>>>>> 7 8 9 has common lines 'A', 'C', and 'E' on the two sides. With zdiff3, one would instead get the following conflict: 1 2 3 4 A <<<<<< B C D |||||| 5 6 ====== X C Y >>>>>> E 7 8 9 Note that the common lines, 'A', and 'E' were moved outside the conflict. Unlike with the two-way conflicts from the 'merge' conflictStyle, the zdiff3 conflict is NOT split into multiple conflict regions to allow the common 'C' lines to be shown outside a conflict, because zdiff3 shows the base version too and the base version cannot be reasonably split. Note also that the removing of lines common to the two sides might make the remaining text inside the conflict region match the base text inside the conflict region (for example, if the diff3 conflict had '5 6 E' on the right side of the conflict, then the common line 'E' would be moved outside and both the base and right side's remaining conflict text would be the lines '5' and '6'). This has the potential to surprise users and make them think there should not have been a conflict, but there definitely was a conflict and it should remain. Based-on-patch-by: Uwe Kleine-König <> Co-authored-by: Elijah Newren <> Signed-off-by: Phillip Wood <> Signed-off-by: Elijah Newren <> Signed-off-by: Junio C Hamano <>
2021-05-11xdiff-interface: replace discard_hunk_line() with a flagÆvar Arnfjörð Bjarmason
Remove the dummy discard_hunk_line() function added in 3b40a090fd4 (diff: avoid generating unused hunk header lines, 2018-11-02) in favor of having a new XDL_EMIT_NO_HUNK_HDR flag, for use along with the two existing and similar XDL_EMIT_* flags. Unlike the recently amended xdiff_emit_line_fn interface which'll be called in a loop in xdl_emit_diff(), the hunk header is only emitted once. It makes more sense to pass this as a flag than provide a dummy callback because that function may be able to skip doing certain work if it knows the caller is doing nothing with the hunk header. It would be possible to do so in the case of -U0 now, but the benefit of doing so is so small that I haven't bothered. But this leaves the door open to that, and more importantly makes the API use more intuitive. The reason we're putting a flag in the gap between 1<<0 and 1<<2 is that the old 1<<1 flag was removed in 907681e940d (xdiff: drop XDL_EMIT_COMMON, 2016-02-23) without re-ordering the remaining flags. Signed-off-by: Ævar Arnfjörð Bjarmason <> Signed-off-by: Junio C Hamano <>
2020-10-20diff: add -I<regex> that ignores matching changesMichał Kępień
Add a new diff option that enables ignoring changes whose all lines (changed, removed, and added) match a given regular expression. This is similar to the -I/--ignore-matching-lines option in standalone diff utilities and can be used e.g. to ignore changes which only affect code comments or to look for unrelated changes in commits containing a large number of automatically applied modifications (e.g. a tree-wide string replacement). The difference between -G/-S and the new -I option is that the latter filters output on a per-change basis. Use the 'ignore' field of xdchange_t for marking a change as ignored or not. Since the same field is used by --ignore-blank-lines, identical hunk emitting rules apply for --ignore-blank-lines and -I. These two options can also be used together in the same git invocation (they are complementary to each other). Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate that it is logically a "sibling" of xdl_mark_ignorable_regex() rather than its "parent". Signed-off-by: Michał Kępień <> Signed-off-by: Junio C Hamano <>
2019-04-12xdiff: use xmalloc/xreallocJeff King
Most of xdiff uses a bare malloc() to allocate memory, and returns an error when we get NULL. However, there are a few spots which don't check the return value and may segfault, including at least xdl_merge() and xpatience.c's find_longest_common_sequence(). Let's use xmalloc() everywhere instead, so that we get a graceful die() for these cases, without having to do further auditing. This does mean the existing cases which check errors will now die() instead of returning an error up the stack. But: - that's how the rest of Git behaves already for malloc errors - all of the callers of xdi_diff(), etc, die upon seeing an error So while we might one day want to fully lib-ify the diff code and make it possible to use as part of a long-running process, we're not close to that now. And because we're just tweaking the xdl_malloc() macro here, we're not really moving ourselves any further away from that. We could, for example, simplify some of the functions which handle malloc() errors which can no longer occur. But that would probably be taking us in the wrong direction. This also makes our malloc handling more consistent with the rest of Git, including enforcing GIT_ALLOC_LIMIT and trying to reclaim pack memory when needed. Reported-by: 王健强 <> Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2018-11-02xdiff: provide a separate emit callback for hunksJeff King
The xdiff library always emits hunk header lines to our callbacks as formatted strings like "@@ -a,b +c,d @@\n". This is convenient if we're going to output a diff, but less so if we actually need to compute using those numbers, which requires re-parsing the line. In preparation for moving away from this, let's teach xdiff a new callback function which gets the broken-out hunk information. To help callers that don't want to use this new callback, if it's NULL we'll continue to format the hunk header into a string. Note that this function renames the "outf" callback to "out_line", as well. This isn't strictly necessary, but helps in two ways: 1. Now that there are two callbacks, it's nice to use more descriptive names. 2. Many callers did not zero the emit_callback_data struct, and needed to be modified to set ecb.out_hunk to NULL. By changing the name of the existing struct member, that guarantees that any new callers from in-flight topics will break the build and be examined manually. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2018-07-17xdiff/xdiff.h: remove unused flagsStefan Beller
These flags were there since the beginning (3443546f6e (Use a *real* built-in diff generator, 2006-03-24), but were never used. Remove them. Signed-off-by: Stefan Beller <> Signed-off-by: Junio C Hamano <>
2017-11-28diff: support anchoring line(s)Jonathan Tan
Teach diff a new algorithm, one that attempts to prevent user-specified lines from appearing as a deletion or addition in the end result. The end user can use this by specifying "--anchored=<text>" one or more times when using Git commands like "diff" and "show". Signed-off-by: Jonathan Tan <> Signed-off-by: Junio C Hamano <>
2017-11-27Merge branch 'jc/ignore-cr-at-eol'Junio C Hamano
The "diff" family of commands learned to ignore differences in carriage return at the end of line. * jc/ignore-cr-at-eol: diff: --ignore-cr-at-eol xdiff: reassign xpparm_t.flags bits
2017-11-09Replace Free Software Foundation address in license noticesTodd Zullinger
The mailing address for the FSF has changed over the years. Rather than updating the address across all files, refer readers to, as the GNU GPL documentation now suggests for license notices. The mailing address is retained in the full license files (COPYING and LGPL-2.1). The old address is still present in t/diff-lib/COPYING. This is intentional, as the file is used in tests and the contents are not expected to change. Signed-off-by: Todd Zullinger <> Signed-off-by: Junio C Hamano <>
2017-11-08diff: --ignore-cr-at-eolJunio C Hamano
A new option --ignore-cr-at-eol tells the diff machinery to treat a carriage-return at the end of a (complete) line as if it does not exist. Just like other "--ignore-*" options to ignore various kinds of whitespace differences, this will help reviewing the real changes you made without getting distracted by spurious CRLF<->LF conversion made by your editor program. Helped-by: Johannes Schindelin <> [jch: squashed in command line completion by Dscho] Signed-off-by: Junio C Hamano <>
2017-10-27xdiff: reassign xpparm_t.flags bitsJunio C Hamano
We have packed the bits too tightly in such a way that it is not easy to add a new type of whitespace ignoring option, a new type of LCS algorithm, or a new type of post-cleanup heuristics. Reorder bits a bit to give room for these three classes of options to grow. Also make use of XDF_WHITESPACE_FLAGS macro where we check any of these bits are on, instead of using DIFF_XDL_TST() macro on individual possibilities. That way, the "is any of the bits on?" code does not have to change when we add more ways to ignore whitespaces. While at it, add a comment in front of the bit definitions to clarify in which structure these defined bits may appear. Signed-off-by: Junio C Hamano <>
2016-12-23diff: retire "compaction" heuristicsJunio C Hamano
When a patch inserts a block of lines, whose last lines are the same as the existing lines that appear before the inserted block, "git diff" can choose any place between these existing lines as the boundary between the pre-context and the added lines (adjusting the end of the inserted block as appropriate) to come up with variants of the same patch, and some variants are easier to read than others. We have been trying to improve the choice of this boundary, and Git 2.11 shipped with an experimental "compaction-heuristic". Since then another attempt to improve the logic further resulted in a new "indent-heuristic" logic. It is agreed that the latter gives better result overall, and the former outlived its usefulness. Retire "compaction", and keep "indent" as an experimental feature. The latter hopefully will be turned on by default in a future release, but that should be done as a separate step. Suggested-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2016-09-19diff: improve positioning of add/delete blocks in diffsMichael Haggerty
Some groups of added/deleted lines in diffs can be slid up or down, because lines at the edges of the group are not unique. Picking good shifts for such groups is not a matter of correctness but definitely has a big effect on aesthetics. For example, consider the following two diffs. The first is what standard Git emits: --- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl +++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl @@ -231,6 +231,9 @@ if (!defined $initial_reply_to && $prompting) { } if (!$smtp_server) { + $smtp_server = $repo->config('sendemail.smtpserver'); +} +if (!$smtp_server) { foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) { if (-x $_) { $smtp_server = $_; The following diff is equivalent, but is obviously preferable from an aesthetic point of view: --- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl +++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl @@ -230,6 +230,9 @@ if (!defined $initial_reply_to && $prompting) { $initial_reply_to =~ s/(^\s+|\s+$)//g; } +if (!$smtp_server) { + $smtp_server = $repo->config('sendemail.smtpserver'); +} if (!$smtp_server) { foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) { if (-x $_) { This patch teaches Git to pick better positions for such "diff sliders" using heuristics that take the positions of nearby blank lines and the indentation of nearby lines into account. The existing Git code basically always shifts such "sliders" as far down in the file as possible. The only exception is when the slider can be aligned with a group of changed lines in the other file, in which case Git favors depicting the change as one add+delete block rather than one add and a slightly offset delete block. This naive algorithm often yields ugly diffs. Commit d634d61ed6 improved the situation somewhat by preferring to position add/delete groups to make their last line a blank line, when that is possible. This heuristic does more good than harm, but (1) it can only help if there are blank lines in the right places, and (2) always picks the last blank line, even if there are others that might be better. The end result is that it makes perhaps 1/3 as many errors as the default Git algorithm, but that still leaves a lot of ugly diffs. This commit implements a new and much better heuristic for picking optimal "slider" positions using the following approach: First observe that each hypothetical positioning of a diff slider introduces two splits: one between the context lines preceding the group and the first added/deleted line, and the other between the last added/deleted line and the first line of context following it. It tries to find the positioning that creates the least bad splits. Splits are evaluated based only on the presence and locations of nearby blank lines, and the indentation of lines near the split. Basically, it prefers to introduce splits adjacent to blank lines, between lines that are indented less, and between lines with the same level of indentation. In more detail: 1. It measures the following characteristics of a proposed splitting position in a `struct split_measurement`: * the number of blank lines above the proposed split * whether the line directly after the split is blank * the number of blank lines following that line * the indentation of the nearest non-blank line above the split * the indentation of the line directly below the split * the indentation of the nearest non-blank line after that line 2. It combines the measured attributes using a bunch of empirically-optimized weighting factors to derive a `struct split_score` that measures the "badness" of splitting the text at that position. 3. It combines the `split_score` for the top and the bottom of the slider at each of its possible positions, and selects the position that has the best `split_score`. I determined the initial set of weighting factors by collecting a corpus of Git histories from 29 open-source software projects in various programming languages. I generated many diffs from this corpus, and determined the best positioning "by eye" for about 6600 diff sliders. I used about half of the repositories in the corpus (corresponding to about 2/3 of the sliders) as a training set, and optimized the weights against this corpus using a crude automated search of the parameter space to get the best agreement with the manually-determined values. Then I tested the resulting heuristic against the full corpus. The results are summarized in the following table, in column `indent-1`: | repository | count | Git 2.9.0 | compaction | compaction-fixed | indent-1 | indent-2 | | --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- | | afnetworking | 109 | 89 (81.7%) | 37 (33.9%) | 37 (33.9%) | 2 (1.8%) | 2 (1.8%) | | alamofire | 30 | 18 (60.0%) | 14 (46.7%) | 15 (50.0%) | 0 (0.0%) | 0 (0.0%) | | angular | 184 | 127 (69.0%) | 39 (21.2%) | 23 (12.5%) | 5 (2.7%) | 5 (2.7%) | | animate | 313 | 2 (0.6%) | 2 (0.6%) | 2 (0.6%) | 2 (0.6%) | 2 (0.6%) | | ant | 380 | 356 (93.7%) | 152 (40.0%) | 148 (38.9%) | 15 (3.9%) | 15 (3.9%) | * | bugzilla | 306 | 263 (85.9%) | 109 (35.6%) | 99 (32.4%) | 14 (4.6%) | 15 (4.9%) | * | corefx | 126 | 91 (72.2%) | 22 (17.5%) | 21 (16.7%) | 6 (4.8%) | 6 (4.8%) | | couchdb | 78 | 44 (56.4%) | 26 (33.3%) | 28 (35.9%) | 6 (7.7%) | 6 (7.7%) | * | cpython | 937 | 158 (16.9%) | 50 (5.3%) | 49 (5.2%) | 5 (0.5%) | 5 (0.5%) | * | discourse | 160 | 95 (59.4%) | 42 (26.2%) | 36 (22.5%) | 18 (11.2%) | 13 (8.1%) | | docker | 307 | 194 (63.2%) | 198 (64.5%) | 253 (82.4%) | 8 (2.6%) | 8 (2.6%) | * | electron | 163 | 132 (81.0%) | 38 (23.3%) | 39 (23.9%) | 6 (3.7%) | 6 (3.7%) | | git | 536 | 470 (87.7%) | 73 (13.6%) | 78 (14.6%) | 16 (3.0%) | 16 (3.0%) | * | gitflow | 127 | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | | ionic | 133 | 89 (66.9%) | 29 (21.8%) | 38 (28.6%) | 1 (0.8%) | 1 (0.8%) | | ipython | 482 | 362 (75.1%) | 167 (34.6%) | 169 (35.1%) | 11 (2.3%) | 11 (2.3%) | * | junit | 161 | 147 (91.3%) | 67 (41.6%) | 66 (41.0%) | 1 (0.6%) | 1 (0.6%) | * | lighttable | 15 | 5 (33.3%) | 0 (0.0%) | 2 (13.3%) | 0 (0.0%) | 0 (0.0%) | | magit | 88 | 75 (85.2%) | 11 (12.5%) | 9 (10.2%) | 1 (1.1%) | 0 (0.0%) | | neural-style | 28 | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | | nodejs | 781 | 649 (83.1%) | 118 (15.1%) | 111 (14.2%) | 4 (0.5%) | 5 (0.6%) | * | phpmyadmin | 491 | 481 (98.0%) | 75 (15.3%) | 48 (9.8%) | 2 (0.4%) | 2 (0.4%) | * | react-native | 168 | 130 (77.4%) | 79 (47.0%) | 81 (48.2%) | 0 (0.0%) | 0 (0.0%) | | rust | 171 | 128 (74.9%) | 30 (17.5%) | 27 (15.8%) | 16 (9.4%) | 14 (8.2%) | | spark | 186 | 149 (80.1%) | 52 (28.0%) | 52 (28.0%) | 2 (1.1%) | 2 (1.1%) | | tensorflow | 115 | 66 (57.4%) | 48 (41.7%) | 48 (41.7%) | 5 (4.3%) | 5 (4.3%) | | test-more | 19 | 15 (78.9%) | 2 (10.5%) | 2 (10.5%) | 1 (5.3%) | 1 (5.3%) | * | test-unit | 51 | 34 (66.7%) | 14 (27.5%) | 8 (15.7%) | 2 (3.9%) | 2 (3.9%) | * | xmonad | 23 | 22 (95.7%) | 2 (8.7%) | 2 (8.7%) | 1 (4.3%) | 1 (4.3%) | * | --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- | | totals | 6668 | 4391 (65.9%) | 1496 (22.4%) | 1491 (22.4%) | 150 (2.2%) | 144 (2.2%) | | totals (training set) | 4552 | 3195 (70.2%) | 1053 (23.1%) | 1061 (23.3%) | 86 (1.9%) | 88 (1.9%) | | totals (test set) | 2116 | 1196 (56.5%) | 443 (20.9%) | 430 (20.3%) | 64 (3.0%) | 56 (2.6%) | In this table, the numbers are the count and percentage of human-rated sliders that the corresponding algorithm got *wrong*. The columns are * "repository" - the name of the repository used. I used the diffs between successive non-merge commits on the HEAD branch of the corresponding repository. * "count" - the number of sliders that were human-rated. I chose most, but not all, sliders to rate from those among which the various algorithms gave different answers. * "Git 2.9.0" - the default algorithm used by `git diff` in Git 2.9.0. * "compaction" - the heuristic used by `git diff --compaction-heuristic` in Git 2.9.0. * "compaction-fixed" - the heuristic used by `git diff --compaction-heuristic` after the fixes from earlier in this patch series. Note that the results are not dramatically different than those for "compaction". Both produce non-ideal diffs only about 1/3 as often as the default `git diff`. * "indent-1" - the new `--indent-heuristic` algorithm, using the first set of weighting factors, determined as described above. * "indent-2" - the new `--indent-heuristic` algorithm, using the final set of weighting factors, determined as described below. * `*` - indicates that repo was part of training set used to determine the first set of weighting factors. The fact that the heuristic performed nearly as well on the test set as on the training set in column "indent-1" is a good indication that the heuristic was not over-trained. Given that fact, I ran a second round of optimization, using the entire corpus as the training set. The resulting set of weights gave the results in column "indent-2". These are the weights included in this patch. The final result gives consistently and significantly better results across the whole corpus than either `git diff` or `git diff --compaction-heuristic`. It makes only about 1/30 as many errors as the former and about 1/10 as many errors as the latter. (And a good fraction of the remaining errors are for diffs that involve weirdly-formatted code, sometimes apparently machine-generated.) The tools that were used to do this optimization and analysis, along with the human-generated data values, are recorded in a separate project [1]. This patch adds a new command-line option `--indent-heuristic`, and a new configuration setting `diff.indentHeuristic`, that activate this heuristic. This interface is only meant for testing purposes, and should be finalized before including this change in any release. [1] Signed-off-by: Michael Haggerty <> Signed-off-by: Junio C Hamano <>
2016-05-06Merge branch 'jk/diff-compact-heuristic'Junio C Hamano
Patch output from "git diff" and friends has been tweaked to be more readable by using a blank line as a strong hint that the contents before and after it belong to a logically separate unit. * jk/diff-compact-heuristic: diff: undocument the compaction heuristic knobs for experimentation xdiff: implement empty line chunk heuristic xdiff: add recs_match helper function
2016-04-19xdiff: implement empty line chunk heuristicStefan Beller
In order to produce the smallest possible diff and combine several diff hunks together, we implement a heuristic from GNU Diff which moves diff hunks forward as far as possible when we find common context above and below a diff hunk. This sometimes produces less readable diffs when writing C, Shell, or other programming languages, ie: ... /* + * + * + */ + +/* ... instead of the more readable equivalent of ... +/* + * + * + */ + /* ... Implement the following heuristic to (optionally) produce the desired output. If there are diff chunks which can be shifted around, shift each hunk such that the last common empty line is below the chunk with the rest of the context above. This heuristic appears to resolve the above example and several other common issues without producing significantly weird results. However, as with any heuristic it is not really known whether this will always be more optimal. Thus, it can be disabled via diff.compactionHeuristic. Signed-off-by: Stefan Beller <> Signed-off-by: Jacob Keller <> Signed-off-by: Stefan Beller <> Signed-off-by: Junio C Hamano <>
2016-02-23xdiff: drop XDL_EMIT_COMMONJeff King
There are no more callers that use this mode, and none likely to be added (as our xdl_merge() eliminates the common use of it for generating 3-way merge bases). This is effectively a revert of a9ed376 (xdiff: generate "anti-diffs" aka what is common to two files, 2006-06-28), though of course trying to revert that ancient commit directly produces many textual conflicts. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2013-06-19diff: add --ignore-blank-lines optionAntoine Pelisse
The goal of the patch is to introduce the GNU diff -B/--ignore-blank-lines as closely as possible. The short option is not available because it's already used for "break-rewrites". When this option is used, git-diff will not create hunks that simply add or remove empty lines, but will still show empty lines addition/suppression if they are close enough to "valuable" changes. There are two differences between this option and GNU diff -B option: - GNU diff doesn't have "--inter-hunk-context", so this must be handled - The following sequence looks like a bug (context is displayed twice): $ seq 5 >file1 $ cat <<EOF >file2 change 1 2 3 4 5 change EOF $ diff -u -B file1 file2 --- file1 2013-06-08 22:13:04.471517834 +0200 +++ file2 2013-06-08 22:13:23.275517855 +0200 @@ -1,5 +1,7 @@ +change 1 2 + 3 4 5 @@ -3,3 +5,4 @@ 3 4 5 +change So here is a more thorough description of the option: - real changes are interesting - blank lines that are close enough (less than context size) to interesting changes are considered interesting (recursive definition) - "context" lines are used around each hunk of interesting changes - If two hunks are separated by less than "inter-hunk-context", they will be merged into one. The implementation does the "interesting changes selection" in a single pass. Signed-off-by: Antoine Pelisse <> Signed-off-by: Junio C Hamano <>
2012-05-09xdiff: remove emit_func() and xdi_diff_hunks()René Scharfe
The functions are unused now, remove them. Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2012-05-09xdiff: add hunk_func()René Scharfe
Add a way to register a callback function that is gets passed the start line and line count of each hunk of a diff. Only standard types are used. Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2012-02-19xdiff: PATIENCE/HISTOGRAM are not independent option bitsJunio C Hamano
Because the default Myers, patience and histogram algorithms cannot be in effect at the same time, XDL_PATIENCE_DIFF and XDL_HISTOGRAM_DIFF are not independent bits. Instead of wasting one bit per algorithm, define a few macros to access the few bits they occupy and update the code that access them. Signed-off-by: Junio C Hamano <>
2012-02-19xdiff: remove XDL_PATCH_* macrosJunio C Hamano
These are not used anywhere in our codebase, and the bit assignment definition is merely confusing. Signed-off-by: Junio C Hamano <>
2011-10-10diff: add option to show whole functions as contextRené Scharfe
Add the option -W/--function-context to git diff. It is similar to the same option of git grep and expands the context of change hunks so that the whole surrounding function is shown. This "natural" context can allow changes to be understood better. Note: GNU patch doesn't like diffs generated with the new option; it seems to expect context lines to be the same before and after changes. git apply doesn't complain. This implementation has the same shortcoming as the one in grep, namely that there is no way to explicitly find the end of a function. That means that a few lines of extra context are shown, right up to the next recognized function begins. It's already useful in its current form, though. The function get_func_line() in xdiff/xemit.c is extended to work forward as well as backward to find post-context as well as pre-context. It returns the position of the first found matching line. The func_line parameter is made optional, as we don't need it for -W. The enhanced function is then used in xdl_emit_diff() to extend the context as needed. If the added context overlaps with the next change, it is merged into the current hunk. Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2011-08-03xdiff: do away with xdl_mmfile_next()Tay Ray Chuan
Given our simple mmfile structure, xdl_mmfile_next() calls are redundant. Do away with calls to them. Signed-off-by: Tay Ray Chuan <> Signed-off-by: Junio C Hamano <>
2011-07-12teach --histogram to diffTay Ray Chuan
Port JGit's HistogramDiff algorithm over to C. Rough numbers (TODO) show that it is faster than its --patience cousin, as well as the default Meyers algorithm. The implementation has been reworked to use structs and pointers, instead of bitmasks, thus doing away with JGit's 2^28 line limit. We also use xdiff's default hash table implementation (xdl_hash_bits() with XDL_HASHLONG()) for convenience. Signed-off-by: Tay Ray Chuan <> Signed-off-by: Junio C Hamano <>
2010-03-21xdl_merge(): move file1 and file2 labels to xmparam structureJonathan Nieder
The labels for the three participants in a potential conflict are all optional arguments for the xdiff merge routine; if they are NULL, then xdl_merge() can cope by omitting the labels from its output. Move them to the xmparam structure to allow new callers to save some keystrokes where they are not needed. This also has the virtue of making the xdiff merge interface more similar to merge_trees, which might make it easier to learn. Signed-off-by: Jonathan Nieder <> Signed-off-by: Junio C Hamano <>
2010-03-21xdl_merge(): add optional ancestor label to diff3-style outputJonathan Nieder
The ‘git checkout --conflict=diff3’ command can be used to present conflicts hunks including text from the common ancestor: <<<<<<< ours ourside ||||||| original ======= theirside >>>>>>> theirs The added information is helpful for resolving merges by hand, and merge tools can usually grok it because it is very similar to the output from diff3 -m. A subtle change can help more tools to understand the output. ‘diff3’ includes the name of the merge base on the ||||||| line of the output, and some tools misparse the conflict hunks without it. Add a new xmp->ancestor parameter to xdl_merge() for use with conflict style XDL_MERGE_DIFF3 as a label on the ||||||| line for any conflict hunks. If xmp->ancestor is NULL, the output format is unchanged. Thus, this change only provides unexposed plumbing for the new feature; it does not affect the outward behavior of git. Requested-by: Stefan Monnier <> Signed-off-by: Jonathan Nieder <> Acked-by: Bert Wesarg <> Signed-off-by: Junio C Hamano <>
2010-03-02refactor merge flags into xmparam_tBert Wesarg
Include the merge level, favor, and style flags into the xmparam_t struct. This removes the bit twiddling with these three values into the one flags parameter. Signed-off-by: Bert Wesarg <> Signed-off-by: Junio C Hamano <>
2010-03-02make union merge an xdl merge favorBert Wesarg
The current union merge driver is implemented as an post process. But the xdl_merge code is quite capable to produce the result by itself. Therefore move it there. Signed-off-by: Bert Wesarg <> Signed-off-by: Junio C Hamano <>
2010-01-21Merge branch 'jc/conflict-marker-size'Junio C Hamano
* jc/conflict-marker-size: rerere: honor conflict-marker-size attribute rerere: prepare for customizable conflict marker length conflict-marker-size: new attribute rerere: use ll_merge() instead of using xdl_merge() merge-tree: use ll_merge() not xdl_merge() xdl_merge(): allow passing down marker_size in xmparam_t xdl_merge(): introduce xmparam_t for merge specific parameters git_attr(): fix function signature Conflicts: builtin-merge-file.c ll-merge.c xdiff/xdiff.h xdiff/xmerge.c
2010-01-17xdl_merge(): allow passing down marker_size in xmparam_tJunio C Hamano
This allows the callers of xdl_merge() to pass marker_size (defaults to 7) in xmparam_t argument, to use conflict markers of non-default length. Signed-off-by: Junio C Hamano <>
2010-01-17xdl_merge(): introduce xmparam_t for merge specific parametersJunio C Hamano
So far we have only needed to be able to pass an option that is generic to xdiff family of functions to this function. Extend the interface so that we can give it merge specific parameters. Signed-off-by: Junio C Hamano <>
2009-11-30git-merge-file --ours, --theirsJunio C Hamano
Sometimes people want their conflicting merges autoresolved by favouring upstream changes. The standard answer they are given is to run "git diff --name-only | xargs git checkout MERGE_HEAD --" in such a case. This is to accept automerge results for the paths that are fully resolved automatically, while taking their version of the file in full for paths that have conflicts. This is problematic on two counts. One is that this is not exactly what these people want. It discards all changes they did on their branch for any paths that conflicted. They usually want to salvage as much automerge result as possible in a conflicted file, and want to take the upstream change only in the conflicted part. This patch teaches two new modes of operation to the lowest-lever merge machinery, xdl_merge(). Instead of leaving the conflicted lines from both sides enclosed in <<<, ===, and >>> markers, the conflicts are resolved favouring our side or their side of changes. A larger problem is that this tends to encourage a bad workflow by allowing people to record such a mixed up half-merged result as a full commit without auditing. This commit does not tackle this issue at all. In git, we usually give long enough rope to users with strange wishes as long as the risky features are not enabled by default, and this is such a risky feature. Signed-off-by: Avery Pennarun <> Signed-off-by: Junio C Hamano <>
2009-01-24Merge branch 'js/patience-diff'Junio C Hamano
* js/patience-diff: bash completions: Add the --patience option Introduce the diff option '--patience' Implement the patience diff algorithm Conflicts: contrib/completion/git-completion.bash
2009-01-07Implement the patience diff algorithmJohannes Schindelin
The patience diff algorithm produces slightly more intuitive output than the classic Myers algorithm, as it does not try to minimize the number of +/- lines first, but tries to preserve the lines that are unique. To this end, it first determines lines that are unique in both files, then the maximal sequence which preserves the order (relative to both files) is extracted. Starting from this initial set of common lines, the rest of the lines is handled recursively, with Myers' algorithm as a fallback when the patience algorithm fails (due to no common unique lines). This patch includes memory leak fixes by Pierre Habouzit. Signed-off-by: Johannes Schindelin <> Signed-off-by: Junio C Hamano <>
2008-12-29diff: add option to show context between close hunksRené Scharfe
Merge two hunks if there is only the specified number of otherwise unshown context between them. For --inter-hunk-context=1, the resulting patch has the same number of lines but shows uninterrupted context instead of a context header line in between. Patches generated with this option are easier to read but are also more likely to conflict if the file to be patched contains other changes. This patch keeps the default for this option at 0. It is intended to just make the feature available in order to see its advantages and downsides. Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2008-10-25Allow alternate "low-level" emit function from xdl_diffBrian Downing
For some users (e.g. git blame), getting textual patch output is just extra work, as they can get all the information they need from the low- level diff structures. Allow for an alternate low-level emit function to be defined to allow bypassing the textual patch generation; set xemitconf_t's emit_func member to enable this. The (void (*)()) type is pretty ugly, but the alternative would be to include most of the private xdiff headers in xdiff.h to get the types required for the "proper" function prototype. Also, a (void *) won't work, as ANSI C doesn't allow a function pointer to be cast to an object pointer. Signed-off-by: Brian Downing <> Signed-off-by: Junio C Hamano <>
2008-08-31xdiff-merge: optionally show conflicts in "diff3 -m" styleJunio C Hamano
When showing conflicting merges, we traditionally followed RCS's merge output format. The output shows: <<<<<<< postimage from one side; ======= postimage of the other side; and >>>>>>> Some poeple find it easier to be able to understand what is going on when they can view the common ancestor's version, which is used by "diff3 -m", which shows: <<<<<<< postimage from one side; ||||||| shared preimage; ======= postimage of the other side; and >>>>>>> This is an initial step to bring that as an optional feature to git. Only "git merge-file" has been converted, with "--diff3" option. Signed-off-by: Junio C Hamano <>
2008-02-18xdl_merge(): introduce XDL_MERGE_ZEALOUS_ALNUMJohannes Schindelin
When a merge conflicts, there are often common lines that are not really common, such as empty lines or lines containing a single curly bracket. With XDL_MERGE_ZEALOUS_ALNUM, we use the following heuristics: when a hunk does not contain any letters or digits, it is treated as conflicting. In other words, a conflict which used to look like this: <<<<<<< a = 1; ======= output(); >>>>>>> } } } <<<<<<< output(); ======= b = 1; >>>>>>> will look like this with ZEALOUS_ALNUM: <<<<<<< a = 1; } } } output(); ======= output(); } } } b = 1; >>>>>>> To demonstrate this, git-merge-file has been switched from XDL_MERGE_ZEALOUS to XDL_MERGE_ZEALOUS_ALNUM. Signed-off-by: Johannes Schindelin <> Signed-off-by: Junio C Hamano <>
2007-07-06Per-path attribute based hunk header selection.Junio C Hamano
This makes"diff -p" hunk headers customizable via gitattributes mechanism. It is based on Johannes's earlier patch that allowed to define a single regexp to be used for everything. The mechanism to arrive at the regexp that is used to define hunk header is the same as other use of gitattributes. You assign an attribute, funcname (because "diff -p" typically uses the name of the function the patch is about as the hunk header), a simple string value. This can be one of the names of built-in pattern (currently, "java" is defined) or a custom pattern name, to be looked up from the configuration file. (in .gitattributes) *.java funcname=java *.perl funcname=perl (in .git/config) [funcname] java = ... # ugly and complicated regexp to override the built-in one. perl = ... # another ugly and complicated regexp to define a new one. Signed-off-by: Junio C Hamano <>
2007-06-07War on whitespaceJunio C Hamano
This uses "git-apply --whitespace=strip" to fix whitespace errors that have crept in to our source files over time. There are a few files that need to have trailing whitespaces (most notably, test vectors). The results still passes the test, and build result in Documentation/ area is unchanged. Signed-off-by: Junio C Hamano <>
2007-02-14teach diff machinery about --ignore-space-at-eolJohannes Schindelin
`git diff --ignore-space-at-eol` will ignore whitespace at the line ends. Signed-off-by: Johannes Schindelin <> Signed-off-by: Junio C Hamano <>
2006-12-03xdiff: add xdl_merge()Johannes Schindelin
This new function implements the functionality of RCS merge, but in-memory. It returns < 0 on error, otherwise the number of conflicts. Finding the conflicting lines can be a very expensive task. You can control the eagerness of this algorithm: - a level value of 0 means that all overlapping changes are treated as conflicts, - a value of 1 means that if the overlapping changes are identical, it is not treated as a conflict. - If you set level to 2, overlapping changes will be analyzed, so that almost identical changes will not result in huge conflicts. Rather, only the conflicting lines will be shown inside conflict markers. With each increasing level, the algorithm gets slower, but more accurate. Note that the code for level 2 depends on the simple definition of mmfile_t specific to git, and therefore it will be harder to port that to LibXDiff. Signed-off-by: Johannes Schindelin <> Signed-off-by: Junio C Hamano <>
2006-06-29xdiff: generate "anti-diffs" aka what is common to two filesLinus Torvalds
This fairly trivial patch adds a new XDL_EMIT_xxx flag to tell libxdiff that we don't want to generate the _diff_ between two files, we want to see the lines that are _common_ to two files. So when you set XDL_EMIT_COMMON, xdl_diff() will do everything exactly like it used to do, but the output records it generates just contain the lines that aren't part of the diff. This is for doing things like generating the common base case for a file that was added in both branches. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2006-06-24Teach diff about -b and -w flagsJohannes Schindelin
This adds -b (--ignore-space-change) and -w (--ignore-all-space) flags to diff. The main part of the patch is teaching libxdiff about it. [jc: renamed xdl_line_match() to xdl_recmatch() since the former is used for different purposes in xpatchi.c which is in the parts of the upstream source we do not use.] Signed-off-by: Johannes Schindelin <> Signed-off-by: Junio C Hamano <>
2006-03-28xdiff: Show function names in hunk headers.Mark Wooding
The speed of the built-in diff generator is nice; but the function names shown by `diff -p' are /really/ nice. And I hate having to choose. So, we hack xdiff to find the function names and print them. xdiff has grown a flag to say whether to dig up the function names. The builtin_diff function passes this flag unconditionally. I suppose it could parse GIT_DIFF_OPTS, but it doesn't at the moment. I've also reintroduced the `function name' into the test suite, from which it was removed in commit 3ce8f089. The function names are parsed by a particularly stupid algorithm at the moment: it just tries to find a line in the `old' file, from before the start of the hunk, whose first character looks plausible. Still, it's most definitely a start. Signed-off-by: Mark Wooding <> Signed-off-by: Junio C Hamano <>
2006-03-26Use a *real* built-in diff generatorLinus Torvalds
This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>