path: root/xdiff
AgeCommit message (Collapse)Author
2012-05-02Merge branch 'tr/xdiff-fast-hash'Junio C Hamano
Use word-at-a-time comparison to find end of line or NUL (end of buffer), borrowed from the linux-kernel discussion. By Thomas Rast * tr/xdiff-fast-hash: xdiff: choose XDL_FAST_HASH code on sizeof(long) instead of __WORDSIZE xdiff: load full words in the inner loop of xdl_hash_record
2012-05-01xdiff: choose XDL_FAST_HASH code on sizeof(long) instead of __WORDSIZEThomas Rast
Darwin does not define __WORDSIZE, and compiles the 32-bit code path on 64-bit systems, resulting in a totally broken git. I could not find an alternative -- other than the platform symbols (__x86_64__ etc.) -- that does the test in the preprocessor. However, we can also just test for the size of a 'long', which is what really matters here. Any compiler worth its salt will leave only the branch relevant for its platform, and indeed on Linux/GCC the numbers don't change: Test tr/darwin-xdl-fast-hash origin/next origin/master ------------------------------------------------------------------------------------------------------------------ 4000.1: log -3000 (baseline) 0.09(0.07+0.01) 0.09(0.07+0.01) -5.5%* 0.09(0.07+0.01) -4.1% 4000.2: log --raw -3000 (tree-only) 0.47(0.41+0.05) 0.47(0.40+0.05) -0.5% 0.45(0.38+0.06) -3.5%. 4000.3: log -p -3000 (Myers) 1.81(1.67+0.12) 1.81(1.67+0.13) +0.3% 1.99(1.84+0.12) +10.2%*** 4000.4: log -p -3000 --histogram 1.79(1.66+0.11) 1.80(1.67+0.11) +0.4% 1.96(1.82+0.10) +9.2%*** 4000.5: log -p -3000 --patience 2.17(2.02+0.13) 2.20(2.04+0.13) +1.3%. 2.33(2.18+0.13) +7.4%*** ------------------------------------------------------------------------------------------------------------------ Significance hints: '.' 0.1 '*' 0.05 '**' 0.01 '***' 0.001 Noticed-by: Brian Gernhardt <> Signed-off-by: Thomas Rast <> Signed-off-by: Junio C Hamano <>
2012-04-16Merge branch 'jc/diff-algo-cleanup'Junio C Hamano
Resurrects the preparatory clean-up patches from another topic that was discarded, as this would give a saner foundation to build on diff.algo configuration option series. * jc/diff-algo-cleanup: xdiff: PATIENCE/HISTOGRAM are not independent option bits xdiff: remove XDL_PATCH_* macros
2012-04-10xdiff: load full words in the inner loop of xdl_hash_recordThomas Rast
Redo the hashing loop in xdl_hash_record in a way that loads an entire 'long' at a time, using masking tricks to see when and where we found the terminating '\n'. I stole inspiration and code from the posts by Linus Torvalds around His method reads the buffers in sizeof(long) increments, and may thus overrun it by at most sizeof(long)-1 bytes before it sees the final newline (or hits the buffer length check). I considered padding out all buffers by a suitable amount to "catch" the overrun, but * this does not work for mmap()'d buffers: if you map 4096+8 bytes from a 4096 byte file, accessing the last 8 bytes results in a SIGBUS on my machine; and * it would also be extremely ugly because it intrudes deep into the unpacking machinery. So I adapted it to not read beyond the buffer at all. Instead, it reads the final partial word byte-by-byte and strings it together. Then it can use the same logic as before to finish the hashing. So far we enable this only on x86_64, where it provides nice speedup for diff-related work: Test origin/next tr/xdiff-fast-hash ----------------------------------------------------------------------------- 4000.1: log -3000 (baseline) 0.07(0.05+0.02) 0.08(0.06+0.02) +14.3% 4000.2: log --raw -3000 (tree-only) 0.37(0.33+0.04) 0.37(0.32+0.04) +0.0% 4000.3: log -p -3000 (Myers) 1.75(1.65+0.09) 1.60(1.49+0.10) -8.6% 4000.4: log -p -3000 --histogram 1.73(1.62+0.09) 1.58(1.49+0.08) -8.7% 4000.5: log -p -3000 --patience 2.11(2.00+0.10) 1.94(1.80+0.11) -8.1% Perhaps other platforms could also benefit. However it does NOT work on big-endian systems! [jc: minimum style and compilation fixes] Signed-off-by: Thomas Rast <> Signed-off-by: Junio C Hamano <>
2012-02-19xdiff: PATIENCE/HISTOGRAM are not independent option bitsJunio C Hamano
Because the default Myers, patience and histogram algorithms cannot be in effect at the same time, XDL_PATIENCE_DIFF and XDL_HISTOGRAM_DIFF are not independent bits. Instead of wasting one bit per algorithm, define a few macros to access the few bits they occupy and update the code that access them. Signed-off-by: Junio C Hamano <>
2012-02-19xdiff: remove XDL_PATCH_* macrosJunio C Hamano
These are not used anywhere in our codebase, and the bit assignment definition is merely confusing. Signed-off-by: Junio C Hamano <>
2012-01-29Merge branch 'rs/diff-postimage-in-context'Junio C Hamano
* rs/diff-postimage-in-context: xdiff: print post-image for common records instead of pre-image
2012-01-06xdiff: print post-image for common records instead of pre-imageRené Scharfe
Normally it doesn't matter if we show the pre-image or th post-image for the common parts of a diff because they are the same. If white-space changes are ignored they can differ, though. The new text after applying the diff is more interesting in that case, so show that instead of the old contents. Note: GNU diff shows the pre-image. Suggested-by: Junio C Hamano <> Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2011-10-19Merge branch 'rs/diff-whole-function'Junio C Hamano
* rs/diff-whole-function: diff: add option to show whole functions as context xdiff: factor out get_func_line()
2011-10-14Merge branch 'rs/diff-cleanup-records-fix'Junio C Hamano
* rs/diff-cleanup-records-fix: diff: resurrect XDF_NEED_MINIMAL with --minimal Revert removal of multi-match discard heuristic in 27af01
2011-10-10diff: add option to show whole functions as contextRené Scharfe
Add the option -W/--function-context to git diff. It is similar to the same option of git grep and expands the context of change hunks so that the whole surrounding function is shown. This "natural" context can allow changes to be understood better. Note: GNU patch doesn't like diffs generated with the new option; it seems to expect context lines to be the same before and after changes. git apply doesn't complain. This implementation has the same shortcoming as the one in grep, namely that there is no way to explicitly find the end of a function. That means that a few lines of extra context are shown, right up to the next recognized function begins. It's already useful in its current form, though. The function get_func_line() in xdiff/xemit.c is extended to work forward as well as backward to find post-context as well as pre-context. It returns the position of the first found matching line. The func_line parameter is made optional, as we don't need it for -W. The enhanced function is then used in xdl_emit_diff() to extend the context as needed. If the added context overlaps with the next change, it is merged into the current hunk. Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2011-10-10xdiff: factor out get_func_line()René Scharfe
Move the code to search for a function line to be shown in the hunk header into its own function and to make returning the length-limited result string easier, introduce struct func_line. Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2011-09-26Revert removal of multi-match discard heuristic in 27af01René Scharfe
27af01d (xdiff/xprepare: improve O(n*m) performance in xdl_cleanup_records(), 2011-08-17) was supposed to be a performance boost only. However, it unexpectedly changed the behaviour of diff. Revert a part of 27af01d that removes logic that mark lines as "multi-match" (ie. dis[i] == 2). This was preventing the multi-match discard heuristic (performed in xdl_cleanup_records() and xdl_clean_mmatch()) from executing. Reported-by: Alexander Pepper <> Signed-off-by: René Scharfe <> Signed-off-by: Tay Ray Chuan <> Signed-off-by: Junio C Hamano <>
2011-09-06Merge branch 'rc/histogram-diff'Junio C Hamano
* rc/histogram-diff: xdiff/xprepare: initialise xdlclassifier_t cf in xdl_prepare_env()
2011-08-31xdiff/xprepare: initialise xdlclassifier_t cf in xdl_prepare_env()Tay Ray Chuan
Ensure that the xdl_free_classifier() call on xdlclassifier_t cf is safe even if xdl_init_classifier() isn't called. This may occur in the case where diff is run with --histogram and a call to, say, xdl_prepare_ctx() fails. Signed-off-by: Tay Ray Chuan <> Signed-off-by: Junio C Hamano <>
2011-08-18Merge branch 'rc/histogram-diff' into HEADJunio C Hamano
* rc/histogram-diff: xdiff/xhistogram: drop need for additional variable xdiff/xhistogram: rely on xdl_trim_ends() xdiff/xhistogram: rework handling of recursed results xdiff: do away with xdl_mmfile_next() Make test number unique xdiff/xprepare: use a smaller sample size for histogram diff xdiff/xprepare: skip classification teach --histogram to diff t4033-diff-patience: factor out tests xdiff/xpatience: factor out fall-back-diff function xdiff/xprepare: refactor abort cleanups xdiff/xprepare: use memset() Conflicts: xdiff/xprepare.c
2011-08-18xdiff/xprepare: improve O(n*m) performance in xdl_cleanup_records()Tay Ray Chuan
In xdl_cleanup_records(), we see O(n*m) performance, where n is the number of records from xdf->dstart to xdf->dend, and m is the size of a bucket in xdf->rhash (<= by mlim). Here, we improve this to O(n) by pre-computing nm (in rcrec->len(1|2)) in xdl_classify_record(). Reported-by: Marat Radchenko <> Signed-off-by: Tay Ray Chuan <> Signed-off-by: Junio C Hamano <>
2011-08-08xdiff/xhistogram: drop need for additional variableTay Ray Chuan
Having an additional variable (ptr) instead of changing line(1|2) and count(1|2) was for debugging purposes. Signed-off-by: Tay Ray Chuan <> Signed-off-by: Junio C Hamano <>
2011-08-08xdiff/xhistogram: rely on xdl_trim_ends()Tay Ray Chuan
Do away with reduce_common_start_end() and use xdf->dstart and xdf->dend set by xdl_trim_ends() that similarly tells us where the first unmatched line from the start and end occurs. Signed-off-by: Tay Ray Chuan <> Signed-off-by: Junio C Hamano <>
2011-08-08xdiff/xhistogram: rework handling of recursed resultsTay Ray Chuan
Previously we were over-complicating matters by trying to combine the recursed results. Now, terminate immediately if a recursive call failed and return its result. Signed-off-by: Tay Ray Chuan <> Signed-off-by: Junio C Hamano <>
2011-08-03xdiff: do away with xdl_mmfile_next()Tay Ray Chuan
Given our simple mmfile structure, xdl_mmfile_next() calls are redundant. Do away with calls to them. Signed-off-by: Tay Ray Chuan <> Signed-off-by: Junio C Hamano <>
2011-07-12xdiff/xprepare: use a smaller sample size for histogram diffTay Ray Chuan
For histogram diff, we can afford a smaller sample size and thus a poorer estimate of the number of lines, as the hash table (rhash) won't be filled up/grown. This is safe as the final count of lines (xdf.nrecs) will be updated correctly anyway by xdl_prepare_ctx(). This gives us a small boost in performance. Signed-off-by: Tay Ray Chuan <> Signed-off-by: Junio C Hamano <>
2011-07-12xdiff/xprepare: skip classificationTay Ray Chuan
xdiff performs "classification" of records (xdl_classify_record()), replacing hashes (xrecord_t.ha) with a unique identifier of the record/line and building a hash table (xrecord_t.rhash) of records. This is then used to "cleanup" records (xdl_cleanup_records()). We don't need any of that in histogram diff, so we omit calls to these functions. We also skip allocating memory to the hash table, rhash, as it is no longer used. This gives us a small boost in performance. Signed-off-by: Tay Ray Chuan <> Signed-off-by: Junio C Hamano <>
2011-07-12teach --histogram to diffTay Ray Chuan
Port JGit's HistogramDiff algorithm over to C. Rough numbers (TODO) show that it is faster than its --patience cousin, as well as the default Meyers algorithm. The implementation has been reworked to use structs and pointers, instead of bitmasks, thus doing away with JGit's 2^28 line limit. We also use xdiff's default hash table implementation (xdl_hash_bits() with XDL_HASHLONG()) for convenience. Signed-off-by: Tay Ray Chuan <> Signed-off-by: Junio C Hamano <>
2011-07-07xdiff/xpatience: factor out fall-back-diff functionTay Ray Chuan
This is in preparation for the histogram diff algorithm, which will also re-use much of the code to call the default Meyers diff algorithm. Signed-off-by: Tay Ray Chuan <> Signed-off-by: Junio C Hamano <>
2011-07-07xdiff/xprepare: refactor abort cleanupsTay Ray Chuan
Group free()'s that are called when a malloc() fails in xdl_prepare_ctx(), making for more readable code. Also add a free() on ha, in case future git hackers add allocs after the ha malloc. Signed-off-by: Tay Ray Chuan <> Signed-off-by: Junio C Hamano <>
2011-07-07xdiff/xprepare: use memset()Tay Ray Chuan
Use memset() instead of a for loop to initialize. This could give a performance advantage. Signed-off-by: Tay Ray Chuan <> Signed-off-by: Junio C Hamano <>
2010-11-17Merge branch 'cb/diff-fname-optim'Junio C Hamano
* cb/diff-fname-optim: diff: avoid repeated scanning while looking for funcname do not search functions for patch ID add rebase patch id tests
2010-10-06xdiff: cast arguments for ctype functions to unsigned charJonathan Nieder
The ctype functions isspace(), isalnum(), et al take an integer argument representing an unsigned character, or -1 for EOF. On platforms with a signed char, it is unsafe to pass a char to them without casting it to unsigned char first. Most of git is already shielded against this by the ctype implementation in git-compat-util.h, but xdiff, which uses libc ctype.h, ought to be fixed. Noticed-by: der Mouse <mouse@Rodents-Montreal.ORG> Reported-by: Ævar Arnfjörð Bjarmason <> Signed-off-by: Jonathan Nieder <> Signed-off-by: Junio C Hamano <>
2010-09-30diff: avoid repeated scanning while looking for funcnameRené Scharfe
For each hunk, xdl_find_func searches the preimage for a function name until the beginning of the file. If the file does not contain any function names, this search has complexity O(n^2) in the number of hunks n. Instead, inline xdl_find_func() and keep track of up to which line we have scanned already and the contents of the last funcname line that we have found. Noticed and a different approach proposed by Clemens Buchacher. This alternative solution was done by René Scharfe. Signed-off-by: Junio C Hamano <>
2010-07-06xdiff: optimise for no whitespace difference when ignoring whitespace.Dylan Reid
In xdl_recmatch, do the memcmp to check if the two lines are equal before checking if whitespace flags are set. If the lines are identical, then there is no need to check if they differ only in whitespace. This makes the common case (there is no whitespace difference) faster. It costs the case where lines are the same length and contain whitespace differences, but the common case is more than 20% faster. Signed-off-by: Dylan Reid <> Signed-off-by: Junio C Hamano <>
2010-05-01xdiff/xmerge.c: use memset() instead of explicit for-loopAlexey Mahotkin
memset() is heavily optimized, and resulting assembler code is about 150 lines less for that file. Signed-off-by: Alexey Mahotkin <> Signed-off-by: Junio C Hamano <>
2010-03-21xdl_merge(): move file1 and file2 labels to xmparam structureJonathan Nieder
The labels for the three participants in a potential conflict are all optional arguments for the xdiff merge routine; if they are NULL, then xdl_merge() can cope by omitting the labels from its output. Move them to the xmparam structure to allow new callers to save some keystrokes where they are not needed. This also has the virtue of making the xdiff merge interface more similar to merge_trees, which might make it easier to learn. Signed-off-by: Jonathan Nieder <> Signed-off-by: Junio C Hamano <>
2010-03-21xdl_merge(): add optional ancestor label to diff3-style outputJonathan Nieder
The ‘git checkout --conflict=diff3’ command can be used to present conflicts hunks including text from the common ancestor: <<<<<<< ours ourside ||||||| original ======= theirside >>>>>>> theirs The added information is helpful for resolving merges by hand, and merge tools can usually grok it because it is very similar to the output from diff3 -m. A subtle change can help more tools to understand the output. ‘diff3’ includes the name of the merge base on the ||||||| line of the output, and some tools misparse the conflict hunks without it. Add a new xmp->ancestor parameter to xdl_merge() for use with conflict style XDL_MERGE_DIFF3 as a label on the ||||||| line for any conflict hunks. If xmp->ancestor is NULL, the output format is unchanged. Thus, this change only provides unexposed plumbing for the new feature; it does not affect the outward behavior of git. Requested-by: Stefan Monnier <> Signed-off-by: Jonathan Nieder <> Acked-by: Bert Wesarg <> Signed-off-by: Junio C Hamano <>
2010-03-02refactor merge flags into xmparam_tBert Wesarg
Include the merge level, favor, and style flags into the xmparam_t struct. This removes the bit twiddling with these three values into the one flags parameter. Signed-off-by: Bert Wesarg <> Signed-off-by: Junio C Hamano <>
2010-03-02make union merge an xdl merge favorBert Wesarg
The current union merge driver is implemented as an post process. But the xdl_merge code is quite capable to produce the result by itself. Therefore move it there. Signed-off-by: Bert Wesarg <> Signed-off-by: Junio C Hamano <>
2010-01-21Merge branch 'jc/conflict-marker-size'Junio C Hamano
* jc/conflict-marker-size: rerere: honor conflict-marker-size attribute rerere: prepare for customizable conflict marker length conflict-marker-size: new attribute rerere: use ll_merge() instead of using xdl_merge() merge-tree: use ll_merge() not xdl_merge() xdl_merge(): allow passing down marker_size in xmparam_t xdl_merge(): introduce xmparam_t for merge specific parameters git_attr(): fix function signature Conflicts: builtin-merge-file.c ll-merge.c xdiff/xdiff.h xdiff/xmerge.c
2010-01-17xdl_merge(): allow passing down marker_size in xmparam_tJunio C Hamano
This allows the callers of xdl_merge() to pass marker_size (defaults to 7) in xmparam_t argument, to use conflict markers of non-default length. Signed-off-by: Junio C Hamano <>
2010-01-17xdl_merge(): introduce xmparam_t for merge specific parametersJunio C Hamano
So far we have only needed to be able to pass an option that is generic to xdiff family of functions to this function. Extend the interface so that we can give it merge specific parameters. Signed-off-by: Junio C Hamano <>
2009-11-30git-merge-file --ours, --theirsJunio C Hamano
Sometimes people want their conflicting merges autoresolved by favouring upstream changes. The standard answer they are given is to run "git diff --name-only | xargs git checkout MERGE_HEAD --" in such a case. This is to accept automerge results for the paths that are fully resolved automatically, while taking their version of the file in full for paths that have conflicts. This is problematic on two counts. One is that this is not exactly what these people want. It discards all changes they did on their branch for any paths that conflicted. They usually want to salvage as much automerge result as possible in a conflicted file, and want to take the upstream change only in the conflicted part. This patch teaches two new modes of operation to the lowest-lever merge machinery, xdl_merge(). Instead of leaving the conflicted lines from both sides enclosed in <<<, ===, and >>> markers, the conflicts are resolved favouring our side or their side of changes. A larger problem is that this tends to encourage a bad workflow by allowing people to record such a mixed up half-merged result as a full commit without auditing. This commit does not tackle this issue at all. In git, we usually give long enough rope to users with strange wishes as long as the risky features are not enabled by default, and this is such a risky feature. Signed-off-by: Avery Pennarun <> Signed-off-by: Junio C Hamano <>
2009-09-01Merge branch 'tf/diff-whitespace-incomplete-line'Junio C Hamano
* tf/diff-whitespace-incomplete-line: xutils: Fix xdl_recmatch() on incomplete lines xutils: Fix hashing an incomplete line with whitespaces at the end
2009-08-23xutils: Fix xdl_recmatch() on incomplete linesJunio C Hamano
Thell Fowler noticed that various "ignore whitespace" options to git diff do not work well on an incomplete line. The loop control of the function responsible for these bugs was extremely difficult to follow. This patch restructures the loops for three variants of "ignore whitespace" logic. The basic idea of the re-written logic is: - A loop runs while the characters from both strings we are looking at match. We declare unmatch immediately when we find something that does not match and return false from the function. We break out of the loop if we ran out of either side of the string. The way we skip spaces inside this loop varies depending on the style of ignoring whitespaces. - After the above loop breaks, we know that the parts of the strings we inspected so far match, ignoring the whitespaces. The lines can match only if the remainder consists of nothing but whitespaces. This part of the logic is shared across all three styles. The new code is more obvious and should be much easier to follow. Tested-by: Thell Fowler <> Signed-off-by: Junio C Hamano <>
2009-08-23xutils: Fix hashing an incomplete line with whitespaces at the endJunio C Hamano
Upon seeing a whitespace, xdl_hash_record_with_whitespace() first skipped the run of whitespaces (excluding LF) that begins there, ensuring that the pointer points at the last whitespace character in the run, and assumed that the next character must be LF at the end of the line. This does not work when hashing an incomplete line, which lacks the LF at the end. Introduce "at_eol" variable that is true when either we are at the end of line (looking at LF) or at the end of an incomplete line, and use that instead throughout the code. Noticed by Thell Fowler. Signed-off-by: Junio C Hamano <>
2009-07-23refactor: use bitsizeof() instead of 8 * sizeof()Pierre Habouzit
Signed-off-by: Pierre Habouzit <> Signed-off-by: Junio C Hamano <>
2009-06-02Merge branch 'cb/maint-1.6.0-xdl-merge-fix' into maintJunio C Hamano
* cb/maint-1.6.0-xdl-merge-fix: Change xdl_merge to generate output even for null merges t6023: merge-file fails to output anything for a degenerate merge Conflicts: xdiff/xmerge.c
2009-05-25Change xdl_merge to generate output even for null mergesCharles Bailey
xdl_merge used to have a check to ensure that there was at least some change in one or other side being merged but this suppressed output for the degenerate case when base, local and remote contents were all identical. Removing this check enables correct output in the degenerate case and xdl_free_script handles freeing NULL scripts so there is no need to have the check for these calls. Signed-off-by: Charles Bailey <> Signed-off-by: Junio C Hamano <>
2009-04-23Fix typos / spelling in commentsMike Ralphson
Signed-off-by: Mike Ralphson <> Signed-off-by: Junio C Hamano <>
2009-03-16Fix various dead stores found by the clang static analyzerBenjamin Kramer
http-push.c::finish_request(): request is initialized by the for loop index-pack.c::free_base_data(): b is initialized by the for loop merge-recursive.c::process_renames(): move compare to narrower scope, and remove unused assignments to it remove unused variable renames2 xdiff/xdiffi.c::xdl_recs_cmp(): remove unused variable ec xdiff/xemit.c::xdl_emit_diff(): xche is always overwritten Signed-off-by: Benjamin Kramer <> Signed-off-by: Junio C Hamano <>
2009-02-06Merge branch 'kc/maint-diff-bwi-fix' into maintJunio C Hamano
* kc/maint-diff-bwi-fix: Fix combined use of whitespace ignore options to diff test more combinations of ignore-whitespace options to diff
2009-01-24Merge branch 'js/patience-diff'Junio C Hamano
* js/patience-diff: bash completions: Add the --patience option Introduce the diff option '--patience' Implement the patience diff algorithm Conflicts: contrib/completion/git-completion.bash