summaryrefslogtreecommitdiff
path: root/diff.c
AgeCommit message (Collapse)Author
2022-07-18Merge branch 'ab/cocci-unused'Junio C Hamano
Add Coccinelle rules to detect the pattern of initializing and then finalizing a structure without using it in between at all, which happens after code restructuring and the compilers fail to recognize as an unused variable. * ab/cocci-unused: cocci: generalize "unused" rule to cover more than "strbuf" cocci: add and apply a rule to find "unused" strbufs cocci: have "coccicheck{,-pending}" depend on "coccicheck-test" cocci: add a "coccicheck-test" target and test *.cocci rules Makefile & .gitignore: ignore & clean "git.res", not "*.res" Makefile: remove mandatory "spatch" arguments from SPATCH_FLAGS
2022-07-06cocci: add and apply a rule to find "unused" strbufsÆvar Arnfjörð Bjarmason
Add a coccinelle rule to remove "struct strbuf" initialization followed by calling "strbuf_release()" function, without any uses of the strbuf in the same function. See the tests in contrib/coccinelle/tests/unused.{c,res} for what it's intended to find and replace. The inclusion of "contrib/scalar/scalar.c" is because "spatch" was manually run on it (we don't usually run spatch on contrib). Per the "buggy code" comment we also match a strbuf_init() before the xmalloc(), but we're not seeking to be so strict as to make checks that the compiler will catch for us redundant. Saying we'll match either "init" or "xmalloc" lines makes the rule simpler. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22merge-ort: make `path_messages` a strmap to a string_listJohannes Schindelin
This allows us once again to get away with less data copying. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-20diff: use mks_tempfile_dt()René Scharfe
Git uses temporary files to pass the contents of blobs to external diff programs and textconv filters. It calls mks_tempfile_ts() to create them, which puts them all in the same directory. This requires adding a random name prefix. Use mks_tempfile_dt() instead, which allows the files to have arbitrary names, each in their own separate temporary directory. This way they can have the same basename as the original blob, which looks nicer in graphical diff programs. The test in t4020 to check the prettiness of the temporary paths was neutered by 5476bdf0e8 (diff tests: don't ignore "git diff" exit code in "read" loop, 2022-03-07), which removed its grep check without replacing it with an equivalent test_cmp check. Add one that only checks the basename of the temporary file and nothing else. And make the test more robust while at it, by using test_when_finished to get rid of the added file even if the test fails. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-23Merge branch 'ab/plug-random-leaks'Junio C Hamano
Double-free fix for a recently merged topic. * ab/plug-random-leaks: diff.c: fix a double-free regression in a18d66cefb tests: demonstrate "show --word-diff --color-moved" regression
2022-03-17diff.c: fix a double-free regression in a18d66cefbÆvar Arnfjörð Bjarmason
My a18d66cefb9 (diff.c: free "buf" in diff_words_flush(), 2022-03-04) has what it retrospect is a rather obvious bug (I don't know what I was thinking, if it all): We use the "emitted_symbols" allocation in append_emitted_diff_symbol() N times, but starting with a18d66cefb9 we'd free it after its first use! The correct way to free this data would have been to add the free() to the existing free_diff_words_data() function, so let's do that. The "ecbdata->diff_words->opt->emitted_symbols" might be NULL, so let's add a trivial free_emitted_diff_symbols() helper next to the function that appends to it. This fixes the "no effect on show from" leak tested for in the preceding commit. Perhaps confusingly this change will skip that test under SANITIZE=leak, but otherwise opt-in the "t4015-diff-whitespace.sh" test. The reason is that a18d66cefb9 "fixed" the leak in the preceding "no effect on diff" test, but for the first call to diff_words_flush() the "wol->buf" would be NULL, so we wouldn't double-free (and SANITIZE=address would see nothing amiss). With this change we'll still pass that test, showing that we've also fixed leaks on this codepath. We then have to skip the new "no effect on show" test because it happens to trip over an unrelated memory leak (in revision.c). The same goes for "move detection with submodules". Both of them pass with SANITIZE=address though, which would error on the "no effect on show" test before this change. Reported-by: Michael J Gruber <git@grubix.eu> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-13Merge branch 'ab/plug-random-leaks'Junio C Hamano
Plug random memory leaks. * ab/plug-random-leaks: repository.c: free the "path cache" in repo_clear() range-diff: plug memory leak in read_patches() range-diff: plug memory leak in common invocation lockfile API users: simplify and don't leak "path" commit-graph: stop fill_oids_from_packs() progress on error and free() commit-graph: fix memory leak in misused string_list API submodule--helper: fix trivial leak in module_add() transport: stop needlessly copying bundle header references bundle: call strvec_clear() on allocated strvec remote-curl.c: free memory in cmd_main() urlmatch.c: add and use a *_release() function diff.c: free "buf" in diff_words_flush() merge-base: free() allocated "struct commit **" list index-pack: fix memory leaks
2022-03-07Merge branch 'ac/usage-string-fixups'Junio C Hamano
Usage-string normalization. * ac/usage-string-fixups: amend remaining usage strings according to style guide
2022-03-04diff.c: free "buf" in diff_words_flush()Ævar Arnfjörð Bjarmason
Amend the freeing logic added in e6e045f8031 (diff.c: buffer all output if asked to, 2017-06-29) to free the containing "buf" in addition to its members. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25Merge branch 'ab/diff-free-more'Junio C Hamano
Leakfixes. * ab/diff-free-more: diff.[ch]: have diff_free() free options->parseopts diff.[ch]: have diff_free() call clear_pathspec(opts.pathspec)
2022-02-23amend remaining usage strings according to style guideAbhradeep Chakraborty
Usage strings for git (sub)command flags has a style guide that suggests - first letter should not capitalized (unless required) and it should skip full-stop at the end of line. But there are some files where usage-strings do not follow the above mentioned guide. Amend the usage strings that don't follow the style convention/guide. Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-16Merge branch 'js/diff-filter-negation-fix'Junio C Hamano
"git diff --diff-filter=aR" is now parsed correctly. * js/diff-filter-negation-fix: diff-filter: be more careful when looking for negative bits diff.c: move the diff filter bits definitions up a bit docs(diff): lose incorrect claim about `diff-files --diff-filter=A`
2022-02-16Merge branch 'en/remerge-diff'Junio C Hamano
"git log --remerge-diff" shows the difference from mechanical merge result and the result that is actually recorded in a merge commit. * en/remerge-diff: diff-merges: avoid history simplifications when diffing merges merge-ort: mark conflict/warning messages from inner merges as omittable show, log: include conflict/warning messages in --remerge-diff headers diff: add ability to insert additional headers for paths merge-ort: format messages slightly different for use in headers merge-ort: mark a few more conflict messages as omittable merge-ort: capture and print ll-merge warnings in our preferred fashion ll-merge: make callers responsible for showing warnings log: clean unneeded objects during `log --remerge-diff` show, log: provide a --remerge-diff capability
2022-02-16diff.[ch]: have diff_free() free options->parseoptsÆvar Arnfjörð Bjarmason
The "struct option" added in 4a288478394 (diff.c: prepare to use parse_options() for parsing, 2019-01-27) would be free'd in the case of diff_setup_done() being called. But not all codepaths that allocate it reach that, e.g. "t6427-diff3-conflict-markers.sh" will now free memory that it didn't free before. By using FREE_AND_NULL() here (which diff_setup_done() also does) we ensure that we free the memory, and that we won't have double-free's. Before this running: ./t6427-diff3-conflict-markers.sh -vixd --run=7 Would report: SUMMARY: LeakSanitizer: 7823 byte(s) leaked in 6 allocation(s). But now we'll report: SUMMARY: LeakSanitizer: 703 byte(s) leaked in 5 allocation(s). I.e. the largest leak in that particular test has now been addressed. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-16diff.[ch]: have diff_free() call clear_pathspec(opts.pathspec)Ævar Arnfjörð Bjarmason
Have the diff_free() function call clear_pathspec(). Since the diff_flush() function calls this all its callers can be simplified to rely on it instead. When I added the diff_free() function in e900d494dcf (diff: add an API for deferred freeing, 2021-02-11) I simply missed this, or wasn't interested in it. Let's consolidate this now. This means that any future callers (and I've got revision.c in mind) that embed a "struct diff_options" can simply call diff_free() instead of needing know that it has an embedded pathspec. This does fix a bunch of leaks, but I can't mark any test here as passing under the SANITIZE=leak testing mode because in 886e1084d78 (builtin/: add UNLEAKs, 2017-10-01) an UNLEAK(rev) was added, which plasters over the memory leak. E.g. "t4011-diff-symlink.sh" would report fewer leaks with this fix, but because of the UNLEAK() reports none. I'll eventually loop around to removing that UNLEAK(rev) annotation as I'll fix deeper issues with the revisions API leaking. This is one small step on the way there, a new freeing function in revisions.c will want to call this diff_free(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02diff: add ability to insert additional headers for pathsElijah Newren
When additional headers are provided, we need to * add diff_filepairs to diff_queued_diff for each paths in the additional headers map which, unless that path is part of another diff_filepair already found in diff_queued_diff * format the headers (colorization, line_prefix for --graph) * make sure the various codepaths that attempt to return early if there are "no changes" take into account the headers that need to be shown. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-28diff-filter: be more careful when looking for negative bitsJohannes Schindelin
The `--diff-filter=<bits>` option allows to filter the diff by certain criteria, for example `R` to only show renamed files. It also supports negating a filter via a down-cased letter, i.e. `r` to show _everything but_ renamed files. However, the code is a bit overzealous when trying to figure out whether `git diff` should start with all diff-filters turned on because the user provided a lower-case letter: if the `--diff-filter` argument starts with an upper-case letter, we must not start with all bits turned on. Even worse, it is possible to specify the diff filters in multiple, separate options, e.g. `--diff-filter=AM [...] --diff-filter=m`. Let's accumulate the include/exclude filters independently, and only special-case the "only exclude filters were specified" case after parsing the options altogether. Note: The code replaced by this commit took pains to avoid setting any unused bits of `options->filter`. That was unnecessary, though, as all accesses happen via the `filter_bit_tst()` function using specific bits, and setting the unused bits has no effect. Therefore, we can simplify the code by using `~0` (or in this instance, `~<unwanted-bit>`). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-28diff.c: move the diff filter bits definitions up a bitJohannes Schindelin
This prepares for a more careful handling of the `--diff-filter` options over the next few commits. This commit is best viewed with `--color-moved`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-10Merge branch 'ja/i18n-similar-messages'Junio C Hamano
Similar message templates have been consolidated so that translators need to work on fewer number of messages. * ja/i18n-similar-messages: i18n: turn even more messages into "cannot be used together" ones i18n: ref-filter: factorize "%(foo) atom used without %(bar) atom" i18n: factorize "--foo outside a repository" i18n: refactor "unrecognized %(foo) argument" strings i18n: factorize "no directory given for --foo" i18n: factorize "--foo requires --bar" and the like i18n: tag.c factorize i18n strings i18n: standardize "cannot open" and "cannot read" i18n: turn "options are incompatible" into "cannot be used together" i18n: refactor "%s, %s and %s are mutually exclusive" i18n: refactor "foo and bar are mutually exclusive"
2022-01-05Merge branch 'pw/diff-color-moved-fix'Junio C Hamano
Correctness and performance update to "diff --color-moved" feature. * pw/diff-color-moved-fix: diff --color-moved: intern strings diff: use designated initializers for emitted_diff_symbol diff --color-moved-ws=allow-indentation-change: improve hash lookups diff --color-moved: stop clearing potential moved blocks diff --color-moved: shrink potential moved blocks as we go diff --color-moved: unify moved block growth functions diff --color-moved: call comparison function directly diff --color-moved-ws=allow-indentation-change: simplify and optimize diff: simplify allow-indentation-change delta calculation diff --color-moved: avoid false short line matches and bad zebra coloring diff --color-moved=zebra: fix alternate coloring diff --color-moved: rewind when discarding pmb diff --color-moved: factor out function diff --color-moved: clear all flags on blocks that are too short diff --color-moved: add perf tests
2022-01-05i18n: turn even more messages into "cannot be used together" onesJean-Noël Avila
Even if some of these messages are not subject to gettext i18n, this helps bring a single style of message for a given error type. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-05i18n: refactor "%s, %s and %s are mutually exclusive"Jean-Noël Avila
Use placeholders for constant tokens. The strings are turned into "cannot be used together" Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09diff --color-moved: intern stringsPhillip Wood
Taking inspiration from xdl_classify_record() assign an id to each addition and deletion such that lines that match for the current --color-moved-ws mode share the same unique id. This reduces the number of hash lookups a little (calculating the ids still involves one hash lookup per line) but the main benefit is that when growing blocks of potentially moved lines we can replace string comparisons which involve chasing a pointer with a simple integer comparison. On a large diff this commit reduces the time to run 'diff --color-moved' by 37% compared to the previous commit and 31% compared to master, for 'diff --color-moved-ws=allow-indentation-change' the reduction is 28% compared to the previous commit and 96% compared to master. There is little change in the performance of 'git log --patch' as the diffs are smaller. Test HEAD^ HEAD --------------------------------------------------------------------------------------------------------------- 4002.1: diff --no-color-moved --no-color-moved-ws large change 0.38(0.33+0.05) 0.38(0.33+0.05) +0.0% 4002.2: diff --color-moved --no-color-moved-ws large change 0.88(0.81+0.06) 0.55(0.50+0.04) -37.5% 4002.3: diff --color-moved-ws=allow-indentation-change large change 0.85(0.79+0.06) 0.61(0.54+0.06) -28.2% 4002.4: log --no-color-moved --no-color-moved-ws 1.16(1.07+0.08) 1.15(1.09+0.05) -0.9% 4002.5: log --color-moved --no-color-moved-ws 1.31(1.22+0.08) 1.29(1.19+0.09) -1.5% 4002.6: log --color-moved-ws=allow-indentation-change 1.32(1.24+0.08) 1.31(1.18+0.13) -0.8% Test master HEAD --------------------------------------------------------------------------------------------------------------- 4002.1: diff --no-color-moved --no-color-moved-ws large change 0.38 (0.33+0.05) 0.38(0.33+0.05) +0.0% 4002.2: diff --color-moved --no-color-moved-ws large change 0.80 (0.75+0.04) 0.55(0.50+0.04) -31.2% 4002.3: diff --color-moved-ws=allow-indentation-change large change 14.20(14.15+0.05) 0.61(0.54+0.06) -95.7% 4002.4: log --no-color-moved --no-color-moved-ws 1.15 (1.05+0.09) 1.15(1.09+0.05) +0.0% 4002.5: log --color-moved --no-color-moved-ws 1.30 (1.19+0.11) 1.29(1.19+0.09) -0.8% 4002.6: log --color-moved-ws=allow-indentation-change 1.70 (1.63+0.06) 1.31(1.18+0.13) -22.9% Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09diff: use designated initializers for emitted_diff_symbolPhillip Wood
This makes it clearer which fields are being explicitly initialized and will simplify the next commit where we add a new field to the struct. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09diff --color-moved-ws=allow-indentation-change: improve hash lookupsPhillip Wood
As libxdiff does not have a whitespace flag to ignore the indentation the code for --color-moved-ws=allow-indentation-change uses XDF_IGNORE_WHITESPACE and then filters out any hash lookups where there are non-indentation changes. This filtering is inefficient as we have to perform another string comparison. By using the offset data that we have already computed to skip the indentation we can avoid using XDF_IGNORE_WHITESPACE and safely remove the extra checks which improves the performance by 11% and paves the way for the elimination of string comparisons in the next commit. This change slightly increases the run time of other --color-moved modes. This could be avoided by using different comparison functions for the different modes but after the next two commits there is no measurable benefit in doing so. There is a change in behavior for lines that begin with a form-feed or vertical-tab character. Since b46054b374 ("xdiff: use git-compat-util", 2019-04-11) xdiff does not treat '\f' or '\v' as whitespace characters. This means that lines starting with those characters are never considered to be blank and never match a line that does not start with the same character. After this patch a line matching "^[\f\v\r]*[ \t]*$" is considered to be blank by --color-moved-ws=allow-indentation-change and lines beginning "^[\f\v\r]*[ \t]*" can match another line if the suffixes match. This changes the output of git show for d18f76dccf ("compat/regex: use the regex engine from gawk for compat", 2010-08-17) as some lines in the pre-image before a moved block that contain '\f' are now considered moved as well as they match a blank line before the moved lines in the post-image. This commit updates one of the tests to reflect this change. Test HEAD^ HEAD -------------------------------------------------------------------------------------------------------------- 4002.1: diff --no-color-moved --no-color-moved-ws large change 0.38(0.33+0.05) 0.38(0.33+0.05) +0.0% 4002.2: diff --color-moved --no-color-moved-ws large change 0.86(0.82+0.04) 0.88(0.84+0.04) +2.3% 4002.3: diff --color-moved-ws=allow-indentation-change large change 0.97(0.94+0.03) 0.86(0.81+0.05) -11.3% 4002.4: log --no-color-moved --no-color-moved-ws 1.16(1.07+0.09) 1.16(1.06+0.09) +0.0% 4002.5: log --color-moved --no-color-moved-ws 1.32(1.26+0.06) 1.33(1.27+0.05) +0.8% 4002.6: log --color-moved-ws=allow-indentation-change 1.35(1.29+0.06) 1.33(1.24+0.08) -1.5% Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09diff --color-moved: stop clearing potential moved blocksPhillip Wood
moved_block_clear() was introduced in 74d156f4a1 ("diff --color-moved-ws: fix double free crash", 2018-10-04) to free the memory that was allocated when initializing a potential moved block. However since 21536d077f ("diff --color-moved-ws: modify allow-indentation-change", 2018-11-23) initializing a potential moved block no longer allocates any memory. Up until the last commit we were relying on moved_block_clear() to set the `match` pointer to NULL when a block stopped matching, but since that commit we do not clear a moved block that does not match so it does not make sense to clear them elsewhere. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09diff --color-moved: shrink potential moved blocks as we goPhillip Wood
Rather than setting `match` to NULL and then looping over the list of potential matched blocks for a second time to remove blocks with no matches just filter out the blocks with no matches as we go. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09diff --color-moved: unify moved block growth functionsPhillip Wood
After the last two commits pmb_advance_or_null() and pmb_advance_or_null_multi_match() differ only in the comparison they perform. Lets simplify the code by combining them into a single function. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09diff --color-moved: call comparison function directlyPhillip Wood
This change will allow us to easily combine pmb_advance_or_null() and pmb_advance_or_null_multi_match() in the next commit. Calling xdiff_compare_lines() directly rather than using a function pointer from the hash map has little effect on the run time. Test HEAD^ HEAD ------------------------------------------------------------------------------------------------------------- 4002.1: diff --no-color-moved --no-color-moved-ws large change 0.38(0.35+0.03) 0.38(0.32+0.06) +0.0% 4002.2: diff --color-moved --no-color-moved-ws large change 0.87(0.83+0.04) 0.87(0.80+0.06) +0.0% 4002.3: diff --color-moved-ws=allow-indentation-change large change 0.97(0.92+0.04) 0.97(0.93+0.04) +0.0% 4002.4: log --no-color-moved --no-color-moved-ws 1.17(1.06+0.10) 1.16(1.10+0.05) -0.9% 4002.5: log --color-moved --no-color-moved-ws 1.32(1.24+0.08) 1.31(1.22+0.09) -0.8% 4002.6: log --color-moved-ws=allow-indentation-change 1.36(1.25+0.10) 1.35(1.25+0.10) -0.7% Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09diff --color-moved-ws=allow-indentation-change: simplify and optimizePhillip Wood
If we already have a block of potentially moved lines then as we move down the diff we need to check if the next line of each potentially moved line matches the current line of the diff. The implementation of --color-moved-ws=allow-indentation-change was needlessly performing this check on all the lines in the diff that matched the current line rather than just the current line. To exacerbate the problem finding all the other lines in the diff that match the current line involves a fuzzy lookup so we were wasting even more time performing a second comparison to filter out the non-matching lines. Fixing this reduces time to run git diff --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0 by 93% compared to master and simplifies the code. Test HEAD^ HEAD --------------------------------------------------------------------------------------------------------------- 4002.1: diff --no-color-moved --no-color-moved-ws large change 0.38 (0.35+0.03) 0.38(0.35+0.03) +0.0% 4002.2: diff --color-moved --no-color-moved-ws large change 0.86 (0.80+0.06) 0.87(0.83+0.04) +1.2% 4002.3: diff --color-moved-ws=allow-indentation-change large change 19.01(18.93+0.06) 0.97(0.92+0.04) -94.9% 4002.4: log --no-color-moved --no-color-moved-ws 1.16 (1.06+0.09) 1.17(1.06+0.10) +0.9% 4002.5: log --color-moved --no-color-moved-ws 1.32 (1.25+0.07) 1.32(1.24+0.08) +0.0% 4002.6: log --color-moved-ws=allow-indentation-change 1.71 (1.64+0.06) 1.36(1.25+0.10) -20.5% Test master HEAD --------------------------------------------------------------------------------------------------------------- 4002.1: diff --no-color-moved --no-color-moved-ws large change 0.38 (0.33+0.05) 0.38(0.35+0.03) +0.0% 4002.2: diff --color-moved --no-color-moved-ws large change 0.80 (0.75+0.04) 0.87(0.83+0.04) +8.7% 4002.3: diff --color-moved-ws=allow-indentation-change large change 14.20(14.15+0.05) 0.97(0.92+0.04) -93.2% 4002.4: log --no-color-moved --no-color-moved-ws 1.15 (1.05+0.09) 1.17(1.06+0.10) +1.7% 4002.5: log --color-moved --no-color-moved-ws 1.30 (1.19+0.11) 1.32(1.24+0.08) +1.5% 4002.6: log --color-moved-ws=allow-indentation-change 1.70 (1.63+0.06) 1.36(1.25+0.10) -20.0% Helped-by: Jeff King <peff@peff.net> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09diff: simplify allow-indentation-change delta calculationPhillip Wood
Now that we reliably end a block when the sign changes we don't need the whitespace delta calculation to rely on the sign. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09diff --color-moved: avoid false short line matches and bad zebra coloringPhillip Wood
When marking moved lines it is possible for a block of potential matched lines to extend past a change in sign when there is a sequence of added lines whose text matches the text of a sequence of deleted and added lines. Most of the time either `match` will be NULL or `pmb_advance_or_null()` will fail when the loop encounters a change of sign but there are corner cases where `match` is non-NULL and `pmb_advance_or_null()` successfully advances the moved block despite the change in sign. One consequence of this is highlighting a short line as moved when it should not be. For example -moved line # Correctly highlighted as moved +short line # Wrongly highlighted as moved context +moved line # Correctly highlighted as moved +short line context -short line The other consequence is coloring a moved addition following a moved deletion in the wrong color. In the example below the first "+moved line 3" should be highlighted as newMoved not newMovedAlternate. -moved line 1 # Correctly highlighted as oldMoved -moved line 2 # Correctly highlighted as oldMovedAlternate +moved line 3 # Wrongly highlighted as newMovedAlternate context # Everything else is highlighted correctly +moved line 2 +moved line 3 context +moved line 1 -moved line 3 These false matches are more likely when using --color-moved-ws with the exception of --color-moved-ws=allow-indentation-change which ties the sign of the current whitespace delta to the sign of the line to avoid this problem. The fix is to check that the sign of the new line being matched is the same as the sign of the line that started the block of potential matches. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09diff --color-moved=zebra: fix alternate coloringPhillip Wood
b0a2ba4776 ("diff --color-moved=zebra: be stricter with color alternation", 2018-11-23) sought to avoid using the alternate colors unless there are two adjacent moved blocks of the same sign. Unfortunately it contains two bugs that prevented it from fixing the problem properly. Firstly `last_symbol` is reset at the start of each iteration of the loop losing the symbol of the last line and secondly when deciding whether to use the alternate color it should be checking if the current line is the same sign of the last line, not a different sign. The combination of the two errors means that we still use the alternate color when we should do but we also use it when we shouldn't. This is most noticable when using --color-moved-ws=allow-indentation-change with hunks like -this line gets indented + this line gets indented where the post image is colored with newMovedAlternate rather than newMoved. While this does not matter much, the next commit will change the coloring to be correct in this case, so lets fix the bug here to make it clear why the output is changing and add a regression test. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09diff --color-moved: rewind when discarding pmbPhillip Wood
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09diff --color-moved: factor out functionPhillip Wood
This code is quite heavily indented and having it in its own function simplifies an upcoming change. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09diff --color-moved: clear all flags on blocks that are too shortPhillip Wood
If a block of potentially moved lines is not long enough then the DIFF_SYMBOL_MOVED_LINE flag is cleared on the matching lines so they are not marked as moved. To avoid problems when we start rewinding after an unsuccessful match in a couple of commits time make sure all the move related flags are cleared, not just DIFF_SYMBOL_MOVED_LINE. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-26run-command API users: use strvec_push(), not argv constructionÆvar Arnfjörð Bjarmason
Change a pattern of hardcoding an "argv" array size, populating it and assigning to the "argv" member of "struct child_process" to instead use "strvec_push()" to add data to the "args" member. As noted in the preceding commit this moves us further towards being able to remove the "argv" member in a subsequent commit These callers could have used strvec_pushl(), but moving to strvec_push() makes the diff easier to read, and keeps the arguments aligned as before. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27*.[ch] *_INIT macros: use { 0 } for a "zero out" idiomÆvar Arnfjörð Bjarmason
In C it isn't required to specify that all members of a struct are zero'd out to 0, NULL or '\0', just providing a "{ 0 }" will accomplish that. Let's also change code that provided N zero'd fields to just provide one, and change e.g. "{ NULL }" to "{ 0 }" for consistency. I.e. even if the first member is a pointer let's use "0" instead of "NULL". The point of using "0" consistently is to pick one, and to not have the reader wonder why we're not using the same pattern everywhere. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09diff: ignore sparse paths in diffstatDerrick Stolee
The diff_populate_filespec() method is used to describe the diff after a merge operation is complete. In order to avoid expanding a sparse index, the reuse_worktree_file() needs to be adapted to ignore files that are outside of the sparse-checkout cone. The file names and OIDs used for this check come from the merged tree in the case of the ORT strategy, not the index, hence the ability to look into these paths without having already expanded the index. The work done by reuse_worktree_file() is only an optimization, and requires the file being on disk for it to be of any value. Thus, it is safe to exit the method early if we do not expect the file on disk. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-06Merge branch 'ab/pickaxe-pcre2'Junio C Hamano
* ab/pickaxe-pcre2: diff: --pickaxe-all typofix
2021-08-04diff: --pickaxe-all typofixBagas Sanjaya
When I was fixing fuzzies as I updating po/id.po for 2.33.0 l10n round, I noticed a triple-dash typo (--pickaxe-all) at diff.c, which according to git-diff(1) manpage, the correct option name should be --pickaxe-all. Fix the typo. Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-28Merge branch 'en/rename-limits-doc'Junio C Hamano
Documentation on "git diff -l<n>" and diff.renameLimit have been updated, and the defaults for these limits have been raised. * en/rename-limits-doc: rename: bump limit defaults yet again diffcore-rename: treat a rename_limit of 0 as unlimited doc: clarify documentation for rename/copy limits diff: correct warning message when renameLimit exceeded
2021-07-15rename: bump limit defaults yet againElijah Newren
These were last bumped in commit 92c57e5c1d29 (bump rename limit defaults (again), 2011-02-19), and were bumped both because processors had gotten faster, and because people were getting ugly merges that caused problems and reporting it to the mailing list (suggesting that folks were willing to spend more time waiting). Since that time: * Linus has continued recommending kernel folks to set diff.renameLimit=0 (maps to 32767, currently) * Folks with repositories with lots of renames were happy to set merge.renameLimit above 32767, once the code supported that, to get correct cherry-picks * Processors have gotten faster * It has been discovered that the timing methodology used last time probably used too large example files. The last point is probably worth explaining a bit more: * The "average" file size used appears to have been average blob size in the linux kernel history at the time (probably v2.6.25 or something close to it). * Since bigger files are modified more frequently, such a computation weights towards larger files. * Larger files may be more likely to be modified over time, but are not more likely to be renamed -- the mean and median blob size within a tree are a bit higher than the mean and median of blob sizes in the history leading up to that version for the linux kernel. * The mean blob size in v2.6.25 was half the average blob size in history leading to that point * The median blob size in v2.6.25 was about 40% of the mean blob size in v2.6.25. * Since the mean blob size is more than double the median blob size, any file as big as the mean will not be compared to any files of median size or less (because they'd be more than 50% dissimilar). * Since it is the number of files compared that provides the O(n^2) behavior, median-sized files should matter more than mean-sized ones. The combined effect of the above is that the file size used in past calculations was likely about 5x too large. Combine that with a CPU performance improvement of ~30%, and we can increase the limits by a factor of sqrt(5/(1-.3)) = 2.67, while keeping the original stated time limits. Keeping the same approximate time limit probably makes sense for diff.renameLimit (there is no progress feedback in e.g. git log -p), but the experience above suggests merge.renameLimit could be extended significantly. In fact, it probably would make sense to have an unlimited default setting for merge.renameLimit, but that would likely need to be coupled with changes to how progress is displayed. (See https://lore.kernel.org/git/YOx+Ok%2FEYvLqRMzJ@coredump.intra.peff.net/ for details in that area.) For now, let's just bump the approximate time limit from 10s to 1m. (Note: We do not want to use actual time limits, because getting results that depend on how loaded your system is that day feels bad, and because we don't discover that we won't get all the renames until after we've put in a lot of work rather than just upfront telling the user there are too many files involved.) Using the original time limit of 2s for diff.renameLimit, and bumping merge.renameLimit from 10s to 60s, I found the following timings using the simple script at the end of this commit message (on an AWS c5.xlarge which reports as "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"): N Timing 1300 1.995s 7100 59.973s So let's round down to nice even numbers and bump the limits from 400->1000, and from 1000->7000. Here is the measure_rename_perf script (adapted from https://lore.kernel.org/git/20080211113516.GB6344@coredump.intra.peff.net/ in particular to avoid triggering the linear handling from basename-guided rename detection): #!/bin/bash n=$1; shift rm -rf repo mkdir repo && cd repo git init -q -b main mkdata() { mkdir $1 for i in `seq 1 $2`; do (sed "s/^/$i /" <../sample echo tag: $1 ) >$1/$i done } mkdata initial $n git add . git commit -q -m initial mkdata new $n git add . cd new for i in *; do git mv $i $i.renamed; done cd .. git rm -q -rf initial git commit -q -m new time git diff-tree -M -l0 --summary HEAD^ HEAD Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-15diff: correct warning message when renameLimit exceededElijah Newren
The warning when quadratic rename detection was skipped referred to "inexact rename detection". For years, the only linear portion of rename detection was looking for exact renames, so "inexact rename detection" was an accurate way to refer to the quadratic portion of rename detection. However, that changed with commit bd24aa2f97a0 (diffcore-rename: guide inexact rename detection based on basenames, 2021-02-14). Let's instead use the term "exhaustive rename detection" to refer to the quadratic portion. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-13Merge branch 'ab/pickaxe-pcre2'Junio C Hamano
Rewrite the backend for "diff -G/-S" to use pcre2 engine when available. * ab/pickaxe-pcre2: (22 commits) xdiff-interface: replace discard_hunk_line() with a flag xdiff users: use designated initializers for out_line pickaxe -G: don't special-case create/delete pickaxe -G: terminate early on matching lines xdiff-interface: allow early return from xdiff_emit_line_fn xdiff-interface: prepare for allowing early return pickaxe -S: slightly optimize contains() pickaxe: rename variables in has_changes() for brevity pickaxe -S: support content with NULs under --pickaxe-regex pickaxe: assert that we must have a needle under -G or -S pickaxe: refactor function selection in diffcore-pickaxe() perf: add performance test for pickaxe pickaxe/style: consolidate declarations and assignments diff.h: move pickaxe fields together again pickaxe: die when --find-object and --pickaxe-all are combined pickaxe: die when -G and --pickaxe-regex are combined pickaxe tests: add missing test for --no-pickaxe-regex being an error pickaxe tests: test for -G, -S and --find-object incompatibility pickaxe tests: add test for "log -S" not being a regex pickaxe tests: add test for diffgrep_consume() internals ...
2021-05-13Merge branch 'pw/word-diff-zero-width-matches'Junio C Hamano
The word-diff mode has been taught to work better with a word regexp that can match an empty string. * pw/word-diff-zero-width-matches: word diff: handle zero length matches
2021-05-11xdiff-interface: replace discard_hunk_line() with a flagÆvar Arnfjörð Bjarmason
Remove the dummy discard_hunk_line() function added in 3b40a090fd4 (diff: avoid generating unused hunk header lines, 2018-11-02) in favor of having a new XDL_EMIT_NO_HUNK_HDR flag, for use along with the two existing and similar XDL_EMIT_* flags. Unlike the recently amended xdiff_emit_line_fn interface which'll be called in a loop in xdl_emit_diff(), the hunk header is only emitted once. It makes more sense to pass this as a flag than provide a dummy callback because that function may be able to skip doing certain work if it knows the caller is doing nothing with the hunk header. It would be possible to do so in the case of -U0 now, but the benefit of doing so is so small that I haven't bothered. But this leaves the door open to that, and more importantly makes the API use more intuitive. The reason we're putting a flag in the gap between 1<<0 and 1<<2 is that the old 1<<1 flag was removed in 907681e940d (xdiff: drop XDL_EMIT_COMMON, 2016-02-23) without re-ordering the remaining flags. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11xdiff-interface: prepare for allowing early returnÆvar Arnfjörð Bjarmason
Change the function prototype of xdiff_emit_line_fn to return an "int" instead of "void". Change all of those functions to "return 0", nothing checks those return values yet, and no behavior is being changed. In subsequent commits the interface will be changed to allow early return via this new return value. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11pickaxe: die when --find-object and --pickaxe-all are combinedÆvar Arnfjörð Bjarmason
Neither the --pickaxe-all documentation nor --find-object's has ever suggested that you can combine the two. See f506b8e8b5 (git log/diff: add -G<regexp> that greps in the patch text, 2010-08-23) and 15af58c1ad (diffcore: add a pickaxe option to find a specific blob, 2018-01-04). But we've silently tolerated it, which makes the logic in diffcore_pickaxe() harder to reason about. Let's assert that we won't have the two combined. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11pickaxe: die when -G and --pickaxe-regex are combinedÆvar Arnfjörð Bjarmason
When the -G and --pickaxe-regex options are combined we simply ignore the --pickaxe-regex option. Let's die instead as suggested by our documentation, since -G is always a regex. When --pickaxe-regex was added in d01d8c6782 (Support for pickaxe matching regular expressions, 2006-03-29) only the -S option existed. Then when -G was added in f506b8e8b5 (git log/diff: add -G<regexp> that greps in the patch text, 2010-08-23) neither the documentation for --pickaxe-regex was updated accordingly, nor was something like this assertion added. Since 5bc3f0b567 (diffcore-pickaxe doc: document -S and -G properly, 2013-05-31) we've claimed that --pickaxe-regex should only be used with -S, but have silently tolerated combining it with -G, let's die instead. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>