summaryrefslogtreecommitdiff
path: root/diff.c
AgeCommit message (Collapse)Author
2017-11-01diff: remove DIFF_OPT_CLR macroBrandon Williams
Remove the `DIFF_OPT_CLR` macro and instead set the flags directly. This conversion is done using the following semantic patch: @@ expression E; identifier fld; @@ - DIFF_OPT_CLR(&E, fld) + E.flags.fld = 0 @@ type T; T *ptr; identifier fld; @@ - DIFF_OPT_CLR(ptr, fld) + ptr->flags.fld = 0 Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01diff: remove DIFF_OPT_SET macroBrandon Williams
Remove the `DIFF_OPT_SET` macro and instead set the flags directly. This conversion is done using the following semantic patch: @@ expression E; identifier fld; @@ - DIFF_OPT_SET(&E, fld) + E.flags.fld = 1 @@ type T; T *ptr; identifier fld; @@ - DIFF_OPT_SET(ptr, fld) + ptr->flags.fld = 1 Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01diff: remove DIFF_OPT_TST macroBrandon Williams
Remove the `DIFF_OPT_TST` macro and instead access the flags directly. This conversion is done using the following semantic patch: @@ expression E; identifier fld; @@ - DIFF_OPT_TST(&E, fld) + E.flags.fld @@ type T; T *ptr; identifier fld; @@ - DIFF_OPT_TST(ptr, fld) + ptr->flags.fld Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01diff: add flag to indicate textconv was set via cmdlineBrandon Williams
git-show is unique in that it wants to use textconv by default except for when it is showing blobs. When asked to show a blob, show doesn't want to use textconv unless the user explicitly requested that it be used by providing the command line flag '--textconv'. Currently this is done by using a parallel set of 'touched' flags which get set every time a particular flag is set or cleared. In a future patch we want to eliminate this parallel set of flags so instead of relying on if the textconv flag has been touched, add a new flag 'TEXTCONV_SET_VIA_CMDLINE' which is only set if textconv is set to true via the command line. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01diff: convert flags to be stored in bitfieldsBrandon Williams
We cannot add many more flags to the diff machinery due to the limitations of the number of flags that can be stored in a single unsigned int. In order to allow for more flags to be added to the diff machinery in the future this patch converts the flags to be stored in bitfields in 'struct diff_flags'. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-29diff: fix lstat() error handling in diff_populate_filespec()Andrey Okoshkin
Add lstat() error handling not only for ENOENT case. Otherwise uninitialised 'struct stat st' variable is used later in case of lstat() non-ENOENT failure which leads to processing of rubbish values of file mode ('S_ISLNK(st.st_mode)' check) or size ('xsize_t(st.st_size)'). Signed-off-by: Andrey Okoshkin <a.okoshkin@samsung.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-27xdiff: reassign xpparm_t.flags bitsJunio C Hamano
We have packed the bits too tightly in such a way that it is not easy to add a new type of whitespace ignoring option, a new type of LCS algorithm, or a new type of post-cleanup heuristics. Reorder bits a bit to give room for these three classes of options to grow. Also make use of XDF_WHITESPACE_FLAGS macro where we check any of these bits are on, instead of using DIFF_XDL_TST() macro on individual possibilities. That way, the "is any of the bits on?" code does not have to change when we add more ways to ignore whitespaces. While at it, add a comment in front of the bit definitions to clarify in which structure these defined bits may appear. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-26diff.c: get rid of duplicate implementationStefan Beller
The implementations in diff.c to detect moved lines needs to compare strings and hash strings, which is implemented in that file, as well as in the xdiff library. Remove the rather recent implementation in diff.c and rely on the well exercised code in the xdiff lib. With this change the hash used for bucketing the strings for the moved line detection changes from FNV32 (that is provided via the hashmaps memhash) to DJB2 (which is used internally in xdiff). Benchmarks found on the web[1] do not indicate that these hashes are different in performance for readable strings. [1] https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-21diff: handle NULs in get_string_hash()Jeff King
For computing moved lines, we feed the characters of each line into a hash. When we've been asked to ignore whitespace, then we pick each character using next_byte(), which returns -1 on end-of-string, which it determines using the start/end pointers we feed it. However our check of its return value treats "0" the same as "-1", meaning we'd quit if the string has an embedded NUL. This is unlikely to ever come up in practice since our line boundaries generally come from calling strlen() in the first place. But it was a bit surprising to me as a reader of the next_byte() code. And it's possible that we may one day feed this function with more exotic input, which otherwise works with arbitrary ptr/len pairs. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-21diff: fix whitespace-skipping with --color-movedJeff King
The code for handling whitespace with --color-moved represents partial strings as a pair of pointers. There are two possible conventions for the end pointer: 1. It points to the byte right after the end of the string. 2. It points to the final byte of the string. But we seem to use both conventions in the code: a. we assign the initial pointers from the NUL-terminated string using (1) b. we eat trailing whitespace by checking the second pointer for isspace(), which needs (2) c. the next_byte() function checks for end-of-string with "if (cp > endp)", which is (2) d. in next_byte() we skip past internal whitespace with "while (cp < end)", which is (1) This creates fewer bugs than you might think, because there are some subtle interactions. Because of (a) and (c), we always return the NUL-terminator from next_byte(). But all of the callers of next_byte() happen to handle that gracefully. Because of the mismatch between (d) and (c), next_byte() could accidentally return a whitespace character right at endp. But because of the interaction of (a) and (b), we fail to actually chomp trailing whitespace, meaning our endp _always_ points to a NUL, canceling out the problem. But that does leave (b) as a real bug: when ignoring whitespace only at the end-of-line, we don't correctly trim it, and fail to match up lines. We can fix the whole thing by moving consistently to one convention. Since convention (1) is idiomatic in our code base, we'll pick that one. The existing "-w" and "-b" tests continue to pass, and a new "--ignore-space-at-eol" shows off the breakage we're fixing. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-18Merge branch 'jk/ref-filter-colors-fix'Junio C Hamano
This is the "theoretically more correct" approach of simply stepping back to the state before plumbing commands started paying attention to "color.ui" configuration variable. Let's run with this one. * jk/ref-filter-colors-fix: tag: respect color.ui config Revert "color: check color.ui in git_default_config()" Revert "t6006: drop "always" color config tests" Revert "color: make "always" the same as "auto" in config"
2017-10-17Revert "color: check color.ui in git_default_config()"Jeff King
This reverts commit 136c8c8b8fa39f1315713248473dececf20f8fe7. That commit was trying to address a bug caused by 4c7f1819b3 (make color.ui default to 'auto', 2013-06-10), in which plumbing like diff-tree defaulted to "auto" color, but did not respect a "color.ui" directive to disable it. But it also meant that we started respecting "color.ui" set to "always". This was a known problem, but 4c7f1819b3 argued that nobody ought to be doing that. However, that turned out to be wrong, and we got a number of bug reports related to "add -p" regressing in v2.14.2. Let's revert 136c8c8b8, fixing the regression to "add -p". This leaves the problem from 4c7f1819b3 unfixed, but: 1. It's a pretty obscure problem in the first place. I only noticed it while working on the color code, and we haven't got a single bug report or complaint about it. 2. We can make a more moderate fix on top by respecting "never" but not "always" for plumbing commands. This is just the minimal fix to go back to the working state we had before v2.14.2. Note that this isn't a pure revert. We now have a test in t3701 which shows off the "add -p" regression. This can be flipped to success. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-17Merge branch 'sb/diff-color-move'Junio C Hamano
A recently added "--color-moved" feature of "diff" fell into infinite loop when ignoring whitespace changes, which has been fixed. * sb/diff-color-move: diff: fix infinite loop with --color-moved --ignore-space-change
2017-10-16diff: fix infinite loop with --color-moved --ignore-space-changeJeff King
The --color-moved code uses next_byte() to advance through the blob contents. When the user has asked to ignore whitespace changes, we try to collapse any whitespace change down to a single space. However, we enter the conditional block whenever we see the IGNORE_WHITESPACE_CHANGE flag, even if the next byte isn't whitespace. This means that the combination of "--color-moved and --ignore-space-change" was completely broken. Worse, because we return from next_byte() without having advanced our pointer, the function makes no forward progress in the buffer and loops infinitely. Fix this by entering the conditional only when we actually see whitespace. We can apply this also to the IGNORE_WHITESPACE change. That code path isn't buggy (because it falls through to returning the next non-whitespace byte), but it makes the logic more clear if we only bother to look at whitespace flags after seeing that the next byte is whitespace. Reported-by: Orgad Shaneh <orgads@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-03Merge branch 'sb/diff-color-move'Junio C Hamano
The output from "git diff --summary" was broken in a recent topic that has been merged to 'master' and lost a LF after reporting of mode change. This has been fixed. * sb/diff-color-move: diff: correct newline in summary for renamed files
2017-09-29Merge branch 'rj/no-sign-compare'Junio C Hamano
Many codepaths have been updated to squelch -Wsign-compare warnings. * rj/no-sign-compare: ALLOC_GROW: avoid -Wsign-compare warnings cache.h: hex2chr() - avoid -Wsign-compare warnings commit-slab.h: avoid -Wsign-compare warnings git-compat-util.h: xsize_t() - avoid -Wsign-compare warnings
2017-09-28diff: correct newline in summary for renamed filesStefan Beller
In 146fdb0dfe (diff.c: emit_diff_symbol learns about DIFF_SYMBOL_SUMMARY, 2017-06-29), the conversion from direct printing to the symbol emission dropped the new line character for renamed, copied and rewritten files. Add the emission of a newline, add a test for this case. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Stefan Beller <sbeller@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-25Merge branch 'jk/write-in-full-fix'Junio C Hamano
Many codepaths did not diagnose write failures correctly when disks go full, due to their misuse of write_in_full() helper function, which have been corrected. * jk/write-in-full-fix: read_pack_header: handle signed/unsigned comparison in read result config: flip return value of store_write_*() notes-merge: use ssize_t for write_in_full() return value pkt-line: check write_in_full() errors against "< 0" convert less-trivial versions of "write_in_full() != len" avoid "write_in_full(fd, buf, len) != len" pattern get-tar-commit-id: check write_in_full() return against 0 config: avoid "write_in_full(fd, buf, len) < len" pattern
2017-09-22ALLOC_GROW: avoid -Wsign-compare warningsRamsay Jones
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-19Merge branch 'rs/strbuf-leakfix'Junio C Hamano
Many leaks of strbuf have been fixed. * rs/strbuf-leakfix: (34 commits) wt-status: release strbuf after use in wt_longstatus_print_tracking() wt-status: release strbuf after use in read_rebase_todolist() vcs-svn: release strbuf after use in end_revision() utf8: release strbuf on error return in strbuf_utf8_replace() userdiff: release strbuf after use in userdiff_get_textconv() transport-helper: release strbuf after use in process_connect_service() sequencer: release strbuf after use in save_head() shortlog: release strbuf after use in insert_one_record() sha1_file: release strbuf on error return in index_path() send-pack: release strbuf on error return in send_pack() remote: release strbuf after use in set_url() remote: release strbuf after use in migrate_file() remote: release strbuf after use in read_remote_branches() refs: release strbuf on error return in write_pseudoref() notes: release strbuf after use in notes_copy_from_stdin() merge: release strbuf after use in write_merge_heads() merge: release strbuf after use in save_state() mailinfo: release strbuf on error return in handle_boundary() mailinfo: release strbuf after use in handle_from() help: release strbuf on error return in exec_woman_emacs() ...
2017-09-14avoid "write_in_full(fd, buf, len) != len" patternJeff King
The return value of write_in_full() is either "-1", or the requested number of bytes[1]. If we make a partial write before seeing an error, we still return -1, not a partial value. This goes back to f6aa66cb95 (write_in_full: really write in full or return error on disk full., 2007-01-11). So checking anything except "was the return value negative" is pointless. And there are a couple of reasons not to do so: 1. It can do a funny signed/unsigned comparison. If your "len" is signed (e.g., a size_t) then the compiler will promote the "-1" to its unsigned variant. This works out for "!= len" (unless you really were trying to write the maximum size_t bytes), but is a bug if you check "< len" (an example of which was fixed recently in config.c). We should avoid promoting the mental model that you need to check the length at all, so that new sites are not tempted to copy us. 2. Checking for a negative value is shorter to type, especially when the length is an expression. 3. Linus says so. In d34cf19b89 (Clean up write_in_full() users, 2007-01-11), right after the write_in_full() semantics were changed, he wrote: I really wish every "write_in_full()" user would just check against "<0" now, but this fixes the nasty and stupid ones. Appeals to authority aside, this makes it clear that writing it this way does not have an intentional benefit. It's a historical curiosity that we never bothered to clean up (and which was undoubtedly cargo-culted into new sites). So let's convert these obviously-correct cases (this includes write_str_in_full(), which is just a wrapper for write_in_full()). [1] A careful reader may notice there is one way that write_in_full() can return a different value. If we ask write() to write N bytes and get a return value that is _larger_ than N, we could return a larger total. But besides the fact that this would imply a totally broken version of write(), it would already invoke undefined behavior. Our internal remaining counter is an unsigned size_t, which means that subtracting too many byte will wrap it around to a very large number. So we'll instantly begin reading off the end of the buffer, trying to write gigabytes (or petabytes) of data. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06diff: release strbuf after use in show_stats()Rene Scharfe
Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06diff: release strbuf after use in show_rename_copy()Rene Scharfe
Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06diff: release strbuf after use in diff_summary()Rene Scharfe
Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06tempfile: auto-allocate tempfiles on heapJeff King
The previous commit taught the tempfile code to give up ownership over tempfiles that have been renamed or deleted. That makes it possible to use a stack variable like this: struct tempfile t; create_tempfile(&t, ...); ... if (!err) rename_tempfile(&t, ...); else delete_tempfile(&t); But doing it this way has a high potential for creating memory errors. The tempfile we pass to create_tempfile() ends up on a global linked list, and it's not safe for it to go out of scope until we've called one of those two deactivation functions. Imagine that we add an early return from the function that forgets to call delete_tempfile(). With a static or heap tempfile variable, the worst case is that the tempfile hangs around until the program exits (and some functions like setup_shallow_temporary rely on this intentionally, creating a tempfile and then leaving it for later cleanup). But with a stack variable as above, this is a serious memory error: the variable goes out of scope and may be filled with garbage by the time the tempfile code looks at it. Let's see if we can make it harder to get this wrong. Since many callers need to allocate arbitrary numbers of tempfiles, we can't rely on static storage as a general solution. So we need to turn to the heap. We could just ask all callers to pass us a heap variable, but that puts the burden on them to call free() at the right time. Instead, let's have the tempfile code handle the heap allocation _and_ the deallocation (when the tempfile is deactivated and removed from the list). This changes the return value of all of the creation functions. For the cleanup functions (delete and rename), we'll add one extra bit of safety: instead of taking a tempfile pointer, we'll take a pointer-to-pointer and set it to NULL after freeing the object. This makes it safe to double-call functions like delete_tempfile(), as the second call treats the NULL input as a noop. Several callsites follow this pattern. The resulting patch does have a fair bit of noise, as each caller needs to be converted to handle: 1. Storing a pointer instead of the struct itself. 2. Passing the pointer instead of taking the struct address. 3. Handling a "struct tempfile *" return instead of a file descriptor. We could play games to make this less noisy. For example, by defining the tempfile like this: struct tempfile { struct heap_allocated_part_of_tempfile { int fd; ...etc } *actual_data; } Callers would continue to have a "struct tempfile", and it would be "active" only when the inner pointer was non-NULL. But that just makes things more awkward in the long run. There aren't that many callers, so we can simply bite the bullet and adjust all of them. And the compiler makes it easy for us to find them all. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06tempfile: do not delete tempfile on failed closeJeff King
When close_tempfile() fails, we delete the tempfile and reset the fields of the tempfile struct. This makes it easier for callers to return without cleaning up, but it also makes this common pattern: if (close_tempfile(tempfile)) return error_errno("error closing %s", tempfile->filename.buf); wrong, because the "filename" field has been reset after the failed close. And it's not easy to fix, as in many cases we don't have another copy of the filename (e.g., if it was created via one of the mks_tempfile functions, and we just have the original template string). Let's drop the feature that a failed close automatically deletes the file. This puts the burden on the caller to do the deletion themselves, but this isn't that big a deal. Callers which do: if (write(...) || close_tempfile(...)) { delete_tempfile(...); return -1; } already had to call delete when the write() failed, and so aren't affected. Likewise, any caller which just calls die() in the error path is OK; we'll delete the tempfile during the atexit handler. Because this patch changes the semantics of close_tempfile() without changing its signature, all callers need to be manually checked and converted to the new scheme. This patch covers all in-tree callers, but there may be others for not-yet-merged topics. To catch these, we rename the function to close_tempfile_gently(), which will attract compile-time attention to new callers. (Technically the original could be considered "gentle" already in that it didn't die() on errors, but this one is even more so). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06always check return value of close_tempfileJeff King
If close_tempfile() encounters an error, then it deletes the tempfile and resets the "struct tempfile". But many code paths ignore the return value and continue to use the tempfile. Instead, we should generally treat this the same as a write() error. Note that in the postimage of some of these cases our error message will be bogus after a failed close because we look at tempfile->filename (either directly or via get_tempfile_path). But after the failed close resets the tempfile object, this is guaranteed to be the empty string. That will be addressed in a future patch (because there are many more cases of the same problem than just these instances). Note also in the hunk in gpg-interface.c that it's fine to call delete_tempfile() in the error path, even if close_tempfile() failed and already deleted the file. The tempfile code is smart enough to know the second deletion is a noop. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-27Merge branch 'jt/packmigrate'Junio C Hamano
Code movement to make it easier to hack later. * jt/packmigrate: (23 commits) pack: move for_each_packed_object() pack: move has_pack_index() pack: move has_sha1_pack() pack: move find_pack_entry() and make it global pack: move find_sha1_pack() pack: move find_pack_entry_one(), is_pack_valid() pack: move check_pack_index_ptr(), nth_packed_object_offset() pack: move nth_packed_object_{sha1,oid} pack: move clear_delta_base_cache(), packed_object_info(), unpack_entry() pack: move unpack_object_header() pack: move get_size_from_delta() pack: move unpack_object_header_buffer() pack: move {,re}prepare_packed_git and approximate_object_count pack: move install_packed_git() pack: move add_packed_git() pack: move unuse_pack() pack: move use_pack() pack: move pack-closing functions pack: move release_pack_memory() pack: move open_pack_index(), parse_pack_index() ...
2017-08-27Merge branch 'bw/submodule-config-cleanup'Junio C Hamano
Code clean-up to avoid mixing values read from the .gitmodules file and values read from the .git/config file. * bw/submodule-config-cleanup: submodule: remove gitmodules_config unpack-trees: improve loading of .gitmodules submodule-config: lazy-load a repository's .gitmodules file submodule-config: move submodule-config functions to submodule-config.c submodule-config: remove support for overlaying repository config diff: stop allowing diff to have submodules configured in .git/config submodule: remove submodule_config callback routine unpack-trees: don't respect submodule.update submodule: don't rely on overlayed config when setting diffopts fetch: don't overlay config with submodule-config submodule--helper: don't overlay config in update-clone submodule--helper: don't overlay config in remote_submodule_branch add, reset: ensure submodules can be added or reset submodule: don't use submodule_from_name t7411: check configuration parsing errors
2017-08-27Merge branch 'po/object-id'Junio C Hamano
* po/object-id: sha1_file: convert index_stream to struct object_id sha1_file: convert hash_sha1_file_literally to struct object_id sha1_file: convert index_fd to struct object_id sha1_file: convert index_path to struct object_id read-cache: convert to struct object_id builtin/hash-object: convert to struct object_id
2017-08-27Merge branch 'jt/diff-color-move-fix'Junio C Hamano
A handful of bugfixes and an improvement to "diff --color-moved". * jt/diff-color-move-fix: diff: define block by number of alphanumeric chars diff: respect MIN_BLOCK_LENGTH for last block diff: avoid redundantly clearing a flag
2017-08-27Merge branch 'sb/diff-color-move'Junio C Hamano
"git diff" has been taught to optionally paint new lines that are the same as deleted lines elsewhere differently from genuinely new lines. * sb/diff-color-move: (25 commits) diff: document the new --color-moved setting diff.c: add dimming to moved line detection diff.c: color moved lines differently, plain mode diff.c: color moved lines differently diff.c: buffer all output if asked to diff.c: emit_diff_symbol learns about DIFF_SYMBOL_SUMMARY diff.c: emit_diff_symbol learns about DIFF_SYMBOL_STAT_SEP diff.c: convert word diffing to use emit_diff_symbol diff.c: convert show_stats to use emit_diff_symbol diff.c: convert emit_binary_diff_body to use emit_diff_symbol submodule.c: migrate diff output to use emit_diff_symbol diff.c: emit_diff_symbol learns DIFF_SYMBOL_REWRITE_DIFF diff.c: emit_diff_symbol learns about DIFF_SYMBOL_BINARY_FILES diff.c: emit_diff_symbol learns DIFF_SYMBOL_HEADER diff.c: emit_diff_symbol learns DIFF_SYMBOL_FILEPAIR_{PLUS, MINUS} diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_INCOMPLETE diff.c: emit_diff_symbol learns DIFF_SYMBOL_WORDS[_PORCELAIN] diff.c: migrate emit_line_checked to use emit_diff_symbol diff.c: emit_diff_symbol learns DIFF_SYMBOL_NO_LF_EOF diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_FRAGINFO ...
2017-08-23pack: move has_sha1_pack()Jonathan Tan
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-21sha1_file: convert index_path to struct object_idPatryk Obara
Convert all remaining callers as well. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-17diff: retire sane_truncate_fnJunio C Hamano
Long time ago, 23707811 ("diff: do not chomp hunk-header in the middle of a character", 2008-01-02) introduced sane_truncate_line() helper function to trim the "function header" line that is shown at the end of the hunk header line, in order to avoid chomping it in the middle of a single UTF-8 character. It also added a facility to define a custom callback function to make it possible to extend it to non UTF-8 encodings. During the following 8 1/2 years, nobody found need for this custom callback facility. A custom callback function is a wrong design to use here anyway---if your contents need support for non UTF-8 encoding, you shouldn't have to write a custom function and recompile Git to plumb it in. A better approach would be to extend sane_truncate_line() function and have a new member in emit_callback to conditionally trigger it. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-16diff: define block by number of alphanumeric charsJonathan Tan
The existing behavior of diff --color-moved=zebra does not define the minimum size of a block at all, instead relying on a heuristic applied later to filter out sets of adjacent moved lines that are shorter than 3 lines long. This can be confusing, because a block could thus be colored as moved at the source but not at the destination (or vice versa), depending on its neighbors. Instead, teach diff that the minimum size of a block is 20 alphanumeric characters, the same heuristic used by "git blame". This allows diff to still exclude uninteresting lines appearing on their own (such as those solely consisting of one or a few closing braces), as was the intention of the adjacent-moved-line heuristic. This requires a change in some tests in that some of their lines are no longer considered to be part of a block, because they are too short. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-16diff: respect MIN_BLOCK_LENGTH for last blockJonathan Tan
Currently, MIN_BLOCK_LENGTH is only checked when diff encounters a line that does not belong to the current block. In particular, this means that MIN_BLOCK_LENGTH is not checked after all lines are encountered. Perform that check. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-14diff: avoid redundantly clearing a flagJonathan Tan
No code in diff.c sets DIFF_SYMBOL_MOVED_LINE except in mark_color_as_moved(), so it is redundant to clear it for the current line. Therefore, clear it only for previous lines. This makes a refactoring in a subsequent patch easier. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-03diff: stop allowing diff to have submodules configured in .git/configBrandon Williams
Traditionally a submodule is comprised of a gitlink as well as a corresponding entry in the .gitmodules file. Diff doesn't follow this paradigm as its config callback routine falls back to populating the submodule-config if a config entry starts with 'submodule.'. Remove this behavior in order to be consistent with how the submodule-config is populated, via calling 'gitmodules_config()' or 'repo_read_gitmodules()'. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-13color: check color.ui in git_default_config()Jeff King
Back in prehistoric times, our decision on whether or not to show color by default relied on using a config callback that either did or didn't load color config like color.diff. When we introduced color.ui, we put it in the same boat: commands had to manually respect it by using git_color_config() or its git_color_default_config() convenience wrapper. But in 4c7f1819b (make color.ui default to 'auto', 2013-06-10), that changed. Since then, we default color.ui to auto in all programs, meaning that even plumbing commands like "git diff-tree --pretty" might colorize the output. Nobody seems to have complained in the intervening years, presumably because the "is stdout a tty" check does a good job of catching the right cases. But that leaves an interesting curiosity: color.ui defaults to auto even in plumbing, but you can't actually _disable_ the color via config. So if you really hate color and set "color.ui" to false, diff-tree will still show color (but porcelain like git-diff won't). Nobody noticed that either, probably because very few people disable color. One could argue that the plumbing should _always_ disable color unless an explicit --color option is given on the command line. But in practice, this creates a lot of complications for scripts which do want plumbing to show user-visible output. They can't just pass "--color" blindly; they need to check the user's config and decide what to send. Given that nobody has complained about the current behavior, let's assume it's a good path, and follow it to its conclusion: supporting color.ui everywhere. Note that you can create havoc by setting color.ui=always in your config, but that's more or less already the case. We could disallow it entirely, but it is handy for one-offs like: git -c color.ui=always foo >not-a-tty when "foo" does not take a --color option itself. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-12Merge branch 'rs/use-div-round-up'Junio C Hamano
Code cleanup. * rs/use-div-round-up: use DIV_ROUND_UP
2017-07-10use DIV_ROUND_UPRené Scharfe
Convert code that divides and rounds up to use DIV_ROUND_UP to make the intent clearer and reduce the number of magic constants. Signed-off-by: Rene Scharfe <l.s.r@web.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: add dimming to moved line detectionStefan Beller
Any lines inside a moved block of code are not interesting. Boundaries of blocks are only interesting if they are next to another block of moved code. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: color moved lines differently, plain modeStefan Beller
Add the 'plain' mode for move detection of code. This omits the checking for adjacent blocks, so it is not as useful. If you have a lot of the same blocks moved in the same patch, the 'Zebra' would end up slow as it is O(n^2) (n is number of same blocks). So this may be useful there and is generally easy to add. Instead be very literal at the move detection, do not skip over short blocks here. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: color moved lines differentlyStefan Beller
When a patch consists mostly of moving blocks of code around, it can be quite tedious to ensure that the blocks are moved verbatim, and not undesirably modified in the move. To that end, color blocks that are moved within the same patch differently. For example (OM, del, add, and NM are different colors): [OM] -void sensitive_stuff(void) [OM] -{ [OM] - if (!is_authorized_user()) [OM] - die("unauthorized"); [OM] - sensitive_stuff(spanning, [OM] - multiple, [OM] - lines); [OM] -} void another_function() { [del] - printf("foo"); [add] + printf("bar"); } [NM] +void sensitive_stuff(void) [NM] +{ [NM] + if (!is_authorized_user()) [NM] + die("unauthorized"); [NM] + sensitive_stuff(spanning, [NM] + multiple, [NM] + lines); [NM] +} However adjacent blocks may be problematic. For example, in this potentially malicious patch, the swapping of blocks can be spotted: [OM] -void sensitive_stuff(void) [OM] -{ [OMA] - if (!is_authorized_user()) [OMA] - die("unauthorized"); [OM] - sensitive_stuff(spanning, [OM] - multiple, [OM] - lines); [OMA] -} void another_function() { [del] - printf("foo"); [add] + printf("bar"); } [NM] +void sensitive_stuff(void) [NM] +{ [NMA] + sensitive_stuff(spanning, [NMA] + multiple, [NMA] + lines); [NM] + if (!is_authorized_user()) [NM] + die("unauthorized"); [NMA] +} If the moved code is larger, it is easier to hide some permutation in the code, which is why some alternative coloring is needed. This patch implements the first mode: * basic alternating 'Zebra' mode This conveys all information needed to the user. Defer customization to later patches. First I implemented an alternative design, which would try to fingerprint a line by its neighbors to detect if we are in a block or at the boundary. This idea iss error prone as it inspected each line and its neighboring lines to determine if the line was (a) moved and (b) if was deep inside a hunk by having matching neighboring lines. This is unreliable as the we can construct hunks which have equal neighbors that just exceed the number of lines inspected. (Think of 'AXYZBXYZCXYZD..' with each letter as a line, that is permutated to AXYZCXYZBXYZD..'). Instead this provides a dynamic programming greedy algorithm that finds the largest moved hunk and then has several modes on highlighting bounds. A note on the options '--submodule=diff' and '--color-words/--word-diff': In the conversion to use emit_line in the prior patches both submodules as well as word diff output carefully chose to call emit_line with sign=0. All output with sign=0 is ignored for move detection purposes in this patch, such that no weird looking output will be generated for these cases. This leads to another thought: We could pass on '--color-moved' to submodules such that they color up moved lines for themselves. If we'd do so only line moves within a repository boundary are marked up. It is useful to have moved lines colored, but there are annoying corner cases, such as a single line moved, that is very common. For example in a typical patch of C code, we have closing braces that end statement blocks or functions. While it is technically true that these lines are moved as they show up elsewhere, it is harmful for the review as the reviewers attention is drawn to such a minor side annoyance. For now let's have a simple solution of hardcoding the number of moved lines to be at least 3 before coloring them. Note, that the length is applied across all blocks to find the 'lonely' blocks that pollute new code, but do not interfere with a permutated block where each permutation has less lines than 3. Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: buffer all output if asked toStefan Beller
Introduce a new option 'emitted_symbols' in the struct diff_options which controls whether all output is buffered up until all output is available. It is set internally in diff.c when necessary. We'll have a new struct 'emitted_string' in diff.c which will be used to buffer each line. The emitted_string will duplicate the memory of the line to buffer as that is easiest to reason about for now. In a future patch we may want to decrease the memory usage by not duplicating all output for buffering but rather we may want to store offsets into the file or in case of hunk descriptions such as the similarity score, we could just store the relevant number and reproduce the text later on. This approach was chosen as a first step because it is quite simple compared to the alternative with less memory footprint. emit_diff_symbol factors out the emission part and depending on the diff_options->emitted_symbols the emission will be performed directly when calling emit_diff_symbol or after the whole process is done, i.e. by buffering we have add the possibility for a second pass over the whole output before doing the actual output. In 6440d34 (2012-03-14, diff: tweak a _copy_ of diff_options with word-diff) we introduced a duplicate diff options struct for word emissions as we may have different regex settings in there. When buffering the output, we need to operate on just one buffer, so we have to copy back the emissions of the word buffer into the main buffer. Unconditionally enable output via buffer in this patch as it yields a great opportunity for testing, i.e. all the diff tests from the test suite pass without having reordering issues (i.e. only parts of the output got buffered, and we forgot to buffer other parts). The test suite passes, which gives confidence that we converted all functions to use emit_string for output. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: emit_diff_symbol learns about DIFF_SYMBOL_SUMMARYStefan Beller
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: emit_diff_symbol learns about DIFF_SYMBOL_STAT_SEPStefan Beller
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: convert word diffing to use emit_diff_symbolStefan Beller
The word diffing is not line oriented and would need some serious effort to be transformed into a line oriented approach, so just go with a symbol DIFF_SYMBOL_WORD_DIFF that is a partial line. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: convert show_stats to use emit_diff_symbolStefan Beller
We call print_stat_summary from builtin/apply, so we still need the version with a file pointer, so introduce print_stat_summary_0 that uses emit_string machinery and keep print_stat_summary with the same arguments around. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>