summaryrefslogtreecommitdiff
path: root/diff.c
AgeCommit message (Collapse)Author
2006-04-04Support for pickaxe matching regular expressionsPetr Baudis
git-diff-* --pickaxe-regex will change the -S pickaxe to match POSIX extended regular expressions instead of fixed strings. The regex.h library is a rather stupid interface and I like pcre too, but with any luck it will be everywhere we will want to run Git on, it being POSIX.2 and all. I'm not sure if we can expect platforms like AIX to conform to POSIX.2 or if win32 has regex.h. We might add a flag to Makefile if there is a portability trouble potential. Signed-off-by: Petr Baudis <pasky@suse.cz>
2006-03-30tree/diff header cleanup.Junio C Hamano
Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-28xdiff: Show function names in hunk headers.Mark Wooding
The speed of the built-in diff generator is nice; but the function names shown by `diff -p' are /really/ nice. And I hate having to choose. So, we hack xdiff to find the function names and print them. xdiff has grown a flag to say whether to dig up the function names. The builtin_diff function passes this flag unconditionally. I suppose it could parse GIT_DIFF_OPTS, but it doesn't at the moment. I've also reintroduced the `function name' into the test suite, from which it was removed in commit 3ce8f089. The function names are parsed by a particularly stupid algorithm at the moment: it just tries to find a line in the `old' file, from before the start of the hunk, whose first character looks plausible. Still, it's most definitely a start. Signed-off-by: Mark Wooding <mdw@distorted.org.uk> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-26Merge branch 'lt/diffgen' into nextJunio C Hamano
* lt/diffgen: true built-in diff: run everything in-core.
2006-03-26true built-in diff: run everything in-core.Junio C Hamano
This stops using temporary files when we are using the built-in diff (including the complete rewrite). Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-26Merge branch 'lt/diffgen' into nextJunio C Hamano
* lt/diffgen: built-in diff: minimum tweaks builtin-diff: \No newline at end of file. Use a *real* built-in diff generator
2006-03-26built-in diff: minimum tweaksJunio C Hamano
This fixes up a couple of minor issues with the real built-in diff to be more usable: - Omit ---/+++ header unless we emit diff output; - Detect and punt binary diff like GNU does; - Honor GIT_DIFF_OPTS minimally (only -u<number> and --unified=<number> are currently supported); - Omit line count of 1 from "@@ -l,k +m,n @@" hunk header (i.e. when k == 1 or n == 1) - Adjust testsuite for the lack of -p support. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-26Use a *real* built-in diff generatorLinus Torvalds
This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-12diffcore-rename: somewhat optimized.Junio C Hamano
This changes diffcore-rename to reuse statistics information gathered during similarity estimation, and updates the hashtable implementation used to keep track of the statistics to be denser. This seems to give better performance. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-27Make git diff-generation use a simpler spawn-like interfaceLinus Torvalds
Instead of depending of fork() and execve() and doing things in between the two, make the git diff functions do everything up front, and then do a single "spawn_prog()" invocation to run the actual external diff program (if any is even needed). This actually ends up simplifying the code, and should make it much easier to make it efficient under broken operating systems (read: Windows). Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22Merge branch 'jc/nostat'Junio C Hamano
* jc/nostat: cache_name_compare() compares name and stage, nothing else. "assume unchanged" git: documentation. ls-files: split "show-valid-bit" into a different option. "Assume unchanged" git: --really-refresh fix. ls-files: debugging aid for CE_VALID changes. "Assume unchanged" git: do not set CE_VALID with --refresh "Assume unchanged" git
2006-02-10find_unique_abbrev() simplification.Junio C Hamano
Earlier it did not grok the 0{40} SHA1 very well, but what it needed to do was to find the shortest 0{N} that is not used as a valid object name to be consistent with the way names of valid objects are abbreviated. This makes some users simpler. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-09"Assume unchanged" gitJunio C Hamano
This adds "assume unchanged" logic, started by this message in the list discussion recently: <Pine.LNX.4.64.0601311807470.7301@g5.osdl.org> This is a workaround for filesystems that do not have lstat() that is quick enough for the index mechanism to take advantage of. On the paths marked as "assumed to be unchanged", the user needs to explicitly use update-index to register the object name to be in the next commit. You can use two new options to update-index to set and reset the CE_VALID bit: git-update-index --assume-unchanged path... git-update-index --no-assume-unchanged path... These forms manipulate only the CE_VALID bit; it does not change the object name recorded in the index file. Nor they add a new entry to the index. When the configuration variable "core.ignorestat = true" is set, the index entries are marked with CE_VALID bit automatically after: - update-index to explicitly register the current object name to the index file. - when update-index --refresh finds the path to be up-to-date. - when tools like read-tree -u and apply --index update the working tree file and register the current object name to the index file. The flag is dropped upon read-tree that does not check out the index entry. This happens regardless of the core.ignorestat settings. Index entries marked with CE_VALID bit are assumed to be unchanged most of the time. However, there are cases that CE_VALID bit is ignored for the sake of safety and usability: - while "git-read-tree -m" or git-apply need to make sure that the paths involved in the merge do not have local modifications. This sacrifices performance for safety. - when git-checkout-index -f -q -u -a tries to see if it needs to checkout the paths. Otherwise you can never check anything out ;-). - when git-update-index --really-refresh (a new flag) tries to see if the index entry is up to date. You can start with everything marked as CE_VALID and run this once to drop CE_VALID bit for paths that are modified. Most notably, "update-index --refresh" honours CE_VALID and does not actively stat, so after you modified a file in the working tree, update-index --refresh would not notice until you tell the index about it with "git-update-index path" or "git-update-index --no-assume-unchanged path". This version is not expected to be perfect. I think diff between index and/or tree and working files may need some adjustment, and there probably needs other cases we should automatically unmark paths that are marked to be CE_VALID. But the basics seem to work, and ready to be tested by people who asked for this feature. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-02Allow diff and index commands to be interruptedPetr Baudis
So far, e.g. git-update-index --refresh was basically uninterruptable by ctrl-c, since it hooked the SIGINT handler, but that handler would only unlink the lockfile but not actually quit. This makes it propagate the signal to the default handler. Note that I expected it to work without resetting the signal handler to SIG_DFL, but without that it ended in an infinite loop of tgkill()s - is my glibc violating SUS or what? Signed-off-by: Petr Baudis <pasky@suse.cz> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-01-28diff --abbrev=<n> option fix.Junio C Hamano
Earier specifying an abbreviation shorter than minimum fell back to full 40 letters, which was nonsense. Make it to fall back to the minimum number (currently 4). Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-01-28abbrev cleanup: use symbolic constantsJunio C Hamano
The minimum length of abbreviated object name was hardcoded in different places to be 4, risking inconsistencies in the future. Also there were three different "default abbreviation precision". Use two C preprocessor symbols to clean up this mess. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-12-29code comments: spellJunio C Hamano
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-12-27Handle symlinks graciouslyJohannes Schindelin
This patch converts a stat() to an lstat() call, thereby fixing the case when the date of a symlink was not the same as the one recorded in the index. The included test case demonstrates this. This is for the case that the symlink points to a non-existing file. If the file exists, worse things than just an error message happen. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-12-27avoid asking ?alloc() for zero bytes.Junio C Hamano
Avoid asking for zero bytes when that change simplifies overall logic. Later we would change the wrapper to ask for 1 byte on platforms that return NULL for zero byte request. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-12-20diff: --abbrev optionJunio C Hamano
When I show transcripts to explain how something works, I often find myself hand-editing the diff-raw output to shorten various object names in the output. This adds --abbrev option to the diff family, which shortens diff-raw output and diff-tree commit id headers. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-22Move diff.renamelimit out of default configuration.Junio C Hamano
Otherwise we would end up linking all the unneeded stuff into git-daemon only to link with git_default_config. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-21rename/copy score parsing updates.H. Peter Anvin
Better variant, which handles stuff like "4.5%" and rejects "192.168.0.1". Additionally, make sure numbers are unsigned (I'm making them unsigned long just for the hell of it), to make sure that artificial wraparound scenarios don't cause harm. -hpa [jc: with this, -M100 changes its meaning back to 10%. People wanting to say "pure renames only" should now say -M100% or -M1.0; sounds a bit like an earthquake, but arguably things are more consistent this way ;-)] Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-21rename detection with -M100 means "exact renames only".Junio C Hamano
When the user is interested in pure renames, there is no point doing the similarity scores. This changes the score argument parsing to special case -M100 (otherwise, it is a precision scaled value 0 <= v < 1 and would mean 0.1, not 1.0 --- if you do mean 0.1, you can say -M1), and optimizes the diffcore_rename transformation to only look at pure renames in that case. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-17diff: --full-indexJunio C Hamano
A new option, --full-index, is introduced to diff family. This causes the full object name of pre- and post-images to appear on the index line of patch formatted output, to be used in conjunction with --allow-binary-replacement option of git-apply. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-15diff: make default rename detection limit configurable.Junio C Hamano
A while ago, a rename-detection limit logic was implemented as a response to this thread: http://marc.theaimsgroup.com/?l=git&m=112413080630175 where gitweb was found to be using a lot of time and memory to detect renames on huge commits. git-diff family takes -l<num> flag, and if the number of paths that are rename destination candidates (i.e. new paths with -M, or modified paths with -C) are larger than that number, skips rename/copy detection even when -M or -C is specified on the command line. This commit makes the rename detection limit easier to use. You can have: [diff] renamelimit = 30 in your .git/config file to specify the default rename detection limit. You can override this from the command line; giving 0 means 'unlimited': git diff -M -l0 We might want to change the default behaviour, when you do not have the configuration, to limit it to say 20 paths or so. This would also help the diffstat generation after a big 'git pull'. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-23Split up tree diff functions into tree-diff.c libraryLinus Torvalds
This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-18Handle "-" at beginning of filenames, part 3Linus Torvalds
This fixes the default built-in exec() of "diff" to add a "--" before the filenames, so that if a filename starts with a "-", the diff program won't think it's an option. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-18Update git-diff-* to use C-style quoting for funny pathnames.Junio C Hamano
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-07Show original and resulting blob object info in diff output.Junio C Hamano
This adds more cruft to diff --git header to record the blob SHA1 and the mode the patch/diff is intended to be applied against, to help the receiving end fall back on a three-way merge. The new header looks like this: diff --git a/apply.c b/apply.c index 7be5041..8366082 100644 --- a/apply.c +++ b/apply.c @@ -14,6 +14,7 @@ // files that are being modified, but doesn't apply the patch // --stat does just a diffstat, and doesn't actually apply +// --show-index-info shows the old and new index info for... ... Upon receiving such a patch, if the patch did not apply cleanly to the target tree, the recipient can try to find the matching old objects in her object database and create a temporary tree, apply the patch to that temporary tree, and attempt a 3-way merge between the patched temporary tree and the target tree using the original temporary tree as the common ancestor. The patch lifts the code to compute the hash for an on-filesystem object from update-index.c and makes it available to the diff output routine. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-01Consolidate null_sha1[].Junio C Hamano
Signed-off-by: Junio C Hamano <junio@twinsun.com>
2005-09-25Diff: --name-status output format.Junio C Hamano
The new output format shows only the status letter and paths. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-25Diff: -l<num> to limit rename/copy detection.Junio C Hamano
When many paths are modified, rename detection takes a lot of time. The new option -l<num> can be used to disable rename detection when more than <num> paths are possibly created as renames. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-25Diff clean-up.Junio C Hamano
This is a long overdue clean-up to the code for parsing and passing diff options. It also tightens some constness issues. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-22Retire diff-helper.Junio C Hamano
The textual diff generation with built-in '-p' in diff-* brothers has proven to be useful enough that git-diff-helper outlived its usefulness. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-15Plug diff leaks.Junio C Hamano
It is a bit embarrassing that it took this long for a fix since the problem was first reported on Aug 13th. Message-ID: <87y876gl1r.wl@mail2.atmark-techno.com> From: Yasushi SHOJI <yashi@atmark-techno.com> Newsgroups: gmane.comp.version-control.git Subject: [patch] possible memory leak in diff.c::diff_free_filepair() Date: Sat, 13 Aug 2005 19:58:56 +0900 This time I used valgrind to make sure that it does not overeagerly discard memory that is still being used. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-14Revert "[PATCH] plug memory leak in diff.c::diff_free_filepair()"Junio C Hamano
This reverts 068eac91ce04b9aca163acb1927c3878c45d1a07 commit.
2005-09-14[PATCH] Fix alloc_filespec() initializationLinus Torvalds
This simplifies and fixes the initialization of a "diff_filespec" when allocated. The old code would not initialize "sha1_valid". Noticed by valgrind. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-10Fix copy marking from diffcore-rename.Junio C Hamano
When (A,B) ==> (B,C) rename-copy was detected, we incorrectly said that C was created by copying B. This is because we only check if the path of rename/copy source still exists in the resulting tree to see if the file is renamed out of existence. In this case, the new B is created by copying or renaming A, so the original B is lost and we should say C is a rename of B not a copy of B. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-09Retire support for old environment variables.Junio C Hamano
We have deprecated the old environment variable names for quite a while and now it's time to remove them. Gone are: SHA1_FILE_DIRECTORIES AUTHOR_DATE AUTHOR_EMAIL AUTHOR_NAME COMMIT_AUTHOR_EMAIL COMMIT_AUTHOR_NAME SHA1_FILE_DIRECTORY Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-08-30Fix compilation warnings.Junio C Hamano
... found by compiling them with gcc 2.95. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-08-24Fix ?: statements.Jason Riedy
Omitting the first branch in ?: is a GNU extension. Cute, but not supported by other compilers. Replaced mostly by explicit tests. Calls to getenv() simply are repeated on non-GNU compilers. Signed-off-by: Jason Riedy <ejr@cs.berkeley.edu>
2005-08-21[PATCH] possible memory leak in diff.c::diff_free_filepair()Yasushi SHOJI
Here is a patch to fix the problem in the simplest way.
2005-08-14[PATCH] plug memory leak in diff.c::diff_free_filepair()Yasushi SHOJI
When I run git-diff-tree on big change, it seems the command eats so much memory. so I just put git under valgrind to see what's going on. diff_free_filespec_data() doesn't free diff_filespec itself. [jc: I ended up doing things slightly differently from Yasushi's patch. The original idea was to use free_filespec_data() only to free the data portion and keep useing the filespec itself, but no existing code seems to do things that way, so I just yanked that part out.] Signed-off-by: Yasushi SHOJI <yashi@atmark-techno.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-08-10A bit more format warning squelching.Junio C Hamano
Inspired by patch from Timo Sirainen. Most of them are not strictly necessary but making warnings less chatty would help spot real bugs later. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-08-06[PATCH] git: use git_mkstemp() instead of mkstemp() for diff generation.Holger Eitzenberger
This lets you run git diff in a repository otherwise read-only to you. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-07-30[PATCH] mmap error handlingPavel Roskin
I have reviewed all occurrences of mmap() in git and fixed three types of errors/defects: 1) The result is not checked. 2) The file descriptor is closed if mmap() succeeds, but not when it fails. 3) Various casts applied to -1 are used instead of MAP_FAILED, which is specifically defined to check mmap() return value. [jc: This is a second round of Pavel's patch. He fixed up the problem that close() potentially clobbering the errno from mmap, which the first round had.] Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-07-26Use symbolic constants for diff-raw status indicators.Junio C Hamano
Both Cogito and StGIT prefer to see 'A' for new files. The current 'N' is visually harder to distinguish from 'M', which is used for modified files. Prepare the internals to use symbolic constants to make the change easier. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-07-15Split up "diff_format" into "format" and "line_termination".Linus Torvalds
This removes the separate "formats" for name and name-with-zero- termination. It also removes the difference between HUMAN and MACHINE formats, and they both become DIFF_FORMAT_RAW, with the difference being just in the line and inter-filename termination. It also makes the code easier to understand.
2005-07-13[PATCH] git-diff-*: --name-only and --name-only-z.Junio C Hamano
Porcelain layers often want to find only names of changed files, and even with diff-raw output format they end up having to pick out only the filename. Support --name-only (and --name-only-z for xargs -0 and cpio -0 users that want to treat filenames with embedded newlines sanely) flag to help them. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-08[PATCH] Make sq_expand() available as sq_quote().Junio C Hamano
A useful shell safety helper sq_expand() was hidden as a static function in diff.c. Extract it out and make it available as sq_quote(). Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>