path: root/Documentation/technical/api-diff.txt
diff options
authorJunio C Hamano <>2007-11-25 07:48:04 (GMT)
committerJunio C Hamano <>2007-12-15 06:29:38 (GMT)
commit530e741c726a612d78de21957d531dd2215483b4 (patch)
tree52cc876e8da8eb90edfabb1f85159d065a78bc91 /Documentation/technical/api-diff.txt
parentfa4701601a62664a9246a211c5d26f238820737e (diff)
Start preparing the API documents.
Most of them are still stubs, but the procedure to build the HTML documentation, maintaining the index and installing the end product are there. I placed names of people who are likely to know the most about the topic in the stub files, so that volunteers will know whom to ask questions as needed. Signed-off-by: Junio C Hamano <>
Diffstat (limited to 'Documentation/technical/api-diff.txt')
1 files changed, 166 insertions, 0 deletions
diff --git a/Documentation/technical/api-diff.txt b/Documentation/technical/api-diff.txt
new file mode 100644
index 0000000..822609b
--- /dev/null
+++ b/Documentation/technical/api-diff.txt
@@ -0,0 +1,166 @@
+diff API
+The diff API is for programs that compare two sets of files (e.g. two
+trees, one tree and the index) and present the found difference in
+various ways. The calling program is responsible for feeding the API
+pairs of files, one from the "old" set and the corresponding one from
+"new" set, that are different. The library called through this API is
+called diffcore, and is responsible for two things.
+* finding total rewrites (`-B`), renames (`-M`) and copies (`-C`), and
+ changes that touch a string (`-S`), as specified by the caller.
+* outputting the differences in various formats, as specified by the
+ caller.
+Calling sequence
+* Prepare `struct diff_options` to record the set of diff options, and
+ then call `diff_setup()` to initialize this structure. This sets up
+ the vanilla default.
+* Fill in the options structure to specify desired output format, rename
+ detection, etc. `diff_opt_parse()` can be used to parse options given
+ from the command line in a way consistent with existing git-diff
+ family of programs.
+* Call `diff_setup_done()`; this inspects the options set up so far for
+ internal consistency and make necessary tweaking to it (e.g. if
+ textual patch output was asked, recursive behaviour is turned on).
+* As you find different pairs of files, call `diff_change()` to feed
+ modified files, `diff_addremove()` to feed created or deleted files,
+ or `diff_unmerged()` to feed a file whose state is 'unmerged' to the
+ API. These are thin wrappers to a lower-level `diff_queue()` function
+ that is flexible enough to record any of these kinds of changes.
+* Once you finish feeding the pairs of files, call `diffcore_std()`.
+ This will tell the diffcore library to go ahead and do its work.
+* Calling `diffcore_flush()` will produce the output.
+Data structures
+* `struct diff_filespec`
+This is the internal representation for a single file (blob). It
+records the blob object name (if known -- for a work tree file it
+typically is a NUL SHA-1), filemode and pathname. This is what the
+`diff_addremove()`, `diff_change()` and `diff_unmerged()` synthesize and
+feed `diff_queue()` function with.
+* `struct diff_filepair`
+This records a pair of `struct diff_filespec`; the filespec for a file
+in the "old" set (i.e. preimage) is called `one`, and the filespec for a
+file in the "new" set (i.e. postimage) is called `two`. A change that
+represents file creation has NULL in `one`, and file deletion has NULL
+in `two`.
+A `filepair` starts pointing at `one` and `two` that are from the same
+filename, but `diffcore_std()` can break pairs and match component
+filespecs with other filespecs from a different filepair to form new
+filepair. This is called 'rename detection'.
+* `struct diff_queue`
+This is a collection of filepairs. Notable members are:
+ An array of pointers to `struct diff_filepair`. This
+ dynamically grows as you add filepairs;
+ The allocated size of the `queue` array;
+ The number of elements in the `queue` array.
+* `struct diff_options`
+This describes the set of options the calling program wants to affect
+the operation of diffcore library with.
+Notable members are:
+ The output format used when `diff_flush()` is run.
+ Number of context lines to generate in patch output.
+`break_opt`, `detect_rename`, `rename-score`, `rename_limit`::
+ Affects the way detection logic for complete rewrites, renames
+ and copies.
+ Number of hexdigits to abbrevate raw format output to.
+ A constant string (can and typically does contain newlines to
+ look for a block of text, not just a single line) to filter out
+ the filepairs that do not change the number of strings contained
+ in its preimage and postmage of the diff_queue.
+ This is mostly a collection of boolean options that affects the
+ operation, but some do not have anything to do with the diffcore
+ library.
+ Affects the way how a file that is seemingly binary is treated.
+ Tells the patch output format not to use abbreviated object
+ names on the "index" lines.
+ Tells the diffcore library that the caller is feeding unchanged
+ filepairs to allow copies from unmodified files be detected.
+ Output should be colored.
+ Output is a colored word-diff.
+ Tells diff-files that the input is not tracked files but files
+ in random locations on the filesystem.
+ Tells output routine that it is Ok to call user specified patch
+ output routine. Plumbing disables this to ensure stable output.
+ Do not show any output.
+ Tells the library that the calling program is feeding the
+ filepairs reversed; `one` is two, and `two` is one.
+ For communication between the calling program and the options
+ parser; tell the calling program to signal the presense of
+ difference using program exit code.
+ Internal; used for optimization to see if there is any change.
+ Affects if diff-files shows removed files.
+ Tells if tree traversal done by tree-diff should recursively
+ descend into a tree object pair that are different in preimage
+ and postimage set.