summaryrefslogtreecommitdiff
path: root/Documentation/technical/index-format.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/technical/index-format.txt')
-rw-r--r--Documentation/technical/index-format.txt357
1 files changed, 0 insertions, 357 deletions
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
deleted file mode 100644
index 7c4d67a..0000000
--- a/Documentation/technical/index-format.txt
+++ /dev/null
@@ -1,357 +0,0 @@
-Git index format
-================
-
-== The Git index file has the following format
-
- All binary numbers are in network byte order. Version 2 is described
- here unless stated otherwise.
-
- - A 12-byte header consisting of
-
- 4-byte signature:
- The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache")
-
- 4-byte version number:
- The current supported versions are 2, 3 and 4.
-
- 32-bit number of index entries.
-
- - A number of sorted index entries (see below).
-
- - Extensions
-
- Extensions are identified by signature. Optional extensions can
- be ignored if Git does not understand them.
-
- Git currently supports cached tree and resolve undo extensions.
-
- 4-byte extension signature. If the first byte is 'A'..'Z' the
- extension is optional and can be ignored.
-
- 32-bit size of the extension
-
- Extension data
-
- - 160-bit SHA-1 over the content of the index file before this
- checksum.
-
-== Index entry
-
- Index entries are sorted in ascending order on the name field,
- interpreted as a string of unsigned bytes (i.e. memcmp() order, no
- localization, no special casing of directory separator '/'). Entries
- with the same name are sorted by their stage field.
-
- 32-bit ctime seconds, the last time a file's metadata changed
- this is stat(2) data
-
- 32-bit ctime nanosecond fractions
- this is stat(2) data
-
- 32-bit mtime seconds, the last time a file's data changed
- this is stat(2) data
-
- 32-bit mtime nanosecond fractions
- this is stat(2) data
-
- 32-bit dev
- this is stat(2) data
-
- 32-bit ino
- this is stat(2) data
-
- 32-bit mode, split into (high to low bits)
-
- 4-bit object type
- valid values in binary are 1000 (regular file), 1010 (symbolic link)
- and 1110 (gitlink)
-
- 3-bit unused
-
- 9-bit unix permission. Only 0755 and 0644 are valid for regular files.
- Symbolic links and gitlinks have value 0 in this field.
-
- 32-bit uid
- this is stat(2) data
-
- 32-bit gid
- this is stat(2) data
-
- 32-bit file size
- This is the on-disk size from stat(2), truncated to 32-bit.
-
- 160-bit SHA-1 for the represented object
-
- A 16-bit 'flags' field split into (high to low bits)
-
- 1-bit assume-valid flag
-
- 1-bit extended flag (must be zero in version 2)
-
- 2-bit stage (during merge)
-
- 12-bit name length if the length is less than 0xFFF; otherwise 0xFFF
- is stored in this field.
-
- (Version 3 or later) A 16-bit field, only applicable if the
- "extended flag" above is 1, split into (high to low bits).
-
- 1-bit reserved for future
-
- 1-bit skip-worktree flag (used by sparse checkout)
-
- 1-bit intent-to-add flag (used by "git add -N")
-
- 13-bit unused, must be zero
-
- Entry path name (variable length) relative to top level directory
- (without leading slash). '/' is used as path separator. The special
- path components ".", ".." and ".git" (without quotes) are disallowed.
- Trailing slash is also disallowed.
-
- The exact encoding is undefined, but the '.' and '/' characters
- are encoded in 7-bit ASCII and the encoding cannot contain a NUL
- byte (iow, this is a UNIX pathname).
-
- (Version 4) In version 4, the entry path name is prefix-compressed
- relative to the path name for the previous entry (the very first
- entry is encoded as if the path name for the previous entry is an
- empty string). At the beginning of an entry, an integer N in the
- variable width encoding (the same encoding as the offset is encoded
- for OFS_DELTA pack entries; see pack-format.txt) is stored, followed
- by a NUL-terminated string S. Removing N bytes from the end of the
- path name for the previous entry, and replacing it with the string S
- yields the path name for this entry.
-
- 1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes
- while keeping the name NUL-terminated.
-
- (Version 4) In version 4, the padding after the pathname does not
- exist.
-
- Interpretation of index entries in split index mode is completely
- different. See below for details.
-
-== Extensions
-
-=== Cached tree
-
- Cached tree extension contains pre-computed hashes for trees that can
- be derived from the index. It helps speed up tree object generation
- from index for a new commit.
-
- When a path is updated in index, the path must be invalidated and
- removed from tree cache.
-
- The signature for this extension is { 'T', 'R', 'E', 'E' }.
-
- A series of entries fill the entire extension; each of which
- consists of:
-
- - NUL-terminated path component (relative to its parent directory);
-
- - ASCII decimal number of entries in the index that is covered by the
- tree this entry represents (entry_count);
-
- - A space (ASCII 32);
-
- - ASCII decimal number that represents the number of subtrees this
- tree has;
-
- - A newline (ASCII 10); and
-
- - 160-bit object name for the object that would result from writing
- this span of index as a tree.
-
- An entry can be in an invalidated state and is represented by having
- a negative number in the entry_count field. In this case, there is no
- object name and the next entry starts immediately after the newline.
- When writing an invalid entry, -1 should always be used as entry_count.
-
- The entries are written out in the top-down, depth-first order. The
- first entry represents the root level of the repository, followed by the
- first subtree--let's call this A--of the root level (with its name
- relative to the root level), followed by the first subtree of A (with
- its name relative to A), ...
-
-=== Resolve undo
-
- A conflict is represented in the index as a set of higher stage entries.
- When a conflict is resolved (e.g. with "git add path"), these higher
- stage entries will be removed and a stage-0 entry with proper resolution
- is added.
-
- When these higher stage entries are removed, they are saved in the
- resolve undo extension, so that conflicts can be recreated (e.g. with
- "git checkout -m"), in case users want to redo a conflict resolution
- from scratch.
-
- The signature for this extension is { 'R', 'E', 'U', 'C' }.
-
- A series of entries fill the entire extension; each of which
- consists of:
-
- - NUL-terminated pathname the entry describes (relative to the root of
- the repository, i.e. full pathname);
-
- - Three NUL-terminated ASCII octal numbers, entry mode of entries in
- stage 1 to 3 (a missing stage is represented by "0" in this field);
- and
-
- - At most three 160-bit object names of the entry in stages from 1 to 3
- (nothing is written for a missing stage).
-
-=== Split index
-
- In split index mode, the majority of index entries could be stored
- in a separate file. This extension records the changes to be made on
- top of that to produce the final index.
-
- The signature for this extension is { 'l', 'i', 'n', 'k' }.
-
- The extension consists of:
-
- - 160-bit SHA-1 of the shared index file. The shared index file path
- is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the
- index does not require a shared index file.
-
- - An ewah-encoded delete bitmap, each bit represents an entry in the
- shared index. If a bit is set, its corresponding entry in the
- shared index will be removed from the final index. Note, because
- a delete operation changes index entry positions, but we do need
- original positions in replace phase, it's best to just mark
- entries for removal, then do a mass deletion after replacement.
-
- - An ewah-encoded replace bitmap, each bit represents an entry in
- the shared index. If a bit is set, its corresponding entry in the
- shared index will be replaced with an entry in this index
- file. All replaced entries are stored in sorted order in this
- index. The first "1" bit in the replace bitmap corresponds to the
- first index entry, the second "1" bit to the second entry and so
- on. Replaced entries may have empty path names to save space.
-
- The remaining index entries after replaced ones will be added to the
- final index. These added entries are also sorted by entry name then
- stage.
-
-== Untracked cache
-
- Untracked cache saves the untracked file list and necessary data to
- verify the cache. The signature for this extension is { 'U', 'N',
- 'T', 'R' }.
-
- The extension starts with
-
- - A sequence of NUL-terminated strings, preceded by the size of the
- sequence in variable width encoding. Each string describes the
- environment where the cache can be used.
-
- - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from
- ctime field until "file size".
-
- - Stat data of core.excludesfile
-
- - 32-bit dir_flags (see struct dir_struct)
-
- - 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file
- does not exist.
-
- - 160-bit SHA-1 of core.excludesfile. Null SHA-1 means the file does
- not exist.
-
- - NUL-terminated string of per-dir exclude file name. This usually
- is ".gitignore".
-
- - The number of following directory blocks, variable width
- encoding. If this number is zero, the extension ends here with a
- following NUL.
-
- - A number of directory blocks in depth-first-search order, each
- consists of
-
- - The number of untracked entries, variable width encoding.
-
- - The number of sub-directory blocks, variable width encoding.
-
- - The directory name terminated by NUL.
-
- - A number of untracked file/dir names terminated by NUL.
-
-The remaining data of each directory block is grouped by type:
-
- - An ewah bitmap, the n-th bit marks whether the n-th directory has
- valid untracked cache entries.
-
- - An ewah bitmap, the n-th bit records "check-only" bit of
- read_directory_recursive() for the n-th directory.
-
- - An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data
- is valid for the n-th directory and exists in the next data.
-
- - An array of stat data. The n-th data corresponds with the n-th
- "one" bit in the previous ewah bitmap.
-
- - An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit
- in the previous ewah bitmap.
-
- - One NUL.
-
-== File System Monitor cache
-
- The file system monitor cache tracks files for which the core.fsmonitor
- hook has told us about changes. The signature for this extension is
- { 'F', 'S', 'M', 'N' }.
-
- The extension starts with
-
- - 32-bit version number: the current supported version is 1.
-
- - 64-bit time: the extension data reflects all changes through the given
- time which is stored as the nanoseconds elapsed since midnight,
- January 1, 1970.
-
- - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap.
-
- - An ewah bitmap, the n-th bit indicates whether the n-th index entry
- is not CE_FSMONITOR_VALID.
-
-== End of Index Entry
-
- The End of Index Entry (EOIE) is used to locate the end of the variable
- length index entries and the begining of the extensions. Code can take
- advantage of this to quickly locate the index extensions without having
- to parse through all of the index entries.
-
- Because it must be able to be loaded before the variable length cache
- entries and other index extensions, this extension must be written last.
- The signature for this extension is { 'E', 'O', 'I', 'E' }.
-
- The extension consists of:
-
- - 32-bit offset to the end of the index entries
-
- - 160-bit SHA-1 over the extension types and their sizes (but not
- their contents). E.g. if we have "TREE" extension that is N-bytes
- long, "REUC" extension that is M-bytes long, followed by "EOIE",
- then the hash would be:
-
- SHA-1("TREE" + <binary representation of N> +
- "REUC" + <binary representation of M>)
-
-== Index Entry Offset Table
-
- The Index Entry Offset Table (IEOT) is used to help address the CPU
- cost of loading the index by enabling multi-threading the process of
- converting cache entries from the on-disk format to the in-memory format.
- The signature for this extension is { 'I', 'E', 'O', 'T' }.
-
- The extension consists of:
-
- - 32-bit version (currently 1)
-
- - A number of index offset entries each consisting of:
-
- - 32-bit offset from the begining of the file to the first cache entry
- in this block of entries.
-
- - 32-bit count of cache entries in this block