path: root/Documentation
diff options
authorJ. Bruce Fields <>2007-06-10 19:15:08 (GMT)
committerJ. Bruce Fields <>2007-09-16 02:17:23 (GMT)
commit1bbf1c7900337a646b6ca65c8b3505e784012f39 (patch)
treee35dfd1334f98869f0de6dd37f93e5470e1e3184 /Documentation
parent513d419c5989b8aaeec1cb1ed356519370079dc1 (diff)
user-manual: rewrite object database discussion
Rewrite the introduction. Rewrite each section completely to make them work in the new order, to add some examples, and to move plumbing commands (like git-commit-tree) to the following chapter. Signed-off-by: J. Bruce Fields <>
Diffstat (limited to 'Documentation')
1 files changed, 196 insertions, 139 deletions
diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt
index 223ec75..4fb2f30 100644
--- a/Documentation/user-manual.txt
+++ b/Documentation/user-manual.txt
@@ -2723,46 +2723,44 @@ database>> and the <<def_index,index>>.
The Object Database
-The object database is literally just a content-addressable collection
-of objects. All objects are named by their content, which is
-approximated by the SHA1 hash of the object itself. Objects may refer
-to other objects (by referencing their SHA1 hash), and so you can
-build up a hierarchy of objects.
-All objects have a statically determined "type" which is
-determined at object creation time, and which identifies the format of
-the object (i.e. how it is used, and how it can refer to other
-objects). There are currently four different object types: "blob",
-"tree", "commit", and "tag".
-A <<def_blob_object,"blob" object>> cannot refer to any other object,
-and is, as the name implies, a pure storage object containing some
-user data. It is used to actually store the file data, i.e. a blob
-object is associated with some particular version of some file.
-A <<def_tree_object,"tree" object>> is an object that ties one or more
-"blob" objects into a directory structure. In addition, a tree object
-can refer to other tree objects, thus creating a directory hierarchy.
-A <<def_commit_object,"commit" object>> ties such directory hierarchies
-together into a <<def_DAG,directed acyclic graph>> of revisions - each
-"commit" is associated with exactly one tree (the directory hierarchy at
-the time of the commit). In addition, a "commit" refers to one or more
-"parent" commit objects that describe the history of how we arrived at
-that directory hierarchy.
-As a special case, a commit object with no parents is called the "root"
-commit, and is the point of an initial project commit. Each project
-must have at least one root, and while you can tie several different
-root objects together into one project by creating a commit object which
-has two or more separate roots as its ultimate parents, that's probably
-just going to confuse people. So aim for the notion of "one root object
-per project", even if git itself does not enforce that.
-A <<def_tag_object,"tag" object>> symbolically identifies and can be
-used to sign other objects. It contains the identifier and type of
-another object, a symbolic name (of course!) and, optionally, a
+We already saw in <<understanding-commits>> that all commits are stored
+under a 40-digit "object name". In fact, all the information needed to
+represent the history of a project is stored in objects with such names.
+In each case the name is calculated by taking the SHA1 hash of the
+contents of the object. The SHA1 hash is a cryptographic hash function.
+What that means to us is that it is impossible to find two different
+objects with the same name. This has a number of advantages; among
+- Git can quickly determine whether two objects are identical or not,
+ just by comparing names.
+- Since object names are computed the same way in ever repository, the
+ same content stored in two repositories will always be stored under
+ the same name.
+- Git can detect errors when it reads an object, by checking that the
+ object's name is still the SHA1 hash of its contents.
+(See <<object-details>> for the details of the object formatting and
+SHA1 calculation.)
+There are four different types of objects: "blob", "tree", "commit", and
+- A <<def_blob_object,"blob" object>> is used to store file data.
+- A <<def_tree_object,"tree" object>> is an object that ties one or more
+ "blob" objects into a directory structure. In addition, a tree object
+ can refer to other tree objects, thus creating a directory hierarchy.
+- A <<def_commit_object,"commit" object>> ties such directory hierarchies
+ together into a <<def_DAG,directed acyclic graph>> of revisions - each
+ commit contains the object name of exactly one tree designating the
+ directory hierarchy at the time of the commit. In addition, a commit
+ refers to "parent" commit objects that describe the history of how we
+ arrived at that directory hierarchy.
+- A <<def_tag_object,"tag" object>> symbolically identifies and can be
+ used to sign other objects. It contains the object name and type of
+ another object, a symbolic name (of course!) and, optionally, a
+ signature.
The object types in some more detail:
@@ -2770,109 +2768,142 @@ The object types in some more detail:
Commit Object
-The "commit" object is an object that introduces the notion of
-history into the picture. In contrast to the other objects, it
-doesn't just describe the physical state of a tree, it describes how
-we got there, and why.
-A "commit" is defined by the tree-object that it results in, the
-parent commits (zero, one or more) that led up to that point, and a
-comment on what happened. Again, a commit is not trusted per se:
-the contents are well-defined and "safe" due to the cryptographically
-strong signatures at all levels, but there is no reason to believe
-that the tree is "good" or that the merge information makes sense.
-The parents do not have to actually have any relationship with the
-result, for example.
-Note on commits: unlike some SCM's, commits do not contain
-rename information or file mode change information. All of that is
-implicit in the trees involved (the result tree, and the result trees
-of the parents), and describing that makes no sense in this idiotic
-file manager.
-A commit is created with gitlink:git-commit-tree[1] and
-its data can be accessed by gitlink:git-cat-file[1].
+The "commit" object links a physical state of a tree with a description
+of how we got there and why. Use the --pretty=raw option to
+gitlink:git-show[1] or gitlink:git-log[1] to examine your favorite
+$ git show -s --pretty=raw 2be7fcb476
+commit 2be7fcb4764f2dbcee52635b91fedb1b3dcf7ab4
+tree fb3a8bdd0ceddd019615af4d57a53f43d8cee2bf
+parent 257a84d9d02e90447b149af58b271c19405edb6a
+author Dave Watson <> 1187576872 -0400
+committer Junio C Hamano <> 1187591163 -0700
+ Fix misspelling of 'suppress' in docs
+ Signed-off-by: Junio C Hamano <>
+As you can see, a commit is defined by:
+- a tree: The SHA1 name of a tree object (as defined below), representing
+ the contents of a directory at a certain point in time.
+- parent(s): The SHA1 name of some number of commits which represent the
+ immediately prevoius step(s) in the history of the project. The
+ example above has one parent; merge commits may have more than
+ one. A commit with no parents is called a "root" commit, and
+ represents the initial revision of a project. Each project must have
+ at least one root. A project can also have multiple roots, though
+ that isn't common (or necessarily a good idea).
+- an author: The name of the person responsible for this change, together
+ with its date.
+- a committer: The name of the person who actually created the commit,
+ with the date it was done. This may be different from the author, for
+ example, if the author was someone who wrote a patch and emailed it
+ to the person who used it to create the commit.
+- a comment describing this commit.
+Note that a commit does not itself contain any information about what
+actually changed; all changes are calculated by comparing the contents
+of the tree referred to by this commit with the trees associated with
+its parents. In particular, git does not attempt to record file renames
+explicitly, though it can identify cases where the existence of the same
+file data at changing paths suggests a rename. (See, for example, the
+-M option to gitlink:git-diff[1]).
+A commit is usually created by gitlink:git-commit[1], which creates a
+commit whose parent is normally the current HEAD, and whose tree is
+taken from the content currently stored in the index.
Tree Object
-The next hierarchical object type is the "tree" object. A tree object
-is a list of mode/name/blob data, sorted by name. Alternatively, the
-mode data may specify a directory mode, in which case instead of
-naming a blob, that name is associated with another TREE object.
-Like the "blob" object, a tree object is uniquely determined by the
-set contents, and so two separate but identical trees will always
-share the exact same object. This is true at all levels, i.e. it's
-true for a "leaf" tree (which does not refer to any other trees, only
-blobs) as well as for a whole subdirectory.
-For that reason a "tree" object is just a pure data abstraction: it
-has no history, no signatures, no verification of validity, except
-that since the contents are again protected by the hash itself, we can
-trust that the tree is immutable and its contents never change.
-So you can trust the contents of a tree to be valid, the same way you
-can trust the contents of a blob, but you don't know where those
-contents 'came' from.
-Side note on trees: since a "tree" object is a sorted list of
-"filename+content", you can create a diff between two trees without
-actually having to unpack two trees. Just ignore all common parts,
-and your diff will look right. In other words, you can effectively
-(and efficiently) tell the difference between any two random trees by
-O(n) where "n" is the size of the difference, rather than the size of
-the tree.
-Side note 2 on trees: since the name of a "blob" depends entirely and
-exclusively on its contents (i.e. there are no names or permissions
-involved), you can see trivial renames or permission changes by
-noticing that the blob stayed the same. However, renames with data
-changes need a smarter "diff" implementation.
-A tree is created with gitlink:git-write-tree[1] and
-its data can be accessed by gitlink:git-ls-tree[1].
-Two trees can be compared with gitlink:git-diff-tree[1].
+The ever-versatile gitlink:git-show[1] command can also be used to
+examine tree objects, but gitlink:git-ls-tree[1] will give you more
+$ git ls-tree fb3a8bdd0ce
+100644 blob 63c918c667fa005ff12ad89437f2fdc80926e21c .gitignore
+100644 blob 5529b198e8d14decbe4ad99db3f7fb632de0439d .mailmap
+100644 blob 6ff87c4664981e4397625791c8ea3bbb5f2279a3 COPYING
+040000 tree 2fb783e477100ce076f6bf57e4a6f026013dc745 Documentation
+100755 blob 3c0032cec592a765692234f1cba47dfdcc3a9200 GIT-VERSION-GEN
+100644 blob 289b046a443c0647624607d471289b2c7dcd470b INSTALL
+100644 blob 4eb463797adc693dc168b926b6932ff53f17d0b1 Makefile
+100644 blob 548142c327a6790ff8821d67c2ee1eff7a656b52 README
+As you can see, a tree object contains a list of entries, each with a
+mode, object type, SHA1 name, and name, sorted by name. It represents
+the contents of a single directory tree.
+The object type may be a blob, representing the contents of a file, or
+another tree, representing the contents of a subdirectory. Since trees
+and blobs, like all other objects, are named by the SHA1 hash of their
+contents, two trees have the same SHA1 name if and only if their
+contents (including, recursively, the contents of all subdirectories)
+are identical. This allows git to quickly determine the differences
+between two related tree objects, since it can ignore any entries with
+identical object names.
+(Note: in the presence of submodules, trees may also have commits as
+entries. See gitlink:git-submodule[1] and gitlink:gitmodules.txt[1]
+for partial documentation.)
+Note that the files all have mode 644 or 755: git actually only pays
+attention to the executable bit.
Blob Object
-A "blob" object is nothing but a binary blob of data, and doesn't
-refer to anything else. There is no signature or any other
-verification of the data, so while the object is consistent (it 'is'
-indexed by its sha1 hash, so the data itself is certainly correct), it
-has absolutely no other attributes. No name associations, no
-permissions. It is purely a blob of data (i.e. normally "file
+You can use gitlink:git-show[1] to examine the contents of a blob; take,
+for example, the blob in the entry for "COPYING" from the tree above:
+$ git show 6ff87c4664
+ Note that the only valid version of the GPL as far as this project
+ is concerned is _this_ particular version of the license (ie v2, not
+ v2.2 or v3.x or whatever), unless explicitly otherwise stated.
-In particular, since the blob is entirely defined by its data, if two
-files in a directory tree (or in multiple different versions of the
-repository) have the same contents, they will share the same blob
-object. The object is totally independent of its location in the
-directory tree, and renaming a file does not change the object that
-file is associated with in any way.
+A "blob" object is nothing but a binary blob of data. It doesn't refer
+to anything else or have attributes of any kind.
-A blob is typically created when gitlink:git-update-index[1]
-is run, and its data can be accessed by gitlink:git-cat-file[1].
+Since the blob is entirely defined by its data, if two files in a
+directory tree (or in multiple different versions of the repository)
+have the same contents, they will share the same blob object. The object
+is totally independent of its location in the directory tree, and
+renaming a file does not change the object that file is associated with.
+Note that any tree or blob object can be examined using
+gitlink:git-show[1] with the <revision>:<path> syntax. This can
+sometimes be useful for browsing the contents of a tree that is not
+currently checked out.
-An aside on the notion of "trust". Trust is really outside the scope
-of "git", but it's worth noting a few things. First off, since
-everything is hashed with SHA1, you 'can' trust that an object is
-intact and has not been messed with by external sources. So the name
-of an object uniquely identifies a known state - just not a state that
-you may want to trust.
+If you receive the SHA1 name of a blob from one source, and its contents
+from another (possibly untrusted) source, you can still trust that those
+contents are correct as long as the SHA1 name agrees. This is because
+the SHA1 is designed so that it is infeasible to find different contents
+that produce the same hash.
-Furthermore, since the SHA1 signature of a commit refers to the
-SHA1 signatures of the tree it is associated with and the signatures
-of the parent, a single named commit specifies uniquely a whole set
-of history, with full contents. You can't later fake any step of the
-way once you have the name of a commit.
+Similarly, you need only trust the SHA1 name of a top-level tree object
+to trust the contents of the entire directory that it refers to, and if
+you receive the SHA1 name of a commit from a trusted source, then you
+can easily verify the entire history of commits reachable through
+parents of that commit, and all of those contents of the trees referred
+to by those commits.
So to introduce some real trust in the system, the only thing you need
to do is to digitally sign just 'one' special note, which includes the
@@ -2891,23 +2922,31 @@ To assist in this, git also provides the tag object...
Tag Object
-Git provides the "tag" object to simplify creating, managing and
-exchanging symbolic and signed tokens. The "tag" object at its
-simplest simply symbolically identifies another object by containing
-the sha1, type and symbolic name.
-However it can optionally contain additional signature information
-(which git doesn't care about as long as there's less than 8k of
-it). This can then be verified externally to git.
+A tag object contains an object, object type, tag name, the name of the
+person ("tagger") who created the tag, and a message, which may contain
+a signature, as can be seen using the gitlink:git-cat-file[1]:
-Note that despite the tag features, "git" itself only handles content
-integrity; the trust framework (and signature provision and
-verification) has to come from outside.
+$ git cat-file tag v1.5.0
+object 437b1b20df4b356c9342dac8d38849f24ef44f27
+type commit
+tag v1.5.0
+tagger Junio C Hamano <> 1171411200 +0000
+GIT 1.5.0
+Version: GnuPG v1.4.6 (GNU/Linux)
-A tag is created with gitlink:git-mktag[1],
-its data can be accessed by gitlink:git-cat-file[1],
-and the signature can be verified by
+See the gitlink:git-tag[1] command to learn how to create and verify tag
+objects. (Note that gitlink:git-tag[1] can also be used to create
+"lightweight tags", which are not tag objects at all, but just simple
+references in .git/refs/tags/).
@@ -2978,6 +3017,24 @@ scripts using a smaller core of low-level git commands. These can still
be useful when doing unusual things with git, or just as a way to
understand its inner workings.
+Object access and manipulation
+The gitlink:git-cat-file[1] command can show the contents of any object,
+though the higher-level gitlink:git-show[1] is usually more useful.
+The gitlink:git-commit-tree[1] command allows constructing commits with
+arbitrary parents and trees.
+A tree can be created with gitlink:git-write-tree[1] and its data can be
+accessed by gitlink:git-ls-tree[1]. Two trees can be compared with
+A tag is created with gitlink:git-mktag[1], and the signature can be
+verified by gitlink:git-verify-tag[1], though it is normally simpler to
+use gitlink:git-tag[1] for both.
The Workflow