summaryrefslogtreecommitdiff
path: root/Documentation/technical/pack-format.txt
blob: ed2decc107d374b07e9ce91fa755617790672743 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
GIT pack format
===============
 
= pack-*.pack file has the following format:
 
   - The header appears at the beginning and consists of the following:
 
     4-byte signature
     4-byte version number (network byte order)
     4-byte number of objects contained in the pack (network byte order)
 
     Observation: we cannot have more than 4G versions ;-) and
     more than 4G objects in a pack.
 
   - The header is followed by number of object entries, each of
     which looks like this:
 
     (undeltified representation)
     n-byte type and length (4-bit type, (n-1)*7+4-bit length)
     compressed data
 
     (deltified representation)
     n-byte type and length (4-bit type, (n-1)*7+4-bit length)
     20-byte base object name
     compressed delta data
 
     Observation: length of each object is encoded in a variable
     length format and is not constrained to 32-bit or anything.
 
  - The trailer records 20-byte SHA1 checksum of all of the above.
 
= pack-*.idx file has the following format:
 
  - The header consists of 256 4-byte network byte order
    integers.  N-th entry of this table records the number of
    objects in the corresponding pack, the first byte of whose
    object name are smaller than N.  This is called the
    'first-level fan-out' table.
 
    Observation: we would need to extend this to an array of
    8-byte integers to go beyond 4G objects per pack, but it is
    not strictly necessary.
 
  - The header is followed by sorted 28-byte entries, one entry
    per object in the pack.  Each entry is:
 
    4-byte network byte order integer, recording where the
    object is stored in the packfile as the offset from the
    beginning.
 
    20-byte object name.
 
    Observation: we would definitely need to extend this to
    8-byte integer plus 20-byte object name to handle a packfile
    that is larger than 4GB.
 
  - The file is concluded with a trailer:
 
    A copy of the 20-byte SHA1 checksum at the end of
    corresponding packfile.
 
    20-byte SHA1-checksum of all of the above.
 
Pack Idx file:
 
	idx
	    +--------------------------------+
	    | fanout[0] = 2                  |-.
	    +--------------------------------+ |
	    | fanout[1]                      | |
	    +--------------------------------+ |
	    | fanout[2]                      | |
	    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
	    | fanout[255]                    | |
	    +--------------------------------+ |
main	    | offset                         | |
index	    | object name 00XXXXXXXXXXXXXXXX | |
table	    +--------------------------------+ | 
	    | offset                         | |
	    | object name 00XXXXXXXXXXXXXXXX | |
	    +--------------------------------+ |
	  .-| offset                         |<+
	  | | object name 01XXXXXXXXXXXXXXXX |
	  | +--------------------------------+
	  | | offset                         |
	  | | object name 01XXXXXXXXXXXXXXXX |
	  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	  | | offset                         |
	  | | object name FFXXXXXXXXXXXXXXXX |
	  | +--------------------------------+
trailer	  | | packfile checksum              |
	  | +--------------------------------+
	  | | idxfile checksum               |
	  | +--------------------------------+
          .-------.      
                  |
Pack file entry: <+
 
     packed object header:
	1-byte type (upper 4-bit)
	       size0 (lower 4-bit) 
        n-byte sizeN (as long as MSB is set, each 7-bit)
		size0..sizeN form 4+7+7+..+7 bit integer, size0
		is the most significant part.
     packed object data:
        If it is not DELTA, then deflated bytes (the size above
		is the size before compression).
	If it is DELTA, then
	  20-byte base object name SHA1 (the size above is the
	  	size of the delta data that follows).
          delta data, deflated.