summaryrefslogtreecommitdiff
path: root/Documentation/technical
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/technical')
-rw-r--r--Documentation/technical/api-error-handling.txt26
-rw-r--r--Documentation/technical/api-index-skel.txt2
-rw-r--r--Documentation/technical/api-merge.txt4
-rw-r--r--Documentation/technical/api-parse-options.txt49
-rw-r--r--Documentation/technical/api-simple-ipc.txt12
-rw-r--r--Documentation/technical/api-trace2.txt316
-rw-r--r--Documentation/technical/bitmap-format.txt242
-rw-r--r--Documentation/technical/bundle-format.txt76
-rw-r--r--Documentation/technical/bundle-uri.txt572
-rw-r--r--Documentation/technical/chunk-format.txt116
-rw-r--r--Documentation/technical/commit-graph-format.txt158
-rw-r--r--Documentation/technical/commit-graph.txt14
-rw-r--r--Documentation/technical/hash-function-transition.txt4
-rw-r--r--Documentation/technical/http-protocol.txt522
-rw-r--r--Documentation/technical/index-format.txt406
-rw-r--r--Documentation/technical/long-running-process-protocol.txt2
-rw-r--r--Documentation/technical/multi-pack-index.txt22
-rw-r--r--Documentation/technical/pack-format.txt464
-rw-r--r--Documentation/technical/pack-protocol.txt709
-rw-r--r--Documentation/technical/packfile-uri.txt2
-rw-r--r--Documentation/technical/parallel-checkout.txt12
-rw-r--r--Documentation/technical/partial-clone.txt13
-rw-r--r--Documentation/technical/protocol-capabilities.txt380
-rw-r--r--Documentation/technical/protocol-common.txt99
-rw-r--r--Documentation/technical/protocol-v2.txt564
-rw-r--r--Documentation/technical/racy-git.txt10
-rw-r--r--Documentation/technical/reftable.txt12
-rw-r--r--Documentation/technical/remembering-renames.txt6
-rw-r--r--Documentation/technical/repository-version.txt12
-rw-r--r--Documentation/technical/rerere.txt18
-rw-r--r--Documentation/technical/scalar.txt66
-rw-r--r--Documentation/technical/signature-format.txt186
-rw-r--r--Documentation/technical/sparse-checkout.txt1103
-rw-r--r--Documentation/technical/unit-tests.txt240
34 files changed, 2507 insertions, 3932 deletions
diff --git a/Documentation/technical/api-error-handling.txt b/Documentation/technical/api-error-handling.txt
index 8be4f4d..665c496 100644
--- a/Documentation/technical/api-error-handling.txt
+++ b/Documentation/technical/api-error-handling.txt
@@ -1,12 +1,34 @@
Error reporting in git
======================
-`BUG`, `die`, `usage`, `error`, and `warning` report errors of
+`BUG`, `bug`, `die`, `usage`, `error`, and `warning` report errors of
various kinds.
- `BUG` is for failed internal assertions that should never happen,
i.e. a bug in git itself.
+- `bug` (lower-case, not `BUG`) is supposed to be used like `BUG` but
+ prints a "BUG" message instead of calling `abort()`.
++
+A call to `bug()` will then result in a "real" call to the `BUG()`
+function, either explicitly by invoking `BUG_if_bug()` after call(s)
+to `bug()`, or implicitly at `exit()` time where we'll check if we
+encountered any outstanding `bug()` invocations.
++
+If there were no prior calls to `bug()` before invoking `BUG_if_bug()`
+the latter is a NOOP. The `BUG_if_bug()` function takes the same
+arguments as `BUG()` itself. Calling `BUG_if_bug()` explicitly isn't
+necessary, but ensures that we die as soon as possible.
++
+If you know you had prior calls to `bug()` then calling `BUG()` itself
+is equivalent to calling `BUG_if_bug()`, the latter being a wrapper
+calling `BUG()` if we've set a flag indicating that we've called
+`bug()`.
++
+This is for the convenience of APIs who'd like to potentially report
+more than one "bug", such as the optbug() validation in
+parse-options.c.
+
- `die` is for fatal application errors. It prints a message to
the user and exits with status 128.
@@ -24,7 +46,7 @@ various kinds.
returns -1 after reporting the situation to the caller.
These reports will be logged via the trace2 facility. See the "error"
-event in link:api-trace2.txt[trace2 API].
+event in link:api-trace2.html[trace2 API].
Customizable error handlers
---------------------------
diff --git a/Documentation/technical/api-index-skel.txt b/Documentation/technical/api-index-skel.txt
index eda8c19..7780a76 100644
--- a/Documentation/technical/api-index-skel.txt
+++ b/Documentation/technical/api-index-skel.txt
@@ -1,7 +1,7 @@
Git API Documents
=================
-Git has grown a set of internal API over time. This collection
+Git has grown a set of internal APIs over time. This collection
documents them.
////////////////////////////////////////////////////////////////
diff --git a/Documentation/technical/api-merge.txt b/Documentation/technical/api-merge.txt
index 487d4d8..c2ba018 100644
--- a/Documentation/technical/api-merge.txt
+++ b/Documentation/technical/api-merge.txt
@@ -28,9 +28,9 @@ and `diff.c` for examples.
* `struct ll_merge_options`
-Check ll-merge.h for details.
+Check merge-ll.h for details.
Low-level (single file) merge
-----------------------------
-Check ll-merge.h for details.
+Check merge-ll.h for details.
diff --git a/Documentation/technical/api-parse-options.txt b/Documentation/technical/api-parse-options.txt
index acfd5dc..61fa6ee 100644
--- a/Documentation/technical/api-parse-options.txt
+++ b/Documentation/technical/api-parse-options.txt
@@ -8,7 +8,8 @@ Basics
------
The argument vector `argv[]` may usually contain mandatory or optional
-'non-option arguments', e.g. a filename or a branch, and 'options'.
+'non-option arguments', e.g. a filename or a branch, 'options', and
+'subcommands'.
Options are optional arguments that start with a dash and
that allow to change the behavior of a command.
@@ -48,6 +49,33 @@ The parse-options API allows:
option, e.g. `-a -b --option -- --this-is-a-file` indicates that
`--this-is-a-file` must not be processed as an option.
+Subcommands are special in a couple of ways:
+
+* Subcommands only have long form, and they have no double dash prefix, no
+ negated form, and no description, and they don't take any arguments, and
+ can't be abbreviated.
+
+* There must be exactly one subcommand among the arguments, or zero if the
+ command has a default operation mode.
+
+* All arguments following the subcommand are considered to be arguments of
+ the subcommand, and, conversely, arguments meant for the subcommand may
+ not precede the subcommand.
+
+Therefore, if the options array contains at least one subcommand and
+`parse_options()` encounters the first dashless argument, it will either:
+
+* stop and return, if that dashless argument is a known subcommand, setting
+ `value` to the function pointer associated with that subcommand, storing
+ the name of the subcommand in argv[0], and leaving the rest of the
+ arguments unprocessed, or
+
+* stop and return, if it was invoked with the `PARSE_OPT_SUBCOMMAND_OPTIONAL`
+ flag and that dashless argument doesn't match any subcommands, leaving
+ `value` unchanged and the rest of the arguments unprocessed, or
+
+* show error and usage, and abort.
+
Steps to parse options
----------------------
@@ -90,8 +118,8 @@ Flags are the bitwise-or of:
Keep the first argument, which contains the program name. It's
removed from argv[] by default.
-`PARSE_OPT_KEEP_UNKNOWN`::
- Keep unknown arguments instead of erroring out. This doesn't
+`PARSE_OPT_KEEP_UNKNOWN_OPT`::
+ Keep unknown options instead of erroring out. This doesn't
work for all combinations of arguments as users might expect
it to do. E.g. if the first argument in `--unknown --known`
takes a value (which we can't know), the second one is
@@ -101,6 +129,8 @@ Flags are the bitwise-or of:
non-option, not as a value belonging to the unknown option,
the parser early. That's why parse_options() errors out if
both options are set.
+ Note that non-option arguments are always kept, even without
+ this flag.
`PARSE_OPT_NO_INTERNAL_HELP`::
By default, parse_options() handles `-h`, `--help` and
@@ -108,6 +138,13 @@ Flags are the bitwise-or of:
turns it off and allows one to add custom handlers for these
options, or to just leave them unknown.
+`PARSE_OPT_SUBCOMMAND_OPTIONAL`::
+ Don't error out when no subcommand is specified.
+
+Note that `PARSE_OPT_STOP_AT_NON_OPTION` is incompatible with subcommands;
+while `PARSE_OPT_KEEP_DASHDASH` and `PARSE_OPT_KEEP_UNKNOWN_OPT` can only be
+used with subcommands when combined with `PARSE_OPT_SUBCOMMAND_OPTIONAL`.
+
Data Structure
--------------
@@ -236,10 +273,14 @@ There are some macros to easily define options:
`OPT_CMDMODE(short, long, &int_var, description, enum_val)`::
Define an "operation mode" option, only one of which in the same
group of "operating mode" options that share the same `int_var`
- can be given by the user. `enum_val` is set to `int_var` when the
+ can be given by the user. `int_var` is set to `enum_val` when the
option is used, but an error is reported if other "operating mode"
option has already set its value to the same `int_var`.
+ In new commands consider using subcommands instead.
+`OPT_SUBCOMMAND(long, &fn_ptr, subcommand_fn)`::
+ Define a subcommand. `subcommand_fn` is put into `fn_ptr` when
+ this subcommand is used.
The last element of the array must be `OPT_END()`.
diff --git a/Documentation/technical/api-simple-ipc.txt b/Documentation/technical/api-simple-ipc.txt
index d79ad32..c4fb152 100644
--- a/Documentation/technical/api-simple-ipc.txt
+++ b/Documentation/technical/api-simple-ipc.txt
@@ -2,7 +2,7 @@ Simple-IPC API
==============
The Simple-IPC API is a collection of `ipc_` prefixed library routines
-and a basic communication protocol that allow an IPC-client process to
+and a basic communication protocol that allows an IPC-client process to
send an application-specific IPC-request message to an IPC-server
process and receive an application-specific IPC-response message.
@@ -20,12 +20,12 @@ IPC-client.
The IPC-client routines within a client application process connect
to the IPC-server and send a request message and wait for a response.
-When received, the response is returned back the caller.
+When received, the response is returned back to the caller.
For example, the `fsmonitor--daemon` feature will be built as a server
application on top of the IPC-server library routines. It will have
threads watching for file system events and a thread pool waiting for
-client connections. Clients, such as `git status` will request a list
+client connections. Clients, such as `git status`, will request a list
of file system events since a point in time and the server will
respond with a list of changed files and directories. The formats of
the request and response are application-specific; the IPC-client and
@@ -37,7 +37,7 @@ Comparison with sub-process model
The Simple-IPC mechanism differs from the existing `sub-process.c`
model (Documentation/technical/long-running-process-protocol.txt) and
-used by applications like Git-LFS. In the LFS-style sub-process model
+used by applications like Git-LFS. In the LFS-style sub-process model,
the helper is started by the foreground process, communication happens
via a pair of file descriptors bound to the stdin/stdout of the
sub-process, the sub-process only serves the current foreground
@@ -78,7 +78,7 @@ client and an optional response message from the server. Both the
client and server messages are unlimited in length and are terminated
with a flush packet.
-The pkt-line routines (Documentation/technical/protocol-common.txt)
+The pkt-line routines (linkgit:gitprotocol-common[5])
are used to simplify buffer management during message generation,
transmission, and reception. A flush packet is used to mark the end
of the message. This allows the sender to incrementally generate and
@@ -102,4 +102,4 @@ stateless request, receive an application-specific
response, and disconnect. It is a one round trip facility for
querying the server. The Simple-IPC routines hide the socket,
named pipe, and thread pool details and allow the application
-layer to focus on the application at hand.
+layer to focus on the task at hand.
diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index b9f3198..de5fc25 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -5,7 +5,7 @@ information to stderr or a file. The Trace2 feature is inactive unless
explicitly enabled by enabling one or more Trace2 Targets.
The Trace2 API is intended to replace the existing (Trace1)
-printf-style tracing provided by the existing `GIT_TRACE` and
+`printf()`-style tracing provided by the existing `GIT_TRACE` and
`GIT_TRACE_PERFORMANCE` facilities. During initial implementation,
Trace2 and Trace1 may operate in parallel.
@@ -24,8 +24,8 @@ for example.
Trace2 is controlled using `trace2.*` config values in the system and
global config files and `GIT_TRACE2*` environment variables. Trace2 does
-not read from repo local or worktree config files or respect `-c`
-command line config settings.
+not read from repo local or worktree config files, nor does it respect
+`-c` command line config settings.
== Trace2 Targets
@@ -34,8 +34,8 @@ Format details are given in a later section.
=== The Normal Format Target
-The normal format target is a tradition printf format and similar
-to GIT_TRACE format. This format is enabled with the `GIT_TRACE2`
+The normal format target is a traditional `printf()` format and similar
+to the `GIT_TRACE` format. This format is enabled with the `GIT_TRACE2`
environment variable or the `trace2.normalTarget` system or global
config setting.
@@ -69,8 +69,8 @@ $ cat ~/log.normal
=== The Performance Format Target
The performance format target (PERF) is a column-based format to
-replace GIT_TRACE_PERFORMANCE and is suitable for development and
-testing, possibly to complement tools like gprof. This format is
+replace `GIT_TRACE_PERFORMANCE` and is suitable for development and
+testing, possibly to complement tools like `gprof`. This format is
enabled with the `GIT_TRACE2_PERF` environment variable or the
`trace2.perfTarget` system or global config setting.
@@ -128,7 +128,7 @@ yields
------------
$ cat ~/log.event
-{"event":"version","sid":"sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.620713Z","file":"common-main.c","line":38,"evt":"2","exe":"2.20.1.155.g426c96fcdb"}
+{"event":"version","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.620713Z","file":"common-main.c","line":38,"evt":"3","exe":"2.20.1.155.g426c96fcdb"}
{"event":"start","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621027Z","file":"common-main.c","line":39,"t_abs":0.001173,"argv":["git","version"]}
{"event":"cmd_name","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621122Z","file":"git.c","line":432,"name":"version","hierarchy":"version"}
{"event":"exit","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621236Z","file":"git.c","line":662,"t_abs":0.001227,"code":0}
@@ -148,20 +148,18 @@ filename collisions).
== Trace2 API
-All public Trace2 functions and macros are defined in `trace2.h` and
-`trace2.c`. All public symbols are prefixed with `trace2_`.
+The Trace2 public API is defined and documented in `trace2.h`; refer to it for
+more information. All public functions and macros are prefixed
+with `trace2_` and are implemented in `trace2.c`.
There are no public Trace2 data structures.
The Trace2 code also defines a set of private functions and data types
in the `trace2/` directory. These symbols are prefixed with `tr2_`
-and should only be used by functions in `trace2.c`.
+and should only be used by functions in `trace2.c` (or other private
+source files in `trace2/`).
-== Conventions for Public Functions and Macros
-
-The functions defined by the Trace2 API are declared and documented
-in `trace2.h`. It defines the API functions and wrapper macros for
-Trace2.
+=== Conventions for Public Functions and Macros
Some functions have a `_fl()` suffix to indicate that they take `file`
and `line-number` arguments.
@@ -170,54 +168,9 @@ Some functions have a `_va_fl()` suffix to indicate that they also
take a `va_list` argument.
Some functions have a `_printf_fl()` suffix to indicate that they also
-take a varargs argument.
-
-There are CPP wrapper macros and ifdefs to hide most of these details.
-See `trace2.h` for more details. The following discussion will only
-describe the simplified forms.
-
-== Public API
-
-All Trace2 API functions send a message to all of the active
-Trace2 Targets. This section describes the set of available
-messages.
-
-It helps to divide these functions into groups for discussion
-purposes.
-
-=== Basic Command Messages
-
-These are concerned with the lifetime of the overall git process.
-e.g: `void trace2_initialize_clock()`, `void trace2_initialize()`,
-`int trace2_is_enabled()`, `void trace2_cmd_start(int argc, const char **argv)`.
-
-=== Command Detail Messages
-
-These are concerned with describing the specific Git command
-after the command line, config, and environment are inspected.
-e.g: `void trace2_cmd_name(const char *name)`,
-`void trace2_cmd_mode(const char *mode)`.
-
-=== Child Process Messages
-
-These are concerned with the various spawned child processes,
-including shell scripts, git commands, editors, pagers, and hooks.
-
-e.g: `void trace2_child_start(struct child_process *cmd)`.
-
-=== Git Thread Messages
-
-These messages are concerned with Git thread usage.
+take a `printf()` style format with a variable number of arguments.
-e.g: `void trace2_thread_start(const char *thread_name)`.
-
-=== Region and Data Messages
-
-These are concerned with recording performance data
-over regions or spans of code. e.g:
-`void trace2_region_enter(const char *category, const char *label, const struct repository *repo)`.
-
-Refer to trace2.h for details about all trace2 functions.
+CPP wrapper macros are defined to hide most of these details.
== Trace2 Target Formats
@@ -234,7 +187,7 @@ Events are written as lines of the form:
is the event name.
`<event-message>`::
- is a free-form printf message intended for human consumption.
+ is a free-form `printf()` message intended for human consumption.
+
Note that this may contain embedded LF or CRLF characters that are
not escaped, so the event may spill across multiple lines.
@@ -300,7 +253,7 @@ This field is in anticipation of in-proc submodules in the future.
indicate a broad category, such as "index" or "status".
`<perf-event-message>`::
- is a free-form printf message intended for human consumption.
+ is a free-form `printf()` message intended for human consumption.
------------
15:33:33.532712 wt-status.c:2310 | d0 | main | region_enter | r1 | 0.126064 | | status | label:print
@@ -391,7 +344,7 @@ only present on the "start" and "atexit" events.
{
"event":"version",
...
- "evt":"2", # EVENT format version
+ "evt":"3", # EVENT format version
"exe":"2.20.1.155.g426c96fcdb" # git version
}
------------
@@ -465,8 +418,8 @@ completed.)
------------
`"error"`::
- This event is emitted when one of the `BUG()`, `error()`, `die()`,
- `warning()`, or `usage()` functions are called.
+ This event is emitted when one of the `BUG()`, `bug()`, `error()`,
+ `die()`, `warning()`, or `usage()` functions are called.
+
------------
{
@@ -533,7 +486,7 @@ these special values are used:
------------
`"cmd_mode"`::
- This event, when present, describes the command variant This
+ This event, when present, describes the command variant. This
event may be emitted more than once.
+
------------
@@ -588,7 +541,7 @@ with "?".
`"child_exit"`::
This event is generated after the current process has returned
- from the waitpid() and collected the exit information from the
+ from the `waitpid()` and collected the exit information from the
child.
+
------------
@@ -609,10 +562,50 @@ process may be a shell script which doesn't have a session-id.)
+
Note that the `t_rel` field contains the observed run time in seconds
for the child process (starting before the fork/exec/spawn and
-stopping after the waitpid() and includes OS process creation overhead).
+stopping after the `waitpid()` and includes OS process creation overhead).
So this time will be slightly larger than the atexit time reported by
the child process itself.
+`"child_ready"`::
+ This event is generated after the current process has started
+ a background process and released all handles to it.
++
+------------
+{
+ "event":"child_ready",
+ ...
+ "child_id":2,
+ "pid":14708, # child PID
+ "ready":"ready", # child ready state
+ "t_rel":0.110605 # observed run-time of child process
+}
+------------
++
+Note that the session-id of the child process is not available to
+the current/spawning process, so the child's PID is reported here as
+a hint for post-processing. (But it is only a hint because the child
+process may be a shell script which doesn't have a session-id.)
++
+This event is generated after the child is started in the background
+and given a little time to boot up and start working. If the child
+starts up normally while the parent is still waiting, the "ready"
+field will have the value "ready".
+If the child is too slow to start and the parent times out, the field
+will have the value "timeout".
+If the child starts but the parent is unable to probe it, the field
+will have the value "error".
++
+After the parent process emits this event, it will release all of its
+handles to the child process and treat the child as a background
+daemon. So even if the child does eventually finish booting up,
+the parent will not emit an updated event.
++
+Note that the `t_rel` field contains the observed run time in seconds
+when the parent released the child process into the background.
+The child is assumed to be a long-running daemon process and may
+outlive the parent process. So the parent's child event times should
+not be compared to the child's atexit times.
+
`"exec"`::
This event is generated before git attempts to `exec()`
another command rather than starting a child process.
@@ -645,8 +638,8 @@ The "exec_id" field is a command-unique id and is only useful if the
`"thread_start"`::
This event is generated when a thread is started. It is
- generated from *within* the new thread's thread-proc (for TLS
- reasons).
+ generated from *within* the new thread's thread-proc (because
+ it needs to access data in the thread's thread-local storage).
+
------------
{
@@ -658,7 +651,7 @@ The "exec_id" field is a command-unique id and is only useful if the
`"thread_exit"`::
This event is generated when a thread exits. It is generated
- from *within* the thread's thread-proc (for TLS reasons).
+ from *within* the thread's thread-proc.
+
------------
{
@@ -677,6 +670,7 @@ The "exec_id" field is a command-unique id and is only useful if the
{
"event":"def_param",
...
+ "scope":"global",
"param":"core.abbrev",
"value":"7"
}
@@ -775,6 +769,73 @@ The "value" field may be an integer or a string.
}
------------
+`"th_timer"`::
+ This event logs the amount of time that a stopwatch timer was
+ running in the thread. This event is generated when a thread
+ exits for timers that requested per-thread events.
++
+------------
+{
+ "event":"th_timer",
+ ...
+ "category":"my_category",
+ "name":"my_timer",
+ "intervals":5, # number of time it was started/stopped
+ "t_total":0.052741, # total time in seconds it was running
+ "t_min":0.010061, # shortest interval
+ "t_max":0.011648 # longest interval
+}
+------------
+
+`"timer"`::
+ This event logs the amount of time that a stopwatch timer was
+ running aggregated across all threads. This event is generated
+ when the process exits.
++
+------------
+{
+ "event":"timer",
+ ...
+ "category":"my_category",
+ "name":"my_timer",
+ "intervals":5, # number of time it was started/stopped
+ "t_total":0.052741, # total time in seconds it was running
+ "t_min":0.010061, # shortest interval
+ "t_max":0.011648 # longest interval
+}
+------------
+
+`"th_counter"`::
+ This event logs the value of a counter variable in a thread.
+ This event is generated when a thread exits for counters that
+ requested per-thread events.
++
+------------
+{
+ "event":"th_counter",
+ ...
+ "category":"my_category",
+ "name":"my_counter",
+ "count":23
+}
+------------
+
+`"counter"`::
+ This event logs the value of a counter variable across all threads.
+ This event is generated when the process exits. The total value
+ reported here is the sum across all threads.
++
+------------
+{
+ "event":"counter",
+ ...
+ "category":"my_category",
+ "name":"my_counter",
+ "count":23
+}
+------------
+
+
== Example Trace2 API Usage
Here is a hypothetical usage of the Trace2 API showing the intended
@@ -909,7 +970,7 @@ atexit elapsed:3.868970 code:0
Regions::
- Regions can be use to time an interesting section of code.
+ Regions can be used to time an interesting section of code.
+
----------------
void wt_status_collect(struct wt_status *s)
@@ -1063,9 +1124,9 @@ Thread Events::
Thread messages added to a thread-proc.
+
-For example, the multithreaded preload-index code can be
+For example, the multi-threaded preload-index code can be
instrumented with a region around the thread pool and then
-per-thread start and exit events within the threadproc.
+per-thread start and exit events within the thread-proc.
+
----------------
static void *preload_thread(void *_data)
@@ -1165,7 +1226,100 @@ worked on 508 items at offset 2032. Thread "th04" worked on 508 items
at offset 508.
+
This example also shows that thread names are assigned in a racy manner
-as each thread starts and allocates TLS storage.
+as each thread starts.
+
+Config (def param) Events::
+
+ Dump "interesting" config values to trace2 log.
++
+We can optionally emit configuration events, see
+`trace2.configparams` in linkgit:git-config[1] for how to enable
+it.
++
+----------------
+$ git config --system color.ui never
+$ git config --global color.ui always
+$ git config --local color.ui auto
+$ git config --list --show-scope | grep 'color.ui'
+system color.ui=never
+global color.ui=always
+local color.ui=auto
+----------------
++
+Then, mark the config `color.ui` as "interesting" config with
+`GIT_TRACE2_CONFIG_PARAMS`:
++
+----------------
+$ export GIT_TRACE2_PERF_BRIEF=1
+$ export GIT_TRACE2_PERF=~/log.perf
+$ export GIT_TRACE2_CONFIG_PARAMS=color.ui
+$ git version
+...
+$ cat ~/log.perf
+d0 | main | version | | | | | ...
+d0 | main | start | | 0.001642 | | | /usr/local/bin/git version
+d0 | main | cmd_name | | | | | version (version)
+d0 | main | def_param | | | | scope:system | color.ui:never
+d0 | main | def_param | | | | scope:global | color.ui:always
+d0 | main | def_param | | | | scope:local | color.ui:auto
+d0 | main | data | r0 | 0.002100 | 0.002100 | fsync | fsync/writeout-only:0
+d0 | main | data | r0 | 0.002126 | 0.002126 | fsync | fsync/hardware-flush:0
+d0 | main | exit | | 0.000470 | | | code:0
+d0 | main | atexit | | 0.000477 | | | code:0
+----------------
+
+Stopwatch Timer Events::
+
+ Measure the time spent in a function call or span of code
+ that might be called from many places within the code
+ throughout the life of the process.
++
+----------------
+static void expensive_function(void)
+{
+ trace2_timer_start(TRACE2_TIMER_ID_TEST1);
+ ...
+ sleep_millisec(1000); // Do something expensive
+ ...
+ trace2_timer_stop(TRACE2_TIMER_ID_TEST1);
+}
+
+static int ut_100timer(int argc, const char **argv)
+{
+ ...
+
+ expensive_function();
+
+ // Do something else 1...
+
+ expensive_function();
+
+ // Do something else 2...
+
+ expensive_function();
+
+ return 0;
+}
+----------------
++
+In this example, we measure the total time spent in
+`expensive_function()` regardless of when it is called
+in the overall flow of the program.
++
+----------------
+$ export GIT_TRACE2_PERF_BRIEF=1
+$ export GIT_TRACE2_PERF=~/log.perf
+$ t/helper/test-tool trace2 100timer 3 1000
+...
+$ cat ~/log.perf
+d0 | main | version | | | | | ...
+d0 | main | start | | 0.001453 | | | t/helper/test-tool trace2 100timer 3 1000
+d0 | main | cmd_name | | | | | trace2 (trace2)
+d0 | main | exit | | 3.003667 | | | code:0
+d0 | main | timer | | | | test | name:test1 intervals:3 total:3.001686 min:1.000254 max:1.000929
+d0 | main | atexit | | 3.003796 | | | code:0
+----------------
+
== Future Work
@@ -1174,11 +1328,11 @@ as each thread starts and allocates TLS storage.
There are a few issues to resolve before we can completely
switch to Trace2.
-* Updating existing tests that assume GIT_TRACE format messages.
+* Updating existing tests that assume `GIT_TRACE` format messages.
-* How to best handle custom GIT_TRACE_<key> messages?
+* How to best handle custom `GIT_TRACE_<key>` messages?
-** The GIT_TRACE_<key> mechanism allows each <key> to write to a
+** The `GIT_TRACE_<key>` mechanism allows each <key> to write to a
different file (in addition to just stderr).
** Do we want to maintain that ability or simply write to the existing
diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index 04b3ec2..f5d2009 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -25,9 +25,9 @@ An object is uniquely described by its bit position within a bitmap:
is defined as follows:
o1 <= o2 <==> pack(o1) <= pack(o2) /\ offset(o1) <= offset(o2)
-
- The ordering between packs is done according to the MIDX's .rev file.
- Notably, the preferred pack sorts ahead of all other packs.
++
+The ordering between packs is done according to the MIDX's .rev file.
+Notably, the preferred pack sorts ahead of all other packs.
The on-disk representation (described below) of a bitmap is the same regardless
of whether or not that bitmap belongs to a packfile or a MIDX. The only
@@ -39,97 +39,119 @@ MIDXs, both the bit-cache and rev-cache extensions are required.
== On-disk format
- - A header appears at the beginning:
-
- 4-byte signature: {'B', 'I', 'T', 'M'}
-
- 2-byte version number (network byte order)
- The current implementation only supports version 1
- of the bitmap index (the same one as JGit).
-
- 2-byte flags (network byte order)
-
- The following flags are supported:
-
- - BITMAP_OPT_FULL_DAG (0x1) REQUIRED
- This flag must always be present. It implies that the
- bitmap index has been generated for a packfile or
- multi-pack index (MIDX) with full closure (i.e. where
- every single object in the packfile/MIDX can find its
- parent links inside the same packfile/MIDX). This is a
- requirement for the bitmap index format, also present in
- JGit, that greatly reduces the complexity of the
- implementation.
-
- - BITMAP_OPT_HASH_CACHE (0x4)
- If present, the end of the bitmap file contains
- `N` 32-bit name-hash values, one per object in the
- pack/MIDX. The format and meaning of the name-hash is
- described below.
-
- 4-byte entry count (network byte order)
-
- The total count of entries (bitmapped commits) in this bitmap index.
-
- 20-byte checksum
-
- The SHA1 checksum of the pack/MIDX this bitmap index
- belongs to.
-
- - 4 EWAH bitmaps that act as type indexes
-
- Type indexes are serialized after the hash cache in the shape
- of four EWAH bitmaps stored consecutively (see Appendix A for
- the serialization format of an EWAH bitmap).
-
- There is a bitmap for each Git object type, stored in the following
- order:
-
- - Commits
- - Trees
- - Blobs
- - Tags
-
- In each bitmap, the `n`th bit is set to true if the `n`th object
- in the packfile or multi-pack index is of that type.
-
- The obvious consequence is that the OR of all 4 bitmaps will result
- in a full set (all bits set), and the AND of all 4 bitmaps will
- result in an empty bitmap (no bits set).
-
- - N entries with compressed bitmaps, one for each indexed commit
-
- Where `N` is the total amount of entries in this bitmap index.
- Each entry contains the following:
-
- - 4-byte object position (network byte order)
- The position **in the index for the packfile or
- multi-pack index** where the bitmap for this commit is
- found.
-
- - 1-byte XOR-offset
- The xor offset used to compress this bitmap. For an entry
- in position `x`, a XOR offset of `y` means that the actual
- bitmap representing this commit is composed by XORing the
- bitmap for this entry with the bitmap in entry `x-y` (i.e.
- the bitmap `y` entries before this one).
-
- Note that this compression can be recursive. In order to
- XOR this entry with a previous one, the previous entry needs
- to be decompressed first, and so on.
-
- The hard-limit for this offset is 160 (an entry can only be
- xor'ed against one of the 160 entries preceding it). This
- number is always positive, and hence entries are always xor'ed
- with **previous** bitmaps, not bitmaps that will come afterwards
- in the index.
-
- - 1-byte flags for this bitmap
- At the moment the only available flag is `0x1`, which hints
- that this bitmap can be re-used when rebuilding bitmap indexes
- for the repository.
-
- - The compressed bitmap itself, see Appendix A.
+ * A header appears at the beginning:
+
+ 4-byte signature: :: {'B', 'I', 'T', 'M'}
+
+ 2-byte version number (network byte order): ::
+
+ The current implementation only supports version 1
+ of the bitmap index (the same one as JGit).
+
+ 2-byte flags (network byte order): ::
+
+ The following flags are supported:
+
+ ** {empty}
+ BITMAP_OPT_FULL_DAG (0x1) REQUIRED: :::
+
+ This flag must always be present. It implies that the
+ bitmap index has been generated for a packfile or
+ multi-pack index (MIDX) with full closure (i.e. where
+ every single object in the packfile/MIDX can find its
+ parent links inside the same packfile/MIDX). This is a
+ requirement for the bitmap index format, also present in
+ JGit, that greatly reduces the complexity of the
+ implementation.
+
+ ** {empty}
+ BITMAP_OPT_HASH_CACHE (0x4): :::
+
+ If present, the end of the bitmap file contains
+ `N` 32-bit name-hash values, one per object in the
+ pack/MIDX. The format and meaning of the name-hash is
+ described below.
+
+ ** {empty}
+ BITMAP_OPT_LOOKUP_TABLE (0x10): :::
+ If present, the end of the bitmap file contains a table
+ containing a list of `N` <commit_pos, offset, xor_row>
+ triplets. The format and meaning of the table is described
+ below.
++
+NOTE: Unlike the xor_offset used to compress an individual bitmap,
+`xor_row` stores an *absolute* index into the lookup table, not a location
+relative to the current entry.
+
+ 4-byte entry count (network byte order): ::
+ The total count of entries (bitmapped commits) in this bitmap index.
+
+ 20-byte checksum: ::
+ The SHA1 checksum of the pack/MIDX this bitmap index
+ belongs to.
+
+ * 4 EWAH bitmaps that act as type indexes
++
+Type indexes are serialized after the hash cache in the shape
+of four EWAH bitmaps stored consecutively (see Appendix A for
+the serialization format of an EWAH bitmap).
++
+There is a bitmap for each Git object type, stored in the following
+order:
++
+ - Commits
+ - Trees
+ - Blobs
+ - Tags
+
++
+In each bitmap, the `n`th bit is set to true if the `n`th object
+in the packfile or multi-pack index is of that type.
++
+The obvious consequence is that the OR of all 4 bitmaps will result
+in a full set (all bits set), and the AND of all 4 bitmaps will
+result in an empty bitmap (no bits set).
+
+ * N entries with compressed bitmaps, one for each indexed commit
++
+Where `N` is the total number of entries in this bitmap index.
+Each entry contains the following:
+
+ ** {empty}
+ 4-byte object position (network byte order): ::
+ The position **in the index for the packfile or
+ multi-pack index** where the bitmap for this commit is
+ found.
+
+ ** {empty}
+ 1-byte XOR-offset: ::
+ The xor offset used to compress this bitmap. For an entry
+ in position `x`, an XOR offset of `y` means that the actual
+ bitmap representing this commit is composed by XORing the
+ bitmap for this entry with the bitmap in entry `x-y` (i.e.
+ the bitmap `y` entries before this one).
++
+NOTE: This compression can be recursive. In order to
+XOR this entry with a previous one, the previous entry needs
+to be decompressed first, and so on.
++
+The hard-limit for this offset is 160 (an entry can only be
+xor'ed against one of the 160 entries preceding it). This
+number is always positive, and hence entries are always xor'ed
+with **previous** bitmaps, not bitmaps that will come afterwards
+in the index.
+
+ ** {empty}
+ 1-byte flags for this bitmap: ::
+ At the moment the only available flag is `0x1`, which hints
+ that this bitmap can be re-used when rebuilding bitmap indexes
+ for the repository.
+
+ ** The compressed bitmap itself, see Appendix A.
+
+ * {empty}
+ TRAILER: ::
+ Trailing checksum of the preceding contents.
== Appendix A: Serialization format for an EWAH bitmap
@@ -142,8 +164,8 @@ implementation:
- 4-byte number of words of the COMPRESSED bitmap, when stored
- N x 8-byte words, as specified by the previous field
-
- This is the actual content of the compressed bitmap.
++
+This is the actual content of the compressed bitmap.
- 4-byte position of the current RLW for the compressed
bitmap
@@ -205,3 +227,31 @@ Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag.
If implementations want to choose a different hashing scheme, they are
free to do so, but MUST allocate a new header flag (because comparing
hashes made under two different schemes would be pointless).
+
+Commit lookup table
+-------------------
+
+If the BITMAP_OPT_LOOKUP_TABLE flag is set, the last `N * (4 + 8 + 4)`
+bytes (preceding the name-hash cache and trailing hash) of the `.bitmap`
+file contains a lookup table specifying the information needed to get
+the desired bitmap from the entries without parsing previous unnecessary
+bitmaps.
+
+For a `.bitmap` containing `nr_entries` reachability bitmaps, the table
+contains a list of `nr_entries` <commit_pos, offset, xor_row> triplets
+(sorted in the ascending order of `commit_pos`). The content of the i'th
+triplet is -
+
+ * {empty}
+ commit_pos (4 byte integer, network byte order): ::
+ It stores the object position of a commit (in the midx or pack
+ index).
+
+ * {empty}
+ offset (8 byte integer, network byte order): ::
+ The offset from which that commit's bitmap can be read.
+
+ * {empty}
+ xor_row (4 byte integer, network byte order): ::
+ The position of the triplet whose bitmap is used to compress
+ this one, or `0xffffffff` if no such bitmap exists.
diff --git a/Documentation/technical/bundle-format.txt b/Documentation/technical/bundle-format.txt
deleted file mode 100644
index bac558d..0000000
--- a/Documentation/technical/bundle-format.txt
+++ /dev/null
@@ -1,76 +0,0 @@
-= Git bundle v2 format
-
-The Git bundle format is a format that represents both refs and Git objects.
-
-== Format
-
-We will use ABNF notation to define the Git bundle format. See
-protocol-common.txt for the details.
-
-A v2 bundle looks like this:
-
-----
-bundle = signature *prerequisite *reference LF pack
-signature = "# v2 git bundle" LF
-
-prerequisite = "-" obj-id SP comment LF
-comment = *CHAR
-reference = obj-id SP refname LF
-
-pack = ... ; packfile
-----
-
-A v3 bundle looks like this:
-
-----
-bundle = signature *capability *prerequisite *reference LF pack
-signature = "# v3 git bundle" LF
-
-capability = "@" key ["=" value] LF
-prerequisite = "-" obj-id SP comment LF
-comment = *CHAR
-reference = obj-id SP refname LF
-key = 1*(ALPHA / DIGIT / "-")
-value = *(%01-09 / %0b-FF)
-
-pack = ... ; packfile
-----
-
-== Semantics
-
-A Git bundle consists of several parts.
-
-* "Capabilities", which are only in the v3 format, indicate functionality that
- the bundle requires to be read properly.
-
-* "Prerequisites" lists the objects that are NOT included in the bundle and the
- reader of the bundle MUST already have, in order to use the data in the
- bundle. The objects stored in the bundle may refer to prerequisite objects and
- anything reachable from them (e.g. a tree object in the bundle can reference
- a blob that is reachable from a prerequisite) and/or expressed as a delta
- against prerequisite objects.
-
-* "References" record the tips of the history graph, iow, what the reader of the
- bundle CAN "git fetch" from it.
-
-* "Pack" is the pack data stream "git fetch" would send, if you fetch from a
- repository that has the references recorded in the "References" above into a
- repository that has references pointing at the objects listed in
- "Prerequisites" above.
-
-In the bundle format, there can be a comment following a prerequisite obj-id.
-This is a comment and it has no specific meaning. The writer of the bundle MAY
-put any string here. The reader of the bundle MUST ignore the comment.
-
-=== Note on the shallow clone and a Git bundle
-
-Note that the prerequisites does not represent a shallow-clone boundary. The
-semantics of the prerequisites and the shallow-clone boundaries are different,
-and the Git bundle v2 format cannot represent a shallow clone repository.
-
-== Capabilities
-
-Because there is no opportunity for negotiation, unknown capabilities cause 'git
-bundle' to abort. The only known capability is `object-format`, which specifies
-the hash algorithm in use, and can take the same values as the
-`extensions.objectFormat` configuration value.
diff --git a/Documentation/technical/bundle-uri.txt b/Documentation/technical/bundle-uri.txt
new file mode 100644
index 0000000..91d3a13
--- /dev/null
+++ b/Documentation/technical/bundle-uri.txt
@@ -0,0 +1,572 @@
+Bundle URIs
+===========
+
+Git bundles are files that store a pack-file along with some extra metadata,
+including a set of refs and a (possibly empty) set of necessary commits. See
+linkgit:git-bundle[1] and linkgit:gitformat-bundle[5] for more information.
+
+Bundle URIs are locations where Git can download one or more bundles in
+order to bootstrap the object database in advance of fetching the remaining
+objects from a remote.
+
+One goal is to speed up clones and fetches for users with poor network
+connectivity to the origin server. Another benefit is to allow heavy users,
+such as CI build farms, to use local resources for the majority of Git data
+and thereby reducing the load on the origin server.
+
+To enable the bundle URI feature, users can specify a bundle URI using
+command-line options or the origin server can advertise one or more URIs
+via a protocol v2 capability.
+
+Design Goals
+------------
+
+The bundle URI standard aims to be flexible enough to satisfy multiple
+workloads. The bundle provider and the Git client have several choices in
+how they create and consume bundle URIs.
+
+* Bundles can have whatever name the server desires. This name could refer
+ to immutable data by using a hash of the bundle contents. However, this
+ means that a new URI will be needed after every update of the content.
+ This might be acceptable if the server is advertising the URI (and the
+ server is aware of new bundles being generated) but would not be
+ ergonomic for users using the command line option.
+
+* The bundles could be organized specifically for bootstrapping full
+ clones, but could also be organized with the intention of bootstrapping
+ incremental fetches. The bundle provider must decide on one of several
+ organization schemes to minimize client downloads during incremental
+ fetches, but the Git client can also choose whether to use bundles for
+ either of these operations.
+
+* The bundle provider can choose to support full clones, partial clones,
+ or both. The client can detect which bundles are appropriate for the
+ repository's partial clone filter, if any.
+
+* The bundle provider can use a single bundle (for clones only), or a
+ list of bundles. When using a list of bundles, the provider can specify
+ whether or not the client needs _all_ of the bundle URIs for a full
+ clone, or if _any_ one of the bundle URIs is sufficient. This allows the
+ bundle provider to use different URIs for different geographies.
+
+* The bundle provider can organize the bundles using heuristics, such as
+ creation tokens, to help the client prevent downloading bundles it does
+ not need. When the bundle provider does not provide these heuristics,
+ the client can use optimizations to minimize how much of the data is
+ downloaded.
+
+* The bundle provider does not need to be associated with the Git server.
+ The client can choose to use the bundle provider without it being
+ advertised by the Git server.
+
+* The client can choose to discover bundle providers that are advertised
+ by the Git server. This could happen during `git clone`, during
+ `git fetch`, both, or neither. The user can choose which combination
+ works best for them.
+
+* The client can choose to configure a bundle provider manually at any
+ time. The client can also choose to specify a bundle provider manually
+ as a command-line option to `git clone`.
+
+Each repository is different and every Git server has different needs.
+Hopefully the bundle URI feature is flexible enough to satisfy all needs.
+If not, then the feature can be extended through its versioning mechanism.
+
+Server requirements
+-------------------
+
+To provide a server-side implementation of bundle servers, no other parts
+of the Git protocol are required. This allows server maintainers to use
+static content solutions such as CDNs in order to serve the bundle files.
+
+At the current scope of the bundle URI feature, all URIs are expected to
+be HTTP(S) URLs where content is downloaded to a local file using a `GET`
+request to that URL. The server could include authentication requirements
+to those requests with the aim of triggering the configured credential
+helper for secure access. (Future extensions could use "file://" URIs or
+SSH URIs.)
+
+Assuming a `200 OK` response from the server, the content at the URL is
+inspected. First, Git attempts to parse the file as a bundle file of
+version 2 or higher. If the file is not a bundle, then the file is parsed
+as a plain-text file using Git's config parser. The key-value pairs in
+that config file are expected to describe a list of bundle URIs. If
+neither of these parse attempts succeed, then Git will report an error to
+the user that the bundle URI provided erroneous data.
+
+Any other data provided by the server is considered erroneous.
+
+Bundle Lists
+------------
+
+The Git server can advertise bundle URIs using a set of `key=value` pairs.
+A bundle URI can also serve a plain-text file in the Git config format
+containing these same `key=value` pairs. In both cases, we consider this
+to be a _bundle list_. The pairs specify information about the bundles
+that the client can use to make decisions for which bundles to download
+and which to ignore.
+
+A few keys focus on properties of the list itself.
+
+bundle.version::
+ (Required) This value provides a version number for the bundle
+ list. If a future Git change enables a feature that needs the Git
+ client to react to a new key in the bundle list file, then this version
+ will increment. The only current version number is 1, and if any other
+ value is specified then Git will fail to use this file.
+
+bundle.mode::
+ (Required) This value has one of two values: `all` and `any`. When `all`
+ is specified, then the client should expect to need all of the listed
+ bundle URIs that match their repository's requirements. When `any` is
+ specified, then the client should expect that any one of the bundle URIs
+ that match their repository's requirements will suffice. Typically, the
+ `any` option is used to list a number of different bundle servers
+ located in different geographies.
+
+bundle.heuristic::
+ If this string-valued key exists, then the bundle list is designed to
+ work well with incremental `git fetch` commands. The heuristic signals
+ that there are additional keys available for each bundle that help
+ determine which subset of bundles the client should download. The only
+ heuristic currently planned is `creationToken`.
+
+The remaining keys include an `<id>` segment which is a server-designated
+name for each available bundle. The `<id>` must contain only alphanumeric
+and `-` characters.
+
+bundle.<id>.uri::
+ (Required) This string value is the URI for downloading bundle `<id>`.
+ If the URI begins with a protocol (`http://` or `https://`) then the URI
+ is absolute. Otherwise, the URI is interpreted as relative to the URI
+ used for the bundle list. If the URI begins with `/`, then that relative
+ path is relative to the domain name used for the bundle list. (This use
+ of relative paths is intended to make it easier to distribute a set of
+ bundles across a large number of servers or CDNs with different domain
+ names.)
+
+bundle.<id>.filter::
+ This string value represents an object filter that should also appear in
+ the header of this bundle. The server uses this value to differentiate
+ different kinds of bundles from which the client can choose those that
+ match their object filters.
+
+bundle.<id>.creationToken::
+ This value is a nonnegative 64-bit integer used for sorting the bundles
+ list. This is used to download a subset of bundles during a fetch when
+ `bundle.heuristic=creationToken`.
+
+bundle.<id>.location::
+ This string value advertises a real-world location from where the bundle
+ URI is served. This can be used to present the user with an option for
+ which bundle URI to use or simply as an informative indicator of which
+ bundle URI was selected by Git. This is only valuable when
+ `bundle.mode` is `any`.
+
+Here is an example bundle list using the Git config format:
+
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "2022-02-09-1644442601-daily"]
+ uri = https://bundles.example.com/git/git/2022-02-09-1644442601-daily.bundle
+ creationToken = 1644442601
+
+ [bundle "2022-02-02-1643842562"]
+ uri = https://bundles.example.com/git/git/2022-02-02-1643842562.bundle
+ creationToken = 1643842562
+
+ [bundle "2022-02-09-1644442631-daily-blobless"]
+ uri = 2022-02-09-1644442631-daily-blobless.bundle
+ creationToken = 1644442631
+ filter = blob:none
+
+ [bundle "2022-02-02-1643842568-blobless"]
+ uri = /git/git/2022-02-02-1643842568-blobless.bundle
+ creationToken = 1643842568
+ filter = blob:none
+
+This example uses `bundle.mode=all` as well as the
+`bundle.<id>.creationToken` heuristic. It also uses the `bundle.<id>.filter`
+options to present two parallel sets of bundles: one for full clones and
+another for blobless partial clones.
+
+Suppose that this bundle list was found at the URI
+`https://bundles.example.com/git/git/` and so the two blobless bundles have
+the following fully-expanded URIs:
+
+* `https://bundles.example.com/git/git/2022-02-09-1644442631-daily-blobless.bundle`
+* `https://bundles.example.com/git/git/2022-02-02-1643842568-blobless.bundle`
+
+Advertising Bundle URIs
+-----------------------
+
+If a user knows a bundle URI for the repository they are cloning, then
+they can specify that URI manually through a command-line option. However,
+a Git host may want to advertise bundle URIs during the clone operation,
+helping users unaware of the feature.
+
+The only thing required for this feature is that the server can advertise
+one or more bundle URIs. This advertisement takes the form of a new
+protocol v2 capability specifically for discovering bundle URIs.
+
+The client could choose an arbitrary bundle URI as an option _or_ select
+the URI with best performance by some exploratory checks. It is up to the
+bundle provider to decide if having multiple URIs is preferable to a
+single URI that is geodistributed through server-side infrastructure.
+
+Cloning with Bundle URIs
+------------------------
+
+The primary need for bundle URIs is to speed up clones. The Git client
+will interact with bundle URIs according to the following flow:
+
+1. The user specifies a bundle URI with the `--bundle-uri` command-line
+ option _or_ the client discovers a bundle list advertised by the
+ Git server.
+
+2. If the downloaded data from a bundle URI is a bundle, then the client
+ inspects the bundle headers to check that the prerequisite commit OIDs
+ are present in the client repository. If some are missing, then the
+ client delays unbundling until other bundles have been unbundled,
+ making those OIDs present. When all required OIDs are present, the
+ client unbundles that data using a refspec. The default refspec is
+ `+refs/heads/*:refs/bundles/*`, but this can be configured. These refs
+ are stored so that later `git fetch` negotiations can communicate each
+ bundled ref as a `have`, reducing the size of the fetch over the Git
+ protocol. To allow pruning refs from this ref namespace, Git may
+ introduce a numbered namespace (such as `refs/bundles/<i>/*`) such that
+ stale bundle refs can be deleted.
+
+3. If the file is instead a bundle list, then the client inspects the
+ `bundle.mode` to see if the list is of the `all` or `any` form.
+
+ a. If `bundle.mode=all`, then the client considers all bundle
+ URIs. The list is reduced based on the `bundle.<id>.filter` options
+ matching the client repository's partial clone filter. Then, all
+ bundle URIs are requested. If the `bundle.<id>.creationToken`
+ heuristic is provided, then the bundles are downloaded in decreasing
+ order by the creation token, stopping when a bundle has all required
+ OIDs. The bundles can then be unbundled in increasing creation token
+ order. The client stores the latest creation token as a heuristic
+ for avoiding future downloads if the bundle list does not advertise
+ bundles with larger creation tokens.
+
+ b. If `bundle.mode=any`, then the client can choose any one of the
+ bundle URIs to inspect. The client can use a variety of ways to
+ choose among these URIs. The client can also fallback to another URI
+ if the initial choice fails to return a result.
+
+Note that during a clone we expect that all bundles will be required, and
+heuristics such as `bundle.<uri>.creationToken` can be used to download
+bundles in chronological order or in parallel.
+
+If a given bundle URI is a bundle list with a `bundle.heuristic`
+value, then the client can choose to store that URI as its chosen bundle
+URI. The client can then navigate directly to that URI during later `git
+fetch` calls.
+
+When downloading bundle URIs, the client can choose to inspect the initial
+content before committing to downloading the entire content. This may
+provide enough information to determine if the URI is a bundle list or
+a bundle. In the case of a bundle, the client may inspect the bundle
+header to determine that all advertised tips are already in the client
+repository and cancel the remaining download.
+
+Fetching with Bundle URIs
+-------------------------
+
+When the client fetches new data, it can decide to fetch from bundle
+servers before fetching from the origin remote. This could be done via a
+command-line option, but it is more likely useful to use a config value
+such as the one specified during the clone.
+
+The fetch operation follows the same procedure to download bundles from a
+bundle list (although we do _not_ want to use parallel downloads here). We
+expect that the process will end when all prerequisite commit OIDs in a
+thin bundle are already in the object database.
+
+When using the `creationToken` heuristic, the client can avoid downloading
+any bundles if their creation tokens are not larger than the stored
+creation token. After fetching new bundles, Git updates this local
+creation token.
+
+If the bundle provider does not provide a heuristic, then the client
+should attempt to inspect the bundle headers before downloading the full
+bundle data in case the bundle tips already exist in the client
+repository.
+
+Error Conditions
+----------------
+
+If the Git client discovers something unexpected while downloading
+information according to a bundle URI or the bundle list found at that
+location, then Git can ignore that data and continue as if it was not
+given a bundle URI. The remote Git server is the ultimate source of truth,
+not the bundle URI.
+
+Here are a few example error conditions:
+
+* The client fails to connect with a server at the given URI or a connection
+ is lost without any chance to recover.
+
+* The client receives a 400-level response (such as `404 Not Found` or
+ `401 Not Authorized`). The client should use the credential helper to
+ find and provide a credential for the URI, but match the semantics of
+ Git's other HTTP protocols in terms of handling specific 400-level
+ errors.
+
+* The server reports any other failure response.
+
+* The client receives data that is not parsable as a bundle or bundle list.
+
+* A bundle includes a filter that does not match expectations.
+
+* The client cannot unbundle the bundles because the prerequisite commit OIDs
+ are not in the object database and there are no more bundles to download.
+
+There are also situations that could be seen as wasteful, but are not
+error conditions:
+
+* The downloaded bundles contain more information than is requested by
+ the clone or fetch request. A primary example is if the user requests
+ a clone with `--single-branch` but downloads bundles that store every
+ reachable commit from all `refs/heads/*` references. This might be
+ initially wasteful, but perhaps these objects will become reachable by
+ a later ref update that the client cares about.
+
+* A bundle download during a `git fetch` contains objects already in the
+ object database. This is probably unavoidable if we are using bundles
+ for fetches, since the client will almost always be slightly ahead of
+ the bundle servers after performing its "catch-up" fetch to the remote
+ server. This extra work is most wasteful when the client is fetching
+ much more frequently than the server is computing bundles, such as if
+ the client is using hourly prefetches with background maintenance, but
+ the server is computing bundles weekly. For this reason, the client
+ should not use bundle URIs for fetch unless the server has explicitly
+ recommended it through a `bundle.heuristic` value.
+
+Example Bundle Provider organization
+------------------------------------
+
+The bundle URI feature is intentionally designed to be flexible to
+different ways a bundle provider wants to organize the object data.
+However, it can be helpful to have a complete organization model described
+here so providers can start from that base.
+
+This example organization is a simplified model of what is used by the
+GVFS Cache Servers (see section near the end of this document) which have
+been beneficial in speeding up clones and fetches for very large
+repositories, although using extra software outside of Git.
+
+The bundle provider deploys servers across multiple geographies. Each
+server manages its own bundle set. The server can track a number of Git
+repositories, but provides a bundle list for each based on a pattern. For
+example, when mirroring a repository at `https://<domain>/<org>/<repo>`
+the bundle server could have its bundle list available at
+`https://<server-url>/<domain>/<org>/<repo>`. The origin Git server can
+list all of these servers under the "any" mode:
+
+ [bundle]
+ version = 1
+ mode = any
+
+ [bundle "eastus"]
+ uri = https://eastus.example.com/<domain>/<org>/<repo>
+
+ [bundle "europe"]
+ uri = https://europe.example.com/<domain>/<org>/<repo>
+
+ [bundle "apac"]
+ uri = https://apac.example.com/<domain>/<org>/<repo>
+
+This "list of lists" is static and only changes if a bundle server is
+added or removed.
+
+Each bundle server manages its own set of bundles. The initial bundle list
+contains only a single bundle, containing all of the objects received from
+cloning the repository from the origin server. The list uses the
+`creationToken` heuristic and a `creationToken` is made for the bundle
+based on the server's timestamp.
+
+The bundle server runs regularly-scheduled updates for the bundle list,
+such as once a day. During this task, the server fetches the latest
+contents from the origin server and generates a bundle containing the
+objects reachable from the latest origin refs, but not contained in a
+previously-computed bundle. This bundle is added to the list, with care
+that the `creationToken` is strictly greater than the previous maximum
+`creationToken`.
+
+When the bundle list grows too large, say more than 30 bundles, then the
+oldest "_N_ minus 30" bundles are combined into a single bundle. This
+bundle's `creationToken` is equal to the maximum `creationToken` among the
+merged bundles.
+
+An example bundle list is provided here, although it only has two daily
+bundles and not a full list of 30:
+
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "2022-02-13-1644770820-daily"]
+ uri = https://eastus.example.com/<domain>/<org>/<repo>/2022-02-09-1644770820-daily.bundle
+ creationToken = 1644770820
+
+ [bundle "2022-02-09-1644442601-daily"]
+ uri = https://eastus.example.com/<domain>/<org>/<repo>/2022-02-09-1644442601-daily.bundle
+ creationToken = 1644442601
+
+ [bundle "2022-02-02-1643842562"]
+ uri = https://eastus.example.com/<domain>/<org>/<repo>/2022-02-02-1643842562.bundle
+ creationToken = 1643842562
+
+To avoid storing and serving object data in perpetuity despite becoming
+unreachable in the origin server, this bundle merge can be more careful.
+Instead of taking an absolute union of the old bundles, instead the bundle
+can be created by looking at the newer bundles and ensuring that their
+necessary commits are all available in this merged bundle (or in another
+one of the newer bundles). This allows "expiring" object data that is not
+being used by new commits in this window of time. That data could be
+reintroduced by a later push.
+
+The intention of this data organization has two main goals. First, initial
+clones of the repository become faster by downloading precomputed object
+data from a closer source. Second, `git fetch` commands can be faster,
+especially if the client has not fetched for a few days. However, if a
+client does not fetch for 30 days, then the bundle list organization would
+cause redownloading a large amount of object data.
+
+One way to make this organization more useful to users who fetch frequently
+is to have more frequent bundle creation. For example, bundles could be
+created every hour, and then once a day those "hourly" bundles could be
+merged into a "daily" bundle. The daily bundles are merged into the
+oldest bundle after 30 days.
+
+It is recommended that this bundle strategy is repeated with the `blob:none`
+filter if clients of this repository are expecting to use blobless partial
+clones. This list of blobless bundles stays in the same list as the full
+bundles, but uses the `bundle.<id>.filter` key to separate the two groups.
+For very large repositories, the bundle provider may want to _only_ provide
+blobless bundles.
+
+Implementation Plan
+-------------------
+
+This design document is being submitted on its own as an aspirational
+document, with the goal of implementing all of the mentioned client
+features over the course of several patch series. Here is a potential
+outline for submitting these features:
+
+1. Integrate bundle URIs into `git clone` with a `--bundle-uri` option.
+ This will include a new `git fetch --bundle-uri` mode for use as the
+ implementation underneath `git clone`. The initial version here will
+ expect a single bundle at the given URI.
+
+2. Implement the ability to parse a bundle list from a bundle URI and
+ update the `git fetch --bundle-uri` logic to properly distinguish
+ between `bundle.mode` options. Specifically design the feature so
+ that the config format parsing feeds a list of key-value pairs into the
+ bundle list logic.
+
+3. Create the `bundle-uri` protocol v2 command so Git servers can advertise
+ bundle URIs using the key-value pairs. Plug into the existing key-value
+ input to the bundle list logic. Allow `git clone` to discover these
+ bundle URIs and bootstrap the client repository from the bundle data.
+ (This choice is an opt-in via a config option and a command-line
+ option.)
+
+4. Allow the client to understand the `bundle.heuristic` configuration key
+ and the `bundle.<id>.creationToken` heuristic. When `git clone`
+ discovers a bundle URI with `bundle.heuristic`, it configures the client
+ repository to check that bundle URI during later `git fetch <remote>`
+ commands.
+
+5. Allow clients to discover bundle URIs during `git fetch` and configure
+ a bundle URI for later fetches if `bundle.heuristic` is set.
+
+6. Implement the "inspect headers" heuristic to reduce data downloads when
+ the `bundle.<id>.creationToken` heuristic is not available.
+
+As these features are reviewed, this plan might be updated. We also expect
+that new designs will be discovered and implemented as this feature
+matures and becomes used in real-world scenarios.
+
+Related Work: Packfile URIs
+---------------------------
+
+The Git protocol already has a capability where the Git server can list
+a set of URLs along with the packfile response when serving a client
+request. The client is then expected to download the packfiles at those
+locations in order to have a complete understanding of the response.
+
+This mechanism is used by the Gerrit server (implemented with JGit) and
+has been effective at reducing CPU load and improving user performance for
+clones.
+
+A major downside to this mechanism is that the origin server needs to know
+_exactly_ what is in those packfiles, and the packfiles need to be available
+to the user for some time after the server has responded. This coupling
+between the origin and the packfile data is difficult to manage.
+
+Further, this implementation is extremely hard to make work with fetches.
+
+Related Work: GVFS Cache Servers
+--------------------------------
+
+The GVFS Protocol [2] is a set of HTTP endpoints designed independently of
+the Git project before Git's partial clone was created. One feature of this
+protocol is the idea of a "cache server" which can be colocated with build
+machines or developer offices to transfer Git data without overloading the
+central server.
+
+The endpoint that VFS for Git is famous for is the `GET /gvfs/objects/{oid}`
+endpoint, which allows downloading an object on-demand. This is a critical
+piece of the filesystem virtualization of that product.
+
+However, a more subtle need is the `GET /gvfs/prefetch?lastPackTimestamp=<t>`
+endpoint. Given an optional timestamp, the cache server responds with a list
+of precomputed packfiles containing the commits and trees that were introduced
+in those time intervals.
+
+The cache server computes these "prefetch" packfiles using the following
+strategy:
+
+1. Every hour, an "hourly" pack is generated with a given timestamp.
+2. Nightly, the previous 24 hourly packs are rolled up into a "daily" pack.
+3. Nightly, all prefetch packs more than 30 days old are rolled up into
+ one pack.
+
+When a user runs `gvfs clone` or `scalar clone` against a repo with cache
+servers, the client requests all prefetch packfiles, which is at most
+`24 + 30 + 1` packfiles downloading only commits and trees. The client
+then follows with a request to the origin server for the references, and
+attempts to checkout that tip reference. (There is an extra endpoint that
+helps get all reachable trees from a given commit, in case that commit
+was not already in a prefetch packfile.)
+
+During a `git fetch`, a hook requests the prefetch endpoint using the
+most-recent timestamp from a previously-downloaded prefetch packfile.
+Only the list of packfiles with later timestamps are downloaded. Most
+users fetch hourly, so they get at most one hourly prefetch pack. Users
+whose machines have been off or otherwise have not fetched in over 30 days
+might redownload all prefetch packfiles. This is rare.
+
+It is important to note that the clients always contact the origin server
+for the refs advertisement, so the refs are frequently "ahead" of the
+prefetched pack data. The missing objects are downloaded on-demand using
+the `GET gvfs/objects/{oid}` requests, when needed by a command such as
+`git checkout` or `git log`. Some Git optimizations disable checks that
+would cause these on-demand downloads to be too aggressive.
+
+See Also
+--------
+
+[1] https://lore.kernel.org/git/RFC-cover-00.13-0000000000-20210805T150534Z-avarab@gmail.com/
+ An earlier RFC for a bundle URI feature.
+
+[2] https://github.com/microsoft/VFSForGit/blob/master/Protocol.md
+ The GVFS Protocol
diff --git a/Documentation/technical/chunk-format.txt b/Documentation/technical/chunk-format.txt
deleted file mode 100644
index 593614f..0000000
--- a/Documentation/technical/chunk-format.txt
+++ /dev/null
@@ -1,116 +0,0 @@
-Chunk-based file formats
-========================
-
-Some file formats in Git use a common concept of "chunks" to describe
-sections of the file. This allows structured access to a large file by
-scanning a small "table of contents" for the remaining data. This common
-format is used by the `commit-graph` and `multi-pack-index` files. See
-link:technical/pack-format.html[the `multi-pack-index` format] and
-link:technical/commit-graph-format.html[the `commit-graph` format] for
-how they use the chunks to describe structured data.
-
-A chunk-based file format begins with some header information custom to
-that format. That header should include enough information to identify
-the file type, format version, and number of chunks in the file. From this
-information, that file can determine the start of the chunk-based region.
-
-The chunk-based region starts with a table of contents describing where
-each chunk starts and ends. This consists of (C+1) rows of 12 bytes each,
-where C is the number of chunks. Consider the following table:
-
- | Chunk ID (4 bytes) | Chunk Offset (8 bytes) |
- |--------------------|------------------------|
- | ID[0] | OFFSET[0] |
- | ... | ... |
- | ID[C] | OFFSET[C] |
- | 0x0000 | OFFSET[C+1] |
-
-Each row consists of a 4-byte chunk identifier (ID) and an 8-byte offset.
-Each integer is stored in network-byte order.
-
-The chunk identifier `ID[i]` is a label for the data stored within this
-fill from `OFFSET[i]` (inclusive) to `OFFSET[i+1]` (exclusive). Thus, the
-size of the `i`th chunk is equal to the difference between `OFFSET[i+1]`
-and `OFFSET[i]`. This requires that the chunk data appears contiguously
-in the same order as the table of contents.
-
-The final entry in the table of contents must be four zero bytes. This
-confirms that the table of contents is ending and provides the offset for
-the end of the chunk-based data.
-
-Note: The chunk-based format expects that the file contains _at least_ a
-trailing hash after `OFFSET[C+1]`.
-
-Functions for working with chunk-based file formats are declared in
-`chunk-format.h`. Using these methods provide extra checks that assist
-developers when creating new file formats.
-
-Writing chunk-based file formats
---------------------------------
-
-To write a chunk-based file format, create a `struct chunkfile` by
-calling `init_chunkfile()` and pass a `struct hashfile` pointer. The
-caller is responsible for opening the `hashfile` and writing header
-information so the file format is identifiable before the chunk-based
-format begins.
-
-Then, call `add_chunk()` for each chunk that is intended for write. This
-populates the `chunkfile` with information about the order and size of
-each chunk to write. Provide a `chunk_write_fn` function pointer to
-perform the write of the chunk data upon request.
-
-Call `write_chunkfile()` to write the table of contents to the `hashfile`
-followed by each of the chunks. This will verify that each chunk wrote
-the expected amount of data so the table of contents is correct.
-
-Finally, call `free_chunkfile()` to clear the `struct chunkfile` data. The
-caller is responsible for finalizing the `hashfile` by writing the trailing
-hash and closing the file.
-
-Reading chunk-based file formats
---------------------------------
-
-To read a chunk-based file format, the file must be opened as a
-memory-mapped region. The chunk-format API expects that the entire file
-is mapped as a contiguous memory region.
-
-Initialize a `struct chunkfile` pointer with `init_chunkfile(NULL)`.
-
-After reading the header information from the beginning of the file,
-including the chunk count, call `read_table_of_contents()` to populate
-the `struct chunkfile` with the list of chunks, their offsets, and their
-sizes.
-
-Extract the data information for each chunk using `pair_chunk()` or
-`read_chunk()`:
-
-* `pair_chunk()` assigns a given pointer with the location inside the
- memory-mapped file corresponding to that chunk's offset. If the chunk
- does not exist, then the pointer is not modified.
-
-* `read_chunk()` takes a `chunk_read_fn` function pointer and calls it
- with the appropriate initial pointer and size information. The function
- is not called if the chunk does not exist. Use this method to read chunks
- if you need to perform immediate parsing or if you need to execute logic
- based on the size of the chunk.
-
-After calling these methods, call `free_chunkfile()` to clear the
-`struct chunkfile` data. This will not close the memory-mapped region.
-Callers are expected to own that data for the timeframe the pointers into
-the region are needed.
-
-Examples
---------
-
-These file formats use the chunk-format API, and can be used as examples
-for future formats:
-
-* *commit-graph:* see `write_commit_graph_file()` and `parse_commit_graph()`
- in `commit-graph.c` for how the chunk-format API is used to write and
- parse the commit-graph file format documented in
- link:technical/commit-graph-format.html[the commit-graph file format].
-
-* *multi-pack-index:* see `write_midx_internal()` and `load_multi_pack_index()`
- in `midx.c` for how the chunk-format API is used to write and
- parse the multi-pack-index file format documented in
- link:technical/pack-format.html[the multi-pack-index file format].
diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
deleted file mode 100644
index 87971c2..0000000
--- a/Documentation/technical/commit-graph-format.txt
+++ /dev/null
@@ -1,158 +0,0 @@
-Git commit graph format
-=======================
-
-The Git commit graph stores a list of commit OIDs and some associated
-metadata, including:
-
-- The generation number of the commit.
-
-- The root tree OID.
-
-- The commit date.
-
-- The parents of the commit, stored using positional references within
- the graph file.
-
-- The Bloom filter of the commit carrying the paths that were changed between
- the commit and its first parent, if requested.
-
-These positional references are stored as unsigned 32-bit integers
-corresponding to the array position within the list of commit OIDs. Due
-to some special constants we use to track parents, we can store at most
-(1 << 30) + (1 << 29) + (1 << 28) - 1 (around 1.8 billion) commits.
-
-== Commit graph files have the following format:
-
-In order to allow extensions that add extra data to the graph, we organize
-the body into "chunks" and provide a binary lookup table at the beginning
-of the body. The header includes certain values, such as number of chunks
-and hash type.
-
-All multi-byte numbers are in network byte order.
-
-HEADER:
-
- 4-byte signature:
- The signature is: {'C', 'G', 'P', 'H'}
-
- 1-byte version number:
- Currently, the only valid version is 1.
-
- 1-byte Hash Version
- We infer the hash length (H) from this value:
- 1 => SHA-1
- 2 => SHA-256
- If the hash type does not match the repository's hash algorithm, the
- commit-graph file should be ignored with a warning presented to the
- user.
-
- 1-byte number (C) of "chunks"
-
- 1-byte number (B) of base commit-graphs
- We infer the length (H*B) of the Base Graphs chunk
- from this value.
-
-CHUNK LOOKUP:
-
- (C + 1) * 12 bytes listing the table of contents for the chunks:
- First 4 bytes describe the chunk id. Value 0 is a terminating label.
- Other 8 bytes provide the byte-offset in current file for chunk to
- start. (Chunks are ordered contiguously in the file, so you can infer
- the length using the next chunk position if necessary.) Each chunk
- ID appears at most once.
-
- The CHUNK LOOKUP matches the table of contents from
- link:technical/chunk-format.html[the chunk-based file format].
-
- The remaining data in the body is described one chunk at a time, and
- these chunks may be given in any order. Chunks are required unless
- otherwise specified.
-
-CHUNK DATA:
-
- OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
- The ith entry, F[i], stores the number of OIDs with first
- byte at most i. Thus F[255] stores the total
- number of commits (N).
-
- OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
- The OIDs for all commits in the graph, sorted in ascending order.
-
- Commit Data (ID: {'C', 'D', 'A', 'T' }) (N * (H + 16) bytes)
- * The first H bytes are for the OID of the root tree.
- * The next 8 bytes are for the positions of the first two parents
- of the ith commit. Stores value 0x70000000 if no parent in that
- position. If there are more than two parents, the second value
- has its most-significant bit on and the other bits store an array
- position into the Extra Edge List chunk.
- * The next 8 bytes store the topological level (generation number v1)
- of the commit and
- the commit time in seconds since EPOCH. The generation number
- uses the higher 30 bits of the first 4 bytes, while the commit
- time uses the 32 bits of the second 4 bytes, along with the lowest
- 2 bits of the lowest byte, storing the 33rd and 34th bit of the
- commit time.
-
- Generation Data (ID: {'G', 'D', 'A', 'T' }) (N * 4 bytes) [Optional]
- * This list of 4-byte values store corrected commit date offsets for the
- commits, arranged in the same order as commit data chunk.
- * If the corrected commit date offset cannot be stored within 31 bits,
- the value has its most-significant bit on and the other bits store
- the position of corrected commit date into the Generation Data Overflow
- chunk.
- * Generation Data chunk is present only when commit-graph file is written
- by compatible versions of Git and in case of split commit-graph chains,
- the topmost layer also has Generation Data chunk.
-
- Generation Data Overflow (ID: {'G', 'D', 'O', 'V' }) [Optional]
- * This list of 8-byte values stores the corrected commit date offsets
- for commits with corrected commit date offsets that cannot be
- stored within 31 bits.
- * Generation Data Overflow chunk is present only when Generation Data
- chunk is present and atleast one corrected commit date offset cannot
- be stored within 31 bits.
-
- Extra Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
- This list of 4-byte values store the second through nth parents for
- all octopus merges. The second parent value in the commit data stores
- an array position within this list along with the most-significant bit
- on. Starting at that array position, iterate through this list of commit
- positions for the parents until reaching a value with the most-significant
- bit on. The other bits correspond to the position of the last parent.
-
- Bloom Filter Index (ID: {'B', 'I', 'D', 'X'}) (N * 4 bytes) [Optional]
- * The ith entry, BIDX[i], stores the number of bytes in all Bloom filters
- from commit 0 to commit i (inclusive) in lexicographic order. The Bloom
- filter for the i-th commit spans from BIDX[i-1] to BIDX[i] (plus header
- length), where BIDX[-1] is 0.
- * The BIDX chunk is ignored if the BDAT chunk is not present.
-
- Bloom Filter Data (ID: {'B', 'D', 'A', 'T'}) [Optional]
- * It starts with header consisting of three unsigned 32-bit integers:
- - Version of the hash algorithm being used. We currently only support
- value 1 which corresponds to the 32-bit version of the murmur3 hash
- implemented exactly as described in
- https://en.wikipedia.org/wiki/MurmurHash#Algorithm and the double
- hashing technique using seed values 0x293ae76f and 0x7e646e2 as
- described in https://doi.org/10.1007/978-3-540-30494-4_26 "Bloom Filters
- in Probabilistic Verification"
- - The number of times a path is hashed and hence the number of bit positions
- that cumulatively determine whether a file is present in the commit.
- - The minimum number of bits 'b' per entry in the Bloom filter. If the filter
- contains 'n' entries, then the filter size is the minimum number of 64-bit
- words that contain n*b bits.
- * The rest of the chunk is the concatenation of all the computed Bloom
- filters for the commits in lexicographic order.
- * Note: Commits with no changes or more than 512 changes have Bloom filters
- of length one, with either all bits set to zero or one respectively.
- * The BDAT chunk is present if and only if BIDX is present.
-
- Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional]
- This list of H-byte hashes describe a set of B commit-graph files that
- form a commit-graph chain. The graph position for the ith commit in this
- file's OID Lookup chunk is equal to i plus the number of commits in all
- base graphs. If B is non-zero, this chunk must exist.
-
-TRAILER:
-
- H-byte HASH-checksum of all of the above.
diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
index f05e7bd..2c26e95 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -1,4 +1,4 @@
-Git Commit Graph Design Notes
+Git Commit-Graph Design Notes
=============================
Git walks the commit graph for many reasons, including:
@@ -17,7 +17,7 @@ There are two main costs here:
The commit-graph file is a supplemental data structure that accelerates
commit graph walks. If a user downgrades or disables the 'core.commitGraph'
-config setting, then the existing ODB is sufficient. The file is stored
+config setting, then the existing object database is sufficient. The file is stored
as "commit-graph" either in the .git/objects/info directory or in the info
directory of an alternate.
@@ -40,7 +40,7 @@ Values 1-4 satisfy the requirements of parse_commit_gently().
There are two definitions of generation number:
1. Corrected committer dates (generation number v2)
-2. Topological levels (generation nummber v1)
+2. Topological levels (generation number v1)
Define "corrected committer date" of a commit recursively as follows:
@@ -48,7 +48,7 @@ Define "corrected committer date" of a commit recursively as follows:
equal to its committer date.
* A commit with at least one parent has corrected committer date equal to
- the maximum of its commiter date and one more than the largest corrected
+ the maximum of its committer date and one more than the largest corrected
committer date among its parents.
* As a special case, a root commit with timestamp zero has corrected commit
@@ -95,7 +95,7 @@ with default order), but is not used when the topological order is
required (such as merge base calculations, "git log --graph").
In practice, we expect some commits to be created recently and not stored
-in the commit graph. We can treat these commits as having "infinite"
+in the commit-graph. We can treat these commits as having "infinite"
generation number and walk until reaching commits with known generation
number.
@@ -136,7 +136,7 @@ Design Details
- Commit grafts and replace objects can change the shape of the commit
history. The latter can also be enabled/disabled on the fly using
- `--no-replace-objects`. This leads to difficultly storing both possible
+ `--no-replace-objects`. This leads to difficulty storing both possible
interpretations of a commit id, especially when computing generation
numbers. The commit-graph will not be read or written when
replace-objects or grafts are present.
@@ -149,7 +149,7 @@ Design Details
helpful for these clones, anyway. The commit-graph will not be read or
written when shallow commits are present.
-Commit Graphs Chains
+Commit-Graphs Chains
--------------------
Typically, repos grow with near-constant velocity (commits per day). Over time,
diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt
index 260224b..ed57481 100644
--- a/Documentation/technical/hash-function-transition.txt
+++ b/Documentation/technical/hash-function-transition.txt
@@ -205,7 +205,7 @@ SHA-1 content.
Object storage
~~~~~~~~~~~~~~
Loose objects use zlib compression and packed objects use the packed
-format described in Documentation/technical/pack-format.txt, just like
+format described in linkgit:gitformat-pack[5], just like
today. The content that is compressed and stored uses SHA-256 content
instead of SHA-1 content.
@@ -562,7 +562,7 @@ hash re-encode during clone and to encourage peers to modernize.
The design described here allows fetches by SHA-1 clients of a
personal SHA-256 repository because it's not much more difficult than
allowing pushes from that repository. This support needs to be guarded
-by a configuration option --- servers like git.kernel.org that serve a
+by a configuration option -- servers like git.kernel.org that serve a
large number of clients would not be expected to bear that cost.
Meaning of signatures
diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
deleted file mode 100644
index cc5126c..0000000
--- a/Documentation/technical/http-protocol.txt
+++ /dev/null
@@ -1,522 +0,0 @@
-HTTP transfer protocols
-=======================
-
-Git supports two HTTP based transfer protocols. A "dumb" protocol
-which requires only a standard HTTP server on the server end of the
-connection, and a "smart" protocol which requires a Git aware CGI
-(or server module). This document describes both protocols.
-
-As a design feature smart clients can automatically upgrade "dumb"
-protocol URLs to smart URLs. This permits all users to have the
-same published URL, and the peers automatically select the most
-efficient transport available to them.
-
-
-URL Format
-----------
-
-URLs for Git repositories accessed by HTTP use the standard HTTP
-URL syntax documented by RFC 1738, so they are of the form:
-
- http://<host>:<port>/<path>?<searchpart>
-
-Within this documentation the placeholder `$GIT_URL` will stand for
-the http:// repository URL entered by the end-user.
-
-Servers SHOULD handle all requests to locations matching `$GIT_URL`, as
-both the "smart" and "dumb" HTTP protocols used by Git operate
-by appending additional path components onto the end of the user
-supplied `$GIT_URL` string.
-
-An example of a dumb client requesting for a loose object:
-
- $GIT_URL: http://example.com:8080/git/repo.git
- URL request: http://example.com:8080/git/repo.git/objects/d0/49f6c27a2244e12041955e262a404c7faba355
-
-An example of a smart request to a catch-all gateway:
-
- $GIT_URL: http://example.com/daemon.cgi?svc=git&q=
- URL request: http://example.com/daemon.cgi?svc=git&q=/info/refs&service=git-receive-pack
-
-An example of a request to a submodule:
-
- $GIT_URL: http://example.com/git/repo.git/path/submodule.git
- URL request: http://example.com/git/repo.git/path/submodule.git/info/refs
-
-Clients MUST strip a trailing `/`, if present, from the user supplied
-`$GIT_URL` string to prevent empty path tokens (`//`) from appearing
-in any URL sent to a server. Compatible clients MUST expand
-`$GIT_URL/info/refs` as `foo/info/refs` and not `foo//info/refs`.
-
-
-Authentication
---------------
-
-Standard HTTP authentication is used if authentication is required
-to access a repository, and MAY be configured and enforced by the
-HTTP server software.
-
-Because Git repositories are accessed by standard path components
-server administrators MAY use directory based permissions within
-their HTTP server to control repository access.
-
-Clients SHOULD support Basic authentication as described by RFC 2617.
-Servers SHOULD support Basic authentication by relying upon the
-HTTP server placed in front of the Git server software.
-
-Servers SHOULD NOT require HTTP cookies for the purposes of
-authentication or access control.
-
-Clients and servers MAY support other common forms of HTTP based
-authentication, such as Digest authentication.
-
-
-SSL
----
-
-Clients and servers SHOULD support SSL, particularly to protect
-passwords when relying on Basic HTTP authentication.
-
-
-Session State
--------------
-
-The Git over HTTP protocol (much like HTTP itself) is stateless
-from the perspective of the HTTP server side. All state MUST be
-retained and managed by the client process. This permits simple
-round-robin load-balancing on the server side, without needing to
-worry about state management.
-
-Clients MUST NOT require state management on the server side in
-order to function correctly.
-
-Servers MUST NOT require HTTP cookies in order to function correctly.
-Clients MAY store and forward HTTP cookies during request processing
-as described by RFC 2616 (HTTP/1.1). Servers SHOULD ignore any
-cookies sent by a client.
-
-
-General Request Processing
---------------------------
-
-Except where noted, all standard HTTP behavior SHOULD be assumed
-by both client and server. This includes (but is not necessarily
-limited to):
-
-If there is no repository at `$GIT_URL`, or the resource pointed to by a
-location matching `$GIT_URL` does not exist, the server MUST NOT respond
-with `200 OK` response. A server SHOULD respond with
-`404 Not Found`, `410 Gone`, or any other suitable HTTP status code
-which does not imply the resource exists as requested.
-
-If there is a repository at `$GIT_URL`, but access is not currently
-permitted, the server MUST respond with the `403 Forbidden` HTTP
-status code.
-
-Servers SHOULD support both HTTP 1.0 and HTTP 1.1.
-Servers SHOULD support chunked encoding for both request and response
-bodies.
-
-Clients SHOULD support both HTTP 1.0 and HTTP 1.1.
-Clients SHOULD support chunked encoding for both request and response
-bodies.
-
-Servers MAY return ETag and/or Last-Modified headers.
-
-Clients MAY revalidate cached entities by including If-Modified-Since
-and/or If-None-Match request headers.
-
-Servers MAY return `304 Not Modified` if the relevant headers appear
-in the request and the entity has not changed. Clients MUST treat
-`304 Not Modified` identical to `200 OK` by reusing the cached entity.
-
-Clients MAY reuse a cached entity without revalidation if the
-Cache-Control and/or Expires header permits caching. Clients and
-servers MUST follow RFC 2616 for cache controls.
-
-
-Discovering References
-----------------------
-
-All HTTP clients MUST begin either a fetch or a push exchange by
-discovering the references available on the remote repository.
-
-Dumb Clients
-~~~~~~~~~~~~
-
-HTTP clients that only support the "dumb" protocol MUST discover
-references by making a request for the special info/refs file of
-the repository.
-
-Dumb HTTP clients MUST make a `GET` request to `$GIT_URL/info/refs`,
-without any search/query parameters.
-
- C: GET $GIT_URL/info/refs HTTP/1.0
-
- S: 200 OK
- S:
- S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint
- S: d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master
- S: 2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0
- S: a3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}
-
-The Content-Type of the returned info/refs entity SHOULD be
-`text/plain; charset=utf-8`, but MAY be any content type.
-Clients MUST NOT attempt to validate the returned Content-Type.
-Dumb servers MUST NOT return a return type starting with
-`application/x-git-`.
-
-Cache-Control headers MAY be returned to disable caching of the
-returned entity.
-
-When examining the response clients SHOULD only examine the HTTP
-status code. Valid responses are `200 OK`, or `304 Not Modified`.
-
-The returned content is a UNIX formatted text file describing
-each ref and its known value. The file SHOULD be sorted by name
-according to the C locale ordering. The file SHOULD NOT include
-the default ref named `HEAD`.
-
- info_refs = *( ref_record )
- ref_record = any_ref / peeled_ref
-
- any_ref = obj-id HTAB refname LF
- peeled_ref = obj-id HTAB refname LF
- obj-id HTAB refname "^{}" LF
-
-Smart Clients
-~~~~~~~~~~~~~
-
-HTTP clients that support the "smart" protocol (or both the
-"smart" and "dumb" protocols) MUST discover references by making
-a parameterized request for the info/refs file of the repository.
-
-The request MUST contain exactly one query parameter,
-`service=$servicename`, where `$servicename` MUST be the service
-name the client wishes to contact to complete the operation.
-The request MUST NOT contain additional query parameters.
-
- C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0
-
-dumb server reply:
-
- S: 200 OK
- S:
- S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint
- S: d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master
- S: 2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0
- S: a3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}
-
-smart server reply:
-
- S: 200 OK
- S: Content-Type: application/x-git-upload-pack-advertisement
- S: Cache-Control: no-cache
- S:
- S: 001e# service=git-upload-pack\n
- S: 0000
- S: 004895dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint\0multi_ack\n
- S: 003fd049f6c27a2244e12041955e262a404c7faba355 refs/heads/master\n
- S: 003c2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0\n
- S: 003fa3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}\n
- S: 0000
-
-The client may send Extra Parameters (see
-Documentation/technical/pack-protocol.txt) as a colon-separated string
-in the Git-Protocol HTTP header.
-
-Uses the `--http-backend-info-refs` option to
-linkgit:git-upload-pack[1].
-
-Dumb Server Response
-^^^^^^^^^^^^^^^^^^^^
-Dumb servers MUST respond with the dumb server reply format.
-
-See the prior section under dumb clients for a more detailed
-description of the dumb server response.
-
-Smart Server Response
-^^^^^^^^^^^^^^^^^^^^^
-If the server does not recognize the requested service name, or the
-requested service name has been disabled by the server administrator,
-the server MUST respond with the `403 Forbidden` HTTP status code.
-
-Otherwise, smart servers MUST respond with the smart server reply
-format for the requested service name.
-
-Cache-Control headers SHOULD be used to disable caching of the
-returned entity.
-
-The Content-Type MUST be `application/x-$servicename-advertisement`.
-Clients SHOULD fall back to the dumb protocol if another content
-type is returned. When falling back to the dumb protocol clients
-SHOULD NOT make an additional request to `$GIT_URL/info/refs`, but
-instead SHOULD use the response already in hand. Clients MUST NOT
-continue if they do not support the dumb protocol.
-
-Clients MUST validate the status code is either `200 OK` or
-`304 Not Modified`.
-
-Clients MUST validate the first five bytes of the response entity
-matches the regex `^[0-9a-f]{4}#`. If this test fails, clients
-MUST NOT continue.
-
-Clients MUST parse the entire response as a sequence of pkt-line
-records.
-
-Clients MUST verify the first pkt-line is `# service=$servicename`.
-Servers MUST set $servicename to be the request parameter value.
-Servers SHOULD include an LF at the end of this line.
-Clients MUST ignore an LF at the end of the line.
-
-Servers MUST terminate the response with the magic `0000` end
-pkt-line marker.
-
-The returned response is a pkt-line stream describing each ref and
-its known value. The stream SHOULD be sorted by name according to
-the C locale ordering. The stream SHOULD include the default ref
-named `HEAD` as the first ref. The stream MUST include capability
-declarations behind a NUL on the first ref.
-
-The returned response contains "version 1" if "version=1" was sent as an
-Extra Parameter.
-
- smart_reply = PKT-LINE("# service=$servicename" LF)
- "0000"
- *1("version 1")
- ref_list
- "0000"
- ref_list = empty_list / non_empty_list
-
- empty_list = PKT-LINE(zero-id SP "capabilities^{}" NUL cap-list LF)
-
- non_empty_list = PKT-LINE(obj-id SP name NUL cap_list LF)
- *ref_record
-
- cap-list = capability *(SP capability)
- capability = 1*(LC_ALPHA / DIGIT / "-" / "_")
- LC_ALPHA = %x61-7A
-
- ref_record = any_ref / peeled_ref
- any_ref = PKT-LINE(obj-id SP name LF)
- peeled_ref = PKT-LINE(obj-id SP name LF)
- PKT-LINE(obj-id SP name "^{}" LF
-
-
-Smart Service git-upload-pack
-------------------------------
-This service reads from the repository pointed to by `$GIT_URL`.
-
-Clients MUST first perform ref discovery with
-`$GIT_URL/info/refs?service=git-upload-pack`.
-
- C: POST $GIT_URL/git-upload-pack HTTP/1.0
- C: Content-Type: application/x-git-upload-pack-request
- C:
- C: 0032want 0a53e9ddeaddad63ad106860237bbf53411d11a7\n
- C: 0032have 441b40d833fdfa93eb2908e52742248faf0ee993\n
- C: 0000
-
- S: 200 OK
- S: Content-Type: application/x-git-upload-pack-result
- S: Cache-Control: no-cache
- S:
- S: ....ACK %s, continue
- S: ....NAK
-
-Clients MUST NOT reuse or revalidate a cached response.
-Servers MUST include sufficient Cache-Control headers
-to prevent caching of the response.
-
-Servers SHOULD support all capabilities defined here.
-
-Clients MUST send at least one "want" command in the request body.
-Clients MUST NOT reference an id in a "want" command which did not
-appear in the response obtained through ref discovery unless the
-server advertises capability `allow-tip-sha1-in-want` or
-`allow-reachable-sha1-in-want`.
-
- compute_request = want_list
- have_list
- request_end
- request_end = "0000" / "done"
-
- want_list = PKT-LINE(want SP cap_list LF)
- *(want_pkt)
- want_pkt = PKT-LINE(want LF)
- want = "want" SP id
- cap_list = capability *(SP capability)
-
- have_list = *PKT-LINE("have" SP id LF)
-
-TODO: Document this further.
-
-The Negotiation Algorithm
-~~~~~~~~~~~~~~~~~~~~~~~~~
-The computation to select the minimal pack proceeds as follows
-(C = client, S = server):
-
-'init step:'
-
-C: Use ref discovery to obtain the advertised refs.
-
-C: Place any object seen into set `advertised`.
-
-C: Build an empty set, `common`, to hold the objects that are later
- determined to be on both ends.
-
-C: Build a set, `want`, of the objects from `advertised` the client
- wants to fetch, based on what it saw during ref discovery.
-
-C: Start a queue, `c_pending`, ordered by commit time (popping newest
- first). Add all client refs. When a commit is popped from
- the queue its parents SHOULD be automatically inserted back.
- Commits MUST only enter the queue once.
-
-'one compute step:'
-
-C: Send one `$GIT_URL/git-upload-pack` request:
-
- C: 0032want <want #1>...............................
- C: 0032want <want #2>...............................
- ....
- C: 0032have <common #1>.............................
- C: 0032have <common #2>.............................
- ....
- C: 0032have <have #1>...............................
- C: 0032have <have #2>...............................
- ....
- C: 0000
-
-The stream is organized into "commands", with each command
-appearing by itself in a pkt-line. Within a command line,
-the text leading up to the first space is the command name,
-and the remainder of the line to the first LF is the value.
-Command lines are terminated with an LF as the last byte of
-the pkt-line value.
-
-Commands MUST appear in the following order, if they appear
-at all in the request stream:
-
-* "want"
-* "have"
-
-The stream is terminated by a pkt-line flush (`0000`).
-
-A single "want" or "have" command MUST have one hex formatted
-object name as its value. Multiple object names MUST be sent by sending
-multiple commands. Object names MUST be given using the object format
-negotiated through the `object-format` capability (default SHA-1).
-
-The `have` list is created by popping the first 32 commits
-from `c_pending`. Less can be supplied if `c_pending` empties.
-
-If the client has sent 256 "have" commits and has not yet
-received one of those back from `s_common`, or the client has
-emptied `c_pending` it SHOULD include a "done" command to let
-the server know it won't proceed:
-
- C: 0009done
-
-S: Parse the git-upload-pack request:
-
-Verify all objects in `want` are directly reachable from refs.
-
-The server MAY walk backwards through history or through
-the reflog to permit slightly stale requests.
-
-If no "want" objects are received, send an error:
-TODO: Define error if no "want" lines are requested.
-
-If any "want" object is not reachable, send an error:
-TODO: Define error if an invalid "want" is requested.
-
-Create an empty list, `s_common`.
-
-If "have" was sent:
-
-Loop through the objects in the order supplied by the client.
-
-For each object, if the server has the object reachable from
-a ref, add it to `s_common`. If a commit is added to `s_common`,
-do not add any ancestors, even if they also appear in `have`.
-
-S: Send the git-upload-pack response:
-
-If the server has found a closed set of objects to pack or the
-request ends with "done", it replies with the pack.
-TODO: Document the pack based response
-
- S: PACK...
-
-The returned stream is the side-band-64k protocol supported
-by the git-upload-pack service, and the pack is embedded into
-stream 1. Progress messages from the server side MAY appear
-in stream 2.
-
-Here a "closed set of objects" is defined to have at least
-one path from every "want" to at least one "common" object.
-
-If the server needs more information, it replies with a
-status continue response:
-TODO: Document the non-pack response
-
-C: Parse the upload-pack response:
- TODO: Document parsing response
-
-'Do another compute step.'
-
-
-Smart Service git-receive-pack
-------------------------------
-This service reads from the repository pointed to by `$GIT_URL`.
-
-Clients MUST first perform ref discovery with
-`$GIT_URL/info/refs?service=git-receive-pack`.
-
- C: POST $GIT_URL/git-receive-pack HTTP/1.0
- C: Content-Type: application/x-git-receive-pack-request
- C:
- C: ....0a53e9ddeaddad63ad106860237bbf53411d11a7 441b40d833fdfa93eb2908e52742248faf0ee993 refs/heads/maint\0 report-status
- C: 0000
- C: PACK....
-
- S: 200 OK
- S: Content-Type: application/x-git-receive-pack-result
- S: Cache-Control: no-cache
- S:
- S: ....
-
-Clients MUST NOT reuse or revalidate a cached response.
-Servers MUST include sufficient Cache-Control headers
-to prevent caching of the response.
-
-Servers SHOULD support all capabilities defined here.
-
-Clients MUST send at least one command in the request body.
-Within the command portion of the request body clients SHOULD send
-the id obtained through ref discovery as old_id.
-
- update_request = command_list
- "PACK" <binary data>
-
- command_list = PKT-LINE(command NUL cap_list LF)
- *(command_pkt)
- command_pkt = PKT-LINE(command LF)
- cap_list = *(SP capability) SP
-
- command = create / delete / update
- create = zero-id SP new_id SP name
- delete = old_id SP zero-id SP name
- update = old_id SP new_id SP name
-
-TODO: Document this further.
-
-
-References
-----------
-
-http://www.ietf.org/rfc/rfc1738.txt[RFC 1738: Uniform Resource Locators (URL)]
-http://www.ietf.org/rfc/rfc2616.txt[RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1]
-link:technical/pack-protocol.html
-link:technical/protocol-capabilities.html
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
deleted file mode 100644
index 65da0da..0000000
--- a/Documentation/technical/index-format.txt
+++ /dev/null
@@ -1,406 +0,0 @@
-Git index format
-================
-
-== The Git index file has the following format
-
- All binary numbers are in network byte order.
- In a repository using the traditional SHA-1, checksums and object IDs
- (object names) mentioned below are all computed using SHA-1. Similarly,
- in SHA-256 repositories, these values are computed using SHA-256.
- Version 2 is described here unless stated otherwise.
-
- - A 12-byte header consisting of
-
- 4-byte signature:
- The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache")
-
- 4-byte version number:
- The current supported versions are 2, 3 and 4.
-
- 32-bit number of index entries.
-
- - A number of sorted index entries (see below).
-
- - Extensions
-
- Extensions are identified by signature. Optional extensions can
- be ignored if Git does not understand them.
-
- Git currently supports cache tree and resolve undo extensions.
-
- 4-byte extension signature. If the first byte is 'A'..'Z' the
- extension is optional and can be ignored.
-
- 32-bit size of the extension
-
- Extension data
-
- - Hash checksum over the content of the index file before this checksum.
-
-== Index entry
-
- Index entries are sorted in ascending order on the name field,
- interpreted as a string of unsigned bytes (i.e. memcmp() order, no
- localization, no special casing of directory separator '/'). Entries
- with the same name are sorted by their stage field.
-
- An index entry typically represents a file. However, if sparse-checkout
- is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the
- `extensions.sparseIndex` extension is enabled, then the index may
- contain entries for directories outside of the sparse-checkout definition.
- These entries have mode `040000`, include the `SKIP_WORKTREE` bit, and
- the path ends in a directory separator.
-
- 32-bit ctime seconds, the last time a file's metadata changed
- this is stat(2) data
-
- 32-bit ctime nanosecond fractions
- this is stat(2) data
-
- 32-bit mtime seconds, the last time a file's data changed
- this is stat(2) data
-
- 32-bit mtime nanosecond fractions
- this is stat(2) data
-
- 32-bit dev
- this is stat(2) data
-
- 32-bit ino
- this is stat(2) data
-
- 32-bit mode, split into (high to low bits)
-
- 4-bit object type
- valid values in binary are 1000 (regular file), 1010 (symbolic link)
- and 1110 (gitlink)
-
- 3-bit unused
-
- 9-bit unix permission. Only 0755 and 0644 are valid for regular files.
- Symbolic links and gitlinks have value 0 in this field.
-
- 32-bit uid
- this is stat(2) data
-
- 32-bit gid
- this is stat(2) data
-
- 32-bit file size
- This is the on-disk size from stat(2), truncated to 32-bit.
-
- Object name for the represented object
-
- A 16-bit 'flags' field split into (high to low bits)
-
- 1-bit assume-valid flag
-
- 1-bit extended flag (must be zero in version 2)
-
- 2-bit stage (during merge)
-
- 12-bit name length if the length is less than 0xFFF; otherwise 0xFFF
- is stored in this field.
-
- (Version 3 or later) A 16-bit field, only applicable if the
- "extended flag" above is 1, split into (high to low bits).
-
- 1-bit reserved for future
-
- 1-bit skip-worktree flag (used by sparse checkout)
-
- 1-bit intent-to-add flag (used by "git add -N")
-
- 13-bit unused, must be zero
-
- Entry path name (variable length) relative to top level directory
- (without leading slash). '/' is used as path separator. The special
- path components ".", ".." and ".git" (without quotes) are disallowed.
- Trailing slash is also disallowed.
-
- The exact encoding is undefined, but the '.' and '/' characters
- are encoded in 7-bit ASCII and the encoding cannot contain a NUL
- byte (iow, this is a UNIX pathname).
-
- (Version 4) In version 4, the entry path name is prefix-compressed
- relative to the path name for the previous entry (the very first
- entry is encoded as if the path name for the previous entry is an
- empty string). At the beginning of an entry, an integer N in the
- variable width encoding (the same encoding as the offset is encoded
- for OFS_DELTA pack entries; see pack-format.txt) is stored, followed
- by a NUL-terminated string S. Removing N bytes from the end of the
- path name for the previous entry, and replacing it with the string S
- yields the path name for this entry.
-
- 1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes
- while keeping the name NUL-terminated.
-
- (Version 4) In version 4, the padding after the pathname does not
- exist.
-
- Interpretation of index entries in split index mode is completely
- different. See below for details.
-
-== Extensions
-
-=== Cache tree
-
- Since the index does not record entries for directories, the cache
- entries cannot describe tree objects that already exist in the object
- database for regions of the index that are unchanged from an existing
- commit. The cache tree extension stores a recursive tree structure that
- describes the trees that already exist and completely match sections of
- the cache entries. This speeds up tree object generation from the index
- for a new commit by only computing the trees that are "new" to that
- commit. It also assists when comparing the index to another tree, such
- as `HEAD^{tree}`, since sections of the index can be skipped when a tree
- comparison demonstrates equality.
-
- The recursive tree structure uses nodes that store a number of cache
- entries, a list of subnodes, and an object ID (OID). The OID references
- the existing tree for that node, if it is known to exist. The subnodes
- correspond to subdirectories that themselves have cache tree nodes. The
- number of cache entries corresponds to the number of cache entries in
- the index that describe paths within that tree's directory.
-
- The extension tracks the full directory structure in the cache tree
- extension, but this is generally smaller than the full cache entry list.
-
- When a path is updated in index, Git invalidates all nodes of the
- recursive cache tree corresponding to the parent directories of that
- path. We store these tree nodes as being "invalid" by using "-1" as the
- number of cache entries. Invalid nodes still store a span of index
- entries, allowing Git to focus its efforts when reconstructing a full
- cache tree.
-
- The signature for this extension is { 'T', 'R', 'E', 'E' }.
-
- A series of entries fill the entire extension; each of which
- consists of:
-
- - NUL-terminated path component (relative to its parent directory);
-
- - ASCII decimal number of entries in the index that is covered by the
- tree this entry represents (entry_count);
-
- - A space (ASCII 32);
-
- - ASCII decimal number that represents the number of subtrees this
- tree has;
-
- - A newline (ASCII 10); and
-
- - Object name for the object that would result from writing this span
- of index as a tree.
-
- An entry can be in an invalidated state and is represented by having
- a negative number in the entry_count field. In this case, there is no
- object name and the next entry starts immediately after the newline.
- When writing an invalid entry, -1 should always be used as entry_count.
-
- The entries are written out in the top-down, depth-first order. The
- first entry represents the root level of the repository, followed by the
- first subtree--let's call this A--of the root level (with its name
- relative to the root level), followed by the first subtree of A (with
- its name relative to A), and so on. The specified number of subtrees
- indicates when the current level of the recursive stack is complete.
-
-=== Resolve undo
-
- A conflict is represented in the index as a set of higher stage entries.
- When a conflict is resolved (e.g. with "git add path"), these higher
- stage entries will be removed and a stage-0 entry with proper resolution
- is added.
-
- When these higher stage entries are removed, they are saved in the
- resolve undo extension, so that conflicts can be recreated (e.g. with
- "git checkout -m"), in case users want to redo a conflict resolution
- from scratch.
-
- The signature for this extension is { 'R', 'E', 'U', 'C' }.
-
- A series of entries fill the entire extension; each of which
- consists of:
-
- - NUL-terminated pathname the entry describes (relative to the root of
- the repository, i.e. full pathname);
-
- - Three NUL-terminated ASCII octal numbers, entry mode of entries in
- stage 1 to 3 (a missing stage is represented by "0" in this field);
- and
-
- - At most three object names of the entry in stages from 1 to 3
- (nothing is written for a missing stage).
-
-=== Split index
-
- In split index mode, the majority of index entries could be stored
- in a separate file. This extension records the changes to be made on
- top of that to produce the final index.
-
- The signature for this extension is { 'l', 'i', 'n', 'k' }.
-
- The extension consists of:
-
- - Hash of the shared index file. The shared index file path
- is $GIT_DIR/sharedindex.<hash>. If all bits are zero, the
- index does not require a shared index file.
-
- - An ewah-encoded delete bitmap, each bit represents an entry in the
- shared index. If a bit is set, its corresponding entry in the
- shared index will be removed from the final index. Note, because
- a delete operation changes index entry positions, but we do need
- original positions in replace phase, it's best to just mark
- entries for removal, then do a mass deletion after replacement.
-
- - An ewah-encoded replace bitmap, each bit represents an entry in
- the shared index. If a bit is set, its corresponding entry in the
- shared index will be replaced with an entry in this index
- file. All replaced entries are stored in sorted order in this
- index. The first "1" bit in the replace bitmap corresponds to the
- first index entry, the second "1" bit to the second entry and so
- on. Replaced entries may have empty path names to save space.
-
- The remaining index entries after replaced ones will be added to the
- final index. These added entries are also sorted by entry name then
- stage.
-
-== Untracked cache
-
- Untracked cache saves the untracked file list and necessary data to
- verify the cache. The signature for this extension is { 'U', 'N',
- 'T', 'R' }.
-
- The extension starts with
-
- - A sequence of NUL-terminated strings, preceded by the size of the
- sequence in variable width encoding. Each string describes the
- environment where the cache can be used.
-
- - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from
- ctime field until "file size".
-
- - Stat data of core.excludesFile
-
- - 32-bit dir_flags (see struct dir_struct)
-
- - Hash of $GIT_DIR/info/exclude. A null hash means the file
- does not exist.
-
- - Hash of core.excludesFile. A null hash means the file does
- not exist.
-
- - NUL-terminated string of per-dir exclude file name. This usually
- is ".gitignore".
-
- - The number of following directory blocks, variable width
- encoding. If this number is zero, the extension ends here with a
- following NUL.
-
- - A number of directory blocks in depth-first-search order, each
- consists of
-
- - The number of untracked entries, variable width encoding.
-
- - The number of sub-directory blocks, variable width encoding.
-
- - The directory name terminated by NUL.
-
- - A number of untracked file/dir names terminated by NUL.
-
-The remaining data of each directory block is grouped by type:
-
- - An ewah bitmap, the n-th bit marks whether the n-th directory has
- valid untracked cache entries.
-
- - An ewah bitmap, the n-th bit records "check-only" bit of
- read_directory_recursive() for the n-th directory.
-
- - An ewah bitmap, the n-th bit indicates whether hash and stat data
- is valid for the n-th directory and exists in the next data.
-
- - An array of stat data. The n-th data corresponds with the n-th
- "one" bit in the previous ewah bitmap.
-
- - An array of hashes. The n-th hash corresponds with the n-th "one" bit
- in the previous ewah bitmap.
-
- - One NUL.
-
-== File System Monitor cache
-
- The file system monitor cache tracks files for which the core.fsmonitor
- hook has told us about changes. The signature for this extension is
- { 'F', 'S', 'M', 'N' }.
-
- The extension starts with
-
- - 32-bit version number: the current supported versions are 1 and 2.
-
- - (Version 1)
- 64-bit time: the extension data reflects all changes through the given
- time which is stored as the nanoseconds elapsed since midnight,
- January 1, 1970.
-
- - (Version 2)
- A null terminated string: an opaque token defined by the file system
- monitor application. The extension data reflects all changes relative
- to that token.
-
- - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap.
-
- - An ewah bitmap, the n-th bit indicates whether the n-th index entry
- is not CE_FSMONITOR_VALID.
-
-== End of Index Entry
-
- The End of Index Entry (EOIE) is used to locate the end of the variable
- length index entries and the beginning of the extensions. Code can take
- advantage of this to quickly locate the index extensions without having
- to parse through all of the index entries.
-
- Because it must be able to be loaded before the variable length cache
- entries and other index extensions, this extension must be written last.
- The signature for this extension is { 'E', 'O', 'I', 'E' }.
-
- The extension consists of:
-
- - 32-bit offset to the end of the index entries
-
- - Hash over the extension types and their sizes (but not
- their contents). E.g. if we have "TREE" extension that is N-bytes
- long, "REUC" extension that is M-bytes long, followed by "EOIE",
- then the hash would be:
-
- Hash("TREE" + <binary representation of N> +
- "REUC" + <binary representation of M>)
-
-== Index Entry Offset Table
-
- The Index Entry Offset Table (IEOT) is used to help address the CPU
- cost of loading the index by enabling multi-threading the process of
- converting cache entries from the on-disk format to the in-memory format.
- The signature for this extension is { 'I', 'E', 'O', 'T' }.
-
- The extension consists of:
-
- - 32-bit version (currently 1)
-
- - A number of index offset entries each consisting of:
-
- - 32-bit offset from the beginning of the file to the first cache entry
- in this block of entries.
-
- - 32-bit count of cache entries in this block
-
-== Sparse Directory Entries
-
- When using sparse-checkout in cone mode, some entire directories within
- the index can be summarized by pointing to a tree object instead of the
- entire expanded list of paths within that tree. An index containing such
- entries is a "sparse index". Index format versions 4 and less were not
- implemented with such entries in mind. Thus, for these versions, an
- index containing sparse directory entries will include this extension
- with signature { 's', 'd', 'i', 'r' }. Like the split-index extension,
- tools should avoid interacting with a sparse index unless they understand
- this extension.
diff --git a/Documentation/technical/long-running-process-protocol.txt b/Documentation/technical/long-running-process-protocol.txt
index aa0aa9a..6f33654 100644
--- a/Documentation/technical/long-running-process-protocol.txt
+++ b/Documentation/technical/long-running-process-protocol.txt
@@ -3,7 +3,7 @@ Long-running process protocol
This protocol is used when Git needs to communicate with an external
process throughout the entire life of a single Git command. All
-communication is in pkt-line format (see technical/protocol-common.txt)
+communication is in pkt-line format (see linkgit:gitprotocol-common[5])
over standard input and standard output.
Handshake
diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index 1a73c3e..f2221d2 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -17,13 +17,14 @@ is not feasible due to storage space or excessive repack times.
The multi-pack-index (MIDX for short) stores a list of objects
and their offsets into multiple packfiles. It contains:
-- A list of packfile names.
-- A sorted list of object IDs.
-- A list of metadata for the ith object ID including:
- - A value j referring to the jth packfile.
- - An offset within the jth packfile for the object.
-- If large offsets are required, we use another list of large
+* A list of packfile names.
+* A sorted list of object IDs.
+* A list of metadata for the ith object ID including:
+** A value j referring to the jth packfile.
+** An offset within the jth packfile for the object.
+* If large offsets are required, we use another list of large
offsets similar to version 2 pack-indexes.
+- An optional list of objects in pseudo-pack order (used with MIDX bitmaps).
Thus, we can provide O(log N) lookup time for any number
of packfiles.
@@ -36,7 +37,9 @@ Design Details
directory of an alternate. It refers only to packfiles in that
same directory.
-- The core.multiPackIndex config setting must be on to consume MIDX files.
+- The core.multiPackIndex config setting must be on (which is the
+ default) to consume MIDX files. Setting it to `false` prevents
+ Git from reading a MIDX file, even if one exists.
- The file format includes parameters for the object ID hash
function, so a future change of hash algorithm does not require
@@ -85,11 +88,6 @@ Future Work
helpful to organize packfiles by object type (commit, tree, blob,
etc.) and use this metadata to help that maintenance.
-- The partial clone feature records special "promisor" packs that
- may point to objects that are not stored locally, but available
- on request to a server. The multi-pack-index does not currently
- track these promisor packs.
-
Related Links
-------------
[0] https://bugs.chromium.org/p/git/issues/detail?id=6
diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
deleted file mode 100644
index 8d2f42f..0000000
--- a/Documentation/technical/pack-format.txt
+++ /dev/null
@@ -1,464 +0,0 @@
-Git pack format
-===============
-
-== Checksums and object IDs
-
-In a repository using the traditional SHA-1, pack checksums, index checksums,
-and object IDs (object names) mentioned below are all computed using SHA-1.
-Similarly, in SHA-256 repositories, these values are computed using SHA-256.
-
-== pack-*.pack files have the following format:
-
- - A header appears at the beginning and consists of the following:
-
- 4-byte signature:
- The signature is: {'P', 'A', 'C', 'K'}
-
- 4-byte version number (network byte order):
- Git currently accepts version number 2 or 3 but
- generates version 2 only.
-
- 4-byte number of objects contained in the pack (network byte order)
-
- Observation: we cannot have more than 4G versions ;-) and
- more than 4G objects in a pack.
-
- - The header is followed by number of object entries, each of
- which looks like this:
-
- (undeltified representation)
- n-byte type and length (3-bit type, (n-1)*7+4-bit length)
- compressed data
-
- (deltified representation)
- n-byte type and length (3-bit type, (n-1)*7+4-bit length)
- base object name if OBJ_REF_DELTA or a negative relative
- offset from the delta object's position in the pack if this
- is an OBJ_OFS_DELTA object
- compressed delta data
-
- Observation: length of each object is encoded in a variable
- length format and is not constrained to 32-bit or anything.
-
- - The trailer records a pack checksum of all of the above.
-
-=== Object types
-
-Valid object types are:
-
-- OBJ_COMMIT (1)
-- OBJ_TREE (2)
-- OBJ_BLOB (3)
-- OBJ_TAG (4)
-- OBJ_OFS_DELTA (6)
-- OBJ_REF_DELTA (7)
-
-Type 5 is reserved for future expansion. Type 0 is invalid.
-
-=== Size encoding
-
-This document uses the following "size encoding" of non-negative
-integers: From each byte, the seven least significant bits are
-used to form the resulting integer. As long as the most significant
-bit is 1, this process continues; the byte with MSB 0 provides the
-last seven bits. The seven-bit chunks are concatenated. Later
-values are more significant.
-
-This size encoding should not be confused with the "offset encoding",
-which is also used in this document.
-
-=== Deltified representation
-
-Conceptually there are only four object types: commit, tree, tag and
-blob. However to save space, an object could be stored as a "delta" of
-another "base" object. These representations are assigned new types
-ofs-delta and ref-delta, which is only valid in a pack file.
-
-Both ofs-delta and ref-delta store the "delta" to be applied to
-another object (called 'base object') to reconstruct the object. The
-difference between them is, ref-delta directly encodes base object
-name. If the base object is in the same pack, ofs-delta encodes
-the offset of the base object in the pack instead.
-
-The base object could also be deltified if it's in the same pack.
-Ref-delta can also refer to an object outside the pack (i.e. the
-so-called "thin pack"). When stored on disk however, the pack should
-be self contained to avoid cyclic dependency.
-
-The delta data starts with the size of the base object and the
-size of the object to be reconstructed. These sizes are
-encoded using the size encoding from above. The remainder of
-the delta data is a sequence of instructions to reconstruct the object
-from the base object. If the base object is deltified, it must be
-converted to canonical form first. Each instruction appends more and
-more data to the target object until it's complete. There are two
-supported instructions so far: one for copy a byte range from the
-source object and one for inserting new data embedded in the
-instruction itself.
-
-Each instruction has variable length. Instruction type is determined
-by the seventh bit of the first octet. The following diagrams follow
-the convention in RFC 1951 (Deflate compressed data format).
-
-==== Instruction to copy from base object
-
- +----------+---------+---------+---------+---------+-------+-------+-------+
- | 1xxxxxxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | size3 |
- +----------+---------+---------+---------+---------+-------+-------+-------+
-
-This is the instruction format to copy a byte range from the source
-object. It encodes the offset to copy from and the number of bytes to
-copy. Offset and size are in little-endian order.
-
-All offset and size bytes are optional. This is to reduce the
-instruction size when encoding small offsets or sizes. The first seven
-bits in the first octet determines which of the next seven octets is
-present. If bit zero is set, offset1 is present. If bit one is set
-offset2 is present and so on.
-
-Note that a more compact instruction does not change offset and size
-encoding. For example, if only offset2 is omitted like below, offset3
-still contains bits 16-23. It does not become offset2 and contains
-bits 8-15 even if it's right next to offset1.
-
- +----------+---------+---------+
- | 10000101 | offset1 | offset3 |
- +----------+---------+---------+
-
-In its most compact form, this instruction only takes up one byte
-(0x80) with both offset and size omitted, which will have default
-values zero. There is another exception: size zero is automatically
-converted to 0x10000.
-
-==== Instruction to add new data
-
- +----------+============+
- | 0xxxxxxx | data |
- +----------+============+
-
-This is the instruction to construct target object without the base
-object. The following data is appended to the target object. The first
-seven bits of the first octet determines the size of data in
-bytes. The size must be non-zero.
-
-==== Reserved instruction
-
- +----------+============
- | 00000000 |
- +----------+============
-
-This is the instruction reserved for future expansion.
-
-== Original (version 1) pack-*.idx files have the following format:
-
- - The header consists of 256 4-byte network byte order
- integers. N-th entry of this table records the number of
- objects in the corresponding pack, the first byte of whose
- object name is less than or equal to N. This is called the
- 'first-level fan-out' table.
-
- - The header is followed by sorted 24-byte entries, one entry
- per object in the pack. Each entry is:
-
- 4-byte network byte order integer, recording where the
- object is stored in the packfile as the offset from the
- beginning.
-
- one object name of the appropriate size.
-
- - The file is concluded with a trailer:
-
- A copy of the pack checksum at the end of the corresponding
- packfile.
-
- Index checksum of all of the above.
-
-Pack Idx file:
-
- -- +--------------------------------+
-fanout | fanout[0] = 2 (for example) |-.
-table +--------------------------------+ |
- | fanout[1] | |
- +--------------------------------+ |
- | fanout[2] | |
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
- | fanout[255] = total objects |---.
- -- +--------------------------------+ | |
-main | offset | | |
-index | object name 00XXXXXXXXXXXXXXXX | | |
-table +--------------------------------+ | |
- | offset | | |
- | object name 00XXXXXXXXXXXXXXXX | | |
- +--------------------------------+<+ |
- .-| offset | |
- | | object name 01XXXXXXXXXXXXXXXX | |
- | +--------------------------------+ |
- | | offset | |
- | | object name 01XXXXXXXXXXXXXXXX | |
- | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
- | | offset | |
- | | object name FFXXXXXXXXXXXXXXXX | |
- --| +--------------------------------+<--+
-trailer | | packfile checksum |
- | +--------------------------------+
- | | idxfile checksum |
- | +--------------------------------+
- .-------.
- |
-Pack file entry: <+
-
- packed object header:
- 1-byte size extension bit (MSB)
- type (next 3 bit)
- size0 (lower 4-bit)
- n-byte sizeN (as long as MSB is set, each 7-bit)
- size0..sizeN form 4+7+7+..+7 bit integer, size0
- is the least significant part, and sizeN is the
- most significant part.
- packed object data:
- If it is not DELTA, then deflated bytes (the size above
- is the size before compression).
- If it is REF_DELTA, then
- base object name (the size above is the
- size of the delta data that follows).
- delta data, deflated.
- If it is OFS_DELTA, then
- n-byte offset (see below) interpreted as a negative
- offset from the type-byte of the header of the
- ofs-delta entry (the size above is the size of
- the delta data that follows).
- delta data, deflated.
-
- offset encoding:
- n bytes with MSB set in all but the last one.
- The offset is then the number constructed by
- concatenating the lower 7 bit of each byte, and
- for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1))
- to the result.
-
-
-
-== Version 2 pack-*.idx files support packs larger than 4 GiB, and
- have some other reorganizations. They have the format:
-
- - A 4-byte magic number '\377tOc' which is an unreasonable
- fanout[0] value.
-
- - A 4-byte version number (= 2)
-
- - A 256-entry fan-out table just like v1.
-
- - A table of sorted object names. These are packed together
- without offset values to reduce the cache footprint of the
- binary search for a specific object name.
-
- - A table of 4-byte CRC32 values of the packed object data.
- This is new in v2 so compressed data can be copied directly
- from pack to pack during repacking without undetected
- data corruption.
-
- - A table of 4-byte offset values (in network byte order).
- These are usually 31-bit pack file offsets, but large
- offsets are encoded as an index into the next table with
- the msbit set.
-
- - A table of 8-byte offset entries (empty for pack files less
- than 2 GiB). Pack files are organized with heavily used
- objects toward the front, so most object references should
- not need to refer to this table.
-
- - The same trailer as a v1 pack file:
-
- A copy of the pack checksum at the end of
- corresponding packfile.
-
- Index checksum of all of the above.
-
-== pack-*.rev files have the format:
-
- - A 4-byte magic number '0x52494458' ('RIDX').
-
- - A 4-byte version identifier (= 1).
-
- - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256).
-
- - A table of index positions (one per packed object, num_objects in
- total, each a 4-byte unsigned integer in network order), sorted by
- their corresponding offsets in the packfile.
-
- - A trailer, containing a:
-
- checksum of the corresponding packfile, and
-
- a checksum of all of the above.
-
-All 4-byte numbers are in network order.
-
-== multi-pack-index (MIDX) files have the following format:
-
-The multi-pack-index files refer to multiple pack-files and loose objects.
-
-In order to allow extensions that add extra data to the MIDX, we organize
-the body into "chunks" and provide a lookup table at the beginning of the
-body. The header includes certain length values, such as the number of packs,
-the number of base MIDX files, hash lengths and types.
-
-All 4-byte numbers are in network order.
-
-HEADER:
-
- 4-byte signature:
- The signature is: {'M', 'I', 'D', 'X'}
-
- 1-byte version number:
- Git only writes or recognizes version 1.
-
- 1-byte Object Id Version
- We infer the length of object IDs (OIDs) from this value:
- 1 => SHA-1
- 2 => SHA-256
- If the hash type does not match the repository's hash algorithm,
- the multi-pack-index file should be ignored with a warning
- presented to the user.
-
- 1-byte number of "chunks"
-
- 1-byte number of base multi-pack-index files:
- This value is currently always zero.
-
- 4-byte number of pack files
-
-CHUNK LOOKUP:
-
- (C + 1) * 12 bytes providing the chunk offsets:
- First 4 bytes describe chunk id. Value 0 is a terminating label.
- Other 8 bytes provide offset in current file for chunk to start.
- (Chunks are provided in file-order, so you can infer the length
- using the next chunk position if necessary.)
-
- The CHUNK LOOKUP matches the table of contents from
- link:technical/chunk-format.html[the chunk-based file format].
-
- The remaining data in the body is described one chunk at a time, and
- these chunks may be given in any order. Chunks are required unless
- otherwise specified.
-
-CHUNK DATA:
-
- Packfile Names (ID: {'P', 'N', 'A', 'M'})
- Stores the packfile names as concatenated, null-terminated strings.
- Packfiles must be listed in lexicographic order for fast lookups by
- name. This is the only chunk not guaranteed to be a multiple of four
- bytes in length, so should be the last chunk for alignment reasons.
-
- OID Fanout (ID: {'O', 'I', 'D', 'F'})
- The ith entry, F[i], stores the number of OIDs with first
- byte at most i. Thus F[255] stores the total
- number of objects.
-
- OID Lookup (ID: {'O', 'I', 'D', 'L'})
- The OIDs for all objects in the MIDX are stored in lexicographic
- order in this chunk.
-
- Object Offsets (ID: {'O', 'O', 'F', 'F'})
- Stores two 4-byte values for every object.
- 1: The pack-int-id for the pack storing this object.
- 2: The offset within the pack.
- If all offsets are less than 2^32, then the large offset chunk
- will not exist and offsets are stored as in IDX v1.
- If there is at least one offset value larger than 2^32-1, then
- the large offset chunk must exist, and offsets larger than
- 2^31-1 must be stored in it instead. If the large offset chunk
- exists and the 31st bit is on, then removing that bit reveals
- the row in the large offsets containing the 8-byte offset of
- this object.
-
- [Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'})
- 8-byte offsets into large packfiles.
-
-TRAILER:
-
- Index checksum of the above contents.
-
-== multi-pack-index reverse indexes
-
-Similar to the pack-based reverse index, the multi-pack index can also
-be used to generate a reverse index.
-
-Instead of mapping between offset, pack-, and index position, this
-reverse index maps between an object's position within the MIDX, and
-that object's position within a pseudo-pack that the MIDX describes
-(i.e., the ith entry of the multi-pack reverse index holds the MIDX
-position of ith object in pseudo-pack order).
-
-To clarify the difference between these orderings, consider a multi-pack
-reachability bitmap (which does not yet exist, but is what we are
-building towards here). Each bit needs to correspond to an object in the
-MIDX, and so we need an efficient mapping from bit position to MIDX
-position.
-
-One solution is to let bits occupy the same position in the oid-sorted
-index stored by the MIDX. But because oids are effectively random, their
-resulting reachability bitmaps would have no locality, and thus compress
-poorly. (This is the reason that single-pack bitmaps use the pack
-ordering, and not the .idx ordering, for the same purpose.)
-
-So we'd like to define an ordering for the whole MIDX based around
-pack ordering, which has far better locality (and thus compresses more
-efficiently). We can think of a pseudo-pack created by the concatenation
-of all of the packs in the MIDX. E.g., if we had a MIDX with three packs
-(a, b, c), with 10, 15, and 20 objects respectively, we can imagine an
-ordering of the objects like:
-
- |a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19|
-
-where the ordering of the packs is defined by the MIDX's pack list,
-and then the ordering of objects within each pack is the same as the
-order in the actual packfile.
-
-Given the list of packs and their counts of objects, you can
-naïvely reconstruct that pseudo-pack ordering (e.g., the object at
-position 27 must be (c,1) because packs "a" and "b" consumed 25 of the
-slots). But there's a catch. Objects may be duplicated between packs, in
-which case the MIDX only stores one pointer to the object (and thus we'd
-want only one slot in the bitmap).
-
-Callers could handle duplicates themselves by reading objects in order
-of their bit-position, but that's linear in the number of objects, and
-much too expensive for ordinary bitmap lookups. Building a reverse index
-solves this, since it is the logical inverse of the index, and that
-index has already removed duplicates. But, building a reverse index on
-the fly can be expensive. Since we already have an on-disk format for
-pack-based reverse indexes, let's reuse it for the MIDX's pseudo-pack,
-too.
-
-Objects from the MIDX are ordered as follows to string together the
-pseudo-pack. Let `pack(o)` return the pack from which `o` was selected
-by the MIDX, and define an ordering of packs based on their numeric ID
-(as stored by the MIDX). Let `offset(o)` return the object offset of `o`
-within `pack(o)`. Then, compare `o1` and `o2` as follows:
-
- - If one of `pack(o1)` and `pack(o2)` is preferred and the other
- is not, then the preferred one sorts first.
-+
-(This is a detail that allows the MIDX bitmap to determine which
-pack should be used by the pack-reuse mechanism, since it can ask
-the MIDX for the pack containing the object at bit position 0).
-
- - If `pack(o1) ≠ pack(o2)`, then sort the two objects in descending
- order based on the pack ID.
-
- - Otherwise, `pack(o1) = pack(o2)`, and the objects are sorted in
- pack-order (i.e., `o1` sorts ahead of `o2` exactly when `offset(o1)
- < offset(o2)`).
-
-In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
-objects in packs stored by the MIDX, laid out in pack order, and the
-packs arranged in MIDX order (with the preferred pack coming first).
-
-Finally, note that the MIDX's reverse index is not stored as a chunk in
-the multi-pack-index itself. This is done because the reverse index
-includes the checksum of the pack or MIDX to which it belongs, which
-makes it impossible to write in the MIDX. To avoid races when rewriting
-the MIDX, a MIDX reverse index includes the MIDX's checksum in its
-filename (e.g., `multi-pack-index-xyz.rev`).
diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
deleted file mode 100644
index e13a2c0..0000000
--- a/Documentation/technical/pack-protocol.txt
+++ /dev/null
@@ -1,709 +0,0 @@
-Packfile transfer protocols
-===========================
-
-Git supports transferring data in packfiles over the ssh://, git://, http:// and
-file:// transports. There exist two sets of protocols, one for pushing
-data from a client to a server and another for fetching data from a
-server to a client. The three transports (ssh, git, file) use the same
-protocol to transfer data. http is documented in http-protocol.txt.
-
-The processes invoked in the canonical Git implementation are 'upload-pack'
-on the server side and 'fetch-pack' on the client side for fetching data;
-then 'receive-pack' on the server and 'send-pack' on the client for pushing
-data. The protocol functions to have a server tell a client what is
-currently on the server, then for the two to negotiate the smallest amount
-of data to send in order to fully update one or the other.
-
-pkt-line Format
----------------
-
-The descriptions below build on the pkt-line format described in
-protocol-common.txt. When the grammar indicate `PKT-LINE(...)`, unless
-otherwise noted the usual pkt-line LF rules apply: the sender SHOULD
-include a LF, but the receiver MUST NOT complain if it is not present.
-
-An error packet is a special pkt-line that contains an error string.
-
-----
- error-line = PKT-LINE("ERR" SP explanation-text)
-----
-
-Throughout the protocol, where `PKT-LINE(...)` is expected, an error packet MAY
-be sent. Once this packet is sent by a client or a server, the data transfer
-process defined in this protocol is terminated.
-
-Transports
-----------
-There are three transports over which the packfile protocol is
-initiated. The Git transport is a simple, unauthenticated server that
-takes the command (almost always 'upload-pack', though Git
-servers can be configured to be globally writable, in which 'receive-
-pack' initiation is also allowed) with which the client wishes to
-communicate and executes it and connects it to the requesting
-process.
-
-In the SSH transport, the client just runs the 'upload-pack'
-or 'receive-pack' process on the server over the SSH protocol and then
-communicates with that invoked process over the SSH connection.
-
-The file:// transport runs the 'upload-pack' or 'receive-pack'
-process locally and communicates with it over a pipe.
-
-Extra Parameters
-----------------
-
-The protocol provides a mechanism in which clients can send additional
-information in its first message to the server. These are called "Extra
-Parameters", and are supported by the Git, SSH, and HTTP protocols.
-
-Each Extra Parameter takes the form of `<key>=<value>` or `<key>`.
-
-Servers that receive any such Extra Parameters MUST ignore all
-unrecognized keys. Currently, the only Extra Parameter recognized is
-"version" with a value of '1' or '2'. See protocol-v2.txt for more
-information on protocol version 2.
-
-Git Transport
--------------
-
-The Git transport starts off by sending the command and repository
-on the wire using the pkt-line format, followed by a NUL byte and a
-hostname parameter, terminated by a NUL byte.
-
- 0033git-upload-pack /project.git\0host=myserver.com\0
-
-The transport may send Extra Parameters by adding an additional NUL
-byte, and then adding one or more NUL-terminated strings:
-
- 003egit-upload-pack /project.git\0host=myserver.com\0\0version=1\0
-
---
- git-proto-request = request-command SP pathname NUL
- [ host-parameter NUL ] [ NUL extra-parameters ]
- request-command = "git-upload-pack" / "git-receive-pack" /
- "git-upload-archive" ; case sensitive
- pathname = *( %x01-ff ) ; exclude NUL
- host-parameter = "host=" hostname [ ":" port ]
- extra-parameters = 1*extra-parameter
- extra-parameter = 1*( %x01-ff ) NUL
---
-
-host-parameter is used for the
-git-daemon name based virtual hosting. See --interpolated-path
-option to git daemon, with the %H/%CH format characters.
-
-Basically what the Git client is doing to connect to an 'upload-pack'
-process on the server side over the Git protocol is this:
-
- $ echo -e -n \
- "003agit-upload-pack /schacon/gitbook.git\0host=example.com\0" |
- nc -v example.com 9418
-
-
-SSH Transport
--------------
-
-Initiating the upload-pack or receive-pack processes over SSH is
-executing the binary on the server via SSH remote execution.
-It is basically equivalent to running this:
-
- $ ssh git.example.com "git-upload-pack '/project.git'"
-
-For a server to support Git pushing and pulling for a given user over
-SSH, that user needs to be able to execute one or both of those
-commands via the SSH shell that they are provided on login. On some
-systems, that shell access is limited to only being able to run those
-two commands, or even just one of them.
-
-In an ssh:// format URI, it's absolute in the URI, so the '/' after
-the host name (or port number) is sent as an argument, which is then
-read by the remote git-upload-pack exactly as is, so it's effectively
-an absolute path in the remote filesystem.
-
- git clone ssh://user@example.com/project.git
- |
- v
- ssh user@example.com "git-upload-pack '/project.git'"
-
-In a "user@host:path" format URI, its relative to the user's home
-directory, because the Git client will run:
-
- git clone user@example.com:project.git
- |
- v
- ssh user@example.com "git-upload-pack 'project.git'"
-
-The exception is if a '~' is used, in which case
-we execute it without the leading '/'.
-
- ssh://user@example.com/~alice/project.git,
- |
- v
- ssh user@example.com "git-upload-pack '~alice/project.git'"
-
-Depending on the value of the `protocol.version` configuration variable,
-Git may attempt to send Extra Parameters as a colon-separated string in
-the GIT_PROTOCOL environment variable. This is done only if
-the `ssh.variant` configuration variable indicates that the ssh command
-supports passing environment variables as an argument.
-
-A few things to remember here:
-
-- The "command name" is spelled with dash (e.g. git-upload-pack), but
- this can be overridden by the client;
-
-- The repository path is always quoted with single quotes.
-
-Fetching Data From a Server
----------------------------
-
-When one Git repository wants to get data that a second repository
-has, the first can 'fetch' from the second. This operation determines
-what data the server has that the client does not then streams that
-data down to the client in packfile format.
-
-
-Reference Discovery
--------------------
-
-When the client initially connects the server will immediately respond
-with a version number (if "version=1" is sent as an Extra Parameter),
-and a listing of each reference it has (all branches and tags) along
-with the object name that each reference currently points to.
-
- $ echo -e -n "0045git-upload-pack /schacon/gitbook.git\0host=example.com\0\0version=1\0" |
- nc -v example.com 9418
- 000eversion 1
- 00887217a7c7e582c46cec22a130adf4b9d7d950fba0 HEAD\0multi_ack thin-pack
- side-band side-band-64k ofs-delta shallow no-progress include-tag
- 00441d3fcd5ced445d1abc402225c0b8a1299641f497 refs/heads/integration
- 003f7217a7c7e582c46cec22a130adf4b9d7d950fba0 refs/heads/master
- 003cb88d2441cac0977faf98efc80305012112238d9d refs/tags/v0.9
- 003c525128480b96c89e6418b1e40909bf6c5b2d580f refs/tags/v1.0
- 003fe92df48743b7bc7d26bcaabfddde0a1e20cae47c refs/tags/v1.0^{}
- 0000
-
-The returned response is a pkt-line stream describing each ref and
-its current value. The stream MUST be sorted by name according to
-the C locale ordering.
-
-If HEAD is a valid ref, HEAD MUST appear as the first advertised
-ref. If HEAD is not a valid ref, HEAD MUST NOT appear in the
-advertisement list at all, but other refs may still appear.
-
-The stream MUST include capability declarations behind a NUL on the
-first ref. The peeled value of a ref (that is "ref^{}") MUST be
-immediately after the ref itself, if presented. A conforming server
-MUST peel the ref if it's an annotated tag.
-
-----
- advertised-refs = *1("version 1")
- (no-refs / list-of-refs)
- *shallow
- flush-pkt
-
- no-refs = PKT-LINE(zero-id SP "capabilities^{}"
- NUL capability-list)
-
- list-of-refs = first-ref *other-ref
- first-ref = PKT-LINE(obj-id SP refname
- NUL capability-list)
-
- other-ref = PKT-LINE(other-tip / other-peeled)
- other-tip = obj-id SP refname
- other-peeled = obj-id SP refname "^{}"
-
- shallow = PKT-LINE("shallow" SP obj-id)
-
- capability-list = capability *(SP capability)
- capability = 1*(LC_ALPHA / DIGIT / "-" / "_")
- LC_ALPHA = %x61-7A
-----
-
-Server and client MUST use lowercase for obj-id, both MUST treat obj-id
-as case-insensitive.
-
-See protocol-capabilities.txt for a list of allowed server capabilities
-and descriptions.
-
-Packfile Negotiation
---------------------
-After reference and capabilities discovery, the client can decide to
-terminate the connection by sending a flush-pkt, telling the server it can
-now gracefully terminate, and disconnect, when it does not need any pack
-data. This can happen with the ls-remote command, and also can happen when
-the client already is up to date.
-
-Otherwise, it enters the negotiation phase, where the client and
-server determine what the minimal packfile necessary for transport is,
-by telling the server what objects it wants, its shallow objects
-(if any), and the maximum commit depth it wants (if any). The client
-will also send a list of the capabilities it wants to be in effect,
-out of what the server said it could do with the first 'want' line.
-
-----
- upload-request = want-list
- *shallow-line
- *1depth-request
- [filter-request]
- flush-pkt
-
- want-list = first-want
- *additional-want
-
- shallow-line = PKT-LINE("shallow" SP obj-id)
-
- depth-request = PKT-LINE("deepen" SP depth) /
- PKT-LINE("deepen-since" SP timestamp) /
- PKT-LINE("deepen-not" SP ref)
-
- first-want = PKT-LINE("want" SP obj-id SP capability-list)
- additional-want = PKT-LINE("want" SP obj-id)
-
- depth = 1*DIGIT
-
- filter-request = PKT-LINE("filter" SP filter-spec)
-----
-
-Clients MUST send all the obj-ids it wants from the reference
-discovery phase as 'want' lines. Clients MUST send at least one
-'want' command in the request body. Clients MUST NOT mention an
-obj-id in a 'want' command which did not appear in the response
-obtained through ref discovery.
-
-The client MUST write all obj-ids which it only has shallow copies
-of (meaning that it does not have the parents of a commit) as
-'shallow' lines so that the server is aware of the limitations of
-the client's history.
-
-The client now sends the maximum commit history depth it wants for
-this transaction, which is the number of commits it wants from the
-tip of the history, if any, as a 'deepen' line. A depth of 0 is the
-same as not making a depth request. The client does not want to receive
-any commits beyond this depth, nor does it want objects needed only to
-complete those commits. Commits whose parents are not received as a
-result are defined as shallow and marked as such in the server. This
-information is sent back to the client in the next step.
-
-The client can optionally request that pack-objects omit various
-objects from the packfile using one of several filtering techniques.
-These are intended for use with partial clone and partial fetch
-operations. An object that does not meet a filter-spec value is
-omitted unless explicitly requested in a 'want' line. See `rev-list`
-for possible filter-spec values.
-
-Once all the 'want's and 'shallow's (and optional 'deepen') are
-transferred, clients MUST send a flush-pkt, to tell the server side
-that it is done sending the list.
-
-Otherwise, if the client sent a positive depth request, the server
-will determine which commits will and will not be shallow and
-send this information to the client. If the client did not request
-a positive depth, this step is skipped.
-
-----
- shallow-update = *shallow-line
- *unshallow-line
- flush-pkt
-
- shallow-line = PKT-LINE("shallow" SP obj-id)
-
- unshallow-line = PKT-LINE("unshallow" SP obj-id)
-----
-
-If the client has requested a positive depth, the server will compute
-the set of commits which are no deeper than the desired depth. The set
-of commits start at the client's wants.
-
-The server writes 'shallow' lines for each
-commit whose parents will not be sent as a result. The server writes
-an 'unshallow' line for each commit which the client has indicated is
-shallow, but is no longer shallow at the currently requested depth
-(that is, its parents will now be sent). The server MUST NOT mark
-as unshallow anything which the client has not indicated was shallow.
-
-Now the client will send a list of the obj-ids it has using 'have'
-lines, so the server can make a packfile that only contains the objects
-that the client needs. In multi_ack mode, the canonical implementation
-will send up to 32 of these at a time, then will send a flush-pkt. The
-canonical implementation will skip ahead and send the next 32 immediately,
-so that there is always a block of 32 "in-flight on the wire" at a time.
-
-----
- upload-haves = have-list
- compute-end
-
- have-list = *have-line
- have-line = PKT-LINE("have" SP obj-id)
- compute-end = flush-pkt / PKT-LINE("done")
-----
-
-If the server reads 'have' lines, it then will respond by ACKing any
-of the obj-ids the client said it had that the server also has. The
-server will ACK obj-ids differently depending on which ack mode is
-chosen by the client.
-
-In multi_ack mode:
-
- * the server will respond with 'ACK obj-id continue' for any common
- commits.
-
- * once the server has found an acceptable common base commit and is
- ready to make a packfile, it will blindly ACK all 'have' obj-ids
- back to the client.
-
- * the server will then send a 'NAK' and then wait for another response
- from the client - either a 'done' or another list of 'have' lines.
-
-In multi_ack_detailed mode:
-
- * the server will differentiate the ACKs where it is signaling
- that it is ready to send data with 'ACK obj-id ready' lines, and
- signals the identified common commits with 'ACK obj-id common' lines.
-
-Without either multi_ack or multi_ack_detailed:
-
- * upload-pack sends "ACK obj-id" on the first common object it finds.
- After that it says nothing until the client gives it a "done".
-
- * upload-pack sends "NAK" on a flush-pkt if no common object
- has been found yet. If one has been found, and thus an ACK
- was already sent, it's silent on the flush-pkt.
-
-After the client has gotten enough ACK responses that it can determine
-that the server has enough information to send an efficient packfile
-(in the canonical implementation, this is determined when it has received
-enough ACKs that it can color everything left in the --date-order queue
-as common with the server, or the --date-order queue is empty), or the
-client determines that it wants to give up (in the canonical implementation,
-this is determined when the client sends 256 'have' lines without getting
-any of them ACKed by the server - meaning there is nothing in common and
-the server should just send all of its objects), then the client will send
-a 'done' command. The 'done' command signals to the server that the client
-is ready to receive its packfile data.
-
-However, the 256 limit *only* turns on in the canonical client
-implementation if we have received at least one "ACK %s continue"
-during a prior round. This helps to ensure that at least one common
-ancestor is found before we give up entirely.
-
-Once the 'done' line is read from the client, the server will either
-send a final 'ACK obj-id' or it will send a 'NAK'. 'obj-id' is the object
-name of the last commit determined to be common. The server only sends
-ACK after 'done' if there is at least one common base and multi_ack or
-multi_ack_detailed is enabled. The server always sends NAK after 'done'
-if there is no common base found.
-
-Instead of 'ACK' or 'NAK', the server may send an error message (for
-example, if it does not recognize an object in a 'want' line received
-from the client).
-
-Then the server will start sending its packfile data.
-
-----
- server-response = *ack_multi ack / nak
- ack_multi = PKT-LINE("ACK" SP obj-id ack_status)
- ack_status = "continue" / "common" / "ready"
- ack = PKT-LINE("ACK" SP obj-id)
- nak = PKT-LINE("NAK")
-----
-
-A simple clone may look like this (with no 'have' lines):
-
-----
- C: 0054want 74730d410fcb6603ace96f1dc55ea6196122532d multi_ack \
- side-band-64k ofs-delta\n
- C: 0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe\n
- C: 0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a\n
- C: 0032want 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01\n
- C: 0032want 74730d410fcb6603ace96f1dc55ea6196122532d\n
- C: 0000
- C: 0009done\n
-
- S: 0008NAK\n
- S: [PACKFILE]
-----
-
-An incremental update (fetch) response might look like this:
-
-----
- C: 0054want 74730d410fcb6603ace96f1dc55ea6196122532d multi_ack \
- side-band-64k ofs-delta\n
- C: 0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe\n
- C: 0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a\n
- C: 0000
- C: 0032have 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01\n
- C: [30 more have lines]
- C: 0032have 74730d410fcb6603ace96f1dc55ea6196122532d\n
- C: 0000
-
- S: 003aACK 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01 continue\n
- S: 003aACK 74730d410fcb6603ace96f1dc55ea6196122532d continue\n
- S: 0008NAK\n
-
- C: 0009done\n
-
- S: 0031ACK 74730d410fcb6603ace96f1dc55ea6196122532d\n
- S: [PACKFILE]
-----
-
-
-Packfile Data
--------------
-
-Now that the client and server have finished negotiation about what
-the minimal amount of data that needs to be sent to the client is, the server
-will construct and send the required data in packfile format.
-
-See pack-format.txt for what the packfile itself actually looks like.
-
-If 'side-band' or 'side-band-64k' capabilities have been specified by
-the client, the server will send the packfile data multiplexed.
-
-Each packet starting with the packet-line length of the amount of data
-that follows, followed by a single byte specifying the sideband the
-following data is coming in on.
-
-In 'side-band' mode, it will send up to 999 data bytes plus 1 control
-code, for a total of up to 1000 bytes in a pkt-line. In 'side-band-64k'
-mode it will send up to 65519 data bytes plus 1 control code, for a
-total of up to 65520 bytes in a pkt-line.
-
-The sideband byte will be a '1', '2' or a '3'. Sideband '1' will contain
-packfile data, sideband '2' will be used for progress information that the
-client will generally print to stderr and sideband '3' is used for error
-information.
-
-If no 'side-band' capability was specified, the server will stream the
-entire packfile without multiplexing.
-
-
-Pushing Data To a Server
-------------------------
-
-Pushing data to a server will invoke the 'receive-pack' process on the
-server, which will allow the client to tell it which references it should
-update and then send all the data the server will need for those new
-references to be complete. Once all the data is received and validated,
-the server will then update its references to what the client specified.
-
-Authentication
---------------
-
-The protocol itself contains no authentication mechanisms. That is to be
-handled by the transport, such as SSH, before the 'receive-pack' process is
-invoked. If 'receive-pack' is configured over the Git transport, those
-repositories will be writable by anyone who can access that port (9418) as
-that transport is unauthenticated.
-
-Reference Discovery
--------------------
-
-The reference discovery phase is done nearly the same way as it is in the
-fetching protocol. Each reference obj-id and name on the server is sent
-in packet-line format to the client, followed by a flush-pkt. The only
-real difference is that the capability listing is different - the only
-possible values are 'report-status', 'report-status-v2', 'delete-refs',
-'ofs-delta', 'atomic' and 'push-options'.
-
-Reference Update Request and Packfile Transfer
-----------------------------------------------
-
-Once the client knows what references the server is at, it can send a
-list of reference update requests. For each reference on the server
-that it wants to update, it sends a line listing the obj-id currently on
-the server, the obj-id the client would like to update it to and the name
-of the reference.
-
-This list is followed by a flush-pkt.
-
-----
- update-requests = *shallow ( command-list | push-cert )
-
- shallow = PKT-LINE("shallow" SP obj-id)
-
- command-list = PKT-LINE(command NUL capability-list)
- *PKT-LINE(command)
- flush-pkt
-
- command = create / delete / update
- create = zero-id SP new-id SP name
- delete = old-id SP zero-id SP name
- update = old-id SP new-id SP name
-
- old-id = obj-id
- new-id = obj-id
-
- push-cert = PKT-LINE("push-cert" NUL capability-list LF)
- PKT-LINE("certificate version 0.1" LF)
- PKT-LINE("pusher" SP ident LF)
- PKT-LINE("pushee" SP url LF)
- PKT-LINE("nonce" SP nonce LF)
- *PKT-LINE("push-option" SP push-option LF)
- PKT-LINE(LF)
- *PKT-LINE(command LF)
- *PKT-LINE(gpg-signature-lines LF)
- PKT-LINE("push-cert-end" LF)
-
- push-option = 1*( VCHAR | SP )
-----
-
-If the server has advertised the 'push-options' capability and the client has
-specified 'push-options' as part of the capability list above, the client then
-sends its push options followed by a flush-pkt.
-
-----
- push-options = *PKT-LINE(push-option) flush-pkt
-----
-
-For backwards compatibility with older Git servers, if the client sends a push
-cert and push options, it MUST send its push options both embedded within the
-push cert and after the push cert. (Note that the push options within the cert
-are prefixed, but the push options after the cert are not.) Both these lists
-MUST be the same, modulo the prefix.
-
-After that the packfile that
-should contain all the objects that the server will need to complete the new
-references will be sent.
-
-----
- packfile = "PACK" 28*(OCTET)
-----
-
-If the receiving end does not support delete-refs, the sending end MUST
-NOT ask for delete command.
-
-If the receiving end does not support push-cert, the sending end
-MUST NOT send a push-cert command. When a push-cert command is
-sent, command-list MUST NOT be sent; the commands recorded in the
-push certificate is used instead.
-
-The packfile MUST NOT be sent if the only command used is 'delete'.
-
-A packfile MUST be sent if either create or update command is used,
-even if the server already has all the necessary objects. In this
-case the client MUST send an empty packfile. The only time this
-is likely to happen is if the client is creating
-a new branch or a tag that points to an existing obj-id.
-
-The server will receive the packfile, unpack it, then validate each
-reference that is being updated that it hasn't changed while the request
-was being processed (the obj-id is still the same as the old-id), and
-it will run any update hooks to make sure that the update is acceptable.
-If all of that is fine, the server will then update the references.
-
-Push Certificate
-----------------
-
-A push certificate begins with a set of header lines. After the
-header and an empty line, the protocol commands follow, one per
-line. Note that the trailing LF in push-cert PKT-LINEs is _not_
-optional; it must be present.
-
-Currently, the following header fields are defined:
-
-`pusher` ident::
- Identify the GPG key in "Human Readable Name <email@address>"
- format.
-
-`pushee` url::
- The repository URL (anonymized, if the URL contains
- authentication material) the user who ran `git push`
- intended to push into.
-
-`nonce` nonce::
- The 'nonce' string the receiving repository asked the
- pushing user to include in the certificate, to prevent
- replay attacks.
-
-The GPG signature lines are a detached signature for the contents
-recorded in the push certificate before the signature block begins.
-The detached signature is used to certify that the commands were
-given by the pusher, who must be the signer.
-
-Report Status
--------------
-
-After receiving the pack data from the sender, the receiver sends a
-report if 'report-status' or 'report-status-v2' capability is in effect.
-It is a short listing of what happened in that update. It will first
-list the status of the packfile unpacking as either 'unpack ok' or
-'unpack [error]'. Then it will list the status for each of the references
-that it tried to update. Each line is either 'ok [refname]' if the
-update was successful, or 'ng [refname] [error]' if the update was not.
-
-----
- report-status = unpack-status
- 1*(command-status)
- flush-pkt
-
- unpack-status = PKT-LINE("unpack" SP unpack-result)
- unpack-result = "ok" / error-msg
-
- command-status = command-ok / command-fail
- command-ok = PKT-LINE("ok" SP refname)
- command-fail = PKT-LINE("ng" SP refname SP error-msg)
-
- error-msg = 1*(OCTET) ; where not "ok"
-----
-
-The 'report-status-v2' capability extends the protocol by adding new option
-lines in order to support reporting of reference rewritten by the
-'proc-receive' hook. The 'proc-receive' hook may handle a command for a
-pseudo-reference which may create or update one or more references, and each
-reference may have different name, different new-oid, and different old-oid.
-
-----
- report-status-v2 = unpack-status
- 1*(command-status-v2)
- flush-pkt
-
- unpack-status = PKT-LINE("unpack" SP unpack-result)
- unpack-result = "ok" / error-msg
-
- command-status-v2 = command-ok-v2 / command-fail
- command-ok-v2 = command-ok
- *option-line
-
- command-ok = PKT-LINE("ok" SP refname)
- command-fail = PKT-LINE("ng" SP refname SP error-msg)
-
- error-msg = 1*(OCTET) ; where not "ok"
-
- option-line = *1(option-refname)
- *1(option-old-oid)
- *1(option-new-oid)
- *1(option-forced-update)
-
- option-refname = PKT-LINE("option" SP "refname" SP refname)
- option-old-oid = PKT-LINE("option" SP "old-oid" SP obj-id)
- option-new-oid = PKT-LINE("option" SP "new-oid" SP obj-id)
- option-force = PKT-LINE("option" SP "forced-update")
-
-----
-
-Updates can be unsuccessful for a number of reasons. The reference can have
-changed since the reference discovery phase was originally sent, meaning
-someone pushed in the meantime. The reference being pushed could be a
-non-fast-forward reference and the update hooks or configuration could be
-set to not allow that, etc. Also, some references can be updated while others
-can be rejected.
-
-An example client/server communication might look like this:
-
-----
- S: 006274730d410fcb6603ace96f1dc55ea6196122532d refs/heads/local\0report-status delete-refs ofs-delta\n
- S: 003e7d1665144a3a975c05f1f43902ddaf084e784dbe refs/heads/debug\n
- S: 003f74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/master\n
- S: 003d74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/team\n
- S: 0000
-
- C: 00677d1665144a3a975c05f1f43902ddaf084e784dbe 74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/debug\n
- C: 006874730d410fcb6603ace96f1dc55ea6196122532d 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a refs/heads/master\n
- C: 0000
- C: [PACKDATA]
-
- S: 000eunpack ok\n
- S: 0018ok refs/heads/debug\n
- S: 002ang refs/heads/master non-fast-forward\n
-----
diff --git a/Documentation/technical/packfile-uri.txt b/Documentation/technical/packfile-uri.txt
index 1eb525f..9d453d4 100644
--- a/Documentation/technical/packfile-uri.txt
+++ b/Documentation/technical/packfile-uri.txt
@@ -18,7 +18,7 @@ a `packfile-uris` argument, the server MAY send a `packfile-uris` section
directly before the `packfile` section (right after `wanted-refs` if it is
sent) containing URIs of any of the given protocols. The URIs point to
packfiles that use only features that the client has declared that it supports
-(e.g. ofs-delta and thin-pack). See protocol-v2.txt for the documentation of
+(e.g. ofs-delta and thin-pack). See linkgit:gitprotocol-v2[5] for the documentation of
this section.
Clients should then download and index all the given URIs (in addition to
diff --git a/Documentation/technical/parallel-checkout.txt b/Documentation/technical/parallel-checkout.txt
index e790258..b4a144e 100644
--- a/Documentation/technical/parallel-checkout.txt
+++ b/Documentation/technical/parallel-checkout.txt
@@ -56,14 +56,14 @@ Rejected Multi-Threaded Solution
The most "straightforward" implementation would be to spread the set of
to-be-updated cache entries across multiple threads. But due to the
-thread-unsafe functions in the ODB code, we would have to use locks to
+thread-unsafe functions in the object database code, we would have to use locks to
coordinate the parallel operation. An early prototype of this solution
showed that the multi-threaded checkout would bring performance
improvements over the sequential code, but there was still too much lock
contention. A `perf` profiling indicated that around 20% of the runtime
during a local Linux clone (on an SSD) was spent in locking functions.
For this reason this approach was rejected in favor of using multiple
-child processes, which led to a better performance.
+child processes, which led to better performance.
Multi-Process Solution
----------------------
@@ -126,7 +126,7 @@ Then, for each assigned item, each worker:
* W5: Writes the result to the file descriptor opened at W2.
-* W6: Calls `fstat()` or lstat()` on the just-written path, and sends
+* W6: Calls `fstat()` or `lstat()` on the just-written path, and sends
the result back to the main process, together with the end status of
the operation and the item's identification number.
@@ -148,7 +148,7 @@ information, the main process handles the results in two steps:
- First, it updates the in-memory index with the `lstat()` information
sent by the workers. (This must be done first as this information
- might me required in the following step.)
+ might be required in the following step.)
- Then it writes the items which collided on disk (i.e. items marked
with `PC_ITEM_COLLIDED`). More on this below.
@@ -185,7 +185,7 @@ quite straightforward: for each parallel-eligible entry, the main
process must remove all files that prevent this entry from being written
(before enqueueing it). This includes any non-directory file in the
leading path of the entry. Later, when a worker gets assigned the entry,
-it looks again for the non-directories files and for an already existing
+it looks again for the non-directory files and for an already existing
file at the entry's path. If any of these checks finds something, the
worker knows that there was a path collision.
@@ -232,7 +232,7 @@ conversion and re-encoding, are eligible for parallel checkout.
Ineligible entries are checked out by the classic sequential codepath
*before* spawning workers.
-Note: submodules's files are also eligible for parallel checkout (as
+Note: submodules' files are also eligible for parallel checkout (as
long as they don't fall into any of the excluding categories mentioned
above). But since each submodule is checked out in its own child
process, we don't mix the superproject's and the submodules' files in
diff --git a/Documentation/technical/partial-clone.txt b/Documentation/technical/partial-clone.txt
index a0dd7c6..cd948b0 100644
--- a/Documentation/technical/partial-clone.txt
+++ b/Documentation/technical/partial-clone.txt
@@ -3,7 +3,7 @@ Partial Clone Design Notes
The "Partial Clone" feature is a performance optimization for Git that
allows Git to function without having a complete copy of the repository.
-The goal of this work is to allow Git better handle extremely large
+The goal of this work is to allow Git to better handle extremely large
repositories.
During clone and fetch operations, Git downloads the complete contents
@@ -79,7 +79,7 @@ Design Details
upload-pack negotiation.
+
This uses the existing capability discovery mechanism.
-See "filter" in Documentation/technical/pack-protocol.txt.
+See "filter" in linkgit:gitprotocol-pack[5].
- Clients pass a "filter-spec" to clone and fetch which is passed to the
server to request filtering during packfile construction.
@@ -181,6 +181,9 @@ Fetching Missing Objects
currently fetches all objects referred to by the requested objects, even
though they are not necessary.
+- Fetching with `--refetch` will request a complete new filtered packfile from
+ the remote, which can be used to change a filter without needing to
+ dynamically fetch missing objects.
Using many promisor remotes
---------------------------
@@ -253,7 +256,7 @@ remote in a specific order.
- Dynamic object fetching currently uses the existing pack protocol V0
which means that each object is requested via fetch-pack. The server
will send a full set of info/refs when the connection is established.
- If there are large number of refs, this may incur significant overhead.
+ If there are a large number of refs, this may incur significant overhead.
Future Work
@@ -262,7 +265,7 @@ Future Work
- Improve the way to specify the order in which promisor remotes are
tried.
+
-For example this could allow to specify explicitly something like:
+For example this could allow specifying explicitly something like:
"When fetching from this remote, I want to use these promisor remotes
in this order, though, when pushing or fetching to that remote, I want
to use those promisor remotes in that order."
@@ -319,7 +322,7 @@ Footnotes
[a] expensive-to-modify list of missing objects: Earlier in the design of
partial clone we discussed the need for a single list of missing objects.
- This would essentially be a sorted linear list of OIDs that the were
+ This would essentially be a sorted linear list of OIDs that were
omitted by the server during a clone or subsequent fetches.
This file would need to be loaded into memory on every object lookup.
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
deleted file mode 100644
index 9dfade9..0000000
--- a/Documentation/technical/protocol-capabilities.txt
+++ /dev/null
@@ -1,380 +0,0 @@
-Git Protocol Capabilities
-=========================
-
-NOTE: this document describes capabilities for versions 0 and 1 of the pack
-protocol. For version 2, please refer to the link:protocol-v2.html[protocol-v2]
-doc.
-
-Servers SHOULD support all capabilities defined in this document.
-
-On the very first line of the initial server response of either
-receive-pack and upload-pack the first reference is followed by
-a NUL byte and then a list of space delimited server capabilities.
-These allow the server to declare what it can and cannot support
-to the client.
-
-Client will then send a space separated list of capabilities it wants
-to be in effect. The client MUST NOT ask for capabilities the server
-did not say it supports.
-
-Server MUST diagnose and abort if capabilities it does not understand
-was sent. Server MUST NOT ignore capabilities that client requested
-and server advertised. As a consequence of these rules, server MUST
-NOT advertise capabilities it does not understand.
-
-The 'atomic', 'report-status', 'report-status-v2', 'delete-refs', 'quiet',
-and 'push-cert' capabilities are sent and recognized by the receive-pack
-(push to server) process.
-
-The 'ofs-delta' and 'side-band-64k' capabilities are sent and recognized
-by both upload-pack and receive-pack protocols. The 'agent' and 'session-id'
-capabilities may optionally be sent in both protocols.
-
-All other capabilities are only recognized by the upload-pack (fetch
-from server) process.
-
-multi_ack
----------
-
-The 'multi_ack' capability allows the server to return "ACK obj-id
-continue" as soon as it finds a commit that it can use as a common
-base, between the client's wants and the client's have set.
-
-By sending this early, the server can potentially head off the client
-from walking any further down that particular branch of the client's
-repository history. The client may still need to walk down other
-branches, sending have lines for those, until the server has a
-complete cut across the DAG, or the client has said "done".
-
-Without multi_ack, a client sends have lines in --date-order until
-the server has found a common base. That means the client will send
-have lines that are already known by the server to be common, because
-they overlap in time with another branch that the server hasn't found
-a common base on yet.
-
-For example suppose the client has commits in caps that the server
-doesn't and the server has commits in lower case that the client
-doesn't, as in the following diagram:
-
- +---- u ---------------------- x
- / +----- y
- / /
- a -- b -- c -- d -- E -- F
- \
- +--- Q -- R -- S
-
-If the client wants x,y and starts out by saying have F,S, the server
-doesn't know what F,S is. Eventually the client says "have d" and
-the server sends "ACK d continue" to let the client know to stop
-walking down that line (so don't send c-b-a), but it's not done yet,
-it needs a base for x. The client keeps going with S-R-Q, until a
-gets reached, at which point the server has a clear base and it all
-ends.
-
-Without multi_ack the client would have sent that c-b-a chain anyway,
-interleaved with S-R-Q.
-
-multi_ack_detailed
-------------------
-This is an extension of multi_ack that permits client to better
-understand the server's in-memory state. See pack-protocol.txt,
-section "Packfile Negotiation" for more information.
-
-no-done
--------
-This capability should only be used with the smart HTTP protocol. If
-multi_ack_detailed and no-done are both present, then the sender is
-free to immediately send a pack following its first "ACK obj-id ready"
-message.
-
-Without no-done in the smart HTTP protocol, the server session would
-end and the client has to make another trip to send "done" before
-the server can send the pack. no-done removes the last round and
-thus slightly reduces latency.
-
-thin-pack
----------
-
-A thin pack is one with deltas which reference base objects not
-contained within the pack (but are known to exist at the receiving
-end). This can reduce the network traffic significantly, but it
-requires the receiving end to know how to "thicken" these packs by
-adding the missing bases to the pack.
-
-The upload-pack server advertises 'thin-pack' when it can generate
-and send a thin pack. A client requests the 'thin-pack' capability
-when it understands how to "thicken" it, notifying the server that
-it can receive such a pack. A client MUST NOT request the
-'thin-pack' capability if it cannot turn a thin pack into a
-self-contained pack.
-
-Receive-pack, on the other hand, is assumed by default to be able to
-handle thin packs, but can ask the client not to use the feature by
-advertising the 'no-thin' capability. A client MUST NOT send a thin
-pack if the server advertises the 'no-thin' capability.
-
-The reasons for this asymmetry are historical. The receive-pack
-program did not exist until after the invention of thin packs, so
-historically the reference implementation of receive-pack always
-understood thin packs. Adding 'no-thin' later allowed receive-pack
-to disable the feature in a backwards-compatible manner.
-
-
-side-band, side-band-64k
-------------------------
-
-This capability means that server can send, and client understand multiplexed
-progress reports and error info interleaved with the packfile itself.
-
-These two options are mutually exclusive. A modern client always
-favors 'side-band-64k'.
-
-Either mode indicates that the packfile data will be streamed broken
-up into packets of up to either 1000 bytes in the case of 'side_band',
-or 65520 bytes in the case of 'side_band_64k'. Each packet is made up
-of a leading 4-byte pkt-line length of how much data is in the packet,
-followed by a 1-byte stream code, followed by the actual data.
-
-The stream code can be one of:
-
- 1 - pack data
- 2 - progress messages
- 3 - fatal error message just before stream aborts
-
-The "side-band-64k" capability came about as a way for newer clients
-that can handle much larger packets to request packets that are
-actually crammed nearly full, while maintaining backward compatibility
-for the older clients.
-
-Further, with side-band and its up to 1000-byte messages, it's actually
-999 bytes of payload and 1 byte for the stream code. With side-band-64k,
-same deal, you have up to 65519 bytes of data and 1 byte for the stream
-code.
-
-The client MUST send only maximum of one of "side-band" and "side-
-band-64k". Server MUST diagnose it as an error if client requests
-both.
-
-ofs-delta
----------
-
-Server can send, and client understand PACKv2 with delta referring to
-its base by position in pack rather than by an obj-id. That is, they can
-send/read OBJ_OFS_DELTA (aka type 6) in a packfile.
-
-agent
------
-
-The server may optionally send a capability of the form `agent=X` to
-notify the client that the server is running version `X`. The client may
-optionally return its own agent string by responding with an `agent=Y`
-capability (but it MUST NOT do so if the server did not mention the
-agent capability). The `X` and `Y` strings may contain any printable
-ASCII characters except space (i.e., the byte range 32 < x < 127), and
-are typically of the form "package/version" (e.g., "git/1.8.3.1"). The
-agent strings are purely informative for statistics and debugging
-purposes, and MUST NOT be used to programmatically assume the presence
-or absence of particular features.
-
-object-format
--------------
-
-This capability, which takes a hash algorithm as an argument, indicates
-that the server supports the given hash algorithms. It may be sent
-multiple times; if so, the first one given is the one used in the ref
-advertisement.
-
-When provided by the client, this indicates that it intends to use the
-given hash algorithm to communicate. The algorithm provided must be one
-that the server supports.
-
-If this capability is not provided, it is assumed that the only
-supported algorithm is SHA-1.
-
-symref
-------
-
-This parameterized capability is used to inform the receiver which symbolic ref
-points to which ref; for example, "symref=HEAD:refs/heads/master" tells the
-receiver that HEAD points to master. This capability can be repeated to
-represent multiple symrefs.
-
-Servers SHOULD include this capability for the HEAD symref if it is one of the
-refs being sent.
-
-Clients MAY use the parameters from this capability to select the proper initial
-branch when cloning a repository.
-
-shallow
--------
-
-This capability adds "deepen", "shallow" and "unshallow" commands to
-the fetch-pack/upload-pack protocol so clients can request shallow
-clones.
-
-deepen-since
-------------
-
-This capability adds "deepen-since" command to fetch-pack/upload-pack
-protocol so the client can request shallow clones that are cut at a
-specific time, instead of depth. Internally it's equivalent of doing
-"rev-list --max-age=<timestamp>" on the server side. "deepen-since"
-cannot be used with "deepen".
-
-deepen-not
-----------
-
-This capability adds "deepen-not" command to fetch-pack/upload-pack
-protocol so the client can request shallow clones that are cut at a
-specific revision, instead of depth. Internally it's equivalent of
-doing "rev-list --not <rev>" on the server side. "deepen-not"
-cannot be used with "deepen", but can be used with "deepen-since".
-
-deepen-relative
----------------
-
-If this capability is requested by the client, the semantics of
-"deepen" command is changed. The "depth" argument is the depth from
-the current shallow boundary, instead of the depth from remote refs.
-
-no-progress
------------
-
-The client was started with "git clone -q" or something, and doesn't
-want that side band 2. Basically the client just says "I do not
-wish to receive stream 2 on sideband, so do not send it to me, and if
-you did, I will drop it on the floor anyway". However, the sideband
-channel 3 is still used for error responses.
-
-include-tag
------------
-
-The 'include-tag' capability is about sending annotated tags if we are
-sending objects they point to. If we pack an object to the client, and
-a tag object points exactly at that object, we pack the tag object too.
-In general this allows a client to get all new annotated tags when it
-fetches a branch, in a single network connection.
-
-Clients MAY always send include-tag, hardcoding it into a request when
-the server advertises this capability. The decision for a client to
-request include-tag only has to do with the client's desires for tag
-data, whether or not a server had advertised objects in the
-refs/tags/* namespace.
-
-Servers MUST pack the tags if their referrant is packed and the client
-has requested include-tags.
-
-Clients MUST be prepared for the case where a server has ignored
-include-tag and has not actually sent tags in the pack. In such
-cases the client SHOULD issue a subsequent fetch to acquire the tags
-that include-tag would have otherwise given the client.
-
-The server SHOULD send include-tag, if it supports it, regardless
-of whether or not there are tags available.
-
-report-status
--------------
-
-The receive-pack process can receive a 'report-status' capability,
-which tells it that the client wants a report of what happened after
-a packfile upload and reference update. If the pushing client requests
-this capability, after unpacking and updating references the server
-will respond with whether the packfile unpacked successfully and if
-each reference was updated successfully. If any of those were not
-successful, it will send back an error message. See pack-protocol.txt
-for example messages.
-
-report-status-v2
-----------------
-
-Capability 'report-status-v2' extends capability 'report-status' by
-adding new "option" directives in order to support reference rewritten by
-the "proc-receive" hook. The "proc-receive" hook may handle a command
-for a pseudo-reference which may create or update a reference with
-different name, new-oid, and old-oid. While the capability
-'report-status' cannot report for such case. See pack-protocol.txt
-for details.
-
-delete-refs
------------
-
-If the server sends back the 'delete-refs' capability, it means that
-it is capable of accepting a zero-id value as the target
-value of a reference update. It is not sent back by the client, it
-simply informs the client that it can be sent zero-id values
-to delete references.
-
-quiet
------
-
-If the receive-pack server advertises the 'quiet' capability, it is
-capable of silencing human-readable progress output which otherwise may
-be shown when processing the received pack. A send-pack client should
-respond with the 'quiet' capability to suppress server-side progress
-reporting if the local progress reporting is also being suppressed
-(e.g., via `push -q`, or if stderr does not go to a tty).
-
-atomic
-------
-
-If the server sends the 'atomic' capability it is capable of accepting
-atomic pushes. If the pushing client requests this capability, the server
-will update the refs in one atomic transaction. Either all refs are
-updated or none.
-
-push-options
-------------
-
-If the server sends the 'push-options' capability it is able to accept
-push options after the update commands have been sent, but before the
-packfile is streamed. If the pushing client requests this capability,
-the server will pass the options to the pre- and post- receive hooks
-that process this push request.
-
-allow-tip-sha1-in-want
-----------------------
-
-If the upload-pack server advertises this capability, fetch-pack may
-send "want" lines with object names that exist at the server but are not
-advertised by upload-pack. For historical reasons, the name of this
-capability contains "sha1". Object names are always given using the
-object format negotiated through the 'object-format' capability.
-
-allow-reachable-sha1-in-want
-----------------------------
-
-If the upload-pack server advertises this capability, fetch-pack may
-send "want" lines with object names that exist at the server but are not
-advertised by upload-pack. For historical reasons, the name of this
-capability contains "sha1". Object names are always given using the
-object format negotiated through the 'object-format' capability.
-
-push-cert=<nonce>
------------------
-
-The receive-pack server that advertises this capability is willing
-to accept a signed push certificate, and asks the <nonce> to be
-included in the push certificate. A send-pack client MUST NOT
-send a push-cert packet unless the receive-pack server advertises
-this capability.
-
-filter
-------
-
-If the upload-pack server advertises the 'filter' capability,
-fetch-pack may send "filter" commands to request a partial clone
-or partial fetch and request that the server omit various objects
-from the packfile.
-
-session-id=<session id>
------------------------
-
-The server may advertise a session ID that can be used to identify this process
-across multiple requests. The client may advertise its own session ID back to
-the server as well.
-
-Session IDs should be unique to a given process. They must fit within a
-packet-line, and must not contain non-printable or whitespace characters. The
-current implementation uses trace2 session IDs (see
-link:api-trace2.html[api-trace2] for details), but this may change and users of
-the session ID should not rely on this fact.
diff --git a/Documentation/technical/protocol-common.txt b/Documentation/technical/protocol-common.txt
deleted file mode 100644
index ecedb34..0000000
--- a/Documentation/technical/protocol-common.txt
+++ /dev/null
@@ -1,99 +0,0 @@
-Documentation Common to Pack and Http Protocols
-===============================================
-
-ABNF Notation
--------------
-
-ABNF notation as described by RFC 5234 is used within the protocol documents,
-except the following replacement core rules are used:
-----
- HEXDIG = DIGIT / "a" / "b" / "c" / "d" / "e" / "f"
-----
-
-We also define the following common rules:
-----
- NUL = %x00
- zero-id = 40*"0"
- obj-id = 40*(HEXDIGIT)
-
- refname = "HEAD"
- refname /= "refs/" <see discussion below>
-----
-
-A refname is a hierarchical octet string beginning with "refs/" and
-not violating the 'git-check-ref-format' command's validation rules.
-More specifically, they:
-
-. They can include slash `/` for hierarchical (directory)
- grouping, but no slash-separated component can begin with a
- dot `.`.
-
-. They must contain at least one `/`. This enforces the presence of a
- category like `heads/`, `tags/` etc. but the actual names are not
- restricted.
-
-. They cannot have two consecutive dots `..` anywhere.
-
-. They cannot have ASCII control characters (i.e. bytes whose
- values are lower than \040, or \177 `DEL`), space, tilde `~`,
- caret `^`, colon `:`, question-mark `?`, asterisk `*`,
- or open bracket `[` anywhere.
-
-. They cannot end with a slash `/` or a dot `.`.
-
-. They cannot end with the sequence `.lock`.
-
-. They cannot contain a sequence `@{`.
-
-. They cannot contain a `\\`.
-
-
-pkt-line Format
----------------
-
-Much (but not all) of the payload is described around pkt-lines.
-
-A pkt-line is a variable length binary string. The first four bytes
-of the line, the pkt-len, indicates the total length of the line,
-in hexadecimal. The pkt-len includes the 4 bytes used to contain
-the length's hexadecimal representation.
-
-A pkt-line MAY contain binary data, so implementors MUST ensure
-pkt-line parsing/formatting routines are 8-bit clean.
-
-A non-binary line SHOULD BE terminated by an LF, which if present
-MUST be included in the total length. Receivers MUST treat pkt-lines
-with non-binary data the same whether or not they contain the trailing
-LF (stripping the LF if present, and not complaining when it is
-missing).
-
-The maximum length of a pkt-line's data component is 65516 bytes.
-Implementations MUST NOT send pkt-line whose length exceeds 65520
-(65516 bytes of payload + 4 bytes of length data).
-
-Implementations SHOULD NOT send an empty pkt-line ("0004").
-
-A pkt-line with a length field of 0 ("0000"), called a flush-pkt,
-is a special case and MUST be handled differently than an empty
-pkt-line ("0004").
-
-----
- pkt-line = data-pkt / flush-pkt
-
- data-pkt = pkt-len pkt-payload
- pkt-len = 4*(HEXDIG)
- pkt-payload = (pkt-len - 4)*(OCTET)
-
- flush-pkt = "0000"
-----
-
-Examples (as C-style strings):
-
-----
- pkt-line actual value
- ---------------------------------
- "0006a\n" "a\n"
- "0005a" "a"
- "000bfoobar\n" "foobar\n"
- "0004" ""
-----
diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
deleted file mode 100644
index 59b86fc..0000000
--- a/Documentation/technical/protocol-v2.txt
+++ /dev/null
@@ -1,564 +0,0 @@
-Git Wire Protocol, Version 2
-============================
-
-This document presents a specification for a version 2 of Git's wire
-protocol. Protocol v2 will improve upon v1 in the following ways:
-
- * Instead of multiple service names, multiple commands will be
- supported by a single service
- * Easily extendable as capabilities are moved into their own section
- of the protocol, no longer being hidden behind a NUL byte and
- limited by the size of a pkt-line
- * Separate out other information hidden behind NUL bytes (e.g. agent
- string as a capability and symrefs can be requested using 'ls-refs')
- * Reference advertisement will be omitted unless explicitly requested
- * ls-refs command to explicitly request some refs
- * Designed with http and stateless-rpc in mind. With clear flush
- semantics the http remote helper can simply act as a proxy
-
-In protocol v2 communication is command oriented. When first contacting a
-server a list of capabilities will advertised. Some of these capabilities
-will be commands which a client can request be executed. Once a command
-has completed, a client can reuse the connection and request that other
-commands be executed.
-
-Packet-Line Framing
--------------------
-
-All communication is done using packet-line framing, just as in v1. See
-`Documentation/technical/pack-protocol.txt` and
-`Documentation/technical/protocol-common.txt` for more information.
-
-In protocol v2 these special packets will have the following semantics:
-
- * '0000' Flush Packet (flush-pkt) - indicates the end of a message
- * '0001' Delimiter Packet (delim-pkt) - separates sections of a message
- * '0002' Response End Packet (response-end-pkt) - indicates the end of a
- response for stateless connections
-
-Initial Client Request
-----------------------
-
-In general a client can request to speak protocol v2 by sending
-`version=2` through the respective side-channel for the transport being
-used which inevitably sets `GIT_PROTOCOL`. More information can be
-found in `pack-protocol.txt` and `http-protocol.txt`, as well as the
-`GIT_PROTOCOL` definition in `git.txt`. In all cases the
-response from the server is the capability advertisement.
-
-Git Transport
-~~~~~~~~~~~~~
-
-When using the git:// transport, you can request to use protocol v2 by
-sending "version=2" as an extra parameter:
-
- 003egit-upload-pack /project.git\0host=myserver.com\0\0version=2\0
-
-SSH and File Transport
-~~~~~~~~~~~~~~~~~~~~~~
-
-When using either the ssh:// or file:// transport, the GIT_PROTOCOL
-environment variable must be set explicitly to include "version=2".
-The server may need to be configured to allow this environment variable
-to pass.
-
-HTTP Transport
-~~~~~~~~~~~~~~
-
-When using the http:// or https:// transport a client makes a "smart"
-info/refs request as described in `http-protocol.txt` and requests that
-v2 be used by supplying "version=2" in the `Git-Protocol` header.
-
- C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0
- C: Git-Protocol: version=2
-
-A v2 server would reply:
-
- S: 200 OK
- S: <Some headers>
- S: ...
- S:
- S: 000eversion 2\n
- S: <capability-advertisement>
-
-Subsequent requests are then made directly to the service
-`$GIT_URL/git-upload-pack`. (This works the same for git-receive-pack).
-
-Uses the `--http-backend-info-refs` option to
-linkgit:git-upload-pack[1].
-
-The server may need to be configured to pass this header's contents via
-the `GIT_PROTOCOL` variable. See the discussion in `git-http-backend.txt`.
-
-Capability Advertisement
-------------------------
-
-A server which decides to communicate (based on a request from a client)
-using protocol version 2, notifies the client by sending a version string
-in its initial response followed by an advertisement of its capabilities.
-Each capability is a key with an optional value. Clients must ignore all
-unknown keys. Semantics of unknown values are left to the definition of
-each key. Some capabilities will describe commands which can be requested
-to be executed by the client.
-
- capability-advertisement = protocol-version
- capability-list
- flush-pkt
-
- protocol-version = PKT-LINE("version 2" LF)
- capability-list = *capability
- capability = PKT-LINE(key[=value] LF)
-
- key = 1*(ALPHA | DIGIT | "-_")
- value = 1*(ALPHA | DIGIT | " -_.,?\/{}[]()<>!@#$%^&*+=:;")
-
-Command Request
----------------
-
-After receiving the capability advertisement, a client can then issue a
-request to select the command it wants with any particular capabilities
-or arguments. There is then an optional section where the client can
-provide any command specific parameters or queries. Only a single
-command can be requested at a time.
-
- request = empty-request | command-request
- empty-request = flush-pkt
- command-request = command
- capability-list
- [command-args]
- flush-pkt
- command = PKT-LINE("command=" key LF)
- command-args = delim-pkt
- *command-specific-arg
-
- command-specific-args are packet line framed arguments defined by
- each individual command.
-
-The server will then check to ensure that the client's request is
-comprised of a valid command as well as valid capabilities which were
-advertised. If the request is valid the server will then execute the
-command. A server MUST wait till it has received the client's entire
-request before issuing a response. The format of the response is
-determined by the command being executed, but in all cases a flush-pkt
-indicates the end of the response.
-
-When a command has finished, and the client has received the entire
-response from the server, a client can either request that another
-command be executed or can terminate the connection. A client may
-optionally send an empty request consisting of just a flush-pkt to
-indicate that no more requests will be made.
-
-Capabilities
-------------
-
-There are two different types of capabilities: normal capabilities,
-which can be used to convey information or alter the behavior of a
-request, and commands, which are the core actions that a client wants to
-perform (fetch, push, etc).
-
-Protocol version 2 is stateless by default. This means that all commands
-must only last a single round and be stateless from the perspective of the
-server side, unless the client has requested a capability indicating that
-state should be maintained by the server. Clients MUST NOT require state
-management on the server side in order to function correctly. This
-permits simple round-robin load-balancing on the server side, without
-needing to worry about state management.
-
-agent
-~~~~~
-
-The server can advertise the `agent` capability with a value `X` (in the
-form `agent=X`) to notify the client that the server is running version
-`X`. The client may optionally send its own agent string by including
-the `agent` capability with a value `Y` (in the form `agent=Y`) in its
-request to the server (but it MUST NOT do so if the server did not
-advertise the agent capability). The `X` and `Y` strings may contain any
-printable ASCII characters except space (i.e., the byte range 32 < x <
-127), and are typically of the form "package/version" (e.g.,
-"git/1.8.3.1"). The agent strings are purely informative for statistics
-and debugging purposes, and MUST NOT be used to programmatically assume
-the presence or absence of particular features.
-
-ls-refs
-~~~~~~~
-
-`ls-refs` is the command used to request a reference advertisement in v2.
-Unlike the current reference advertisement, ls-refs takes in arguments
-which can be used to limit the refs sent from the server.
-
-Additional features not supported in the base command will be advertised
-as the value of the command in the capability advertisement in the form
-of a space separated list of features: "<command>=<feature 1> <feature 2>"
-
-ls-refs takes in the following arguments:
-
- symrefs
- In addition to the object pointed by it, show the underlying ref
- pointed by it when showing a symbolic ref.
- peel
- Show peeled tags.
- ref-prefix <prefix>
- When specified, only references having a prefix matching one of
- the provided prefixes are displayed.
-
-If the 'unborn' feature is advertised the following argument can be
-included in the client's request.
-
- unborn
- The server will send information about HEAD even if it is a symref
- pointing to an unborn branch in the form "unborn HEAD
- symref-target:<target>".
-
-The output of ls-refs is as follows:
-
- output = *ref
- flush-pkt
- obj-id-or-unborn = (obj-id | "unborn")
- ref = PKT-LINE(obj-id-or-unborn SP refname *(SP ref-attribute) LF)
- ref-attribute = (symref | peeled)
- symref = "symref-target:" symref-target
- peeled = "peeled:" obj-id
-
-fetch
-~~~~~
-
-`fetch` is the command used to fetch a packfile in v2. It can be looked
-at as a modified version of the v1 fetch where the ref-advertisement is
-stripped out (since the `ls-refs` command fills that role) and the
-message format is tweaked to eliminate redundancies and permit easy
-addition of future extensions.
-
-Additional features not supported in the base command will be advertised
-as the value of the command in the capability advertisement in the form
-of a space separated list of features: "<command>=<feature 1> <feature 2>"
-
-A `fetch` request can take the following arguments:
-
- want <oid>
- Indicates to the server an object which the client wants to
- retrieve. Wants can be anything and are not limited to
- advertised objects.
-
- have <oid>
- Indicates to the server an object which the client has locally.
- This allows the server to make a packfile which only contains
- the objects that the client needs. Multiple 'have' lines can be
- supplied.
-
- done
- Indicates to the server that negotiation should terminate (or
- not even begin if performing a clone) and that the server should
- use the information supplied in the request to construct the
- packfile.
-
- thin-pack
- Request that a thin pack be sent, which is a pack with deltas
- which reference base objects not contained within the pack (but
- are known to exist at the receiving end). This can reduce the
- network traffic significantly, but it requires the receiving end
- to know how to "thicken" these packs by adding the missing bases
- to the pack.
-
- no-progress
- Request that progress information that would normally be sent on
- side-band channel 2, during the packfile transfer, should not be
- sent. However, the side-band channel 3 is still used for error
- responses.
-
- include-tag
- Request that annotated tags should be sent if the objects they
- point to are being sent.
-
- ofs-delta
- Indicate that the client understands PACKv2 with delta referring
- to its base by position in pack rather than by an oid. That is,
- they can read OBJ_OFS_DELTA (aka type 6) in a packfile.
-
-If the 'shallow' feature is advertised the following arguments can be
-included in the clients request as well as the potential addition of the
-'shallow-info' section in the server's response as explained below.
-
- shallow <oid>
- A client must notify the server of all commits for which it only
- has shallow copies (meaning that it doesn't have the parents of
- a commit) by supplying a 'shallow <oid>' line for each such
- object so that the server is aware of the limitations of the
- client's history. This is so that the server is aware that the
- client may not have all objects reachable from such commits.
-
- deepen <depth>
- Requests that the fetch/clone should be shallow having a commit
- depth of <depth> relative to the remote side.
-
- deepen-relative
- Requests that the semantics of the "deepen" command be changed
- to indicate that the depth requested is relative to the client's
- current shallow boundary, instead of relative to the requested
- commits.
-
- deepen-since <timestamp>
- Requests that the shallow clone/fetch should be cut at a
- specific time, instead of depth. Internally it's equivalent to
- doing "git rev-list --max-age=<timestamp>". Cannot be used with
- "deepen".
-
- deepen-not <rev>
- Requests that the shallow clone/fetch should be cut at a
- specific revision specified by '<rev>', instead of a depth.
- Internally it's equivalent of doing "git rev-list --not <rev>".
- Cannot be used with "deepen", but can be used with
- "deepen-since".
-
-If the 'filter' feature is advertised, the following argument can be
-included in the client's request:
-
- filter <filter-spec>
- Request that various objects from the packfile be omitted
- using one of several filtering techniques. These are intended
- for use with partial clone and partial fetch operations. See
- `rev-list` for possible "filter-spec" values. When communicating
- with other processes, senders SHOULD translate scaled integers
- (e.g. "1k") into a fully-expanded form (e.g. "1024") to aid
- interoperability with older receivers that may not understand
- newly-invented scaling suffixes. However, receivers SHOULD
- accept the following suffixes: 'k', 'm', and 'g' for 1024,
- 1048576, and 1073741824, respectively.
-
-If the 'ref-in-want' feature is advertised, the following argument can
-be included in the client's request as well as the potential addition of
-the 'wanted-refs' section in the server's response as explained below.
-
- want-ref <ref>
- Indicates to the server that the client wants to retrieve a
- particular ref, where <ref> is the full name of a ref on the
- server.
-
-If the 'sideband-all' feature is advertised, the following argument can be
-included in the client's request:
-
- sideband-all
- Instruct the server to send the whole response multiplexed, not just
- the packfile section. All non-flush and non-delim PKT-LINE in the
- response (not only in the packfile section) will then start with a byte
- indicating its sideband (1, 2, or 3), and the server may send "0005\2"
- (a PKT-LINE of sideband 2 with no payload) as a keepalive packet.
-
-If the 'packfile-uris' feature is advertised, the following argument
-can be included in the client's request as well as the potential
-addition of the 'packfile-uris' section in the server's response as
-explained below.
-
- packfile-uris <comma-separated list of protocols>
- Indicates to the server that the client is willing to receive
- URIs of any of the given protocols in place of objects in the
- sent packfile. Before performing the connectivity check, the
- client should download from all given URIs. Currently, the
- protocols supported are "http" and "https".
-
-If the 'wait-for-done' feature is advertised, the following argument
-can be included in the client's request.
-
- wait-for-done
- Indicates to the server that it should never send "ready", but
- should wait for the client to say "done" before sending the
- packfile.
-
-The response of `fetch` is broken into a number of sections separated by
-delimiter packets (0001), with each section beginning with its section
-header. Most sections are sent only when the packfile is sent.
-
- output = acknowledgements flush-pkt |
- [acknowledgments delim-pkt] [shallow-info delim-pkt]
- [wanted-refs delim-pkt] [packfile-uris delim-pkt]
- packfile flush-pkt
-
- acknowledgments = PKT-LINE("acknowledgments" LF)
- (nak | *ack)
- (ready)
- ready = PKT-LINE("ready" LF)
- nak = PKT-LINE("NAK" LF)
- ack = PKT-LINE("ACK" SP obj-id LF)
-
- shallow-info = PKT-LINE("shallow-info" LF)
- *PKT-LINE((shallow | unshallow) LF)
- shallow = "shallow" SP obj-id
- unshallow = "unshallow" SP obj-id
-
- wanted-refs = PKT-LINE("wanted-refs" LF)
- *PKT-LINE(wanted-ref LF)
- wanted-ref = obj-id SP refname
-
- packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri
- packfile-uri = PKT-LINE(40*(HEXDIGIT) SP *%x20-ff LF)
-
- packfile = PKT-LINE("packfile" LF)
- *PKT-LINE(%x01-03 *%x00-ff)
-
- acknowledgments section
- * If the client determines that it is finished with negotiations by
- sending a "done" line (thus requiring the server to send a packfile),
- the acknowledgments sections MUST be omitted from the server's
- response.
-
- * Always begins with the section header "acknowledgments"
-
- * The server will respond with "NAK" if none of the object ids sent
- as have lines were common.
-
- * The server will respond with "ACK obj-id" for all of the
- object ids sent as have lines which are common.
-
- * A response cannot have both "ACK" lines as well as a "NAK"
- line.
-
- * The server will respond with a "ready" line indicating that
- the server has found an acceptable common base and is ready to
- make and send a packfile (which will be found in the packfile
- section of the same response)
-
- * If the server has found a suitable cut point and has decided
- to send a "ready" line, then the server can decide to (as an
- optimization) omit any "ACK" lines it would have sent during
- its response. This is because the server will have already
- determined the objects it plans to send to the client and no
- further negotiation is needed.
-
- shallow-info section
- * If the client has requested a shallow fetch/clone, a shallow
- client requests a fetch or the server is shallow then the
- server's response may include a shallow-info section. The
- shallow-info section will be included if (due to one of the
- above conditions) the server needs to inform the client of any
- shallow boundaries or adjustments to the clients already
- existing shallow boundaries.
-
- * Always begins with the section header "shallow-info"
-
- * If a positive depth is requested, the server will compute the
- set of commits which are no deeper than the desired depth.
-
- * The server sends a "shallow obj-id" line for each commit whose
- parents will not be sent in the following packfile.
-
- * The server sends an "unshallow obj-id" line for each commit
- which the client has indicated is shallow, but is no longer
- shallow as a result of the fetch (due to its parents being
- sent in the following packfile).
-
- * The server MUST NOT send any "unshallow" lines for anything
- which the client has not indicated was shallow as a part of
- its request.
-
- wanted-refs section
- * This section is only included if the client has requested a
- ref using a 'want-ref' line and if a packfile section is also
- included in the response.
-
- * Always begins with the section header "wanted-refs".
-
- * The server will send a ref listing ("<oid> <refname>") for
- each reference requested using 'want-ref' lines.
-
- * The server MUST NOT send any refs which were not requested
- using 'want-ref' lines.
-
- packfile-uris section
- * This section is only included if the client sent
- 'packfile-uris' and the server has at least one such URI to
- send.
-
- * Always begins with the section header "packfile-uris".
-
- * For each URI the server sends, it sends a hash of the pack's
- contents (as output by git index-pack) followed by the URI.
-
- * The hashes are 40 hex characters long. When Git upgrades to a new
- hash algorithm, this might need to be updated. (It should match
- whatever index-pack outputs after "pack\t" or "keep\t".
-
- packfile section
- * This section is only included if the client has sent 'want'
- lines in its request and either requested that no more
- negotiation be done by sending 'done' or if the server has
- decided it has found a sufficient cut point to produce a
- packfile.
-
- * Always begins with the section header "packfile"
-
- * The transmission of the packfile begins immediately after the
- section header
-
- * The data transfer of the packfile is always multiplexed, using
- the same semantics of the 'side-band-64k' capability from
- protocol version 1. This means that each packet, during the
- packfile data stream, is made up of a leading 4-byte pkt-line
- length (typical of the pkt-line format), followed by a 1-byte
- stream code, followed by the actual data.
-
- The stream code can be one of:
- 1 - pack data
- 2 - progress messages
- 3 - fatal error message just before stream aborts
-
-server-option
-~~~~~~~~~~~~~
-
-If advertised, indicates that any number of server specific options can be
-included in a request. This is done by sending each option as a
-"server-option=<option>" capability line in the capability-list section of
-a request.
-
-The provided options must not contain a NUL or LF character.
-
- object-format
-~~~~~~~~~~~~~~~
-
-The server can advertise the `object-format` capability with a value `X` (in the
-form `object-format=X`) to notify the client that the server is able to deal
-with objects using hash algorithm X. If not specified, the server is assumed to
-only handle SHA-1. If the client would like to use a hash algorithm other than
-SHA-1, it should specify its object-format string.
-
-session-id=<session id>
-~~~~~~~~~~~~~~~~~~~~~~~
-
-The server may advertise a session ID that can be used to identify this process
-across multiple requests. The client may advertise its own session ID back to
-the server as well.
-
-Session IDs should be unique to a given process. They must fit within a
-packet-line, and must not contain non-printable or whitespace characters. The
-current implementation uses trace2 session IDs (see
-link:api-trace2.html[api-trace2] for details), but this may change and users of
-the session ID should not rely on this fact.
-
-object-info
-~~~~~~~~~~~
-
-`object-info` is the command to retrieve information about one or more objects.
-Its main purpose is to allow a client to make decisions based on this
-information without having to fully fetch objects. Object size is the only
-information that is currently supported.
-
-An `object-info` request takes the following arguments:
-
- size
- Requests size information to be returned for each listed object id.
-
- oid <oid>
- Indicates to the server an object which the client wants to obtain
- information for.
-
-The response of `object-info` is a list of the requested object ids
-and associated requested information, each separated by a single space.
-
- output = info flush-pkt
-
- info = PKT-LINE(attrs) LF)
- *PKT-LINE(obj-info LF)
-
- attrs = attr | attrs SP attrs
-
- attr = "size"
-
- obj-info = obj-id SP obj-size
diff --git a/Documentation/technical/racy-git.txt b/Documentation/technical/racy-git.txt
index ceda4bb..59bea66 100644
--- a/Documentation/technical/racy-git.txt
+++ b/Documentation/technical/racy-git.txt
@@ -11,7 +11,7 @@ write out the next tree object to be committed. The state is
"virtual" in the sense that it does not necessarily have to, and
often does not, match the files in the working tree.
-There are cases Git needs to examine the differences between the
+There are cases where Git needs to examine the differences between the
virtual working tree state in the index and the files in the
working tree. The most obvious case is when the user asks `git
diff` (or its low level implementation, `git diff-files`) or
@@ -165,9 +165,9 @@ Avoiding runtime penalty
In order to avoid the above runtime penalty, post 1.4.2 Git used
to have a code that made sure the index file
-got timestamp newer than the youngest files in the index when
-there are many young files with the same timestamp as the
-resulting index file would otherwise would have by waiting
+got a timestamp newer than the youngest files in the index when
+there were many young files with the same timestamp as the
+resulting index file otherwise would have by waiting
before finishing writing the index file out.
I suspected that in practice the situation where many paths in the
@@ -190,7 +190,7 @@ In a large project where raciness avoidance cost really matters,
however, the initial computation of all object names in the
index takes more than one second, and the index file is written
out after all that happens. Therefore the timestamp of the
-index file will be more than one seconds later than the
+index file will be more than one second later than the
youngest file in the working tree. This means that in these
cases there actually will not be any racily clean entry in
the resulting index.
diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
index d7c3b64..dd0b37c 100644
--- a/Documentation/technical/reftable.txt
+++ b/Documentation/technical/reftable.txt
@@ -46,7 +46,7 @@ search lookup, and range scans.
Storage in the file is organized into variable sized blocks. Prefix
compression is used within a single block to reduce disk space. Block
-size and alignment is tunable by the writer.
+size and alignment are tunable by the writer.
Performance
^^^^^^^^^^^
@@ -115,7 +115,7 @@ Varint encoding
Varint encoding is identical to the ofs-delta encoding method used
within pack files.
-Decoder works such as:
+Decoder works as follows:
....
val = buf[ptr] & 0x7f
@@ -175,7 +175,7 @@ log_index*
footer
....
-in a log-only file the first log block immediately follows the file
+In a log-only file, the first log block immediately follows the file
header, without padding to block alignment.
Block size
@@ -247,7 +247,7 @@ uint32( hash_id )
....
The header is identical to `version_number=1`, with the 4-byte hash ID
-("sha1" for SHA1 and "s256" for SHA-256) append to the header.
+("sha1" for SHA1 and "s256" for SHA-256) appended to the header.
For maximum backward compatibility, it is recommended to use version 1 when
writing SHA1 reftables.
@@ -288,7 +288,7 @@ The 2-byte `restart_count` stores the number of entries in the
`restart_count` to binary search between restarts before starting a
linear scan.
-Exactly `restart_count` 3-byte `restart_offset` values precedes the
+Exactly `restart_count` 3-byte `restart_offset` values precede the
`restart_count`. Offsets are relative to the start of the block and
refer to the first byte of any `ref_record` whose name has not been
prefix compressed. Entries in the `restart_offset` list must be sorted,
@@ -443,7 +443,7 @@ Obj block format
Object blocks are optional. Writers may choose to omit object blocks,
especially if readers will not use the object name to ref mapping.
-Object blocks use unique, abbreviated 2-32 object name keys, mapping to
+Object blocks use unique, abbreviated 2-31 byte object name keys, mapping to
ref blocks containing references pointing to that object directly, or as
the peeled value of an annotated tag. Like ref blocks, object blocks use
the file's standard block size. The abbreviation length is available in
diff --git a/Documentation/technical/remembering-renames.txt b/Documentation/technical/remembering-renames.txt
index 2fd5cc8..73f4176 100644
--- a/Documentation/technical/remembering-renames.txt
+++ b/Documentation/technical/remembering-renames.txt
@@ -20,7 +20,7 @@ Outline:
3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also
a rename on MERGE_SIDE1 for the next pick
- 4. A detailed description of the the counter-examples to #3.
+ 4. A detailed description of the counter-examples to #3.
5. Why the special cases in #4 are still fully reasonable to use to pair
up files for three-way content merging in the merge machinery, and why
@@ -407,7 +407,7 @@ considered to be "irrelevant". See for example the following commits:
no longer relevant", 2021-03-13)
Relevance is always determined by what the _other_ side of history has
-done, in terms of modifing a file that our side renamed, or adding a
+done, in terms of modifying a file that our side renamed, or adding a
file to a directory which our side renamed. This means that a path
that is "irrelevant" when picking the first commit of a series in a
rebase or cherry-pick, may suddenly become "relevant" when picking the
@@ -664,7 +664,7 @@ skip-irrelevant-renames optimization means we sometimes don't detect
renames for any files within a directory that was renamed, in which
case we will not have been able to detect any rename for the directory
itself. In such a case, we do not know whether the directory was
-renamed; we want to be careful to avoid cacheing some kind of "this
+renamed; we want to be careful to avoid caching some kind of "this
directory was not renamed" statement. If we did, then a subsequent
commit being rebased could add a file to the old directory, and the
user would expect it to end up in the correct directory -- something
diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 7844ef3..4728142 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -82,9 +82,9 @@ When the config key `extensions.preciousObjects` is set to `true`,
objects in the repository MUST NOT be deleted (e.g., by `git-prune` or
`git repack -d`).
-==== `partialclone`
+==== `partialClone`
-When the config key `extensions.partialclone` is set, it indicates
+When the config key `extensions.partialClone` is set, it indicates
that the repo was created with a partial clone (or later performed
a partial fetch) and that the remote may have omitted sending
certain unwanted objects. Such a remote is called a "promisor remote"
@@ -96,7 +96,13 @@ The value of this key is the name of the promisor remote.
==== `worktreeConfig`
If set, by default "git config" reads from both "config" and
-"config.worktree" file from GIT_DIR in that order. In
+"config.worktree" files from GIT_DIR in that order. In
multiple working directory mode, "config" file is shared while
"config.worktree" is per-working directory (i.e., it's in
GIT_COMMON_DIR/worktrees/<id>/config.worktree)
+
+==== `refStorage`
+
+Specifies the file format for the ref database. The valid values are
+`files` (loose references with a packed-refs file) and `reftable` (see
+Documentation/technical/reftable.txt).
diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
index af5f9fc..580f233 100644
--- a/Documentation/technical/rerere.txt
+++ b/Documentation/technical/rerere.txt
@@ -14,9 +14,9 @@ conflicts before writing them to the rerere database.
Different conflict styles and branch names are normalized by stripping
the labels from the conflict markers, and removing the common ancestor
-version from the `diff3` conflict style. Branches that are merged
-in different order are normalized by sorting the conflict hunks. More
-on each of those steps in the following sections.
+version from the `diff3` or `zdiff3` conflict styles. Branches that
+are merged in different order are normalized by sorting the conflict
+hunks. More on each of those steps in the following sections.
Once these two normalization operations are applied, a conflict ID is
calculated based on the normalized conflict, which is later used by
@@ -42,8 +42,8 @@ get a conflict like the following:
>>>>>>> AC
Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB
-and then merging branch AC2 into it), using the diff3 conflict style,
-we get a conflict like the following:
+and then merging branch AC2 into it), using the diff3 or zdiff3
+conflict style, we get a conflict like the following:
<<<<<<< HEAD
B
@@ -60,7 +60,7 @@ By resolving this conflict, to leave line D, the user declares:
what AB and AC wanted to do.
As branch AC2 refers to the same commit as AC, the above implies that
-this is also compatible what AB and AC2 wanted to do.
+this is also compatible with what AB and AC2 wanted to do.
By extension, this means that rerere should recognize that the above
conflicts are the same. To do this, the labels on the conflict
@@ -76,7 +76,7 @@ examples would both result in the following normalized conflict:
Sorting hunks
~~~~~~~~~~~~~
-As before, lets imagine that a common ancestor had a file with line A
+As before, let's imagine that a common ancestor had a file with line A
its early part, and line X in its late part. And then four branches
are forked that do these things:
@@ -99,7 +99,7 @@ conflict to leave line D means that the user declares:
compatible with what AB and AC wanted to do.
So the conflict we would see when merging AB into ACAB should be
-resolved the same way---it is the resolution that is in line with that
+resolved the same way--it is the resolution that is in line with that
declaration.
Imagine that similarly previously a branch XYXZ was forked from XY,
@@ -145,7 +145,7 @@ Nested conflicts
Nested conflicts are handled very similarly to "simple" conflicts.
Similar to simple conflicts, the conflict is first normalized by
stripping the labels from conflict markers, stripping the common ancestor
-version, and the sorting the conflict hunks, both for the outer and the
+version, and sorting the conflict hunks, both for the outer and the
inner conflict. This is done recursively, so any number of nested
conflicts can be handled.
diff --git a/Documentation/technical/scalar.txt b/Documentation/technical/scalar.txt
new file mode 100644
index 0000000..921cb10
--- /dev/null
+++ b/Documentation/technical/scalar.txt
@@ -0,0 +1,66 @@
+Scalar
+======
+
+Scalar is a repository management tool that optimizes Git for use in large
+repositories. It accomplishes this by helping users to take advantage of
+advanced performance features in Git. Unlike most other Git built-in commands,
+Scalar is not executed as a subcommand of 'git'; rather, it is built as a
+separate executable containing its own series of subcommands.
+
+Background
+----------
+
+Scalar was originally designed as an add-on to Git and implemented as a .NET
+Core application. It was created based on the learnings from the VFS for Git
+project (another application aimed at improving the experience of working with
+large repositories). As part of its initial implementation, Scalar relied on
+custom features in the Microsoft fork of Git that have since been integrated
+into core Git:
+
+* partial clone,
+* commit graphs,
+* multi-pack index,
+* sparse checkout (cone mode),
+* scheduled background maintenance,
+* etc
+
+With the requisite Git functionality in place and a desire to bring the benefits
+of Scalar to the larger Git community, the Scalar application itself was ported
+from C# to C and integrated upstream.
+
+Features
+--------
+
+Scalar is comprised of two major pieces of functionality: automatically
+configuring built-in Git performance features and managing repository
+enlistments.
+
+The Git performance features configured by Scalar (see "Background" for
+examples) confer substantial performance benefits to large repositories, but are
+either too experimental to enable for all of Git yet, or only benefit large
+repositories. As new features are introduced, Scalar should be updated
+accordingly to incorporate them. This will prevent the tool from becoming stale
+while also providing a path for more easily bringing features to the appropriate
+users.
+
+Enlistments are how Scalar knows which repositories on a user's system should
+utilize Scalar-configured features. This allows it to update performance
+settings when new ones are added to the tool, as well as centrally manage
+repository maintenance. The enlistment structure - a root directory with a
+`src/` subdirectory containing the cloned repository itself - is designed to
+encourage users to route build outputs outside of the repository to avoid the
+performance-limiting overhead of ignoring those files in Git.
+
+Design
+------
+
+Scalar is implemented in C and interacts with Git via a mix of child process
+invocations of Git and direct usage of `libgit.a`. Internally, it is structured
+much like other built-ins with subcommands (e.g., `git stash`), containing a
+`cmd_<subcommand>()` function for each subcommand, routed through a `cmd_main()`
+function. Most options are unique to each subcommand, with `scalar` respecting
+some "global" `git` options (e.g., `-c` and `-C`).
+
+Because `scalar` is not invoked as a Git subcommand (like `git scalar`), it is
+built and installed as its own executable in the `bin/` directory, alongside
+`git`, `git-gui`, etc.
diff --git a/Documentation/technical/signature-format.txt b/Documentation/technical/signature-format.txt
deleted file mode 100644
index 2c9406a..0000000
--- a/Documentation/technical/signature-format.txt
+++ /dev/null
@@ -1,186 +0,0 @@
-Git signature format
-====================
-
-== Overview
-
-Git uses cryptographic signatures in various places, currently objects (tags,
-commits, mergetags) and transactions (pushes). In every case, the command which
-is about to create an object or transaction determines a payload from that,
-calls gpg to obtain a detached signature for the payload (`gpg -bsa`) and
-embeds the signature into the object or transaction.
-
-Signatures always begin with `-----BEGIN PGP SIGNATURE-----`
-and end with `-----END PGP SIGNATURE-----`, unless gpg is told to
-produce RFC1991 signatures which use `MESSAGE` instead of `SIGNATURE`.
-
-The signed payload and the way the signature is embedded depends
-on the type of the object resp. transaction.
-
-== Tag signatures
-
-- created by: `git tag -s`
-- payload: annotated tag object
-- embedding: append the signature to the unsigned tag object
-- example: tag `signedtag` with subject `signed tag`
-
-----
-object 04b871796dc0420f8e7561a895b52484b701d51a
-type commit
-tag signedtag
-tagger C O Mitter <committer@example.com> 1465981006 +0000
-
-signed tag
-
-signed tag message body
------BEGIN PGP SIGNATURE-----
-Version: GnuPG v1
-
-iQEcBAABAgAGBQJXYRhOAAoJEGEJLoW3InGJklkIAIcnhL7RwEb/+QeX9enkXhxn
-rxfdqrvWd1K80sl2TOt8Bg/NYwrUBw/RWJ+sg/hhHp4WtvE1HDGHlkEz3y11Lkuh
-8tSxS3qKTxXUGozyPGuE90sJfExhZlW4knIQ1wt/yWqM+33E9pN4hzPqLwyrdods
-q8FWEqPPUbSJXoMbRPw04S5jrLtZSsUWbRYjmJCHzlhSfFWW4eFd37uquIaLUBS0
-rkC3Jrx7420jkIpgFcTI2s60uhSQLzgcCwdA2ukSYIRnjg/zDkj8+3h/GaROJ72x
-lZyI6HWixKJkWw8lE9aAOD9TmTW9sFJwcVAzmAuFX2kUreDUKMZduGcoRYGpD7E=
-=jpXa
------END PGP SIGNATURE-----
-----
-
-- verify with: `git verify-tag [-v]` or `git tag -v`
-
-----
-gpg: Signature made Wed Jun 15 10:56:46 2016 CEST using RSA key ID B7227189
-gpg: Good signature from "Eris Discordia <discord@example.net>"
-gpg: WARNING: This key is not certified with a trusted signature!
-gpg: There is no indication that the signature belongs to the owner.
-Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189
-object 04b871796dc0420f8e7561a895b52484b701d51a
-type commit
-tag signedtag
-tagger C O Mitter <committer@example.com> 1465981006 +0000
-
-signed tag
-
-signed tag message body
-----
-
-== Commit signatures
-
-- created by: `git commit -S`
-- payload: commit object
-- embedding: header entry `gpgsig`
- (content is preceded by a space)
-- example: commit with subject `signed commit`
-
-----
-tree eebfed94e75e7760540d1485c740902590a00332
-parent 04b871796dc0420f8e7561a895b52484b701d51a
-author A U Thor <author@example.com> 1465981137 +0000
-committer C O Mitter <committer@example.com> 1465981137 +0000
-gpgsig -----BEGIN PGP SIGNATURE-----
- Version: GnuPG v1
-
- iQEcBAABAgAGBQJXYRjRAAoJEGEJLoW3InGJ3IwIAIY4SA6GxY3BjL60YyvsJPh/
- HRCJwH+w7wt3Yc/9/bW2F+gF72kdHOOs2jfv+OZhq0q4OAN6fvVSczISY/82LpS7
- DVdMQj2/YcHDT4xrDNBnXnviDO9G7am/9OE77kEbXrp7QPxvhjkicHNwy2rEflAA
- zn075rtEERDHr8nRYiDh8eVrefSO7D+bdQ7gv+7GsYMsd2auJWi1dHOSfTr9HIF4
- HJhWXT9d2f8W+diRYXGh4X0wYiGg6na/soXc+vdtDYBzIxanRqjg8jCAeo1eOTk1
- EdTwhcTZlI0x5pvJ3H0+4hA2jtldVtmPM4OTB0cTrEWBad7XV6YgiyuII73Ve3I=
- =jKHM
- -----END PGP SIGNATURE-----
-
-signed commit
-
-signed commit message body
-----
-
-- verify with: `git verify-commit [-v]` (or `git show --show-signature`)
-
-----
-gpg: Signature made Wed Jun 15 10:58:57 2016 CEST using RSA key ID B7227189
-gpg: Good signature from "Eris Discordia <discord@example.net>"
-gpg: WARNING: This key is not certified with a trusted signature!
-gpg: There is no indication that the signature belongs to the owner.
-Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189
-tree eebfed94e75e7760540d1485c740902590a00332
-parent 04b871796dc0420f8e7561a895b52484b701d51a
-author A U Thor <author@example.com> 1465981137 +0000
-committer C O Mitter <committer@example.com> 1465981137 +0000
-
-signed commit
-
-signed commit message body
-----
-
-== Mergetag signatures
-
-- created by: `git merge` on signed tag
-- payload/embedding: the whole signed tag object is embedded into
- the (merge) commit object as header entry `mergetag`
-- example: merge of the signed tag `signedtag` as above
-
-----
-tree c7b1cff039a93f3600a1d18b82d26688668c7dea
-parent c33429be94b5f2d3ee9b0adad223f877f174b05d
-parent 04b871796dc0420f8e7561a895b52484b701d51a
-author A U Thor <author@example.com> 1465982009 +0000
-committer C O Mitter <committer@example.com> 1465982009 +0000
-mergetag object 04b871796dc0420f8e7561a895b52484b701d51a
- type commit
- tag signedtag
- tagger C O Mitter <committer@example.com> 1465981006 +0000
-
- signed tag
-
- signed tag message body
- -----BEGIN PGP SIGNATURE-----
- Version: GnuPG v1
-
- iQEcBAABAgAGBQJXYRhOAAoJEGEJLoW3InGJklkIAIcnhL7RwEb/+QeX9enkXhxn
- rxfdqrvWd1K80sl2TOt8Bg/NYwrUBw/RWJ+sg/hhHp4WtvE1HDGHlkEz3y11Lkuh
- 8tSxS3qKTxXUGozyPGuE90sJfExhZlW4knIQ1wt/yWqM+33E9pN4hzPqLwyrdods
- q8FWEqPPUbSJXoMbRPw04S5jrLtZSsUWbRYjmJCHzlhSfFWW4eFd37uquIaLUBS0
- rkC3Jrx7420jkIpgFcTI2s60uhSQLzgcCwdA2ukSYIRnjg/zDkj8+3h/GaROJ72x
- lZyI6HWixKJkWw8lE9aAOD9TmTW9sFJwcVAzmAuFX2kUreDUKMZduGcoRYGpD7E=
- =jpXa
- -----END PGP SIGNATURE-----
-
-Merge tag 'signedtag' into downstream
-
-signed tag
-
-signed tag message body
-
-# gpg: Signature made Wed Jun 15 08:56:46 2016 UTC using RSA key ID B7227189
-# gpg: Good signature from "Eris Discordia <discord@example.net>"
-# gpg: WARNING: This key is not certified with a trusted signature!
-# gpg: There is no indication that the signature belongs to the owner.
-# Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189
-----
-
-- verify with: verification is embedded in merge commit message by default,
- alternatively with `git show --show-signature`:
-
-----
-commit 9863f0c76ff78712b6800e199a46aa56afbcbd49
-merged tag 'signedtag'
-gpg: Signature made Wed Jun 15 10:56:46 2016 CEST using RSA key ID B7227189
-gpg: Good signature from "Eris Discordia <discord@example.net>"
-gpg: WARNING: This key is not certified with a trusted signature!
-gpg: There is no indication that the signature belongs to the owner.
-Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189
-Merge: c33429b 04b8717
-Author: A U Thor <author@example.com>
-Date: Wed Jun 15 09:13:29 2016 +0000
-
- Merge tag 'signedtag' into downstream
-
- signed tag
-
- signed tag message body
-
- # gpg: Signature made Wed Jun 15 08:56:46 2016 UTC using RSA key ID B7227189
- # gpg: Good signature from "Eris Discordia <discord@example.net>"
- # gpg: WARNING: This key is not certified with a trusted signature!
- # gpg: There is no indication that the signature belongs to the owner.
- # Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189
-----
diff --git a/Documentation/technical/sparse-checkout.txt b/Documentation/technical/sparse-checkout.txt
new file mode 100644
index 0000000..fa0d01c
--- /dev/null
+++ b/Documentation/technical/sparse-checkout.txt
@@ -0,0 +1,1103 @@
+Table of contents:
+
+ * Terminology
+ * Purpose of sparse-checkouts
+ * Usecases of primary concern
+ * Oversimplified mental models ("Cliff Notes" for this document!)
+ * Desired behavior
+ * Behavior classes
+ * Subcommand-dependent defaults
+ * Sparse specification vs. sparsity patterns
+ * Implementation Questions
+ * Implementation Goals/Plans
+ * Known bugs
+ * Reference Emails
+
+
+=== Terminology ===
+
+cone mode: one of two modes for specifying the desired subset of files
+ in a sparse-checkout. In cone-mode, the user specifies
+ directories (getting both everything under that directory as
+ well as everything in leading directories), while in non-cone
+ mode, the user specifies gitignore-style patterns. Controlled
+ by the --[no-]cone option to sparse-checkout init|set.
+
+SKIP_WORKTREE: When tracked files do not match the sparse specification and
+ are removed from the working tree, the file in the index is marked
+ with a SKIP_WORKTREE bit. Note that if a tracked file has the
+ SKIP_WORKTREE bit set but the file is later written by the user to
+ the working tree anyway, the SKIP_WORKTREE bit will be cleared at
+ the beginning of any subsequent Git operation.
+
+ Most sparse checkout users are unaware of this implementation
+ detail, and the term should generally be avoided in user-facing
+ descriptions and command flags. Unfortunately, prior to the
+ `sparse-checkout` subcommand this low-level detail was exposed,
+ and as of time of writing, is still exposed in various places.
+
+sparse-checkout: a subcommand in git used to reduce the files present in
+ the working tree to a subset of all tracked files. Also, the
+ name of the file in the $GIT_DIR/info directory used to track
+ the sparsity patterns corresponding to the user's desired
+ subset.
+
+sparse cone: see cone mode
+
+sparse directory: An entry in the index corresponding to a directory, which
+ appears in the index instead of all the files under that directory
+ that would normally appear. See also sparse-index. Something that
+ can cause confusion is that the "sparse directory" does NOT match
+ the sparse specification, i.e. the directory is NOT present in the
+ working tree. May be renamed in the future (e.g. to "skipped
+ directory").
+
+sparse index: A special mode for sparse-checkout that also makes the
+ index sparse by recording a directory entry in lieu of all the
+ files underneath that directory (thus making that a "skipped
+ directory" which unfortunately has also been called a "sparse
+ directory"), and does this for potentially multiple
+ directories. Controlled by the --[no-]sparse-index option to
+ init|set|reapply.
+
+sparsity patterns: patterns from $GIT_DIR/info/sparse-checkout used to
+ define the set of files of interest. A warning: It is easy to
+ over-use this term (or the shortened "patterns" term), for two
+ reasons: (1) users in cone mode specify directories rather than
+ patterns (their directories are transformed into patterns, but
+ users may think you are talking about non-cone mode if you use the
+ word "patterns"), and (b) the sparse specification might
+ transiently differ in the working tree or index from the sparsity
+ patterns (see "Sparse specification vs. sparsity patterns").
+
+sparse specification: The set of paths in the user's area of focus. This
+ is typically just the tracked files that match the sparsity
+ patterns, but the sparse specification can temporarily differ and
+ include additional files. (See also "Sparse specification
+ vs. sparsity patterns")
+
+ * When working with history, the sparse specification is exactly
+ the set of files matching the sparsity patterns.
+ * When interacting with the working tree, the sparse specification
+ is the set of tracked files with a clear SKIP_WORKTREE bit or
+ tracked files present in the working copy.
+ * When modifying or showing results from the index, the sparse
+ specification is the set of files with a clear SKIP_WORKTREE bit
+ or that differ in the index from HEAD.
+ * If working with the index and the working copy, the sparse
+ specification is the union of the paths from above.
+
+vivifying: When a command restores a tracked file to the working tree (and
+ hopefully also clears the SKIP_WORKTREE bit in the index for that
+ file), this is referred to as "vivifying" the file.
+
+
+=== Purpose of sparse-checkouts ===
+
+sparse-checkouts exist to allow users to work with a subset of their
+files.
+
+You can think of sparse-checkouts as subdividing "tracked" files into two
+categories -- a sparse subset, and all the rest. Implementationally, we
+mark "all the rest" in the index with a SKIP_WORKTREE bit and leave them
+out of the working tree. The SKIP_WORKTREE files are still tracked, just
+not present in the working tree.
+
+In the past, sparse-checkouts were defined by "SKIP_WORKTREE means the file
+is missing from the working tree but pretend the file contents match HEAD".
+That was not only bogus (it actually meant the file missing from the
+working tree matched the index rather than HEAD), but it was also a
+low-level detail which only provided decent behavior for a few commands.
+There were a surprising number of ways in which that guiding principle gave
+command results that violated user expectations, and as such was a bad
+mental model. However, it persisted for many years and may still be found
+in some corners of the code base.
+
+Anyway, the idea of "working with a subset of files" is simple enough, but
+there are multiple different high-level usecases which affect how some Git
+subcommands should behave. Further, even if we only considered one of
+those usecases, sparse-checkouts can modify different subcommands in over a
+half dozen different ways. Let's start by considering the high level
+usecases:
+
+ A) Users are _only_ interested in the sparse portion of the repo
+
+ A*) Users are _only_ interested in the sparse portion of the repo
+ that they have downloaded so far
+
+ B) Users want a sparse working tree, but are working in a larger whole
+
+ C) sparse-checkout is a behind-the-scenes implementation detail allowing
+ Git to work with a specially crafted in-house virtual file system;
+ users are actually working with a "full" working tree that is
+ lazily populated, and sparse-checkout helps with the lazy population
+ piece.
+
+It may be worth explaining each of these in a bit more detail:
+
+
+ (Behavior A) Users are _only_ interested in the sparse portion of the repo
+
+These folks might know there are other things in the repository, but
+don't care. They are uninterested in other parts of the repository, and
+only want to know about changes within their area of interest. Showing
+them other files from history (e.g. from diff/log/grep/etc.) is a
+usability annoyance, potentially a huge one since other changes in
+history may dwarf the changes they are interested in.
+
+Some of these users also arrive at this usecase from wanting to use partial
+clones together with sparse checkouts (in a way where they have downloaded
+blobs within the sparse specification) and do disconnected development.
+Not only do these users generally not care about other parts of the
+repository, but consider it a blocker for Git commands to try to operate on
+those. If commands attempt to access paths in history outside the sparsity
+specification, then the partial clone will attempt to download additional
+blobs on demand, fail, and then fail the user's command. (This may be
+unavoidable in some cases, e.g. when `git merge` has non-trivial changes to
+reconcile outside the sparse specification, but we should limit how often
+users are forced to connect to the network.)
+
+Also, even for users using partial clones that do not mind being
+always connected to the network, the need to download blobs as
+side-effects of various other commands (such as the printed diffstat
+after a merge or pull) can lead to worries about local repository size
+growing unnecessarily[10].
+
+ (Behavior A*) Users are _only_ interested in the sparse portion of the repo
+ that they have downloaded so far (a variant on the first usecase)
+
+This variant is driven by folks who using partial clones together with
+sparse checkouts and do disconnected development (so far sounding like a
+subset of behavior A users) and doing so on very large repositories. The
+reason for yet another variant is that downloading even just the blobs
+through history within their sparse specification may be too much, so they
+only download some. They would still like operations to succeed without
+network connectivity, though, so things like `git log -S${SEARCH_TERM} -p`
+or `git grep ${SEARCH_TERM} OLDREV ` would need to be prepared to provide
+partial results that depend on what happens to have been downloaded.
+
+This variant could be viewed as Behavior A with the sparse specification
+for history querying operations modified from "sparsity patterns" to
+"sparsity patterns limited to the blobs we have already downloaded".
+
+ (Behavior B) Users want a sparse working tree, but are working in a
+ larger whole
+
+Stolee described this usecase this way[11]:
+
+"I'm also focused on users that know that they are a part of a larger
+whole. They know they are operating on a large repository but focus on
+what they need to contribute their part. I expect multiple "roles" to
+use very different, almost disjoint parts of the codebase. Some other
+"architect" users operate across the entire tree or hop between different
+sections of the codebase as necessary. In this situation, I'm wary of
+scoping too many features to the sparse-checkout definition, especially
+"git log," as it can be too confusing to have their view of the codebase
+depend on your "point of view."
+
+People might also end up wanting behavior B due to complex inter-project
+dependencies. The initial attempts to use sparse-checkouts usually involve
+the directories you are directly interested in plus what those directories
+depend upon within your repository. But there's a monkey wrench here: if
+you have integration tests, they invert the hierarchy: to run integration
+tests, you need not only what you are interested in and its in-tree
+dependencies, you also need everything that depends upon what you are
+interested in or that depends upon one of your dependencies...AND you need
+all the in-tree dependencies of that expanded group. That can easily
+change your sparse-checkout into a nearly dense one.
+
+Naturally, that tends to kill the benefits of sparse-checkouts. There are
+a couple solutions to this conundrum: either avoid grabbing in-repo
+dependencies (maybe have built versions of your in-repo dependencies pulled
+from a CI cache somewhere), or say that users shouldn't run integration
+tests directly and instead do it on the CI server when they submit a code
+review. Or do both. Regardless of whether you stub out your in-repo
+dependencies or stub out the things that depend upon you, there is
+certainly a reason to want to query and be aware of those other stubbed-out
+parts of the repository, particularly when the dependencies are complex or
+change relatively frequently. Thus, for such uses, sparse-checkouts can be
+used to limit what you directly build and modify, but these users do not
+necessarily want their sparse checkout paths to limit their queries of
+versions in history.
+
+Some people may also be interested in behavior B over behavior A simply as
+a performance workaround: if they are using non-cone mode, then they have
+to deal with its inherent quadratic performance problems. In that mode,
+every operation that checks whether paths match the sparsity specification
+can be expensive. As such, these users may only be willing to pay for
+those expensive checks when interacting with the working copy, and may
+prefer getting "unrelated" results from their history queries over having
+slow commands.
+
+ (Behavior C) sparse-checkout is an implementational detail supporting a
+ special VFS.
+
+This usecase goes slightly against the traditional definition of
+sparse-checkout in that it actually tries to present a full or dense
+checkout to the user. However, this usecase utilizes the same underlying
+technical underpinnings in a new way which does provide some performance
+advantages to users. The basic idea is that a company can have an in-house
+Git-aware Virtual File System which pretends all files are present in the
+working tree, by intercepting all file system accesses and using those to
+fetch and write accessed files on demand via partial clones. The VFS uses
+sparse-checkout to prevent Git from writing or paying attention to many
+files, and manually updates the sparse checkout patterns itself based on
+user access and modification of files in the working tree. See commit
+ecc7c8841d ("repo_read_index: add config to expect files outside sparse
+patterns", 2022-02-25) and the link at [17] for a more detailed description
+of such a VFS.
+
+The biggest difference here is that users are completely unaware that the
+sparse-checkout machinery is even in use. The sparse patterns are not
+specified by the user but rather are under the complete control of the VFS
+(and the patterns are updated frequently and dynamically by it). The user
+will perceive the checkout as dense, and commands should thus behave as if
+all files are present.
+
+
+=== Usecases of primary concern ===
+
+Most of the rest of this document will focus on Behavior A and Behavior
+B. Some notes about the other two cases and why we are not focusing on
+them:
+
+ (Behavior A*)
+
+Supporting this usecase is estimated to be difficult and a lot of work.
+There are no plans to implement it currently, but it may be a potential
+future alternative. Knowing about the existence of additional alternatives
+may affect our choice of command line flags (e.g. if we need tri-state or
+quad-state flags rather than just binary flags), so it was still important
+to at least note.
+
+Further, I believe the descriptions below for Behavior A are probably still
+valid for this usecase, with the only exception being that it redefines the
+sparse specification to restrict it to already-downloaded blobs. The hard
+part is in making commands capable of respecting that modified definition.
+
+ (Behavior C)
+
+This usecase violates some of the early sparse-checkout documented
+assumptions (since files marked as SKIP_WORKTREE will be displayed to users
+as present in the working tree). That violation may mean various
+sparse-checkout related behaviors are not well suited to this usecase and
+we may need tweaks -- to both documentation and code -- to handle it.
+However, this usecase is also perhaps the simplest model to support in that
+everything behaves like a dense checkout with a few exceptions (e.g. branch
+checkouts and switches write fewer things, knowing the VFS will lazily
+write the rest on an as-needed basis).
+
+Since there is no publically available VFS-related code for folks to try,
+the number of folks who can test such a usecase is limited.
+
+The primary reason to note the Behavior C usecase is that as we fix things
+to better support Behaviors A and B, there may be additional places where
+we need to make tweaks allowing folks in this usecase to get the original
+non-sparse treatment. For an example, see ecc7c8841d ("repo_read_index:
+add config to expect files outside sparse patterns", 2022-02-25). The
+secondary reason to note Behavior C, is so that folks taking advantage of
+Behavior C do not assume they are part of the Behavior B camp and propose
+patches that break things for the real Behavior B folks.
+
+
+=== Oversimplified mental models ===
+
+An oversimplification of the differences in the above behaviors is:
+
+ Behavior A: Restrict worktree and history operations to sparse specification
+ Behavior B: Restrict worktree operations to sparse specification; have any
+ history operations work across all files
+ Behavior C: Do not restrict either worktree or history operations to the
+ sparse specification...with the exception of branch checkouts or
+ switches which avoid writing files that will match the index so
+ they can later lazily be populated instead.
+
+
+=== Desired behavior ===
+
+As noted previously, despite the simple idea of just working with a subset
+of files, there are a range of different behavioral changes that need to be
+made to different subcommands to work well with such a feature. See
+[1,2,3,4,5,6,7,8,9,10] for various examples. In particular, at [2], we saw
+that mere composition of other commands that individually worked correctly
+in a sparse-checkout context did not imply that the higher level command
+would work correctly; it sometimes requires further tweaks. So,
+understanding these differences can be beneficial.
+
+* Commands behaving the same regardless of high-level use-case
+
+ * commands that only look at files within the sparsity specification
+
+ * diff (without --cached or REVISION arguments)
+ * grep (without --cached or REVISION arguments)
+ * diff-files
+
+ * commands that restore files to the working tree that match sparsity
+ patterns, and remove unmodified files that don't match those
+ patterns:
+
+ * switch
+ * checkout (the switch-like half)
+ * read-tree
+ * reset --hard
+
+ * commands that write conflicted files to the working tree, but otherwise
+ will omit writing files to the working tree that do not match the
+ sparsity patterns:
+
+ * merge
+ * rebase
+ * cherry-pick
+ * revert
+
+ * `am` and `apply --cached` should probably be in this section but
+ are buggy (see the "Known bugs" section below)
+
+ The behavior for these commands somewhat depends upon the merge
+ strategy being used:
+ * `ort` behaves as described above
+ * `recursive` tries to not vivify files unnecessarily, but does sometimes
+ vivify files without conflicts.
+ * `octopus` and `resolve` will always vivify any file changed in the merge
+ relative to the first parent, which is rather suboptimal.
+
+ It is also important to note that these commands WILL update the index
+ outside the sparse specification relative to when the operation began,
+ BUT these commands often make a commit just before or after such that
+ by the end of the operation there is no change to the index outside the
+ sparse specification. Of course, if the operation hits conflicts or
+ does not make a commit, then these operations clearly can modify the
+ index outside the sparse specification.
+
+ Finally, it is important to note that at least the first four of these
+ commands also try to remove differences between the sparse
+ specification and the sparsity patterns (much like the commands in the
+ previous section).
+
+ * commands that always ignore sparsity since commits must be full-tree
+
+ * archive
+ * bundle
+ * commit
+ * format-patch
+ * fast-export
+ * fast-import
+ * commit-tree
+
+ * commands that write any modified file to the working tree (conflicted
+ or not, and whether those paths match sparsity patterns or not):
+
+ * stash
+ * apply (without `--index` or `--cached`)
+
+* Commands that may slightly differ for behavior A vs. behavior B:
+
+ Commands in this category behave mostly the same between the two
+ behaviors, but may differ in verbosity and types of warning and error
+ messages.
+
+ * commands that make modifications to which files are tracked:
+ * add
+ * rm
+ * mv
+ * update-index
+
+ The fact that files can move between the 'tracked' and 'untracked'
+ categories means some commands will have to treat untracked files
+ differently. But if we have to treat untracked files differently,
+ then additional commands may also need changes:
+
+ * status
+ * clean
+
+ In particular, `status` may need to report any untracked files outside
+ the sparsity specification as an erroneous condition (especially to
+ avoid the user trying to `git add` them, forcing `git add` to display
+ an error).
+
+ It's not clear to me exactly how (or even if) `clean` would change,
+ but it's the other command that also affects untracked files.
+
+ `update-index` may be slightly special. Its --[no-]skip-worktree flag
+ may need to ignore the sparse specification by its nature. Also, its
+ current --[no-]ignore-skip-worktree-entries default is totally bogus.
+
+ * commands for manually tweaking paths in both the index and the working tree
+ * `restore`
+ * the restore-like half of `checkout`
+
+ These commands should be similar to add/rm/mv in that they should
+ only operate on the sparse specification by default, and require a
+ special flag to operate on all files.
+
+ Also, note that these commands currently have a number of issues (see
+ the "Known bugs" section below)
+
+* Commands that significantly differ for behavior A vs. behavior B:
+
+ * commands that query history
+ * diff (with --cached or REVISION arguments)
+ * grep (with --cached or REVISION arguments)
+ * show (when given commit arguments)
+ * blame (only matters when one or more -C flags are passed)
+ * and annotate
+ * log
+ * whatchanged
+ * ls-files
+ * diff-index
+ * diff-tree
+ * ls-tree
+
+ Note: for log and whatchanged, revision walking logic is unaffected
+ but displaying of patches is affected by scoping the command to the
+ sparse-checkout. (The fact that revision walking is unaffected is
+ why rev-list, shortlog, show-branch, and bisect are not in this
+ list.)
+
+ ls-files may be slightly special in that e.g. `git ls-files -t` is
+ often used to see what is sparse and what is not. Perhaps -t should
+ always work on the full tree?
+
+* Commands I don't know how to classify
+
+ * range-diff
+
+ Is this like `log` or `format-patch`?
+
+ * cherry
+
+ See range-diff
+
+* Commands unaffected by sparse-checkouts
+
+ * shortlog
+ * show-branch
+ * rev-list
+ * bisect
+
+ * branch
+ * describe
+ * fetch
+ * gc
+ * init
+ * maintenance
+ * notes
+ * pull (merge & rebase have the necessary changes)
+ * push
+ * submodule
+ * tag
+
+ * config
+ * filter-branch (works in separate checkout without sparse-checkout setup)
+ * pack-refs
+ * prune
+ * remote
+ * repack
+ * replace
+
+ * bugreport
+ * count-objects
+ * fsck
+ * gitweb
+ * help
+ * instaweb
+ * merge-tree (doesn't touch worktree or index, and merges always compute full-tree)
+ * rerere
+ * verify-commit
+ * verify-tag
+
+ * commit-graph
+ * hash-object
+ * index-pack
+ * mktag
+ * mktree
+ * multi-pack-index
+ * pack-objects
+ * prune-packed
+ * symbolic-ref
+ * unpack-objects
+ * update-ref
+ * write-tree (operates on index, possibly optimized to use sparse dir entries)
+
+ * for-each-ref
+ * get-tar-commit-id
+ * ls-remote
+ * merge-base (merges are computed full tree, so merge base should be too)
+ * name-rev
+ * pack-redundant
+ * rev-parse
+ * show-index
+ * show-ref
+ * unpack-file
+ * var
+ * verify-pack
+
+ * <Everything under 'Interacting with Others' in 'git help --all'>
+ * <Everything under 'Low-level...Syncing' in 'git help --all'>
+ * <Everything under 'Low-level...Internal Helpers' in 'git help --all'>
+ * <Everything under 'External commands' in 'git help --all'>
+
+* Commands that might be affected, but who cares?
+
+ * merge-file
+ * merge-index
+ * gitk?
+
+
+=== Behavior classes ===
+
+From the above there are a few classes of behavior:
+
+ * "restrict"
+
+ Commands in this class only read or write files in the working tree
+ within the sparse specification.
+
+ When moving to a new commit (e.g. switch, reset --hard), these commands
+ may update index files outside the sparse specification as of the start
+ of the operation, but by the end of the operation those index files
+ will match HEAD again and thus those files will again be outside the
+ sparse specification.
+
+ When paths are explicitly specified, these paths are intersected with
+ the sparse specification and will only operate on such paths.
+ (e.g. `git restore [--staged] -- '*.png'`, `git reset -p -- '*.md'`)
+
+ Some of these commands may also attempt, at the end of their operation,
+ to cull transient differences between the sparse specification and the
+ sparsity patterns (see "Sparse specification vs. sparsity patterns" for
+ details, but this basically means either removing unmodified files not
+ matching the sparsity patterns and marking those files as
+ SKIP_WORKTREE, or vivifying files that match the sparsity patterns and
+ marking those files as !SKIP_WORKTREE).
+
+ * "restrict modulo conflicts"
+
+ Commands in this class generally behave like the "restrict" class,
+ except that:
+ (1) they will ignore the sparse specification and write files with
+ conflicts to the working tree (thus temporarily expanding the
+ sparse specification to include such files.)
+ (2) they are grouped with commands which move to a new commit, since
+ they often create a commit and then move to it, even though we
+ know there are many exceptions to moving to the new commit. (For
+ example, the user may rebase a commit that becomes empty, or have
+ a cherry-pick which conflicts, or a user could run `merge
+ --no-commit`, and we also view `apply --index` kind of like `am
+ --no-commit`.) As such, these commands can make changes to index
+ files outside the sparse specification, though they'll mark such
+ files with SKIP_WORKTREE.
+
+ * "restrict also specially applied to untracked files"
+
+ Commands in this class generally behave like the "restrict" class,
+ except that they have to handle untracked files differently too, often
+ because these commands are dealing with files changing state between
+ 'tracked' and 'untracked'. Often, this may mean printing an error
+ message if the command had nothing to do, but the arguments may have
+ referred to files whose tracked-ness state could have changed were it
+ not for the sparsity patterns excluding them.
+
+ * "no restrict"
+
+ Commands in this class ignore the sparse specification entirely.
+
+ * "restrict or no restrict dependent upon behavior A vs. behavior B"
+
+ Commands in this class behave like "no restrict" for folks in the
+ behavior B camp, and like "restrict" for folks in the behavior A camp.
+ However, when behaving like "restrict" a warning of some sort might be
+ provided that history queries have been limited by the sparse-checkout
+ specification.
+
+
+=== Subcommand-dependent defaults ===
+
+Note that we have different defaults depending on the command for the
+desired behavior :
+
+ * Commands defaulting to "restrict":
+ * diff-files
+ * diff (without --cached or REVISION arguments)
+ * grep (without --cached or REVISION arguments)
+ * switch
+ * checkout (the switch-like half)
+ * reset (<commit>)
+
+ * restore
+ * checkout (the restore-like half)
+ * checkout-index
+ * reset (with pathspec)
+
+ This behavior makes sense; these interact with the working tree.
+
+ * Commands defaulting to "restrict modulo conflicts":
+ * merge
+ * rebase
+ * cherry-pick
+ * revert
+
+ * am
+ * apply --index (which is kind of like an `am --no-commit`)
+
+ * read-tree (especially with -m or -u; is kind of like a --no-commit merge)
+ * reset (<tree-ish>, due to similarity to read-tree)
+
+ These also interact with the working tree, but require slightly
+ different behavior either so that (a) conflicts can be resolved or (b)
+ because they are kind of like a merge-without-commit operation.
+
+ (See also the "Known bugs" section below regarding `am` and `apply`)
+
+ * Commands defaulting to "no restrict":
+ * archive
+ * bundle
+ * commit
+ * format-patch
+ * fast-export
+ * fast-import
+ * commit-tree
+
+ * stash
+ * apply (without `--index`)
+
+ These have completely different defaults and perhaps deserve the most
+ detailed explanation:
+
+ In the case of commands in the first group (format-patch,
+ fast-export, bundle, archive, etc.), these are commands for
+ communicating history, which will be broken if they restrict to a
+ subset of the repository. As such, they operate on full paths and
+ have no `--restrict` option for overriding. Some of these commands may
+ take paths for manually restricting what is exported, but it needs to
+ be very explicit.
+
+ In the case of stash, it needs to vivify files to avoid losing the
+ user's changes.
+
+ In the case of apply without `--index`, that command needs to update
+ the working tree without the index (or the index without the working
+ tree if `--cached` is passed), and if we restrict those updates to the
+ sparse specification then we'll lose changes from the user.
+
+ * Commands defaulting to "restrict also specially applied to untracked files":
+ * add
+ * rm
+ * mv
+ * update-index
+ * status
+ * clean (?)
+
+ Our original implementation for the first three of these commands was
+ "no restrict", but it had some severe usability issues:
+ * `git add <somefile>` if honored and outside the sparse
+ specification, can result in the file randomly disappearing later
+ when some subsequent command is run (since various commands
+ automatically clean up unmodified files outside the sparse
+ specification).
+ * `git rm '*.jpg'` could very negatively surprise users if it deletes
+ files outside the range of the user's interest.
+ * `git mv` has similar surprises when moving into or out of the cone,
+ so best to restrict by default
+
+ So, we switched `add` and `rm` to default to "restrict", which made
+ usability problems much less severe and less frequent, but we still got
+ complaints because commands like:
+ git add <file-outside-sparse-specification>
+ git rm <file-outside-sparse-specification>
+ would silently do nothing. We should instead print an error in those
+ cases to get usability right.
+
+ update-index needs to be updated to match, and status and maybe clean
+ also need to be updated to specially handle untracked paths.
+
+ There may be a difference in here between behavior A and behavior B in
+ terms of verboseness of errors or additional warnings.
+
+ * Commands falling under "restrict or no restrict dependent upon behavior
+ A vs. behavior B"
+
+ * diff (with --cached or REVISION arguments)
+ * grep (with --cached or REVISION arguments)
+ * show (when given commit arguments)
+ * blame (only matters when one or more -C flags passed)
+ * and annotate
+ * log
+ * and variants: shortlog, gitk, show-branch, whatchanged, rev-list
+ * ls-files
+ * diff-index
+ * diff-tree
+ * ls-tree
+
+ For now, we default to behavior B for these, which want a default of
+ "no restrict".
+
+ Note that two of these commands -- diff and grep -- also appeared in a
+ different list with a default of "restrict", but only when limited to
+ searching the working tree. The working tree vs. history distinction
+ is fundamental in how behavior B operates, so this is expected. Note,
+ though, that for diff and grep with --cached, when doing "restrict"
+ behavior, the difference between sparse specification and sparsity
+ patterns is important to handle.
+
+ "restrict" may make more sense as the long term default for these[12].
+ Also, supporting "restrict" for these commands might be a fair amount
+ of work to implement, meaning it might be implemented over multiple
+ releases. If that behavior were the default in the commands that
+ supported it, that would force behavior B users to need to learn to
+ slowly add additional flags to their commands, depending on git
+ version, to get the behavior they want. That gradual switchover would
+ be painful, so we should avoid it at least until it's fully
+ implemented.
+
+
+=== Sparse specification vs. sparsity patterns ===
+
+In a well-behaved situation, the sparse specification is given directly
+by the $GIT_DIR/info/sparse-checkout file. However, it can transiently
+diverge for a few reasons:
+
+ * needing to resolve conflicts (merging will vivify conflicted files)
+ * running Git commands that implicitly vivify files (e.g. "git stash apply")
+ * running Git commands that explicitly vivify files (e.g. "git checkout
+ --ignore-skip-worktree-bits FILENAME")
+ * other commands that write to these files (perhaps a user copies it
+ from elsewhere)
+
+For the last item, note that we do automatically clear the SKIP_WORKTREE
+bit for files that are present in the working tree. This has been true
+since 82386b4496 ("Merge branch 'en/present-despite-skipped'",
+2022-03-09)
+
+However, such a situation is transient because:
+
+ * Such transient differences can and will be automatically removed as
+ a side-effect of commands which call unpack_trees() (checkout,
+ merge, reset, etc.).
+ * Users can also request such transient differences be corrected via
+ running `git sparse-checkout reapply`. Various places recommend
+ running that command.
+ * Additional commands are also welcome to implicitly fix these
+ differences; we may add more in the future.
+
+While we avoid dropping unstaged changes or files which have conflicts,
+we otherwise aggressively try to fix these transient differences. If
+users want these differences to persist, they should run the `set` or
+`add` subcommands of `git sparse-checkout` to reflect their intended
+sparse specification.
+
+However, when we need to do a query on history restricted to the
+"relevant subset of files" such a transiently expanded sparse
+specification is ignored. There are a couple reasons for this:
+
+ * The behavior wanted when doing something like
+ git grep expression REVISION
+ is roughly what the users would expect from
+ git checkout REVISION && git grep expression
+ (modulo a "REVISION:" prefix), which has a couple ramifications:
+
+ * REVISION may have paths not in the current index, so there is no
+ path we can consult for a SKIP_WORKTREE setting for those paths.
+
+ * Since `checkout` is one of those commands that tries to remove
+ transient differences in the sparse specification, it makes sense
+ to use the corrected sparse specification
+ (i.e. $GIT_DIR/info/sparse-checkout) rather than attempting to
+ consult SKIP_WORKTREE anyway.
+
+So, a transiently expanded (or restricted) sparse specification applies to
+the working tree, but not to history queries where we always use the
+sparsity patterns. (See [16] for an early discussion of this.)
+
+Similar to a transiently expanded sparse specification of the working tree
+based on additional files being present in the working tree, we also need
+to consider additional files being modified in the index. In particular,
+if the user has staged changes to files (relative to HEAD) that do not
+match the sparsity patterns, and the file is not present in the working
+tree, we still want to consider the file part of the sparse specification
+if we are specifically performing a query related to the index (e.g. git
+diff --cached [REVISION], git diff-index [REVISION], git restore --staged
+--source=REVISION -- PATHS, etc.) Note that a transiently expanded sparse
+specification for the index usually only matters under behavior A, since
+under behavior B index operations are lumped with history and tend to
+operate full-tree.
+
+
+=== Implementation Questions ===
+
+ * Do the options --scope={sparse,all} sound good to others? Are there better
+ options?
+ * Names in use, or appearing in patches, or previously suggested:
+ * --sparse/--dense
+ * --ignore-skip-worktree-bits
+ * --ignore-skip-worktree-entries
+ * --ignore-sparsity
+ * --[no-]restrict-to-sparse-paths
+ * --full-tree/--sparse-tree
+ * --[no-]restrict
+ * --scope={sparse,all}
+ * --focus/--unfocus
+ * --limit/--unlimited
+ * Rationale making me lean slightly towards --scope={sparse,all}:
+ * We want a name that works for many commands, so we need a name that
+ does not conflict
+ * We know that we have more than two possible usecases, so it is best
+ to avoid a flag that appears to be binary.
+ * --scope={sparse,all} isn't overly long and seems relatively
+ explanatory
+ * `--sparse`, as used in add/rm/mv, is totally backwards for
+ grep/log/etc. Changing the meaning of `--sparse` for these
+ commands would fix the backwardness, but possibly break existing
+ scripts. Using a new name pairing would allow us to treat
+ `--sparse` in these commands as a deprecated alias.
+ * There is a different `--sparse`/`--dense` pair for commands using
+ revision machinery, so using that naming might cause confusion
+ * There is also a `--sparse` in both pack-objects and show-branch, which
+ don't conflict but do suggest that `--sparse` is overloaded
+ * The name --ignore-skip-worktree-bits is a double negative, is
+ quite a mouthful, refers to an implementation detail that many
+ users may not be familiar with, and we'd need a negation for it
+ which would probably be even more ridiculously long. (But we
+ can make --ignore-skip-worktree-bits a deprecated alias for
+ --no-restrict.)
+
+ * If a config option is added (sparse.scope?) what should the values and
+ description be? "sparse" (behavior A), "worktree-sparse-history-dense"
+ (behavior B), "dense" (behavior C)? There's a risk of confusion,
+ because even for Behaviors A and B we want some commands to be
+ full-tree and others to operate sparsely, so the wording may need to be
+ more tied to the usecases and somehow explain that. Also, right now,
+ the primary difference we are focusing is just the history-querying
+ commands (log/diff/grep). Previous config suggestion here: [13]
+
+ * Is `--no-expand` a good alias for ls-files's `--sparse` option?
+ (`--sparse` does not map to either `--scope=sparse` or `--scope=all`,
+ because in non-cone mode it does nothing and in cone-mode it shows the
+ sparse directory entries which are technically outside the sparse
+ specification)
+
+ * Under Behavior A:
+ * Does ls-files' `--no-expand` override the default `--scope=all`, or
+ does it need an extra flag?
+ * Does ls-files' `-t` option imply `--scope=all`?
+ * Does update-index's `--[no-]skip-worktree` option imply `--scope=all`?
+
+ * sparse-checkout: once behavior A is fully implemented, should we take
+ an interim measure to ease people into switching the default? Namely,
+ if folks are not already in a sparse checkout, then require
+ `sparse-checkout init/set` to take a
+ `--set-scope=(sparse|worktree-sparse-history-dense|dense)` flag (which
+ would set sparse.scope according to the setting given), and throw an
+ error if the flag is not provided? That error would be a great place
+ to warn folks that the default may change in the future, and get them
+ used to specifying what they want so that the eventual default switch
+ is seamless for them.
+
+
+=== Implementation Goals/Plans ===
+
+ * Get buy-in on this document in general.
+
+ * Figure out answers to the 'Implementation Questions' sections (above)
+
+ * Fix bugs in the 'Known bugs' section (below)
+
+ * Provide some kind of method for backfilling the blobs within the sparse
+ specification in a partial clone
+
+ [Below here is kind of spitballing since the first two haven't been resolved]
+
+ * update-index: flip the default to --no-ignore-skip-worktree-entries,
+ nuke this stupid "Oh, there's a bug? Let me add a flag to let users
+ request that they not trigger this bug." flag
+
+ * Flags & Config
+ * Make `--sparse` in add/rm/mv a deprecated alias for `--scope=all`
+ * Make `--ignore-skip-worktree-bits` in checkout-index/checkout/restore
+ a deprecated aliases for `--scope=all`
+ * Create config option (sparse.scope?), tie it to the "Cliff notes"
+ overview
+
+ * Add --scope=sparse (and --scope=all) flag to each of the history querying
+ commands. IMPORTANT: make sure diff machinery changes don't mess with
+ format-patch, fast-export, etc.
+
+=== Known bugs ===
+
+This list used to be a lot longer (see e.g. [1,2,3,4,5,6,7,8,9]), but we've
+been working on it.
+
+0. Behavior A is not well supported in Git. (Behavior B didn't used to
+ be either, but was the easier of the two to implement.)
+
+1. am and apply:
+
+ apply, without `--index` or `--cached`, relies on files being present
+ in the working copy, and also writes to them unconditionally. As
+ such, it should first check for the files' presence, and if found to
+ be SKIP_WORKTREE, then clear the bit and vivify the paths, then do
+ its work. Currently, it just throws an error.
+
+ apply, with either `--cached` or `--index`, will not preserve the
+ SKIP_WORKTREE bit. This is fine if the file has conflicts, but
+ otherwise SKIP_WORKTREE bits should be preserved for --cached and
+ probably also for --index.
+
+ am, if there are no conflicts, will vivify files and fail to preserve
+ the SKIP_WORKTREE bit. If there are conflicts and `-3` is not
+ specified, it will vivify files and then complain the patch doesn't
+ apply. If there are conflicts and `-3` is specified, it will vivify
+ files and then complain that those vivified files would be
+ overwritten by merge.
+
+2. reset --hard:
+
+ reset --hard provides confusing error message (works correctly, but
+ misleads the user into believing it didn't):
+
+ $ touch addme
+ $ git add addme
+ $ git ls-files -t
+ H addme
+ H tracked
+ S tracked-but-maybe-skipped
+ $ git reset --hard # usually works great
+ error: Path 'addme' not uptodate; will not remove from working tree.
+ HEAD is now at bdbbb6f third
+ $ git ls-files -t
+ H tracked
+ S tracked-but-maybe-skipped
+ $ ls -1
+ tracked
+
+ `git reset --hard` DID remove addme from the index and the working tree, contrary
+ to the error message, but in line with how reset --hard should behave.
+
+3. read-tree
+
+ `read-tree` doesn't apply the 'SKIP_WORKTREE' bit to *any* of the
+ entries it reads into the index, resulting in all your files suddenly
+ appearing to be "deleted".
+
+4. Checkout, restore:
+
+ These command do not handle path & revision arguments appropriately:
+
+ $ ls
+ tracked
+ $ git ls-files -t
+ H tracked
+ S tracked-but-maybe-skipped
+ $ git status --porcelain
+ $ git checkout -- '*skipped'
+ error: pathspec '*skipped' did not match any file(s) known to git
+ $ git ls-files -- '*skipped'
+ tracked-but-maybe-skipped
+ $ git checkout HEAD -- '*skipped'
+ error: pathspec '*skipped' did not match any file(s) known to git
+ $ git ls-tree HEAD | grep skipped
+ 100644 blob 276f5a64354b791b13840f02047738c77ad0584f tracked-but-maybe-skipped
+ $ git status --porcelain
+ $ git checkout HEAD~1 -- '*skipped'
+ $ git ls-files -t
+ H tracked
+ H tracked-but-maybe-skipped
+ $ git status --porcelain
+ M tracked-but-maybe-skipped
+ $ git checkout HEAD -- '*skipped'
+ $ git status --porcelain
+ $
+
+ Note that checkout without a revision (or restore --staged) fails to
+ find a file to restore from the index, even though ls-files shows
+ such a file certainly exists.
+
+ Similar issues occur with HEAD (--source=HEAD in restore's case),
+ but suddenly works when HEAD~1 is specified. And then after that it
+ will work with HEAD specified, even though it didn't before.
+
+ Directories are also an issue:
+
+ $ git sparse-checkout set nomatches
+ $ git status
+ On branch main
+ You are in a sparse checkout with 0% of tracked files present.
+
+ nothing to commit, working tree clean
+ $ git checkout .
+ error: pathspec '.' did not match any file(s) known to git
+ $ git checkout HEAD~1 .
+ Updated 1 path from 58916d9
+ $ git ls-files -t
+ S tracked
+ H tracked-but-maybe-skipped
+
+5. checkout and restore --staged, continued:
+
+ These commands do not correctly scope operations to the sparse
+ specification, and make it worse by not setting important SKIP_WORKTREE
+ bits:
+
+ $ git restore --source OLDREV --staged outside-sparse-cone/
+ $ git status --porcelain
+ MD outside-sparse-cone/file1
+ MD outside-sparse-cone/file2
+ MD outside-sparse-cone/file3
+
+ We can add a --scope=all mode to `git restore` to let it operate outside
+ the sparse specification, but then it will be important to set the
+ SKIP_WORKTREE bits appropriately.
+
+6. Performance issues; see:
+ https://lore.kernel.org/git/CABPp-BEkJQoKZsQGCYioyga_uoDQ6iBeW+FKr8JhyuuTMK1RDw@mail.gmail.com/
+
+
+=== Reference Emails ===
+
+Emails that detail various bugs we've had in sparse-checkout:
+
+[1] (Original descriptions of behavior A & behavior B)
+ https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/
+[2] (Fix stash applications in sparse checkouts; bugs from behavioral differences)
+ https://lore.kernel.org/git/ccfedc7140dbf63ba26a15f93bd3885180b26517.1606861519.git.gitgitgadget@gmail.com/
+[3] (Present-despite-skipped entries)
+ https://lore.kernel.org/git/11d46a399d26c913787b704d2b7169cafc28d639.1642175983.git.gitgitgadget@gmail.com/
+[4] (Clone --no-checkout interaction)
+ https://lore.kernel.org/git/pull.801.v2.git.git.1591324899170.gitgitgadget@gmail.com/ (clone --no-checkout)
+[5] (The need for update_sparsity() and avoiding `read-tree -mu HEAD`)
+ https://lore.kernel.org/git/3a1f084641eb47515b5a41ed4409a36128913309.1585270142.git.gitgitgadget@gmail.com/
+[6] (SKIP_WORKTREE is advisory, not mandatory)
+ https://lore.kernel.org/git/844306c3e86ef67591cc086decb2b760e7d710a3.1585270142.git.gitgitgadget@gmail.com/
+[7] (`worktree add` should copy sparsity settings from current worktree)
+ https://lore.kernel.org/git/c51cb3714e7b1d2f8c9370fe87eca9984ff4859f.1644269584.git.gitgitgadget@gmail.com/
+[8] (Avoid negative surprises in add, rm, and mv)
+ https://lore.kernel.org/git/cover.1617914011.git.matheus.bernardino@usp.br/
+ https://lore.kernel.org/git/pull.1018.v4.git.1632497954.gitgitgadget@gmail.com/
+[9] (Move from out-of-cone to in-cone)
+ https://lore.kernel.org/git/20220630023737.473690-6-shaoxuan.yuan02@gmail.com/
+ https://lore.kernel.org/git/20220630023737.473690-4-shaoxuan.yuan02@gmail.com/
+[10] (Unnecessarily downloading objects outside sparse specification)
+ https://lore.kernel.org/git/CAOLTT8QfwOi9yx_qZZgyGa8iL8kHWutEED7ok_jxwTcYT_hf9Q@mail.gmail.com/
+
+[11] (Stolee's comments on high-level usecases)
+ https://lore.kernel.org/git/1a1e33f6-3514-9afc-0a28-5a6b85bd8014@gmail.com/
+
+[12] Others commenting on eventually switching default to behavior A:
+ * https://lore.kernel.org/git/xmqqh719pcoo.fsf@gitster.g/
+ * https://lore.kernel.org/git/xmqqzgeqw0sy.fsf@gitster.g/
+ * https://lore.kernel.org/git/a86af661-cf58-a4e5-0214-a67d3a794d7e@github.com/
+
+[13] Previous config name suggestion and description
+ * https://lore.kernel.org/git/CABPp-BE6zW0nJSStcVU=_DoDBnPgLqOR8pkTXK3dW11=T01OhA@mail.gmail.com/
+
+[14] Tangential issue: switch to cone mode as default sparse specification mechanism:
+ https://lore.kernel.org/git/a1b68fd6126eb341ef3637bb93fedad4309b36d0.1650594746.git.gitgitgadget@gmail.com/
+
+[15] Lengthy email on grep behavior, covering what should be searched:
+ * https://lore.kernel.org/git/CABPp-BGVO3QdbfE84uF_3QDF0-y2iHHh6G5FAFzNRfeRitkuHw@mail.gmail.com/
+
+[16] Email explaining sparsity patterns vs. SKIP_WORKTREE and history operations,
+ search for the parenthetical comment starting "We do not check".
+ https://lore.kernel.org/git/CABPp-BFsCPPNOZ92JQRJeGyNd0e-TCW-LcLyr0i_+VSQJP+GCg@mail.gmail.com/
+
+[17] https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/
diff --git a/Documentation/technical/unit-tests.txt b/Documentation/technical/unit-tests.txt
new file mode 100644
index 0000000..206037f
--- /dev/null
+++ b/Documentation/technical/unit-tests.txt
@@ -0,0 +1,240 @@
+= Unit Testing
+
+In our current testing environment, we spend a significant amount of effort
+crafting end-to-end tests for error conditions that could easily be captured by
+unit tests (or we simply forgo some hard-to-setup and rare error conditions).
+Unit tests additionally provide stability to the codebase and can simplify
+debugging through isolation. Writing unit tests in pure C, rather than with our
+current shell/test-tool helper setup, simplifies test setup, simplifies passing
+data around (no shell-isms required), and reduces testing runtime by not
+spawning a separate process for every test invocation.
+
+We believe that a large body of unit tests, living alongside the existing test
+suite, will improve code quality for the Git project.
+
+== Definitions
+
+For the purposes of this document, we'll use *test framework* to refer to
+projects that support writing test cases and running tests within the context
+of a single executable. *Test harness* will refer to projects that manage
+running multiple executables (each of which may contain multiple test cases) and
+aggregating their results.
+
+In reality, these terms are not strictly defined, and many of the projects
+discussed below contain features from both categories.
+
+For now, we will evaluate projects solely on their framework features. Since we
+are relying on having TAP output (see below), we can assume that any framework
+can be made to work with a harness that we can choose later.
+
+
+== Summary
+
+We believe the best way forward is to implement a custom TAP framework for the
+Git project. We use a version of the framework originally proposed in
+https://lore.kernel.org/git/c902a166-98ce-afba-93f2-ea6027557176@gmail.com/[1].
+
+See the <<framework-selection,Framework Selection>> section below for the
+rationale behind this decision.
+
+
+== Choosing a test harness
+
+During upstream discussion, it was occasionally noted that `prove` provides many
+convenient features, such as scheduling slower tests first, or re-running
+previously failed tests.
+
+While we already support the use of `prove` as a test harness for the shell
+tests, it is not strictly required. The t/Makefile allows running shell tests
+directly (though with interleaved output if parallelism is enabled). Git
+developers who wish to use `prove` as a more advanced harness can do so by
+setting DEFAULT_TEST_TARGET=prove in their config.mak.
+
+We will follow a similar approach for unit tests: by default the test
+executables will be run directly from the t/Makefile, but `prove` can be
+configured with DEFAULT_UNIT_TEST_TARGET=prove.
+
+
+[[framework-selection]]
+== Framework selection
+
+There are a variety of features we can use to rank the candidate frameworks, and
+those features have different priorities:
+
+* Critical features: we probably won't consider a framework without these
+** Can we legally / easily use the project?
+*** <<license,License>>
+*** <<vendorable-or-ubiquitous,Vendorable or ubiquitous>>
+*** <<maintainable-extensible,Maintainable / extensible>>
+*** <<major-platform-support,Major platform support>>
+** Does the project support our bare-minimum needs?
+*** <<tap-support,TAP support>>
+*** <<diagnostic-output,Diagnostic output>>
+*** <<runtime-skippable-tests,Runtime-skippable tests>>
+* Nice-to-have features:
+** <<parallel-execution,Parallel execution>>
+** <<mock-support,Mock support>>
+** <<signal-error-handling,Signal & error-handling>>
+* Tie-breaker stats
+** <<project-kloc,Project KLOC>>
+** <<adoption,Adoption>>
+
+[[license]]
+=== License
+
+We must be able to legally use the framework in connection with Git. As Git is
+licensed only under GPLv2, we must eliminate any LGPLv3, GPLv3, or Apache 2.0
+projects.
+
+[[vendorable-or-ubiquitous]]
+=== Vendorable or ubiquitous
+
+We want to avoid forcing Git developers to install new tools just to run unit
+tests. Any prospective frameworks and harnesses must either be vendorable
+(meaning, we can copy their source directly into Git's repository), or so
+ubiquitous that it is reasonable to expect that most developers will have the
+tools installed already.
+
+[[maintainable-extensible]]
+=== Maintainable / extensible
+
+It is unlikely that any pre-existing project perfectly fits our needs, so any
+project we select will need to be actively maintained and open to accepting
+changes. Alternatively, assuming we are vendoring the source into our repo, it
+must be simple enough that Git developers can feel comfortable making changes as
+needed to our version.
+
+In the comparison table below, "True" means that the framework seems to have
+active developers, that it is simple enough that Git developers can make changes
+to it, and that the project seems open to accepting external contributions (or
+that it is vendorable). "Partial" means that at least one of the above
+conditions holds.
+
+[[major-platform-support]]
+=== Major platform support
+
+At a bare minimum, unit-testing must work on Linux, MacOS, and Windows.
+
+In the comparison table below, "True" means that it works on all three major
+platforms with no issues. "Partial" means that there may be annoyances on one or
+more platforms, but it is still usable in principle.
+
+[[tap-support]]
+=== TAP support
+
+The https://testanything.org/[Test Anything Protocol] is a text-based interface
+that allows tests to communicate with a test harness. It is already used by
+Git's integration test suite. Supporting TAP output is a mandatory feature for
+any prospective test framework.
+
+In the comparison table below, "True" means this is natively supported.
+"Partial" means TAP output must be generated by post-processing the native
+output.
+
+Frameworks that do not have at least Partial support will not be evaluated
+further.
+
+[[diagnostic-output]]
+=== Diagnostic output
+
+When a test case fails, the framework must generate enough diagnostic output to
+help developers find the appropriate test case in source code in order to debug
+the failure.
+
+[[runtime-skippable-tests]]
+=== Runtime-skippable tests
+
+Test authors may wish to skip certain test cases based on runtime circumstances,
+so the framework should support this.
+
+[[parallel-execution]]
+=== Parallel execution
+
+Ideally, we will build up a significant collection of unit test cases, most
+likely split across multiple executables. It will be necessary to run these
+tests in parallel to enable fast develop-test-debug cycles.
+
+In the comparison table below, "True" means that individual test cases within a
+single test executable can be run in parallel. We assume that executable-level
+parallelism can be handled by the test harness.
+
+[[mock-support]]
+=== Mock support
+
+Unit test authors may wish to test code that interacts with objects that may be
+inconvenient to handle in a test (e.g. interacting with a network service).
+Mocking allows test authors to provide a fake implementation of these objects
+for more convenient tests.
+
+[[signal-error-handling]]
+=== Signal & error handling
+
+The test framework should fail gracefully when test cases are themselves buggy
+or when they are interrupted by signals during runtime.
+
+[[project-kloc]]
+=== Project KLOC
+
+The size of the project, in thousands of lines of code as measured by
+https://dwheeler.com/sloccount/[sloccount] (rounded up to the next multiple of
+1,000). As a tie-breaker, we probably prefer a project with fewer LOC.
+
+[[adoption]]
+=== Adoption
+
+As a tie-breaker, we prefer a more widely-used project. We use the number of
+GitHub / GitLab stars to estimate this.
+
+
+=== Comparison
+
+:true: [lime-background]#True#
+:false: [red-background]#False#
+:partial: [yellow-background]#Partial#
+
+:gpl: [lime-background]#GPL v2#
+:isc: [lime-background]#ISC#
+:mit: [lime-background]#MIT#
+:expat: [lime-background]#Expat#
+:lgpl: [lime-background]#LGPL v2.1#
+
+:custom-impl: https://lore.kernel.org/git/c902a166-98ce-afba-93f2-ea6027557176@gmail.com/[Custom Git impl.]
+:greatest: https://github.com/silentbicycle/greatest[Greatest]
+:criterion: https://github.com/Snaipe/Criterion[Criterion]
+:c-tap: https://github.com/rra/c-tap-harness/[C TAP]
+:check: https://libcheck.github.io/check/[Check]
+
+[format="csv",options="header",width="33%",subs="specialcharacters,attributes,quotes,macros"]
+|=====
+Framework,"<<license,License>>","<<vendorable-or-ubiquitous,Vendorable or ubiquitous>>","<<maintainable-extensible,Maintainable / extensible>>","<<major-platform-support,Major platform support>>","<<tap-support,TAP support>>","<<diagnostic-output,Diagnostic output>>","<<runtime--skippable-tests,Runtime- skippable tests>>","<<parallel-execution,Parallel execution>>","<<mock-support,Mock support>>","<<signal-error-handling,Signal & error handling>>","<<project-kloc,Project KLOC>>","<<adoption,Adoption>>"
+{custom-impl},{gpl},{true},{true},{true},{true},{true},{true},{false},{false},{false},1,0
+{greatest},{isc},{true},{partial},{true},{partial},{true},{true},{false},{false},{false},3,1400
+{criterion},{mit},{false},{partial},{true},{true},{true},{true},{true},{false},{true},19,1800
+{c-tap},{expat},{true},{partial},{partial},{true},{false},{true},{false},{false},{false},4,33
+{check},{lgpl},{false},{partial},{true},{true},{true},{false},{false},{false},{true},17,973
+|=====
+
+=== Additional framework candidates
+
+Several suggested frameworks have been eliminated from consideration:
+
+* Incompatible licenses:
+** https://github.com/zorgnax/libtap[libtap] (LGPL v3)
+** https://cmocka.org/[cmocka] (Apache 2.0)
+* Missing source: https://www.kindahl.net/mytap/doc/index.html[MyTap]
+* No TAP support:
+** https://nemequ.github.io/munit/[µnit]
+** https://github.com/google/cmockery[cmockery]
+** https://github.com/lpabon/cmockery2[cmockery2]
+** https://github.com/ThrowTheSwitch/Unity[Unity]
+** https://github.com/siu/minunit[minunit]
+** https://cunit.sourceforge.net/[CUnit]
+
+
+== Milestones
+
+* Add useful tests of library-like code
+* Integrate with
+ https://lore.kernel.org/git/20230502211454.1673000-1-calvinwan@google.com/[stdlib
+ work]
+* Run alongside regular `make test` target