summaryrefslogtreecommitdiff
path: root/Documentation/technical
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/technical')
-rw-r--r--Documentation/technical/api-error-handling.txt24
-rw-r--r--Documentation/technical/api-trace2.txt48
-rw-r--r--Documentation/technical/bitmap-format.txt203
-rw-r--r--Documentation/technical/cruft-packs.txt123
-rw-r--r--Documentation/technical/index-format.txt2
-rw-r--r--Documentation/technical/pack-format.txt19
-rw-r--r--Documentation/technical/partial-clone.txt3
-rw-r--r--Documentation/technical/scalar.txt127
8 files changed, 426 insertions, 123 deletions
diff --git a/Documentation/technical/api-error-handling.txt b/Documentation/technical/api-error-handling.txt
index 8be4f4d..70bf1d3 100644
--- a/Documentation/technical/api-error-handling.txt
+++ b/Documentation/technical/api-error-handling.txt
@@ -1,12 +1,34 @@
Error reporting in git
======================
-`BUG`, `die`, `usage`, `error`, and `warning` report errors of
+`BUG`, `bug`, `die`, `usage`, `error`, and `warning` report errors of
various kinds.
- `BUG` is for failed internal assertions that should never happen,
i.e. a bug in git itself.
+- `bug` (lower-case, not `BUG`) is supposed to be used like `BUG` but
+ prints a "BUG" message instead of calling `abort()`.
++
+A call to `bug()` will then result in a "real" call to the `BUG()`
+function, either explicitly by invoking `BUG_if_bug()` after call(s)
+to `bug()`, or implicitly at `exit()` time where we'll check if we
+encountered any outstanding `bug()` invocations.
++
+If there were no prior calls to `bug()` before invoking `BUG_if_bug()`
+the latter is a NOOP. The `BUG_if_bug()` function takes the same
+arguments as `BUG()` itself. Calling `BUG_if_bug()` explicitly isn't
+necessary, but ensures that we die as soon as possible.
++
+If you know you had prior calls to `bug()` then calling `BUG()` itself
+is equivalent to calling `BUG_if_bug()`, the latter being a wrapper
+calling `BUG()` if we've set a flag indicating that we've called
+`bug()`.
++
+This is for the convenience of APIs who'd like to potentially report
+more than one "bug", such as the optbug() validation in
+parse-options.c.
+
- `die` is for fatal application errors. It prints a message to
the user and exits with status 128.
diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index bb13ca3..77a150b 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -5,7 +5,7 @@ information to stderr or a file. The Trace2 feature is inactive unless
explicitly enabled by enabling one or more Trace2 Targets.
The Trace2 API is intended to replace the existing (Trace1)
-printf-style tracing provided by the existing `GIT_TRACE` and
+`printf()`-style tracing provided by the existing `GIT_TRACE` and
`GIT_TRACE_PERFORMANCE` facilities. During initial implementation,
Trace2 and Trace1 may operate in parallel.
@@ -24,8 +24,8 @@ for example.
Trace2 is controlled using `trace2.*` config values in the system and
global config files and `GIT_TRACE2*` environment variables. Trace2 does
-not read from repo local or worktree config files or respect `-c`
-command line config settings.
+not read from repo local or worktree config files, nor does it respect
+`-c` command line config settings.
== Trace2 Targets
@@ -34,8 +34,8 @@ Format details are given in a later section.
=== The Normal Format Target
-The normal format target is a tradition printf format and similar
-to GIT_TRACE format. This format is enabled with the `GIT_TRACE2`
+The normal format target is a traditional `printf()` format and similar
+to the `GIT_TRACE` format. This format is enabled with the `GIT_TRACE2`
environment variable or the `trace2.normalTarget` system or global
config setting.
@@ -69,8 +69,8 @@ $ cat ~/log.normal
=== The Performance Format Target
The performance format target (PERF) is a column-based format to
-replace GIT_TRACE_PERFORMANCE and is suitable for development and
-testing, possibly to complement tools like gprof. This format is
+replace `GIT_TRACE_PERFORMANCE` and is suitable for development and
+testing, possibly to complement tools like `gprof`. This format is
enabled with the `GIT_TRACE2_PERF` environment variable or the
`trace2.perfTarget` system or global config setting.
@@ -128,7 +128,7 @@ yields
------------
$ cat ~/log.event
-{"event":"version","sid":"sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.620713Z","file":"common-main.c","line":38,"evt":"3","exe":"2.20.1.155.g426c96fcdb"}
+{"event":"version","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.620713Z","file":"common-main.c","line":38,"evt":"3","exe":"2.20.1.155.g426c96fcdb"}
{"event":"start","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621027Z","file":"common-main.c","line":39,"t_abs":0.001173,"argv":["git","version"]}
{"event":"cmd_name","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621122Z","file":"git.c","line":432,"name":"version","hierarchy":"version"}
{"event":"exit","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621236Z","file":"git.c","line":662,"t_abs":0.001227,"code":0}
@@ -170,9 +170,9 @@ Some functions have a `_va_fl()` suffix to indicate that they also
take a `va_list` argument.
Some functions have a `_printf_fl()` suffix to indicate that they also
-take a varargs argument.
+take a `printf()` style format with a variable number of arguments.
-There are CPP wrapper macros and ifdefs to hide most of these details.
+There are CPP wrapper macros and `#ifdef`s to hide most of these details.
See `trace2.h` for more details. The following discussion will only
describe the simplified forms.
@@ -234,7 +234,7 @@ Events are written as lines of the form:
is the event name.
`<event-message>`::
- is a free-form printf message intended for human consumption.
+ is a free-form `printf()` message intended for human consumption.
+
Note that this may contain embedded LF or CRLF characters that are
not escaped, so the event may spill across multiple lines.
@@ -300,7 +300,7 @@ This field is in anticipation of in-proc submodules in the future.
indicate a broad category, such as "index" or "status".
`<perf-event-message>`::
- is a free-form printf message intended for human consumption.
+ is a free-form `printf()` message intended for human consumption.
------------
15:33:33.532712 wt-status.c:2310 | d0 | main | region_enter | r1 | 0.126064 | | status | label:print
@@ -465,8 +465,8 @@ completed.)
------------
`"error"`::
- This event is emitted when one of the `BUG()`, `error()`, `die()`,
- `warning()`, or `usage()` functions are called.
+ This event is emitted when one of the `BUG()`, `bug()`, `error()`,
+ `die()`, `warning()`, or `usage()` functions are called.
+
------------
{
@@ -533,7 +533,7 @@ these special values are used:
------------
`"cmd_mode"`::
- This event, when present, describes the command variant This
+ This event, when present, describes the command variant. This
event may be emitted more than once.
+
------------
@@ -588,7 +588,7 @@ with "?".
`"child_exit"`::
This event is generated after the current process has returned
- from the waitpid() and collected the exit information from the
+ from the `waitpid()` and collected the exit information from the
child.
+
------------
@@ -609,7 +609,7 @@ process may be a shell script which doesn't have a session-id.)
+
Note that the `t_rel` field contains the observed run time in seconds
for the child process (starting before the fork/exec/spawn and
-stopping after the waitpid() and includes OS process creation overhead).
+stopping after the `waitpid()` and includes OS process creation overhead).
So this time will be slightly larger than the atexit time reported by
the child process itself.
@@ -635,7 +635,7 @@ process may be a shell script which doesn't have a session-id.)
+
This event is generated after the child is started in the background
and given a little time to boot up and start working. If the child
-startups normally and while the parent is still waiting, the "ready"
+starts up normally while the parent is still waiting, the "ready"
field will have the value "ready".
If the child is too slow to start and the parent times out, the field
will have the value "timeout".
@@ -949,7 +949,7 @@ atexit elapsed:3.868970 code:0
Regions::
- Regions can be use to time an interesting section of code.
+ Regions can be used to time an interesting section of code.
+
----------------
void wt_status_collect(struct wt_status *s)
@@ -1103,9 +1103,9 @@ Thread Events::
Thread messages added to a thread-proc.
+
-For example, the multithreaded preload-index code can be
+For example, the multi-threaded preload-index code can be
instrumented with a region around the thread pool and then
-per-thread start and exit events within the threadproc.
+per-thread start and exit events within the thread-proc.
+
----------------
static void *preload_thread(void *_data)
@@ -1214,11 +1214,11 @@ as each thread starts and allocates TLS storage.
There are a few issues to resolve before we can completely
switch to Trace2.
-* Updating existing tests that assume GIT_TRACE format messages.
+* Updating existing tests that assume `GIT_TRACE` format messages.
-* How to best handle custom GIT_TRACE_<key> messages?
+* How to best handle custom `GIT_TRACE_<key>` messages?
-** The GIT_TRACE_<key> mechanism allows each <key> to write to a
+** The `GIT_TRACE_<key>` mechanism allows each <key> to write to a
different file (in addition to just stderr).
** Do we want to maintain that ability or simply write to the existing
diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index 04b3ec2..a85f58f 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -25,9 +25,9 @@ An object is uniquely described by its bit position within a bitmap:
is defined as follows:
o1 <= o2 <==> pack(o1) <= pack(o2) /\ offset(o1) <= offset(o2)
-
- The ordering between packs is done according to the MIDX's .rev file.
- Notably, the preferred pack sorts ahead of all other packs.
++
+The ordering between packs is done according to the MIDX's .rev file.
+Notably, the preferred pack sorts ahead of all other packs.
The on-disk representation (described below) of a bitmap is the same regardless
of whether or not that bitmap belongs to a packfile or a MIDX. The only
@@ -39,97 +39,108 @@ MIDXs, both the bit-cache and rev-cache extensions are required.
== On-disk format
- - A header appears at the beginning:
-
- 4-byte signature: {'B', 'I', 'T', 'M'}
-
- 2-byte version number (network byte order)
- The current implementation only supports version 1
- of the bitmap index (the same one as JGit).
-
- 2-byte flags (network byte order)
-
- The following flags are supported:
-
- - BITMAP_OPT_FULL_DAG (0x1) REQUIRED
- This flag must always be present. It implies that the
- bitmap index has been generated for a packfile or
- multi-pack index (MIDX) with full closure (i.e. where
- every single object in the packfile/MIDX can find its
- parent links inside the same packfile/MIDX). This is a
- requirement for the bitmap index format, also present in
- JGit, that greatly reduces the complexity of the
- implementation.
-
- - BITMAP_OPT_HASH_CACHE (0x4)
- If present, the end of the bitmap file contains
- `N` 32-bit name-hash values, one per object in the
- pack/MIDX. The format and meaning of the name-hash is
- described below.
-
- 4-byte entry count (network byte order)
-
- The total count of entries (bitmapped commits) in this bitmap index.
-
- 20-byte checksum
-
- The SHA1 checksum of the pack/MIDX this bitmap index
- belongs to.
-
- - 4 EWAH bitmaps that act as type indexes
-
- Type indexes are serialized after the hash cache in the shape
- of four EWAH bitmaps stored consecutively (see Appendix A for
- the serialization format of an EWAH bitmap).
-
- There is a bitmap for each Git object type, stored in the following
- order:
-
- - Commits
- - Trees
- - Blobs
- - Tags
-
- In each bitmap, the `n`th bit is set to true if the `n`th object
- in the packfile or multi-pack index is of that type.
-
- The obvious consequence is that the OR of all 4 bitmaps will result
- in a full set (all bits set), and the AND of all 4 bitmaps will
- result in an empty bitmap (no bits set).
-
- - N entries with compressed bitmaps, one for each indexed commit
-
- Where `N` is the total amount of entries in this bitmap index.
- Each entry contains the following:
-
- - 4-byte object position (network byte order)
- The position **in the index for the packfile or
- multi-pack index** where the bitmap for this commit is
- found.
-
- - 1-byte XOR-offset
- The xor offset used to compress this bitmap. For an entry
- in position `x`, a XOR offset of `y` means that the actual
- bitmap representing this commit is composed by XORing the
- bitmap for this entry with the bitmap in entry `x-y` (i.e.
- the bitmap `y` entries before this one).
-
- Note that this compression can be recursive. In order to
- XOR this entry with a previous one, the previous entry needs
- to be decompressed first, and so on.
-
- The hard-limit for this offset is 160 (an entry can only be
- xor'ed against one of the 160 entries preceding it). This
- number is always positive, and hence entries are always xor'ed
- with **previous** bitmaps, not bitmaps that will come afterwards
- in the index.
-
- - 1-byte flags for this bitmap
- At the moment the only available flag is `0x1`, which hints
- that this bitmap can be re-used when rebuilding bitmap indexes
- for the repository.
-
- - The compressed bitmap itself, see Appendix A.
+ * A header appears at the beginning:
+
+ 4-byte signature: :: {'B', 'I', 'T', 'M'}
+
+ 2-byte version number (network byte order): ::
+
+ The current implementation only supports version 1
+ of the bitmap index (the same one as JGit).
+
+ 2-byte flags (network byte order): ::
+
+ The following flags are supported:
+
+ ** {empty}
+ BITMAP_OPT_FULL_DAG (0x1) REQUIRED: :::
+
+ This flag must always be present. It implies that the
+ bitmap index has been generated for a packfile or
+ multi-pack index (MIDX) with full closure (i.e. where
+ every single object in the packfile/MIDX can find its
+ parent links inside the same packfile/MIDX). This is a
+ requirement for the bitmap index format, also present in
+ JGit, that greatly reduces the complexity of the
+ implementation.
+
+ ** {empty}
+ BITMAP_OPT_HASH_CACHE (0x4): :::
+
+ If present, the end of the bitmap file contains
+ `N` 32-bit name-hash values, one per object in the
+ pack/MIDX. The format and meaning of the name-hash is
+ described below.
+
+ 4-byte entry count (network byte order): ::
+ The total count of entries (bitmapped commits) in this bitmap index.
+
+ 20-byte checksum: ::
+ The SHA1 checksum of the pack/MIDX this bitmap index
+ belongs to.
+
+ * 4 EWAH bitmaps that act as type indexes
++
+Type indexes are serialized after the hash cache in the shape
+of four EWAH bitmaps stored consecutively (see Appendix A for
+the serialization format of an EWAH bitmap).
++
+There is a bitmap for each Git object type, stored in the following
+order:
++
+ - Commits
+ - Trees
+ - Blobs
+ - Tags
+
++
+In each bitmap, the `n`th bit is set to true if the `n`th object
+in the packfile or multi-pack index is of that type.
++
+The obvious consequence is that the OR of all 4 bitmaps will result
+in a full set (all bits set), and the AND of all 4 bitmaps will
+result in an empty bitmap (no bits set).
+
+ * N entries with compressed bitmaps, one for each indexed commit
++
+Where `N` is the total amount of entries in this bitmap index.
+Each entry contains the following:
+
+ ** {empty}
+ 4-byte object position (network byte order): ::
+ The position **in the index for the packfile or
+ multi-pack index** where the bitmap for this commit is
+ found.
+
+ ** {empty}
+ 1-byte XOR-offset: ::
+ The xor offset used to compress this bitmap. For an entry
+ in position `x`, a XOR offset of `y` means that the actual
+ bitmap representing this commit is composed by XORing the
+ bitmap for this entry with the bitmap in entry `x-y` (i.e.
+ the bitmap `y` entries before this one).
++
+NOTE: This compression can be recursive. In order to
+XOR this entry with a previous one, the previous entry needs
+to be decompressed first, and so on.
++
+The hard-limit for this offset is 160 (an entry can only be
+xor'ed against one of the 160 entries preceding it). This
+number is always positive, and hence entries are always xor'ed
+with **previous** bitmaps, not bitmaps that will come afterwards
+in the index.
+
+ ** {empty}
+ 1-byte flags for this bitmap: ::
+ At the moment the only available flag is `0x1`, which hints
+ that this bitmap can be re-used when rebuilding bitmap indexes
+ for the repository.
+
+ ** The compressed bitmap itself, see Appendix A.
+
+ * {empty}
+ TRAILER: ::
+ Trailing checksum of the preceding contents.
== Appendix A: Serialization format for an EWAH bitmap
@@ -142,8 +153,8 @@ implementation:
- 4-byte number of words of the COMPRESSED bitmap, when stored
- N x 8-byte words, as specified by the previous field
-
- This is the actual content of the compressed bitmap.
++
+This is the actual content of the compressed bitmap.
- 4-byte position of the current RLW for the compressed
bitmap
diff --git a/Documentation/technical/cruft-packs.txt b/Documentation/technical/cruft-packs.txt
new file mode 100644
index 0000000..d81f3a8
--- /dev/null
+++ b/Documentation/technical/cruft-packs.txt
@@ -0,0 +1,123 @@
+= Cruft packs
+
+The cruft packs feature offer an alternative to Git's traditional mechanism of
+removing unreachable objects. This document provides an overview of Git's
+pruning mechanism, and how a cruft pack can be used instead to accomplish the
+same.
+
+== Background
+
+To remove unreachable objects from your repository, Git offers `git repack -Ad`
+(see linkgit:git-repack[1]). Quoting from the documentation:
+
+[quote]
+[...] unreachable objects in a previous pack become loose, unpacked objects,
+instead of being left in the old pack. [...] loose unreachable objects will be
+pruned according to normal expiry rules with the next 'git gc' invocation.
+
+Unreachable objects aren't removed immediately, since doing so could race with
+an incoming push which may reference an object which is about to be deleted.
+Instead, those unreachable objects are stored as loose objects and stay that way
+until they are older than the expiration window, at which point they are removed
+by linkgit:git-prune[1].
+
+Git must store these unreachable objects loose in order to keep track of their
+per-object mtimes. If these unreachable objects were written into one big pack,
+then either freshening that pack (because an object contained within it was
+re-written) or creating a new pack of unreachable objects would cause the pack's
+mtime to get updated, and the objects within it would never leave the expiration
+window. Instead, objects are stored loose in order to keep track of the
+individual object mtimes and avoid a situation where all cruft objects are
+freshened at once.
+
+This can lead to undesirable situations when a repository contains many
+unreachable objects which have not yet left the grace period. Having large
+directories in the shards of `.git/objects` can lead to decreased performance in
+the repository. But given enough unreachable objects, this can lead to inode
+starvation and degrade the performance of the whole system. Since we
+can never pack those objects, these repositories often take up a large amount of
+disk space, since we can only zlib compress them, but not store them in delta
+chains.
+
+== Cruft packs
+
+A cruft pack eliminates the need for storing unreachable objects in a loose
+state by including the per-object mtimes in a separate file alongside a single
+pack containing all loose objects.
+
+A cruft pack is written by `git repack --cruft` when generating a new pack.
+linkgit:git-pack-objects[1]'s `--cruft` option. Note that `git repack --cruft`
+is a classic all-into-one repack, meaning that everything in the resulting pack is
+reachable, and everything else is unreachable. Once written, the `--cruft`
+option instructs `git repack` to generate another pack containing only objects
+not packed in the previous step (which equates to packing all unreachable
+objects together). This progresses as follows:
+
+ 1. Enumerate every object, marking any object which is (a) not contained in a
+ kept-pack, and (b) whose mtime is within the grace period as a traversal
+ tip.
+
+ 2. Perform a reachability traversal based on the tips gathered in the previous
+ step, adding every object along the way to the pack.
+
+ 3. Write the pack out, along with a `.mtimes` file that records the per-object
+ timestamps.
+
+This mode is invoked internally by linkgit:git-repack[1] when instructed to
+write a cruft pack. Crucially, the set of in-core kept packs is exactly the set
+of packs which will not be deleted by the repack; in other words, they contain
+all of the repository's reachable objects.
+
+When a repository already has a cruft pack, `git repack --cruft` typically only
+adds objects to it. An exception to this is when `git repack` is given the
+`--cruft-expiration` option, which allows the generated cruft pack to omit
+expired objects instead of waiting for linkgit:git-gc[1] to expire those objects
+later on.
+
+It is linkgit:git-gc[1] that is typically responsible for removing expired
+unreachable objects.
+
+== Caution for mixed-version environments
+
+Repositories that have cruft packs in them will continue to work with any older
+version of Git. Note, however, that previous versions of Git which do not
+understand the `.mtimes` file will use the cruft pack's mtime as the mtime for
+all of the objects in it. In other words, do not expect older (pre-cruft pack)
+versions of Git to interpret or even read the contents of the `.mtimes` file.
+
+Note that having mixed versions of Git GC-ing the same repository can lead to
+unreachable objects never being completely pruned. This can happen under the
+following circumstances:
+
+ - An older version of Git running GC explodes the contents of an existing
+ cruft pack loose, using the cruft pack's mtime.
+ - A newer version running GC collects those loose objects into a cruft pack,
+ where the .mtime file reflects the loose object's actual mtimes, but the
+ cruft pack mtime is "now".
+
+Repeating this process will lead to unreachable objects not getting pruned as a
+result of repeatedly resetting the objects' mtimes to the present time.
+
+If you are GC-ing repositories in a mixed version environment, consider omitting
+the `--cruft` option when using linkgit:git-repack[1] and linkgit:git-gc[1], and
+leaving the `gc.cruftPacks` configuration unset until all writers understand
+cruft packs.
+
+== Alternatives
+
+Notable alternatives to this design include:
+
+ - The location of the per-object mtime data, and
+ - Storing unreachable objects in multiple cruft packs.
+
+On the location of mtime data, a new auxiliary file tied to the pack was chosen
+to avoid complicating the `.idx` format. If the `.idx` format were ever to gain
+support for optional chunks of data, it may make sense to consolidate the
+`.mtimes` format into the `.idx` itself.
+
+Storing unreachable objects among multiple cruft packs (e.g., creating a new
+cruft pack during each repacking operation including only unreachable objects
+which aren't already stored in an earlier cruft pack) is significantly more
+complicated to construct, and so aren't pursued here. The obvious drawback to
+the current implementation is that the entire cruft pack must be re-written from
+scratch.
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index 65da0da..f691c20 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -26,8 +26,6 @@ Git index format
Extensions are identified by signature. Optional extensions can
be ignored if Git does not understand them.
- Git currently supports cache tree and resolve undo extensions.
-
4-byte extension signature. If the first byte is 'A'..'Z' the
extension is optional and can be ignored.
diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 6d3efb7..b520aa9 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -294,6 +294,25 @@ Pack file entry: <+
All 4-byte numbers are in network order.
+== pack-*.mtimes files have the format:
+
+All 4-byte numbers are in network byte order.
+
+ - A 4-byte magic number '0x4d544d45' ('MTME').
+
+ - A 4-byte version identifier (= 1).
+
+ - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256).
+
+ - A table of 4-byte unsigned integers. The ith value is the
+ modification time (mtime) of the ith object in the corresponding
+ pack by lexicographic (index) order. The mtimes count standard
+ epoch seconds.
+
+ - A trailer, containing a checksum of the corresponding packfile,
+ and a checksum of all of the above (each having length according
+ to the specified hash function).
+
== multi-pack-index (MIDX) files have the following format:
The multi-pack-index files refer to multiple pack-files and loose objects.
diff --git a/Documentation/technical/partial-clone.txt b/Documentation/technical/partial-clone.txt
index a0dd7c6..99f0eb3 100644
--- a/Documentation/technical/partial-clone.txt
+++ b/Documentation/technical/partial-clone.txt
@@ -181,6 +181,9 @@ Fetching Missing Objects
currently fetches all objects referred to by the requested objects, even
though they are not necessary.
+- Fetching with `--refetch` will request a complete new filtered packfile from
+ the remote, which can be used to change a filter without needing to
+ dynamically fetch missing objects.
Using many promisor remotes
---------------------------
diff --git a/Documentation/technical/scalar.txt b/Documentation/technical/scalar.txt
new file mode 100644
index 0000000..08bc09c
--- /dev/null
+++ b/Documentation/technical/scalar.txt
@@ -0,0 +1,127 @@
+Scalar
+======
+
+Scalar is a repository management tool that optimizes Git for use in large
+repositories. It accomplishes this by helping users to take advantage of
+advanced performance features in Git. Unlike most other Git built-in commands,
+Scalar is not executed as a subcommand of 'git'; rather, it is built as a
+separate executable containing its own series of subcommands.
+
+Background
+----------
+
+Scalar was originally designed as an add-on to Git and implemented as a .NET
+Core application. It was created based on the learnings from the VFS for Git
+project (another application aimed at improving the experience of working with
+large repositories). As part of its initial implementation, Scalar relied on
+custom features in the Microsoft fork of Git that have since been integrated
+into core Git:
+
+* partial clone,
+* commit graphs,
+* multi-pack index,
+* sparse checkout (cone mode),
+* scheduled background maintenance,
+* etc
+
+With the requisite Git functionality in place and a desire to bring the benefits
+of Scalar to the larger Git community, the Scalar application itself was ported
+from C# to C and integrated upstream.
+
+Features
+--------
+
+Scalar is comprised of two major pieces of functionality: automatically
+configuring built-in Git performance features and managing repository
+enlistments.
+
+The Git performance features configured by Scalar (see "Background" for
+examples) confer substantial performance benefits to large repositories, but are
+either too experimental to enable for all of Git yet, or only benefit large
+repositories. As new features are introduced, Scalar should be updated
+accordingly to incorporate them. This will prevent the tool from becoming stale
+while also providing a path for more easily bringing features to the appropriate
+users.
+
+Enlistments are how Scalar knows which repositories on a user's system should
+utilize Scalar-configured features. This allows it to update performance
+settings when new ones are added to the tool, as well as centrally manage
+repository maintenance. The enlistment structure - a root directory with a
+`src/` subdirectory containing the cloned repository itself - is designed to
+encourage users to route build outputs outside of the repository to avoid the
+performance-limiting overhead of ignoring those files in Git.
+
+Design
+------
+
+Scalar is implemented in C and interacts with Git via a mix of child process
+invocations of Git and direct usage of `libgit.a`. Internally, it is structured
+much like other built-ins with subcommands (e.g., `git stash`), containing a
+`cmd_<subcommand>()` function for each subcommand, routed through a `cmd_main()`
+function. Most options are unique to each subcommand, with `scalar` respecting
+some "global" `git` options (e.g., `-c` and `-C`).
+
+Because `scalar` is not invoked as a Git subcommand (like `git scalar`), it is
+built and installed as its own executable in the `bin/` directory, alongside
+`git`, `git-gui`, etc.
+
+Roadmap
+-------
+
+NOTE: this section will be removed once the remaining tasks outlined in this
+roadmap are complete.
+
+Scalar is a large enough project that it is being upstreamed incrementally,
+living in `contrib/` until it is feature-complete. So far, the following patch
+series have been accepted:
+
+- `scalar-the-beginning`: The initial patch series which sets up
+ `contrib/scalar/` and populates it with a minimal `scalar` command that
+ demonstrates the fundamental ideas.
+
+- `scalar-c-and-C`: The `scalar` command learns about two options that can be
+ specified before the command, `-c <key>=<value>` and `-C <directory>`.
+
+- `scalar-diagnose`: The `scalar` command is taught the `diagnose` subcommand.
+
+Roughly speaking (and subject to change), the following series are needed to
+"finish" this initial version of Scalar:
+
+- Finish Scalar features: Enable the built-in FSMonitor in Scalar enlistments
+ and implement `scalar help`. At the end of this series, Scalar should be
+ feature-complete from the perspective of a user.
+
+- Generalize features not specific to Scalar: In the spirit of making Scalar
+ configure only what is needed for large repo performance, move common
+ utilities into other parts of Git. Some of this will be internal-only, but one
+ major change will be generalizing `scalar diagnose` for use with any Git
+ repository.
+
+- Move Scalar to toplevel: Move Scalar out of `contrib/` and into the root of
+ `git`, including updates to build and install it with the rest of Git. This
+ change will incorporate Scalar into the Git CI and test framework, as well as
+ expand regression and performance testing to ensure the tool is stable.
+
+Finally, there are two additional patch series that exist in Microsoft's fork of
+Git, but there is no current plan to upstream them. There are some interesting
+ideas there, but the implementation is too specific to Azure Repos and/or VFS
+for Git to be of much help in general.
+
+These still exist mainly because the GVFS protocol is what Azure Repos has
+instead of partial clone, while Git is focused on improving partial clone:
+
+- `scalar-with-gvfs`: The primary purpose of this patch series is to support
+ existing Scalar users whose repositories are hosted in Azure Repos (which does
+ not support Git's partial clones, but supports its predecessor, the GVFS
+ protocol, which is used by Scalar to emulate the partial clone).
+
+ Since the GVFS protocol will never be supported by core Git, this patch series
+ will remain in Microsoft's fork of Git.
+
+- `run-scalar-functional-tests`: The Scalar project developed a quite
+ comprehensive set of integration tests (or, "Functional Tests"). They are the
+ sole remaining part of the original C#-based Scalar project, and this patch
+ adds a GitHub workflow that runs them all.
+
+ Since the tests partially depend on features that are only provided in the
+ `scalar-with-gvfs` patch series, this patch cannot be upstreamed.