path: root/Documentation/technical
diff options
Diffstat (limited to 'Documentation/technical')
3 files changed, 182 insertions, 4 deletions
diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt
index b977ae8..ad7a5bd 100644
--- a/Documentation/technical/api-hashmap.txt
+++ b/Documentation/technical/api-hashmap.txt
@@ -8,11 +8,19 @@ Data Structures
`struct hashmap`::
- The hash table structure.
+ The hash table structure. Members can be used as follows, but should
+ not be modified directly:
-The `size` member keeps track of the total number of entries. The `cmpfn`
-member is a function used to compare two entries for equality. The `table` and
-`tablesize` members store the hash table and its size, respectively.
+The `size` member keeps track of the total number of entries (0 means the
+hashmap is empty).
+`tablesize` is the allocated size of the hash table. A non-0 value indicates
+that the hashmap is initialized. It may also be useful for statistical purposes
+(i.e. `size / tablesize` is the current load factor).
+`cmpfn` stores the comparison function specified in `hashmap_init()`. In
+advanced scenarios, it may be useful to change this, e.g. to switch between
+case-sensitive and case-insensitive lookup.
`struct hashmap_entry`::
@@ -58,6 +66,15 @@ Functions
`strihash` and `memihash` are case insensitive versions.
+`unsigned int sha1hash(const unsigned char *sha1)`::
+ Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
+ for use in hash tables. Cryptographic hashes are supposed to have
+ uniform distribution, so in contrast to `memhash()`, this just copies
+ the first `sizeof(int)` bytes without shuffling any bits. Note that
+ the results will be different on big-endian and little-endian
+ platforms, so they should not be stored or transferred over the net.
`void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t initial_size)`::
Initializes a hashmap structure.
@@ -101,6 +118,20 @@ hashmap_entry) that has at least been initialized with the proper hash code
If an entry with matching hash code is found, `key` and `keydata` are passed
to `hashmap_cmp_fn` to decide whether the entry matches the key.
+`void *hashmap_get_from_hash(const struct hashmap *map, unsigned int hash, const void *keydata)`::
+ Returns the hashmap entry for the specified hash code and key data,
+ or NULL if not found.
+`map` is the hashmap structure.
+`hash` is the hash code of the entry to look up.
+If an entry with matching hash code is found, `keydata` is passed to
+`hashmap_cmp_fn` to decide whether the entry matches the key. The
+`entry_or_key` parameter points to a bogus hashmap_entry structure that
+should not be used in the comparison.
`void *hashmap_get_next(const struct hashmap *map, const void *entry)`::
Returns the next equal hashmap entry, or NULL if not found. This can be
@@ -162,6 +193,21 @@ more entries.
`hashmap_iter_first` is a combination of both (i.e. initializes the iterator
and returns the first entry, if any).
+`const char *strintern(const char *string)`::
+`const void *memintern(const void *data, size_t len)`::
+ Returns the unique, interned version of the specified string or data,
+ similar to the `String.intern` API in Java and .NET, respectively.
+ Interned strings remain valid for the entire lifetime of the process.
+Can be used as `[x]strdup()` or `xmemdupz` replacement, except that interned
+strings / data must not be modified or freed.
+Interned strings are best used for short strings with high probability of
+Uses a hashmap to store the pool of interned strings.
Usage example
diff --git a/Documentation/technical/api-trace.txt b/Documentation/technical/api-trace.txt
new file mode 100644
index 0000000..097a651
--- /dev/null
+++ b/Documentation/technical/api-trace.txt
@@ -0,0 +1,97 @@
+trace API
+The trace API can be used to print debug messages to stderr or a file. Trace
+code is inactive unless explicitly enabled by setting `GIT_TRACE*` environment
+The trace implementation automatically adds `timestamp file:line ... \n` to
+all trace messages. E.g.:
+23:59:59.123456 git.c:312 trace: built-in: git 'foo'
+00:00:00.000001 builtin/foo.c:99 foo: some message
+Data Structures
+`struct trace_key`::
+ Defines a trace key (or category). The default (for API functions that
+ don't take a key) is `GIT_TRACE`.
+E.g. to define a trace key controlled by environment variable `GIT_TRACE_FOO`:
+static struct trace_key trace_foo = TRACE_KEY_INIT(FOO);
+static void trace_print_foo(const char *message)
+ trace_print_key(&trace_foo, message);
+Note: don't use `const` as the trace implementation stores internal state in
+the `trace_key` structure.
+`int trace_want(struct trace_key *key)`::
+ Checks whether the trace key is enabled. Used to prevent expensive
+ string formatting before calling one of the printing APIs.
+`void trace_disable(struct trace_key *key)`::
+ Disables tracing for the specified key, even if the environment
+ variable was set.
+`void trace_printf(const char *format, ...)`::
+`void trace_printf_key(struct trace_key *key, const char *format, ...)`::
+ Prints a formatted message, similar to printf.
+`void trace_argv_printf(const char **argv, const char *format, ...)``::
+ Prints a formatted message, followed by a quoted list of arguments.
+`void trace_strbuf(struct trace_key *key, const struct strbuf *data)`::
+ Prints the strbuf, without additional formatting (i.e. doesn't
+ choke on `%` or even `\0`).
+`uint64_t getnanotime(void)`::
+ Returns nanoseconds since the epoch (01/01/1970), typically used
+ for performance measurements.
+Currently there are high precision timer implementations for Linux (using
+`clock_gettime(CLOCK_MONOTONIC)`) and Windows (`QueryPerformanceCounter`).
+Other platforms use `gettimeofday` as time source.
+`void trace_performance(uint64_t nanos, const char *format, ...)`::
+`void trace_performance_since(uint64_t start, const char *format, ...)`::
+ Prints the elapsed time (in nanoseconds), or elapsed time since
+ `start`, followed by a formatted message. Enabled via environment
+ variable `GIT_TRACE_PERFORMANCE`. Used for manual profiling, e.g.:
+uint64_t start = getnanotime();
+/* code section to measure */
+trace_performance_since(start, "foobar");
+uint64_t t = 0;
+for (;;) {
+ /* ignore */
+ t -= getnanotime();
+ /* code section to measure */
+ t += getnanotime();
+ /* ignore */
+trace_performance(t, "frotz");
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index f352a9b..fe6f316 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -129,6 +129,9 @@ Git index format
(Version 4) In version 4, the padding after the pathname does not
+ Interpretation of index entries in split index mode is completely
+ different. See below for details.
== Extensions
=== Cached tree
@@ -198,3 +201,35 @@ Git index format
- At most three 160-bit object names of the entry in stages from 1 to 3
(nothing is written for a missing stage).
+=== Split index
+ In split index mode, the majority of index entries could be stored
+ in a separate file. This extension records the changes to be made on
+ top of that to produce the final index.
+ The signature for this extension is { 'l', 'i, 'n', 'k' }.
+ The extension consists of:
+ - 160-bit SHA-1 of the shared index file. The shared index file path
+ is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the
+ index does not require a shared index file.
+ - An ewah-encoded delete bitmap, each bit represents an entry in the
+ shared index. If a bit is set, its corresponding entry in the
+ shared index will be removed from the final index. Note, because
+ a delete operation changes index entry positions, but we do need
+ original positions in replace phase, it's best to just mark
+ entries for removal, then do a mass deletion after replacement.
+ - An ewah-encoded replace bitmap, each bit represents an entry in
+ the shared index. If a bit is set, its corresponding entry in the
+ shared index will be replaced with an entry in this index
+ file. All replaced entries are stored in sorted order in this
+ index. The first "1" bit in the replace bitmap corresponds to the
+ first index entry, the second "1" bit to the second entry and so
+ on. Replaced entries may have empty path names to save space.
+ The remaining index entries after replaced ones will be added to the
+ final index. These added entries are also sorted by entry namme then
+ stage.