summaryrefslogtreecommitdiff
path: root/builtin/multi-pack-index.c
AgeCommit message (Collapse)Author
2021-09-29midx: preliminary support for `--refs-snapshot`Taylor Blau
To figure out which commits we can write a bitmap for, the multi-pack index/bitmap code does a reachability traversal, marking any commit which can be found in the MIDX as eligible to receive a bitmap. This approach will cause a problem when multi-pack bitmaps are able to be generated from `git repack`, since the reference tips can change during the repack. Even though we ignore commits that don't exist in the MIDX (when doing a scan of the ref tips), it's possible that a commit in the MIDX reaches something that isn't. This can happen when a multi-pack index contains some pack which refers to loose objects (e.g., if a pack was pushed after starting the repack but before generating the MIDX which depends on an object which is stored as loose in the repository, and by definition isn't included in the multi-pack index). By taking a snapshot of the references before we start repacking, we can close that race window. In the above scenario (where we have a packed object pointing at a loose one), we'll either (a) take a snapshot of the references before seeing the packed one, or (b) take it after, at which point we can guarantee that the loose object will be packed and included in the MIDX. This patch does just that. It writes a temporary "reference snapshot", which is a list of OIDs that are at the ref tips before writing a multi-pack bitmap. References that are "preferred" (i.e,. are a suffix of at least one value of the 'pack.preferBitmapTips' configuration) are marked with a special '+'. The format is simple: one line per commit at each tip, with an optional '+' at the beginning (for preferred references, as described above). When provided, the reference snapshot is used to drive bitmap selection instead of the MIDX code doing its own traversal. When it isn't provided, the usual traversal takes place instead. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-29builtin/multi-pack-index.c: support `--stdin-packs` modeTaylor Blau
To power a new `--write-midx` mode, `git repack` will want to write a multi-pack index containing a certain set of packs in the repository. This new option will be used by `git repack` to write a MIDX which contains only the packs which will survive after the repack (that is, it will exclude any packs which are about to be deleted). This patch effectively exposes the function implemented in the previous commit via the `git multi-pack-index` builtin. An alternative approach would have been to call that function from the `git repack` builtin directly, but this introduces awkward problems around closing and reopening the object store, so the MIDX will be written out-of-process. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01pack-bitmap: write multi-pack bitmapsTaylor Blau
Write multi-pack bitmaps in the format described by Documentation/technical/bitmap-format.txt, inferring their presence with the absence of '--bitmap'. To write a multi-pack bitmap, this patch attempts to reuse as much of the existing machinery from pack-objects as possible. Specifically, the MIDX code prepares a packing_data struct that pretends as if a single packfile has been generated containing all of the objects contained within the MIDX. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-19multi-pack-index: fix potential segfault without sub-commandTaylor Blau
Since cd57bc41bb (builtin/multi-pack-index.c: display usage on unrecognized command, 2021-03-30) we have used a "usage" label to avoid having two separate callers of usage_with_options (one when no arguments are given, and another for unrecognized sub-commands). But the first caller has been broken since cd57bc41bb, since it will happily jump to usage without arguments, and then pass argv[0] to the "unrecognized subcommand" error. Many compilers will save us from a segfault here, but the end result is ugly, since it mentions an unrecognized subcommand when we didn't even pass one, and (on GCC) includes "(null)" in its output. Move the "usage" label down past the error about unrecognized subcommands so that it is only triggered when it should be. While we're at it, bulk up our test coverage in this area, too. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-01midx: allow marking a pack as preferredTaylor Blau
When multiple packs in the multi-pack index contain the same object, the MIDX machinery must make a choice about which pack it associates with that object. Prior to this patch, the lowest-ordered[1] pack was always selected. Pack selection for duplicate objects is relatively unimportant today, but it will become important for multi-pack bitmaps. This is because we can only invoke the pack-reuse mechanism when all of the bits for reused objects come from the reuse pack (in order to ensure that all reused deltas can find their base objects in the same pack). To encourage the pack selection process to prefer one pack over another (the pack to be preferred is the one a caller would like to later use as a reuse pack), introduce the concept of a "preferred pack". When provided, the MIDX code will always prefer an object found in a preferred pack over any other. No format changes are required to store the preferred pack, since it will be able to be inferred with a corresponding MIDX bitmap, by looking up the pack associated with the object in the first bit position (this ordering is described in detail in a subsequent commit). [1]: the ordering is specified by MIDX internals; for our purposes we can consider the "lowest ordered" pack to be "the one with the most-recent mtime. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30builtin/multi-pack-index.c: display usage on unrecognized commandTaylor Blau
When given a sub-command that it doesn't understand, 'git multi-pack-index' dies with the following message: $ git multi-pack-index bogus fatal: unrecognized subcommand: bogus Instead of 'die()'-ing, we can display the usage text, which is much more helpful: $ git.compile multi-pack-index bogus error: unrecognized subcommand: bogus usage: git multi-pack-index [<options>] write or: git multi-pack-index [<options>] verify or: git multi-pack-index [<options>] expire or: git multi-pack-index [<options>] repack [--batch-size=<size>] --object-dir <file> object directory containing set of packfile and pack-index pairs --progress force progress reporting While we're at it, clean up some duplication between the "no sub-command" and "unrecognized sub-command" conditionals. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30builtin/multi-pack-index.c: don't enter bogus cmd_modeTaylor Blau
Even before the recent refactoring, 'git multi-pack-index' calls 'trace2_cmd_mode()' before verifying that the sub-command is recognized. Push this call down into the individual sub-commands so that we don't enter a bogus command mode. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30builtin/multi-pack-index.c: split sub-commandsTaylor Blau
Handle sub-commands of the 'git multi-pack-index' builtin (e.g., "write", "repack", etc.) separately from one another. This allows sub-commands with unique options, without forcing cmd_multi_pack_index() to reject invalid combinations itself. This comes at the cost of some duplication and boilerplate. Luckily, the duplication is reduced to a minimum, since common options are shared among sub-commands due to a suggestion by Ævar. (Sub-commands do have to retain the common options, too, since this builtin accepts common options on either side of the sub-command). Roughly speaking, cmd_multi_pack_index() parses options (including common ones), and stops at the first non-option, which is the sub-command. It then dispatches to the appropriate sub-command, which parses the remaining options (also including common options). Unknown options are kept by the sub-commands in order to detect their presence (and complain that too many arguments were given). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30builtin/multi-pack-index.c: define common usage with a macroTaylor Blau
Factor out the usage message into pieces corresponding to each mode. This avoids options specific to one sub-command from being shared with another in the usage. A subsequent commit will use these #define macros to have usage variables for each sub-command without duplicating their contents. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30builtin/multi-pack-index.c: don't handle 'progress' separatelyTaylor Blau
Now that there is a shared 'flags' member in the options structure, there is no need to keep track of whether to force progress or not, since ultimately the decision of whether or not to show a progress meter is controlled by a bit in the flags member. Manipulate that bit directly, and drop the now-unnecessary 'progress' field while we're at it. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30builtin/multi-pack-index.c: inline 'flags' with optionsTaylor Blau
Subcommands of the 'git multi-pack-index' command (e.g., 'write', 'verify', etc.) will want to optionally change a set of shared flags that are eventually passed to the MIDX libraries. Right now, options and flags are handled separately. That's fine, since the options structure is never passed around. But a future patch will make it so that common options shared by all sub-commands are defined in a common location. That means that "flags" would have to become a global variable. Group it with the options structure so that we reduce the number of global variables we have overall. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-23multi-pack-index: add [--[no-]progress] option.William Baker
Add the --[no-]progress option to git multi-pack-index. Pass the MIDX_PROGRESS flag to the subcommand functions when progress should be displayed by multi-pack-index. The progress feature was added to 'verify' in 144d703 ("multi-pack-index: report progress during 'verify'", 2018-09-13) but some subcommands were not updated to display progress, and the ability to opt-out was overlooked. Signed-off-by: William Baker <William.Baker@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-23midx: add MIDX_PROGRESS flagWilliam Baker
Add the MIDX_PROGRESS flag and update the write|verify|expire|repack functions in midx.h to accept a flags parameter. The MIDX_PROGRESS flag indicates whether the caller of the function would like progress information to be displayed. This patch only changes the method prototypes and does not change the functionality. The functionality change will be handled by a later patch. Signed-off-by: William Baker <William.Baker@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-11multi-pack-index: prepare 'repack' subcommandDerrick Stolee
In an environment where the multi-pack-index is useful, it is due to many pack-files and an inability to repack the object store into a single pack-file. However, it is likely that many of these pack-files are rather small, and could be repacked into a slightly larger pack-file without too much effort. It may also be important to ensure the object store is highly available and the repack operation does not interrupt concurrent git commands. Introduce a 'repack' subcommand to 'git multi-pack-index' that takes a '--batch-size' option. The subcommand will inspect the multi-pack-index for referenced pack-files whose size is smaller than the batch size, until collecting a list of pack-files whose sizes sum to larger than the batch size. Then, a new pack-file will be created containing the objects from those pack-files that are referenced by the multi-pack-index. The resulting pack is likely to actually be smaller than the batch size due to compression and the fact that there may be objects in the pack- files that have duplicate copies in other pack-files. The current change introduces the command-line arguments, and we add a test that ensures we parse these options properly. Since we specify a small batch size, we will guarantee that future implementations do not change the list of pack-files. In addition, we hard-code the modified times of the packs in the pack directory to ensure the list of packs sorted by modified time matches the order if sorted by size (ascending). This will be important in a future test. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-11multi-pack-index: prepare for 'expire' subcommandDerrick Stolee
The multi-pack-index tracks objects in a collection of pack-files. Only one copy of each object is indexed, using the modified time of the pack-files to determine tie-breakers. It is possible to have a pack-file with no referenced objects because all objects have a duplicate in a newer pack-file. Introduce a new 'expire' subcommand to the multi-pack-index builtin. This subcommand will delete these unused pack-files and rewrite the multi-pack-index to no longer refer to those files. More details about the specifics will follow as the method is implemented. Add a test that verifies the 'expire' subcommand is correctly wired, but will still be valid when the verb is implemented. Specifically, create a set of packs that should all have referenced objects and should not be removed during an 'expire' operation. The packs are created carefully to ensure they have a specific order when sorted by size. This will be important in a later test. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-07midx: pass a repository pointerDerrick Stolee
Much of the multi-pack-index code focuses on the multi_pack_index struct, and so we only pass a pointer to the current one. However, we will insert a dependency on the packed_git linked list in a future change, so we will need a repository reference. Inserting these parameters is a significant enough change to split out. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-22trace2:data: add trace2 data to midxJeff Hostetler
Log multi-pack-index command mode. Log number of objects and packfiles in the midx. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17multi-pack-index: add 'verify' verbDerrick Stolee
The multi-pack-index builtin writes multi-pack-index files, and uses a 'write' verb to do so. Add a 'verify' verb that checks this file matches the contents of the pack-indexes it replaces. The current implementation is a no-op, but will be extended in small increments in later commits. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20multi-pack-index: provide more helpful usage infoDerrick Stolee
The multi-pack-index builtin has a very simple command-line interface. Instead of simply reporting usage, give the user a hint to why the arguments failed. Reported-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20multi-pack-index: add 'write' verbDerrick Stolee
In anticipation of writing multi-pack-indexes, add a skeleton 'git multi-pack-index write' subcommand and send the options to a write_midx_file() method. Also create a skeleton test script that tests the 'write' subcommand. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20multi-pack-index: add builtinDerrick Stolee
This new 'git multi-pack-index' builtin will be the plumbing access for writing, reading, and checking multi-pack-index files. The initial implementation is a no-op. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>