path: root/builtin/repack.c
AgeCommit message (Collapse)Author
2015-10-26Merge branch 'jk/repository-extension'Junio C Hamano
Prepare for Git on-disk repository representation to undergo backward incompatible changes by introducing a new repository format version "1", with an extension mechanism. * jk/repository-extension: introduce "preciousObjects" repository extension introduce "extensions" form of core.repositoryformatversion
2015-08-10prefer mkpathdup to mkpath in assignmentsJeff King
As with the previous commit to git_path, assigning the result of mkpath is suspicious, since it is not clear whether we will still depend on the value after it may have been overwritten by subsequent calls. This patch converts low-hanging fruit to use mkpathdup instead of mkpath (with the downside that we must remember to free the result). Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2015-06-25introduce "preciousObjects" repository extensionJeff King
If this extension is used in a repository, then no operations should run which may drop objects from the object storage. This can be useful if you are sharing that storage with other repositories whose refs you cannot see. For instance, if you do: $ git clone -s parent child $ git -C parent config extensions.preciousObjects true $ git -C parent config core.repositoryformatversion 1 you now have additional safety when running git in the parent repository. Prunes and repacks will bail with an error, and `git gc` will skip those operations (it will continue to pack refs and do other non-object operations). Older versions of git, when run in the repository, will fail on every operation. Note that we do not set the preciousObjects extension by default when doing a "clone -s", as doing so breaks backwards compatibility. It is a decision the user should make explicitly. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2015-05-11Merge branch 'nd/multiple-work-trees'Junio C Hamano
A replacement for contrib/workdir/git-new-workdir that does not rely on symbolic links and make sharing of objects and refs safer by making the borrowee and borrowers aware of each other. * nd/multiple-work-trees: (41 commits) prune --worktrees: fix expire vs worktree existence condition t1501: fix test with split index t2026: fix broken &&-chain t2026 needs procondition SANITY git-checkout.txt: a note about multiple checkout support for submodules checkout: add --ignore-other-wortrees checkout: pass whole struct to parse_branchname_arg instead of individual flags git-common-dir: make "modules/" per-working-directory directory checkout: do not fail if target is an empty directory t2025: add a test to make sure grafts is working from a linked checkout checkout: don't require a work tree when checking out into a new one git_path(): keep "info/sparse-checkout" per work-tree count-objects: report unused files in $GIT_DIR/worktrees/... gc: support prune --worktrees gc: factor out gc.pruneexpire parsing code gc: style change -- no SP before closing parenthesis checkout: clean up half-prepared directories in --to mode checkout: reject if the branch is already checked out elsewhere prune: strategies for linked checkouts checkout: support checking out into a new working directory ...
2015-03-25Merge branch 'jk/prune-with-corrupt-refs'Junio C Hamano
"git prune" used to largely ignore broken refs when deciding which objects are still being used, which could spread an existing small damage and make it a larger one. * jk/prune-with-corrupt-refs: refs.c: drop curate_packed_refs repack: turn on "ref paranoia" when doing a destructive repack prune: turn on ref_paranoia flag refs: introduce a "ref paranoia" flag t5312: test object deletion code paths in a corrupted repository
2015-03-20repack: turn on "ref paranoia" when doing a destructive repackJeff King
If we are repacking with "-ad", we will drop any unreachable objects. Likewise, using "-Ad --unpack-unreachable=<time>" will drop any old, unreachable objects. In these cases, we want to make sure the reachability we compute with "--all" is complete. We can do this by passing GIT_REF_PARANOIA=1 in the environment to pack-objects. Note that "-Ad" is safe already, because it only loosens unreachable objects. It is up to "git prune" to avoid deleting them. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2015-01-14standardize usage info string formatAlex Henrie
This patch puts the usage info strings that were not already in docopt- like format into docopt-like format, which will be a litle easier for end users and a lot easier for translators. Changes include: - Placing angle brackets around fill-in-the-blank parameters - Putting dashes in multiword parameter names - Adding spaces to [-f|--foobar] to make [-f | --foobar] - Replacing <foobar>* with [<foobar>...] Signed-off-by: Alex Henrie <> Reviewed-by: Matthieu Moy <> Signed-off-by: Junio C Hamano <>
2014-12-22Merge branch 'mh/simplify-repack-without-refs'Junio C Hamano
"git remote update --prune" to drop many refs has been optimized. * mh/simplify-repack-without-refs: sort_string_list(): rename to string_list_sort() prune_remote(): iterate using for_each_string_list_item() prune_remote(): rename local variable repack_without_refs(): make the refnames argument a string_list prune_remote(): sort delete_refs_list references en masse prune_remote(): initialize both delete_refs lists in a single loop prune_remote(): exit early if there are no stale references
2014-12-01path.c: make get_pathname() call sites return const char *Nguyễn Thái Ngọc Duy
Before the previous commit, get_pathname returns an array of PATH_MAX length. Even if git_path() and similar functions does not use the whole array, git_path() caller can, in theory. After the commit, get_pathname() may return a buffer that has just enough room for the returned string and git_path() caller should never write beyond that. Make git_path(), mkpath() and git_path_submodule() return a const buffer to make sure callers do not write in it at all. This could have been part of the previous commit, but the "const" conversion is too much distraction from the core changes in path.c. Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2014-11-25sort_string_list(): rename to string_list_sort()Michael Haggerty
The new name is more consistent with the names of other string_list-related functions. Suggested-by: Junio C Hamano <> Signed-off-by: Michael Haggerty <> Signed-off-by: Junio C Hamano <>
2014-11-10use args member of struct child_processRené Scharfe
Convert users of struct child_process to using the managed argv_array args instead of providing their own. This shortens the code a bit and ensures that the allocated memory is released automatically after use. Suggested-by: Jeff King <> Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2014-10-19repack: pack objects mentioned by the indexJeff King
When we pack all objects, we use only the objects reachable from references and reflogs. This misses any objects which are reachable from the index, but not yet referenced. By itself this isn't a big deal; the objects can remain loose until they are actually used in a commit. However, it does create a problem when we drop packed but unreachable objects. We try to optimize out the writing of objects that we will immediately prune, which means we must follow the same rules as prune in determining what is reachable. And prune uses the index for this purpose. This is rather uncommon in practice, as objects in the index would not usually have been packed in the first place. But it could happen in a sequence like: 1. You make a commit on a branch that references blob X. 2. You repack, moving X into the pack. 3. You delete the branch (and its reflog), so that X is unreferenced. 4. You "git add" blob X so that it is now referenced only by the index. 5. You repack again with git-gc. The pack-objects we invoke will see that X is neither referenced nor recent and not bother loosening it. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2014-09-26Merge branch 'jk/prune-packed-server-info'Junio C Hamano
Code cleanup. * jk/prune-packed-server-info: repack: call prune_packed_objects() and update_server_info() directly server-info: clean up after writing info/packs make update-server-info more robust prune-packed: fix minor memory leak
2014-09-15repack: call prune_packed_objects() and update_server_info() directlyRené Scharfe
Call the functions behind git prune-packed and git update-server-info directly instead of using run_command(). This is shorter, easier and quicker. Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2014-08-20run-command: introduce CHILD_PROCESS_INITRené Scharfe
Most struct child_process variables are cleared using memset first after declaration. Provide a macro, CHILD_PROCESS_INIT, that can be used to initialize them statically instead. That's shorter, doesn't require a function call and is slightly more readable (especially given that we already have STRBUF_INIT, ARGV_ARRAY_INIT etc.). Helped-by: Johannes Sixt <> Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2014-07-16Merge branch 'jk/strip-suffix'Junio C Hamano
* jk/strip-suffix: prepare_packed_git_one: refactor duplicate-pack check verify-pack: use strbuf_strip_suffix strbuf: implement strbuf_strip_suffix index-pack: use strip_suffix to avoid magic numbers use strip_suffix instead of ends_with in simple cases replace has_extension with ends_with implement ends_with via strip_suffix add strip_suffix function sha1_file: replace PATH_MAX buffer with strbuf in prepare_packed_git_one()
2014-06-30use strip_suffix instead of ends_with in simple casesJeff King
When stripping a suffix like: if (ends_with(str, "foo")) buf = xmemdupz(str, strlen(str) - 3); we can instead use strip_suffix to avoid the constant 3, which must match the literal "foo" (we sometimes use strlen("foo") instead, but that means we are repeating ourselves). The example above becomes: if (strip_suffix(str, "foo", &len)) buf = xmemdupz(str, len); This also saves a strlen(), since we calculate the string length when detecting the suffix. Note that in some cases we also switch from xstrndup to xmemdupz, which saves a further strlen call. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2014-06-25Merge branch 'jk/repack-pack-writebitmaps-config'Junio C Hamano
* jk/repack-pack-writebitmaps-config: t7700: drop explicit --no-pack-kept-objects from .keep test repack: introduce repack.writeBitmaps config option repack: simplify handling of --write-bitmap-index pack-objects: stop respecting pack.writebitmaps
2014-06-25Merge branch 'jk/repack-pack-keep-objects'Junio C Hamano
Recent updates to "git repack" started to duplicate objects that are in packfiles marked with .keep flag into the new packfile by mistake. * jk/repack-pack-keep-objects: repack: s/write_bitmap/&s/ in code repack: respect pack.writebitmaps repack: do not accidentally pack kept objects by default
2014-06-10repack: introduce repack.writeBitmaps config optionJeff King
We currently have pack.writeBitmaps, which originally operated at the pack-objects level. This should really have been a repack.* option from day one. Let's give it the more sensible name, but keep the old version as a deprecated synonym. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2014-06-10repack: simplify handling of --write-bitmap-indexJeff King
We previously needed to pass --no-write-bitmap-index explicitly to pack-objects to override its reading of pack.writebitmaps from the config. Now that it no longer does so, we can assume that bitmaps are off by default, and only turn them on when necessary. This also lets us avoid a confusing tri-state flag for write_bitmaps. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2014-06-10repack: s/write_bitmap/&s/ in codeJeff King
The config name is "writeBitmaps", so the internal variable missing the plural is unnecessarily confusing to write. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2014-06-10repack: respect pack.writebitmapsJeff King
The config option to turn on bitmaps is read all the way down in the plumbing of pack-objects. This makes it hard for other options in the porcelain of repack to make decisions based on the bitmap setting. For example, repack.packKeptObjects tries to kick in by default only when bitmaps are turned on. But it can't do so reliably because it doesn't yet know whether we are using bitmaps. This patch teaches repack to respect pack.writebitmaps. It means we pass a redundant command-line flag to pack-objects, but that's OK; it shouldn't affect the outcome. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2014-06-10repack: do not accidentally pack kept objects by defaultJeff King
Commit ee34a2b (repack: add `repack.packKeptObjects` config var, 2014-03-03) added a flag which could duplicate kept objects, but did not mean to turn it on by default. Instead, the option is tied by default to the decision to write bitmaps, like: if (pack_kept_objects < 0) pack_kept_objects = write_bitmap; after which we expect pack_kept_objects to be a boolean 0 or 1. However, that assignment neglects that write_bitmap is _also_ a tri-state with "-1" as the default, and with neither option given, we accidentally turn the option on. This patch is the minimal fix to restore the desired behavior for the default state. Further patches will fix the more complicated cases. Note the update to t7700. It failed to turn on bitmaps, meaning we were actually confirming the wrong behavior! Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2014-03-18Merge branch 'jk/repack-pack-keep-objects'Junio C Hamano
* jk/repack-pack-keep-objects: repack: add `repack.packKeptObjects` config var
2014-03-03repack: add `repack.packKeptObjects` config varJeff King
The git-repack command always passes `--honor-pack-keep` to pack-objects. This has traditionally been a good thing, as we do not want to duplicate those objects in a new pack, and we are not going to delete the old pack. However, when bitmaps are in use, it is important for a full repack to include all reachable objects, even if they may be duplicated in a .keep pack. Otherwise, we cannot generate the bitmaps, as the on-disk format requires the set of objects in the pack to be fully closed. Even if the repository does not generally have .keep files, a simultaneous push could cause a race condition in which a .keep file exists at the moment of a repack. The repack may try to include those objects in one of two situations: 1. The pushed .keep pack contains objects that were already in the repository (e.g., blobs due to a revert of an old commit). 2. Receive-pack updates the refs, making the objects reachable, but before it removes the .keep file, the repack runs. In either case, we may prefer to duplicate some objects in the new, full pack, and let the next repack (after the .keep file is cleaned up) take care of removing them. This patch introduces both a command-line and config option to disable the `--honor-pack-keep` option. By default, it is triggered when pack.writeBitmaps (or `--write-bitmap-index` is turned on), but specifying it explicitly can override the behavior (e.g., in cases where you prefer .keep files to bitmaps, but only when they are present). Note that this option just disables the pack-objects behavior. We still leave packs with a .keep in place, as we do not necessarily know that we have duplicated all of their objects. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2014-02-27Merge branch 'jk/pack-bitmap'Junio C Hamano
Borrow the bitmap index into packfiles from JGit to speed up enumeration of objects involved in a commit range without having to fully traverse the history. * jk/pack-bitmap: (26 commits) ewah: unconditionally ntohll ewah data ewah: support platforms that require aligned reads read-cache: use get_be32 instead of hand-rolled ntoh_l block-sha1: factor out get_be and put_be wrappers do not discard revindex when re-preparing packfiles pack-bitmap: implement optional name_hash cache t/perf: add tests for pack bitmaps t: add basic bitmap functionality tests count-objects: recognize .bitmap in garbage-checking repack: consider bitmaps when performing repacks repack: handle optional files created by pack-objects repack: turn exts array into array-of-struct repack: stop using magic number for ARRAY_SIZE(exts) pack-objects: implement bitmap writing rev-list: add bitmap mode to speed up object lists pack-objects: use bitmaps when packing objects pack-objects: split add_object_entry pack-bitmap: add support for bitmap indexes documentation: add documentation for the bitmap format ewah: compressed bitmap implementation ...
2014-02-05Merge branch 'tb/repack-fix-renames' (early part)Junio C Hamano
Finishing touches to the "rewrite repack in C" series. * 'tb/repack-fix-renames' (early part): repack.c: rename and unlink pack file if it exists
2014-02-05repack.c: rename and unlink pack file if it existsTorsten Bögershausen
When a repo was fully repacked, and is repacked again, we may run into the situation that "new" packfiles have the same name as already existing ones (traditionally packfiles have been named after the list of names of objects in them, so repacking all the objects in a single pack would have produced a packfile with the same name). The logic is to rename the existing ones into filename like "old-XXX", create the new ones and then remove the "old-" ones. When something went wrong in the middle, this sequence is rolled back by renaming the "old-" files back. The renaming into "old-" did not work as intended, because file_exists() was done on "XXX", not "pack-XXX". Also when rolling back the change, the code tried to rename "old-pack-XXX" but the saved ones are named "old-XXX", so this couldn't have worked. Signed-off-by: Torsten Bögershausen <> Signed-off-by: Junio C Hamano <>
2014-01-27Merge branch 'sb/repack-in-c'Junio C Hamano
"git repack --max-pack-size=8g" stopped being parsed correctly when the command was reimplemented in C. * sb/repack-in-c: repack: propagate pack-objects options as strings repack: make parsed string options const-correct repack: fix typo in max-pack-size option
2014-01-23repack: propagate pack-objects options as stringsJeff King
In the original shell version of git-repack, any options destined for pack-objects were left as strings, and passed as a whole. Since the C rewrite in commit a1bbc6c (repack: rewrite the shell script in C, 2013-09-15), we now parse these values to integers internally, then reformat the integers when passing the option to pack-objects. This has the advantage that we catch format errors earlier (i.e., when repack is invoked, rather than when pack-objects is invoked). It has three disadvantages, though: 1. Our internal data types may not be the right size. In the case of "--window-memory" and "--max-pack-size", these are "unsigned long" in pack-objects, but we can only represent a regular "int". 2. Our parsing routines might not be the same as those of pack-objects. For the two options above, pack-objects understands "100m" to mean "100 megabytes", but repack does not. 3. We have to keep a sentinel value to know whether it is worth passing the option along. In the case of "--window-memory", we currently do not pass it if the value is "0". But that is a meaningful value to pack-objects, where it overrides any configured value. We can fix all of these by simply passing the strings from the user along to pack-objects verbatim. This does not actually fix anything for "--depth" or "--window", but these are converted, too, for consistency. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2014-01-23repack: make parsed string options const-correctJeff King
When we use OPT_STRING to parse an option, we get back a pointer into the argv array, which should be "const char *". The compiler doesn't notice because it gets passed through a "void *" in the option struct. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2014-01-23repack: fix typo in max-pack-size optionJeff King
When we see "--max-pack-size", we accidentally propagated this to pack-objects as "--max_pack_size", which does not work at all. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2013-12-30repack: consider bitmaps when performing repacksVicent Marti
Since `pack-objects` will write a `.bitmap` file next to the `.pack` and `.idx` files, this commit teaches `git-repack` to consider the new bitmap indexes (if they exist) when performing repack operations. This implies moving old bitmap indexes out of the way if we are repacking a repository that already has them, and moving the newly generated bitmap indexes into the `objects/pack` directory, next to their corresponding packfiles. Since `git repack` is now capable of handling these `.bitmap` files, a normal `git gc` run on a repository that has `pack.writebitmaps` set to true in its config file will generate bitmap indexes as part of the garbage collection process. Alternatively, `git repack` can be called with the `-b` switch to explicitly generate bitmap indexes if you are experimenting and don't want them on all the time. Signed-off-by: Vicent Marti <> Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2013-12-30repack: handle optional files created by pack-objectsJeff King
We ask pack-objects to pack to a set of temporary files, and then rename them into place. Some files that pack-objects creates may be optional (like a .bitmap file), in which case we would not want to call rename(). We already call stat() and make the chmod optional if the file cannot be accessed. We could simply skip the rename step in this case, but that would be a minor regression in noticing problems with non-optional files (like the .pack and .idx files). Instead, we can now annotate extensions as optional, and skip them if they don't exist (and otherwise rely on rename() to barf). Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2013-12-30repack: turn exts array into array-of-structJeff King
This is slightly more verbose, but will let us annotate the extensions with further options in future commits. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2013-12-30repack: stop using magic number for ARRAY_SIZE(exts)Jeff King
We have a static array of extensions, but hardcode the size of the array in our loops. Let's pull out this magic number, which will make it easier to change. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2013-12-17Sync with C Hamano
* maint: Git cmd_repack(): remove redundant local variable "nr_packs"
2013-12-17cmd_repack(): remove redundant local variable "nr_packs"Michael Haggerty
Its value is the same as the number of entries in the "names" string_list, so just use "" in its place. Signed-off-by: Michael Haggerty <> Acked-by: Stefan Beller <> Signed-off-by: Junio C Hamano <>
2013-12-05replace {pre,suf}fixcmp() with {starts,ends}_with()Christian Couder
Leaving only the function definitions and declarations so that any new topic in flight can still make use of the old functions, replace existing uses of the prefixcmp() and suffixcmp() with new API functions. The change can be recreated by mechanically applying this: $ git grep -l -e prefixcmp -e suffixcmp -- \*.c | grep -v strbuf\\.c | xargs perl -pi -e ' s|!prefixcmp\(|starts_with\(|g; s|prefixcmp\(|!starts_with\(|g; s|!suffixcmp\(|ends_with\(|g; s|suffixcmp\(|!ends_with\(|g; ' on the result of preparatory changes in this series. Signed-off-by: Christian Couder <> Signed-off-by: Junio C Hamano <>
2013-09-17repack: improve warnings about failure of renaming and removing filesStefan Beller
Signed-off-by: Stefan Beller <> Signed-off-by: Junio C Hamano <>
2013-09-17repack: retain the return value of pack-objectsStefan Beller
During the review process of the previous commit (repack: rewrite the shell script in C), Johannes Sixt proposed to retain any exit codes from the sub-process, which makes it probably more obvious in case of failure. As the commit before should behave as close to the original shell script, the proposed change is put in this extra commit. The infrastructure however was already setup in the previous commit. (Having a local 'ret' variable) Signed-off-by: Stefan Beller <> Signed-off-by: Junio C Hamano <>
2013-09-17repack: rewrite the shell script in CStefan Beller
The motivation of this patch is to get closer to a goal of being able to have a core subset of git functionality built in to git. That would mean * people on Windows could get a copy of at least the core parts of Git without having to install a Unix-style shell * people using git in on servers with chrooted environments do not need to worry about standard tools lacking for shell scripts. This patch is meant to be mostly a literal translation of the git-repack script; the intent is that later patches would start using more library facilities, but this patch is meant to be as close to a no-op as possible so it doesn't do that kind of thing. Signed-off-by: Stefan Beller <> Signed-off-by: Junio C Hamano <>