summaryrefslogtreecommitdiff
path: root/builtin/pack-redundant.c
AgeCommit message (Collapse)Author
2021-04-27builtin/pack-redundant: avoid casting buffers to struct object_idbrian m. carlson
Now that we need our instances of struct object_id to be zero padded, we can no longer cast unsigned char buffers to be pointers to struct object_id. This file reads data out of the pack objects and then inserts it directly into a linked list item which is a pointer to struct object_id. Instead, let's have the linked list item hold its own struct object_id and copy the data into it. In addition, since these are not really pointers to struct object_id, stop passing them around as such, and call them what they really are: pointers to unsigned char. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-14use CALLOC_ARRAYRené Scharfe
Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-25Merge branch 'jc/deprecate-pack-redundant'Junio C Hamano
Warn loudly when the "pack-redundant" command, which has been left stale with almost unusable performance issues, gets used, as we no longer want to recommend its use (instead just "repack -d" instead). * jc/deprecate-pack-redundant: pack-redundant: gauge the usage before proposing its removal
2020-12-23Merge branch 'jx/pack-redundant-on-single-pack'Junio C Hamano
"git pack-redandant" when there is only one packfile used to crash, which has been corrected. * jx/pack-redundant-on-single-pack: pack-redundant: fix crash when one packfile in repo
2020-12-17pack-redundant: fix crash when one packfile in repoJiang Xin
Command `git pack-redundant --all` will crash if there is only one packfile in the repository. This is because, if there is only one packfile in local_packs, `cmp_local_packs` will do nothing and will leave `pl->unique_objects` as uninitialized. Also add testcases for repository with no packfile and one packfile in t5323. Reported-by: Daniel C. Klauer <daniel.c.klauer@web.de> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-15pack-redundant: gauge the usage before proposing its removalJunio C Hamano
The subcommand is unusably slow and the reason why nobody reports it as a performance bug is suspected to be the absense of users. Let's show a big message that asks the user to tell us that they still care about the command when an attempt is made to run the command, with an escape hatch to override it with a command line option. In a few releases, we may turn it into an error and keep it for a few more releases before finally removing it (during the whole time, the plan to remove it would be interrupted by end user raising hand). Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-16use size_t to store pack .idx byte offsetsJeff King
We sometimes store the offset into a pack .idx file as an "unsigned long", but the mmap'd size of a pack .idx file can exceed 4GB. This is sufficient on LP64 systems like Linux, but will be too small on LLP64 systems like Windows, where "unsigned long" is still only 32 bits. Let's use size_t, which is a better type for an offset into a memory buffer. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01object-store: rename and expand packed_git's sha1 memberbrian m. carlson
This member is used to represent the pack checksum of the pack in question. Expand this member to be GIT_MAX_RAWSZ bytes in length so it works with longer hashes and rename it to be "hash" instead of "sha1". This transformation was made with a change to the definition and the following semantic patch: @@ struct packed_git *E1; @@ - E1->sha1 + E1->hash @@ struct packed_git E1; @@ - E1.sha1 + E1.hash Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-07Merge branch 'sc/pack-redundant'Junio C Hamano
Update the implementation of pack-redundant for performance in a repository with many packfiles. * sc/pack-redundant: pack-redundant: consistent sort method pack-redundant: rename pack_list.all_objects pack-redundant: new algorithm to find min packs pack-redundant: delete redundant code pack-redundant: delay creation of unique_objects t5323: test cases for git-pack-redundant
2019-02-04pack-redundant: consistent sort methodJiang Xin
SZEDER reported that test case t5323 has different test result on MacOS. This is because `cmp_pack_list_reverse` cannot give identical result when two pack being sorted has the same size of remaining_objects. Changes to the sorting function will make consistent test result for t5323. The new algorithm to find redundant packs is a trade-off to save memory resources, and the result of it may be different with old one, and may be not the best result sometimes. Update t5323 for the new algorithm. Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-04pack-redundant: rename pack_list.all_objectsJiang Xin
New algorithm uses `pack_list.all_objects` to track remaining objects, so rename it to `pack_list.remaining_objects`. Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-04pack-redundant: new algorithm to find min packsSun Chao
When calling `git pack-redundant --all`, if there are too many local packs and too many redundant objects within them, the too deep iteration of `get_permutations` will exhaust all the resources, and the process of `git pack-redundant` will be killed. The following script could create a repository with too many redundant packs, and running `git pack-redundant --all` in the `test.git` repo will die soon. #!/bin/sh repo="$(pwd)/test.git" work="$(pwd)/test" i=1 max=199 if test -d "$repo" || test -d "$work"; then echo >&2 "ERROR: '$repo' or '$work' already exist" exit 1 fi git init -q --bare "$repo" git --git-dir="$repo" config gc.auto 0 git --git-dir="$repo" config transfer.unpackLimit 0 git clone -q "$repo" "$work" 2>/dev/null while :; do cd "$work" echo "loop $i: $(date +%s)" >$i git add $i git commit -q -sm "loop $i" git push -q origin HEAD:master printf "\rCreate pack %4d/%d\t" $i $max if test $i -ge $max; then break; fi cd "$repo" git repack -q if test $(($i % 2)) -eq 0; then git repack -aq pack=$(ls -t $repo/objects/pack/*.pack | head -1) touch "${pack%.pack}.keep" fi i=$((i+1)) done printf "\ndone\n" To get the `min` unique pack list, we can replace the iteration in `minimize` function with a new algorithm, and this could solve this issue: 1. Get the unique and non_uniqe packs, add the unique packs to the `min` list. 2. Remove the objects of unique packs from non_unique packs, then each object left in the non_unique packs will have at least two copies. 3. Sort the non_unique packs by the objects' size, more objects first, and add the first non_unique pack to `min` list. 4. Drop the duplicated objects from other packs in the ordered non_unique pack list, and repeat step 3. Some test cases will fail on Mac OS X. Mark them and will resolve in later commit. Original PR and discussions: https://github.com/jiangxin/git/pull/25 Signed-off-by: Sun Chao <sunchao9@huawei.com> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-04pack-redundant: delete redundant codeSun Chao
The objects in alt-odb are removed from `all_objects` twice in `load_all_objects` and `scan_alt_odb_packs`, remove it from the later function. Signed-off-by: Sun Chao <sunchao9@huawei.com> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-04pack-redundant: delay creation of unique_objectsJiang Xin
Instead of initializing unique_objects in `add_pack()`, copy from all_objects in `cmp_two_packs()`, when unwanted objects are removed from all_objects. This will save memory (no allocate memory for alt-odb packs), and run `llist_sorted_difference_inplace()` only once when removing ignored objects and removing objects in alt-odb in `scan_alt_odb_packs()`. Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-04various: tighten constness of some local variablesShahzad Lone
Signed-off-by: Shahzad Lone <shahzadlone@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20treewide: use get_all_packsDerrick Stolee
There are many places in the codebase that want to iterate over all packfiles known to Git. The purposes are wide-ranging, and those that can take advantage of the multi-pack-index already do. So, use get_all_packs() instead of get_packed_git() to be sure we are iterating over all packfiles. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02pack-redundant: convert linked lists to use struct object_idbrian m. carlson
Convert struct llist_item and the rest of the linked list code to use struct object_id. Add a use of GIT_MAX_HEXSZ to avoid a dependency on a hard-coded constant. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02pack-redundant: abstract away hash algorithmbrian m. carlson
Instead of using hard-coded instances of the constant 20, use the_hash_algo to look up the correct constant. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11Merge branch 'sb/packfiles-in-repository'Junio C Hamano
Refactoring of the internal global data structure continues. * sb/packfiles-in-repository: packfile: keep prepare_packed_git() private packfile: allow find_pack_entry to handle arbitrary repositories packfile: add repository argument to find_pack_entry packfile: allow reprepare_packed_git to handle arbitrary repositories packfile: allow prepare_packed_git to handle arbitrary repositories packfile: allow prepare_packed_git_one to handle arbitrary repositories packfile: add repository argument to reprepare_packed_git packfile: add repository argument to prepare_packed_git packfile: add repository argument to prepare_packed_git_one packfile: allow install_packed_git to handle arbitrary repositories packfile: allow rearrange_packed_git to handle arbitrary repositories packfile: allow prepare_packed_git_mru to handle arbitrary repositories
2018-04-11Merge branch 'sb/object-store'Junio C Hamano
Refactoring the internal global data structure to make it possible to open multiple repositories, work with and then close them. Rerolled by Duy on top of a separate preliminary clean-up topic. The resulting structure of the topics looked very sensible. * sb/object-store: (27 commits) sha1_file: allow sha1_loose_object_info to handle arbitrary repositories sha1_file: allow map_sha1_file to handle arbitrary repositories sha1_file: allow map_sha1_file_1 to handle arbitrary repositories sha1_file: allow open_sha1_file to handle arbitrary repositories sha1_file: allow stat_sha1_file to handle arbitrary repositories sha1_file: allow sha1_file_name to handle arbitrary repositories sha1_file: add repository argument to sha1_loose_object_info sha1_file: add repository argument to map_sha1_file sha1_file: add repository argument to map_sha1_file_1 sha1_file: add repository argument to open_sha1_file sha1_file: add repository argument to stat_sha1_file sha1_file: add repository argument to sha1_file_name sha1_file: allow prepare_alt_odb to handle arbitrary repositories sha1_file: allow link_alt_odb_entries to handle arbitrary repositories sha1_file: add repository argument to prepare_alt_odb sha1_file: add repository argument to link_alt_odb_entries sha1_file: add repository argument to read_info_alternates sha1_file: add repository argument to link_alt_odb_entry sha1_file: add raw_object_store argument to alt_odb_usable pack: move approximate object count to object store ...
2018-03-26packfile: keep prepare_packed_git() privateNguyễn Thái Ngọc Duy
The reason callers have to call this is to make sure either packed_git or packed_git_mru pointers are initialized since we don't do that by default. Sometimes it's hard to see this connection between where the function is called and where packed_git pointer is used (sometimes in separate functions). Keep this dependency internal because now all access to packed_git and packed_git_mru must go through get_xxx() wrappers. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-26packfile: add repository argument to prepare_packed_gitStefan Beller
Add a repository argument to allow prepare_packed_git callers to be more specific about which repository to handle. See commit "sha1_file: add repository argument to link_alt_odb_entry" for an explanation of the #define trick. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-26object-store: move packed_git and packed_git_mru to object storeStefan Beller
In a process with multiple repositories open, packfile accessors should be associated to a single repository and not shared globally. Move packed_git and packed_git_mru into the_repository and adjust callers to reflect this. [nd: while at there, wrap access to these two fields in get_packed_git() and get_packed_git_mru(). This allows us to lazily initialize these fields without caller doing that explicitly] Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-22pack-redundant: rename 'new' variablesBrandon Williams
Rename C++ keyword in order to bring the codebase closer to being able to be compiled with a C++ compiler. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23pack: move pack name-related functionsJonathan Tan
Currently, sha1_file.c and cache.h contain many functions, both related to and unrelated to packfiles. This makes both files very large and causes an unclear separation of concerns. Create a new file, packfile.c, to hold all packfile-related functions currently in sha1_file.c. It has a corresponding header packfile.h. In this commit, the pack name-related functions are moved. Subsequent commits will move the other functions. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08pack-redundant: plug memory leakJohannes Schindelin
Identified via Coverity. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22convert trivial cases to ALLOC_ARRAYJeff King
Each of these cases can be converted to use ALLOC_ARRAY or REALLOC_ARRAY, which has two advantages: 1. It automatically checks the array-size multiplication for overflow. 2. It always uses sizeof(*array) for the element-size, so that it can never go out of sync with the declared type of the array. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-14standardize usage info string formatAlex Henrie
This patch puts the usage info strings that were not already in docopt- like format into docopt-like format, which will be a litle easier for end users and a lot easier for translators. Changes include: - Placing angle brackets around fill-in-the-blank parameters - Putting dashes in multiword parameter names - Adding spaces to [-f|--foobar] to make [-f | --foobar] - Replacing <foobar>* with [<foobar>...] Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-13Fix sizeof usage in get_permutationsMatthew Daley
Currently it gets the size of an otherwise unrelated, unused variable instead of the expected struct size. Signed-off-by: Matthew Daley <mattjd@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-22Fix sparse warningsStephen Boyd
Fix warnings from 'make check'. - These files don't include 'builtin.h' causing sparse to complain that cmd_* isn't declared: builtin/clone.c:364, builtin/fetch-pack.c:797, builtin/fmt-merge-msg.c:34, builtin/hash-object.c:78, builtin/merge-index.c:69, builtin/merge-recursive.c:22 builtin/merge-tree.c:341, builtin/mktag.c:156, builtin/notes.c:426 builtin/notes.c:822, builtin/pack-redundant.c:596, builtin/pack-refs.c:10, builtin/patch-id.c:60, builtin/patch-id.c:149, builtin/remote.c:1512, builtin/remote-ext.c:240, builtin/remote-fd.c:53, builtin/reset.c:236, builtin/send-pack.c:384, builtin/unpack-file.c:25, builtin/var.c:75 - These files have symbols which should be marked static since they're only file scope: submodule.c:12, diff.c:631, replace_object.c:92, submodule.c:13, submodule.c:14, trace.c:78, transport.c:195, transport-helper.c:79, unpack-trees.c:19, url.c:3, url.c:18, url.c:104, url.c:117, url.c:123, url.c:129, url.c:136, thread-utils.c:21, thread-utils.c:48 - These files redeclare symbols to be different types: builtin/index-pack.c:210, parse-options.c:564, parse-options.c:571, usage.c:49, usage.c:58, usage.c:63, usage.c:72 - These files use a literal integer 0 when they really should use a NULL pointer: daemon.c:663, fast-import.c:2942, imap-send.c:1072, notes-merge.c:362 While we're in the area, clean up some unused #includes in builtin files (mostly exec_cmd.h). Signed-off-by: Stephen Boyd <bebarino@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-22Move 'builtin-*' into a 'builtin/' subdirectoryLinus Torvalds
This shrinks the top-level directory a bit, and makes it much more pleasant to use auto-completion on the thing. Instead of [torvalds@nehalem git]$ em buil<tab> Display all 180 possibilities? (y or n) [torvalds@nehalem git]$ em builtin-sh builtin-shortlog.c builtin-show-branch.c builtin-show-ref.c builtin-shortlog.o builtin-show-branch.o builtin-show-ref.o [torvalds@nehalem git]$ em builtin-shor<tab> builtin-shortlog.c builtin-shortlog.o [torvalds@nehalem git]$ em builtin-shortlog.c you get [torvalds@nehalem git]$ em buil<tab> [type] builtin/ builtin.h [torvalds@nehalem git]$ em builtin [auto-completes to] [torvalds@nehalem git]$ em builtin/sh<tab> [type] shortlog.c shortlog.o show-branch.c show-branch.o show-ref.c show-ref.o [torvalds@nehalem git]$ em builtin/sho [auto-completes to] [torvalds@nehalem git]$ em builtin/shor<tab> [type] shortlog.c shortlog.o [torvalds@nehalem git]$ em builtin/shortlog.c which doesn't seem all that different, but not having that annoying break in "Display all 180 possibilities?" is quite a relief. NOTE! If you do this in a clean tree (no object files etc), or using an editor that has auto-completion rules that ignores '*.o' files, you won't see that annoying 'Display all 180 possibilities?' message - it will just show the choices instead. I think bash has some cut-off around 100 choices or something. So the reason I see this is that I'm using an odd editory, and thus don't have the rules to cut down on auto-completion. But you can simulate that by using 'ls' instead, or something similar. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>