summaryrefslogtreecommitdiff
path: root/builtin/gc.c
AgeCommit message (Collapse)Author
2021-09-23Merge branch 'ab/gc-remove-unused-call'Junio C Hamano
Code clean-up. * ab/gc-remove-unused-call: gc: remove unused launchctl_get_uid() call
2021-09-20Merge branch 'js/run-command-close-packs'Junio C Hamano
The run-command API has been updated so that the callers can easily ask the file descriptors open for packfiles to be closed immediately before spawning commands that may trigger auto-gc. * js/run-command-close-packs: Close object store closer to spawning child processes run_auto_maintenance(): implicitly close the object store run-command: offer to close the object store before running run-command: prettify the `RUN_COMMAND_*` flags pull: release packs before fetching commit-graph: when closing the graph, also release the slab
2021-09-20Merge branch 'lh/systemd-timers'Junio C Hamano
"git maintenance" scheduler learned to use systemd timers as a possible backend. * lh/systemd-timers: maintenance: add support for systemd timers on Linux maintenance: `git maintenance run` learned `--scheduler=<scheduler>` cache.h: Introduce a generic "xdg_config_home_for(…)" function
2021-09-12gc: remove unused launchctl_get_uid() callÆvar Arnfjörð Bjarmason
When the launchctl_boot_plist() function was added in a16eb6b1ff3 (maintenance: skip bootout/bootstrap when plist is registered, 2021-08-24), an unused call to launchctl_get_uid() was added along with it. That call appears to have been copy/pasted from launchctl_boot_plist(). Since we can remove that, we can also get rid of the "result" variable, whose only purpose was allow for the free() between its assignment and the return. That pattern also appears to have been copy/pasted from launchctl_boot_plist(). As the patch shows the returned value from launchctl_get_uid() wasn't used at all in this function. The launchctl_get_uid() function itself just calls xstrfmt() and getuid(), neither of which have any subtle global side-effects, so this removal is safe. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-10Merge branch 'ab/gc-log-rephrase'Junio C Hamano
A pathname in an advice message has been made cut-and-paste ready. * ab/gc-log-rephrase: gc: remove trailing dot from "gc.log" line
2021-09-09Close object store closer to spawning child processesJohannes Schindelin
In many cases where we spawned child processes that _may_ trigger a repack, we explicitly closed the object store first (so that the `repack` process can delete the `.pack` files, which would otherwise not be possible on Windows since files cannot be deleted as long as they as still in use). Wherever possible, we now use the new `close_object_store` bit of the `run_command()` API, to delay closing the object store even further. This makes the code easier to maintain because it is now more obvious that we only release those file handles because of those child processes. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08Merge branch 'js/maintenance-launchctl-fix'Junio C Hamano
"git maintenance" scheduler fix for macOS. * js/maintenance-launchctl-fix: maintenance: skip bootout/bootstrap when plist is registered maintenance: create `launchctl` configuration using a lock file
2021-09-07maintenance: add support for systemd timers on LinuxLénaïc Huard
The existing mechanism for scheduling background maintenance is done through cron. On Linux systems managed by systemd, systemd provides an alternative to schedule recurring tasks: systemd timers. The main motivations to implement systemd timers in addition to cron are: * cron is optional and Linux systems running systemd might not have it installed. * The execution of `crontab -l` can tell us if cron is installed but not if the daemon is actually running. * With systemd, each service is run in its own cgroup and its logs are tagged by the service inside journald. With cron, all scheduled tasks are running in the cron daemon cgroup and all the logs of the user-scheduled tasks are pretended to belong to the system cron service. Concretely, a user that doesn’t have access to the system logs won’t have access to the log of their own tasks scheduled by cron whereas they will have access to the log of their own tasks scheduled by systemd timer. Although `cron` attempts to send email, that email may go unseen by the user because these days, local mailboxes are not heavily used anymore. In order to schedule git maintenance, we need two unit template files: * ~/.config/systemd/user/git-maintenance@.service to define the command to be started by systemd and * ~/.config/systemd/user/git-maintenance@.timer to define the schedule at which the command should be run. Those units are templates that are parameterized by the frequency. Based on those templates, 3 timers are started: * git-maintenance@hourly.timer * git-maintenance@daily.timer * git-maintenance@weekly.timer The command launched by those three timers are the same as with the other scheduling methods: /path/to/git for-each-repo --exec-path=/path/to --config=maintenance.repo maintenance run --schedule=%i with the full path for git to ensure that the version of git launched for the scheduled maintenance is the same as the one used to run `maintenance start`. The timer unit contains `Persistent=true` so that, if the computer is powered down when a maintenance task should run, the task will be run when the computer is back powered on. Signed-off-by: Lénaïc Huard <lenaic@lhuard.fr> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07maintenance: `git maintenance run` learned `--scheduler=<scheduler>`Lénaïc Huard
Depending on the system, different schedulers can be used to schedule the hourly, daily and weekly executions of `git maintenance run`: * `launchctl` for MacOS, * `schtasks` for Windows and * `crontab` for everything else. `git maintenance run` now has an option to let the end-user explicitly choose which scheduler he wants to use: `--scheduler=auto|crontab|launchctl|schtasks`. When `git maintenance start --scheduler=XXX` is run, it not only registers `git maintenance run` tasks in the scheduler XXX, it also removes the `git maintenance run` tasks from all the other schedulers to ensure we cannot have two schedulers launching concurrent identical tasks. The default value is `auto` which chooses a suitable scheduler for the system. `git maintenance stop` doesn't have any `--scheduler` parameter because this command will try to remove the `git maintenance run` tasks from all the available schedulers. Signed-off-by: Lénaïc Huard <lenaic@lhuard.fr> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-02gc: remove trailing dot from "gc.log" lineÆvar Arnfjörð Bjarmason
Remove the trailing dot from the warning we emit about gc.log. It's common for various terminal UX's to allow the user to select "words", and by including the trailing dot a user wanting to select the path to gc.log will need to manually remove the trailing dot. Such a user would also probably need to adjust the path if it e.g. had spaces in it, but this should address this very common case. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Suggested-by: Jan Judas <snugar.i@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-24maintenance: skip bootout/bootstrap when plist is registeredDerrick Stolee
On macOS, we use launchctl to manage the background maintenance schedule. This uses a set of .plist files to describe the schedule, but these files are also registered with 'launchctl bootstrap'. If multiple 'git maintenance start' commands run concurrently, then they can collide replacing these schedule files and registering them with launchctl. To avoid extra launchctl commands, do a check for the .plist files on disk and check if they are registered using 'launchctl list <name>'. This command will return with exit code 0 if it exists, or exit code 113 if it does not. We can test this behavior using the GIT_TEST_MAINT_SCHEDULER environment variable. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-24maintenance: create `launchctl` configuration using a lock fileJohannes Schindelin
When two `git maintenance` processes try to write the `.plist` file, we need to help them with serializing their efforts. The 150ms time-out value was determined from thin air. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-26Use a better name for the function interpolating pathsJohannes Schindelin
It is not immediately clear what `expand_user_path()` means, so let's rename it to `interpolate_path()`. This also opens the path for interpolating more than just a home directory. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11maintenance: fix two memory leaksLénaïc Huard
Fixes two memory leaks when running `git maintenance start` or `git maintenance stop` in `update_background_schedule`: $ valgrind --leak-check=full ~/git/bin/git maintenance start ==76584== Memcheck, a memory error detector ==76584== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==76584== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info ==76584== Command: /home/lenaic/git/bin/git maintenance start ==76584== ==76584== ==76584== HEAP SUMMARY: ==76584== in use at exit: 34,880 bytes in 252 blocks ==76584== total heap usage: 820 allocs, 568 frees, 146,414 bytes allocated ==76584== ==76584== 65 bytes in 1 blocks are definitely lost in loss record 17 of 39 ==76584== at 0x483E6AF: malloc (vg_replace_malloc.c:306) ==76584== by 0x3DC39C: xrealloc (wrapper.c:126) ==76584== by 0x3992CC: strbuf_grow (strbuf.c:98) ==76584== by 0x39A473: strbuf_vaddf (strbuf.c:392) ==76584== by 0x39BC54: xstrvfmt (strbuf.c:979) ==76584== by 0x39BD2C: xstrfmt (strbuf.c:989) ==76584== by 0x18451B: update_background_schedule (gc.c:1977) ==76584== by 0x1846F6: maintenance_start (gc.c:2011) ==76584== by 0x1847B4: cmd_maintenance (gc.c:2030) ==76584== by 0x127A2E: run_builtin (git.c:453) ==76584== by 0x127E81: handle_builtin (git.c:704) ==76584== by 0x128142: run_argv (git.c:771) ==76584== ==76584== 240 bytes in 1 blocks are definitely lost in loss record 29 of 39 ==76584== at 0x4840D7B: realloc (vg_replace_malloc.c:834) ==76584== by 0x491CE5D: getdelim (in /usr/lib/libc-2.33.so) ==76584== by 0x39ADD7: strbuf_getwholeline (strbuf.c:635) ==76584== by 0x39AF31: strbuf_getdelim (strbuf.c:706) ==76584== by 0x39B064: strbuf_getline_lf (strbuf.c:727) ==76584== by 0x184273: crontab_update_schedule (gc.c:1919) ==76584== by 0x184678: update_background_schedule (gc.c:1997) ==76584== by 0x1846F6: maintenance_start (gc.c:2011) ==76584== by 0x1847B4: cmd_maintenance (gc.c:2030) ==76584== by 0x127A2E: run_builtin (git.c:453) ==76584== by 0x127E81: handle_builtin (git.c:704) ==76584== by 0x128142: run_argv (git.c:771) ==76584== ==76584== LEAK SUMMARY: ==76584== definitely lost: 305 bytes in 2 blocks ==76584== indirectly lost: 0 bytes in 0 blocks ==76584== possibly lost: 0 bytes in 0 blocks ==76584== still reachable: 34,575 bytes in 250 blocks ==76584== suppressed: 0 bytes in 0 blocks ==76584== Reachable blocks (those to which a pointer was found) are not shown. ==76584== To see them, rerun with: --leak-check=full --show-leak-kinds=all ==76584== ==76584== For lists of detected and suppressed errors, rerun with: -s ==76584== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0) Signed-off-by: Lénaïc Huard <lenaic@lhuard.fr> Acked-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-16maintenance: respect remote.*.skipFetchAllDerrick Stolee
If a remote has the skipFetchAll setting enabled, then that remote is not intended for frequent fetching. It makes sense to not fetch that data during the 'prefetch' maintenance task. Skip that remote in the iteration without error. The skip_default_update member is initialized in remote.c:handle_config() as part of initializing the 'struct remote'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-16maintenance: use 'git fetch --prefetch'Derrick Stolee
The 'prefetch' maintenance task previously forced the following refspec for each remote: +refs/heads/*:refs/prefetch/<remote>/* If a user has specified a more strict refspec for the remote, then this prefetch task downloads more objects than necessary. The previous change introduced the '--prefetch' option to 'git fetch' which manipulates the remote's refspec to place all resulting refs into refs/prefetch/, with further partitioning based on the destinations of those refspecs. Update the documentation to be more generic about the destination refs. Do not mention custom refspecs explicitly, as that does not need to be highlighted in this documentation. The important part of placing refs in refs/prefetch/ remains. Reported-by: Tom Saeger <tom.saeger@oracle.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-06maintenance: simplify prefetch logicDerrick Stolee
The previous logic filled a string list with the names of each remote, but instead we could simply run the appropriate 'git fetch' data directly in the remote iterator. Do this for reduced code size, but also because it sets up an upcoming change to use the remote's refspec. This data is accessible from the 'struct remote' data that is now accessible in fetch_remote(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-23maintenance: fix incorrect `maintenance.repo` path with bare repositoryEric Sunshine
The periodic maintenance tasks configured by `git maintenance start` invoke `git for-each-repo` to run `git maintenance run` on each path specified by the multi-value global configuration variable `maintenance.repo`. Because `git for-each-repo` will likely be run outside of the repositories which require periodic maintenance, it is mandatory that the repository paths specified by `maintenance.repo` are absolute. Unfortunately, however, `git maintenance register` does nothing to ensure that the paths it assigns to `maintenance.repo` are indeed absolute, and may in fact -- especially in the case of a bare repository -- assign a relative path to `maintenance.repo` instead. Fix this problem by converting all paths to absolute before assigning them to `maintenance.repo`. While at it, also fix `git maintenance unregister` to convert paths to absolute, as well, in order to ensure that it can correctly remove from `maintenance.repo` a path assigned via `git maintenance register`. Reported-by: Clement Moyroud <clement.moyroud@gmail.com> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-10maintenance: incremental strategy runs pack-refs weeklyDerrick Stolee
When the 'maintenance.strategy' config option is set to 'incremental', a default maintenance schedule is enabled. Add the 'pack-refs' task to that strategy at the weekly cadence. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-10maintenance: add pack-refs taskDerrick Stolee
It is valuable to collect loose refs into a more compressed form. This is typically the packed-refs file, although this could be the reftable in the future. Having packed refs can be extremely valuable in repos with many tags or remote branches that are not modified by the local user, but still are necessary for other queries. For instance, with many exploded refs, commands such as git describe --tags --exact-match HEAD can be very slow (multiple seconds). This command in particular is used by terminal prompts to show when a detatched HEAD is pointing to an existing tag, so having it be slow causes significant delays for users. Add a new 'pack-refs' maintenance task. It runs 'git pack-refs --all --prune' to move loose refs into a packed form. For now, that is the packed-refs file, but could adjust to other file formats in the future. This is the first of several sub-tasks of the 'gc' task that could be extracted to their own tasks. In this process, we should not change the behavior of the 'gc' task since that remains the default way to keep repositories maintained. Creating a new task for one of these sub-tasks only provides more customization options for those choosing to not use the 'gc' task. It is certainly possible to have both the 'gc' and 'pack-refs' tasks enabled and run regularly. While they may repeat effort, they do not conflict in a destructive way. The 'auto_condition' function pointer is left NULL for now. We could extend this in the future to have a condition check if pack-refs should be run during 'git maintenance run --auto'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-03Merge branch 'jk/peel-iterated-oid'Junio C Hamano
The peel_ref() API has been replaced with peel_iterated_oid(). * jk/peel-iterated-oid: refs: switch peel_ref() to peel_iterated_oid()
2021-02-03Merge branch 'ds/maintenance-prefetch-cleanup'Junio C Hamano
Test clean-up plus UI improvement by hiding extra refs that the prefetch task uses from "log --decorate" output. * ds/maintenance-prefetch-cleanup: t7900: clean up some broken refs maintenance: set log.excludeDecoration durin prefetch
2021-01-25Merge branch 'tb/pack-revindex-api'Junio C Hamano
Abstract accesses to in-core revindex that allows enumerating objects stored in a packfile in the order they appear in the pack, in preparation for introducing an on-disk precomputed revindex. * tb/pack-revindex-api: (21 commits) for_each_object_in_pack(): clarify pack vs index ordering pack-revindex.c: avoid direct revindex access in 'offset_to_pack_pos()' pack-revindex: hide the definition of 'revindex_entry' pack-revindex: remove unused 'find_revindex_position()' pack-revindex: remove unused 'find_pack_revindex()' builtin/gc.c: guess the size of the revindex for_each_object_in_pack(): convert to new revindex API unpack_entry(): convert to new revindex API packed_object_info(): convert to new revindex API retry_bad_packed_offset(): convert to new revindex API get_delta_base_oid(): convert to new revindex API rebuild_existing_bitmaps(): convert to new revindex API try_partial_reuse(): convert to new revindex API get_size_by_pos(): convert to new revindex API show_objects_for_type(): convert to new revindex API bitmap_position_packfile(): convert to new revindex API check_object(): convert to new revindex API write_reused_pack_verbatim(): convert to new revindex API write_reused_pack_one(): convert to new revindex API write_reuse_object(): convert to new revindex API ...
2021-01-25Merge branch 'ma/more-opaque-lock-file'Junio C Hamano
Code clean-up. * ma/more-opaque-lock-file: read-cache: try not to peek into `struct {lock_,temp}file` refs/files-backend: don't peek into `struct lock_file` midx: don't peek into `struct lock_file` commit-graph: don't peek into `struct lock_file` builtin/gc: don't peek into `struct lock_file`
2021-01-21refs: switch peel_ref() to peel_iterated_oid()Jeff King
The peel_ref() interface is confusing and error-prone: - it's typically used by ref iteration callbacks that have both a refname and oid. But since they pass only the refname, we may load the ref value from the filesystem again. This is inefficient, but also means we are open to a race if somebody simultaneously updates the ref. E.g., this: int some_ref_cb(const char *refname, const struct object_id *oid, ...) { if (!peel_ref(refname, &peeled)) printf("%s peels to %s", oid_to_hex(oid), oid_to_hex(&peeled); } could print nonsense. It is correct to say "refname peels to..." (you may see the "before" value or the "after" value, either of which is consistent), but mentioning both oids may be mixing before/after values. Worse, whether this is possible depends on whether the optimization to read from the current iterator value kicks in. So it is actually not possible with: for_each_ref(some_ref_cb); but it _is_ possible with: head_ref(some_ref_cb); which does not use the iterator mechanism (though in practice, HEAD should never peel to anything, so this may not be triggerable). - it must take a fully-qualified refname for the read_ref_full() code path to work. Yet we routinely pass it partial refnames from callbacks to for_each_tag_ref(), etc. This happens to work when iterating because there we do not call read_ref_full() at all, and only use the passed refname to check if it is the same as the iterator. But the requirements for the function parameters are quite unclear. Instead of taking a refname, let's instead take an oid. That fixes both problems. It's a little funny for a "ref" function not to involve refs at all. The key thing is that it's optimizing under the hood based on having access to the ref iterator. So let's change the name to make it clear why you'd want this function versus just peel_object(). There are two other directions I considered but rejected: - we could pass the peel information into the each_ref_fn callback. However, we don't know if the caller actually wants it or not. For packed-refs, providing it is essentially free. But for loose refs, we actually have to peel the object, which would be wasteful in most cases. We could likewise pass in a flag to the callback indicating whether the peeled information is known, but that complicates those callbacks, as they then have to decide whether to manually peel themselves. Plus it requires changing the interface of every callback, whether they care about peeling or not, and there are many of them. - we could make a function to return the peeled value of the current iterated ref (computing it if necessary), and BUG() otherwise. I.e.: int peel_current_iterated_ref(struct object_id *out); Each of the current callers is an each_ref_fn callback, so they'd mostly be happy. But: - we use those callbacks with functions like head_ref(), which do not use the iteration code. So we'd need to handle the fallback case there, anyway. - it's possible that a caller would want to call into generic code that sometimes is used during iteration and sometimes not. This encapsulates the logic to do the fast thing when possible, and fallback when necessary. The implementation is mostly obvious, but I want to call out a few things in the patch: - the test-tool coverage for peel_ref() is now meaningless, as it all collapses to a single peel_object() call (arguably they were pretty uninteresting before; the tricky part of that function is the fast-path we see during iteration, but these calls didn't trigger that). I've just dropped it entirely, though note that some other tests relied on the tags we created; I've moved that creation to the tests where it matters. - we no longer need to take a ref_store parameter, since we'd never look up a ref now. We do still rely on a global "current iterator" variable which _could_ be kept per-ref-store. But in practice this is only useful if there are multiple recursive iterations, at which point the more appropriate solution is probably a stack of iterators. No caller used the actual ref-store parameter anyway (they all call the wrapper that passes the_repository). - the original only kicked in the optimization when the "refname" pointer matched (i.e., not string comparison). We do likewise with the "oid" parameter here, but fall back to doing an actual oideq() call. This in theory lets us kick in the optimization more often, though in practice no current caller cares. It should never be wrong, though (peeling is a property of an object, so two refs pointing to the same object would peel identically). - the original took care not to touch the peeled out-parameter unless we found something to put in it. But no caller cares about this, and anyway, it is enforced by peel_object() itself (and even in the optimized iterator case, that's where we eventually end up). We can shorten the code and avoid an extra copy by just passing the out-parameter through the stack. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21maintenance: set log.excludeDecoration durin prefetchDerrick Stolee
The 'prefetch' task fetches refs from all remotes and places them in the refs/prefetch/<remote>/ refspace. As this task is intended to run in the background, this allows users to keep their local data very close to the remote servers' data while not updating the users' understanding of the remote refs in refs/remotes/<remote>/. However, this can clutter 'git log' decorations with copies of the refs with the full name 'refs/prefetch/<remote>/<branch>'. The log.excludeDecoration config option was added in a6be5e67 (log: add log.excludeDecoration config option, 2020-05-16) for exactly this purpose. Ensure we set this only for users that would benefit from it by assigning it at the beginning of the prefetch task. Other alternatives would be during 'git maintenance register' or 'git maintenance start', but those might assign the config even when the prefetch task is disabled by existing config. Further, users could run 'git maintenance run --task=prefetch' using their own scripting or scheduling. This provides the best coverage to automatically update the config when valuable. It is improbable, but possible, that users might want to run the prefetch task _and_ see these refs in their log decorations. This seems incredibly unlikely to me, but users can always opt-in on a command-by-command basis using --decorate-refs=refs/prefetch/. Test that this works in a few cases. In particular, ensure that our assignment of log.excludeDecoration=refs/prefetch/ is additive to other existing exclusions. Further, ensure we do not add multiple copies in multiple runs. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-16Merge branch 'ds/maintenance-part-4'Junio C Hamano
Follow-up on the "maintenance part-3" which introduced scheduled maintenance tasks to support platforms whose native scheduling methods are not 'cron'. * ds/maintenance-part-4: maintenance: use Windows scheduled tasks maintenance: use launchctl on macOS maintenance: include 'cron' details in docs maintenance: extract platform-specific scheduling
2021-01-14builtin/gc.c: guess the size of the revindexTaylor Blau
'estimate_repack_memory()' takes into account the amount of memory required to load the reverse index in memory by multiplying the assumed number of objects by the size of the 'revindex_entry' struct. Prepare for hiding the definition of 'struct revindex_entry' by removing a 'sizeof()' of that type from outside of pack-revindex.c. Instead, guess that one off_t and one uint32_t are required per object. Strictly speaking, this is a worse guess than asking for 'sizeof(struct revindex_entry)' directly, since the true size of this struct is 16 bytes with padding on the end of the struct in order to align the offset field. But, this is an approximation anyway, and it does remove a use of the 'struct revindex_entry' from outside of pack-revindex internals. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-06builtin/gc: don't peek into `struct lock_file`Martin Ågren
A `struct lock_file` is pretty much just a wrapper around a tempfile. But it's easy enough to avoid relying on this. Use the wrappers that the lock file API provides rather than peeking at the temp file or even into *its* internals. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-05maintenance: use Windows scheduled tasksDerrick Stolee
Git's background maintenance uses cron by default, but this is not available on Windows. Instead, integrate with Task Scheduler. Tasks can be scheduled using the 'schtasks' command. There are several command-line options that can allow for some advanced scheduling, but unfortunately these seem to all require authenticating using a password. Instead, use the "/xml" option to pass an XML file that contains the configuration for the necessary schedule. These XML files are based on some that I exported after constructing a schedule in the Task Scheduler GUI. These options only run background maintenance when the user is logged in, and more fields are populated with the current username and SID at run-time by 'schtasks'. Since the GIT_TEST_MAINT_SCHEDULER environment variable allows us to specify 'schtasks' as the scheduler, we can test the Windows-specific logic on other platforms. Thus, add a check that the XML file written by Git is valid when xmllint exists on the system. Since we use a temporary file for the XML files sent to 'schtasks', we prefix the random characters with the frequency so it is easier to examine the proper file during tests. Instead of an exact match on the 'args' file, we 'grep' for the arguments other than the filename. There is a deficiency in the current design. Windows has two kinds of applications: GUI applications that start by "winmain()" and console applications that start by "main()". Console applications are attached to a new Console window if they are not already associated with a GUI application. This means that every hour the scheudled task launches a command window for the scheduled tasks. Not only is this visually obtrusive, but it also takes focus from whatever else the user is doing! A simple fix would be to insert a GUI application that acts as a shim between the scheduled task and Git. This is currently possible in Git for Windows by setting the <Command> tag equal to C:\Program Files\Git\git-bash.exe with options "--hide --no-needs-console --command=cmd\git.exe" followed by the arguments currently used. Since git-bash.exe is not included in Windows builds of core Git, I chose to leave out this feature. My plan is to submit a small patch to Git for Windows that converts the use of git.exe with this use of git-bash.exe in the short term. In the long term, we can consider creating this GUI shim application within core Git, perhaps in contrib/. Co-authored-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-05maintenance: use launchctl on macOSDerrick Stolee
The existing mechanism for scheduling background maintenance is done through cron. The 'crontab -e' command allows updating the schedule while cron itself runs those commands. While this is technically supported by macOS, it has some significant deficiencies: 1. Every run of 'crontab -e' must request elevated privileges through the user interface. When running 'git maintenance start' from the Terminal app, it presents a dialog box saying "Terminal.app would like to administer your computer. Administration can include modifying passwords, networking, and system settings." This is more alarming than what we are hoping to achieve. If this alert had some information about how "git" is trying to run "crontab" then we would have some reason to believe that this dialog might be fine. However, it also doesn't help that some scenarios just leave Git waiting for a response without presenting anything to the user. I experienced this when executing the command from a Bash terminal view inside Visual Studio Code. 2. While cron initializes a user environment enough for "git config --global --show-origin" to show the correct config file information, it does not set up the environment enough for Git Credential Manager Core to load credentials during a 'prefetch' task. My prefetches against private repositories required re-authenticating through UI pop-ups in a way that should not be required. The solution is to switch from cron to the Apple-recommended [1] 'launchd' tool. [1] https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/ScheduledJobs.html The basics of this tool is that we need to create XML-formatted "plist" files inside "~/Library/LaunchAgents/" and then use the 'launchctl' tool to make launchd aware of them. The plist files include all of the scheduling information, along with the command-line arguments split across an array of <string> tags. For example, here is my plist file for the weekly scheduled tasks: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"><dict> <key>Label</key><string>org.git-scm.git.weekly</string> <key>ProgramArguments</key> <array> <string>/usr/local/libexec/git-core/git</string> <string>--exec-path=/usr/local/libexec/git-core</string> <string>for-each-repo</string> <string>--config=maintenance.repo</string> <string>maintenance</string> <string>run</string> <string>--schedule=weekly</string> </array> <key>StartCalendarInterval</key> <array> <dict> <key>Day</key><integer>0</integer> <key>Hour</key><integer>0</integer> <key>Minute</key><integer>0</integer> </dict> </array> </dict> </plist> The schedules for the daily and hourly tasks are more complicated since we need to use an array for the StartCalendarInterval with an entry for each of the six days other than the 0th day (to avoid colliding with the weekly task), and each of the 23 hours other than the 0th hour (to avoid colliding with the daily task). The "Label" value is currently filled with "org.git-scm.git.X" where X is the frequency. We need a different plist file for each frequency. The launchctl command needs to be aligned with a user id in order to initialize the command environment. This must be done using the 'launchctl bootstrap' subcommand. This subcommand is new as of macOS 10.11, which was released in September 2015. Before that release the 'launchctl load' subcommand was recommended. The best source of information on this transition I have seen is available at [2]. The current design does not preclude a future version that detects the available fatures of 'launchctl' to use the older commands. However, it is best to rely on the newest version since Apple might completely remove the deprecated version on short notice. [2] https://babodee.wordpress.com/2016/04/09/launchctl-2-0-syntax/ To remove a schedule, we must run 'launchctl bootout' with a valid plist file. We also need to 'bootout' a task before the 'bootstrap' subcommand will succeed, if such a task already exists. The need for a user id requires us to run 'id -u' which works on POSIX systems but not Windows. Further, the need for fully-qualitifed path names including $HOME behaves differently in the Git internals and the external test suite. The $HOME variable starts with "C:\..." instead of the "/c/..." that is provided by Git in these subcommands. The test therefore has a prerequisite that we are not on Windows. The cross- platform logic still allows us to test the macOS logic on a Linux machine. We can verify the commands that were run by 'git maintenance start' and 'git maintenance stop' by injecting a script that writes the command-line arguments into GIT_TEST_MAINT_SCHEDULER. An earlier version of this patch accidentally had an opening "<dict>" tag when it should have had a closing "</dict>" tag. This was caught during manual testing with actual 'launchctl' commands, but we do not want to update developers' tasks when running tests. It appears that macOS includes the "xmllint" tool which can verify the XML format. This is useful for any system that might contain the tool, so use it whenever it is available. We strive to make these tests work on all platforms, but Windows caused some headaches. In particular, the value of getuid() called by the C code is not guaranteed to be the same as `$(id -u)` invoked by a test. This is because `git.exe` is a native Windows program, whereas the utility programs run by the test script mostly utilize the MSYS2 runtime, which emulates a POSIX-like environment. Since the purpose of the test is to check that the input to the hook is well-formed, the actual user ID is immaterial, thus we can work around the problem by making the the test UID-agnostic. Another subtle issue is the $HOME environment variable being a Windows-style path instead of a Unix-style path. We can be more flexible here instead of expecting exact path matches. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Co-authored-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-21gc: fix handling of crontab magic markersMartin Ågren
On `git maintenance start`, we add a few entries to the user's cron table. We wrap our entries using two magic markers, "# BEGIN GIT MAINTENANCE SCHEDULE" and "# END GIT MAINTENANCE SCHEDULE". At a later `git maintenance stop`, we will go through the table and remove these lines. Or rather, we will remove the "BEGIN" marker, the "END" marker and everything between them. Alas, we have a bug in how we detect the "END" marker: we don't. As we loop through all the lines of the crontab, if we are in the "old region", i.e., the region we're aiming to remove, we make an early `continue` and don't get as far as checking for the "END" marker. Thus, once we've seen our "BEGIN", we remove everything until the end of the file. Rewrite the logic for identifying these markers. There are four cases that are mutually exclusive: The current line starts a region or it ends it, or it's firmly within the region, or it's outside of it (and should be printed). Signed-off-by: Martin Ågren <martin.agren@gmail.com> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08Merge branch 'rs/maintenance-run-outside-repo'Junio C Hamano
"git maintenance run/start/stop" needed to be run in a repository to hold the lockfile they use, but didn't make sure they are actually in a repository, which has been corrected. * rs/maintenance-run-outside-repo: t7900: fix typo: "test_execpt_success" maintenance: fix SEGFAULT when no repository
2020-12-08Merge branch 'ds/maintenance-part-3'Junio C Hamano
"git maintenance" command had trouble working in a directory whose pathname contained an ERE metacharacter like '+'. * ds/maintenance-part-3: maintenance: use 'git config --fixed-value'
2020-11-30Merge branch 'ab/gc-keep-base-option'Junio C Hamano
Fix an option name in "gc" documentation. * ab/gc-keep-base-option: gc: rename keep_base_pack variable for --keep-largest-pack gc docs: change --keep-base-pack to --keep-largest-pack
2020-11-30maintenance: fix SEGFAULT when no repositoryRafael Silva
The "git maintenance run" and "git maintenance start/stop" commands holds a file-based lock at the .git/maintenance.lock and .git/schedule.lock respectively. These locks are used to ensure only one maintenance process is executed at the time as both operations involves writing data into the git repository. The path to the lock file is built using "the_repository->objects->odb->path" that results in SEGFAULT when we have no repository available as "the_repository->objects->odb" is set to NULL. Let's teach maintenance command to use RUN_SETUP option that will provide the validation and fail when running outside of a repository. Hence fixing the SEGFAULT for all three operations and making the behaviour consistent across all subcommands. Setting the RUN_SETUP also provides the same protection for all subcommands given that the "register" and "unregister" also requires to be executed inside a repository. Furthermore let's remove the local validation implemented by the "register" and "unregister" as this will not be required anymore with the new option. Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-25maintenance: use 'git config --fixed-value'Derrick Stolee
When a repository's leading directories contain regex metacharacters, the config calls for 'git maintenance register' and 'git maintenance unregister' are not careful enough. Use the new --fixed-value option to direct the config machinery to use exact string matches. This is a more robust option than escaping these arguments in a piecemeal fashion. For the test, require that we are not running on Windows since the '+' and '*' characters are not allowed on that filesystem. Reported-by: Emily Shaffer <emilyshaffer@google.com> Reported-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-24maintenance: extract platform-specific schedulingDerrick Stolee
The existing schedule mechanism using 'cron' is supported by POSIX platforms, but not Windows. It also works slightly differently on macOS to significant detriment of the user experience. To allow for new implementations on these platforms, extract a method that performs the platform-specific scheduling mechanism. This will be swapped at compile time with new implementations on specialized platforms. As we add this generality, rename GIT_TEST_CRONTAB to GIT_TEST_MAINT_SCHEDULER. Further, this variable is now parsed as "<scheduler>:<command>" so we can test platform-specific scheduling logic even when not on the correct platform. By specifying the <scheduler> in this string, we will be able to test all three sets of Git logic from a Linux machine. Co-authored-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-21gc: rename keep_base_pack variable for --keep-largest-packÆvar Arnfjörð Bjarmason
As noted in an earlier change the keep_base_pack variable name is a relic from an earlier on-list version of ae4e89e549 ("gc: add --keep-largest-pack option", 2018-04-15) before it was renamed to --keep-largest-pack. Let's change the variable name to avoid that confusion, it's easier to read the code if there's a 1=1 mapping between the variable name and option name. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-18gc: fix cast in compare_tasks_by_selection()René Scharfe
compare_tasks_by_selection() is used with QSORT and gets passed pointers to the elements of "static struct maintenance_task tasks[]". It casts the *addresses* of these passed pointers to element pointers, though, and thus effectively compares some unrelated values from the stack. Fix the casts to actually compare array elements. Detected by USan (make SANITIZE=undefined test). Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-18Merge branch 'ds/maintenance-part-3'Junio C Hamano
Parts of "git maintenance" to ease writing crontab entries (and other scheduling system configuration) for it. * ds/maintenance-part-3: maintenance: add troubleshooting guide to docs maintenance: use 'incremental' strategy by default maintenance: create maintenance.strategy config maintenance: add start/stop subcommands maintenance: add [un]register subcommands for-each-repo: run subcommands on configured repos maintenance: add --schedule option and config maintenance: optionally skip --auto process
2020-11-11Merge branch 'rs/clear-commit-marks-in-repo'Junio C Hamano
Code clean-up. * rs/clear-commit-marks-in-repo: bisect: clear flags in passed repository object: allow clear_commit_marks_all to handle any repo
2020-11-02Merge branch 'ds/maintenance-commit-graph-auto-fix'Junio C Hamano
Test-coverage enhancement of running commit-graph task "git maintenance" as needed led to discovery and fix of a bug. * ds/maintenance-commit-graph-auto-fix: maintenance: core.commitGraph=false prevents writes maintenance: test commit-graph auto condition
2020-10-31object: allow clear_commit_marks_all to handle any repoRené Scharfe
Allow callers to specify the repository to use. Rename the function to repo_clear_commit_marks to document its new scope. No functional change intended. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16maintenance: use 'incremental' strategy by defaultDerrick Stolee
The 'git maintenance (register|start)' subcommands add the current repository to the global Git config so maintenance will operate on that repository. It does not specify what maintenance should occur or how often. To make it simple for users to start background maintenance with a recommended schedlue, update the 'maintenance.strategy' config option in both the 'register' and 'start' subcommands. This allows users to customize beyond the defaults using individual 'maintenance.<task>.schedule' options, but also the user can opt-out of this strategy using 'maintenance.strategy=none'. Helped-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16maintenance: create maintenance.strategy configDerrick Stolee
To provide an on-ramp for users to use background maintenance without several 'git config' commands, create a 'maintenance.strategy' config option. Currently, the only important value is 'incremental' which assigns the following schedule: * gc: never * prefetch: hourly * commit-graph: hourly * loose-objects: daily * incremental-repack: daily These tasks are chosen to minimize disruptions to foreground Git commands and use few compute resources. The 'maintenance.strategy' is intended as a baseline that can be customzied further by manually assigning 'maintenance.<task>.enabled' and 'maintenance.<task>.schedule' config options, which will override any recommendation from 'maintenance.strategy'. This operates similarly to config options like 'feature.experimental' which operate as "meta" config options that change default config values. This presents a way forward for updating the 'incremental' strategy in the future or adding new strategies. For example, a potential strategy could be to include a 'full' strategy that runs the 'gc' task weekly and no other tasks by default. Helped-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-12maintenance: core.commitGraph=false prevents writesDerrick Stolee
Recently, a user had an issue due to combining fetch.writeCommitGraph=true with core.commitGraph=false. The root bug has been resolved by preventing commit-graph writes when core.commitGraph is disabled. This happens inside the 'git commit-graph write' command, but we can be more aware of this situation and prevent that process from ever starting in the 'commit-graph' maintenance task. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-08maintenance: test commit-graph auto conditionDerrick Stolee
The auto condition for the commit-graph maintenance task walks refs looking for commits that are not in the commit-graph file. This was added in 4ddc79b2 (maintenance: add auto condition for commit-graph task, 2020-09-17) but was left untested. The initial goal of this change was to demonstrate the feature works properly by adding tests. However, there was an off-by-one error that caused the basic tests around maintenance.commit-graph.auto=1 to fail when it should work. The subtlety is that if a ref tip is not in the commit-graph, then we were not adding that to the total count. In the test, we see that we have only added one commit since our last commit-graph write, so the auto condition would say there is nothing to do. The fix is simple: add the check for the commit-graph position to see that the tip is not in the commit-graph file before starting our walk. Since this happens before adding to the DFS stack, we do not need to clear our (currently empty) commit list. This does add some extra complexity for the test, because we also want to verify that the walk along the parents actually does some work. This means we need to add at least two commits in a row without writing the commit-graph. However, we also need to make sure no additional refs are pointing to the middle of this list or else the for_each_ref() in should_write_commit_graph() might visit these commits as tips instead of doing a DFS walk. Hence, the last two commits are added with "git commit" instead of "test_commit". Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25maintenance: add start/stop subcommandsDerrick Stolee
Add new subcommands to 'git maintenance' that start or stop background maintenance using 'cron', when available. This integration is as simple as I could make it, barring some implementation complications. The schedule is laid out as follows: 0 1-23 * * * $cmd maintenance run --schedule=hourly 0 0 * * 1-6 $cmd maintenance run --schedule=daily 0 0 * * 0 $cmd maintenance run --schedule=weekly where $cmd is a properly-qualified 'git for-each-repo' execution: $cmd=$path/git --exec-path=$path for-each-repo --config=maintenance.repo where $path points to the location of the Git executable running 'git maintenance start'. This is critical for systems with multiple versions of Git. Specifically, macOS has a system version at '/usr/bin/git' while the version that users can install resides at '/usr/local/bin/git' (symlinked to '/usr/local/libexec/git-core/git'). This will also use your locally-built version if you build and run this in your development environment without installing first. This conditional schedule avoids having cron launch multiple 'git for-each-repo' commands in parallel. Such parallel commands would likely lead to the 'hourly' and 'daily' tasks competing over the object database lock. This could lead to to some tasks never being run! Since the --schedule=<frequency> argument will run all tasks with _at least_ the given frequency, the daily runs will also run the hourly tasks. Similarly, the weekly runs will also run the daily and hourly tasks. The GIT_TEST_CRONTAB environment variable is not intended for users to edit, but instead as a way to mock the 'crontab [-l]' command. This variable is set in test-lib.sh to avoid a future test from accidentally running anything with the cron integration from modifying the user's schedule. We use GIT_TEST_CRONTAB='test-tool crontab <file>' in our tests to check how the schedule is modified in 'git maintenance (start|stop)' commands. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25maintenance: add [un]register subcommandsDerrick Stolee
In preparation for launching background maintenance from the 'git maintenance' builtin, create register/unregister subcommands. These commands update the new 'maintenance.repos' config option in the global config so the background maintenance job knows which repositories to maintain. These commands allow users to add a repository to the background maintenance list without disrupting the actual maintenance mechanism. For example, a user can run 'git maintenance register' when no background maintenance is running and it will not start the background maintenance. A later update to start running background maintenance will then pick up this repository automatically. The opposite example is that a user can run 'git maintenance unregister' to remove the current repository from background maintenance without halting maintenance for other repositories. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>