From 1942d48380fec53f76361e9adebef15b5db9628a Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 28 Aug 2020 15:45:12 +0000 Subject: maintenance: optionally skip --auto process Some commands run 'git maintenance run --auto --[no-]quiet' after doing their normal work, as a way to keep repositories clean as they are used. Currently, users who do not want this maintenance to occur would set the 'gc.auto' config option to 0 to avoid the 'gc' task from running. However, this does not stop the extra process invocation. On Windows, this extra process invocation can be more expensive than necessary. Allow users to drop this extra process by setting 'maintenance.auto' to 'false'. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt index a0706d8..06db758 100644 --- a/Documentation/config/maintenance.txt +++ b/Documentation/config/maintenance.txt @@ -1,3 +1,8 @@ +maintenance.auto:: + This boolean config option controls whether some commands run + `git maintenance run --auto` after doing their normal work. Defaults + to true. + maintenance..enabled:: This boolean config option controls whether the maintenance task with name `` is run when no `--task` option is specified to diff --git a/run-command.c b/run-command.c index 2ee59ac..ea4d0fb 100644 --- a/run-command.c +++ b/run-command.c @@ -7,6 +7,7 @@ #include "strbuf.h" #include "string-list.h" #include "quote.h" +#include "config.h" void child_process_init(struct child_process *child) { @@ -1868,8 +1869,13 @@ int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task, int run_auto_maintenance(int quiet) { + int enabled; struct child_process maint = CHILD_PROCESS_INIT; + if (!git_config_get_bool("maintenance.auto", &enabled) && + !enabled) + return 0; + maint.git_cmd = 1; strvec_pushl(&maint.args, "maintenance", "run", "--auto", NULL); strvec_push(&maint.args, quiet ? "--quiet" : "--no-quiet"); diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh index 55116c2..c7caaa7 100755 --- a/t/t7900-maintenance.sh +++ b/t/t7900-maintenance.sh @@ -28,6 +28,19 @@ test_expect_success 'run [--auto|--quiet]' ' test_subcommand git gc --no-quiet .enabled' ' git config maintenance.gc.enabled false && git config maintenance.commit-graph.enabled true && -- cgit v0.10.2-6-g49f6 From b08ff1fee000230ad22fee823aabff2613bd4e62 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 11 Sep 2020 17:49:15 +0000 Subject: maintenance: add --schedule option and config Maintenance currently triggers when certain data-size thresholds are met, such as number of pack-files or loose objects. Users may want to run certain maintenance tasks based on frequency instead. For example, a user may want to perform a 'prefetch' task every hour, or 'gc' task every day. To help these users, update the 'git maintenance run' command to include a '--schedule=' option. The allowed frequencies are 'hourly', 'daily', and 'weekly'. These values are also allowed in a new config value 'maintenance..schedule'. The 'git maintenance run --schedule=' checks the '*.schedule' config value for each enabled task to see if the configured frequency is at least as frequent as the frequency from the '--schedule' argument. We use the following order, for full clarity: 'hourly' > 'daily' > 'weekly' Use new 'enum schedule_priority' to track these values numerically. The following cron table would run the scheduled tasks with the correct frequencies: 0 1-23 * * * git -C maintenance run --schedule=hourly 0 0 * * 1-6 git -C maintenance run --schedule=daily 0 0 * * 0 git -C maintenance run --schedule=weekly This cron schedule will run --schedule=hourly every hour except at midnight. This avoids a concurrent run with the --schedule=daily that runs at midnight every day except the first day of the week. This avoids a concurrent run with the --schedule=weekly that runs at midnight on the first day of the week. Since --schedule=daily also runs the 'hourly' tasks and --schedule=weekly runs the 'hourly' and 'daily' tasks, we will still see all tasks run with the proper frequencies. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt index 06db758..7058556 100644 --- a/Documentation/config/maintenance.txt +++ b/Documentation/config/maintenance.txt @@ -10,6 +10,11 @@ maintenance..enabled:: `--task` option exists. By default, only `maintenance.gc.enabled` is true. +maintenance..schedule:: + This config option controls whether or not the given `` runs + during a `git maintenance run --schedule=` command. The + value must be one of "hourly", "daily", or "weekly". + maintenance.commit-graph.auto:: This integer config option controls how often the `commit-graph` task should be run as part of `git maintenance run --auto`. If zero, then diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt index 3f5d894..ed94f66 100644 --- a/Documentation/git-maintenance.txt +++ b/Documentation/git-maintenance.txt @@ -110,7 +110,18 @@ OPTIONS only if certain thresholds are met. For example, the `gc` task runs when the number of loose objects exceeds the number stored in the `gc.auto` config setting, or when the number of pack-files - exceeds the `gc.autoPackLimit` config setting. + exceeds the `gc.autoPackLimit` config setting. Not compatible with + the `--schedule` option. + +--schedule:: + When combined with the `run` subcommand, run maintenance tasks + only if certain time conditions are met, as specified by the + `maintenance..schedule` config value for each ``. + This config value specifies a number of seconds since the last + time that task ran, according to the `maintenance..lastRun` + config value. The tasks that are tested are those provided by + the `--task=` option(s) or those with + `maintenance..enabled` set to true. --quiet:: Do not report progress or other information over `stderr`. diff --git a/builtin/gc.c b/builtin/gc.c index 2b99596..03b24ea 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -703,14 +703,51 @@ int cmd_gc(int argc, const char **argv, const char *prefix) return 0; } -static const char * const builtin_maintenance_run_usage[] = { - N_("git maintenance run [--auto] [--[no-]quiet] [--task=]"), +static const char *const builtin_maintenance_run_usage[] = { + N_("git maintenance run [--auto] [--[no-]quiet] [--task=] [--schedule]"), NULL }; +enum schedule_priority { + SCHEDULE_NONE = 0, + SCHEDULE_WEEKLY = 1, + SCHEDULE_DAILY = 2, + SCHEDULE_HOURLY = 3, +}; + +static enum schedule_priority parse_schedule(const char *value) +{ + if (!value) + return SCHEDULE_NONE; + if (!strcasecmp(value, "hourly")) + return SCHEDULE_HOURLY; + if (!strcasecmp(value, "daily")) + return SCHEDULE_DAILY; + if (!strcasecmp(value, "weekly")) + return SCHEDULE_WEEKLY; + return SCHEDULE_NONE; +} + +static int maintenance_opt_schedule(const struct option *opt, const char *arg, + int unset) +{ + enum schedule_priority *priority = opt->value; + + if (unset) + die(_("--no-schedule is not allowed")); + + *priority = parse_schedule(arg); + + if (!*priority) + die(_("unrecognized --schedule argument '%s'"), arg); + + return 0; +} + struct maintenance_run_opts { int auto_flag; int quiet; + enum schedule_priority schedule; }; /* Remember to update object flag allocation in object.h */ @@ -1158,6 +1195,8 @@ struct maintenance_task { maintenance_auto_fn *auto_condition; unsigned enabled:1; + enum schedule_priority schedule; + /* -1 if not selected. */ int selected_order; }; @@ -1253,6 +1292,9 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts) !tasks[i].auto_condition())) continue; + if (opts->schedule && tasks[i].schedule < opts->schedule) + continue; + trace2_region_enter("maintenance", tasks[i].name, r); if (tasks[i].fn(opts)) { error(_("task '%s' failed"), tasks[i].name); @@ -1273,13 +1315,23 @@ static void initialize_task_config(void) for (i = 0; i < TASK__COUNT; i++) { int config_value; + char *config_str; - strbuf_setlen(&config_name, 0); + strbuf_reset(&config_name); strbuf_addf(&config_name, "maintenance.%s.enabled", tasks[i].name); if (!git_config_get_bool(config_name.buf, &config_value)) tasks[i].enabled = config_value; + + strbuf_reset(&config_name); + strbuf_addf(&config_name, "maintenance.%s.schedule", + tasks[i].name); + + if (!git_config_get_string(config_name.buf, &config_str)) { + tasks[i].schedule = parse_schedule(config_str); + free(config_str); + } } strbuf_release(&config_name); @@ -1323,6 +1375,9 @@ static int maintenance_run(int argc, const char **argv, const char *prefix) struct option builtin_maintenance_run_options[] = { OPT_BOOL(0, "auto", &opts.auto_flag, N_("run tasks based on the state of the repository")), + OPT_CALLBACK(0, "schedule", &opts.schedule, N_("frequency"), + N_("run tasks based on frequency"), + maintenance_opt_schedule), OPT_BOOL(0, "quiet", &opts.quiet, N_("do not report progress or other information over stderr")), OPT_CALLBACK_F(0, "task", NULL, N_("task"), @@ -1343,6 +1398,9 @@ static int maintenance_run(int argc, const char **argv, const char *prefix) builtin_maintenance_run_usage, PARSE_OPT_STOP_AT_NON_OPTION); + if (opts.auto_flag && opts.schedule) + die(_("use at most one of --auto and --schedule=")); + if (argc != 0) usage_with_options(builtin_maintenance_run_usage, builtin_maintenance_run_options); diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh index c7caaa7..33d73cd 100755 --- a/t/t7900-maintenance.sh +++ b/t/t7900-maintenance.sh @@ -260,4 +260,44 @@ test_expect_success 'maintenance.incremental-repack.auto' ' test_subcommand git multi-pack-index write --no-progress err && + test_i18ngrep "at most one" err +' + +test_expect_success 'invalid --schedule value' ' + test_must_fail git maintenance run --schedule=annually 2>err && + test_i18ngrep "unrecognized --schedule" err +' + +test_expect_success '--schedule inheritance weekly -> daily -> hourly' ' + git config maintenance.loose-objects.enabled true && + git config maintenance.loose-objects.schedule hourly && + git config maintenance.commit-graph.enabled true && + git config maintenance.commit-graph.schedule daily && + git config maintenance.incremental-repack.enabled true && + git config maintenance.incremental-repack.schedule weekly && + + GIT_TRACE2_EVENT="$(pwd)/hourly.txt" \ + git maintenance run --schedule=hourly 2>/dev/null && + test_subcommand git prune-packed --quiet /dev/null && + test_subcommand git prune-packed --quiet /dev/null && + test_subcommand git prune-packed --quiet Date: Fri, 11 Sep 2020 17:49:16 +0000 Subject: for-each-repo: run subcommands on configured repos It can be helpful to store a list of repositories in global or system config and then iterate Git commands on that list. Create a new builtin that makes this process simple for experts. We will use this builtin to run scheduled maintenance on all configured repositories in a future change. The test is very simple, but does highlight that the "--" argument is optional. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano diff --git a/.gitignore b/.gitignore index a5808fa..5eb2a2b 100644 --- a/.gitignore +++ b/.gitignore @@ -67,6 +67,7 @@ /git-filter-branch /git-fmt-merge-msg /git-for-each-ref +/git-for-each-repo /git-format-patch /git-fsck /git-fsck-objects diff --git a/Documentation/git-for-each-repo.txt b/Documentation/git-for-each-repo.txt new file mode 100644 index 0000000..94bd19d --- /dev/null +++ b/Documentation/git-for-each-repo.txt @@ -0,0 +1,59 @@ +git-for-each-repo(1) +==================== + +NAME +---- +git-for-each-repo - Run a Git command on a list of repositories + + +SYNOPSIS +-------- +[verse] +'git for-each-repo' --config= [--] + + +DESCRIPTION +----------- +Run a Git command on a list of repositories. The arguments after the +known options or `--` indicator are used as the arguments for the Git +subprocess. + +THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE. + +For example, we could run maintenance on each of a list of repositories +stored in a `maintenance.repo` config variable using + +------------- +git for-each-repo --config=maintenance.repo maintenance run +------------- + +This will run `git -C maintenance run` for each value `` +in the multi-valued config variable `maintenance.repo`. + + +OPTIONS +------- +--config=:: + Use the given config variable as a multi-valued list storing + absolute path names. Iterate on that list of paths to run + the given arguments. ++ +These config values are loaded from system, global, and local Git config, +as available. If `git for-each-repo` is run in a directory that is not a +Git repository, then only the system and global config is used. + + +SUBPROCESS BEHAVIOR +------------------- + +If any `git -C ` subprocess returns a non-zero exit code, +then the `git for-each-repo` process returns that exit code without running +more subprocesses. + +Each `git -C ` subprocess inherits the standard file +descriptors `stdin`, `stdout`, and `stderr`. + + +GIT +--- +Part of the linkgit:git[1] suite diff --git a/Makefile b/Makefile index 65f8cfb..7c588ff 100644 --- a/Makefile +++ b/Makefile @@ -1071,6 +1071,7 @@ BUILTIN_OBJS += builtin/fetch-pack.o BUILTIN_OBJS += builtin/fetch.o BUILTIN_OBJS += builtin/fmt-merge-msg.o BUILTIN_OBJS += builtin/for-each-ref.o +BUILTIN_OBJS += builtin/for-each-repo.o BUILTIN_OBJS += builtin/fsck.o BUILTIN_OBJS += builtin/gc.o BUILTIN_OBJS += builtin/get-tar-commit-id.o diff --git a/builtin.h b/builtin.h index 17c1c0c..ff7c6e5 100644 --- a/builtin.h +++ b/builtin.h @@ -150,6 +150,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix); int cmd_fetch_pack(int argc, const char **argv, const char *prefix); int cmd_fmt_merge_msg(int argc, const char **argv, const char *prefix); int cmd_for_each_ref(int argc, const char **argv, const char *prefix); +int cmd_for_each_repo(int argc, const char **argv, const char *prefix); int cmd_format_patch(int argc, const char **argv, const char *prefix); int cmd_fsck(int argc, const char **argv, const char *prefix); int cmd_gc(int argc, const char **argv, const char *prefix); diff --git a/builtin/for-each-repo.c b/builtin/for-each-repo.c new file mode 100644 index 0000000..5bba623 --- /dev/null +++ b/builtin/for-each-repo.c @@ -0,0 +1,58 @@ +#include "cache.h" +#include "config.h" +#include "builtin.h" +#include "parse-options.h" +#include "run-command.h" +#include "string-list.h" + +static const char * const for_each_repo_usage[] = { + N_("git for-each-repo --config= "), + NULL +}; + +static int run_command_on_repo(const char *path, + void *cbdata) +{ + int i; + struct child_process child = CHILD_PROCESS_INIT; + struct strvec *args = (struct strvec *)cbdata; + + child.git_cmd = 1; + strvec_pushl(&child.args, "-C", path, NULL); + + for (i = 0; i < args->nr; i++) + strvec_push(&child.args, args->v[i]); + + return run_command(&child); +} + +int cmd_for_each_repo(int argc, const char **argv, const char *prefix) +{ + static const char *config_key = NULL; + int i, result = 0; + const struct string_list *values; + struct strvec args = STRVEC_INIT; + + const struct option options[] = { + OPT_STRING(0, "config", &config_key, N_("config"), + N_("config key storing a list of repository paths")), + OPT_END() + }; + + argc = parse_options(argc, argv, prefix, options, for_each_repo_usage, + PARSE_OPT_STOP_AT_NON_OPTION); + + if (!config_key) + die(_("missing --config=")); + + for (i = 0; i < argc; i++) + strvec_push(&args, argv[i]); + + values = repo_config_get_value_multi(the_repository, + config_key); + + for (i = 0; !result && i < values->nr; i++) + result = run_command_on_repo(values->items[i].string, &args); + + return result; +} diff --git a/command-list.txt b/command-list.txt index 0e3204e..581499b 100644 --- a/command-list.txt +++ b/command-list.txt @@ -94,6 +94,7 @@ git-fetch-pack synchingrepositories git-filter-branch ancillarymanipulators git-fmt-merge-msg purehelpers git-for-each-ref plumbinginterrogators +git-for-each-repo plumbinginterrogators git-format-patch mainporcelain git-fsck ancillaryinterrogators complete git-gc mainporcelain diff --git a/git.c b/git.c index 24f250d..1cab64b 100644 --- a/git.c +++ b/git.c @@ -511,6 +511,7 @@ static struct cmd_struct commands[] = { { "fetch-pack", cmd_fetch_pack, RUN_SETUP | NO_PARSEOPT }, { "fmt-merge-msg", cmd_fmt_merge_msg, RUN_SETUP }, { "for-each-ref", cmd_for_each_ref, RUN_SETUP }, + { "for-each-repo", cmd_for_each_repo, RUN_SETUP_GENTLY }, { "format-patch", cmd_format_patch, RUN_SETUP }, { "fsck", cmd_fsck, RUN_SETUP }, { "fsck-objects", cmd_fsck, RUN_SETUP }, diff --git a/t/t0068-for-each-repo.sh b/t/t0068-for-each-repo.sh new file mode 100755 index 0000000..136b4ec --- /dev/null +++ b/t/t0068-for-each-repo.sh @@ -0,0 +1,30 @@ +#!/bin/sh + +test_description='git for-each-repo builtin' + +. ./test-lib.sh + +test_expect_success 'run based on configured value' ' + git init one && + git init two && + git init three && + git -C two commit --allow-empty -m "DID NOT RUN" && + git config run.key "$TRASH_DIRECTORY/one" && + git config --add run.key "$TRASH_DIRECTORY/three" && + git for-each-repo --config=run.key commit --allow-empty -m "ran" && + git -C one log -1 --pretty=format:%s >message && + grep ran message && + git -C two log -1 --pretty=format:%s >message && + ! grep ran message && + git -C three log -1 --pretty=format:%s >message && + grep ran message && + git for-each-repo --config=run.key -- commit --allow-empty -m "ran again" && + git -C one log -1 --pretty=format:%s >message && + grep again message && + git -C two log -1 --pretty=format:%s >message && + ! grep again message && + git -C three log -1 --pretty=format:%s >message && + grep again message +' + +test_done -- cgit v0.10.2-6-g49f6 From 0c18b700810ab2e2a4fac0d5b54a817141198a27 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 11 Sep 2020 17:49:17 +0000 Subject: maintenance: add [un]register subcommands In preparation for launching background maintenance from the 'git maintenance' builtin, create register/unregister subcommands. These commands update the new 'maintenance.repos' config option in the global config so the background maintenance job knows which repositories to maintain. These commands allow users to add a repository to the background maintenance list without disrupting the actual maintenance mechanism. For example, a user can run 'git maintenance register' when no background maintenance is running and it will not start the background maintenance. A later update to start running background maintenance will then pick up this repository automatically. The opposite example is that a user can run 'git maintenance unregister' to remove the current repository from background maintenance without halting maintenance for other repositories. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt index ed94f66..1c59fd0 100644 --- a/Documentation/git-maintenance.txt +++ b/Documentation/git-maintenance.txt @@ -29,6 +29,15 @@ Git repository. SUBCOMMANDS ----------- +register:: + Initialize Git config values so any scheduled maintenance will + start running on this repository. This adds the repository to the + `maintenance.repo` config variable in the current user's global + config and enables some recommended configuration values for + `maintenance..schedule`. The tasks that are enabled are safe + for running in the background without disrupting foreground + processes. + run:: Run one or more maintenance tasks. If one or more `--task` options are specified, then those tasks are run in that order. Otherwise, @@ -36,6 +45,11 @@ run:: config options are true. By default, only `maintenance.gc.enabled` is true. +unregister:: + Remove the current repository from background maintenance. This + only removes the repository from the configured list. It does not + stop the background maintenance processes from running. + TASKS ----- diff --git a/builtin/gc.c b/builtin/gc.c index 03b24ea..edf1d35 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -1407,7 +1407,56 @@ static int maintenance_run(int argc, const char **argv, const char *prefix) return maintenance_run_tasks(&opts); } -static const char builtin_maintenance_usage[] = N_("git maintenance run []"); +static int maintenance_register(void) +{ + struct child_process config_set = CHILD_PROCESS_INIT; + struct child_process config_get = CHILD_PROCESS_INIT; + + /* There is no current repository, so skip registering it */ + if (!the_repository || !the_repository->gitdir) + return 0; + + config_get.git_cmd = 1; + strvec_pushl(&config_get.args, "config", "--global", "--get", "maintenance.repo", + the_repository->worktree ? the_repository->worktree + : the_repository->gitdir, + NULL); + config_get.out = -1; + + if (start_command(&config_get)) + return error(_("failed to run 'git config'")); + + /* We already have this value in our config! */ + if (!finish_command(&config_get)) + return 0; + + config_set.git_cmd = 1; + strvec_pushl(&config_set.args, "config", "--add", "--global", "maintenance.repo", + the_repository->worktree ? the_repository->worktree + : the_repository->gitdir, + NULL); + + return run_command(&config_set); +} + +static int maintenance_unregister(void) +{ + struct child_process config_unset = CHILD_PROCESS_INIT; + + if (!the_repository || !the_repository->gitdir) + return error(_("no current repository to unregister")); + + config_unset.git_cmd = 1; + strvec_pushl(&config_unset.args, "config", "--global", "--unset", + "maintenance.repo", + the_repository->worktree ? the_repository->worktree + : the_repository->gitdir, + NULL); + + return run_command(&config_unset); +} + +static const char builtin_maintenance_usage[] = N_("git maintenance []"); int cmd_maintenance(int argc, const char **argv, const char *prefix) { @@ -1417,6 +1466,10 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix) if (!strcmp(argv[1], "run")) return maintenance_run(argc - 1, argv + 1, prefix); + if (!strcmp(argv[1], "register")) + return maintenance_register(); + if (!strcmp(argv[1], "unregister")) + return maintenance_unregister(); die(_("invalid subcommand: %s"), argv[1]); } diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh index 33d73cd..8f383d0 100755 --- a/t/t7900-maintenance.sh +++ b/t/t7900-maintenance.sh @@ -9,7 +9,7 @@ GIT_TEST_MULTI_PACK_INDEX=0 test_expect_success 'help text' ' test_expect_code 129 git maintenance -h 2>err && - test_i18ngrep "usage: git maintenance run" err && + test_i18ngrep "usage: git maintenance " err && test_expect_code 128 git maintenance barf 2>err && test_i18ngrep "invalid subcommand: barf" err && test_expect_code 129 git maintenance 2>err && @@ -300,4 +300,19 @@ test_expect_success '--schedule inheritance weekly -> daily -> hourly' ' test_subcommand git multi-pack-index write --no-progress before && + git maintenance register && + git config --global --get-all maintenance.repo >actual && + cp before after && + pwd >>after && + test_cmp after actual && + git maintenance unregister && + git config --global --get-all maintenance.repo >actual && + test_cmp before actual +' + test_done -- cgit v0.10.2-6-g49f6 From 2fec604f8df51fcf47551723e68a3447915b9eb9 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 11 Sep 2020 17:49:18 +0000 Subject: maintenance: add start/stop subcommands Add new subcommands to 'git maintenance' that start or stop background maintenance using 'cron', when available. This integration is as simple as I could make it, barring some implementation complications. The schedule is laid out as follows: 0 1-23 * * * $cmd maintenance run --schedule=hourly 0 0 * * 1-6 $cmd maintenance run --schedule=daily 0 0 * * 0 $cmd maintenance run --schedule=weekly where $cmd is a properly-qualified 'git for-each-repo' execution: $cmd=$path/git --exec-path=$path for-each-repo --config=maintenance.repo where $path points to the location of the Git executable running 'git maintenance start'. This is critical for systems with multiple versions of Git. Specifically, macOS has a system version at '/usr/bin/git' while the version that users can install resides at '/usr/local/bin/git' (symlinked to '/usr/local/libexec/git-core/git'). This will also use your locally-built version if you build and run this in your development environment without installing first. This conditional schedule avoids having cron launch multiple 'git for-each-repo' commands in parallel. Such parallel commands would likely lead to the 'hourly' and 'daily' tasks competing over the object database lock. This could lead to to some tasks never being run! Since the --schedule= argument will run all tasks with _at least_ the given frequency, the daily runs will also run the hourly tasks. Similarly, the weekly runs will also run the daily and hourly tasks. The GIT_TEST_CRONTAB environment variable is not intended for users to edit, but instead as a way to mock the 'crontab [-l]' command. This variable is set in test-lib.sh to avoid a future test from accidentally running anything with the cron integration from modifying the user's schedule. We use GIT_TEST_CRONTAB='test-tool crontab ' in our tests to check how the schedule is modified in 'git maintenance (start|stop)' commands. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt index 1c59fd0..7628a6d 100644 --- a/Documentation/git-maintenance.txt +++ b/Documentation/git-maintenance.txt @@ -45,6 +45,17 @@ run:: config options are true. By default, only `maintenance.gc.enabled` is true. +start:: + Start running maintenance on the current repository. This performs + the same config updates as the `register` subcommand, then updates + the background scheduler to run `git maintenance run --scheduled` + on an hourly basis. + +stop:: + Halt the background maintenance schedule. The current repository + is not removed from the list of maintained repositories, in case + the background maintenance is restarted later. + unregister:: Remove the current repository from background maintenance. This only removes the repository from the configured list. It does not diff --git a/Makefile b/Makefile index 7c588ff..c39b39b 100644 --- a/Makefile +++ b/Makefile @@ -690,6 +690,7 @@ TEST_BUILTINS_OBJS += test-advise.o TEST_BUILTINS_OBJS += test-bloom.o TEST_BUILTINS_OBJS += test-chmtime.o TEST_BUILTINS_OBJS += test-config.o +TEST_BUILTINS_OBJS += test-crontab.o TEST_BUILTINS_OBJS += test-ctype.o TEST_BUILTINS_OBJS += test-date.o TEST_BUILTINS_OBJS += test-delta.o diff --git a/builtin/gc.c b/builtin/gc.c index edf1d35..a387f46 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -31,6 +31,7 @@ #include "refs.h" #include "remote.h" #include "object-store.h" +#include "exec-cmd.h" #define FAILED_RUN "failed to run %s" @@ -1456,6 +1457,125 @@ static int maintenance_unregister(void) return run_command(&config_unset); } +#define BEGIN_LINE "# BEGIN GIT MAINTENANCE SCHEDULE" +#define END_LINE "# END GIT MAINTENANCE SCHEDULE" + +static int update_background_schedule(int run_maintenance) +{ + int result = 0; + int in_old_region = 0; + struct child_process crontab_list = CHILD_PROCESS_INIT; + struct child_process crontab_edit = CHILD_PROCESS_INIT; + FILE *cron_list, *cron_in; + const char *crontab_name; + struct strbuf line = STRBUF_INIT; + struct lock_file lk; + char *lock_path = xstrfmt("%s/schedule", the_repository->objects->odb->path); + + if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0) + return error(_("another process is scheduling background maintenance")); + + crontab_name = getenv("GIT_TEST_CRONTAB"); + if (!crontab_name) + crontab_name = "crontab"; + + strvec_split(&crontab_list.args, crontab_name); + strvec_push(&crontab_list.args, "-l"); + crontab_list.in = -1; + crontab_list.out = dup(lk.tempfile->fd); + crontab_list.git_cmd = 0; + + if (start_command(&crontab_list)) { + result = error(_("failed to run 'crontab -l'; your system might not support 'cron'")); + goto cleanup; + } + + /* Ignore exit code, as an empty crontab will return error. */ + finish_command(&crontab_list); + + /* + * Read from the .lock file, filtering out the old + * schedule while appending the new schedule. + */ + cron_list = fdopen(lk.tempfile->fd, "r"); + rewind(cron_list); + + strvec_split(&crontab_edit.args, crontab_name); + crontab_edit.in = -1; + crontab_edit.git_cmd = 0; + + if (start_command(&crontab_edit)) { + result = error(_("failed to run 'crontab'; your system might not support 'cron'")); + goto cleanup; + } + + cron_in = fdopen(crontab_edit.in, "w"); + if (!cron_in) { + result = error(_("failed to open stdin of 'crontab'")); + goto done_editing; + } + + while (!strbuf_getline_lf(&line, cron_list)) { + if (!in_old_region && !strcmp(line.buf, BEGIN_LINE)) + in_old_region = 1; + if (in_old_region) + continue; + fprintf(cron_in, "%s\n", line.buf); + if (in_old_region && !strcmp(line.buf, END_LINE)) + in_old_region = 0; + } + + if (run_maintenance) { + struct strbuf line_format = STRBUF_INIT; + const char *exec_path = git_exec_path(); + + fprintf(cron_in, "%s\n", BEGIN_LINE); + fprintf(cron_in, + "# The following schedule was created by Git\n"); + fprintf(cron_in, "# Any edits made in this region might be\n"); + fprintf(cron_in, + "# replaced in the future by a Git command.\n\n"); + + strbuf_addf(&line_format, + "%%s %%s * * %%s \"%s/git\" --exec-path=\"%s\" for-each-repo --config=maintenance.repo maintenance run --schedule=%%s\n", + exec_path, exec_path); + fprintf(cron_in, line_format.buf, "0", "1-23", "*", "hourly"); + fprintf(cron_in, line_format.buf, "0", "0", "1-6", "daily"); + fprintf(cron_in, line_format.buf, "0", "0", "0", "weekly"); + strbuf_release(&line_format); + + fprintf(cron_in, "\n%s\n", END_LINE); + } + + fflush(cron_in); + fclose(cron_in); + close(crontab_edit.in); + +done_editing: + if (finish_command(&crontab_edit)) { + result = error(_("'crontab' died")); + goto cleanup; + } + fclose(cron_list); + +cleanup: + rollback_lock_file(&lk); + return result; +} + +static int maintenance_start(void) +{ + if (maintenance_register()) + warning(_("failed to add repo to global config")); + + return update_background_schedule(1); +} + +static int maintenance_stop(void) +{ + return update_background_schedule(0); +} + static const char builtin_maintenance_usage[] = N_("git maintenance []"); int cmd_maintenance(int argc, const char **argv, const char *prefix) @@ -1466,6 +1586,10 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix) if (!strcmp(argv[1], "run")) return maintenance_run(argc - 1, argv + 1, prefix); + if (!strcmp(argv[1], "start")) + return maintenance_start(); + if (!strcmp(argv[1], "stop")) + return maintenance_stop(); if (!strcmp(argv[1], "register")) return maintenance_register(); if (!strcmp(argv[1], "unregister")) diff --git a/t/helper/test-crontab.c b/t/helper/test-crontab.c new file mode 100644 index 0000000..e7c0137 --- /dev/null +++ b/t/helper/test-crontab.c @@ -0,0 +1,35 @@ +#include "test-tool.h" +#include "cache.h" + +/* + * Usage: test-tool cron [-l] + * + * If -l is specified, then write the contents of to stdout. + * Otherwise, write from stdin into . + */ +int cmd__crontab(int argc, const char **argv) +{ + int a; + FILE *from, *to; + + if (argc == 3 && !strcmp(argv[2], "-l")) { + from = fopen(argv[1], "r"); + if (!from) + return 0; + to = stdout; + } else if (argc == 2) { + from = stdin; + to = fopen(argv[1], "w"); + } else + return error("unknown arguments"); + + while ((a = fgetc(from)) != EOF) + fputc(a, to); + + if (argc == 3) + fclose(from); + else + fclose(to); + + return 0; +} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 590b2ef..432b49d 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -18,6 +18,7 @@ static struct test_cmd cmds[] = { { "bloom", cmd__bloom }, { "chmtime", cmd__chmtime }, { "config", cmd__config }, + { "crontab", cmd__crontab }, { "ctype", cmd__ctype }, { "date", cmd__date }, { "delta", cmd__delta }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index ddc8e99..7c3281e 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -8,6 +8,7 @@ int cmd__advise_if_enabled(int argc, const char **argv); int cmd__bloom(int argc, const char **argv); int cmd__chmtime(int argc, const char **argv); int cmd__config(int argc, const char **argv); +int cmd__crontab(int argc, const char **argv); int cmd__ctype(int argc, const char **argv); int cmd__date(int argc, const char **argv); int cmd__delta(int argc, const char **argv); diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh index 8f383d0..7715e40 100755 --- a/t/t7900-maintenance.sh +++ b/t/t7900-maintenance.sh @@ -315,4 +315,32 @@ test_expect_success 'register and unregister' ' test_cmp before actual ' +test_expect_success 'start from empty cron table' ' + GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance start && + + # start registers the repo + git config --get --global maintenance.repo "$(pwd)" && + + grep "for-each-repo --config=maintenance.repo maintenance run --schedule=daily" cron.txt && + grep "for-each-repo --config=maintenance.repo maintenance run --schedule=hourly" cron.txt && + grep "for-each-repo --config=maintenance.repo maintenance run --schedule=weekly" cron.txt +' + +test_expect_success 'stop from existing schedule' ' + GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance stop && + + # stop does not unregister the repo + git config --get --global maintenance.repo "$(pwd)" && + + # Operation is idempotent + GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance stop && + test_must_be_empty cron.txt +' + +test_expect_success 'start preserves existing schedule' ' + echo "Important information!" >cron.txt && + GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance start && + grep "Important information!" cron.txt +' + test_done diff --git a/t/test-lib.sh b/t/test-lib.sh index ef31f40..4a60d1e 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -1702,3 +1702,9 @@ test_lazy_prereq SHA1 ' test_lazy_prereq REBASE_P ' test -z "$GIT_TEST_SKIP_REBASE_P" ' + +# Ensure that no test accidentally triggers a Git command +# that runs 'crontab', affecting a user's cron schedule. +# Tests that verify the cron integration must set this locally +# to avoid errors. +GIT_TEST_CRONTAB="exit 1" -- cgit v0.10.2-6-g49f6 From a4cb1a2339c1aa4ba71ebc6d2b07d537e1071124 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 15 Oct 2020 17:22:02 +0000 Subject: maintenance: create maintenance.strategy config MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit To provide an on-ramp for users to use background maintenance without several 'git config' commands, create a 'maintenance.strategy' config option. Currently, the only important value is 'incremental' which assigns the following schedule: * gc: never * prefetch: hourly * commit-graph: hourly * loose-objects: daily * incremental-repack: daily These tasks are chosen to minimize disruptions to foreground Git commands and use few compute resources. The 'maintenance.strategy' is intended as a baseline that can be customzied further by manually assigning 'maintenance..enabled' and 'maintenance..schedule' config options, which will override any recommendation from 'maintenance.strategy'. This operates similarly to config options like 'feature.experimental' which operate as "meta" config options that change default config values. This presents a way forward for updating the 'incremental' strategy in the future or adding new strategies. For example, a potential strategy could be to include a 'full' strategy that runs the 'gc' task weekly and no other tasks by default. Helped-by: Martin Ågren Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt index 7058556..a5ead09 100644 --- a/Documentation/config/maintenance.txt +++ b/Documentation/config/maintenance.txt @@ -3,6 +3,21 @@ maintenance.auto:: `git maintenance run --auto` after doing their normal work. Defaults to true. +maintenance.strategy:: + This string config option provides a way to specify one of a few + recommended schedules for background maintenance. This only affects + which tasks are run during `git maintenance run --schedule=X` + commands, provided no `--task=` arguments are provided. + Further, if a `maintenance..schedule` config value is set, + then that value is used instead of the one provided by + `maintenance.strategy`. The possible strategy strings are: ++ +* `none`: This default setting implies no task are run at any schedule. +* `incremental`: This setting optimizes for performing small maintenance + activities that do not delete any data. This does not schedule the `gc` + task, but runs the `prefetch` and `commit-graph` tasks hourly and the + `loose-objects` and `incremental-repack` tasks daily. + maintenance..enabled:: This boolean config option controls whether the maintenance task with name `` is run when no `--task` option is specified to diff --git a/builtin/gc.c b/builtin/gc.c index a387f46..a8248e7 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -1308,12 +1308,35 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts) return result; } -static void initialize_task_config(void) +static void initialize_maintenance_strategy(void) +{ + char *config_str; + + if (git_config_get_string("maintenance.strategy", &config_str)) + return; + + if (!strcasecmp(config_str, "incremental")) { + tasks[TASK_GC].schedule = SCHEDULE_NONE; + tasks[TASK_COMMIT_GRAPH].enabled = 1; + tasks[TASK_COMMIT_GRAPH].schedule = SCHEDULE_HOURLY; + tasks[TASK_PREFETCH].enabled = 1; + tasks[TASK_PREFETCH].schedule = SCHEDULE_HOURLY; + tasks[TASK_INCREMENTAL_REPACK].enabled = 1; + tasks[TASK_INCREMENTAL_REPACK].schedule = SCHEDULE_DAILY; + tasks[TASK_LOOSE_OBJECTS].enabled = 1; + tasks[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY; + } +} + +static void initialize_task_config(int schedule) { int i; struct strbuf config_name = STRBUF_INIT; gc_config(); + if (schedule) + initialize_maintenance_strategy(); + for (i = 0; i < TASK__COUNT; i++) { int config_value; char *config_str; @@ -1389,7 +1412,6 @@ static int maintenance_run(int argc, const char **argv, const char *prefix) memset(&opts, 0, sizeof(opts)); opts.quiet = !isatty(2); - initialize_task_config(); for (i = 0; i < TASK__COUNT; i++) tasks[i].selected_order = -1; @@ -1402,6 +1424,8 @@ static int maintenance_run(int argc, const char **argv, const char *prefix) if (opts.auto_flag && opts.schedule) die(_("use at most one of --auto and --schedule=")); + initialize_task_config(opts.schedule); + if (argc != 0) usage_with_options(builtin_maintenance_run_usage, builtin_maintenance_run_options); diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh index 7715e40..7440a0e 100755 --- a/t/t7900-maintenance.sh +++ b/t/t7900-maintenance.sh @@ -300,6 +300,55 @@ test_expect_success '--schedule inheritance weekly -> daily -> hourly' ' test_subcommand git multi-pack-index write --no-progress Date: Thu, 15 Oct 2020 17:22:03 +0000 Subject: maintenance: use 'incremental' strategy by default MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The 'git maintenance (register|start)' subcommands add the current repository to the global Git config so maintenance will operate on that repository. It does not specify what maintenance should occur or how often. To make it simple for users to start background maintenance with a recommended schedlue, update the 'maintenance.strategy' config option in both the 'register' and 'start' subcommands. This allows users to customize beyond the defaults using individual 'maintenance..schedule' options, but also the user can opt-out of this strategy using 'maintenance.strategy=none'. Helped-by: Martin Ågren Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt index 7628a6d..b5944b4 100644 --- a/Documentation/git-maintenance.txt +++ b/Documentation/git-maintenance.txt @@ -37,6 +37,23 @@ register:: `maintenance..schedule`. The tasks that are enabled are safe for running in the background without disrupting foreground processes. ++ +The `register` subcomand will also set the `maintenance.strategy` config +value to `incremental`, if this value is not previously set. The +`incremental` strategy uses the following schedule for each maintenance +task: ++ +-- +* `gc`: disabled. +* `commit-graph`: hourly. +* `prefetch`: hourly. +* `loose-objects`: daily. +* `incremental-repack`: daily. +-- ++ +`git maintenance register` will also disable foreground maintenance by +setting `maintenance.auto = false` in the current repository. This config +setting will remain after a `git maintenance unregister` command. run:: Run one or more maintenance tasks. If one or more `--task` options diff --git a/builtin/gc.c b/builtin/gc.c index a8248e7..e3098ef 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -1434,6 +1434,7 @@ static int maintenance_run(int argc, const char **argv, const char *prefix) static int maintenance_register(void) { + char *config_value; struct child_process config_set = CHILD_PROCESS_INIT; struct child_process config_get = CHILD_PROCESS_INIT; @@ -1441,6 +1442,15 @@ static int maintenance_register(void) if (!the_repository || !the_repository->gitdir) return 0; + /* Disable foreground maintenance */ + git_config_set("maintenance.auto", "false"); + + /* Set maintenance strategy, if unset */ + if (!git_config_get_string("maintenance.strategy", &config_value)) + free(config_value); + else + git_config_set("maintenance.strategy", "incremental"); + config_get.git_cmd = 1; strvec_pushl(&config_get.args, "config", "--global", "--get", "maintenance.repo", the_repository->worktree ? the_repository->worktree diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh index 7440a0e..20184e9 100755 --- a/t/t7900-maintenance.sh +++ b/t/t7900-maintenance.sh @@ -354,11 +354,14 @@ test_expect_success 'register and unregister' ' git config --global --add maintenance.repo /existing1 && git config --global --add maintenance.repo /existing2 && git config --global --get-all maintenance.repo >before && + git maintenance register && - git config --global --get-all maintenance.repo >actual && - cp before after && - pwd >>after && - test_cmp after actual && + test_cmp_config false maintenance.auto && + git config --global --get-all maintenance.repo >between && + cp before expect && + pwd >>expect && + test_cmp expect between && + git maintenance unregister && git config --global --get-all maintenance.repo >actual && test_cmp before actual @@ -392,4 +395,13 @@ test_expect_success 'start preserves existing schedule' ' grep "Important information!" cron.txt ' +test_expect_success 'register preserves existing strategy' ' + git config maintenance.strategy none && + git maintenance register && + test_config maintenance.strategy none && + git config --unset maintenance.strategy && + git maintenance register && + test_config maintenance.strategy incremental +' + test_done -- cgit v0.10.2-6-g49f6 From 0016b618182f642771dc589cf0090289f9fe1b4f Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 15 Oct 2020 17:22:04 +0000 Subject: maintenance: add troubleshooting guide to docs The 'git maintenance run' subcommand takes a lock on the object database to prevent concurrent processes from competing for resources. This is an important safety measure to prevent possible repository corruption and data loss. This feature can lead to confusing behavior if a user is not aware of it. Add a TROUBLESHOOTING section to the 'git maintenance' builtin documentation that discusses these tradeoffs. The short version of this section is that Git will not corrupt your repository, but if the list of scheduled tasks takes longer than an hour then some scheduled tasks may be dropped due to this object database collision. For example, a long-running "daily" task at midnight might prevent an "hourly" task from running at 1AM. The opposite is also possible, but less likely as long as the "hourly" tasks are much faster than the "daily" and "weekly" tasks. Helped-by: Junio C Hamano Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt index b5944b4..6fec1eb 100644 --- a/Documentation/git-maintenance.txt +++ b/Documentation/git-maintenance.txt @@ -175,6 +175,50 @@ OPTIONS `maintenance..enabled` configured as `true` are considered. See the 'TASKS' section for the list of accepted `` values. + +TROUBLESHOOTING +--------------- +The `git maintenance` command is designed to simplify the repository +maintenance patterns while minimizing user wait time during Git commands. +A variety of configuration options are available to allow customizing this +process. The default maintenance options focus on operations that complete +quickly, even on large repositories. + +Users may find some cases where scheduled maintenance tasks do not run as +frequently as intended. Each `git maintenance run` command takes a lock on +the repository's object database, and this prevents other concurrent +`git maintenance run` commands from running on the same repository. Without +this safeguard, competing processes could leave the repository in an +unpredictable state. + +The background maintenance schedule runs `git maintenance run` processes +on an hourly basis. Each run executes the "hourly" tasks. At midnight, +that process also executes the "daily" tasks. At midnight on the first day +of the week, that process also executes the "weekly" tasks. A single +process iterates over each registered repository, performing the scheduled +tasks for that frequency. Depending on the number of registered +repositories and their sizes, this process may take longer than an hour. +In this case, multiple `git maintenance run` commands may run on the same +repository at the same time, colliding on the object database lock. This +results in one of the two tasks not running. + +If you find that some maintenance windows are taking longer than one hour +to complete, then consider reducing the complexity of your maintenance +tasks. For example, the `gc` task is much slower than the +`incremental-repack` task. However, this comes at a cost of a slightly +larger object database. Consider moving more expensive tasks to be run +less frequently. + +Expert users may consider scheduling their own maintenance tasks using a +different schedule than is available through `git maintenance start` and +Git configuration options. These users should be aware of the object +database lock and how concurrent `git maintenance run` commands behave. +Further, the `git gc` command should not be combined with +`git maintenance run` commands. `git gc` modifies the object database +but does not take the lock in the same way as `git maintenance run`. If +possible, use `git maintenance run --task=gc` instead of `git gc`. + + GIT --- Part of the linkgit:git[1] suite -- cgit v0.10.2-6-g49f6