diff options
Diffstat (limited to 'Documentation/git-sparse-checkout.txt')
-rw-r--r-- | Documentation/git-sparse-checkout.txt | 466 |
1 files changed, 366 insertions, 100 deletions
diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt index 7c8943a..529a8ed 100644 --- a/Documentation/git-sparse-checkout.txt +++ b/Documentation/git-sparse-checkout.txt @@ -3,22 +3,32 @@ git-sparse-checkout(1) NAME ---- -git-sparse-checkout - Initialize and modify the sparse-checkout -configuration, which reduces the checkout to a set of paths -given by a list of patterns. +git-sparse-checkout - Reduce your working tree to a subset of tracked files SYNOPSIS -------- [verse] -'git sparse-checkout <subcommand> [options]' +'git sparse-checkout' (init | list | set | add | reapply | disable | check-rules) [<options>] DESCRIPTION ----------- -Initialize and modify the sparse-checkout configuration, which reduces -the checkout to a set of paths given by a list of patterns. +This command is used to create sparse checkouts, which change the +working tree from having all tracked files present to only having a +subset of those files. It can also switch which subset of files are +present, or undo and go back to having all tracked files present in +the working copy. + +The subset of files is chosen by providing a list of directories in +cone mode (the default), or by providing a list of patterns in +non-cone mode. + +When in a sparse-checkout, other Git commands behave a bit differently. +For example, switching branches will not update paths outside the +sparse-checkout directories/patterns, and `git commit -a` will not record +paths outside the sparse-checkout directories/patterns as deleted. THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN @@ -28,47 +38,63 @@ THE FUTURE. COMMANDS -------- 'list':: - Describe the patterns in the sparse-checkout file. + Describe the directories or patterns in the sparse-checkout file. -'init':: - Enable the `core.sparseCheckout` setting. If the - sparse-checkout file does not exist, then populate it with - patterns that match every file in the root directory and - no other directories, then will remove all directories tracked - by Git. Add patterns to the sparse-checkout file to - repopulate the working directory. +'set':: + Enable the necessary sparse-checkout config settings + (`core.sparseCheckout`, `core.sparseCheckoutCone`, and + `index.sparse`) if they are not already set to the desired values, + populate the sparse-checkout file from the list of arguments + following the 'set' subcommand, and update the working directory to + match. + -To avoid interfering with other worktrees, it first enables the -`extensions.worktreeConfig` setting and makes sure to set the -`core.sparseCheckout` setting in the worktree-specific config file. +To ensure that adjusting the sparse-checkout settings within a worktree +does not alter the sparse-checkout settings in other worktrees, the 'set' +subcommand will upgrade your repository config to use worktree-specific +config if not already present. The sparsity defined by the arguments to +the 'set' subcommand are stored in the worktree-specific sparse-checkout +file. See linkgit:git-worktree[1] and the documentation of +`extensions.worktreeConfig` in linkgit:git-config[1] for more details. + -When `--cone` is provided, the `core.sparseCheckoutCone` setting is -also set, allowing for better performance with a limited set of -patterns (see 'CONE PATTERN SET' below). - -'set':: - Write a set of patterns to the sparse-checkout file, as given as - a list of arguments following the 'set' subcommand. Update the - working directory to match the new patterns. Enable the - core.sparseCheckout config setting if it is not already enabled. +When the `--stdin` option is provided, the directories or patterns are +read from standard in as a newline-delimited list instead of from the +arguments. ++ +By default, the input list is considered a list of directories, matching +the output of `git ls-tree -d --name-only`. This includes interpreting +pathnames that begin with a double quote (") as C-style quoted strings. +Note that all files under the specified directories (at any depth) will +be included in the sparse checkout, as well as files that are siblings +of either the given directory or any of its ancestors (see 'CONE PATTERN +SET' below for more details). In the past, this was not the default, +and `--cone` needed to be specified or `core.sparseCheckoutCone` needed +to be enabled. ++ +When `--no-cone` is passed, the input list is considered a list of +patterns. This mode has a number of drawbacks, including not working +with some options like `--sparse-index`. As explained in the +"Non-cone Problems" section below, we do not recommend using it. + -When the `--stdin` option is provided, the patterns are read from -standard in as a newline-delimited list instead of from the arguments. +Use the `--[no-]sparse-index` option to use a sparse index (the +default is to not use it). A sparse index reduces the size of the +index to be more closely aligned with your sparse-checkout +definition. This can have significant performance advantages for +commands such as `git status` or `git add`. This feature is still +experimental. Some commands might be slower with a sparse index until +they are properly integrated with the feature. + -When `core.sparseCheckoutCone` is enabled, the input list is considered a -list of directories instead of sparse-checkout patterns. The command writes -patterns to the sparse-checkout file to include all files contained in those -directories (recursively) as well as files that are siblings of ancestor -directories. The input format matches the output of `git ls-tree --name-only`. -This includes interpreting pathnames that begin with a double quote (") as -C-style quoted strings. +**WARNING:** Using a sparse index requires modifying the index in a way +that is not completely understood by external tools. If you have trouble +with this compatibility, then run `git sparse-checkout init --no-sparse-index` +to rewrite your index to not be sparse. Older versions of Git will not +understand the sparse directory entries index extension and may fail to +interact with your repository until it is disabled. 'add':: - Update the sparse-checkout file to include additional patterns. - By default, these patterns are read from the command-line arguments, - but they can be read from stdin using the `--stdin` option. When - `core.sparseCheckoutCone` is enabled, the given patterns are interpreted - as directory names as in the 'set' subcommand. + Update the sparse-checkout file to include additional directories + (in cone mode) or patterns (in non-cone mode). By default, these + directories or patterns are read from the command-line arguments, + but they can be read from stdin using the `--stdin` option. 'reapply':: Reapply the sparsity pattern rules to paths in the working tree. @@ -79,23 +105,107 @@ C-style quoted strings. cases, it can make sense to run `git sparse-checkout reapply` later after cleaning up affected paths (e.g. resolving conflicts, undoing or committing changes, etc.). ++ +The `reapply` command can also take `--[no-]cone` and `--[no-]sparse-index` +flags, with the same meaning as the flags from the `set` command, in order +to change which sparsity mode you are using without needing to also respecify +all sparsity paths. 'disable':: Disable the `core.sparseCheckout` config setting, and restore the - working directory to include all files. Leaves the sparse-checkout - file intact so a later 'git sparse-checkout init' command may - return the working directory to the same state. + working directory to include all files. + +'init':: + Deprecated command that behaves like `set` with no specified paths. + May be removed in the future. ++ +Historically, `set` did not handle all the necessary config settings, +which meant that both `init` and `set` had to be called. Invoking +both meant the `init` step would first remove nearly all tracked files +(and in cone mode, ignored files too), then the `set` step would add +many of the tracked files (but not ignored files) back. In addition +to the lost files, the performance and UI of this combination was +poor. ++ +Also, historically, `init` would not actually initialize the +sparse-checkout file if it already existed. This meant it was +possible to return to a sparse-checkout without remembering which +paths to pass to a subsequent 'set' or 'add' command. However, +`--cone` and `--sparse-index` options would not be remembered across +the disable command, so the easy restore of calling a plain `init` +decreased in utility. + +'check-rules':: + Check whether sparsity rules match one or more paths. ++ +By default `check-rules` reads a list of paths from stdin and outputs only +the ones that match the current sparsity rules. The input is expected to consist +of one path per line, matching the output of `git ls-tree --name-only` including +that pathnames that begin with a double quote (") are interpreted as C-style +quoted strings. ++ +When called with the `--rules-file <file>` flag the input files are matched +against the sparse checkout rules found in `<file>` instead of the current ones. +The rules in the files are expected to be in the same form as accepted by `git +sparse-checkout set --stdin` (in particular, they must be newline-delimited). ++ +By default, the rules passed to the `--rules-file` option are interpreted as +cone mode directories. To pass non-cone mode patterns with `--rules-file`, +combine the option with the `--no-cone` option. ++ +When called with the `-z` flag, the format of the paths input on stdin as well +as the output paths are \0 terminated and not quoted. Note that this does not +apply to the format of the rules passed with the `--rules-file` option. -SPARSE CHECKOUT ---------------- -"Sparse checkout" allows populating the working directory sparsely. -It uses the skip-worktree bit (see linkgit:git-update-index[1]) to tell -Git whether a file in the working directory is worth looking at. If -the skip-worktree bit is set, then the file is ignored in the working -directory. Git will not populate the contents of those files, which -makes a sparse checkout helpful when working in a repository with many -files, but only a few are important to the current user. +EXAMPLES +-------- +`git sparse-checkout set MY/DIR1 SUB/DIR2`:: + + Change to a sparse checkout with all files (at any depth) under + MY/DIR1/ and SUB/DIR2/ present in the working copy (plus all + files immediately under MY/ and SUB/ and the toplevel + directory). If already in a sparse checkout, change which files + are present in the working copy to this new selection. Note + that this command will also delete all ignored files in any + directory that no longer has either tracked or + non-ignored-untracked files present. + +`git sparse-checkout disable`:: + + Repopulate the working directory with all files, disabling sparse + checkouts. + +`git sparse-checkout add SOME/DIR/ECTORY`:: + + Add all files under SOME/DIR/ECTORY/ (at any depth) to the + sparse checkout, as well as all files immediately under + SOME/DIR/ and immediately under SOME/. Must already be in a + sparse checkout before using this command. + +`git sparse-checkout reapply`:: + + It is possible for commands to update the working tree in a + way that does not respect the selected sparsity directories. + This can come from tools external to Git writing files, or + even affect Git commands because of either special cases (such + as hitting conflicts when merging/rebasing), or because some + commands didn't fully support sparse checkouts (e.g. the old + `recursive` merge backend had only limited support). This + command reapplies the existing sparse directory specifications + to make the working directory match. + +INTERNALS -- SPARSE CHECKOUT +---------------------------- + +"Sparse checkout" allows populating the working directory sparsely. It +uses the skip-worktree bit (see linkgit:git-update-index[1]) to tell Git +whether a file in the working directory is worth looking at. If the +skip-worktree bit is set, and the file is not present in the working tree, +then its absence is ignored. Git will avoid populating the contents of +those files, which makes a sparse checkout helpful when working in a +repository with many files, but only a few are important to the current +user. The `$GIT_DIR/info/sparse-checkout` file is used to define the skip-worktree reference bitmap. When Git updates the working @@ -103,66 +213,196 @@ directory, it updates the skip-worktree bits in the index based on this file. The files matching the patterns in the file will appear in the working directory, and the rest will not. -To enable the sparse-checkout feature, run `git sparse-checkout init` to -initialize a simple sparse-checkout file and enable the `core.sparseCheckout` -config setting. Then, run `git sparse-checkout set` to modify the patterns in -the sparse-checkout file. +INTERNALS -- NON-CONE PROBLEMS +------------------------------ + +The `$GIT_DIR/info/sparse-checkout` file populated by the `set` and +`add` subcommands is defined to be a bunch of patterns (one per line) +using the same syntax as `.gitignore` files. In cone mode, these +patterns are restricted to matching directories (and users only ever +need supply or see directory names), while in non-cone mode any +gitignore-style pattern is permitted. Using the full gitignore-style +patterns in non-cone mode has a number of shortcomings: + + * Fundamentally, it makes various worktree-updating processes (pull, + merge, rebase, switch, reset, checkout, etc.) require O(N*M) pattern + matches, where N is the number of patterns and M is the number of + paths in the index. This scales poorly. + + * Avoiding the scaling issue has to be done via limiting the number + of patterns via specifying leading directory name or glob. + + * Passing globs on the command line is error-prone as users may + forget to quote the glob, causing the shell to expand it into all + matching files and pass them all individually along to + sparse-checkout set/add. While this could also be a problem with + e.g. "git grep -- *.c", mistakes with grep/log/status appear in + the immediate output. With sparse-checkout, the mistake gets + recorded at the time the sparse-checkout command is run and might + not be problematic until the user later switches branches or rebases + or merges, thus putting a delay between the user's error and when + they have a chance to catch/notice it. + + * Related to the previous item, sparse-checkout has an 'add' + subcommand but no 'remove' subcommand. Even if a 'remove' + subcommand were added, undoing an accidental unquoted glob runs + the risk of "removing too much", as it may remove entries that had + been included before the accidental add. + + * Non-cone mode uses gitignore-style patterns to select what to + *include* (with the exception of negated patterns), while + .gitignore files use gitignore-style patterns to select what to + *exclude* (with the exception of negated patterns). The + documentation on gitignore-style patterns usually does not talk in + terms of matching or non-matching, but on what the user wants to + "exclude". This can cause confusion for users trying to learn how + to specify sparse-checkout patterns to get their desired behavior. + + * Every other git subcommand that wants to provide "special path + pattern matching" of some sort uses pathspecs, but non-cone mode + for sparse-checkout uses gitignore patterns, which feels + inconsistent. + + * It has edge cases where the "right" behavior is unclear. Two examples: + + First, two users are in a subdirectory, and the first runs + git sparse-checkout set '/toplevel-dir/*.c' + while the second runs + git sparse-checkout set relative-dir + Should those arguments be transliterated into + current/subdirectory/toplevel-dir/*.c + and + current/subdirectory/relative-dir + before inserting into the sparse-checkout file? The user who typed + the first command is probably aware that arguments to set/add are + supposed to be patterns in non-cone mode, and probably would not be + happy with such a transliteration. However, many gitignore-style + patterns are just paths, which might be what the user who typed the + second command was thinking, and they'd be upset if their argument + wasn't transliterated. + + Second, what should bash-completion complete on for set/add commands + for non-cone users? If it suggests paths, is it exacerbating the + problem above? Also, if it suggests paths, what if the user has a + file or directory that begins with either a '!' or '#' or has a '*', + '\', '?', '[', or ']' in its name? And if it suggests paths, will + it complete "/pro" to "/proc" (in the root filesystem) rather than to + "/progress.txt" in the current directory? (Note that users are + likely to want to start paths with a leading '/' in non-cone mode, + for the same reason that .gitignore files often have one.) + Completing on files or directories might give nasty surprises in + all these cases. + + * The excessive flexibility made other extensions essentially + impractical. `--sparse-index` is likely impossible in non-cone + mode; even if it is somehow feasible, it would have been far more + work to implement and may have been too slow in practice. Some + ideas for adding coupling between partial clones and sparse + checkouts are only practical with a more restricted set of paths + as well. + +For all these reasons, non-cone mode is deprecated. Please switch to +using cone mode. + + +INTERNALS -- CONE MODE HANDLING +------------------------------- + +The "cone mode", which is the default, lets you specify only what +directories to include. For any directory specified, all paths below +that directory will be included, and any paths immediately under +leading directories (including the toplevel directory) will also be +included. Thus, if you specified the directory + Documentation/technical/ +then your sparse checkout would contain: + + * all files in the toplevel-directory + * all files immediately under Documentation/ + * all files at any depth under Documentation/technical/ + +Also, in cone mode, even if no directories are specified, then the +files in the toplevel directory will be included. + +When changing the sparse-checkout patterns in cone mode, Git will inspect each +tracked directory that is not within the sparse-checkout cone to see if it +contains any untracked files. If all of those files are ignored due to the +`.gitignore` patterns, then the directory will be deleted. If any of the +untracked files within that directory is not ignored, then no deletions will +occur within that directory and a warning message will appear. If these files +are important, then reset your sparse-checkout definition so they are included, +use `git add` and `git commit` to store them, then remove any remaining files +manually to ensure Git can behave optimally. + +See also the "Internals -- Cone Pattern Set" section to learn how the +directories are transformed under the hood into a subset of the +Full Pattern Set of sparse-checkout. + + +INTERNALS -- FULL PATTERN SET +----------------------------- -To repopulate the working directory with all files, use the -`git sparse-checkout disable` command. +The full pattern set allows for arbitrary pattern matches and complicated +inclusion/exclusion rules. These can result in O(N*M) pattern matches when +updating the index, where N is the number of patterns and M is the number +of paths in the index. To combat this performance issue, a more restricted +pattern set is allowed when `core.sparseCheckoutCone` is enabled. +The sparse-checkout file uses the same syntax as `.gitignore` files; +see linkgit:gitignore[5] for details. Here, though, the patterns are +usually being used to select which files to include rather than which +files to exclude. (However, it can get a bit confusing since +gitignore-style patterns have negations defined by patterns which +begin with a '!', so you can also select files to _not_ include.) -FULL PATTERN SET ----------------- +For example, to select everything, and then to remove the file +`unwanted` (so that every file will appear in your working tree except +the file named `unwanted`): -By default, the sparse-checkout file uses the same syntax as `.gitignore` -files. + git sparse-checkout set --no-cone '/*' '!unwanted' -While `$GIT_DIR/info/sparse-checkout` is usually used to specify what -files are included, you can also specify what files are _not_ included, -using negative patterns. For example, to remove the file `unwanted`: +These patterns are just placed into the +`$GIT_DIR/info/sparse-checkout` as-is, so the contents of that file +at this point would be ---------------- /* !unwanted ---------------- +See also the "Sparse Checkout" section of linkgit:git-read-tree[1] to +learn more about the gitignore-style patterns used in sparse +checkouts. -CONE PATTERN SET ----------------- -The full pattern set allows for arbitrary pattern matches and complicated -inclusion/exclusion rules. These can result in O(N*M) pattern matches when -updating the index, where N is the number of patterns and M is the number -of paths in the index. To combat this performance issue, a more restricted -pattern set is allowed when `core.sparseCheckoutCone` is enabled. +INTERNALS -- CONE PATTERN SET +----------------------------- -The accepted patterns in the cone pattern set are: +In cone mode, only directories are accepted, but they are translated into +the same gitignore-style patterns used in the full pattern set. We refer +to the particular patterns used in those mode as being of one of two types: 1. *Recursive:* All paths inside a directory are included. 2. *Parent:* All files immediately inside a directory are included. -In addition to the above two patterns, we also expect that all files in the -root directory are included. If a recursive pattern is added, then all -leading directories are added as parent patterns. - -By default, when running `git sparse-checkout init`, the root directory is -added as a parent pattern. At this point, the sparse-checkout file contains -the following patterns: +Since cone mode always includes files at the toplevel, when running +`git sparse-checkout set` with no directories specified, the toplevel +directory is added as a parent pattern. At this point, the +sparse-checkout file contains the following patterns: ---------------- /* !/*/ ---------------- -This says "include everything in root, but nothing two levels below root." +This says "include everything immediately under the toplevel +directory, but nothing at any level below that." -When in cone mode, the `git sparse-checkout set` subcommand takes a list of -directories instead of a list of sparse-checkout patterns. In this mode, -the command `git sparse-checkout set A/B/C` sets the directory `A/B/C` as -a recursive pattern, the directories `A` and `A/B` are added as parent -patterns. The resulting sparse-checkout file is now +When in cone mode, the `git sparse-checkout set` subcommand takes a +list of directories. The command `git sparse-checkout set A/B/C` sets +the directory `A/B/C` as a recursive pattern, the directories `A` and +`A/B` are added as parent patterns. The resulting sparse-checkout file +is now ---------------- /* @@ -177,14 +417,18 @@ patterns. The resulting sparse-checkout file is now Here, order matters, so the negative patterns are overridden by the positive patterns that appear lower in the file. -If `core.sparseCheckoutCone=true`, then Git will parse the sparse-checkout file -expecting patterns of these types. Git will warn if the patterns do not match. -If the patterns do match the expected format, then Git will use faster hash- -based algorithms to compute inclusion in the sparse-checkout. +Unless `core.sparseCheckoutCone` is explicitly set to `false`, Git will +parse the sparse-checkout file expecting patterns of these types. Git will +warn if the patterns do not match. If the patterns do match the expected +format, then Git will use faster hash-based algorithms to compute inclusion +in the sparse-checkout. If they do not match, git will behave as though +`core.sparseCheckoutCone` was false, regardless of its setting. -In the cone mode case, the `git sparse-checkout list` subcommand will list the -directories that define the recursive patterns. For the example sparse-checkout -file above, the output is as follows: +In the cone mode case, despite the fact that full patterns are written +to the $GIT_DIR/info/sparse-checkout file, the `git sparse-checkout +list` subcommand will list the directories that define the recursive +patterns. For the example sparse-checkout file above, the output is as +follows: -------------------------- $ git sparse-checkout list @@ -197,13 +441,35 @@ case-insensitive check. This corrects for case mismatched filenames in the directory. -SUBMODULES ----------- - -If your repository contains one or more submodules, then those submodules will -appear based on which you initialized with the `git submodule` command. If -your sparse-checkout patterns exclude an initialized submodule, then that -submodule will still appear in your working directory. +INTERNALS -- SUBMODULES +----------------------- + +If your repository contains one or more submodules, then submodules +are populated based on interactions with the `git submodule` command. +Specifically, `git submodule init -- <path>` will ensure the submodule +at `<path>` is present, while `git submodule deinit [-f] -- <path>` +will remove the files for the submodule at `<path>` (including any +untracked files, uncommitted changes, and unpushed history). Similar +to how sparse-checkout removes files from the working tree but still +leaves entries in the index, deinitialized submodules are removed from +the working directory but still have an entry in the index. + +Since submodules may have unpushed changes or untracked files, +removing them could result in data loss. Thus, changing sparse +inclusion/exclusion rules will not cause an already checked out +submodule to be removed from the working copy. Said another way, just +as `checkout` will not cause submodules to be automatically removed or +initialized even when switching between branches that remove or add +submodules, using `sparse-checkout` to reduce or expand the scope of +"interesting" files will not cause submodules to be automatically +deinitialized or initialized either. + +Further, the above facts mean that there are multiple reasons that +"tracked" files might not be present in the working copy: sparsity +pattern application from sparse-checkout, and submodule initialization +state. Thus, commands like `git grep` that work on tracked files in +the working copy may return results that are limited by either or both +of these restrictions. SEE ALSO |