summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorJeff King <peff@peff.net>2020-06-25 19:48:32 (GMT)
committerJunio C Hamano <gitster@pobox.com>2020-06-25 21:19:23 (GMT)
commit65b5d9fae7684a282f48295b645c2f9da77c2736 (patch)
tree3a5c42c4ec2c542d7327403ad712796f91680669 /Documentation
parentd5bf91fde4430532eb725425c3ef9827048af6b5 (diff)
downloadgit-65b5d9fae7684a282f48295b645c2f9da77c2736.zip
git-65b5d9fae7684a282f48295b645c2f9da77c2736.tar.gz
git-65b5d9fae7684a282f48295b645c2f9da77c2736.tar.bz2
fast-export: allow seeding the anonymized mapping
After you anonymize a repository, it can be hard to find which commits correspond between the original and the result, and thus hard to reproduce commands that triggered bugs in the original. Let's make it possible to seed the anonymization map. This lets users either: - mark names to be retained as-is, if they don't consider them secret (in which case their original commands would just work) - map names to new values, which lets them adapt the reproduction recipe to the new names without revealing the originals The implementation is fairly straight-forward. We already store each anonymized token in a hashmap (so that the same token appearing twice is converted to the same result). We can just introduce a new "seed" hashmap which is consulted first. This does make a few more promises to the user about how we'll anonymize things (e.g., token-splitting pathnames). But it's unlikely that we'd want to change those rules, even if the actual anonymization of a single token changes. And it makes things much easier for the user, who can unblind only a directory name without having to specify each path within it. One alternative to this approach would be to anonymize as we see fit, and then dump the whole refname and pathname mappings to a file. This does work, but it's a bit awkward to use (you have to manually dig the items you care about out of the mapping). Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/git-fast-export.txt29
1 files changed, 29 insertions, 0 deletions
diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
index e8950de..1978dbd 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -119,6 +119,11 @@ by keeping the marks the same across runs.
the shape of the history and stored tree. See the section on
`ANONYMIZING` below.
+--anonymize-map=<from>[:<to>]::
+ Convert token `<from>` to `<to>` in the anonymized output. If
+ `<to>` is omitted, map `<from>` to itself (i.e., do not
+ anonymize it). See the section on `ANONYMIZING` below.
+
--reference-excluded-parents::
By default, running a command such as `git fast-export
master~5..master` will not include the commit master{tilde}5
@@ -238,6 +243,30 @@ collapse "User 0", "User 1", etc into "User X"). This produces a much
smaller output, and it is usually easy to quickly confirm that there is
no private data in the stream.
+Reproducing some bugs may require referencing particular commits or
+paths, which becomes challenging after refnames and paths have been
+anonymized. You can ask for a particular token to be left as-is or
+mapped to a new value. For example, if you have a bug which reproduces
+with `git rev-list sensitive -- secret.c`, you can run:
+
+---------------------------------------------------
+$ git fast-export --anonymize --all \
+ --anonymize-map=sensitive:foo \
+ --anonymize-map=secret.c:bar.c \
+ >stream
+---------------------------------------------------
+
+After importing the stream, you can then run `git rev-list foo -- bar.c`
+in the anonymized repository.
+
+Note that paths and refnames are split into tokens at slash boundaries.
+The command above would anonymize `subdir/secret.c` as something like
+`path123/bar.c`; you could then search for `bar.c` in the anonymized
+repository to determine the final pathname.
+
+To make referencing the final pathname simpler, you can map each path
+component; so if you also anonymize `subdir` to `publicdir`, then the
+final pathname would be `publicdir/bar.c`.
LIMITATIONS
-----------