path: root/builtin/fast-export.c
AgeCommit message (Collapse)Author
2015-11-20Remove get_object_hash.brian m. carlson
Convert all instances of get_object_hash to use an appropriate reference to the hash member of the oid member of struct object. This provides no functional change, as it is essentially a macro substitution. Signed-off-by: brian m. carlson <> Signed-off-by: Jeff King <>
2015-11-20Convert struct object to object_idbrian m. carlson
struct object is one of the major data structures dealing with object IDs. Convert it to use struct object_id instead of an unsigned char array. Convert get_object_hash to refer to the new member as well. Signed-off-by: brian m. carlson <> Signed-off-by: Jeff King <>
2015-11-20Add several uses of get_object_hash.brian m. carlson
Convert most instances where the sha1 member of struct object is dereferenced to use get_object_hash. Most instances that are passed to functions that have versions taking struct object_id, such as get_sha1_hex/get_oid_hex, or instances that can be trivially converted to use struct object_id instead, are not converted. Signed-off-by: brian m. carlson <> Signed-off-by: Jeff King <>
2015-06-22refs: move the remaining ref module declarations to refs.hMichael Haggerty
Some functions from the refs module were still declared in cache.h. Move them to refs.h. Signed-off-by: Michael Haggerty <> Signed-off-by: Junio C Hamano <>
2014-08-27teach fast-export an --anonymize optionJeff King
Sometimes users want to report a bug they experience on their repository, but they are not at liberty to share the contents of the repository. It would be useful if they could produce a repository that has a similar shape to its history and tree, but without leaking any information. This "anonymized" repository could then be shared with developers (assuming it still replicates the original problem). This patch implements an "--anonymize" option to fast-export, which generates a stream that can recreate such a repository. Producing a single stream makes it easy for the caller to verify that they are not leaking any useful information. You can get an overview of what will be shared by running a command like: git fast-export --anonymize --all | perl -pe 's/\d+/X/g' | sort -u | less which will show every unique line we generate, modulo any numbers (each anonymized token is assigned a number, like "User 0", and we replace it consistently in the output). In addition to anonymizing, this produces test cases that are relatively small (compared to the original repository) and fast to generate (compared to using filter-branch, or modifying the output of fast-export yourself). Here are numbers for git.git: $ time git fast-export --anonymize --all \ --tag-of-filtered-object=drop >output real 0m2.883s user 0m2.828s sys 0m0.052s $ gzip output $ ls -lh output.gz | awk '{print $5}' 2.9M Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2014-07-02Merge branch 'jk/commit-buffer-length'Junio C Hamano
Move "commit->buffer" out of the in-core commit object and keep track of their lengths. Use this to optimize the code paths to validate GPG signatures in commit objects. * jk/commit-buffer-length: reuse cached commit buffer when parsing signatures commit: record buffer length in cache commit: convert commit->buffer to a slab commit-slab: provide a static initializer use get_commit_buffer everywhere convert logmsg_reencode to get_commit_buffer use get_commit_buffer to avoid duplicate code use get_cached_commit_buffer where appropriate provide helpers to access the commit buffer provide a helper to set the commit buffer provide a helper to free commit buffer sequencer: use logmsg_reencode in get_message logmsg_reencode: return const buffer do not create "struct commit" with xcalloc commit: push commit_index update into alloc_commit_node alloc: include any-object allocations in alloc_report replace dangerous uses of strbuf_attach commit_tree: take a pointer/len pair rather than a const strbuf
2014-06-13commit: record buffer length in cacheJeff King
Most callsites which use the commit buffer try to use the cached version attached to the commit, rather than re-reading from disk. Unfortunately, that interface provides only a pointer to the NUL-terminated buffer, with no indication of the original length. For the most part, this doesn't matter. People do not put NULs in their commit messages, and the log code is happy to treat it all as a NUL-terminated string. However, some code paths do care. For example, when checking signatures, we want to be very careful that we verify all the bytes to avoid malicious trickery. This patch just adds an optional "size" out-pointer to get_commit_buffer and friends. The existing callers all pass NULL (there did not seem to be any obvious sites where we could avoid an immediate strlen() call, though perhaps with some further refactoring we could). Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2014-06-13use get_commit_buffer everywhereJeff King
Each of these sites assumes that commit->buffer is valid. Since they would segfault if this was not the case, they are likely to be correct in practice. However, we can future-proof them by using get_commit_buffer. And as a side effect, we abstract away the final bare uses of commit->buffer. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2014-04-21fast-export: add support to delete refsFelipe Contreras
Signed-off-by: Felipe Contreras <> Signed-off-by: Junio C Hamano <>
2014-04-21fast-export: add new --refspec optionFelipe Contreras
So that we can convert the exported ref names. Signed-off-by: Felipe Contreras <> Signed-off-by: Junio C Hamano <>
2014-04-21fast-export: improve argument parsingFelipe Contreras
We don't want to pass arguments specific to fast-export to setup_revisions. Signed-off-by: Felipe Contreras <> Signed-off-by: Junio C Hamano <>
2013-12-17Merge branch 'cc/starts-n-ends-with'Junio C Hamano
Remove a few duplicate implementations of prefix/suffix comparison functions, and rename them to starts_with and ends_with. * cc/starts-n-ends-with: replace {pre,suf}fixcmp() with {starts,ends}_with() strbuf: introduce starts_with() and ends_with() builtin/remote: remove postfixcmp() and use suffixcmp() instead environment: normalize use of prefixcmp() by removing " != 0"
2013-12-05replace {pre,suf}fixcmp() with {starts,ends}_with()Christian Couder
Leaving only the function definitions and declarations so that any new topic in flight can still make use of the old functions, replace existing uses of the prefixcmp() and suffixcmp() with new API functions. The change can be recreated by mechanically applying this: $ git grep -l -e prefixcmp -e suffixcmp -- \*.c | grep -v strbuf\\.c | xargs perl -pi -e ' s|!prefixcmp\(|starts_with\(|g; s|prefixcmp\(|!starts_with\(|g; s|!suffixcmp\(|ends_with\(|g; s|suffixcmp\(|!ends_with\(|g; ' on the result of preparatory changes in this series. Signed-off-by: Christian Couder <> Signed-off-by: Junio C Hamano <>
2013-12-05Merge branch 'jk/robustify-parse-commit'Junio C Hamano
* jk/robustify-parse-commit: checkout: do not die when leaving broken detached HEAD use parse_commit_or_die instead of custom message use parse_commit_or_die instead of segfaulting assume parse_commit checks for NULL commit assume parse_commit checks commit->object.parsed log_tree_diff: die when we fail to parse a commit
2013-10-24use parse_commit_or_die instead of segfaultingJeff King
Some unchecked calls to parse_commit should obviously die on error, because their next step is to start looking at the parsed fields, which will cause a segfault. These are obvious candidates for parse_commit_or_die, which will be a strict improvement in behavior. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2013-09-17Merge branch 'fc/fast-export'Junio C Hamano
Code simpification. * fc/fast-export: fast-export: refactor get_tags_and_duplicates() fast-export: make extra_refs global
2013-09-04Merge branch 'sb/parseopt-boolean-removal'Junio C Hamano
Convert most uses of OPT_BOOLEAN/OPTION_BOOLEAN that can use OPT_BOOL/OPTION_BOOLEAN which have much saner semantics, and turn remaining ones into OPT_SET_INT, OPT_COUNTUP, etc. as necessary. * sb/parseopt-boolean-removal: revert: use the OPT_CMDMODE for parsing, reducing code checkout-index: fix negations of even numbers of -n config parsing options: allow one flag multiple times hash-object: replace stdin parsing OPT_BOOLEAN by OPT_COUNTUP branch, commit, name-rev: ease up boolean conditions checkout: remove superfluous local variable log, format-patch: parsing uses OPT__QUIET Replace deprecated OPT_BOOLEAN by OPT_BOOL Remove deprecated OPTION_BOOLEAN for parsing arguments
2013-09-03fast-export: refactor get_tags_and_duplicates()Felipe Contreras
Split into a separate helper function get_commit() so that the part that finds the relevant commit, and the part that does something with it (handle tag object, etc.) are in different places. No functional changes. Signed-off-by: Felipe Contreras <> Signed-off-by: Junio C Hamano <>
2013-09-03fast-export: make extra_refs globalFelipe Contreras
There's no need to pass it around everywhere. This would make easier further refactoring that makes use of this variable. Signed-off-by: Felipe Contreras <> Signed-off-by: Junio C Hamano <>
2013-08-05Replace deprecated OPT_BOOLEAN by OPT_BOOLStefan Beller
This task emerged from b04ba2bb (parse-options: deprecate OPT_BOOLEAN, 2011-09-27). All occurrences of the respective variables have been reviewed and none of them relied on the counting up mechanism, but all of them were using the variable as a true boolean. This patch does not change semantics of any command intentionally. Signed-off-by: Stefan Beller <> Signed-off-by: Junio C Hamano <>
2013-07-29many small typofixesOndřej Bílka
Signed-off-by: Ondřej Bílka <> Reviewed-by: Marc Branchaud <> Signed-off-by: Junio C Hamano <>
2013-05-07fast-{import,export}: use get_sha1_hex() to read from marks fileFelipe Contreras
It's wrong to call get_sha1() if they should be SHA-1s, plus inefficient. Signed-off-by: Felipe Contreras <> Signed-off-by: Junio C Hamano <>
2013-05-07fast-export: don't parse commits while reading marks fileFelipe Contreras
We don't need the parsed objects at this point, merely the information that they have marks. Seems to be three times faster in my setup with lots of objects. Signed-off-by: Felipe Contreras <> Signed-off-by: Junio C Hamano <>
2013-05-07fast-export: do not parse non-commit objects while reading marks fileFelipe Contreras
We read from the marks file and keep only marked commits, but in order to find the type of object, we are parsing the whole thing, which is slow, specially in big repositories with lots of big files. There's no need for that, we can query the object information with sha1_object_info(). Before this, loading the objects of a fresh emacs import, with 260598 blobs took 14 minutes, after this patch, it takes 3 seconds. This is the way fast-import does it. Also die if the object is not found (like fast-import). Signed-off-by: Felipe Contreras <> Signed-off-by: Junio C Hamano <>
2013-04-15fast-export: add --signed-tags=warn-strip modeJohn Keeping
This issues a warning while stripping signatures from signed tags, which allows us to use it as default behaviour for remote helpers which cannot specify how to handle signed tags. Signed-off-by: John Keeping <> Signed-off-by: Junio C Hamano <>
2013-04-12Sync with 'maint'Junio C Hamano
* maint: Correct common spelling mistakes in comments and tests kwset: fix spelling in comments precompose-utf8: fix spelling of "want" in error message compat/nedmalloc: fix spelling in comments compat/regex: fix spelling and grammar in comments obstack: fix spelling of similar contrib/subtree: fix spelling of accidentally git-remote-mediawiki: spelling fixes doc: various spelling fixes fast-export: fix argument name in error messages Documentation: distinguish between ref and offset deltas in pack-format i18n: make the translation of -u advice in one go
2013-04-12Merge branch 'maint-1.8.1' into maintJunio C Hamano
* maint-1.8.1: fast-export: fix argument name in error messages Documentation: distinguish between ref and offset deltas in pack-format
2013-04-12fast-export: fix argument name in error messagesPaul Price
The --signed-tags argument is plural, while error messages referred to --signed-tag (singular). Tweak error messages to correspond to the argument. Signed-off-by: Paul Price <> Signed-off-by: Junio C Hamano <>
2013-04-07fast-export: Allow pruned-references in mark fileAntoine Pelisse
fast-export can fail because of some pruned-reference when importing a mark file. The problem happens in the following scenario: $ git fast-export --export-marks=MARKS master (rewrite master) $ git prune $ git fast-export --import-marks=MARKS master This might fail if some references have been removed by prune because some marks will refer to no longer existing commits. git-fast-export will not need these objects anyway as they were no longer reachable. We still need to update last_numid so we don't change the mapping between marks and objects for remote-helpers. Unfortunately, the mark file should not be rewritten without lost marks if no new objects has been exported, as we could lose track of the last last_numid. Signed-off-by: Antoine Pelisse <> Reviewed-by: Felipe Contreras <> Signed-off-by: Junio C Hamano <>
2013-03-25Merge branch 'jk/fast-export-object-lookup'Junio C Hamano
* jk/fast-export-object-lookup: fast-export: do not load blob objects twice fast-export: rename handle_object function
2013-03-17fast-export: do not load blob objects twiceJeff King
When fast-export wants to export a blob object, it first calls parse_object to get a "struct object" and check whether we have already shown the object. If we haven't shown it, we then use read_sha1_file to pull it from disk and write it out. That means we load each blob from disk twice: once for parse_object to find its type and check its sha1, and a second time when we actually output it. We can drop this to a single load by using lookup_object to check the SHOWN flag, and then checking the signature on and outputting a single buffer. This provides modest speedups on git.git (best-of-five, "git fast-export HEAD >/dev/null"): [before] [after] real 0m14.347s real 0m13.780s user 0m14.084s user 0m13.620s sys 0m0.208s sys 0m0.100s and somewhat more on more blob-heavy repos (this is a repository full of media files): [before] [after] real 0m52.236s real 0m44.451s user 0m50.568s user 0m43.000s sys 0m1.536s sys 0m1.284s Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2013-03-17fast-export: rename handle_object functionJeff King
The handle_object function is rather vaguely named; it only operates on blobs, and its purpose is to export the blob to the output stream. Let's call it "export_blob" to make it more clear what it does. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2012-12-03fast-export: make sure updated refs get updatedFelipe Contreras
When an object has already been exported (and thus is in the marks) it's flagged as SHOWN, so it will not be exported again, even if in a later time it's exported through a different ref. We don't need the object to be exported again, but we want the ref updated, which doesn't happen. Since we can't know if a ref was exported or not, let's just assume that if the commit was marked (flags & SHOWN), the user still wants the ref updated. IOW: If it's specified in the command line, it will get updated, regardless of whether or not the object was marked. So: % git branch test master % git fast-export $mark_flags master % git fast-export $mark_flags test Would export 'test' properly. Additionally, this fixes issues with remote helpers; now they can push refs whose objects have already been exported, and a few other issues as well. Update the tests accordingly. Signed-off-by: Felipe Contreras <> Signed-off-by: Junio C Hamano <>
2012-12-03fast-export: don't handle uninteresting refsFelipe Contreras
They have been marked as UNINTERESTING for a reason, lets respect that. Currently the first ref is handled properly, but not the rest. Assuming that all the refs point at the same commit in the following example: % git fast-export master ^uninteresting ^foo ^bar reset refs/heads/bar from :0 reset refs/heads/foo from :0 reset refs/heads/uninteresting from :0 % git fast-export ^uninteresting ^foo ^bar master reset refs/heads/master from :0 reset refs/heads/bar from :0 reset refs/heads/foo from :0 Clearly this is wrong; the negative refs should be ignored. After this patch: % git fast-export ^uninteresting ^foo ^bar master # nothing % git fast-export master ^uninteresting ^foo ^bar # nothing And even more, it would only happen if the ref is pointing to exactly the same commit, but not otherwise: % git fast-export ^next next reset refs/heads/next from :0 % git fast-export ^next next^{commit} # nothing % git fast-export ^next next~0 # nothing % git fast-export ^next next~1 # nothing % git fast-export ^next next~2 # nothing The reason this happens is that before traversing the commits, fast-export checks if any of the refs point to the same object, and any duplicated ref gets added to a list in order to issue 'reset' commands after the traversing. Unfortunately, it's not even checking if the commit is flagged as UNINTERESTING. The fix of course, is to check it. However, in order to do it properly we need to get the UNINTERESTING flag from the command line, not from the commit object, because "^foo bar" will mark the commit 'bar' uninteresting if foo and bar points at the same commit. rev_cmdline_info, which was introduced exactly to handle this situation, contains all the information we need for get_tags_and_duplicates(), plus the ref flag. This way the rest of the positive refs will remain untouched; it's only the negative ones that change in behavior. Signed-off-by: Felipe Contreras <> Signed-off-by: Junio C Hamano <>
2012-11-29fast-export: trivial cleanupFelipe Contreras
Setting 'commit' to 'commit' is a no-op. It might have been there to avoid a compiler warning, but if so, it was the compiler to blame, and it's certainly not there any more. Signed-off-by: Felipe Contreras <> Signed-off-by: Junio C Hamano <>
2012-11-26fast-export: avoid importing blob marksFelipe Contreras
We want to be able to import, and then export, using the same marks, so that we don't push things that the other side already received. Unfortunately, fast-export doesn't store blobs in the marks, but fast-import does. This creates a mismatch when fast export is reusing a mark that was previously stored by fast-import. There is no point in one tool saving blobs, and the other not, but for now let's just check in fast-export that the objects are indeed commits. Signed-off-by: Felipe Contreras <> Signed-off-by: Junio C Hamano <>
2012-08-20i18n: fast-export: mark parseopt strings for translationNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2012-07-09Merge branch 'js/fast-export-paths-with-spaces'Junio C Hamano
"git fast-export" produced an input stream for fast-import without properly quoting pathnames when they contain SPs in them. * js/fast-export-paths-with-spaces: fast-export: quote paths with spaces
2012-06-28fast-export: quote paths with spacesJay Soffian
A path containing a space must be quoted when used as an argument to either the copy or rename commands (because unlike other commands, the path is not the final thing on the line for those commands). Commit 6280dfdc3b (fast-export: quote paths in output, 2011-08-05) previously attempted to fix fast-export's quoting by passing all paths through quote_c_style(). However, that function does not consider the space to be a character which requires quoting, so let's special-case the space inside print_path(). This will cause space-containing paths to also be quoted in other commands where such quoting is not strictly necessary, but it does not hurt to do so. The test from 6280dfdc3b did not detect this because, while it does introduce renames in the export stream, it does not actually turn on rename detection, so they were presented as pairs of deletions/adds. Using "-M" reveals the bug. Signed-off-by: Jay Soffian <> Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2012-06-12Merge branch 'js/maint-fast-export-mark-error'Junio C Hamano
* js/maint-fast-export-mark-error: fast-export: report SHA-1 instead of gibberish when marks exist already
2012-03-02Merge branch 'rs/no-no-no-parseopt'Junio C Hamano
* rs/no-no-no-parseopt: parse-options: remove PARSE_OPT_NEGHELP parse-options: allow positivation of options starting, with no- test-parse-options: convert to OPT_BOOL() Conflicts: builtin/grep.c
2012-02-28parse-options: remove PARSE_OPT_NEGHELPRené Scharfe
PARSE_OPT_NEGHELP is confusing because short options defined with that flag do the opposite of what the helptext says. It is also not needed anymore now that options starting with no- can be negated by removing that prefix. Convert its only two users to OPT_NEGBIT() and OPT_BOOL() and then remove support for PARSE_OPT_NEGHELP. Signed-off-by: Rene Scharfe <> Acked-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2011-12-21Fix an enum assignment issue spotted by Sun StudioÆvar Arnfjörð Bjarmason
In builtin/fast-export.c we'd assign to variables of the tag_of_filtered_mode enum type with constants defined for the signed_tag_mode enum. We'd get the intended value since both the value we were assigning with and the one we actually wanted had the same positional within their respective enums, but doing it this way makes no sense. This issue was spotted by Sun Studio 12 Update 1: "builtin/fast-export.c", line 54: warning: enum type mismatch: op "=" (E_ENUM_TYPE_MISMATCH_OP) Signed-off-by: Ævar Arnfjörð Bjarmason <> Signed-off-by: Junio C Hamano <>
2011-08-11Merge branch 'jk/fast-export-quote-path'Junio C Hamano
* jk/fast-export-quote-path: fast-export: quote paths in output
2011-08-05fast-export: quote paths in outputJeff King
Many pathnames in a fast-import stream need to be quoted. In particular: 1. Pathnames at the end of an "M" or "D" line need quoting if they contain a LF or start with double-quote. 2. Pathnames on a "C" or "R" line need quoting as above, but also if they contain spaces. For (1), we weren't quoting at all. For (2), we put double-quotes around the paths to handle spaces, but ignored the possibility that they would need further quoting. This patch checks whether each pathname needs c-style quoting, and uses it. This is slightly overkill for (1), which doesn't actually need to quote many characters that vanilla c-style quoting does. However, it shouldn't hurt, as any implementation needs to be ready to handle quoted strings anyway. In addition to adding a test, we have to tweak a test which blindly assumed that case (2) would always use double-quotes, whether it needed to or not. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2011-07-19fast-export: support done featureSverre Rabbelier
If fast-export is being used to generate a fast-import stream that will be used afterwards it is desirable to indicate the end of the stream with the new 'done' command. Add a flag that causes fast-export to end with 'done'. Signed-off-by: Sverre Rabbelier <> Acked-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2011-02-28Merge branch 'mg/placeholders-are-lowercase'Junio C Hamano
* mg/placeholders-are-lowercase: Make <identifier> lowercase in Documentation Make <identifier> lowercase as per CodingGuidelines Make <identifier> lowercase as per CodingGuidelines Make <identifier> lowercase as per CodingGuidelines CodingGuidelines: downcase placeholders in usage messages
2011-02-15Make <identifier> lowercase as per CodingGuidelinesMichael J Gruber
*.c part for matches with '"[A-Z]+"'. Signed-off-by: Michael J Gruber <> Signed-off-by: Junio C Hamano <>
2011-02-03struct rev_info: convert prune_data to struct pathspecNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2010-09-08Merge branch 'en/d-f-conflict-fix'Junio C Hamano
* en/d-f-conflict-fix: fast-export: ensure that a renamed file is printed after all references