path: root/vcs-svn/fast_export.c
AgeCommit message (Collapse)Author
2018-05-06Convert remaining die*(BUG) messagesJohannes Schindelin
These were not caught by the previous commit, as they did not match the regular expression. While at it, remove the localization from one instance: we never want BUG() messages to be translated, as they target Git developers, not the end user (hence it would be quite unhelpful to not only burden the translators, but then even end up with a bug report in a language that no core Git contributor understands). Signed-off-by: Johannes Schindelin <> Signed-off-by: Junio C Hamano <>
2017-08-27Merge branch 'jn/vcs-svn-cleanup'Junio C Hamano
Code clean-up. * jn/vcs-svn-cleanup: vcs-svn: move remaining repo_tree functions to fast_export.h vcs-svn: remove repo_delete wrapper function vcs-svn: remove custom mode constants vcs-svn: remove more unused prototypes and declarations
2017-08-23vcs-svn: move remaining repo_tree functions to fast_export.hJonathan Nieder
These used to be for manipulating the in-memory repo_tree structure, but nowadays they are convenience wrappers to handle a few git-vs-svn mismatches: 1. Git does not track empty directories but Subversion does. When looking up a path in git that Subversion thinks exists and finding nothing, we can safely assume that the path represents a directory. This is needed when a later Subversion revision modifies that directory. 2. Subversion allows deleting a file by copying. In Git fast-import we have to handle that more explicitly as a deletion. These are details of the tool's interaction with git fast-import. Move them to fast_export.c, where other such details are handled. This way the function names do not start with a repo_ prefix that would clash with the repository object introduced in v2.14.0-rc0~38^2~16 (repository: introduce the repository object, 2017-06-22) or an svn_ prefix that would clash with libsvn (in case someone wants to link this code with libsvn some day). Signed-off-by: Jonathan Nieder <> Signed-off-by: Junio C Hamano <>
2017-08-23vcs-svn: remove custom mode constantsJonathan Nieder
In the rest of Git, these modes are spelled as S_IFDIR, S_IFREG | 0644, S_IFREG | 0755, and S_IFLNK. Use the same constants in svn-fe for simplicity and consistency. No functional change intended. Signed-off-by: Jonathan Nieder <> Signed-off-by: Junio C Hamano <>
2017-04-27timestamp_t: a new data type for timestampsJohannes Schindelin
Git's source code assumes that unsigned long is at least as precise as time_t. Which is incorrect, and causes a lot of problems, in particular where unsigned long is only 32-bit (notably on Windows, even in 64-bit versions). So let's just use a more appropriate data type instead. In preparation for this, we introduce the new `timestamp_t` data type. By necessity, this is a very, very large patch, as it has to replace all timestamps' data type in one go. As we will use a data type that is not necessarily identical to `time_t`, we need to be very careful to use `time_t` whenever we interact with the system functions, and `timestamp_t` everywhere else. Signed-off-by: Johannes Schindelin <> Signed-off-by: Junio C Hamano <>
2017-04-24PRItime: introduce a new "printf format" for timestampsJohannes Schindelin
Currently, Git's source code treats all timestamps as if they were unsigned longs. Therefore, it is okay to write "%lu" when printing them. There is a substantial problem with that, though: at least on Windows, time_t is *larger* than unsigned long, and hence we will want to switch away from the ill-specified `unsigned long` data type. So let's introduce the pseudo format "PRItime" (currently simply being defined to "lu") to make it easier to change the data type used for timestamps. Signed-off-by: Johannes Schindelin <> Signed-off-by: Junio C Hamano <>
2016-09-14vcs-svn/fast_export: fix timestamp fmt specifiersMike Ralphson
Two instances of %ld being used for unsigned longs Signed-off-by: Mike Ralphson <> Signed-off-by: Junio C Hamano <>
2013-12-05strbuf: introduce starts_with() and ends_with()Christian Couder
prefixcmp() and suffixcmp() share the common "cmp" suffix that typically are used to name functions that can be used for ordering, but they can't, because they are not antisymmetric: prefixcmp("foo", "foobar") < 0 prefixcmp("foobar", "foo") == 0 We in fact do not use these functions for ordering. Replace them with functions that just check for equality. Add starts_with() and end_with() that will be used to replace prefixcmp() and suffixcmp(), respectively, as the first step. These are named after corresponding functions/methods in programming languages, like Java, Python and Ruby. In vcs-svn/fast_export.c, there was already an ends_with() function that did the same thing. Let's use the new one instead while at it. Signed-off-by: Christian Couder <> Signed-off-by: Junio C Hamano <>
2012-10-07remote-svn: add incremental importFlorian Achleitner
Search for a note attached to the ref to update and read it's 'Revision-number:'-line. Start import from the next svn revision. If there is no next revision in the svn repo, svnrdump terminates with a message on stderr an non-zero return value. This looks a little weird, but there is no other way to know whether there is a new revision in the svn repo. On the start of an incremental import, the parent of the first commit in the fast-import stream is set to the branch name to update. All following commits specify their parent by a mark number. Previous mark files are currently not reused. Signed-off-by: Florian Achleitner <> Acked-by: David Michael Barr <> Signed-off-by: Junio C Hamano <>
2012-10-07Create a note for every imported commit containing svn metadataFlorian Achleitner
To provide metadata from svn dumps for further processing, e.g. branch detection, attach a note to each imported commit that stores additional information. The notes are currently hard-coded in refs/notes/svn/revs. Currently the following lines from the svn dump are directly accumulated in the note. This can be refined as needed. - "Revision-number" - "Node-path" - "Node-kind" - "Node-action" - "Node-copyfrom-path" - "Node-copyfrom-rev" Signed-off-by: Florian Achleitner <> Acked-by: David Michael Barr <> Signed-off-by: Junio C Hamano <>
2012-10-07vcs-svn: add fast_export_note to create notesDmitry Ivankov
fast_export lacked a method to writes notes to fast-import stream. Add two new functions fast_export_note which is similar to fast_export_modify. And also add fast_export_buf_to_data to be able to write inline blobs that don't come from a line_buffer or from delta application. To be used like this: fast_export_begin_commit("refs/notes/somenotes", ...) fast_export_note("refs/heads/master", "inline") fast_export_buf_to_data(&data) or maybe fast_export_note("refs/heads/master", sha1) Signed-off-by: Dmitry Ivankov <> Signed-off-by: Florian Achleitner <> Acked-by: David Michael Barr <> Signed-off-by: Junio C Hamano <>
2012-10-07remote-svn, vcs-svn: Enable fetching to private refsFlorian Achleitner
The reference to update by the fast-import stream is hard-coded. When fetching from a remote the remote-helper shall update refs in a private namespace, i.e. a private subdir of refs/. This namespace is defined by the 'refspec' capability, that the remote-helper advertises as a reply to the 'capabilities' command. Extend svndump and fast-export to allow passing the target ref. Update svn-fe to be compatible. Signed-off-by: Florian Achleitner <> Acked-by: David Michael Barr <> Signed-off-by: Junio C Hamano <>
2012-07-06vcs-svn: suppress signed/unsigned comparison warningsDavid Barr
These are already safe because both sides of the comparison are nonnegative. This would normally not be important because Git is not -Wsign-compare clean anyway, but we like to keep the vcs-svn/ lib to a higher standard for convenience using it in other projects. Signed-off-by: David Barr <> Signed-off-by: Jonathan Nieder <>
2012-07-06vcs-svn: use strstr instead of memmemDavid Barr
memmem is a GNU extension. Avoiding it makes the code clearer and makes it easier for projects that don't share git's compat/ code, such as the standalone svn-dump-fast-export project, to reuse the vcs-svn/ library. Signed-off-by: David Barr <> Signed-off-by: Jonathan Nieder <>
2012-07-06vcs-svn: drop no-op reset methodsDavid Barr
Since v1.7.5~42^2~6 (vcs-svn: remove buffer_read_string) buffer_reset() does nothing thus fast_export_reset() also. Signed-off-by: David Barr <> Signed-off-by: Jonathan Nieder <>
2012-02-02vcs-svn: allow import of > 4GiB filesJonathan Nieder
There is no reason in principle that an svn-format dump would not be able to represent a file whose length does not fit in a 32-bit integer. Use off_t consistently (instead of uint32_t) to represent file lengths so we can handle that. Most of our code is already ready to do that without this patch and already passes values of type off_t around. The type mismatch due to stragglers was noticed with gcc -Wtype-limits. Inspired-by: Ramsay Jones <> Signed-off-by: Jonathan Nieder <>
2011-06-23vcs-svn: reset first_commit_done in fast_export_initDmitry Ivankov
first_commit_done has zero as a default value, but it is not reset back to zero in fast_export_init. Reset it back to zero so that each export will have proper initial state. Signed-off-by: Dmitry Ivankov <> Signed-off-by: Jonathan Nieder <>
2011-06-21vcs-svn: do not initialize report_buffer twiceDmitry Ivankov
When importing from a dump with deltas, first fast_export_init calls buffer_fdinit, and then init_report_buffer calls fdopen once again when processing the first delta. The second initialization is redundant and leaks a FILE *. Remove the redundant on-demand initialization to fix this. Initializing directly in fast_export_init is simpler and lets the caller pass an int specifying which fd to use instead of hard-coding REPORT_FILENO. Signed-off-by: Dmitry Ivankov <> Signed-off-by: Jonathan Nieder <>
2011-06-15vcs-svn: avoid hangs from corrupt deltasJonathan Nieder
A corrupt Subversion-format delta can request reads past the end of the preimage. Set sliding_view::max_off so such corruption is caught when it appears rather than blocking in an impossible-to-fulfill read() when input is coming from a socket or pipe. Inspired-by: Ramkumar Ramachandra <> Signed-off-by: Jonathan Nieder <>
2011-06-15vcs-svn: guard against overflow when computing preimage lengthJonathan Nieder
Signed integer overflow produces undefined behavior in C and off_t is a signed type. For predictable behavior, add some checks to protect in advance against overflow. On 32-bit systems ftell as called by buffer_tmpfile_prepare_to_read is likely to fail with EOVERFLOW when reading the corresponding postimage, and this patch does not fix that. So it's more of a futureproofing measure than a complete fix. Signed-off-by: Jonathan Nieder <>
2011-06-15Merge branch 'db/delta-applier' into db/text-deltaJonathan Nieder
* db/delta-applier: vcs-svn: cap number of bytes read from sliding view test-svn-fe: split off "test-svn-fe -d" into a separate function
2011-05-26vcs-svn: implement text-delta handlingDavid Barr
Handle input in Subversion's dumpfile format, version 3. This is the format produced by "svnrdump dump" and "svnadmin dump --deltas", and the main difference between v3 dumpfiles and the dumpfiles already handled is that these can include nodes whose properties and text are expressed relative to some other node. To handle such nodes, we find which node the text and properties are based on, handle its property changes, use the cat-blob command to request the basis blob from the fast-import backend, use the svndiff0_apply() helper to apply the text delta on the fly, writing output to a temporary file, and then measure that postimage file's length and write its content to the fast-import stream. The temporary postimage file is shared between delta-using nodes to avoid some file system overhead. The svn-fe interface needs to be more complicated to accomodate the backward flow of information from the fast-import backend to svn-fe. The backflow fd is not needed when parsing streams without deltas, though, so existing scripts using svn-fe on v2 dumps should continue to work. NEEDSWORK: generalize interface so caller sets the backflow fd, close temporary file before exiting Signed-off-by: David Barr <> Signed-off-by: Jonathan Nieder <> Signed-off-by: David Barr <> Signed-off-by: Jonathan Nieder <>
2011-05-26Merge branch 'db/svn-fe-code-purge' into svn-feJonathan Nieder
* db/svn-fe-code-purge: vcs-svn: drop obj_pool vcs-svn: drop treap vcs-svn: drop string_pool vcs-svn: pass paths through to fast-import Conflicts: vcs-svn/fast_export.c vcs-svn/fast_export.h vcs-svn/repo_tree.c vcs-svn/repo_tree.h vcs-svn/string_pool.c vcs-svn/svndump.c vcs-svn/trp.txt
2011-05-26Merge branch 'db/vcs-svn-incremental' into svn-feJonathan Nieder
This teaches svn-fe to incrementally import into an existing repository (at last!) at the expense of less convenient UI. Think of it as growing pains. This opens the door to many excellent things, and it would be a bad idea to discourage people from building on it for much longer. * db/vcs-svn-incremental: vcs-svn: avoid using ls command twice vcs-svn: use mark from previous import for parent commit vcs-svn: handle filenames with dq correctly vcs-svn: quote paths correctly for ls command vcs-svn: eliminate repo_tree structure vcs-svn: add a comment before each commit vcs-svn: save marks for imported commits vcs-svn: use higher mark numbers for blobs vcs-svn: set up channel to read fast-import cat-blob response Conflicts: t/ vcs-svn/fast_export.c vcs-svn/fast_export.h vcs-svn/repo_tree.c vcs-svn/svndump.c
2011-03-27vcs-svn: add missing cast to printf argumentJonathan Nieder
gcc -m32 correctly warns: vcs-svn/fast_export.c: In function 'fast_export_commit': vcs-svn/fast_export.c:54:2: warning: format '%llu' expects argument of type 'long long unsigned int', but argument 2 has type 'unsigned int' [-Wformat] Fix it. Signed-off-by: Jonathan Nieder <>
2011-03-26vcs-svn: handle log message with embedded NULJonathan Nieder
Pass the log message by strbuf instead of as a C-style string and use fwrite instead of printf to write it to fast-import so embedded '\0' bytes can be preserved. Currently "git log" doesn't show the embedded NULs but "git cat-file commit" can. While at it, stop including system headers from repo_tree.h. git source files need to include git-compat-util.h (or cache.h or builtin.h) sooner to ensure the appropriate feature test macros are defined. Signed-off-by: Jonathan Nieder <>
2011-03-22vcs-svn: pass paths through to fast-importDavid Barr
Now that there is no internal representation of the repo, it is not necessary to tokenise paths. Use strbuf instead and bypass string_pool. This means svn-fe can handle arbitrarily long paths (as long as a strbuf can fit them), with arbitrarily many path components. While at it, since we now treat paths in their entirety, only quote when necessary. Signed-off-by: David Barr <> Signed-off-by: Jonathan Nieder <>
2011-03-22Merge branch 'db/strbufs-for-metadata' into db/svn-fe-code-purgeJonathan Nieder
* db/strbufs-for-metadata: vcs-svn: use strbuf for author, UUID, and URL vcs-svn: use strbuf for revision log Conflicts: vcs-svn/fast_export.c vcs-svn/fast_export.h vcs-svn/repo_tree.c vcs-svn/svndump.c
2011-03-22Merge branch 'db/length-as-hash' (early part) into db/svn-fe-code-purgeJonathan Nieder
* 'db/length-as-hash' (early part): vcs-svn: implement perfect hash for top-level keys vcs-svn: implement perfect hash for node-prop keys vcs-svn: improve reporting of input errors vcs-svn: make buffer_copy_bytes return length read vcs-svn: make buffer_skip_bytes return length read vcs-svn: improve support for reading large files Conflicts: vcs-svn/fast_export.c vcs-svn/svndump.c
2011-03-22vcs-svn: use strbuf for author, UUID, and URLDavid Barr
Use strbufs and strings instead of interned strings for values of rev, dump, and node fields that happen to be strings. After this change, the only remaining string_pool use is for paths in the repo_tree API and internals. Functional change: treat an empty author, UUID, or URL as none at all. So for example, in repos where the first revision has an empty svn:author property, the first rev will be treated as by "nobody" rather than by a person with empty name and email address created by prepending an @ sign to the repository UUID. Signed-off-by: David Barr <> Signed-off-by: Jonathan Nieder <>
2011-03-22vcs-svn: improve reporting of input errorsJonathan Nieder
Catch input errors and exit early enough to print a reasonable diagnosis based on errno. Signed-off-by: Jonathan Nieder <> Signed-off-by: David Barr <> Signed-off-by: Jonathan Nieder <>
2011-03-07vcs-svn: use mark from previous import for parent commitDavid Barr
With this patch, overlapping incremental imports work. Signed-off-by: David Barr <> Signed-off-by: Jonathan Nieder <>
2011-03-07vcs-svn: handle filenames with dq correctlyJonathan Nieder
Quote paths passed to fast-import so filenames with double quotes are not misinterpreted. One might imagine this could help with filenames with newlines, too, but svn does not allow those. Helped-by: David Barr <> Signed-off-by: Jonathan Nieder <> Signed-off-by: David Barr <> Signed-off-by: Jonathan Nieder <>
2011-03-07vcs-svn: quote paths correctly for ls commandDavid Barr
This bug was found while importing rev 601865 of ASF. [jn: with test] Signed-off-by: David Barr <> Signed-off-by: Jonathan Nieder <>
2011-03-07vcs-svn: eliminate repo_tree structureJonathan Nieder
Rely on fast-import for information about previous revs. This requires always setting up backward flow of information, even for v2 dumps. On the plus side, it simplifies the code by quite a bit and opens the door to further simplifications. [db: adjusted to support final version of the cat-blob patch] [jn: avoiding hard-coding git's name for the empty tree for portability to other backends] Signed-off-by: Jonathan Nieder <> Signed-off-by: David Barr <> Signed-off-by: Jonathan Nieder <>
2011-03-07vcs-svn: add a comment before each commitJonathan Nieder
Current svn-fe produces output like this: blob mark :7382321 data 5 hello blob mark :7382322 data 5 Hello commit mark :3 [...] M 100644 :7382321 hello.c M 100644 :7382322 hello2.c This means svn-fe has to keep track of the paths modified in each commit and the corresponding marks, instead of dealing with each file as it arrives in input and then forgetting about it. A better strategy would be to use inline blobs: commit mark :3 [...] M 100644 inline hello.c data 5 hello [...] As a first step towards that, teach svn-fe to notice when the collection of blobs for each commit starts and write a comment ("# commit 3.") there. Signed-off-by: Jonathan Nieder <>
2011-03-07vcs-svn: save marks for imported commitsJonathan Nieder
This way, a person can use svnadmin dump $path | svn-fe | git fast-import --relative-marks --export-marks=svn-revs to get a list of what commit corresponds to each svn revision (plus some irrelevant blob names) in .git/info/fast-import/svn-revs. Signed-off-by: Jonathan Nieder <> Signed-off-by: David Barr <> Signed-off-by: Jonathan Nieder <>
2011-03-07vcs-svn: set up channel to read fast-import cat-blob responseDavid Barr
Set up some plumbing: teach the svndump lib to pass a file descriptor number to the fast_export lib, representing where cat-blob/ls responses can be read from, and add a get_response_line helper function to the fast_export lib to read a line from that file. Unfortunately this means that svn-fe needs file descriptor 3 to be redirected from somewhere (preferrably the cat-blob stream of a fast-import backend); otherwise it will fail: $ svndump <path> | svn-fe fatal: cannot read from file descriptor 3: Bad file descriptor For the moment, "svn-fe 3</dev/null" works as a workaround but it will not work for very long. A fast-import backend that can retrieve old commits is needed in order to be able to fulfill svn "Node-copyfrom-rev" requests that refer to revs from a previous run. [jn: with new change description] Based-on-patch-by: Jonathan Nieder <> Signed-off-by: David Barr <> Signed-off-by: Jonathan Nieder <>
2011-02-26vcs-svn: teach line_buffer to handle multiple input filesJonathan Nieder
Collect the line_buffer state in a newly public line_buffer struct. Callers can use multiple line_buffers to manage input from multiple files at a time. svn-fe's delta applier will use this to stream a delta from svnrdump and the preimage it applies to from fast-import at the same time. The tests don't take advantage of the new features, but I think that's okay. It is easier to find lingering examples of nonreentrant code by searching for "static" in line_buffer.c. Signed-off-by: Jonathan Nieder <>
2010-09-12vcs-svn: Fix some printf format compiler warningsRamsay Jones
In particular, on systems that define uint32_t as an unsigned long, gcc complains as follows: CC vcs-svn/fast_export.o vcs-svn/fast_export.c: In function `fast_export_modify': vcs-svn/fast_export.c:28: warning: unsigned int format, uint32_t arg (arg 2) vcs-svn/fast_export.c:28: warning: int format, uint32_t arg (arg 3) vcs-svn/fast_export.c: In function `fast_export_commit': vcs-svn/fast_export.c:42: warning: int format, uint32_t arg (arg 5) vcs-svn/fast_export.c:62: warning: int format, uint32_t arg (arg 2) vcs-svn/fast_export.c: In function `fast_export_blob': vcs-svn/fast_export.c:72: warning: int format, uint32_t arg (arg 2) vcs-svn/fast_export.c:72: warning: int format, uint32_t arg (arg 3) CC vcs-svn/svndump.o vcs-svn/svndump.c: In function `svndump_read': vcs-svn/svndump.c:260: warning: int format, uint32_t arg (arg 3) In order to suppress the warnings we use the C99 format specifier macros PRIo32 and PRIu32 from <inttypes.h>. Signed-off-by: Ramsay Jones <> Acked-by: Jonathan Nieder <> Signed-off-by: Junio C Hamano <>
2010-08-15vcs-svn: Avoid %z in format stringJonathan Nieder
In the spirit of v1.6.4-rc0~124 (MinGW: Fix compiler warning in merge-recursive, 2009-05-23), use a 32-bit integer instead; the dump file parser does not support any better, anyway. Signed-off-by: Jonathan Nieder <> Signed-off-by: Junio C Hamano <>
2010-08-15Infrastructure to write revisions in fast-export formatDavid Barr
repo_tree maintains the exporter's state and provides a facility to to call fast_export, which writes objects to stdout suitable for consumption by fast-import. The exported functions roughly correspond to Subversion FS operations. . repo_add, repo_modify, repo_copy, repo_replace, and repo_delete update the current commit, based roughly on the corresponding Subversion FS operation. . repo_commit calls out to fast_export to write the current commit to the fast-import stream in stdout. . repo_diff is used by the fast_export module to write the changes for a commit. . repo_reset erases the exporter's state, so valgrind can be happy. [rr: squelched compiler warnings] [jn: removed support for maintaining state on-disk, though we may want to add it back later] Signed-off-by: David Barr <> Signed-off-by: Ramkumar Ramachandra <> Signed-off-by: Jonathan Nieder <> Signed-off-by: Junio C Hamano <>