summaryrefslogtreecommitdiff
path: root/fast-import.c
AgeCommit message (Collapse)Author
2007-02-12fast-import: Support reusing 'from' and brown paper bag fix reset.Shawn O. Pearce
It was suggested on the mailing list that being able to use `from` in any commit to reset the current branch is useful in some types of importers, such as a darcs importer. We originally did not permit resetting an existing branch with a new `from` command during a `commit` command, but this restriction was only to help debug the hacked up cvs2svn that Jon Smirl was developing in parallel with git-fast-import. It is probably more of a problem to disallow it than to allow it. So now we permit a `from` during any `commit`. While making the changes required to permit multiple `from` commands on the same branch, I discovered we no longer needed the last_commit field to be set to 0 during a reset, so that was removed. (Reset was originally setting the field to 0 to signal cmd_from() that it was OK to execute on the branch.) While poking around in this section of fast-import I also realized the `reset` command was not working as intended if the corresponding `from` command was omitted (as allowed by the BNF grammar and the code). If `from` was omitted we cleared out the tree but we left the tree SHA-1 and parent commit SHA-1 intact. This is not what the user intended in this case. Instead they would be trying to reset the branch to have no parent and to have no tree, making the branch look new-born during the next commit. We now clear these SHA-1 values during `reset`, ensuring the branch looks new-born if `from` does not get supplied. New test cases for these were also added. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-12fast-import: Hide the pack boundary commits by default.Shawn O. Pearce
Most users don't need the pack boundary information that fast-import was printing to standard output, especially if they were calling it with --quiet. Those users who do want this information probably want it captured so they can go back and use it to repack the imported repository. So dumping the boundary commits to a log file makes more sense then printing them to standard output. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-07fast-import: Fix compile warningsJohannes Schindelin
Not on all platforms are size_t and unsigned long equivalent. Since I do not know how portable %z is, I play safe, and just cast the respective variables to unsigned long. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-07Don't crash fast-import if the marks cannot be exported.Shawn O. Pearce
Apparently fast-import used to die a horrible death if we were unable to open the marks file for output. This is slightly less than ideal, especially now that we dump the marks as part of the `checkpoint` command. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-07Dump all refs and marks during a checkpoint in fast-import.Shawn O. Pearce
If the frontend asks us to checkpoint (via the explicit checkpoint command) its probably because they are afraid the current import will crash/fail/whatever and want to make sure they can pickup from the last checkpoint. To do that sort of recovery, we will need the current tip of every branch and tag available at the next startup. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-07Teach fast-import how to sit quietly in the corner.Shawn O. Pearce
Often users will be running fast-import from within a larger frontend process, and this may be a frequent periodic tool such as a future edition of `git-svn fetch`. We don't want to bombard users with our large stats output if they won't be interested in it, so `--quiet` is now an option to make gfi be more silent. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-07Teach fast-import how to clear the internal branch content.Shawn O. Pearce
Some frontends may not be able to (easily) keep track of which files are included in the branch, and which aren't. Performing this tracking can be tedious and error prone for the frontend to do, especially if its foreign data source cannot supply the changed path list on a per-commit basis. fast-import now allows a frontend to request that a branch's tree be wiped clean (reset to the empty tree) at the start of a commit, allowing the frontend to feed in all paths which belong on the branch. This is ideal for a tar-file importer frontend, for example, as the frontend just needs to reformat the tar data stream into a gfi data stream, which may be something a few Perl regexps can take care of. :) Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-06S_IFLNK != 0140000Junio C Hamano
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-06Don't do non-fastforward updates in fast-import.Shawn O. Pearce
If fast-import is being used to update an existing branch of a repository, the user may not want to lose commits if another process updates the same ref at the same time. For example, the user might be using fast-import to make just one or two commits against a live branch. We now perform a fast-forward check during the ref updating process. If updating a branch would cause commits in that branch to be lost, we skip over it and display the new SHA1 to standard error. This new default behavior can be overridden with `--force`, like git-push and git-fetch. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-06Support RFC 2822 date parsing in fast-import.Shawn O. Pearce
Since some frontends may be working with source material where the dates are only readily available as RFC 2822 strings, it is more friendly if fast-import exposes Git's parse_date() function to handle the conversion. This way the frontend doesn't need to perform the parsing itself. The new --date-format option to fast-import can be used by a frontend to select which format it will supply date strings in. The default is the standard `raw` Git format, which fast-import has always supported. Format rfc2822 can be used to activate the parse_date() function instead. Because fast-import could also be useful for creating new, current commits, the format `now` is also supported to generate the current system timestamp. The implementation of `now` is a trivial call to datestamp(), but is actually a whole whopping 3 lines so that fast-import can verify the frontend really meant `now`. As part of this change I have added validation of the `raw` date format. Prior to this change fast-import would accept anything in a `committer` command, even if it was seriously malformed. Now fast-import requires the '> ' near the end of the string and verifies the timestamp is formatted properly. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-06Remove unnecessary null pointer checks in fast-import.Shawn O. Pearce
There is no need to check for a NULL pointer before invoking free(), the runtime library automatically performs this check anyway and does nothing if a NULL pointer is supplied. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-06Correct minor style issue in fast-import.Shawn O. Pearce
Junio noticed that I was using a different style in fast-import for returned pointers than the rest of Git. Before merging this code into the main git.git tree I'd like to make it consistent, as this style variation was not intentional. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-06Correct compiler warnings in fast-import.Shawn O. Pearce
Junio noticed these warnings/errors in fast-import when compiling with `-Werror -ansi -pedantic`. A few changes are to reduce compiler warnings, while one (in cmd_merge) is a bug fix. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-06Remove --branch-log from fast-import.Shawn O. Pearce
The --branch-log option and its associated code hasn't been used in several months, as its not really very useful for debugging fast-import or a frontend. I don't plan on supporting it in this state long-term, so I'm killing it now before it gets distributed to a wider audience. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-06Don't support shell-quoted refnames in fast-import.Shawn O. Pearce
The current implementation of shell-style quoted refnames and SHA-1 expressions within fast-import contains a bad memory leak. We leak the unquoted strings used by the `from` and `merge` commands, maybe others. Its also just muddling up the docs. Since Git refnames cannot contain LF, and that is our delimiter for the end of the refname, and we accept any other character as-is, there is no reason for these strings to support quoting, except to be nice to frontends. But frontends shouldn't be expecting to use funny refs in Git, and its just as simple to never quote them as it is to always pass them through the same quoting filter as pathnames. So frontends should never quote refs, or ref expressions. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-05Reduce memory usage of fast-import.Shawn O. Pearce
Some structs are allocated rather frequently, but were using integer types which were far larger than required to actually store their full value range. As packfiles are limited to 4 GiB we don't need more than 32 bits to store the offset of an object within that packfile, an `unsigned long` on a 64 bit system is likely a 64 bit unsigned value. Saving 4 bytes per object on a 64 bit system can add up fast on any sizable import. As atom strings are strictly single components in a path name these are probably limited to just 255 bytes by the underlying OS. Going to that short of a string is probably too restrictive, but certainly `unsigned int` is far too large for their lengths. `unsigned short` is a reasonable limit. Modes within a tree really only need two bytes to store their whole value; using `unsigned int` here is vast overkill. Saving 4 bytes per file entry in an active branch can add up quickly on a project with a large number of files. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-05Include checkpoint command in the BNF.Shawn O. Pearce
This command isn't encouraged (as its slow) but it does exist and is accepted, so it still should be covered in the BNF. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-30Merge branch 'master' into sp/gfiShawn O. Pearce
git-fast-import requires use of inttypes.h, but the master branch has added it to git-compat-util differently than git-fast-import originally had used it. This merge back of master to the fast-import topic is to get (and use) inttypes.h the way master is using it. This is a partially evil merge to remove the call to setup_ident(), as the master branch now contains a change which makes this implicit and therefore removed the function declaration. (commit 01754769). Conflicts: git-compat-util.h
2007-01-18Accept 'inline' file data in fast-import commit structure.Shawn O. Pearce
Its very annoying to need to specify the file content ahead of a commit and use marks to connect the individual blobs to the commit's file modification entry, especially if the frontend can't/won't generate the blob SHA1s itself. Instead it would much easier to use if we can accept the blob data at the same time as we receive each file_change line. Now fast-import accepts 'inline' instead of a mark idnum or blob SHA1 within the 'M' type file_change command. If an inline is detected the very next line must be a 'data n' command, supplying the file data. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-18Support delimited data regions in fast-import.Shawn O. Pearce
During testing its nice to not have to feed the length of a data chunk to the 'data' command of fast-import. Instead we would prefer to be able to establish a data chunk much like shell's << operator and use a line delimiter to denote the end of the input. So now if a data command is started as 'data <<EOF' we will look for a terminator line containing only the string EOF on that line. Once found, we stop the data command. Everything between the two lines is used as the data value. The 'data <<' syntax is slower than 'data n', as we don't know how many bytes to expect and instead must grow our buffer on the fly. It also has the problem that the frontend must use a string which will not appear on a line by itself in the input, and the data region will always end in an LF. For these reasons real import frontends are encouraged to continue to use _only_ 'data n'. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-18Remove unnecessary options from fast-import.Shawn O. Pearce
The --objects command line option is rather unnecessary. Internally we allocate objects in 5000 unit blocks, ensuring that any sort of malloc overhead is ammortized over the individual objects to almost nothing. Since most frontends don't know how many objects they will need for a given import run (and its hard for them to predict without just doing the run) we probably won't see anyone using --objects. Further since there's really no major benefit to using the option, most frontends won't even bother supplying it even if they could estimate the number of objects. So I'm removing it. The --max-objects-per-pack option was probably a mistake to even have added in the first place. The packfile format is limited to 4 GiB today; given that objects need at least 3 bytes of data (and probably need even more) there's no way we are going to exceed the limit of 1<<32-1 objects before we reach the file size limit. So I'm removing it (to slightly reduce the complexity of the code) before anyone gets any wise ideas and tries to use it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-18Use fixed-size integers when writing out the index in fast-import.Shawn O. Pearce
Currently the pack .idx file format uses 32-bit unsigned integers for the fan-out table and the object offsets. We had previously defined these as 'unsigned int', but not every system will define that type to be a 32 bit value. To ensure maximum portability we should always use 'uint32_t'. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-18Always use struct pack_header for pack header in fast-import.Shawn O. Pearce
Previously we were using 'unsigned int' to update the hdr_entries field of the pack header after the file had been completed and was being hashed. This may not be 32 bits on all platforms. Instead we want to always uint32_t. I'm actually cheating here by just using the pack_header like the rest of Git and letting the struct definition declare the correct type. Right now that field is still 'unsigned int' (wrong) but a pending change submitted by Simon 'corecode' Schubert changes it to uint32_t. After that change is merged in fast-import will do the right thing all of the time. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-17Correct packfile edge output in fast-import.Shawn O. Pearce
Branches are only contained by a packfile if the branch actually had its most recent commit in that packfile. So new branches are set to MAX_PACK_ID to ensure they don't cause their commit to list as part of the first packfile when it closes out if the commit was actually in existance before fast-import started. Also corrected the type of last_commit to be umaxint_t to prevent overflow and wraparound on very large imports. Though that is highly unlikely to occur as we're talking 4 billion commits, which no real project has right now. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-17Declare no-arg functions as (void) in fast-import.Shawn O. Pearce
Apparently the git convention is to declare any function which takes no arguments as taking void. I did not do this during the early fast-import development, but should have. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-17Correct a few types to be unsigned in fast-import.Shawn O. Pearce
The length of an atom string cannot be negative. So make it explicit and declare it as an unsigned value. The shift width in a mark table node also cannot be negative. I'm also moving it to after the pointer arrays to prevent any possible alignment problems on a 64 bit system. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-17Corrected BNF input documentation for fast-import.Shawn O. Pearce
Now that fast-import uses uintmax_t (the largest available unsigned integer type) for marks we don't want to say its an unsigned 32 bit integer in ASCII base 10 notation. It could be much larger, especially on 64 bit systems, and especially if a frontend uses a very large number of marks (1 per file revision on a very, very large import). Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-16Print out the edge commits for each packfile in fast-import.Shawn O. Pearce
To help callers repack very large repositories into a series of packfiles fast-import now outputs the last commits/tags it wrote to a packfile when it prints out the packfile name. This information can be feed to pack-objects --revs to repack. For the first pack of an initial import this is pretty easy (just feed those SHA1s on stdin) but for subsequent packs you want to feed the subsequent pack's final SHA1s but also all prior pack's SHA1s prefixed with the negation operator. This way the prior pack's data does not get included into the subsequent pack. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-16Correct object_count type and stat output in fast-import.Shawn O. Pearce
Since object_count is limited to 'unsigned long' (really an unsigned 32 bit integer value) by the pack file format we may as well use exactly that type here in fast-import for that counter. An earlier change by me incorrectly made it uintmax_t. But since object_count is a counter for the current packfile only, we don't want to output its value at the end. Instead we should sum up the individual type counters and report that total, as that will cover all of the packfiles. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-16Correct max_packsize default in fast-import.Shawn O. Pearce
Apparently amd64 has defined 'unsigned long' to be a 64 bit value, which means -1 was way over the 4 GiB packfile limit. Whoops. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-16Remove unnecessary pack_fd global in fast-import.Shawn O. Pearce
Much like the pack_sha1 the pack_fd is an unnecessary global variable, we already have the fd stored in our struct packed_git *pack_data so that the core library functions in sha1_file.c are able to lookup and decompress object data that we have previously written. Keeping an extra copy of this value in our own variable is just a hold-over from earlier versions of fast-import and is now completely unnecessary. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-16Ensure we close the packfile after creating it in fast-import.Shawn O. Pearce
Because we are renaming the packfile into its file destination we need to be sure its not open when the rename is called, otherwise some operating systems (e.g. Windows) may prevent the rename from occurring. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-16Use .keep files in fast-import during processing.Shawn O. Pearce
Because fast-import automatically updates all references (heads and tags) at the end of its run the repository is corrupt unless the objects are available in the .git/objects/pack directory prior to the refs being modified. The easiest way to ensure that is true is to move the packfile and its associated index directly into the .git/objects/pack directory as soon as we have finished output to it. But the only safe way to do this is to create the a temporary .keep file for that pack, so we use the same tricks that index-pack uses when its being invoked by receive-pack. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-16Reuse sha1 in packed_git in fast-import.Shawn O. Pearce
Rather than maintaing our own packfile level sha1 variable we can make use of the one already available in struct packed_git. Its meant for the SHA1 of the index but it can also hold the SHA1 of the packfile itself between final checksumming of the packfile and creation of the index. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-16Replace redundant yread() with read_in_full() in fast-import.Shawn O. Pearce
Prior to git having read_in_full() fast-import used its own private function yread to perform the header reading task. No sense in keeping that around now that read_in_full is a public, stable function. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-16Use uintmax_t for marks in fast-import.Shawn O. Pearce
If a frontend wants to use a mark per file revision and per commit and is doing a truly huge import (such as a 32 GiB SVN repository) we may need more than 2**32 unique mark values, especially if the frontend is unable (or unwilling) to recycle mark values. For mark idnums we should use the largest unsigned integer type available, hoping that will be at least 64 bits when we are compiled as a 64 bit executable. This way we may consume huge amounts of memory storing our mark table, but we'll at least be able to process the entire import without failing. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-16Corrected buffer overflow during automatic checkpoint in fast-import.Shawn O. Pearce
If we previously were using a delta but we needed to checkpoint the current packfile and switch to a new packfile we need to throw away the delta and compress the raw object by itself, as delta chains cannot span non-thin packfiles. Unfortunately the output buffer in this case needs to grow, as the size of the compressed object may be quite a bit larger than the size of the compressed delta. I've also avoided recompressing the object if we are checkpointing and we didn't use a delta. In this case the output buffer is the correct size and has already been populated with the right data, we just need to close out the current packfile and open a new one. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-15Print the packfile names to stdout from fast-import.Shawn O. Pearce
Caller scripts may want to know what packfiles the fast-import process just wrote out for them. This is now output to stdout, one packfile name per line, after we checkpoint each packfile. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-15Implemented automatic checkpoints within fast-import.Shawn O. Pearce
When the number of objects or number of bytes gets close to the limit allowed by the packfile format (or configured on the command line by our caller) we should automatically checkpoint the current packfile and start a new one before writing the object out. This does however require that we abandon the delta (if we had one) as its not valid in a new packfile. I also added the simple rule that if we got a delta back but the delta itself is the same size as or larger than the uncompressed object to ignore the delta and just store the object data. This should avoid some really bad behavior caused by our current delta strategy. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-15Optimize index creation on large object sets in fast-import.Shawn O. Pearce
When we are generating multiple packfiles at once we only need to scan the blocks of object_entry structs which contain objects for the current packfile. Because the most recent blocks are at the front of the linked list, and because all new objects going into the current file are allocated from the front of that list, we can stop scanning for objects as soon as we identify one which doesn't belong to the current packfile. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-15Don't create a final empty packfile in fast-import.Shawn O. Pearce
If the last packfile is going to be empty (has 0 objects) then it shouldn't be kept after the import has terminated, as there is no point to the packfile. So rather than hashing it and making the index file, just delete the packfile. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-15Implemented manual packfile switching in fast-import.Shawn O. Pearce
To help importers which are dealing with massive amounts of data fast-import needs to be able to close the packfile it is currently writing to and open a new packfile for any additional data that will be received. A new 'checkpoint' command has been introduced which can be used by the frontend import process to force this to occur at any time. This may be useful to ensure a very long running import doesn't lose any work due to unexpected failures. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-15Remove unnecessary duplicate_count in fast-import.Shawn O. Pearce
There is little reason to be keeping a global duplicate_count value when we also keep it per object type. The global counter can easily be computed at the end, once all processing has completed. This saves us a couple of machine instructions in an unimportant part of code. But it looks slightly better to me to not keep two counters around. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-15Restructure fast-import to support creating multiple packfiles.Shawn O. Pearce
Now that we are starting to see some really large projects (such as KDE or a fork of FreeBSD) get imported into Git we're running into the upper limit on packfile object count as well as overall byte length. The KDE and FreeBSD projects are both likely to require more than 4 GiB to store their current history, which means we really need multiple packfiles to handle their content. This is a fairly simple restructuring of the internal code to help us support creating multiple packfiles from within fast-import. We are now adding a 5 digit incrementing suffix to the end of the basename supplied to us by the caller, permitting up to 99,999 packs to be generated in a single fast-import run. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-15Misc. type cleanups within fast-import.Shawn O. Pearce
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-15Improve reuse of sha1_file library within fast-import.Shawn O. Pearce
Now that the sha1_file.c library routines use the sliding mmap routines to perform efficient access to portions of a packfile I can remove that code from fast-import.c and just invoke it. One benefit is we now have reloading support for any packfile which uses OBJ_OFS_DELTA. Another is we have significantly less code to maintain. This code reuse change *requires* that fast-import generate only an OBJ_OFS_DELTA format packfile, as there is absolutely no index available to perform OBJ_REF_DELTA lookup in while unpacking an object. This is probably reasonable to require as the delta offsets result in smaller packfiles and are faster to unpack, as no index searching is required. Its also only a temporary requirement as users could always repack without offsets before making the import available to older versions of Git. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Merge branch 'master' into sp/fast-importShawn O. Pearce
I'm bringing master in early so that the OBJ_OFS_DELTA implementation is available as part of the topic. This way git-fast-import can learn about this new slightly smaller and faster packfile format, and can generate them directly rather than needing to have them be repacked with git-pack-objects. Due to the API changes in master during the period of development of git-fast-import, a few minor tweaks to fast-import.c are needed to produce a working merge. I've done them here as part of the merge to ensure bisection always works. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Allow creating branches without committing in fast-import.Shawn O. Pearce
Some importers may want to create a branch long before they actually commit to it, or in some cases they may never commit to the branch but they still need the ref to be created in the repository after the import is complete. This extends the 'reset ' command to automatically create a new branch if the supplied reference isn't already known as a branch. While I'm at it I also modified the syntax of the reset command to terminate with an empty line, like commit and tag operate. This just makes the command set more consistent. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Support creation of merge commits in fast-import.Shawn O. Pearce
Some importers are able to determine when branch merges occurred within their source data. In these cases they will want to supply the correct commits to fast-import so that a proper merge commit will exist in Git. This is now supported by supplying a 'merge ' command after the commit message and optional from command. A merge is not actually performed by fast-import, its assumed that the frontend performed any sort of merging activity already and that fast-import should simply be storing its result. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Fix repository corruption when using marks for modified blobs.Shawn O. Pearce
Apparently we did not copy the blob SHA1 into the stack variable 'sha1' when a mark is used to refer to a prior blob. This code was not previously tested as the Mozilla CVS -> git-fast-import program always fed us full SHA1s for modified blobs and did not use the mark feature there. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>