summaryrefslogtreecommitdiff
path: root/sha1_file.c
AgeCommit message (Collapse)Author
2012-02-27Merge branch 'jk/maint-avoid-streaming-filtered-contents'Junio C Hamano
* jk/maint-avoid-streaming-filtered-contents: do not stream large files to pack when filters are in use teach dry-run convert_to_git not to require a src buffer teach convert_to_git a "dry run" mode
2012-02-24do not stream large files to pack when filters are in useJeff King
Because git's object format requires us to specify the number of bytes in the object in its header, we must know the size before streaming a blob into the object database. This is not a problem when adding a regular file, as we can get the size from stat(). However, when filters are in use (such as autocrlf, or the ident, filter, or eol gitattributes), we have no idea what the ultimate size will be. The current code just punts on the whole issue and ignores filter configuration entirely for files larger than core.bigfilethreshold. This can generate confusing results if you use filters for large binary files, as the filter will suddenly stop working as the file goes over a certain size. Rather than try to handle unknown input sizes with streaming, this patch just turns off the streaming optimization when filters are in use. This has a slight performance regression in a very specific case: if you have autocrlf on, but no gitattributes, a large binary file will avoid the streaming code path because we don't know beforehand whether it will need conversion or not. But if you are handling large binary files, you should be marking them as such via attributes (or at least not using autocrlf, and instead marking your text files as such). And the flip side is that if you have a large _non_-binary file, there is a correctness improvement; before we did not apply the conversion at all. The first half of the new t1051 script covers these failures on input. The second half tests the matching output code paths. These already work correctly, and do not need any adjustment. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-13Merge branch 'nd/find-pack-entry-recent-cache-invalidation'Junio C Hamano
* nd/find-pack-entry-recent-cache-invalidation: find_pack_entry(): do not keep packed_git pointer locally sha1_file.c: move the core logic of find_pack_entry() into fill_pack_entry()
2012-02-13Merge branch 'mm/empty-loose-error-message'Junio C Hamano
* mm/empty-loose-error-message: fsck: give accurate error message on empty loose object files
2012-02-06fsck: give accurate error message on empty loose object filesMatthieu Moy
Since 3ba7a065527a (A loose object is not corrupt if it cannot be read due to EMFILE), "git fsck" on a repository with an empty loose object file complains with the error message fatal: failed to read object <sha1>: Invalid argument This comes from a failure of mmap on this empty file, which sets errno to EINVAL. Instead of calling xmmap on empty file, we display a clean error message ourselves, and return a NULL pointer. The new message is error: object file .git/objects/09/<rest-of-sha1> is empty fatal: loose object <sha1> (stored in .git/objects/09/<rest-of-sha1>) is corrupt The second line was already there before the regression in 3ba7a065527a, and the first is an additional message, that should help diagnosing the problem for the user. Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-01find_pack_entry(): do not keep packed_git pointer locallyNguyễn Thái Ngọc Duy
Commit f7c22cc (always start looking up objects in the last used pack first - 2007-05-30) introduce a static packed_git* pointer as an optimization. The kept pointer however may become invalid if free_pack_by_name() happens to free that particular pack. Current code base does not access packs after calling free_pack_by_name() so it should not be a problem. Anyway, move the pointer out so that free_pack_by_name() can reset it to avoid running into troubles in future. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-01sha1_file.c: move the core logic of find_pack_entry() into fill_pack_entry()Nguyễn Thái Ngọc Duy
The new helper function implements the logic to find the offset for the object in one pack and fill a pack_entry structure. The next patch will restructure the loop and will call the helper from two places. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-21Appease Sun Studio by renaming "tmpfile"Ævar Arnfjörð Bjarmason
On Solaris the system headers define the "tmpfile" name, which'll cause Git compiled with Sun Studio 12 Update 1 to whine about us redefining the name: "pack-write.c", line 76: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "sha1_file.c", line 2455: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "fast-import.c", line 858: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "builtin/index-pack.c", line 175: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) Just renaming the "tmpfile" variable to "tmp_file" in the relevant places is the easiest way to fix this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-17Merge branch 'jc/stream-to-pack'Junio C Hamano
* jc/stream-to-pack: bulk-checkin: replace fast-import based implementation csum-file: introduce sha1file_checkpoint finish_tmp_packfile(): a helper function create_tmp_packfile(): a helper function write_pack_header(): a helper function Conflicts: pack.h
2011-12-14Merge branch 'nd/misc-cleanups' into maintJunio C Hamano
* nd/misc-cleanups: unpack_object_header_buffer(): clear the size field upon error tree_entry_interesting: make use of local pointer "item" tree_entry_interesting(): give meaningful names to return values read_directory_recursive: reduce one indentation level get_tree_entry(): do not call find_tree_entry() on an empty tree tree-walk.c: do not leak internal structure in tree_entry_len()
2011-12-05Merge branch 'nd/misc-cleanups'Junio C Hamano
* nd/misc-cleanups: unpack_object_header_buffer(): clear the size field upon error tree_entry_interesting: make use of local pointer "item" tree_entry_interesting(): give meaningful names to return values read_directory_recursive: reduce one indentation level get_tree_entry(): do not call find_tree_entry() on an empty tree tree-walk.c: do not leak internal structure in tree_entry_len()
2011-12-01bulk-checkin: replace fast-import based implementationJunio C Hamano
This extends the earlier approach to stream a large file directly from the filesystem to its own packfile, and allows "git add" to send large files directly into a single pack. Older code used to spawn fast-import, but the new bulk-checkin API replaces it. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-16sha1_file: don't mix enum with intRamkumar Ramachandra
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-27unpack_object_header_buffer(): clear the size field upon errorJunio C Hamano
The callers do not use the returned size when the function says it did not use any bytes and sets the type to OBJ_BAD, so this should not matter in practice, but it is a good code hygiene anyway. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-21Merge branch 'jk/maint-pack-objects-compete-with-delete'Junio C Hamano
* jk/maint-pack-objects-compete-with-delete: downgrade "packfile cannot be accessed" errors to warnings pack-objects: protect against disappearing packs
2011-10-14downgrade "packfile cannot be accessed" errors to warningsJeff King
These can happen if another process simultaneously prunes a pack. But that is not usually an error condition, because a properly-running prune should have repacked the object into a new pack. So we will notice that the pack has disappeared unexpectedly, print a message, try other packs (possibly after re-scanning the list of packs), and find it in the new pack. Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-14pack-objects: protect against disappearing packsJeff King
It's possible that while pack-objects is running, a simultaneously running prune process might delete a pack that we are interested in. Because we load the pack indices early on, we know that the pack contains our item, but by the time we try to open and map it, it is gone. Since c715f78, we already protect against this in the normal object access code path, but pack-objects accesses the packs at a lower level. In the normal access path, we call find_pack_entry, which will call find_pack_entry_one on each pack index, which does the actual lookup. If it gets a hit, we will actually open and verify the validity of the matching packfile (using c715f78's is_pack_valid). If we can't open it, we'll issue a warning and pretend that we didn't find it, causing us to go on to the next pack (or on to loose objects). Furthermore, we will cache the descriptor to the opened packfile. Which means that later, when we actually try to access the object, we are likely to still have that packfile opened, and won't care if it has been unlinked from the filesystem. Notice the "likely" above. If there is another pack access in the interim, and we run out of descriptors, we could close the pack. And then a later attempt to access the closed pack could fail (we'll try to re-open it, of course, but it may have been deleted). In practice, this doesn't happen because we tend to look up items and then access them immediately. Pack-objects does not follow this code path. Instead, it accesses the packs at a much lower level, using find_pack_entry_one directly. This means we skip the is_pack_valid check, and may end up with the name of a packfile, but no open descriptor. We can add the same is_pack_valid check here. Unfortunately, the access patterns of pack-objects are not quite as nice for keeping lookup and object access together. We look up each object as we find out about it, and the only later when writing the packfile do we necessarily access it. Which means that the opened packfile may be closed in the interim. In practice, however, adding this check still has value, for three reasons. 1. If you have a reasonable number of packs and/or a reasonable file descriptor limit, you can keep all of your packs open simultaneously. If this is the case, then the race is impossible to trigger. 2. Even if you can't keep all packs open at once, you may end up keeping the deleted one open (i.e., you may get lucky). 3. The race window is shortened. You may notice early that the pack is gone, and not try to access it. Triggering the problem without this check means deleting the pack any time after we read the list of index files, but before we access the looked-up objects. Triggering it with this check means deleting the pack means deleting the pack after we do a lookup (and successfully access the packfile), but before we access the object. Which is a smaller window. Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-05Merge branch 'wh/normalize-alt-odb-path'Junio C Hamano
* wh/normalize-alt-odb-path: sha1_file: normalize alt_odb path before comparing and storing
2011-09-07sha1_file: normalize alt_odb path before comparing and storingHui Wang
When it needs to compare and add an alt object path to the alt_odb_list, we normalize this path first since comparing normalized path is easy to get correct result. Use strbuf to replace some string operations, since it is cleaner and safer. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Hui Wang <Hui.Wang@windriver.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-29Merge branch 'jc/maint-clone-alternates'Junio C Hamano
* jc/maint-clone-alternates: clone: clone from a repository with relative alternates clone: allow more than one --reference Conflicts: builtin/clone.c
2011-08-23Merge branch 'rt/zlib-smaller-window'Junio C Hamano
* rt/zlib-smaller-window: test: consolidate definition of $LF Tolerate zlib deflation with window size < 32Kb
2011-08-23clone: clone from a repository with relative alternatesJunio C Hamano
Cloning from a local repository blindly copies or hardlinks all the files under objects/ hierarchy. This results in two issues: - If the repository cloned has an "objects/info/alternates" file, and the command line of clone specifies --reference, the ones specified on the command line get overwritten by the copy from the original repository. - An entry in a "objects/info/alternates" file can specify the object stores it borrows objects from as a path relative to the "objects/" directory. When cloning a repository with such an alternates file, if the new repository is not sitting next to the original repository, such relative paths needs to be adjusted so that they can be used in the new repository. This updates add_to_alternates_file() to take the path to the alternate object store, including the "/objects" part at the end (earlier, it was taking the path to $GIT_DIR and was adding "/objects" itself), as it is technically possible to specify in objects/info/alternates file the path of a directory whose name does not end with "/objects". Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-11Tolerate zlib deflation with window size < 32KbRoberto Tyley
Git currently reports loose objects as 'corrupt' if they've been deflated using a window size less than 32Kb, because the experimental_loose_object() function doesn't recognise the header byte as a zlib header. This patch makes the function tolerant of all valid window sizes (15-bit to 8-bit) - but doesn't sacrifice it's accuracy in distingushing the standard loose-object format from the experimental (now abandoned) format. On memory constrained systems zlib may use a much smaller window size - working on Agit, I found that Android uses a 4KB window; giving a header byte of 0x48, not 0x78. Consequently all loose objects generated appear 'corrupt', which is why Agit is a read-only Git client at this time - I don't want my client to generate Git repos that other clients treat as broken :( This patch makes Git tolerant of different deflate settings - it might appear that it changes experimental_loose_object() to the point where it could incorrectly identify the experimental format as the standard one, but the two criteria (bitmask & checksum) can only give a false result for an experimental object where both of the following are true: 1) object size is exactly 8 bytes when uncompressed (bitmask) 2) [single-byte in-pack git type&size header] * 256 + [1st byte of the following zlib header] % 31 = 0 (checksum) As it happens, for all possible combinations of valid object type (1-4) and window bits (0-7), the only time when the checksum will be divisible by 31 is for 0x1838 - ie object type *1*, a Commit - which, due the fields all Commit objects must contain, could never be as small as 8 bytes in size. Given this, the combination of the two criteria (bitmask & checksum) always correctly determines the buffer format, and is more tolerant than the previous version. The alternative to this patch is simply removing support for the experimental format, which I am also totally cool with. References: Android uses a 4KB window for deflation: http://android.git.kernel.org/?p=platform/libcore.git;a=blob;f=luni/src/main/native/java_util_zip_Deflater.cpp;h=c0b2feff196e63a7b85d97cf9ae5bb2583409c28;hb=refs/heads/gingerbread#l53 Code snippet searching for false positives with the zlib checksum: https://gist.github.com/1118177 Signed-off-by: Roberto Tyley <roberto.tyley@guardian.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-05Merge branch 'jc/pack-order-tweak'Junio C Hamano
* jc/pack-order-tweak: pack-objects: optimize "recency order" core: log offset pack data accesses happened
2011-08-01Merge branch 'jc/legacy-loose-object' into maintJunio C Hamano
* jc/legacy-loose-object: sha1_file.c: "legacy" is really the current format
2011-07-19Merge branch 'jc/index-pack'Junio C Hamano
* jc/index-pack: verify-pack: use index-pack --verify index-pack: show histogram when emulating "verify-pack -v" index-pack: start learning to emulate "verify-pack -v" index-pack: a miniscule refactor index-pack --verify: read anomalous offsets from v2 idx file write_idx_file: need_large_offset() helper function index-pack: --verify write_idx_file: introduce a struct to hold idx customization options index-pack: group the delta-base array entries also by type Conflicts: builtin/verify-pack.c cache.h sha1_file.c
2011-07-19Merge branch 'jc/zlib-wrap'Junio C Hamano
* jc/zlib-wrap: zlib: allow feeding more than 4GB in one go zlib: zlib can only process 4GB at a time zlib: wrap deflateBound() too zlib: wrap deflate side of the API zlib: wrap inflateInit2 used to accept only for gzip format zlib: wrap remaining calls to direct inflate/inflateEnd zlib wrapper: refactor error message formatter Conflicts: sha1_file.c
2011-07-13Merge branch 'jc/legacy-loose-object'Junio C Hamano
* jc/legacy-loose-object: sha1_file.c: "legacy" is really the current format
2011-07-07core: log offset pack data accesses happenedJunio C Hamano
In a workload other than "git log" (without pathspec nor any option that causes us to inspect trees and blobs), the recency pack order is said to cause the access jump around quite a bit. Add a hook to allow us observe how bad it is. "git config core.logpackaccess /var/tmp/pal.txt" will give you the log in the specified file. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10zlib: zlib can only process 4GB at a timeJunio C Hamano
The size of objects we read from the repository and data we try to put into the repository are represented in "unsigned long", so that on larger architectures we can handle objects that weigh more than 4GB. But the interface defined in zlib.h to communicate with inflate/deflate limits avail_in (how many bytes of input are we calling zlib with) and avail_out (how many bytes of output from zlib are we ready to accept) fields effectively to 4GB by defining their type to be uInt. In many places in our code, we allocate a large buffer (e.g. mmap'ing a large loose object file) and tell zlib its size by assigning the size to avail_in field of the stream, but that will truncate the high octets of the real size. The worst part of this story is that we often pass around z_stream (the state object used by zlib) to keep track of the number of used bytes in input/output buffer by inspecting these two fields, which practically limits our callchain to the same 4GB limit. Wrap z_stream in another structure git_zstream that can express avail_in and avail_out in unsigned long. For now, just die() when the caller gives a size that cannot be given to a single zlib call. In later patches in the series, we would make git_inflate() and git_deflate() internally loop to give callers an illusion that our "improved" version of zlib interface can operate on a buffer larger than 4GB in one go. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10zlib: wrap deflate side of the APIJunio C Hamano
Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use of deflateInit2 in remote-curl.c to tell the library to use gzip header and trailer in git_deflate_init_gzip(). There is only one caller that cares about the status from deflateEnd(). Introduce git_deflate_end_gently() to let that sole caller retrieve the status and act on it (i.e. die) for now, but we would probably want to make inflate_end/deflate_end die when they ran out of memory and get rid of the _gently() kind. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-08sha1_file.c: "legacy" is really the current formatJunio C Hamano
Every time I look at the read-loose-object codepath, legacy_loose_object() function makes my brain go through mental contortion. When we were playing with the experimental loose object format, it may have made sense to call the traditional format "legacy", in the hope that the experimental one will some day replace it to become official, but it never happened. This renames the function (and negates its return value) to detect if we are looking at the experimental format, and move the code around in its caller which used to do "if we are looing at legacy, do this special case, otherwise the normal case is this". The codepath to read from the loose objects in experimental format is the "unlikely" case. Someday after Git 2.0, we should drop the support of this format. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-06verify-pack: use index-pack --verifyJunio C Hamano
This finally gets rid of the inefficient verify-pack implementation that walks objects in the packfile in their object name order and replaces it with a call to index-pack --verify. As a side effect, it also removes packed_object_info_detail() API which is rather expensive. As this changes the way errors are reported (verify-pack used to rely on the usual runtime error detection routine unpack_entry() to diagnose the CRC errors in an entry in the *.idx file; index-pack --verify checks the whole *.idx file in one go), update a test that expected the string "CRC" to appear in the error message. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26sha1_file: use the correct type (ssize_t, not size_t) for read-style functionJim Meyering
Using an unsigned type, we would fail to detect a read error and then proceed to try to write (size_t)-1 bytes. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-25Merge branch 'jc/bigfile'Junio C Hamano
* jc/bigfile: Bigfile: teach "git add" to send a large file straight to a pack index_fd(): split into two helper functions index_fd(): turn write_object and format_check arguments into one flag
2011-05-21sha1_file.c: expose helpers to read loose objectsJunio C Hamano
Make map_sha1_file(), parse_sha1_header() and unpack_sha1_header() available to the streaming read API by exporting them via cache.h header file. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-21unpack_object_header(): make it publicJunio C Hamano
This function is used to read and skip over the per-object header in a packfile. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-21sha1_object_info_extended(): hint about objects in delta-base cacheJunio C Hamano
An object found in the delta-base cache is not guaranteed to stay there, but we know it came from a pack and it is likely to give us a quick access if we read_sha1_file() it right now, which is a piece of useful information. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20Merge branch 'jc/replacing'Junio C Hamano
* jc/replacing: read_sha1_file(): allow selective bypassing of replacement mechanism inline lookup_replace_object() calls read_sha1_file(): get rid of read_sha1_file_repl() madness t6050: make sure we test not just commit replacement Declare lookup_replace_object() in cache.h, not in commit.h Conflicts: environment.c
2011-05-19sha1_object_info_extended(): expose a bit more infoJunio C Hamano
The original interface for sha1_object_info() takes an object name and gives back a type and its size (the latter is given only when it was asked). The new interface wraps its implementation and exposes a bit more pieces of information that the interface used to discard, namely: - where the object is stored (loose? cached? packed?) - if packed, where in which packfile? Signed-off-by: Junio C Hamano <gitster@pobox.com> --- * In the earlier round, this used u.pack.delta to record the length of the delta chain, but the caller is not necessarily interested in the length of the delta chain per-se, but may only want to know if it is a delta against another object or is stored as a deflated data. Calling packed_object_info_detail() involves walking the reverse index chain to compute the store size of the object and is unnecessarily expensive. We could resurrect the code if a new caller wants to know, but I doubt it.
2011-05-17packed_object_info_detail(): do not return a stringJunio C Hamano
Instead return an integer that can be given to typename() if the caller wants a string, just like everybody else does. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15Merge branches 'jc/convert', 'jc/bigfile' and 'jc/replacing' into jc/streamingJunio C Hamano
* jc/convert: convert: make it harder to screw up adding a conversion attribute convert: make it safer to add conversion attributes convert: give saner names to crlf/eol variables, types and functions convert: rename the "eol" global variable to "core_eol" * jc/bigfile: Bigfile: teach "git add" to send a large file straight to a pack index_fd(): split into two helper functions index_fd(): turn write_object and format_check arguments into one flag * jc/replacing: read_sha1_file(): allow selective bypassing of replacement mechanism inline lookup_replace_object() calls read_sha1_file(): get rid of read_sha1_file_repl() madness t6050: make sure we test not just commit replacement Declare lookup_replace_object() in cache.h, not in commit.h
2011-05-15git_open_noatime(): drop unused parameterJunio C Hamano
Since commit c793430 (Limit file descriptors used by packs, 2011-02-28), the extra parameter added in f2e872aa (Work around EMFILE when there are too many pack files, 2010-11-01) is not used anymore. Remove it. Signed-off-by: Junio C Hamano <gitster@pobox.com> Acked-by: Shawn O. Pearce <spearce@spearce.org>
2011-05-15sha1_file: typofixJunio C Hamano
The number zero is spelled "zero", not "zer0". Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15read_sha1_file(): allow selective bypassing of replacement mechanismJunio C Hamano
The way "object replacement" mechanism was tucked to the read_sha1_file() interface was suboptimal in a couple of ways: - Callers that want it to die with useful diagnosis upon seeing a corrupt object does not have a way to say that they do not want any object replacement. - Callers who do not want it to die but want to handle the errors themselves are told to arrange to call read_object(), but the function does not use the replacement mechanism, and also it is a file scope static function that not many callers can call to begin with. This adds a read_sha1_file_extended() that takes a set of flags; the callers of read_sha1_file() passes a flag READ_SHA1_FILE_REPLACE to ask for object replacement mechanism to kick in. Later, we could add another flag bit to tell the function to return an error instead of dying and then remove the misguided "call read_object() yourself". Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15read_sha1_file(): get rid of read_sha1_file_repl() madnessJunio C Hamano
Most callers want to silently get a replacement object, and they do not care what the real name of the replacement object is. Worse yet, no sane interface to return the underlying object without replacement is provided. Remove the function and make only the few callers that want the name of the replacement object find it themselves. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-13Bigfile: teach "git add" to send a large file straight to a packJunio C Hamano
When adding a new content to the repository, we have always slurped the blob in its entirety in-core first, and computed the object name and compressed it into a loose object file. Handling large binary files (e.g. video and audio asset for games) has been problematic because of this design. At the middle level of "git add" callchain is an internal API index_fd() that takes an open file descriptor to read from the working tree file being added with its size. Teach it to call out to fast-import when adding a large blob. The write-out codepath in entry.c::write_entry() should be taught to stream, instead of reading everything in core. This should not be so hard to implement, especially if we limit ourselves only to loose object files and non-delta representation in packfiles. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09index_fd(): split into two helper functionsJunio C Hamano
Split out the case where we do not know the size of the input (hence we read everything into a strbuf before doing anything) to index_pipe(), and the other case where we mmap or read the whole data to index_bulk(). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09index_fd(): turn write_object and format_check arguments into one flagJunio C Hamano
The "format_check" parameter tucked after the existing parameters is too ugly an afterthought to live in any reasonable API. Combine it with the other boolean parameter "write_object" into a single "flags" parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-13remove doubled words, e.g., s/to to/to/, and fix related typosJim Meyering
I found that some doubled words had snuck back into projects from which I'd already removed them, so now there's a "syntax-check" makefile rule in gnulib to help prevent recurrence. Running the command below spotted a few in git, too: git ls-files | xargs perl -0777 -n \ -e 'while (/\b(then?|[iao]n|i[fst]|but|f?or|at|and|[dt])\s+\1\b/gims)' \ -e '{$n=($` =~ tr/\n/\n/ + 1); ($v=$&)=~s/\n/\\n/g;' \ -e 'print "$ARGV:$n:$v\n"}' Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>