summaryrefslogtreecommitdiff
path: root/midx.c
AgeCommit message (Collapse)Author
2019-01-04Merge branch 'jk/loose-object-cache'Junio C Hamano
Code clean-up with optimization for the codepath that checks (non-)existence of loose objects. * jk/loose-object-cache: odb_load_loose_cache: fix strbuf leak fetch-pack: drop custom loose object cache sha1-file: use loose object cache for quick existence check object-store: provide helpers for loose_objects_cache sha1-file: use an object_directory for the main object dir handle alternates paths the same as the main object dir sha1_file_name(): overwrite buffer instead of appending rename "alternate_object_database" to "object_directory" submodule--helper: prefer strip_suffix() to ends_with() fsck: do not reuse child_process structs
2018-11-29i18n: fix small typosJean-Noël Avila
Translating the new strings introduced for v2.20 showed some typos. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-18Merge branch 'jk/unused-parameter-fixes'Junio C Hamano
Various functions have been audited for "-Wunused-parameter" warnings and bugs in them got fixed. * jk/unused-parameter-fixes: midx: double-check large object write loop assert NOARG/NONEG behavior of parse-options callbacks parse-options: drop OPT_DATE() apply: return -1 from option callback instead of calling exit(1) cat-file: report an error on multiple --batch options tag: mark "--message" option with NONEG show-branch: mark --reflog option as NONEG format-patch: mark "--no-numbered" option with NONEG status: mark --find-renames option with NONEG cat-file: mark batch options with NONEG pack-objects: mark index-version option as NONEG ls-files: mark exclude options as NONEG am: handle --no-patch-format option apply: mark include/exclude options as NONEG
2018-11-13Merge branch 'ds/test-multi-pack-index'Junio C Hamano
Tests for the recently introduced multi-pack index machinery. * ds/test-multi-pack-index: packfile: close multi-pack-index in close_all_packs multi-pack-index: define GIT_TEST_MULTI_PACK_INDEX midx: close multi-pack-index on repack midx: fix broken free() in close_midx()
2018-11-06midx: double-check large object write loopJeff King
The write_midx_large_offsets() function takes an array of object entries, the number of entries in the array (nr_objects), and the number of entries with large offsets (nr_large_offset). But we never actually use nr_objects; instead we keep walking down the array and counting down nr_large_offset until we've seen all of the large entries. This is correct, but we can be a bit more defensive. If there were ever a mismatch between nr_large_offset and the actual set of large-offset objects, we'd walk off the end of the array. Since we know the size of the array, we can use nr_objects to make sure we don't walk too far. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-22multi-pack-index: define GIT_TEST_MULTI_PACK_INDEXDerrick Stolee
The multi-pack-index feature is tested in isolation by t5319-multi-pack-index.sh, but there are many more interesting scenarios in the test suite surrounding pack-file data shapes and interactions. Since the multi-pack-index is an optional data structure, it does not make sense to include it by default in those tests. Instead, add a new GIT_TEST_MULTI_PACK_INDEX environment variable that enables core.multiPackIndex and writes a multi-pack-index after each 'git repack' command. This adds extra test coverage when needed. There are a few spots in the test suite that need to react to this change: * t5319-multi-pack-index.sh: there is a test that checks that 'git repack' deletes the multi-pack-index. Disable the environment variable to ensure this still happens. * t5310-pack-bitmaps.sh: One test moves a pack-file from the object directory to an alternate. This breaks the multi-pack-index, so delete the multi-pack-index at this point, if it exists. * t9300-fast-import.sh: One test verifies the number of files in the .git/objects/pack directory is exactly 8. Exclude the multi-pack-index from this count so it is still 8 in all cases. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-22midx: close multi-pack-index on repackDerrick Stolee
When repacking, we may remove pack-files. This invalidates the multi-pack-index (if it exists). Previously, we removed the multi-pack-index file before removing any pack-file. In some cases, the repack command may load the multi-pack-index into memory. This may lead to later in-memory references to the non-existent pack- files. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19multi-pack-index: avoid dead store for struct progressCarlo Marcelo Arenas Belón
it is initialized unconditionally by a call to start_progress below. Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-16Merge branch 'jk/oideq-hasheq-cleanup'Junio C Hamano
Code clean-up. * jk/oideq-hasheq-cleanup: more oideq/hasheq conversions
2018-10-09midx: fix broken free() in close_midx()Derrick Stolee
When closing a multi-pack-index, we intend to close each pack-file and free the struct packed_git that represents it. However, this line was previously freeing the array of pointers, not the pointer itself. This leads to a double-free issue. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04more oideq/hasheq conversionsJeff King
We added faster equality-comparison functions for hashes in 14438c4497 (introduce hasheq() and oideq(), 2018-08-28). A few topics were in-flight at the time, and can now be converted. This covers all spots found by "make coccicheck" in master (the coccicheck results were tweaked by hand for style). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17multi-pack-index: report progress during 'verify'Derrick Stolee
When verifying a multi-pack-index, the only action that takes significant time is checking the object offsets. For example, to verify a multi-pack-index containing 6.2 million objects in the Linux kernel repository takes 1.3 seconds on my machine. 99% of that time is spent looking up object offsets in each of the packfiles and comparing them to the multi-pack-index offset. Add a progress indicator for that section of the 'verify' verb. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17multi-pack-index: verify object offsetsDerrick Stolee
The 'git multi-pack-index verify' command must verify the object offsets stored in the multi-pack-index are correct. There are two ways the offset chunk can be incorrect: the pack-int-id and the object offset. Replace the BUG() statement with a die() statement, now that we may hit a bad pack-int-id during a 'verify' command on a corrupt multi-pack-index, and it is covered by a test. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17multi-pack-index: fix 32-bit vs 64-bit size checkDerrick Stolee
When loading a 64-bit offset, we intend to check that off_t can store the resulting offset. However, the condition accidentally checks the 32-bit offset to see if it is smaller than a 64-bit value. Fix it, and this will be covered by a test in the 'git multi-pack-index verify' command in a later commit. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17multi-pack-index: verify oid lookup orderDerrick Stolee
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17multi-pack-index: verify oid fanout orderDerrick Stolee
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17multi-pack-index: verify missing packDerrick Stolee
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17multi-pack-index: verify packname orderDerrick Stolee
The final check we make while loading a multi-pack-index is that the packfile names are in lexicographical order. Make this error be a die() instead. In order to test this condition, we need multiple packfiles. Earlier in t5319-multi-pack-index.sh, we tested the interaction with 'git repack' but this limits us to one packfile in our object dir. Move these repack tests until after the 'verify' tests. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17multi-pack-index: verify corrupt chunk lookup tableDerrick Stolee
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17multi-pack-index: verify bad headerDerrick Stolee
When verifying if a multi-pack-index file is valid, we want the command to fail to signal an invalid file. Previously, we wrote an error to stderr and continued as if we had no multi-pack-index. Now, die() instead of error(). Add tests that check corrupted headers in a few ways: * Bad signature * Bad file version * Bad hash version * Truncated hash count * Extended hash count Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17multi-pack-index: add 'verify' verbDerrick Stolee
The multi-pack-index builtin writes multi-pack-index files, and uses a 'write' verb to do so. Add a 'verify' verb that checks this file matches the contents of the pack-indexes it replaces. The current implementation is a no-op, but will be extended in small increments in later commits. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20packfile: add all_packs listDerrick Stolee
If a repo contains a multi-pack-index, then the packed_git list does not contain the packfiles that are covered by the multi-pack-index. This is important for doing object lookups, abbreviations, and approximating object count. However, there are many operations that really want to iterate over all packfiles. Create a new 'all_packs' linked list that contains this list, starting with the packfiles in the multi-pack-index and then continuing along the packed_git linked list. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20midx: fix bug that skips midx with alternatesDerrick Stolee
The logic for constructing the linked list of multi-pack-indexes in the object store is incorrect. If the local object store has a multi-pack-index, but an alternate does not, then the list is dropped. Add tests that would have revealed this bug. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20midx: mark bad packed objectsDerrick Stolee
When an object fails to decompress from a pack-file, we mark the object as 'bad' so we can retry with a different copy of the object (if such a copy exists). Before now, the multi-pack-index did not update the bad objects list for the pack-files it contains, and we did not check the bad objects list when reading an object. Now, do both. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20multi-pack-index: store local propertyDerrick Stolee
A pack-file is 'local' if it is stored within the usual object directory. If it is stored in an alternate, it is non-local. Pack-files are stored using a 'pack_local' member in the packed_git struct. Add a similar 'local' member to the multi_pack_index struct and 'local' parameters to the methods that load and prepare multi- pack-indexes. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20midx: clear midx on repackDerrick Stolee
If a 'git repack' command replaces existing packfiles, then we must clear the existing multi-pack-index before moving the packfiles it references. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20midx: use existing midx when writing new oneDerrick Stolee
Due to how Windows handles replacing a lockfile when there is an open handle, create the close_midx() method to close the existing midx before writing the new one. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20midx: use midx in abbreviation calculationsDerrick Stolee
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20midx: read objects from multi-pack-indexDerrick Stolee
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20config: create core.multiPackIndex settingDerrick Stolee
The core.multiPackIndex config setting controls the multi-pack- index (MIDX) feature. If false, the setting will disable all reads from the multi-pack-index file. Read this config setting in the new prepare_multi_pack_index_one() which is called during prepare_packed_git(). This check is run once per repository. Add comparison commands in t5319-multi-pack-index.sh to check typical Git behavior remains the same as the config setting is turned on and off. This currently includes 'git rev-list' and 'git log' commands to trigger several object database reads. Currently, these would only catch an error in the prepare_multi_pack_index_one(), but with later commits will catch errors in object lookups, abbreviations, and approximate object counts. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20midx: write object offsetsDerrick Stolee
The final pair of chunks for the multi-pack-index file stores the object offsets. We default to using 32-bit offsets as in the pack-index version 1 format, but if there exists an offset larger than 32-bits, we use a trick similar to the pack-index version 2 format by storing all offsets at least 2^31 in a 64-bit table; we use the 32-bit table to point into that 64-bit table as necessary. We only store these 64-bit offsets if necessary, so create a test that manipulates a version 2 pack-index to fake a large offset. This allows us to test that the large offset table is created, but the data does not match the actual packfile offsets. The multi-pack-index offset does match the (corrupted) pack-index offset, so a future feature will compare these offsets during a 'verify' step. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20midx: write object id fanout chunkDerrick Stolee
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20midx: write object ids in a chunkDerrick Stolee
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20midx: sort and deduplicate objects from packfilesDerrick Stolee
Before writing a list of objects and their offsets to a multi-pack-index, we need to collect the list of objects contained in the packfiles. There may be multiple copies of some objects, so this list must be deduplicated. It is possible to artificially get into a state where there are many duplicate copies of objects. That can create high memory pressure if we are to create a list of all objects before de-duplication. To reduce this memory pressure without a significant performance drop, automatically group objects by the first byte of their object id. Use the IDX fanout tables to group the data, copy to a local array, then sort. Copy only the de-duplicated entries. Select the duplicate based on the most-recent modified time of a packfile containing the object. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20midx: read pack names into arrayDerrick Stolee
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20multi-pack-index: write pack names in chunkDerrick Stolee
The multi-pack-index needs to track which packfiles it indexes. Store these in our first required chunk. Since filenames are not well structured, add padding to keep good alignment in later chunks. Modify the 'git multi-pack-index read' subcommand to output the existence of the pack-file name chunk. Modify t5319-multi-pack-index.sh to reflect this new output and the new expected number of chunks. Defense in depth: A pattern we are using in the multi-pack-index feature is to verify the data as we write it. We want to ensure we never write invalid data to the multi-pack-index. There are many checks that verify that the values we are writing fit the format definitions. This mainly helps developers while working on the feature, but it can also identify issues that only appear when dealing with very large data sets. These large sets are hard to encode into test cases. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20multi-pack-index: read packfile listDerrick Stolee
When constructing a multi-pack-index file for a given object directory, read the files within the enclosed pack directory and find matches that end with ".idx" and find the correct paired packfile using add_packed_git(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20multi-pack-index: load into memoryDerrick Stolee
Create a new multi_pack_index struct for loading multi-pack-indexes into memory. Create a test-tool builtin for reading basic information about that multi-pack-index to verify the correct data is written. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20midx: write header information to lockfileDerrick Stolee
As we begin writing the multi-pack-index format to disk, start with the basics: the 12-byte header and the 20-byte checksum footer. Start with these basics so we can add the rest of the format in small increments. As we implement the format, we will use a technique to check that our computed offsets within the multi-pack-index file match what we are actually writing. Each method that writes to the hashfile will return the number of bytes written, and we will track that those values match our expectations. Currently, write_midx_header() returns 12, but is not checked. We will check the return value in a later commit. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20multi-pack-index: add 'write' verbDerrick Stolee
In anticipation of writing multi-pack-indexes, add a skeleton 'git multi-pack-index write' subcommand and send the options to a write_midx_file() method. Also create a skeleton test script that tests the 'write' subcommand. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>