From f33fb6e419ab9ca27403c8207e53f27411028420 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 13 Jan 2021 17:23:31 -0500 Subject: pack-revindex: introduce a new API In the next several patches, we will prepare for loading a reverse index either in memory (mapping the inverse of the .idx's contents in-core), or directly from a yet-to-be-introduced on-disk format. To prepare for that, we'll introduce an API that avoids the caller explicitly indexing the revindex pointer in the packed_git structure. There are four ways to interact with the reverse index. Accordingly, four functions will be exported from 'pack-revindex.h' by the time that the existing API is removed. A caller may: 1. Load the pack's reverse index. This involves opening up the index, generating an array, and then sorting it. Since opening the index can fail, this function ('load_pack_revindex()') returns an int. Accordingly, it takes only a single argument: the 'struct packed_git' the caller wants to build a reverse index for. This function is well-suited for both the current and new API. Callers will have to continue to open the reverse index explicitly, but this function will eventually learn how to detect and load a reverse index from the on-disk format, if one exists. Otherwise, it will fallback to generating one in memory from scratch. 2. Convert a pack position into an offset. This operation is now called `pack_pos_to_offset()`. It takes a pack and a position, and returns the corresponding off_t. Any error simply calls BUG(), since the callers are not well-suited to handle a failure and keep going. 3. Convert a pack position into an index position. Same as above; this takes a pack and a position, and returns a uint32_t. This operation is known as `pack_pos_to_index()`. The same thinking about error conditions applies here as well. 4. Find the pack position for a given offset. This operation is now known as `offset_to_pack_pos()`. It takes a pack, an offset, and a pointer to a uint32_t where the position is written, if an object exists at that offset. Otherwise, -1 is returned to indicate failure. Unlike some of the callers that used to access '->offset' and '->nr' directly, the error checking around this call is somewhat more robust. This is important since callers should always pass an offset which points at the boundary of two objects. The API, unlike direct access, enforces that that is the case. This will become important in a subsequent patch where a caller which does not but could check the return value treats the signed `-1` from `find_revindex_position()` as an index into the 'revindex' array. Two design warts are carried over into the new API: - Asking for the index position of an out-of-bounds object will result in a BUG() (since no such object exists), but asking for the offset of the non-existent object at the end of the pack returns the total size of the pack. This makes it convenient for callers who always want to take the difference of two adjacent object's offsets (to compute the on-disk size) but don't want to worry about boundaries at the end of the pack. - offset_to_pack_pos() lazily loads the reverse index, but pack_pos_to_index() doesn't (callers of the former are well-suited to handle errors, but callers of the latter are not). Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/pack-revindex.c b/pack-revindex.c index ecdde39..0ca3b54 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -203,3 +203,35 @@ struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs) return p->revindex + pos; } + +int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos) +{ + int ret; + + if (load_pack_revindex(p) < 0) + return -1; + + ret = find_revindex_position(p, ofs); + if (ret < 0) + return ret; + *pos = ret; + return 0; +} + +uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos) +{ + if (!p->revindex) + BUG("pack_pos_to_index: reverse index not yet loaded"); + if (p->num_objects <= pos) + BUG("pack_pos_to_index: out-of-bounds object at %"PRIu32, pos); + return p->revindex[pos].nr; +} + +off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos) +{ + if (!p->revindex) + BUG("pack_pos_to_index: reverse index not yet loaded"); + if (p->num_objects < pos) + BUG("pack_pos_to_offset: out-of-bounds object at %"PRIu32, pos); + return p->revindex[pos].offset; +} diff --git a/pack-revindex.h b/pack-revindex.h index 848331d..5a218aa 100644 --- a/pack-revindex.h +++ b/pack-revindex.h @@ -1,6 +1,21 @@ #ifndef PACK_REVINDEX_H #define PACK_REVINDEX_H +/** + * A revindex allows converting efficiently between three properties + * of an object within a pack: + * + * - index position: the numeric position within the list of sorted object ids + * found in the .idx file + * + * - pack position: the numeric position within the list of objects in their + * order within the actual .pack file (i.e., 0 is the first object in the + * .pack, 1 is the second, and so on) + * + * - offset: the byte offset within the .pack file at which the object contents + * can be found + */ + struct packed_git; struct revindex_entry { @@ -8,9 +23,48 @@ struct revindex_entry { unsigned int nr; }; +/* + * load_pack_revindex populates the revindex's internal data-structures for the + * given pack, returning zero on success and a negative value otherwise. + */ int load_pack_revindex(struct packed_git *p); int find_revindex_position(struct packed_git *p, off_t ofs); struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs); +/* + * offset_to_pack_pos converts an object offset to a pack position. This + * function returns zero on success, and a negative number otherwise. The + * parameter 'pos' is usable only on success. + * + * If the reverse index has not yet been loaded, this function loads it lazily, + * and returns an negative number if an error was encountered. + * + * This function runs in time O(log N) with the number of objects in the pack. + */ +int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos); + +/* + * pack_pos_to_index converts the given pack-relative position 'pos' by + * returning an index-relative position. + * + * If the reverse index has not yet been loaded, or the position is out of + * bounds, this function aborts. + * + * This function runs in constant time. + */ +uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos); + +/* + * pack_pos_to_offset converts the given pack-relative position 'pos' into a + * pack offset. For a pack with 'N' objects, asking for position 'N' will return + * the total size (in bytes) of the pack. + * + * If the reverse index has not yet been loaded, or the position is out of + * bounds, this function aborts. + * + * This function runs in constant time. + */ +off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos); + #endif -- cgit v0.10.2-6-g49f6 From 952fc6870d9594b364cf755e8419843b3049d98c Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 13 Jan 2021 17:23:35 -0500 Subject: write_reuse_object(): convert to new revindex API First replace 'find_pack_revindex()' with its replacement 'offset_to_pack_pos()'. This prevents any bogus OFS_DELTA that may make its way through until 'write_reuse_object()' from causing a bad memory read (if 'revidx' is 'NULL') Next, replace a direct access of '->nr' with the wrapper function 'pack_pos_to_index()'. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 2a00358..ab1fd85 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -419,7 +419,7 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry, { struct packed_git *p = IN_PACK(entry); struct pack_window *w_curs = NULL; - struct revindex_entry *revidx; + uint32_t pos; off_t offset; enum object_type type = oe_type(entry); off_t datalen; @@ -436,10 +436,15 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry, type, entry_size); offset = entry->in_pack_offset; - revidx = find_pack_revindex(p, offset); - datalen = revidx[1].offset - offset; + if (offset_to_pack_pos(p, offset, &pos) < 0) + die(_("write_reuse_object: could not locate %s, expected at " + "offset %"PRIuMAX" in pack %s"), + oid_to_hex(&entry->idx.oid), (uintmax_t)offset, + p->pack_name); + datalen = pack_pos_to_offset(p, pos + 1) - offset; if (!pack_to_stdout && p->index_version > 1 && - check_pack_crc(p, &w_curs, offset, datalen, revidx->nr)) { + check_pack_crc(p, &w_curs, offset, datalen, + pack_pos_to_index(p, pos))) { error(_("bad packed object CRC for %s"), oid_to_hex(&entry->idx.oid)); unuse_pack(&w_curs); -- cgit v0.10.2-6-g49f6 From 66cbd3e2fbb552aa7dc51a805fbbb9174abc5ccb Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 13 Jan 2021 17:23:39 -0500 Subject: write_reused_pack_one(): convert to new revindex API Replace direct revindex accesses with calls to 'pack_pos_to_offset()' and 'pack_pos_to_index()'. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index ab1fd85..8e40b19 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -868,8 +868,8 @@ static void write_reused_pack_one(size_t pos, struct hashfile *out, enum object_type type; unsigned long size; - offset = reuse_packfile->revindex[pos].offset; - next = reuse_packfile->revindex[pos + 1].offset; + offset = pack_pos_to_offset(reuse_packfile, pos); + next = pack_pos_to_offset(reuse_packfile, pos + 1); record_reused_object(offset, offset - hashfile_total(out)); @@ -889,11 +889,17 @@ static void write_reused_pack_one(size_t pos, struct hashfile *out, /* Convert to REF_DELTA if we must... */ if (!allow_ofs_delta) { - int base_pos = find_revindex_position(reuse_packfile, base_offset); + uint32_t base_pos; struct object_id base_oid; + if (offset_to_pack_pos(reuse_packfile, base_offset, &base_pos) < 0) + die(_("expected object at offset %"PRIuMAX" " + "in pack %s"), + (uintmax_t)base_offset, + reuse_packfile->pack_name); + nth_packed_object_id(&base_oid, reuse_packfile, - reuse_packfile->revindex[base_pos].nr); + pack_pos_to_index(reuse_packfile, base_pos)); len = encode_in_pack_object_header(header, sizeof(header), OBJ_REF_DELTA, size); -- cgit v0.10.2-6-g49f6 From 6a5c10c45f110abfef89c978cd8d5f7120f63d48 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 13 Jan 2021 17:23:43 -0500 Subject: write_reused_pack_verbatim(): convert to new revindex API Replace a direct access to the revindex array with 'pack_pos_to_offset()'. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 8e40b19..77ce558 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -952,7 +952,7 @@ static size_t write_reused_pack_verbatim(struct hashfile *out, off_t to_write; written = (pos * BITS_IN_EWORD); - to_write = reuse_packfile->revindex[written].offset + to_write = pack_pos_to_offset(reuse_packfile, written) - sizeof(struct pack_header); /* We're recording one chunk, not one object. */ -- cgit v0.10.2-6-g49f6 From eb3fd99efd82bc8d199e01634e6fb521d3a27641 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 13 Jan 2021 17:23:47 -0500 Subject: check_object(): convert to new revindex API Replace direct accesses to the revindex with calls to 'offset_to_pack_pos()' and 'pack_pos_to_index()'. Since this caller already had some error checking (it can jump to the 'give_up' label if it encounters an error), we can easily check whether or not the provided offset points to an object in the given pack. This error checking existed prior to this patch, too, since the caller checks whether the return value from 'find_pack_revindex()' was NULL or not. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 77ce558..5b0c448 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1817,11 +1817,11 @@ static void check_object(struct object_entry *entry, uint32_t object_index) goto give_up; } if (reuse_delta && !entry->preferred_base) { - struct revindex_entry *revidx; - revidx = find_pack_revindex(p, ofs); - if (!revidx) + uint32_t pos; + if (offset_to_pack_pos(p, ofs, &pos) < 0) goto give_up; - if (!nth_packed_object_id(&base_ref, p, revidx->nr)) + if (!nth_packed_object_id(&base_ref, p, + pack_pos_to_index(p, pos))) have_base = 1; } entry->in_pack_header_size = used + used_0; -- cgit v0.10.2-6-g49f6 From 57665086af39a6a9e1a911fd4871ff5eadc568c9 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 13 Jan 2021 17:23:52 -0500 Subject: bitmap_position_packfile(): convert to new revindex API Replace find_revindex_position() with its counterpart in the new API, offset_to_pack_pos(). Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/pack-bitmap.c b/pack-bitmap.c index d88745f..d6861dd 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -407,11 +407,14 @@ static inline int bitmap_position_extended(struct bitmap_index *bitmap_git, static inline int bitmap_position_packfile(struct bitmap_index *bitmap_git, const struct object_id *oid) { + uint32_t pos; off_t offset = find_pack_entry_one(oid->hash, bitmap_git->pack); if (!offset) return -1; - return find_revindex_position(bitmap_git->pack, offset); + if (offset_to_pack_pos(bitmap_git->pack, offset, &pos) < 0) + return -1; + return pos; } static int bitmap_position(struct bitmap_index *bitmap_git, -- cgit v0.10.2-6-g49f6 From cf98f2e8e09416b72bf984af4706c140de8794ab Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 13 Jan 2021 17:23:56 -0500 Subject: show_objects_for_type(): convert to new revindex API Avoid storing the revindex entry directly, since this structure will soon be removed from the public interface. Instead, store the offset and index position by calling 'pack_pos_to_offset()' and 'pack_pos_to_index()', respectively. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/pack-bitmap.c b/pack-bitmap.c index d6861dd..27a7a8a 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -711,21 +711,22 @@ static void show_objects_for_type( for (offset = 0; offset < BITS_IN_EWORD; ++offset) { struct object_id oid; - struct revindex_entry *entry; - uint32_t hash = 0; + uint32_t hash = 0, index_pos; + off_t ofs; if ((word >> offset) == 0) break; offset += ewah_bit_ctz64(word >> offset); - entry = &bitmap_git->pack->revindex[pos + offset]; - nth_packed_object_id(&oid, bitmap_git->pack, entry->nr); + index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset); + ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset); + nth_packed_object_id(&oid, bitmap_git->pack, index_pos); if (bitmap_git->hashes) - hash = get_be32(bitmap_git->hashes + entry->nr); + hash = get_be32(bitmap_git->hashes + index_pos); - show_reach(&oid, object_type, 0, hash, bitmap_git->pack, entry->offset); + show_reach(&oid, object_type, 0, hash, bitmap_git->pack, ofs); } } } -- cgit v0.10.2-6-g49f6 From a78a90324d1c5ccba2ab46cb70573518dd6eeade Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 13 Jan 2021 17:24:00 -0500 Subject: get_size_by_pos(): convert to new revindex API Remove another caller that holds onto a 'struct revindex_entry' by replacing the direct indexing with calls to 'pack_pos_to_offset()' and 'pack_pos_to_index()'. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/pack-bitmap.c b/pack-bitmap.c index 27a7a8a..89a528a 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -835,11 +835,11 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git, oi.sizep = &size; if (pos < pack->num_objects) { - struct revindex_entry *entry = &pack->revindex[pos]; - if (packed_object_info(the_repository, pack, - entry->offset, &oi) < 0) { + off_t ofs = pack_pos_to_offset(pack, pos); + if (packed_object_info(the_repository, pack, ofs, &oi) < 0) { struct object_id oid; - nth_packed_object_id(&oid, pack, entry->nr); + nth_packed_object_id(&oid, pack, + pack_pos_to_index(pack, pos)); die(_("unable to get size of %s"), oid_to_hex(&oid)); } } else { -- cgit v0.10.2-6-g49f6 From 011f3fd5cdacc69e2ba2adf998656adcd2045a0f Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 13 Jan 2021 17:24:05 -0500 Subject: try_partial_reuse(): convert to new revindex API Remove another instance of direct revindex manipulation by calling 'pack_pos_to_offset()' instead (the caller here does not care about the index position of the object at position 'pos'). Note that we cannot just use the existing "offset" variable to store the value we get from pack_pos_to_offset(). It is incremented by unpack_object_header(), but we later need the original value. Since we'll no longer have revindex->offset to read it from, we'll store that in a separate variable ("header" since it points to the entry's header bytes). Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/pack-bitmap.c b/pack-bitmap.c index 89a528a..1fdf7ce 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1069,23 +1069,21 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git, struct bitmap *reuse, struct pack_window **w_curs) { - struct revindex_entry *revidx; - off_t offset; + off_t offset, header; enum object_type type; unsigned long size; if (pos >= bitmap_git->pack->num_objects) return; /* not actually in the pack */ - revidx = &bitmap_git->pack->revindex[pos]; - offset = revidx->offset; + offset = header = pack_pos_to_offset(bitmap_git->pack, pos); type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size); if (type < 0) return; /* broken packfile, punt */ if (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA) { off_t base_offset; - int base_pos; + uint32_t base_pos; /* * Find the position of the base object so we can look it up @@ -1096,11 +1094,10 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git, * more detail. */ base_offset = get_delta_base(bitmap_git->pack, w_curs, - &offset, type, revidx->offset); + &offset, type, header); if (!base_offset) return; - base_pos = find_revindex_position(bitmap_git->pack, base_offset); - if (base_pos < 0) + if (offset_to_pack_pos(bitmap_git->pack, base_offset, &base_pos) < 0) return; /* -- cgit v0.10.2-6-g49f6 From 78232bf65d59601fad99b4f45a90effaf0ac1204 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 13 Jan 2021 17:24:27 -0500 Subject: rebuild_existing_bitmaps(): convert to new revindex API Remove another instance of looking at the revindex directly by instead calling 'pack_pos_to_index()'. Unlike other patches, this caller only cares about the index position of each object in the loop. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/pack-bitmap.c b/pack-bitmap.c index 1fdf7ce..60fe20f 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1392,11 +1392,10 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git, for (i = 0; i < num_objects; ++i) { struct object_id oid; - struct revindex_entry *entry; struct object_entry *oe; - entry = &bitmap_git->pack->revindex[i]; - nth_packed_object_id(&oid, bitmap_git->pack, entry->nr); + nth_packed_object_id(&oid, bitmap_git->pack, + pack_pos_to_index(bitmap_git->pack, i)); oe = packlist_find(mapping, &oid); if (oe) -- cgit v0.10.2-6-g49f6 From 45bef5c064e4a41c07b1ddedd7c238c1c55ae182 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 13 Jan 2021 17:24:32 -0500 Subject: get_delta_base_oid(): convert to new revindex API Replace direct accesses to the 'struct revindex' type with a call to 'pack_pos_to_index()'. Likewise drop the old-style 'find_pack_revindex()' with its replacement 'offset_to_pack_pos()' (while continuing to perform the same error checking). Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/packfile.c b/packfile.c index 86f5c8d..3e3f391 100644 --- a/packfile.c +++ b/packfile.c @@ -1235,18 +1235,18 @@ static int get_delta_base_oid(struct packed_git *p, oidread(oid, base); return 0; } else if (type == OBJ_OFS_DELTA) { - struct revindex_entry *revidx; + uint32_t base_pos; off_t base_offset = get_delta_base(p, w_curs, &curpos, type, delta_obj_offset); if (!base_offset) return -1; - revidx = find_pack_revindex(p, base_offset); - if (!revidx) + if (offset_to_pack_pos(p, base_offset, &base_pos) < 0) return -1; - return nth_packed_object_id(oid, p, revidx->nr); + return nth_packed_object_id(oid, p, + pack_pos_to_index(p, base_pos)); } else return -1; } -- cgit v0.10.2-6-g49f6 From 3a3f54dd0a70c2b6e8342b156d13e0bd941be07b Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 13 Jan 2021 17:24:36 -0500 Subject: retry_bad_packed_offset(): convert to new revindex API Perform exactly the same conversion as in the previous commit to another caller within 'packfile.c'. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/packfile.c b/packfile.c index 3e3f391..7c37f9e 100644 --- a/packfile.c +++ b/packfile.c @@ -1256,12 +1256,11 @@ static int retry_bad_packed_offset(struct repository *r, off_t obj_offset) { int type; - struct revindex_entry *revidx; + uint32_t pos; struct object_id oid; - revidx = find_pack_revindex(p, obj_offset); - if (!revidx) + if (offset_to_pack_pos(p, obj_offset, &pos) < 0) return OBJ_BAD; - nth_packed_object_id(&oid, p, revidx->nr); + nth_packed_object_id(&oid, p, pack_pos_to_index(p, pos)); mark_bad_packed_object(p, oid.hash); type = oid_object_info(r, &oid, NULL); if (type <= OBJ_NONE) -- cgit v0.10.2-6-g49f6 From fc150caf67d91b0eeba359d63b4ce707aba8b5ca Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 13 Jan 2021 17:24:41 -0500 Subject: packed_object_info(): convert to new revindex API Convert another call of 'find_pack_revindex()' to its replacement 'pack_pos_to_offset()'. Likewise: - Avoid manipulating `struct packed_git`'s `revindex` pointer directly by removing the pointer-as-array indexing. - Add an additional guard to check that the offset 'obj_offset()' points to a real object. This should be the case with well-behaved callers to 'packed_object_info()', but isn't guarenteed. Other blocks that fill in various other values from the 'struct object_info' request handle bad inputs by setting the type to 'OBJ_BAD' and jumping to 'out'. Do the same when given a bad offset here. The previous code would have segfaulted when given a bad 'obj_offset' value, since 'find_pack_revindex()' would return 'NULL', and then the line that fills 'oi->disk_sizep' would try to access 'NULL[1]' with a stride of 16 bytes (the width of 'struct revindex_entry)'. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/packfile.c b/packfile.c index 7c37f9e..bb4bb14 100644 --- a/packfile.c +++ b/packfile.c @@ -1537,8 +1537,15 @@ int packed_object_info(struct repository *r, struct packed_git *p, } if (oi->disk_sizep) { - struct revindex_entry *revidx = find_pack_revindex(p, obj_offset); - *oi->disk_sizep = revidx[1].offset - obj_offset; + uint32_t pos; + if (offset_to_pack_pos(p, obj_offset, &pos) < 0) { + error("could not find object at offset %"PRIuMAX" " + "in pack %s", (uintmax_t)obj_offset, p->pack_name); + type = OBJ_BAD; + goto out; + } + + *oi->disk_sizep = pack_pos_to_offset(p, pos + 1) - obj_offset; } if (oi->typep || oi->type_name) { -- cgit v0.10.2-6-g49f6 From 0a7e3642bc0d42a2c322886e64872ac9d736fa84 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 13 Jan 2021 17:24:45 -0500 Subject: unpack_entry(): convert to new revindex API Remove direct manipulation of the 'struct revindex_entry' type as well as calls to the deprecated API in 'packfile.c:unpack_entry()'. Usual clean-up is performed (replacing '->nr' with calls to 'pack_pos_to_index()' and so on). Add an additional check to make sure that 'obj_offset()' points at a valid object. In the case this check is violated, we cannot call 'mark_bad_packed_object()' because we don't know the OID. At the top of the call stack is do_oid_object_info_extended() (via packed_object_info()), which does mark the object. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/packfile.c b/packfile.c index bb4bb14..936ab3d 100644 --- a/packfile.c +++ b/packfile.c @@ -1694,11 +1694,21 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset, } if (do_check_packed_object_crc && p->index_version > 1) { - struct revindex_entry *revidx = find_pack_revindex(p, obj_offset); - off_t len = revidx[1].offset - obj_offset; - if (check_pack_crc(p, &w_curs, obj_offset, len, revidx->nr)) { + uint32_t pack_pos, index_pos; + off_t len; + + if (offset_to_pack_pos(p, obj_offset, &pack_pos) < 0) { + error("could not find object at offset %"PRIuMAX" in pack %s", + (uintmax_t)obj_offset, p->pack_name); + data = NULL; + goto out; + } + + len = pack_pos_to_offset(p, pack_pos + 1) - obj_offset; + index_pos = pack_pos_to_index(p, pack_pos); + if (check_pack_crc(p, &w_curs, obj_offset, len, index_pos)) { struct object_id oid; - nth_packed_object_id(&oid, p, revidx->nr); + nth_packed_object_id(&oid, p, index_pos); error("bad packed object CRC for %s", oid_to_hex(&oid)); mark_bad_packed_object(p, oid.hash); @@ -1781,11 +1791,11 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset, * This is costly but should happen only in the presence * of a corrupted pack, and is better than failing outright. */ - struct revindex_entry *revidx; + uint32_t pos; struct object_id base_oid; - revidx = find_pack_revindex(p, obj_offset); - if (revidx) { - nth_packed_object_id(&base_oid, p, revidx->nr); + if (!(offset_to_pack_pos(p, obj_offset, &pos))) { + nth_packed_object_id(&base_oid, p, + pack_pos_to_index(p, pos)); error("failed to read delta base object %s" " at offset %"PRIuMAX" from %s", oid_to_hex(&base_oid), (uintmax_t)obj_offset, -- cgit v0.10.2-6-g49f6 From b130aef65e492667229191dfd0689b74d88c6a84 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 13 Jan 2021 17:24:49 -0500 Subject: for_each_object_in_pack(): convert to new revindex API Avoid looking at the 'revindex' pointer directly and instead call 'pack_pos_to_index()'. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/packfile.c b/packfile.c index 936ab3d..7bb1750 100644 --- a/packfile.c +++ b/packfile.c @@ -2086,7 +2086,7 @@ int for_each_object_in_pack(struct packed_git *p, struct object_id oid; if (flags & FOR_EACH_OBJECT_PACK_ORDER) - pos = p->revindex[i].nr; + pos = pack_pos_to_index(p, i); else pos = i; -- cgit v0.10.2-6-g49f6 From 2891b434ac4b3e83b49d94b36825915734b8adcc Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 13 Jan 2021 17:24:54 -0500 Subject: builtin/gc.c: guess the size of the revindex 'estimate_repack_memory()' takes into account the amount of memory required to load the reverse index in memory by multiplying the assumed number of objects by the size of the 'revindex_entry' struct. Prepare for hiding the definition of 'struct revindex_entry' by removing a 'sizeof()' of that type from outside of pack-revindex.c. Instead, guess that one off_t and one uint32_t are required per object. Strictly speaking, this is a worse guess than asking for 'sizeof(struct revindex_entry)' directly, since the true size of this struct is 16 bytes with padding on the end of the struct in order to align the offset field. But, this is an approximation anyway, and it does remove a use of the 'struct revindex_entry' from outside of pack-revindex internals. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/builtin/gc.c b/builtin/gc.c index 4c24f41..c60811f 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -301,7 +301,7 @@ static uint64_t estimate_repack_memory(struct packed_git *pack) /* and then obj_hash[], underestimated in fact */ heap += sizeof(struct object *) * nr_objects; /* revindex is used also */ - heap += sizeof(struct revindex_entry) * nr_objects; + heap += (sizeof(off_t) + sizeof(uint32_t)) * nr_objects; /* * read_sha1_file() (either at delta calculation phase, or * writing phase) also fills up the delta base cache -- cgit v0.10.2-6-g49f6 From 1c3855f33b66accf106a015a4f94c3b7775c54f9 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 13 Jan 2021 17:24:58 -0500 Subject: pack-revindex: remove unused 'find_pack_revindex()' Now that no callers of 'find_pack_revindex()' remain, remove the function's declaration and implementation entirely. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/pack-revindex.c b/pack-revindex.c index 0ca3b54..16baafb 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -189,21 +189,6 @@ int find_revindex_position(struct packed_git *p, off_t ofs) return -1; } -struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs) -{ - int pos; - - if (load_pack_revindex(p)) - return NULL; - - pos = find_revindex_position(p, ofs); - - if (pos < 0) - return NULL; - - return p->revindex + pos; -} - int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos) { int ret; diff --git a/pack-revindex.h b/pack-revindex.h index 5a218aa..f7094ba 100644 --- a/pack-revindex.h +++ b/pack-revindex.h @@ -30,8 +30,6 @@ struct revindex_entry { int load_pack_revindex(struct packed_git *p); int find_revindex_position(struct packed_git *p, off_t ofs); -struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs); - /* * offset_to_pack_pos converts an object offset to a pack position. This * function returns zero on success, and a negative number otherwise. The -- cgit v0.10.2-6-g49f6 From 8389855a9b97c7447383e9938730d24054f33831 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 13 Jan 2021 17:25:02 -0500 Subject: pack-revindex: remove unused 'find_revindex_position()' Now that all 'find_revindex_position()' callers have been removed (and converted to the more descriptive 'offset_to_pack_pos()'), it is almost safe to get rid of 'find_revindex_position()' entirely. Almost, except for the fact that 'offset_to_pack_pos()' calls 'find_revindex_position()'. Inline 'find_revindex_position()' into 'offset_to_pack_pos()', and then remove 'find_revindex_position()' entirely. This is a straightforward refactoring with one minor snag. 'offset_to_pack_pos()' used to load the index before calling 'find_revindex_position()'. That means that by the time 'find_revindex_position()' starts executing, 'p->num_objects' can be safely read. After inlining, be careful to not read 'p->num_objects' until _after_ 'load_pack_revindex()' (which loads the index as a side-effect) has been called. Another small fix that is included is converting the upper- and lower-bounds to be unsigned's instead of ints. This dates back to 92e5c77c37 (revindex: export new APIs, 2013-10-24)--ironically, the last time we introduced new APIs here--but this unifies the types. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/pack-revindex.c b/pack-revindex.c index 16baafb..282fe92 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -169,16 +169,23 @@ int load_pack_revindex(struct packed_git *p) return 0; } -int find_revindex_position(struct packed_git *p, off_t ofs) +int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos) { - int lo = 0; - int hi = p->num_objects + 1; - const struct revindex_entry *revindex = p->revindex; + unsigned lo, hi; + const struct revindex_entry *revindex; + + if (load_pack_revindex(p) < 0) + return -1; + + lo = 0; + hi = p->num_objects + 1; + revindex = p->revindex; do { const unsigned mi = lo + (hi - lo) / 2; if (revindex[mi].offset == ofs) { - return mi; + *pos = mi; + return 0; } else if (ofs < revindex[mi].offset) hi = mi; else @@ -189,20 +196,6 @@ int find_revindex_position(struct packed_git *p, off_t ofs) return -1; } -int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos) -{ - int ret; - - if (load_pack_revindex(p) < 0) - return -1; - - ret = find_revindex_position(p, ofs); - if (ret < 0) - return ret; - *pos = ret; - return 0; -} - uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos) { if (!p->revindex) diff --git a/pack-revindex.h b/pack-revindex.h index f7094ba..746776b 100644 --- a/pack-revindex.h +++ b/pack-revindex.h @@ -28,7 +28,6 @@ struct revindex_entry { * given pack, returning zero on success and a negative value otherwise. */ int load_pack_revindex(struct packed_git *p); -int find_revindex_position(struct packed_git *p, off_t ofs); /* * offset_to_pack_pos converts an object offset to a pack position. This -- cgit v0.10.2-6-g49f6 From d5bc7c60c72ee239b5c5d3e4aa808d29412f78d8 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 13 Jan 2021 17:25:06 -0500 Subject: pack-revindex: hide the definition of 'revindex_entry' Now that all spots outside of pack-revindex.c that reference 'struct revindex_entry' directly have been removed, it is safe to hide the implementation by moving it from pack-revindex.h to pack-revindex.c. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/pack-revindex.c b/pack-revindex.c index 282fe92..a508d5f 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -3,6 +3,11 @@ #include "object-store.h" #include "packfile.h" +struct revindex_entry { + off_t offset; + unsigned int nr; +}; + /* * Pack index for existing packs give us easy access to the offsets into * corresponding pack file where each object's data starts, but the entries diff --git a/pack-revindex.h b/pack-revindex.h index 746776b..6e0320b 100644 --- a/pack-revindex.h +++ b/pack-revindex.h @@ -18,11 +18,6 @@ struct packed_git; -struct revindex_entry { - off_t offset; - unsigned int nr; -}; - /* * load_pack_revindex populates the revindex's internal data-structures for the * given pack, returning zero on success and a negative value otherwise. -- cgit v0.10.2-6-g49f6 From e5dcd7841828fd4c03dfc8a5c52691ada979b7e2 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 13 Jan 2021 17:25:10 -0500 Subject: pack-revindex.c: avoid direct revindex access in 'offset_to_pack_pos()' To prepare for on-disk reverse indexes, remove a spot in 'offset_to_pack_pos()' that looks at the 'revindex' array in 'struct packed_git'. Even though this use of the revindex pointer is within pack-revindex.c, this clean up is still worth doing. Since the 'revindex' pointer will be NULL when reading from an on-disk reverse index (instead the 'revindex_data' pointer will be mmaped to the 'pack-*.rev' file), this call-site would have to include a conditional to lookup the offset for position 'mi' each iteration through the search. So instead of open-coding 'pack_pos_to_offset()', call it directly from within 'offset_to_pack_pos()'. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/pack-revindex.c b/pack-revindex.c index a508d5f..5e69bc7 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -177,21 +177,21 @@ int load_pack_revindex(struct packed_git *p) int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos) { unsigned lo, hi; - const struct revindex_entry *revindex; if (load_pack_revindex(p) < 0) return -1; lo = 0; hi = p->num_objects + 1; - revindex = p->revindex; do { const unsigned mi = lo + (hi - lo) / 2; - if (revindex[mi].offset == ofs) { + off_t got = pack_pos_to_offset(p, mi); + + if (got == ofs) { *pos = mi; return 0; - } else if (ofs < revindex[mi].offset) + } else if (ofs < got) hi = mi; else lo = mi + 1; -- cgit v0.10.2-6-g49f6 From 779412b9d99544ae71eefabb699a109b1638f96c Mon Sep 17 00:00:00 2001 From: Jeff King Date: Thu, 14 Jan 2021 15:11:10 -0500 Subject: for_each_object_in_pack(): clarify pack vs index ordering We may return objects in one of two orders: how they appear in the .idx (sorted by object id) or how they appear in the packfile itself. To further complicate matters, we have two ordering variables, "i" and "pos", and it is not clear to which order they apply. Let's clarify this by using an unambiguous name where possible, and leaving a comment for the variable that does double-duty. Signed-off-by: Jeff King Acked-by: Taylor Blau Signed-off-by: Junio C Hamano diff --git a/packfile.c b/packfile.c index 7bb1750..35d50e2 100644 --- a/packfile.c +++ b/packfile.c @@ -2082,19 +2082,31 @@ int for_each_object_in_pack(struct packed_git *p, } for (i = 0; i < p->num_objects; i++) { - uint32_t pos; + uint32_t index_pos; struct object_id oid; + /* + * We are iterating "i" from 0 up to num_objects, but its + * meaning may be different, depending on the requested output + * order: + * + * - in object-name order, it is the same as the index order + * used by nth_packed_object_id(), so we can pass it + * directly + * + * - in pack-order, it is pack position, which we must + * convert to an index position in order to get the oid. + */ if (flags & FOR_EACH_OBJECT_PACK_ORDER) - pos = pack_pos_to_index(p, i); + index_pos = pack_pos_to_index(p, i); else - pos = i; + index_pos = i; - if (nth_packed_object_id(&oid, p, pos) < 0) + if (nth_packed_object_id(&oid, p, index_pos) < 0) return error("unable to get sha1 of object %u in %s", - pos, p->pack_name); + index_pos, p->pack_name); - r = cb(&oid, p, pos, data); + r = cb(&oid, p, index_pos, data); if (r) break; } -- cgit v0.10.2-6-g49f6