path: root/builtin/diff.c
diff options
authorJeff King <>2020-03-30 14:03:09 (GMT)
committerJunio C Hamano <>2020-03-30 17:59:08 (GMT)
commit600bee4e70a1672bf1e8d50c3afd02090f1f32ae (patch)
tree545d6f832db3954aeb4873982476e42b61dd7247 /builtin/diff.c
parent274b9cc25322d9ee79aa8e6d4e86f0ffe5ced925 (diff)
oid_array: use size_t for count and allocation
The oid_array object uses an "int" to store the number of items and the allocated size. It's rather unlikely for somebody to have more than 2^31 objects in a repository (the sha1's alone would be 40GB!), but if they do, we'd overflow our alloc variable. You can reproduce this case with something like: git init repo cd repo # make a pack with 2^24 objects perl -e ' my $nr = 2**24; for (my $i = 0; $i < $nr; $i++) { print "blob\n"; print "data 4\n"; print pack("N", $i); } ' | git fast-import # now make 256 copies of it; most of these objects will be duplicates, # but oid_array doesn't de-dup until all values are read and it can # sort the result. cd .git/objects/pack/ pack=$(echo *.pack) idx=$(echo *.idx) for i in $(seq 0 255); do # no need to waste disk space ln "$pack" "pack-extra-$i.pack" ln "$idx" "pack-extra-$i.idx" done # and now force an oid_array to store all of it git cat-file --batch-all-objects --batch-check which results in: fatal: size_t overflow: 32 * 18446744071562067968 So the good news is that st_mult() sees the problem (the large number is because our int wraps negative, and then that gets cast to a size_t), doing the job it was meant to: bailing in crazy situations rather than causing an undersized buffer. But we should avoid hitting this case at all, and instead limit ourselves based on what malloc() is willing to give us. We can easily do that by switching to size_t. The cat-file process above made it to ~120GB virtual set size before the integer overflow (our internal hash storage is 32-bytes now in preparation for sha256, so we'd expect ~128GB total needed, plus potentially more to copy from one realloc'd block to another)). After this patch (and about 130GB of RAM+swap), it does eventually read in the whole set. No test for obvious reasons. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
Diffstat (limited to 'builtin/diff.c')
0 files changed, 0 insertions, 0 deletions