path: root/utf8.c
diff options
authorJeff King <>2018-07-24 10:50:10 (GMT)
committerJunio C Hamano <>2018-07-24 17:19:29 (GMT)
commit77aa03d6c7f07db4a5d34afe8f5b3a55e801057c (patch)
tree6ac1f8e8124f5878aaf228d84bc04d5357b4d310 /utf8.c
parent53f9a3e157dbbc901a02ac2c73346d375e24978c (diff)
reencode_string: use st_add/st_mult helpers
When converting a string with iconv, if the output buffer isn't big enough, we grow it. But our growth is done without any concern for integer overflow. So when we add: outalloc = sofar + insz * 2 + 32; we may end up wrapping outalloc (which is a size_t), and allocating a too-small buffer. We then manipulate it further: outsz = outalloc - sofar - 1; and feed outsz back to iconv. If outalloc is wrapped and smaller than sofar, we'll end up with a small allocation but feed a very large outsz to iconv, which could result in it overflowing the buffer. Can we use this to construct an attack wherein the victim clones a repository with a very large commit object with an encoding header, and running "git log" reencodes it into utf8, causing an overflow? An attack of this sort is likely impossible in practice. "sofar" is how many output bytes we've written total, and "insz" is the number of input bytes remaining. Imagine our input doubles in size as we output it (which is easy to do by converting latin1 to utf8, for example), and that we start with N input bytes. Our initial output buffer also starts at N bytes, so after the first call we'd have N/2 input bytes remaining (insz), and have written N bytes (sofar). That means our next allocation will be (N + N/2 * 2 + 32) bytes, or (2N + 32). We can therefore overflow a 32-bit size_t with a commit message that's just under 2^31 bytes, assuming it consists mostly of "doubling" sequences (e.g., latin1 0xe1 which becomes utf8 0xc3 0xa1). But we'll never make it that far with such a message. We'll be spending 2^31 bytes on the original string. And our initial output buffer will also be 2^31 bytes. Which is not going to succeed on a system with a 32-bit size_t, since there will be other things using the address space, too. The initial malloc will fail. If we imagine instead that we can triple the size when converting, then our second allocation becomes (N + 2/3N * 2 + 32), or (7/3N + 32). That still requires two allocations of 3/7 of our address space (6/7 of the total) to succeed. If we imagine we can quadruple, it becomes (5/2N + 32); we need to be able to allocate 4/5 of the address space to succeed. This might start to get plausible. But is it possible to get a 4-to-1 increase in size? Probably if you're converting to some obscure encoding. But since git defaults to utf8 for its output, that's the likely destination encoding for an attack. And while there are 4-character utf8 sequences, it's unlikely that you'd be able find a single-byte source sequence in any encoding. So this is certainly buggy code which should be fixed, but it is probably not a useful attack vector. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
Diffstat (limited to 'utf8.c')
1 files changed, 2 insertions, 2 deletions
diff --git a/utf8.c b/utf8.c
index d55e20c..a2fd24c 100644
--- a/utf8.c
+++ b/utf8.c
@@ -477,7 +477,7 @@ char *reencode_string_iconv(const char *in, size_t insz, iconv_t conv, int *outs
iconv_ibp cp;
outsz = insz;
- outalloc = outsz + 1; /* for terminating NUL */
+ outalloc = st_add(outsz, 1); /* for terminating NUL */
out = xmalloc(outalloc);
outpos = out;
cp = (iconv_ibp)in;
@@ -497,7 +497,7 @@ char *reencode_string_iconv(const char *in, size_t insz, iconv_t conv, int *outs
* converting the rest.
sofar = outpos - out;
- outalloc = sofar + insz * 2 + 32;
+ outalloc = st_add3(sofar, st_mult(insz, 2), 32);
out = xrealloc(out, outalloc);
outpos = out + sofar;
outsz = outalloc - sofar - 1;