path: root/ewah
AgeCommit message (Collapse)Author
2017-03-13Merge branch 'jk/ewah-use-right-type-in-sizeof'Junio C Hamano
Code clean-up. * jk/ewah-use-right-type-in-sizeof: ewah: fix eword_t/uint64_t confusion
2017-03-06ewah: fix eword_t/uint64_t confusionJeff King
The ewah subsystem typedefs eword_t to be uint64_t, but some code uses a bare uint64_t. This isn't a bug now, but it's a potential maintenance problem if the definition of eword_t ever changes. Let's use the correct type. Note that we can't use COPY_ARRAY() here because the source and destination point to objects of different sizes. For that reason we'll also skip the usual "sizeof(*dst)" and use the real type, which should make it more clear that there's something tricky going on. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2016-02-22ewah: convert to REALLOC_ARRAY, etcJeff King
Now that we're built around xmalloc and friends, we can use helpers like REALLOC_ARRAY, ALLOC_GROW, and so on to make the code shorter and protect against integer overflow. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2016-02-22convert ewah/bitmap code to use xmallocJeff King
This code was originally written with the idea that it could be spun off into its own ewah library, and uses the overrideable ewah_malloc to do allocations. We plug in xmalloc as our ewah_malloc, of course. But over the years the ewah code itself has become more entangled with git, and the return value of many ewah_malloc sites is not checked. Let's just drop the level of indirection and use xmalloc and friends directly. This saves a few lines, and will let us adapt these sites to our more advanced malloc helpers. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2015-06-24Merge branch 'es/osx-header-pollutes-mask-macro'Junio C Hamano
* es/osx-header-pollutes-mask-macro: ewah: use less generic macro name ewah/bitmap: silence warning about MASK macro redefinition
2015-06-03ewah: use less generic macro nameJeff King
The ewah/ewok.h header pollutes the global namespace with "BITS_IN_WORD", without any specific notion that we are talking about the bits in an eword_t. We can give this the more specific name "BITS_IN_EWORD". Signed-off-by: Jeff King <> Reviewed-by: Eric Sunshine <> Signed-off-by: Junio C Hamano <>
2015-06-03ewah/bitmap: silence warning about MASK macro redefinitionEric Sunshine
On PowerPC Mac OS X (10.5.8 "Leopard" with Xcode 3.1), system header /usr/include/ppc/param.h[1] pollutes the preprocessor namespace with a macro generically named MASK. This conflicts with the same-named macro in ewah/bitmap.c. We can avoid this conflict by using a more specific name. [1]: Included indirectly via: git-compat-util.h -> sys/sysctl.h -> sys/ucred.h -> sys/param.h -> machine/param.h -> ppc/param.h Signed-off-by: Eric Sunshine <> Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2015-03-12ewah: add convenient wrapper ewah_serialize_strbuf()Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2015-02-18Merge branch 'jk/pack-bitmap'Junio C Hamano
The pack bitmap support did not build with older versions of GCC. * jk/pack-bitmap: ewah: fix building with gcc < 3.4.0
2015-02-04ewah: fix building with gcc < 3.4.0Tom G. Christensen
The __builtin_ctzll function was added in gcc 3.4.0. This extends the check for gcc so that use of __builtin_ctzll is only enabled if gcc >= 3.4.0. Signed-off-by: Tom G. Christensen <> Reviewed-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2014-04-29ewah: delete unused ewah_read_mmap_native declarationNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2014-04-29ewah: fix constness of ewah_read_mmapNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <> Signed-off-by: Junio C Hamano <>
2014-04-22ewah_bitmap.c: do not assume size_t and eword_t are the same sizeKyle J. McKay
When buffer_grow changes the size of the buffer using realloc, it first computes and saves the rlw pointer's offset into the buffer using (uint8_t *) math before the realloc but then restores it using (eword_t *) math. In order to do this it's necessary to convert the (uint8_t *) offset into an (eword_t *) offset. It was doing this by dividing by the sizeof(size_t). Unfortunately sizeof(size_t) is not same as sizeof(eword_t) on all platforms. This causes illegal memory accesses and other bad things to happen when attempting to use bitmaps on those platforms. Fix this by dividing by the sizeof(eword_t) instead which will always be correct for all platforms. Signed-off-by: Kyle J. McKay <> Acked-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2014-02-12ewah: unconditionally ntohll ewah dataJeff King
Commit a201c20 tried to optimize out a loop like: for (i = 0; i < len; i++) data[i] = ntohll(data[i]); in the big-endian case, because we know that ntohll is a noop, and we do not need to pay the cost of the loop at all. However, it mistakenly assumed that __BYTE_ORDER was always defined, whereas it may not be on systems which do not define it by default, and where we did not need to define it to set up the ntohll macro. This includes OS X and Windows. We could muck with the ordering in compat/bswap.h to make sure it is defined unconditionally, but it is simpler to still to just execute the loop unconditionally. That avoids the application code knowing anything about these magic macros, and lets it depend only on having ntohll defined. And since the resulting loop looks like (on a big-endian system): for (i = 0; i < len; i++) data[i] = data[i]; any decent compiler can probably optimize it out. Original report and analysis by Brian Gernhardt. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2014-01-23ewah: support platforms that require aligned readsVicent Marti
The caller may hand us an unaligned buffer (e.g., because it is an mmap of a file with many ewah bitmaps). On some platforms (like SPARC) this can cause a bus error. We can fix it with a combination of get_be32 and moving the data into an aligned buffer (which we would do anyway, but we can move it before fixing the endianness). Signed-off-by: Vicent Marti <> Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2013-12-30ewah: compressed bitmap implementationVicent Marti
EWAH is a word-aligned compressed variant of a bitset (i.e. a data structure that acts as a 0-indexed boolean array for many entries). It uses a 64-bit run-length encoding (RLE) compression scheme, trading some compression for better processing speed. The goal of this word-aligned implementation is not to achieve the best compression, but rather to improve query processing time. As it stands right now, this EWAH implementation will always be more efficient storage-wise than its uncompressed alternative. EWAH arrays will be used as the on-disk format to store reachability bitmaps for all objects in a repository while keeping reasonable sizes, in the same way that JGit does. This EWAH implementation is a mostly straightforward port of the original `javaewah` library that JGit currently uses. The library is self-contained and has been embedded whole (4 files) inside the `ewah` folder to ease redistribution. The library is re-licensed under the GPLv2 with the permission of Daniel Lemire, the original author. The source code for the C version can be found on GitHub: The original Java implementation can also be found on GitHub: [jc: stripped debug-only code per Peff's $gmane/239768] Signed-off-by: Vicent Marti <> Signed-off-by: Jeff King <> Helped-by: Ramsay Jones <> Signed-off-by: Junio C Hamano <>