summaryrefslogtreecommitdiff
path: root/ewah
AgeCommit message (Collapse)Author
2021-03-14use CALLOC_ARRAYRené Scharfe
Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08bitmap: implement bitmap_is_subset()Derrick Stolee
The bitmap_is_subset() function checks if the 'self' bitmap contains any bitmaps that are not on in the 'other' bitmap. Up until this patch, it had a declaration, but no implementation or callers. A subsequent patch will want this function, so implement it here. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08ewah: add bitmap_dup() functionJeff King
There's no easy way to make a copy of a bitmap. Obviously a caller can iterate over the bits and set them one by one in a new bitmap, but we can go much faster by copying whole words with memcpy(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08ewah: implement bitmap_or()Jeff King
We have a function to bitwise-OR an ewah into an uncompressed bitmap, but not to OR two uncompressed bitmaps. Let's add it. Interestingly, we have a public header declaration going back to e1273106f6 (ewah: compressed bitmap implementation, 2013-11-14), but the function was never implemented. That was all OK since there were no users of 'bitmap_or()', but a first caller will be added in a couple of patches. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08ewah: make bitmap growth less aggressiveJeff King
If you ask to set a bit in the Nth word and we haven't yet allocated that many slots in our array, we'll increase the bitmap size to 2*N. This means we might frequently end up with bitmaps that are twice the necessary size (as soon as you ask for the biggest bit, we'll size up to twice that). But if we just allocate as many words as were asked for, we may not grow fast enough. The worst case there is setting bit 0, then 1, etc. Each time we grow we'd just extend by one more word, giving us linear reallocations (and quadratic memory copies). A middle ground is relying on alloc_nr(), which causes us to grow by a factor of roughly 3/2 instead of 2. That's less aggressive than doubling, and it may help avoid fragmenting memory. (If we start with N, then grow twice, our total is N*(3/2)^2 = 9N/4. After growing twice, that array of size 9N/4 can fit into the space vacated by the original array and first growth, N+3N/2 = 10N/4 > 9N/4, leading to less fragmentation in memory). Our worst case is still 3/2N wasted bits (you set bit N-1, then setting bit N causes us to grow by 3/2), but our average should be much better. This isn't usually that big a deal, but it will matter as we shift the reachability bitmap generation code to store more bitmaps in memory. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08ewah: factor out bitmap growthJeff King
We auto-grow bitmaps when somebody asks to set a bit whose position is outside of our currently allocated range. Other operations besides single bit-setting might need to do this, too, so let's pull it into its own function. Note that we change the semantics a little: you now ask for the number of words you'd like to have, not the id of the block you'd like to write to. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08ewah/ewah_bitmap.c: avoid open-coding ALLOC_GROW()Taylor Blau
'ewah/ewah_bitmap.c:buffer_grow()' is responsible for growing the buffer used to store the bits of an EWAH bitmap. It is essentially doing the same task as the 'ALLOC_GROW()' macro, so use that instead. This simplifies the callers of 'buffer_grow()', who no longer have to ask for a specific size, but rather specify how much of the buffer they need. They also no longer need to guard 'buffer_grow()' behind an if statement, since 'ALLOC_GROW()' (and, by extension, 'buffer_grow()') is a noop if the buffer is already large enough. But, the most significant change is that this fixes a bug when calling buffer_grow() with both 'alloc_size' and 'new_size' set to 1. In this case, truncating integer math will leave the new size set to 1, causing the buffer to never grow. Instead, let alloc_nr() handle this, which asks for '(new_size + 16) * 3 / 2' instead of 'new_size * 3 / 2'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-02Merge branch 'jk/object-filter-with-bitmap'Junio C Hamano
The object reachability bitmap machinery and the partial cloning machinery were not prepared to work well together, because some object-filtering criteria that partial clones use inherently rely on object traversal, but the bitmap machinery is an optimization to bypass that object traversal. There however are some cases where they can work together, and they were taught about them. * jk/object-filter-with-bitmap: rev-list --count: comment on the use of count_right++ pack-objects: support filters with bitmaps pack-bitmap: implement BLOB_LIMIT filtering pack-bitmap: implement BLOB_NONE filtering bitmap: add bitmap_unset() function rev-list: use bitmap filters for traversal pack-bitmap: basic noop bitmap filter infrastructure rev-list: allow commit-only bitmap traversals t5310: factor out bitmap traversal comparison rev-list: allow bitmaps when counting objects rev-list: make --count work with --objects rev-list: factor out bitmap-optimized routines pack-bitmap: refuse to do a bitmap traversal with pathspecs rev-list: fallback to non-bitmap traversal when filtering pack-bitmap: fix leak of haves/wants object lists pack-bitmap: factor out type iterator initialization
2020-02-14bitmap: add bitmap_unset() functionJeff King
We've never needed to unset an individual bit in a bitmap until now. Typically they start with all bits unset and we bitmap_set() them, or we are applying another bitmap as a mask. But the easiest way to apply an object filter to a bitmap result will be to unset the individual bits. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-23ewah/bitmap: introduce bitmap_word_alloc()Jeff King
In a following commit we will need to allocate a variable number of bitmap words, instead of always 32, so let's add bitmap_word_alloc() for this purpose. Note that we have to adjust the block growth in bitmap_set(), since a caller could now use an initial size of "0" (we don't plan to do that, but it doesn't hurt to be defensive). Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-29ewok_rlw.h: add missing 'inline' to function definitionRamsay Jones
The 'ewok_rlw.h' header file contains the rlw_get_run_bit() function definition, which is marked as 'static' but not 'inline'. At least when compiled by gcc, with the default -O2 optimization level, the function is actually inlined and leaves no static version in the ewah_bitmap.o and ewah_rlw.o object files. Despite this, add the missing 'inline' keyword to better describe the intended behaviour. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-20ewah/ewok_rlw.h: add missing include (hdr-check)Ramsay Jones
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-21ewah: delete unused 'rlwit_discharge_empty()'Junio C Hamano
Complete the removal of unused 'ewah bitmap' code by removing the now unused 'rlwit_discharge_empty()' function. Also, the 'ewah_clear()' function can now be made a file-scope static symbol. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18ewah: drop ewah_serialize_native functionJeff King
We don't call this function, and never have. The on-disk bitmap format uses network-byte-order integers, meaning that we cannot use the native-byte-order format written here. Let's drop it in the name of simplicity. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18ewah: drop ewah_deserialize functionJeff King
We don't call this function, and in fact never have since it was added (at least not in iterations of the ewah patches that got merged). Instead we use ewah_read_mmap(). Let's drop the unused code. Note to anybody who later wants to resurrect this: it does not check for integer overflow in the ewah data size, meaning it may be possible to convince the code to allocate a too-small buffer and read() into it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18ewah_io: delete unused 'ewah_serialize()'Derrick Stolee
Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18ewah_bitmap: delete unused 'ewah_or()'Derrick Stolee
Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18ewah_bitmap: delete unused 'ewah_not()'Derrick Stolee
Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18ewah_bitmap: delete unused 'ewah_and_not()'Derrick Stolee
Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18ewah_bitmap: delete unused 'ewah_and()'Derrick Stolee
Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18ewah/bitmap.c: delete unused 'bitmap_each_bit()'Derrick Stolee
Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18ewah/bitmap.c: delete unused 'bitmap_clear()'Derrick Stolee
Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18ewah_read_mmap: bounds-check mmap readsJeff King
The on-disk ewah format tells us how big the ewah data is, and we blindly read that much from the buffer without considering whether the mmap'd data is long enough, which can lead to out-of-bound reads. Let's make sure we have data available before reading it, both for the ewah header/footer as well as for the bit data itself. In particular: - keep our ptr/len pair in sync as we move through the buffer, and check it before each read - check the size for integer overflow (this should be impossible on 64-bit, as the size is given as a 32-bit count of 8-byte words, but is possible on a 32-bit system) - return the number of bytes read as an ssize_t instead of an int, again to prevent integer overflow - compute the return value using a pointer difference; this should yield the same result as the existing code, but makes it more obvious that we got our computations right The included test is far from comprehensive, as it just picks a static point at which to truncate the generated bitmap. But in practice this will hit in the middle of an ewah and make sure we're at least exercising this code. Reported-by: Luat Nguyen <root@l4w.io> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-09Replace Free Software Foundation address in license noticesTodd Zullinger
The mailing address for the FSF has changed over the years. Rather than updating the address across all files, refer readers to gnu.org, as the GNU GPL documentation now suggests for license notices. The mailing address is retained in the full license files (COPYING and LGPL-2.1). The old address is still present in t/diff-lib/COPYING. This is intentional, as the file is used in tests and the contents are not expected to change. Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-10use DIV_ROUND_UPRené Scharfe
Convert code that divides and rounds up to use DIV_ROUND_UP to make the intent clearer and reduce the number of magic constants. Signed-off-by: Rene Scharfe <l.s.r@web.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-13Merge branch 'jk/ewah-use-right-type-in-sizeof'Junio C Hamano
Code clean-up. * jk/ewah-use-right-type-in-sizeof: ewah: fix eword_t/uint64_t confusion
2017-03-06ewah: fix eword_t/uint64_t confusionJeff King
The ewah subsystem typedefs eword_t to be uint64_t, but some code uses a bare uint64_t. This isn't a bug now, but it's a potential maintenance problem if the definition of eword_t ever changes. Let's use the correct type. Note that we can't use COPY_ARRAY() here because the source and destination point to objects of different sizes. For that reason we'll also skip the usual "sizeof(*dst)" and use the real type, which should make it more clear that there's something tricky going on. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22ewah: convert to REALLOC_ARRAY, etcJeff King
Now that we're built around xmalloc and friends, we can use helpers like REALLOC_ARRAY, ALLOC_GROW, and so on to make the code shorter and protect against integer overflow. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22convert ewah/bitmap code to use xmallocJeff King
This code was originally written with the idea that it could be spun off into its own ewah library, and uses the overrideable ewah_malloc to do allocations. We plug in xmalloc as our ewah_malloc, of course. But over the years the ewah code itself has become more entangled with git, and the return value of many ewah_malloc sites is not checked. Let's just drop the level of indirection and use xmalloc and friends directly. This saves a few lines, and will let us adapt these sites to our more advanced malloc helpers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-24Merge branch 'es/osx-header-pollutes-mask-macro'Junio C Hamano
* es/osx-header-pollutes-mask-macro: ewah: use less generic macro name ewah/bitmap: silence warning about MASK macro redefinition
2015-06-03ewah: use less generic macro nameJeff King
The ewah/ewok.h header pollutes the global namespace with "BITS_IN_WORD", without any specific notion that we are talking about the bits in an eword_t. We can give this the more specific name "BITS_IN_EWORD". Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-03ewah/bitmap: silence warning about MASK macro redefinitionEric Sunshine
On PowerPC Mac OS X (10.5.8 "Leopard" with Xcode 3.1), system header /usr/include/ppc/param.h[1] pollutes the preprocessor namespace with a macro generically named MASK. This conflicts with the same-named macro in ewah/bitmap.c. We can avoid this conflict by using a more specific name. [1]: Included indirectly via: git-compat-util.h -> sys/sysctl.h -> sys/ucred.h -> sys/param.h -> machine/param.h -> ppc/param.h Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12ewah: add convenient wrapper ewah_serialize_strbuf()Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-18Merge branch 'jk/pack-bitmap'Junio C Hamano
The pack bitmap support did not build with older versions of GCC. * jk/pack-bitmap: ewah: fix building with gcc < 3.4.0
2015-02-04ewah: fix building with gcc < 3.4.0Tom G. Christensen
The __builtin_ctzll function was added in gcc 3.4.0. This extends the check for gcc so that use of __builtin_ctzll is only enabled if gcc >= 3.4.0. Signed-off-by: Tom G. Christensen <tgc@statsbiblioteket.dk> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-29ewah: delete unused ewah_read_mmap_native declarationNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-29ewah: fix constness of ewah_read_mmapNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-22ewah_bitmap.c: do not assume size_t and eword_t are the same sizeKyle J. McKay
When buffer_grow changes the size of the buffer using realloc, it first computes and saves the rlw pointer's offset into the buffer using (uint8_t *) math before the realloc but then restores it using (eword_t *) math. In order to do this it's necessary to convert the (uint8_t *) offset into an (eword_t *) offset. It was doing this by dividing by the sizeof(size_t). Unfortunately sizeof(size_t) is not same as sizeof(eword_t) on all platforms. This causes illegal memory accesses and other bad things to happen when attempting to use bitmaps on those platforms. Fix this by dividing by the sizeof(eword_t) instead which will always be correct for all platforms. Signed-off-by: Kyle J. McKay <mackyle@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-12ewah: unconditionally ntohll ewah dataJeff King
Commit a201c20 tried to optimize out a loop like: for (i = 0; i < len; i++) data[i] = ntohll(data[i]); in the big-endian case, because we know that ntohll is a noop, and we do not need to pay the cost of the loop at all. However, it mistakenly assumed that __BYTE_ORDER was always defined, whereas it may not be on systems which do not define it by default, and where we did not need to define it to set up the ntohll macro. This includes OS X and Windows. We could muck with the ordering in compat/bswap.h to make sure it is defined unconditionally, but it is simpler to still to just execute the loop unconditionally. That avoids the application code knowing anything about these magic macros, and lets it depend only on having ntohll defined. And since the resulting loop looks like (on a big-endian system): for (i = 0; i < len; i++) data[i] = data[i]; any decent compiler can probably optimize it out. Original report and analysis by Brian Gernhardt. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-23ewah: support platforms that require aligned readsVicent Marti
The caller may hand us an unaligned buffer (e.g., because it is an mmap of a file with many ewah bitmaps). On some platforms (like SPARC) this can cause a bus error. We can fix it with a combination of get_be32 and moving the data into an aligned buffer (which we would do anyway, but we can move it before fixing the endianness). Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30ewah: compressed bitmap implementationVicent Marti
EWAH is a word-aligned compressed variant of a bitset (i.e. a data structure that acts as a 0-indexed boolean array for many entries). It uses a 64-bit run-length encoding (RLE) compression scheme, trading some compression for better processing speed. The goal of this word-aligned implementation is not to achieve the best compression, but rather to improve query processing time. As it stands right now, this EWAH implementation will always be more efficient storage-wise than its uncompressed alternative. EWAH arrays will be used as the on-disk format to store reachability bitmaps for all objects in a repository while keeping reasonable sizes, in the same way that JGit does. This EWAH implementation is a mostly straightforward port of the original `javaewah` library that JGit currently uses. The library is self-contained and has been embedded whole (4 files) inside the `ewah` folder to ease redistribution. The library is re-licensed under the GPLv2 with the permission of Daniel Lemire, the original author. The source code for the C version can be found on GitHub: https://github.com/vmg/libewok The original Java implementation can also be found on GitHub: https://github.com/lemire/javaewah [jc: stripped debug-only code per Peff's $gmane/239768] Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>