path: root/oidset.h
AgeCommit message (Collapse)Author
2020-05-01Merge branch 'tb/commit-graph-split-strategy'Junio C Hamano
"git commit-graph write" learned different ways to write out split files. * tb/commit-graph-split-strategy: Revert "commit-graph.c: introduce '--[no-]check-oids'" commit-graph.c: introduce '--[no-]check-oids' commit-graph.h: replace 'commit_hex' with 'commits' oidset: introduce 'oidset_size' builtin/commit-graph.c: introduce split strategy 'replace' builtin/commit-graph.c: introduce split strategy 'no-merge' builtin/commit-graph.c: support for '--split[=<strategy>]' t/helper/test-read-graph.c: support commit-graph chains
2020-04-22Merge branch 'jk/oid-array-cleanups'Junio C Hamano
Code cleanup. * jk/oid-array-cleanups: oidset: stop referring to sha1-array ref-filter: stop referring to "sha1 array" bisect: stop referring to sha1_array test-tool: rename sha1-array to oid-array oid_array: rename source file from sha1-array oid_array: use size_t for iteration oid_array: use size_t for count and allocation
2020-04-15oidset: introduce 'oidset_size'Taylor Blau
Occasionally, it may be useful for callers to know the number of object IDs in an oidset. Right now, the only way to compute this is to call 'kh_size' on the internal 'kh_set_oid_t'. Similar to how we wrap other 'kh_*' functions over the 'oidset' type, let's allow callers to compute this value by introducing 'oidset_size'. We will add its first caller in the subsequent commit. Signed-off-by: Taylor Blau <> Signed-off-by: Junio C Hamano <>
2020-03-30oidset: stop referring to sha1-arrayJeff King
Ths has been oid_array for some time, though the source only recently moved from sha1-array.[ch] to oid-array.[ch]. In either case, we should say "oid-array" here. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2020-03-25Merge branch 'en/oidset-uninclude-hashmap'Junio C Hamano
Code clean-up. * en/oidset-uninclude-hashmap: oidset: remove unnecessary include
2020-03-15oidset: remove unnecessary includeElijah Newren
When commit 8b2f8cbcb1 ("oidset: use khash", 2018-10-04) moved from using oidmap to khash, it replaced the oidmap.h include with both one for hashmap.h and khash.h. Since the hashmap.h header is unnecessary, and the point of the patch was to switch from hashmap (used by oidmap) to khash.h, remove the unneccessary include. Signed-off-by: Elijah Newren <> Signed-off-by: Junio C Hamano <>
2019-07-19Merge branch 'br/blame-ignore'Junio C Hamano
"git blame" learned to "ignore" commits in the history, whose effects (as well as their presence) get ignored. * br/blame-ignore: t8014: remove unnecessary braces blame: drop some unused function parameters blame: add a test to cover blame_coalesce() blame: use the fingerprint heuristic to match ignored lines blame: add a fingerprint heuristic to match ignored lines blame: optionally track line fingerprints during fill_blame_origin() blame: add config options for the output of ignored or unblamable lines blame: add the ability to ignore commits and their changes blame: use a helper function in blame_chunk() Move oidset_parse_file() to oidset.c fsck: rename and touch up init_skiplist()
2019-06-20khash: rename kh_oid_t to kh_oid_setJeff King
khash lets us define a hash as either a map or a set (i.e., with no "value" type). For the oid maps we define, "oid" is the set and "oid_map" is the map. As the bug in the previous commit shows, it's easy to pick the wrong one. So let's make the names more distinct: "oid_set" and "oid_map". An alternative naming scheme would be to actually name the type after the key/value types. So e.g., "oid" _would_ be the set, since it has no value type. And "oid_map" would become "oid_void" or similar (and "oid_pos" becomes "oid_int"). That's better in some ways: it's more regular, and a given map type can be more reasily reused in multiple contexts (e.g., something storing an "int" that isn't a "pos"). But it's also slightly less descriptive. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2019-05-16Move oidset_parse_file() to oidset.cBarret Rhoden
Signed-off-by: Barret Rhoden <> Signed-off-by: Junio C Hamano <>
2019-04-01khash: move oid hash table definitionbrian m. carlson
Move the oid khash table definition to khash.h and define a typedef for it, similar to the one we have for unsigned char pointers. Define variants that are maps as well. Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2018-10-04oidset: uninline oidset_init()René Scharfe
There is no need to inline oidset_init(), as it's typically only called twice in the lifetime of an oidset (once at the beginning and at the end by oidset_clear()) and kh_resize_* is quite big, so move its definition to oidset.c. Document it while we're at it. Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2018-10-04oidset: use khashRené Scharfe
Reimplement oidset using khash.h in order to reduce its memory footprint and make it faster. Performance of a command that mainly checks for duplicate objects using an oidset, with master and Clang 6.0.1: $ cmd="./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)'" $ /usr/bin/time $cmd >/dev/null 0.22user 0.03system 0:00.25elapsed 99%CPU (0avgtext+0avgdata 48484maxresident)k 0inputs+0outputs (0major+11204minor)pagefaults 0swaps $ hyperfine "$cmd" Benchmark #1: ./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)' Time (mean ± σ): 250.0 ms ± 6.0 ms [User: 225.9 ms, System: 23.6 ms] Range (min … max): 242.0 ms … 261.1 ms And with this patch: $ /usr/bin/time $cmd >/dev/null 0.14user 0.00system 0:00.15elapsed 100%CPU (0avgtext+0avgdata 41396maxresident)k 0inputs+0outputs (0major+8318minor)pagefaults 0swaps $ hyperfine "$cmd" Benchmark #1: ./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)' Time (mean ± σ): 151.9 ms ± 4.9 ms [User: 130.5 ms, System: 21.2 ms] Range (min … max): 148.2 ms … 170.4 ms Initial-patch-by: Jeff King <> Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2018-01-08oidset: don't return value from oidset_initThomas Gummerer
c3a9ad3117 ("oidset: add iterator methods to oidset", 2017-11-21) introduced a 'oidset_init()' function in oidset.h, which has void as return type, but returns an expression. This makes the solaris compiler fail with: "oidset.h", line 30: void function cannot return value As the return type is void, and even the return type of the expression we're trying to return (oidmap_init) is void just remove the return statement to fix the compiler error. Signed-off-by: Thomas Gummerer <> Signed-off-by: Junio C Hamano <>
2017-11-22oidset: add iterator methods to oidsetJeff Hostetler
Add the usual iterator methods to oidset. Add oidset_remove(). Signed-off-by: Jeff Hostetler <> Reviewed-by: Jonathan Tan <> Signed-off-by: Junio C Hamano <>
2017-10-01oidmap: map with OID as keyJonathan Tan
This is similar to using the hashmap in hashmap.c, but with an easier-to-use API. In particular, custom entry comparisons no longer need to be written, and lookups can be done without constructing a temporary entry structure. This is implemented as a thin wrapper over the hashmap API. In particular, this means that there is an additional 4-byte overhead due to the fact that the first 4 bytes of the hash is redundantly stored. For now, I'm taking the simpler approach, but if need be, we can reimplement oidmap without affecting the callers significantly. oidset has been updated to use oidmap. Signed-off-by: Jonathan Tan <> Signed-off-by: Junio C Hamano <>
2017-02-08add oidset APIJeff King
This is similar to many of our uses of sha1-array, but it overcomes one limitation of a sha1-array: when you are de-duplicating a large input with relatively few unique entries, sha1-array uses 20 bytes per non-unique entry. Whereas this set will use memory linear in the number of unique entries (albeit a few more than 20 bytes due to hashmap overhead). Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>