path: root/oidset.c
AgeCommit message (Collapse)Author
2021-09-12oidset: make oidset_size() an inline functionRené Scharfe
oidset_size() just reads a single word from memory and returns it. Avoid the function call overhead for this trivial operation by turning it into an inline function. While we're at it, declare its parameter const to allow it to be used on read-only oidsets. Suggested-by: Jeff King <> Signed-off-by: René Scharfe <> Signed-off-by: Junio C Hamano <>
2020-11-10blame: silently ignore invalid ignore file objectsRené Scharfe
Since 610e2b9240 (blame: validate and peel the object names on the ignore list, 2020-09-24) git blame reports checks if objects specified with --ignore-rev and in files loaded with --ignore-revs-file and config option blame.ignoreRevsFile are actual objects and dies if they aren't. The intent is to report typos to the user. This also breaks the ability to use a single ignore file for multiple repositories. Typos are presumably less likely in files than on the command line, so alerting is less useful here. Restore that feature by skipping non-commits without dying. Reported-by: Jean-Yves Avenard <> Signed-off-by: René Scharfe <> Reviewed-by: Barret Rhoden <> Signed-off-by: Junio C Hamano <>
2020-09-25blame: validate and peel the object names on the ignore listJunio C Hamano
The command reads list of object names to place on the ignore list either from the command line or from a file, but they are not checked with their object type (those read from the file are not even checked for object existence). Extend the oidset_parse_file() API and allow it to take a callback that can be used to die (e.g. when an inappropriate input is read) or modify the object name read (e.g. when a tag pointing at a commit is read, and the caller wants a commit object name), and use it in the code that handles ignore list. Signed-off-by: Junio C Hamano <>
2020-04-15oidset: introduce 'oidset_size'Taylor Blau
Occasionally, it may be useful for callers to know the number of object IDs in an oidset. Right now, the only way to compute this is to call 'kh_size' on the internal 'kh_set_oid_t'. Similar to how we wrap other 'kh_*' functions over the 'oidset' type, let's allow callers to compute this value by introducing 'oidset_size'. We will add its first caller in the subsequent commit. Signed-off-by: Taylor Blau <> Signed-off-by: Junio C Hamano <>
2019-07-19Merge branch 'br/blame-ignore'Junio C Hamano
"git blame" learned to "ignore" commits in the history, whose effects (as well as their presence) get ignored. * br/blame-ignore: t8014: remove unnecessary braces blame: drop some unused function parameters blame: add a test to cover blame_coalesce() blame: use the fingerprint heuristic to match ignored lines blame: add a fingerprint heuristic to match ignored lines blame: optionally track line fingerprints during fill_blame_origin() blame: add config options for the output of ignored or unblamable lines blame: add the ability to ignore commits and their changes blame: use a helper function in blame_chunk() Move oidset_parse_file() to oidset.c fsck: rename and touch up init_skiplist()
2019-06-20khash: rename kh_oid_t to kh_oid_setJeff King
khash lets us define a hash as either a map or a set (i.e., with no "value" type). For the oid maps we define, "oid" is the set and "oid_map" is the map. As the bug in the previous commit shows, it's easy to pick the wrong one. So let's make the names more distinct: "oid_set" and "oid_map". An alternative naming scheme would be to actually name the type after the key/value types. So e.g., "oid" _would_ be the set, since it has no value type. And "oid_map" would become "oid_void" or similar (and "oid_pos" becomes "oid_int"). That's better in some ways: it's more regular, and a given map type can be more reasily reused in multiple contexts (e.g., something storing an "int" that isn't a "pos"). But it's also slightly less descriptive. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2019-05-16Move oidset_parse_file() to oidset.cBarret Rhoden
Signed-off-by: Barret Rhoden <> Signed-off-by: Junio C Hamano <>
2018-10-04oidset: uninline oidset_init()René Scharfe
There is no need to inline oidset_init(), as it's typically only called twice in the lifetime of an oidset (once at the beginning and at the end by oidset_clear()) and kh_resize_* is quite big, so move its definition to oidset.c. Document it while we're at it. Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2018-10-04oidset: use khashRené Scharfe
Reimplement oidset using khash.h in order to reduce its memory footprint and make it faster. Performance of a command that mainly checks for duplicate objects using an oidset, with master and Clang 6.0.1: $ cmd="./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)'" $ /usr/bin/time $cmd >/dev/null 0.22user 0.03system 0:00.25elapsed 99%CPU (0avgtext+0avgdata 48484maxresident)k 0inputs+0outputs (0major+11204minor)pagefaults 0swaps $ hyperfine "$cmd" Benchmark #1: ./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)' Time (mean ± σ): 250.0 ms ± 6.0 ms [User: 225.9 ms, System: 23.6 ms] Range (min … max): 242.0 ms … 261.1 ms And with this patch: $ /usr/bin/time $cmd >/dev/null 0.14user 0.00system 0:00.15elapsed 100%CPU (0avgtext+0avgdata 41396maxresident)k 0inputs+0outputs (0major+8318minor)pagefaults 0swaps $ hyperfine "$cmd" Benchmark #1: ./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)' Time (mean ± σ): 151.9 ms ± 4.9 ms [User: 130.5 ms, System: 21.2 ms] Range (min … max): 148.2 ms … 170.4 ms Initial-patch-by: Jeff King <> Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2017-11-22oidset: add iterator methods to oidsetJeff Hostetler
Add the usual iterator methods to oidset. Add oidset_remove(). Signed-off-by: Jeff Hostetler <> Reviewed-by: Jonathan Tan <> Signed-off-by: Junio C Hamano <>
2017-10-01oidmap: map with OID as keyJonathan Tan
This is similar to using the hashmap in hashmap.c, but with an easier-to-use API. In particular, custom entry comparisons no longer need to be written, and lookups can be done without constructing a temporary entry structure. This is implemented as a thin wrapper over the hashmap API. In particular, this means that there is an additional 4-byte overhead due to the fact that the first 4 bytes of the hash is redundantly stored. For now, I'm taking the simpler approach, but if need be, we can reimplement oidmap without affecting the callers significantly. oidset has been updated to use oidmap. Signed-off-by: Jonathan Tan <> Signed-off-by: Junio C Hamano <>
2017-06-30hashmap.h: compare function has access to a data fieldStefan Beller
When using the hashmap a common need is to have access to caller provided data in the compare function. A couple of times we abuse the keydata field to pass in the data needed. This happens for example in patch-ids.c. This patch changes the function signature of the compare function to have one more void pointer available. The pointer given for each invocation of the compare function must be defined in the init function of the hashmap and is just passed through. Documentation of this new feature is deferred to a later patch. This is a rather mechanical conversion, just adding the new pass-through parameter. However while at it improve the naming of the fields of all compare functions used by hashmaps by ensuring unused parameters are prefixed with 'unused_' and naming the parameters what they are (instead of 'unused' make it 'unused_keydata'). Signed-off-by: Stefan Beller <> Signed-off-by: Junio C Hamano <>
2017-02-08add oidset APIJeff King
This is similar to many of our uses of sha1-array, but it overcomes one limitation of a sha1-array: when you are de-duplicating a large input with relatively few unique entries, sha1-array uses 20 bytes per non-unique entry. Whereas this set will use memory linear in the number of unique entries (albeit a few more than 20 bytes due to hashmap overhead). Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>