path: root/hex.c
AgeCommit message (Collapse)Author
2021-04-27hex: print objects using the hash algorithm memberbrian m. carlson
Now that all code paths correctly set the hash algorithm member of struct object_id, write an object's hex representation using the hash algorithm member embedded in it. Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2021-04-27hex: default to the_hash_algo on zero algorithm valuebrian m. carlson
There are numerous places in the codebase where we assume we can initialize data by zeroing all its bytes. However, when we do that with a struct object_id, it leaves the structure with a zero value for the algorithm, which is invalid. We could forbid this pattern and require that all struct object_id instances be initialized using oidclr, but this seems burdensome and it's unnatural to most C programmers. Instead, if the algorithm is zero, assume we wanted to use the default hash algorithm instead. Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2021-04-27hash: set, copy, and use algo field in struct object_idbrian m. carlson
Now that struct object_id has an algorithm field, we should populate it. This will allow us to handle object IDs in any supported algorithm and distinguish between them. Ensure that the field is written whenever we write an object ID by storing it explicitly every time we write an object. Set values for the empty blob and tree values as well. In addition, use the algorithm field to compare object IDs. Note that because we zero-initialize struct object_id in many places throughout the codebase, we default to the default algorithm in cases where the algorithm field is zero rather than explicitly initialize all of those locations. This leads to a branch on every comparison, but the alternative is to compare the entire buffer each time and padding the buffer for SHA-1. That alternative ranges up to 3.9% worse than this approach on the perf t0001, t1450, and t1451. Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2020-02-24hex: add functions to parse hex object IDs in any algorithmbrian m. carlson
There are some places where we need to parse a hex object ID in any algorithm without knowing beforehand which algorithm is in use. An example is when parsing fast-import marks. Add a get_oid_hex_any to parse an object ID and return the algorithm it belongs to, and additionally add parse_oid_hex_any which is the equivalent change for parse_oid_hex. If the object is not parseable, we return GIT_HASH_UNKNOWN. Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2020-02-24hex: introduce parsing variants taking hash algorithmsbrian m. carlson
Introduce variants of get_oid_hex and parse_oid_hex that parse an arbitrary hash algorithm, implementing internal functions to avoid duplication. These functions can be used in the transport code to parse refs properly. Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2019-11-13hex: drop sha1_to_hex()Jeff King
There's only a single caller left of sha1_to_hex(), since everybody that has an object name in "unsigned char[]" now uses hash_to_hex() instead. This case is in the sha1dc wrapper, where we print a hex sha1 when we find a collision. This one will always be sha1, regardless of the current hash algorithm, so we can't use hash_to_hex() here. In practice we'd probably not be running sha1 at all if it isn't the current algorithm, but it's possible we might still occasionally need to compute a sha1 in a post-sha256 world. Since sha1_to_hex() is just a wrapper for hash_to_hex_algop(), let's call that ourselves. There's value in getting rid of the sha1-specific wrapper to de-clutter the global namespace, and to make sure nobody uses it (and as with sha1_to_hex_r() in the previous patch, we'll drop the coccinelle transformations, too). The sha1_to_hex() function is mentioned in a comment; we can easily swap that out for oid_to_hex() to give a better example. Also update the comment that was left stale when we added "struct object_id *" as a way to name an object and added functions to convert it to hex. The function is also mentioned in some test vectors in t4100, but that's not runnable code, so there's no point in trying to clean it up. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2019-11-11hex: drop sha1_to_hex_r()Jeff King
There are no callers left; everybody uses oid_to_hex_r() or hash_to_hex_algop_r(). This used to actually be the underlying implementation for oid_to_hex_r(), but that's no longer the case since 47edb64997 (hex: introduce functions to print arbitrary hashes, 2018-11-14). Let's get rid of it to de-clutter and to make sure nobody uses it. Likewise we can drop the coccinelle rules that mention it, since the compiler will make it quite clear that the code does not work. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2018-11-14hex: introduce functions to print arbitrary hashesbrian m. carlson
Currently, we have functions that turn an arbitrary SHA-1 value or an object ID into hex format, either using a static buffer or with a user-provided buffer. Add variants of these functions that can handle an arbitrary hash algorithm, specified by constant. Update the documentation as well. While we're at it, remove the "extern" declaration from this family of functions, since it's not needed and our style now recommends against it. We use the variant taking the algorithm structure pointer as the internal variant, since taking an algorithm pointer is the easiest way to handle all of the variants in use. Note that we maintain these functions because there are hashes which must change based on the hash algorithm in use but are not object IDs (such as pack checksums). Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2018-07-16hex: switch to using the_hash_algobrian m. carlson
Instead of using the GIT_SHA1_* constants, switch to using the_hash_algo to convert object IDs to and from hex format. Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2017-11-01notes: move hex_to_bytes() to hex.c and export itRené Scharfe
Make the function for converting pairs of hexadecimal digits to binary available to other call sites. Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2017-03-27Convert GIT_SHA1_HEXSZ used for allocation to GIT_MAX_HEXSZbrian m. carlson
Since we will likely be introducing a new hash function at some point, and that hash function might be longer than 40 hex characters, use the constant GIT_MAX_HEXSZ, which is designed to be suitable for allocations, instead of GIT_SHA1_HEXSZ. This will ease the transition down the line by distinguishing between places where we need to allocate memory suitable for the largest hash from those where we need to handle the current hash. Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2017-02-20hex: introduce parse_oid_hexbrian m. carlson
Introduce a function, parse_oid_hex, which parses a hexadecimal object ID and if successful, sets a pointer to just beyond the last character. This allows for simpler, more robust parsing without needing to hard-code integer values throughout the codebase. Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2016-10-26hex: make wraparound of the index into ring-buffer explicitRené Scharfe
Overflow is defined for unsigned integers, but not for signed ones. We could make the ring-buffer index in sha1_to_hex() and get_pathname() unsigned to be on the safe side to resolve this, but let's make it explicit that we are wrapping around at whatever the number of elements the ring-buffer has. The compiler is smart enough to turn modulus into bitmask for these codepaths that use ring-buffers of a size that is a power of 2. Signed-off-by: René Scharfe <> Signed-off-by: Junio C Hamano <>
2016-09-19Merge branch 'rs/hex2chr' into maintJunio C Hamano
Code cleanup. * rs/hex2chr: introduce hex2chr() for converting two hexadecimal digits to a character
2016-09-07introduce hex2chr() for converting two hexadecimal digits to a characterRené Scharfe
Add and use a helper function that decodes the char value of two hexadecimal digits. It returns a negative number on error, avoids running over the end of the given string and doesn't shift negative values. Signed-off-by: Rene Scharfe <> Signed-off-by: Junio C Hamano <>
2016-06-28hex: add oid_to_hex_r()brian m. carlson
This function works just like sha1_to_hex_r, except that it takes a pointer to struct object_id instead of a pointer to unsigned char. Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2015-09-25add reentrant variants of sha1_to_hex and find_unique_abbrevJeff King
The sha1_to_hex and find_unique_abbrev functions always write into reusable static buffers. There are a few problems with this: - future calls overwrite our result. This is especially annoying with find_unique_abbrev, which does not have a ring of buffers, so you cannot even printf() a result that has two abbreviated sha1s. - if you want to put the result into another buffer, we often strcpy, which looks suspicious when auditing for overflows. This patch introduces sha1_to_hex_r and find_unique_abbrev_r, which write into a user-provided buffer. Of course this is just punting on the overflow-auditing, as the buffer obviously needs to be GIT_SHA1_HEXSZ + 1 bytes. But it is much easier to audit, since that is a well-known size. We retain the non-reentrant forms, which just become thin wrappers around the reentrant ones. This patch also adds a strbuf variant of find_unique_abbrev, which will be handy in later patches. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2015-03-14define utility functions for object IDsbrian m. carlson
There are several utility functions (hashcmp and friends) that are used for comparing object IDs (SHA-1 values). Using these functions, which take pointers to unsigned char, with struct object_id requires tiresome access to the sha1 member, which bloats code and violates the desired encapsulation. Provide wrappers around these functions for struct object_id for neater, more maintainable code. Use the new constants to avoid the hard-coded 20s and 40s throughout the original functions. These functions simply call the underlying pointer-to-unsigned-char versions to ensure that any performance improvements will be passed through to the new functions. Signed-off-by: brian m. carlson <> Signed-off-by: Junio C Hamano <>
2015-03-05Merge branch 'sb/hex-object-name-is-at-most-41-bytes-long' into maintJunio C Hamano
Code clean-up. * sb/hex-object-name-is-at-most-41-bytes-long: hex.c: reduce memory footprint of sha1_to_hex static buffers
2015-02-13hex.c: reduce memory footprint of sha1_to_hex static buffersStefan Beller
41 bytes is the exact number of bytes needed for having the returned hex string represented. 50 seems to be an arbitrary number, such that there are no benefits from alignment to certain address boundaries. Signed-off-by: Stefan Beller <> Signed-off-by: Junio C Hamano <>
2011-10-05get_sha1_hex(): do not read past a NUL characterMichael Haggerty
Previously, get_sha1_hex() would read one character past the end of a null-terminated string whose strlen was an even number less than 40. Although the function correctly returned -1 in these cases, the extra memory access might have been to uninitialized (or even, conceivably, unallocated) memory. Add a check to avoid reading past the end of a string. This problem was discovered by Thomas Rast <> using valgrind. Signed-off-by: Michael Haggerty <> Signed-off-by: Junio C Hamano <>
2010-01-22slim down "git show-index"Linus Torvalds
As the documentation says, this is primarily for debugging, and in the longer term we should rename it to test-show-index or something. In the meantime, just avoid xmalloc (which slurps in the rest of git), and separating out the trivial hex functions into "hex.o". This results in [torvalds@nehalem git]$ size git-show-index text data bss dec hex filename 222818 2276 112688 337782 52776 git-show-index (before) 5696 624 1264 7584 1da0 git-show-index (after) which is a whole lot better. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>