summaryrefslogtreecommitdiff
path: root/hash.h
AgeCommit message (Collapse)Author
2018-02-09hash: update obsolete reference to SHA1_HEADERbrian m. carlson
We moved away from SHA1_HEADER to a preprocessor if chain, but didn't update the comment discussing the platform defines. Update this comment so it reflects the current state of our codebase. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02hash: create union for hash context allocationbrian m. carlson
In various parts of our code, we want to allocate a structure representing the internal state of a hash algorithm. The original implementation of the hash algorithm abstraction assumed we would do that using heap allocations, and added a context size element to struct git_hash_algo. However, most of the existing code uses stack allocations and conversion would needlessly complicate various parts of the code. Add a union for the purpose of allocating hash contexts on the stack and a typedef for ease of use. Use this union for defining the init, update, and final functions to avoid casts. Remove the ctxsz element for struct git_hash_algo, which is no longer very useful. This does mean that stack allocations will grow slightly as additional hash functions are added, but this should not be a significant problem, since we don't allocate many hash contexts. The improved usability and benefits from avoiding dynamic allocation outweigh this small downside. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02hash: move SHA-1 macros to hash.hbrian m. carlson
Most of the other code dealing with SHA-1 and other hashes is located in hash.h, which is in turn loaded by cache.h. Move the SHA-1 macros to hash.h as well, so we can use them in additional hash-related items in the future. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-13Add structure representing hash algorithmbrian m. carlson
Since in the future we want to support an additional hash algorithm, add a structure that represents a hash algorithm and all the data that must go along with it. Add a constant to allow easy enumeration of hash algorithms. Implement function typedefs to create an abstract API that can be used by any hash algorithm, and wrappers for the existing SHA1 functions that conform to this API. Expose a value for hex size as well as binary size. While one will always be twice the other, the two values are both used extremely commonly throughout the codebase and providing both leads to improved readability. Don't include an entry in the hash algorithm structure for the null object ID. As this value is all zeros, any suitably sized all-zero object ID can be used, and there's no need to store a given one on a per-hash basis. The current hash function transition plan envisions a time when we will accept input from the user that might be in SHA-1 or in the NewHash format. Since we cannot know which the user has provided, add a constant representing the unknown algorithm to allow us to indicate that we must look the correct value up. Provide dummy API functions that die in this case. Finally, include git-compat-util.h in hash.h so that the required types are available. This aids people using automated tools their editors. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-16sha1dc: build git plumbing code more explicitlyTakashi Iwai
The plumbing code between sha1dc and git is defined in sha1dc_git.[ch], but these aren't compiled / included directly but only via the indirect inclusion from sha1dc code. This is slightly confusing when you try to trace the build flow. This patch brings the following changes for simplification: - Make sha1dc_git.c stand-alone and build from Makefile - sha1dc_git.h is the common header to include further sha1.h depending on the build condition - Move comments for plumbing codes from the header to definitions This is also meant as a preliminary work for further plumbing with external sha1dc shlib. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-03sha1dc: optionally use sha1collisiondetection as a submoduleÆvar Arnfjörð Bjarmason
Add an option to use the sha1collisiondetection library from the submodule in sha1collisiondetection/ instead of in the copy in the sha1dc/ directory. This allows us to try out the submodule in sha1collisiondetection without breaking the build for anyone who's not expecting them as we work out any kinks. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-17Makefile: add DC_SHA1 knobJeff King
This knob lets you use the sha1dc implementation from: https://github.com/cr-marcstevens/sha1collisiondetection which can detect certain types of collision attacks (even when we only see half of the colliding pair). So it mitigates any attack which consists of getting the "good" half of a collision into a trusted repository, and then later replacing it with the "bad" half. The "good" half is rejected by the victim's version of Git (and even if they run an old version of Git, any sha1dc-enabled git will complain loudly if it ever has to interact with the object). The big downside is that it's slower than either the openssl or block-sha1 implementations. Here are some timings based off of linux.git: - compute sha1 over whole packfile sha1dc: 3.580s blk-sha1: 2.046s (-43%) openssl: 1.335s (-62%) - rev-list --all --objects sha1dc: 33.512s blk-sha1: 33.514s (+0.0%) openssl: 33.650s (+0.4%) - git log --no-merges -10000 -p sha1dc: 8.124s blk-sha1: 7.986s (-1.6%) openssl: 8.203s (+0.9%) - index-pack --verify sha1dc: 4m19s blk-sha1: 2m57s (-32%) openssl: 2m19s (-42%) So overall the sha1 computation with collision detection is about 1.75x slower than block-sha1, and 2.7x slower than sha1. But of course most operations do more than just sha1. Normal object access isn't really slowed at all (both the +/- changes there are well within the run-to-run noise); any changes are drowned out by the other work Git is doing. The most-affected operation is `index-pack --verify`, which is essentially just computing the sha1 on every object. This is similar to the `index-pack` invocation that the receiver of a push or fetch would perform. So clearly there's some extra CPU load here. There will also be some latency for the user, though keep in mind that such an operation will generally be network bound (this is about a 1.2GB packfile). Some of that extra CPU is "free" in the sense that we use it while the pack is streaming in anyway. But most of it comes during the delta-resolution phase, after the whole pack has been received. So we can imagine that for this (quite large) push, the user might have to wait an extra 100 seconds over openssl (which is what we use now). If we assume they can push to us at 20Mbit/s, that's 480s for a 1.2GB pack, which is only 20% slower. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-15hash.h: move SHA-1 implementation selection into a header filebrian m. carlson
Many developers use functionality in their editors that allows for quick syntax checks, including warning about questionable constructs. This functionality allows rapid development with fewer errors. However, such functionality generally does not allow the specification of project-specific defines or command-line options. Since the SHA1_HEADER include is not defined in such a case, developers see spurious errors when using these tools. Furthermore, there are known implementations of "cc" whose '#include' is unhappy with this construct. Instead of using SHA1_HEADER, create a hash.h header and use #if and #elif to select the desired header. Have the Makefile pass an appropriate option to help the header select the right implementation to use. [jc: make BLK_SHA1 the fallback default as discussed on list, e.g. <20170314201424.vccij5z2ortq4a4o@sigill.intra.peff.net>; also remove SHA1_HEADER and SHA1_HEADER_SQ that are no longer used]. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18remove old hash.[ch] implementationKarsten Blees
Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-17Preallocate hash tables when the number of inserts are known in advanceNguyễn Thái Ngọc Duy
This avoids unnecessary re-allocations and reinsertions. On webkit.git (i.e. about 182k inserts to the name hash table), this reduces about 100ms out of 3s user time. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-19for_each_hash: allow passing a 'void *data' pointer to callbackLinus Torvalds
For the find_exact_renames() function, this allows us to pass the diff_options structure pointer to the low-level routines. We will use that to distinguish between the "rename" and "copy" cases. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-09Add 'const' where appropriate to index handling functionsLinus Torvalds
This is in an effort to make the source index of 'unpack_trees()' as being const, and thus making the compiler help us verify that we only access it for reading. The constification also extended to some of the hashing helpers that get called indirectly. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-27Do linear-time/space rename logic for exact renamesLinus Torvalds
This implements a smarter rename detector for exact renames, which rather than doing a pairwise comparison (time O(m*n)) will just hash the files into a hash-table (size O(n+m)), and only do pairwise comparisons to renames that have the same hash (time O(n+m) except for unrealistic hash collissions, which we just cull aggressively). Admittedly the exact rename case is not nearly as interesting as the generic case, but it's an important case none-the-less. A similar general approach should work for the generic case too, but even then you do need to handle the exact renames/copies separately (to avoid the inevitable added cost factor that comes from the _size_ of the file), so this is worth doing. In the expectation that we will indeed do the same hashing trick for the general rename case, this code uses a generic hash-table implementation that can be used for other things too. In fact, we might be able to consolidate some of our existing hash tables with the new generic code in hash.[ch]. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>