path: root/xdiff/xhistogram.c
AgeCommit message (Collapse)Author
2021-12-05xdiff: drop xpparam_t parameter from histogram cmp_recs()Jeff King
Since 663c5ad035 (diff histogram: intern strings, 2021-11-17), our cmp_recs() does not call xdl_recmatch(), and thus no longer needs an xpparam_t struct from which to get the flags. We can drop the unused parameter from the function, as well as the macro which wraps it. There's no functional change here; it's just simplifying things (and making -Wunused-parameter happier). Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2021-12-05xdiff: drop CMP_ENV macro from xhistogramJeff King
This macro has been unused since 43ca7530df (xdiff/xhistogram: rely on xdl_trim_ends(), 2011-08-01). The function that called it went away there, and its use in the CMP() macro was inlined. It probably should have been deleted then, but nobody noticed. Signed-off-by: Jeff King <> Signed-off-by: Junio C Hamano <>
2021-11-19diff histogram: intern stringsPhillip Wood
Histogram is the only diff algorithm not to call xdl_classify_record(). xdl_classify_record() ensures that the hash values of two strings that are not equal differ which means that it is not necessary to use xdl_recmatch() when comparing lines, all that is necessary is to compare the hash values. This gives a 7% reduction in the runtime of "git log --patch" when using the histogram diff algorithm. Test HEAD^ HEAD ----------------------------------------------------------------------------- 4000.1: log -3000 (baseline) 0.18(0.14+0.04) 0.19(0.17+0.02) +5.6% 4000.2: log --raw -3000 (tree-only) 0.99(0.77+0.21) 0.98(0.78+0.20) -1.0% 4000.3: log -p -3000 (Myers) 4.84(4.31+0.51) 4.81(4.15+0.64) -0.6% 4000.4: log -p -3000 --histogram 6.34(5.86+0.46) 5.87(5.19+0.66) -7.4% 4000.5: log -p -3000 --patience 5.39(4.60+0.76) 5.35(4.60+0.73) -0.7% Signed-off-by: Phillip Wood <> Signed-off-by: Junio C Hamano <>
2020-10-20merge-base, xdiff: zero out xpparam_t structuresMichał Kępień
xpparam_t structures are usually zero-initialized before their specific fields are assigned to, but there are three locations in the tree where that does not happen. Add the missing memset() calls in order to make initialization of xpparam_t structures consistent tree-wide and to prevent stack garbage from being used as field values. Signed-off-by: Michał Kępień <> Signed-off-by: Junio C Hamano <>
2019-07-29xdiff: remove duplicate headers from xhistogram.cCarlo Marcelo Arenas Belón
8c912eea94 ("teach --histogram to diff", 2011-07-12) included them, but were already part of xinclude.h Signed-off-by: Carlo Marcelo Arenas Belón <> Signed-off-by: Junio C Hamano <>
2018-07-23xdiff/histogram: remove tail recursionStefan Beller
When running the same reproduction script as the previous patch, it turns out the stack is too small, which can be easily avoided. Signed-off-by: Stefan Beller <> Signed-off-by: Junio C Hamano <>
2018-07-19xdiff/xhistogram: move index allocation into find_lcsStefan Beller
This fixes a memory issue when recursing a lot, which can be reproduced as seq 1 100000 >one seq 1 4 100000 >two git diff --no-index --histogram one two Before this patch, histogram_diff would call itself recursively before calling free_index, which would mean a lot of memory is allocated during the recursion and only freed afterwards. By moving the memory allocation (and its free call) into find_lcs, the memory is free'd before we recurse, such that memory is reused in the next step of the recursion instead of using new memory. This addresses only the memory pressure, not the run time complexity, that is also awful for the corner case outlined above. Helpful in understanding the code (in addition to the sparse history of this file), was which reproduces most of the code comments of the JGit implementation. Signed-off-by: Stefan Beller <> Signed-off-by: Junio C Hamano <>
2018-07-19xdiff/xhistogram: factor out memory cleanup into free_index()Stefan Beller
This will be useful in the next patch as we'll introduce multiple callers. Signed-off-by: Stefan Beller <> Signed-off-by: Junio C Hamano <>
2018-07-19xdiff/xhistogram: pass arguments directly to fall_back_to_classic_diffStefan Beller
By passing the 'xpp' and 'env' argument directly to the function 'fall_back_to_classic_diff', we eliminate an occurrence of the 'index' in histogram_diff, which will prove useful in a bit. While at it, move it up in the file. This will make the diff of one of the next patches more legible. Signed-off-by: Stefan Beller <> Signed-off-by: Junio C Hamano <>
2013-04-12Correct common spelling mistakes in comments and testsStefano Lattarini
Most of these were found using Lucas De Marchi's codespell tool. Signed-off-by: Stefano Lattarini <> Signed-off-by: Jonathan Nieder <> Acked-by: Matthieu Moy <> Signed-off-by: Junio C Hamano <>
2012-02-19xdiff: PATIENCE/HISTOGRAM are not independent option bitsJunio C Hamano
Because the default Myers, patience and histogram algorithms cannot be in effect at the same time, XDL_PATIENCE_DIFF and XDL_HISTOGRAM_DIFF are not independent bits. Instead of wasting one bit per algorithm, define a few macros to access the few bits they occupy and update the code that access them. Signed-off-by: Junio C Hamano <>
2011-08-08xdiff/xhistogram: drop need for additional variableTay Ray Chuan
Having an additional variable (ptr) instead of changing line(1|2) and count(1|2) was for debugging purposes. Signed-off-by: Tay Ray Chuan <> Signed-off-by: Junio C Hamano <>
2011-08-08xdiff/xhistogram: rely on xdl_trim_ends()Tay Ray Chuan
Do away with reduce_common_start_end() and use xdf->dstart and xdf->dend set by xdl_trim_ends() that similarly tells us where the first unmatched line from the start and end occurs. Signed-off-by: Tay Ray Chuan <> Signed-off-by: Junio C Hamano <>
2011-08-08xdiff/xhistogram: rework handling of recursed resultsTay Ray Chuan
Previously we were over-complicating matters by trying to combine the recursed results. Now, terminate immediately if a recursive call failed and return its result. Signed-off-by: Tay Ray Chuan <> Signed-off-by: Junio C Hamano <>
2011-07-12teach --histogram to diffTay Ray Chuan
Port JGit's HistogramDiff algorithm over to C. Rough numbers (TODO) show that it is faster than its --patience cousin, as well as the default Meyers algorithm. The implementation has been reworked to use structs and pointers, instead of bitmasks, thus doing away with JGit's 2^28 line limit. We also use xdiff's default hash table implementation (xdl_hash_bits() with XDL_HASHLONG()) for convenience. Signed-off-by: Tay Ray Chuan <> Signed-off-by: Junio C Hamano <>