summaryrefslogtreecommitdiff
path: root/diffcore-rename.c
AgeCommit message (Collapse)Author
2018-10-19Merge branch 'nd/the-index'Junio C Hamano
Various codepaths in the core-ish part learn to work on an arbitrary in-core index structure, not necessarily the default instance "the_index". * nd/the-index: (23 commits) revision.c: reduce implicit dependency the_repository revision.c: remove implicit dependency on the_index ws.c: remove implicit dependency on the_index tree-diff.c: remove implicit dependency on the_index submodule.c: remove implicit dependency on the_index line-range.c: remove implicit dependency on the_index userdiff.c: remove implicit dependency on the_index rerere.c: remove implicit dependency on the_index sha1-file.c: remove implicit dependency on the_index patch-ids.c: remove implicit dependency on the_index merge.c: remove implicit dependency on the_index merge-blobs.c: remove implicit dependency on the_index ll-merge.c: remove implicit dependency on the_index diff-lib.c: remove implicit dependency on the_index read-cache.c: remove implicit dependency on the_index diff.c: remove implicit dependency on the_index grep.c: remove implicit dependency on the_index diff.c: remove the_index dependency in textconv() functions blame.c: rename "repo" argument to "r" combine-diff.c: remove implicit dependency on the_index ...
2018-09-21diff.c: reduce implicit dependency on the_indexNguyễn Thái Ngọc Duy
diff and textconv code has so widespread use that it's hard to simply update their api and all call sites at once because it would result in a big patch. For now reduce the_index references to two places: diff_setup() and fill_textconv(). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29convert "oidcmp() != 0" to "!oideq()"Jeff King
This is the flip side of the previous two patches: checking for a non-zero oidcmp() can be more strictly expressed as inequality. Like those patches, we write "!= 0" in the coccinelle transformation, which covers by isomorphism the more common: if (oidcmp(E1, E2)) As with the previous two patches, this patch can be achieved almost entirely by running "make coccicheck"; the only differences are manual line-wrap fixes to match the original code. There is one thing to note for anybody replicating this, though: coccinelle 1.0.4 seems to miss the case in builtin/tag.c, even though it's basically the same as all the others. Running with 1.0.7 does catch this, so presumably it's just a coccinelle bug that was fixed in the interim. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16object-store: move object access functions to object-store.hStefan Beller
This should make these functions easier to find and cache.h less overwhelming to read. In particular, this moves: - read_object_file - oid_object_info - write_object_file As a result, most of the codebase needs to #include object-store.h. In this patch the #include is only added to files that would fail to compile otherwise. It would be better to #include wherever identifiers from the header are used. That can happen later when we have better tooling for it. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-15Merge branch 'po/object-id'Junio C Hamano
Conversion from uchar[20] to struct object_id continues. * po/object-id: sha1_file: rename hash_sha1_file_literally sha1_file: convert write_loose_object to object_id sha1_file: convert force_object_loose to object_id sha1_file: convert write_sha1_file to object_id notes: convert write_notes_tree to object_id notes: convert combine_notes_* to object_id commit: convert commit_tree* to object_id match-trees: convert splice_tree to object_id cache: clear whole hash buffer with oidclr sha1_file: convert hash_sha1_file to object_id dir: convert struct sha1_stat to use object_id sha1_file: convert pretend_sha1_file to object_id
2018-01-30sha1_file: convert hash_sha1_file to object_idPatryk Obara
Convert the declaration and definition of hash_sha1_file to use struct object_id and adjust all function calls. Rename this function to hash_object_file. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-22Use MOVE_ARRAYSZEDER Gábor
Use the helper macro MOVE_ARRAY to move arrays. This is shorter and safer, as it automatically infers the size of elements. Patch generated by Coccinelle and contrib/coccinelle/array.cocci in Travis CI's static analysis build job. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-03diffcore-rename: make diff-tree -l0 mean -l<large>Jonathan Tan
In the documentation of diff-tree, it is stated that the -l option "prevents rename/copy detection from running if the number of rename/copy targets exceeds the specified number". The documentation does not mention any special handling for the number 0, but the implementation before commit 9f7e4bfa3b ("diff: remove silent clamp of renameLimit", 2017-11-13) treated 0 as a special value indicating that the rename limit is to be a very large number instead. The commit 9f7e4bfa3b changed that behavior, treating 0 as 0. Revert this behavior to what it was previously. This allows existing scripts and tools that use "-l0" to continue working. The alternative (to have "-l0" suppress rename detection) is probably much less useful, since users can just refrain from specifying -M and/or -C to have the same effect. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-15diff: remove silent clamp of renameLimitElijah Newren
In commit 0024a5492 (Fix the rename detection limit checking; 2007-09-14), the renameLimit was clamped to 32767. This appears to have been to simply avoid integer overflow in the following computation: num_create * num_src <= rename_limit * rename_limit although it also could be viewed as a hardcoded bound on the amount of CPU time we're willing to allow users to tell git to spend on handling renames. An upper bound may make sense, but unfortunately this upper bound was neither communicated to the users, nor documented anywhere. Although large limits can make things slow, we have users who would be ecstatic to have a small five file change be correctly cherry picked even if they have to manually specify a large limit and wait ten minutes for the renames to be detected. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-15progress: fix progress meters when dealing with lots of workElijah Newren
The possibility of setting merge.renameLimit beyond 2^16 raises the possibility that the values passed to progress can exceed 2^32. Use uint64_t, because it "ought to be enough for anybody". :-) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01diff: make struct diff_flags members lowercaseBrandon Williams
Now that the flags stored in struct diff_flags are being accessed directly and not through macros, change all struct members from being uppercase to lowercase. This conversion is done using the following semantic patch: @@ expression E; @@ - E.RECURSIVE + E.recursive @@ expression E; @@ - E.TREE_IN_RECURSIVE + E.tree_in_recursive @@ expression E; @@ - E.BINARY + E.binary @@ expression E; @@ - E.TEXT + E.text @@ expression E; @@ - E.FULL_INDEX + E.full_index @@ expression E; @@ - E.SILENT_ON_REMOVE + E.silent_on_remove @@ expression E; @@ - E.FIND_COPIES_HARDER + E.find_copies_harder @@ expression E; @@ - E.FOLLOW_RENAMES + E.follow_renames @@ expression E; @@ - E.RENAME_EMPTY + E.rename_empty @@ expression E; @@ - E.HAS_CHANGES + E.has_changes @@ expression E; @@ - E.QUICK + E.quick @@ expression E; @@ - E.NO_INDEX + E.no_index @@ expression E; @@ - E.ALLOW_EXTERNAL + E.allow_external @@ expression E; @@ - E.EXIT_WITH_STATUS + E.exit_with_status @@ expression E; @@ - E.REVERSE_DIFF + E.reverse_diff @@ expression E; @@ - E.CHECK_FAILED + E.check_failed @@ expression E; @@ - E.RELATIVE_NAME + E.relative_name @@ expression E; @@ - E.IGNORE_SUBMODULES + E.ignore_submodules @@ expression E; @@ - E.DIRSTAT_CUMULATIVE + E.dirstat_cumulative @@ expression E; @@ - E.DIRSTAT_BY_FILE + E.dirstat_by_file @@ expression E; @@ - E.ALLOW_TEXTCONV + E.allow_textconv @@ expression E; @@ - E.TEXTCONV_SET_VIA_CMDLINE + E.textconv_set_via_cmdline @@ expression E; @@ - E.DIFF_FROM_CONTENTS + E.diff_from_contents @@ expression E; @@ - E.DIRTY_SUBMODULES + E.dirty_submodules @@ expression E; @@ - E.IGNORE_UNTRACKED_IN_SUBMODULES + E.ignore_untracked_in_submodules @@ expression E; @@ - E.IGNORE_DIRTY_SUBMODULES + E.ignore_dirty_submodules @@ expression E; @@ - E.OVERRIDE_SUBMODULE_CONFIG + E.override_submodule_config @@ expression E; @@ - E.DIRSTAT_BY_LINE + E.dirstat_by_line @@ expression E; @@ - E.FUNCCONTEXT + E.funccontext @@ expression E; @@ - E.PICKAXE_IGNORE_CASE + E.pickaxe_ignore_case @@ expression E; @@ - E.DEFAULT_FOLLOW_RENAMES + E.default_follow_renames Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01diff: remove DIFF_OPT_TST macroBrandon Williams
Remove the `DIFF_OPT_TST` macro and instead access the flags directly. This conversion is done using the following semantic patch: @@ expression E; identifier fld; @@ - DIFF_OPT_TST(&E, fld) + E.flags.fld @@ type T; T *ptr; identifier fld; @@ - DIFF_OPT_TST(ptr, fld) + ptr->flags.fld Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-19progress: simplify "delayed" progress APIJunio C Hamano
We used to expose the full power of the delayed progress API to the callers, so that they can specify, not just the message to show and expected total amount of work that is used to compute the percentage of work performed so far, the percent-threshold parameter P and the delay-seconds parameter N. The progress meter starts to show at N seconds into the operation only if we have not yet completed P per-cent of the total work. Most callers used either (0%, 2s) or (50%, 1s) as (P, N), but there are oddballs that chose more random-looking values like 95%. For a smoother workload, (50%, 1s) would allow us to start showing the progress meter earlier than (0%, 2s), while keeping the chance of not showing progress meter for long running operation the same as the latter. For a task that would take 2s or more to complete, it is likely that less than half of it would complete within the first second, if the workload is smooth. But for a spiky workload whose earlier part is easier, such a setting is likely to fail to show the progress meter entirely and (0%, 2s) is more appropriate. But that is merely a theory. Realistically, it is of dubious value to ask each codepath to carefully consider smoothness of their workload and specify their own setting by passing two extra parameters. Let's simplify the API by dropping both parameters and have everybody use (0%, 2s). Oh, by the way, the percent-threshold parameter and the structure member were consistently misspelled, which also is now fixed ;-) Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30hashmap.h: compare function has access to a data fieldStefan Beller
When using the hashmap a common need is to have access to caller provided data in the compare function. A couple of times we abuse the keydata field to pass in the data needed. This happens for example in patch-ids.c. This patch changes the function signature of the compare function to have one more void pointer available. The pointer given for each invocation of the compare function must be defined in the init function of the hashmap and is just passed through. Documentation of this new feature is deferred to a later patch. This is a rather mechanical conversion, just adding the new pass-through parameter. However while at it improve the naming of the fields of all compare functions used by hashmaps by ensuring unused parameters are prefixed with 'unused_' and naming the parameters what they are (instead of 'unused' make it 'unused_keydata'). Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-24Merge branch 'ab/free-and-null'Junio C Hamano
A common pattern to free a piece of memory and assign NULL to the pointer that used to point at it has been replaced with a new FREE_AND_NULL() macro. * ab/free-and-null: *.[ch] refactoring: make use of the FREE_AND_NULL() macro coccinelle: make use of the "expression" FREE_AND_NULL() rule coccinelle: add a rule to make "expression" code use FREE_AND_NULL() coccinelle: make use of the "type" FREE_AND_NULL() rule coccinelle: add a rule to make "type" code use FREE_AND_NULL() git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
2017-06-16coccinelle: make use of the "type" FREE_AND_NULL() ruleÆvar Arnfjörð Bjarmason
Apply the result of the just-added coccinelle rule. This manually excludes a few occurrences, mostly things that resulted in many FREE_AND_NULL() on one line, that'll be manually fixed in a subsequent change. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-05diffcore-rename: use is_empty_blob_oidBrandon Williams
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-02diff: convert fill_filespec to struct object_idBrandon Williams
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-17Merge branch 'tk/diffcore-delta-remove-unused'Junio C Hamano
Code cleanup. * tk/diffcore-delta-remove-unused: diffcore-delta: remove unused parameter to diffcore_count_changes()
2016-11-14diffcore-delta: remove unused parameter to diffcore_count_changes()Tobias Klauser
The delta_limit parameter to diffcore_count_changes() has been unused since commit ba23bbc8e ("diffcore-delta: make change counter to byte oriented again.", 2006-03-04). Remove the parameter and adjust all callers. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-29use QSORTRené Scharfe
Apply the semantic patch contrib/coccinelle/qsort.cocci to the code base, replacing calls of qsort(3) with QSORT. The resulting code is shorter and supports empty arrays with NULL pointers. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-01pass constants as first argument to st_mult()René Scharfe
The result of st_mult() is the same no matter the order of its arguments. It invokes the macro unsigned_mult_overflows(), which divides the second parameter by the first one. Pass constants first to allow that division to be done already at compile time. Signed-off-by: Rene Scharfe <l.s.r@web.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-28diff: rename struct diff_filespec's sha1_valid memberbrian m. carlson
Now that this struct's sha1 member is called "oid", update the comment and the sha1_valid member to be called "oid_valid" instead. The following Coccinelle semantic patch was used to implement this, followed by the transformations in object_id.cocci: @@ struct diff_filespec o; @@ - o.sha1_valid + o.oid_valid @@ struct diff_filespec *p; @@ - p->sha1_valid + p->oid_valid Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-28diff: convert struct diff_filespec to struct object_idbrian m. carlson
Convert struct diff_filespec's sha1 member to use a struct object_id called "oid" instead. The following Coccinelle semantic patch was used to implement this, followed by the transformations in object_id.cocci: @@ struct diff_filespec o; @@ - o.sha1 + o.oid.hash @@ struct diff_filespec *p; @@ - p->sha1 + p->oid.hash Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-29Merge branch 'sg/diff-multiple-identical-renames' into maintJunio C Hamano
"git diff -M" used to work better when two originally identical files A and B got renamed to X/A and X/B by pairing A to X/A and B to X/B, but this was broken in the 2.0 timeframe. * sg/diff-multiple-identical-renames: diffcore: fix iteration order of identical files during rename detection
2016-04-13Merge branch 'sg/diff-multiple-identical-renames'Junio C Hamano
"git diff -M" used to work better when two originally identical files A and B got renamed to X/A and X/B by pairing A to X/A and B to X/B, but this was broken in the 2.0 timeframe. * sg/diff-multiple-identical-renames: diffcore: fix iteration order of identical files during rename detection
2016-03-30diffcore: fix iteration order of identical files during rename detectionSZEDER Gábor
If the two paths 'dir/A/file' and 'dir/B/file' have identical content and the parent directory is renamed, e.g. 'git mv dir other-dir', then diffcore reports the following exact renames: renamed: dir/B/file -> other-dir/A/file renamed: dir/A/file -> other-dir/B/file While technically not wrong, this is confusing not only for the user, but also for git commands that make decisions based on rename information, e.g. 'git log --follow other-dir/A/file' follows 'dir/B/file' past the rename. This behavior is a side effect of commit v2.0.0-rc4~8^2~14 (diffcore-rename.c: simplify finding exact renames, 2013-11-14): the hashmap storing sources returns entries from the same bucket, i.e. sources matching the current destination, in LIFO order. Thus the iteration first examines 'other-dir/A/file' and 'dir/B/file' and, upon finding identical content and basename, reports an exact rename. Other hashmap users are apparently happy with the current iteration order over the entries of a bucket. Changing the iteration order would risk upsetting other hashmap users and would increase the memory footprint of each bucket by a pointer to the tail element. Fill the hashmap with source entries in reverse order to restore the original exact rename detection behavior. Reported-by: Bill Okara <billokara@gmail.com> Signed-off-by: SZEDER Gábor <szeder@ira.uka.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22use st_add and st_mult for allocation size computationJeff King
If our size computation overflows size_t, we may allocate a much smaller buffer than we expected and overflow it. It's probably impossible to trigger an overflow in most of these sites in practice, but it is easy enough convert their additions and multiplications into overflow-checking variants. This may be fixing real bugs, and it makes auditing the code easier. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-10Merge branch 'jk/diffcore-rename-duplicate'Junio C Hamano
A corrupt input to "git diff -M" can cause us to segfault. * jk/diffcore-rename-duplicate: diffcore-rename: avoid processing duplicate destinations diffcore-rename: split locate_rename_dst into two functions
2015-02-27diffcore-rename: avoid processing duplicate destinationsJeff King
The rename code cannot handle an input where we have duplicate destinations (i.e., more than one diff_filepair in the queue with the same string in its pair->two->path). We end up allocating only one slot in the rename_dst mapping. If we fill in the diff_filepair for that slot, when we re-queue the results, we may queue that filepair multiple times. When the diff is finally flushed, the filepair is processed and free()d multiple times, leading to heap corruption. This situation should only happen when a tree diff sees duplicates in one of the trees (see the added test for a detailed example). Rather than handle it, the sanest thing is just to turn off rename detection altogether for the diff. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-27diffcore-rename: split locate_rename_dst into two functionsJeff King
This function manages the mapping of destination pathnames to filepairs, and it handles both insertion and lookup. This makes the return value a bit confusing, as we return a newly created entry (even though no caller cares), and have no room to indicate to the caller that an entry already existed. Instead, let's break this up into two distinct functions, both backed by a common binary search. The binary search will use our normal "return the index if we found something, or negative index minus one to show where it would have gone" semantics. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-18diff.c: allow to pass more flags to diff_populate_filespecNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07hashmap: add simplified hashmap_get_from_hash() APIKarsten Blees
Hashmap entries are typically looked up by just a key. The hashmap_get() API expects an initialized entry structure instead, to support compound keys. This flexibility is currently only needed by find_dir_entry() in name-hash.c (and compat/win32/fscache.c in the msysgit fork). All other (currently five) call sites of hashmap_get() have to set up a near emtpy entry structure, resulting in duplicate code like this: struct hashmap_entry keyentry; hashmap_entry_init(&keyentry, hash(key)); return hashmap_get(map, &keyentry, key); Add a hashmap_get_from_hash() API that allows hashmap lookups by just specifying the key and its hash code, i.e.: return hashmap_get_from_hash(map, hash(key), key); Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07hashmap: factor out getting a hash code from a SHA1Karsten Blees
Copying the first bytes of a SHA1 is duplicated in six places, however, the implications (the actual value would depend on the endianness of the platform) is documented only once. Add a properly documented API for this. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-18Merge branch 'dd/use-alloc-grow'Junio C Hamano
Replace open-coded reallocation with ALLOC_GROW() macro. * dd/use-alloc-grow: sha1_file.c: use ALLOC_GROW() in pretend_sha1_file() read-cache.c: use ALLOC_GROW() in add_index_entry() builtin/mktree.c: use ALLOC_GROW() in append_to_tree() attr.c: use ALLOC_GROW() in handle_attr_line() dir.c: use ALLOC_GROW() in create_simplify() reflog-walk.c: use ALLOC_GROW() replace_object.c: use ALLOC_GROW() in register_replace_object() patch-ids.c: use ALLOC_GROW() in add_commit() diffcore-rename.c: use ALLOC_GROW() diff.c: use ALLOC_GROW() commit.c: use ALLOC_GROW() in register_commit_graft() cache-tree.c: use ALLOC_GROW() in find_subtree() bundle.c: use ALLOC_GROW() in add_to_ref_list() builtin/pack-objects.c: use ALLOC_GROW() in check_pbase_path()
2014-03-14Merge branch 'nd/i18n-progress'Junio C Hamano
Mark the progress indicators from various time-consuming commands for i18n/l10n. * nd/i18n-progress: i18n: mark all progress lines for translation
2014-03-03diffcore-rename.c: use ALLOC_GROW()Dmitry S. Dolzhenko
Use ALLOC_GROW() instead of open-coding it in locate_rename_dst() and register_rename_src(). Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24i18n: mark all progress lines for translationNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18diffcore-rename.c: use new hash map implementationKarsten Blees
Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18diffcore-rename.c: simplify finding exact renamesKarsten Blees
The find_exact_renames function currently only uses the hash table for grouping, i.e.: 1. add sources 2. add destinations 3. iterate all buckets, per bucket: 4. split sources from destinations 5. iterate destinations, per destination: 6. iterate sources to find best match This can be simplified by utilizing the lookup functionality of the hash table, i.e.: 1. add sources 2. iterate destinations, per destination: 3. lookup sources matching the current destination 4. iterate sources to find best match This saves several iterations and file_similarity allocations for the destinations. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18diffcore-rename.c: move code around to prepare for the next patchKarsten Blees
No actual code changes, just move hash_filespec up and outdent part of find_identical_files. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-17Preallocate hash tables when the number of inserts are known in advanceNguyễn Thái Ngọc Duy
This avoids unnecessary re-allocations and reinsertions. On webkit.git (i.e. about 182k inserts to the name hash table), this reduces about 100ms out of 3s user time. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-27Merge branch 'jk/maint-null-in-trees'Junio C Hamano
We do not want a link to 0{40} object stored anywhere in our objects. * jk/maint-null-in-trees: fsck: detect null sha1 in tree entries do not write null sha1s to on-disk index diff: do not use null sha1 as a sentinel value
2012-07-29diff: do not use null sha1 as a sentinel valueJeff King
The diff code represents paths using the diff_filespec struct. This struct has a sha1 to represent the sha1 of the content at that path, as well as a sha1_valid member which indicates whether its sha1 field is actually useful. If sha1_valid is not true, then the filespec represents a working tree file (e.g., for the no-index case, or for when the index is not up-to-date). The diff_filespec is only used internally, though. At the interfaces to the diff subsystem, callers feed the sha1 directly, and we create a diff_filespec from it. It's at that point that we look at the sha1 and decide whether it is valid or not; callers may pass the null sha1 as a sentinel value to indicate that it is not. We should not typically see the null sha1 coming from any other source (e.g., in the index itself, or from a tree). However, a corrupt tree might have a null sha1, which would cause "diff --patch" to accidentally diff the working tree version of a file instead of treating it as a blob. This patch extends the edges of the diff interface to accept a "sha1_valid" flag whenever we accept a sha1, and to use that flag when creating a filespec. In some cases, this means passing the flag through several layers, making the code change larger than would be desirable. One alternative would be to simply die() upon seeing corrupted trees with null sha1s. However, this fix more directly addresses the problem (while bogus sha1s in a tree are probably a bad thing, it is really the sentinel confusion sending us down the wrong code path that is what makes it devastating). And it means that git is more capable of examining and debugging these corrupted trees. For example, you can still "diff --raw" such a tree to find out when the bogus entry was introduced; you just cannot do a "--patch" diff (just as you could not with any other corrupted tree, as we do not have any content to diff). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-23teach diffcore-rename to optionally ignore empty contentJeff King
Our rename detection is a heuristic, matching pairs of removed and added files with similar or identical content. It's unlikely to be wrong when there is actual content to compare, and we already take care not to do inexact rename detection when there is not enough content to produce good results. However, we always do exact rename detection, even when the blob is tiny or empty. It's easy to get false positives with an empty blob, simply because it is an obvious content to use as a boilerplate (e.g., when telling git that an empty directory is worth tracking via an empty .gitignore). This patch lets callers specify whether or not they are interested in using empty files as rename sources and destinations. The default is "yes", keeping the original behavior. It works by detecting the empty-blob sha1 for rename sources and destinations. One more flexible alternative would be to allow the caller to specify a minimum size for a blob to be "interesting" for rename detection. But that would catch small boilerplate files, not large ones (e.g., if you had the GPL COPYING file in many directories). A better alternative would be to allow a "-rename" gitattribute to allow boilerplate files to be marked as such. I'll leave the complexity of that solution until such time as somebody actually wants it. The complaints we've seen so far revolve around empty files, so let's start with the simple thing. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-02Merge branch 'mz/maint-rename-unmerged'Junio C Hamano
* mz/maint-rename-unmerged: diffcore-rename: don't consider unmerged path as source
2011-04-29diffcore-rename.c: avoid set-but-not-used warningJim Meyering
Since 9d8a5a5 (diffcore-rename: refactor "too many candidates" logic, 2011-01-06), diffcore_rename() initializes num_src but does not use it anymore. "-Wunused-but-set-variable" in gcc-4.6 complains about this. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-24diffcore-rename: don't consider unmerged path as sourceMartin von Zweigbergk
Since e9c8409 (diff-index --cached --raw: show tree entry on the LHS for unmerged entries., 2007-01-05), an unmerged entry should be detected by using DIFF_PAIR_UNMERGED(p), not by noticing both one and two sides of the filepair records mode=0 entries. However, it forgot to update some parts of the rename detection logic. This only makes difference in the "diff --cached" codepath where an unmerged filepair carries information on the entries that came from the tree. It probably hasn't been noticed for a long time because nobody would run "diff -M" during a conflict resolution, but "git status" uses rename detection when it internally runs "diff-index" and "diff-files" and gives nonsense results. In an unmerged pair, "one" side can have a valid filespec to record the tree entry (e.g. what's in HEAD) when running "diff --cached". This can be used as a rename source to other paths in the index that are not unmerged. The path that is unmerged by definition does not have the final content yet (i.e. "two" side cannot have a valid filespec), so it can never be a rename destination. Use the DIFF_PAIR_UNMERGED() to detect unmerged filepair correctly, and allow the valid "one" side of an unmerged filepair to be considered a potential rename source, but never to be considered a rename destination. Commit message and first two test cases by Junio, the rest by Martin. Signed-off-by: Martin von Zweigbergk <martin.von.zweigbergk@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-22diffcore-rename: fall back to -C when -C -C busts the rename limitJunio C Hamano
When there are too many paths in the project, the number of rename source candidates "git diff -C -C" finds will exceed the rename detection limit, and no inexact rename detection is performed. We however could fall back to "git diff -C" if the number of modified paths is sufficiently small. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-22diffcore-rename: record filepair for rename srcJunio C Hamano
This will allow us to later skip unmodified entries added due to "-C -C". We might also want to do something similar to rename_dst side, but that would only be for the sake of symmetry and not necessary for this series. Signed-off-by: Junio C Hamano <gitster@pobox.com>