summaryrefslogtreecommitdiff
path: root/read-cache.c
AgeCommit message (Collapse)Author
2018-02-15Merge branch 'bc/hash-algo'Junio C Hamano
More abstraction of hash function from the codepath. * bc/hash-algo: hash: update obsolete reference to SHA1_HEADER bulk-checkin: abstract SHA-1 usage csum-file: abstract uses of SHA-1 csum-file: rename sha1file to hashfile read-cache: abstract away uses of SHA-1 pack-write: switch various SHA-1 values to abstract forms pack-check: convert various uses of SHA-1 to abstract forms fast-import: switch various uses of SHA-1 to the_hash_algo sha1_file: switch uses of SHA-1 to the_hash_algo builtin/unpack-objects: switch uses of SHA-1 to the_hash_algo builtin/index-pack: improve hash function abstraction hash: create union for hash context allocation hash: move SHA-1 macros to hash.h
2018-02-15Merge branch 'nd/trace-index-ops'Junio C Hamano
* nd/trace-index-ops: trace: measure where the time is spent in the index-heavy operations
2018-02-15Merge branch 'po/object-id'Junio C Hamano
Conversion from uchar[20] to struct object_id continues. * po/object-id: sha1_file: rename hash_sha1_file_literally sha1_file: convert write_loose_object to object_id sha1_file: convert force_object_loose to object_id sha1_file: convert write_sha1_file to object_id notes: convert write_notes_tree to object_id notes: convert combine_notes_* to object_id commit: convert commit_tree* to object_id match-trees: convert splice_tree to object_id cache: clear whole hash buffer with oidclr sha1_file: convert hash_sha1_file to object_id dir: convert struct sha1_stat to use object_id sha1_file: convert pretend_sha1_file to object_id
2018-02-13Merge branch 'nd/shared-index-fix'Junio C Hamano
Code clean-up. * nd/shared-index-fix: read-cache: don't write index twice if we can't write shared index read-cache.c: move tempfile creation/cleanup out of write_shared_index read-cache.c: change type of "temp" in write_shared_index()
2018-02-13Merge branch 'sg/cocci-move-array'Junio C Hamano
Code clean-up. * sg/cocci-move-array: Use MOVE_ARRAY
2018-02-13Merge branch 'tg/split-index-fixes'Junio C Hamano
The split-index mode had a few corner case bugs fixed. * tg/split-index-fixes: travis: run tests with GIT_TEST_SPLIT_INDEX split-index: don't write cache tree with null oid entries read-cache: fix reading the shared index for other repos
2018-02-02read-cache: abstract away uses of SHA-1brian m. carlson
Convert various uses of direct calls to SHA-1 and 20- and 40-based constants to use the_hash_algo instead. Don't yet convert the on-disk data structures, which will be handled in a future commit. Adjust some comments so as not to refer explicitly to SHA-1. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02trace: measure where the time is spent in the index-heavy operationsNguyễn Thái Ngọc Duy
All the known heavy code blocks are measured (except object database access). This should help identify if an optimization is effective or not. An unoptimized git-status would give something like below: 0.001791141 s: read cache ... 0.004011363 s: preload index 0.000516161 s: refresh index 0.003139257 s: git command: ... 'status' '--porcelain=2' 0.006788129 s: diff-files 0.002090267 s: diff-index 0.001885735 s: initialize name hash 0.032013138 s: read directory 0.051781209 s: git command: './git' 'status' Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-30sha1_file: convert write_sha1_file to object_idPatryk Obara
Convert the definition and declaration of write_sha1_file to struct object_id and adjust usage of this function. This commit also converts static function write_sha1_file_prepare, as it is closely related. Rename these functions to write_object_file and write_object_file_prepare respectively. Replace sha1_to_hex, hashcpy and hashclr with their oid equivalents wherever possible. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-24read-cache: don't write index twice if we can't write shared indexNguyễn Thái Ngọc Duy
In a0a967568e ("update-index --split-index: do not split if $GIT_DIR is read only", 2014-06-13), we tried to make sure we can still write an index, even if the shared index can not be written. We did so by just calling 'do_write_locked_index()' just before 'write_shared_index()'. 'do_write_locked_index()' always at least closes the tempfile nowadays, and used to close or commit the lockfile if COMMIT_LOCK or CLOSE_LOCK were given at the time this feature was introduced. COMMIT_LOCK or CLOSE_LOCK is passed in by most callers of 'write_locked_index()'. After calling 'write_shared_index()', we call 'write_split_index()', which calls 'do_write_locked_index()' again, which then tries to use the closed lockfile again, but in fact fails to do so as it's already closed. This eventually leads to a segfault. Make sure to write the main index only once. [nd: most of the commit message and investigation done by Thomas, I only tweaked the solution a bit] Helped-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-22Use MOVE_ARRAYSZEDER Gábor
Use the helper macro MOVE_ARRAY to move arrays. This is shorter and safer, as it automatically infers the size of elements. Patch generated by Coccinelle and contrib/coccinelle/array.cocci in Travis CI's static analysis build job. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-19split-index: don't write cache tree with null oid entriesThomas Gummerer
In a96d3cc3f6 ("cache-tree: reject entries with null sha1", 2017-04-21) we made sure that broken cache entries do not get propagated to new trees. Part of that was making sure not to re-use an existing cache tree that includes a null oid. It did so by dropping the cache tree in 'do_write_index()' if one of the entries contains a null oid. In split index mode however, there are two invocations to 'do_write_index()', one for the shared index and one for the split index. The cache tree is only written once, to the split index. As we only loop through the elements that are effectively being written by the current invocation, that may not include the entry with a null oid in the split index (when it is already written to the shared index), where we write the cache tree. Therefore in split index mode we may still end up writing the cache tree, even though there is an entry with a null oid in the index. Fix this by checking for null oids in prepare_to_write_split_index, where we loop the entries of the shared index as well as the entries for the split index. This fixes t7009 with GIT_TEST_SPLIT_INDEX. Also add a new test that's more specifically showing the problem. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-19read-cache: fix reading the shared index for other reposThomas Gummerer
read_index_from() takes a path argument for the location of the index file. For reading the shared index in split index mode however it just ignores that path argument, and reads it from the gitdir of the current repository. This works as long as an index in the_repository is read. Once that changes, such as when we read the index of a submodule, or of a different working tree than the current one, the gitdir of the_repository will no longer contain the appropriate shared index, and git will fail to read it. For example t3007-ls-files-recurse-submodules.sh was broken with GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also broken in a similar manner, probably by introducing struct repository there, although I didn't track down the exact commit for that. be489d02d2 ("revision.c: --indexed-objects add objects from all worktrees", 2017-08-23) breaks with split index mode in a similar manner, not erroring out when it can't read the index, but instead carrying on with pruning, without taking the index of the worktree into account. Fix this by passing an additional gitdir parameter to read_index_from, to indicate where it should look for and read the shared index from. read_cache_from() defaults to using the gitdir of the_repository. As it is mostly a convenience macro, having to pass get_git_dir() for every call seems overkill, and if necessary users can have more control by using read_index_from(). Helped-by: Brandon Williams <bmwill@google.com> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-16read-cache.c: move tempfile creation/cleanup out of write_shared_indexNguyễn Thái Ngọc Duy
For one thing, we have more consistent cleanup procedure now and always keep errno intact. The real purpose is the ability to break out of write_locked_index() early when mks_tempfile() fails in the next patch. It's more awkward to do it if this mks_tempfile() is still inside write_shared_index(). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-16read-cache.c: change type of "temp" in write_shared_index()Nguyễn Thái Ngọc Duy
This local variable 'temp' will be passed in from the caller in the next patch. To reduce patch noise, let's change its type now while it's still a local variable and get all the trival conversion out of the next patch. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-27Merge branch 'tb/add-renormalize'Junio C Hamano
"git add --renormalize ." is a new and safer way to record the fact that you are correcting the end-of-line convention and other "convert_to_git()" glitches in the in-repository data. * tb/add-renormalize: add: introduce "--renormalize"
2017-11-21Merge branch 'av/fsmonitor'Junio C Hamano
Various fixes to bp/fsmonitor topic. * av/fsmonitor: fsmonitor: simplify determining the git worktree under Windows fsmonitor: store fsmonitor bitmap before splitting index fsmonitor: read from getcwd(), not the PWD environment variable fsmonitor: delay updating state until after split index is merged fsmonitor: document GIT_TRACE_FSMONITOR fsmonitor: don't bother pretty-printing JSON from watchman fsmonitor: set the PWD to the top of the working tree
2017-11-21Merge branch 'bp/fsmonitor'Junio C Hamano
We learned to talk to watchman to speed up "git status" and other operations that need to see which paths have been modified. * bp/fsmonitor: fsmonitor: preserve utf8 filenames in fsmonitor-watchman log fsmonitor: read entirety of watchman output fsmonitor: MINGW support for watchman integration fsmonitor: add a performance test fsmonitor: add a sample integration script for Watchman fsmonitor: add test cases for fsmonitor extension split-index: disable the fsmonitor extension when running the split index test fsmonitor: add a test tool to dump the index extension update-index: add fsmonitor support to update-index ls-files: Add support in ls-files to display the fsmonitor valid bit fsmonitor: add documentation for the fsmonitor extension. fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files. update-index: add a new --force-write-index option preload-index: add override to enable testing preload-index bswap: add 64 bit endianness helper get_be64
2017-11-17add: introduce "--renormalize"Torsten Bögershausen
Make it safer to normalize the line endings in a repository. Files that had been commited with CRLF will be commited with LF. The old way to normalize a repo was like this: # Make sure that there are not untracked files $ echo "* text=auto" >.gitattributes $ git read-tree --empty $ git add . $ git commit -m "Introduce end-of-line normalization" The user must make sure that there are no untracked files, otherwise they would have been added and tracked from now on. The new "add --renormalize" does not add untracked files: $ echo "* text=auto" >.gitattributes $ git add --renormalize . $ git commit -m "Introduce end-of-line normalization" Note that "git add --renormalize <pathspec>" is the short form for "git add -u --renormalize <pathspec>". While at it, document that the same renormalization may be needed, whenever a clean filter is added or changed. Helped-By: Junio C Hamano <gitster@pobox.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-15Merge branch 'bp/read-index-from-skip-verification'Junio C Hamano
Drop (perhaps overly cautious) sanity check before using the index read from the filesystem at runtime. * bp/read-index-from-skip-verification: read_index_from(): speed index loading by skipping verification of the entry order
2017-11-10fsmonitor: store fsmonitor bitmap before splitting indexAlex Vandiver
ba1b9cac ("fsmonitor: delay updating state until after split index is merged", 2017-10-27) resolved the problem of the fsmonitor data being applied to the non-base index when reading; however, a similar problem exists when writing the index. Specifically, writing of the fsmonitor extension happens only after the work to split the index has been applied -- as such, the information in the index is only for the non-"base" index, and thus the extension information contains only partial data. When saving, compute the ewah bitmap before the index is split, and store it in the fsmonitor_dirty field, mirroring the behavior that occurred during reading. fsmonitor_dirty is kept from being leaked by being freed when the extension data is written -- which always happens precisely once, no matter the split index configuration. Signed-off-by: Alex Vandiver <alexmv@dropbox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-08read_index_from(): speed index loading by skipping verification of the entry ↵Ben Peart
order There is code in post_read_index_from() to catch out of order entries when reading an index file. This order verification is ~13% of the cost of every call to read_index_from(). Update check_ce_order() so that it skips this verification unless the "verify_ce_order" global variable is set. Teach fsck to force this verification. The effect can be seen using t/perf/p0002-read-cache.sh: Test HEAD HEAD~1 -------------------------------------------------------------------------------------- 0002.1: read_cache/discard_cache 1000 times 0.41(0.04+0.04) 0.50(0.00+0.10) +22.0% Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-06Merge branch 'bc/object-id'Junio C Hamano
Conversion from uchar[20] to struct object_id continues. * bc/object-id: (25 commits) refs/files-backend: convert static functions to object_id refs: convert read_raw_ref backends to struct object_id refs: convert peel_object to struct object_id refs: convert resolve_ref_unsafe to struct object_id worktree: convert struct worktree to object_id refs: convert resolve_gitlink_ref to struct object_id Convert remaining callers of resolve_gitlink_ref to object_id sha1_file: convert index_path and index_fd to struct object_id refs: convert reflog_expire parameter to struct object_id refs: convert read_ref_at to struct object_id refs: convert peel_ref to struct object_id builtin/pack-objects: convert to struct object_id pack-bitmap: convert traverse_bitmap_commit_list to object_id refs: convert dwim_log to struct object_id builtin/reflog: convert remaining unsigned char uses to object_id refs: convert dwim_ref and expand_ref to struct object_id refs: convert read_ref and read_ref_full to object_id refs: convert resolve_refdup and refs_resolve_refdup to struct object_id Convert check_connected to use struct object_id refs: update ref transactions to use struct object_id ...
2017-10-16refs: convert resolve_gitlink_ref to struct object_idbrian m. carlson
Convert the declaration and definition of resolve_gitlink_ref to use struct object_id and apply the following semantic patch: @@ expression E1, E2, E3; @@ - resolve_gitlink_ref(E1, E2, E3.hash) + resolve_gitlink_ref(E1, E2, &E3) @@ expression E1, E2, E3; @@ - resolve_gitlink_ref(E1, E2, E3->hash) + resolve_gitlink_ref(E1, E2, E3) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-16Convert remaining callers of resolve_gitlink_ref to object_idbrian m. carlson
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-07read_cache: roll back lock in `update_index_if_able()`Martin Ågren
`update_index_if_able()` used to always commit the lock or roll it back. Commit 03b866477 (read-cache: new API write_locked_index instead of write_index/write_cache, 2014-06-13) stopped rolling it back in case a write was not even attempted. This change in behavior is not motivated in the commit message and appears to be accidental: the `else`-path was removed, although that changed the behavior in case the `if` shortcuts. Reintroduce the rollback and document this behavior. While at it, move the documentation on this function from the function definition to the function declaration in cache.h. If `write_locked_index(..., COMMIT_LOCK)` fails, it will roll back the lock for us (see the previous commit). Noticed-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-07read-cache: leave lock in right state in `write_locked_index()`Martin Ågren
If the original version of `write_locked_index()` returned with an error, it didn't roll back the lockfile unless the error occured at the very end, during closing/committing. See commit 03b866477 (read-cache: new API write_locked_index instead of write_index/write_cache, 2014-06-13). In commit 9f41c7a6b (read-cache: close index.lock in do_write_index, 2017-04-26), we learned to close the lock slightly earlier in the callstack. That was mostly a side-effect of lockfiles being implemented using temporary files, but didn't cause any real harm. Recently, commit 076aa2cbd (tempfile: auto-allocate tempfiles on heap, 2017-09-05) introduced a subtle bug. If the temporary file is deleted (i.e., the lockfile is rolled back), the tempfile-pointer in the `struct lock_file` will be left dangling. Thus, an attempt to reuse the lockfile, or even just to roll it back, will induce undefined behavior -- most likely a crash. Besides not crashing, we clearly want to make things consistent. The guarantees which the lockfile-machinery itself provides is A) if we ask to commit and it fails, roll back, and B) if we ask to close and it fails, do _not_ roll back. Let's do the same for consistency. Do not delete the temporary file in `do_write_index()`. One of its callers, `write_locked_index()` will thereby avoid rolling back the lock. The other caller, `write_shared_index()`, will delete its temporary file anyway. Both of these callers will avoid undefined behavior (crashing). Teach `write_locked_index(..., COMMIT_LOCK)` to roll back the lock before returning. If we have already succeeded and committed, it will be a noop. Simplify the existing callers where we now have a superfluous call to `rollback_lockfile()`. That should keep future readers from wondering why the callers are inconsistent. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-07read-cache: drop explicit `CLOSE_LOCK`-flagMartin Ågren
`write_locked_index()` takes two flags: `COMMIT_LOCK` and `CLOSE_LOCK`. At most one is allowed. But it is also possible to use no flag, i.e., `0`. But when `write_locked_index()` calls `do_write_index()`, the temporary file, a.k.a. the lockfile, will be closed. So passing `0` is effectively the same as `CLOSE_LOCK`, which seems like a bug. We might feel tempted to restructure the code in order to close the file later, or conditionally. It also feels a bit unfortunate that we simply "happen" to close the lock by way of an implementation detail of lockfiles. But note that we need to close the temporary file before `stat`-ing it, at least on Windows. See 9f41c7a6b (read-cache: close index.lock in do_write_index, 2017-04-26). Drop `CLOSE_LOCK` and make it explicit that `write_locked_index()` always closes the lock. Whether it is also committed is governed by the remaining flag, `COMMIT_LOCK`. This means we neither have nor suggest that we have a mode to write the index and leave the file open. Whatever extra contents we might eventually want to write, we should probably write it from within `write_locked_index()` itself anyway. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-01fsmonitor: teach git to optionally utilize a file system monitor to speed up ↵Ben Peart
detecting new or changed files. When the index is read from disk, the fsmonitor index extension is used to flag the last known potentially dirty index entries. The registered core.fsmonitor command is called with the time the index was last updated and returns the list of files changed since that time. This list is used to flag any additional dirty cache entries and untracked cache directories. We can then use this valid state to speed up preload_index(), ie_match_stat(), and refresh_cache_ent() as they do not need to lstat() files to detect potential changes for those entries marked CE_FSMONITOR_VALID. In addition, if the untracked cache is turned on valid_cached_dir() can skip checking directories for new or changed files as fsmonitor will invalidate the cache only for those directories that have been identified as having potential changes. To keep the CE_FSMONITOR_VALID state accurate during git operations; when git updates a cache entry to match the current state on disk, it will now set the CE_FSMONITOR_VALID bit. Inversely, anytime git changes a cache entry, the CE_FSMONITOR_VALID bit is cleared and the corresponding untracked cache directory is marked invalid. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-28Merge branch 'jk/fallthrough'Junio C Hamano
Many codepaths have been updated to squelch -Wimplicit-fallthrough warnings from Gcc 7 (which is a good code hygiene). * jk/fallthrough: consistently use "fallthrough" comments in switches curl_trace(): eliminate switch fallthrough test-line-buffer: simplify command parsing
2017-09-25Merge branch 'jk/write-in-full-fix'Junio C Hamano
Many codepaths did not diagnose write failures correctly when disks go full, due to their misuse of write_in_full() helper function, which have been corrected. * jk/write-in-full-fix: read_pack_header: handle signed/unsigned comparison in read result config: flip return value of store_write_*() notes-merge: use ssize_t for write_in_full() return value pkt-line: check write_in_full() errors against "< 0" convert less-trivial versions of "write_in_full() != len" avoid "write_in_full(fd, buf, len) != len" pattern get-tar-commit-id: check write_in_full() return against 0 config: avoid "write_in_full(fd, buf, len) < len" pattern
2017-09-25Merge branch 'kw/write-index-reduce-alloc'Junio C Hamano
A hotfix to a topic already in 'master'. * kw/write-index-reduce-alloc: read-cache: fix index corruption with index v4 Add t/helper/test-write-cache to .gitignore
2017-09-22consistently use "fallthrough" comments in switchesJeff King
Gcc 7 adds -Wimplicit-fallthrough, which can warn when a switch case falls through to the next case. The general idea is that the compiler can't tell if this was intentional or not, so you should annotate any intentional fall-throughs as such, leaving it to complain about any unannotated ones. There's a GNU __attribute__ which can be used for annotation, but of course we'd have to #ifdef it away on non-gcc compilers. Gcc will also recognize specially-formatted comments, which matches our current practice. Let's extend that practice to all of the unannotated sites (which I did look over and verify that they were behaving as intended). Ideally in each case we'd actually give some reasons in the comment about why we're falling through, or what we're falling through to. And gcc does support that with -Wimplicit-fallthrough=2, which relaxes the comment pattern matching to anything that contains "fallthrough" (or a variety of spelling variants). However, this isn't the default for -Wimplicit-fallthrough, nor for -Wextra. In the name of simplicity, it's probably better for us to support the default level, which requires "fallthrough" to be the only thing in the comment (modulo some window dressing like "else" and some punctuation; see the gcc manual for the complete set of patterns). This patch suppresses all warnings due to -Wimplicit-fallthrough. We might eventually want to add that to the DEVELOPER Makefile knob, but we should probably wait until gcc 7 is more widely adopted (since earlier versions will complain about the unknown warning type). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-14avoid "write_in_full(fd, buf, len) != len" patternJeff King
The return value of write_in_full() is either "-1", or the requested number of bytes[1]. If we make a partial write before seeing an error, we still return -1, not a partial value. This goes back to f6aa66cb95 (write_in_full: really write in full or return error on disk full., 2007-01-11). So checking anything except "was the return value negative" is pointless. And there are a couple of reasons not to do so: 1. It can do a funny signed/unsigned comparison. If your "len" is signed (e.g., a size_t) then the compiler will promote the "-1" to its unsigned variant. This works out for "!= len" (unless you really were trying to write the maximum size_t bytes), but is a bug if you check "< len" (an example of which was fixed recently in config.c). We should avoid promoting the mental model that you need to check the length at all, so that new sites are not tempted to copy us. 2. Checking for a negative value is shorter to type, especially when the length is an expression. 3. Linus says so. In d34cf19b89 (Clean up write_in_full() users, 2007-01-11), right after the write_in_full() semantics were changed, he wrote: I really wish every "write_in_full()" user would just check against "<0" now, but this fixes the nasty and stupid ones. Appeals to authority aside, this makes it clear that writing it this way does not have an intentional benefit. It's a historical curiosity that we never bothered to clean up (and which was undoubtedly cargo-culted into new sites). So let's convert these obviously-correct cases (this includes write_str_in_full(), which is just a wrapper for write_in_full()). [1] A careful reader may notice there is one way that write_in_full() can return a different value. If we ask write() to write N bytes and get a return value that is _larger_ than N, we could return a larger total. But besides the fact that this would imply a totally broken version of write(), it would already invoke undefined behavior. Our internal remaining counter is an unsigned size_t, which means that subtracting too many byte will wrap it around to a very large number. So we'll instantly begin reading off the end of the buffer, trying to write gigabytes (or petabytes) of data. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-08read-cache: fix index corruption with index v4Thomas Gummerer
ce012deb98 ("read-cache: avoid allocating every ondisk entry when writing", 2017-08-21) changed the way cache entries are written to the index file. While previously it wrote the name to an struct that was allocated using xcalloc(), it now uses ce_write() directly. Previously ce_namelen - common bytes were written to the cache entry, which would automatically make it nul terminated, as it was allocated using calloc. Now we are writing ce_namelen - common + 1 bytes directly from the ce->name to the index. If CE_STRIP_NAME however gets set in the split index case ce->ce_namelen is set to 0 without changing the actual ce->name buffer. When index-v4, this results in the first character of ce->name being written out instead of just a terminating nul charcter. As index-v4 requires the terminating nul character as terminator of the name when reading it back, this results in a corrupted index. Fix that by only writing ce_namelen - common bytes directly from ce->name to the index, and adding the nul terminator in an extra call to ce_write. This bug was turned up by setting TEST_GIT_INDEX_VERSION = 4 in config.mak and running the test suite (t1700 specifically broke). Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06tempfile: auto-allocate tempfiles on heapJeff King
The previous commit taught the tempfile code to give up ownership over tempfiles that have been renamed or deleted. That makes it possible to use a stack variable like this: struct tempfile t; create_tempfile(&t, ...); ... if (!err) rename_tempfile(&t, ...); else delete_tempfile(&t); But doing it this way has a high potential for creating memory errors. The tempfile we pass to create_tempfile() ends up on a global linked list, and it's not safe for it to go out of scope until we've called one of those two deactivation functions. Imagine that we add an early return from the function that forgets to call delete_tempfile(). With a static or heap tempfile variable, the worst case is that the tempfile hangs around until the program exits (and some functions like setup_shallow_temporary rely on this intentionally, creating a tempfile and then leaving it for later cleanup). But with a stack variable as above, this is a serious memory error: the variable goes out of scope and may be filled with garbage by the time the tempfile code looks at it. Let's see if we can make it harder to get this wrong. Since many callers need to allocate arbitrary numbers of tempfiles, we can't rely on static storage as a general solution. So we need to turn to the heap. We could just ask all callers to pass us a heap variable, but that puts the burden on them to call free() at the right time. Instead, let's have the tempfile code handle the heap allocation _and_ the deallocation (when the tempfile is deactivated and removed from the list). This changes the return value of all of the creation functions. For the cleanup functions (delete and rename), we'll add one extra bit of safety: instead of taking a tempfile pointer, we'll take a pointer-to-pointer and set it to NULL after freeing the object. This makes it safe to double-call functions like delete_tempfile(), as the second call treats the NULL input as a noop. Several callsites follow this pattern. The resulting patch does have a fair bit of noise, as each caller needs to be converted to handle: 1. Storing a pointer instead of the struct itself. 2. Passing the pointer instead of taking the struct address. 3. Handling a "struct tempfile *" return instead of a file descriptor. We could play games to make this less noisy. For example, by defining the tempfile like this: struct tempfile { struct heap_allocated_part_of_tempfile { int fd; ...etc } *actual_data; } Callers would continue to have a "struct tempfile", and it would be "active" only when the inner pointer was non-NULL. But that just makes things more awkward in the long run. There aren't that many callers, so we can simply bite the bullet and adjust all of them. And the compiler makes it easy for us to find them all. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06lockfile: do not rollback lock on failed closeJeff King
Since the lockfile code is based on the tempfile code, it has some of the same problems, including that close_lock_file() erases the tempfile's filename buf, making it hard for the caller to write a good error message. In practice this comes up less for lockfiles than for straight tempfiles, since we usually just report the refname. But there is at least one buggy case in write_ref_to_lockfile(). Besides, given the coupling between the lockfile and tempfile modules, it's less confusing if their close() functions have the same semantics. Just as the previous commit did for close_tempfile(), let's teach close_lock_file() and its wrapper close_ref() not to rollback on error. And just as before, we'll give them new "gently" names to catch any new callers that are added. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06tempfile: do not delete tempfile on failed closeJeff King
When close_tempfile() fails, we delete the tempfile and reset the fields of the tempfile struct. This makes it easier for callers to return without cleaning up, but it also makes this common pattern: if (close_tempfile(tempfile)) return error_errno("error closing %s", tempfile->filename.buf); wrong, because the "filename" field has been reset after the failed close. And it's not easy to fix, as in many cases we don't have another copy of the filename (e.g., if it was created via one of the mks_tempfile functions, and we just have the original template string). Let's drop the feature that a failed close automatically deletes the file. This puts the burden on the caller to do the deletion themselves, but this isn't that big a deal. Callers which do: if (write(...) || close_tempfile(...)) { delete_tempfile(...); return -1; } already had to call delete when the write() failed, and so aren't affected. Likewise, any caller which just calls die() in the error path is OK; we'll delete the tempfile during the atexit handler. Because this patch changes the semantics of close_tempfile() without changing its signature, all callers need to be manually checked and converted to the new scheme. This patch covers all in-tree callers, but there may be others for not-yet-merged topics. To catch these, we rename the function to close_tempfile_gently(), which will attract compile-time attention to new callers. (Technically the original could be considered "gentle" already in that it didn't die() on errors, but this one is even more so). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-27Merge branch 'kw/write-index-reduce-alloc'Junio C Hamano
We used to spend more than necessary cycles allocating and freeing piece of memory while writing each index entry out. This has been optimized. * kw/write-index-reduce-alloc: read-cache: avoid allocating every ondisk entry when writing read-cache: fix memory leak in do_write_index perf: add test for writing the index
2017-08-21read-cache: avoid allocating every ondisk entry when writingKevin Willford
When writing the index for each entry an ondisk struct will be allocated and freed in ce_write_entry. We can do better by using a ondisk struct on the stack for each entry. This is accomplished by using a stack ondisk_cache_entry_extended outside looping through the entries in do_write_index. Only the fixed fields of this struct are used when writing and depending on whether it is extended or not the flags2 field will be written. The name field is not used and instead the cache_entry name field is used directly when writing out the name. Because ce_write is using a buffer and memcpy to fill the buffer before flushing to disk, we don't have to worry about doing multiple ce_write calls. Running the p0007-write-cache.sh tests would save anywhere between 3-7% when the index had over a million entries with no performance degradation on small repos. Signed-off-by: Kevin Willford <kewillf@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-21read-cache: fix memory leak in do_write_indexKevin Willford
The previous_name_buf was never getting released when there was an error in ce_write_entry or allow was false and execution was returned to the caller. Signed-off-by: Kevin Willford <kewillf@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-21sha1_file: convert index_fd to struct object_idPatryk Obara
Convert all remaining callers as well. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-21sha1_file: convert index_path to struct object_idPatryk Obara
Convert all remaining callers as well. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-21read-cache: convert to struct object_idPatryk Obara
Replace hashcmp with oidcmp. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-17use MOVE_ARRAYRené Scharfe
Simplify the code for moving members inside of an array and make it more robust by using the helper macro MOVE_ARRAY. It calculates the size based on the specified number of elements for us and supports NULL pointers when that number is zero. Raw memmove(3) calls with NULL can cause the compiler to (over-eagerly) optimize out later NULL checks. This patch was generated with contrib/coccinelle/array.cocci and spatch (Coccinelle). Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-05Merge branch 'cc/shared-index-permfix'Junio C Hamano
The split index code did not honor core.sharedrepository setting correctly. * cc/shared-index-permfix: t1700: make sure split-index respects core.sharedrepository t1301: move modebits() to test-lib-functions.sh read-cache: use shared perms when writing shared index
2017-06-25read-cache: use shared perms when writing shared indexChristian Couder
Since f6ecc62dbf (write_shared_index(): use tempfile module, 2015-08-10) write_shared_index() has been using mks_tempfile() to create the temporary file that will become the shared index. But even before that, it looks like the functions used to create this file didn't call adjust_shared_perm(), which means that the shared index file has always been created with 600 permissions regardless of the shared permission settings. Because of that, on repositories created with `git init --shared=all` and using the split index feature, one gets an error like: fatal: .git/sharedindex.a52f910b489bc462f187ab572ba0086f7b5157de: index file open failed: Permission denied when another user performs any operation that reads the shared index. Call adjust_shared_perm() on the temporary file created by mks_tempfile() ourselves to adjust the permission bits. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-24Merge branch 'ab/free-and-null'Junio C Hamano
A common pattern to free a piece of memory and assign NULL to the pointer that used to point at it has been replaced with a new FREE_AND_NULL() macro. * ab/free-and-null: *.[ch] refactoring: make use of the FREE_AND_NULL() macro coccinelle: make use of the "expression" FREE_AND_NULL() rule coccinelle: add a rule to make "expression" code use FREE_AND_NULL() coccinelle: make use of the "type" FREE_AND_NULL() rule coccinelle: add a rule to make "type" code use FREE_AND_NULL() git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
2017-06-24Merge branch 'bw/config-h'Junio C Hamano
Fix configuration codepath to pay proper attention to commondir that is used in multi-worktree situation, and isolate config API into its own header file. * bw/config-h: config: don't implicitly use gitdir or commondir config: respect commondir setup: teach discover_git_directory to respect the commondir config: don't include config.h by default config: remove git_config_iter config: create config.h
2017-06-24Merge branch 'nd/split-index-unshare'Junio C Hamano
* nd/split-index-unshare: Revert "split-index: add and use unshare_split_index()"