summaryrefslogtreecommitdiff
path: root/repository.c
AgeCommit message (Collapse)Author
2022-03-13Merge branch 'ab/plug-random-leaks'Junio C Hamano
Plug random memory leaks. * ab/plug-random-leaks: repository.c: free the "path cache" in repo_clear() range-diff: plug memory leak in read_patches() range-diff: plug memory leak in common invocation lockfile API users: simplify and don't leak "path" commit-graph: stop fill_oids_from_packs() progress on error and free() commit-graph: fix memory leak in misused string_list API submodule--helper: fix trivial leak in module_add() transport: stop needlessly copying bundle header references bundle: call strvec_clear() on allocated strvec remote-curl.c: free memory in cmd_main() urlmatch.c: add and use a *_release() function diff.c: free "buf" in diff_words_flush() merge-base: free() allocated "struct commit **" list index-pack: fix memory leaks
2022-03-04repository.c: free the "path cache" in repo_clear()Ævar Arnfjörð Bjarmason
The "struct path_cache" added in 102de880d24 (path.c: migrate global git_path_* to take a repository argument, 2018-05-17) is only used directly by code in repository.[ch] (but populated in path.[ch]). Let's move this code to repository.[ch], and stop leaking this memory when we run repo_clear(). To avoid the cast change it from a "const char *" to a "char *". This also removes the "PATH_CACHE_INIT" macro, which has never been used for anything. For the "struct repository" we already make a hard assumption that it (and "the_repository") can be identically initialized by making it a "static" variable, so making use of a "PATH_CACHE_INIT" somewhere would have been confusing. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-14repo_read_index: clear SKIP_WORKTREE bit from files present in worktreeElijah Newren
The fix is short (~30 lines), but the description is not. Sorry. There is a set of problems caused by files in what I'll refer to as the "present-despite-SKIP_WORKTREE" state. This commit aims to not just fix these problems, but remove the entire class as a possibility -- for those using sparse checkouts. But first, we need to understand the problems this class presents. A quick outline: * Problems * User facing issues * Problem space complexity * Maintenance and code correctness challenges * SKIP_WORKTREE expectations in Git * Suggested solution * Pros/Cons of suggested solution * Notes on testcase modifications === User facing issues === There are various ways for users to get files to be present in the working copy despite having the SKIP_WORKTREE bit set for that file in the index. This may come from: * various git commands not really supporting the SKIP_WORKTREE bit[1,2] * users grabbing files from elsewhere and writing them to the worktree (perhaps even cached in their editor) * users attempting to "abort" a sparse-checkout operation with a not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the working tree is not atomic)[3]. Once users have present-despite-SKIP_WORKTREE files, any modifications users make to these files will be ignored, possibly to users' confusion. Further: * these files will degrade performance for the sparse-index case due to requiring the index to be expanded (see commit 55dfcf9591 ("sparse-checkout: clear tracked sparse dirs", 2021-09-08) for why we try to delete entire directories outside the sparse cone). * these files will not be updated by by standard commands (switch/checkout/pull/merge/rebase will leave them alone unless conflicts happen -- and even then, the conflicted file may be written somewhere else to avoid overwriting the SKIP_WORKTREE file that is present and in the way) * there is nothing in Git that users can use to discover such files (status, diff, grep, etc. all ignore it) * there is no reasonable mechanism to "recover" from such a condition (neither `git sparse-checkout reapply` nor `git reset --hard` will correct it). So, not only are users modifications ignored, but the files get progressively more stale over time. At some point in the future, they may change their sparseness specification or disable sparse-checkouts. At that time, all present-despite-SKIP_WORKTREE files will show up as having lots of modifications because they represent a version from a different branch or commit. These might include user-made local changes from days before, but the only way to tell is to have users look through them all closely. If these users come to others for help, there will be no logs that explain the issue; it's just a mysterious list of changes. Users might adamantly claim (correctly, as it turns out) that they didn't modify these files, while others presume they did. [1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ [2] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/ === Problem space complexity === SKIP_WORKTREE has been part of Git for over a decade. Duy did lots of work on it initially, and several others have since come along and put lots of work into it. Stolee spent most of 2021 on the sparse-index, with lots of bugfixes along the way including to non-sparse-index cases as we are still trying to get sparse checkouts to behave reasonably. Basically every codepath throughout the treat needs to be aware of an additional type of file: tracked-but-not-present. The extra type results in lots of extra testcases and lots of extra code everywhere. But, the sad thing is that we actually have more than one extra type. We have tracked, tracked-but-not-present (SKIP_WORKTREE), and tracked-but-promised-to-not-be-present-but-is-present-anyway (present-despite-SKIP_WORKTREE). Two types is a monumental amount of effort to support, and adding a third feels a bit like insanity[4]. [4] Some examples of which can be seen at https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ === Maintenance and code correctness challenges === Matheus' patches to grep stalled for nearly a year, in part because of complications of how to handle sparse-checkouts appropriately in all cases[5][6] (with trying to sanely figure out how to sanely handle present-despite-SKIP_WORKTREE files being one of the complications). His rm/add follow-ups also took months because of those kinds of issues[7]. The corner cases with things like submodules and SKIP_WORKTREE with the addition of present-despite-SKIP_WORKTREE start becoming really complex[8]. We've had to add ugly logic to merge-ort to attempt to handle present-despite-SKIP_WORKTREE files[9], and basically just been forced to give up in merge-recursive knowing full well that we'll sometimes silently discard user modifications. Despite stash essentially being a merge, it needed extra code (beyond what was in merge-ort and merge-recursive) to manually tweak SKIP_WORKTREE bits in order to avoid a few different bugs that'd result in an early abort with a partial stash application[10]. [5] See https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/#t and the dates on the thread; also Matheus and I had several conversations off-list trying to resolve the issues over that time [6] ...it finally kind of got unstuck after https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ [7] See for example https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/#t and quotes like "The core functionality of sparse-checkout has always been only partially implemented", a statement I still believe is true today. [8] https://lore.kernel.org/git/pull.809.git.git.1592356884310.gitgitgadget@gmail.com/ [9] See commit 66b209b86a ("merge-ort: implement CE_SKIP_WORKTREE handling with conflicted entries", 2021-03-20) [10] See commit ba359fd507 ("stash: fix stash application in sparse-checkouts", 2020-12-01) === SKIP_WORKTREE expectations in Git === A couple quotes: * From [11] (before the "sparse-checkout" command existed): If it needs too many special cases, hacks, and conditionals, then it is not worth the complexity---if it is easier to write a correct code by allowing Git to populate working tree files, it is perfectly fine to do so. In a sense, the sparse checkout "feature" itself is a hack by itself, and that is why I think this part should be "best effort" as well. * From the git-sparse-checkout manual (still present today): THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE. [11] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ === Suggested solution === SKIP_WORKTREE was written to allow sparse-checkouts, in particular, as the name of the option implies, to allow the file to NOT be in the worktree but consider it to be unchanged rather than deleted. The suggests a simple solution: present-despite-SKIP_WORKTREE files should not exist, for those using sparse-checkouts. Enforce this at index loading time by checking if core.sparseCheckout is true; if so, check files in the index with the SKIP_WORKTREE bit set to verify that they are absent from the working tree. If they are present, unset the bit (in memory, though any commands that write to the index will record the update). Users can, of course, can get the SKIP_WORKTREE bit back such as by running `git sparse-checkout reapply` (if they have ensured the file is unmodified and doesn't match the specified sparsity patterns). === Pros/Cons of suggested solution === Pros: * Solves the user visible problems reported above, which I've been complaining about for nearly a year but couldn't find a solution to. * Helps prevent slow performance degradation with a sparse-index. * Much easier behavior in sparse-checkouts for users to reason about * Very simple, ~30 lines of code. * Significantly simplifies some ugly testcases, and obviates the need to test an entire class of potential issues. * Reduces code complexity, reasoning, and maintenance. Avoids disagreements about weird corner cases[12]. * It has been reported that some users might be (ab)using SKIP_WORKTREE as a let-me-modify-but-keep-the-file-in-the-worktree mechanism[13, and a few other similar references]. These users know of multiple caveats and shortcomings in doing so; perhaps not surprising given the "SKIP_WORKTREE expecations" section above. However, these users use `git update-index --skip-worktree`, and not `git sparse-checkout` or core.sparseCheckout=true. As such, these users would be unaffected by this change and can continue abusing the system as before. [12] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [13] https://stackoverflow.com/questions/13630849/git-difference-between-assume-unchanged-and-skip-worktree Cons: * When core.sparseCheckout is enabled, this adds a performance cost to reading the index. I'll defer discussion of this cost to a subsequent patch, since I have some optimizations to add. === Notes on testcase modifications === The good: * t1011: Compare to two cases above it ('read-tree will not throw away dirty changes, non-sparse'); since the file is present, it should match the non-sparse case now * t1092: sparse-index & sparse-checkout now match full-worktree behavior in more cases! Yaay for consistency! * t6428, t7012: look at how much simpler the tests become! Merge and stash can just fail early telling the user there's a file in the way, instead of not noticing until it's about to write a file and then have to implement sudden crash avoidance. Hurray for sanity! * t7817: sparse behavior better matches full tree behavior. Hurray for sanity! The confusing: * t3705: These changes were ONLY needed on Windows, but they don't hurt other platforms. Let's discuss each individually: * core.sparseCheckout should be false by default. Nothing in this testcase toggles that until many, many tests later. However, early tests (#5 in particular) were testing `update-index --skip-worktree` behavior in a non-sparse-checkout, but the Windows tests in CI were behaving as if core.sparseCheckout=true had been specified somewhere. I do not have access to a Windows machine. But I just manually did what should have been a no-op and turned the config off. And it fixed the test. * I have no idea why the leftover .gitattributes file from this test was causing failures for test #18 on Windows, but only with these changes of mine. Test #18 was checking for empty stderr, and specifically wanted to know that some error completely unrelated to file endings did not appear. The leftover .gitattributes file thus caused some spurious stderr unrelated to the thing being checked. Since other tests did not intend to test normalization, just proactively remove the .gitattributes file. I'm certain this is cleaner and better, I'm just unsure why/how this didn't trigger problems before. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-04Merge branch 'ns/tmp-objdir'Junio C Hamano
New interface into the tmp-objdir API to help in-core use of the quarantine feature. * ns/tmp-objdir: tmp-objdir: disable ref updates when replacing the primary odb tmp-objdir: new API for creating temporary writable databases
2021-12-08tmp-objdir: disable ref updates when replacing the primary odbNeeraj Singh
When creating a subprocess with a temporary ODB, we set the GIT_QUARANTINE_ENVIRONMENT env var to tell child Git processes not to update refs, since the tmp-objdir may go away. Introduce a similar mechanism for in-process temporary ODBs when we call tmp_objdir_replace_primary_odb. Now both mechanisms set the disable_ref_updates flag on the odb, which is queried by the ref_transaction_prepare function. Peff's test case [1] was invoking ref updates via the cachetextconv setting. That particular code silently does nothing when a ref update is forbidden. See the call to notes_cache_put in fill_textconv where errors are ignored. [1] https://lore.kernel.org/git/YVOn3hDsb5pnxR53@coredump.intra.peff.net/ Reported-by: Jeff King <peff@peff.net> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-19remote: move static variables into per-repository structGlen Choo
remote.c does not works with non-the_repository because it stores its state as static variables. To support non-the_repository, we can use a per-repository struct for the remotes subsystem. Prepare for this change by defining a struct remote_state that holds the remotes subsystem state and move the static variables of remote.c into the_repository->remote_state. This introduces no behavioral or API changes. Signed-off-by: Glen Choo <chooglen@google.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-06Merge branch 'jt/add-submodule-odb-clean-up'Junio C Hamano
More code paths that use the hack to add submodule's object database to the set of alternate object store have been cleaned up. * jt/add-submodule-odb-clean-up: revision: remove "submodule" from opt struct repository: support unabsorbed in repo_submodule_init submodule: remove unnecessary unabsorbed fallback
2021-09-15submodule: extract path to submodule gitdir funcJonathan Tan
We currently store each submodule gitdir in ".git/modules/<name>", but this has problems with some submodule naming schemes, as described in a comment in submodule_name_to_gitdir() in this patch. Extract the determination of the location of a submodule's gitdir into its own function submodule_name_to_gitdir(). For now, the problem remains unsolved, but this puts us in a better position for finding a solution. This was motivated, at $DAYJOB, by a part of Android's repo hierarchy [1]. In particular, there is a repo "build", and several repos of the form "build/<name>". This is based on earlier work by Brandon Williams [2]. [1] https://android.googlesource.com/platform/ [2] https://lore.kernel.org/git/20180808223323.79989-2-bmwill@google.com/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09repository: support unabsorbed in repo_submodule_initJonathan Tan
In preparation for a subsequent commit that migrates code using add_submodule_odb() to repo_submodule_init(), teach repo_submodule_init() to support submodules with unabsorbed gitdirs. (See the documentation for "git submodule absorbgitdirs" for more information about absorbed and unabsorbed gitdirs.) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-28promisor-remote: support per-repository configJonathan Tan
Instead of using global variables to store promisor remote information, store this config in struct repository instead, and add repository-agnostic non-static functions corresponding to the existing non-static functions that only work on the_repository. The actual lazy-fetching of missing objects currently does not work on repositories other than the_repository, and will still not work after this commit, so add a BUG message explaining this. A subsequent commit will remove this limitation. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-28repository: move global r_f_p_c to repo structJonathan Tan
Move repository_format_partial_clone, which is currently a global variable, into struct repository. (Full support for per-repository partial clone config will be done in a subsequent commit - this is split into its own commit because of the extent of the changes needed.) The new repo-specific variable cannot be set in check_repository_format_gently() (as is currently), because that function does not know which repo it is operating on (or even whether the value is important); therefore this responsibility is delegated to the outermost caller that knows. Of all the outermost callers that know (found by looking at all functions that call clear_repository_format()), I looked at those that either read from the main Git directory or write into a struct repository. These callers have been modified accordingly (write to the_repository in the former case and write to the given struct repository in the latter case). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-30Merge branch 'ds/sparse-index-protections'Junio C Hamano
Builds on top of the sparse-index infrastructure to mark operations that are not ready to mark with the sparse index, causing them to fall back on fully-populated index that they always have worked with. * ds/sparse-index-protections: (47 commits) name-hash: use expand_to_path() sparse-index: expand_to_path() name-hash: don't add directories to name_hash revision: ensure full index resolve-undo: ensure full index read-cache: ensure full index pathspec: ensure full index merge-recursive: ensure full index entry: ensure full index dir: ensure full index update-index: ensure full index stash: ensure full index rm: ensure full index merge-index: ensure full index ls-files: ensure full index grep: ensure full index fsck: ensure full index difftool: ensure full index commit: ensure full index checkout: ensure full index ...
2021-03-30sparse-index: add guard to ensure full indexDerrick Stolee
Upcoming changes will introduce modifications to the index format that allow sparse directories. It will be useful to have a mechanism for converting those sparse index files into full indexes by walking the tree at those sparse directories. Name this method ensure_full_index() as it will guarantee that the index is fully expanded. This method is not implemented yet, and instead we focus on the scaffolding to declare it and call it at the appropriate time. Add a 'command_requires_full_index' member to struct repo_settings. This will be an indicator that we need the index in full mode to do certain index operations. This starts as being true for every command, then we will set it to false as some commands integrate with sparse indexes. If 'command_requires_full_index' is true, then we will immediately expand a sparse index to a full one upon reading from disk. This suffices for now, but we will want to add more callers to ensure_full_index() later. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-14use CALLOC_ARRAYRené Scharfe
Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-24repository: add repo reference to index_stateDerrick Stolee
It will be helpful to add behavior to index operations that might trigger an object lookup. Since each index belongs to a specific repository, add a 'repo' pointer to struct index_state that allows access to this repository. Add a BUG() statement if the repo already has an index, and the index already has a repo, but somehow the index points to a different repo. This will prevent future changes from needing to pass an additional 'struct repository *repo' parameter and instead rely only on the 'struct index_state *istate' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-30repository: enable SHA-256 support by defaultbrian m. carlson
Now that we have a complete SHA-256 implementation in Git, let's enable it so people can use it. Remove the ENABLE_SHA256 define constant everywhere it's used. Add tests for initializing a repository with SHA-256. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24repository: require a build flag to use SHA-256brian m. carlson
At this point, SHA-256 support is experimental and some behavior may change. To avoid surprising unsuspecting users, require a build flag, ENABLE_SHA256, to allow use of a non-SHA-1 algorithm. Enable this flag by default when the DEVELOPER make flag is set so that contributors can test this case adequately. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-10Fix spelling errors in code commentsElijah Newren
Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-20Merge branch 'ma/clear-repository-format'Junio C Hamano
The setup code has been cleaned up to avoid leaks around the repository_format structure. * ma/clear-repository-format: setup: fix memory leaks with `struct repository_format` setup: free old value before setting `work_tree`
2019-02-28setup: fix memory leaks with `struct repository_format`Martin Ågren
After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. Thus, it is important to initialize the struct before calling `read_...()`, so document that. It's also important because we might not even call `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c. Teach `read_...()` to clear the struct on error, so that it is reset to a safe state, and document this. (In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.) We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-22trace2: create new combined trace facilityJeff Hostetler
Create a new unified tracing facility for git. The eventual intent is to replace the current trace_printf* and trace_performance* routines with a unified set of git_trace2* routines. In addition to the usual printf-style API, trace2 provides higer-level event verbs with fixed-fields allowing structured data to be written. This makes post-processing and analysis easier for external tools. Trace2 defines 3 output targets. These are set using the environment variables "GIT_TR2", "GIT_TR2_PERF", and "GIT_TR2_EVENT". These may be set to "1" or to an absolute pathname (just like the current GIT_TRACE). * GIT_TR2 is intended to be a replacement for GIT_TRACE and logs command summary data. * GIT_TR2_PERF is intended as a replacement for GIT_TRACE_PERFORMANCE. It extends the output with columns for the command process, thread, repo, absolute and relative elapsed times. It reports events for child process start/stop, thread start/stop, and per-thread function nesting. * GIT_TR2_EVENT is a new structured format. It writes event data as a series of JSON records. Calls to trace2 functions log to any of the 3 output targets enabled without the need to call different trace_printf* or trace_performance* routines. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-07Merge branch 'nd/the-index-final'Junio C Hamano
The assumption to work on the single "in-core index" instance has been reduced from the library-ish part of the codebase. * nd/the-index-final: cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch read-cache.c: remove the_* from index_has_changes() merge-recursive.c: remove implicit dependency on the_repository merge-recursive.c: remove implicit dependency on the_index sha1-name.c: remove implicit dependency on the_index read-cache.c: replace update_index_if_able with repo_& read-cache.c: kill read_index() checkout: avoid the_index when possible repository.c: replace hold_locked_index() with repo_hold_locked_index() notes-utils.c: remove the_repository references grep: use grep_opt->repo instead of explict repo argument
2019-01-29Merge branch 'sb/submodule-recursive-fetch-gets-the-tip'Junio C Hamano
"git fetch --recurse-submodules" may not fetch the necessary commit that is bound to the superproject, which is getting corrected. * sb/submodule-recursive-fetch-gets-the-tip: fetch: ensure submodule objects fetched submodule.c: fetch in submodules git directory instead of in worktree submodule: migrate get_next_submodule to use repository structs repository: repo_submodule_init to take a submodule struct submodule: store OIDs in changed_submodule_names submodule.c: tighten scope of changed_submodule_names struct submodule.c: sort changed_submodule_names before searching it submodule.c: fix indentation sha1-array: provide oid_array_filter
2019-01-24cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switchNguyễn Thái Ngọc Duy
By default, index compat macros are off from now on, because they could hide the_index dependency. Only those in builtin can use it. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14repository.c: replace hold_locked_index() with repo_hold_locked_index()Nguyễn Thái Ngọc Duy
hold_locked_index() assumes the index path at $GIT_DIR/index. This is not good for places that take an arbitrary index_state instead of the_index, which is basically everywhere except builtin/. Replace it with repo_hold_locked_index(). hold_locked_index() remains as a wrapper around repo_hold_locked_index() to reduce changes in builtin/ Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-05repository: repo_submodule_init to take a submodule structStefan Beller
When constructing a struct repository for a submodule for some revision of the superproject where the submodule is not contained in the index, it may not be present in the working tree currently either. In that situation giving a 'path' argument is not useful. Upgrade the repo_submodule_init function to take a struct submodule instead. The submodule struct can be obtained via submodule_from_{path, name} or an artificial submodule struct can be passed in. While we are at it, rename the repository struct in the repo_submodule_init function, which is to be initialized, to a name that is not confused with the struct submodule as easily. Perform such renames in similar functions as well. Also move its documentation into the header file. Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-13sha1-file: use an object_directory for the main object dirJeff King
Our handling of alternate object directories is needlessly different from the main object directory. As a result, many places in the code basically look like this: do_something(r->objects->objdir); for (odb = r->objects->alt_odb_list; odb; odb = odb->next) do_something(odb->path); That gets annoying when do_something() is non-trivial, and we've resorted to gross hacks like creating fake alternates (see find_short_object_filename()). Instead, let's give each raw_object_store a unified list of object_directory structs. The first will be the main store, and everything after is an alternate. Very few callers even care about the distinction, and can just loop over the whole list (and those who care can just treat the first element differently). A few observations: - we don't need r->objects->objectdir anymore, and can just mechanically convert that to r->objects->odb->path - object_directory's path field needs to become a real pointer rather than a FLEX_ARRAY, in order to fill it with expand_base_dir() - we'll call prepare_alt_odb() earlier in many functions (i.e., outside of the loop). This may result in us calling it even when our function would be satisfied looking only at the main odb. But this doesn't matter in practice. It's not a very expensive operation in the first place, and in the majority of cases it will be a noop. We call it already (and cache its results) in prepare_packed_git(), and we'll generally check packs before loose objects. So essentially every program is going to call it immediately once per program. Arguably we should just prepare_alt_odb() immediately upon setting up the repository's object directory, which would save us sprinkling calls throughout the code base (and forgetting to do so has been a source of subtle bugs in the past). But I've stopped short of that here, since there are already a lot of other moving parts in this patch. - Most call sites just get shorter. The check_and_freshen() functions are an exception, because they have entry points to handle local and nonlocal directories separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-25Merge branch 'sb/object-store-alloc'Junio C Hamano
The conversion to pass "the_repository" and then "a_repository" throughout the object access API continues. * sb/object-store-alloc: alloc: allow arbitrary repositories for alloc functions object: allow create_object to handle arbitrary repositories object: allow grow_object_hash to handle arbitrary repositories alloc: add repository argument to alloc_commit_index alloc: add repository argument to alloc_report alloc: add repository argument to alloc_object_node alloc: add repository argument to alloc_tag_node alloc: add repository argument to alloc_commit_node alloc: add repository argument to alloc_tree_node alloc: add repository argument to alloc_blob_node object: add repository argument to grow_object_hash object: add repository argument to create_object repository: introduce parsed objects field
2018-05-30Merge branch 'nd/repo-clear-keep-the-index'Junio C Hamano
the_repository->index is not a allocated piece of memory but repo_clear() indiscriminately attempted to free(3) it, which has been corrected. * nd/repo-clear-keep-the-index: repository: fix free problem with repo_clear(the_repository)
2018-05-10repository: fix free problem with repo_clear(the_repository)Nguyễn Thái Ngọc Duy
the_repository is special. One of the special things about it is that it does not allocate a new index_state object like submodules but points to the global the_index variable instead. As a global variable, the_index cannot be free()'d. Add an exception for this in repo_clear(). In the future perhaps we would be able to allocate the_repository's index on heap too. Then we can revert this. the_repository->index remains pointed to a clean the_index even after repo_clear() so that it could still be used next time (e.g. in a crazy use case where a dev switches repo in the same process). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-09repository: introduce parsed objects fieldStefan Beller
Convert the existing global cache for parsed objects (obj_hash) into repository-specific parsed object caches. Existing code that uses obj_hash are modified to use the parsed object cache of the_repository; future patches will use the parsed object caches of other repositories. Another future use case for a pool of objects is ease of memory management in revision walking: If we can free the rev-list related memory early in pack-objects (e.g. part of repack operation) then it could lower memory pressure significantly when running on large repos. While this has been discussed on the mailing list lately, this series doesn't implement this. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-29submodule: fixup nested submodules after moving the submoduleStefan Beller
connect_work_tree_and_git_dir is used to connect a submodule worktree with its git directory and vice versa after events that require a reconnection such as moving around the working tree. As submodules can have nested submodules themselves, we'd also want to fix the nested submodules when asked to. Add an option to recurse into the nested submodules and connect them as well. As submodules are identified by their name (which determines their git directory in relation to their superproject's git directory) internally and by their path in the working tree of the superproject, we need to make sure that the mapping of name <-> path is kept intact. We can do that in the git-mv command by writing out the gitmodules file first and then forcing a reload of the submodule config machinery. Signed-off-by: Stefan Beller <sbeller@google.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-29submodule-config: remove submodule_from_cacheStefan Beller
This continues the story of bf12fcdf5e (submodule-config: store the_submodule_cache in the_repository, 2017-06-22). The previous patch taught submodule_from_path to take a repository into account, such that submodule_from_{path, cache} are the same now. Remove submodule_from_cache, migrating all its callers to submodule_from_path. Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Stefan Beller <sbeller@google.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-23repository: introduce raw object store fieldStefan Beller
The raw object store field will contain any objects needed for access to objects in a given repository. This patch introduces the raw object store and populates it with the `objectdir`, which used to be part of the repository struct. As the struct gains members, we'll also populate the function to clear the memory for these members. In a later step, we'll introduce a struct object_parser, that will complement the object parsing in a repository struct: The raw object parser is the layer that will provide access to raw object content, while the higher level object parser code will parse raw objects and keeps track of parenthood and other object relationships using 'struct object'. For now only add the lower level to the repository struct. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-05repository: delete ignore_env memberNguyễn Thái Ngọc Duy
This variable was added because the repo_set_gitdir() was created to cover both submodule and main repos, but these two are initialized a bit differently so ignore_env == 0 means main repo, while ignore_env != 0 is submodules. Since the difference part (env variables) has been moved out of repo_set_gitdir(), this function works the same way for both repo types and ignore_env is not needed anymore. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-05sha1_file.c: move delayed getenv(altdb) back to setup_git_env()Nguyễn Thái Ngọc Duy
getenv() is supposed to work on the main repository only. This delayed getenv() code in sha1_file.c makes it more difficult to convert sha1_file.c to a generic object store that could be used by both submodule and main repositories. Move the getenv() back in setup_git_env() where other env vars are also fetched. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-05repository.c: delete dead functionsNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-05repository.c: move env-related setup code back to environment.cNguyễn Thái Ngọc Duy
It does not make sense that generic repository code contains handling of environment variables, which are specific for the main repository only. Refactor repo_set_gitdir() function to take $GIT_DIR and optionally _all_ other customizable paths. These optional paths can be NULL and will be calculated according to the default directory layout. Note that some dead functions are left behind to reduce diff noise. They will be deleted in the next patch. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-05repository: initialize the_repository in main()Nguyễn Thái Ngọc Duy
This simplifies initialization of struct repository and anything inside. Easier to read. Easier to add/remove fields. Everything will go through main() common-main.c so this should cover all programs, including t/helper. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-13Merge branch 'tg/split-index-fixes'Junio C Hamano
The split-index mode had a few corner case bugs fixed. * tg/split-index-fixes: travis: run tests with GIT_TEST_SPLIT_INDEX split-index: don't write cache tree with null oid entries read-cache: fix reading the shared index for other repos
2018-01-19repository: pre-initialize hash algo pointerbrian m. carlson
There are various git subcommands (among them, clone) which don't set up the repository (that is, they lack RUN_SETUP or RUN_SETUP_GENTLY) but end up needing to have information about the hash algorithm in use. Because the hash algorithm is part of struct repository and it's only initialized in repository setup, we can end up dereferencing a NULL pointer in some cases if we call one of these subcommands and look up the empty blob or empty tree values. A "git clone" of a project that has two paths that differ only in case suffers from this if it is run on a case insensitive platform. When the command attempts to check out one of these two paths after checking out the other one, the checkout codepath needs to see if the version that is already on the filesystem (which should not happen if the FS were case sensitive) is dirty, and it needs to exercise the hashing code at that point. In the future, we can add a command line option for this or read it from the configuration, but until we're ready to expose that functionality to the user, simply initialize the repository structure to use the current hash algorithm, SHA-1. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-19read-cache: fix reading the shared index for other reposThomas Gummerer
read_index_from() takes a path argument for the location of the index file. For reading the shared index in split index mode however it just ignores that path argument, and reads it from the gitdir of the current repository. This works as long as an index in the_repository is read. Once that changes, such as when we read the index of a submodule, or of a different working tree than the current one, the gitdir of the_repository will no longer contain the appropriate shared index, and git will fail to read it. For example t3007-ls-files-recurse-submodules.sh was broken with GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also broken in a similar manner, probably by introducing struct repository there, although I didn't track down the exact commit for that. be489d02d2 ("revision.c: --indexed-objects add objects from all worktrees", 2017-08-23) breaks with split index mode in a similar manner, not erroring out when it can't read the index, but instead carrying on with pruning, without taking the index of the worktree into account. Fix this by passing an additional gitdir parameter to read_index_from, to indicate where it should look for and read the shared index from. read_cache_from() defaults to using the gitdir of the_repository. As it is mostly a convenience macro, having to pass get_git_dir() for every call seems overkill, and if necessary users can have more control by using read_index_from(). Helped-by: Brandon Williams <bmwill@google.com> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-28repository: fix a sparse 'using integer as NULL pointer' warningRamsay Jones
Commit 78a6766802 ("Integrate hash algorithm support with repo setup", 2017-11-12) added a 'const struct git_hash_algo *hash_algo' field to the repository structure, without modifying the initializer of the 'the_repo' variable. This does not actually introduce a bug, since the '0' initializer for the 'ignore_env:1' bit-field is interpreted as a NULL pointer (hence the warning), and the final field (now with no initializer) receives a default '0'. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-13Integrate hash algorithm support with repo setupbrian m. carlson
In future versions of Git, we plan to support an additional hash algorithm. Integrate the enumeration of hash algorithms with repository setup, and store a pointer to the enumerated data in struct repository. Of course, we currently only support SHA-1, so hard-code this value in read_repository_format. In the future, we'll enumerate this value from the configuration. Add a constant, the_hash_algo, which points to the hash_algo structure pointer in the repository global. Note that this is the hash which is used to serialize data to disk, not the hash which is used to display items to the user. The transition plan anticipates that these may be different. We can add an additional element in the future (say, ui_hash_algo) to provide for this case. Include repository.h in cache.h since we now need to have access to these struct and variable definitions. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-02repository: use FREE_AND_NULLRené Scharfe
Use the macro FREE_AND_NULL to release allocated objects and clear their pointers. This is shorter and documents the intent better by combining the two related operations into one. Patch generated with Coccinelle and contrib/coccinelle/free.cocci. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06set_git_dir: handle feeding gitdir to itselfJeff King
Ideally we'd free the existing gitdir field before assigning the new one, to avoid a memory leak. But we can't do so safely because some callers do the equivalent of: set_git_dir(get_git_dir()); We can detect that case as a noop, but there are even more complicated cases like: set_git_dir(remove_leading_path(worktree, get_git_dir()); where we really do need to do some work, but the original string must remain valid. Rather than put the burden on callers to make a copy of the string (only to free it later, since we'll make a copy of it ourselves), let's solve the problem inside set_git_dir(). We can make a copy of the pointer for the old gitdir, and then avoid freeing it until after we've made our new copy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06repository: free fields before overwriting themJeff King
It's possible that the repository data may be initialized twice (e.g., after doing a chdir() to the top of the worktree we may have to adjust a relative git_dir path). We should free() any existing fields before assigning to them to avoid leaks. This should be safe, as the fields are set based on the environment or on other strings like the gitdir or commondir. That makes it impossible that we are feeding an alias to the just-freed string. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-22Merge branch 'bw/grep-recurse-submodules'Junio C Hamano
"git grep --recurse-submodules" has been reworked to give a more consistent output across submodule boundary (and do its thing without having to fork a separate process). * bw/grep-recurse-submodules: grep: recurse in-process using 'struct repository' submodule: merge repo_read_gitmodules and gitmodules_config submodule: check for unmerged .gitmodules outside of config parsing submodule: check for unstaged .gitmodules outside of config parsing submodule: remove fetch.recursesubmodules from submodule-config parsing submodule: remove submodule.fetchjobs from submodule-config parsing config: add config_from_gitmodules cache.h: add GITMODULES_FILE macro repository: have the_repository use the_index repo_read_index: don't discard the index
2017-07-18repository: have the_repository use the_indexBrandon Williams
Have the index state which is stored in 'the_repository' be a pointer to the in-core index 'the_index'. This makes it easier to begin transitioning more parts of the code base to operate on a 'struct repository'. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-18repo_read_index: don't discard the indexBrandon Williams
Have 'repo_read_index()' behave more like the other read_index family of functions and don't discard the index if it has already been populated and instead rely on the quick return of read_index_from which has: /* istate->initialized covers both .git/index and .git/sharedindex.xxx */ if (istate->initialized) return istate->cache_nr; Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>