summaryrefslogtreecommitdiff
path: root/refs.c
AgeCommit message (Collapse)Author
2024-03-05Merge branch 'kn/for-all-refs'Junio C Hamano
"git for-each-ref" learned "--include-root-refs" option to show even the stuff outside the 'refs/' hierarchy. * kn/for-all-refs: for-each-ref: add new option to include root refs ref-filter: rename 'FILTER_REFS_ALL' to 'FILTER_REFS_REGULAR' refs: introduce `refs_for_each_include_root_refs()` refs: extract out `loose_fill_ref_dir_regular_file()` refs: introduce `is_pseudoref()` and `is_headref()`
2024-03-05Merge branch 'jk/reflog-special-cases-fix'Junio C Hamano
The logic to access reflog entries by date and number had ugly corner cases at the boundaries, which have been cleaned up. * jk/reflog-special-cases-fix: read_ref_at(): special-case ref@{0} for an empty reflog get_oid_basic(): special-case ref@{n} for oldest reflog entry Revert "refs: allow @{n} to work with n-sized reflog"
2024-02-26read_ref_at(): special-case ref@{0} for an empty reflogJeff King
The previous commit special-cased get_oid_basic()'s handling of ref@{n} for a reflog with n entries. But its special case doesn't work for ref@{0} in an empty reflog, because read_ref_at() dies when it notices the empty reflog! We can make this work by special-casing this in read_ref_at(). It's somewhat gross, for two reasons: 1. We have no reflog entry to describe in the "msg" out-parameter. So we have to leave it uninitialized or make something up. 2. Likewise, we have no oid to put in the "oid" out-parameter. Leaving it untouched is actually the best thing here, as all of the callers will have initialized it with the current ref value via repo_dwim_log(). This is rather subtle, but it is how things worked in 6436a20284 (refs: allow @{n} to work with n-sized reflog, 2021-01-07) before we reverted it. The key difference from 6436a20284 here is that we'll return "1" to indicate that we _didn't_ find the requested reflog entry. Coupled with the special-casing in get_oid_basic() in the previous commit, that's enough to make looking up ref@{0} work, and we can flip 6436a20284's test back to expect_success. It also means that the call in show-branch which segfaulted with 6436a20284 (and which is now tested in t3202) remains OK. The caller notices that we could not find any reflog entry, and so it breaks out of its loop, showing nothing. This is different from the current behavior of producing an error, but it's just as reasonable (and is exactly what we'd do if you asked it to walk starting at ref@{1} but there was only 1 entry). Thus nobody should actually look at the reflog entry info we return. But we'll still put in some fake values just to be on the safe side, since this is such a subtle and confusing interface. Likewise, we'll document what's going on in a comment above the function declaration. If this were a function with a lot of callers, the footgun would probably not be worth it. But it has only ever had two callers in its 18-year existence, and it seems unlikely to grow more. So let's hold our noses and let users enjoy the convenience of a simulated ref@{0}. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-26get_oid_basic(): special-case ref@{n} for oldest reflog entryJeff King
The goal of 6436a20284 (refs: allow @{n} to work with n-sized reflog, 2021-01-07) was that if we have "n" entries in a reflog, we should still be able to resolve ref@{n} by looking at the "old" value of the oldest entry. Commit 6436a20284 tried to put the logic into read_ref_at() by shifting its idea of "n" by one. But we reverted that in the previous commit, since it led to bugs in other callers which cared about the details of the reflog entry we found. Instead, let's put the special case into the caller that resolves @{n}, as it cares only about the oid. read_ref_at() is even kind enough to return the "old" value from the final reflog; it just returns "1" to signal to us that we ran off the end of the reflog. But we can notice in the caller that we read just enough records for that "old" value to be the one we're looking for, and use it. Note that read_ref_at() could notice this case, too, and just return 0. But we don't want to do that, because the caller must be made aware that we only found the oid, not an actual reflog entry (and the call sites in show-branch do care about this). There is one complication, though. When read_ref_at() hits a truncated reflog, it will return the "old" value of the oldest entry only if it is not the null oid. Otherwise, it actually returns the "new" value from that entry! This bit of fudging is due to d1a4489a56 (avoid null SHA1 in oldest reflog, 2008-07-08), where asking for "ref@{20.years.ago}" for a ref created recently will produce the initial value as a convenience (even though technically it did not exist 20 years ago). But this convenience is only useful for time-based cutoffs. For count-based cutoffs, get_oid_basic() has always simply complained about going too far back: $ git rev-parse HEAD@{20} fatal: log for 'HEAD' only has 16 entries and we should continue to do so, rather than returning a nonsense value (there's even a test in t1508 already which covers this). So let's have the d1a4489a56 code kick in only when doing timestamp-based cutoffs. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-26Revert "refs: allow @{n} to work with n-sized reflog"Jeff King
This reverts commit 6436a20284f33d42103cac93bd82e65bebb31526. The idea of that commit is that if read_ref_at() is counting back to the Nth reflog but the reflog is short by one entry (e.g., because it was pruned), we can find the oid of the missing entry by looking at the "before" oid value of the entry that comes after it (whereas before, we looked at the "after" value of each entry and complained that we couldn't find the one from before the truncation). This works fine for resolving the oid of ref@{n}, as it is used by get_oid_basic(), which does not look at any other aspect of the reflog we found (e.g., its timestamp or message). But there's another caller of read_ref_at(): in show-branch we use it to walk over the reflog, and we do care about the reflog entry. And so that commit broke "show-branch --reflog"; it shows the reflog message for ref@{0} as ref@{1}, ref@{1} as ref@{2}, and so on. For example, in the new test in t3202 we produce: ! [branch@{0}] (0 seconds ago) commit: three ! [branch@{1}] (0 seconds ago) commit: three ! [branch@{2}] (60 seconds ago) commit: two ! [branch@{3}] (2 minutes ago) reset: moving to HEAD^ instead of the correct: ! [branch@{0}] (0 seconds ago) commit: three ! [branch@{1}] (60 seconds ago) commit: two ! [branch@{2}] (2 minutes ago) reset: moving to HEAD^ ! [branch@{3}] (2 minutes ago) commit: one But there's another bug, too: because it is looking at the "old" value of the reflog after the one we're interested in, it has to special-case ref@{0} (since there isn't anything after it). That's why it doesn't show the offset bug in the output above. But this special-case code fails to handle the situation where the reflog is empty or missing; it returns success even though the reflog message out-parameter has been left uninitialized. You can't trigger this through get_oid_basic(), but "show-branch --reflog" will pretty reliably segfault as it tries to access the garbage pointer. Fixing the segfault would be pretty easy. But the off-by-one problem is inherent in this approach. So let's start by reverting the commit to give us a clean slate to work with. This isn't a pure revert; all of the code changes are reverted, but for the tests: 1. We'll flip the cases in t1508 to expect_failure; making these work was the goal of 6436a2028, and we'll want to use them for our replacement approach. 2. There's a test in t3202 for "show-branch --reflog", but it expects the broken output! It was added by f2463490c4 (show-branch: show reflog message, 2021-12-02) which was fixing another bug, and I think the author simply didn't notice that the second line showed the wrong reflog. Rather than fixing that test, let's replace it with one that is more thorough (while still covering the reflog message fix from that commit). We'll use a longer reflog, which lets us see more entries (thus making the "off by one" pattern much more clear). And we'll use a more recent timestamp for "now" so that our relative dates have more resolution. That lets us see that the reflog dates are correct (whereas when you are 4 years away, two entries that are 60 seconds apart will have the same "4 years ago" relative date). Because we're adjusting the repository state, I've moved this new test to the end of the script, leaving the other tests undisturbed. We'll also add a new test which covers the missing reflog case; previously it segfaulted, but now it reports the empty reflog). Reported-by: Yasushi SHOJI <yasushi.shoji@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-23refs: introduce `refs_for_each_include_root_refs()`Karthik Nayak
Introduce a new ref iteration flag `DO_FOR_EACH_INCLUDE_ROOT_REFS`, which will be used to iterate over regular refs plus pseudorefs and HEAD. Refs which fall outside the `refs/` and aren't either pseudorefs or HEAD are more of a grey area. This is because we don't block the users from creating such refs but they are not officially supported. Introduce `refs_for_each_include_root_refs()` which calls `do_for_each_ref()` with this newly introduced flag. In `refs/files-backend.c`, introduce a new function `add_pseudoref_and_head_entries()` to add pseudorefs and HEAD to the `ref_dir`. We then finally call `add_pseudoref_and_head_entries()` whenever the `DO_FOR_EACH_INCLUDE_ROOT_REFS` flag is set. Any new ref backend will also have to implement similar changes on its end. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-23refs: introduce `is_pseudoref()` and `is_headref()`Karthik Nayak
Introduce two new functions `is_pseudoref()` and `is_headref()`. This provides the necessary functionality for us to add pseudorefs and HEAD to the loose ref cache in the files backend, allowing us to build tooling to print these refs. The `is_pseudoref()` function internally calls `is_pseudoref_syntax()` but adds onto it by also checking to ensure that the pseudoref either ends with a "_HEAD" suffix or matches a list of exceptions. After which we also parse the contents of the pseudoref to ensure that it conforms to the ref format. We cannot directly add the new syntax checks to `is_pseudoref_syntax()` because the function is also used by `is_current_worktree_ref()` and making it stricter to match only known pseudorefs might have unintended consequences due to files like 'BISECT_START' which isn't a pseudoref but sometimes contains object ID. Keeping this in mind, we leave `is_pseudoref_syntax()` as is and create `is_pseudoref()` which is stricter. Ideally we'd want to move the new syntax checks to `is_pseudoref_syntax()` but a prerequisite for this would be to actually remove the exception list by converting those pseudorefs to also contain a '_HEAD' suffix and perhaps move bisect related files like 'BISECT_START' to a new directory similar to the 'rebase-merge' directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-21refs: drop unused params from the reflog iterator callbackPatrick Steinhardt
The ref and reflog iterators share much of the same underlying code to iterate over the corresponding entries. This results in some weird code because the reflog iterator also exposes an object ID as well as a flag to the callback function. Neither of these fields do refer to the reflog though -- they refer to the corresponding ref with the same name. This is quite misleading. In practice at least the object ID cannot really be implemented in any other way as a reflog does not have a specific object ID in the first place. This is further stressed by the fact that none of the callbacks except for our test helper make use of these fields. Split up the infrastucture so that ref and reflog iterators use separate callback signatures. This allows us to drop the nonsensical fields from the reflog iterator. Note that internally, the backends still use the same shared infra to iterate over both types. As the backends should never end up being called directly anyway, this is not much of a problem and thus kept as-is for simplicity's sake. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-21refs: always treat iterators as orderedPatrick Steinhardt
In the preceding commit we have converted the reflog iterator of the "files" backend to be ordered, which was the only remaining ref iterator that wasn't ordered. Refactor the ref iterator infrastructure so that we always assume iterators to be ordered, thus simplifying the code. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-07refs: introduce reftable backendPatrick Steinhardt
Due to scalability issues, Shawn Pearce has originally proposed a new "reftable" format more than six years ago [1]. Initially, this new format was implemented in JGit with promising results. Around two years ago, we have then added the "reftable" library to the Git codebase via a4bbd13be3 (Merge branch 'hn/reftable', 2021-12-15). With this we have landed all the low-level code to read and write reftables. Notably missing though was the integration of this low-level code into the Git code base in the form of a new ref backend that ties all of this together. This gap is now finally closed by introducing a new "reftable" backend into the Git codebase. This new backend promises to bring some notable improvements to Git repositories: - It becomes possible to do truly atomic writes where either all refs are committed to disk or none are. This was not possible with the "files" backend because ref updates were split across multiple loose files. - The disk space required to store many refs is reduced, both compared to loose refs and packed-refs. This is enabled both by the reftable format being a binary format, which is more compact, and by prefix compression. - We can ignore filesystem-specific behaviour as ref names are not encoded via paths anymore. This means there is no need to handle case sensitivity on Windows systems or Unicode precomposition on macOS. - There is no need to rewrite the complete refdb anymore every time a ref is being deleted like it was the case for packed-refs. This means that ref deletions are now constant time instead of scaling linearly with the number of refs. - We can ignore file/directory conflicts so that it becomes possible to store both "refs/heads/foo" and "refs/heads/foo/bar". - Due to this property we can retain reflogs for deleted refs. We have previously been deleting reflogs together with their refs to avoid file/directory conflicts, which is not necessary anymore. - We can properly enumerate all refs. With the "files" backend it is not easily possible to distinguish between refs and non-refs because they may live side by side in the gitdir. Not all of these improvements are realized with the current "reftable" backend implementation. At this point, the new backend is supposed to be a drop-in replacement for the "files" backend that is used by basically all Git repositories nowadays. It strives for 1:1 compatibility, which means that a user can expect the same behaviour regardless of whether they use the "reftable" backend or the "files" backend for most of the part. Most notably, this means we artificially limit the capabilities of the "reftable" backend to match the limits of the "files" backend. It is not possible to create refs that would end up with file/directory conflicts, we do not retain reflogs, we perform stricter-than-necessary checks. This is done intentionally due to two main reasons: - It makes it significantly easier to land the "reftable" backend as tests behave the same. It would be tough to argue for each and every single test that doesn't pass with the "reftable" backend. - It ensures compatibility between repositories that use the "files" backend and repositories that use the "reftable" backend. Like this, hosters can migrate their repositories to use the "reftable" backend without causing issues for clients that use the "files" backend in their clones. It is expected that these artificial limitations may eventually go away in the long term. Performance-wise things very much depend on the actual workload. The following benchmarks compare the "files" and "reftable" backends in the current version: - Creating N refs in separate transactions shows that the "files" backend is ~50% faster. This is not surprising given that creating a ref only requires us to create a single loose ref. The "reftable" backend will also perform auto compaction on updates. In real-world workloads we would likely also want to perform pack loose refs, which would likely change the picture. Benchmark 1: update-ref: create refs sequentially (refformat = files, refcount = 1) Time (mean ± σ): 2.1 ms ± 0.3 ms [User: 0.6 ms, System: 1.7 ms] Range (min … max): 1.8 ms … 4.3 ms 133 runs Benchmark 2: update-ref: create refs sequentially (refformat = reftable, refcount = 1) Time (mean ± σ): 2.7 ms ± 0.1 ms [User: 0.6 ms, System: 2.2 ms] Range (min … max): 2.4 ms … 2.9 ms 132 runs Benchmark 3: update-ref: create refs sequentially (refformat = files, refcount = 1000) Time (mean ± σ): 1.975 s ± 0.006 s [User: 0.437 s, System: 1.535 s] Range (min … max): 1.969 s … 1.980 s 3 runs Benchmark 4: update-ref: create refs sequentially (refformat = reftable, refcount = 1000) Time (mean ± σ): 2.611 s ± 0.013 s [User: 0.782 s, System: 1.825 s] Range (min … max): 2.597 s … 2.622 s 3 runs Benchmark 5: update-ref: create refs sequentially (refformat = files, refcount = 100000) Time (mean ± σ): 198.442 s ± 0.241 s [User: 43.051 s, System: 155.250 s] Range (min … max): 198.189 s … 198.670 s 3 runs Benchmark 6: update-ref: create refs sequentially (refformat = reftable, refcount = 100000) Time (mean ± σ): 294.509 s ± 4.269 s [User: 104.046 s, System: 190.326 s] Range (min … max): 290.223 s … 298.761 s 3 runs - Creating N refs in a single transaction shows that the "files" backend is significantly slower once we start to write many refs. The "reftable" backend only needs to update two files, whereas the "files" backend needs to write one file per ref. Benchmark 1: update-ref: create many refs (refformat = files, refcount = 1) Time (mean ± σ): 1.9 ms ± 0.1 ms [User: 0.4 ms, System: 1.4 ms] Range (min … max): 1.8 ms … 2.6 ms 151 runs Benchmark 2: update-ref: create many refs (refformat = reftable, refcount = 1) Time (mean ± σ): 2.5 ms ± 0.1 ms [User: 0.7 ms, System: 1.7 ms] Range (min … max): 2.4 ms … 3.4 ms 148 runs Benchmark 3: update-ref: create many refs (refformat = files, refcount = 1000) Time (mean ± σ): 152.5 ms ± 5.2 ms [User: 19.1 ms, System: 133.1 ms] Range (min … max): 148.5 ms … 167.8 ms 15 runs Benchmark 4: update-ref: create many refs (refformat = reftable, refcount = 1000) Time (mean ± σ): 58.0 ms ± 2.5 ms [User: 28.4 ms, System: 29.4 ms] Range (min … max): 56.3 ms … 72.9 ms 40 runs Benchmark 5: update-ref: create many refs (refformat = files, refcount = 1000000) Time (mean ± σ): 152.752 s ± 0.710 s [User: 20.315 s, System: 131.310 s] Range (min … max): 152.165 s … 153.542 s 3 runs Benchmark 6: update-ref: create many refs (refformat = reftable, refcount = 1000000) Time (mean ± σ): 51.912 s ± 0.127 s [User: 26.483 s, System: 25.424 s] Range (min … max): 51.769 s … 52.012 s 3 runs - Deleting a ref in a fully-packed repository shows that the "files" backend scales with the number of refs. The "reftable" backend has constant-time deletions. Benchmark 1: update-ref: delete ref (refformat = files, refcount = 1) Time (mean ± σ): 1.7 ms ± 0.1 ms [User: 0.4 ms, System: 1.2 ms] Range (min … max): 1.6 ms … 2.1 ms 316 runs Benchmark 2: update-ref: delete ref (refformat = reftable, refcount = 1) Time (mean ± σ): 1.8 ms ± 0.1 ms [User: 0.4 ms, System: 1.3 ms] Range (min … max): 1.7 ms … 2.1 ms 294 runs Benchmark 3: update-ref: delete ref (refformat = files, refcount = 1000) Time (mean ± σ): 2.0 ms ± 0.1 ms [User: 0.5 ms, System: 1.4 ms] Range (min … max): 1.9 ms … 2.5 ms 287 runs Benchmark 4: update-ref: delete ref (refformat = reftable, refcount = 1000) Time (mean ± σ): 1.9 ms ± 0.1 ms [User: 0.5 ms, System: 1.3 ms] Range (min … max): 1.8 ms … 2.1 ms 217 runs Benchmark 5: update-ref: delete ref (refformat = files, refcount = 1000000) Time (mean ± σ): 229.8 ms ± 7.9 ms [User: 182.6 ms, System: 46.8 ms] Range (min … max): 224.6 ms … 245.2 ms 6 runs Benchmark 6: update-ref: delete ref (refformat = reftable, refcount = 1000000) Time (mean ± σ): 2.0 ms ± 0.0 ms [User: 0.6 ms, System: 1.3 ms] Range (min … max): 2.0 ms … 2.1 ms 3 runs - Listing all refs shows no significant advantage for either of the backends. The "files" backend is a bit faster, but not by a significant margin. When repositories are not packed the "reftable" backend outperforms the "files" backend because the "reftable" backend performs auto-compaction. Benchmark 1: show-ref: print all refs (refformat = files, refcount = 1, packed = true) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.0 ms 1729 runs Benchmark 2: show-ref: print all refs (refformat = reftable, refcount = 1, packed = true) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 1.8 ms 1816 runs Benchmark 3: show-ref: print all refs (refformat = files, refcount = 1000, packed = true) Time (mean ± σ): 4.3 ms ± 0.1 ms [User: 0.9 ms, System: 3.3 ms] Range (min … max): 4.1 ms … 4.6 ms 645 runs Benchmark 4: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = true) Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 1.0 ms, System: 3.3 ms] Range (min … max): 4.2 ms … 5.9 ms 643 runs Benchmark 5: show-ref: print all refs (refformat = files, refcount = 1000000, packed = true) Time (mean ± σ): 2.537 s ± 0.034 s [User: 0.488 s, System: 2.048 s] Range (min … max): 2.511 s … 2.627 s 10 runs Benchmark 6: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = true) Time (mean ± σ): 2.712 s ± 0.017 s [User: 0.653 s, System: 2.059 s] Range (min … max): 2.692 s … 2.752 s 10 runs Benchmark 7: show-ref: print all refs (refformat = files, refcount = 1, packed = false) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 1.9 ms 1834 runs Benchmark 8: show-ref: print all refs (refformat = reftable, refcount = 1, packed = false) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.4 ms … 2.0 ms 1840 runs Benchmark 9: show-ref: print all refs (refformat = files, refcount = 1000, packed = false) Time (mean ± σ): 13.8 ms ± 0.2 ms [User: 2.8 ms, System: 10.8 ms] Range (min … max): 13.3 ms … 14.5 ms 208 runs Benchmark 10: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = false) Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 1.2 ms, System: 3.3 ms] Range (min … max): 4.3 ms … 6.2 ms 624 runs Benchmark 11: show-ref: print all refs (refformat = files, refcount = 1000000, packed = false) Time (mean ± σ): 12.127 s ± 0.129 s [User: 2.675 s, System: 9.451 s] Range (min … max): 11.965 s … 12.370 s 10 runs Benchmark 12: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = false) Time (mean ± σ): 2.799 s ± 0.022 s [User: 0.735 s, System: 2.063 s] Range (min … max): 2.769 s … 2.836 s 10 runs - Printing a single ref shows no real difference between the "files" and "reftable" backends. Benchmark 1: show-ref: print single ref (refformat = files, refcount = 1) Time (mean ± σ): 1.5 ms ± 0.1 ms [User: 0.4 ms, System: 1.0 ms] Range (min … max): 1.4 ms … 1.8 ms 1779 runs Benchmark 2: show-ref: print single ref (refformat = reftable, refcount = 1) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.4 ms … 2.5 ms 1753 runs Benchmark 3: show-ref: print single ref (refformat = files, refcount = 1000) Time (mean ± σ): 1.5 ms ± 0.1 ms [User: 0.3 ms, System: 1.1 ms] Range (min … max): 1.4 ms … 1.9 ms 1840 runs Benchmark 4: show-ref: print single ref (refformat = reftable, refcount = 1000) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.0 ms 1831 runs Benchmark 5: show-ref: print single ref (refformat = files, refcount = 1000000) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.1 ms 1848 runs Benchmark 6: show-ref: print single ref (refformat = reftable, refcount = 1000000) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.1 ms 1762 runs So overall, performance depends on the usecases. Except for many sequential writes the "reftable" backend is roughly on par or significantly faster than the "files" backend though. Given that the "files" backend has received 18 years of optimizations by now this can be seen as a win. Furthermore, we can expect that the "reftable" backend will grow faster over time when attention turns more towards optimizations. The complete test suite passes, except for those tests explicitly marked to require the REFFILES prerequisite. Some tests in t0610 are marked as failing because they depend on still-in-flight bug fixes. Tests can be run with the new backend by setting the GIT_TEST_DEFAULT_REF_FORMAT environment variable to "reftable". There is a single known conceptual incompatibility with the dumb HTTP transport. As "info/refs" SHOULD NOT contain the HEAD reference, and because the "HEAD" file is not valid anymore, it is impossible for the remote client to figure out the default branch without changing the protocol. This shortcoming needs to be handled in a subsequent patch series. As the reftable library has already been introduced a while ago, this commit message will not go into the details of how exactly the on-disk format works. Please refer to our preexisting technical documentation at Documentation/technical/reftable for this. [1]: https://public-inbox.org/git/CAJo=hJtyof=HRy=2sLP0ng0uZ4=S-DpZ5dR1aF+VHVETKG20OQ@mail.gmail.com/ Original-idea-by: Shawn Pearce <spearce@spearce.org> Based-on-patch-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-30Merge branch 'ps/not-so-many-refs-are-special'Junio C Hamano
Define "special ref" as a very narrow set that consists of FETCH_HEAD and MERGE_HEAD, and clarify everything else that used to be classified as such are actually just pseudorefs. * ps/not-so-many-refs-are-special: Documentation: add "special refs" to the glossary refs: redefine special refs refs: convert MERGE_AUTOSTASH to become a normal pseudo-ref sequencer: introduce functions to handle autostashes via refs refs: convert AUTO_MERGE to become a normal pseudo-ref sequencer: delete REBASE_HEAD in correct repo when picking commits sequencer: clean up pseudo refs with REF_NO_DEREF
2024-01-26Merge branch 'ps/worktree-refdb-initialization'Junio C Hamano
Instead of manually creating refs/ hierarchy on disk upon a creation of a secondary worktree, which is only usable via the files backend, use the refs API to populate it. * ps/worktree-refdb-initialization: builtin/worktree: create refdb via ref backend worktree: expose interface to look up worktree by name builtin/worktree: move setup of commondir file earlier refs/files: skip creation of "refs/{heads,tags}" for worktrees setup: move creation of "refs/" into the files backend refs: prepare `refs_init_db()` for initializing worktree refs
2024-01-19refs: redefine special refsPatrick Steinhardt
Now that our list of special refs really only contains refs which have actually-special semantics, let's redefine what makes a special ref. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-19refs: convert MERGE_AUTOSTASH to become a normal pseudo-refPatrick Steinhardt
Similar to the preceding conversion of the AUTO_MERGE pseudo-ref, let's convert the MERGE_AUTOSTASH ref to become a normal pseudo-ref as well. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-19refs: convert AUTO_MERGE to become a normal pseudo-refPatrick Steinhardt
In 70c70de616 (refs: complete list of special refs, 2023-12-14) we have inrtoduced a new `is_special_ref()` function that classifies some refs as being special. The rule is that special refs are exclusively read and written via the filesystem directly, whereas normal refs exclucsively go via the refs API. The intent of that commit was to record the status quo so that we know to route reads of such special refs consistently. Eventually, the list should be reduced to its bare minimum of refs which really are special, namely FETCH_HEAD and MERGE_HEAD. Follow up on this promise and convert the AUTO_MERGE ref to become a normal pseudo-ref by using the refs API to both read and write it instead of accessing the filesystem directly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-16Merge branch 'ps/refstorage-extension'Junio C Hamano
Introduce a new extension "refstorage" so that we can mark a repository that uses a non-default ref backend, like reftable. * ps/refstorage-extension: t9500: write "extensions.refstorage" into config builtin/clone: introduce `--ref-format=` value flag builtin/init: introduce `--ref-format=` value flag builtin/rev-parse: introduce `--show-ref-format` flag t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar setup: introduce GIT_DEFAULT_REF_FORMAT envvar setup: introduce "extensions.refStorage" extension setup: set repository's formats on init setup: start tracking ref storage format refs: refactor logic to look up storage backends worktree: skip reading HEAD when repairing worktrees t: introduce DEFAULT_REPO_FORMAT prereq
2024-01-08refs: prepare `refs_init_db()` for initializing worktree refsPatrick Steinhardt
The purpose of `refs_init_db()` is to initialize the on-disk files of a new ref database. The function is quite inflexible right now though, as callers can neither specify the `struct ref_store` nor can they pass any flags. Refactor the interface to accept both of these. This will be required so that we can start initializing per-worktree ref databases via the ref backend instead of open-coding the initialization in "worktree.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-02Merge branch 'ps/pseudo-refs'Junio C Hamano
Assorted changes around pseudoref handling. * ps/pseudo-refs: bisect: consistently write BISECT_EXPECTED_REV via the refdb refs: complete list of special refs refs: propagate errno when reading special refs fails wt-status: read HEAD and ORIG_HEAD via the refdb
2024-01-02setup: start tracking ref storage formatPatrick Steinhardt
In order to discern which ref storage format a repository is supposed to use we need to start setting up and/or discovering the format. This needs to happen in two separate code paths. - The first path is when we create a repository via `init_db()`. When we are re-initializing a preexisting repository we need to retain the previously used ref storage format -- if the user asked for a different format then this indicates an error and we error out. Otherwise we either initialize the repository with the format asked for by the user or the default format, which currently is the "files" backend. - The second path is when discovering repositories, where we need to read the config of that repository. There is not yet any way to configure something other than the "files" backend, so we can just blindly set the ref storage format to this backend. Wire up this logic so that we have the ref storage format always readily available when needed. As there is only a single backend and because it is not configurable we cannot yet verify that this tracking works as expected via tests, but tests will be added in subsequent commits. To countermand this ommission now though, raise a BUG() in case the ref storage format is not set up properly in `ref_store_init()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-02refs: refactor logic to look up storage backendsPatrick Steinhardt
In order to look up ref storage backends, we're currently using a linked list of backends, where each backend is expected to set up its `next` pointer to the next ref storage backend. This is kind of a weird setup as backends need to be aware of other backends without much of a reason. Refactor the code so that the array of backends is centrally defined in "refs.c", where each backend is now identified by an integer constant. Expose functions to translate from those integer constants to the name and vice versa, which will be required by subsequent patches. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14bisect: consistently write BISECT_EXPECTED_REV via the refdbPatrick Steinhardt
We're inconsistently writing BISECT_EXPECTED_REV both via the filesystem and via the refdb, which violates the newly established rules for how special refs must be treated. This works alright in practice with the reffiles reference backend, but will cause bugs once we gain additional backends. Fix this issue and consistently write BISECT_EXPECTED_REV via the refdb so that it is no longer a special ref. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14refs: complete list of special refsPatrick Steinhardt
We have some references that are more special than others. The reason for them being special is that they either do not follow the usual format of references, or that they are written to the filesystem directly by the respective owning subsystem and thus circumvent the reference backend. This works perfectly fine right now because the reffiles backend will know how to read those refs just fine. But with the prospect of gaining a new reference backend implementation we need to be a lot more careful here: - We need to make sure that we are consistent about how those refs are written. They must either always be written via the filesystem, or they must always be written via the reference backend. Any mixture will lead to inconsistent state. - We need to make sure that such special refs are always handled specially when reading them. We're already mostly good with regard to the first item, except for `BISECT_EXPECTED_REV` which will be addressed in a subsequent commit. But the current list of special refs is missing some refs that really should be treated specially. Right now, we only treat `FETCH_HEAD` and `MERGE_HEAD` specially here. Introduce a new function `is_special_ref()` that contains all current instances of special refs to fix the reading path. Note that this is only a temporary measure where we record and rectify the current state. Ideally, the list of special refs should in the end only contain `FETCH_HEAD` and `MERGE_HEAD` again because they both may reference multiple objects and can contain annotations, so they indeed are special. Based-on-patch-by: Han-Wen Nienhuys <hanwenn@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14refs: propagate errno when reading special refs failsPatrick Steinhardt
Some refs in Git are more special than others due to reasons explained in the next commit. These refs are read via `refs_read_special_head()`, but this function doesn't behave the same as when we try to read a normal ref. Most importantly, we do not propagate `failure_errno` in the case where the reference does not exist, which is behaviour that we rely on in many parts of Git. Fix this bug by propagating errno when `strbuf_read_file()` fails. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-17refs: deduplicate code to delete referencesPatrick Steinhardt
Both the files and the packed-refs reference backends now use the same generic transactions-based code to delete references. Let's pull these implementations up into `refs_delete_refs()` to deduplicate the code. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-25Merge branch 'jk/unused-parameter'Junio C Hamano
Mark-up unused parameters in the code so that we can eventually enable -Wunused-parameter by default. * jk/unused-parameter: t/helper: mark unused callback void data parameters tag: mark unused parameters in each_tag_name_fn callbacks rev-parse: mark unused parameter in for_each_abbrev callback replace: mark unused parameter in each_mergetag_fn callback replace: mark unused parameter in ref callback merge-tree: mark unused parameter in traverse callback fsck: mark unused parameters in various fsck callbacks revisions: drop unused "opt" parameter in "tweak" callbacks count-objects: mark unused parameter in alternates callback am: mark unused keep_cr parameters http-push: mark unused parameter in xml callback http: mark unused parameters in curl callbacks do_for_each_ref_helper(): mark unused repository parameter test-ref-store: drop unimplemented reflog-expire command
2023-07-21Merge branch 'tb/refs-exclusion-and-packed-refs'Junio C Hamano
Enumerating refs in the packed-refs file, while excluding refs that match certain patterns, has been optimized. * tb/refs-exclusion-and-packed-refs: ls-refs.c: avoid enumerating hidden refs where possible upload-pack.c: avoid enumerating hidden refs where possible builtin/receive-pack.c: avoid enumerating hidden references refs.h: implement `hidden_refs_to_excludes()` refs.h: let `for_each_namespaced_ref()` take excluded patterns revision.h: store hidden refs in a `strvec` refs/packed-backend.c: add trace2 counters for jump list refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) refs/packed-backend.c: refactor `find_reference_location()` refs: plumb `exclude_patterns` argument throughout builtin/for-each-ref.c: add `--exclude` option ref-filter.c: parameterize match functions over patterns ref-filter: add `ref_filter_clear()` ref-filter: clear reachable list pointers after freeing ref-filter.h: provide `REF_FILTER_INIT` refs.c: rename `ref_filter`
2023-07-14do_for_each_ref_helper(): mark unused repository parameterJeff King
This function gets a repository parameter because it's a callback for do_for_each_repo_ref_iterator(). But it's just a wrapper that passes along each call to a regular each_ref_fn callback, and the latter doesn't accept a repository argument. Probably in the long run all of the each_ref_fn callbacks should get a repository parameter, too. But changing that now would require updates all over the code base. Until that happens, let's annotate this wrapper callback to quiet the compiler's -Wunused-parameter warning. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10refs.h: implement `hidden_refs_to_excludes()`Taylor Blau
In subsequent commits, we'll teach `receive-pack` and `upload-pack` to use the new jump list feature in the packed-refs iterator by ignoring references which are mentioned via its respective hideRefs lists. However, the packed-ref jump lists cannot handle un-hiding rules (that begin with '!'), or namespace comparisons (that begin with '^'). Add a convenience function to the refs.h API to detect when either of these conditions are met, and returns an appropriate value to pass as excluded patterns. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10refs.h: let `for_each_namespaced_ref()` take excluded patternsTaylor Blau
A future commit will want to call `for_each_namespaced_ref()` with a list of excluded patterns. We could introduce a variant of that function, say, `for_each_namespaced_ref_exclude()` which takes the extra parameter, and reimplement the original function in terms of that. But all but one caller (in `http-backend.c`) will supply the new parameter, so add the new parameter to `for_each_namespaced_ref()` itself instead of introducing a new function. For now, supply NULL for the list of excluded patterns at all callers to avoid changing behavior, which we will do in a future change. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10revision.h: store hidden refs in a `strvec`Taylor Blau
In subsequent commits, it will be convenient to have a 'const char **' of hidden refs (matching `transfer.hiderefs`, `uploadpack.hideRefs`, etc.), instead of a `string_list`. Convert spots throughout the tree that store the list of hidden refs from a `string_list` to a `strvec`. Note that in `parse_hide_refs_config()` there is an ugly const-cast used to avoid an extra copy of each value before trimming any trailing slash characters. This could instead be written as: ref = xstrdup(value); len = strlen(ref); while (len && ref[len - 1] == '/') ref[--len] = '\0'; strvec_push(hide_refs, ref); free(ref); but the double-copy (once when calling `xstrdup()`, and another via `strvec_push()`) is wasteful. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10refs: plumb `exclude_patterns` argument throughoutTaylor Blau
The subsequent patch will want to access an optional `excluded_patterns` array within `refs/packed-backend.c` that will cull out certain references matching any of the given patterns on a best-effort basis. To do so, the refs subsystem needs to be updated to pass this value across a number of different locations. Prepare for a future patch by introducing this plumbing now, passing NULLs at top-level APIs in order to make that patch less noisy and more easily readable. Signed-off-by: Taylor Blau <me@ttaylorr.co> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10refs.c: rename `ref_filter`Jeff King
The refs machinery has its own implementation of a `ref_filter` (used by `for-each-ref`), which is distinct from the `ref-filter.h` API (also used by `for-each-ref`, among other things). Rename the one within refs.c to more clearly indicate its purpose. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-05git-compat-util: move alloc macros to git-compat-util.hCalvin Wan
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for dynamic array allocation. Moving these macros to git-compat-util.h with the other alloc macros focuses alloc.[ch] to allocation for Git objects and additionally allows us to remove inclusions to alloc.h from files that solely used the above macros. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-05treewide: remove unnecessary includes for wrapper.hCalvin Wan
Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21object-store-ll.h: split this header out of object-store.hElijah Newren
The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store | sort | uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21git-compat-util.h: remove unneccessary include of wildmatch.hElijah Newren
The include of wildmatch.h in git-compat-util.h was added in cebcab189aa (Makefile: add USE_WILDMATCH to use wildmatch as fnmatch, 2013-01-01) as a way to be able to compile-time force any calls to fnmatch() to instead invoke wildmatch(). The defines and inline function were removed in 70a8fc999d9 (stop using fnmatch (either native or compat), 2014-02-15), and this include in git-compat-util.h has been unnecessary ever since. Remove the include from git-compat-util.h, but add it to the .c files that had omitted the direct #include they needed. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21repository: remove unnecessary include of path.hElijah Newren
This also made it clear that several .c files that depended upon path.h were missing a #include for it; add the missing includes while at it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-12pack-refs: teach --exclude option to exclude refs from being packedJohn Cai
At GitLab, we have a system that creates ephemeral internal refs that don't live long before getting deleted. Having an option to exclude certain refs from a packed-refs file allows these internal references to be deleted much more efficiently. Add an --exclude option to the pack-refs builtin, and use the ref exclusions API to exclude certain refs from being packed into the final packed-refs file Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: remove cache.h inclusion due to object-name.h changesElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object-name.h: move declarations for object-name.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on advice.hElijah Newren
Dozens of files made use of advice functions, without explicitly including advice.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include advice.h if they are using it. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-04Merge branch 'ab/remove-implicit-use-of-the-repository' into ↵Junio C Hamano
en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-03-28post-cocci: adjust comments for recent repo_* migrationÆvar Arnfjörð Bjarmason
In preceding commits we changed many calls to macros that were providing a "the_repository" argument to invoke corresponding repo_*() function instead. Let's follow-up and adjust references to those in comments, which coccinelle didn't (and inherently can't) catch. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21setup.h: move declarations for setup.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21environment.h: move declarations for environment.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21wrapper.h: move declarations for wrapper.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: remove unnecessary cache.h inclusion from a few headersElijah Newren
Ever since a64215b6cd ("object.h: stop depending on cache.h; make cache.h depend on object.h", 2023-02-24), we have a few headers that could have replaced their include of cache.h with an include of object.h. Make that change now. Some C files had to start including cache.h after this change (or some smaller header it had brought in), because the C files were depending on things from cache.h but were only formerly implicitly getting cache.h through one of these headers being modified in this patch. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-17Merge branch 'en/header-cleanup'Junio C Hamano
Code clean-up to clarify the rule that "git-compat-util.h" must be the first to be included. * en/header-cleanup: diff.h: remove unnecessary include of object.h Remove unnecessary includes of builtin.h treewide: replace cache.h with more direct headers, where possible replace-object.h: move read_replace_refs declaration from cache.h to here object-store.h: move struct object_info from cache.h dir.h: refactor to no longer need to include cache.h object.h: stop depending on cache.h; make cache.h depend on object.h ident.h: move ident-related declarations out of cache.h pretty.h: move has_non_ascii() declaration from commit.h cache.h: remove dependence on hex.h; make other files include it explicitly hex.h: move some hex-related declarations from cache.h hash.h: move some oid-related declarations from cache.h alloc.h: move ALLOC_GROW() functions from cache.h treewide: remove unnecessary cache.h includes in source files treewide: remove unnecessary cache.h includes treewide: remove unnecessary git-compat-util.h includes in headers treewide: ensure one of the appropriate headers is sourced first
2023-02-27Merge branch 'jk/shorten-unambiguous-ref-wo-sscanf'Junio C Hamano
sscanf(3) used in "git symbolic-ref --short" implementation found to be not working reliably on macOS in UTF-8 locales. Rewrite the code to avoid sscanf() altogether to work it around. * jk/shorten-unambiguous-ref-wo-sscanf: shorten_unambiguous_ref(): avoid sscanf() shorten_unambiguous_ref(): use NUM_REV_PARSE_RULES constant shorten_unambiguous_ref(): avoid integer truncation