summaryrefslogtreecommitdiff
path: root/rev-list.c
AgeCommit message (Collapse)Author
2006-05-18Make "git rev-list" be a builtinLinus Torvalds
This was surprisingly easy. The diff is truly minimal: rename "main()" to "cmd_rev_list()" in rev-list.c, and rename the whole file to reflect its new built-in status. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-18Merge branch 'lt/logopt'Junio C Hamano
* lt/logopt: Fix "git log --stat": make sure to set recursive with --stat. combine-diff: show diffstat with the first parent. git.c: LOGSIZE is unused after log printing cleanup. Log message printout cleanups (#3): fix --pretty=oneline Log message printout cleanups (#2) Log message printout cleanups rev-list --header: output format fix Fixes for option parsing log/whatchanged/show - log formatting cleanup. Simplify common default options setup for built-in log family. Tentative built-in "git show" Built-in git-whatchanged. rev-list option parser fix. Split init_revisions() out of setup_revisions() Fix up rev-list option parsing. Fix up default abbrev in setup_revisions() argument parser. Common option parsing for "git log --diff" and friends
2006-04-17Log message printout cleanupsLinus Torvalds
On Sun, 16 Apr 2006, Junio C Hamano wrote: > > In the mid-term, I am hoping we can drop the generate_header() > callchain _and_ the custom code that formats commit log in-core, > found in cmd_log_wc(). Ok, this was nastier than expected, just because the dependencies between the different log-printing stuff were absolutely _everywhere_, but here's a patch that does exactly that. The patch is not very easy to read, and the "--patch-with-stat" thing is still broken (it does not call the "show_log()" thing properly for merges). That's not a new bug. In the new world order it _should_ do something like if (rev->logopt) show_log(rev, rev->logopt, "---\n"); but it doesn't. I haven't looked at the --with-stat logic, so I left it alone. That said, this patch removes more lines than it adds, and in particular, the "cmd_log_wc()" loop is now a very clean: while ((commit = get_revision(rev)) != NULL) { log_tree_commit(rev, commit); free(commit->buffer); commit->buffer = NULL; } so it doesn't get much prettier than this. All the complexity is entirely hidden in log-tree.c, and any code that needs to flush the log literally just needs to do the "if (rev->logopt) show_log(...)" incantation. I had to make the combined_diff() logic take a "struct rev_info" instead of just a "struct diff_options", but that part is pretty clean. This does change "git whatchanged" from using "diff-tree" as the commit descriptor to "commit", and I changed one of the tests to reflect that new reality. Otherwise everything still passes, and my other tests look fine too. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-17Merge branch 'jc/boundary'Junio C Hamano
* jc/boundary: rev-list --boundary: show boundary commits even when limited otherwise.
2006-04-17rev-list --header: output format fixJunio C Hamano
Initial fix prepared by Johannes, but I did it slightly differently. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-17rev-list --boundary: show boundary commits even when limited otherwise.Junio C Hamano
The boundary commits are shown for UI like gitk to draw them as soon as topo-order sorting allows, and should not be omitted by get_revision() filtering logic. As long as their immediate child commits are shown, we should not filter them out. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-16rev-list option parser fix.Junio C Hamano
The big option parser unification broke rev-list the big way; this makes it use options from the parsed revs structure. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15Fix up rev-list option parsing.Junio C Hamano
rev-list does not take diff options, so barf after seeing some. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15rev-list --bisect: limit list before bisecting.Junio C Hamano
I noticed bisect does not work well without both good and bad. Running this script in git.git repository would give you quite different results: #!/bin/sh initial=e83c5163316f89bfbde7d9ab23ca2e25604af290 mid0=`git rev-list --bisect ^$initial --all` git rev-list $mid0 | wc -l git rev-list ^$mid0 --all | wc -l mid1=`git rev-list --bisect --all` git rev-list $mid1 | wc -l git rev-list ^$mid1 --all | wc -l The $initial commit is the very first commit you made. The first midpoint bisects things evenly as designed, but the latter does not. The reason I got interested in this was because I was wondering if something like the following would help people converting a huge repository from foreign SCM, or preparing a repository to be fetched over plain dumb HTTP only: #!/bin/sh N=4 P=.git/objects/pack bottom= while test 0 \< $N do N=$((N-1)) if test -z "$bottom" then newbottom=`git rev-list --bisect --all` else newbottom=`git rev-list --bisect ^$bottom --all` fi if test -z "$bottom" then rev_list="$newbottom" elif test 0 = $N then rev_list="^$bottom --all" else rev_list="^$bottom $newbottom" fi p=$(git rev-list --unpacked --objects $rev_list | git pack-objects $P/pack) git show-index <$P/pack-$p.idx | wc -l bottom=$newbottom done The idea is to pack older half of the history to one pack, then older half of the remaining history to another, to continue a few times, using finer granularity as we get closer to the tip. This may not matter, since for a truly huge history, running bisect number of times could be quite time consuming, and we might be better off running "git rev-list --all" once into a temporary file, and manually pick cut-off points from the resulting list of commits. After all we are talking about "approximately half" for such an usage, and older history does not matter much. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-11blame and friends: adjust to multiple pathspec change.Junio C Hamano
This makes things that include revision.h build again. Blame is also built, but I am not sure how well it works (or how well it worked to begin with) -- it was relying on tree-diff to be using whatever pathspec was used the last time, which smells a bit suspicious. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-09Make "--parents" logs also be incrementalLinus Torvalds
The parent rewriting feature caused us to create the whole history in one go, and then simplify it later, because of how rewrite_parents() had been written. However, with a little tweaking, it's perfectly possible to do even that one incrementally. Right now, this doesn't really much matter, because every user of "--parents" will probably generally _also_ use "--topo-order", which will cause the old non-incremental behaviour anyway. However, I'm hopeful that we could make even the topological sort incremental, or at least _partially_ so (for example, make it incremental up to the first merge). In the meantime, this at least moves things in the right direction, and removes a strange special case. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-07rev-list --abbrev-commitJunio C Hamano
This should make --pretty=oneline a whole lot more readable for people using 80-column terminals. Originally from Eric Wong. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-01Move "--parent" parsing into generic revision.c library codeLinus Torvalds
Not only do we do it in both rev-list.c and git.c, the revision walking code will soon want to know whether we should rewrite parenthood information or not. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-30tree/diff header cleanup.Junio C Hamano
Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-29rev-list --boundaryJunio C Hamano
With the new --boundary flag, the output from rev-list includes the UNINTERESING commits at the boundary, which are usually not shown. Their object names are prefixed with '-'. For example, with this graph: C side / A---B---D master You would get something like this: $ git rev-list --boundary --header --parents side..master D B tree D^{tree} parent B ... log message for commit D here ... \0-B A tree B^{tree} parent A ... log message for commit B here ... \0 Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-29rev-list: memory usage reduction.Junio C Hamano
We do not need to track object refs, neither we need to save commit unless we are doing verbose header. A lot of traversal happens inside prepare_revision_walk() these days so setting things up before calling that function is necessary. Signed-off-by: Junio C Hamano <junkio@cox.net> Acked-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22rev-list --timestampJunio C Hamano
This prefixes the raw commit timestamp to the output. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-11rev-lib: Make it easy to do rename tracking (take 2)Fredrik Kuivinen
prune_fn in the rev_info structure is called in place of try_to_simplify_commit. This makes it possible to do rename tracking with a custom try_to_simplify_commit-like function. This commit also introduces init_revisions which initialises the rev_info structure with default values. Signed-off-by: Fredrik Kuivinen <freku045@student.liu.se> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-01git-log (internal): more options.Junio C Hamano
This ports the following options from rev-list based git-log implementation: * -<n>, -n<n>, and -n <n>. I am still wondering if we want this natively supported by setup_revisions(), which already takes --max-count. We may want to move them in the next round. Also I am not sure if we can get away with not setting revs->limited when we set max-count. The latest rev-list.c and revision.c in this series do not, so I left them as they are. * --pretty and --pretty=<fmt>. * --abbrev=<n> and --no-abbrev. The previous commit already handles time-based limiters (--since, --until and friends). The remaining things that rev-list based git-log happens to do are not useful in a pure log-viewing purposes, and not ported: * --bisect (obviously). * --header. I am actually in favor of doing the NUL terminated record format, but rev-list based one always passed --pretty, which defeated this option. Maybe next round. * --parents. I do not think of a reason a log viewer wants this. The flag is primarily for feeding squashed history via pipe to downstream tools. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-01Rip out merge-order and make "git log <paths>..." work again.Linus Torvalds
Well, assuming breaking --merge-order is fine, here's a patch (on top of the other ones) that makes git log <filename> actually work, as far as I can tell. I didn't add the logic for --before/--after flags, but that should be pretty trivial, and is independent of this anyway. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-28git-rev-list libification: rev-list walkingLinus Torvalds
This actually moves the "meat" of the revision walking from rev-list.c to the new library code in revision.h. It introduces the new functions void prepare_revision_walk(struct rev_info *revs); struct commit *get_revision(struct rev_info *revs); to prepare and then walk the revisions that we have. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-27Splitting rev-list into revisions lib, end of beginning.Linus Torvalds
This makes the rewrite easier to validate in that revision flag parsing and warlking part are now all in rev_info structure. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-27rev-list split: minimum fixup.Junio C Hamano
This fixes "the other end has commit X but since then we tagged that commit with tag T, and he says he wants T -- what is the list of objects we need to send him?" question: git-rev-list --objects ^X T We ended up sending everything since the beginning of time X-<. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-26First cut at libifying revlist generationLinus Torvalds
This really just splits things up partially, and creates the interface to set things up by parsing the command line. No real code changes so far, although the parsing of filenames is a bit stricter. In particular, if there is a "--", then we do not accept any filenames before it, and if there isn't any "--", then we check that _all_ paths listed are valid, not just the first one. The new argument parsing automatically also gives us "--default" and "--not" handling as in git-rev-parse. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-25Merge branches 'jc/rev-list' and 'jc/pack-thin'Junio C Hamano
* jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next")
2006-02-24rev-list --objects: use full pathname to help hashing.Junio C Hamano
This helps to group the same files from different revs together, while spreading files with the same basename in different directories, to help pack-object. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-24rev-list --objects-edge: remove duplicated edge commit output.Junio C Hamano
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22rev-list.c: fix non-grammatical comments.Junio C Hamano
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-20rev-list --objects-edgeJunio C Hamano
This new flag is similar to --objects, but causes rev-list to show list of "uninteresting" commits that appear on the edge commit prefixed with '-'. Downstream pack-objects will be changed to take these as hints to use the trees and blobs contained with them as base objects of resulting pack, producing an incomplete (not self-contained) pack. Such a pack cannot be used in .git/objects/pack (it is prevented by git-index-pack erroring out if it is fed to git-fetch-pack -k or git-clone-pack), but would be useful when transferring only small changes to huge blobs. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16topo-order: make --date-order optional.Junio C Hamano
This adds --date-order to rev-list; it is similar to topo order in the sense that no parent comes before all of its children, but otherwise things are still ordered in the commit timestamp order. The same flag is also added to show-branch. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-10rev-list: default to abbreviate merge parent names under --pretty.Junio C Hamano
When we prettyprint commit log messages, merge parent names were often very long and there was no way to abbreviate it. This changes them to be abbreviated by default, and non-default abbreviations can be specified with --no-abbrev or --abbrev=<n> options. Note that this affects only the prettyprinted parent names. The output from --show-parents is meant for machine consumption and is not affected by this flag.
2006-02-02rev-list: omit duplicated parents.Junio C Hamano
Showing the same parent more than once for a commit does not make much sense downstream, so stop it. This can happen with an incorrectly made merge commit that merges the same parent twice, but can happen in an otherwise sane development history while squishing the history by taking into account only commits that touch specified paths. For example, $ git rev-list --max-count=1 --parents addafaf -- rev-list.c would have to show this commit ancestry graph: .---o---. / \ .---*---o---. / 93b74bc \ ---*---o---o-----o---o-----o addafaf d8f6b34 \ / .---o---o---. \ / .---*---. 3815f42 where 5 independent development tracks, only two of which have changes in the specified paths since they forked. The last change for the other three development tracks was done by the same commit before they forked, and we were showing that three times. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-01rev-list: allow -<n> as shorthand for --max-count=<n>Eric Wong
This builds on top of the previous one. Traditionally, head(1) and tail(1) allow their line limits to be parsed this way. Signed-off-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-01rev-list: allow -n<n> as shorthand for --max-count=<n>Eric Wong
Both -n<n> and -n <n> are supported. POSIX versions of head(1) and tail(1) allow their line limits to be parsed this way. I find --max-count to be a commonly used option, and also similar in spirit to head/tail, so I decided to make life easier on my worn out (and lazy :) fingers with this patch. Signed-off-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-01-28Merge lt/revlist,jc/diff,jc/revparse,jc/abbrevJunio C Hamano
2006-01-28pretty_print_commit(): pass commit object instead of commit->buffer.Junio C Hamano
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-01-28diff-tree: abbreviate merge parent object names with --abbrev --pretty.Junio C Hamano
When --abbrev is in effect, abbreviate the merge parent names in prettyprinted output. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-01-28rev-list --remove-empty: add minimum help and doc entry.Junio C Hamano
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-01-28rev-list: stop when the file disappearsLinus Torvalds
The one thing I've considered doing (I really should) is to add a "stop when you don't find the file" option to "git-rev-list". This patch does some of the work towards that: it removes the "parent" thing when the file disappears, so a "git annotate" could do do something like git-rev-list --remove-empty --parents HEAD -- "$filename" and it would get a good graph that stops when the filename disappears (it's not perfect though: it won't remove all the unintersting commits). It also simplifies the logic of finding tree differences a bit, at the cost of making it a tad less efficient. The old logic was two-phase: it would first simplify _only_ merges tree as it traversed the tree, and then simplify the linear parts of the remainder independently. That was pretty optimal from an efficiency standpoint because it avoids doing any comparisons that we can see are unnecessary, but it made it much harder to understand than it really needed to be. The new logic is a lot more straightforward, and compares the trees as it traverses the graph (ie everything is a single phase). That makes it much easier to stop graph traversal at any point where a file disappears. As an example, let's say that you have a git repository that has had a file called "A" some time in the past. That file gets renamed to B, and then gets renamed back again to A. The old "git-rev-list" would show two commits: the commit that renames B to A (because it changes A) _and_ as its parent the commit that renames A to B (because it changes A). With the new --remove-empty flag, git-rev-list will show just the commit that renames B to A as the "root" commit, and stop traversal there (because that's what you want for "annotate" - you want to stop there, and for every "root" commit you then separately see if it really is a new file, or if the paths history disappeared because it was renamed from some other file). With this patch, you should be able to basically do a "poor mans 'git annotate'" with a fairly simple loop: push("HEAD", "$filename") while (revision,filename = pop()) { for each i in $(git-rev-list --parents --remove-empty $revision -- "$filename") pseudo-parents($i) = git-rev-list parents for that line if (pseudo-parents($i) is non-empty) { show diff of $i against pseudo-parents continue } /* See if the _real_ parents of $i had a rename */ parent($i) = real-parent($i) if (find-rename in $parent($i)->$i) push $parent($i), "old-name" } which should be doable in perl or something (doing stacks in shell is just too painful to be worth it, so I'm not going to do this). Anybody want to try? Linus
2006-01-25Make git-rev-list and git-rev-parse argument parsing stricterLinus Torvalds
If you pass it a filename without the "--" marker to separate it from revision information and flags, we now require that the file in question actually exists. This makes mis-typed revision information not be silently just considered a strange filename. With the "--" marker, you can continue to pass in filenames that do not actually exists - useful for querying what happened to a file that you no longer have in the repository. [ All scripts should use the "--" format regardless, to make things unambiguous. So this change should not affect any existing tools ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-12-20rev-list --objects: fix object list without commit.Junio C Hamano
Earlier, "rev-list --objects <sha1>" for an object chain that does not have any commit failed with a usage message. This fixes "send-pack remote $tag" where tag points at a non-commit (e.g. a blob). Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-29bisect: limit the searchspace by pathspecsLinus Torvalds
It was surprisingly easy to do. git bisect start <pathspec> followed by all the normal "git bisect good/bad" stuff. Almost totally untested, and I guarantee that if your pathnames have spaces in them (or your GIT_DIR has spaces in it) this won't work. I don't know how to fix that, my shell programming isn't good enough. This involves small changes to make "git-rev-list --bisect" work in the presense of a pathspec limiter, and then truly trivial (and that's the broken part) changes to make "git bisect" save away and use the pathspec. I tried one bisection, and a "git bisect visualize", and it all looked correct. But hey, don't be surprised if it has problems. Linus Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-22max-count in terms of intersectionLuben Tuikov
When a path designation is given, max-count counts the number of commits therein (intersection), not globally. This avoids the case where in case path has been inactive for the last N commits, --max-count=N and path designation at git-rev-list is given, would give no commits. Signed-off-by: Luben Tuikov <ltuikov@yahoo.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-31Update usage string and documentation for git-rev-list.Junio C Hamano
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-26git-rev-list: do not forget non-commit refsLinus Torvalds
What happens is that the new logic decides that if it can't look up a commit reference (ie "get_commit_reference()" returns NULL), the thing must be a pathname. Fair enough. But wrong. The thing is, it may be a perfectly fine ref that _isn't_ a commit. In git, you have a tag that points to your PGP key, and in the kernel, I have a tag that points to a tree (and a direct ref that points to that tree too, for that matter). So the rule is (as for all the other programs that mix revs and pathnames) not that we only accept commit references, but _any_ valid object ref. If the object then isn't a commit ref, git-rev-list will either ignore it, or add it to the list of non-commit objects (if using "--objects"). The solution is to move the "get_sha1()" out of get_commit_reference(), and into the callers. In fact, we already _have_ the SHA1 in the case of the handle_all() loop, since for_each_ref() will have done it for us, so this is the correct thing to do anyway. This patch (on top of the original one) does exactly that. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-26git-rev-list: make --dense the default (and introduce "--sparse")Linus Torvalds
This actually does three things: - make "--dense" the default for git-rev-list. Since dense is a no-op if no filenames are given, this doesn't actually change any historical behaviour, but it's logically the right default (if we want to prune on filenames, do it fully. The sparse "merge-only" thing may be useful, but it's not what you'd normally expect) - make "git-rev-parse" show the default revision control before it shows any pathnames. This was a real bug, but nobody would ever have noticed, because the default thing tends to only make sense for git-rev-list, and git-rev-list didn't use to take pathnames. - it changes "git-rev-list" to match the other commands that take a mix of revisions and filenames - it no longer requires the "--" before filenames (although you still need to do it if a filename could be confused with a revision name, eg "gitk" in the git archive) This all just makes for much more pleasant and obvous usage. Just doing a gitk t/ does the obvious thing: it will show the history as it concerns the "t/" subdirectory. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-26git-rev-list: fix "--dense" flagLinus Torvalds
Right now --dense will _always_ show the root commit. I didn't do the logic that does the diff against an empty tree. I was lazy. This patch does that. The first round was incorrect but this patch is even slightly tested, and might do a better job. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-23git-rev-list: add "--dense" flagLinus Torvalds
This is what the recent git-rev-list changes have all been gearing up for. When we use a path filter to git-rev-list, the new "--dense" flag asks git-rev-list to compress the history so that it _only_ contains commits that change files in the path filter. It also rewrites the parent information so that tools like "gitk" will see the result as a dense history tree. For example, on the current kernel archive: [torvalds@g5 linux]$ git-rev-list HEAD | wc -l 9904 [torvalds@g5 linux]$ git-rev-list HEAD -- kernel | wc -l 5442 [torvalds@g5 linux]$ git-rev-list --dense HEAD -- kernel | wc -l 356 which shows that while we have almost ten thousand commits, we can prune down the work to slightly more than half by only following the merges that are interesting. But further, we can then compress the history to just 356 entries that actually make changes to the kernel subdirectory. To see this in action, try something like gitk --dense -- gitk to see just the history that affects gitk. Or, to show that true parallel development still remains parallel, do gitk --dense -- daemon.c which shows some parallel commits in the current git tree. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-23Teach git-rev-list to follow just a specified set of filesLinus Torvalds
This is the first cut at a git-rev-list that knows to ignore commits that don't change a certain file (or set of files). NOTE! For now it only prunes _merge_ commits, and follows the parent where there are no differences in the set of files specified. In the long run, I'd like to make it re-write the straight-line history too, but for now the merge simplification is much more fundamentally important (the rewriting of straight-line history is largely a separate simplification phase, but the merge simplification needs to happen early if we want to optimize away unnecessary commit parsing). If all parents of a merge change some of the files, the merge is left as is, so the end result is in no way guaranteed to be a linear history, but it will often be a lot /more/ linear than the full tree, since it prunes out parents that didn't matter for that set of files. As an example from the current kernel: [torvalds@g5 linux]$ git-rev-list HEAD | wc -l 9885 [torvalds@g5 linux]$ git-rev-list HEAD -- Makefile | wc -l 4084 [torvalds@g5 linux]$ git-rev-list HEAD -- drivers/usb | wc -l 5206 and you can also use 'gitk' to more visually see the pruning of the history tree, with something like gitk -- drivers/usb showing a simplified history that tries to follow the first parent in a merge that is the parent that fully defines drivers/usb/. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-19Optimize common case of git-rev-listLinus Torvalds
I took a look at webgit, and it looks like at least for the "projects" page, the most common operation ends up being basically git-rev-list --header --parents --max-count=1 HEAD Now, the thing is, the way "git-rev-list" works, it always keeps on popping the parents and parsing them in order to build the list of parents, and it turns out that even though we just want a single commit, git-rev-list will invariably look up _three_ generations of commits. It will parse: - the commit we want (it obviously needs this) - it's parent(s) as part of the "pop_most_recent_commit()" logic - it will then pop one of the parents before it notices that it doesn't need any more - and as part of popping the parent, it will parse the grandparent (again due to "pop_most_recent_commit()". Now, I've strace'd it, and it really is pretty efficient on the whole, but if things aren't nicely cached, and with long-latency IO, doing those two extra objects (at a minimum - if the parent is a merge it will be more) is just wasted time, and potentially a lot of it. So here's a quick special-case for the trivial case of "just one commit, and no date-limits or other special rules". Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>