summaryrefslogtreecommitdiff
path: root/fsck-objects.c
AgeCommit message (Collapse)Author
2007-01-29git-fsck-objects is now synonym to git-fsckJunio C Hamano
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-22fsck-objects: refactor checking for connectivityLinus Torvalds
This separates the connectivity check into separate codepaths, one for reachable objects and the other for unreachable ones, while adding a lot of comments to explain what is going on. When checking an unreachable object, unlike a reachable one, we do not have to complain if it does not exist (we used to complain about a missing blob even when the only thing that references it is a tree that is dangling). Also we do not have to check and complain about objects that are referenced by an unreachable object. This makes the messages from fsck-objects a lot less noisy and more useful. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-11Better error messages for corrupt databasesLinus Torvalds
This fixes another problem that Andy's case showed: git-fsck-objects reports nonsensical results for corrupt objects. There were actually two independent and confusing problems: - when we had a zero-sized file and used map_sha1_file, mmap() would return EINVAL, and git-fsck-objects would report that as an insane and confusing error. I don't know when this was introduced, it might have been there forever. - when "parse_object()" returned NULL, fsck would say "object not found", which can be very confusing, since obviously the object might "exist", it's just unparseable because it's totally corrupt. So this just makes "xmmap()" return NULL for a zero-sized object (which is a valid thing pointer, exactly the same way "malloc()" can return NULL for a zero-sized allocation). That fixes the first problem (but we could have fixed it in the caller too - I don't personally much care whichever way it goes, but maybe somebody should check that the NO_MMAP case does something sane in this case too?). And the second problem is solved by just making the error message slightly clearer - the failure to parse an object may be because it's missing or corrupt, not necessarily because it's not "found". Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-09Sanitize for_each_reflog_ent()Johannes Schindelin
It used to ignore the return value of the helper function; now, it expects it to return 0, and stops iteration upon non-zero return values; this value is then passed on as the return value of for_each_reflog_ent(). Further, it makes no sense to force the parsing upon the helper functions; for_each_reflog_ent() now calls the helper function with old and new sha1, the email, the timestamp & timezone, and the message. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-21Protect commits recorded in reflog from pruning.Junio C Hamano
This teaches fsck-objects and prune to protect objects referred to by reflog entries. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-20simplify inclusion of system header files.Junio C Hamano
This is a mechanical clean-up of the way *.c files include system header files. (1) sources under compat/, platform sha-1 implementations, and xdelta code are exempt from the following rules; (2) the first #include must be "git-compat-util.h" or one of our own header file that includes it first (e.g. config.h, builtin.h, pkt-line.h); (3) system headers that are included in "git-compat-util.h" need not be included in individual C source files. (4) "git-compat-util.h" does not have to include subsystem specific header files (e.g. expat.h). Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-21Tell between packed, unpacked and symbolic refs.Junio C Hamano
This adds a "int *flag" parameter to resolve_ref() and makes for_each_ref() family to call callback function with an extra "int flag" parameter. They are used to give two bits of information (REF_ISSYMREF and REF_ISPACKED) about the ref. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-21Add callback data to for_each_ref() family.Junio C Hamano
This is a long overdue fix to the API for for_each_ref() family of functions. It allows the callers to specify a callback data pointer, so that the caller does not have to use static variables to communicate with the callback funciton. The updated for_each_ref() family takes a function of type int (*fn)(const char *, const unsigned char *, void *) and a void pointer as parameters, and calls the function with the name of the ref and its SHA-1 with the caller-supplied void pointer as parameters. The commit updates two callers, builtin-name-rev.c and builtin-pack-refs.c as an example. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-18fsck-objects: adjust to resolve_ref() clean-up.Junio C Hamano
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-02Replace uses of strdup with xstrdup.Shawn Pearce
Like xmalloc and xrealloc xstrdup dies with a useful message if the native strdup() implementation returns NULL rather than a valid pointer. I just tried to use xstrdup in new code and found it to be missing. However I expected it to be present as xmalloc and xrealloc are already commonly used throughout the code. [jc: removed the part that deals with last_XXX, which I am finding more and more dubious these days.] Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-29git-fsck-objects: lacking default references should not be fatalLinus Torvalds
The comment added says it all: if we have lost all references in a git archive, git-fsck-objects should still work, so instead of dying it should just notify the user about that condition. This change was triggered by me just doing a "git-init-db" and then populating that empty git archive with a pack/index file to look at it. Having git-fsck-objects not work just because I didn't have any references handy was rather irritating, since part of the reason for running git-fsck-objects in the first place was to _find_ the missing references. However, "--unreachable" really doesn't make sense in that situation, and we want to turn it off to protect anybody who uses the old "git prune" shell-script (rather than the modern built-in). The old pruning script used to remove all objects that were reported as unreachable, and without any refs, that obviously means everything - not worth it. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-23Convert memcpy(a,b,20) to hashcpy(a,b).Shawn Pearce
This abstracts away the size of the hash values when copying them from memory location to memory location, much as the introduction of hashcmp abstracted away hash value comparsion. A few call sites were using char* rather than unsigned char* so I added the cast rather than open hashcpy to be void*. This is a reasonable tradeoff as most call sites already use unsigned char* and the existing hashcmp is also declared to be unsigned char*. [jc: Splitted the patch to "master" part, to be followed by a patch for merge-recursive.c which is not in "master" yet. Fixed the cast in the latter hunk to combine-diff.c which was wrong in the original. Also converted ones left-over in combine-diff.c, diff-lib.c and upload-pack.c ] Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-16remove unnecessary initializationsDavid Rientjes
[jc: I needed to hand merge the changes to the updated codebase, so the result needs to be checked.] Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-15make inline is_null_sha1 globalDavid Rientjes
Replace sha1 comparisons to null_sha1 with a global inline (which previously an unused static inline in builtin-apply.c) [jc: with a fix from Jonas Fonseca.] Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-15Make fsck_dir void.David Rientjes
Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-13Remove TYPE_* constant macros and use object_type enums consistently.Linus Torvalds
This updates the type-enumeration constants introduced to reduce the memory footprint of "struct object" to match the type bits already used in the packfile format, by removing the former (i.e. TYPE_* constant macros) and using the latter (i.e. enum object_type) throughout the code for consistency. Eventually we can stop passing around the "type strings" entirely, and this will help - no confusion about two different integer enumeration. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-30Abstract out accesses to object hash arrayLinus Torvalds
There are a few special places where some programs accessed the object hash array directly, which bothered me because I wanted to play with some simple re-organizations. So this patch makes the object hash array data structures all entirely local to object.c, and the few users who wanted to look at it now get to use a function to query how many object index entries there can be, and to actually access the array. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-18Remove "refs" field from "struct object"Linus Torvalds
This shrinks "struct object" to the absolutely minimal size possible. It now contains /only/ the object flags and the SHA1 hash name of the object. The "refs" field, which is really needed only for fsck, is maintained in a separate hashed lookup-table, allowing all normal users to totally ignore it. This helps memory usage, although not as much as I hoped: it looks like the allocation overhead of malloc (and the alignment constraints in particular) means that while the structure size shrinks, the actual allocation overhead mostly does not. [ That said: memory usage is actually down, but not as much as it should be: I suspect just one of the object types actually ended up shrinking its effective allocation size. To get to the next level, we probably need specialized allocators that don't pad the allocation more than necessary. ] The separation makes for some code cleanup, though, and makes the ref tracking that fsck wants a clearly separate thing. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-18Shrink "struct object" a bitLinus Torvalds
This shrinks "struct object" by a small amount, by getting rid of the "struct type *" pointer and replacing it with a 3-bit bitfield instead. In addition, we merge the bitfields and the "flags" field, which incidentally should also remove a useless 4-byte padding from the object when in 64-bit mode. Now, our "struct object" is still too damn large, but it's now less obviously bloated, and of the remaining fields, only the "util" (which is not used by most things) is clearly something that should be eventually discarded. This shrinks the "git-rev-list --all" memory use by about 2.5% on the kernel archive (and, perhaps more importantly, on the larger mozilla archive). That may not sound like much, but I suspect it's more on a 64-bit platform. There are other remaining inefficiencies (the parent lists, for example, probably have horrible malloc overhead), but this was pretty obvious. Most of the patch is just changing the comparison of the "type" pointer from one of the constant string pointers to the appropriate new TYPE_xxx small integer constant. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-30fsck-objects: avoid unnecessary tree_entry_list usageLinus Torvalds
Prime example of where the raw tree parser is easier for everybody. [jc: "Aieee" one-liner fix from the list applied. ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-30Remove "tree->entries" tree-entry list from tree parserLinus Torvalds
Instead, just use the tree buffer directly, and use the tree-walk infrastructure to walk the buffers instead of the tree-entry list. The tree-entry list is inefficient, and generates tons of small allocations for no good reason. The tree-walk infrastructure is generally no harder to use than following a linked list, and allows us to do most tree parsing in-place. Some programs still use the old tree-entry lists, and are a bit painful to convert without major surgery. For them we have a helper function that creates a temporary tree-entry list on demand. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-30Make "tree_entry" have a SHA1 instead of a union of object pointersLinus Torvalds
This is preparatory work for further cleanups, where we try to make tree_entry look more like the more efficient tree-walk descriptor. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-30Make "struct tree" contain the pointer to the tree bufferLinus Torvalds
This allows us to avoid allocating information for names etc, because we can just use the information from the tree buffer directly. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-04fsck-objects: do not segfault on missing tree in cache-treeJunio C Hamano
Even if trees are missing in cache-tree, we should continue and check the rest of the object database. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-02fsck-objects: mark objects reachable from cache-treeJunio C Hamano
When fsck-objects scanned cache-tree, it forgot to mark the trees it found reachable and in use. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-25Teach fsck-objects about cache-tree.Junio C Hamano
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-09fsck-objects: Remove --standaloneJunio C Hamano
The fsck-objects command (back then it was called fsck-cache) used to complain if objects referred to by files in .git/refs/ or objects stored in files under .git/objects/??/ were not found as stand-alone SHA1 files (i.e. found in alternate object pools or packed archives stored under .git/objects/pack). Back then, packs and alternates were new curiosity and having everything as loose objects were the norm. When we adjusted the behaviour of fsck-cache to consider objects found in packs are OK, we introduced the --standalone flag as a backward compatibility measure. It still correctly checks if your repository is complete and consists only of loose objects, so in that sense it is doing the "right" thing, but checking that is pointless these days. This commit removes --standalone flag. See also: 23676d407c63a6f67f8ce3ff192199bda03e6a03 8a498a05c3c6b2f53db669b24f36257ab213eb4c Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-26Use setenv(), fix warningsTimo Hirvonen
- Fix -Wundef -Wold-style-definition warnings - Make pll_free() static [jc: original patch by Timo had another unrelated bits: - Use setenv() instead of putenv() I'm postponing that part for now.] Signed-off-by: Timo Hirvonen <tihirvon@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-12Use a hashtable for objects instead of a sorted listJohannes Schindelin
In a simple test, this brings down the CPU time from 47 sec to 22 sec. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-01-22fsck-objects: support platforms without d_ino in struct dirent.Junio C Hamano
The d_ino field is only used for performance reasons in fsck-objects. On a typical filesystem, i-number tends to have a strong correlation with where the actual bits sit on the disk platter, and we sort the entries to allow us scan things that ought to be close together together. If the platform lacks support for it, it is not a big deal. Just do not use d_ino for sorting, and scan them unsorted. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-29fsck-objects: work from subdirectory.Junio C Hamano
Not much point making it work from subdirectory, but for a consistency make it so. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-15Rework object refs tracking to reduce memory usageSergey Vlasov
Store pointers to referenced objects in a variable sized array instead of linked list. This cuts down memory usage of utilities which use object references; e.g., git-fsck-objects --full on the git.git repository consumes about 2 MB of memory tracked by Massif instead of 7 MB before the change. Object refs are still the biggest consumer of memory (57%), but the malloc overhead for a single block instead of a linked list is substantially smaller. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-15git-fsck-objects: Free tree entries after useSergey Vlasov
The Massif tool of Valgrind revealed that parsed tree entries occupy more than 60% of memory allocated by git-fsck-objects. These entries can be freed immediately after use, which significantly decreases memory consumption. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-08Create object subdirectories on demandLinus Torvalds
This makes it possible to have a "sparse" git object subdirectory structure, something that has become much more attractive now that people use pack-files all the time. As a result of pack-files, a git object directory doesn't necessarily have any individual objects lying around, and in that case it's just wasting space to keep the empty first-level object directories around: on many filesystems the 256 empty directories will be aboue 1MB of diskspace. Even more importantly, after you re-pack a project that _used_ to be unpacked, you could be left with huge directories that no longer contain anything, but that waste space and take time to look through. With this change, "git prune-packed" can just do an rmdir() on the directories, and they'll get removed if empty, and re-created on demand. This patch also tries to fix up "write_sha1_from_fd()" to use the new common infrastructure for creating the object files, closing a hole where we might otherwise leave half-written objects in the object database. [jc: I unoptimized the part that really removes the fan-out directories to ease transition. init-db still wastes 1MB of diskspace to hold 256 empty fan-outs, and prune-packed rmdir()'s the grown but empty directories, but runs mkdir() immediately after that -- reducing the saving from 150KB to 146KB. These parts will be re-introduced when everybody has the on-demand capability.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-02Add git-symbolic-refJunio C Hamano
This adds the counterpart of git-update-ref that lets you read and create "symbolic refs". By default it uses a symbolic link to represent ".git/HEAD -> refs/heads/master", but it can be compiled to use the textfile symbolic ref. The places that did 'readlink .git/HEAD' and 'ln -s refs/heads/blah .git/HEAD' have been converted to use new git-symbolic-ref command, so that they can deal with either implementation. Signed-off-by: Junio C Hamano <junio@twinsun.com>
2005-09-28[PATCH] Make some needlessly global stuff staticPeter Hagervall
Insert 'static' where appropriate. Signed-off-by: Peter Hagervall <hager@cs.umu.se> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-27[PATCH] Provide access to git_dir through get_git_dir().Sven Verdoolaege
Signed-off-by: Sven Verdoolaege <skimo@kotnet.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-20[PATCH] Make the git-fsck-objects diagnostics more usefulPetr Baudis
Actually report what exactly is wrong with the object, instead of an ambiguous 'bad sha1 file' or such. In places where we already do, unify the format and clean the messages up. Signed-off-by: Petr Baudis <pasky@suse.cz> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-09Retire support for old environment variables.Junio C Hamano
We have deprecated the old environment variable names for quite a while and now it's time to remove them. Gone are: SHA1_FILE_DIRECTORIES AUTHOR_DATE AUTHOR_EMAIL AUTHOR_NAME COMMIT_AUTHOR_EMAIL COMMIT_AUTHOR_NAME SHA1_FILE_DIRECTORY Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-08Big tool rename.Junio C Hamano
As promised, this is the "big tool rename" patch. The primary differences since 0.99.6 are: (1) git-*-script are no more. The commands installed do not have any such suffix so users do not have to remember if something is implemented as a shell script or not. (2) Many command names with 'cache' in them are renamed with 'index' if that is what they mean. There are backward compatibility symblic links so that you and Porcelains can keep using the old names, but the backward compatibility support is expected to be removed in the near future. Signed-off-by: Junio C Hamano <junkio@cox.net>